This paper deals with analysis of variance with fuzzy data (ANOVAF) based on permutation method. The permutation method is a nonparametric method introduced by Heap and Johnson for the data when the normal distribution cannot be assumed. We proposed two different approaches to test hypothesis of fuzzy means using the empirical distribution. To compare the results, several distances are considered especially using
Analysis of variance (ANOVA) is a widely used statistical techniques in various fields such as design of experiments, survey design, quality improvement and many other industries. In general, the purpose of ANOVA is to test for significant differences of means among more than two independent populations. In ANOVA, the involved experimental data are assumed to be drawn from a normal distribution having equal variances. But in many practical situations the underlying distribution is far from being normal. In these situations, nonparametric method can be used as an alternative approach. Also, the data sometimes cannot be collected precise values due to ambiguous information or linguistic structure. An appropriate way of solving problems like these is to applying the concept of fuzzy set theory.
ANOVA using fuzzy set theory concepts has been studied by several authors. De Garibay [1] considered one-way fuzzy ANOVA and compares it with other classical techniques. Konishi et al. [2] propose the method of ANOVA which can treat the fuzzy interval data using the moment correction. Wu [3] solved one-way fuzzy ANOVA using the h-level set and the notions of pessimistic degree and optimistic degree. ANOVA study was developed by Gonzalez-Rodriguez et al. [4] and Montenegro et al. [5]. They presented an exact one-way ANOVA testing procedure for the case with normal fuzzy random variables. On the other hand, they approached large sample tests for simple fuzzy random variables. A bootstrap approach to ANOVA for fuzzy valued sample data is introduced in Gil et al. [6] and Montenegro et al. [7]. Lubiano and Trutschnig [8] used the R package SAFD (Statistical Analysis of Fuzzy Data). Also, Jiryaei et al. [9] applied least squares method.
In this paper, we discuss ANOVA model with fuzzy data applying the permutation method to compare the fuzzy mean responses. The permutation tests are significant tests based on permutation resamples drawn from the original data. Since the permutation resamples are drawn without replacement, in contrast to bootstrap samples, which are drawn with replacement, the permutation resamples include all original data. If the number of all possible permutations is large number then for computational reasons, we can use random permutation test. In this paper we perform the permutation test based on the Monte Carlo simulation to determine how many permutation samples are required to provide a sufficiently accurate p-value.
This paper is organized as follows: In Section 2, some preliminary concepts which are required to develop the main results are presented. In Section 3, numerical example is presented to illustrate ANOVA model with fuzzy data applying permutation test under several distances, special cases of ρ-distance.
In this section, we briefly describe the main theoretical tools that have been used in the proposing ANOVA model using permutation method.
Some basic concepts of fuzzy theory are provided as follows:
A fuzzy subset Ã of the set of real numbers
The α-cut of fuzzy number Ã is a finite closed interval,
where S^{n}^{–1} is the (n – 1)-dimensional unit sphere in
The ρ-distance for any two fuzzy numbers Ã and B̃ is defined as
where S = S^{n}^{–1}×[0, 1]×S^{n}^{–1}×[0, 1] and K is a symmetric and positive definite kernel.
Let Ã = (a^{l}, a^{c}, a^{r}) and B̃ = (b^{l}, b^{c}, b^{r}) be the triangular fuzzy numbers. a^{c} is the center of Ã and a^{l}, a^{r} are left, right spread of Ã. Using the special kernel, ρ-distance reduces to the several distances as follows:
ρ-distance reduces to well-known Diamond distance [11].
The distance by Ming et al. [12]
is included in the ρ-distance.
Bertoluzza et. al. [13] defined the distance
where f_{Ã}(α, λ) = λsupA_{α} + (1 − λ)infA_{α}, and W, ϕ are weighting measures.
Traditional ANOVA model, one-factor model can be stated as follows:
where Y_{ij} is value of the response variable in the jth trial for the ith factor level or treatment and assume random sampling with equal variances, independent errors, and a normal distribution. ANOVA is a statistical method used to test differences between group means. The relevant hypotheses are
The number of cases for the ith factor level is denoted by n_{i} and the total number of cases in the study is denoted by N, where
where Ȳ_{i}_{.} is the mean of observations within factor level i, and Ȳ.. is the grand mean (i.e.,
follow an F distribution with r – 1 and N – r degrees of freedom. There are two equivalent decision rules to test given hypotheses. Since we know that f is distributed as F_{r}_{–1,}_{N}_{–}_{r} when H_{0} holds and that large values of f lead to conclusion H_{a}, we have a decision rule to control the level of significant at α as follows:
where F_{α}_{;}_{r}_{–1,}_{N}_{–}_{r} is the (1–α)100 percentile of the F distribution and f is the observed F-statistic value. Another approach of decision rule is p-value approach. The p-value is calculated as the area under the appropriate null sampling distribution of F that is bigger than f. If the p-value is smaller the level of significance, reject the null hypothesis that all means are the same for the different groups.
In many studies, it is assumed that the data were taken from normally distributed population. However, when the data size is small, it is not suitable to assume normal population. Even the case we have a big data set, it is not very common that the data are taken from exactly normally distributed population. Hence, nonparametric methods have been used as its alternatives. But some nonparametric methods make use of only ranks of the data, which results in loss of raw data information. To overcome this problem, Heap [14] and Johnson [15] proposed permutation methods which employ raw data information. Moreover these methods are not dependent on distribution of population. A permutation method is a distribution free method, which can be used in case the test statistic does not follow a well-known distribution. For example, in case the test of difference of means of two populations, even though a permutation method use t-statistic, it is not assume that t-statistic follow t-distribution. It obtains an empirical distribution for the test statistic directly from dataset. In other words, it obtains an empirical distribution after calculating all possible permutations from observed data, then it calculates p-value after investigating where the observed test statistic is located in the empirical distribution. This method could not be widely used in the past because of huge amount of calculation, but it actively has been studied recently due to computer development. As the number of data size is increased, the possible number of permutation gets increased geometrically. Accordingly, it is almost impossible to apply exact permutation method which needs an empirical distribution with total amount of possible permutations. In this case, a random permutation method can be applied which needs randomly chosen permutations to generate an approximate empirical distribution.
The general permutation method proceeds as follows:
Set a statistic for the test of the hypothesis.
Calculate the test statistic based on observed data.
Generate permutation data (b = 1, · · · ,B).
Calculate the test statistic based on each permutation data, and find the empirical distribution.
Calculate the p-value in the empirical distribution based on the test statistic using observed data.
Test the hypothesis using p-value.
Consider one-way ANOVA model with fuzzy data (ANOVAF).
where μ̃;_{i} is the fuzzy mean of the ith population. ε_{ij} is an error term without distribution assumption. Given r populations with fuzzy data, we construct hypothesis for the test as follows:
Likewise in traditional ANOVA model, following properties hold in ANOVAF. The sum of squares of the distance deviations about the fuzzy grand mean can be decomposed into two sums, the first of which represents the within-group sum of squares, and the second, the between-group sum of squares:
where
If error is assumed to be normal distribution, the test statistic follows an F distribution with r – 1 and N – r degrees of freedom [5]. In this study, we use to test hypothesis permutation method that does not require the restrictive assumption such as normal distribution. The proposed method proceeds as follows:
Calculate the F-value from the observed fuzzy data.
Generate all possible permutations from the data X̃ = { X̃_{11}, · · · , X̃_{1}_{n}_{1} , X̃_{21}, · · · , X̃_{rn}_{r}}. Let b = 1, 2, · · · ,B be index a specific permutation of X, where
Calculate the F-value from bth permutation sample and denote F^{b}. Repeat for b = 1, · · · ,B.
Find empirical permutation distribution of test statistics.
Now, we present two different approaches for the hypothesis test. First approach uses the p-value calculated by the ratio of the number of cases which are greater than F-value obtained from observations and F-value obtained from permutation samples.
where I(·) is the indicator function, equaling ‘1’ if the condition in parentheses is true, and ‘0’ otherwise. This permutation test is denoted by PT1.
The second approach uses the p-value calculated from the ratio of the number of cases which are greater than the critical points of F distribution. This permutation test is denoted by PT2.
If the p-value is small than the level of significance, reject the null hypothesis that all means are the same for the different groups.
If all possible permutation numbers are large, we randomly choose some of them for the test, which use estimated significant probability. Let ϕ be the true parameter of the estimated significant probability. The data obtained from permutation data follow p_{1}, · · · , p_{n} ~ Bernoulli(ϕ) and V ar(p̂) = ϕ(1 – ϕ)/n. Therefore, the estimated variance of the significant probability is small enough if we use more than a specific number of permutations.
Consider the fuzzy data are given to be symmetric triangular fuzzy numbers [3]. We test whether or not the fuzzy mean are the same for the four package designs. The hypothesis for the test is given by:
The fuzzy means for four package designs are
In exact PT1 (EPT1), the p-values which is greater than F-value calculated by observed data are 0.0203, 0.0198, 0.0203, respectively. And the null hypothesis is rejected because all p-values are smaller than the significance level α = 0.05. Moreover, the critical points of distribution are 3.9103, 4.0712, 4.1893, which are greater than F-values obtained from observed data. Hence we conclude that the fuzzy means of package designs are different.
In EPT2, the critical point of F distribution is F_{0.05;3,6} = 4.7571 and F_{D}, F_{M}, F_{B} are greater than F_{0.05;3,6}, respectively. In addition, the p-values which are defined greater than F_{0.05;3,6} are 0.0254, 0.0297 and 0.0325, respectively. Therefore, the null hypothesis is rejected at the significance level α = 0.05. In practice, the number of all permutations can be too large to be tractable. In this case, we can test the hypothesis with randomly chosen permutation data. Table 3 shows Monte Carlo simulation results through 5,000 iterations. Let the p-value calculated by EPT is the true parameter, the estimated p-value gets close to the true parameter as B, the number of permutation sample, increases. In other words, the sampling error gets small when B gets increased. In addition, there is no significant differences in errors regardless of distances (Diamond, Ming, Bertoluzza) and test methods (RPT1, RPT2). Moreover, when the alternative hypothesis (H_{a}) is true, the power of the test, increases, B increases. Further, the power of the test doesn’t show any significant difference regardless of distances or test methods. Here, the power of the test is calculated based on the ratio of the number of rejection among 5,000 simulations. In addition, the ratio of the case when the true parameter is included in the confidence interval under 95% confidence level also gets increased, as B gets increased. Especially, the number of permutation sample should be large enough when more than 4,000 in order to satisfy 95% confidence level.
Some samples are selected to compare means of milk pouches which are produced by 4 different machines. The samples are observed as fuzzy numbers because the data could not recorded exactly due to unexpected situation. Table 4 shows the fuzzy observations which are non-symmetric triangular fuzzy numbers [16].
The fuzzy means for four package designs are
F_{D}, F_{M} and F_{B} are 4.9019, 4.9399 and 5.0134, respectively. The number of permutation sample is 25,200. In EPT1, critical point of significance level 0.05 from the empirical distribution are 4.3584, 4.4211 and 4.4866. The p-value are 0.0337, 0.0358 and 0.0354. Therefore, the null hypothesis is rejected. In EPT2, since F-value > F_{0.05;,3,6} and p-value < α, H_{0} is rejected. That is, we can conclude that means of the number of pouches are different. Table 6 shows the results from the simulation through 5,000 iterations. As B gets increased, the estimates of p-value get approached true parameter and the power of the test also get improved. In addition, confidence level increase as B increases. Especially, the number of permutation sample should be large enough when more than 4,000 in order to satisfy significance level α = 0.05.
In this study, a permutation method is applied to ANOVA model with fuzzy data based on the distances introduced by Diamond [11], Ming et. al. [12] and Bertoluzza et. al. [13]. A permutation method can be applied using an empirical distribution through resampling, which doesn’t need any assumption regarding specific distribution. Regardless of distances or test methods, it is shown through Monte Carlo simulation that the power of the test increases and the significant probability (p̂-value) approach true parameter (p-value), moreover the confidence level increases, as the size of permutation data increases. Therefore, if the size of total possible permutation sample is large enough, the test results in accurate conclusion even if we use only randomly selected permutation sample. This permutation method can be expanded to two-way ANOVA model which will be discussed in our further studies.
No potential conflict of interest relevant to this article was reported.
Empirical permutation distribution in Example 1. (a) Diamond [
Fuzzy data for sales volumes
Package design i | |||
---|---|---|---|
1 | |||
2 | |||
3 | |||
4 |
The illustration and the summary statistics for exact permutation method in Example 1
Replicate | Package design | ||||||
---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | Diamond [ | Ming [ | Bertoluzza [ | |
1 ( | 5.2410 | 5.4330 | 5.5721 | ||||
2 | 7.0197 | 7.3946 | 7.6733 | ||||
3 | 3.4223 | 3.4963 | 3.5487 | ||||
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
25,200 | 5.2410 | 5.4330 | 5.5721 | ||||
Critical point of empirical distribution | 3.9103 | 4.0712 | 4.1893 | ||||
0.0203 | 0.0198 | 0.0203 | |||||
Critical point of | 4.7571 | 4.7571 | 4.7571 | ||||
0.0254 | 0.0297 | 0.0325 |
The statistical results of random permutation method in Example 1
50 | 100 | 1,000 | 3,000 | 5,000 | 10,000 | |||
---|---|---|---|---|---|---|---|---|
Diamond [ | RPT1 | 0.0197 | 0.0097 | 0.0009 | 0.0002 | 0.0001 | 0.0001 | |
Power | 0.7380 | 0.8584 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.3612 | 0.4012 | 0.7768 | 0.9386 | 0.9830 | 0.9986 | ||
RPT2 | p̂ – p | 0.0197 | 0.0097 | 0.0009 | 0.0003 | 0.0002 | 0.0001 | |
Power | 0.6452 | 0.7584 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.2708 | 0.5384 | 0.8002 | 0.9404 | 0.9816 | 0.9986 | ||
Ming et. al. [ | RPT1 | 0.0195 | 0.0097 | 0.0009 | 0.0002 | 0.0001 | 0.0001 | |
Power | 0.7506 | 0.8698 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.3718 | 0.4198 | 0.8050 | 0.9442 | 0.9834 | 0.9984 | ||
RPT2 | p̂ – p | 0.0195 | 0.0097 | 0.0009 | 0.0003 | 0.0002 | 0.0001 | |
Power | 0.5678 | 0.6650 | 0.9996 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.2182 | 0.4290 | 0.7658 | 0.9410 | 0.9772 | 0.9970 | ||
Bertoluzza et. al. [ | RPT1 | 0.0197 | 0.0098 | 0.0009 | 0.0003 | 0.0002 | 0.0001 | |
Power | 0.7408 | 0.8600 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.5600 | 0.3986 | 0.7714 | 0.9352 | 0.9852 | 0.9994 | ||
RPT2 | p̂ – p | 0.0194 | 0.0096 | 0.0010 | 0.0003 | 0.0002 | 0.0002 | |
Power | 0.5206 | 0.5956 | 0.9968 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.1894 | 0.3682 | 0.7630 | 0.9364 | 0.9808 | 0.9970 |
Fuzzy data for milk pouches
Machine ( | |||
---|---|---|---|
1 | (8,9,14) | (11,14,18) | |
2 | (7,11,15) | (8,10,14) | (8,11,14) |
3 | (12,15,18) | (12,14,19) | (14,17,23) |
4 | (12,15,19) | (19,21,24) |
The statistical results of exact permutation method in Example 2
Metric | C.P. | ||||
---|---|---|---|---|---|
Diamond [ | 4.9019 | 4.3584 | 4.7571 | 0.0337 | 0.0395 |
Ming et. al. [ | 4.9399 | 4.4211 | 4.7571 | 0.0358 | 0.0406 |
Bertoluzza et. al. [ | 5.0134 | 4.4866 | 4.7571 | 0.0354 | 0.0419 |
The statistical results of random permutation method in Example 2
50 | 100 | 1,000 | 3,000 | 5,000 | 10,000 | |||
---|---|---|---|---|---|---|---|---|
Diamond [ | RPT1 | 0.0193 | 0.0102 | 0.0013 | 0.0007 | 0.0005 | 0.0005 | |
Power | 0.5080 | 0.5592 | 0.9928 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.1764 | 0.3368 | 0.7798 | 0.9378 | 0.9778 | 0.9982 | ||
RPT2 | 0.0191 | 0.0098 | 0.0010 | 0.0003 | 0.0002 | 0.0001 | ||
Power | 0.4190 | 0.4434 | 0.9268 | 0.9964 | 0.9996 | 1.0000 | ||
C.L | 0.4190 | 0.4434 | 0.7922 | 0.9288 | 0.9764 | 0.9986 | ||
Ming et. al. [ | RPT1 | 0.0192 | 0.0099 | 0.0010 | 0.0003 | 0.0002 | 0.0001 | |
Power | 0.4732 | 0.5236 | 0.9810 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.4730 | 0.5236 | 0.7844 | 0.9382 | 0.9776 | 0.9988 | ||
RPT2 | 0.0189 | 0.0098 | 0.0010 | 0.0003 | 0.0002 | 0.0001 | ||
Power | 0.4082 | 0.4230 | 0.8932 | 0.9900 | 0.9990 | 1.0000 | ||
C.L | 0.4082 | 0.4230 | 0.7816 | 0.9392 | 0.9788 | 0.9974 | ||
Bertoluzza et. al. [ | RPT1 | 0.0192 | 0.0099 | 0.0010 | 0.0003 | 0.0002 | 0.0001 | |
Power | 0.4790 | 0.5318 | 0.9852 | 1.0000 | 1.0000 | 1.0000 | ||
C.L | 0.4790 | 0.5318 | 0.7532 | 0.9306 | 0.9776 | 0.9984 | ||
RPT2 | 0.0188 | 0.0097 | 0.0010 | 0.0003 | 0.0002 | 0.0001 | ||
Power | 0.3912 | 0.3978 | 0.8470 | 0.9792 | 0.9968 | 1.0000 | ||
C.L | 0.3912 | 0.3978 | 0.7656 | 0.9388 | 0.9794 | 0.9982 |