search for


Analysis of Variance for Fuzzy Data Based on Permutation Method
Int. J. Fuzzy Log. Intell. Syst. 2017;17(1):43-50
Published online March 31, 2017
© 2017 Korean Institute of Intelligent Systems.

Woo-Joo Lee1, Hye-Young Jung2, Jin Hee Yoon3, and Seung Hoe Choi4

1Department of Mathematics, Yonsei University, Seoul, Korea, 2Department of Statistics, Seoul National University, Seoul, Korea, 3School of Mathematics and Statistics, Sejong University, Seoul, Korea, 4School of Liberal Arts and Science, Korea Aerospace University, Goyang, Korea
Correspondence to: Seung Hoe Choi (
Received March 13, 2017; Revised March 20, 2017; Accepted March 23, 2017.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This paper deals with analysis of variance with fuzzy data (ANOVAF) based on permutation method. The permutation method is a nonparametric method introduced by Heap and Johnson for the data when the normal distribution cannot be assumed. We proposed two different approaches to test hypothesis of fuzzy means using the empirical distribution. To compare the results, several distances are considered especially using ρ-distance. Applying Monte Carlo simulation, it is confirmed through the numerical examples that the significant probability (-value) get approached true parameter (p-value) regardless of distances or testing method based on proposed method. In addition, the number of permutation samples required is determined in the example to satisfy specified given accuracy.

Keywords : Fuzzy number, ANOVA, Permutation method, Monte Carlo simulation
1. Introduction

Analysis of variance (ANOVA) is a widely used statistical techniques in various fields such as design of experiments, survey design, quality improvement and many other industries. In general, the purpose of ANOVA is to test for significant differences of means among more than two independent populations. In ANOVA, the involved experimental data are assumed to be drawn from a normal distribution having equal variances. But in many practical situations the underlying distribution is far from being normal. In these situations, nonparametric method can be used as an alternative approach. Also, the data sometimes cannot be collected precise values due to ambiguous information or linguistic structure. An appropriate way of solving problems like these is to applying the concept of fuzzy set theory.

ANOVA using fuzzy set theory concepts has been studied by several authors. De Garibay [1] considered one-way fuzzy ANOVA and compares it with other classical techniques. Konishi et al. [2] propose the method of ANOVA which can treat the fuzzy interval data using the moment correction. Wu [3] solved one-way fuzzy ANOVA using the h-level set and the notions of pessimistic degree and optimistic degree. ANOVA study was developed by Gonzalez-Rodriguez et al. [4] and Montenegro et al. [5]. They presented an exact one-way ANOVA testing procedure for the case with normal fuzzy random variables. On the other hand, they approached large sample tests for simple fuzzy random variables. A bootstrap approach to ANOVA for fuzzy valued sample data is introduced in Gil et al. [6] and Montenegro et al. [7]. Lubiano and Trutschnig [8] used the R package SAFD (Statistical Analysis of Fuzzy Data). Also, Jiryaei et al. [9] applied least squares method.

In this paper, we discuss ANOVA model with fuzzy data applying the permutation method to compare the fuzzy mean responses. The permutation tests are significant tests based on permutation resamples drawn from the original data. Since the permutation resamples are drawn without replacement, in contrast to bootstrap samples, which are drawn with replacement, the permutation resamples include all original data. If the number of all possible permutations is large number then for computational reasons, we can use random permutation test. In this paper we perform the permutation test based on the Monte Carlo simulation to determine how many permutation samples are required to provide a sufficiently accurate p-value.

This paper is organized as follows: In Section 2, some preliminary concepts which are required to develop the main results are presented. In Section 3, numerical example is presented to illustrate ANOVA model with fuzzy data applying permutation test under several distances, special cases of ρ-distance.

2. Preliminaries

In this section, we briefly describe the main theoretical tools that have been used in the proposing ANOVA model using permutation method.

2.1 Fuzzy Numbers

Some basic concepts of fuzzy theory are provided as follows:

Definition 1

A fuzzy subset à of the set of real numbersRnwith membership function uà : Rn → [0, 1] is called a fuzzy number if satisfies (i) à is normal, i.e uÃ(x0) = 1 for some x0Rn, (ii) à is convex set, i.ex1, x2Rnand λ ∈ [0, 1]: uÃ(λx1 + (1 − λ)x2) ≥ min(uÃ(x1), uÃ(x2)) (iii) uÃis semicontinuous.

The α-cut of fuzzy number à is a finite closed interval, Aα={xRnuA˜α,α[0,1]}=[AαL,AαU] where AαL=inf{xRnuA˜(x)α},AαU=sup{xRnuA˜(x)α}. The statistical management of fuzzy numbers usually requires suitable metric between them. The support function of any compact convex set ARn is defined as function sA : Sn–1R given by, for all xSn–1


where Sn–1 is the (n – 1)-dimensional unit sphere in Rn and · is the inner product in Rn. A ρ-distance as a unification of different distance applied in statistical inference with fuzzy data was introduced by Näther [10].

Definition 2

The ρ-distance for any two fuzzy numbers à and B̃ is defined as


where S = Sn–1×[0, 1]×Sn–1×[0, 1] and K is a symmetric and positive definite kernel.

Let à = (al, ac, ar) and = (bl, bc, br) be the triangular fuzzy numbers. ac is the center of à and al, ar are left, right spread of Ã. Using the special kernel, ρ-distance reduces to the several distances as follows:

ρ-distance reduces to well-known Diamond distance [11].


The distance by Ming et al. [12]


is included in the ρ-distance.

Bertoluzza et. al. [13] defined the distance dB2(A˜,B˜)


where fÃ(α, λ) = λsupAα + (1 − λ)infAα, and W, ϕ are weighting measures.

2.2 Traditional Analysis of Variance Model

Traditional ANOVA model, one-factor model can be stated as follows:

Yij=μi+ɛij,         i=1,,r;j=1,,ni,

where Yij is value of the response variable in the jth trial for the ith factor level or treatment and assume random sampling with equal variances, independent errors, and a normal distribution. ANOVA is a statistical method used to test differences between group means. The relevant hypotheses are

H0:μ1=μ2==μr,         vs.Ha:At least one of the μis is not equal.

The number of cases for the ith factor level is denoted by ni and the total number of cases in the study is denoted by N, where N=i=1rni. The total sum of squares (SSTO), the treatment sum of squares (SSTR) and the error sum of squares (SSE) be defined respectively as


where i. is the mean of observations within factor level i, and .. is the grand mean (i.e., Y¯..=i=1rj=1niY¯i./N). The test statistic


follow an F distribution with r – 1 and Nr degrees of freedom. There are two equivalent decision rules to test given hypotheses. Since we know that f is distributed as Fr–1,Nr when H0 holds and that large values of f lead to conclusion Ha, we have a decision rule to control the level of significant at α as follows:

If fFα;r-1,N-r,conclude H0,If f>Fα;r-1,N-r,conclude Ha,

where Fα;r–1,Nr is the (1–α)100 percentile of the F distribution and f is the observed F-statistic value. Another approach of decision rule is p-value approach. The p-value is calculated as the area under the appropriate null sampling distribution of F that is bigger than f. If the p-value is smaller the level of significance, reject the null hypothesis that all means are the same for the different groups.

2.3 Permutation Method

In many studies, it is assumed that the data were taken from normally distributed population. However, when the data size is small, it is not suitable to assume normal population. Even the case we have a big data set, it is not very common that the data are taken from exactly normally distributed population. Hence, nonparametric methods have been used as its alternatives. But some nonparametric methods make use of only ranks of the data, which results in loss of raw data information. To overcome this problem, Heap [14] and Johnson [15] proposed permutation methods which employ raw data information. Moreover these methods are not dependent on distribution of population. A permutation method is a distribution free method, which can be used in case the test statistic does not follow a well-known distribution. For example, in case the test of difference of means of two populations, even though a permutation method use t-statistic, it is not assume that t-statistic follow t-distribution. It obtains an empirical distribution for the test statistic directly from dataset. In other words, it obtains an empirical distribution after calculating all possible permutations from observed data, then it calculates p-value after investigating where the observed test statistic is located in the empirical distribution. This method could not be widely used in the past because of huge amount of calculation, but it actively has been studied recently due to computer development. As the number of data size is increased, the possible number of permutation gets increased geometrically. Accordingly, it is almost impossible to apply exact permutation method which needs an empirical distribution with total amount of possible permutations. In this case, a random permutation method can be applied which needs randomly chosen permutations to generate an approximate empirical distribution.

The general permutation method proceeds as follows:

  1. Set a statistic for the test of the hypothesis.

  2. Calculate the test statistic based on observed data.

  3. Generate permutation data (b = 1, · · · ,B).

  4. Calculate the test statistic based on each permutation data, and find the empirical distribution.

  5. Calculate the p-value in the empirical distribution based on the test statistic using observed data.

  6. Test the hypothesis using p-value.

3. Analysis of Variance Model with Fuzzy Data

Consider one-way ANOVA model with fuzzy data (ANOVAF).

X˜ij=μ˜i+ɛij,         i=1,2,,r,j=1,2,,ni,

where μ̃;i is the fuzzy mean of the ith population. εij is an error term without distribution assumption. Given r populations with fuzzy data, we construct hypothesis for the test as follows:

H0:μ˜1=μ˜2==μ˜r,         vs.Ha:At least one of the μ˜is is not equal.

Likewise in traditional ANOVA model, following properties hold in ANOVAF. The sum of squares of the distance deviations about the fuzzy grand mean can be decomposed into two sums, the first of which represents the within-group sum of squares, and the second, the between-group sum of squares:


where X˜¯i. be the fuzzy mean value of ith group and X˜¯.. be the total fuzzy grand mean. The test statistic to be used is


If error is assumed to be normal distribution, the test statistic follows an F distribution with r – 1 and Nr degrees of freedom [5]. In this study, we use to test hypothesis permutation method that does not require the restrictive assumption such as normal distribution. The proposed method proceeds as follows:

  1. Calculate the F-value from the observed fuzzy data.

  2. Generate all possible permutations from the data = { 11, · · · , 1n1 , 21, · · · , rnr}. Let b = 1, 2, · · · ,B be index a specific permutation of X, where B=CNN-r1×CN-r1N-(n1+n2)××CN-i=1r-1ninr. The bth permutation sample is denoted X˜b={X˜11b,,X˜1n1b,X˜21b,X˜rnrb}.

  3. Calculate the F-value from bth permutation sample and denote Fb. Repeat for b = 1, · · · ,B.

  4. Find empirical permutation distribution of test statistics.

Now, we present two different approaches for the hypothesis test. First approach uses the p-value calculated by the ratio of the number of cases which are greater than F-value obtained from observations and F-value obtained from permutation samples.


where I(·) is the indicator function, equaling ‘1’ if the condition in parentheses is true, and ‘0’ otherwise. This permutation test is denoted by PT1.

The second approach uses the p-value calculated from the ratio of the number of cases which are greater than the critical points of F distribution. This permutation test is denoted by PT2.


If the p-value is small than the level of significance, reject the null hypothesis that all means are the same for the different groups.

If all possible permutation numbers are large, we randomly choose some of them for the test, which use estimated significant probability. Let ϕ be the true parameter of the estimated significant probability. The data obtained from permutation data follow p1, · · · , pn ~ Bernoulli(ϕ) and V ar() = ϕ(1 – ϕ)/n. Therefore, the estimated variance of the significant probability is small enough if we use more than a specific number of permutations.

3.1 Example 1

Consider the fuzzy data are given to be symmetric triangular fuzzy numbers [3]. We test whether or not the fuzzy mean are the same for the four package designs. The hypothesis for the test is given by:

H0:μ˜1=μ˜2=μ˜3=μ˜4,         vs.Ha:At least one of the μ˜is is not equal.

The fuzzy means for four package designs are Y˜¯1.=(11.5,13.5,15.5),Y˜¯2.=(10.67,14.33,18),Y˜¯3.=(15.33,18.33,21.33) and Y˜¯4.=(18,21.5,25). And fuzzy grand mean is Y˜¯..=(13.7,16.8,19.9). As the special cases for ρ-distances, we also applied some distances introduced by Diamond [11], Ming et al. [12] and Bertoluzza et al. [13]. The F-value calculated from the observed fuzzy data are respectively 5.2410(FD), 5.4330(FM) and 5.5721(FB) where FD, FM and FB are F-value using the distance by Diamond, Ming and Bertoluzza. In this paper, we apply the permutation method to test the hypothesis. The number of total possible permutation is 25,200(B =10C2 ×8C3 ×5C3 ×2C2). Figure 1 is the empirical distributions of F-value(Fb; b = 1· · · , 25, 200) based on permutation sample.

In exact PT1 (EPT1), the p-values which is greater than F-value calculated by observed data are 0.0203, 0.0198, 0.0203, respectively. And the null hypothesis is rejected because all p-values are smaller than the significance level α = 0.05. Moreover, the critical points of distribution are 3.9103, 4.0712, 4.1893, which are greater than F-values obtained from observed data. Hence we conclude that the fuzzy means of package designs are different.

In EPT2, the critical point of F distribution is F0.05;3,6 = 4.7571 and FD, FM, FB are greater than F0.05;3,6, respectively. In addition, the p-values which are defined greater than F0.05;3,6 are 0.0254, 0.0297 and 0.0325, respectively. Therefore, the null hypothesis is rejected at the significance level α = 0.05. In practice, the number of all permutations can be too large to be tractable. In this case, we can test the hypothesis with randomly chosen permutation data. Table 3 shows Monte Carlo simulation results through 5,000 iterations. Let the p-value calculated by EPT is the true parameter, the estimated p-value gets close to the true parameter as B, the number of permutation sample, increases. In other words, the sampling error gets small when B gets increased. In addition, there is no significant differences in errors regardless of distances (Diamond, Ming, Bertoluzza) and test methods (RPT1, RPT2). Moreover, when the alternative hypothesis (Ha) is true, the power of the test, increases, B increases. Further, the power of the test doesn’t show any significant difference regardless of distances or test methods. Here, the power of the test is calculated based on the ratio of the number of rejection among 5,000 simulations. In addition, the ratio of the case when the true parameter is included in the confidence interval under 95% confidence level also gets increased, as B gets increased. Especially, the number of permutation sample should be large enough when more than 4,000 in order to satisfy 95% confidence level.

3.2 Example2

Some samples are selected to compare means of milk pouches which are produced by 4 different machines. The samples are observed as fuzzy numbers because the data could not recorded exactly due to unexpected situation. Table 4 shows the fuzzy observations which are non-symmetric triangular fuzzy numbers [16].

The fuzzy means for four package designs are Y˜¯1.=(9.5,11.5,16),Y˜¯2.=(7.67,10.67,14.33),Y˜¯3.=(12.67,15.33,20) and Y˜¯4.=(15.5,18,21.5). And fuzzy grand mean is Y˜¯..=(11.1,13.7,17.8). The hypothesis for the test of means of numbers of milk pouches produced from each machine is as follow:

H0:μ˜1=μ˜2=μ˜3=μ˜4,         vs.Ha:At least one of the μ˜is is not equal.

FD, FM and FB are 4.9019, 4.9399 and 5.0134, respectively. The number of permutation sample is 25,200. In EPT1, critical point of significance level 0.05 from the empirical distribution are 4.3584, 4.4211 and 4.4866. The p-value are 0.0337, 0.0358 and 0.0354. Therefore, the null hypothesis is rejected. In EPT2, since F-value > F0.05;,3,6 and p-value < α, H0 is rejected. That is, we can conclude that means of the number of pouches are different. Table 6 shows the results from the simulation through 5,000 iterations. As B gets increased, the estimates of p-value get approached true parameter and the power of the test also get improved. In addition, confidence level increase as B increases. Especially, the number of permutation sample should be large enough when more than 4,000 in order to satisfy significance level α = 0.05.

4. Conclusion

In this study, a permutation method is applied to ANOVA model with fuzzy data based on the distances introduced by Diamond [11], Ming et. al. [12] and Bertoluzza et. al. [13]. A permutation method can be applied using an empirical distribution through resampling, which doesn’t need any assumption regarding specific distribution. Regardless of distances or test methods, it is shown through Monte Carlo simulation that the power of the test increases and the significant probability (-value) approach true parameter (p-value), moreover the confidence level increases, as the size of permutation data increases. Therefore, if the size of total possible permutation sample is large enough, the test results in accurate conclusion even if we use only randomly selected permutation sample. This permutation method can be expanded to two-way ANOVA model which will be discussed in our further studies.

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Fig. 1.

Empirical permutation distribution in Example 1. (a) Diamond [11], (b) Ming et al. [12], and (c) Bertoluzza et al. [13].


Table 1

Fuzzy data for sales volumes

Package design i

Table 2

The illustration and the summary statistics for exact permutation method in Example 1

ReplicateBPackage designFb
1234Diamond [11]Ming [12]Bertoluzza [13]
1 (F-value)11˜,16˜15˜,15˜,13˜18˜,17˜,20˜19˜,24˜5.24105.43305.5721
Critical point of empirical distribution3.91034.07124.1893
p-value of EPT10.02030.01980.0203
Critical point of F4.75714.75714.7571
p-value of EPT20.02540.02970.0325

Table 3

The statistical results of random permutation method in Example 1

Diamond [11]RPT1p0.01970.00970.00090.00020.00010.0001
RPT2p̂ – p0.01970.00970.00090.00030.00020.0001

Ming et. al. [12]RPT1p0.01950.00970.00090.00020.00010.0001
RPT2p̂ – p0.01950.00970.00090.00030.00020.0001

Bertoluzza et. al. [13]RPT1p0.01970.00980.00090.00030.00020.0001
RPT2p̂ – p0.01940.00960.00100.00030.00020.0002

Table 4

Fuzzy data for milk pouches

Machine (i)

Table 5

The statistical results of exact permutation method in Example 2

MetricF-valueC.P.F0.05; 3,6p-value of EPT1p-value of EPT2
Diamond [11]4.90194.35844.75710.03370.0395
Ming et. al. [12]4.93994.42114.75710.03580.0406
Bertoluzza et. al. [13]5.01344.48664.75710.03540.0419

Table 6

The statistical results of random permutation method in Example 2

Diamond [11]RPT1p0.01930.01020.00130.00070.00050.0005

Ming et. al. [12]RPT1p0.01920.00990.00100.00030.00020.0001

Bertoluzza et. al. [13]RPT1p0.01920.00990.00100.00030.00020.0001

  1. De Garibay, VG (1987). Behaviour of fuzzy ANOVA. Kybernetes. 16, 107-112.
  2. Konishi, M, Okuda, T, and Asai, K (2006). Analysis of variance based on fuzzy interval data using moment correction method. International Journal of Innovative Computing, Information and Control. 2, 83-99.
  3. Wu, HC (2007). Analysis of variance for fuzzy data. International Journal of Systems Science. 38, 235-246.
  4. Gonzalez-Rodriguez, G, Colubi, A, and Gil, MA (2012). Fuzzy data treated as functional data: a one-way ANOVA test approach. Computational Statistics and Data Analysis. 56, 943-955.
  5. Montenegro, M, Gonzalez-Rodriguez, G, Gil, MA, Colubi, A, and Casals, MR (2004). Introduction to ANOVA with fuzzy random variables. Soft Methodology and Random Information Systems. 26, 487-494.
  6. Gil, MA, Montenegro, M, Gonzalez-Rodriguez, G, Colubi, A, and Casals, MR (2006). Bootstrap approach to the multi-sample test of means with imprecise data. Computational Statistics and Data Analysis. 51, 148-162.
  7. Montenegro, M, Colubi, A, Casals, MR, and Gil, MA (2004). Asymptotic and bootstrap techniques for testing the expected value of a fuzzy random variable. Metrika. 59, 31-49.
  8. Lubiano, MA, and Trutschnig, W (2010). ANOVA for fuzzy random variables using the R-package SAFD. Combining Soft Computing and Statistical Methods in Data Analysis, Advances in Intelligent and Soft Computing, Borgelt, C, ed. Berlin: Springer, pp. 449-456
  9. Jiryaei, A, Parchami, A, and Mashinchi, M (2013). One-way ANOVA and least squares method based on fuzzy random variables. Turkish Journal of Fuzzy Systems. 4, 18-33.
  10. Näther, W (2001). Random fuzzy variables of second order and applications to statistical inference. Information Sciences. 133, 69-88.
  11. Diamond, P (1988). Fuzzy least squares. Information Sciences. 46, 141-157.
  12. Ming, M, Friedman, M, and Kandel, A (1997). General fuzzy least squares. Fuzzy Sets and Systems. 88, 107-118.
  13. Bertoluzza, C, Corral Blanco, N, and Salas, A (1995). On a new class of distances between fuzzy numbers. Mathware & Soft Computing. 2, 71-78.
  14. Heap, BR (1963). Permutations by Interchanges. The Computer Journal. 6, 293-298.
  15. Johnson, SM (1963). Generation of permutations by adjacent transposition. Mathematics of Computation. 17, 282-285.
  16. Kalpanapriya, D, and Pandian, P (2012). Fuzzy hypothesis testing of Anova model with fuzzy data. International Journal of Modern Engineering Research. 2, 2951-2956.

Woo-Joo Lee obtained a Ph.D. in Mathematics majoring mathematical statistics from Yonsei University, Seoul, Korea. His main research interests are data mining, fuzzy time series and intelligent systems.

Hye-Young Jung received her Ph.D. in Mathematics from Yonsei University in 2014. Currently, she is an associate teaching professor at the Faculty of Liberal Education, Seoul National University. Her research interests include statistical methods, fuzzy theory, and statistical genetics.

Jin Hee Yoon received Bachelor, Master and Ph.D. degree in Mathematics from Yonsei University. She is currently faculty of school of Mathematics and Statistics at Sejong University, Seoul, Korea. Her research interests are fuzzy regression analysis, fuzzy time series and F-transform. She is a board member of KIIS (Korean Institute of Intelligent Systems) and has been working as an editor, guest editor and editorial board member of The Scientific World Journal, Information, International Journal of Applied Economics and International Journal of Fuzzy logic and Intelligent Systems etc. Also, she has been working as an organizer and committee member of several international conferences such as 2012, 2013 NIMS Hot topic workshop, 2012, 2013 ICIS, 2014, 2015 FUZZ IEEE, 2015 ISIS, 2016 SCIS&ISIS, 2016 ICFMsquare 2016, ICMF 2016 etc.

Seung Hoe Choi obtained his Ph.D. degree in Mathematical Statistics from Yonsei University Korea in 1994. Since 1996, he is currently full professor of Korea Aerospace University. His main research interests are mathematical prediction method using the soft computing and statistical prediction in sports like soccer, baseball and basketball.