A general approach to the problem of testing statistical nonparametric tests is proposed, for the case when the available data are fuzzy and the level of significance is given as a fuzzy number. To do this, the usual concepts of test statistic and critical value are extended to the fuzzy test statistic and fuzzy critical value, by using the
Nonparametric approaches, including nonparametric tests, provide inferential procedures to statistics based on some weak assumptions regarding the nature of the underlying population distributions. A particular class of nonparametric tests is composed of twosample tests. Such tests are commonly based on crisp (exact/nonfuzzy) observations. But, in real world, there are many situations in which the available data are imprecise (vague/fuzzy) rather than precise (crisp). For instance, in the source water studies, the water level of a river cannot be measured in an exact way because of the fluctuation. In this case, the level of water may be reported as imprecise quantities such as: “about 180 (cm)”, “approximately 145 (cm)”, etc. As another example, in survival analysis, we may not determine an exact value for the lifetime of a certain virus. A virus may active complectly over a certain period but losing in effect for some time, and finally go dead complectly at a certain time. In such case, we may report the lifetimes as imprecise quantities such as: “approximately 40 (h)”, “approximately 55 (h)”, and the like. To perform suitable statistical methods for dealing with imprecise observations, first we need to model such data, and then, extend the usual approach to imprecise environment. Fuzzy set theory seems to have suitable tools for modeling these data and providing appropriate statistical methods based on such data.
After introducing fuzzy set theory, there have been a lot of attempts for developing fuzzy statistical methods. But, as the authors know, there have been a few works on nonparametric approach in fuzzy environment. Concerning the purposes of this article, let us briefly review some of the literature on this topic. Kahranam et al. [1] proposed some algorithms for fuzzy nonparametric ranksum tests based on fuzzy random variables.
Grzegorzewski [2] introduced a method for inference about the median of a population using fuzzy random variables.
Also, the author demonstrated a straightforward generalization of some classical nonparametric tests for fuzzy random variables [3]. The last work relies on the quasiordering based on a metric in the space of fuzzy numbers. He studied some nonparametric median tests based on the necessity index of strict dominance suggested by Dubios and Prade [4], for fuzzy observations [5, 6]. In this manner, he obtained a fuzzy test showing a degree of possibility and a degree of necessity for evaluating the underlying hypotheses. Denoeux et al. [7], using a fuzzy partial ordering on closed intervals, extended the nonparametric ranksum tests based on fuzzy data. For evaluating the hypotheses of interes, they employed the concepts of the fuzzy pvalue and degree of rejection of the null hypothesis quantified by a degree of possibility and a degree of necessity, when a given significance level is a crisp number or a fuzzy set. Hryniewicz investigated the fuzzy version of the GoodmanKruskal γ statistic described by ordered categorical data [8], see also [9]. Grzegorzewski and Szymanowski [10] studied the problem of Goodnessoffit tests for fuzzy data. Taheri and Hesamian [11] developed the Wilcoxonrank test for fuzzy data. They also studied some linear rank tests for fuzzy data, by using a pvaluebased method [12]. Recently, Taheri et al. [13] studied the statistical inference for contingency tables when the available data are fuzzy rather than crisp.
The present paper aims to develop some nonparametric statistical twosample tests for fuzzy data, based on extending the concept of classical critical value and comparing it with corresponding test statistic. This paper is organized as follows: In Section 2, we we recall some concepts of fuzzy numbers and fuzzy random variables. In Section 3, based on an index for ranking fuzzy numbers, we introduce a method to construct a version of linear rank tests for fuzzy data. Also, we extend the concept of classical critical value when the given significance level is a fuzzy number, too. In Section 4, we provide the method of decision making to accept or reject the hypothesis of interest. To do this, we use a preference degree to compare the observed fuzzy test statistic and the fuzzy critical value. A numerical example is given in Section 5 to calrify the proposed approach. Finally, a brief conclusion is provided in Section 6. A brief review on some linear rank tests is given in
A fuzzy set Ã of the universal set
One of the very realistic kind of fuzzy numbers is the triangular fuzzy number denoted by Ã = (ε_{1}, a,ε_{2})_{T}, with the following membership function
We denote by (ℝ) the set of all fuzzy real numbers (for more on fuzzy numbers see [14]).
Note that, given a real number z, we can induce a fuzzy number z̃ with membership function μ_{z̃}(r) such that μ_{z̃}(x) = 1 and μ_{z̃}(r) < 1 for r ≠ x [15].
Let ℱ(ℝ) ⊆ (ℝ) be the set of all real fuzzy numbers induced by the real set ℝ. We define the relation ~ on ℱ(ℝ) as z̃_{1}~z̃_{2} iff z̃_{1} and z̃_{2} are induced by the same real number z. Then ~ is an equivalence classes [z̃] = {ã: ã ~ z̃}. The quotient set ℱ(ℝ)/~ is the set of all equivalence classes. We call ℱ(ℝ)/~ as the fuzzy real number system. In practice, we take only one element z̃ from each equivalence class [z̃] to form the fuzzy real number system (ℱ(ℝ)/~)_{ℝ} that is
To treat imprecise observations, we use the concept of fuzzy random variable similar to those of Gerzegorzewski [6] and Wu [15].
Letbe the sample space of a random variableXdefined on the probability space (Ω, ,
(a) For any α ∈ [0, 1] and all ω ∈ Ω, the real valued mapping inf X̃_{α}: Ω → ℝ, satisfying inf X̃_{α}(ω) = inf(X̃(ω))_{α} and sup X̃_{α}: Ω → ℝ satisfying sup X̃_{α}(ω) = sup(X̃(ω))_{α} are realvalued random variables.
(b) { X̃_{α}(ω): α ∈ [0, 1]} is a set representation of X̃(ω) for all ω ∈ Ω.
Note that, in the above definition, (ℱ( )/~)_{} is the support of the fuzzy random variable X̃ Therefore, each αcut of X̃ depends on the random variable X. Thus, the crisp random variable X may interpret as the origin of the fuzzy random variable X̃.
We say thatX̃_{1},X̃_{2}, …,X̃_{m}(with observedx̃_{1},x̃_{2}, …,x̃_{m}) are independent and identically distributed (briefly, say a fuzzy random sample) if their related originsX_{1},X_{2}, …,X_{n}are independent and identically distributed crisp random variables.
In this section, we are going to extend the statistical linear tests to examine the hypothesis test about the differences in location or variability between two populations based on a set of imprecise (fuzzy) observations. The proposed approach is based on two key concepts of fuzzy test statistic and fuzzy critical value.
First, note that the classical linear rank statistic can be rewritten as follows (see
where, I denotes the indicator functions and z_{j} denotes the jth observation in combined observations x_{1}, x_{2}, …, x_{n} and y_{1}, y_{2}, …, y_{m}.
Now, to perform the nonparametric twosample tests for location or scale problem based on fuzzy observations x̃_{1}, x̃_{2}, …, x̃_{n} and ỹ_{1}, ỹ_{2}, …, ỹ_{m}, we need a suitable method of ranking fuzzy numbers. Here, we recall a definition of a common method of ranking fuzzy numbers called the necessity index of strict dominance (NSD index), suggested by Dubois and Prade [4].
For two fuzzy numbersÃandB̃, we can evaluate the degree of necessity to which the relationÃ ≻ B̃is fulfilled by
The fuzzy relation Nec is antisymmetric and transitive (i.e. it is a fuzzy partial order on (ℝ)). We use NSD index because of its property and natural interpretation employed in some problems of statistics (for more details, see [7, 16]).
Consider the problem of a nonparametric linear rank tests based on combined imprecise observations x̃_{1}, x̃_{2}, …, x̃_{n} and ỹ_{1}, ỹ_{2}, …, ỹ_{m} (denoted by z̃_{1}, z̃_{2} …, z̃_{N}) whose αcuts are
in which,
If the available fuzzy observations x̃_{1}, x̃_{2}, …, x̃_{n} and ỹ_{1}, ỹ_{2}, …, ỹ_{m} reduce to the crisp values x_{1}, x_{2}, …, x_{n}, and y_{1}, y_{2}, …, y_{m}, then the fuzzy linear rank test statistic T̃_{N} reduces to the classical linear rank test statistic T_{N} (see
In the linear rank tests for comparing two statistical populations, the usual approach is to compare the observed test statistic and relative critical value. In this section, by introducing and applying the concept of fuzzy critical value, we are going to generalize this approach for location and scale problem, for fuzzy observations. We establish the approach for location problem. First, we consider the case of testing H_{0}: θ = 0 against H_{1}: θ > 0. At a significance level of δ, the common method rejects H_{0} if T_{N} ≤ w_{δ}, where
Now, assume that the level of significance is given by a triangular fuzzy number as δ̃ = (ε_{1}, δ,ε_{2})_{T} [17]. Hence, every value satisfying
Consider the problem of location hypothesis testsH_{0}: θ = 0 andH_{1}: θ > 0. At the fuzzy significance levelδ̃ = (ε_{1},δ,ε_{2})_{T}, the fuzzy critical value is defined to be a fuzzy set with the following membership function
in which
Suppose we wish to testH_{0}: θ = 0 againstH_{1}: θ < 0, at the fuzzy significance levelδ̃ = (ε_{1},δ,ε_{2})_{T}. The fuzzy critical value is defined to be a fuzzy set with a membership function as follows
where
Now, we consider the case of testing H_{0}: θ = 0 against the alternative H_{1}: θ ≠ 0. In the classical case, at a crisp significance level of δ, we reject the null hypothesis H_{0} when T_{N} ≤ w_{δ} or
In the problem of hypothesis testH_{0}: θ = 0 againstH_{1}: θ ≠ 0, at the fuzzy significance levelδ̃ = (ε_{1},δ,ε_{2})_{T}, the lower and upper fuzzy critical values are defined to be fuzzy sets with the following membership functions
Fuzzy lower critical value:
where,
Fuzzy upper critical value:
where,
Using normal approximation, we can also define the fuzzy critical values, for large sample sizes, in a similar way.
Let the probability distribution of T_{N} be symmetric. Then the upper fuzzy critical value for twosided alternative hypothesis test H_{1}: θ ≠ 0, reduces as follows
Hence, based on interval arithmetic, it is easy to see that
Consider the problem of linear rank test for location problem with imprecise observations at a given fuzzy significance level. One can expect that, for the case of testing H_{0}: θ = 0 versus H_{1}: θ > 0, if the observed fuzzy test statistic is less than the corresponding fuzzy critical value, then H_{0} is rejected, otherwise H_{0} is accepted (the similar argument can be stated for other two cases). To do this, we need a criterion for comparing the observed fuzzy test statistic and fuzzy critical value. The generalized rejection regions may be represented as given in Table 1, in which ⋄ denotes a suitable ranking operator (see, for example, [18, 19]). One of the most commonly used methods for ranking fuzzy sets consists in the definition of the preference degree P_{c} [19].
For two discrete fuzzy sets Ã and B̃ with the supports {a_{1}, …, a_{n}} and {b_{1}, …, b_{m}}, the P_{c}index is defined as
and
where ⊙ is a tnorm operator that for a > 0 and b > 0 satisfies the condition a ⊙ b > 0 (see [14, 18]). In the following, we utilize the minoperator as the tnorm operator.
The preference degree P_{c} takes its values in the interval [0, 1]. In addition, P_{c}(Ã,B̃) = 1 − P_{c}(B̃,Ã). It is obvious that if P_{c}(Ã,B̃) = 1, then Ã is absolutely preferred to B̃, if P_{c}(Ã,B̃) = 0.5 then Ã and B̃ are equally preferred, and B̃ is absolutely preferred to Ã if P_{c}(Ã,B̃) = 0.
Consider the null hypothesisH_{0}: θ = 0 against an alternative hypothesis given in Table 1 for location problem. Based on Definition 4.1,
in the onesided case (1), we reject H_{0} with preference degree
in the onesided (2), we reject H_{0} with preference degree
in the twosided case (3), we reject H_{0} with preference degree
Note that, the comparison approach used in pervious definition is subjective, and so nothing of the main results of the present work will be lost by altering these definitions to ones which fit the demands of the decision makers.
By similar argument as we noted in previous definition, we can define the the preference degree to accept or reject the hypothesis H_{0}: θ = 1 versus an alternative hypothesis given in Table 6 for scale problem.
It should be mentioned that, the proposed method of test (which is illustrated in Section 3) and the method of decision making are general. So that, one can use the methods based on nonsymmetric fuzzy numbers (observations). Note that, there is not any limitation, in this regard, in the definitions of fuzzy test statistic and fuzzy critical value as well as in the definition of the depreference degree P_{c}.
To clarify our proposed method, a numerical example is provided in this section.
Two potential suppliers of street lighting equipment, A and B, want to present their bids to a city manager. Two independent random samples of size 5 and 4 street lighting equipments were tested from each supplier because the tests are expensive and may take considerable time to complete. Since, under some unexpected situations, we cannot measure the life lengths, precisely, we can just obtain the tire life around a number. The life lengths are reported to be triangular fuzzy numbers as shown in Table 2. We wish to test whether the life length of suppliers A and B have equal variability (i.e. H_{0}: σ_{A} = σ_{B}).
Before we test for scale, we must determine whether we can assume the locations (medians) can be regarded as equal (i.e. H_{0}: M_{A} = M_{B}). One of the most commonly used tests for the location problem is the Wilcoxon test. Using Definition 3.2, the fuzzy wilcoxon test statistic is obtained as follows
At the fuzzy significance level δ̃ = (0.02, 0.05, 0.08)_{T}, since the probability distribution of W_{N} is symmetric, from Definition 3.5 and Remark 3.2, the fuzzy lower and upper critical values are calculated as follows
By using Definition 4.1, we obtain
Now, we wish to test that whether the life length of suppliers A and B have equal variability. The fuzzy SiegelTukey test statistic can be obtained as
since the probability distribution of S̃_{N} is the same as that of the W_{N}, Therefore, the fuzzy critical values are remaind to that of Wilcoxon test. We obtain
As natural generalizations of the some statistical nonparametric tests, we proposed the corresponding tests based on fuzzy observations, when the given significance level is a fuzzy number, too. To do this, the usual concepts of the test statistic and critical value were extended to the concepts of the fuzzy test statistic and fuzzy critical value. For decision making (about rejection/acceptance the null hypothesis of interest) a method was used to compare the observed fuzzy test statistic and associated fuzzy critical value on the basis of a preference degree for ranking fuzzy sets. The proposed method is general and so, it can be applied for other nonparametric tests such as MannWhitney test and Kendall test.
Finally, it should be mentioned that, in many realworld problems, we initially come across with the fuzzy (vague) data, so that the proposed method in this article (generally the methods for fuzzy data) can be applied for such problems. Beside, we may consider the problem of testing fuzzy rather than crisp hypotheses, in which, by using crisp or fuzzy data, we wish to test some vague claims about the underlying statistical polulation(s) (see e.g., [21, 22]). Such extensions are suggested to expand the idea proposed in this article.
No potential conflict of interest relevant to this article was reported.
Rejection regions
Alternative  Rejection region 

(1) 

(2) 

(3) 
Data set in Example 5.1
Supplier A  Supplier B 

(25, 35, 45) 
(36, 46, 56) 
(56, 66, 76) 
(46, 56, 66) 
(48, 58, 68) 
(50, 60, 70) 
(73, 83, 93) 
(39, 49, 59) 
(61, 71, 81) 
 
Some wellknown test statistics for twosample location problem
Test statistic  

Wilcoxon: 

VanDer Vaerden: 

TerryHoeffing: 
Rejection regions for general twosample location problem
Alternative hypothesis: 
Rejection region 

Some wellknown test statistics for twosample scale problem
Test Statistic  

Mood: 
( 
AnsariBradly: 

Klotz NormalScores: 
(Φ^{−1}( 
SiegelTukey: 
Rejection regions for general twosample scale problem
Alternative hypothesis: 
Rejection region 
