Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2018; 18(4): 303-315

Published online December 31, 2018

https://doi.org/10.5391/IJFIS.2018.18.4.303

© The Korean Institute of Intelligent Systems

An Intelligent Similarity Model between Generalized Trapezoidal Fuzzy Numbers in Large Scale

Mohamedou Cheikh Tourad, and Abdelmounaim Abdali

Applied Mathematics and Computer Science Laboratory, Cadi Ayyad University, Marrakech, Morocco

Correspondence to :
Mohamedou Cheikh Tourad, (cheikhtouradmohamedou@gmail.com)

Received: November 13, 2018; Revised: December 19, 2018; Accepted: December 24, 2018

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

The rapid expansion of data published on the web has given rise to the similarity problem on a large scale, a very important subject for scientific research in the field of computer science. Several methods have been developed for this. In this paper, we propose the first mathematical model to find the similarity value between generalized trapezoidal fuzzy numbers (GTFNs). This model employs fuzzy inference systems to find the value of an effective weighting, the weights to be associated to different kinds of methods that can handle an important scale of the data. This model will allow us to develop intelligent systems. A comparative study based on 21 sets of GTFNs has been carried out to demonstrate the difference between our approach and existing methods. This study shows that our model is more reasonable than existing methods.

Keywords: Cosine coefficient, Jaccard index, GTFNs, FIS, Similarity, Large scale

The concept of fuzzy logic, proposed in 1965 by Zadeh [1], has been used to manage a kind of probability. To employ this concept, the generalized trapezoidal fuzzy numbers (GTFNs) are most the most popular in practice. In the literature, several similarity methods between GTFNs have been introduced (e.g., [210]). But, these existing methods of similarity measures have many weaknesses. In many situations, such methods cannot appropriately find the similarity between two GTFNs. In the present study, a novel mathematical model for a fuzzy-number similarity method between GTFNs has been created, based on the weights associated to each similarity measure. This model uses the cosine coefficient and the Jaccard Index. Additionally, we describe and provide three characteristics of the proposed model. A comparative study has been carried out, based on 21 sets of GTFNs, to show that the model can surmount the limitations of the existing measures.

The remainder of this article is structured as follows. Section 2 presents a summary of the fundamental notions of existing methods. Section 3 introduces the novel approach, presenting the similarity methods employed and finding the weights associated to each similarity method. Many properties are suggested and proven. Section 4 compares it with the existing similarity methods. We give our conclusions in Section 5.

The notion of a GFN T̃ is presented by Chen [11, 12] as follows: T̃ = {t1, t2, t3, t4, w }, where the tkk = 1, 2, 3, 4 denote real numbers, and we take into account a weight w with 0 < w < 1. When w = 1, T̃ is a normal trapezoidal fuzzy number (NTFN). When t2 = t3, T̃ is called a GTFN. When t1 = t2 and t3 = t4, T̃ is called a crisp interval. When w = 1 and t1 = tk (where k =1,2,3,4), T̃ is a real number.

Several Similarity methods between GTFNs T̃ and H̃ = {h1, h2, h3, h4, w } have been introduced:

Chen [2] proposed a new method to solve the similarity problems of the GFNs. This method is based on the geometric distance. The author used this method to clear the problems of fuzzy risk analysis and subjective mental workload assessment.

SC(T˜,H˜)=1-(k=14tk-hk4).

Hsieh and Chen [3] introduced a new approach to find a solution to the similarity problems of the GFNs. This approach is based on the average integration representation distance.

SHC(T˜,H˜)=11+d(T,H),

where d(, ) = |P(T) − P(H)|, P(T˜)=t1+2t2+2t3+t46,P(H˜)=h1+2h2+2h3+h46.

Lee [4] presented a new measure to answer the similarity problems of the GFNs. This measure is based on the the LP norm. The author used this measure to solve the problems of aggregating individual opinions into group consensus under group decision.

SL(T˜,H˜)=1-T˜-H˜LPU×4-1P,

where T˜-H˜LP=(k=14(tk-hk)P)1p, taking into account a positive integer P. ||U|| = max U − min U where U indicates the universe of discourse. Chen and Chen [5] defined a new proposed method to clear the similarity problems of GFNs. It is based on the simple center of gravity method (SCGM). The authors used this proposed to solve the problems of fuzzy risk analysis.

SCC(T˜,H˜)=1-(k=14tk-hk4)×(1-XT˜*-XH˜*)B(ST˜,SH˜)×min(yT˜*,yH˜*)max(yT˜*,yH˜*),

where

yT˜*={wT˜6×(2+t3-t2t4-t1)if t4t1,wT˜2if t4-t1,yH˜*={wH˜6×(2+h3-h2h4-h1)if h4h1,wH˜2if h4-h1,xT˜*={(yT˜*×(t3+t2))+(t4+t1)×(wT˜-yT˜*)2wT˜if wT˜0t4+t12if wT˜=0,xH˜*={(yH˜*×(h3+h2))+(h4+h1)×(wH˜-yH˜*)2wH˜if wH˜0h4+h12if wH˜=0,B(ST˜,SH˜)={1if ST˜+SH˜>0,0if ST˜+SH˜=0,

where S = t4t1, S = h4h1. Wei and Chen [7] proposed a new approach in order to solve the similarity problems of GFNs. It is based on the concepts of geometric distance, the perimeter and the height of GFNs. The authors used this approach to to answer the problems of fuzzy risk analysis.

SWC(T˜,H˜)=(1-(k=14tk-hk4))×(min(P(T˜),P(H˜))+min(wT˜,wH˜)max(P(T˜),P(H˜))+max(wT˜,wH˜)),

where

P(T˜)=((t1-t2)2+(wT˜)2)+(t3-t2)+((t4-t3)2+(wT˜)2))+(t4-t1),P(H˜)=((h1-h2)2+(wH˜)2)+(h3-h2)+((h4-h3)2+(wH˜)2))+(h4-h1).

Recently, Xu et al. [8] changed the similarity method SCC proposed by Chen and Chen [5], introducing the new similarity SX. The authors used this method to answer the problems of fuzzy risk analysis.

SX(T˜,H˜)=1-(k=14tk-hk8)-(d(T˜,H˜)2),

where d(T˜,H˜)=((xT˜*-xH˜*)2+(yT˜*-yH˜*)2)1.25.

Chen [9] introduced a new proposed method to find a solution to the similarity problems of GFNs. This method is based on the geometric mean operator. The author used this approach to clear the problems of fuzzy-number used in the information retrieval.

SSJ(T˜,H˜)=[(k=14(2-tk-hk))14-1]×min(yT˜*,yH˜*)max(yT˜*,yH˜*),

where yT˜* and yH˜* are calculated as in Chen and Chen [5].

We will now present a novel mathematical model, MCESTA (Mohamedou Cheikh Elghotob Cheikh Saad bouh Cheikh Tourad Abass), Abdelmounaim Abdali, a large scale similarity method between GTFNs, call them T̃ and H̃. It is a kind of hybrid of the similarity measures Sk and Skq, where each measure is associated with an importance weight (αk and βq). MCESTA is an efficient development of the formula FBHSM=k=1nαk×Sk proposed by Gupta et al. [13] to calculate similarity in IR (information retrieval). MCESTA employs a fuzzy inference system (FIS) to find the value of the effective weighting. This hybrid similarity measure is calculated as follows:

MCESTA(T˜,H˜)={k=1nαk×Sk(T˜,H˜),Sk(T˜,H˜)=q=1mkβq×Skq(T˜,H˜),

where k=1nαk1 and q=1mkβq1.

Our model will be as follows:

MCESTA(T˜,H˜)=K=1nαK×[q=1mKβqSkq(T˜,H˜)],

where Sk is a similarity method between T̃ and H̃, Skq is a similarity sub-measure between T̃ and H̃ that should take into account the two criteria of their relative location and size. This measure is calculated to determine a reasonable value of similarity Sk. αk is an importance weight associated with Sk and βq is an importance weight associated with Skq. The number n defines the number of similarity methods Sk used to calculate the model. The number mk defines the number of similarity sub-measures Skq employed to calculate Sk.

3.1 Similarity Measures Sk and Skq

The descriptions of the similarity methods employed, the cosine coefficient and the Jaccard Index [14], are given below. This choice of similarity measures will be validated in Section 4. We have n = 2, S1 = cos and S2 = Jaccard and we choose m1 = 1 and m2 = 2, which implies, in (9),

MCESTA(T˜,H˜)=k=12αk×[q=1mkβqSkq(T˜,H˜)].

for each k we provide a corresponding mk.

MCESTA(T˜,H˜)=α1×[q=1m1βqS1q(T˜,H˜)]+α2×[q=1m2βqS2q(T˜,H˜)],MCESTA(T˜,H˜)=α1×[β1S11(T˜,H˜)]+α2×[β1S21(T˜,H˜)+β2S22(T˜,H˜)],MCESTA(T˜,H˜)=α1×β1cos(T˜,H˜)+α2×β1Jaccard1(T˜,H˜)+α2×β2Jaccard2(T˜,H˜),

where S11=cos,S21=Jaccard1 and S22=Jaccard2. To facilitate the calculation of this approach, one needs to make the following changes of variable: C = α1 × β1, J1 = α2 × β1 and J2 = α2 × β2. So this equation will become

MCESTA(T˜,H˜)=C×cos(T˜,H˜)+J1×Jaccard1(T˜,H˜)+J2×Jaccard2(T˜,H˜).
3.1.1 The cosine coefficient cos(, )

The cosine coefficient calculates the similarity between vectors in an easy and more intelligent way: it is by the determination of the direction. cos(, ) is calculated as follows:

xk=tk,

where [k = {1, 2, 3, 4}], x5 = wT, x6 = t2t1, x7 = t3t2, x8 = t4t3. Also, yk = hk where [k = {1, 2, 3, 4}], y5 = wH, y6 = h2h1, y7 = h3h2, y8 = h4h3.

cos(T˜,H˜)=k=18xk×yk(k=18xk2)12×(k=18yk2)12.
3.1.2 The Jaccard Index

The Jaccard Index, Jaccard(, ) [10], measures the similarity as follows: the size of the intersection is divided by the size of the union.

Jaccard1(, ) is calculated as follows: xk = tkm, where [k = {1, 2, 3, 4}], x5 = w, x6 = t2t1, x7 = t3t2, x8 = t4t3, and yk = hkm, where [k = {1, 2, 3, 4}], y5 = w, y6 = h2h1, y7 = h3h2, y8 = h4h3, where m = min(t1, h1).

Jaccard1(T˜,H˜)=k=18xk×yk(k=18xk2)+(k=18yk2)-k=18xk×yk.

Jaccard2(, ) is calculated as following: x5-k*=Mk-xk, where [k = {1, 2, 3, 4}], x5*=wT,x6*=t2-t1,x7*=t3-t2,x8*=t4-t3, and y5-k*=Mk-yk, where [k = {1, 2, 3, 4}], y5*=wH,y6*=h2-h1,y7*=h3-h2,y8*=h4-h3, where M = max(t4, h4).

Jaccard2(T˜,H˜)=k=18xk*×yk*k=18(xk*)2+k=18(yk*)2-i=k8xk*×yk*.

3.2 The Methods to Define the Weights C, J1 and J2

Our choice is based on an FIS, such systems are recognized and have been used in many different fields [15, 16]. To calculate the weights mentioned in Eq. 9, we use the Mamdani-type FIS mentioned in [13, 1719] adapted to our model. Figure 1 shows the Mamdani for MCESTA.

In Table 1 there is presented the FIS fuzzy rule for the MCESTA, which is a set of semantic declarations that define how the FIS-Mamdani must carry out its decision making for input state (Cosine, Jaccard1, and Jaccard2) or controlling an output(C, J1 and J2). This table uses the values of the functions presented in Figure 2.

In Figure 3, the FIS-diagram for the MCESTA defined as a procedure for developing the association relation (from a given value to an output value) based on the concepts of fuzzy logic. This figure shows the values which are the most important for the weights following C, J1 and J2.

Figure 4 to Figure 9 show the evolution of the FIS-Rule surface diagram for the weights (C, J1 and J2) in MCESTA.

3.3 Properties

3.3.1 IdenticalT˜=H˜MCESTA(T˜,H˜)=1.
3.3.2 Symmetric relationMCESTA(T˜,H˜)=MCESTA(H˜,T˜).
3.3.3 Scale-free

This property has been used by Hwang and Yang [10] for validating their proposed similarity method. We have that

MCESTA(T˜1,H˜1)=MCESTA(T˜,H˜)

for 1 = α × = (α×t1, α×t2, α×t3, α×t4, α×w) and 1 = α × = (α × h1, α × h2, α × h3, α × h4, α × w), where α ≥ 0. Normally, the best similarity should be scale free. This is a very significant property for the large scale. According to this result, we deduce that a very large data Set1 = (1, 1) can be analyzed using another batch of data of small size Set = (, ). But according to the scale relation 1 = α × and 1 = α × , the interpretation and the analysis of Set gives a global idea of the interpretation of Set1, which is important when it can be difficult to treat everything in the context of big data or the machines don’t have sufficient capacity. This property is very important for large scale studies.

3.4 Proof of Properties

To see that our model verifies the three properties, it is enough that every similarity (Cosine, Jaccard1 and Jaccard2) used in this model satisfies these properties. For the sub-measures Jaccard1 and Jaccard2, this is proved by Hwang and Yang [10]. It remains to treat Cosine.

3.4.1 MCESTA is identicalT˜=H˜cos(T˜,H˜)=1.
  • The ⇒ is proved by observing that since = , then xk = yk, k = {1, 2, 3, 4, 6, 7, 8}. Therefore

    k=18xk×yk(k=18xk2)12×(k=18yk2)12=k=18xk×xk(k=18xk2)12×(k=18xk2)12=k=18xk2k=18xk2=1.

    Thus,

    tildeT=H˜cos(T˜,H˜)=1.

  • The ⇐ is proved as follows: Since

    cos(T˜,H˜)=1k=18(xk×yk)(k=18xk2)12×(k=18yk2)12=1,k=18(xk×yk)=(k=18xk2)12×(k=18yk2)12(k=18xk×yk)2=(k=18xk2)×(k=18yk2),

we have that:

(k=18xk2)×(k=18yk2)=k=18xk2×yk2+(k=18xk2)×(q=1,qk8yq2)=k=18xk2×yk2+k=18q=1,qk8xk2×yq2,

(a) (k=18xk2)×(k=18yk2)=k=18xk2×yk2+k=18q=1,qk8xk2×yq2, and we have that:

(b) (k=18xk×yk)2=k=18xk2×yk2+k=18q=1,qk8(xk×yk)×(xq×yq), the intersection between (a) and (b) is 0:

k=18q=1,qk8xk2×yk2-k=18q=1,qk8(xk×yk)×(xq×yq)=0,(k=18q=1,qk8(xk×yq))×[(xk×yq)-(xq×yk)]=0,

we have:

0k=18xk×yk(k=18xk2)12×(k=18yk2)1210(k=18xk×yk)2(k=18xk2)×(k=18yk2)1(a)-(b)0,

so (a)–(b)= 0 ⇒ (xk × yq)) × [(xk × yq) − (xq × yk)] = 0, we have xk ≠ 0 and yk ≠ 0. Thus (a)–(b)= 0 ⇒ [(xk ×yq)− (xq × yk)] = 0. ⇒ ∃λ/∀λ ≥ 0, xkyk=xqyq=λ. particularly λ = 1 ⇒ xk = yk.

Therefore, we have that xk = yk, [k = 1, 2, 3, 4, 5, 6, 7, 8], T̃= H̃.

3.4.2 MCESTA is a symmetric relationcos(T˜,H˜)=cos(H˜,T˜).

This follows since k=18xk×yk=k=18yk×xk and (k=18xk2)12×(k=18yk2)12=(k=18yk2)12×(k=18xk2)12. It simple to show that cos(, ) = cos(, ).

3.4.3 MCESTA is scale-freecos(T˜1,H˜1)=cos(T˜,H˜).

Suppose 1 = α × = (α × t1, α × t2, α × t3, α × t4, α × w) and 1 = α × = (α×h1, α×h2, α×h3, α×h4, α×w), where α ≥ 0.

We have cos(T˜1,H˜1)=k=18(α×xk)×(α×yk)(k=18((α×xk)2)12×(k=18((α×yk)2)12=k=18xk×yk(k=18xk2)12×(k=18yk2)12=cos(T˜,H˜).

4.1 Implementation

Our approach to similarity and the existing methods SC (Eq. 1), SHC (Eq. 2), SL (Eq. 3), SCC (Eq. 4), SWC (Eq. 5), SX (Eq. 6), SSJ (Eq. 7) are implemented employing the Python 3.6 with libraries data science (NumPy, SciPy). All calculations were done with a quad-core processor 3.60 GHz and a 16 GB memory for the JVM. The sets of GTFNs are implemented as associated lists of arrays (Array List).

4.2 Examples and Comparisons

To validate and compare our contributions with the existing methods, we made the calculations on the sets already used by Hwang and Yang [10] in 2014. There are 21 sets (Set1 to Set21) of fuzzy numbers, shown in Figure 10 to Figure 30 respectively. The estimated times taken by our approach and by the existing methods are given in Table 2. We can demonstrate and analyze the weaknesses and limitations of each of the existing similarity measures from the table.

We can analyze Table 2 in terms of three types: incorrect results, scale-dependent results, and direction.

4.2.1 Incorrect results
  • We have in Set1, SHC(Ã,Ã1) =1: we can judge the result as incorrect.

  • For Set3 and Set4, we have that the similarity of Set4 should be more similar than that of Set3, but the similarity methods SC(Ã,Ã1), SHC(Ã,Ã1) and SL(Ã,Ã1) produce the same result: we can judge the result as incorrect.

  • We have in Set5, Ã ≠ Ã1. The similarity produced by SC(Ã,Ã1), SC(Ã,Ã1) and SL(Ã,Ã1) is 1: we can say the result is incorrect.

  • We have in Set6 for SL(Ã,Ã1), it can’t evaluate the similarity degree. we can judge the result as incorrect.

  • For Set7, the similarity produced by SL(Ã,Ã1) is 0: the result is incorrect.

  • For Set8 and Set9, we have that the similarity of Set4 should be different from the similarity of those with Set3, but the similarity methods SC(Ã,Ã1) and SC(Ã,Ã1) produce the same result: we can say the result is incorrect.

  • For Set9 and Set10, we have that the similarity of Set9 should be more similar than that Set10, however, Table 2 proves that the similarity produced by the methods SC(Ã,Ã1), SHC(Ã,Ã1), SL(Ã,Ã1), SCC(Ã,Ã1) and SX(Ã,Ã1) are incorrect results.

  • We have in Set11, Ã ≠ Ã1. The similarity produced by SHC(Ã,Ã1) is 1: we can judge the result as incorrect.

  • We have in Set14, Ã ≠ Ã1. The similarity produced by SC(Ã,Ã1), SHC(Ã,Ã1) and SL(Ã,Ã1) is 1: the result is incorrect.

  • For Set14 and Set15, we have that the similarity of Set14 should be more similar than that of Set15. However, Table 2 proves that the similarity produced by the methods SC(Ã,Ã1), SHC(Ã,Ã1), SL(Ã,Ã1), SCC(Ã,Ã1), SX(Ã,Ã1) and SSJ(Ã,Ã1) are incorrect results.

  • It can be seen for the two different Set16 and Set17 that the methods SC(Ã,Ã1), SHC(Ã,Ã1), SCC(Ã,Ã1) and SX(Ã,Ã1) produce incorrect results.

  • For Set18 and Set19, we have that the similarity of Set18 should be more similar than that of Set19, but the similarity methods SC(Ã, Ã1), SHC(Ã,Ã1), SL(Ã,Ã1), SCC(Ã, Ã1), SWC(Ã,Ã1), SX(Ã, Ã1) and SSJ(Ã, Ã1) produce the same result: we can say the result is incorrect.

4.2.2 Scale-dependent results
  • For Set20 and Set21, we have that Set20 and Set21 are in double-scale relation. Usually, the best similarity methods must verify property 3 (scale-free), but the similarity methods SC(Ã, Ã1), SHC(Ã, Ã1), SL(Ã, Ã1), SCC(Ã, Ã1), SWC(Ã Ã1), SX(Ã, Ã1) and SSJ(Ã, Ã1) are scaledependent (are not scale-free), while our approach is scale-free.

4.2.3 Direction(Cosine) passing to large-scale
  • The direction similarity is very important. This measurement is used in the big data framework when the data tends towards the infinite. But the similarity methods SC(Ã,Ã1), SHC(AÃ,Ã1), SL(Ã,Ã1), SCC(Ã,Ã1), SWC(Ã,Ã1), SX(Ã,Ã1)and SSJ(Ã,Ã1) do not use the direction, while our approach uses this measurement with a weight C = 0.508, see Figure 3.

After the analysis of Table 2, the three properties and the direction based on the cosine coefficient help this similarity approach to be the best choice for the large scale.

A novel mathematical model MCESTA for GTFNs has been is presented (as well as the existing methods). This model is based on using weights associated with each one of several differeur nt similarity measures. We have been able to infer the importance weights by a Mamdani-type FIS [17]. This model uses the cosine coefficient and Jaccard Index. Three properties of the model are proved, one property is advantageous for being used with large scale datasets. A comparative study has also been presented to explain how onovel approach can overcome the limitations and weaknesses of the existing methods. This approach will help us develop an intelligent filtering of the pub-sub system [20, 21].

Fig. 1.

The Mamdani for MCESTA.


Fig. 2.

FIS value membership functions.


Fig. 3.

FIS-diagram for the MCESTA.


Fig. 4.

FIS-rule surface diagram for C by Cosine and Jaccard1 in MCESTA.


Fig. 5.

FIS-rule surface diagram for C by Cosine and Jaccard2 in MCESTA.


Fig. 6.

FIS-rule surface diagram for J1 by Jaccard1 and Cosine in MCESTA.


Fig. 7.

FIS-rule surface diagram for J1 by Jaccard1 and Jaccard2 in MCESTA.


Fig. 8.

FIS-rule surface diagram for J2 by Jaccard2 and Cosine in MCESTA.


Fig. 9.

FIS-rule surface diagram for J2 by Jaccard2 and Jaccard2 in MCESTA.


Fig. 10.

Set1.


Fig. 11.

Set2.


Fig. 12.

Set3.


Fig. 13.

Set4.


Fig. 14.

Set5.


Fig. 15.

Set6.


Fig. 16.

Set7.


Fig. 17.

Set8.


Fig. 18.

Set9.


Fig. 19.

Set10.


Fig. 20.

Set11.


Fig. 21.

Set4.


Fig. 22.

Set13.


Fig. 23.

Set14.


Fig. 24.

Set15.


Fig. 25.

Set16.


Fig. 26.

Set17.


Fig. 27.

Set18.


Fig. 28.

Set19.


Fig. 29.

Set20.


Fig. 30.

Set21.


Table. 1.

Table 1. FIS fuzzy rule for MCESTA.

IF (Jaccard1 is L) and (Jaccard2 is L) and (Cos is L) THEN (J1 is L) (J2 is L) (C is L) (1).
IF (Jaccard1 is L) and (Jaccard2 is M) and (Cos is L) THEN (J1 is L) (J2 is M) (C is L) (1).
IF (Jaccard1 is L) and (Jaccard2 is H) and (Cos is L) THEN (J1 is L) (J2 is H) (C is L) (1).
IF (Jaccard1 is M) and (Jaccard2 is L) and (Cos is L) THEN (J1 is M) (J2 is L) (C is L) (1).
IF (Jaccard1 is M) and (Jaccard2 is M) and (Cos is L) THEN (J1 is M) (J2 is M) (C is L) (1).
IF (Jaccard1 is M) and (Jaccard2 is H) and (Cos is L) THEN (J1 is M) (J2 is H) (C is L) (1).
IF (Jaccard1 is H) and (Jaccard2 is L) and (Cos is L) THEN J1 is H) (J2 is L) (C is L) (1).
IF (Jaccard1 is H) and (Jaccard2 is M) and (Cos is L) THEN (J1 is H) (J2 is M) (C is L) (1).
IF (Jaccard1 is H) and (Jaccard2 is H) and (Cos is L) THEN (J1 is H) (J2 is H) (C is L) (1).
IF (Jaccard1 is L) and (Jaccard2 is L) and (Cos is M) THEN (J1 is L) (J2 is L) (C is M) (1).
IF (Jaccard1 is L) and (Jaccard2 is M) and (Cos is M) THEN (J1 is L) (J2 is M) (C is M) (1).
IF (Jaccard1 is L) and (Jaccard2 is H) and (Cos is M) THEN (J1 is L) (J2 is H) (C is M) (1).
IF (Jaccard1 is M) and (Jaccard2 is L) and (Cos is M) THEN (J1 is M) (J2 is L) (C is M) (1).
IF (Jaccard1 is M) and (Jaccard2 is M) and (Cos is M) THEN (J1 is M) (J2 is M) (C is M) (1).
IF (Jaccard1 is M) and (Jaccard2 is H) and (Cos is M) THEN (J1 is M) (J2 is H) (C is M) (1).
IF (Jaccard1 is H) and (Jaccard2 is L) and (Cos is M) THEN (J1 is H) (J2 is L) (C is M) (1).
IF (Jaccard1 is H) and (Jaccard2 is M) and (Cos is M) THEN (J1 is H) (J2 is M) (C is M) (1).
IF (Jaccard1 is H) and (Jaccard2 is H) and (Cos is M) THEN (J1 is H) (J2 is H) (C is M) (1).
IF (Jaccard1 is L) and (Jaccard2 is L) and (Cos is H) THEN (J1 is L) (J2 is L) (C is H) (1).
IF (Jaccard1 is L) and (Jaccard2 is M) and (Cos is H) THEN (J1 is L) (J2 is M) (C is H) (1).
IF (Jaccard1 is L) and (Jaccard2 is H) and (Cos is H) THEN (J1 is L) (J2 is H) (C is H) (1).
IF (Jaccard1 is M) and (Jaccard2 is L) and (Cos is H) THEN (J1 is M) (J2 is L) (C is H) (1).
IF (Jaccard1 is M) and (Jaccard2 is M) and (Cos is H) THEN (J1 is M) (J2 is M) (C is H) (1).
IF (Jaccard1 is M) and (Jaccard2 is H) and (Cos is H) THEN (J1 is M) (J2 is H) (C is H) (1).
IF (Jaccard1 is H) and (Jaccard2 is L) and (Cos is H) THEN (J1 is H) (J2 is L) (C is H) (1).
IF (Jaccard1 is H) and (Jaccard2 is M) and (Cos is H) THEN (J1 is H) (J2 is M) (C is H) (1).
IF (Jaccard1 is H) and (Jaccard2 is H) and (Cos is H) THEN (J1 is H) (J2 is H) (C is H) (1).

L, Low; M, Medium; H, High..


Table. 2.

Table 2. Comparison.

Method/SetiSLSHCSCSCCSWCSXSSJOur approach
10.91671.00.9750.83570.950.96270.83560.9877
21.01.01.01.01.01.01.01.0
30.50.76920.70.420.6820.71360.59970.8475
40.50.76920.70.490.70.71580.70.8551
51.01.01.00.80.82480.96520.80.9797
6#1.01.01.01.01.01.01.0
70.00.90910.90.90.90.90530.90.9725
80.50.90910.90.540.84110.86310.59910.9482
90.66670.90910.90.810.90.90530.90.9756
100.83331.00.90.90.78330.950.89740.9311
110.751.00.90.720.80030.91270.720.9572
120.80.93750.90.780.83090.89040.89590.9068
130.750.90910.90.810.90.90530.90.979
141.01.01.00.70.72090.95530.70.9484
150.751.00.950.90480.62150.96750.90420.9187
160.40.76920.70.490.62220.71580.69710.8125
170.250.76920.70.490.70.71580.70.828
180.50.76920.70.490.70.71580.70.8551
190.50.76920.70.490.70.71580.70.7636
200.50.76920.70.490.70.71580.70.8551
210.810.86960.850.72250.850.85790.850.8551

Bold text represents incorrect results and italicized text, scale-dependent results..


  1. Zadeh, LA (1965). Fuzzy sets. Information and Control. 8, 338-353. https://doi.org/10.1016/S0019-9958(65)90241-X
    CrossRef
  2. Chen, SM (1996). New methods for subjective mental workload assessment and fuzzy risk analysis. Cybernetics and Systems. 27, 449-472. https://doi.org/10.1080/019697296126417
    CrossRef
  3. Hsieh, CH, and Chen, SH . Similarity of generalized fuzzy numbers with graded mean integration representation., Proceedings of the 8th International Fuzzy Systems Association World Congress, 1999, Taipei, Taiwan, pp.551-555.
  4. Lee, HS 1999. An optimal aggregation method for fuzzy opinions of group decision., Proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics, Tokyo, Japan, Array, pp.314-319. https://doi.org/10.1109/ICSMC.1999.823219
    CrossRef
  5. Chen, SJ, and Chen, SM (2003). Fuzzy risk analysis based on similarity measures of generalized fuzzy numbers. IEEE Transactions on Fuzzy Systems. 11, 45-56. https://doi.org/10.1109/TFUZZ.2002.806316
    CrossRef
  6. Yang, MS, Hung, WL, and ChangChien, SJ (2005). On a similarity measure between LR-type fuzzy numbers and its application to database acquisition. International Journal of Intelligent Systems. 20, 1001-1016. https://doi.org/10.1002/int.20102
    CrossRef
  7. Wei, SH, and Chen, SM (2009). A new approach for fuzzy risk analysis based on similarity measures of generalized fuzzy numbers. Expert Systems with Applications. 36, 589-598. https://doi.org/10.1016/j.eswa.2007.09.033
    CrossRef
  8. Xu, Z, Shang, S, Qian, W, and Shu, W (2010). A method for fuzzy risk analysis based on the new similarity of trapezoidal fuzzy numbers. Expert Systems with Applications. 37, 1920-1927. https://doi.org/10.1016/j.eswa.2009.07.015
    CrossRef
  9. Chen, SJ (2011). Fuzzy information retrieval based on a new similarity measure of generalized fuzzy numbers. Intelligent Automation & Soft Computing. 17, 465-476. https://doi.org/10.1080/10798587.2011.10643161
    CrossRef
  10. Hwang, CM, and Yang, MS (2014). New similarity measures between generalized trapezoidal fuzzy numbers using the Jaccard Index. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 22, 831-844. https://doi.org/10.1142/S0218488514500445
    CrossRef
  11. Chen, SH (1998). Operations of fuzzy numbers with step form membership function using function principle. Information Sciences. 108, 149-155. https://doi.org/10.1016/S0020-0255(97)10070-6
    CrossRef
  12. Chiao, KP 2016. Ranking interval type 2 fuzzy sets using parametric graded mean integration representation., Proceedings of 2016 International Conference on Machine Learning and Cybernetics, Jeju, Korea, Array, pp.606-611. https://doi.org/10.1109/ICMLC.2016.7872956
    CrossRef
  13. Gupta, Y, Saini, A, and Saxena, AK (2014). Fuzzy logic-based approach to develop hybrid similarity measure for efficient information retrieval. Journal of Information Science. 40, 846-857. https://doi.org/10.1177/0165551514548989
    CrossRef
  14. Real, R, and Vargas, JM (1996). The probabilistic basis of Jaccards index of similarity. Systematic Biology. 45, 380-385. https://doi.org/10.1093/sysbio/45.3.380
    CrossRef
  15. Samiri, MY, Najib, M, Elfazziki, A, Abourraja, MN, Boudebous, D, and Bouain, A (2017). APRICOIN: an adaptive approach for prioritizing high-risk containers inspections. IEEE Access. 5, 18238-18249. https://doi.org/10.1109/ACCESS.2017.2746838
    CrossRef
  16. Tsai, JT, Chou, JH, and Lin, CF (2015). Designing microstructure parameters for backlight modules by using improved adaptive neuro-fuzzy inference system. IEEE Access. 3, 2626-2636. https://doi.org/10.1109/ACCESS.2015.2508144
    CrossRef
  17. Mamdani, EH, and Assilian, S (1975). An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies. 7, 1-13. https://doi.org/10.1016/S0020-7373(75)80002-2
    CrossRef
  18. Gupta, Y, Saini, A, and Saxena, AK (2015). A new fuzzy logic based ranking function for efficient information retrieval system. Expert Systems with Applications. 42, 1223-1234. https://doi.org/10.1016/j.eswa.2014.09.009
    CrossRef
  19. Lee, E, Choi, C, and Kim, P (2017). Intelligent handover scheme for drone using fuzzy inference systems. IEEE Access. 5, 13712-13719. https://doi.org/10.1109/ACCESS.2017.2724067
    CrossRef
  20. Tourad, MC, and Abdali, A 2017. Toward efficient ranked-key algorithm for the web notification of big data systems., Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, Tetouan, Morocco, Array. https://doi.org/10.1145/3090354.3090386
    CrossRef
  21. Tourad, MC, Abdali, A, and Outfarouin, A 2016. On a new index of publish/subscribe system in the context of big data., Proceedings of 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications, Agadir, Morocco, Array, pp.1-2. https://doi.org/10.1109/AICCSA.2016.7945756
    CrossRef

Mohamedou Cheikh Tourad received the degree of specialized Engineer from the Faculty of Sciences and Technology, Marrakesh, Morocco, in 2014, where he is currently pursuing a Ph.D. degree with the Computer Science Department of the Faculty of Sciences and Technology. His research interests are in Intelligent Publish/Subscribe System, Big Data and Fuzzy logic, Data science, Machine learning, Deep learning.

E-mail: cheikhtouradmohamedou@gmail.com

Abdelmoumaim Abdali received a Ph.D. in Solid Mechanics and Structures from the University of Amiens, France, in 1996. He is a professor of Computer Science at the University Cadi Ayyad, Faculty of Sciences and Technology, Marrakech, Morocco, and a member of the Laboratory of Applied Mathematics and Computer Science (LAMAI) Marrakesh, Morocco. His research interests are: Publish/Subscribe System, software engineering, Big Data, Fuzzy logic, computer science, DTN network, business intelligence, design and implementation in data warehouse, ubiquitous systems, modeling & design, metamodel design, model transformation, model verification & validation methods, numerical simulation, smart cities, biomechanics, and bone remodeling.

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2018; 18(4): 303-315

Published online December 31, 2018 https://doi.org/10.5391/IJFIS.2018.18.4.303

Copyright © The Korean Institute of Intelligent Systems.

An Intelligent Similarity Model between Generalized Trapezoidal Fuzzy Numbers in Large Scale

Mohamedou Cheikh Tourad, and Abdelmounaim Abdali

Applied Mathematics and Computer Science Laboratory, Cadi Ayyad University, Marrakech, Morocco

Correspondence to: Mohamedou Cheikh Tourad, (cheikhtouradmohamedou@gmail.com)

Received: November 13, 2018; Revised: December 19, 2018; Accepted: December 24, 2018

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

The rapid expansion of data published on the web has given rise to the similarity problem on a large scale, a very important subject for scientific research in the field of computer science. Several methods have been developed for this. In this paper, we propose the first mathematical model to find the similarity value between generalized trapezoidal fuzzy numbers (GTFNs). This model employs fuzzy inference systems to find the value of an effective weighting, the weights to be associated to different kinds of methods that can handle an important scale of the data. This model will allow us to develop intelligent systems. A comparative study based on 21 sets of GTFNs has been carried out to demonstrate the difference between our approach and existing methods. This study shows that our model is more reasonable than existing methods.

Keywords: Cosine coefficient, Jaccard index, GTFNs, FIS, Similarity, Large scale

1. Introduction

The concept of fuzzy logic, proposed in 1965 by Zadeh [1], has been used to manage a kind of probability. To employ this concept, the generalized trapezoidal fuzzy numbers (GTFNs) are most the most popular in practice. In the literature, several similarity methods between GTFNs have been introduced (e.g., [210]). But, these existing methods of similarity measures have many weaknesses. In many situations, such methods cannot appropriately find the similarity between two GTFNs. In the present study, a novel mathematical model for a fuzzy-number similarity method between GTFNs has been created, based on the weights associated to each similarity measure. This model uses the cosine coefficient and the Jaccard Index. Additionally, we describe and provide three characteristics of the proposed model. A comparative study has been carried out, based on 21 sets of GTFNs, to show that the model can surmount the limitations of the existing measures.

The remainder of this article is structured as follows. Section 2 presents a summary of the fundamental notions of existing methods. Section 3 introduces the novel approach, presenting the similarity methods employed and finding the weights associated to each similarity method. Many properties are suggested and proven. Section 4 compares it with the existing similarity methods. We give our conclusions in Section 5.

2. Related Work

The notion of a GFN T̃ is presented by Chen [11, 12] as follows: T̃ = {t1, t2, t3, t4, w }, where the tkk = 1, 2, 3, 4 denote real numbers, and we take into account a weight w with 0 < w < 1. When w = 1, T̃ is a normal trapezoidal fuzzy number (NTFN). When t2 = t3, T̃ is called a GTFN. When t1 = t2 and t3 = t4, T̃ is called a crisp interval. When w = 1 and t1 = tk (where k =1,2,3,4), T̃ is a real number.

Several Similarity methods between GTFNs T̃ and H̃ = {h1, h2, h3, h4, w } have been introduced:

Chen [2] proposed a new method to solve the similarity problems of the GFNs. This method is based on the geometric distance. The author used this method to clear the problems of fuzzy risk analysis and subjective mental workload assessment.

SC(T˜,H˜)=1-(k=14tk-hk4).

Hsieh and Chen [3] introduced a new approach to find a solution to the similarity problems of the GFNs. This approach is based on the average integration representation distance.

SHC(T˜,H˜)=11+d(T,H),

where d(, ) = |P(T) − P(H)|, P(T˜)=t1+2t2+2t3+t46,P(H˜)=h1+2h2+2h3+h46.

Lee [4] presented a new measure to answer the similarity problems of the GFNs. This measure is based on the the LP norm. The author used this measure to solve the problems of aggregating individual opinions into group consensus under group decision.

SL(T˜,H˜)=1-T˜-H˜LPU×4-1P,

where T˜-H˜LP=(k=14(tk-hk)P)1p, taking into account a positive integer P. ||U|| = max U − min U where U indicates the universe of discourse. Chen and Chen [5] defined a new proposed method to clear the similarity problems of GFNs. It is based on the simple center of gravity method (SCGM). The authors used this proposed to solve the problems of fuzzy risk analysis.

SCC(T˜,H˜)=1-(k=14tk-hk4)×(1-XT˜*-XH˜*)B(ST˜,SH˜)×min(yT˜*,yH˜*)max(yT˜*,yH˜*),

where

yT˜*={wT˜6×(2+t3-t2t4-t1)if t4t1,wT˜2if t4-t1,yH˜*={wH˜6×(2+h3-h2h4-h1)if h4h1,wH˜2if h4-h1,xT˜*={(yT˜*×(t3+t2))+(t4+t1)×(wT˜-yT˜*)2wT˜if wT˜0t4+t12if wT˜=0,xH˜*={(yH˜*×(h3+h2))+(h4+h1)×(wH˜-yH˜*)2wH˜if wH˜0h4+h12if wH˜=0,B(ST˜,SH˜)={1if ST˜+SH˜>0,0if ST˜+SH˜=0,

where S = t4t1, S = h4h1. Wei and Chen [7] proposed a new approach in order to solve the similarity problems of GFNs. It is based on the concepts of geometric distance, the perimeter and the height of GFNs. The authors used this approach to to answer the problems of fuzzy risk analysis.

SWC(T˜,H˜)=(1-(k=14tk-hk4))×(min(P(T˜),P(H˜))+min(wT˜,wH˜)max(P(T˜),P(H˜))+max(wT˜,wH˜)),

where

P(T˜)=((t1-t2)2+(wT˜)2)+(t3-t2)+((t4-t3)2+(wT˜)2))+(t4-t1),P(H˜)=((h1-h2)2+(wH˜)2)+(h3-h2)+((h4-h3)2+(wH˜)2))+(h4-h1).

Recently, Xu et al. [8] changed the similarity method SCC proposed by Chen and Chen [5], introducing the new similarity SX. The authors used this method to answer the problems of fuzzy risk analysis.

SX(T˜,H˜)=1-(k=14tk-hk8)-(d(T˜,H˜)2),

where d(T˜,H˜)=((xT˜*-xH˜*)2+(yT˜*-yH˜*)2)1.25.

Chen [9] introduced a new proposed method to find a solution to the similarity problems of GFNs. This method is based on the geometric mean operator. The author used this approach to clear the problems of fuzzy-number used in the information retrieval.

SSJ(T˜,H˜)=[(k=14(2-tk-hk))14-1]×min(yT˜*,yH˜*)max(yT˜*,yH˜*),

where yT˜* and yH˜* are calculated as in Chen and Chen [5].

3. Approach

We will now present a novel mathematical model, MCESTA (Mohamedou Cheikh Elghotob Cheikh Saad bouh Cheikh Tourad Abass), Abdelmounaim Abdali, a large scale similarity method between GTFNs, call them T̃ and H̃. It is a kind of hybrid of the similarity measures Sk and Skq, where each measure is associated with an importance weight (αk and βq). MCESTA is an efficient development of the formula FBHSM=k=1nαk×Sk proposed by Gupta et al. [13] to calculate similarity in IR (information retrieval). MCESTA employs a fuzzy inference system (FIS) to find the value of the effective weighting. This hybrid similarity measure is calculated as follows:

MCESTA(T˜,H˜)={k=1nαk×Sk(T˜,H˜),Sk(T˜,H˜)=q=1mkβq×Skq(T˜,H˜),

where k=1nαk1 and q=1mkβq1.

Our model will be as follows:

MCESTA(T˜,H˜)=K=1nαK×[q=1mKβqSkq(T˜,H˜)],

where Sk is a similarity method between T̃ and H̃, Skq is a similarity sub-measure between T̃ and H̃ that should take into account the two criteria of their relative location and size. This measure is calculated to determine a reasonable value of similarity Sk. αk is an importance weight associated with Sk and βq is an importance weight associated with Skq. The number n defines the number of similarity methods Sk used to calculate the model. The number mk defines the number of similarity sub-measures Skq employed to calculate Sk.

3.1 Similarity Measures Sk and Skq

The descriptions of the similarity methods employed, the cosine coefficient and the Jaccard Index [14], are given below. This choice of similarity measures will be validated in Section 4. We have n = 2, S1 = cos and S2 = Jaccard and we choose m1 = 1 and m2 = 2, which implies, in (9),

MCESTA(T˜,H˜)=k=12αk×[q=1mkβqSkq(T˜,H˜)].

for each k we provide a corresponding mk.

MCESTA(T˜,H˜)=α1×[q=1m1βqS1q(T˜,H˜)]+α2×[q=1m2βqS2q(T˜,H˜)],MCESTA(T˜,H˜)=α1×[β1S11(T˜,H˜)]+α2×[β1S21(T˜,H˜)+β2S22(T˜,H˜)],MCESTA(T˜,H˜)=α1×β1cos(T˜,H˜)+α2×β1Jaccard1(T˜,H˜)+α2×β2Jaccard2(T˜,H˜),

where S11=cos,S21=Jaccard1 and S22=Jaccard2. To facilitate the calculation of this approach, one needs to make the following changes of variable: C = α1 × β1, J1 = α2 × β1 and J2 = α2 × β2. So this equation will become

MCESTA(T˜,H˜)=C×cos(T˜,H˜)+J1×Jaccard1(T˜,H˜)+J2×Jaccard2(T˜,H˜).
3.1.1 The cosine coefficient cos(, )

The cosine coefficient calculates the similarity between vectors in an easy and more intelligent way: it is by the determination of the direction. cos(, ) is calculated as follows:

xk=tk,

where [k = {1, 2, 3, 4}], x5 = wT, x6 = t2t1, x7 = t3t2, x8 = t4t3. Also, yk = hk where [k = {1, 2, 3, 4}], y5 = wH, y6 = h2h1, y7 = h3h2, y8 = h4h3.

cos(T˜,H˜)=k=18xk×yk(k=18xk2)12×(k=18yk2)12.
3.1.2 The Jaccard Index

The Jaccard Index, Jaccard(, ) [10], measures the similarity as follows: the size of the intersection is divided by the size of the union.

Jaccard1(, ) is calculated as follows: xk = tkm, where [k = {1, 2, 3, 4}], x5 = w, x6 = t2t1, x7 = t3t2, x8 = t4t3, and yk = hkm, where [k = {1, 2, 3, 4}], y5 = w, y6 = h2h1, y7 = h3h2, y8 = h4h3, where m = min(t1, h1).

Jaccard1(T˜,H˜)=k=18xk×yk(k=18xk2)+(k=18yk2)-k=18xk×yk.

Jaccard2(, ) is calculated as following: x5-k*=Mk-xk, where [k = {1, 2, 3, 4}], x5*=wT,x6*=t2-t1,x7*=t3-t2,x8*=t4-t3, and y5-k*=Mk-yk, where [k = {1, 2, 3, 4}], y5*=wH,y6*=h2-h1,y7*=h3-h2,y8*=h4-h3, where M = max(t4, h4).

Jaccard2(T˜,H˜)=k=18xk*×yk*k=18(xk*)2+k=18(yk*)2-i=k8xk*×yk*.

3.2 The Methods to Define the Weights C, J1 and J2

Our choice is based on an FIS, such systems are recognized and have been used in many different fields [15, 16]. To calculate the weights mentioned in Eq. 9, we use the Mamdani-type FIS mentioned in [13, 1719] adapted to our model. Figure 1 shows the Mamdani for MCESTA.

In Table 1 there is presented the FIS fuzzy rule for the MCESTA, which is a set of semantic declarations that define how the FIS-Mamdani must carry out its decision making for input state (Cosine, Jaccard1, and Jaccard2) or controlling an output(C, J1 and J2). This table uses the values of the functions presented in Figure 2.

In Figure 3, the FIS-diagram for the MCESTA defined as a procedure for developing the association relation (from a given value to an output value) based on the concepts of fuzzy logic. This figure shows the values which are the most important for the weights following C, J1 and J2.

Figure 4 to Figure 9 show the evolution of the FIS-Rule surface diagram for the weights (C, J1 and J2) in MCESTA.

3.3 Properties

3.3.1 IdenticalT˜=H˜MCESTA(T˜,H˜)=1.
3.3.2 Symmetric relationMCESTA(T˜,H˜)=MCESTA(H˜,T˜).
3.3.3 Scale-free

This property has been used by Hwang and Yang [10] for validating their proposed similarity method. We have that

MCESTA(T˜1,H˜1)=MCESTA(T˜,H˜)

for 1 = α × = (α×t1, α×t2, α×t3, α×t4, α×w) and 1 = α × = (α × h1, α × h2, α × h3, α × h4, α × w), where α ≥ 0. Normally, the best similarity should be scale free. This is a very significant property for the large scale. According to this result, we deduce that a very large data Set1 = (1, 1) can be analyzed using another batch of data of small size Set = (, ). But according to the scale relation 1 = α × and 1 = α × , the interpretation and the analysis of Set gives a global idea of the interpretation of Set1, which is important when it can be difficult to treat everything in the context of big data or the machines don’t have sufficient capacity. This property is very important for large scale studies.

3.4 Proof of Properties

To see that our model verifies the three properties, it is enough that every similarity (Cosine, Jaccard1 and Jaccard2) used in this model satisfies these properties. For the sub-measures Jaccard1 and Jaccard2, this is proved by Hwang and Yang [10]. It remains to treat Cosine.

3.4.1 MCESTA is identicalT˜=H˜cos(T˜,H˜)=1.
  • The ⇒ is proved by observing that since = , then xk = yk, k = {1, 2, 3, 4, 6, 7, 8}. Therefore

    k=18xk×yk(k=18xk2)12×(k=18yk2)12=k=18xk×xk(k=18xk2)12×(k=18xk2)12=k=18xk2k=18xk2=1.

    Thus,

    tildeT=H˜cos(T˜,H˜)=1.

  • The ⇐ is proved as follows: Since

    cos(T˜,H˜)=1k=18(xk×yk)(k=18xk2)12×(k=18yk2)12=1,k=18(xk×yk)=(k=18xk2)12×(k=18yk2)12(k=18xk×yk)2=(k=18xk2)×(k=18yk2),

we have that:

(k=18xk2)×(k=18yk2)=k=18xk2×yk2+(k=18xk2)×(q=1,qk8yq2)=k=18xk2×yk2+k=18q=1,qk8xk2×yq2,

(a) (k=18xk2)×(k=18yk2)=k=18xk2×yk2+k=18q=1,qk8xk2×yq2, and we have that:

(b) (k=18xk×yk)2=k=18xk2×yk2+k=18q=1,qk8(xk×yk)×(xq×yq), the intersection between (a) and (b) is 0:

k=18q=1,qk8xk2×yk2-k=18q=1,qk8(xk×yk)×(xq×yq)=0,(k=18q=1,qk8(xk×yq))×[(xk×yq)-(xq×yk)]=0,

we have:

0k=18xk×yk(k=18xk2)12×(k=18yk2)1210(k=18xk×yk)2(k=18xk2)×(k=18yk2)1(a)-(b)0,

so (a)–(b)= 0 ⇒ (xk × yq)) × [(xk × yq) − (xq × yk)] = 0, we have xk ≠ 0 and yk ≠ 0. Thus (a)–(b)= 0 ⇒ [(xk ×yq)− (xq × yk)] = 0. ⇒ ∃λ/∀λ ≥ 0, xkyk=xqyq=λ. particularly λ = 1 ⇒ xk = yk.

Therefore, we have that xk = yk, [k = 1, 2, 3, 4, 5, 6, 7, 8], T̃= H̃.

3.4.2 MCESTA is a symmetric relationcos(T˜,H˜)=cos(H˜,T˜).

This follows since k=18xk×yk=k=18yk×xk and (k=18xk2)12×(k=18yk2)12=(k=18yk2)12×(k=18xk2)12. It simple to show that cos(, ) = cos(, ).

3.4.3 MCESTA is scale-freecos(T˜1,H˜1)=cos(T˜,H˜).

Suppose 1 = α × = (α × t1, α × t2, α × t3, α × t4, α × w) and 1 = α × = (α×h1, α×h2, α×h3, α×h4, α×w), where α ≥ 0.

We have cos(T˜1,H˜1)=k=18(α×xk)×(α×yk)(k=18((α×xk)2)12×(k=18((α×yk)2)12=k=18xk×yk(k=18xk2)12×(k=18yk2)12=cos(T˜,H˜).

4. A Comparative Study

4.1 Implementation

Our approach to similarity and the existing methods SC (Eq. 1), SHC (Eq. 2), SL (Eq. 3), SCC (Eq. 4), SWC (Eq. 5), SX (Eq. 6), SSJ (Eq. 7) are implemented employing the Python 3.6 with libraries data science (NumPy, SciPy). All calculations were done with a quad-core processor 3.60 GHz and a 16 GB memory for the JVM. The sets of GTFNs are implemented as associated lists of arrays (Array List).

4.2 Examples and Comparisons

To validate and compare our contributions with the existing methods, we made the calculations on the sets already used by Hwang and Yang [10] in 2014. There are 21 sets (Set1 to Set21) of fuzzy numbers, shown in Figure 10 to Figure 30 respectively. The estimated times taken by our approach and by the existing methods are given in Table 2. We can demonstrate and analyze the weaknesses and limitations of each of the existing similarity measures from the table.

We can analyze Table 2 in terms of three types: incorrect results, scale-dependent results, and direction.

4.2.1 Incorrect results
  • We have in Set1, SHC(Ã,Ã1) =1: we can judge the result as incorrect.

  • For Set3 and Set4, we have that the similarity of Set4 should be more similar than that of Set3, but the similarity methods SC(Ã,Ã1), SHC(Ã,Ã1) and SL(Ã,Ã1) produce the same result: we can judge the result as incorrect.

  • We have in Set5, Ã ≠ Ã1. The similarity produced by SC(Ã,Ã1), SC(Ã,Ã1) and SL(Ã,Ã1) is 1: we can say the result is incorrect.

  • We have in Set6 for SL(Ã,Ã1), it can’t evaluate the similarity degree. we can judge the result as incorrect.

  • For Set7, the similarity produced by SL(Ã,Ã1) is 0: the result is incorrect.

  • For Set8 and Set9, we have that the similarity of Set4 should be different from the similarity of those with Set3, but the similarity methods SC(Ã,Ã1) and SC(Ã,Ã1) produce the same result: we can say the result is incorrect.

  • For Set9 and Set10, we have that the similarity of Set9 should be more similar than that Set10, however, Table 2 proves that the similarity produced by the methods SC(Ã,Ã1), SHC(Ã,Ã1), SL(Ã,Ã1), SCC(Ã,Ã1) and SX(Ã,Ã1) are incorrect results.

  • We have in Set11, Ã ≠ Ã1. The similarity produced by SHC(Ã,Ã1) is 1: we can judge the result as incorrect.

  • We have in Set14, Ã ≠ Ã1. The similarity produced by SC(Ã,Ã1), SHC(Ã,Ã1) and SL(Ã,Ã1) is 1: the result is incorrect.

  • For Set14 and Set15, we have that the similarity of Set14 should be more similar than that of Set15. However, Table 2 proves that the similarity produced by the methods SC(Ã,Ã1), SHC(Ã,Ã1), SL(Ã,Ã1), SCC(Ã,Ã1), SX(Ã,Ã1) and SSJ(Ã,Ã1) are incorrect results.

  • It can be seen for the two different Set16 and Set17 that the methods SC(Ã,Ã1), SHC(Ã,Ã1), SCC(Ã,Ã1) and SX(Ã,Ã1) produce incorrect results.

  • For Set18 and Set19, we have that the similarity of Set18 should be more similar than that of Set19, but the similarity methods SC(Ã, Ã1), SHC(Ã,Ã1), SL(Ã,Ã1), SCC(Ã, Ã1), SWC(Ã,Ã1), SX(Ã, Ã1) and SSJ(Ã, Ã1) produce the same result: we can say the result is incorrect.

4.2.2 Scale-dependent results
  • For Set20 and Set21, we have that Set20 and Set21 are in double-scale relation. Usually, the best similarity methods must verify property 3 (scale-free), but the similarity methods SC(Ã, Ã1), SHC(Ã, Ã1), SL(Ã, Ã1), SCC(Ã, Ã1), SWC(Ã Ã1), SX(Ã, Ã1) and SSJ(Ã, Ã1) are scaledependent (are not scale-free), while our approach is scale-free.

4.2.3 Direction(Cosine) passing to large-scale
  • The direction similarity is very important. This measurement is used in the big data framework when the data tends towards the infinite. But the similarity methods SC(Ã,Ã1), SHC(AÃ,Ã1), SL(Ã,Ã1), SCC(Ã,Ã1), SWC(Ã,Ã1), SX(Ã,Ã1)and SSJ(Ã,Ã1) do not use the direction, while our approach uses this measurement with a weight C = 0.508, see Figure 3.

After the analysis of Table 2, the three properties and the direction based on the cosine coefficient help this similarity approach to be the best choice for the large scale.

5. Conclusion

A novel mathematical model MCESTA for GTFNs has been is presented (as well as the existing methods). This model is based on using weights associated with each one of several differeur nt similarity measures. We have been able to infer the importance weights by a Mamdani-type FIS [17]. This model uses the cosine coefficient and Jaccard Index. Three properties of the model are proved, one property is advantageous for being used with large scale datasets. A comparative study has also been presented to explain how onovel approach can overcome the limitations and weaknesses of the existing methods. This approach will help us develop an intelligent filtering of the pub-sub system [20, 21].

Fig 1.

Figure 1.

The Mamdani for MCESTA.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 2.

Figure 2.

FIS value membership functions.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 3.

Figure 3.

FIS-diagram for the MCESTA.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 4.

Figure 4.

FIS-rule surface diagram for C by Cosine and Jaccard1 in MCESTA.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 5.

Figure 5.

FIS-rule surface diagram for C by Cosine and Jaccard2 in MCESTA.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 6.

Figure 6.

FIS-rule surface diagram for J1 by Jaccard1 and Cosine in MCESTA.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 7.

Figure 7.

FIS-rule surface diagram for J1 by Jaccard1 and Jaccard2 in MCESTA.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 8.

Figure 8.

FIS-rule surface diagram for J2 by Jaccard2 and Cosine in MCESTA.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 9.

Figure 9.

FIS-rule surface diagram for J2 by Jaccard2 and Jaccard2 in MCESTA.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 10.

Figure 10.

Set1.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 11.

Figure 11.

Set2.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 12.

Figure 12.

Set3.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 13.

Figure 13.

Set4.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 14.

Figure 14.

Set5.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 15.

Figure 15.

Set6.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 16.

Figure 16.

Set7.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 17.

Figure 17.

Set8.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 18.

Figure 18.

Set9.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 19.

Figure 19.

Set10.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 20.

Figure 20.

Set11.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 21.

Figure 21.

Set4.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 22.

Figure 22.

Set13.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 23.

Figure 23.

Set14.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 24.

Figure 24.

Set15.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 25.

Figure 25.

Set16.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 26.

Figure 26.

Set17.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 27.

Figure 27.

Set18.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 28.

Figure 28.

Set19.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 29.

Figure 29.

Set20.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Fig 30.

Figure 30.

Set21.

The International Journal of Fuzzy Logic and Intelligent Systems 2018; 18: 303-315https://doi.org/10.5391/IJFIS.2018.18.4.303

Table 1 . FIS fuzzy rule for MCESTA.

IF (Jaccard1 is L) and (Jaccard2 is L) and (Cos is L) THEN (J1 is L) (J2 is L) (C is L) (1).
IF (Jaccard1 is L) and (Jaccard2 is M) and (Cos is L) THEN (J1 is L) (J2 is M) (C is L) (1).
IF (Jaccard1 is L) and (Jaccard2 is H) and (Cos is L) THEN (J1 is L) (J2 is H) (C is L) (1).
IF (Jaccard1 is M) and (Jaccard2 is L) and (Cos is L) THEN (J1 is M) (J2 is L) (C is L) (1).
IF (Jaccard1 is M) and (Jaccard2 is M) and (Cos is L) THEN (J1 is M) (J2 is M) (C is L) (1).
IF (Jaccard1 is M) and (Jaccard2 is H) and (Cos is L) THEN (J1 is M) (J2 is H) (C is L) (1).
IF (Jaccard1 is H) and (Jaccard2 is L) and (Cos is L) THEN J1 is H) (J2 is L) (C is L) (1).
IF (Jaccard1 is H) and (Jaccard2 is M) and (Cos is L) THEN (J1 is H) (J2 is M) (C is L) (1).
IF (Jaccard1 is H) and (Jaccard2 is H) and (Cos is L) THEN (J1 is H) (J2 is H) (C is L) (1).
IF (Jaccard1 is L) and (Jaccard2 is L) and (Cos is M) THEN (J1 is L) (J2 is L) (C is M) (1).
IF (Jaccard1 is L) and (Jaccard2 is M) and (Cos is M) THEN (J1 is L) (J2 is M) (C is M) (1).
IF (Jaccard1 is L) and (Jaccard2 is H) and (Cos is M) THEN (J1 is L) (J2 is H) (C is M) (1).
IF (Jaccard1 is M) and (Jaccard2 is L) and (Cos is M) THEN (J1 is M) (J2 is L) (C is M) (1).
IF (Jaccard1 is M) and (Jaccard2 is M) and (Cos is M) THEN (J1 is M) (J2 is M) (C is M) (1).
IF (Jaccard1 is M) and (Jaccard2 is H) and (Cos is M) THEN (J1 is M) (J2 is H) (C is M) (1).
IF (Jaccard1 is H) and (Jaccard2 is L) and (Cos is M) THEN (J1 is H) (J2 is L) (C is M) (1).
IF (Jaccard1 is H) and (Jaccard2 is M) and (Cos is M) THEN (J1 is H) (J2 is M) (C is M) (1).
IF (Jaccard1 is H) and (Jaccard2 is H) and (Cos is M) THEN (J1 is H) (J2 is H) (C is M) (1).
IF (Jaccard1 is L) and (Jaccard2 is L) and (Cos is H) THEN (J1 is L) (J2 is L) (C is H) (1).
IF (Jaccard1 is L) and (Jaccard2 is M) and (Cos is H) THEN (J1 is L) (J2 is M) (C is H) (1).
IF (Jaccard1 is L) and (Jaccard2 is H) and (Cos is H) THEN (J1 is L) (J2 is H) (C is H) (1).
IF (Jaccard1 is M) and (Jaccard2 is L) and (Cos is H) THEN (J1 is M) (J2 is L) (C is H) (1).
IF (Jaccard1 is M) and (Jaccard2 is M) and (Cos is H) THEN (J1 is M) (J2 is M) (C is H) (1).
IF (Jaccard1 is M) and (Jaccard2 is H) and (Cos is H) THEN (J1 is M) (J2 is H) (C is H) (1).
IF (Jaccard1 is H) and (Jaccard2 is L) and (Cos is H) THEN (J1 is H) (J2 is L) (C is H) (1).
IF (Jaccard1 is H) and (Jaccard2 is M) and (Cos is H) THEN (J1 is H) (J2 is M) (C is H) (1).
IF (Jaccard1 is H) and (Jaccard2 is H) and (Cos is H) THEN (J1 is H) (J2 is H) (C is H) (1).

L, Low; M, Medium; H, High..


Table 2 . Comparison.

Method/SetiSLSHCSCSCCSWCSXSSJOur approach
10.91671.00.9750.83570.950.96270.83560.9877
21.01.01.01.01.01.01.01.0
30.50.76920.70.420.6820.71360.59970.8475
40.50.76920.70.490.70.71580.70.8551
51.01.01.00.80.82480.96520.80.9797
6#1.01.01.01.01.01.01.0
70.00.90910.90.90.90.90530.90.9725
80.50.90910.90.540.84110.86310.59910.9482
90.66670.90910.90.810.90.90530.90.9756
100.83331.00.90.90.78330.950.89740.9311
110.751.00.90.720.80030.91270.720.9572
120.80.93750.90.780.83090.89040.89590.9068
130.750.90910.90.810.90.90530.90.979
141.01.01.00.70.72090.95530.70.9484
150.751.00.950.90480.62150.96750.90420.9187
160.40.76920.70.490.62220.71580.69710.8125
170.250.76920.70.490.70.71580.70.828
180.50.76920.70.490.70.71580.70.8551
190.50.76920.70.490.70.71580.70.7636
200.50.76920.70.490.70.71580.70.8551
210.810.86960.850.72250.850.85790.850.8551

Bold text represents incorrect results and italicized text, scale-dependent results..


References

  1. Zadeh, LA (1965). Fuzzy sets. Information and Control. 8, 338-353. https://doi.org/10.1016/S0019-9958(65)90241-X
    CrossRef
  2. Chen, SM (1996). New methods for subjective mental workload assessment and fuzzy risk analysis. Cybernetics and Systems. 27, 449-472. https://doi.org/10.1080/019697296126417
    CrossRef
  3. Hsieh, CH, and Chen, SH . Similarity of generalized fuzzy numbers with graded mean integration representation., Proceedings of the 8th International Fuzzy Systems Association World Congress, 1999, Taipei, Taiwan, pp.551-555.
  4. Lee, HS 1999. An optimal aggregation method for fuzzy opinions of group decision., Proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics, Tokyo, Japan, Array, pp.314-319. https://doi.org/10.1109/ICSMC.1999.823219
    CrossRef
  5. Chen, SJ, and Chen, SM (2003). Fuzzy risk analysis based on similarity measures of generalized fuzzy numbers. IEEE Transactions on Fuzzy Systems. 11, 45-56. https://doi.org/10.1109/TFUZZ.2002.806316
    CrossRef
  6. Yang, MS, Hung, WL, and ChangChien, SJ (2005). On a similarity measure between LR-type fuzzy numbers and its application to database acquisition. International Journal of Intelligent Systems. 20, 1001-1016. https://doi.org/10.1002/int.20102
    CrossRef
  7. Wei, SH, and Chen, SM (2009). A new approach for fuzzy risk analysis based on similarity measures of generalized fuzzy numbers. Expert Systems with Applications. 36, 589-598. https://doi.org/10.1016/j.eswa.2007.09.033
    CrossRef
  8. Xu, Z, Shang, S, Qian, W, and Shu, W (2010). A method for fuzzy risk analysis based on the new similarity of trapezoidal fuzzy numbers. Expert Systems with Applications. 37, 1920-1927. https://doi.org/10.1016/j.eswa.2009.07.015
    CrossRef
  9. Chen, SJ (2011). Fuzzy information retrieval based on a new similarity measure of generalized fuzzy numbers. Intelligent Automation & Soft Computing. 17, 465-476. https://doi.org/10.1080/10798587.2011.10643161
    CrossRef
  10. Hwang, CM, and Yang, MS (2014). New similarity measures between generalized trapezoidal fuzzy numbers using the Jaccard Index. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 22, 831-844. https://doi.org/10.1142/S0218488514500445
    CrossRef
  11. Chen, SH (1998). Operations of fuzzy numbers with step form membership function using function principle. Information Sciences. 108, 149-155. https://doi.org/10.1016/S0020-0255(97)10070-6
    CrossRef
  12. Chiao, KP 2016. Ranking interval type 2 fuzzy sets using parametric graded mean integration representation., Proceedings of 2016 International Conference on Machine Learning and Cybernetics, Jeju, Korea, Array, pp.606-611. https://doi.org/10.1109/ICMLC.2016.7872956
    CrossRef
  13. Gupta, Y, Saini, A, and Saxena, AK (2014). Fuzzy logic-based approach to develop hybrid similarity measure for efficient information retrieval. Journal of Information Science. 40, 846-857. https://doi.org/10.1177/0165551514548989
    CrossRef
  14. Real, R, and Vargas, JM (1996). The probabilistic basis of Jaccards index of similarity. Systematic Biology. 45, 380-385. https://doi.org/10.1093/sysbio/45.3.380
    CrossRef
  15. Samiri, MY, Najib, M, Elfazziki, A, Abourraja, MN, Boudebous, D, and Bouain, A (2017). APRICOIN: an adaptive approach for prioritizing high-risk containers inspections. IEEE Access. 5, 18238-18249. https://doi.org/10.1109/ACCESS.2017.2746838
    CrossRef
  16. Tsai, JT, Chou, JH, and Lin, CF (2015). Designing microstructure parameters for backlight modules by using improved adaptive neuro-fuzzy inference system. IEEE Access. 3, 2626-2636. https://doi.org/10.1109/ACCESS.2015.2508144
    CrossRef
  17. Mamdani, EH, and Assilian, S (1975). An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of Man-Machine Studies. 7, 1-13. https://doi.org/10.1016/S0020-7373(75)80002-2
    CrossRef
  18. Gupta, Y, Saini, A, and Saxena, AK (2015). A new fuzzy logic based ranking function for efficient information retrieval system. Expert Systems with Applications. 42, 1223-1234. https://doi.org/10.1016/j.eswa.2014.09.009
    CrossRef
  19. Lee, E, Choi, C, and Kim, P (2017). Intelligent handover scheme for drone using fuzzy inference systems. IEEE Access. 5, 13712-13719. https://doi.org/10.1109/ACCESS.2017.2724067
    CrossRef
  20. Tourad, MC, and Abdali, A 2017. Toward efficient ranked-key algorithm for the web notification of big data systems., Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, Tetouan, Morocco, Array. https://doi.org/10.1145/3090354.3090386
    CrossRef
  21. Tourad, MC, Abdali, A, and Outfarouin, A 2016. On a new index of publish/subscribe system in the context of big data., Proceedings of 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications, Agadir, Morocco, Array, pp.1-2. https://doi.org/10.1109/AICCSA.2016.7945756
    CrossRef

Share this article on :

Related articles in IJFIS