Article Search
닫기

Original Article

Split Viewer

International Journal of Fuzzy Logic and Intelligent Systems 2013; 13(4): 254-268

Published online December 30, 2013

https://doi.org/10.5391/IJFIS.2013.13.4.254

© The Korean Institute of Intelligent Systems

An Overview of Unsupervised and Semi-Supervised Fuzzy Kernel Clustering

Hichem Frigui1, Ouiem Bchir2, and Naouel Baili3

1Multimedia Research Lab, University of Louisville, Louisville, KY, USA
2Computer Science Department, College of Computer and Information Systems (CCIS), King Saud University, Riyadh, Saudi Arabia
3Quintiles, USA

Correspondence to :
Hichem Frigui (h.frigui@louisville.edu)

Received: December 12, 2013; Revised: December 23, 2013; Accepted: December 24, 2013

For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Kernel-based clustering has proven to be an effective approach to partition such data. In this paper, we provide an overview of several fuzzy kernel clustering algorithms. We focus on methods that optimize an fuzzy C-mean-type objective function. We highlight the advantages and disadvantages of each method. In addition to the completely unsupervised algorithms, we also provide an overview of some semi-supervised fuzzy kernel clustering algorithms. These algorithms use partial supervision information to guide the optimization process and avoid local minima. We also provide an overview of the different approaches that have been used to extend kernel clustering to handle very large data sets.

Keywords: Fuzzy clustering,Kernel-based clustering,Relational Kernel clustering,Multiple Kernel clustering,Semi-supervised clustering

Clustering is an essential and frequently performed task in pattern recognition and data mining. It aids in a variety of tasks related to understanding and exploring the structure of large and high dimensional data. The goal of cluster analysis is to find natural groupings in a set of objects such that objects in the same cluster are as similar as possible and objects in different clusters are as dissimilar as possible.

In most applications, categories are rarely well separated and boundaries are overlapping. Describing these real world situations by crisp sets does not allow the user to quantitatively distinguish between objects which are strongly associated with a particular category from those that have only a marginal association with multiple ones, particularly, along the overlapping boundaries. Fuzzy clustering methods are good at dealing with these situations. In fact, data elements can belong to more than one cluster with fuzzy membership degrees.

Fuzzy clustering is widely used in the machine learning field. Areas of application of fuzzy cluster analysis include data analysis [1, 2], information retrieval [3, 4], image segmentation [5], and robotics [6]. One of the most widely used fuzzy clustering algorithm is the fuzzy C-means (FCM) algorithm [7].

Typically, the data to be clustered could have an object based or a relational based representation. In object data representation, each object is represented by a feature vector, while for the relational representation only information about how two objects are related is available. Relational data representation is more general in the sense that it can be applied when only the degree of dissimilarity between objects is available, or when groups of similar objects cannot be represented efficiently by a single prototype.

Despite the large number of existing clustering methods, clustering remains a challenging task when the structure of the data does not correspond to easily separable categories, and when clusters vary in size, density, and shape. Kernel based approaches [814] can adapt a specific distance measure in order to make the problem easier. They have attracted great attention and have been applied in many fields such as fault diagnosis of marine diesel engine [12], bridge parameters estimation [13], watermarking [14], automatic classiffication fragments of ceramic cultural relics [15], image segmentation [16], model construction for an erythromycin fermentation process [17], oil-immersed transformer fault diagnosis [18], analyzing enterprises independent innovation capability in order to promote different strategies [19], segmentingmagnetic resonance imaging brain images [20, 21], and classification of audio signals [22].

Kernel clustering approaches rely on their ability to produce nonlinear separating hyper surfaces between clusters by performing a mapping φ from the input space X to a high dimensional feature space F. One of the most relevant aspects in kernel applications is that it is possible to compute Euclidean distances in F without knowing it explicitly. This can be done using the distance kernel trick [23]:

Φ(xi)-Φ(xj)2=K(xi,xi)+K(xj,xj)-2K(xi,xj)

In Eq. (1)K (xi, xj) = Φ(xi)T Φ(xj) is the Mercer Kernel [24]. Gaussian kernels,

K(g)(xi,xj)=exp(-xi-xj22σ2),         σ

are the most commonly used ones.

In this paper, we survey existing fuzzy kernel clustering algorithms. We provide an overview of unsupervised algorithms as well as semi-supervised algorithms that integrate partial supervision into the objective function to guide the optimization process. For most algorithms, we describe the objective function being optimized and the necessary conditions to optimize it. We also highlight the advantages and disadvantages of the different methods.

Let {x1, · · ·, xN} be a set of N data points to be partitioned into C clusters, and R = [rjk] is a relational matrix where rjk represents the degree to which pairs of objects xj and xk are related. The matrix R could be given or it could be constructed from the features of the objects. Each object xj belongs to every cluster i with a fuzzy membership uij that satisfies [7]:

0uij1,

and

i=1Cuij=1,   fori,j{1,N}.

The exponent m ∈ (1, ∞) is a constant that determines the level of cluster fuzziness.

2.1 The Feature Space Kernel (FSK) FCM Algorithm

The FSK-FCM algorithm [25] derives a kernel version of the FCM in the feature space by minimizing

Jφ=i=1Cj=1N(uij)mφ(xj)-aiφ2.

subject to Eq. (3). In Eq. (4), aiφ is the center of cluster i, in the feature space, defined as:

aiφ=j=1N(uij)mφ(xj)j=1N(uij)m

Minimization of Eq. (4) with respect to uij yields [25]

uih-1=j=1C[khh-2bir=1n(uir)mkhr+bi2r=1ns=1n(uiruis)mkrskhh-2bjr=1n(ujr)mkhr+bj2r=1ns=1n(ujrujs)mkrs]1/(m-1)

where

bi=(1r=1n(uir)m)

The FSK-FCM algorithm is one of the early kernel versions of the FCM. It is simple and has the flexibility of incorporating different kernels. It achieves this by simply fixing the update equation for the centers in the feature space. Thus, since this equation is not derived to optimize the objective function, there is no guarantee that the obtained centers are optimal. Morevover, in [25] the authors use Gaussian kernels with one global scale (σ) for the entire data. The selection of this parameter is not discussed in [25].

2.2 The Metric Kernel (MK) FCM Algorithm

The MK-FCM [26] is an extension of the FSK-FCM that allows the cluster centers to be optimized. It minimizes

Jφ=i=1Cj=1N(uij)mφ(xj)-φ(ai)2

subject to the constraint in Eq. (3). In Eq. (8), φ(ai) is the center of cluster i in the feature space F. Minimization of Eq. (8) has been proposed only for the case of a Gaussian kernel using the fact that

δK(xj,ai)δai=(xj-ai)σ2K(xj,ai).

In this case, it can be shown [26] that the update equations for the memberships and centers are

uih=[j=1C(1-K(xj,ai)1-K(xj,ai))1/(m-1)]-1,

and

φ(ai)=h=1n(uih)mK(xj,ai)xjh=1n(uih)mK(xj,ai).

Unlike the FSK-FCM, the MK-FCM learns the optimal cluster centers in the feature space. However, the update equations have been derived only for the case of Gaussian kernels with one fixed global scale.

2.3 The Kernelized Non-Euclidean Relational FCM (kNERF) Algorithm

The FSK-FCM and MK-FCM are object based algorithms and require an explicit feature representation of the data to be clustered. The kNERF algorithm [27], on the other hand, is a kernelized version of the non-Euclidean relational FCM algorithm [28], and works on relational data. kKERF does not formulate a new objective function. It simply uses a Gaussian kernel to compute a relational similarity matrix R = [rjk] using

R^=1-exp(-rjkσ2)

Then, it uses the non-Euclidean relational fuzzy (NERF) C-means [28] to cluster .

kNERF is not a true kernel clustering algorithm. It simply uses kernels as a preprocessing step to create the similarity matrix. Since this step is not part of the optimization process, any kernel function can be used. However, this also prevents the kernel parameters from being optimized for the given data. Also, kNERF constructs a relational matrix with one global Gaussian parameter for the entire data. The selection of this parameter is discussed in [27] but there has been no attempt to devise methods to automatically select it.

2.4 The Clustering and Local Scale Learning (LSL) Algorithm

Although good results were obtained using the Gaussian kernel function, its performance depends on the selection of the scale parameter. Moreover, since one global σ is used for the entire data set, it may not be possible to find one optimal parameter when there are large variations between the distributions of the different clusters in the feature space. One way to learn optimal Gaussian parameters is through an exhaustive search of one parameter for each cluster. However, this approach is not practical since it is computationally expensive especially when the data include a large number of clusters and when the dynamic range of possible values for these parameters is large. Moreover, it is not trivial to evaluate the resulting partition in order to select the optimal parameters. To overcome this limitation, the LSL algorithm [29] has been proposed. It minimizes one objective function for both the optimal partition and for cluster dependent Gaussian parameters that reflect the intra-cluster characteristics of the data. The LSL algorithm minimizes

J=i=1Cj=1Nk=1Nuijmuikm(1-exp(-rjkσi))-i=1CKσi2

subject to the membership constraint in Eq. (3). The first term in Eq. (13) seeks compact clusters using a local relational distance, Djki, with respect to each cluster i. This distance is defined as Djki=1-exp(-rjkσi)

where the scaling σi controls the rate of decay of Djki as a function of the distance between xj and xk with respect to cluster i. The cluster dependent σi allows LSL to deal with the large variations, in the feature space, between the distributions and the geometric characteristics of the different clusters. The second term in Eq. (13) is a regularization term to avoid the trivial solution where all the scaling parameters σi are infinitely large.

Optimization of J with respect to uij and σi yields the following update equations [29]:

uij=1t=1C(dij2/dtj2)1m-1,

and

σi=(K2π2-p2NNj=1Nk=1Nuijmuikmrjk)22+p

where || is the cardinality of the neighborhood of j.

In Eq. (16), σi is inversely proportional to the intra-cluster distances with respect to cluster i. Thus, when the intra-cluster dissimilarity is small, σi is large allowing the pairwise distances over the same cluster to be smaller and thus, obtain a more compact cluster. On the other hand, when the intra-cluster dissimilarity is high, σi is small to prevent points which are not highly similar from being mapped to the same location. According to Eq. (16), σi can also be seen as the average time to move between points in cluster i.

LSL has the advantages of learning cluster dependent resolution parameters and can be used to idenify clusters of various densities. However, this also makes the optimization process more complex and prone to local minima. Moreover, the partition generated by LSL depends on the choice of the constant K in Eq. (13).

2.5 The Fuzzy ClusteringWith Learnable Cluster Dependent Kernels (FLeCK) Algorithm

FLeCK [30] is an extension of LSL that does not assume that K is fixed. It learns the scaling parameters and K by optimizing both the intra-cluster and the inter-cluster dissimilarities. Consequently, the learned scale parameters reflect the relative density, size, and position of each cluster with respect to the other clusters. In particular, FLeCK minimizes the intra-cluster distances

Jintra=i=1Cj=1Nk=1Nuijmuikm(1-exp(-rjkσi))+Ki=1Cσi

and maximizes the inter-cluster distances

Jinter=i=1Cj=1Nk=1N[(uijm(1-uikm)+uikm(1-uijm))·(1-exp(-rjkσi))]+Ki=1Cσi

The constant K provides a way of specifying the relative importance of the regularization term (second term in Eqs. (17) and 18)), compared to the sum of intra-cluster (first term in Eq. (17)) and inter-cluster distances (first term in Eq. (18)). The parameter, σi, controls the rate of decay of Djki. This approach learns a cluster dependent σi in order to deal with the large variations between the distributions and the geometric characteristics of each cluster.

Using the Lagrange multiplier technique and assuming that the values of σi do not change significantly from one iteration (t – 1) to the next one (t), it can be shown [30] that

σi(t)=Q1iQ2i

where

Q1i=j=1Nk=1Nuijmuikmrjk2exp(-rjkσi(t-1)),

and

Q2i=j=1Nk=1N[(uijm(1-uikm)+uikm(1-uijm))·rjkexp(-rjkσi(t-1))]+j=1Nk=1Nuijmuikmrjkexp(-rjkσi(t-1)).

The simultaneous optimization of Eqs. (17) and (18) with respect to uij yields [30]:

uij=1t=1C(dij2/dtj2)1m-1.

where

dik2=j=1NDjkiuijmv=1Nuivm-12j=1Nq=1NuijmDjqiuiqm(v=1Nuivm)2

and Djki is as defined in Eq. (14).

2.6 The Fuzzy ClusteringWith MultipleKernels (FCMK) Algorithm

LSL [29] and FLeCK [30] have the advantage of learning cluster dependent scaling parameters. Thus, they can be used to identify clusters of different densities. However, these algorithms have two main limitations. First, learning σi for each cluster involves complex optimization and is prone to local minima. Second, data points assigned to one cluster are constrained to have one common density. An alternative approach, that involves relatively simpler optimization, and overcomes the above limitations is based on multiple kernel learning (MKL) [3134]. Instead of using one kernel (fixed or learned) per cluster, MKL uses multiple kernels for each cluster.

Most existing MKL methods assume that kernel weights remain constant for all data (i.e., space-level), while algorithms like Localized MKL [35] seek kernel weights that are data dependent and locally smooth (i.e., sample-level). Although sample-level non-uniform methods give the largest flexibility, in practice relaxations are typically introduced to enhance tractability. Most of the previous MKL approaches have focused on supervised [32, 33] and semi-supervised learning [35]. Recently, an extension of multiple kernels to unsupervised learning, based on maximum margin clustering, was proposed in [36]. However, this method is only designed for crisp clustering. In [37], Huang et al. designed a multiple kernel fuzzy clustering algorithm which uses one set of global kernel weights for all clusters. Therefore, their approach is not appropriate for clusters of various densities. To overcome these drawbacks, Baili and Frigui proposed the FCMK [38]. FCMK is based on the observation that data within the same cluster tend to manifest similar properties. Thus, the intra-cluster weights can be approximately uniform while kernel weights are allowed to vary across clusters.

Given a set of M embedding mappings that map the data to new feature spaces, Φ = {φ1, . . ., φM}. Each mapping φk maps the p-dimensional data x as a vector φk(x) in its feature space whose dimensionality is Lk. Let {K1, . . ., KM} be the kernels corresponding to these implicit mapping respectively,

Kk(xj,xj)=φk(xj)Tφk(xj).

Since these implicit mappings do not necessarily have the same dimensionality, the authors in [38] construct a new set of independent mappings, Ψ = {ψ1, ψ2, . . ., ψM}, from the original mappings Φ as

ψ1(xj)=[φ1(xj)00],ψ2(xj)=[0φ2(xj)0],,ψM(xj)=[00φM(x)].

Each of these constructed mappings converts x into an L-dimensional vector, where L=k=1MLk. The linear combination of the bases in Ψ within each cluster i is defined as

ψ(i)(x)=k=1Mwikψk(x),

with

wik0,i,and k=1Mwik=1.

A new cluster-dependent kernel, K(i), between object j and cluster i, is computed as a convex combination of M Gaussian kernels K1, K2, . . ., KM with fixed scaling parameters σ1, σ2, . . ., σM, respectively. That is,

K(i)(xj,ai)=k=1Mwik2Kk(xj,ai).

In Eq. (26)W = [wik], where wik ∈ [0, 1] is a resolution weight for kernel Kk with respect to cluster i. A low value of wik indicates that kernel Kk is not relevant for the density estimation of cluster i (due to its scaling parameter), and that this kernel should not have a significant impact on the creation of this cluster. Similarly, a high value of wik indicates that the bandwidth of kernel Kk is highly relevant for the density estimation of cluster i, and that this kernel should be the main factor in the creation of this cluster.

The FCMK algorithm minimizes

J=i=1Cj=1Nuijmψ(i)(xj)-ψ(i)(ai)2

subject to the constraints in Eqs. (3) and (25). It can be shown [38] that minimization of Eq. (27) with respect to uij yields

uij=1q=1C(Dij2Dqj2)1m-1,

where Dij denotes the distance between data xj and cluster center ai, i.e., Dij2=ψ(i)(xj)-ψ(i)(ai)2. Similarly, minimization of Eq. (27) with respect to ai yields

ai=j=1Nuijmψ(i)(xj)j=1Nuijm=j=1Nu^ijψ(i)(xj).

where u^ij=uijmj=1Nuijm is the normalized membership.

Using Eq. (29), the distance between data xj and cluster center ai can be written as

Dij2=k=1Mαijkwik2,

where the coefficients αijk are defined as

αijk=Kk(xj,xj)-2h=1Nu^ihKk(xj,xh)+h=1Nh=1Nu^ihu^ihKk(xh,xh).

Replacing Eq. (30) back in Eq. (27) and introducing a Lagrange multiplier, the optimal kernel weights need to be updated using

wik=1βikk=1M1βik,

where

βik=j=1Nμijmαijk.

The FCMK is based on the MK-FCM [26] and inherits the limitations of object-based clustering. First, multiple runs of the algorithm may be needed since it is susceptible to initialization problems. Second, FCMK is restricted to data for which there is a notion of center (centroid). Finally, FCMK is not suitable for all types of data. For instance, the density is estimated by counting the number of points within a specified radius, σk, of the cluster center. However, for data with multiple resolutions within the same cluster, density should take into account variance between pairs of points and not points to center.

2.7 The Relational Fuzzy Clustering With Multiple Kernels (RFCMK) Algorithm

To overcome the limitations of FCMK, the RFCMK was proposed in [39]. The RFCMK involves dissimilarity between pairs of objects instead of dissimilarity of the objects to a cluster center. It minimizes

J=i=1Cj=1Nh=1Nuijmuihmr^jh(i)h=1Nuihm,

subject to the constraints in Eqs. (3) and (25). In Eq. (34), r^jh(i) is the transformed relational data between feature points xj and xh, with respect to cluster i define using the implicit mapping ψ(i):

r^jh(i)=ψ(i)(xj)-ψ(i)(xh)2=k=1Mwik2Kk(xj,xj)+k=1Mwik2Kk(xh,xh)-2k=1Mwik2Kk(xj,xh).

The distance Dij2 from feature vector xj to the center of the ith cluster, ai, can be written in terms of the relational matrix R^(i)=[r^jh(i)] using [40]

Dij2=(R^(i)vi)j-viTR^(i)vi2,

where vi is the membership vector defined by

vi=(ui1m,,uiNm)Tj=1Nuijm.

It can be shown [39] that optimiation of Eq. (34) with respect to uij yields

μij=1t=1C(Dij2Dtj2)1m-1.

Rewriting the relational data Eq. (35) as

r^jh(i)=k=1Mαjhkwik2,

where

αjhk=Kk(xj,xj)+Kk(xh,xh)-2Kk(xj,xh),

it can be shown [39] that the kernel weights need to be updated using

wik=1p=1M(D¯ik/D¯ip),

where

D¯ik=j=1Nh=1Nuijmuihmαjhk.

In Eq. (41), the resolution weight wik is inversely proportional to αjhk, which is the distance between objects xj and xh induced by kernel k. When, the objects are mapped close to each other, wik will be large indicating that kernel k is relevant. On the other hand, when, the objects are mapped far apart, the weight wik will be small indicating that kernel k is irrelevant.

Clustering is a difficult combinatorial problem that is susceptible to local minima, especially for high dimensional data. Incorporating prior knowledge in the unsupervised learning task, in order to guide the clustering process has attracted considerable interest among researchers in the data mining and machine learning communities. Some of these methods use available side-information to learn the parameters of the Mahalanobis distance (e.g. [4143]). The Non-linear methods have focused on the kernelization of Mahalanobis metric learning methods (e.g. [4446]). However, these approaches are computationally expensive or even infeasible for high dimensional data.

Another approach to guide the clustering process is to use available information in the form of hints, constraints, or labels. Supervision in the form of constraints is more practical than providing class labels. This is because in many real world applications, the true class labels may not be known, and it is much easier to specify whether pairs of points should belong to the same or to different clusters. In fact, pairwise constraints occur naturally in many domains.

3.1 The Semi-Supervised Kernel FCM (SS-KFCM) Algorithm

The SS-KFCM [47] uses L labeled points and U unlabeld points. It assumes that fuzzy memberships can be assigned, using expert knowledge, to the subset of labeled data. Its objective function is defined by adding classification errors of both the labeled and the unlabeled data, i.e.,

J=wj=1Li=1C(uijm-uij,om)(K(xj,ai,0)-K(xj,ai))+(1-w)j=1Ui=1Cuijm(2-2K(xj,ai))

The first term in Eq. (43) is the error between the fuzzy centers calculated based on the learned clusters and the labeled information. The second term minimizes the intra-cluster distances. The optimal solutions to Eq. (43) involves learning the fuzzy memberships and the kernel parameters. It can be shown [47] that for the labeled data, the memberships need to be updated using:

uij=(k=1C(K(xj,xj)-2K(xj,ai)+K(ai,ai)K(xj,xj)-2K(xj,ak)+K(ak,ak))1m-1)-1

and for the unlabeled data, the fuzzy membership need to be updated using:

uij=uijK(xj,ai)k=1CuijK(xj,ak)

Optimization of Eq. (43) with respect to σ does not lead to a closed form expression. Instead, σ is updated iteratively using:

σ=σ-η0δJ(L,U)δσ

The SS-KFCM algorithm assumes that a subset of the data is labeled with fuzzy membership degrees. However, for real applications, this information may not be available and can be tedious to generate using expert knowledge. An alternative approach uses pairwise constraints. For the following semi-supervised algorithms, we assume that pairwise “Should-Link” constraints (pairs of points that should belong to the same cluster) and “Should not-Link” constraints (pairs of points that should belong to different clusters) are provided with the input. let Sl be the indicator matrix for the set of “Should-Link” pairs of constraints such that Sl (j, k) = 1 means that xj and xk should be assigned to the same cluster and 0 otherwise. Similarly, let SNl be the indicator matrix for the set of “Should not-Link” pairs such that SNl (j, k) = 1 means that xj and xk should not be assigned to the same cluster and 0 otherwise.

3.2 Semi-Supervised Relational ClusteringWith Local Scaling

The semi-supervised local scaling learning (SS-LSL) [48] minimizes

J=i=1Cj=1Nk=1Nuijmuikm(1-exp(-rjkσi))-i=1CK1σi2-wi=1Cj=1Nk=1NuijmuikmSl(j,k)+wi=1Cj=1Nk=1NuijmuikmSNl(j,k)

subject to the constraint in Eq. (3). SS-LSL is an extension of the LSL algorithm [29] that incorporates partial supervision. The second term in Eq. (47), is a reward term for satisfying “Should-Link” constraints and a penalty term for violating “Should not-Link” constraints.

In Eq. (47), the weight w ∈ (0, 1) provides a way of specifying the relative importance of the “Should-Link” and “Should not-Link” constraints compared to the sum of inter-cluster distances. In [48], the authors recommend fixing it as the ratio of the number of constraints to the total number of points.

Setting the derivative of J with respect to σi gives the same update equation for σi as the LSL algorithm [29] (Refer to Eq. (16)).

The objective function in Eq. (47) can be rewritten as

J=i=1Cj=1Nk=1NuijmuikmD^jki-i=1CK1σi2

where

D^jki=Djki-wSl(j,k)+wSNl(j,k)

can be regarded as the “effective distance” that takes into account the satisfaction and violation of the constraints.

It can be shown [48] that optimization of J with respect to uij yields

uij=1t=1C(d^ij2/d^tj2)1m-1.

where

d^ik2=j=1ND^jkiuijmv=1Nuivm-12j=1Nq=1NuijmD^jqiuiqm(v=1Nuivm)2

3.3 The SS-FLeCK Algorithm

SS-FLeCK [49] is an extension of FLeCK [30] that incorporates partial supervision information. It attempts to satisfy a set of “Should-Link” and “Should-Not Link” constraints by minimizing

Jintra=i=1Cj=1Nk=1Nuijmuikm(1-exp(-rjkσi))+i=1CKσi-wi=1Cj=1Nk=1NuijmuikmSl(j,k)+wi=1Cj=1Nk=1NuijmuikmSNl(j,k)

The optimal K is determined by simultaneously minimizing the intra-cluster distances Eq. (52) and maximizing the inter-cluster distances in Eq. (18).

It can be shown [49] that optimization of Eq. (52) and Eq. (18) with respect to σi yields the same update equation for σi as in FLeCK (i.e., Eq. (19)). Similarly, optimization of Eq. (52) with respect to the memberships yields the same update equation as SS-LSL (i.e. Eq. (50)).

3.4 The SS-FCMK Algorithm

The SS-FCMK [50] uses the same partial supervision information to extend FCMK [38] by minimizing:

J=i=1Cj=1Nk=1Muij2wik2σk(1-exp(-xj-ai22σk2))+γ(SL(j,h)=1i=1Cs=1,siCuijush+SNL(j,h)=1i=1Cuijuih),

subject to Eqs. (3) and (25).

In Eq. (53), the weight γ ∈ (0, 1) provides a way of specifying the relative importance of the should-link and should not-link constraints compared to the sum of inter-cluster distances. It can be shown [50] that the update equation for the membership of point xj to cluster i is

uij=uijFCMK+uijConstraints

where uijFCMK is given by Eq. (28), and

uijConstraints=γ2Dij2(C¯j-Cij).

In Eq. (55)Cij and j are defined as

Cij=SL(j,h)=1s=1,siCush+SNL(j,h)=1uih,

and

C¯j=i=1C(SL(j,h)=1s=1,siCush+SNL(j,h)=1uih)Dij2i=1C1Dij2.

The first term in Eq. (54) is the membership term in the FCMK algorithm and only focuses on distances between feature points and prototypes. The second term takes into account the available supervision: memberships are reinforced or reduced according to the pairwise constraints provided by the user. Cij is the constraint violation cost associated with feature point xj if it is assigned to cluster i, while j is the weighted average, over all the clusters, of the violation costs associated with feature point xj. If the violated constraints do not involve feature point xj, then uijConstraints=0.

Since the constraints in Eq. (53) do not depend on W explicitly, optimization of Eq. (53) with respect to W yields the same update Eq. (32) as in FCMK.

3.5 The SS-RFCMK Algorithm

The SS-RFCMK [50] used partial supervision to extend RFCMK [39]. It minimizes

J=i=1Cj=1Nh=1Nuij2uik2r^jh(i)2h=1Nμih2+γ(SL(j,h)=1i=1Cs=1,siCuijush+SNL(j,h)=1i=1Cuijuih),

subject to Eqs. (3) and (25).

Using the Lagrange multiplier technique, it can be shown [50] that the memberships need to be updated using

uij=uijRFCMK+uijConstraints

where uijRFCMK is as defined in Eq. (38)

μijConstraints=γ2Dij2(Cj¯-Cij).

In Eq. (60)Cij and Cj¯ are as defined in Eqs. (56) and (57).

Since the constraints in Eq. (58) do not depend on wik explicitly, optimization of Eq. (58) with respect to W yields the same update Eq. (41) as in RFCMK.

In this paper, we have focused on kernel clustering algorithms that are based on the FCM objective function. There are several other fuzzy kernel clustering approaches. For instance, the multisphere support vectors (MSV) clustering [51] extends the SV clustering approach [52]. It defines a cluster as a sphere in the feature space. This yields kernel-based mapping between the original space and the feature space. The kernel possibilistic C-means (KPCM) algorithm [53] applies the kernel approach to the possibilistic C-means (PCM) algorithm [54]. The weighted kernel fuzzy clustering algorithm (WKFCA) [55] is a kernelized version of the SCAD algorithm [56]. It performs feature weighting along with fuzzy kernel clustering. In [57], the authors propose a kernel fuzzy clustering model that extends the additive fuzzy clustering model to the case of a nonlinear model. More specifically, it has been shown that the additive clustering [58] is special case of fuzzy kernel clustering. The similarity structure is then captured and mapped to higher dimensional space by using kernel functions. The kernel fuzzy clustering methods based on local adaptive distances [59] performs feature weighting and fuzzy kernel clustering simultaneously. The sum of the weights of the variables with respect to each cluster is not equal to one as in [55]. However, the product of the weights of the variables for each cluster is constrained be equal to one. The genetic multiple kernel interval type 2 FCM clustering [60] combines heuristic method based on genetic algorithm (GA) and MK-FCM. It automatically determines the optimal number of clusters and the initial centroids in the first step. Then, it adjusts the coefficients of the kernels and combines them in the feature space to produce a new kernel. Other kernel clustering algorithms, based on type 2 fuzzy sets, include [61, 62]. A kernel intuitionistic FCM clustering algorithm (KIFCM) was proposed in [63]. EKIFCM has two main phases. The first one is KIFCM and the second phase is parameters selection of KIFCM with GA. KIFCM is a combination of Atanassov’s intuitionistic fuzzy sets (IFSs) [64] with kernel-based FCM (KFCM) [47].

All of the kernel clustering algorithms that we have outlined do not scale to very large (VL) data sets. VL data or big data are any data that cannot be loaded into the computer’s working memory. Since clustering is one of the primary tasks used in the pattern recognition and data mining communities to search VL databases in various applications, it is desirable to have clustering algorithms that scale well to VL data.

The scalability issue has been studied by many researchers. In general, there are three main approaches to clustering for VL data: sampling-based methods, incremental algorithms and data approximation algorithms. The sample and extend algorithms apply clustering to a sample of the dataset found by either progressive [65] or random sampling [66]. Then, the sample results are noniteratively extended to approximate the partition for the rest of the data. Representative algorithms include random sample and extend kernel FCM (rseKFCM) algorithm [67] and the random and extend RFCMK (rseRFCMK) [68]. The rseRFCMK is an extension of the RFCMK algorithm to VL data based on sampling followed by non-iterative extension. The main problem with sampling-based methods is the choice of the sample. For instance, if the sample is not representative of the full dataset, then sample and extend methods cannot accurately approximate the clustering solution.

On the other hand, the incremental algorithms are designed to operate on the full dataset by separating it into multiple chunks. First, these algorithms sequentially load manageable chunks of the data. Then, they cluster each chunk in a single pass. They construct the final partition as a combination of the results from each chunk. In [66, 67], a single pass kernel fuzzy C-means (spKFCM) algorithm was proposed. The spKFCM algorithm runs weighted KFCM (wKFCM) on sequential chunks of the data, passing the clustering solution from each chunk onto the next. spKFCM is scalable as its space complexity is only based on the size of the sample. The single pass RFCMK (spRFCMK) [68] is an extension of the RFCMK algorithm to VL data. The spRFCMK is an incremental technique that makes one sequential pass through subsets of the data.

In [66], an online algorithm for kernel FCM (oKFCM) was proposed in which data is assumed to arrive in chunks. Each chunk is clustered and the memory is freed by summarizing the clustering result by weighted centroids. In contrast to spKFCM, oKFCM is not truly scalable and is not recommended for VL data. This is because, rather than passing the clustering results from one chunk to the next, oKFCM clusters the weighted centroids in one final run to obtain the clustering for the entire stream.

Fuzzy kernel clustering has proven to be an effective approach to partition data when the clusters are not well-defined. Several algorithms have been proposed in the past few years and were described in this paper. Some algorithms, such as FSK-FCM, are simple and can incorporate any kernel function. However, these methods impose an intuitive equation to update the centers in the feature space. Thus, there is no guarantee that the optimized objective function of these algorithms correspond to the optimal partition.

Other kernel algorithms can solve for the optimal centers by restricting the kernel to be Gaussian. Some of these algorithms, such as FSK-FCM, use one global scale (σ) for all clusters. These are relatively simpler algorithms that are not very sensitive to the initialization. However, they require the user to specify the scale, and may not perform well when the data exhibit large variations between the distributions of the different clusters. Other algorithms, such as FLeCK, use more complex objective functions to learn cluster-dependent kernel resolution parameters. However, because the search space of these methods include many more parameters, they are more prone to local minima. Clustering with multiple kernels is a good compromize between methods that use one fixed global scale and methods that learn one scale for each cluster. These algorithms, such as FCMK, use a set of kernels with different, but fixed, scales. Then, they learn relevance weights for each kernel within each cluster.

Clustering, in general, is a difficult combinatorial problem that is susceptible to local minima. This problem is more acute in Kernel based clustering as they solve the partitioning problem in a much higher feature space. As a result, several semi-supervised kernel clustering methods have emerged. These algorithms incorporate prior knowledge in order to guide the optimization process. This prior knowledge is usually available in the form of a small set of constraints that specify which pairs of points should be assigned to the same cluster, and which ones should be assigned to different clusters.

Scalability to very large data is another desirable feature to have in a clustering algorithm. In fact, many applications involve very large data that cannot be loaded into the computer’s memory. In this case, algorithm scalability becomes a necessary condition. We have outlined three main approaches that have been used with kernel clustering methods. These include sampling-based methods, incremental algorithms, and data approximation algorithms.

  1. Pedrycz, W, Amato, A, Di Lecce, V, and Piuri, V (2008). Fuzzy clustering with partial supervision in organization and classification of digital images. IEEE Transactions on Fuzzy Systems. 16, 1008-1026. http://dx.doi.org/10.1109/TFUZZ.2008.917287
    CrossRef
  2. Loia, V, Pedrycz, W, and Senatore, S (2007). Semantic web content analysis: a study in proximity-based collaborative clustering. IEEE Transactions on Fuzzy Systems. 15, 1294-1312. http://dx.doi.org/10.1109/TFUZZ.2006.889970
    CrossRef
  3. Frigui, H, and Hwang, C (2008). Fuzzy clustering and aggregation of relational data with instance-level constraints. IEEE Transactions on Fuzzy Systems. 16, 1565-1581. http://dx.doi.org/10.1109/TFUZZ.2008.2005692
    CrossRef
  4. Horng, YJ, Chen, SM, Chang, YC, and Lee, CH (2005). A new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques. IEEE Transactions on Fuzzy Systems. 13, 216-228. http://dx.doi.org/10.1109/TFUZZ.2004.840134
    CrossRef
  5. Chatzis, SP, and Varvarigou, TA (2008). A fuzzy clustering approach toward Hidden Markov random field models for enhanced spatially constrained image segmentation. IEEE Transactions on Fuzzy Systems. 16, 1351-1361. http://dx.doi.org/10.1109/TFUZZ.2008.2005008
    CrossRef
  6. Liu, PX, and Meng, MQH (2004). Online data-driven fuzzy clustering with applications to real-time robotic tracking. IEEE Transactions on Fuzzy Systems. 12, 516-523. http://dx.doi.org/10.1109/TFUZZ.2004.832521
    CrossRef
  7. Bezdek, JC (1981). Pattern Recognition With Fuzzy Objective Function Algorithms. New York, NY: Plenum Press
    CrossRef
  8. Girolami, M (2002). Mercer kernel-based clustering in feature space. IEEE Transactions on Neural Networks. 13, -Array. http://dx.doi.org/10.1109/TNN.2002.1000150
    CrossRef
  9. Inokuchi, R, and Miyamoto, S . LVQ clustering and SOM using a kernel function., Proceedings of the IEEE International Conference on Fuzzy Systems, July 25–29, 2004, Budapest, Hungary, Array, pp.1497-1500. http://dx.doi.org/10.1109/FUZZY.2004.1375395
  10. Qin, AK, and Suganthan, PN . Kernel neural gas algorithms with application to cluster analysis., Proceedings of the 17th International Conference on Pattern Recognition, August 23–26, 2004, Cambridge, United Kingdom, Array, pp.617-620. http://dx.doi.org/10.1109/ICPR.2004.1333848
  11. Luxburg, U (2007). A tutorial on spectral clustering. Statistics and Computing. 17, 395-416. http://dx.doi.org/10.1007/s11222-007-9033-z
    CrossRef
  12. Peng, X, Chai, Y, Xu, L, and Man, X . Research on fault diagnosis of marine diesel engine based on grey relational analysis and kernel fuzzy C-means clustering., The Fifth International Conference on Intelligent Computation Technology and Automation, January 12–14, 2012, Zhangjiajie, China, Array, pp.283-286. http://dx.doi.org/10.1109/ICICTA.2012.78
  13. Liu, H, Wu, C, Wan, H, and Wang, H . Parameter identification based on stabilization diagram with kernel fuzzy clustering method., International Conference on Transportation, Mechanical, and Electrical Engineering, December 16–18, 2011, Changchun, China, Array, pp.1185-1188. http://dx.doi.org/10.1109/TMEE.2011.6199417
  14. Fan, J, and Wu, Y . Watermarking algorithm based on kernel fuzzy clustering and singular value decomposition in the complex wavelet transform domain., International Conference on Information Technology, Computer Engineering and Management Sciences, September 24–25, 2011, Nanjing, China, Array, pp.42-46. http://dx.doi.org/10.1109/ICM.2011.121
  15. Qi, LY, and Wang, KG . Kernel fuzzy clustering based classification of Ancient-Ceramic fragments., The 2nd IEEE International Conference on Information Management and Engineering, April 16–18, 2010, Chengdu, China, Array, pp.348-350. http://dx.doi.org/10.1109/ICIME.2010.5477818
  16. Tsai, DM, and Lin, CC (2011). Fuzzy C-means based clustering for linearly and nonlinearly separable data. Pattern Recognition. 44, 1750-1760. http://dx.doi.org/10.1016/j.patcog.2011.02.009
    CrossRef
  17. Mei, C, Xu, H, and Liu, J . A novel NN-based soft sensor based on modified fuzzy kernel clustering for fermentation process., International Conference on Intelligent Human-Machine Systems and Cybernetics, August 26–27, 2009, Hangzhou, China, Array, pp.113-117. http://dx.doi.org/10.1109/IHMSC.2009.153
  18. Liu, DH, Bian, JP, and Sun, XY . The study of fault diagnosis model of DGA for oil-immersed transformer based on fuzzy means Kernel clustering and SVM multi-class object simplified structure., Proceedings of the 7th International Conference on Machine Learning and Cybernetics, July 12–15, 2008, Kunming, China, Array, pp.1505-1509. http://dx.doi.org/10.1109/ICMLC.2008.4620644
  19. Qu, B, and Wang, H . Dynamic fuzzy kernel clustering analysis of enterprises independent: innovation capability based on artificial immunity., International Workshop on Modelling, Simulation and Optimization, December 27–28, 2009, Hong Kong, Array, pp.216-220. http://dx.doi.org/10.1109/WMSO.2008.102
  20. Li, X, and Bian, S . A kernel fuzzy clustering algorithm with spatial constraint based on improved expectation maximization for image segmentation., International Conference on Measuring Technology and Mechatronics Automation, April 11–12, 2009, Zhangjiajie, China, Array, pp.529-533. http://dx.doi.org/10.1109/ICMTMA.2009.59
  21. Liao, L, and Lin, TS . A fast spatial constrained fuzzy kernel clustering algorithm for MRI brain image segmentation., International Conference on Wavelet Analysis and Pattern Recognition, November 2–4, 2008, Beijing, China, Array, pp.82-87. http://dx.doi.org/10.1109/ICWAPR.2007.4420641
  22. Campello, RJGB (2007). A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Letters. 28, 833-841. http://dx.doi.org/10.1016/j.patrec.2006.11.010
    CrossRef
  23. Schölkopf, B, Smola, A, and Mller, KR (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation. 10, 1299-1319. http://dx.doi.org/10.1162/089976698300017467
    CrossRef
  24. Aronszajn, N (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society. 68, 337-404.
    CrossRef
  25. Zhang, D, and Chen, S . Fuzzy clustering using kernel method., International Conference on Control and Automation, June 19, 2002, Xiamen, China, Array, pp.162-163. http://dx.doi.org/10.1109/ICCA.2002.1229535
  26. Zhang, DQ, and Chen, SC . Kernel-based fuzzy and possibilistic c-means clustering., Proceedings of the International Conference on Artificial Neural Network, June 26–29, 2003, Istanbul, Turkey, pp.122-125.
  27. Hathaway, RJ, Huband, JM, and Bezdek, JC . A kernelized non-euclidean relational fuzzy c-means algorithm., IEEE International Conference on Fuzzy Systems, May 22–25, 2005, Reno, NV, Array, pp.414-419. http://dx.doi.org/10.1109/FUZZY.2005.1452429
  28. Hathaway, RJ, and Bezdek, JC (1994). Nerf c-means: non-Euclidean relational fuzzy clustering. Pattern Recognition. 27, 429-437. http://dx.doi.org/10.1016/0031-3203(94)90119-8
    CrossRef
  29. Bchir, O, and Frigui, H . Fuzzy relational kernel clustering with local scaling parameter learning., Proceedings of the 20th IEEE International Workshop on Machine Learning for Signal Processing, August 29-September 1, 2010, Kittila, Finland, Array, pp.289-294. http://dx.doi.org/10.1109/MLSP.2010.5589234
  30. Bchir, O, and Frigui, H . Fuzzy clustering with learnable cluster dependent kernels., IEEE International Conference on Fuzzy Systems, June 27–30, 2011, Taipei, Taiwan, Array, pp.2521-2527. http://dx.doi.org/10.1109/FUZZY.2011.6007411
  31. Lanckriet, GRG, Cristianini, N, Bartlett, P, El Ghaoui, L, and Jordan, MI (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research. 5, 27-72.
  32. Bach, FR, Lanckriet, GRG, and Jordan, MI . Multiple kernel learning, conic duality, and the SMO algorithm., Proceedings of the 21st International Conference on Machine Learning, July 4–8, 2004, Banff, Canada, pp.41-48.
  33. Rakotomamonjy, A, Bach, F, Canu, S, and Grandvalet, Y . More efficiency in multiple kernel learning., Proceedings of the 24th International Conference on Machine Learning, June 20–24, 2007, Corvalis, OR, Array, pp.775-782. http://dx.doi.org/10.1145/1273496.1273594
  34. Ye, J, Ji, S, and Chen, J . Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming., Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 12–15, 2007, San Jose, CA, Array, pp.854-863. http://dx.doi.org/10.1145/1281192.1281283
  35. Gönen, M, and Alpaydin, E . Localized multiple kernel learning., Proceedings of the 25th International Conference on Machine Learning, July 5–9, 2008, Helsinki, Finland, pp.352-359.
  36. Zhao, B, Kwok, JT, and Zhang, C . Multiple kernel clustering., Proceedings of the 9th SIAM International Conference on Data Mining, Sparks, April 30-May 2, 2009, NV, pp.638-649.
  37. Huang, HC, Chuang, YY, and Chen, CS (2012). Multiple kernel fuzzy clustering. IEEE Transactions on Fuzzy Systems. 20, 120-134. http://dx.doi.org/10.1109/TFUZZ.2011.2170175
    CrossRef
  38. Baili, N, and Frigui, H . Fuzzy clustering with multiple kernels in feature space., IEEE International Conference on Fuzzy Systems, June 10–15, 2012, Brisbane, Australia, Array. http://dx.doi.org/10.1109/FUZZ-IEEE.2012.6251146
  39. Baili, N, and Frigui, H . Relational fuzzy clustering with multiple kernels., Proceedings of the 11th IEEE International Conference on Data Mining, December 11, 2011, Vancouver, Canada, Array, pp.488-495. http://dx.doi.org/10.1109/ICDMW.2011.145
  40. Hathaway, RJ, Davenport, JW, and Bezdek, JC (1989). Relational duals of the c-means clustering algorithms. Pattern Recognition. 22, 205-212.
    CrossRef
  41. Xing, EP, Ng, AY, Jordan, MI, and Russell, S . Distance metric learning, with application to clustering with side-information., The 17th Annual Conference on Neural Information Processing Systems, December 8–13, 2003, Vancouver & Whistler, Canada.
  42. Weinberger, K, Blitzer, J, and Saul, L . Distance metric learning for large margin nearest neighbor classification., The 19th Annual Conference on Neural Information Processing Systems, December 5–10, 2005, Vancouver & Whistler, Canada.
  43. Globerson, A, and Roweis, S . Metric learning by collapsing classes., The 19th Annual Conference on Neural Information Processing Systems, December 5–10, 2005, Vancouver & Whistler, Canada.
  44. Shalev-Shwartz, S, Singer, Y, and Ng, AY . Online and batch learning of pseudo-metrics., Proceedings of the 21st International Conference on Machine Learning, July 4–8, 2004, Banff, Canada, pp.94.
  45. Davis, JV, Kulis, B, Jain, P, Sra, S, and Dhillon, IS . Information-theoretic metric learning., Proceedings of the 24th International Conference on Machine Learning, June 20–24, 2007, Corvallis, OR, pp.209-216.
  46. Chatpatanasiri, R, Korsrilabutr, T, Tangchanachaianan, P, and Kijsirikul, B. On kernelization of supervised Mahalanobis distance learners. arXiv:0804.1441. http://arxivorg/abs/08041441
  47. Zhang, H, and Lu, J (2009). Semi-supervised fuzzy clustering: a kernel-based approach. Knowledge-Based Systems. 22, 477-481. http://dx.doi.org/10.1016/j.knosys.2009.06.009
    CrossRef
  48. Bchir, O, Frigui, H, and Ben Ismail, MM . Semisupervised clustering and local scale learning algorithm., World Congress on Computer and Information Technology, June 22–24, 2013, Sousse, Tunisia, article number 6618774. . article number 6618774. http://dx.doi.org/10.1109/WCCIT.2013.6618774
  49. Bchir, O, Frigui, H, and Ismail, MMB (2013). Semi-supervised fuzzy clustering with learnable cluster dependent kernels. International Journal on Artificial Intelligence Tools. 22. article number 1350013
    CrossRef
  50. Baili, N, and Frigui, H . Semi-supervised clustering with cluster-dependent multiple kernels., The 4th International Conference on Information, Intelligence, Systems and Applications Piraeus, July 10–12, 2013, Greece.
  51. Chiang, JH, and Hao, PY (2003). A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Transactions on Fuzzy Systems. 11, 518-527. http://dx.doi.org/10.1109/TFUZZ.2003.814839
    CrossRef
  52. Schölkopf, B, Platt, JC, Shawe-Taylor, J, Smola, AJ, and Williamson, RC (2001). Estimating the support of a highdimensional distribution. Neural Computation. 13, 1443-1471. http://dx.doi.org/10.1162/089976601750264965
    CrossRef
  53. Rhee, FCH, Choi, KS, and Choi, BI (2009). Kernel approach to possibilistic C-means clustering. International Journal of Intelligent Systems. 24, 272-292. http://dx.doi.org/10.1002/int.20336
    CrossRef
  54. Krishnapuram, R, and Keller, JM (1993). A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems. 1, 98-110. http://dx.doi.org/10.1109/91.227387
    CrossRef
  55. Shen, H, Yang, J, Wang, S, and Liu, X (2006). Attribute weighted mercer kernel based fuzzy clustering algorithm for general non-spherical datasets. Soft Computing. 10, 1061-1073. http://dx.doi.org/10.1007/s00500-005-0043-5
    CrossRef
  56. Frigui, H, and Nasraoui, O (2004). Unsupervised learning of prototypes and attribute weights. Pattern Recognition. 37, 567-581. http://dx.doi.org/10.1016/j.patcog.2003.08.002
    CrossRef
  57. Sato-Ilic, M, Ito, S, and Takahashi, S . Generalized kernel fuzzy clustering model., IEEE International Conference on Fuzzy Systems, August 20–24, 2009, Jeju, Korea, Array, pp.421-426. http://dx.doi.org/10.1109/FUZZY.2009.5276876
  58. Sato, M, and Sato, Y (1995). On a general fuzzy additive clustering model. Intelligent Automation & Soft Computing. 1, 439-448. http://dx.doi.org/10.1080/10798587.1995.10750648
    CrossRef
  59. Ferreira, MRP, and de Carvalho, FDAT . Kernel fuzzy clustering methods based on local adaptive distances., IEEE International Conference on Fuzzy Systems, June 10–15, 2012, Brisbane, Australia, Array, pp.1-8. http://dx.doi.org/10.1109/FUZZ-IEEE.2012.6251352
  60. Nguyen, DD, Ngo, LT, and Pham, LT . GMKIT2-FCM: a genetic-based improved multiple kernel interval type-2 fuzzy C-means clustering., IEEE International Conference on Cybernetics, June 13–15, 2013, Lausanne, Switzerland, Array, pp.104-109. http://dx.doi.org/10.1109/CYBConf.2013.6617457
  61. Abhishek, JA, and Rhee, FCH . Interval type-2 fuzzy C-means using multiple kernels., IEEE International Conference on Fuzzy Systems, July 7–10, 2013, Hyderabad, India, Array, pp.1-8. http://dx.doi.org/10.1109/FUZZ-IEEE.2013.6622306
  62. Raza, MA, and Rhee, FCH . Interval type-2 approach to kernel possibilistic C-means clustering., IEEE International Conference on Fuzzy Systems, June 10–15, 2012, Brisbane, Australia, Array, pp.1-7. http://dx.doi.org/10.1109/FUZZ-IEEE.2012.6251233
  63. Lin, K (2013). A novel evolutionary kernel intuitionistic fuzzy C-means clustering algorithm. IEEE Transactions on Fuzzy Systems. PP. article number TFS-2013-0121.R2
  64. Atanassov, KT (1986). Intuitionistic fuzzy sets. Fuzzy Sets and Systems. 20, 87-96. http://dx.doi.org/10.1016/S0165-0114(86)80034-3
    CrossRef
  65. Hathaway, RJ, and Bezdek, JC (2006). Extending fuzzy and probabilistic clustering to very large data sets. Computational Statistics & Data Analysis. 51, 215-234. http://dx.doi.org/10.1016/j.csda.2006.02.008
    CrossRef
  66. Havens, TC, Bezdek, JC, Leckie, C, Hall, LO, and Palaniswami, M (2012). Fuzzy c-Means algorithms for very large data. IEEE Transactions on Fuzzy Systems. 20, 1130-1146. http://dx.doi.org/10.1109/TFUZZ.2012.2201485
    CrossRef
  67. Havens, TC, Bezdek, JC, and Palaniswami, M (2012). Incremental kernel fuzzy c-means. Computational Intelligence, Studies in Computational Intelligence vol 399, Madani, K, Correia, AD, Rosa, A, and Filipe, J, ed. Heidelberg: Springer Berlin, pp. 3-18 http://dx.doi.org/10.1007/978-3-642-27534-01
  68. Baili, N, and Frigui, H (2014). Incremental fuzzy clustering with multiple kernels. ATSP under review.

Hichem Frigui received the Ph.D. degree in computer engineering and computer science from the University of Missouri, Columbia, in 1997. From 1998 to 2004, he was an Assistant Professor with the University of Memphis, Memphis, TN. He is currently a Professor and the Director of the Multimedia Research Lab, University of Louisville, Louisville, KY. He has been active in the research fields of fuzzy pattern recognition, data mining, and image processing with applications to content-based multimedia retrieval and land mine detection. He has participated in the development, testing, and real-time implementation of several land mine detection systems. He has published over 160 journal and refereed conference articles. Dr. Frigui has received the National Science Foundation Career Award for outstanding young scientists.

Ouiem Bchir is an Assistant professor in the Computer Science Department, College of Computer and Information Systems (CCIS), King Saud University, Riyadh, Saudi Arabia. She got her PHD degree from the University of Louisville, KY, USA. Her research interests include kernel clustering and learning, pattern recognition, and hyperspectral image analysis.

Naouel Baili received the Ph.D. degree from the Department of Computer Engineering and Computer Science, University of Louisville, USA, in 2013. She is currently a data scientist with Quintiles, USA. Her research interests include pattern recognition, computer vision, multimedia analysis and machine learning.

Article

Original Article

International Journal of Fuzzy Logic and Intelligent Systems 2013; 13(4): 254-268

Published online December 30, 2013 https://doi.org/10.5391/IJFIS.2013.13.4.254

Copyright © The Korean Institute of Intelligent Systems.

An Overview of Unsupervised and Semi-Supervised Fuzzy Kernel Clustering

Hichem Frigui1, Ouiem Bchir2, and Naouel Baili3

1Multimedia Research Lab, University of Louisville, Louisville, KY, USA
2Computer Science Department, College of Computer and Information Systems (CCIS), King Saud University, Riyadh, Saudi Arabia
3Quintiles, USA

Correspondence to:Hichem Frigui (h.frigui@louisville.edu)

Received: December 12, 2013; Revised: December 23, 2013; Accepted: December 24, 2013

Abstract

For real-world clustering tasks, the input data is typically not easily separable due to the highly complex data structure or when clusters vary in size, density and shape. Kernel-based clustering has proven to be an effective approach to partition such data. In this paper, we provide an overview of several fuzzy kernel clustering algorithms. We focus on methods that optimize an fuzzy C-mean-type objective function. We highlight the advantages and disadvantages of each method. In addition to the completely unsupervised algorithms, we also provide an overview of some semi-supervised fuzzy kernel clustering algorithms. These algorithms use partial supervision information to guide the optimization process and avoid local minima. We also provide an overview of the different approaches that have been used to extend kernel clustering to handle very large data sets.

Keywords: Fuzzy clustering,Kernel-based clustering,Relational Kernel clustering,Multiple Kernel clustering,Semi-supervised clustering

1. Introduction

Clustering is an essential and frequently performed task in pattern recognition and data mining. It aids in a variety of tasks related to understanding and exploring the structure of large and high dimensional data. The goal of cluster analysis is to find natural groupings in a set of objects such that objects in the same cluster are as similar as possible and objects in different clusters are as dissimilar as possible.

In most applications, categories are rarely well separated and boundaries are overlapping. Describing these real world situations by crisp sets does not allow the user to quantitatively distinguish between objects which are strongly associated with a particular category from those that have only a marginal association with multiple ones, particularly, along the overlapping boundaries. Fuzzy clustering methods are good at dealing with these situations. In fact, data elements can belong to more than one cluster with fuzzy membership degrees.

Fuzzy clustering is widely used in the machine learning field. Areas of application of fuzzy cluster analysis include data analysis [1, 2], information retrieval [3, 4], image segmentation [5], and robotics [6]. One of the most widely used fuzzy clustering algorithm is the fuzzy C-means (FCM) algorithm [7].

Typically, the data to be clustered could have an object based or a relational based representation. In object data representation, each object is represented by a feature vector, while for the relational representation only information about how two objects are related is available. Relational data representation is more general in the sense that it can be applied when only the degree of dissimilarity between objects is available, or when groups of similar objects cannot be represented efficiently by a single prototype.

Despite the large number of existing clustering methods, clustering remains a challenging task when the structure of the data does not correspond to easily separable categories, and when clusters vary in size, density, and shape. Kernel based approaches [814] can adapt a specific distance measure in order to make the problem easier. They have attracted great attention and have been applied in many fields such as fault diagnosis of marine diesel engine [12], bridge parameters estimation [13], watermarking [14], automatic classiffication fragments of ceramic cultural relics [15], image segmentation [16], model construction for an erythromycin fermentation process [17], oil-immersed transformer fault diagnosis [18], analyzing enterprises independent innovation capability in order to promote different strategies [19], segmentingmagnetic resonance imaging brain images [20, 21], and classification of audio signals [22].

Kernel clustering approaches rely on their ability to produce nonlinear separating hyper surfaces between clusters by performing a mapping φ from the input space X to a high dimensional feature space F. One of the most relevant aspects in kernel applications is that it is possible to compute Euclidean distances in F without knowing it explicitly. This can be done using the distance kernel trick [23]:

Φ(xi)-Φ(xj)2=K(xi,xi)+K(xj,xj)-2K(xi,xj)

In Eq. (1)K (xi, xj) = Φ(xi)T Φ(xj) is the Mercer Kernel [24]. Gaussian kernels,

K(g)(xi,xj)=exp(-xi-xj22σ2),         σ

are the most commonly used ones.

In this paper, we survey existing fuzzy kernel clustering algorithms. We provide an overview of unsupervised algorithms as well as semi-supervised algorithms that integrate partial supervision into the objective function to guide the optimization process. For most algorithms, we describe the objective function being optimized and the necessary conditions to optimize it. We also highlight the advantages and disadvantages of the different methods.

2. Unsupervised FuzzyKernel Based Clustering

Let {x1, · · ·, xN} be a set of N data points to be partitioned into C clusters, and R = [rjk] is a relational matrix where rjk represents the degree to which pairs of objects xj and xk are related. The matrix R could be given or it could be constructed from the features of the objects. Each object xj belongs to every cluster i with a fuzzy membership uij that satisfies [7]:

0uij1,

and

i=1Cuij=1,   fori,j{1,N}.

The exponent m ∈ (1, ∞) is a constant that determines the level of cluster fuzziness.

2.1 The Feature Space Kernel (FSK) FCM Algorithm

The FSK-FCM algorithm [25] derives a kernel version of the FCM in the feature space by minimizing

Jφ=i=1Cj=1N(uij)mφ(xj)-aiφ2.

subject to Eq. (3). In Eq. (4), aiφ is the center of cluster i, in the feature space, defined as:

aiφ=j=1N(uij)mφ(xj)j=1N(uij)m

Minimization of Eq. (4) with respect to uij yields [25]

uih-1=j=1C[khh-2bir=1n(uir)mkhr+bi2r=1ns=1n(uiruis)mkrskhh-2bjr=1n(ujr)mkhr+bj2r=1ns=1n(ujrujs)mkrs]1/(m-1)

where

bi=(1r=1n(uir)m)

The FSK-FCM algorithm is one of the early kernel versions of the FCM. It is simple and has the flexibility of incorporating different kernels. It achieves this by simply fixing the update equation for the centers in the feature space. Thus, since this equation is not derived to optimize the objective function, there is no guarantee that the obtained centers are optimal. Morevover, in [25] the authors use Gaussian kernels with one global scale (σ) for the entire data. The selection of this parameter is not discussed in [25].

2.2 The Metric Kernel (MK) FCM Algorithm

The MK-FCM [26] is an extension of the FSK-FCM that allows the cluster centers to be optimized. It minimizes

Jφ=i=1Cj=1N(uij)mφ(xj)-φ(ai)2

subject to the constraint in Eq. (3). In Eq. (8), φ(ai) is the center of cluster i in the feature space F. Minimization of Eq. (8) has been proposed only for the case of a Gaussian kernel using the fact that

δK(xj,ai)δai=(xj-ai)σ2K(xj,ai).

In this case, it can be shown [26] that the update equations for the memberships and centers are

uih=[j=1C(1-K(xj,ai)1-K(xj,ai))1/(m-1)]-1,

and

φ(ai)=h=1n(uih)mK(xj,ai)xjh=1n(uih)mK(xj,ai).

Unlike the FSK-FCM, the MK-FCM learns the optimal cluster centers in the feature space. However, the update equations have been derived only for the case of Gaussian kernels with one fixed global scale.

2.3 The Kernelized Non-Euclidean Relational FCM (kNERF) Algorithm

The FSK-FCM and MK-FCM are object based algorithms and require an explicit feature representation of the data to be clustered. The kNERF algorithm [27], on the other hand, is a kernelized version of the non-Euclidean relational FCM algorithm [28], and works on relational data. kKERF does not formulate a new objective function. It simply uses a Gaussian kernel to compute a relational similarity matrix R = [rjk] using

R^=1-exp(-rjkσ2)

Then, it uses the non-Euclidean relational fuzzy (NERF) C-means [28] to cluster .

kNERF is not a true kernel clustering algorithm. It simply uses kernels as a preprocessing step to create the similarity matrix. Since this step is not part of the optimization process, any kernel function can be used. However, this also prevents the kernel parameters from being optimized for the given data. Also, kNERF constructs a relational matrix with one global Gaussian parameter for the entire data. The selection of this parameter is discussed in [27] but there has been no attempt to devise methods to automatically select it.

2.4 The Clustering and Local Scale Learning (LSL) Algorithm

Although good results were obtained using the Gaussian kernel function, its performance depends on the selection of the scale parameter. Moreover, since one global σ is used for the entire data set, it may not be possible to find one optimal parameter when there are large variations between the distributions of the different clusters in the feature space. One way to learn optimal Gaussian parameters is through an exhaustive search of one parameter for each cluster. However, this approach is not practical since it is computationally expensive especially when the data include a large number of clusters and when the dynamic range of possible values for these parameters is large. Moreover, it is not trivial to evaluate the resulting partition in order to select the optimal parameters. To overcome this limitation, the LSL algorithm [29] has been proposed. It minimizes one objective function for both the optimal partition and for cluster dependent Gaussian parameters that reflect the intra-cluster characteristics of the data. The LSL algorithm minimizes

J=i=1Cj=1Nk=1Nuijmuikm(1-exp(-rjkσi))-i=1CKσi2

subject to the membership constraint in Eq. (3). The first term in Eq. (13) seeks compact clusters using a local relational distance, Djki, with respect to each cluster i. This distance is defined as Djki=1-exp(-rjkσi)

where the scaling σi controls the rate of decay of Djki as a function of the distance between xj and xk with respect to cluster i. The cluster dependent σi allows LSL to deal with the large variations, in the feature space, between the distributions and the geometric characteristics of the different clusters. The second term in Eq. (13) is a regularization term to avoid the trivial solution where all the scaling parameters σi are infinitely large.

Optimization of J with respect to uij and σi yields the following update equations [29]:

uij=1t=1C(dij2/dtj2)1m-1,

and

σi=(K2π2-p2NNj=1Nk=1Nuijmuikmrjk)22+p

where || is the cardinality of the neighborhood of j.

In Eq. (16), σi is inversely proportional to the intra-cluster distances with respect to cluster i. Thus, when the intra-cluster dissimilarity is small, σi is large allowing the pairwise distances over the same cluster to be smaller and thus, obtain a more compact cluster. On the other hand, when the intra-cluster dissimilarity is high, σi is small to prevent points which are not highly similar from being mapped to the same location. According to Eq. (16), σi can also be seen as the average time to move between points in cluster i.

LSL has the advantages of learning cluster dependent resolution parameters and can be used to idenify clusters of various densities. However, this also makes the optimization process more complex and prone to local minima. Moreover, the partition generated by LSL depends on the choice of the constant K in Eq. (13).

2.5 The Fuzzy ClusteringWith Learnable Cluster Dependent Kernels (FLeCK) Algorithm

FLeCK [30] is an extension of LSL that does not assume that K is fixed. It learns the scaling parameters and K by optimizing both the intra-cluster and the inter-cluster dissimilarities. Consequently, the learned scale parameters reflect the relative density, size, and position of each cluster with respect to the other clusters. In particular, FLeCK minimizes the intra-cluster distances

Jintra=i=1Cj=1Nk=1Nuijmuikm(1-exp(-rjkσi))+Ki=1Cσi

and maximizes the inter-cluster distances

Jinter=i=1Cj=1Nk=1N[(uijm(1-uikm)+uikm(1-uijm))·(1-exp(-rjkσi))]+Ki=1Cσi

The constant K provides a way of specifying the relative importance of the regularization term (second term in Eqs. (17) and 18)), compared to the sum of intra-cluster (first term in Eq. (17)) and inter-cluster distances (first term in Eq. (18)). The parameter, σi, controls the rate of decay of Djki. This approach learns a cluster dependent σi in order to deal with the large variations between the distributions and the geometric characteristics of each cluster.

Using the Lagrange multiplier technique and assuming that the values of σi do not change significantly from one iteration (t – 1) to the next one (t), it can be shown [30] that

σi(t)=Q1iQ2i

where

Q1i=j=1Nk=1Nuijmuikmrjk2exp(-rjkσi(t-1)),

and

Q2i=j=1Nk=1N[(uijm(1-uikm)+uikm(1-uijm))·rjkexp(-rjkσi(t-1))]+j=1Nk=1Nuijmuikmrjkexp(-rjkσi(t-1)).

The simultaneous optimization of Eqs. (17) and (18) with respect to uij yields [30]:

uij=1t=1C(dij2/dtj2)1m-1.

where

dik2=j=1NDjkiuijmv=1Nuivm-12j=1Nq=1NuijmDjqiuiqm(v=1Nuivm)2

and Djki is as defined in Eq. (14).

2.6 The Fuzzy ClusteringWith MultipleKernels (FCMK) Algorithm

LSL [29] and FLeCK [30] have the advantage of learning cluster dependent scaling parameters. Thus, they can be used to identify clusters of different densities. However, these algorithms have two main limitations. First, learning σi for each cluster involves complex optimization and is prone to local minima. Second, data points assigned to one cluster are constrained to have one common density. An alternative approach, that involves relatively simpler optimization, and overcomes the above limitations is based on multiple kernel learning (MKL) [3134]. Instead of using one kernel (fixed or learned) per cluster, MKL uses multiple kernels for each cluster.

Most existing MKL methods assume that kernel weights remain constant for all data (i.e., space-level), while algorithms like Localized MKL [35] seek kernel weights that are data dependent and locally smooth (i.e., sample-level). Although sample-level non-uniform methods give the largest flexibility, in practice relaxations are typically introduced to enhance tractability. Most of the previous MKL approaches have focused on supervised [32, 33] and semi-supervised learning [35]. Recently, an extension of multiple kernels to unsupervised learning, based on maximum margin clustering, was proposed in [36]. However, this method is only designed for crisp clustering. In [37], Huang et al. designed a multiple kernel fuzzy clustering algorithm which uses one set of global kernel weights for all clusters. Therefore, their approach is not appropriate for clusters of various densities. To overcome these drawbacks, Baili and Frigui proposed the FCMK [38]. FCMK is based on the observation that data within the same cluster tend to manifest similar properties. Thus, the intra-cluster weights can be approximately uniform while kernel weights are allowed to vary across clusters.

Given a set of M embedding mappings that map the data to new feature spaces, Φ = {φ1, . . ., φM}. Each mapping φk maps the p-dimensional data x as a vector φk(x) in its feature space whose dimensionality is Lk. Let {K1, . . ., KM} be the kernels corresponding to these implicit mapping respectively,

Kk(xj,xj)=φk(xj)Tφk(xj).

Since these implicit mappings do not necessarily have the same dimensionality, the authors in [38] construct a new set of independent mappings, Ψ = {ψ1, ψ2, . . ., ψM}, from the original mappings Φ as

ψ1(xj)=[φ1(xj)00],ψ2(xj)=[0φ2(xj)0],,ψM(xj)=[00φM(x)].

Each of these constructed mappings converts x into an L-dimensional vector, where L=k=1MLk. The linear combination of the bases in Ψ within each cluster i is defined as

ψ(i)(x)=k=1Mwikψk(x),

with

wik0,i,and k=1Mwik=1.

A new cluster-dependent kernel, K(i), between object j and cluster i, is computed as a convex combination of M Gaussian kernels K1, K2, . . ., KM with fixed scaling parameters σ1, σ2, . . ., σM, respectively. That is,

K(i)(xj,ai)=k=1Mwik2Kk(xj,ai).

In Eq. (26)W = [wik], where wik ∈ [0, 1] is a resolution weight for kernel Kk with respect to cluster i. A low value of wik indicates that kernel Kk is not relevant for the density estimation of cluster i (due to its scaling parameter), and that this kernel should not have a significant impact on the creation of this cluster. Similarly, a high value of wik indicates that the bandwidth of kernel Kk is highly relevant for the density estimation of cluster i, and that this kernel should be the main factor in the creation of this cluster.

The FCMK algorithm minimizes

J=i=1Cj=1Nuijmψ(i)(xj)-ψ(i)(ai)2

subject to the constraints in Eqs. (3) and (25). It can be shown [38] that minimization of Eq. (27) with respect to uij yields

uij=1q=1C(Dij2Dqj2)1m-1,

where Dij denotes the distance between data xj and cluster center ai, i.e., Dij2=ψ(i)(xj)-ψ(i)(ai)2. Similarly, minimization of Eq. (27) with respect to ai yields

ai=j=1Nuijmψ(i)(xj)j=1Nuijm=j=1Nu^ijψ(i)(xj).

where u^ij=uijmj=1Nuijm is the normalized membership.

Using Eq. (29), the distance between data xj and cluster center ai can be written as

Dij2=k=1Mαijkwik2,

where the coefficients αijk are defined as

αijk=Kk(xj,xj)-2h=1Nu^ihKk(xj,xh)+h=1Nh=1Nu^ihu^ihKk(xh,xh).

Replacing Eq. (30) back in Eq. (27) and introducing a Lagrange multiplier, the optimal kernel weights need to be updated using

wik=1βikk=1M1βik,

where

βik=j=1Nμijmαijk.

The FCMK is based on the MK-FCM [26] and inherits the limitations of object-based clustering. First, multiple runs of the algorithm may be needed since it is susceptible to initialization problems. Second, FCMK is restricted to data for which there is a notion of center (centroid). Finally, FCMK is not suitable for all types of data. For instance, the density is estimated by counting the number of points within a specified radius, σk, of the cluster center. However, for data with multiple resolutions within the same cluster, density should take into account variance between pairs of points and not points to center.

2.7 The Relational Fuzzy Clustering With Multiple Kernels (RFCMK) Algorithm

To overcome the limitations of FCMK, the RFCMK was proposed in [39]. The RFCMK involves dissimilarity between pairs of objects instead of dissimilarity of the objects to a cluster center. It minimizes

J=i=1Cj=1Nh=1Nuijmuihmr^jh(i)h=1Nuihm,

subject to the constraints in Eqs. (3) and (25). In Eq. (34), r^jh(i) is the transformed relational data between feature points xj and xh, with respect to cluster i define using the implicit mapping ψ(i):

r^jh(i)=ψ(i)(xj)-ψ(i)(xh)2=k=1Mwik2Kk(xj,xj)+k=1Mwik2Kk(xh,xh)-2k=1Mwik2Kk(xj,xh).

The distance Dij2 from feature vector xj to the center of the ith cluster, ai, can be written in terms of the relational matrix R^(i)=[r^jh(i)] using [40]

Dij2=(R^(i)vi)j-viTR^(i)vi2,

where vi is the membership vector defined by

vi=(ui1m,,uiNm)Tj=1Nuijm.

It can be shown [39] that optimiation of Eq. (34) with respect to uij yields

μij=1t=1C(Dij2Dtj2)1m-1.

Rewriting the relational data Eq. (35) as

r^jh(i)=k=1Mαjhkwik2,

where

αjhk=Kk(xj,xj)+Kk(xh,xh)-2Kk(xj,xh),

it can be shown [39] that the kernel weights need to be updated using

wik=1p=1M(D¯ik/D¯ip),

where

D¯ik=j=1Nh=1Nuijmuihmαjhk.

In Eq. (41), the resolution weight wik is inversely proportional to αjhk, which is the distance between objects xj and xh induced by kernel k. When, the objects are mapped close to each other, wik will be large indicating that kernel k is relevant. On the other hand, when, the objects are mapped far apart, the weight wik will be small indicating that kernel k is irrelevant.

3. Semi-Supervised Fuzzy Kernel Based Clustering

Clustering is a difficult combinatorial problem that is susceptible to local minima, especially for high dimensional data. Incorporating prior knowledge in the unsupervised learning task, in order to guide the clustering process has attracted considerable interest among researchers in the data mining and machine learning communities. Some of these methods use available side-information to learn the parameters of the Mahalanobis distance (e.g. [4143]). The Non-linear methods have focused on the kernelization of Mahalanobis metric learning methods (e.g. [4446]). However, these approaches are computationally expensive or even infeasible for high dimensional data.

Another approach to guide the clustering process is to use available information in the form of hints, constraints, or labels. Supervision in the form of constraints is more practical than providing class labels. This is because in many real world applications, the true class labels may not be known, and it is much easier to specify whether pairs of points should belong to the same or to different clusters. In fact, pairwise constraints occur naturally in many domains.

3.1 The Semi-Supervised Kernel FCM (SS-KFCM) Algorithm

The SS-KFCM [47] uses L labeled points and U unlabeld points. It assumes that fuzzy memberships can be assigned, using expert knowledge, to the subset of labeled data. Its objective function is defined by adding classification errors of both the labeled and the unlabeled data, i.e.,

J=wj=1Li=1C(uijm-uij,om)(K(xj,ai,0)-K(xj,ai))+(1-w)j=1Ui=1Cuijm(2-2K(xj,ai))

The first term in Eq. (43) is the error between the fuzzy centers calculated based on the learned clusters and the labeled information. The second term minimizes the intra-cluster distances. The optimal solutions to Eq. (43) involves learning the fuzzy memberships and the kernel parameters. It can be shown [47] that for the labeled data, the memberships need to be updated using:

uij=(k=1C(K(xj,xj)-2K(xj,ai)+K(ai,ai)K(xj,xj)-2K(xj,ak)+K(ak,ak))1m-1)-1

and for the unlabeled data, the fuzzy membership need to be updated using:

uij=uijK(xj,ai)k=1CuijK(xj,ak)

Optimization of Eq. (43) with respect to σ does not lead to a closed form expression. Instead, σ is updated iteratively using:

σ=σ-η0δJ(L,U)δσ

The SS-KFCM algorithm assumes that a subset of the data is labeled with fuzzy membership degrees. However, for real applications, this information may not be available and can be tedious to generate using expert knowledge. An alternative approach uses pairwise constraints. For the following semi-supervised algorithms, we assume that pairwise “Should-Link” constraints (pairs of points that should belong to the same cluster) and “Should not-Link” constraints (pairs of points that should belong to different clusters) are provided with the input. let Sl be the indicator matrix for the set of “Should-Link” pairs of constraints such that Sl (j, k) = 1 means that xj and xk should be assigned to the same cluster and 0 otherwise. Similarly, let SNl be the indicator matrix for the set of “Should not-Link” pairs such that SNl (j, k) = 1 means that xj and xk should not be assigned to the same cluster and 0 otherwise.

3.2 Semi-Supervised Relational ClusteringWith Local Scaling

The semi-supervised local scaling learning (SS-LSL) [48] minimizes

J=i=1Cj=1Nk=1Nuijmuikm(1-exp(-rjkσi))-i=1CK1σi2-wi=1Cj=1Nk=1NuijmuikmSl(j,k)+wi=1Cj=1Nk=1NuijmuikmSNl(j,k)

subject to the constraint in Eq. (3). SS-LSL is an extension of the LSL algorithm [29] that incorporates partial supervision. The second term in Eq. (47), is a reward term for satisfying “Should-Link” constraints and a penalty term for violating “Should not-Link” constraints.

In Eq. (47), the weight w ∈ (0, 1) provides a way of specifying the relative importance of the “Should-Link” and “Should not-Link” constraints compared to the sum of inter-cluster distances. In [48], the authors recommend fixing it as the ratio of the number of constraints to the total number of points.

Setting the derivative of J with respect to σi gives the same update equation for σi as the LSL algorithm [29] (Refer to Eq. (16)).

The objective function in Eq. (47) can be rewritten as

J=i=1Cj=1Nk=1NuijmuikmD^jki-i=1CK1σi2

where

D^jki=Djki-wSl(j,k)+wSNl(j,k)

can be regarded as the “effective distance” that takes into account the satisfaction and violation of the constraints.

It can be shown [48] that optimization of J with respect to uij yields

uij=1t=1C(d^ij2/d^tj2)1m-1.

where

d^ik2=j=1ND^jkiuijmv=1Nuivm-12j=1Nq=1NuijmD^jqiuiqm(v=1Nuivm)2

3.3 The SS-FLeCK Algorithm

SS-FLeCK [49] is an extension of FLeCK [30] that incorporates partial supervision information. It attempts to satisfy a set of “Should-Link” and “Should-Not Link” constraints by minimizing

Jintra=i=1Cj=1Nk=1Nuijmuikm(1-exp(-rjkσi))+i=1CKσi-wi=1Cj=1Nk=1NuijmuikmSl(j,k)+wi=1Cj=1Nk=1NuijmuikmSNl(j,k)

The optimal K is determined by simultaneously minimizing the intra-cluster distances Eq. (52) and maximizing the inter-cluster distances in Eq. (18).

It can be shown [49] that optimization of Eq. (52) and Eq. (18) with respect to σi yields the same update equation for σi as in FLeCK (i.e., Eq. (19)). Similarly, optimization of Eq. (52) with respect to the memberships yields the same update equation as SS-LSL (i.e. Eq. (50)).

3.4 The SS-FCMK Algorithm

The SS-FCMK [50] uses the same partial supervision information to extend FCMK [38] by minimizing:

J=i=1Cj=1Nk=1Muij2wik2σk(1-exp(-xj-ai22σk2))+γ(SL(j,h)=1i=1Cs=1,siCuijush+SNL(j,h)=1i=1Cuijuih),

subject to Eqs. (3) and (25).

In Eq. (53), the weight γ ∈ (0, 1) provides a way of specifying the relative importance of the should-link and should not-link constraints compared to the sum of inter-cluster distances. It can be shown [50] that the update equation for the membership of point xj to cluster i is

uij=uijFCMK+uijConstraints

where uijFCMK is given by Eq. (28), and

uijConstraints=γ2Dij2(C¯j-Cij).

In Eq. (55)Cij and j are defined as

Cij=SL(j,h)=1s=1,siCush+SNL(j,h)=1uih,

and

C¯j=i=1C(SL(j,h)=1s=1,siCush+SNL(j,h)=1uih)Dij2i=1C1Dij2.

The first term in Eq. (54) is the membership term in the FCMK algorithm and only focuses on distances between feature points and prototypes. The second term takes into account the available supervision: memberships are reinforced or reduced according to the pairwise constraints provided by the user. Cij is the constraint violation cost associated with feature point xj if it is assigned to cluster i, while j is the weighted average, over all the clusters, of the violation costs associated with feature point xj. If the violated constraints do not involve feature point xj, then uijConstraints=0.

Since the constraints in Eq. (53) do not depend on W explicitly, optimization of Eq. (53) with respect to W yields the same update Eq. (32) as in FCMK.

3.5 The SS-RFCMK Algorithm

The SS-RFCMK [50] used partial supervision to extend RFCMK [39]. It minimizes

J=i=1Cj=1Nh=1Nuij2uik2r^jh(i)2h=1Nμih2+γ(SL(j,h)=1i=1Cs=1,siCuijush+SNL(j,h)=1i=1Cuijuih),

subject to Eqs. (3) and (25).

Using the Lagrange multiplier technique, it can be shown [50] that the memberships need to be updated using

uij=uijRFCMK+uijConstraints

where uijRFCMK is as defined in Eq. (38)

μijConstraints=γ2Dij2(Cj¯-Cij).

In Eq. (60)Cij and Cj¯ are as defined in Eqs. (56) and (57).

Since the constraints in Eq. (58) do not depend on wik explicitly, optimization of Eq. (58) with respect to W yields the same update Eq. (41) as in RFCMK.

4. Other Fuzzy Kernel Clustering Algorithms

In this paper, we have focused on kernel clustering algorithms that are based on the FCM objective function. There are several other fuzzy kernel clustering approaches. For instance, the multisphere support vectors (MSV) clustering [51] extends the SV clustering approach [52]. It defines a cluster as a sphere in the feature space. This yields kernel-based mapping between the original space and the feature space. The kernel possibilistic C-means (KPCM) algorithm [53] applies the kernel approach to the possibilistic C-means (PCM) algorithm [54]. The weighted kernel fuzzy clustering algorithm (WKFCA) [55] is a kernelized version of the SCAD algorithm [56]. It performs feature weighting along with fuzzy kernel clustering. In [57], the authors propose a kernel fuzzy clustering model that extends the additive fuzzy clustering model to the case of a nonlinear model. More specifically, it has been shown that the additive clustering [58] is special case of fuzzy kernel clustering. The similarity structure is then captured and mapped to higher dimensional space by using kernel functions. The kernel fuzzy clustering methods based on local adaptive distances [59] performs feature weighting and fuzzy kernel clustering simultaneously. The sum of the weights of the variables with respect to each cluster is not equal to one as in [55]. However, the product of the weights of the variables for each cluster is constrained be equal to one. The genetic multiple kernel interval type 2 FCM clustering [60] combines heuristic method based on genetic algorithm (GA) and MK-FCM. It automatically determines the optimal number of clusters and the initial centroids in the first step. Then, it adjusts the coefficients of the kernels and combines them in the feature space to produce a new kernel. Other kernel clustering algorithms, based on type 2 fuzzy sets, include [61, 62]. A kernel intuitionistic FCM clustering algorithm (KIFCM) was proposed in [63]. EKIFCM has two main phases. The first one is KIFCM and the second phase is parameters selection of KIFCM with GA. KIFCM is a combination of Atanassov’s intuitionistic fuzzy sets (IFSs) [64] with kernel-based FCM (KFCM) [47].

5. FuzzyKernel Based Clustering for Very Large Data

All of the kernel clustering algorithms that we have outlined do not scale to very large (VL) data sets. VL data or big data are any data that cannot be loaded into the computer’s working memory. Since clustering is one of the primary tasks used in the pattern recognition and data mining communities to search VL databases in various applications, it is desirable to have clustering algorithms that scale well to VL data.

The scalability issue has been studied by many researchers. In general, there are three main approaches to clustering for VL data: sampling-based methods, incremental algorithms and data approximation algorithms. The sample and extend algorithms apply clustering to a sample of the dataset found by either progressive [65] or random sampling [66]. Then, the sample results are noniteratively extended to approximate the partition for the rest of the data. Representative algorithms include random sample and extend kernel FCM (rseKFCM) algorithm [67] and the random and extend RFCMK (rseRFCMK) [68]. The rseRFCMK is an extension of the RFCMK algorithm to VL data based on sampling followed by non-iterative extension. The main problem with sampling-based methods is the choice of the sample. For instance, if the sample is not representative of the full dataset, then sample and extend methods cannot accurately approximate the clustering solution.

On the other hand, the incremental algorithms are designed to operate on the full dataset by separating it into multiple chunks. First, these algorithms sequentially load manageable chunks of the data. Then, they cluster each chunk in a single pass. They construct the final partition as a combination of the results from each chunk. In [66, 67], a single pass kernel fuzzy C-means (spKFCM) algorithm was proposed. The spKFCM algorithm runs weighted KFCM (wKFCM) on sequential chunks of the data, passing the clustering solution from each chunk onto the next. spKFCM is scalable as its space complexity is only based on the size of the sample. The single pass RFCMK (spRFCMK) [68] is an extension of the RFCMK algorithm to VL data. The spRFCMK is an incremental technique that makes one sequential pass through subsets of the data.

In [66], an online algorithm for kernel FCM (oKFCM) was proposed in which data is assumed to arrive in chunks. Each chunk is clustered and the memory is freed by summarizing the clustering result by weighted centroids. In contrast to spKFCM, oKFCM is not truly scalable and is not recommended for VL data. This is because, rather than passing the clustering results from one chunk to the next, oKFCM clusters the weighted centroids in one final run to obtain the clustering for the entire stream.

6. Conclusion

Fuzzy kernel clustering has proven to be an effective approach to partition data when the clusters are not well-defined. Several algorithms have been proposed in the past few years and were described in this paper. Some algorithms, such as FSK-FCM, are simple and can incorporate any kernel function. However, these methods impose an intuitive equation to update the centers in the feature space. Thus, there is no guarantee that the optimized objective function of these algorithms correspond to the optimal partition.

Other kernel algorithms can solve for the optimal centers by restricting the kernel to be Gaussian. Some of these algorithms, such as FSK-FCM, use one global scale (σ) for all clusters. These are relatively simpler algorithms that are not very sensitive to the initialization. However, they require the user to specify the scale, and may not perform well when the data exhibit large variations between the distributions of the different clusters. Other algorithms, such as FLeCK, use more complex objective functions to learn cluster-dependent kernel resolution parameters. However, because the search space of these methods include many more parameters, they are more prone to local minima. Clustering with multiple kernels is a good compromize between methods that use one fixed global scale and methods that learn one scale for each cluster. These algorithms, such as FCMK, use a set of kernels with different, but fixed, scales. Then, they learn relevance weights for each kernel within each cluster.

Clustering, in general, is a difficult combinatorial problem that is susceptible to local minima. This problem is more acute in Kernel based clustering as they solve the partitioning problem in a much higher feature space. As a result, several semi-supervised kernel clustering methods have emerged. These algorithms incorporate prior knowledge in order to guide the optimization process. This prior knowledge is usually available in the form of a small set of constraints that specify which pairs of points should be assigned to the same cluster, and which ones should be assigned to different clusters.

Scalability to very large data is another desirable feature to have in a clustering algorithm. In fact, many applications involve very large data that cannot be loaded into the computer’s memory. In this case, algorithm scalability becomes a necessary condition. We have outlined three main approaches that have been used with kernel clustering methods. These include sampling-based methods, incremental algorithms, and data approximation algorithms.

References

  1. Pedrycz, W, Amato, A, Di Lecce, V, and Piuri, V (2008). Fuzzy clustering with partial supervision in organization and classification of digital images. IEEE Transactions on Fuzzy Systems. 16, 1008-1026. http://dx.doi.org/10.1109/TFUZZ.2008.917287
    CrossRef
  2. Loia, V, Pedrycz, W, and Senatore, S (2007). Semantic web content analysis: a study in proximity-based collaborative clustering. IEEE Transactions on Fuzzy Systems. 15, 1294-1312. http://dx.doi.org/10.1109/TFUZZ.2006.889970
    CrossRef
  3. Frigui, H, and Hwang, C (2008). Fuzzy clustering and aggregation of relational data with instance-level constraints. IEEE Transactions on Fuzzy Systems. 16, 1565-1581. http://dx.doi.org/10.1109/TFUZZ.2008.2005692
    CrossRef
  4. Horng, YJ, Chen, SM, Chang, YC, and Lee, CH (2005). A new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques. IEEE Transactions on Fuzzy Systems. 13, 216-228. http://dx.doi.org/10.1109/TFUZZ.2004.840134
    CrossRef
  5. Chatzis, SP, and Varvarigou, TA (2008). A fuzzy clustering approach toward Hidden Markov random field models for enhanced spatially constrained image segmentation. IEEE Transactions on Fuzzy Systems. 16, 1351-1361. http://dx.doi.org/10.1109/TFUZZ.2008.2005008
    CrossRef
  6. Liu, PX, and Meng, MQH (2004). Online data-driven fuzzy clustering with applications to real-time robotic tracking. IEEE Transactions on Fuzzy Systems. 12, 516-523. http://dx.doi.org/10.1109/TFUZZ.2004.832521
    CrossRef
  7. Bezdek, JC (1981). Pattern Recognition With Fuzzy Objective Function Algorithms. New York, NY: Plenum Press
    CrossRef
  8. Girolami, M (2002). Mercer kernel-based clustering in feature space. IEEE Transactions on Neural Networks. 13, -Array. http://dx.doi.org/10.1109/TNN.2002.1000150
    CrossRef
  9. Inokuchi, R, and Miyamoto, S . LVQ clustering and SOM using a kernel function., Proceedings of the IEEE International Conference on Fuzzy Systems, July 25–29, 2004, Budapest, Hungary, Array, pp.1497-1500. http://dx.doi.org/10.1109/FUZZY.2004.1375395
  10. Qin, AK, and Suganthan, PN . Kernel neural gas algorithms with application to cluster analysis., Proceedings of the 17th International Conference on Pattern Recognition, August 23–26, 2004, Cambridge, United Kingdom, Array, pp.617-620. http://dx.doi.org/10.1109/ICPR.2004.1333848
  11. Luxburg, U (2007). A tutorial on spectral clustering. Statistics and Computing. 17, 395-416. http://dx.doi.org/10.1007/s11222-007-9033-z
    CrossRef
  12. Peng, X, Chai, Y, Xu, L, and Man, X . Research on fault diagnosis of marine diesel engine based on grey relational analysis and kernel fuzzy C-means clustering., The Fifth International Conference on Intelligent Computation Technology and Automation, January 12–14, 2012, Zhangjiajie, China, Array, pp.283-286. http://dx.doi.org/10.1109/ICICTA.2012.78
  13. Liu, H, Wu, C, Wan, H, and Wang, H . Parameter identification based on stabilization diagram with kernel fuzzy clustering method., International Conference on Transportation, Mechanical, and Electrical Engineering, December 16–18, 2011, Changchun, China, Array, pp.1185-1188. http://dx.doi.org/10.1109/TMEE.2011.6199417
  14. Fan, J, and Wu, Y . Watermarking algorithm based on kernel fuzzy clustering and singular value decomposition in the complex wavelet transform domain., International Conference on Information Technology, Computer Engineering and Management Sciences, September 24–25, 2011, Nanjing, China, Array, pp.42-46. http://dx.doi.org/10.1109/ICM.2011.121
  15. Qi, LY, and Wang, KG . Kernel fuzzy clustering based classification of Ancient-Ceramic fragments., The 2nd IEEE International Conference on Information Management and Engineering, April 16–18, 2010, Chengdu, China, Array, pp.348-350. http://dx.doi.org/10.1109/ICIME.2010.5477818
  16. Tsai, DM, and Lin, CC (2011). Fuzzy C-means based clustering for linearly and nonlinearly separable data. Pattern Recognition. 44, 1750-1760. http://dx.doi.org/10.1016/j.patcog.2011.02.009
    CrossRef
  17. Mei, C, Xu, H, and Liu, J . A novel NN-based soft sensor based on modified fuzzy kernel clustering for fermentation process., International Conference on Intelligent Human-Machine Systems and Cybernetics, August 26–27, 2009, Hangzhou, China, Array, pp.113-117. http://dx.doi.org/10.1109/IHMSC.2009.153
  18. Liu, DH, Bian, JP, and Sun, XY . The study of fault diagnosis model of DGA for oil-immersed transformer based on fuzzy means Kernel clustering and SVM multi-class object simplified structure., Proceedings of the 7th International Conference on Machine Learning and Cybernetics, July 12–15, 2008, Kunming, China, Array, pp.1505-1509. http://dx.doi.org/10.1109/ICMLC.2008.4620644
  19. Qu, B, and Wang, H . Dynamic fuzzy kernel clustering analysis of enterprises independent: innovation capability based on artificial immunity., International Workshop on Modelling, Simulation and Optimization, December 27–28, 2009, Hong Kong, Array, pp.216-220. http://dx.doi.org/10.1109/WMSO.2008.102
  20. Li, X, and Bian, S . A kernel fuzzy clustering algorithm with spatial constraint based on improved expectation maximization for image segmentation., International Conference on Measuring Technology and Mechatronics Automation, April 11–12, 2009, Zhangjiajie, China, Array, pp.529-533. http://dx.doi.org/10.1109/ICMTMA.2009.59
  21. Liao, L, and Lin, TS . A fast spatial constrained fuzzy kernel clustering algorithm for MRI brain image segmentation., International Conference on Wavelet Analysis and Pattern Recognition, November 2–4, 2008, Beijing, China, Array, pp.82-87. http://dx.doi.org/10.1109/ICWAPR.2007.4420641
  22. Campello, RJGB (2007). A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment. Pattern Recognition Letters. 28, 833-841. http://dx.doi.org/10.1016/j.patrec.2006.11.010
    CrossRef
  23. Schölkopf, B, Smola, A, and Mller, KR (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation. 10, 1299-1319. http://dx.doi.org/10.1162/089976698300017467
    CrossRef
  24. Aronszajn, N (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society. 68, 337-404.
    CrossRef
  25. Zhang, D, and Chen, S . Fuzzy clustering using kernel method., International Conference on Control and Automation, June 19, 2002, Xiamen, China, Array, pp.162-163. http://dx.doi.org/10.1109/ICCA.2002.1229535
  26. Zhang, DQ, and Chen, SC . Kernel-based fuzzy and possibilistic c-means clustering., Proceedings of the International Conference on Artificial Neural Network, June 26–29, 2003, Istanbul, Turkey, pp.122-125.
  27. Hathaway, RJ, Huband, JM, and Bezdek, JC . A kernelized non-euclidean relational fuzzy c-means algorithm., IEEE International Conference on Fuzzy Systems, May 22–25, 2005, Reno, NV, Array, pp.414-419. http://dx.doi.org/10.1109/FUZZY.2005.1452429
  28. Hathaway, RJ, and Bezdek, JC (1994). Nerf c-means: non-Euclidean relational fuzzy clustering. Pattern Recognition. 27, 429-437. http://dx.doi.org/10.1016/0031-3203(94)90119-8
    CrossRef
  29. Bchir, O, and Frigui, H . Fuzzy relational kernel clustering with local scaling parameter learning., Proceedings of the 20th IEEE International Workshop on Machine Learning for Signal Processing, August 29-September 1, 2010, Kittila, Finland, Array, pp.289-294. http://dx.doi.org/10.1109/MLSP.2010.5589234
  30. Bchir, O, and Frigui, H . Fuzzy clustering with learnable cluster dependent kernels., IEEE International Conference on Fuzzy Systems, June 27–30, 2011, Taipei, Taiwan, Array, pp.2521-2527. http://dx.doi.org/10.1109/FUZZY.2011.6007411
  31. Lanckriet, GRG, Cristianini, N, Bartlett, P, El Ghaoui, L, and Jordan, MI (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research. 5, 27-72.
  32. Bach, FR, Lanckriet, GRG, and Jordan, MI . Multiple kernel learning, conic duality, and the SMO algorithm., Proceedings of the 21st International Conference on Machine Learning, July 4–8, 2004, Banff, Canada, pp.41-48.
  33. Rakotomamonjy, A, Bach, F, Canu, S, and Grandvalet, Y . More efficiency in multiple kernel learning., Proceedings of the 24th International Conference on Machine Learning, June 20–24, 2007, Corvalis, OR, Array, pp.775-782. http://dx.doi.org/10.1145/1273496.1273594
  34. Ye, J, Ji, S, and Chen, J . Learning the kernel matrix in discriminant analysis via quadratically constrained quadratic programming., Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 12–15, 2007, San Jose, CA, Array, pp.854-863. http://dx.doi.org/10.1145/1281192.1281283
  35. Gönen, M, and Alpaydin, E . Localized multiple kernel learning., Proceedings of the 25th International Conference on Machine Learning, July 5–9, 2008, Helsinki, Finland, pp.352-359.
  36. Zhao, B, Kwok, JT, and Zhang, C . Multiple kernel clustering., Proceedings of the 9th SIAM International Conference on Data Mining, Sparks, April 30-May 2, 2009, NV, pp.638-649.
  37. Huang, HC, Chuang, YY, and Chen, CS (2012). Multiple kernel fuzzy clustering. IEEE Transactions on Fuzzy Systems. 20, 120-134. http://dx.doi.org/10.1109/TFUZZ.2011.2170175
    CrossRef
  38. Baili, N, and Frigui, H . Fuzzy clustering with multiple kernels in feature space., IEEE International Conference on Fuzzy Systems, June 10–15, 2012, Brisbane, Australia, Array. http://dx.doi.org/10.1109/FUZZ-IEEE.2012.6251146
  39. Baili, N, and Frigui, H . Relational fuzzy clustering with multiple kernels., Proceedings of the 11th IEEE International Conference on Data Mining, December 11, 2011, Vancouver, Canada, Array, pp.488-495. http://dx.doi.org/10.1109/ICDMW.2011.145
  40. Hathaway, RJ, Davenport, JW, and Bezdek, JC (1989). Relational duals of the c-means clustering algorithms. Pattern Recognition. 22, 205-212.
    CrossRef
  41. Xing, EP, Ng, AY, Jordan, MI, and Russell, S . Distance metric learning, with application to clustering with side-information., The 17th Annual Conference on Neural Information Processing Systems, December 8–13, 2003, Vancouver & Whistler, Canada.
  42. Weinberger, K, Blitzer, J, and Saul, L . Distance metric learning for large margin nearest neighbor classification., The 19th Annual Conference on Neural Information Processing Systems, December 5–10, 2005, Vancouver & Whistler, Canada.
  43. Globerson, A, and Roweis, S . Metric learning by collapsing classes., The 19th Annual Conference on Neural Information Processing Systems, December 5–10, 2005, Vancouver & Whistler, Canada.
  44. Shalev-Shwartz, S, Singer, Y, and Ng, AY . Online and batch learning of pseudo-metrics., Proceedings of the 21st International Conference on Machine Learning, July 4–8, 2004, Banff, Canada, pp.94.
  45. Davis, JV, Kulis, B, Jain, P, Sra, S, and Dhillon, IS . Information-theoretic metric learning., Proceedings of the 24th International Conference on Machine Learning, June 20–24, 2007, Corvallis, OR, pp.209-216.
  46. Chatpatanasiri, R, Korsrilabutr, T, Tangchanachaianan, P, and Kijsirikul, B. On kernelization of supervised Mahalanobis distance learners. arXiv:0804.1441. http://arxivorg/abs/08041441
  47. Zhang, H, and Lu, J (2009). Semi-supervised fuzzy clustering: a kernel-based approach. Knowledge-Based Systems. 22, 477-481. http://dx.doi.org/10.1016/j.knosys.2009.06.009
    CrossRef
  48. Bchir, O, Frigui, H, and Ben Ismail, MM . Semisupervised clustering and local scale learning algorithm., World Congress on Computer and Information Technology, June 22–24, 2013, Sousse, Tunisia, article number 6618774. . article number 6618774. http://dx.doi.org/10.1109/WCCIT.2013.6618774
  49. Bchir, O, Frigui, H, and Ismail, MMB (2013). Semi-supervised fuzzy clustering with learnable cluster dependent kernels. International Journal on Artificial Intelligence Tools. 22. article number 1350013
    CrossRef
  50. Baili, N, and Frigui, H . Semi-supervised clustering with cluster-dependent multiple kernels., The 4th International Conference on Information, Intelligence, Systems and Applications Piraeus, July 10–12, 2013, Greece.
  51. Chiang, JH, and Hao, PY (2003). A new kernel-based fuzzy clustering approach: support vector clustering with cell growing. IEEE Transactions on Fuzzy Systems. 11, 518-527. http://dx.doi.org/10.1109/TFUZZ.2003.814839
    CrossRef
  52. Schölkopf, B, Platt, JC, Shawe-Taylor, J, Smola, AJ, and Williamson, RC (2001). Estimating the support of a highdimensional distribution. Neural Computation. 13, 1443-1471. http://dx.doi.org/10.1162/089976601750264965
    CrossRef
  53. Rhee, FCH, Choi, KS, and Choi, BI (2009). Kernel approach to possibilistic C-means clustering. International Journal of Intelligent Systems. 24, 272-292. http://dx.doi.org/10.1002/int.20336
    CrossRef
  54. Krishnapuram, R, and Keller, JM (1993). A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems. 1, 98-110. http://dx.doi.org/10.1109/91.227387
    CrossRef
  55. Shen, H, Yang, J, Wang, S, and Liu, X (2006). Attribute weighted mercer kernel based fuzzy clustering algorithm for general non-spherical datasets. Soft Computing. 10, 1061-1073. http://dx.doi.org/10.1007/s00500-005-0043-5
    CrossRef
  56. Frigui, H, and Nasraoui, O (2004). Unsupervised learning of prototypes and attribute weights. Pattern Recognition. 37, 567-581. http://dx.doi.org/10.1016/j.patcog.2003.08.002
    CrossRef
  57. Sato-Ilic, M, Ito, S, and Takahashi, S . Generalized kernel fuzzy clustering model., IEEE International Conference on Fuzzy Systems, August 20–24, 2009, Jeju, Korea, Array, pp.421-426. http://dx.doi.org/10.1109/FUZZY.2009.5276876
  58. Sato, M, and Sato, Y (1995). On a general fuzzy additive clustering model. Intelligent Automation & Soft Computing. 1, 439-448. http://dx.doi.org/10.1080/10798587.1995.10750648
    CrossRef
  59. Ferreira, MRP, and de Carvalho, FDAT . Kernel fuzzy clustering methods based on local adaptive distances., IEEE International Conference on Fuzzy Systems, June 10–15, 2012, Brisbane, Australia, Array, pp.1-8. http://dx.doi.org/10.1109/FUZZ-IEEE.2012.6251352
  60. Nguyen, DD, Ngo, LT, and Pham, LT . GMKIT2-FCM: a genetic-based improved multiple kernel interval type-2 fuzzy C-means clustering., IEEE International Conference on Cybernetics, June 13–15, 2013, Lausanne, Switzerland, Array, pp.104-109. http://dx.doi.org/10.1109/CYBConf.2013.6617457
  61. Abhishek, JA, and Rhee, FCH . Interval type-2 fuzzy C-means using multiple kernels., IEEE International Conference on Fuzzy Systems, July 7–10, 2013, Hyderabad, India, Array, pp.1-8. http://dx.doi.org/10.1109/FUZZ-IEEE.2013.6622306
  62. Raza, MA, and Rhee, FCH . Interval type-2 approach to kernel possibilistic C-means clustering., IEEE International Conference on Fuzzy Systems, June 10–15, 2012, Brisbane, Australia, Array, pp.1-7. http://dx.doi.org/10.1109/FUZZ-IEEE.2012.6251233
  63. Lin, K (2013). A novel evolutionary kernel intuitionistic fuzzy C-means clustering algorithm. IEEE Transactions on Fuzzy Systems. PP. article number TFS-2013-0121.R2
  64. Atanassov, KT (1986). Intuitionistic fuzzy sets. Fuzzy Sets and Systems. 20, 87-96. http://dx.doi.org/10.1016/S0165-0114(86)80034-3
    CrossRef
  65. Hathaway, RJ, and Bezdek, JC (2006). Extending fuzzy and probabilistic clustering to very large data sets. Computational Statistics & Data Analysis. 51, 215-234. http://dx.doi.org/10.1016/j.csda.2006.02.008
    CrossRef
  66. Havens, TC, Bezdek, JC, Leckie, C, Hall, LO, and Palaniswami, M (2012). Fuzzy c-Means algorithms for very large data. IEEE Transactions on Fuzzy Systems. 20, 1130-1146. http://dx.doi.org/10.1109/TFUZZ.2012.2201485
    CrossRef
  67. Havens, TC, Bezdek, JC, and Palaniswami, M (2012). Incremental kernel fuzzy c-means. Computational Intelligence, Studies in Computational Intelligence vol 399, Madani, K, Correia, AD, Rosa, A, and Filipe, J, ed. Heidelberg: Springer Berlin, pp. 3-18 http://dx.doi.org/10.1007/978-3-642-27534-01
  68. Baili, N, and Frigui, H (2014). Incremental fuzzy clustering with multiple kernels. ATSP under review.