Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters. Li, M. J., Ng, M. K., Cheung, Y., & Huang, J. Z. IEEE Transactions on Knowledge and Data Engineering, 20(11):1519–1534, November, 2008. Conference Name: IEEE Transactions on Knowledge and Data Engineering
doi  abstract   bibtex   
In this paper, we present an agglomerative fuzzy K-means clustering algorithm for numerical data, an extension to the standard fuzzy K-means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the initial cluster centers. The new algorithm can produce more consistent clustering results from different sets of initial clusters centers. Combined with cluster validation techniques, the new algorithm can determine the number of clusters in a data set, which is a well known problem in $k$-means clustering. Experimental results on synthetic data sets (2 to 5 dimensions, 500 to 5000 objects and 3 to 7 clusters), the BIRCH two-dimensional data set of 20000 objects and 100 clusters, and the WINE data set of 178 objects, 17 dimensions and 3 clusters from UCI, have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.
@article{li_agglomerative_2008,
	title = {Agglomerative {Fuzzy} {K}-{Means} {Clustering} {Algorithm} with {Selection} of {Number} of {Clusters}},
	volume = {20},
	issn = {1558-2191},
	doi = {10.1109/TKDE.2008.88},
	abstract = {In this paper, we present an agglomerative fuzzy K-means clustering algorithm for numerical data, an extension to the standard fuzzy K-means algorithm by introducing a penalty term to the objective function to make the clustering process not sensitive to the initial cluster centers. The new algorithm can produce more consistent clustering results from different sets of initial clusters centers. Combined with cluster validation techniques, the new algorithm can determine the number of clusters in a data set, which is a well known problem in \$k\$-means clustering. Experimental results on synthetic data sets (2 to 5 dimensions, 500 to 5000 objects and 3 to 7 clusters), the BIRCH two-dimensional data set of 20000 objects and 100 clusters, and the WINE data set of 178 objects, 17 dimensions and 3 clusters from UCI, have demonstrated the effectiveness of the new algorithm in producing consistent clustering results and determining the correct number of clusters in different data sets, some with overlapping inherent clusters.},
	number = {11},
	journal = {IEEE Transactions on Knowledge and Data Engineering},
	author = {Li, Mark Junjie and Ng, Michael K. and Cheung, Yiu-ming and Huang, Joshua Zhexue},
	month = nov,
	year = {2008},
	note = {Conference Name: IEEE Transactions on Knowledge and Data Engineering},
	keywords = {Algorithm design and analysis, Application software, Clustering, Clustering algorithms, Clustering methods, Computer vision, Data mining, Genetic algorithms, Minimization methods, Mining methods and algorithms, Optimization methods, Pattern recognition, Statistical analysis},
	pages = {1519--1534},
}

Downloads: 0