Enhancing K-Means Using Class Labels

Enhancing K-Means Using Class Labels. Peralta, B., Espinace, P., & Soto, A. Intelligent Data Analysis (IDA), 17(6):1023-1039, 2013.

Paper abstract bibtex 9 downloads

Clustering is a relevant problem in machine learning where the main goal is to locate meaningful partitions of unlabeled data. In the case of labeled data, a related problem is supervised clustering, where the objective is to locate class- uniform clusters. Most current approaches to supervised clustering optimize a score related to cluster purity with respect to class labels. In particular, we present Labeled K-Means (LK-Means), an algorithm for supervised clustering based on a variant of K-Means that incorporates information about class labels. LK-Means replaces the classical cost function of K-Means by a convex combination of the joint cost associated to: (i) A discriminative score based on class labels, and (ii) A generative score based on a traditional metric for unsupervised clustering. We test the performance of LK-Means using standard real datasets and an application for object recognition. Moreover, we also compare its performance against classical K-Means and a popular K-Medoids-based supervised clustering method. Our experiments show that, in most cases, LK-Means outperforms the alternative techniques by a considerable margin. Furthermore, LK-Means presents execution times considerably lower than the alternative supervised clustering method under evaluation.

@Article{	  peralta:etal:2013,
  author	= {B. Peralta and P. Espinace and A. Soto},
  title		= {Enhancing K-Means Using Class Labels},
  journal	= {Intelligent Data Analysis (IDA)},
  volume	= {17},
  number	= {6},
  pages		= {1023-1039},
  year		= {2013},
  abstract	= {Clustering is a relevant problem in machine learning where
		  the main goal is to locate meaningful partitions of
		  unlabeled data. In the case of labeled data, a related
		  problem is supervised clustering, where the objective is to
		  locate class- uniform clusters. Most current approaches to
		  supervised clustering optimize a score related to cluster
		  purity with respect to class labels. In particular, we
		  present Labeled K-Means (LK-Means), an algorithm for
		  supervised clustering based on a variant of K-Means that
		  incorporates information about class labels. LK-Means
		  replaces the classical cost function of K-Means by a convex
		  combination of the joint cost associated to: (i) A
		  discriminative score based on class labels, and (ii) A
		  generative score based on a traditional metric for
		  unsupervised clustering. We test the performance of
		  LK-Means using standard real datasets and an application
		  for object recognition. Moreover, we also compare its
		  performance against classical K-Means and a popular
		  K-Medoids-based supervised clustering method. Our
		  experiments show that, in most cases, LK-Means outperforms
		  the alternative techniques by a considerable margin.
		  Furthermore, LK-Means presents execution times considerably
		  lower than the alternative supervised clustering method
		  under evaluation. },
  url		= {http://saturno.ing.puc.cl/media/papers_alvaro/supClustering.pdf}
}

Downloads: 9