Unsupervised Feature Selection for Efficient Exploration of High Dimensional Data

Unsupervised Feature Selection for Efficient Exploration of High Dimensional Data. Chakrabarti, A., Das, A., Cochez, M., & Quix, C. In Bellatreche, L., Dumas, M., Karras, P., & Matulevičius, R., editors, Advances in Databases and Information Systems, pages 183–197, Cham, August, 2021. Springer International Publishing.

Paper abstract bibtex 2 downloads

The exponential growth in the ability to generate, capture, and store high dimensional data has driven sophisticated machine learning applications. However, high dimensionality often poses a challenge for analysts to effectively identify and extract relevant features from datasets. Though many feature selection methods have shown good results in supervised learning, the major challenge lies in the area of unsupervised feature selection. For example, in the domain of data visualization, high-dimensional data is difficult to visualize and interpret due to the limitations of the screen, resulting in visual clutter. Visualizations are more interpretable when visualized in a low dimensional feature space. To mitigate these challenges, we present an approach to perform unsupervised feature clustering and selection using our novel graph clustering algorithm based on Clique-Cover Theory. We implemented our approach in an interactive data exploration tool which facilitates the exploration of relationships between features and generates interpretable visualizations.

@inproceedings{chakrabarti_unsupervised_2021,
	address = {Cham},
	title = {Unsupervised {Feature} {Selection} for {Efficient} {Exploration} of {High} {Dimensional} {Data}},
	isbn = {978-3-030-82472-3},
	url = {https://www.cochez.nl/papers/feature_selection_for_exploration.pdf},
	abstract = {The exponential growth in the ability to generate, capture, and store high dimensional data has driven sophisticated machine learning applications. However, high dimensionality often poses a challenge for analysts to effectively identify and extract relevant features from datasets. Though many feature selection methods have shown good results in supervised learning, the major challenge lies in the area of unsupervised feature selection. For example, in the domain of data visualization, high-dimensional data is difficult to visualize and interpret due to the limitations of the screen, resulting in visual clutter. Visualizations are more interpretable when visualized in a low dimensional feature space. To mitigate these challenges, we present an approach to perform unsupervised feature clustering and selection using our novel graph clustering algorithm based on Clique-Cover Theory. We implemented our approach in an interactive data exploration tool which facilitates the exploration of relationships between features and generates interpretable visualizations.},
	booktitle = {Advances in {Databases} and {Information} {Systems}},
	publisher = {Springer International Publishing},
	author = {Chakrabarti, Arnab and Das, Abhijeet and Cochez, Michael and Quix, Christoph},
	editor = {Bellatreche, Ladjel and Dumas, Marlon and Karras, Panagiotis and Matulevičius, Raimundas},
	month = aug,
	year = {2021},
	pages = {183--197},
}

Downloads: 2

{"_id":"F9KtzbyeyWqK5dqow","bibbaseid":"chakrabarti-das-cochez-quix-unsupervisedfeatureselectionforefficientexplorationofhighdimensionaldata-2021","author_short":["Chakrabarti, A.","Das, A.","Cochez, M.","Quix, C."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","address":"Cham","title":"Unsupervised Feature Selection for Efficient Exploration of High Dimensional Data","isbn":"978-3-030-82472-3","url":"https://www.cochez.nl/papers/feature_selection_for_exploration.pdf","abstract":"The exponential growth in the ability to generate, capture, and store high dimensional data has driven sophisticated machine learning applications. However, high dimensionality often poses a challenge for analysts to effectively identify and extract relevant features from datasets. Though many feature selection methods have shown good results in supervised learning, the major challenge lies in the area of unsupervised feature selection. For example, in the domain of data visualization, high-dimensional data is difficult to visualize and interpret due to the limitations of the screen, resulting in visual clutter. Visualizations are more interpretable when visualized in a low dimensional feature space. To mitigate these challenges, we present an approach to perform unsupervised feature clustering and selection using our novel graph clustering algorithm based on Clique-Cover Theory. We implemented our approach in an interactive data exploration tool which facilitates the exploration of relationships between features and generates interpretable visualizations.","booktitle":"Advances in Databases and Information Systems","publisher":"Springer International Publishing","author":[{"propositions":[],"lastnames":["Chakrabarti"],"firstnames":["Arnab"],"suffixes":[]},{"propositions":[],"lastnames":["Das"],"firstnames":["Abhijeet"],"suffixes":[]},{"propositions":[],"lastnames":["Cochez"],"firstnames":["Michael"],"suffixes":[]},{"propositions":[],"lastnames":["Quix"],"firstnames":["Christoph"],"suffixes":[]}],"editor":[{"propositions":[],"lastnames":["Bellatreche"],"firstnames":["Ladjel"],"suffixes":[]},{"propositions":[],"lastnames":["Dumas"],"firstnames":["Marlon"],"suffixes":[]},{"propositions":[],"lastnames":["Karras"],"firstnames":["Panagiotis"],"suffixes":[]},{"propositions":[],"lastnames":["Matulevičius"],"firstnames":["Raimundas"],"suffixes":[]}],"month":"August","year":"2021","pages":"183–197","bibtex":"@inproceedings{chakrabarti_unsupervised_2021,\n\taddress = {Cham},\n\ttitle = {Unsupervised {Feature} {Selection} for {Efficient} {Exploration} of {High} {Dimensional} {Data}},\n\tisbn = {978-3-030-82472-3},\n\turl = {https://www.cochez.nl/papers/feature_selection_for_exploration.pdf},\n\tabstract = {The exponential growth in the ability to generate, capture, and store high dimensional data has driven sophisticated machine learning applications. However, high dimensionality often poses a challenge for analysts to effectively identify and extract relevant features from datasets. Though many feature selection methods have shown good results in supervised learning, the major challenge lies in the area of unsupervised feature selection. For example, in the domain of data visualization, high-dimensional data is difficult to visualize and interpret due to the limitations of the screen, resulting in visual clutter. Visualizations are more interpretable when visualized in a low dimensional feature space. To mitigate these challenges, we present an approach to perform unsupervised feature clustering and selection using our novel graph clustering algorithm based on Clique-Cover Theory. We implemented our approach in an interactive data exploration tool which facilitates the exploration of relationships between features and generates interpretable visualizations.},\n\tbooktitle = {Advances in {Databases} and {Information} {Systems}},\n\tpublisher = {Springer International Publishing},\n\tauthor = {Chakrabarti, Arnab and Das, Abhijeet and Cochez, Michael and Quix, Christoph},\n\teditor = {Bellatreche, Ladjel and Dumas, Marlon and Karras, Panagiotis and Matulevičius, Raimundas},\n\tmonth = aug,\n\tyear = {2021},\n\tpages = {183--197},\n}\n\n","author_short":["Chakrabarti, A.","Das, A.","Cochez, M.","Quix, C."],"editor_short":["Bellatreche, L.","Dumas, M.","Karras, P.","Matulevičius, R."],"key":"chakrabarti_unsupervised_2021","id":"chakrabarti_unsupervised_2021","bibbaseid":"chakrabarti-das-cochez-quix-unsupervisedfeatureselectionforefficientexplorationofhighdimensionaldata-2021","role":"author","urls":{"Paper":"https://www.cochez.nl/papers/feature_selection_for_exploration.pdf"},"metadata":{"authorlinks":{}},"downloads":2},"bibtype":"inproceedings","biburl":"https://api.zotero.org/groups/4799514/items?key=euYq5cdKpGK6xljQmpW8AeSb&format=bibtex&limit=100","dataSources":["H6xuGqu5uQ6rXhdJ4","dJmTXpbSWWjnxatYT","qyx6bB8ujfH9zqTzu"],"keywords":[],"search_terms":["unsupervised","feature","selection","efficient","exploration","high","dimensional","data","chakrabarti","das","cochez","quix"],"title":"Unsupervised Feature Selection for Efficient Exploration of High Dimensional Data","year":2021,"downloads":2}