ClusterFit: Improving Generalization of Visual Representations

ClusterFit: Improving Generalization of Visual Representations. Yan, X., Misra, I., Gupta, A., Ghadiyaram, D., & Mahajan, D. arXiv:1912.03330 [cs], December, 2019. arXiv: 1912.03330

Paper abstract bibtex

Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks. However, due to the lack of strong discriminative signals, these learned representations may overfit to the pre-training objective (e.g., hashtag prediction) and not generalize well to downstream tasks. In this work, we present a simple strategy - ClusterFit (CF) to improve the robustness of the visual representations learned during pre-training. Given a dataset, we (a) cluster its features extracted from a pre-trained network using k-means and (b) re-train a new network from scratch on this dataset using cluster assignments as pseudo-labels. We empirically show that clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same. Our approach is extensible to different pre-training frameworks – weak- and self-supervised, modalities – images and videos, and pre-training tasks – object and action classification. Through extensive transfer learning experiments on 11 different target datasets of varied vocabularies and granularities, we show that ClusterFit significantly improves the representation quality compared to the state-of-the-art large-scale (millions / billions) weakly-supervised image and video models and self-supervised image models.

@article{yan_clusterfit_2019,
	title = {{ClusterFit}: {Improving} {Generalization} of {Visual} {Representations}},
	shorttitle = {{ClusterFit}},
	url = {http://arxiv.org/abs/1912.03330},
	abstract = {Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks. However, due to the lack of strong discriminative signals, these learned representations may overfit to the pre-training objective (e.g., hashtag prediction) and not generalize well to downstream tasks. In this work, we present a simple strategy - ClusterFit (CF) to improve the robustness of the visual representations learned during pre-training. Given a dataset, we (a) cluster its features extracted from a pre-trained network using k-means and (b) re-train a new network from scratch on this dataset using cluster assignments as pseudo-labels. We empirically show that clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same. Our approach is extensible to different pre-training frameworks -- weak- and self-supervised, modalities -- images and videos, and pre-training tasks -- object and action classification. Through extensive transfer learning experiments on 11 different target datasets of varied vocabularies and granularities, we show that ClusterFit significantly improves the representation quality compared to the state-of-the-art large-scale (millions / billions) weakly-supervised image and video models and self-supervised image models.},
	urldate = {2022-03-02},
	journal = {arXiv:1912.03330 [cs]},
	author = {Yan, Xueting and Misra, Ishan and Gupta, Abhinav and Ghadiyaram, Deepti and Mahajan, Dhruv},
	month = dec,
	year = {2019},
	note = {arXiv: 1912.03330},
	keywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning},
}

Downloads: 0

{"_id":"gDomCHhkpugd6nqde","bibbaseid":"yan-misra-gupta-ghadiyaram-mahajan-clusterfitimprovinggeneralizationofvisualrepresentations-2019","author_short":["Yan, X.","Misra, I.","Gupta, A.","Ghadiyaram, D.","Mahajan, D."],"bibdata":{"bibtype":"article","type":"article","title":"ClusterFit: Improving Generalization of Visual Representations","shorttitle":"ClusterFit","url":"http://arxiv.org/abs/1912.03330","abstract":"Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks. However, due to the lack of strong discriminative signals, these learned representations may overfit to the pre-training objective (e.g., hashtag prediction) and not generalize well to downstream tasks. In this work, we present a simple strategy - ClusterFit (CF) to improve the robustness of the visual representations learned during pre-training. Given a dataset, we (a) cluster its features extracted from a pre-trained network using k-means and (b) re-train a new network from scratch on this dataset using cluster assignments as pseudo-labels. We empirically show that clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same. Our approach is extensible to different pre-training frameworks – weak- and self-supervised, modalities – images and videos, and pre-training tasks – object and action classification. Through extensive transfer learning experiments on 11 different target datasets of varied vocabularies and granularities, we show that ClusterFit significantly improves the representation quality compared to the state-of-the-art large-scale (millions / billions) weakly-supervised image and video models and self-supervised image models.","urldate":"2022-03-02","journal":"arXiv:1912.03330 [cs]","author":[{"propositions":[],"lastnames":["Yan"],"firstnames":["Xueting"],"suffixes":[]},{"propositions":[],"lastnames":["Misra"],"firstnames":["Ishan"],"suffixes":[]},{"propositions":[],"lastnames":["Gupta"],"firstnames":["Abhinav"],"suffixes":[]},{"propositions":[],"lastnames":["Ghadiyaram"],"firstnames":["Deepti"],"suffixes":[]},{"propositions":[],"lastnames":["Mahajan"],"firstnames":["Dhruv"],"suffixes":[]}],"month":"December","year":"2019","note":"arXiv: 1912.03330","keywords":"Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning","bibtex":"@article{yan_clusterfit_2019,\n\ttitle = {{ClusterFit}: {Improving} {Generalization} of {Visual} {Representations}},\n\tshorttitle = {{ClusterFit}},\n\turl = {http://arxiv.org/abs/1912.03330},\n\tabstract = {Pre-training convolutional neural networks with weakly-supervised and self-supervised strategies is becoming increasingly popular for several computer vision tasks. However, due to the lack of strong discriminative signals, these learned representations may overfit to the pre-training objective (e.g., hashtag prediction) and not generalize well to downstream tasks. In this work, we present a simple strategy - ClusterFit (CF) to improve the robustness of the visual representations learned during pre-training. Given a dataset, we (a) cluster its features extracted from a pre-trained network using k-means and (b) re-train a new network from scratch on this dataset using cluster assignments as pseudo-labels. We empirically show that clustering helps reduce the pre-training task-specific information from the extracted features thereby minimizing overfitting to the same. Our approach is extensible to different pre-training frameworks -- weak- and self-supervised, modalities -- images and videos, and pre-training tasks -- object and action classification. Through extensive transfer learning experiments on 11 different target datasets of varied vocabularies and granularities, we show that ClusterFit significantly improves the representation quality compared to the state-of-the-art large-scale (millions / billions) weakly-supervised image and video models and self-supervised image models.},\n\turldate = {2022-03-02},\n\tjournal = {arXiv:1912.03330 [cs]},\n\tauthor = {Yan, Xueting and Misra, Ishan and Gupta, Abhinav and Ghadiyaram, Deepti and Mahajan, Dhruv},\n\tmonth = dec,\n\tyear = {2019},\n\tnote = {arXiv: 1912.03330},\n\tkeywords = {Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning},\n}\n\n","author_short":["Yan, X.","Misra, I.","Gupta, A.","Ghadiyaram, D.","Mahajan, D."],"key":"yan_clusterfit_2019","id":"yan_clusterfit_2019","bibbaseid":"yan-misra-gupta-ghadiyaram-mahajan-clusterfitimprovinggeneralizationofvisualrepresentations-2019","role":"author","urls":{"Paper":"http://arxiv.org/abs/1912.03330"},"keyword":["Computer Science - Computer Vision and Pattern Recognition","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/zotero/mxmplx","dataSources":["aXmRAq63YsH7a3ufx"],"keywords":["computer science - computer vision and pattern recognition","computer science - machine learning"],"search_terms":["clusterfit","improving","generalization","visual","representations","yan","misra","gupta","ghadiyaram","mahajan"],"title":"ClusterFit: Improving Generalization of Visual Representations","year":2019}