On the Marginal Benefit of Active Learning: Does Self-Supervision Eat its Cake?. Chan, Y., Li, M., & Oymak, S. In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3455–3459, June, 2021. ISSN: 2379-190X
doi  abstract   bibtex   
Active learning is the set of techniques for intelligently labeling large unlabeled datasets to reduce the labeling effort. In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in required labeling in the standard machine learning benchmarks. A natural question is whether these paradigms can be unified to obtain superior results. To this aim, this paper provides a novel algorithmic framework integrating self-supervised pretraining, active learning, and consistency-regularized self-training. We conduct extensive experiments with our framework on CIFAR10 and CIFAR100 datasets. These experiments enable us to isolate and assess the benefits of individual components which are evaluated using state-of-the-art methods (e.g. Core-Set, VAAL, simCLR, FixMatch). Our experiments reveal two key insights: (i) Self-supervised pre-training significantly improves semi-supervised learning, especially in the few-label regime, (ii) The benefit of active learning is undermined and subsumed by S4L techniques. Specifically, we fail to observe any additional benefit of state-of-the-art active learning algorithms when combined with state-of-the-art S4L techniques.
@inproceedings{chan_marginal_2021,
	title = {On the {Marginal} {Benefit} of {Active} {Learning}: {Does} {Self}-{Supervision} {Eat} its {Cake}?},
	shorttitle = {On the {Marginal} {Benefit} of {Active} {Learning}},
	doi = {10.1109/ICASSP39728.2021.9414665},
	abstract = {Active learning is the set of techniques for intelligently labeling large unlabeled datasets to reduce the labeling effort. In parallel, recent developments in self-supervised and semi-supervised learning (S4L) provide powerful techniques, based on data-augmentation, contrastive learning, and self-training, that enable superior utilization of unlabeled data which led to a significant reduction in required labeling in the standard machine learning benchmarks. A natural question is whether these paradigms can be unified to obtain superior results. To this aim, this paper provides a novel algorithmic framework integrating self-supervised pretraining, active learning, and consistency-regularized self-training. We conduct extensive experiments with our framework on CIFAR10 and CIFAR100 datasets. These experiments enable us to isolate and assess the benefits of individual components which are evaluated using state-of-the-art methods (e.g. Core-Set, VAAL, simCLR, FixMatch). Our experiments reveal two key insights: (i) Self-supervised pre-training significantly improves semi-supervised learning, especially in the few-label regime, (ii) The benefit of active learning is undermined and subsumed by S4L techniques. Specifically, we fail to observe any additional benefit of state-of-the-art active learning algorithms when combined with state-of-the-art S4L techniques.},
	booktitle = {{ICASSP} 2021 - 2021 {IEEE} {International} {Conference} on {Acoustics}, {Speech} and {Signal} {Processing} ({ICASSP})},
	author = {Chan, Yao-Chun and Li, Mingchen and Oymak, Samet},
	month = jun,
	year = {2021},
	note = {ISSN: 2379-190X},
	keywords = {\#nosource, Benchmark testing, Conferences, Machine learning, Machine learning algorithms, Semisupervised learning, Signal processing, Signal processing algorithms, active learning, contrastive learning, self-supervision, semi-supervised learning, ⭐⭐⭐},
	pages = {3455--3459},
}

Downloads: 0