DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning. Tamkin, A., Liu, V., Lu, R., Fein, D., Schultz, C., & Goodman, N. arXiv:2111.12062 [cs], November, 2021. arXiv: 2111.12062
DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning [link]Paper  abstract   bibtex   
Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a DomainAgnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that significant progress is needed before self-supervised learning is an out-of-thebox solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.
@article{tamkin_dabs_2021,
	title = {{DABS}: {A} {Domain}-{Agnostic} {Benchmark} for {Self}-{Supervised} {Learning}},
	shorttitle = {{DABS}},
	url = {http://arxiv.org/abs/2111.12062},
	abstract = {Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, and speech processing. However, these algorithms are domain-specific, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientific, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a DomainAgnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that significant progress is needed before self-supervised learning is an out-of-thebox solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.},
	language = {en},
	urldate = {2021-12-08},
	journal = {arXiv:2111.12062 [cs]},
	author = {Tamkin, Alex and Liu, Vincent and Lu, Rongfei and Fein, Daniel and Schultz, Colin and Goodman, Noah},
	month = nov,
	year = {2021},
	note = {arXiv: 2111.12062},
	keywords = {Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning},
}

Downloads: 0