DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning

DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning. Tamkin, A., Liu, V., Lu, R., Fein, D., Schultz, C., & Goodman, N. arXiv:2111.12062 [cs], November, 2021. arXiv: 2111.12062

Paper abstract bibtex

Self-supervised learning algorithms, including BERT and SimCLR, have enabled signiﬁcant strides in ﬁelds like natural language processing, computer vision, and speech processing. However, these algorithms are domain-speciﬁc, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientiﬁc, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a DomainAgnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that signiﬁcant progress is needed before self-supervised learning is an out-of-thebox solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.

@article{tamkin_dabs_2021,
	title = {{DABS}: {A} {Domain}-{Agnostic} {Benchmark} for {Self}-{Supervised} {Learning}},
	shorttitle = {{DABS}},
	url = {http://arxiv.org/abs/2111.12062},
	abstract = {Self-supervised learning algorithms, including BERT and SimCLR, have enabled signiﬁcant strides in ﬁelds like natural language processing, computer vision, and speech processing. However, these algorithms are domain-speciﬁc, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientiﬁc, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a DomainAgnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that signiﬁcant progress is needed before self-supervised learning is an out-of-thebox solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.},
	language = {en},
	urldate = {2021-12-08},
	journal = {arXiv:2111.12062 [cs]},
	author = {Tamkin, Alex and Liu, Vincent and Lu, Rongfei and Fein, Daniel and Schultz, Colin and Goodman, Noah},
	month = nov,
	year = {2021},
	note = {arXiv: 2111.12062},
	keywords = {Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning},
}

Downloads: 0

{"_id":"3vNEakcdcgZjTiKaF","bibbaseid":"tamkin-liu-lu-fein-schultz-goodman-dabsadomainagnosticbenchmarkforselfsupervisedlearning-2021","author_short":["Tamkin, A.","Liu, V.","Lu, R.","Fein, D.","Schultz, C.","Goodman, N."],"bibdata":{"bibtype":"article","type":"article","title":"DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning","shorttitle":"DABS","url":"http://arxiv.org/abs/2111.12062","abstract":"Self-supervised learning algorithms, including BERT and SimCLR, have enabled signiﬁcant strides in ﬁelds like natural language processing, computer vision, and speech processing. However, these algorithms are domain-speciﬁc, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientiﬁc, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a DomainAgnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that signiﬁcant progress is needed before self-supervised learning is an out-of-thebox solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.","language":"en","urldate":"2021-12-08","journal":"arXiv:2111.12062 [cs]","author":[{"propositions":[],"lastnames":["Tamkin"],"firstnames":["Alex"],"suffixes":[]},{"propositions":[],"lastnames":["Liu"],"firstnames":["Vincent"],"suffixes":[]},{"propositions":[],"lastnames":["Lu"],"firstnames":["Rongfei"],"suffixes":[]},{"propositions":[],"lastnames":["Fein"],"firstnames":["Daniel"],"suffixes":[]},{"propositions":[],"lastnames":["Schultz"],"firstnames":["Colin"],"suffixes":[]},{"propositions":[],"lastnames":["Goodman"],"firstnames":["Noah"],"suffixes":[]}],"month":"November","year":"2021","note":"arXiv: 2111.12062","keywords":"Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning","bibtex":"@article{tamkin_dabs_2021,\n\ttitle = {{DABS}: {A} {Domain}-{Agnostic} {Benchmark} for {Self}-{Supervised} {Learning}},\n\tshorttitle = {{DABS}},\n\turl = {http://arxiv.org/abs/2111.12062},\n\tabstract = {Self-supervised learning algorithms, including BERT and SimCLR, have enabled signiﬁcant strides in ﬁelds like natural language processing, computer vision, and speech processing. However, these algorithms are domain-speciﬁc, meaning that new self-supervised learning algorithms must be developed for each new setting, including myriad healthcare, scientiﬁc, and multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a DomainAgnostic Benchmark for Self-supervised learning. To perform well on DABS, an algorithm is evaluated on seven diverse domains: natural images, multichannel sensor data, English text, speech recordings, multilingual text, chest x-rays, and images with text descriptions. Each domain contains an unlabeled dataset for pretraining; the model is then is scored based on its downstream performance on a set of labeled tasks in the domain. We also present e-Mix and ShED: two baseline domain-agnostic algorithms; their relatively modest performance demonstrates that signiﬁcant progress is needed before self-supervised learning is an out-of-thebox solution for arbitrary domains. Code for benchmark datasets and baseline algorithms is available at https://github.com/alextamkin/dabs.},\n\tlanguage = {en},\n\turldate = {2021-12-08},\n\tjournal = {arXiv:2111.12062 [cs]},\n\tauthor = {Tamkin, Alex and Liu, Vincent and Lu, Rongfei and Fein, Daniel and Schultz, Colin and Goodman, Noah},\n\tmonth = nov,\n\tyear = {2021},\n\tnote = {arXiv: 2111.12062},\n\tkeywords = {Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning},\n}\n\n","author_short":["Tamkin, A.","Liu, V.","Lu, R.","Fein, D.","Schultz, C.","Goodman, N."],"key":"tamkin_dabs_2021","id":"tamkin_dabs_2021","bibbaseid":"tamkin-liu-lu-fein-schultz-goodman-dabsadomainagnosticbenchmarkforselfsupervisedlearning-2021","role":"author","urls":{"Paper":"http://arxiv.org/abs/2111.12062"},"keyword":["Computer Science - Computation and Language","Computer Science - Computer Vision and Pattern Recognition","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://bibbase.org/zotero/thomas_phsz","dataSources":["gDLCbJw3P8BMbaqME"],"keywords":["computer science - computation and language","computer science - computer vision and pattern recognition","computer science - machine learning"],"search_terms":["dabs","domain","agnostic","benchmark","self","supervised","learning","tamkin","liu","lu","fein","schultz","goodman"],"title":"DABS: A Domain-Agnostic Benchmark for Self-Supervised Learning","year":2021}