SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations. Singh, A., D'Arcy, M., Cohan, A., Downey, D., & Feldman, S. November, 2022. arXiv:2211.13308 [cs]

Paper abstract bibtex

Learned representations of scientific documents can serve as valuable input features for downstream tasks, without the need for further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations. It includes 25 challenging and realistic tasks, 11 of which are new, across four formats: classification, regression, ranking and search. We then use the benchmark to study and improve the generalization ability of scientific document representation models. We show how state-of-the-art models struggle to generalize across task formats, and that simple multi-task training fails to improve them. However, a new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance. We experiment with task-format-specific control codes and adapters in a multi-task setting and find that they outperform the existing single-embedding state-of-the-art by up to 1.5 points absolute.

@misc{singh_scirepeval_2022,
	title = {{SciRepEval}: {A} {Multi}-{Format} {Benchmark} for {Scientific} {Document} {Representations}},
	shorttitle = {{SciRepEval}},
	url = {http://arxiv.org/abs/2211.13308},
	abstract = {Learned representations of scientific documents can serve as valuable input features for downstream tasks, without the need for further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations. It includes 25 challenging and realistic tasks, 11 of which are new, across four formats: classification, regression, ranking and search. We then use the benchmark to study and improve the generalization ability of scientific document representation models. We show how state-of-the-art models struggle to generalize across task formats, and that simple multi-task training fails to improve them. However, a new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance. We experiment with task-format-specific control codes and adapters in a multi-task setting and find that they outperform the existing single-embedding state-of-the-art by up to 1.5 points absolute.},
	urldate = {2022-11-28},
	publisher = {arXiv},
	author = {Singh, Amanpreet and D'Arcy, Mike and Cohan, Arman and Downey, Doug and Feldman, Sergey},
	month = nov,
	year = {2022},
	note = {arXiv:2211.13308 [cs]},
	keywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning},
}

Downloads: 0

{"_id":"4GPoi4ruHB8bhBxM6","bibbaseid":"singh-darcy-cohan-downey-feldman-scirepevalamultiformatbenchmarkforscientificdocumentrepresentations-2022","author_short":["Singh, A.","D'Arcy, M.","Cohan, A.","Downey, D.","Feldman, S."],"bibdata":{"bibtype":"misc","type":"misc","title":"SciRepEval: A Multi-Format Benchmark for Scientific Document Representations","shorttitle":"SciRepEval","url":"http://arxiv.org/abs/2211.13308","abstract":"Learned representations of scientific documents can serve as valuable input features for downstream tasks, without the need for further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations. It includes 25 challenging and realistic tasks, 11 of which are new, across four formats: classification, regression, ranking and search. We then use the benchmark to study and improve the generalization ability of scientific document representation models. We show how state-of-the-art models struggle to generalize across task formats, and that simple multi-task training fails to improve them. However, a new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance. We experiment with task-format-specific control codes and adapters in a multi-task setting and find that they outperform the existing single-embedding state-of-the-art by up to 1.5 points absolute.","urldate":"2022-11-28","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Singh"],"firstnames":["Amanpreet"],"suffixes":[]},{"propositions":[],"lastnames":["D'Arcy"],"firstnames":["Mike"],"suffixes":[]},{"propositions":[],"lastnames":["Cohan"],"firstnames":["Arman"],"suffixes":[]},{"propositions":[],"lastnames":["Downey"],"firstnames":["Doug"],"suffixes":[]},{"propositions":[],"lastnames":["Feldman"],"firstnames":["Sergey"],"suffixes":[]}],"month":"November","year":"2022","note":"arXiv:2211.13308 [cs]","keywords":"Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning","bibtex":"@misc{singh_scirepeval_2022,\n\ttitle = {{SciRepEval}: {A} {Multi}-{Format} {Benchmark} for {Scientific} {Document} {Representations}},\n\tshorttitle = {{SciRepEval}},\n\turl = {http://arxiv.org/abs/2211.13308},\n\tabstract = {Learned representations of scientific documents can serve as valuable input features for downstream tasks, without the need for further fine-tuning. However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks. In response, we introduce SciRepEval, the first comprehensive benchmark for training and evaluating scientific document representations. It includes 25 challenging and realistic tasks, 11 of which are new, across four formats: classification, regression, ranking and search. We then use the benchmark to study and improve the generalization ability of scientific document representation models. We show how state-of-the-art models struggle to generalize across task formats, and that simple multi-task training fails to improve them. However, a new approach that learns multiple embeddings per document, each tailored to a different format, can improve performance. We experiment with task-format-specific control codes and adapters in a multi-task setting and find that they outperform the existing single-embedding state-of-the-art by up to 1.5 points absolute.},\n\turldate = {2022-11-28},\n\tpublisher = {arXiv},\n\tauthor = {Singh, Amanpreet and D'Arcy, Mike and Cohan, Arman and Downey, Doug and Feldman, Sergey},\n\tmonth = nov,\n\tyear = {2022},\n\tnote = {arXiv:2211.13308 [cs]},\n\tkeywords = {Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning},\n}\n\n","author_short":["Singh, A.","D'Arcy, M.","Cohan, A.","Downey, D.","Feldman, S."],"key":"singh_scirepeval_2022","id":"singh_scirepeval_2022","bibbaseid":"singh-darcy-cohan-downey-feldman-scirepevalamultiformatbenchmarkforscientificdocumentrepresentations-2022","role":"author","urls":{"Paper":"http://arxiv.org/abs/2211.13308"},"keyword":["Computer Science - Artificial Intelligence","Computer Science - Computation and Language","Computer Science - Information Retrieval","Computer Science - Machine Learning"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"misc","biburl":"https://bibbase.org/zotero/ifromm","dataSources":["N4kJAiLiJ7kxfNsoh"],"keywords":["computer science - artificial intelligence","computer science - computation and language","computer science - information retrieval","computer science - machine learning"],"search_terms":["scirepeval","multi","format","benchmark","scientific","document","representations","singh","d'arcy","cohan","downey","feldman"],"title":"SciRepEval: A Multi-Format Benchmark for Scientific Document Representations","year":2022}