Targeted Syntactic Evaluation of Language Models. Marvin, R. & Linzen, T. August, 2018. arXiv:1808.09031 [cs]
Paper doi abstract bibtex We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.
@misc{marvin_targeted_2018,
title = {Targeted {Syntactic} {Evaluation} of {Language} {Models}},
url = {http://arxiv.org/abs/1808.09031},
doi = {10.48550/arXiv.1808.09031},
abstract = {We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.},
urldate = {2022-09-02},
publisher = {arXiv},
author = {Marvin, Rebecca and Linzen, Tal},
month = aug,
year = {2018},
note = {arXiv:1808.09031 [cs]},
keywords = {Computer Science - Computation and Language},
}
Downloads: 0
{"_id":"gdhfyc7apq3SZyWQQ","bibbaseid":"marvin-linzen-targetedsyntacticevaluationoflanguagemodels-2018","authorIDs":[],"author_short":["Marvin, R.","Linzen, T."],"bibdata":{"bibtype":"misc","type":"misc","title":"Targeted Syntactic Evaluation of Language Models","url":"http://arxiv.org/abs/1808.09031","doi":"10.48550/arXiv.1808.09031","abstract":"We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.","urldate":"2022-09-02","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Marvin"],"firstnames":["Rebecca"],"suffixes":[]},{"propositions":[],"lastnames":["Linzen"],"firstnames":["Tal"],"suffixes":[]}],"month":"August","year":"2018","note":"arXiv:1808.09031 [cs]","keywords":"Computer Science - Computation and Language","bibtex":"@misc{marvin_targeted_2018,\n\ttitle = {Targeted {Syntactic} {Evaluation} of {Language} {Models}},\n\turl = {http://arxiv.org/abs/1808.09031},\n\tdoi = {10.48550/arXiv.1808.09031},\n\tabstract = {We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.},\n\turldate = {2022-09-02},\n\tpublisher = {arXiv},\n\tauthor = {Marvin, Rebecca and Linzen, Tal},\n\tmonth = aug,\n\tyear = {2018},\n\tnote = {arXiv:1808.09031 [cs]},\n\tkeywords = {Computer Science - Computation and Language},\n}\n\n","author_short":["Marvin, R.","Linzen, T."],"key":"marvin_targeted_2018","id":"marvin_targeted_2018","bibbaseid":"marvin-linzen-targetedsyntacticevaluationoflanguagemodels-2018","role":"author","urls":{"Paper":"http://arxiv.org/abs/1808.09031"},"keyword":["Computer Science - Computation and Language"],"metadata":{"authorlinks":{}}},"bibtype":"misc","biburl":"https://api.zotero.org/groups/2386895/collections/7PPRTB2H/items?format=bibtex&limit=100","creationDate":"2020-01-05T04:04:02.887Z","downloads":0,"keywords":["computer science - computation and language"],"search_terms":["targeted","syntactic","evaluation","language","models","marvin","linzen"],"title":"Targeted Syntactic Evaluation of Language Models","year":2018,"dataSources":["okYcdTpf4JJ2zkj7A","Qf8nXnKaPEZYBmSLn","u8q5uny4m5jJL9RcX"]}