Targeted Syntactic Evaluation of Language Models

Targeted Syntactic Evaluation of Language Models. Marvin, R. & Linzen, T. August, 2018. 🏷️ /unread、Computer Science - Computation and Language

Paper doi abstract bibtex

We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model. 【摘要翻译】我们提出了一个用于评估语言模型预测语法性的数据集。我们自动构建了大量差异极小的英语句对，每个句对由一个语法句和一个非语法句组成。这些句子对代表了结构敏感现象的不同变化：主谓一致、反身拟词和负极性项目。我们希望语言模型赋予语法句子的概率高于非语法句子。在使用该数据集进行的实验中，LSTM 语言模型在许多结构上表现不佳。使用句法目标（CCG 超级标记）进行多任务训练提高了 LSTM 的准确性，但其表现与在线招募的人类参与者的准确性之间仍存在很大差距。这表明，在语言模型中捕捉语法方面，LSTM 还有很大的改进空间。

@misc{marvin2018,
	title = {Targeted {Syntactic} {Evaluation} of {Language} {Models}},
	shorttitle = {语言模型的目标句法评估},
	url = {http://arxiv.org/abs/1808.09031},
	doi = {10.48550/arXiv.1808.09031},
	abstract = {We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.

【摘要翻译】我们提出了一个用于评估语言模型预测语法性的数据集。我们自动构建了大量差异极小的英语句对，每个句对由一个语法句和一个非语法句组成。这些句子对代表了结构敏感现象的不同变化：主谓一致、反身拟词和负极性项目。我们希望语言模型赋予语法句子的概率高于非语法句子。在使用该数据集进行的实验中，LSTM 语言模型在许多结构上表现不佳。使用句法目标（CCG 超级标记）进行多任务训练提高了 LSTM 的准确性，但其表现与在线招募的人类参与者的准确性之间仍存在很大差距。这表明，在语言模型中捕捉语法方面，LSTM 还有很大的改进空间。},
	language = {en},
	urldate = {2022-09-02},
	publisher = {arXiv},
	author = {Marvin, Rebecca and Linzen, Tal},
	month = aug,
	year = {2018},
	note = {🏷️ /unread、Computer Science - Computation and Language},
	keywords = {/unread, Computer Science - Computation and Language},
}

Downloads: 0

{"_id":"gdhfyc7apq3SZyWQQ","bibbaseid":"marvin-linzen-targetedsyntacticevaluationoflanguagemodels-2018","authorIDs":[],"author_short":["Marvin, R.","Linzen, T."],"bibdata":{"bibtype":"misc","type":"misc","title":"Targeted Syntactic Evaluation of Language Models","shorttitle":"语言模型的目标句法评估","url":"http://arxiv.org/abs/1808.09031","doi":"10.48550/arXiv.1808.09031","abstract":"We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model. 【摘要翻译】我们提出了一个用于评估语言模型预测语法性的数据集。我们自动构建了大量差异极小的英语句对，每个句对由一个语法句和一个非语法句组成。这些句子对代表了结构敏感现象的不同变化：主谓一致、反身拟词和负极性项目。我们希望语言模型赋予语法句子的概率高于非语法句子。在使用该数据集进行的实验中，LSTM 语言模型在许多结构上表现不佳。使用句法目标（CCG 超级标记）进行多任务训练提高了 LSTM 的准确性，但其表现与在线招募的人类参与者的准确性之间仍存在很大差距。这表明，在语言模型中捕捉语法方面，LSTM 还有很大的改进空间。","language":"en","urldate":"2022-09-02","publisher":"arXiv","author":[{"propositions":[],"lastnames":["Marvin"],"firstnames":["Rebecca"],"suffixes":[]},{"propositions":[],"lastnames":["Linzen"],"firstnames":["Tal"],"suffixes":[]}],"month":"August","year":"2018","note":"🏷️ /unread、Computer Science - Computation and Language","keywords":"/unread, Computer Science - Computation and Language","bibtex":"@misc{marvin2018,\n\ttitle = {Targeted {Syntactic} {Evaluation} of {Language} {Models}},\n\tshorttitle = {语言模型的目标句法评估},\n\turl = {http://arxiv.org/abs/1808.09031},\n\tdoi = {10.48550/arXiv.1808.09031},\n\tabstract = {We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.\n\n【摘要翻译】我们提出了一个用于评估语言模型预测语法性的数据集。我们自动构建了大量差异极小的英语句对，每个句对由一个语法句和一个非语法句组成。这些句子对代表了结构敏感现象的不同变化：主谓一致、反身拟词和负极性项目。我们希望语言模型赋予语法句子的概率高于非语法句子。在使用该数据集进行的实验中，LSTM 语言模型在许多结构上表现不佳。使用句法目标（CCG 超级标记）进行多任务训练提高了 LSTM 的准确性，但其表现与在线招募的人类参与者的准确性之间仍存在很大差距。这表明，在语言模型中捕捉语法方面，LSTM 还有很大的改进空间。},\n\tlanguage = {en},\n\turldate = {2022-09-02},\n\tpublisher = {arXiv},\n\tauthor = {Marvin, Rebecca and Linzen, Tal},\n\tmonth = aug,\n\tyear = {2018},\n\tnote = {🏷️ /unread、Computer Science - Computation and Language},\n\tkeywords = {/unread, Computer Science - Computation and Language},\n}\n\n","author_short":["Marvin, R.","Linzen, T."],"key":"marvin2018","id":"marvin2018","bibbaseid":"marvin-linzen-targetedsyntacticevaluationoflanguagemodels-2018","role":"author","urls":{"Paper":"http://arxiv.org/abs/1808.09031"},"keyword":["/unread","Computer Science - Computation and Language"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"misc","biburl":"https://api.zotero.org/groups/2386895/collections/7PPRTB2H/items?format=bibtex&limit=100","creationDate":"2020-01-05T04:04:02.887Z","downloads":0,"keywords":["/unread","computer science - computation and language"],"search_terms":["targeted","syntactic","evaluation","language","models","marvin","linzen"],"title":"Targeted Syntactic Evaluation of Language Models","year":2018,"dataSources":["okYcdTpf4JJ2zkj7A","Qf8nXnKaPEZYBmSLn","znj7izS5PeehdLR3G","u8q5uny4m5jJL9RcX","aGtG992oMsrqA3Aas"]}