Targeted Syntactic Evaluation of Language Models. Marvin, R. & Linzen, T. August, 2018. 🏷️ /unread、Computer Science - Computation and Language
Targeted Syntactic Evaluation of Language Models [link]Paper  doi  abstract   bibtex   
We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model. 【摘要翻译】我们提出了一个用于评估语言模型预测语法性的数据集。我们自动构建了大量差异极小的英语句对,每个句对由一个语法句和一个非语法句组成。这些句子对代表了结构敏感现象的不同变化:主谓一致、反身拟词和负极性项目。我们希望语言模型赋予语法句子的概率高于非语法句子。在使用该数据集进行的实验中,LSTM 语言模型在许多结构上表现不佳。使用句法目标(CCG 超级标记)进行多任务训练提高了 LSTM 的准确性,但其表现与在线招募的人类参与者的准确性之间仍存在很大差距。这表明,在语言模型中捕捉语法方面,LSTM 还有很大的改进空间。
@misc{marvin2018,
	title = {Targeted {Syntactic} {Evaluation} of {Language} {Models}},
	shorttitle = {语言模型的目标句法评估},
	url = {http://arxiv.org/abs/1808.09031},
	doi = {10.48550/arXiv.1808.09031},
	abstract = {We present a dataset for evaluating the grammaticality of the predictions of a language model. We automatically construct a large number of minimally different pairs of English sentences, each consisting of a grammatical and an ungrammatical sentence. The sentence pairs represent different variations of structure-sensitive phenomena: subject-verb agreement, reflexive anaphora and negative polarity items. We expect a language model to assign a higher probability to the grammatical sentence than the ungrammatical one. In an experiment using this data set, an LSTM language model performed poorly on many of the constructions. Multi-task training with a syntactic objective (CCG supertagging) improved the LSTM's accuracy, but a large gap remained between its performance and the accuracy of human participants recruited online. This suggests that there is considerable room for improvement over LSTMs in capturing syntax in a language model.

【摘要翻译】我们提出了一个用于评估语言模型预测语法性的数据集。我们自动构建了大量差异极小的英语句对,每个句对由一个语法句和一个非语法句组成。这些句子对代表了结构敏感现象的不同变化:主谓一致、反身拟词和负极性项目。我们希望语言模型赋予语法句子的概率高于非语法句子。在使用该数据集进行的实验中,LSTM 语言模型在许多结构上表现不佳。使用句法目标(CCG 超级标记)进行多任务训练提高了 LSTM 的准确性,但其表现与在线招募的人类参与者的准确性之间仍存在很大差距。这表明,在语言模型中捕捉语法方面,LSTM 还有很大的改进空间。},
	language = {en},
	urldate = {2022-09-02},
	publisher = {arXiv},
	author = {Marvin, Rebecca and Linzen, Tal},
	month = aug,
	year = {2018},
	note = {🏷️ /unread、Computer Science - Computation and Language},
	keywords = {/unread, Computer Science - Computation and Language},
}

Downloads: 0