Evaluating Social Intelligence in NLP Models with Theory of Mind Stories (ToMS): A New Challenging Benchmark. Chen, Y. Multidisciplinary Undergraduate Research Conference, March, 2021.
abstract   bibtex   
Neural natural language processing (NLP) models, mostly Transformers, recently achieved higher-than-human level performance on multiple tasks such as reading comprehension and translation. However, the existing NLP benchmarks rarely test for social reasoning, which is a significant facet of human abilities. To quantify the level of social intelligence in NLP models, we adapted and expanded psychology batteries to construct the Theory of Mind Stories (ToMS) benchmark. In contrast to normal reading comprehension, ToMS poses a unique challenge of inferring the unobservable mental state in humans. We evaluated several state-of-the-art NLP models and reported results. We also made this benchmark open-source with the hope to assist future research and development of human-centric NLP models.
@article{chen_evaluating_2021,
	title = {Evaluating {Social} {Intelligence} in {NLP} {Models} with {Theory} of {Mind} {Stories} ({ToMS}): {A} {New} {Challenging} {Benchmark}},
	abstract = {Neural natural language processing (NLP) models, mostly Transformers, recently achieved higher-than-human level performance on multiple tasks such as reading comprehension and translation. However, the existing NLP benchmarks rarely test for social reasoning, which is a significant facet of human abilities. To quantify the level of social intelligence in NLP models, we adapted and expanded psychology batteries to construct the Theory of Mind Stories (ToMS) benchmark. In contrast to normal reading comprehension, ToMS poses a unique challenge of inferring the unobservable mental state in humans.  We evaluated several state-of-the-art NLP models and reported results.  We also made this benchmark open-source with the hope to assist future research and development of human-centric NLP models.},
	journal = {Multidisciplinary Undergraduate Research Conference},
	author = {Chen, Yifu},
	month = mar,
	year = {2021},
}

Downloads: 0