Evaluating Social Intelligence in NLP Models with Theory of Mind Stories (ToMS): A New Challenging Benchmark

Evaluating Social Intelligence in NLP Models with Theory of Mind Stories (ToMS): A New Challenging Benchmark. Chen, Y. Multidisciplinary Undergraduate Research Conference, March, 2021.
abstract bibtex

Neural natural language processing (NLP) models, mostly Transformers, recently achieved higher-than-human level performance on multiple tasks such as reading comprehension and translation. However, the existing NLP benchmarks rarely test for social reasoning, which is a significant facet of human abilities. To quantify the level of social intelligence in NLP models, we adapted and expanded psychology batteries to construct the Theory of Mind Stories (ToMS) benchmark. In contrast to normal reading comprehension, ToMS poses a unique challenge of inferring the unobservable mental state in humans. We evaluated several state-of-the-art NLP models and reported results. We also made this benchmark open-source with the hope to assist future research and development of human-centric NLP models.

@article{chen_evaluating_2021,
	title = {Evaluating {Social} {Intelligence} in {NLP} {Models} with {Theory} of {Mind} {Stories} ({ToMS}): {A} {New} {Challenging} {Benchmark}},
	abstract = {Neural natural language processing (NLP) models, mostly Transformers, recently achieved higher-than-human level performance on multiple tasks such as reading comprehension and translation. However, the existing NLP benchmarks rarely test for social reasoning, which is a significant facet of human abilities. To quantify the level of social intelligence in NLP models, we adapted and expanded psychology batteries to construct the Theory of Mind Stories (ToMS) benchmark. In contrast to normal reading comprehension, ToMS poses a unique challenge of inferring the unobservable mental state in humans.  We evaluated several state-of-the-art NLP models and reported results.  We also made this benchmark open-source with the hope to assist future research and development of human-centric NLP models.},
	journal = {Multidisciplinary Undergraduate Research Conference},
	author = {Chen, Yifu},
	month = mar,
	year = {2021},
}

Downloads: 0

{"_id":"KBwXW7NK98eywNpKm","bibbaseid":"chen-evaluatingsocialintelligenceinnlpmodelswiththeoryofmindstoriestomsanewchallengingbenchmark-2021","author_short":["Chen, Y."],"bibdata":{"bibtype":"article","type":"article","title":"Evaluating Social Intelligence in NLP Models with Theory of Mind Stories (ToMS): A New Challenging Benchmark","abstract":"Neural natural language processing (NLP) models, mostly Transformers, recently achieved higher-than-human level performance on multiple tasks such as reading comprehension and translation. However, the existing NLP benchmarks rarely test for social reasoning, which is a significant facet of human abilities. To quantify the level of social intelligence in NLP models, we adapted and expanded psychology batteries to construct the Theory of Mind Stories (ToMS) benchmark. In contrast to normal reading comprehension, ToMS poses a unique challenge of inferring the unobservable mental state in humans. We evaluated several state-of-the-art NLP models and reported results. We also made this benchmark open-source with the hope to assist future research and development of human-centric NLP models.","journal":"Multidisciplinary Undergraduate Research Conference","author":[{"propositions":[],"lastnames":["Chen"],"firstnames":["Yifu"],"suffixes":[]}],"month":"March","year":"2021","bibtex":"@article{chen_evaluating_2021,\n\ttitle = {Evaluating {Social} {Intelligence} in {NLP} {Models} with {Theory} of {Mind} {Stories} ({ToMS}): {A} {New} {Challenging} {Benchmark}},\n\tabstract = {Neural natural language processing (NLP) models, mostly Transformers, recently achieved higher-than-human level performance on multiple tasks such as reading comprehension and translation. However, the existing NLP benchmarks rarely test for social reasoning, which is a significant facet of human abilities. To quantify the level of social intelligence in NLP models, we adapted and expanded psychology batteries to construct the Theory of Mind Stories (ToMS) benchmark. In contrast to normal reading comprehension, ToMS poses a unique challenge of inferring the unobservable mental state in humans. We evaluated several state-of-the-art NLP models and reported results. We also made this benchmark open-source with the hope to assist future research and development of human-centric NLP models.},\n\tjournal = {Multidisciplinary Undergraduate Research Conference},\n\tauthor = {Chen, Yifu},\n\tmonth = mar,\n\tyear = {2021},\n}\n\n","author_short":["Chen, Y."],"key":"chen_evaluating_2021","id":"chen_evaluating_2021","bibbaseid":"chen-evaluatingsocialintelligenceinnlpmodelswiththeoryofmindstoriestomsanewchallengingbenchmark-2021","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"article","biburl":"https://api.zotero.org/users/8295825/collections/MG3XR5KT/items?key=Y7rfld8S8JPx6dVsAPoXiFrY&format=bibtex&limit=100","dataSources":["AH9hpyNrknNTSKxDe","jmbSxJ9M6o4AQrcDM","DE3TKJiMnMnFgFjjE"],"keywords":[],"search_terms":["evaluating","social","intelligence","nlp","models","theory","mind","stories","toms","new","challenging","benchmark","chen"],"title":"Evaluating Social Intelligence in NLP Models with Theory of Mind Stories (ToMS): A New Challenging Benchmark","year":2021,"downloads":4}