Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds. Lalor, J. P., Wu, H., & Yu, H. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4240–4250, Hong Kong, China, November, 2019. Association for Computational Linguistics. NIHMSID: NIHMS1059054
Learning Latent Parameters without Human Response Patterns: Item Response Theory with Artificial Crowds [link]Paper  doi  abstract   bibtex   
Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned using human response pattern (RP) data, presenting a significant bottleneck for large data sets like those required for training deep neural networks (DNNs). In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models. We demonstrate the effectiveness of learning IRT models using DNN-generated data through quantitative and qualitative analyses for two NLP tasks. Parameters learned from human and machine RPs for natural language inference and sentiment analysis exhibit medium to large positive correlations. We demonstrate a use-case for latent difficulty item parameters, namely training set filtering, and show that using difficulty to sample training data outperforms baseline methods. Finally, we highlight cases where human expectation about item difficulty does not match difficulty as estimated from the machine RPs.
@inproceedings{lalor_learning_2019,
	address = {Hong Kong, China},
	title = {Learning {Latent} {Parameters} without {Human} {Response} {Patterns}: {Item} {Response} {Theory} with {Artificial} {Crowds}},
	shorttitle = {Learning {Latent} {Parameters} without {Human} {Response} {Patterns}},
	url = {https://www.aclweb.org/anthology/D19-1434},
	doi = {10.18653/v1/D19-1434},
	abstract = {Incorporating Item Response Theory (IRT) into NLP tasks can provide valuable information about model performance and behavior. Traditionally, IRT models are learned using human response pattern (RP) data, presenting a significant bottleneck for large data sets like those required for training deep neural networks (DNNs). In this work we propose learning IRT models using RPs generated from artificial crowds of DNN models. We demonstrate the effectiveness of learning IRT models using DNN-generated data through quantitative and qualitative analyses for two NLP tasks. Parameters learned from human and machine RPs for natural language inference and sentiment analysis exhibit medium to large positive correlations. We demonstrate a use-case for latent difficulty item parameters, namely training set filtering, and show that using difficulty to sample training data outperforms baseline methods. Finally, we highlight cases where human expectation about item difficulty does not match difficulty as estimated from the machine RPs.},
	urldate = {2019-11-11},
	booktitle = {Proceedings of the 2019 {Conference} on {Empirical} {Methods} in {Natural} {Language} {Processing} and the 9th {International} {Joint} {Conference} on {Natural} {Language} {Processing} ({EMNLP}-{IJCNLP})},
	publisher = {Association for Computational Linguistics},
	author = {Lalor, John P. and Wu, Hao and Yu, Hong},
	month = nov,
	year = {2019},
	pmcid = {PMC6892593},
	pmid = {31803865},
	note = {NIHMSID: NIHMS1059054},
	pages = {4240--4250},
}

Downloads: 0