Accounting for Language Effect in the Evaluation of Cross-lingual AMR Parsers. Wein, S. & Schneider, N. In Calzolari, N., Huang, C., Kim, H., Pustejovsky, J., Wanner, L., Choi, K., Ryu, P., Chen, H., Donatelli, L., Ji, H., Kurohashi, S., Paggio, P., Xue, N., Kim, S., Hahm, Y., He, Z., Lee, T. K., Santus, E., Bond, F., & Na, S., editors, Proceedings of the 29th International Conference on Computational Linguistics, pages 3824–3834, Gyeongju, Republic of Korea, October, 2022. International Committee on Computational Linguistics.
Accounting for Language Effect in the Evaluation of Cross-lingual AMR Parsers [link]Paper  abstract   bibtex   4 downloads  
Cross-lingual Abstract Meaning Representation (AMR) parsers are currently evaluated in comparison to gold English AMRs, despite parsing a language other than English, due to the lack of multilingual AMR evaluation metrics. This evaluation practice is problematic because of the established effect of source language on AMR structure. In this work, we present three multilingual adaptations of monolingual AMR evaluation metrics and compare the performance of these metrics to sentence-level human judgments. We then use our most highly correlated metric to evaluate the output of state-of-the-art cross-lingual AMR parsers, finding that Smatch may still be a useful metric in comparison to gold English AMRs, while our multilingual adaptation of S2match (XS2match) is best for comparison with gold in-language AMRs.
@inproceedings{wein_accounting_2022,
	address = {Gyeongju, Republic of Korea},
	title = {Accounting for {Language} {Effect} in the {Evaluation} of {Cross}-lingual {AMR} {Parsers}},
	url = {https://aclanthology.org/2022.coling-1.336},
	abstract = {Cross-lingual Abstract Meaning Representation (AMR) parsers are currently evaluated in comparison to gold English AMRs, despite parsing a language other than English, due to the lack of multilingual AMR evaluation metrics. This evaluation practice is problematic because of the established effect of source language on AMR structure. In this work, we present three multilingual adaptations of monolingual AMR evaluation metrics and compare the performance of these metrics to sentence-level human judgments. We then use our most highly correlated metric to evaluate the output of state-of-the-art cross-lingual AMR parsers, finding that Smatch may still be a useful metric in comparison to gold English AMRs, while our multilingual adaptation of S2match (XS2match) is best for comparison with gold in-language AMRs.},
	urldate = {2023-12-22},
	booktitle = {Proceedings of the 29th {International} {Conference} on {Computational} {Linguistics}},
	publisher = {International Committee on Computational Linguistics},
	author = {Wein, Shira and Schneider, Nathan},
	editor = {Calzolari, Nicoletta and Huang, Chu-Ren and Kim, Hansaem and Pustejovsky, James and Wanner, Leo and Choi, Key-Sun and Ryu, Pum-Mo and Chen, Hsin-Hsi and Donatelli, Lucia and Ji, Heng and Kurohashi, Sadao and Paggio, Patrizia and Xue, Nianwen and Kim, Seokhwan and Hahm, Younggyun and He, Zhong and Lee, Tony Kyungil and Santus, Enrico and Bond, Francis and Na, Seung-Hoon},
	month = oct,
	year = {2022},
	keywords = {PhdApp24-Georgetown, people-nert},
	pages = {3824--3834},
}

Downloads: 4