\n \n \n
\n
\n\n \n \n \n \n \n \n Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility.\n \n \n \n \n\n\n \n Zhu, X.; Zhou, Z.; Charlow, S.; and Frank, R.\n\n\n \n\n\n\n In Che, W.; Nabende, J.; Shutova, E.; and Pilehvar, M. T., editor(s),
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8824–8842, Vienna, Austria, July 2025. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{zhu-etal-2025-meaning,\n title = "Meaning Beyond Truth Conditions: Evaluating Discourse Level Understanding via Anaphora Accessibility",\n author = "Zhu, Xiaomeng and\n Zhou, Zhenghao and\n Charlow, Simon and\n Frank, Robert",\n editor = "Che, Wanxiang and\n Nabende, Joyce and\n Shutova, Ekaterina and\n Pilehvar, Mohammad Taher",\n booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",\n month = jul,\n year = "2025",\n address = "Vienna, Austria",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2025.acl-long.432/",\n doi = "10.18653/v1/2025.acl-long.432",\n pages = "8824--8842",\n ISBN = "979-8-89176-251-0",\n abstract = "We present a hierarchy of natural language understanding abilities and argue for the importance of moving beyond assessments of understanding at the lexical and sentence levels to the discourse level. We propose the task of anaphora accessibility as a diagnostic for assessing discourse understanding, and to this end, present an evaluation dataset inspired by theoretical research in dynamic semantics. We evaluate human and LLM performance on our dataset and find that LLMs and humans align on some tasks and diverge on others. Such divergence can be explained by LLMs' reliance on specific lexical items during language comprehension, in contrast to human sensitivity to structural abstractions."\n}\n\n
\n\n\n
\n We present a hierarchy of natural language understanding abilities and argue for the importance of moving beyond assessments of understanding at the lexical and sentence levels to the discourse level. We propose the task of anaphora accessibility as a diagnostic for assessing discourse understanding, and to this end, present an evaluation dataset inspired by theoretical research in dynamic semantics. We evaluate human and LLM performance on our dataset and find that LLMs and humans align on some tasks and diverge on others. Such divergence can be explained by LLMs' reliance on specific lexical items during language comprehension, in contrast to human sensitivity to structural abstractions.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs.\n \n \n \n \n\n\n \n He, L.; Nie, E.; Dindar, S. S.; Firoozi, A.; Nguyen, V.; Puffay, C.; Shimizu, R.; Ye, H.; Brennan, J.; Schmid, H.; Schütze, H.; and Mesgarani, N.\n\n\n \n\n\n\n In Hahn, M.; Rani, P.; Kumar, R.; Shcherbakov, A.; Sorokin, A.; Serikov, O.; Cotterell, R.; and Vylomova, E., editor(s),
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 75–81, Vienna, Austria, August 2025. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{he-etal-2025-xcomps,\n title = "{XCOMPS}: A Multilingual Benchmark of Conceptual Minimal Pairs",\n author = "He, Linyang and\n Nie, Ercong and\n Dindar, Sukru Samet and\n Firoozi, Arsalan and\n Nguyen, Van and\n Puffay, Corentin and\n Shimizu, Riki and\n Ye, Haotian and\n Brennan, Jonathan and\n Schmid, Helmut and\n Schütze, Hinrich and\n Mesgarani, Nima",\n editor = "Hahn, Michael and\n Rani, Priya and\n Kumar, Ritesh and\n Shcherbakov, Andreas and\n Sorokin, Alexey and\n Serikov, Oleg and\n Cotterell, Ryan and\n Vylomova, Ekaterina",\n booktitle = "Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP",\n month = aug,\n year = "2025",\n address = "Vienna, Austria",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2025.sigtyp-1.9/",\n doi = "10.18653/v1/2025.sigtyp-1.9",\n pages = "75--81",\n ISBN = "979-8-89176-281-7",\n abstract = "In this work, we introduce XCOMPS, a multilingual conceptual minimal pair dataset that covers 17 languages.Using this dataset, we evaluate LLMs' multilingual conceptual understanding through metalinguistic prompting, direct probability measurement, and neurolinguistic probing. We find that: 1) LLMs exhibit weaker conceptual understanding for low-resource languages, and accuracy varies across languages despite being tested on the same concept sets. 2) LLMs excel at distinguishing concept-property pairs that are visibly different but exhibit a marked performance drop when negative pairs share subtle semantic similarities. 3) More morphologically complex languages yield lower concept understanding scores and require deeper layers for conceptual reasoning."\n}\n\n\n
\n\n\n
\n In this work, we introduce XCOMPS, a multilingual conceptual minimal pair dataset that covers 17 languages.Using this dataset, we evaluate LLMs' multilingual conceptual understanding through metalinguistic prompting, direct probability measurement, and neurolinguistic probing. We find that: 1) LLMs exhibit weaker conceptual understanding for low-resource languages, and accuracy varies across languages despite being tested on the same concept sets. 2) LLMs excel at distinguishing concept-property pairs that are visibly different but exhibit a marked performance drop when negative pairs share subtle semantic similarities. 3) More morphologically complex languages yield lower concept understanding scores and require deeper layers for conceptual reasoning.\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n LLM Dependency Parsing with In-Context Rules.\n \n \n \n \n\n\n \n Ginn, M.; and Palmer, A.\n\n\n \n\n\n\n In Fei, H.; Tu, K.; Zhang, Y.; Hu, X.; Han, W.; Jia, Z.; Zheng, Z.; Cao, Y.; Zhang, M.; Lu, W.; Siddharth, N.; Øvrelid, L.; Xue, N.; and Zhang, Y., editor(s),
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025), pages 186–196, Vienna, Austria, August 2025. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{ginn-palmer-2025-llm,\n title = "{LLM} Dependency Parsing with In-Context Rules",\n author = "Ginn, Michael and\n Palmer, Alexis",\n editor = "Fei, Hao and\n Tu, Kewei and\n Zhang, Yuhui and\n Hu, Xiang and\n Han, Wenjuan and\n Jia, Zixia and\n Zheng, Zilong and\n Cao, Yixin and\n Zhang, Meishan and\n Lu, Wei and\n Siddharth, N. and\n {\\O}vrelid, Lilja and\n Xue, Nianwen and\n Zhang, Yue",\n booktitle = "Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)",\n month = aug,\n year = "2025",\n address = "Vienna, Austria",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2025.xllm-1.17/",\n doi = "10.18653/v1/2025.xllm-1.17",\n pages = "186--196",\n ISBN = "979-8-89176-286-2",\n abstract = "We study whether incorporating rules (in various formats) can aid large language models to perform dependency parsing. We consider a paradigm in which LLMs first produce symbolic rules given fully labeled examples, and the rules are then provided in a subsequent call that performs the actual parsing. In addition, we experiment with providing human-created annotation guidelines in-context to the LLMs. We test on eight low-resource languages from Universal Dependencies, finding that while both methods for rule incorporation improve zero-shot performance, the benefit disappears with a few labeled in-context examples."\n}\n\n\n
\n\n\n
\n We study whether incorporating rules (in various formats) can aid large language models to perform dependency parsing. We consider a paradigm in which LLMs first produce symbolic rules given fully labeled examples, and the rules are then provided in a subsequent call that performs the actual parsing. In addition, we experiment with providing human-created annotation guidelines in-context to the LLMs. We test on eight low-resource languages from Universal Dependencies, finding that while both methods for rule incorporation improve zero-shot performance, the benefit disappears with a few labeled in-context examples.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n How Humans and LLMs Organize Conceptual Knowledge: Exploring Subordinate Categories in Italian.\n \n \n \n \n\n\n \n Pedrotti, A.; Rambelli, G.; Villani, C.; and Bolognesi, M.\n\n\n \n\n\n\n In Che, W.; Nabende, J.; Shutova, E.; and Pilehvar, M. T., editor(s),
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4464–4482, Vienna, Austria, July 2025. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{pedrotti-etal-2025-humans,\n title = "How Humans and {LLM}s Organize Conceptual Knowledge: Exploring Subordinate Categories in {I}talian",\n author = "Pedrotti, Andrea and\n Rambelli, Giulia and\n Villani, Caterina and\n Bolognesi, Marianna",\n editor = "Che, Wanxiang and\n Nabende, Joyce and\n Shutova, Ekaterina and\n Pilehvar, Mohammad Taher",\n booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",\n month = jul,\n year = "2025",\n address = "Vienna, Austria",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2025.acl-long.224/",\n doi = "10.18653/v1/2025.acl-long.224",\n pages = "4464--4482",\n ISBN = "979-8-89176-251-0",\n abstract = "People can categorize the same entity at multiple taxonomic levels, such as basic (bear), superordinate (animal), and subordinate (grizzly bear). While prior research has focused on basic-level categories, this study is the first attempt to examine the organization of categories by analyzing exemplars produced at the subordinate level. We present a new Italian psycholinguistic dataset of human-generated exemplars for 187 concrete words. We then leverage these data to evaluate whether textual and vision LLMs produce meaningful exemplars that align with human category organization across three key tasks: exemplar generation, category induction, and typicality judgment. Our findings show a low alignment between humans and LLMs, consistent with previous studies. However, their performance varies notably across different semantic domains. Ultimately, this study highlights both the promises and the constraints of using AI-generated exemplars to support psychological and linguistic research."\n}\n\n\n
\n\n\n
\n People can categorize the same entity at multiple taxonomic levels, such as basic (bear), superordinate (animal), and subordinate (grizzly bear). While prior research has focused on basic-level categories, this study is the first attempt to examine the organization of categories by analyzing exemplars produced at the subordinate level. We present a new Italian psycholinguistic dataset of human-generated exemplars for 187 concrete words. We then leverage these data to evaluate whether textual and vision LLMs produce meaningful exemplars that align with human category organization across three key tasks: exemplar generation, category induction, and typicality judgment. Our findings show a low alignment between humans and LLMs, consistent with previous studies. However, their performance varies notably across different semantic domains. Ultimately, this study highlights both the promises and the constraints of using AI-generated exemplars to support psychological and linguistic research.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books.\n \n \n \n \n\n\n \n Zhang, C.; Lin, J.; Liu, X.; Zhang, Z.; and Feng, Y.\n\n\n \n\n\n\n In Che, W.; Nabende, J.; Shutova, E.; and Pilehvar, M. T., editor(s),
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3977–3997, Vienna, Austria, July 2025. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{zhang-etal-2025-read,\n title = "Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books",\n author = "Zhang, Chen and\n Lin, Jiuheng and\n Liu, Xiao and\n Zhang, Zekai and\n Feng, Yansong",\n editor = "Che, Wanxiang and\n Nabende, Joyce and\n Shutova, Ekaterina and\n Pilehvar, Mohammad Taher",\n booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",\n month = jul,\n year = "2025",\n address = "Vienna, Austria",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2025.acl-long.202/",\n doi = "10.18653/v1/2025.acl-long.202",\n pages = "3977--3997",\n ISBN = "979-8-89176-251-0",\n abstract = "While large language models (LLMs) have shown promise in translating extremely low-resource languages using resources like dictionaries, the effectiveness of grammar books remains debated. This paper investigates the role of grammar books in translating extremely low-resource languages by decomposing it into two key steps: grammar rule retrieval and application. To facilitate the study, we introduce ZhuangRules, a modularized dataset of grammar rules and their corresponding test sentences. Our analysis reveals that rule retrieval constitutes a primary bottleneck in grammar-based translation. Moreover, although LLMs can apply simple rules for translation when explicitly provided, they encounter difficulties in handling more complex rules. To address these challenges, we propose representing grammar rules as code functions, considering their similarities in structure and the benefit of code in facilitating LLM reasoning. Our experiments show that using code rules significantly boosts both rule retrieval and application, ultimately resulting in a 13.1{\\%} BLEU improvement in translation."\n}\n\n\n
\n\n\n
\n While large language models (LLMs) have shown promise in translating extremely low-resource languages using resources like dictionaries, the effectiveness of grammar books remains debated. This paper investigates the role of grammar books in translating extremely low-resource languages by decomposing it into two key steps: grammar rule retrieval and application. To facilitate the study, we introduce ZhuangRules, a modularized dataset of grammar rules and their corresponding test sentences. Our analysis reveals that rule retrieval constitutes a primary bottleneck in grammar-based translation. Moreover, although LLMs can apply simple rules for translation when explicitly provided, they encounter difficulties in handling more complex rules. To address these challenges, we propose representing grammar rules as code functions, considering their similarities in structure and the benefit of code in facilitating LLM reasoning. Our experiments show that using code rules significantly boosts both rule retrieval and application, ultimately resulting in a 13.1% BLEU improvement in translation.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Linguistic Blind Spots of Large Language Models.\n \n \n \n \n\n\n \n Cheng, J.; and Amiri, H.\n\n\n \n\n\n\n In Kuribayashi, T.; Rambelli, G.; Takmaz, E.; Wicke, P.; Li, J.; and Oh, B., editor(s),
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 1–17, Albuquerque, New Mexico, USA, May 2025. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{cheng-amiri-2025-linguistic,\n title = "Linguistic Blind Spots of Large Language Models",\n author = "Cheng, Jiali and\n Amiri, Hadi",\n editor = "Kuribayashi, Tatsuki and\n Rambelli, Giulia and\n Takmaz, Ece and\n Wicke, Philipp and\n Li, Jixing and\n Oh, Byung-Doh",\n booktitle = "Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics",\n month = may,\n year = "2025",\n address = "Albuquerque, New Mexico, USA",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2025.cmcl-1.3/",\n doi = "10.18653/v1/2025.cmcl-1.3",\n pages = "1--17",\n ISBN = "979-8-89176-227-5",\n abstract = "Large language models (LLMs) serve as the foundation of numerous AI applications today. However, despite their remarkable proficiency in generating coherent text, questions linger regarding their ability in performing fine-grained linguistic annotation tasks, such as detecting nouns or verbs, or identifying more complex syntactic structures like clauses or T-units in input texts. These tasks require precise syntactic and semantic understanding of input text, and when LLMs underperform on specific linguistic structures, it raises concerns about their reliability for detailed linguistic analysis and whether their (even correct) outputs truly reflect an understanding of the inputs. In this paper, we empirically study recent LLMs performance across fine-grained linguistic annotation tasks. Through a series of experiments, we find that recent LLMs show limited efficacy in addressing linguistic queries and often struggle with linguistically complex inputs. We show that the most capable LLM (Llama3-70b) makes notable errors in detecting linguistic structures, such as misidentifying embedded clauses, failing to recognize verb phrases, and confusing complex nominals with clauses. Our study provides valuable insights to inform future endeavors in LLM design and development."\n}\n\n\n
\n\n\n
\n Large language models (LLMs) serve as the foundation of numerous AI applications today. However, despite their remarkable proficiency in generating coherent text, questions linger regarding their ability in performing fine-grained linguistic annotation tasks, such as detecting nouns or verbs, or identifying more complex syntactic structures like clauses or T-units in input texts. These tasks require precise syntactic and semantic understanding of input text, and when LLMs underperform on specific linguistic structures, it raises concerns about their reliability for detailed linguistic analysis and whether their (even correct) outputs truly reflect an understanding of the inputs. In this paper, we empirically study recent LLMs performance across fine-grained linguistic annotation tasks. Through a series of experiments, we find that recent LLMs show limited efficacy in addressing linguistic queries and often struggle with linguistically complex inputs. We show that the most capable LLM (Llama3-70b) makes notable errors in detecting linguistic structures, such as misidentifying embedded clauses, failing to recognize verb phrases, and confusing complex nominals with clauses. Our study provides valuable insights to inform future endeavors in LLM design and development.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Can LLMs help create grammar?: Automating grammar creation for endangered languages with in-context learning.\n \n \n \n \n\n\n \n Spencer, P. T.; and Kongborrirak, N.\n\n\n \n\n\n\n In Rambow, O.; Wanner, L.; Apidianaki, M.; Al-Khalifa, H.; Eugenio, B. D.; and Schockaert, S., editor(s),
Proc. of COLING, 2025. \n
\n\n
\n\n
\n\n
\n\n \n \n
Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{spencer-25,\n\ttitle = {Can {LLMs} help create grammar?: {A}utomating grammar creation for endangered languages with in-context learning},\n\turl = {https://aclanthology.org/2025.coling-main.681/},\n\tbooktitle = {Proc. of {COLING}},\n\tauthor = {Spencer, Piyapath T. and Kongborrirak, Nanthipat},\n\teditor = {Rambow, Owen and Wanner, Leo and Apidianaki, Marianna and {Al-Khalifa}, Hend and Eugenio, Barbara Di and Schockaert, Steven},\n\tyear = {2025}\n}\n
\n\n\n\n
\n\n\n\n\n\n