\n \n \n
\n
\n\n \n \n \n \n \n \n Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue.\n \n \n \n \n\n\n \n Lennert Jansen, Štěpán Lars Laichter, Arabella Sinclair, Margot Goot, Raquel Fernández, & Sandro Pezzelle.\n\n\n \n\n\n\n In
Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM), pages 172–188, Abu Dhabi, United Arab Emirates (Hybrid), 2022. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 9 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{jansen-etal-2022-controllable,\n title = "Controllable Text Generation for All Ages: Evaluating a Plug-and-Play Approach to Age-Adapted Dialogue",\n author = "Jansen, Lennert and\n Laichter, {\\v{S}}t{\\v{e}}p{\\'a}n Lars and\n Sinclair, Arabella and\n van der Goot, Margot and\n Fern{\\'a}ndez, Raquel and\n Pezzelle, Sandro",\n booktitle = "Proceedings of the 2nd Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)",\n year = "2022",\n address = "Abu Dhabi, United Arab Emirates (Hybrid)",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2022.gem-1.14",\n pages = "172--188",\n abstract = "To be trusted and perceived as natural and coherent, conversational systems must adapt to the language of their users. While personalized dialogue is a promising direction, controlling generation for fine-grained language features remains a challenge in this approach. A recent line of research showed the effectiveness of leveraging pre-trained language models toward adapting to a text{'}s topic or sentiment. In this study, we build on these approaches and focus on a higher-level dimension of language variation: speakers{'} age. We frame the task as a dialogue response generation, and test methods based on bag-of-words (BoW) and neural discriminators (Disc) to condition the output of GPT-2 and DialoGPT without altering the parameters of the language models. We show that Disc models achieve a higher degree of detectable control than BoW models based on automatic evaluation. In contrast, humans can partially detect age differences in BoW but not Disc responses. Since BoW responses are deemed better than Disc ones by humans, simple controllable methods thus appear to be a better tradeoff between adaptation and language quality. Our work confirms the challenges of adapting to higher-level dimensions of language variation. Moreover, it highlights the need to evaluate natural language generation thoroughly.",\n}\n\n
\n
\n\n\n
\n To be trusted and perceived as natural and coherent, conversational systems must adapt to the language of their users. While personalized dialogue is a promising direction, controlling generation for fine-grained language features remains a challenge in this approach. A recent line of research showed the effectiveness of leveraging pre-trained language models toward adapting to a text's topic or sentiment. In this study, we build on these approaches and focus on a higher-level dimension of language variation: speakers' age. We frame the task as a dialogue response generation, and test methods based on bag-of-words (BoW) and neural discriminators (Disc) to condition the output of GPT-2 and DialoGPT without altering the parameters of the language models. We show that Disc models achieve a higher degree of detectable control than BoW models based on automatic evaluation. In contrast, humans can partially detect age differences in BoW but not Disc responses. Since BoW responses are deemed better than Disc ones by humans, simple controllable methods thus appear to be a better tradeoff between adaptation and language quality. Our work confirms the challenges of adapting to higher-level dimensions of language variation. Moreover, it highlights the need to evaluate natural language generation thoroughly.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Stop Measuring Calibration When Humans Disagree.\n \n \n \n \n\n\n \n Joris Baan, Wilker Aziz, Barbara Plank, & Raquel Fernández.\n\n\n \n\n\n\n In
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1892–1915, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 12 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{baan-etal-2022-emnlp,\n title = "Stop Measuring Calibration When Humans Disagree",\n author = "Baan, Joris and\n Aziz, Wilker and\n Plank, Barbara and\n Fern{\\'a}ndez, Raquel",\n booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",\n year = "2022",\n address = "Abu Dhabi, United Arab Emirates",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2022.emnlp-main.124",\n pages = "1892--1915",\n abstract = "Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - including class frequency, ranking and entropy.",\n}\n\n
\n
\n\n\n
\n Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - including class frequency, ranking and entropy.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Towards Pragmatic Production Strategies for Natural Language Generation Tasks.\n \n \n \n \n\n\n \n Mario Giulianelli.\n\n\n \n\n\n\n In
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 7978–7984, Abu Dhabi, United Arab Emirates, 2022. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 7 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{giulianelli-2022-emnlp,\n title = "Towards Pragmatic Production Strategies for Natural Language Generation Tasks",\n author = "Giulianelli, Mario",\n booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",\n year = "2022",\n address = "Abu Dhabi, United Arab Emirates",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2022.emnlp-main.544",\n pages = "7978--7984",\n abstract = "This position paper proposes a conceptual framework for the design of Natural Language Generation (NLG) systems that follow efficient and effective production strategies in order to achieve complex communicative goals. In this general framework, efficiency is characterised as the parsimonious regulation of production and comprehension costs while effectiveness is measured with respect to task-oriented and contextually grounded communicative goals. We provide concrete suggestions for the estimation of goals, costs, and utility via modern statistical methods, demonstrating applications of our framework to the classic pragmatic task of visually grounded referential games and to abstractive text summarisation, two popular generation tasks with real-world applications. In sum, we advocate for the development of NLG systems that learn to make pragmatic production decisions from experience, by reasoning about goals, costs, and utility in a human-like way.",\n}\n\n\n
\n
\n\n\n
\n This position paper proposes a conceptual framework for the design of Natural Language Generation (NLG) systems that follow efficient and effective production strategies in order to achieve complex communicative goals. In this general framework, efficiency is characterised as the parsimonious regulation of production and comprehension costs while effectiveness is measured with respect to task-oriented and contextually grounded communicative goals. We provide concrete suggestions for the estimation of goals, costs, and utility via modern statistical methods, demonstrating applications of our framework to the classic pragmatic task of visually grounded referential games and to abstractive text summarisation, two popular generation tasks with real-world applications. In sum, we advocate for the development of NLG systems that learn to make pragmatic production decisions from experience, by reasoning about goals, costs, and utility in a human-like way.\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Alignment of code switching varies with proficiency in second language learning dialogue.\n \n \n \n \n\n\n \n Arabella J. Sinclair, & Raquel Fernández.\n\n\n \n\n\n\n
System. Special Issue on Linguistic alignment in Second Language Acquisition: occurrences, learning effects, and beyond. 2022.\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 4 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{sinclair-fernandez-system-2022,\ntitle = {Alignment of code switching varies with proficiency in second language learning dialogue},\njournal = {System. Special Issue on Linguistic alignment in Second Language Acquisition: occurrences, learning effects, and beyond},\nyear = {2022},\nissn = {0346-251X},\npublisher = {Elsevier},\ndoi = {https://doi.org/10.1016/j.system.2022.102952},\nurl = {https://www.sciencedirect.com/science/article/pii/S0346251X22002342},\nauthor = {Arabella J. Sinclair and Raquel Fern\\'andez},\nabstract = {Speakers in dialogue tend to adopt the language patterns of the other, aligning their language to their interlocutor. This can happen at many levels of communication, including the tendency to code switch (CS), or change to another language. Alignment has often been considered the result of an unconscious automatic process that facilitates speakers' mutual understanding. In dialogues with a second language (L2) learner, alignment is constrained by the proficiency of the learner, and additional non-automatic processes will be at play, namely the individual pedagogical goals of learner and tutor. In this study, we investigate alignment in dialogues between Spanish/Catalan learners of English and their tutors. We analyse CS incidence, whether code switching can be explained as automatic alignment between speakers, and whether this is independent of other, non-automatic factors related to speakers’ goals. We find that alignment of code switching is present, varies with learner proficiency, and that code switching can additionally be triggered by lexical overlap and turn taking asymmetry, which we attribute to conscious pedagogical choices on the part of both tutor, at lower levels, and learner, at higher levels of student proficiency.}\n}\n\n
\n
\n\n\n
\n Speakers in dialogue tend to adopt the language patterns of the other, aligning their language to their interlocutor. This can happen at many levels of communication, including the tendency to code switch (CS), or change to another language. Alignment has often been considered the result of an unconscious automatic process that facilitates speakers' mutual understanding. In dialogues with a second language (L2) learner, alignment is constrained by the proficiency of the learner, and additional non-automatic processes will be at play, namely the individual pedagogical goals of learner and tutor. In this study, we investigate alignment in dialogues between Spanish/Catalan learners of English and their tutors. We analyse CS incidence, whether code switching can be explained as automatic alignment between speakers, and whether this is independent of other, non-automatic factors related to speakers’ goals. We find that alignment of code switching is present, varies with learner proficiency, and that code switching can additionally be triggered by lexical overlap and turn taking asymmetry, which we attribute to conscious pedagogical choices on the part of both tutor, at lower levels, and learner, at higher levels of student proficiency.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations.\n \n \n \n \n\n\n \n Arabella Sinclair, Jaap Jumelet, Willem Zuidema, & Raquel Fernández.\n\n\n \n\n\n\n
Transactions of the Association for Computational Linguistics (TACL). 2022.\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 13 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@Article{sinclair-etal-2022-tacl,\n author = \t "Arabella Sinclair and Jaap Jumelet and Willem Zuidema and Raquel Fern\\'andez",\n title = \t "Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations",\n journal = \t "Transactions of the Association for Computational Linguistics (TACL)",\n year = "2022",\n url = "https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00504/113019/Structural-Persistence-in-Language-Models-Priming",\n abstract = "We investigate the extent to which modern, neural language models are susceptible to structural priming,\n the phenomenon whereby the structure of a sentence makes the same structure more probable in a follow-up sentence.\n We explore how priming can be used to study the potential of these models to learn abstract structural information,\n which is a prerequisite for good performance on tasks that require natural language understanding skills. We introduce\n a novel metric and release PRIME-LM, a large corpus where we control for various linguistic factors which interact with priming strength.\n We find that Transformer models indeed show evidence of structural priming, but also that the generalisations they learned are to\n some extent modulated by semantic information. Our experiments also show that the representations acquired by the models\n may not only encode abstract sequential structure but involve certain level of hierarchical syntactic information.\n More generally, our study shows that the priming paradigm is a useful, additional tool for gaining insights into the capacities\n of language models and opens the door to future priming-based investigations that probe the model's internal states."\n}\n\n
\n
\n\n\n
\n We investigate the extent to which modern, neural language models are susceptible to structural priming, the phenomenon whereby the structure of a sentence makes the same structure more probable in a follow-up sentence. We explore how priming can be used to study the potential of these models to learn abstract structural information, which is a prerequisite for good performance on tasks that require natural language understanding skills. We introduce a novel metric and release PRIME-LM, a large corpus where we control for various linguistic factors which interact with priming strength. We find that Transformer models indeed show evidence of structural priming, but also that the generalisations they learned are to some extent modulated by semantic information. Our experiments also show that the representations acquired by the models may not only encode abstract sequential structure but involve certain level of hierarchical syntactic information. More generally, our study shows that the priming paradigm is a useful, additional tool for gaining insights into the capacities of language models and opens the door to future priming-based investigations that probe the model's internal states.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n AnaLog: Testing Analytical and Deductive Logic Learnability in Language Models.\n \n \n \n \n\n\n \n Samuel Ryb, Mario Giulianelli, Arabella Sinclair, & Raquel Fernández.\n\n\n \n\n\n\n In
Proceedings of the 11th Joint Conference on Lexical and Computational Semantics, pages 55–68, Seattle, Washington, July 2022. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 26 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{ryb-etal-2022-analog,\n title = "{A}na{L}og: Testing Analytical and Deductive Logic Learnability in Language Models",\n author = "Ryb, Samuel and\n Giulianelli, Mario and\n Sinclair, Arabella and\n Fern{\\'a}ndez, Raquel",\n booktitle = "Proceedings of the 11th Joint Conference on Lexical and Computational Semantics",\n month = jul,\n year = "2022",\n address = "Seattle, Washington",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2022.starsem-1.5",\n pages = "55--68",\n abstract = "We investigate the extent to which pre-trained language models acquire analytical and deductive logical reasoning\n capabilities as a side effect of learning word prediction. We present AnaLog, a natural language inference task designed to\n probe models for these capabilities, controlling for different invalid heuristics the models may adopt instead of learning the\n desired generalisations. We test four languagemodels on AnaLog, finding that they have all learned, to a different extent,\n to encode information that is predictive of entailment beyond shallow heuristics such as lexical overlap and grammaticality.\n We closely analyse the best performing language model and show that while it performs more consistently than other language\n models across logical connectives and reasoning domains, it still is sensitive to lexical and syntactic variations in the\n realisation of logical statements."\n}\n\n
\n
\n\n\n
\n We investigate the extent to which pre-trained language models acquire analytical and deductive logical reasoning capabilities as a side effect of learning word prediction. We present AnaLog, a natural language inference task designed to probe models for these capabilities, controlling for different invalid heuristics the models may adopt instead of learning the desired generalisations. We test four languagemodels on AnaLog, finding that they have all learned, to a different extent, to encode information that is predictive of entailment beyond shallow heuristics such as lexical overlap and grammaticality. We closely analyse the best performing language model and show that while it performs more consistently than other language models across logical connectives and reasoning domains, it still is sensitive to lexical and syntactic variations in the realisation of logical statements.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Less Descriptive yet Discriminative: Quantifying the Properties of Multimodal Referring Utterances via CLIP.\n \n \n \n \n\n\n \n Ece Takmaz, Sandro Pezzelle, & Raquel Fernández.\n\n\n \n\n\n\n In
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 36–42, Dublin, Ireland, May 2022. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 9 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{takmaz-etal-2022-cmcl,\n title = "Less Descriptive yet Discriminative: Quantifying the Properties of Multimodal Referring Utterances via {CLIP}",\n author = "Takmaz, Ece and Pezzelle, Sandro and Fern{\\'a}ndez, Raquel",\n booktitle = "Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics",\n month = may,\n year = "2022",\n address = "Dublin, Ireland",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2022.cmcl-1.4",\n pages = "36--42",\n abstract = "In this work, we use a transformer-based pre-trained multimodal model, CLIP,\n to shed light on the mechanisms employed by human speakers when referring to visual entities.\n In particular, we use CLIP to quantify the degree of descriptiveness (how well an utterance\n describes an image in isolation) and discriminativeness (to what extent an utterance is effective\n in picking out a single image among similar images) of human referring utterances within multimodal\n dialogues. Overall, our results show that utterances become less descriptive over time while their\n discriminativeness remains unchanged. Through analysis, we propose that this trend could be due to\n participants relying on the previous mentions in the dialogue history, as well as being able to\n distill the most discriminative information from the visual context. In general, our study opens\n up the possibility of using this and similar models to quantify patterns in human data and shed\n light on the underlying cognitive mechanisms."\n}\n\n
\n
\n\n\n
\n In this work, we use a transformer-based pre-trained multimodal model, CLIP, to shed light on the mechanisms employed by human speakers when referring to visual entities. In particular, we use CLIP to quantify the degree of descriptiveness (how well an utterance describes an image in isolation) and discriminativeness (to what extent an utterance is effective in picking out a single image among similar images) of human referring utterances within multimodal dialogues. Overall, our results show that utterances become less descriptive over time while their discriminativeness remains unchanged. Through analysis, we propose that this trend could be due to participants relying on the previous mentions in the dialogue history, as well as being able to distill the most discriminative information from the visual context. In general, our study opens up the possibility of using this and similar models to quantify patterns in human data and shed light on the underlying cognitive mechanisms.\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Do Not Fire the Linguist: Grammatical Profiles Help Language Models Detect Semantic Change.\n \n \n \n \n\n\n \n Mario Giulianelli, Andrey Kutuzov, & Lidia Pivovarova.\n\n\n \n\n\n\n In
Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change, pages 54–67, Dublin, Ireland, May 2022. Association for Computational Linguistics\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 4 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{giulianelli-etal-2022-fire,\n title = "Do Not Fire the Linguist: Grammatical Profiles Help Language Models Detect Semantic Change",\n author = "Giulianelli, Mario and Kutuzov, Andrey and Pivovarova, Lidia",\n booktitle = "Proceedings of the 3rd Workshop on Computational Approaches to Historical Language Change",\n month = may,\n year = "2022",\n address = "Dublin, Ireland",\n publisher = "Association for Computational Linguistics",\n url = "https://aclanthology.org/2022.lchange-1.6",\n pages = "54--67",\n abstract = "Morphological and syntactic changes in word usage {---} as captured, e.g., by\n grammatical profiles {---} have been shown to be good predictors of a word{'}s meaning change.\n In this work, we explore whether large pre-trained contextualised language models, a common tool\n for lexical semantic change detection, are sensitive to such morphosyntactic changes. To this end,\n we first compare the performance of grammatical profiles against that of a multilingual neural\n language model (XLM-R) on 10 datasets, covering 7 languages, and then combine the two approaches\n in ensembles to assess their complementarity. Our results show that ensembling grammatical\n profiles with XLM-R improves semantic change detection performance for most datasets and languages.\n This indicates that language models do not fully cover the fine-grained morphological and syntactic\n signals that are explicitly represented in grammatical profiles. An interesting exception are the\n test sets where the time spans under analysis are much longer than the time gap between them\n (for example, century-long spans with a one-year gap between them). Morphosyntactic change is\n slow so grammatical profiles do not detect in such cases. In contrast, language models, thanks\n to their access to lexical information, are able to detect fast topical changes."\n}\n\n\n
\n
\n\n\n
\n Morphological and syntactic changes in word usage — as captured, e.g., by grammatical profiles — have been shown to be good predictors of a word's meaning change. In this work, we explore whether large pre-trained contextualised language models, a common tool for lexical semantic change detection, are sensitive to such morphosyntactic changes. To this end, we first compare the performance of grammatical profiles against that of a multilingual neural language model (XLM-R) on 10 datasets, covering 7 languages, and then combine the two approaches in ensembles to assess their complementarity. Our results show that ensembling grammatical profiles with XLM-R improves semantic change detection performance for most datasets and languages. This indicates that language models do not fully cover the fine-grained morphological and syntactic signals that are explicitly represented in grammatical profiles. An interesting exception are the test sets where the time spans under analysis are much longer than the time gap between them (for example, century-long spans with a one-year gap between them). Morphosyntactic change is slow so grammatical profiles do not detect in such cases. In contrast, language models, thanks to their access to lexical information, are able to detect fast topical changes.\n
\n\n\n
\n\n\n\n\n\n