Lexical choice criteria in language generation. Stede, M. In Proceedings, Sixth conference of the European Chapter of the Association for Computational Linguistics, pages 454–459, Utrecht, April, 1993. abstract bibtex In natural language generation (NLG), a semantic representation of some kind — possibly enriched with pragmatic attributes — is successively transformed into one or more linguistic utterances. No matter what particular architecture is chosen to organize this process, one of the crucial decisions to be made is lexicalization: selecting words that adequately express the content that is to be communicated and, if represented, the intentions and attitudes of the speaker. Nirenburg and Nirenburg [1988] give this example to illustrate the lexical choice problem: If we want to express the meaning a person whose sex is male and whose age is between 13 and 15 years, then candidate realizations include: boy, kid, teenager, youth, child, young man, schoolboy, adolescent, man. The criteria influencing such choices remain largely in the dark, however.
As it happens, the problem of lexical choice has not been a particularly popular one in NLG. For instance, Marcus [1987] complained that most generators don't really choose words at all; McDonald [1991], amongst others, lamented that lexical choice has attracted only very little attention in the research community. Implemented generators tend to provide a one-to-one mapping from semantic units to lexical items, and their producers occasionally acknowledge this as a shortcoming (e.g., [Novak, 1991, p. 666]); thereby the task of lexical choice becomes a non-issue. For many applications, this is indeed a feasible scheme, because the sub-language under consideration can be sufficiently restricted such that a direct mapping from content to words does not present a drawback — the generator is implicitly tailored towards the type of situation (or register) in which it operates. But in general, with an eye on more expressive and versatile generators, this state of affairs calls for improvement.
Why is lexical choice difficult? Unlike many other decisions in generation (e.g., whether to express an attribute of an object as a relative clause or an adjective) the choice of a word very often carries implicatures that can change the overall message significantly — if in some sentence the word boy is replaced with one of the alternatives above, the meaning shifts considerably. Also, often there are quite a few similar lexical options available to a speaker, whereas the number of possible syntactic sentence constructions is more limited. To solve the choice problem, first of all the differences between similar words have to be represented in the lexicon, and the criteria for choosing among them have to be established. In the following, I give a tentative list of choice criteria, classify them into constraints and preferences, and outline a (partly implemented) model of lexicalization that can be incorporated into language generators.
@InProceedings{ stede10,
author = {Manfred Stede},
title = {Lexical choice criteria in language generation},
booktitle = {Proceedings, Sixth conference of the European Chapter of
the Association for Computational Linguistics},
address = {Utrecht},
month = {April},
year = {1993},
pages = {454--459},
abstract = {<P>In natural language generation (NLG), a semantic
representation of some kind --- possibly enriched with
pragmatic attributes --- is successively transformed into
one or more linguistic utterances. No matter what
particular architecture is chosen to organize this process,
one of the crucial decisions to be made is lexicalization:
selecting words that adequately express the content that is
to be communicated and, if represented, the intentions and
attitudes of the speaker. Nirenburg and Nirenburg [1988]
give this example to illustrate the lexical choice problem:
If we want to express the meaning <I>a person whose sex is
male and whose age is between 13 and 15 years</I>, then
candidate realizations include: <I>boy, kid, teenager,
youth, child, young man, schoolboy, adolescent, man</I>.
The criteria influencing such choices remain largely in the
dark, however.</p> <P> As it happens, the problem of
lexical choice has not been a particularly popular one in
NLG. For instance, Marcus [1987] complained that most
generators don't really choose words at all; McDonald
[1991], amongst others, lamented that lexical choice has
attracted only very little attention in the research
community. Implemented generators tend to provide a
one-to-one mapping from semantic units to lexical items,
and their producers occasionally acknowledge this as a
shortcoming (e.g., [Novak, 1991, p. 666]); thereby the task
of lexical choice becomes a non-issue. For many
applications, this is indeed a feasible scheme, because the
sub-language under consideration can be sufficiently
restricted such that a direct mapping from content to words
does not present a drawback --- the generator is implicitly
tailored towards the type of situation (or register) in
which it operates. But in general, with an eye on more
expressive and versatile generators, this state of affairs
calls for improvement.</p> <P> Why is lexical choice
difficult? Unlike many other decisions in generation (e.g.,
whether to express an attribute of an object as a relative
clause or an adjective) the choice of a word very often
carries implicatures that can change the overall message
significantly --- if in some sentence the word <I>boy</I>
is replaced with one of the alternatives above, the meaning
shifts considerably. Also, often there are quite a few
similar lexical options available to a speaker, whereas the
number of possible syntactic sentence constructions is more
limited. To solve the choice problem, first of all the
differences between similar words have to be represented in
the lexicon, and the criteria for choosing among them have
to be established. In the following, I give a tentative
list of choice criteria, classify them into constraints and
preferences, and outline a (partly implemented) model of
lexicalization that can be incorporated into language
generators.</p>},
download = {http://ftp.cs.toronto.edu/pub/gh/Stede-1993.pdf}
}
Downloads: 0
{"_id":{"_str":"534282740e946d920a001b3f"},"__v":4,"authorIDs":["54594c97b43425b77200045a"],"author_short":["Stede, M."],"bibbaseid":"stede-lexicalchoicecriteriainlanguagegeneration-1993","bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Manfred"],"propositions":[],"lastnames":["Stede"],"suffixes":[]}],"title":"Lexical choice criteria in language generation","booktitle":"Proceedings, Sixth conference of the European Chapter of the Association for Computational Linguistics","address":"Utrecht","month":"April","year":"1993","pages":"454–459","abstract":"<P>In natural language generation (NLG), a semantic representation of some kind — possibly enriched with pragmatic attributes — is successively transformed into one or more linguistic utterances. No matter what particular architecture is chosen to organize this process, one of the crucial decisions to be made is lexicalization: selecting words that adequately express the content that is to be communicated and, if represented, the intentions and attitudes of the speaker. Nirenburg and Nirenburg [1988] give this example to illustrate the lexical choice problem: If we want to express the meaning <I>a person whose sex is male and whose age is between 13 and 15 years</I>, then candidate realizations include: <I>boy, kid, teenager, youth, child, young man, schoolboy, adolescent, man</I>. The criteria influencing such choices remain largely in the dark, however.</p> <P> As it happens, the problem of lexical choice has not been a particularly popular one in NLG. For instance, Marcus [1987] complained that most generators don't really choose words at all; McDonald [1991], amongst others, lamented that lexical choice has attracted only very little attention in the research community. Implemented generators tend to provide a one-to-one mapping from semantic units to lexical items, and their producers occasionally acknowledge this as a shortcoming (e.g., [Novak, 1991, p. 666]); thereby the task of lexical choice becomes a non-issue. For many applications, this is indeed a feasible scheme, because the sub-language under consideration can be sufficiently restricted such that a direct mapping from content to words does not present a drawback — the generator is implicitly tailored towards the type of situation (or register) in which it operates. But in general, with an eye on more expressive and versatile generators, this state of affairs calls for improvement.</p> <P> Why is lexical choice difficult? Unlike many other decisions in generation (e.g., whether to express an attribute of an object as a relative clause or an adjective) the choice of a word very often carries implicatures that can change the overall message significantly — if in some sentence the word <I>boy</I> is replaced with one of the alternatives above, the meaning shifts considerably. Also, often there are quite a few similar lexical options available to a speaker, whereas the number of possible syntactic sentence constructions is more limited. To solve the choice problem, first of all the differences between similar words have to be represented in the lexicon, and the criteria for choosing among them have to be established. In the following, I give a tentative list of choice criteria, classify them into constraints and preferences, and outline a (partly implemented) model of lexicalization that can be incorporated into language generators.</p>","download":"http://ftp.cs.toronto.edu/pub/gh/Stede-1993.pdf","bibtex":"@InProceedings{\t stede10,\n author\t= {Manfred Stede},\n title\t\t= {Lexical choice criteria in language generation},\n booktitle\t= {Proceedings, Sixth conference of the European Chapter of\n\t\t the Association for Computational Linguistics},\n address\t= {Utrecht},\n month\t\t= {April},\n year\t\t= {1993},\n pages\t\t= {454--459},\n abstract\t= {<P>In natural language generation (NLG), a semantic\n\t\t representation of some kind --- possibly enriched with\n\t\t pragmatic attributes --- is successively transformed into\n\t\t one or more linguistic utterances. No matter what\n\t\t particular architecture is chosen to organize this process,\n\t\t one of the crucial decisions to be made is lexicalization:\n\t\t selecting words that adequately express the content that is\n\t\t to be communicated and, if represented, the intentions and\n\t\t attitudes of the speaker. Nirenburg and Nirenburg [1988]\n\t\t give this example to illustrate the lexical choice problem:\n\t\t If we want to express the meaning <I>a person whose sex is\n\t\t male and whose age is between 13 and 15 years</I>, then\n\t\t candidate realizations include: <I>boy, kid, teenager,\n\t\t youth, child, young man, schoolboy, adolescent, man</I>.\n\t\t The criteria influencing such choices remain largely in the\n\t\t dark, however.</p> <P> As it happens, the problem of\n\t\t lexical choice has not been a particularly popular one in\n\t\t NLG. For instance, Marcus [1987] complained that most\n\t\t generators don't really choose words at all; McDonald\n\t\t [1991], amongst others, lamented that lexical choice has\n\t\t attracted only very little attention in the research\n\t\t community. Implemented generators tend to provide a\n\t\t one-to-one mapping from semantic units to lexical items,\n\t\t and their producers occasionally acknowledge this as a\n\t\t shortcoming (e.g., [Novak, 1991, p. 666]); thereby the task\n\t\t of lexical choice becomes a non-issue. For many\n\t\t applications, this is indeed a feasible scheme, because the\n\t\t sub-language under consideration can be sufficiently\n\t\t restricted such that a direct mapping from content to words\n\t\t does not present a drawback --- the generator is implicitly\n\t\t tailored towards the type of situation (or register) in\n\t\t which it operates. But in general, with an eye on more\n\t\t expressive and versatile generators, this state of affairs\n\t\t calls for improvement.</p> <P> Why is lexical choice\n\t\t difficult? Unlike many other decisions in generation (e.g.,\n\t\t whether to express an attribute of an object as a relative\n\t\t clause or an adjective) the choice of a word very often\n\t\t carries implicatures that can change the overall message\n\t\t significantly --- if in some sentence the word <I>boy</I>\n\t\t is replaced with one of the alternatives above, the meaning\n\t\t shifts considerably. Also, often there are quite a few\n\t\t similar lexical options available to a speaker, whereas the\n\t\t number of possible syntactic sentence constructions is more\n\t\t limited. To solve the choice problem, first of all the\n\t\t differences between similar words have to be represented in\n\t\t the lexicon, and the criteria for choosing among them have\n\t\t to be established. In the following, I give a tentative\n\t\t list of choice criteria, classify them into constraints and\n\t\t preferences, and outline a (partly implemented) model of\n\t\t lexicalization that can be incorporated into language\n\t\t generators.</p>},\n download\t= {http://ftp.cs.toronto.edu/pub/gh/Stede-1993.pdf}\n}\n\n","author_short":["Stede, M."],"key":"stede10","id":"stede10","bibbaseid":"stede-lexicalchoicecriteriainlanguagegeneration-1993","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"www.cs.toronto.edu/~fritz/tmp/compling.bib","downloads":0,"keywords":[],"search_terms":["lexical","choice","criteria","language","generation","stede"],"title":"Lexical choice criteria in language generation","year":1993,"dataSources":["n8jB5BJxaeSmH6mtR","6b6A9kbkw4CsEGnRX"]}