Lexical choice criteria in language generation. Stede, M. In Proceedings, Sixth conference of the European Chapter of the Association for Computational Linguistics, pages 454–459, Utrecht, April, 1993.
abstract   bibtex   

In natural language generation (NLG), a semantic representation of some kind –- possibly enriched with pragmatic attributes –- is successively transformed into one or more linguistic utterances. No matter what particular architecture is chosen to organize this process, one of the crucial decisions to be made is lexicalization: selecting words that adequately express the content that is to be communicated and, if represented, the intentions and attitudes of the speaker. Nirenburg and Nirenburg [1988] give this example to illustrate the lexical choice problem: If we want to express the meaning a person whose sex is male and whose age is between 13 and 15 years, then candidate realizations include: boy, kid, teenager, youth, child, young man, schoolboy, adolescent, man. The criteria influencing such choices remain largely in the dark, however.

As it happens, the problem of lexical choice has not been a particularly popular one in NLG. For instance, Marcus [1987] complained that most generators don't really choose words at all; McDonald [1991], amongst others, lamented that lexical choice has attracted only very little attention in the research community. Implemented generators tend to provide a one-to-one mapping from semantic units to lexical items, and their producers occasionally acknowledge this as a shortcoming (e.g., [Novak, 1991, p. 666]); thereby the task of lexical choice becomes a non-issue. For many applications, this is indeed a feasible scheme, because the sub-language under consideration can be sufficiently restricted such that a direct mapping from content to words does not present a drawback –- the generator is implicitly tailored towards the type of situation (or register) in which it operates. But in general, with an eye on more expressive and versatile generators, this state of affairs calls for improvement.

Why is lexical choice difficult? Unlike many other decisions in generation (e.g., whether to express an attribute of an object as a relative clause or an adjective) the choice of a word very often carries implicatures that can change the overall message significantly –- if in some sentence the word boy is replaced with one of the alternatives above, the meaning shifts considerably. Also, often there are quite a few similar lexical options available to a speaker, whereas the number of possible syntactic sentence constructions is more limited. To solve the choice problem, first of all the differences between similar words have to be represented in the lexicon, and the criteria for choosing among them have to be established. In the following, I give a tentative list of choice criteria, classify them into constraints and preferences, and outline a (partly implemented) model of lexicalization that can be incorporated into language generators.

@InProceedings{	  stede10,
  author	= {Manfred Stede},
  title		= {Lexical choice criteria in language generation},
  booktitle	= {Proceedings, Sixth conference of the European Chapter of
		  the Association for Computational Linguistics},
  address	= {Utrecht},
  month		= {April},
  year		= {1993},
  pages		= {454--459},
  abstract	= {<P>In natural language generation (NLG), a semantic
		  representation of some kind --- possibly enriched with
		  pragmatic attributes --- is successively transformed into
		  one or more linguistic utterances. No matter what
		  particular architecture is chosen to organize this process,
		  one of the crucial decisions to be made is lexicalization:
		  selecting words that adequately express the content that is
		  to be communicated and, if represented, the intentions and
		  attitudes of the speaker. Nirenburg and Nirenburg [1988]
		  give this example to illustrate the lexical choice problem:
		  If we want to express the meaning <I>a person whose sex is
		  male and whose age is between 13 and 15 years</I>, then
		  candidate realizations include: <I>boy, kid, teenager,
		  youth, child, young man, schoolboy, adolescent, man</I>.
		  The criteria influencing such choices remain largely in the
		  dark, however.</p> <P> As it happens, the problem of
		  lexical choice has not been a particularly popular one in
		  NLG. For instance, Marcus [1987] complained that most
		  generators don't really choose words at all; McDonald
		  [1991], amongst others, lamented that lexical choice has
		  attracted only very little attention in the research
		  community. Implemented generators tend to provide a
		  one-to-one mapping from semantic units to lexical items,
		  and their producers occasionally acknowledge this as a
		  shortcoming (e.g., [Novak, 1991, p. 666]); thereby the task
		  of lexical choice becomes a non-issue. For many
		  applications, this is indeed a feasible scheme, because the
		  sub-language under consideration can be sufficiently
		  restricted such that a direct mapping from content to words
		  does not present a drawback --- the generator is implicitly
		  tailored towards the type of situation (or register) in
		  which it operates. But in general, with an eye on more
		  expressive and versatile generators, this state of affairs
		  calls for improvement.</p> <P> Why is lexical choice
		  difficult? Unlike many other decisions in generation (e.g.,
		  whether to express an attribute of an object as a relative
		  clause or an adjective) the choice of a word very often
		  carries implicatures that can change the overall message
		  significantly --- if in some sentence the word <I>boy</I>
		  is replaced with one of the alternatives above, the meaning
		  shifts considerably. Also, often there are quite a few
		  similar lexical options available to a speaker, whereas the
		  number of possible syntactic sentence constructions is more
		  limited. To solve the choice problem, first of all the
		  differences between similar words have to be represented in
		  the lexicon, and the criteria for choosing among them have
		  to be established. In the following, I give a tentative
		  list of choice criteria, classify them into constraints and
		  preferences, and outline a (partly implemented) model of
		  lexicalization that can be incorporated into language
		  generators.</p>},
  download	= {http://ftp.cs.toronto.edu/pub/gh/Stede-1993.pdf}
}

Downloads: 0