Building a Lexical Knowledge-Base of Near-Synonym Differences. Inkpen, D. Ph.D. Thesis, Department of Computer Science, University of Toronto, October, 2003.
abstract   bibtex   

Current natural language generation or machine translation systems cannot distinguish among near-synonyms — words that share the same core meaning but vary in their lexical nuances. This is due to a lack of knowledge about differences between near-synonyms in existing computational lexical resources.

The goal of this thesis is to automatically acquire a lexical knowledge-base of near-synonym differences (LKB of NS) from multiple sources, and to show how it can be used in a practical natural language processing system.

I designed a method to automatically acquire knowledge from dictionaries of near-synonym discrimination written for human readers. An unsupervised decision-list algorithm learns patterns and words for classes of distinctions. The patterns are learned automatically, followed by a manual validation step. The extraction of distinctions between near-synonyms is entirely automatic. The main types of distinctions are: stylistic (for example, inebriated is more formal than drunk), attitudinal (for example, skinny is more pejorative than slim), and denotational (for example, blunder implies accident and ignorance, while error does not).

I enriched the initial LKB of NS with information extracted from other sources. First, information about the senses of the near-synonym was added (WordNet senses). The other near-synonyms in the same dictionary entry and the text of the entry provide a strong context for disambiguation. Second, knowledge about the collocational behaviour of the near-synonyms was acquired from free text. Collocations between a word and the near-synonyms in a dictionary entry were classified into: preferred collocations, less-preferred collocations, and anti-collocations. Third, knowledge about distinctions between near-synonyms was acquired from machine-readable dictionaries (the General Inquirer and the Macquarie Dictionary). These distinctions were merged with the initial LKB of NS, and inconsistencies were resolved.

The generic LKB of NS needs to be customized in order to be used in a natural language processing system. The parts that need customization are the core denotations and the strings that describe peripheral concepts in the denotational distinctions. To show how the LKB of NS can be used in practice, I present Xenon, a natural language generation system system that chooses the near-synonym that best matches a set of input preferences. I implemented Xenon by adding a near-synonym choice module and a near-synonym collocation module to an existing general-purpose surface realizer.

@PhDThesis{	  inkpen4,
  author	= {Diana Inkpen},
  title		= {Building a Lexical Knowledge-Base of Near-Synonym
		  Differences},
  school	= {Department of Computer Science, University of Toronto},
  month		= {October},
  year		= {2003},
  abstract	= {<P> Current natural language generation or machine
		  translation systems cannot distinguish among near-synonyms
		  --- words that share the same core meaning but vary in
		  their lexical nuances. This is due to a lack of knowledge
		  about differences between near-synonyms in existing
		  computational lexical resources.</p> <P> The goal of this
		  thesis is to automatically acquire a lexical knowledge-base
		  of near-synonym differences (LKB of NS) from multiple
		  sources, and to show how it can be used in a practical
		  natural language processing system.</p> <P> I designed a
		  method to automatically acquire knowledge from dictionaries
		  of near-synonym discrimination written for human readers.
		  An unsupervised decision-list algorithm learns patterns and
		  words for classes of distinctions. The patterns are learned
		  automatically, followed by a manual validation step. The
		  extraction of distinctions between near-synonyms is
		  entirely automatic. The main types of distinctions are:
		  stylistic (for example, <i>inebriated</i> is more formal
		  than <i>drunk</i>), attitudinal (for example, <i>skinny</i>
		  is more pejorative than <i>slim</i>), and denotational (for
		  example, <i>blunder</i> implies <i>accident</i> and
		  <i>ignorance</i>, while <i>error</i> does not).</p> <P> I
		  enriched the initial LKB of NS with information extracted
		  from other sources. First, information about the senses of
		  the near-synonym was added (WordNet senses). The other
		  near-synonyms in the same dictionary entry and the text of
		  the entry provide a strong context for disambiguation.
		  Second, knowledge about the collocational behaviour of the
		  near-synonyms was acquired from free text. Collocations
		  between a word and the near-synonyms in a dictionary entry
		  were classified into: preferred collocations,
		  less-preferred collocations, and anti-collocations. Third,
		  knowledge about distinctions between near-synonyms was
		  acquired from machine-readable dictionaries (the <i>General
		  Inquirer</i> and the <i>Macquarie Dictionary</i>). These
		  distinctions were merged with the initial LKB of NS, and
		  inconsistencies were resolved.</p> <P> The generic LKB of
		  NS needs to be customized in order to be used in a natural
		  language processing system. The parts that need
		  customization are the core denotations and the strings that
		  describe peripheral concepts in the denotational
		  distinctions. To show how the LKB of NS can be used in
		  practice, I present Xenon, a natural language generation
		  system system that chooses the near-synonym that best
		  matches a set of input preferences. I implemented Xenon by
		  adding a near-synonym choice module and a near-synonym
		  collocation module to an existing general-purpose surface
		  realizer.</p>},
  download	= {http://ftp.cs.toronto.edu/pub/gh/Inkpen-thesis.pdf}
}

Downloads: 0