Building a Lexical Knowledge-Base of Near-Synonym Differences. Inkpen, D. Ph.D. Thesis, Department of Computer Science, University of Toronto, October, 2003. abstract bibtex Current natural language generation or machine translation systems cannot distinguish among near-synonyms — words that share the same core meaning but vary in their lexical nuances. This is due to a lack of knowledge about differences between near-synonyms in existing computational lexical resources.
The goal of this thesis is to automatically acquire a lexical knowledge-base of near-synonym differences (LKB of NS) from multiple sources, and to show how it can be used in a practical natural language processing system.
I designed a method to automatically acquire knowledge from dictionaries of near-synonym discrimination written for human readers. An unsupervised decision-list algorithm learns patterns and words for classes of distinctions. The patterns are learned automatically, followed by a manual validation step. The extraction of distinctions between near-synonyms is entirely automatic. The main types of distinctions are: stylistic (for example, inebriated is more formal than drunk), attitudinal (for example, skinny is more pejorative than slim), and denotational (for example, blunder implies accident and ignorance, while error does not).
I enriched the initial LKB of NS with information extracted from other sources. First, information about the senses of the near-synonym was added (WordNet senses). The other near-synonyms in the same dictionary entry and the text of the entry provide a strong context for disambiguation. Second, knowledge about the collocational behaviour of the near-synonyms was acquired from free text. Collocations between a word and the near-synonyms in a dictionary entry were classified into: preferred collocations, less-preferred collocations, and anti-collocations. Third, knowledge about distinctions between near-synonyms was acquired from machine-readable dictionaries (the General Inquirer and the Macquarie Dictionary). These distinctions were merged with the initial LKB of NS, and inconsistencies were resolved.
The generic LKB of NS needs to be customized in order to be used in a natural language processing system. The parts that need customization are the core denotations and the strings that describe peripheral concepts in the denotational distinctions. To show how the LKB of NS can be used in practice, I present Xenon, a natural language generation system system that chooses the near-synonym that best matches a set of input preferences. I implemented Xenon by adding a near-synonym choice module and a near-synonym collocation module to an existing general-purpose surface realizer.
@PhDThesis{ inkpen4,
author = {Diana Inkpen},
title = {Building a Lexical Knowledge-Base of Near-Synonym
Differences},
school = {Department of Computer Science, University of Toronto},
month = {October},
year = {2003},
abstract = {<P> Current natural language generation or machine
translation systems cannot distinguish among near-synonyms
--- words that share the same core meaning but vary in
their lexical nuances. This is due to a lack of knowledge
about differences between near-synonyms in existing
computational lexical resources.</p> <P> The goal of this
thesis is to automatically acquire a lexical knowledge-base
of near-synonym differences (LKB of NS) from multiple
sources, and to show how it can be used in a practical
natural language processing system.</p> <P> I designed a
method to automatically acquire knowledge from dictionaries
of near-synonym discrimination written for human readers.
An unsupervised decision-list algorithm learns patterns and
words for classes of distinctions. The patterns are learned
automatically, followed by a manual validation step. The
extraction of distinctions between near-synonyms is
entirely automatic. The main types of distinctions are:
stylistic (for example, <i>inebriated</i> is more formal
than <i>drunk</i>), attitudinal (for example, <i>skinny</i>
is more pejorative than <i>slim</i>), and denotational (for
example, <i>blunder</i> implies <i>accident</i> and
<i>ignorance</i>, while <i>error</i> does not).</p> <P> I
enriched the initial LKB of NS with information extracted
from other sources. First, information about the senses of
the near-synonym was added (WordNet senses). The other
near-synonyms in the same dictionary entry and the text of
the entry provide a strong context for disambiguation.
Second, knowledge about the collocational behaviour of the
near-synonyms was acquired from free text. Collocations
between a word and the near-synonyms in a dictionary entry
were classified into: preferred collocations,
less-preferred collocations, and anti-collocations. Third,
knowledge about distinctions between near-synonyms was
acquired from machine-readable dictionaries (the <i>General
Inquirer</i> and the <i>Macquarie Dictionary</i>). These
distinctions were merged with the initial LKB of NS, and
inconsistencies were resolved.</p> <P> The generic LKB of
NS needs to be customized in order to be used in a natural
language processing system. The parts that need
customization are the core denotations and the strings that
describe peripheral concepts in the denotational
distinctions. To show how the LKB of NS can be used in
practice, I present Xenon, a natural language generation
system system that chooses the near-synonym that best
matches a set of input preferences. I implemented Xenon by
adding a near-synonym choice module and a near-synonym
collocation module to an existing general-purpose surface
realizer.</p>},
download = {http://ftp.cs.toronto.edu/pub/gh/Inkpen-thesis.pdf}
}
Downloads: 0
{"_id":{"_str":"534282740e946d920a001b97"},"__v":3,"authorIDs":["545f38126aaec20d23000b37"],"author_short":["Inkpen, D."],"bibbaseid":"inkpen-buildingalexicalknowledgebaseofnearsynonymdifferences-2003","bibdata":{"bibtype":"phdthesis","type":"phdthesis","author":[{"firstnames":["Diana"],"propositions":[],"lastnames":["Inkpen"],"suffixes":[]}],"title":"Building a Lexical Knowledge-Base of Near-Synonym Differences","school":"Department of Computer Science, University of Toronto","month":"October","year":"2003","abstract":"<P> Current natural language generation or machine translation systems cannot distinguish among near-synonyms — words that share the same core meaning but vary in their lexical nuances. This is due to a lack of knowledge about differences between near-synonyms in existing computational lexical resources.</p> <P> The goal of this thesis is to automatically acquire a lexical knowledge-base of near-synonym differences (LKB of NS) from multiple sources, and to show how it can be used in a practical natural language processing system.</p> <P> I designed a method to automatically acquire knowledge from dictionaries of near-synonym discrimination written for human readers. An unsupervised decision-list algorithm learns patterns and words for classes of distinctions. The patterns are learned automatically, followed by a manual validation step. The extraction of distinctions between near-synonyms is entirely automatic. The main types of distinctions are: stylistic (for example, <i>inebriated</i> is more formal than <i>drunk</i>), attitudinal (for example, <i>skinny</i> is more pejorative than <i>slim</i>), and denotational (for example, <i>blunder</i> implies <i>accident</i> and <i>ignorance</i>, while <i>error</i> does not).</p> <P> I enriched the initial LKB of NS with information extracted from other sources. First, information about the senses of the near-synonym was added (WordNet senses). The other near-synonyms in the same dictionary entry and the text of the entry provide a strong context for disambiguation. Second, knowledge about the collocational behaviour of the near-synonyms was acquired from free text. Collocations between a word and the near-synonyms in a dictionary entry were classified into: preferred collocations, less-preferred collocations, and anti-collocations. Third, knowledge about distinctions between near-synonyms was acquired from machine-readable dictionaries (the <i>General Inquirer</i> and the <i>Macquarie Dictionary</i>). These distinctions were merged with the initial LKB of NS, and inconsistencies were resolved.</p> <P> The generic LKB of NS needs to be customized in order to be used in a natural language processing system. The parts that need customization are the core denotations and the strings that describe peripheral concepts in the denotational distinctions. To show how the LKB of NS can be used in practice, I present Xenon, a natural language generation system system that chooses the near-synonym that best matches a set of input preferences. I implemented Xenon by adding a near-synonym choice module and a near-synonym collocation module to an existing general-purpose surface realizer.</p>","download":"http://ftp.cs.toronto.edu/pub/gh/Inkpen-thesis.pdf","bibtex":"@PhDThesis{\t inkpen4,\n author\t= {Diana Inkpen},\n title\t\t= {Building a Lexical Knowledge-Base of Near-Synonym\n\t\t Differences},\n school\t= {Department of Computer Science, University of Toronto},\n month\t\t= {October},\n year\t\t= {2003},\n abstract\t= {<P> Current natural language generation or machine\n\t\t translation systems cannot distinguish among near-synonyms\n\t\t --- words that share the same core meaning but vary in\n\t\t their lexical nuances. This is due to a lack of knowledge\n\t\t about differences between near-synonyms in existing\n\t\t computational lexical resources.</p> <P> The goal of this\n\t\t thesis is to automatically acquire a lexical knowledge-base\n\t\t of near-synonym differences (LKB of NS) from multiple\n\t\t sources, and to show how it can be used in a practical\n\t\t natural language processing system.</p> <P> I designed a\n\t\t method to automatically acquire knowledge from dictionaries\n\t\t of near-synonym discrimination written for human readers.\n\t\t An unsupervised decision-list algorithm learns patterns and\n\t\t words for classes of distinctions. The patterns are learned\n\t\t automatically, followed by a manual validation step. The\n\t\t extraction of distinctions between near-synonyms is\n\t\t entirely automatic. The main types of distinctions are:\n\t\t stylistic (for example, <i>inebriated</i> is more formal\n\t\t than <i>drunk</i>), attitudinal (for example, <i>skinny</i>\n\t\t is more pejorative than <i>slim</i>), and denotational (for\n\t\t example, <i>blunder</i> implies <i>accident</i> and\n\t\t <i>ignorance</i>, while <i>error</i> does not).</p> <P> I\n\t\t enriched the initial LKB of NS with information extracted\n\t\t from other sources. First, information about the senses of\n\t\t the near-synonym was added (WordNet senses). The other\n\t\t near-synonyms in the same dictionary entry and the text of\n\t\t the entry provide a strong context for disambiguation.\n\t\t Second, knowledge about the collocational behaviour of the\n\t\t near-synonyms was acquired from free text. Collocations\n\t\t between a word and the near-synonyms in a dictionary entry\n\t\t were classified into: preferred collocations,\n\t\t less-preferred collocations, and anti-collocations. Third,\n\t\t knowledge about distinctions between near-synonyms was\n\t\t acquired from machine-readable dictionaries (the <i>General\n\t\t Inquirer</i> and the <i>Macquarie Dictionary</i>). These\n\t\t distinctions were merged with the initial LKB of NS, and\n\t\t inconsistencies were resolved.</p> <P> The generic LKB of\n\t\t NS needs to be customized in order to be used in a natural\n\t\t language processing system. The parts that need\n\t\t customization are the core denotations and the strings that\n\t\t describe peripheral concepts in the denotational\n\t\t distinctions. To show how the LKB of NS can be used in\n\t\t practice, I present Xenon, a natural language generation\n\t\t system system that chooses the near-synonym that best\n\t\t matches a set of input preferences. I implemented Xenon by\n\t\t adding a near-synonym choice module and a near-synonym\n\t\t collocation module to an existing general-purpose surface\n\t\t realizer.</p>},\n download\t= {http://ftp.cs.toronto.edu/pub/gh/Inkpen-thesis.pdf}\n}\n\n","author_short":["Inkpen, D."],"key":"inkpen4","id":"inkpen4","bibbaseid":"inkpen-buildingalexicalknowledgebaseofnearsynonymdifferences-2003","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"phdthesis","biburl":"www.cs.toronto.edu/~fritz/tmp/compling.bib","downloads":0,"keywords":[],"search_terms":["building","lexical","knowledge","base","near","synonym","differences","inkpen"],"title":"Building a Lexical Knowledge-Base of Near-Synonym Differences","year":2003,"dataSources":["n8jB5BJxaeSmH6mtR","6b6A9kbkw4CsEGnRX"]}