The rhetorical parsing, summarization, and generation of natural language texts. Marcu, D. Ph.D. Thesis, Department of Computer Science, University of Toronto, January, 1998. Published as technical report CSRI-371abstract bibtex This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically.
The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid.
The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of a sequence of units among which some rhetorical relations hold. Two algorithms apply model-theoretic techniques; the other two apply proof-theoretic techniques.
The formalization and the algorithms mentioned so far correspond to the theoretical facet of the thesis. An exploratory corpus analysis of cue phrases provides the means for applying the formalization to unrestricted natural language texts. A set of empirically motivated algorithms were designed in order to determine the elementary textual units of a text, to hypothesize rhetorical relations that hold among these units, and eventually, to derive the discourse structure of that text. The process that finds the discourse structure of unrestricted natural language texts is called rhetorical parsing.
The thesis explores two possible applications of the text theory that it proposes. The first application concerns a discourse-based summarization system, which is shown to significantly outperform both a baseline algorithm and a commercial system. An empirical psycholinguistic experiment not only provides an objective evaluation of the summarization system, but also confirms the adequacy of using the text theory proposed here in order to determine the most important units in a text. The second application concerns a set of text planning algorithms that can be used by natural language generation systems in order to construct text plans in the cases in which the high-level communicative goal is to map an entire knowledge pool into text.
@PhDThesis{ marcu8,
author = {Daniel Marcu},
title = {The rhetorical parsing, summarization, and generation of
natural language texts},
school = {Department of Computer Science, University of Toronto},
month = {January},
year = {1998},
note = {Published as technical report CSRI-371},
abstract = {<P>This thesis is an inquiry into the nature of the
high-level, rhetorical structure of unrestricted natural
language texts, computational means to enable its
derivation, and two applications (in automatic
summarization and natural language generation) that follow
from the ability to build such structures
automatically.</p> <P>The thesis proposes a first-order
formalization of the high-level, rhetorical structure of
text. The formalization assumes that text can be sequenced
into elementary units; that discourse relations hold
between textual units of various sizes; that some textual
units are more important to the writer's purpose than
others; and that trees are a good approximation of the
abstract structure of text. The formalization also
introduces a linguistically motivated compositionality
criterion, which is shown to hold for the text structures
that are valid.</p> <P>The thesis proposes, analyzes
theoretically, and compares empirically four algorithms for
determining the valid text structures of a sequence of
units among which some rhetorical relations hold. Two
algorithms apply model-theoretic techniques; the other two
apply proof-theoretic techniques.</p> <P>The formalization
and the algorithms mentioned so far correspond to the
theoretical facet of the thesis. An exploratory corpus
analysis of cue phrases provides the means for applying the
formalization to unrestricted natural language texts. A set
of empirically motivated algorithms were designed in order
to determine the elementary textual units of a text, to
hypothesize rhetorical relations that hold among these
units, and eventually, to derive the discourse structure of
that text. The process that finds the discourse structure
of unrestricted natural language texts is called rhetorical
parsing.</p> <P>The thesis explores two possible
applications of the text theory that it proposes. The first
application concerns a discourse-based summarization
system, which is shown to significantly outperform both a
baseline algorithm and a commercial system. An empirical
psycholinguistic experiment not only provides an objective
evaluation of the summarization system, but also confirms
the adequacy of using the text theory proposed here in
order to determine the most important units in a text. The
second application concerns a set of text planning
algorithms that can be used by natural language generation
systems in order to construct text plans in the cases in
which the high-level communicative goal is to map an entire
knowledge pool into text.</p>},
download = {http://ftp.cs.toronto.edu/pub/gh/Marcu-PhDthesis.pdf}
}
Downloads: 0
{"_id":{"_str":"534282740e946d920a001b32"},"__v":3,"authorIDs":["5456f4cc8b01c819300000ad"],"author_short":["Marcu, D."],"bibbaseid":"marcu-therhetoricalparsingsummarizationandgenerationofnaturallanguagetexts-1998","bibdata":{"bibtype":"phdthesis","type":"phdthesis","author":[{"firstnames":["Daniel"],"propositions":[],"lastnames":["Marcu"],"suffixes":[]}],"title":"The rhetorical parsing, summarization, and generation of natural language texts","school":"Department of Computer Science, University of Toronto","month":"January","year":"1998","note":"Published as technical report CSRI-371","abstract":"<P>This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically.</p> <P>The thesis proposes a first-order formalization of the high-level, rhetorical structure of text. The formalization assumes that text can be sequenced into elementary units; that discourse relations hold between textual units of various sizes; that some textual units are more important to the writer's purpose than others; and that trees are a good approximation of the abstract structure of text. The formalization also introduces a linguistically motivated compositionality criterion, which is shown to hold for the text structures that are valid.</p> <P>The thesis proposes, analyzes theoretically, and compares empirically four algorithms for determining the valid text structures of a sequence of units among which some rhetorical relations hold. Two algorithms apply model-theoretic techniques; the other two apply proof-theoretic techniques.</p> <P>The formalization and the algorithms mentioned so far correspond to the theoretical facet of the thesis. An exploratory corpus analysis of cue phrases provides the means for applying the formalization to unrestricted natural language texts. A set of empirically motivated algorithms were designed in order to determine the elementary textual units of a text, to hypothesize rhetorical relations that hold among these units, and eventually, to derive the discourse structure of that text. The process that finds the discourse structure of unrestricted natural language texts is called rhetorical parsing.</p> <P>The thesis explores two possible applications of the text theory that it proposes. The first application concerns a discourse-based summarization system, which is shown to significantly outperform both a baseline algorithm and a commercial system. An empirical psycholinguistic experiment not only provides an objective evaluation of the summarization system, but also confirms the adequacy of using the text theory proposed here in order to determine the most important units in a text. The second application concerns a set of text planning algorithms that can be used by natural language generation systems in order to construct text plans in the cases in which the high-level communicative goal is to map an entire knowledge pool into text.</p>","download":"http://ftp.cs.toronto.edu/pub/gh/Marcu-PhDthesis.pdf","bibtex":"@PhDThesis{\t marcu8,\n author\t= {Daniel Marcu},\n title\t\t= {The rhetorical parsing, summarization, and generation of\n\t\t natural language texts},\n school\t= {Department of Computer Science, University of Toronto},\n month\t\t= {January},\n year\t\t= {1998},\n note\t\t= {Published as technical report CSRI-371},\n abstract\t= {<P>This thesis is an inquiry into the nature of the\n\t\t high-level, rhetorical structure of unrestricted natural\n\t\t language texts, computational means to enable its\n\t\t derivation, and two applications (in automatic\n\t\t summarization and natural language generation) that follow\n\t\t from the ability to build such structures\n\t\t automatically.</p> <P>The thesis proposes a first-order\n\t\t formalization of the high-level, rhetorical structure of\n\t\t text. The formalization assumes that text can be sequenced\n\t\t into elementary units; that discourse relations hold\n\t\t between textual units of various sizes; that some textual\n\t\t units are more important to the writer's purpose than\n\t\t others; and that trees are a good approximation of the\n\t\t abstract structure of text. The formalization also\n\t\t introduces a linguistically motivated compositionality\n\t\t criterion, which is shown to hold for the text structures\n\t\t that are valid.</p> <P>The thesis proposes, analyzes\n\t\t theoretically, and compares empirically four algorithms for\n\t\t determining the valid text structures of a sequence of\n\t\t units among which some rhetorical relations hold. Two\n\t\t algorithms apply model-theoretic techniques; the other two\n\t\t apply proof-theoretic techniques.</p> <P>The formalization\n\t\t and the algorithms mentioned so far correspond to the\n\t\t theoretical facet of the thesis. An exploratory corpus\n\t\t analysis of cue phrases provides the means for applying the\n\t\t formalization to unrestricted natural language texts. A set\n\t\t of empirically motivated algorithms were designed in order\n\t\t to determine the elementary textual units of a text, to\n\t\t hypothesize rhetorical relations that hold among these\n\t\t units, and eventually, to derive the discourse structure of\n\t\t that text. The process that finds the discourse structure\n\t\t of unrestricted natural language texts is called rhetorical\n\t\t parsing.</p> <P>The thesis explores two possible\n\t\t applications of the text theory that it proposes. The first\n\t\t application concerns a discourse-based summarization\n\t\t system, which is shown to significantly outperform both a\n\t\t baseline algorithm and a commercial system. An empirical\n\t\t psycholinguistic experiment not only provides an objective\n\t\t evaluation of the summarization system, but also confirms\n\t\t the adequacy of using the text theory proposed here in\n\t\t order to determine the most important units in a text. The\n\t\t second application concerns a set of text planning\n\t\t algorithms that can be used by natural language generation\n\t\t systems in order to construct text plans in the cases in\n\t\t which the high-level communicative goal is to map an entire\n\t\t knowledge pool into text.</p>},\n download\t= {http://ftp.cs.toronto.edu/pub/gh/Marcu-PhDthesis.pdf}\n}\n\n","author_short":["Marcu, D."],"key":"marcu8","id":"marcu8","bibbaseid":"marcu-therhetoricalparsingsummarizationandgenerationofnaturallanguagetexts-1998","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"phdthesis","biburl":"www.cs.toronto.edu/~fritz/tmp/compling.bib","downloads":0,"keywords":[],"search_terms":["rhetorical","parsing","summarization","generation","natural","language","texts","marcu"],"title":"The rhetorical parsing, summarization, and generation of natural language texts","year":1998,"dataSources":["n8jB5BJxaeSmH6mtR","6b6A9kbkw4CsEGnRX"]}