Mixed-depth representations for natural language text. Hirst, G. & Ryan, M. Text-based intelligent systems, pages 59–82. Lawrence Erlbaum Associates, Hillsdale, NJ, 1992.
abstract   bibtex   

Intelligent text-based systems will vary as to the degree of difficulty of the texts they deal with. Some may have a relatively easy time with texts for which fairly superficial processes will get useful results, such as, say, The New York Times or Julia Child's Favorite Recipes. But many systems will have to work on more difficult texts. Often, it is the complexity of the text that makes the system desirable in the first place. It is for such systems that we need to think about making the deeper methods that are already studied in AI and computational linguistics more robust and suitable for processing long texts without interactive human help. The dilemma is that on one hand, we have the limitations of raw text databases and superficial processing methods; on the other we have the difficulty of deeper methods and conceptual representations. Our proposal here is to have the best of both, and accordingly we develop the notion of a heterogeneous, or mixed, type of representation.

In our model, a text base permits two parallel representations of meaning: the text itself, for presentation to human users, and a conceptual encoding of the text, for use by intelligent components of the system. The two representations are stored in parallel; that is, there are links between each unit of text (a sentence or paragraph in most cases) and the corresponding conceptual encoding. This encoding could be created en masse when the text was entered into the system. But if it is expected that only a small fraction of the text base will ever be looked at by processes that need the conceptual representations, then the encoding could be performed on each part of the text as necessary for inference and understanding to answer some particular request. The results could then be stored so that they don't have to be redone if the same area of the text is searched again. Thus, a text would gradually grow its encoding as it continues to be used. (And the work will never be done for texts or parts of texts that are never used.)

So far, this is straightforward. But we can go one step further. The encoding itself may be deep or shallow at different places, depending on what happened to be necessary at the time it was generated—or on what was possible. Or, to put it a different way, we can view natural-language text and AI-style knowledge representations as two ends of a spectrum.

@InBook{	  hirst22,
  author	= {Graeme Hirst and Mark Ryan},
  chapter	= {Mixed-depth representations for natural language text},
  editor	= {Paul S. Jacobs},
  title		= {Text-based intelligent systems},
  address	= {Hillsdale, NJ},
  publisher	= {Lawrence Erlbaum Associates},
  year		= {1992},
  pages		= {59--82},
  abstract	= {<P> Intelligent text-based systems will vary as to the
		  degree of difficulty of the texts they deal with. Some may
		  have a relatively easy time with texts for which fairly
		  superficial processes will get useful results, such as,
		  say, <I>The New York Times</I> or <I>Julia Child's Favorite
		  Recipes</I>. But many systems will have to work on more
		  difficult texts. Often, it is the complexity of the text
		  that makes the system desirable in the first place. It is
		  for such systems that we need to think about making the
		  deeper methods that are already studied in AI and
		  computational linguistics more robust and suitable for
		  processing long texts without interactive human help. The
		  dilemma is that on one hand, we have the limitations of raw
		  text databases and superficial processing methods; on the
		  other we have the difficulty of deeper methods and
		  conceptual representations. Our proposal here is to have
		  the best of both, and accordingly we develop the notion of
		  a heterogeneous, or mixed, type of representation.</p>
		  <P>In our model, a text base permits two parallel
		  representations of meaning: the text itself, for
		  presentation to human users, and a <I>conceptual
		  encoding</I> of the text, for use by intelligent components
		  of the system. The two representations are stored in
		  parallel; that is, there are links between each unit of
		  text (a sentence or paragraph in most cases) and the
		  corresponding conceptual encoding. This encoding could be
		  created en masse when the text was entered into the system.
		  But if it is expected that only a small fraction of the
		  text base will ever be looked at by processes that need the
		  conceptual representations, then the encoding could be
		  performed on each part of the text as necessary for
		  inference and understanding to answer some particular
		  request. The results could then be stored so that they
		  don't have to be redone if the same area of the text is
		  searched again. Thus, a text would gradually <I>grow</I>
		  its encoding as it continues to be used. (And the work will
		  never be done for texts or parts of texts that are never
		  used.)</p> <p>So far, this is straightforward. But we can
		  go one step further. The encoding itself may be deep or
		  shallow at different places, depending on what happened to
		  be necessary at the time it was generated---or on what was
		  possible. Or, to put it a different way, we can view
		  natural-language text and AI-style knowledge
		  representations as two ends of a spectrum.</p>},
  download	= {http://ftp.cs.toronto.edu/pub/gh/Hirst+Ryan-92.pdf}
}

Downloads: 0