Automatic Detection of Authorship Changes within Single Documents. Graham, N. Master's thesis, Department of Computer Science, University of Toronto, January, 2000. Published as technical report CSRG-406
abstract   bibtex   

One of the most difficult tasks facing anyone who must compile or maintain any large, collaboratively-written document is to foster a consistent style throughout. In this thesis, we explore whether it is possible to identify stylistic inconsistencies within documents even in principle, given our understanding of how style can be captured statistically.

We carry out this investigation by computing stylistic statistics on very small samples of text comprising a set of synthetic collaboratively-written documents, and using these statistics to train and test a series of neural networks. We are able to show that this method does allow us to recover the boundaries of authors' contributions. We find that time-delay neural networks, hitherto ignored in this field, are especially effective in this regard. Along the way, we observe that statistics characterizing the syntactic style of a passage appear to hold much more information for small text samples than those concerned with lexical choice or complexity.

@MastersThesis{	  graham3,
  author	= {Neil Graham},
  title		= {Automatic Detection of Authorship Changes within Single
		  Documents},
  school	= {Department of Computer Science, University of Toronto},
  month		= {January},
  year		= {2000},
  note		= {Published as technical report CSRG-406},
  abstract	= {<P> One of the most difficult tasks facing anyone who must
		  compile or maintain any large, collaboratively-written
		  document is to foster a consistent style throughout. In
		  this thesis, we explore whether it is possible to identify
		  stylistic inconsistencies within documents even in
		  principle, given our understanding of how style can be
		  captured statistically.</p> <P>We carry out this
		  investigation by computing stylistic statistics on very
		  small samples of text comprising a set of synthetic
		  collaboratively-written documents, and using these
		  statistics to train and test a series of neural networks.
		  We are able to show that this method does allow us to
		  recover the boundaries of authors' contributions. We find
		  that time-delay neural networks, hitherto ignored in this
		  field, are especially effective in this regard. Along the
		  way, we observe that statistics characterizing the
		  syntactic style of a passage appear to hold much more
		  information for small text samples than those concerned
		  with lexical choice or complexity.</p>},
  download	= {http://ftp.cs.toronto.edu/pub/gh/Graham-thesis.pdf}
}

Downloads: 0