Building readability lexicons with unannotated corpora. Brooke, J., Tsang, V., Jacob, D., Shein, F., & Hirst, G. In Proceedings, Workshop on Predicting and Improving Text Readability for Target Reader Populations, Montreal, 2012. The poster
abstract   bibtex   
Lexicons of word difficulty are useful for various educational applications, including readability classification and text simplification. In this work, we explore automatic creation of these lexicons using methods which go beyond simple term frequency, but without relying on age-graded texts. In particular, we derive information for each word type from the readability of the web documents they appear in and the words they co-occur with, linearly combining these various features. We show the efficacy of this approach by comparing our lexicon with an existing coarse-grained, low-coverage resource and a new crowdsourced annotation.
@InProceedings{	  brooke7,
  author	= {Julian Brooke and Vivian Tsang and David Jacob and Fraser
		  Shein and Graeme Hirst},
  title		= {Building readability lexicons with unannotated corpora},
  address	= {Montreal},
  booktitle	= {Proceedings, Workshop on Predicting and Improving Text
		  Readability for Target Reader Populations},
  year		= {2012},
  download	= {http://ftp.cs.toronto.edu/pub/gh/Brooke-etal-PITR-2012.pdf}
		  ,
  note		= {<a
		  href=http://ftp.cs.toronto.edu/pub/gh/Brooke-etal-PITR-2012-poster.pdf>The
		  poster</a>},
  abstract	= {Lexicons of word difficulty are useful for various
		  educational applications, including readability
		  classification and text simplification. In this work, we
		  explore automatic creation of these lexicons using methods
		  which go beyond simple term frequency, but without relying
		  on age-graded texts. In particular, we derive information
		  for each word type from the readability of the web
		  documents they appear in and the words they co-occur with,
		  linearly combining these various features. We show the
		  efficacy of this approach by comparing our lexicon with an
		  existing coarse-grained, low-coverage resource and a new
		  crowdsourced annotation.}
}

Downloads: 0