Authorship verification with entity coherence and other rich linguistic features. Feng, V. W. & Hirst, G. In Proceedings, PAN 2013 Lab: Uncovering Plagiarism, Authorship and Social Software Misuse — at the CLEF 2013 Conference and Labs of the Evaluation Forum: Information Access Evaluation meets Multilinguality, Multimodality, and Visualization), Valencia, Spain, September, 2013.
Authorship verification with entity coherence and other rich linguistic features [link]Paper  abstract   bibtex   
We adopt Koppel et al.'s unmasking approach as the major framework of our authorship verification system. We enrich Koppel et al.'s original word frequency features with a novel set of coherence features, derived from our earlier work, together with a full set of stylometric features. For texts written in languages other than English, some stylometric features are unavailable due to the lack of appropriate NLP tools, and their coherence features are derived from their translations produced by Google Translate service. Evaluated on the training corpus, we achieve an overall accuracy of 65.7%: 100.0% for both English and Spanish texts, while only 40% for Greek texts; evaluated on the test corpus, we achieve an overall accuracy of 68.2%, and roughly the same performance across three languages.
@InProceedings{	  feng2013p,
  author	= {Vanessa Wei Feng and Graeme Hirst},
  title		= {Authorship verification with entity coherence and other
		  rich linguistic features},
  address	= {Valencia, Spain},
  booktitle	= {Proceedings, PAN 2013 Lab: Uncovering Plagiarism,
		  Authorship and Social Software Misuse --- at the CLEF 2013
		  Conference and Labs of the Evaluation Forum: Information
		  Access Evaluation meets Multilinguality, Multimodality, and
		  Visualization)},
  year		= {2013},
  month		= {September},
  url		= {http://www.clef-initiative.eu/documents/71612/278c06fc-20c1-4340-a6f9-eeec1e87913c}
		  ,
  abstract	= {We adopt Koppel et al.'s unmasking approach as the major
		  framework of our authorship verification system. We enrich
		  Koppel et al.'s original word frequency features with a
		  novel set of coherence features, derived from our earlier
		  work, together with a full set of stylometric features. For
		  texts written in languages other than English, some
		  stylometric features are unavailable due to the lack of
		  appropriate NLP tools, and their coherence features are
		  derived from their translations produced by Google
		  Translate service. Evaluated on the training corpus, we
		  achieve an overall accuracy of 65.7\%: 100.0\% for both
		  English and Spanish texts, while only 40\% for Greek texts;
		  evaluated on the test corpus, we achieve an overall
		  accuracy of 68.2\%, and roughly the same performance across
		  three languages.}
}

Downloads: 0