Detecting Machine-Obfuscated Plagiarism. Foltýnek, T., Ruas, T., Scharpf, P., Meuschke, N., Schubotz, M., Grosky, W., & Gipp, B. In Sundqvist, A., Berget, G., Nolin, J., & Skjerdingstad, K. I., editors, Sustainable Digital Communities, volume 12051 LNCS, pages 816–827. Springer International Publishing, Cham, March, 2020.
Detecting Machine-Obfuscated Plagiarism [pdf]Paper  Detecting Machine-Obfuscated Plagiarism [link]Data  Detecting Machine-Obfuscated Plagiarism [link]Demo  doi  abstract   bibtex   
Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machine-paraphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.

Downloads: 0