In Sundqvist, A., Berget, G., Nolin, J., & Skjerdingstad, K. I., editors, Sustainable Digital Communities, volume 12051 LNCS, pages 816–827. Springer International Publishing, Cham, March, 2020. Paper Data Demo doi abstract bibtex
Research on academic integrity has identified online paraphrasing tools as a severe threat to the effectiveness of plagiarism detection systems. To enable the automated identification of machine-paraphrased text, we make three contributions. First, we evaluate the effectiveness of six prominent word embedding models in combination with five classifiers for distinguishing human-written from machine-paraphrased text. The best performing classification approach achieves an accuracy of 99.0% for documents and 83.4% for paragraphs. Second, we show that the best approach outperforms human experts and established plagiarism detection systems for these classification tasks. Third, we provide a Web application that uses the best performing classification approach to indicate whether a text underwent machine-paraphrasing. The data and code of our study are openly available.