In Proceedings of the 5th International Plagiarism Conference, Newcastle upon Tyne, UK, July, 2012. Paper Code doi abstract bibtex
This paper presents an open-source prototype of a citation-based plagiarism detection system called CitePlag. The underlying idea of the system is to evaluate the citations of academic documents as language independent markers to detect plagiarism. CitePlag uses three different detection algorithms that analyze the citation sequence of academic documents for similar patterns that may indicate unduly used foreign text or ideas. The algorithms consider multiple citation related factors such as proximity and order of citations within the text, or their probability of co-occurrence in order to compute document similarity scores. We present technical details of CitePlag's detection algorithms and the acquisition of test data from the PubMed Central Open Access Subset. Future advancements of the prototype focus on increasing the reference database by enabling the system to process more document and citation formats. Furthermore, we aim to improve CitePlag's detection algorithms and scoring functions for reducing the number of false positives. Eventually, we plan to integrate text with citation-based detection algorithms within CitePlag.