Springer Fachmedien Wiesbaden, 2023. Preprint Publisher doi abstract bibtex
Identifying plagiarism is a pressing problem for research institutions, publishers, and funding bodies. Current detection methods focus on textual analysis and find copied, moderately reworded, or translated content. However, detecting more subtle forms of plagiarism, including strong paraphrasing, sense-for-sense translations, or the reuse of non-textual content and ideas, remains a challenge. This book presents a novel approach to address this problem—analyzing non-textual elements in academic documents, such as citations, images, and mathematical content. The proposed detection techniques are validated in five evaluations using confirmed plagiarism cases and exploratory searches for new instances. The results show that non-textual elements contain much semantic information, are language-independent, and resilient to typical tactics for concealing plagiarism. Incorporating non-textual content analysis complements text-based detection approaches and increases the detection effectiveness, particularly for disguised forms of plagiarism. The book introduces the first integrated plagiarism detection system that combines citation, image, math, and text similarity analysis. Its user interface features visual aids that significantly reduce the time and effort users must invest in examining content similarity.