THE CONSTRUCTION OF INDONESIAN-ENGLISH CROSS LANGUAGE PLAGIARISM DETECTION SYSTEM USING FINGERPRINTING TECHNIQUE
Abstract: Cross language
plagiarism detection is an important task since it can protect person
intellectual property right. Since English is the most popular international
language, we proposed an Indonesian-English cross language plagiarism detection
to handle such problem in Indonesian-English domain where the suspected
plagiarism document is written in Indonesian and the source document is written
in English. To minimize translation error, we build the system by translating
the Indonesian document into English and then compare the translated document
with the English document collection. The detection system consists of
preprocess component, heuristic retrieval component, and detailed analysis
component. The main technique used in retrieval process is fingerprinting which
can extract lexical features from text which is suitable to be used to detect
plagiarism done using literal translation method. In this paper, we also
propose additional methods to be implemented in heuristic retrieval component
to increase the performance of the system: phrase chunking, stop word removal,
stemming, and synonym selection. We evaluated system̢۪s performance and the
effects of additional methods to system̢۪s performance, provided several data
test sets which represents a plagiarism type. From the experiments, we
concluded that the system works on 83.33% of test cases. We also concluded that
mainly all additional methods except the phrase chunking have good effects in
enhancing the system accuracy.
Keywords: detection system;
fingerprinting; Indonesian-English cross language; lintas bahasa
Indonesia-Inggris; phrase chunking; plagiarism; plagiarism
Author: Zakiy Firdaus Alfikri,
Ayu Purwarianti
Journal Code: jptkomputergg120007