Pengaruh Phrase Detection dengan POS-Tagger terhadap Akurasi Klasifikasi Sentimen menggunakan SVM
Abstract: Sentiment analysis
or opinion mining, which is one of the application of Natural Language
Processing (NLP), aims to find a method to facilitate human in communicating
with a computer using their common language. To simplify the process of understanding
human language, there are three important stages that must be carried out by a
computer, which are tokenizing, stemming and filtering. The tokenizing that
breaks down the sentence into a single word will make the computer assume all
words (token) are the same. If there is a phrase formed from one of unimportant
words, which is happened to be in the stoplist, the phrase will be deleted.
Solution for the aforementioned problem is tokenizing based on phrase detection
using Hidden Markov Model (HMM) POS-Tagger to improve classification
performance using Support Vector Machine (SVM).
With this approach, computer will be able to distinguish a phrase from
others, then store the phrase into a single entity. There is an increase in
accuracy by approximately 6% on Dataset I and 3% on Dataset II in the
classification process using phrase detection, due to reduction of missing
features that usually occurs in the filtering process. In addition, the
detection of the phrase-based approach also produces the most optimal
classification model, as seen from the ROC value that reaches 0.897.
Kata Kunci: analisis sentimen,
deteksi frasa, HMM POS-Tagger, ROC, Support Vector Machine, tokenisasi
Penulis: Hermawan Arief
Putranto, Onny Setyawati, Wijono Wijono
Kode Jurnal: jptlisetrodd160424