THE IMPLEMENTATION OF THE MACHINE LEARNING ALGORITHM FOR THE SENTIMENT ANALYSIS OF INDONESIA’S 2019 PRESIDENTIAL ELECTION
DOI:
https://doi.org/10.31436/iiumej.v22i1.1532Keywords:
sentiment analysis, president, indonesia, naive bayes classifier, Support Vector Machine, Machine LearningAbstract
In 2019, citizens of Indonesia participated in the democratic process of electing a new president, vice president, and various legislative candidates for the country. The 2019 Indonesian presidential election was very tense in terms of the candidates' campaigns in cyberspace, especially on social media sites such as Facebook, Twitter, Instagram, Google+, Tumblr, LinkedIn, etc. The Indonesian people used social media platforms to express their positive, neutral, and also negative opinions on the respective presidential candidates. The campaigning of respective social media users on their choice of candidates for regents, governors, and legislative positions up to presidential candidates was conducted via the Internet and online media. Therefore, the aim of this paper is to conduct sentiment analysis on the candidates in the 2019 Indonesia presidential election based on Twitter datasets. The study used datasets on the opinions expressed by the Indonesian people available on Twitter with the hashtags (#) containing "Jokowi and Prabowo." We conducted data pre-processing using a selection of comments, data cleansing, text parsing, sentence normalization and tokenization based on the given text in the Indonesian language, determination of class attributes, and, finally, we classified the Twitter posts with the hashtags (#) using Naïve Bayes Classifier (NBC) and a Support Vector Machine (SVM) to achieve an optimal and maximum optimization accuracy. The study provides benefits in terms of helping the community to research opinions on Twitter that contain positive, neutral, or negative sentiments. Sentiment Analysis on the candidates in the 2019 Indonesian presidential election on Twitter using non-conventional processes resulted in cost, time, and effort savings. This research proved that the combination of the SVM machine learning algorithm and alphabetic tokenization produced the highest accuracy value of 79.02%. While the lowest accuracy value in this study was obtained with a combination of the NBC machine learning algorithm and N-gram tokenization with an accuracy value of 44.94%.
ABSTRAK: Pada tahun 2019 rakyat Indonesia telah terlibat dalam proses demokrasi memilih presiden baru, wakil presiden, dan berbagai calon legislatif negara. Pemilihan presiden Indonesia 2019 sangat tegang dalam kempen calon di ruang siber, terutama di laman media sosial seperti Facebook, Twitter, Instagram, Google+, Tumblr, LinkedIn, dll. Rakyat Indonesia menggunakan platfom media sosial bagi menyatakan pendapat positif, berkecuali, dan juga negatif terhadap calon presiden masing-masing. Kampen pencalonan menteri, gabenor, dan perundangan hingga pencalonan presiden dilakukan melalui media internet dan atas talian. Oleh itu, kajian ini dilakukan bagi menilai sentimen terhadap calon pemilihan presiden Indonesia 2019 berdasarkan kumpulan data Twitter. Kajian ini menggunakan kumpulan data yang diungkapkan oleh rakyat Indonesia yang terdapat di Twitter dengan hashtag (#) yang mengandungi "Jokowi dan Prabowo." Proses data dibuat menggunakan pilihan komentar, pembersihan data, penguraian teks, normalisasi kalimat, dan tokenisasi teks dalam bahasa Indonesia, penentuan atribut kelas, dan akhirnya, pengklasifikasian catatan Twitter dengan hashtag (#) menggunakan Klasifikasi Naïve Bayes (NBC) dan Mesin Vektor Sokongan (SVM) bagi mencapai ketepatan optimum dan maksimum. Kajian ini memberikan faedah dari segi membantu masyarakat meneliti pendapat di Twitter yang mengandungi sentimen positif, neutral, atau negatif. Analisis Sentimen terhadap calon dalam pemilihan presiden Indonesia 2019 di Twitter menggunakan proses bukan konvensional menghasilkan penjimatan kos, waktu, dan usaha. Penyelidikan ini membuktikan bahawa gabungan algoritma pembelajaran mesin SVM dan tokenisasi abjad menghasilkan nilai ketepatan tertinggi iaitu 79.02%. Manakala nilai ketepatan terendah dalam kajian ini diperoleh dengan kombinasi algoritma pembelajaran mesin NBC dan tokenisasi N-gram dengan nilai ketepatan 44.94%.
Downloads
Metrics
References
KPU - Portal Publikasi Pemilihan Umum 2019. Available: https://infopemilu.kpu.go.id/.
Instagram by the Numbers: Stats, Demographics & Fun Facts, Omnicore Agency, 2020. Available: https://www.omnicoreagency.com/instagram-statistics/.
Januru L. (2016) Analisis Wacana Black Campaign (Kampanye Hitam) Pada PILPRES Tahun 2014 di Media Kompas, Jawa Pos Dan Kedaulatan Rakyat, Natapraja, 4(2): 181-194.
Indurkhya N, Damerau FJ. (2010) Handbook of Natural Language Processing, 2nd ed. Chapman & Hall/CRC.
Wagh B, N RW. (2016) Sentimental Analysis on Twitter Data using Naive Bayes, IJARCCE, 5(12): 316-319.
Yuret D, Türe F. (2006) Learning morphological disambiguation rules for Turkish, Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pages 328–334.
Zainuddin N, Selamat A. (2014) Sentiment analysis using Support Vector Machine, in I4CT 2014 - 1st International Conference on Computer, Communications, and Control Technology, Proceedings, pages. 333–337.
Pitsilis GK, Ramampiaro H, Langseth H. (2018) Effective hate-speech detection in Twitter data using recurrent neural networks, Appl. Intell., 48(12): 4730-4742.
Pak A, Paroubek P. (2010) Twitter as a corpus for sentiment analysis and opinion mining, Proc. 7th Int. Conf. Lang. Resour. Eval. Lr. 2010, pages. 1320–1326.
Kolchyna, O., Souza, T.T., Treleaven, P., Aste, T. (2015). Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination. arXiv: Computation and Language.
Nezhad ZB, Deihimi MA. (2019) A combined deep learning model for Persian Sentiment Analysis, IIUM Eng. J., 20(1): 129-139.
Zainuddin N., Selamat A., Ibrahim R. (2016) Improving Twitter Aspect-Based Sentiment Analysis Using Hybrid Approach. In: Nguyen N.T., Trawi?ski B., Fujita H., Hong TP. (eds) Intelligent Information and Database Systems. ACIIDS 2016. Lecture Notes in Computer Science, vol 9621. Springer, Berlin, Heidelberg.
Zainuddin, N., Selamat, A., Ibrahim, R. (2018) Hybrid sentiment classification on twitter aspect-based sentiment analysis. Appl Intell 48, 1218–1232.
Purohit NS, Angadi AB, Bhat M, Gull KC. (2015) Crawling through web to extract the data from Social networking site-Twitter, 2015 Natl. Conf. Parallel Comput. Technol. PARCOMPTECH 2015.
R: What is R?. Available: https://www.r-project.org/about.html.
Sammut C. (2011) Genetic and Evolutionary Algorithms. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston, MA.
Utomo FS, Suryana N, Azmi MS. (2020) Stemming impact analysis on Indonesian Quran translation and their exegesis classification for ontology instances. IIUM Eng. J., 21(1): 33-50.
Lutfiatun A, Novitasari A, Helfiyana A. (2018) Bahasa Alay Pada Chating Di Medsos Remaja Millenial ( Bahasa Alay Vs Remaja Millenial ), Prosiding SENASBASA (Seminar Nasional Bahasa dan Sastra), pages. 34–41.
KBBI Daring. Available: https://kbbi.kemdikbud.go.id/
Weka_tokenizers: R/Weka Tokenizers in RWeka: R/Weka Interface. Available: https://rdrr.io/cran/RWeka/man/Weka_tokenizers.html.
Jabreel M, Moreno A. (2019) A Deep Learning-Based Approach for Multi-Label Emotion Classification in Tweets, Appl. Sci., 9(6).
Musto C, Semeraro G, Polignano M. (2014) A comparison of lexicon-based approaches for sentiment analysis of microblog, CEUR Workshop Proc., 1314: 59-68.
Buntoro GA. (2017) Analisis Sentimen Calon Gubernur DKI Jakarta 2017 Di Twitter, Integer J. Maret, 1(1): 32-41.
Machine Learning Group - Department of Computer Science: University of Waikato. Available: https://www.cs.waikato.ac.nz/research/research-groups/machine-learning-group
Wiley M, Wiley JF. (2019) Advanced R statistical programming and data models: Analysis, machine learning, and visualization. Apress Media LLC.
Confusion Matrix for Your Multi-Class Machine Learning Model. Available: https://towardsdatascience.com/confusion-matrix-for-your-multi-class-machine-learning-model-ff9aa3bf7826.
Hernandez-Suarez A, Sanchez-Perez G, Toscano-Medina K, Martinez-Hernandez V, Sanchez V, Perez-Meana H. (2018) A Web Scraping Methodology for Bypassing Twitter API Restrictions, pp. 1-7.
Haniewicz K, Rutkowski W, Adamczyk M, Kaczmarek M. (2013) Towards the Lexicon-Based Sentiment Analysis of Polish Texts: Polarity Lexicon.
Attribute-Relation File Format (ARFF). Available: https://www.cs.waikato.ac.nz/~ml/weka/arff.html.
Handling imbalanced datasets in machine learning - Towards Data Science. Available: https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28.
Longadge R, Dongre S. (2013) Class Imbalance Problem in Data Mining Review, Int. J. Comput. Sci. Netw., vol. 2, no 1.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 IIUM Press
This work is licensed under a Creative Commons Attribution 4.0 International License.