IMPLEMENTATION OF A MACHINE LEARNING ALGORITHM FOR SENTIMENT ANALYSIS OF INDONESIA’S 2019 PRESIDENTIAL ELECTION

: In 2019, citizens of Indonesia participated in the democratic process of electing a new president, vice president, and various legislative candidates for the country. The 2019 Indonesian presidential election was very tense in terms of the candidates' campaigns in cyberspace, especially on social media sites such as Facebook, Twitter, Instagram, Google+, Tumblr, LinkedIn, etc. The Indonesian people used social media platforms to express their positive, neutral, and also negative opinions on the respective presidential candidates. The campaigning of respective social media users on their choice of candidates for regents, governors, and legislative positions up to presidential candidates was conducted via the Internet and online media. Therefore, the aim of this paper is to conduct sentiment analysis on the candidates in the 2019 Indonesia presidential election based on Twitter datasets. The study used datasets on the opinions expressed by the Indonesian people available on Twitter with the hashtags (#) containing "Jokowi and Prabowo." We conducted data pre-processing using a selection of comments, data cleansing, text parsing, sentence normalization and tokenization based on the given text in the Indonesian language, determination of class attributes, and, finally, we classified


INTRODUCTION
The turmoil resulting from organizing the 2019 Indonesian general election, notably the presidential election, has been felt since last year. This has applied not only in the real world but also in cyberspace, mainly on social media sites such as Twitter, Instagram, Facebook, etc., which people used to discuss their potential presidential candidates. The stages of the general election and presidential election in 2019 were announced by the Indonesian General Elections Commission (KPU). The names of the presidential candidates had been widely discussed on social media as far back as the candidate registration phase in early 2019 by the Indonesian KPU [1]. The virtual world is a world that is so free and difficult to control, where everyone is free to speak or give their opinion on their respective candidates. The opinions expressed by the public may be positive, neutral, or even negative.
The world of information has developed so fast that there is now a significant amount of online media, from news information to social media or friendships, with social media including Facebook, Twitter, Path, Instagram, Google+, and many more. Twitter has a total of 330 million active users to date, while around 500 million tweets are made worldwide every day. There are around 100 million active daily users of Twitter around the world [2].
Social media is not only used as a means of friendship or for making friends but also for activities such as the promotion of merchandise or sale and purchase, up to political party promos or campaigns for regent, presidential, and legislative candidates. The team charged with ensuring a candidate for president or regional head, for example, will justify any means of campaigning for their candidate, as evidenced by the presence of many Black Campaigns during the campaign period [3], especially on social media against a candidate. Today's campaign or imaging is not only done in the real world but also in the virtual world. Social media, especially Twitter, is now one of the most effective and efficient campaign venues.
Sentiment analysis continues to be used as part of opinion mining research. It is the process of understanding, extracting, and processing textual data automatically to obtain the sentiment information contained in an opinion sentence [4].
In this study, sentiment analysis was conducted with the aim of viewing and retrieving information pertaining to the opinions expressed by people in the Indonesian language on Twitter with regard to the candidates in the 2019 Indonesian presidential election, whether those opinions were in the category of positive, neutral, or negative. To test the accuracy of the sentiment analysis in this study, we used two machine learning algorithms, namely https://doi.org/10.31436/iiumej.v22i1.1532 Naïve Bayes Classifier (NBC) and Support Vector Machine (SVM) and 7 tokenizations including an Alphabetic Tokenizer, Character N-gram Tokenizer, Unigram, Bigram, Trigram, N-gram, and Word Tokenizer. The result will enable us to see the accuracy of the machine learning methods NBC and SVM [5] and 7 tokenizations including the Alphabet Tokenizer, Character N-gram Tokenizer, Character Tokenizer, Unigram, Bigram, Trigram, N-gram, and Word Tokenizer for sentiment analysis of the 2019 Indonesian presidential candidates.

RELATED WORK
Sentiment analysis research used machine learning to classify Turkish political news [6]. This research classified the sentiment toward Turkish political news and determined whether the sentiment expressed was positive or negative. The different features of Turkish political news were extracted with the machine learning algorithms of Naïve Bayes Classifier (NBC), Maximum Entropy (ME), and SVM to produce a classification model. Sentiment analysis was used to group texts according to their positive or negative orientation [7]. This paper explains the experimental results that apply SVMs to conduct benchmarking with standard datasets to train sentiment analysis classifiers. N-grams and different weighting schemes were used to extract the most classic features. This study also explores the Chi-Square weight feature to select informative features for the classification method. The results of this experimental analysis reveal that using the Chi-Square feature selection can significantly enhance classification accuracy.
The main challenge for law enforcement in recent years has been the automatic detection of abusive language in online media [8]. First, we have developed a deep learning architecture that uses word frequency vectorization to implement the features above. Second, we have proposed a method that, because it does not use pre-trained word embedding, is an independent language. Third, we have conducted a comprehensive evaluation of our model using public datasets from labelled tweets and open-source implementations built using Keras. The paper presents an ensemble classifier for detecting hate speech in short texts, such as opinion tweets used as corpus datasets [9]. Our classification uses deep learning and combines a set of features related to user behaviour characteristics, such as the tendency to send rough messages as input to a combination of machine learning algorithms [10,11]. Sentiment analysis research was carried out using a hybrid approach [12] with its research methods, including mining association rules, parsing dependencies, and Sentiwordnet applied to solve this aspect-based sentiment analysis problem [13]. The performance of the research was evaluated using negative and racial domains and other benchmarks to evaluate the accuracy of aspect-based sentiment classification.

Tweet Data Collection
Crawling [14] carried out tweet data collection with R Programming using R-Studio from Twitter. The data taken comprised only tweets in Indonesian, which consisted of 5,000 tweets containing the Jokowi keywords and 5000 tweets containing the Prabowo keywords, to give a total of 10,000 tweets. The data were taken randomly from ordinary users of Twitter (Fig. 1).

Data Pre-Processing
The data pre-processing stage [16] in this study consisted of 4 steps, which are described as follows:

A. Selection of Comments
At this stage, comments were selected that contained the keywords of hashtags (#) Jokowi and Prabowo; any data that did not contain both were deleted. When crawling all comments with the hashtag, both will be taken even if they appear in the same sentence. Then, during this process, the same comment will be deleted, even if it comes from a different Twitter account, in order to find unique tweet data.

B. Cleansing
This process aimed to clean up any comments from Twitter that were still dirty and contained a lot of noise. The opinion sentences obtained from Twitter usually contained a certain level of noise, i.e., random errors or variants in measured variables; therefore, we had to eliminate and clean the noise. The items omitted were usually HTML characters, symbols, emoticon icons, hashtags (#), usernames (@username), URL addresses (http://websitename.com), and email addresses (name@websitename.com).

C. Parsing
The third data pre-processing step in this study was parsing [17]. The aim was to break the document into a string of words and then analyse the collection of words by separating them and determining the syntactic structure of each word.

D. Sentence Normalization
The aim of this step was to normalize the sentences taken from Twitter; for example, a sentence containing the words Gaul or Alay [18] would be normalized so that the sentence or language of Gaul and Alay could be recognized as a language following KBBI (The Great Dictionary of the Indonesian Language) [19]. The normalization of sentences involved the following processes: • Stretch punctuation and symbols other than the alphabet Stretching punctuation involves inserting distance around the punctuation associated with words that come after or before. The aim is to avoid any punctuation and/or symbols other than those in the alphabet becoming one with the words during the tokenization process.
• Change to all lowercase letters • Normalization of words. The rules in the normalization process are shown in Table  1. When happy or upset, someone may write opinions based on their emotions; often, when expressing this in written form, they will repeat the same letter. For example: "kereeen" to express pleasure. Repeated words like "kereeen" will be normalized to "cool".

Tokenization
After normalizing the sentence, it was then broken down into tokens [20] using a delimiter or space bar. The tokens used in this study are: • Alphabetic Tokenizer: These tokens are formed only from adjacent alphabetical sequences, for example: aku, anak, asli, baik, bagus, cara, cinta, demi, engkau, enak, film • Character N-gram Tokenizer: This tokenizer divides the token into a one-word character; for example: pe, mi, lu, pe, mi, li, han, u, mum • Unigram: This tokenizer divides the sentence into a token, with each token consisting of only one word; for example, "Pemilu". • Bigram: This tokenizer divides the sentence into a token, with each token consisting of two words; for example: "Pemilihan Umum". • Trigram: This tokenizer divides the sentence into a token, with each token consisting of three words; for example, "Pemilihan Umum Indonesia". • N-gram Tokenizer: This tokenizer divides the string into n-grams with the minimum and maximum number of grams as specified; for example, "pemilihan, pemilihan umum, pemilihan umum Indonesia, aku, aku anak, aku anak indonesia" • Word Tokenizer: This tokenizer divides tokens from the basic words; for example, "aku, akun, akuntansi, alam, alami, alamiah"

Determination of Class Attribute
After pre-processing, the next stage in this research is to determine the class attribute. The class attribute used here is sentiment class; in this study, there are 3 class attributes [21], namely positive, neutral, and negative. The use of 3 class attributes provides a more detailed and accurate classification of public opinion toward a particular object.

Load Dictionary
Following the class attribute determination, the next step is to apply the Lexicon-based method [22]. The dictionary used in this study comprises positive words (positive keywords), negative words (negative keywords), and negation words (negation keywords).

Determination of Sentiment
This is the process used for determining the sentiment (Positive, Neutral, or Negative) in Twitter data once the processing has been performed. The sentiment determining process used in this study consisted of the Lexicon-based or Dictionary-based method with Python Programming. In this study, we are using the Positive and Negative Dictionary. The polarity score of an opinion word (p) will be 1 if the word is in the positive dictionary, meaning the word is positive. A word that is in neither the positive nor negative dictionary is worth 0, meaning it is neutral, while a word in the negative dictionary is worth -1, meaning it is negative [23]. The method for determining sentiment uses the sum formula n, namely the opinion polarity score of the word, plus p, that is the opinion commenting on the feature (f).
After determining which words in a Twitter opinion sentence are positive, neutral, or negative, the weight of the values contained in the sentence is then calculated by totalling the value of each opinion word. If the number of opinion words in the sentence is ≥ 1, then the sentiment value of the opinion sentence is positive; if the opinion value of the sentence https://doi.org/10.31436/iiumej.v22i1.1532 is 0, then the sentiment value of the opinion sentence is neutral, and if the opinion word value in the sentence is ≥ -1, then the sentiment value of the opinion sentence is negative. The determination of sentiment can be seen in Table 2.

Classification Processes
Following the process for determining sentiment and having established the sentiment value of each opinion sentence using Python Programming, the next step is the sentiment classification process. The classification process uses the WEKA 3.8.3 Machine Learning tool [24], and the machine learning algorithms used in this study are NBC and SVM. In the classification process, the data were tested using the 10-fold cross-validation method [25]. The method works by dividing the dataset into two, namely 10 parts with 9/10 parts used as training data and 1/10 parts used as testing data. The iteration process in the method can be run 10 times with a variety of training data and data testing using a combination of 10 parts of data.

Evaluation of Results
The stages of evaluation in the study will examine the performance of Accuracy, Precision, and Recall from the experiments that have been carried out. The results evaluation process is conducted using a Confusion Matrix [26] featuring as its indicators a true positive rate (TP rate), true negative rate (TN rate), false positive rate (FP rate), and false negative rate (FN rate). The TP rate is the percentage of the positive class that is successfully classified as a positive class, while the TN rate is the percentage of the negative class that is https://doi.org/10.31436/iiumej.v22i1.1532 successfully classified as a negative class. The FP rate is a negative class that is classified as a positive class, and the FN rate is a positive class that is classified as a negative class.

EXPERIMENT AND RESULTS
In this study, the dataset was derived from tweets of public opinion on the Indonesian 2019 presidential candidates. The data were taken using the crawling method [27] with R Programming using R-Studio from Twitter social media. The data taken were only tweets in Indonesian, with the details of 5000 tweets containing Jokowi's keywords and 5,000 tweets containing Prabowo's keywords, giving a total of 10,000 tweets. The tweet data were taken randomly from both ordinary users and from the news media online on Twitter.
Following the data pre-processing, tokenization, and class attribute determination steps, the dataset used for this study contained opinion sentences from Twitter classified into their respective sentiment classes (Positive, Neutral, or Negative) with Python Programming. The number of datasets is not the same as the amount of data taken because, during the data pre-processing, the same opinion sentence will be deleted to search for unique data, whereas when the data are being crawled, all opinion sentences will be taken even though the sentence is the same. Table 4 contains the results of the determination of the sentiment class using the Lexicon-based method [28] in Python Programming with three attribute classes, namely positive, neutral, and negative. After determining the sentiment value of each opinion sentence, the opinion sentences are formed into a dataset using the Attribute-Relation File Format (ARFF) [29] as the input for classifying data with WEKA. The tweet data were then classified or tested for accuracy using the NBC machine learning algorithms and SVM with WEKA version 3.8.3 software.
This study uses the 10-fold cross-validation method for the process of classifying or testing tweet data. In this process, the data are divided into 10 parts with 9/10 parts used for the training process and 1/10 parts used for the testing process. Iteration takes place 10 times with variations in training and testing data using a combination of 10 parts of data. Table 5 displays a comparison of the results from the NBC machine learning algorithm with SVM. The information in Table 5 enables a comparison of the accuracy, precision, recall, TP rate, and TN rate values for each trial carried out with the NBC machine learning algorithm and SVM. The columns contain the tokenization data used in this study while the rows contain the accuracy, precision, recall, TP rate, and TN rate values for each trial conducted. The process from data pre-processing to the determination of the sentiment class produced the dataset of this research, which was then used as the input in the classification process. The classification process was carried out with WEKA Machine Learning using the NBC machine learning algorithm and SVM. The classification test process with 7 tokenizations produced values for accuracy, precision, recall, TP rate, and TN rate for each trial.   Accuracy was one of the main parameters in the assessment of the sentiment analysis model used in this study. The formula for the value of accuracy was the amount of data that were successfully classified according to the class of sentiment for the entire amount of data classified. Therefore, the greater the amount of data that were correctly classified according to the sentiment class, the higher the accuracy value. The highest accuracy value was obtained with respect to the combination of the SVM and Alphabetic Tokenization machine learning algorithms, which had an accuracy value of 79.02%. In this study, machine learning methods such as the SVM algorithm produced the highest accuracy because they work by recognizing word patterns. This machine learning algorithm is capable of easily recognizing and memorizing word patterns for a certain sentiment class in an opinion sentence. Yet while it is easy to classify sentiment data correctly using these methods, alphabetic tokenization can improve accuracy by breaking a sentence into words, which enables the easy classification of sentences with sentiments. The lowest accuracy value in this study was obtained for the NBC machine learning algorithm with N-gram tokenization, which yielded an accuracy value of 44.94%.   of 55.1% came from the NBC machine learning algorithm with N-gram tokenization. In the Figure 7, we can see that highest Recall value of 79% was obtained with the SVM machine learning algorithm and alphabetic tokenization, while the lowest Recall value of 51.6% was obtained with the NBC machine learning algorithm with N-gram tokenization. The high precision values were obtained because the precision value formula was based on the number of positive classes that were correctly classified as a positive class divided by the total data classified as a positive class, whereas the recall value formula consisted of the number of positive classes that were correctly classified as positive classes divided by the number of actual positive classes.   data that were correctly classified according to the sentiment class, which in this case was negative.
From the research carried out, it can be seen that the model constructed delivered the greatest accuracy when using a combination of the SVM machine learning algorithm and Ngram tokenization, while the lowest accuracy value was obtained when testing using a combination of the NBC machine learning algorithm with Trigram tokenization. The accuracy results produced were quite good; however, the model still made a number of mistakes when the classification process of the dataset with the distribution of sentiments was not as balanced as this study intended. The use of datasets with imbalanced distribution will lead to the incorrect classification of minority class data as majority class data [30], which results in a large value difference because most classifiers manage to correctly classify the majority class compared to the minor class [31].

CONCLUSIONS
From the series of studies conducted, we can conclude that the Sentiment Analysis model built was suitable for use in determining the sentiment of public opinion on Twitter with respect to the 2019 Indonesian presidential candidates. The study aimed to test and determine which machine learning algorithms were suitable for the classification of public opinion on Twitter, and also to test 7 suitable tokenizations and produce high accuracy when combined with the Naïve Bayes Classifier (NBC) and Support Vector Machine (SVM) machine learning algorithms. The sentiment analysis revealed that there was much negative public sentiment on Twitter aimed at the 2019 Indonesian presidential candidates. The greatest accuracy value was obtained when using a combination of the SVM machine learning algorithm and alphabetic tokenization, which yielded an accuracy value of 79.02%. The lowest accuracy value in this study was obtained for the NBC machine learning algorithm with N-gram tokenization, which had an accuracy value of 44.94%. This study has therefore demonstrated that the SVM machine learning algorithm produces higher accuracy compared to the NBC machine learning algorithm. It is suggested that further research should endeavour to use more data and real-time data from both Twitter and other social media sites such as Facebook and YouTube.