STEMMING IMPACT ANALYSIS ON INDONESIAN QURAN TRANSLATION AND THEIR TAFSIR CLASSIFICATION FOR ONTOLOGY INSTANCES

Fandy Setyo Utomo; Nanna Suryana; Mohd Sanusi Azmi

doi:10.31436/iiumej.v21i1.1170

Authors

Fandy Setyo Utomo Universitas AMIKOM Purwokerto https://orcid.org/0000-0001-6347-6514
Nanna Suryana Center for Advanced Computing Technology (C-ACT), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia https://orcid.org/0000-0003-3695-639X
Mohd Sanusi Azmi Center for Advanced Computing Technology (C-ACT), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia

DOI:

https://doi.org/10.31436/iiumej.v21i1.1170

Keywords:

K-Nearest Neighbor, Neural Network, Ontology Learning, Ontology Population, Support Vector Machine

Abstract

The current gap which appears in the Quran ontology population domain is stemming impact analysis on Indonesian Quran translation and their Tafsir to develop ontology instances. The existing studies of stemming effect analysis performed in various languages, dataset, stemming method, cases, and classifier. However, there is a lack of literature that studies about stemming influence on instances classification for Quran ontology with different dataset, classifier, Quran translation, and their Tafsir on Indonesian. Based on this problem, our study aims to investigate and analyze the stemming impact on instances classification results using Indonesian Quran translation and their Tafsir as datasets with multiple supervised classifiers. Our classification framework consists of text pre-processing, feature extraction, and text classification stage. Sastrawi stemmer was used to perform stemming operation in text pre-processing stage. Based on our experiment results, it was found that Support Vector Machine (SVM) with Term Frequency-Inverse Document Frequency (TF-IDF) and stemming operation owns the best classification performance, i.e., 70.75% for accuracy and 71.55% for precision in Indonesian Quran translation dataset on 20% test data size. While in 30% test data size, SVM and TF-IDF with stemming process own the best classification performance, i.e., 67.30% for accuracy and 68.10% for precision in Ministry of Religious Affairs Indonesia dataset. Furthermore, in this study, it was also discovered that the Backpropagation Neural Network has the most precision and accuracy reduction due to the negative impact of stemming operations.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

Fandy Setyo Utomo, Universitas AMIKOM Purwokerto

Fandy Setyo Utomo received his Master's degree in Computer Science at the Faculty of Mathematics and Natural Sciences, Gadjah Mada University in 2015. He is a Ph.D student in Software and Information Systems Engineering at the Faculty of Information and Communication Technology, Universiti Teknikal Malaysia Melaka (UTeM) since 2016. Currently, he works as a lecturer at Universitas AMIKOM Purwokerto. His research interests are ontologies and semantic web and their usage in question answering system.

Nanna Suryana, Center for Advanced Computing Technology (C-ACT), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia

Nanna Suryana is currently a full professor and former Director of International Office at Universiti Teknikal Malaysia Melaka (UTEM), Faculty of Information and Communication Technology. He obtained his B.Sc. in Soil & Water Engineering at Padjadjaran University – Indonesia (1980), M.Sc. in Computer Assisted for Geoinformatics & Earth Science at International Institute for Geoinformatics and Earth Observation (ITC), Enschede – the Netherlands (1987), and Ph.D. in Geographical Information System (GIS) and Remote Sensing from the Department of GIS and Remote Sensing, The Wageningen Research University(WUR) – the Netherlands (1996). He is currently holding a position of Chairman of the Center of Advanced Computing Technology (C-ACT) - Centre for Research and Innovation Management, Faculty of Information and Commuication Technology (FTMK), UTeM. His current research interest and has published articles in journals, book chapters in field of GIS, Large Spatial Data and Information Retrieval, Image Processing, Spatial Modelling and Analysis. Mobile GIS and Interoperability.

Mohd Sanusi Azmi, Center for Advanced Computing Technology (C-ACT), Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia

Mohd Sanusi Azmi received his PhD from Universiti Kebangsaan Malaysia in 2013. Currently, he is the Head of Software Engineering Department in Universiti Teknikal Malaysia Melaka (UTeM), Faculty of Information and Communication Technology. His specialization is in feature extraction for the Arabic/Jawi handwriting image, having proposed a novel feature in the domain. He is also interested in image processing especially preprocessing, segmentation and classification of handwriting image in the Arabic/Jawi domain.

References

[1] Utomo FS, Suryana N, Azmi MS. (2019). New Instances Classification Framework on Quran Ontology Applied to Question Answering System. TELKOMNIKA, 17(1): 139–146. http://dx.doi.org/10.12928/telkomnika.v17i1.9794
[2] Cimiano P. (2006). Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. New York, Springer Science & Business Media. doi: 10.1007/978-0-387-39252-3
[3] Xian G, Li J, Kou Y, Luo T, Huang Y. (2018). Construction and Application of Upper Country Ontology Based on OWL and SKOS. In Proceedings of the 2nd International Conference on Computer Science and Application Engineering: 22-24 October 2018; Hohhot. pp 1–6. doi: 10.1145/3207677.3278056
[4] Buranarach M, Supnithi T, Thein YM, Ruangrajitpakorn T, Rattanasawad T, Wongpatikaseree K, Lim AO, Tan Y, Assawamakin A. (2016). OAM: An Ontology Application Management Framework for Simplifying Ontology-Based Semantic Web Application Development. International Journal of Software Engineering and Knowledge Engineering, 26(1): 115–145. doi: 10.1142/s0218194016500066
[5] Mitzias P, Riga M, Kontopoulos E, Stavropoulos TG, Andreadis S, Meditskos G, Kompatsiaris I. (2016). User-Driven Ontology Population from Linked Data Sources. In Communications in Computer and Information Science. Volume 649. Edited by Ngonga Ngomo AC., K?emen P. Prague, Springer; 31–41.
[6] Hakkoum A, Raghay S. Advanced Search in the Qur’an using Semantic modeling. (2015). In IEEE/ACS 12th International Conference of Computer Systems and Applications (AICCSA): 17-20 November 2015; Marrakech. pp. 1–4. doi: 10.1109/AICCSA.2015. 7507259
[7] Periamalai NSHA, Mustapha A, Alqurneh A. (2016). An Ontology for Juz’ Amma based on Expert Knowledge. In 7th International Conference on Computer Science and Information Technology (CSIT): 13-14 July 2016; Amman. pp. 1–5. doi: 10.1109/CSIT.2016.7549480
[8] Zailani SAM, Omar NA, Mustapha A, Rahim MHA. (2018). Fasting ontology in pillars of Islam. Indonesian Journal of Electrical Engineering and Computer Science, 12(2): 562–569. doi: 10.11591/ijeecs.v12.i2.pp562-569
[9] Ta’a A, Abed QA, Ali BM, Ahmad M. (2016). Ontology-Based Approach for Knowledge Retrieval in Al-Quran Holy Book. International Journal of Computational Engineering Research (IJCER) Ontology-Based, 6(3): 8–15.
[10] Afifi M, Safee M, Saudi MM, Pitchay SA, Ridzuan F, Basir N, Saadan K, Nabila F. (2018). Hybrid Search Approach for Retrieving Medical and Health Science Knowledge from Quran. International Journal of Engineering & Technology, 7: 69–74. doi: 10.14419/ijet.v7i4.15. 21374
[11] Jabbar A, Iqbal S, Khan MUG, Hussain S. (2018). A survey on Urdu and Urdu like language stemmers and stemming techniques. Artificial Intelligence Review, 49(3): 339–373. doi: 10.1007/s10462-016-9527-1
[12] Jabbar A, Iqbal S, Akhunzada A, Abbas Q. (2018). An improved Urdu stemming algorithm for text mining based on multi-step hybrid approach. Journal of Experimental and Theoretical Artificial Intelligence, 30(5): 703–723. https://doi.org/10.1080/0952813X.2018.1467495
[13] Kassim MN, Jali SHM, Maarof MA, Zainal A. (2019). Towards stemming error reduction for Malay texts. Lecture Notes in Electrical Engineering, 481: 13–23. doi: 10.1007/978-981-13-2622-6_2
[14] Uysal AK, Gunal S. (2014). The impact of preprocessing on text classification. Information Processing and Management, 50(1): 104–112. http://dx.doi.org/10.1016/j.ipm.2013.08.006
[15] Sharma D, Jain S. (2015). Evaluation of Stemming and Stop Word Techniques on Text Classification Problem. International Journal of Scientific Research in Computer Science and Engineering, 3(2): 1–4.
[16] Hamed SK, Ab Aziz MJ. (2018). Classification of Holy Quran Translation using Neural Network Technique. Journal of Engineering and Applied Sciences, 13(12): 4468–4475. doi: 10.3923/jeasci.2018.4468.4475
[17] Hamed SK, Ab Aziz MJ. (2016). A question answering system on Holy Quran translation based on question expansion technique and Neural Network classification. Journal of Computer Science, 12(3): 169–177. doi: 10.3844/jcssp.2016.169.177
[18] Rostam NAP, Malim NHAH. (in press). Text categorisation in Quran and Hadith: Overcoming the interrelation challenges using machine learning and term weighting. Journal of King Saud University - Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2019.03.007
[19] Pane RA, Mubarok MS, Huda NS, Adiwijaya. (2018). A Multi-lable Classification on Topics of Quranic Verses in English Translation using Multinomial Naive Bayes. In Proceedings of the 6th International Conference on Information and Communication Technology (ICoICT): 3-5 May 2018; Bandung. pp. 481–484. doi: 10.1109/ICoICT.2018.8528777
[20] Wahbeh A, Al-Kabi M, Al-Radaideh Q, Al-Shawakfa E, Alsmadi I. (2011). The Effect of Stemming on Arabic Text Classification. International Journal of Information Retrieval Research, 1(3): 54–70. doi: 10.4018/ijirr.2011070104
[21] Sallam RM, Mousa HM, Hussein M. (2016). Improving Arabic Text Categorization using Normalization and Stemming Techniques. International Journal of Computer Applications, 135(2): 38–43. doi: 10.5120/ijca2016908328
[22] Taghva K, Elkhoury R, Coombs J. (2005). Arabic Stemming without a Root Dictionary. In Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC): 4-6 April 2005; Las Vegas. pp. 152–157. doi: 10.1109/ITCC.2005.90
[23] Tashaphyne: Arabic Light Stemmer [https://pypi.org/project/Tashaphyne/].
[24] Hidayatullah AF, Ratnasari CI, Wisnugroho S. (2016). Analysis of Stemming Influence on Indonesian Tweet Classification. TELKOMNIKA, 14(2): 665–673. doi: 10.12928/ telkomnika.v14i2.3113
[25] Asian J. (2007). Effective Techniques for Indonesian Text Retrieval. PhD thesis. RMIT University, School of Computer Science and Information Technology.
[26] Singh J, Gupta V. (2017). A systematic review of text stemming techniques. Artificial Intelligence Review, 48(2): 157–217. doi: 10.1007/s10462-016-9498-2
[27] Tala FZ. (2003). A Study of Stemming Effect on Information Retrieval in Bahasa Indonesia. Master Thesis. Universiteit van Amsterdam, Institute for Logic, Language and Computation.
[28] Kusumaningrum R, Adhy S, Suryono S. (2018). WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classification based on Latent Dirichlet Allocation. TELKOMNIKA, 16(4): 1752–1759. doi: 10.12928/telkomnika.v16i4.8194
[29] Darmawiguna IGM, Pradnyana GA, Santyadiputra GS. (2019). The Development of Integrated Bali Tourism Information Portal using Web Scrapping and Clustering Methods. Journal of Physics: Conference Series, 1165(1): 1–10. doi: 10.1088/1742-6596/1165/1/ 012010
[30] Schneider MJ, Gupta S. (2016). Forecasting sales of new and existing products using consumer reviews: A random projections approach. International Journal of Forecasting, 32(2): 243–256. doi: 10.1016/j.ijforecast.2015.08.005
[31] Chen G, Xiao L. (2016). Selecting publication keywords for domain analysis in bibliometrics: A comparison of three methods. Journal of Informetrics, 10(1): 212–223. http://dx.doi.org/10.1016/j.joi.2016.01.006
[32] Kurnia I, Sopian T, Suryana Y, Makbul, Nugraha S, Al-Ghifari, MM, Abdullah R. (2012). Al-Qur’an CORDOBA, 1st ed. Bandung, Cordoba: Internasional - Indonesia.