Classifying Muslim Ideologies from Islamic Websites using Text Analysis Based on Naive Bayes and TF-IDF

Authors

  • Akeem Olowolayemo Department of Computer Science, International Islamic University Malaysia, Kuala Lumpur, Malaysia
  • Salma Moustafa Sharey Moustafa Department of Computer Science, KICT, International Islamic University Malaysia

DOI:

https://doi.org/10.31436/ijpcc.v10i1.321

Keywords:

Digital deception, Web Classification, Text Classification, Content Filtering, Text Analysis, Naive Bayes, TF-IDF

Abstract

Reliable digital Islamic information is one of the challenges faced by innocent Islamic information seekers such as young Muslims, new Muslims as well as others who desire to find authentic information about Islam, Prophet Muhammad (saw), and Muslims, in general. Several deviant ideologies abound, and they also present their information using the internet, sometimes involving digital deception. In the digital era, misleading Islamic information may affect people’s beliefs, behaviours, and attitudes. Many websites are equally based on several schools of thought regarding Islamic practices which could be difficult for the new Muslims, and the young generation of Muslims to recognize what to follow among these different websites based on the information presented on the sites. Some other variants of practices are considered to be deviants by the mainstream Sunni scholars which may be misleading for innocent Islamic information seekers including non-Muslims. Consequently, the need to categorize different Islamic websites based on different schools and branches becomes imperative. This initial study focuses classification of Islamic websites utilising website categorization and text classification approach to their textual contents. The proposed technique classified 60 Islamic websites into two various categories Sunni and Shia using TF-IDF for features extraction while using Multinomial Naive Bayes for classification. In addition, extracting the keywords for each of the two categories assisted in the classification process. The results show that Multinomial Naive Bayes was easily implemented and predicted the categories of Islamic websites with an accuracy of 0.89, precision 1.0, recall 0.80, as well as an F1 score of 0.89. The keywords that differentiate Sunni websites from Shia's websites were extracted. It was found that the best keywords that can be used in search engines to identify Sunni websites are Islam and Muslim, while Shia and Imam are the most prominent keywords that can be used to identify Shia's websites.

References

L. Safae, B. El Habib, and T. Abderrahim, “A Review of Machine Learning Algorithms for Web Page Classification,” in 5th International Congress on Information Science and Technology (CiSt), 2018, pp. 220–226.

F. A. Mohamed, M. S. Abdul Aziz, M. Mahmud, and Z. Zulkifli, “Identifying Cues to Deception in Islamic Websites Text-Based Content and Design,” in International Conference on Information and Communication Technology for the Muslim World, 2018, pp. 285–289.

A. Idris and O. Kurtbag, “A Comparative Study of Government Policy in Dealing with Deviant Teachings in Islam: The Case of Malaysia and Turkey,” Int. J. Acad. Res. Bus. Soc. Sci., vol. Vol. 9, N, 2019.

M. Mahmud and A. Abubakar, “Investigating the Act of Deception in Online-Islamic Content,” in 2014 3rd International Conference on User Science and Engineering (i-USEr), 2014, pp. 1–6.

N. Saat, “Johor and Traditionalist Islam: What This Means for Malaysia,” in Southeast Asian Affairs; Singapore, 2018, pp. 186–200.

V. Kirichenko, “The Shia Community IN Malaysia,” Russ. Moslem World, vol. N 1 (307), pp. 80–91, 2020.

L. Roberts and R. Samani, “Digital Deception: The Online Behavior of Teens?,” pp. 1–8, 2013.

A. D. Kulkarni and L. L. Brown III, “Phishing Websites Detection Using Machine Learning,” Int. J. Adv. Comput. Sci. Appl., vol. Vol. 10, 2019.

Z. Zaman and S. Sharmin, “Spam Detection in Social Media Employing Machine Learning Tool for Text Mining,” in 13th International Conference on Signal-Image Technology and Internet-Based Sys, 2017, pp. 137–142.

M. Granik and V. Mesyura, “Fake News Detection Using Naive Bayes Classifier,” in 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), 2017, pp. 900–903.

A. Subasi, E. Molah, F. Almkallawi, and T. J. Chaudhery, “Intelligent Phishing Website Detection Using Random Forest Classifier,” in 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), 2017, pp. 1–5.

T. A. Abdallah and B. de La Iglesia, “URL-Based Web Page Classification: With n-Gram Language Models,” in In International Joint Conference on Knowledge Discovery, Knowledge Engineering, and Knowledge Management, 2014, pp. 19–33.

A. Shawon, S. T. Zuhori, F. Mahmud, and M. J.-U. Rahman, “Website Classification Using Word Based Multiple N-Gram Models And Random Search Oriented Feature Parameters,” in 2018 21st International Conference of Computer and Information Technology, 2018, pp. 1–6.

S. Pudaruth, Y. Ankiah, and K. Sembhoo, “Using a Thesaurus-Based Approach for the Categorisation of Web Sites,” 2014.

R. Rajalakshmi and C. Aravindan, “Naive Bayes Approach for Website Classification,” in International Conference on Advances in Information Technology and Mobile Communication, 2011, pp. 323–326.

S. Raschka, “Naive Bayes and Text Classification I - Introduction and Theory,” pp. 1–20, 2014.

G. Singh, B. Kumar, L. Gaur, and A. Tyagi, “Comparison Between Multinomial and Bernoulli Naïve Bayes for Text Classification,” in 2019 International Conference on Automation, Computational and Technology Management (ICACTM), 2019, pp. 593–596

Downloads

Published

2024-01-28

How to Cite

Olowolayemo, A., & Moustafa Sharey Moustafa, S. (2024). Classifying Muslim Ideologies from Islamic Websites using Text Analysis Based on Naive Bayes and TF-IDF. International Journal on Perceptive and Cognitive Computing, 10(1), 8–15. https://doi.org/10.31436/ijpcc.v10i1.321

Most read articles by the same author(s)