Semantic Similarity Exploration in Heterogeneous Sparse Multidimensional Numeric Spaces

A Case Study of the Quran Text using SemSim

Authors

  • Adel Sabour University of Washington, Tacoma, USA
  • Abdeltawab Hendawi University of Rhode Island, USA
  • Mohamed Ali University of Washington, Tacoma, USA

Keywords:

Similarity detection, multidimensional data, heterogeneous entities, semantic similarity, numeric similarities, knowledge graphs, Quran, and text analysis

Abstract

Comparing heterogeneous entities that seem to have no common denominator, and inferring similarities among their attributes is a complex process compared to homogeneous entity sets. Two types of similarities exist: semantic similarities and numeric similarities. This paper presents a system named SemSim that helps users explore similarities in heterogeneous environments of multidimensional data sets. The system helps the user (a) define entities, entity groups, and dimensions of entities, (b) detect numeric similarities among entities either across the same dimensions or across different dimensions, (c) correlate the semantic similarities given in a knowledge graph with the detected numeric similarities, and then (d) use the detected numeric similarities to enhance the knowledge graph by exploring and mining for other hidden semantic similarities. As a case study, we apply the proposed system to the text of the holy Quran to explore the correlation between the semantic similarities and the numeric similarities for chapters, verses, and words.

References

R. Gisli Hjaltason and Hanan Samet. Index-driven similarity search in metric spaces (survey article). ACM Transactions on Database Systems (TODS), 28(4):517–580, 2003.

N. Roussopoulos, Stephen Kelley, and Frederic Vincent. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pages 71–79, 1995.

V. Gaede and Oliver Gu¨nther. Multidimensional access methods. ACM Computing Surveys (CSUR), 30(2):170–231, 1998.

P. Rai and Shubha Singh. A survey of clustering techniques. International Journal of Computer Applications, 7(12):1–5, 2010.

N. Vafaei, Rita A Ribeiro, and Luis M Camarinha-Matos. Normalization techniques for multi-criteria decision making: analytical hierarchy process case study. In doctoral conference on computing, electrical and industrial systems, pages 261–269. Springer, 2016.

M. M. Alqahtani and Eric Atwell. Developing bilingual arabic- english ontologies of al-quran. In 2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR), pages 96–101. IEEE, 2018.

R. Ismail, Zainab Abu Bakar, and Nurazzah Abd Rahman. Extracting knowledge from English translated Quran using nlp pattern. Jurnal Teknologi, 77(19), 2015.

H. Kharrazi and Said Raghay. Collaborative ontology authoring in the domain of the holy Quran knowledge. In 2019 7th Mediterranean Congress of Telecommunications (CMT), pages 1–3. IEEE, 2019.

M. Alshammeri, Eric Atwell, and Mhd Ammar Alsalka. Quranic topic modelling using paragraph vectors. In Proceedings of SAI Intelligent Sys- tems Conference, pages 218–230. Springer, 2020.

A.F. Huda, Moch R Deyana, QU Safitri, Wahyudin Darmalaksana, Ulfa Rahmani, et al. Analysis partition clustering and similarity measure on al-quran verses. In 2019 IEEE 5th International Conference on Wireless and Telematics (ICWT), pages 1–5. IEEE, 2019.

E. Khadangi, Mohammad Moein Fazeli, and Amin Shahmohammadi. The study on quranic surahs’ topic sameness using nlp techniques. In 2018 8th International Conference on Computer and Knowledge Engineer- ing (ICCKE), pages 298–302. IEEE, 2018.

M. K¨oppen. The curse of dimensionality. In 5th online world conference on soft computing in industrial applications (WSC5), volume 1, pages 4–8, 2000.

C. Sun Liew, Assad Abbas, Prem Prakash Jayaraman, Teh Ying Wah, Samee U Khan, et al. Big data reduction methods: a survey. Data Science and Engineering, 1(4):265–284, 2016.

Y. Huang. Incorporating domain ontology information into clustering in heterogeneous networks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(4):e1413, 2021.

J. Liu, Zhe Wang, Jianmin Zhang, and Enhong Chen. A survey of knowledge graph embedding: approaches and applications. In 2021 IEEE 23rd International Conference on High Performance Computing and Communications; IEEE 19th International Conference on Smart City; IEEE 7th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pages 1182–1187. IEEE, 2021.

A. Salsabila Muhmad Rusli, Farida Ridzuan, Zulkifly Mohd Zaki, M No- razizi Sham Mohd Sayuti, and Rosalina Abdul Salam. A systematic review on semantic-based ontology for quranic knowledge. International Journal of Engineering and Technology (UAE), 2018.

M. Nabeel Asim, Muhammad Wasim, Muhammad Usman Ghani Khan, Waqar Mahmood, and Hafiza Mahnoor Abbasi. A survey of ontology learning techniques and applications. Database, 2018, 2018.

O. Romero and Alberto Abell´o. A survey of multidimensional modeling methodologies. International Journal of Data Warehousing and Mining (IJDWM), 5(2):1–23, 2009.

Downloads

Published

2023-07-28

How to Cite

Sabour, A., Hendawi, A., & Ali , M. (2023). Semantic Similarity Exploration in Heterogeneous Sparse Multidimensional Numeric Spaces : A Case Study of the Quran Text using SemSim. International Journal on Perceptive and Cognitive Computing, 9(2), 25–32. Retrieved from https://journals.iium.edu.my/kict/index.php/IJPCC/article/view/399

Issue

Section

Articles