Data mining, Big data, Decision tree, Parallel classifier, SPRINT classifier


Data mining is the extraction of information and its roles from a vast amount of data. This topic is one of the most important topics these days. Nowadays, massive amounts of data are generated and stored each day. This data has useful information in different fields that attract programmers’ and engineers’ attention. One of the primary data mining classifying algorithms is the decision tree. Decision tree techniques have several advantages but also present drawbacks. One of its main drawbacks is its need to reside its data in the main memory. SPRINT is one of the decision tree builder classifiers that has proposed a fix for this problem. In this paper, our research developed a new parallel decision tree classifier by working on SPRINT results. Our experimental results show considerable improvements in terms of the runtime and memory requirements compared to the SPRINT classifier. Our proposed classifier algorithm could be implemented in serial and parallel environments and can deal with big data.

ABSTRAK: Perlombongan data adalah pengekstrakan maklumat dan peranannya dari sejumlah besar data. Topik ini adalah salah satu topik yang paling penting pada masa ini. Pada masa ini, data yang banyak dihasilkan dan disimpan setiap hari. Data ini mempunyai maklumat berguna dalam pelbagai bidang yang menarik perhatian pengaturcara dan jurutera. Salah satu algoritma pengkelasan perlombongan data utama adalah pokok keputusan. Teknik pokok keputusan mempunyai beberapa kelebihan tetapi kekurangan. Salah satu kelemahan utamanya adalah keperluan menyimpan datanya dalam memori utama. SPRINT adalah salah satu pengelasan pembangun pokok keputusan yang telah mengemukakan untuk masalah ini. Dalam makalah ini, penyelidikan kami sedang mengembangkan pengkelasan pokok keputusan selari baru dengan mengusahakan hasil SPRINT. Hasil percubaan kami menunjukkan peningkatan yang besar dari segi jangka masa dan keperluan memori berbanding dengan pengelasan SPRINT. Algoritma pengklasifikasi yang dicadangkan kami dapat dilaksanakan dalam persekitaran bersiri dan selari dan dapat menangani data besar.


Download data is not yet available.


Metrics Loading ...

Author Biography

Mahdi Bahaghighat, Amirkabir University of Technology

He got his Ph.D. in EE from Amirkabir University Of Technology (AUT) in 2017 and his M.Sc from IUST in 2007. Currently, he is the assistant professor and chairman of the electrical engineering group at Raja University. His current research interests include Signal, Image and Video Processing, Computer Vision, Artificial Intelligence, Machine Learning, Deep Learning, Sensor Networks, and Wireless Multimedia Transmission.


Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth (1996). From data mining to knowledge discovery in databases. AI magazine. 17(3):37-37.

Berry, M.J. and G.S. Linoff (2004). Data mining techniques: for marketing, sales, and customer relationship management. John Wiley & Sons.

Tan, P.-N., M. Steinbach, and V. Kumar (2016). Introduction to data mining. Pearson Education India.

Aggarwal, C.C. (2014). An Introduction to Data Classification. Data Classification: Algorithms and Applications. Chapman and Hall/CRC. DOI:

Zaki, M.J. and W. Meira (2014). Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press. DOI:

Quinlan, J.R (1986). Induction of decision trees. Machine learning. 1(1):81-106. DOI:

Rennie, J.D., et al (2003). Tackling the poor assumptions of naive Bayes text classifiers In Proceedings of the Twentieth International Conference on Machine Learning: August 21-24, 2003; Whasington DC.

Hagan, M., et al (2014). Neural Network Design. 2nd Edtion. Oklahoma. Martin Hagan.

Jain, A.K. and R.C (1988). Dubes, Algorithms for clustering data. Prentice-Hall, Inc.

Zhang, C. and S. Zhang (2003). Association rule mining: models and algorithms. Springer.

Aggarwal, C (2014). Data Classification: Algorithms and Applications, ser. Frontiers in physics. Chapman and Hall/CRC.

Esposito, F., et al (1997). A comparative analysis of methods for pruning decision trees. IEEE transactions on pattern analysis and machine intelligence. 19(5): 476-491. DOI:

Mingers, J (1989). An empirical comparison of pruning methods for decision tree induction. Machine learning. 4(2):227-243. DOI:

Rokach, L. and O.Z. Maimon (2008). Data mining with decision trees: theory and applications. World scientific. DOI:

Kass, G.V (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society: Series C (Applied Statistics). 29(2):119-127. DOI:

Breiman, L., et al (1984). Classification and regression trees. CRC press.

Hunt, E.B , J. Marin, and P.J. Stone (1966). Experiments in induction. Academic Press.

Friedman, J.H (1991). Multivariate adaptive regression splines. The annals of statistics, 19(1):1-67. DOI:

Mehta, M., R. Agrawal, and J. Rissanen (1996). SLIQ: A fast scalable classifier for data mining. In Proceedings of the International conference on extending database technology: 25-29 March 1996; Avignon, France; pp 18-32. DOI:

Shafer, J., R. Agrawal, and M. Mehta (1996). SPRINT: A scalable parallel classifier for data mining. In Proceedings of the 22nd VLDB Conference. 3-6 September 1996; Mumbai(Bombay), India; pp 544-555.

Joshi, M.V., G. Karypis, and V. Kumar (1998). ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing. 30 March-3 April 1998; Orlando, FL, USA. DOI:

Bowyer, K.W., et al. A parallel decision tree builder for mining very large visualization datasets. In Proceedings of the ieee international conference on systems, man and cybernetics.'cybernetics evolving to systems, humans, organizations, and their complex interaction. 8-11 Oct 2000; Nashville, TN, USA. DOI:

Ranka, S. and V. Singh (1998). CLOUDS: A decision tree classifier for large datasets. In Proceedings of the 4th Knowledge Discovery and Data Mining Conference. 27 – 31 August 1998; New York, USA.

Liaw, A. and M. Wiener (2002). Classification and regression by random Forest. R news. 2(3):18-22.

Rastogi, R. and K. Shim (2000). PUBLIC: A decision tree classifier that integrates building and pruning. Data Mining and Knowledge Discovery. 4(4):315-344. DOI:

Bahaghighat, M., et al (2020). Estimation of wind turbine angular velocity remotely found on video mining and convolutional neural network. Applied Sciences. 10(10):35-44. DOI:

Bahaghighat, M., et al (2020). ConvLSTMConv network: A deep learning approach for sentiment analysis in cloud computing. Journal of Cloud Computing: Advances, Systems and Applications. 9(16). DOI:

Abedini, F., et al (2019). Wind turbine tower detection using feature descriptors and deep learningFacta Universitatis, Series: Electronics and Energetics. 33(1):133-153. DOI:

Bahaghighat, M., et al (2019). Vision Inspection of Bottle Caps in Drink Factories Using Convolutional Neural Networks. In Proceedings of the IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP). 5 - 7 September 2019; City Plaza, Cluj-Napoca, Romania. DOI:

Bahaghighat, M., et al (2019). A machine learning based approach for counting blister cards within drug packages. IEEE Access. 7: 83785-83796. DOI:




How to Cite

Shamseen, A. ., Mohammadi Zanjireh , M. ., Bahaghighat, M., & Xin, Q. (2021). DEVELOPING A PARALLEL CLASSIFIER FOR MINING IN BIG DATA SETS. IIUM Engineering Journal, 22(2), 119–134.



Electrical, Computer and Communications Engineering