Design of Intelligent Feature Selection Technique for Phishing Detection

Sharvari Sagar Patil; Narendra M. Shekokar; Sridhar Chandramohan Iyer

doi:10.31436/iiumej.v26i1.3337

Authors

Sharvari Sagar Patil Dwarkadas J. Sanghvi College of Engineering https://orcid.org/0000-0002-0721-8788
Narendra M. Shekokar Dwarkadas J. Sanghvi College of Engineering
Sridhar C. Iyer Dwarkadas J. Sanghvi College of Engineering https://orcid.org/0000-0003-3964-2476

DOI:

https://doi.org/10.31436/iiumej.v26i1.3337

Keywords:

Reinforcement Learning, Feature Selection, Phishing Detection, Machine Learning

Abstract

Phishing attacks lead to significant threats to individuals and organizations by gaining unauthorized access. The attackers redirect the users to fake websites and steal their credentials and other confidential data. Various techniques are employed to detect phishing using machine learning algorithms or static detection techniques that use blacklisting of web URLs. The attackers tend to change their approach to launch an attack, making it difficult for traditional phishing detection techniques to safeguard the user. The performance of conventional detection methods relies on exhaustive data and features selected for classification. Features selected for designing detection systems majorly contribute to the performance of the detection system. Phishing detection techniques rely mainly on static features that are selected based on traditional feature selection or ranking techniques. This paper proposes an innovative approach to phishing detection by designing a feature selection technique using reinforcement learning. A novel reinforcement learning agent is designed that uses a dynamic, adaptive, and data-driven approach to improve classifier performance in phishing detection. The technique is designed to select the features using the RL agent dynamically. We have evaluated our technique using the real-world phishing dataset and compared its performance with the existing techniques. Based on the evaluation, our proposed methodology of dynamic feature selection gives the best accuracy of 99.07 % with the random forest classifier model. Our work contributes to advancing phishing detection methodology by developing a dynamic feature selection technique.

ABSTRAK: Serangan pancing data membawa ancaman besar kepada individu dan organisasi dengan mendapatkan akses tanpa kebenaran. Penyerang akan mengalihkan pengguna ke laman web palsu dan mencuri maklumat log masuk serta data sulit yang lain. Pelbagai teknik digunakan bagi mengesan pancing data menggunakan algoritma pembelajaran mesin atau teknik pengesanan statik yang menggunakan URL laman web yang disenarai hitam. Penyerang cenderung mengubah pendekatan mereka untuk melancarkan serangan, menjadikan teknik pengesanan pancing data tradisional sukar bagi melindungi pengguna. Prestasi kaedah pengesanan konvensional bergantung kepada data menyeluruh dan ciri-ciri yang dipilih untuk pengelasan. Teknik pengesanan pancing data kebanyakannya bergantung pada ciri-ciri statik yang dipilih berdasarkan kaedah pemilihan atau penarafan ciri tradisional. Kajian ini mencadangkan pendekatan inovatif bagi pengesanan pancing data dengan mereka bentuk teknik pemilihan ciri menggunakan pembelajaran peneguhan. Ejen pembelajaran peneguhan baru, direka menggunakan pendekatan yang dinamik, adaptif, dan berasaskan data bagi memperbaiki prestasi pengelas dalam pengesanan pancing data. Teknik ini direka untuk memilih ciri-ciri secara dinamik menggunakan ejen RL. Teknik ini dinilai menggunakan dataset pancing data sebenar dan dibanding prestasinya dengan teknik sedia ada. Berdasarkan penilaian, metodologi pemilihan ciri dinamik ini memberikan ketepatan terbaik sebanyak 99.07% dengan model pengelasan rawak. Kerja ini merupakan sumbangan kepada kemajuan metodologi pengesanan pancing data dengan membangunkan teknik pemilihan ciri dinamik.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Infosecurity Magazine, "341% Rise in Advanced Phishing Attacks," [Online]. Available: https://www.infosecurity-magazine.com/news/341-rise-advanced-phishing-attacks/.

[Internet Crime Complaint Center (IC3), "2023 Internet Crime Report," [Online]. Available: https://www.ic3.gov/media/pdf/annualreport/2023_ic3report.pdf.

Fredj, Ouissem Ben, et al. "An OWASP top ten driven survey on web application protection methods." Risks and Security of Internet and Systems: 15th International Conference, CRiSIS 2020, Paris, France, November 4–6, 2020, Revised Selected Papers 15. Springer International Publishing, 2021.

Alsenani, Theyab R., et al. "Intelligent feature selection model based on particle swarm optimization to detect phishing websites." Multimedia Tools and Applications 82.29 (2023): 44943-44975. DOI: https://doi.org/10.1007/s11042-023-15399-6

Shabudin, Shafaizal, et al. "Feature selection for phishing website classification." International Journal of Advanced Computer Science and Applications 11.4 (2020).

Hamim, Mohammed, et al. "A novel dimensionality reduction approach to improve microarray data classification." IIUM Engineering Journal 22.1 (2021): 1-22. DOI: https://doi.org/10.31436/iiumej.v22i1.1447

Kalabarige, L. R., Rao, R. S., Pais, A. R., & Gabralla, L. A. (2023). A Boosting-based Hybrid Feature Selection and Multi-layer Stacked Ensemble Learning Model to detect phishing websites. IEEE Access. DOI: https://doi.org/10.1109/ACCESS.2023.3293649

Shabudin, S., Sani, N. S., Ariffin, K. A. Z., & Aliff, M. (2020). Feature selection for phishing website classification. International Journal of Advanced Computer Science and Applications, 11(4). DOI: https://doi.org/10.14569/IJACSA.2020.0110477

Adane, K., Beyene, B., & Abebe, M. (2023). Single and hybrid-ensemble learning-based phishing website detection: examining impacts of varied nature datasets and informative feature selection technique. Digital Threats: Research and Practice, 4(3), 1-27. DOI: https://doi.org/10.1145/3611392

Calzarossa, M. C., Giudici, P., & Zieni, R. (2024). Explainable machine learning for phishing feature detection. Quality and Reliability Engineering International, 40(1), 362-373. DOI: https://doi.org/10.1002/qre.3411

Abulfaz Hajizada and Sharmin Jahan. 2023. Feature Selections for Phishing URL Detection Using Combination of Multiple Feature Selection Methods. In 2023 15th International Conference on Machine Learning and Computing (ICMLC 2023), February 17–20, 2023, Zhuhai, China. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3587716.3587790 DOI: https://doi.org/10.1145/3587716.3587790

Alotaibi, Bandar, and Munif Alotaibi. "Consensus and majority vote feature selection methods and a detection technique for web phishing." Journal of Ambient Intelligence and Humanized Computing 12 (2021): 717-727. DOI: https://doi.org/10.1007/s12652-020-02054-3

Singh, Amit, and Abhishek Tiwari. "A study of feature selection and dimensionality reduction methods for classification-based phishing detection system." International Journal of Information Retrieval Research (IJIRR) 11.1 (2021): 1-35. DOI: https://doi.org/10.4018/IJIRR.2021010101

Bu, S. J., & Kim, H. J. (2022). Optimized URL feature selection based on genetic-algorithm-embedded deep learning for phishing website detection. Electronics, 11(7), 1090. DOI: https://doi.org/10.3390/electronics11071090

Moedjahedy, J., Setyanto, A., Alarfaj, F. K., & Alreshoodi, M. (2022). CCrFS: combine correlation features selection for detecting phishing websites using machine learning. Future Internet, 14(8), 229. DOI: https://doi.org/10.3390/fi14080229

Goud, N. Swapna, and Anjali Mathur. "Feature Engineering Framework to detect Phishing Websites using URL Analysis." International Journal of Advanced Computer Science and Applications 12.7 (2021). DOI: https://doi.org/10.14569/IJACSA.2021.0120733

Liu, D. J., Geng, G. G., Jin, X. B., & Wang, W. (2021). An efficient multistage phishing website detection model based on the CASE feature framework: Aiming at the real web environment. Computers & Security, 110, 102421. DOI: https://doi.org/10.1016/j.cose.2021.102421

Ali, Waleed, and Sharaf Malebary. "Particle swarm optimization-based feature weighting for improving intelligent phishing website detection." IEEE Access 8 (2020): 116766-116780. DOI: https://doi.org/10.1109/ACCESS.2020.3003569

Vrban?i?, Grega, Iztok Fister Jr, and Vili Podgorelec. "Datasets for phishing websites detection." Data in Brief 33 (2020): 106438. DOI: https://doi.org/10.1016/j.dib.2020.106438

Foundational Courses. Google for Developers Online. Available: https://developers.google.com/machine-learning/crash-course/classification