CLASSIFICATION MODEL FOR BREAST CANCER MAMMOGRAMS

Suzani Mohamad Samuri; Try Viananda  Nova; Bahbibi  Rahmatullah; Shir Li Wang; Z.T Al-Qaysi

doi:10.31436/iiumej.v23i1.1825

Authors

Suzani Mohamad Samuri Sultan Idris Education University https://orcid.org/0000-0003-4651-1431
Try Viananda Nova https://orcid.org/0000-0002-2533-3528
Bahbibi Rahmatullah https://orcid.org/0000-0002-6920-8112
Shir Li Wang https://orcid.org/0000-0003-4417-3213
Z.T Al-Qaysi

DOI:

https://doi.org/10.31436/iiumej.v23i1.1825

Keywords:

Machine learning, Breast cancer detection, Mammogram images, Data mining, Data-driven modelling.

Abstract

Machine learning has been the topic of interest in research related to early detection of breast cancer based on mammogram images. In this study, we compare the performance results from three (3) types of machine learning techniques: 1) Naïve Bayes (NB), 2) Neural Network (NN) and 3) Support Vector Machine (SVM) with 2000 digital mammogram images to choose the best technique that could model the relationship between the features extracted and the state of the breast (‘Normal’ or ‘Cancer’). Grey Level Co-occurrence Matrix (GLCM) which represents the two dimensions of the level variation gray in the image is used in the feature extraction process. Six (6) attributes consist of contrast, variance, standard deviation, kurtosis, mean and smoothness were computed as feature extracted and used as the inputs for the classification process. The data has been randomized and the experiment has been repeated for ten (10) times to check for the consistencies of the performance of all techniques. 70% of the data were used as the training data and another 30% used as testing data. The result after ten (10) experiments show that, Support Vector Machine (SVM) gives the most consistent results in correctly classifying the state of the breast as ‘Normal’ or ‘Cancer’, with the accuracy of 99.4%, in training and 98.76% in testing. The SVM classification model has outperformed NN and NB model in the study, and it shows that SVM is a good choice for determining the state of the breast at the early stage.

ABSTRAK: Pembelajaran mesin telah menjadi topik yang diminati dalam penyelidikan yang berkaitan dengan pengesanan awal kanser payudara berdasarkan imej mamogram. Dalam kajian ini, kami membandingkan hasil prestasi dari tiga (3) jenis teknik pembelajaran mesin: 1) Naïve Bayes (NB), 2) Neural Network (NN) dan 3) Support Vector Machine (SVM) dengan 2000 imej digital mammogram hingga teknik terbaik yang dapat memodelkan hubungan antara ciri yang diekstraksi dan keadaan payudara ('Normal' atau 'Cancer') dapat diperoleh. Grey Level Co-occurrence Matrix (GLCM) yang mewakili dua dimensi variasi tahap kelabu pada gambar digunakan dalam proses pengekstrakan ciri. Enam (6) atribut terdiri dari kontras, varians, sisihan piawai, kurtosis, min dan kehalusan dihitung sebagai fitur yang diekstrak dan digunakan sebagai input untuk proses klasifikasi. Eksperimen telah diulang selama sepuluh (10) kali untuk memeriksa kesesuaian prestasi semua teknik. 70% data digunakan sebagai data latihan dan 30% lagi digunakan sebagai data ujian. Hasil setelah sepuluh (10) eksperimen menunjukkan bahawa, Support Vector Machine (SVM) memberikan hasil yang paling konsisten dalam mengklasifikasikan keadaan payudara dengan betul sebagai 'Normal' atau 'Kanser', dengan akurasi 99.4%, dalam latihan dan 98.76% dalam ujian. Model klasifikasi SVM telah mengungguli model NN dan NB dalam kajian ini, dan ia menunjukkan bahawa SVM adalah pilihan yang baik untuk menentukan keadaan payudara pada peringkat awal.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Rubin R. (2017) Do Screening Mammograms Cut Breast Cancer Deaths or Lead to Overtreatment? Probably Both. Forbes. Retrieved from https://www.forbes.com/sites/ritarubin/2017/01/10/do-screening-mammograms-cut-breast-cancer-deaths-or-lead-to-overtreatment-probably-both/

Mohamad Samuri, Suzani, Megariani TV. (2019) Intelligent 3D Analysis for Detection and Classification of Breast Cancer. JITCE (Journal of Information Technology and Computer Engineering), 3(2): 96-103. https://doi.org/https://doi.org/10.25077/jitce.3.02.96-103.2019 DOI: https://doi.org/10.25077/jitce.3.02.96-103.2019

Hosni M, Abnane I, Idri A, de Gea JMC, Alemán JLF. (2019) Reviewing ensemble classification methods in breast cancer. Computer methods and programs in biomedicine, 177: 89-112. DOI: https://doi.org/10.1016/j.cmpb.2019.05.019

Dubois D, Prade, H. (2016) Practical Methods for Constructing Possibility Distributions. Int. J. Intell. Syst., 31: 215-239. https://doi.org/10.1002/int.21782 DOI: https://doi.org/10.1002/int.21782

Mohanaiah P, Sathyanarayana P, GuruKumar L. (2013) Image texture feature extraction using GLCM approach. International journal of scientific and research publications, 3(5): 1.

Han J, Kamber M, Pei J. (2012) Data Mining Concepts and Techniques (3rd ed). USA: Elsevier Inc.

Brereton R, Gavin L. (2010) Support Vector Machines for classification and regression. The Analyst. 135: 230-267. 10.1039/b918972f. DOI: https://doi.org/10.1039/B918972F

Hamidinekoo A, Denton E, Rampun A, Honnor K, Zwiggelaar R. (2018) Deep Learning in Mammography and Breast Histology, an Overview and FutureTrends, Medical Image Analysis. doi:10.1016/j.media.2018.03.006. DOI: https://doi.org/10.1016/j.media.2018.03.006

Sahiner B, Pezeshk A, Hadjiiski LM, Wang X, Drukker K, Cha KH, Summers RM, Giger ML. (2019).Deep learning in medical imaging and radiation therapy. Med Phys. 46(1):e1-e36. doi: 10.1002/mp.13264. Epub 2018 Nov 20. PMID: 30367497. DOI: https://doi.org/10.1002/mp.13264

Sechopoulos I, Teuwen J, Mann R. (2020) Artificial intelligence for breast cancer detection in mammography and digital breast tomosynthesis: State of the art. Seminars in Cancer Biology. https://doi.org/10.1016/j.semcancer.2020.06.002. DOI: https://doi.org/10.1016/j.semcancer.2020.06.002

Arcadu F, Benmansour F, Maunz A. Willis J, Haskova Z, Prunotto,M. (2019) Deep learning algorithm predicts diabetic retinopathy progression in individual patients. NPJ digital medicine, 2(1): 1-9. DOI: https://doi.org/10.1038/s41746-019-0172-3

Wang H, Zheng B, Yoon SW, Ko HS. (2018) A support vector machine-based en- semble algorithm for breast cancer diagnosis, Eur. J. Oper. Res., 267: 687–699.

doi: 10.1016/j.ejor.2017.12.001. DOI: https://doi.org/10.1016/j.ejor.2017.12.001

Huang, M-W, Chen C-W, Lin W-C, Ke S-W, Tsai C-F. (2017) SVM and SVM ensembles in breast cancer prediction, PLoS One 12: e0161501. doi: 10. 1371/journal.pone.0161501 DOI: https://doi.org/10.1371/journal.pone.0161501

Tsirogiannis GL, Frossyniotis D, Nikita KS, Stafylopatis A. (2004) A meta-classifier approach for medical diagnosis, in: Methods Appl. Artif. Intell. pp 154–163. DOI: https://doi.org/10.1007/978-3-540-24674-9_17

doi: 10.1007/978- 3- 540- 24674-9_17.

Tan AC, Gilbert D. (2003) Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinform. 2 :1–10. doi: 10.1186/ 1471-2105-9-275.

Janghel RR, Shukla A, Sharma S, Gnaneswar AV. (2014) Evolutionary ensemble model for breast cancer classification. doi: 10.1007/978-3-319-11897-0_2. DOI: https://doi.org/10.1007/978-3-319-11897-0_2

Joanna J, Piotr J. (2008) GEP-induced expression trees as weak classifiers using gene expression programming to induce, in: Lect. Notes Comput. Sci., pp. 129–141.

Schaefer G, Nakashima T. (2015) Strategies for addressing class imbalance in en- semble classification of thermography breast cancer features, in: 2015 IEEE Congr. Evol. Comput. CEC 2015 –Proc, pp. 2362–2367. doi: 10.1109/CEC. 2015.7257177. DOI: https://doi.org/10.1109/CEC.2015.7257177

Lederman D, Wang XW, Zheng B, Sumkin JH, Tublin M, Gur D. (2011) Fusion of classifiers for REIS-based detection of suspicious breast lesions, in: Med. Imaging 2011 Image Perception, Obs. Performance, Technol. Assess., pp. 1–9. doi:79661c n10.1117/12.877368. DOI: https://doi.org/10.1117/12.877368

Nguyen TT, Liew AWC, Tran MT, Pham XC, Nguyen MP. (2014) A novel genetic algorithm approach for simultaneous feature and classifier selection in multi classifier system, in: Proc. IEEE Congr. Evol. Comput. CEC 2014, pp. 1698–1705.

doi: 10.1109/CEC.2014.6900377. DOI: https://doi.org/10.1109/CEC.2014.6900377

Ghorai S, Mukherjee A, Sengupta S, Dutta PK. (2011) Cancer classification from gene expression data by NPPC ensemble, IEEE/ACM Trans. Comput. Biol. Bioinform. 659–671, doi: 10.1109/TCBB.2010.36. DOI: https://doi.org/10.1109/TCBB.2010.36

Ferri C, Hernández-Orallo J, Modroiu R. (2009) An experimental comparison of per- formance measures for classification, Pattern Recognit. Lett., 30: 27–38.

doi: 10.1016/j.patrec.2008.08.010. DOI: https://doi.org/10.1016/j.patrec.2008.08.010

Davis J, Goadrich M. (2006) The relationship between precision-recall and ROC curves, in: Proc. 23rd Int. Conf. Mach. Learn. DOI: https://doi.org/10.1145/1143844.1143874

Zaidan AA, Ahmad NN, Abdul Karim H, Larbani M, Zaidan BB, Sali A. (2014) On the multi-agent learning neural and Bayesian methods in skin detector and pornography classifier: An automated anti-pornography system. Neurocomputing, 131: 397-418. https://doi.org/10.1016/j.neucom.2013.10.003. DOI: https://doi.org/10.1016/j.neucom.2013.10.003

Alamoodi, A. H.; Garfan, Salem; Zaidan, B. B.; Zaidan, A. A.; Shuwandy, Moceheb Lazam; Alaa, Mussab; Alsalem, M. A.; Mohammed, Ali; Aleesa, A. M.; Albahri, O. S.; Al-Hussein, Ward Ahmed; Alobaidi, O. R. (2020). A systematic review into the assessment of medical apps: motivations, challenges, recommendations and methodological aspect. Health and Technology, (), doi:10.1007/s12553-020-00451-4 DOI: https://doi.org/10.1007/s12553-020-00451-4