SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING

Authors

DOI:

https://doi.org/10.31436/iiumej.v25i1.2796

Keywords:

chemometrics calibration, ensemble machine learning, near infrared spectroscopy

Abstract

Near infrared spectroscopy (NIRS) is a widely used analytical technique for non-destructive analysis of various materials including food fraud detection. However, the accurate calibration of NIRS data can be challenging due to the complexity of the underlying relationships between the spectral data and the target variables of interest. Ensemble learning, which combines multiple models to make predictions, has been shown to improve the accuracy and robustness of predictive models in various domains. This paper proposes stacking ensemble machine learning (SEML) for calibration of NIRS data with two levels of learning involved. Eight (8) spectroscopy datasets from public repository and previously published works by the authors are used as the case study. The model well generalized the data in the respective regression tasks with   of at least  »0.8 in the test samples and in the respective classification tasks with classification accuracy (CA) of at least »0.8 also. In addition, the proposed SEML can improve, or at least reach par with, the accuracy of individual base learners in both train and test samples for all cases of regression and classification datasets. It shows superior performance in test samples for both regression and classification datasets with respectively  ranging from 0.86 to nearly 1 and CA ranging from 0.89 to 1.

ABSTRAK: Spektroskopi inframerah dekat (NIRS) adalah teknik analitikal yang banyak digunakan bagi analisa pelbagai bahan tanpa merosakkan bahan termasuk ketika mengesan penipuan makanan. Walau bagaimanapun, kalibrasi yang tepat bagi data NIRS adalah sangat mencabar kerana hubungan antara data spektral dan pemboleh ubah sasaran yang ingin dikaji bersifat kompleks. Gabungan pembelajaran (Ensemble learning), iaitu gabungan pelbagai model bagi membuat prediksi, telah terbukti dapat meningkatkan ketepatan dan kecekapan model prediksi dalam pelbagai bentuk. Kajian ini mencadangkan Turutan Gabungan Pembelajaran Mesin (Stacking Ensemble Machine Learning ) (SEML), bagi teknik penentu ukuran data NIRS melibatkan dua tahap pembelajaran. Lapan (8) set data spektroskopi dari repositori awam dan kajian terdahulu oleh pengarang telah digunakan sebagai kes kajian. Model ini menggeneralisasi data dalam tugas regresi  masing-masing sebanyak ?0.8 bagi sampel ujian dan pengelasan tugas masing-masing dengan ketepatan klasifikasi (CA) sekurang-kurangnya ?0.8. Tambahan, SEML yang dicadangkan ini dapat membantu, atau sekurang-kurangnya setanding dengan ketepatan individu dalam pembelajaran berkumpulan dalam kedua-dua sampel latihan dan ujian bagi semua kes set data regresi dan klasifikasi. Ia menunjukkan prestasi terbaik dalam sampel ujian bagi kedua-dua kumpulan set data regresi dan klasifikasi dengan masing-masing  antara 0.86 hingga hampir 1 dan antara julat 0.89 hingga 1 bagi CA.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Solihin MI, Shameem Y, Htut T, Ang CK, Hidayab M. (2019) Non-Invasive Blood Glucose Estimation using Handheld Near Infrared Device. Int. J. Recent Technol. Eng., 3: 16-19.

doi: 10.35940/ijrte.C1004.1083S19. DOI: https://doi.org/10.35940/ijrte.C1004.1083S19

Chen CJ, Akowuah GA. (2023) Comparison of HPLC and ATR-FTIR Methods for the Determination of Rosmarinic Acid in Aqueous Leaf Extract of Orthosiphon stamineus. Nat. Prod. J., 13(1): 40-46. doi: 10.2174/2210315512666220429114935. DOI: https://doi.org/10.2174/2210315512666220429114935

B. A. Sabbagh, P. V. Kumar, Y. L. Chew, J. H. Chin, and G. A. Akowuah. (2022) Determination of metformin in fixed-dose combination tablets by ATR-FTIR spectroscopy. Chem. Data Collect., 13: 100868. doi: 10.1016/J.CDC.2022.100868. DOI: https://doi.org/10.1016/j.cdc.2022.100868

D. G. Abdullah Al-Sanabani, M. I. Solihin, L. P. Pui, W. Astuti, C. K. Ang, and L. W. Hong. (2019) Development of non-destructive mango assessment using Handheld Spectroscopy and Machine Learning Regression. Journal of Physics: Conference Series, 1367(1): 012030. doi: 10.1088/1742-6596/1367/1/012030. DOI: https://doi.org/10.1088/1742-6596/1367/1/012030

S. H. Tan, L. P. Pui, M. I. Solihin, K. S. Keat, W. H. Lim, and C. K. Ang. (2021) Physicochemical analysis and adulteration detection in Malaysia stingless bee honey using a handheld near-infrared spectrometer,” J. Food Process. Preserv., 45(7): e15576. doi: 10.1111/JFPP.15576. DOI: https://doi.org/10.1111/jfpp.15576

K. T. Liew, L. P. Pui, and M. I. Solihin. (2020) Feasibility of fraud detection in rice using a handheld near-infrared spectroscopy. AIP Conference Proceedings, 2306(1): 020018. doi: 10.1063/5.0032679. DOI: https://doi.org/10.1063/5.0032679

P. S. Sampaio, A. Soares, A. Castanho, A. S. Almeida, J. Oliveira, and C. Brites. (2018) Optimization of rice amylose determination by NIR-spectroscopy using PLS chemometrics algorithms. Food Chem., 242: 196–204. doi: 10.1016/j.foodchem.2017.09.058. DOI: https://doi.org/10.1016/j.foodchem.2017.09.058

R. F. Kranenburg et al. (2020) Rapid and robust on-scene detection of cocaine in street samples using a handheld near-infrared spectrometer and machine learning algorithms. Drug Test. Anal., 12(10): 1404–1418. doi: 10.1002/DTA.2895. DOI: https://doi.org/10.1002/dta.2895

K. B. Be? and C. W. Huck. (2019) Breakthrough potential in near-infrared spectroscopy: Spectra simulation. A review of recent developments. Frontiers in Chemistry, 7(FEB). doi: 10.3389/fchem.2019.00048. DOI: https://doi.org/10.3389/fchem.2019.00048

H. P. Wang et al. (2022) Recent advances of chemometric calibration methods in modern spectroscopy: Algorithms, strategy, and related issues. TrAC Trends Anal. Chem., 153: 116648. doi: 10.1016/J.TRAC.2022.116648. DOI: https://doi.org/10.1016/j.trac.2022.116648

H. A. Neto, W. L. F. Tavares, D. C. S. Z. Ribeiro, R. C. O. Alves, L. M. Fonseca, and S. V. A. Campos. (2019) On the utilization of deep and ensemble learning to detect milk adulteration. BioData Min., 12(1): 1–13. doi: 10.1186/s13040-019-0200-5. DOI: https://doi.org/10.1186/s13040-019-0200-5

M. Y. Mohamed, M. I. Solihin, W. Astuti, C. K. Ang, and W. Zailah. (2019) Food powders classification using handheld Near-Infrared Spectroscopy and Support Vector Machine. J. Phys. Conf. Ser., 1367: 012029. doi:10.1088/1742-6596/1367/1/012029. DOI: https://doi.org/10.1088/1742-6596/1367/1/012029

D. Sing et al., (2021) Estimation of Andrographolides and Gradation of Andrographis paniculata Leaves Using Near Infrared Spectroscopy Together With Support Vector Machine. Front. Pharmacol., 12(May): 1–8. doi:10.3389/fphar.2021.629833. DOI: https://doi.org/10.3389/fphar.2021.629833

H. Chen, L. Xu, W. Ai, B. Lin, Q. Feng, and K. Cai. (2020) Kernel functions embedded in support vector machine learning models for rapid water pollution assessment via near-infrared spectroscopy. Science of the Total Environment, 714: 136765. doi: 10.1016/j.scitotenv.2020.136765. DOI: https://doi.org/10.1016/j.scitotenv.2020.136765

A. P. M. Michel, A. E. Morrison, V. L. Preston, C. T. Marx, B. C. Colson, and H. K. White. (2020) Rapid Identification of Marine Plastic Debris via Spectroscopic Techniques and Machine Learning Classifiers. Environ. Sci. Technol., 54(17): 10630–10637. doi: 10.1021/acs.est.0c02099. DOI: https://doi.org/10.1021/acs.est.0c02099

I. M. Nolasco Perez, A. T. Badaró, S. Barbon, A. P. A. Barbon, M. A. R. Pollonio, and D. F. Barbin. (2018) Classification of Chicken Parts Using a Portable Near-Infrared (NIR) Spectrophotometer and Machine Learning. Appl. Spectrosc., 72(12): 1774–1780. doi: 10.1177/0003702818788878. DOI: https://doi.org/10.1177/0003702818788878

Y. Wang, M. Li, R. Ji, M. Wang, and L. Zheng. (2020) Comparison of soil total nitrogen content prediction models based on Vis-NIR spectroscopy. Sensors (Switzerland), 20(24): 1–20. doi: 10.3390/s20247078. DOI: https://doi.org/10.3390/s20247078

V. Woeng, L. Y. Lim, L. Abdul Kalam Saleena, M. I. Solihin, and L. P. Pui. (2022) Physicochemical properties and detection of glucose syrup adulterated Kelulut (Heterotrigona itama) honey using Near-Infrared spectroscopy. J. Food Process. Preserv., 46(7): e16686. doi: 10.1111/JFPP.16686. DOI: https://doi.org/10.1111/jfpp.16686

K. Nordhausen. (2022) Ensemble Methods: Foundations and Algorithms by Zhi-Hua Zhou. Int. Stat. Rev., 81(3): 470–470. doi: 10.1111/INSR.12042_10. DOI: https://doi.org/10.1111/insr.12042_10

H. Cao et al. (2022) Application of stacking ensemble learning model in quantitative analysis of biomaterial activity. Microchem. J., 183: 108075. doi: 10.1016/J.MICROC.2022.108075. DOI: https://doi.org/10.1016/j.microc.2022.108075

M. Dyrby, S. B. Engelsen, L. Nørgaard, M. Bruhn, and L. Lundsberg-Nielsen. (2022) Chemometric Quantitation of the Active Substance (Containing C=N) in a Pharmaceutical Tablet Using Near-Infrared (NIR) Transmittance and NIR FT-Raman Spectra. Applied Spectroscopy, 56(5): 579-585. https://doi.org/10.1366/0003702021955358 DOI: https://doi.org/10.1366/0003702021955358

N. T. Anderson, K. B. Walsh, J. R. Flynn, and J. P. Walsh. (2021) Achieving robustness across season, location and cultivar for a NIRS model for intact mango fruit dry matter content. II. Local PLS and nonlinear models. Postharvest Biol. Technol. 171: 111358. doi: 10.1016/J.POSTHARVBIO.2020.111358. DOI: https://doi.org/10.1016/j.postharvbio.2020.111358

“Data Sets - Eigenvector.” [Online]. Available: https://eigenvector.com/resources/data-sets/. [Accessed: 28-Oct-2021].

“Core Science Resources at QI.” [Online]. Available: https://csr.quadram.ac.uk/. [Accessed: 29-Oct-2021].

Holland. JK, Kemsley. EK, and Wilson. RH. (1998) Use of Fourier transform infrared spectroscopy and partial least squares regression for the detection of adulteration of strawberry purees. J. Sci. Food Agric., 76(2): 263–269. doi: 10.1002/(SICI)1097-0010(199802)76:2. DOI: https://doi.org/10.1002/(SICI)1097-0010(199802)76:2<263::AID-JSFA943>3.0.CO;2-F

U. Blazhko, V. Shapaval, V. Kovalev, and A. Kohler. (2021) Comparison of augmentation and pre-processing for deep learning and chemometric classification of infrared spectra. Chemom. Intell. Lab. Syst., 215: 104367. doi: 10.1016/j.chemolab.2021.104367. DOI: https://doi.org/10.1016/j.chemolab.2021.104367

D. Passos and P. Mishra. (2022) A tutorial on automatic hyperparameter tuning of deep spectral modelling for regression and classification tasks. Chemom. Intell. Lab. Syst., 223: 104520 . doi: 10.1016/j.chemolab.2022.104520. DOI: https://doi.org/10.1016/j.chemolab.2022.104520

D. S. Long, R. E. Engel, and M. C. Siemens. (2008) Measuring Grain Protein Concentration with In-line Near Infrared Reflectance Spectroscopy. Agron. J., 100(2): 247–252. doi: 10.2134/AGRONJ2007.0052. DOI: https://doi.org/10.2134/agronj2007.0052

J. Acquarelli, T. van Laarhoven, J. Gerretzen, T. N. Tran, L. M. C. Buydens, and E. Marchiori. (2017) Convolutional neural networks for vibrational spectroscopic data analysis. Anal. Chim. Acta, 954: 22–31. doi: 10.1016/J.ACA.2016.12.010. DOI: https://doi.org/10.1016/j.aca.2016.12.010

H. Kew. (2021) A model for spectroscopic food sample analysis using data sonification. Int. J. Speech Technol., 24(4): 865–881. doi: 10.1007/s10772-020-09794-9. DOI: https://doi.org/10.1007/s10772-020-09794-9

M. I. Solihin, Z. Zekui, C. K. Ang, F. Heltha, and M. Rizon. (2021) Machine Learning Calibration for Near Infrared Spectroscopy Data: A Visual Programming Approach. Lecture Notes in Electrical Engineering, 666: 577–590. doi: 10.1007/978-981-15-5281-6_40/COVER DOI: https://doi.org/10.1007/978-981-15-5281-6_40

M. I. Solihin, Yanto, G. Hayder, and H. A. Q. Maarif. (2023) Landslide Susceptibility Mapping with Stacking Ensemble Machine Learning. Adv. Sci. Technol. Innov., 1: 35–40. doi: 10.1007/978-3-031-26580-8_7/COVER. DOI: https://doi.org/10.1007/978-3-031-26580-8_7

T. Chen and C. Guestrin. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1: 785–794. doi: 10.1145/2939672.2939785. DOI: https://doi.org/10.1145/2939672.2939785

Z. Cheng, Y. Yang, and H. Zhang. (2022) Interpretable ensemble machine-learning models for strength activity index prediction of iron ore tailings. Case Stud. Constr. Mater., 17: e01239. doi: 10.1016/J.CSCM.2022.E01239. DOI: https://doi.org/10.1016/j.cscm.2022.e01239

K. P. Chan, M. I. Solihin, C. K. Ang, and L. P. Pui. (2022) Experimentation on Spectra Data Regression Using Dense Multilayer Neural Networks with Common Pre-processing. Lect. Notes Electr. Eng., 900: 97–112. doi: 10.1007/978-981-19-2095-0_10/COVER. DOI: https://doi.org/10.1007/978-981-19-2095-0_10

Downloads

Published

2024-01-01

How to Cite

Mahmud Iwan Solihin, Yuan, C. J., Hong, W. S., Pui, L. P., Kit, A. C., Hossain, W., & Machmudah, A. (2024). SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING . IIUM Engineering Journal, 25(1), 208–224. https://doi.org/10.31436/iiumej.v25i1.2796

Issue

Section

Electrical, Computer and Communications Engineering

Funding data