Integration of MFCCs and CNN for Multi-Class Stress Speech Classification on Unscripted Dataset

Authors

  • Nur Aishah Zainal International Islamic University Malaysia https://orcid.org/0000-0002-3718-374X
  • Ani Liza Asnawi International Islamic University Malaysia
  • Ahmad Zamani Jusoh International Islamic University Malaysia
  • Siti Noorjannah Ibrahim International Islamic University Malaysia https://orcid.org/0000-0002-2892-5959
  • Huda Adibah Mohd. Ramli International Islamic University Malaysia

DOI:

https://doi.org/10.31436/iiumej.v25i2.3207

Keywords:

Multi-class stress classification, Unscripted dataset, Speech stress detection, MFCCs, CNN

Abstract

Stress is an interaction between individuals and their environment, where perceived threats can lead to serious consequences if prolonged and consistently linked to adverse physical and mental health outcomes. Our study explores methods for stress classification via speech, utilizing an unscripted dataset from an experimental study that was able to show the spontaneous reactions of stressed individuals. Mel-Frequency Cepstral Coefficients (MFCCs) emerge as promising speech features, adept at representing the power spectrum crucial to human auditory perception, especially in stress speech recognition. Leveraging deep learning technology, specifically Convolutional Neural Network (CNN), our research optimally combines speech features and CNN algorithms for stress classification. Despite the scarcity of publications on unscripted datasets and multi-class stress classifications, our study advocates their adoption, aiming to enhance performance metrics and contribute to research expansion. The proposed system shows that MFCCs achieve an accuracy of 95.67% in distinguishing among three stress classes (low-stress, medium-stress, and high-stress), surpassing the prior unscripted dataset study by 81.86%. This highlights the efficacy of the proposed MFCCs-CNN system in stress classification.

ABSTRAK: Tekanan merupakan interaksi antara individu dan persekitaran, di mana ancaman akan membawa kepada akibat serius jika berlarutan, dan secara konsisten dikaitkan dengan kesan kesihatan fizikal dan mental yang buruk. Kajian ini mengkaji kaedah pengelasan tekanan melalui pertuturan, menggunakan set data tanpa skrip yang diperoleh daripada kajian eksperimen, iaitu mampu menunjukkan tindak balas spontan individu tertekan. Pekali Septral Frekuensi-Mel (MFCCs) muncul sebagai ciri pertuturan berpotensi, iaitu mahir dalam menunjukkan secara ringkas spektrum kuasa penting bagi persepsi pendengaran manusia, terutama ketika pengecaman pertuturan bertekanan. Memanfaatkan teknologi pembelajaran mendalam, khususnya Rangkaian Neural Lingkaran (CNN), kajian ini menggabungkan ciri pertuturan dan algoritma CNN secara optimum bagi pengelasan tekanan. Walau terdapat kekurangan penerbitan pada set data tanpa skrip dan klasifikasi tekanan pelbagai kelas, kajian ini meningkatkan penggunaannya, bertujuan bagi meningkatkan metrik prestasi dan menyumbang kepada keluasan penyelidikan. Sistem yang dicadangkan ini menunjukkan bahawa MFCC mencapai ketepatan 95.67% dalam membezakan antara tiga kelas tekanan (tekanan rendah, tekanan sederhana dan tekanan tinggi), mengatasi kajian dataset tanpa skrip terdahulu sebanyak 81.86%. Ini menunjukkan keberkesanan sistem MFCCs-CNN dalam pengelasan tekanan.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

S. A. Kriakous, K. A. Elliott, C. Lamers, and R. Owen, “The Effectiveness of Mindfulness-Based Stress Reduction on the Psychological Functioning of Healthcare Professionals: a Systematic Review,” Mindfulness (N Y), vol. 12, no. 1, pp. 1–28, Jan. 2021, doi: 10.1007/s12671-020-01500-9. DOI: https://doi.org/10.1007/s12671-020-01500-9

S. Liu, A. Lithopoulos, C.-Q. Zhang, M. A. Garcia-Barrera, and R. E. Rhodes, “Personality and perceived stress during COVID-19 pandemic: Testing the mediating role of perceived threat and efficacy,” Pers Individ Dif, vol. 168, p. 110351, Jan. 2021, doi: 10.1016/j.paid.2020.110351. DOI: https://doi.org/10.1016/j.paid.2020.110351

S. Bromuri, A. P. Henkel, D. Iren, and V. Urovi, “Using AI to predict service agent stress from emotion patterns in service interactions,” Journal of Service Management, vol. 32, no. 4, pp. 581–611, 2020, doi: 10.1108/JOSM-06-2019-0163. DOI: https://doi.org/10.1108/JOSM-06-2019-0163

H. Aouani and Y. Ben Ayed, “Speech Emotion Recognition with deep learning,” in Procedia Computer Science, Elsevier B.V., 2020, pp. 251–260. doi: 10.1016/j.procs.2020.08.027. DOI: https://doi.org/10.1016/j.procs.2020.08.027

S. Mihalache, D. Burileanu, and C. Burileanu, “Detecting Psychological Stress from Speech using Deep Neural Networks and Ensemble Classifiers,” Institute of Electrical and Electronics Engineers (IEEE), Nov. 2021, pp. 74–79. doi: 10.1109/sped53181.2021.9587430. DOI: https://doi.org/10.1109/SpeD53181.2021.9587430

P. Chyan, A. Achmad, I. Nurtanio, and I. S. Areni, “A Deep Learning Approach for Stress Detection Through Speech with Audio Feature Analysis,” Institute of Electrical and Electronics Engineers (IEEE), Mar. 2023, pp. 1–5. doi: 10.1109/icitisee57756.2022.10057845. DOI: https://doi.org/10.1109/ICITISEE57756.2022.10057845

N. A. Zainal, A. L. Asnawi, A. Z. Jusoh, S. N. Ibrahim, H. A. M. Ramli, and N. F. M. Azmin, “MFCCs and TEO-MFCCs for Stress Detection on Women Gender through Deep Learning Analysis,” in 2023 9th International Conference on Computer and Communication Engineering (ICCCE), IEEE, Aug. 2023, pp. 283–288. doi: 10.1109/ICCCE58854.2023.10246098. DOI: https://doi.org/10.1109/ICCCE58854.2023.10246098

M. S. Hafiy Hilmy et al., “Stress Classification based on Speech Analysis of MFCC Feature via Machine Learning,” in 2021 8th International Conference on Computer and Communication Engineering (ICCCE), IEEE, Jun. 2021, pp. 339–343. doi: 10.1109/ICCCE50029.2021.9467176. DOI: https://doi.org/10.1109/ICCCE50029.2021.9467176

K. Chlasta, K. Wo?k, and I. Krejtz, “Automated speech-based screening of depression using deep convolutional neural networks,” in Procedia Computer Science, Elsevier B.V., 2019, pp. 618–628. doi: 10.1016/j.procs.2019.12.228. DOI: https://doi.org/10.1016/j.procs.2019.12.228

H.-K. Shin, H. Han, K. Byun, and H.-G. Kang, “Speaker-invariant Psychological Stress Detection Using Attention-based Network,” in 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2020, pp. 308–313.

J. Kejriwal, S. Benus, and M. Trnka, “Stress detection using non-semantic speech representation,” in 2022 32nd International Conference Radioelektronika (RADIOELEKTRONIKA), IEEE, Apr. 2022, pp. 1–5. doi: 10.1109/RADIOELEKTRONIKA54537.2022.9764916. DOI: https://doi.org/10.1109/RADIOELEKTRONIKA54537.2022.9764916

H. Han, K. Byun, and H.-G. Kang, “A Deep Learning-based Stress Detection Algorithm with Speech Signal,” in Proceedings of the 2018 Workshop on Audio-Visual Scene Understanding for Immersive Multimedia, New York, NY, USA: ACM, Oct. 2018, pp. 11–15. doi: 10.1145/3264869.3264875. DOI: https://doi.org/10.1145/3264869.3264875

M. E. Damayanti and L. Listyani, “AN ANALYSIS OF STUDENTS’ SPEAKING ANXIETY IN ACADEMIC SPEAKING CLASS,” ELTR Journal, vol. 4, no. 2, pp. 152–170, Aug. 2020, doi: 10.37147/eltr.v4i2.70. DOI: https://doi.org/10.37147/eltr.v4i2.70

X. T. Tee, T. A. T. Joanna, and W. Kamarulzaman, “Self-regulatory Strategies Used by Malaysian University Students in Reducing Public Speaking Anxiety: A Case Study,” Proceedings of the 2nd International Conference on Social Science, Humanities, Education and Society Development (ICONS 2021), vol. 629, no. Icons 2021, pp. 146–152, 2022, doi: 10.2991/assehr.k.220101.023. DOI: https://doi.org/10.2991/assehr.k.220101.023

N. Li, N. Li, M. Guo, and J. Feng, “Research of Speech Biomarkers for Stress Recognition Using Linear and Nonlinear Features,” in 2021 7th International Conference on Computer and Communications, ICCC 2021, Institute of Electrical and Electronics Engineers Inc., 2021, pp. 509–513. doi: 10.1109/ICCC54389.2021.9674330. DOI: https://doi.org/10.1109/ICCC54389.2021.9674330

Y. S. Can, D. Gokay, D. R. K?l?ç, D. Ekiz, N. Chalabianloo, and C. Ersoy, “How laboratory experiments can be exploited for monitoring stress in the wild: A bridge between laboratory and daily life,” Sensors (Switzerland), vol. 20, no. 3, Feb. 2020, doi: 10.3390/s20030838. DOI: https://doi.org/10.3390/s20030838

Z. K. Abdul and A. K. Al-Talabani, “Mel Frequency Cepstral Coefficient and its Applications: A Review,” IEEE Access, vol. 10. Institute of Electrical and Electronics Engineers Inc., pp. 122136–122158, 2022. doi: 10.1109/ACCESS.2022.3223444. DOI: https://doi.org/10.1109/ACCESS.2022.3223444

D. C. Marcu and C. Grava, “The Importance of Data Quality in Training a Deep Convolutional Neural Network,” in 2023 17th International Conference on Engineering of Modern Electric Systems (EMES), IEEE, Jun. 2023, pp. 1–4. doi: 10.1109/EMES58375.2023.10171785. DOI: https://doi.org/10.1109/EMES58375.2023.10171785

Saulo Barreto, “Data Augmentation | Baeldung on Computer Science.” Accessed: Jul. 24, 2023. [Online]. Available: https://www.baeldung.com/cs/ml-data-augmentation

Downloads

Published

2024-07-14

How to Cite

Zainal, N. A., Asnawi, A. L., Jusoh, A. Z., Ibrahim, S. N., & Mohd. Ramli, H. A. (2024). Integration of MFCCs and CNN for Multi-Class Stress Speech Classification on Unscripted Dataset. IIUM Engineering Journal, 25(2), 381–395. https://doi.org/10.31436/iiumej.v25i2.3207

Issue

Section

Mechatronics and Automation Engineering

Funding data

Most read articles by the same author(s)