Utilizing MFCCs and TEO-MFCCs to Classify Stress in Females Using SSNNA

Nur Aishah Zainal; Ani Liza Asnawi; Siti Noorjannah Ibrahim; Nor Fadhillah Mohamed Azmin; Norharyati Harum; Nora Mat Zin

doi:10.31436/iiumej.v26i1.3411

Authors

Nur Aishah Zainal International Islamic University Malaysia https://orcid.org/0000-0002-3718-374X
Ani Liza Asnawi International Islamic University Malaysia https://orcid.org/0000-0003-1964-5661
Siti Noorjannah Ibrahim International Islamic University Malaysia https://orcid.org/0000-0002-2892-5959
Nor Fadhillah Mohamed Azmin International Islamic University Malaysia https://orcid.org/0000-0003-4299-8828
Norharyati Harum International Islamic University Malaysia https://orcid.org/0000-0003-0068-6025
Nora Mat Zin International Islamic University Malaysia https://orcid.org/0000-0002-0679-2534

DOI:

https://doi.org/10.31436/iiumej.v26i1.3411

Keywords:

stress detection via speech, stress classification for female, MFCCs, CNN

Abstract

All individuals are susceptible to experiencing stress in their everyday lives. Nevertheless, stress has a greater influence on females due to both biological and environmental factors. This study utilized female speeches to detect and classify stress and no stress in women. Using speech, composed of non-invasive and non-intrusive approaches, helps to identify stress better in females. A comparative analysis was conducted between Mel-frequency Cepstral Coefficients (MFCCs) and Teager Energy Operator- MFCCs (TEO-MFCCs) to determine the best speech feature for classifying emotions associated with stress and no-stress conditions for female voices. With the assistance of the Stress Speech Neural Network Architecture (SSNNA), an improved accuracy of 93.9% was achieved. This research showed that MFCCs enhanced higher-frequency components in stressed speech, distinguishing between stress and no-stress classes. This study shows that SSNNA achieved high accuracy with 14 female voices, confirming its ability to function independently of speaker identity.

ABSTRAK: Semua individu terdedah kepada stres dalam kehidupan seharian mereka. Walau bagaimanapun, stres memberi pengaruh yang lebih besar terhadap wanita akibat faktor biologi dan persekitaran. Kajian ini menggunakan ucapan untuk mengesan dan mengklasifikasikan stres dan tiada stres dalam kalangan wanita. Penggunaan ucapan, yang merupakan pendekatan tidak invasif dan tidak mengganggu, membantu mengenal pasti tekanan dengan lebih baik dalam kalangan wanita. Analisis perbandingan telah dijalankan antara Mel-frequency Cepstral Coefficients (MFCCs) dan Teager Energy Operator-MFCCs (TEO-MFCCs). Tujuannya adalah untuk menentukan ciri ucapan terbaik bagi mengklasifikasikan emosi yang berkaitan dengan keadaan stres dan tiada stres bagi suara wanita. Dengan bantuan Stress Speech Neural Network Architecture (SSNNA), metrik prestasi yang lebih tinggi dengan ketepatan 93.9% telah dicapai. Penyelidikan ini menunjukkan bahawa MFCCs meningkatkan komponen frekuensi tinggi dalam ucapan yang stres, secara efektif membezakan antara kelas stres dan tiada stres. Kajian ini menunjukkan bahawa SSNNA mencapai ketepatan tinggi dengan 14 suara wanita, mengesahkan ia berfungsi secara bebas daripada identiti penutur.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

K. Monesson, “Why Are Women More Stressed Out Than Men? | UKG.” Accessed: Sep. 11, 2023. [Online]. Available: https://www.ukg.com/blog/life-work-trends/why-are-women-more-stressed-out-men

A. Ghazali, “Wanita tidak tahan tekanan - Sinar Harian,” Sinar Harian Newspaper. Accessed: Sep. 04, 2023. [Online]. Available: https://www.sinarharian.com.my/article/10527/sinar-aktif/murung

M. S. Nordin et al., “Stress Detection based on TEO and MFCC speech features using Convolutional Neural Networks (CNN),” in 2022 IEEE International Conference on Computing (ICOCO), IEEE, Nov. 2022, pp. 84–89. doi: 10.1109/ICOCO56118.2022.10031771. DOI: https://doi.org/10.1109/ICOCO56118.2022.10031771

M. K. Pichora-Fuller and K. Dupuis, “Toronto emotional speech set (TESS),” 2020, Borealis. doi: doi/10.5683/SP2/E8H2MF.

S. R. Livingstone and F. A. Russo, “The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English,” 2018, doi: 10.5281/zenodo.1188976. DOI: https://doi.org/10.1371/journal.pone.0196391

“Great Speech.” Accessed: Jul. 11, 2024. [Online]. Available: https://www.greatspeech.com/can-emotional-stress-cause-speech-problems/

B. Sa?lam Topal and A. E. Yavuz Sever, “I love you but I can’t say: adaptation of the Measure of Verbally Expressed Emotion (MoVEE) to Turkish and investigation of psychometric properties,” Current Psychology, vol. 43, no. 24, pp. 20881–20890, Jun. 2024, doi: 10.1007/s12144-024-05861-5. DOI: https://doi.org/10.1007/s12144-024-05861-5

S. R. Bandela and T. K. Kumar, “Stressed speech emotion recognition using feature fusion of teager energy operator and MFCC,” in 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), IEEE, Jul. 2017, pp. 1–5. doi: 10.1109/ICCCNT.2017.8204149. DOI: https://doi.org/10.1109/ICCCNT.2017.8204149

S. Bromuri, A. P. Henkel, D. Iren, and V. Urovi, “Using AI to predict service agent stress from emotion patterns in service interactions,” Journal of Service Management, vol. 32, no. 4, pp. 581–611, 2020, doi: 10.1108/JOSM-06-2019-0163. DOI: https://doi.org/10.1108/JOSM-06-2019-0163

S. Mihalache, D. Burileanu, and C. Burileanu, “Detecting Psychological Stress from Speech using Deep Neural Networks and Ensemble Classifiers,” Institute of Electrical and Electronics Engineers (IEEE), Nov. 2021, pp. 74–79. doi: 10.1109/sped53181.2021.9587430. DOI: https://doi.org/10.1109/SpeD53181.2021.9587430

A. De Arriba, M. Oriol, and X. Franch, “Merging Datasets for Emotion Analysis,” in 2021 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), IEEE, Nov. 2021, pp. 227–231. doi: 10.1109/ASEW52652.2021.00051. DOI: https://doi.org/10.1109/ASEW52652.2021.00051

X. Huang, A. Acero, H.-W. Hon, and R. Reddy, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, 1st ed. USA: Prentice Hall PTR, 2001.

M. S. Hafiy Hilmy et al., “Stress Classification based on Speech Analysis of MFCC Feature via Machine Learning,” in 2021 8th International Conference on Computer and Communication Engineering (ICCCE), IEEE, Jun. 2021, pp. 339–343. doi: 10.1109/ICCCE50029.2021.9467176. DOI: https://doi.org/10.1109/ICCCE50029.2021.9467176

H. Gao, S. Chen, P. An, and G. Su, “Emotion recognition of mandarin speech for different speech corpora based on nonlinear features,” in 2012 IEEE 11th International Conference on Signal Processing, IEEE, Oct. 2012, pp. 567–570. doi: 10.1109/ICoSP.2012.6491552. DOI: https://doi.org/10.1109/ICoSP.2012.6491552

M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: Features, classification schemes, and databases,” Pattern Recognit, vol. 44, no. 3, pp. 572–587, Mar. 2011, doi: 10.1016/j.patcog.2010.09.020. DOI: https://doi.org/10.1016/j.patcog.2010.09.020