Discrete wavelet transforms, Feature extraction, Hidden Markov Models, Speaker recognition, Wavelet coefficients


Speaker recognition is the process of recognizing a speaker from his speech. This can be used in many aspects of life, such as taking access remotely to a personal device, securing access to voice control, and doing a forensic investigation. In speaker recognition, extracting features from the speech is the most critical process. The features are used to represent the speech as unique features to distinguish speech samples from one another. In this research, we proposed the use of a combination of Wavelet and Mel Frequency Cepstral Coefficient (MFCC), Wavelet-MFCC, as feature extraction methods, and Hidden Markov Model (HMM) as classification. The speech signal is first extracted using Wavelet into one level of decomposition, then only the sub-band detail coefficient is used as the feature for further extraction using MFCC. The modeled system was applied in 300 speech datasets of 30 speakers uttering “HADIR” in the Indonesian language. K-fold cross-validation is implemented with five folds. As much as 80% of the data were trained for each fold, while the rest was used as testing data. Based on the testing, the system's accuracy using the combination of Wavelet-MFCC obtained is 96.67%.

ABSTRAK: Pengecaman penutur adalah proses mengenali penutur dari ucapannya yang dapat digunakan dalam banyak aspek kehidupan, seperti mengambil akses dari jauh ke peranti peribadi, mendapat kawalan ke atas akses suara, dan melakukan penyelidikan forensik. Ciri-ciri khas dari ucapan merupakan proses paling kritikal dalam pengecaman penutur. Ciri-ciri ini digunakan bagi mengenali ciri unik yang terdapat pada sesebuah ucapan dalam membezakan satu sama lain. Penyelidikan ini mencadangkan penggunaan kombinasi Wavelet dan Mel Frekuensi Pekali Cepstral (MFCC), Wavelet-MFCC, sebagai kaedah ekstrak ciri-ciri penutur, dan Model Markov Tersembunyi (HMM) sebagai pengelasan. Isyarat penuturan pada awalnya diekstrak menggunakan Wavelet menjadi satu tahap penguraian, kemudian hanya pekali perincian sub-jalur digunakan bagi pengekstrakan ciri-ciri berikutnya menggunakan MFCC. Model ini diterapkan kepada 300 kumpulan data ucapan daripada 30 penutur yang mengucapkan kata "HADIR" dalam bahasa Indonesia. Pengesahan silang K-lipat dilaksanakan dengan 5 lipatan. Sebanyak 80% data telah dilatih bagi setiap lipatan, sementara selebihnya digunakan sebagai data ujian. Berdasarkan ujian ini, ketepatan sistem yang menggunakan kombinasi Wavelet-MFCC memperolehi 96.67%.


Download data is not yet available.


Metrics Loading ...

Author Biographies

Syahroni Hidayat, University of Mataram

Departement of Agricultural Engineering, University of Mataram, Mataram City, Indonesia

Research and Development, Sekawan Institute, Mataram City, Indonesia

Muhammad Tajuddin, Universitas Bumigora

Departement of Computer Science, Universitas Bumigora, Mataram City, Indonesia

Siti Agrippina Alodia Yusuf, Sekawan Institute

Research and Development, Sekawan Institute, Mataram City, Indonesia

Jihadil Qudsi, Politeknik Medica Farma Husada

Deptartement of Medical Record, Politeknik Medica Farma Husada, Mataram City, Indonesia

Nenet Natasudian Jaya, Universitas Mahasaraswati Mataram

Departement of Management, Universitas Mahasaraswati Mataram, Mataram City, Indonesia


Tirumala SS, Shahamiri SR, Garhwal AS, Wang R. (2017) Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90:250-271. doi:10.1016/j.eswa.2017.08.015 DOI: https://doi.org/10.1016/j.eswa.2017.08.015

Alsulaiman M, Mahmood A, Muhammad G. (2017) Speaker recognition based on Arabic phonemes. Speech Communication, 86:42-51. doi:10.1016/j.specom.2016.11.004 DOI: https://doi.org/10.1016/j.specom.2016.11.004

Shaver, Clark D. and Acken, John M. (2016) A Brief Review of Speaker Recognition Technology. Electrical and Computer Engineering Faculty Publications and Presentations. 350. http://pdxscholar.library.pdx.edu/ece_fac/350

Wei, Y. (2020). Adaptive Speaker Recognition Based on Hidden Markov Model Parameter Optimization. IEEE Access, 8: 34942-34948. doi:10.1109/ACCESS.2020.2972511 DOI: https://doi.org/10.1109/ACCESS.2020.2972511

Huang, Xuedong and Acero, Alex, Hon H-W. (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Upper Saddle River, NJ, United States.

Zhao X, Wang Y, Wang D. Robust. (2014) Speaker Identification in Noisy and Reverberant Conditions. ICASSP, IEEE Int Conf Acoust Speech Signal Process – Proc, 22(4):3997-4001. doi:10.1109/ICASSP.2014.6854352 DOI: https://doi.org/10.1109/TASLP.2014.2308398

Ayoub B, Jamal K, Arsalane Z. (2016) Gammatone frequency cepstral coefficients for speaker identification over VoIP networks. 2016 Int Conf Inf Technol Organ Dev IT4OD. doi:10.1109/IT4OD.2016.7479293 DOI: https://doi.org/10.1109/IT4OD.2016.7479293

Daqrouq K, Al Azzawi KY. (2012) Average framing linear prediction coding with wavelet transform for text-independent speaker identification system. Comput Electr Eng, 38(6):1467-1479. doi:10.1016/j.compeleceng.2012.04.014 DOI: https://doi.org/10.1016/j.compeleceng.2012.04.014

Amelia F, Gunawan D. (2019) DWT-MFCC Method for Speaker Recognition System with Noise. 7th Int Conf Smart Comput Commun ICSCC 2019, pp.1-5. doi:10.1109/ICSCC.2019.8843660 DOI: https://doi.org/10.1109/ICSCC.2019.8843660

Lung SY. (2007) Efficient text independent speaker recognition with wavelet feature selection based multilayered neural network using supervised learning algorithm. Pattern Recognit, 40(12):3616-3620. doi:10.1016/j.patcog.2007.05.010 DOI: https://doi.org/10.1016/j.patcog.2007.05.010

Wu J Da, Lin BF. (2009) Speaker identification using discrete wavelet packet transform technique with irregular decomposition. Expert Syst Appl, 36(2 PART 2):3136-3143. doi:10.1016/j.eswa.2008.01.038 DOI: https://doi.org/10.1016/j.eswa.2008.01.038

Kumar P, Chandra M. (2011) Hybrid of wavelet and MFCC features for speaker verification. Proc 2011 World Congr Inf Commun Technol WICT 2011, pp. 1150-1154. doi:10.1109/WICT.2011.6141410 DOI: https://doi.org/10.1109/WICT.2011.6141410

Turner C, Joseph AA. (2015) Wavelet Packet and Mel-Frequency Cepstral Coefficients-Based Feature Extraction Method for Speaker Identification. Procedia Comput Sci, 61:416-421. doi:10.1016/j.procs.2015.09.177 DOI: https://doi.org/10.1016/j.procs.2015.09.177

Kishore KVK, Sharrefaunnisa S, Venkatramaphanikumar S. (2015). An efficient text dependent speaker recognition using fusion of MFCC and SBC. 1st Int Conf Futur Trends Comput Anal Knowl Manag ABLAZE 2015, (Ablaze):18-22. doi:10.1109/ABLAZE.2015.7154960 DOI: https://doi.org/10.1109/ABLAZE.2015.7154960

Rathor S, Jadon RS. (2017) Text indpendent speaker recognition using wavelet cepstral coefficient and butter worth filter. 8th Int Conf Comput Commun Netw Technol ICCCN, pp.1-5. doi:10.1109/ICCCNT.2017.8204079 DOI: https://doi.org/10.1109/ICCCNT.2017.8204079

Badrit N, Tadjt ABC, Gargourt C, Ramachandrant K. (2002) On the use of wavelet and Fourier transforms for speaker verification. The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002., Tulsa, OK, USA, pp. III-344.

doi: 10.1109/MWSCAS.2002.1187043 DOI: https://doi.org/10.1109/MWSCAS.2002.1187043

Adam TB, Salam MS, Gunawan TS. (2013) Wavelet based cepstral coefficients for neural network speech recognition. IEEE ICSIPA 2013 - IEEE Int Conf Signal Image Process Appl, pp.447-451. doi:10.1109/ICSIPA.2013.6708048 DOI: https://doi.org/10.1109/ICSIPA.2013.6708048

Rozario MS, Thomas A, Mathew D. (2019) Performance Comparison of Multiple Speech Features for Speaker Recognition using Artifical Neural Network. 9th International Conference on Advances in Computing and Communication (ICACC), Kochi, India, pp. 234-239. doi: 10.1109/ICACC48162.2019.8986182 19. DOI: https://doi.org/10.1109/ICACC48162.2019.8986182

Hidayat S. Hidayat R. Adji TB. (2016) Speech Recognition of Kv-Patterned Indonesian Syllable Using Mfcc, Wavelet and Hmm. Kursor, 8(2):67. doi:10.28961/kursor.v8i2.63 DOI: https://doi.org/10.28961/kursor.v8i2.63

Sharma G, Umapathy K, Krishnan S. (2020) Trends in audio signal feature extraction methods. Appl Acoust. 158:107020. doi:10.1016/j.apacoust.2019.107020 DOI: https://doi.org/10.1016/j.apacoust.2019.107020

Sunitha C, Chandra E. (2015) Speaker recognition using MFCC and improved weighted vector quantization algorithm. International Journal of Engineering and Technology, 7(5):1685-1692.

Adam TB, Salam MS, Gunawan TS. (2013) Wavelet Cepstral Coefficients for Isolated Speech Recognition. TELKOMNIKA, 11(5):2731-2738. doi:10.11591/telkomnika.v11i5.2510 DOI: https://doi.org/10.11591/telkomnika.v11i5.2510

Hidayat S, Hasanah U, Rizal AA. (2016) Algoritma Penghapus Derau / Silence Dan Penentuan Endpoint Dengan Nilai Ambang Terbobot Untuk Sinyal Suara. In: Seminar Nasional APTIKOM (SEMNASTIKOM), pp.320-323.

Sekkate S, Khalil M, Adib A. (2018) Fusing wavelet and short-term features for speaker identification in noisy environment. Int Conf Intell Syst Comput Vision, ISCV 2018. May:1-8. doi:10.1109/ISACV.2018.8354030 DOI: https://doi.org/10.1109/ISACV.2018.8354030

Jahangir, R. Teh, YW. Memon, NA. Mujtaba, G. Zareei, M. Ishtiaq, U. AKhtar, MZ. Ali, I. (2020) Text-independent Speaker Identification through Feature Fusion and Deep Neural Network. IEEE Access, 8: 32187-32202. doi:10.1109/ACCESS.2020.2973541 DOI: https://doi.org/10.1109/ACCESS.2020.2973541

El-henawy IM, Khedr WI, Elkomy OM, Abdalla AMI. (2014) Recognition of phonetic Arabic figures via wavelet based Mel Frequency Cepstrum using HMMs. HBRC J, 10(1):49-54. doi:10.1016/j.hbrcj.2013.09.003 DOI: https://doi.org/10.1016/j.hbrcj.2013.09.003

Maurya A, Kumar D, Agarwal RK. (2018) Speaker Recognition for Hindi Speech Signal using MFCC-GMM Approach. Procedia Comput Sci, 125:880-887. doi:10.1016/j.procs.2017.12.112 DOI: https://doi.org/10.1016/j.procs.2017.12.112

Picard RR, Cook RD. (1984). Cross-Validation of Regression Models. J Am Stat Assoc, 79(387):575-583. doi:10.2307/2288403 DOI: https://doi.org/10.1080/01621459.1984.10478083

Li L, Zheng TF. (2015) Gender-dependent feature extraction for speaker recognition. In IEEE China Summit and International Conference on Signal and Information Processing, ChinaSIP 2015 - Proceedings, pp 509-513. doi:10.1109/ChinaSIP.2015.7230455 DOI: https://doi.org/10.1109/ChinaSIP.2015.7230455

Kanervisto A, Vestman V, Sahidullah M, Hautamaki V, Kinnunen T. (2017) Effects of gender information in text-independent and text-dependent speaker verification. ICASSP, IEEE Int Conf Acoust Speech Signal Process - Proc., pp. 5360-5364. doi:10.1109/ICASSP.2017.7953180 DOI: https://doi.org/10.1109/ICASSP.2017.7953180

Titze IR. (1989) Physiologic and acoustic differences between male and female voices. J Acoust Soc Am, 85:1699-1707. doi:https://doi.org/10.1121/1.397959 DOI: https://doi.org/10.1121/1.397959

Lee Y, Keating P, Kreiman J. (2019) Acoustic voice variation within and between speakers. J Acoust Soc Am, 146(3):1568-1579. doi:10.1121/1.5125134 DOI: https://doi.org/10.1121/1.5125134

Zhang Z. (2016) Mechanics of human voice production and control. J Acoust Soc Am, 140(4):2614-2635. doi:10.1121/1.4964509 DOI: https://doi.org/10.1121/1.4964509




How to Cite

Hidayat, S., Muhammad Tajuddin, Siti Agrippina Alodia Yusuf, Jihadil Qudsi, & Jaya, N. N. (2022). WAVELET DETAIL COEFFICIENT AS A NOVEL WAVELET-MFCC FEATURES IN TEXT-DEPENDENT SPEAKER RECOGNITION SYSTEM. IIUM Engineering Journal, 23(1), 68–81. https://doi.org/10.31436/iiumej.v23i1.1760



Electrical, Computer and Communications Engineering