Speech Intelligibility Score Detection Based on Light Weight Deep Learning

Authors

  • Osama Ali Mohammed Dept. of Computer Science, Faculty of Computer Science and Information Technology, Kerbala University, Kerbala, Iraq. 2
  • Noor AL-Shakarchy Dept. of Computer Science, Faculty of Computer Science and Information Technology, Kerbala University, Kerbala, Iraq. 2

DOI:

https://doi.org/10.32792/jeps.v16i1.790

Keywords:

Classification,, CNN, Deep learning, Speech intelligibility, UA-Speech, Speech disorders, Dysarthria

Abstract

 Speech intelligibility evaluation is essential for determining the clarity of speech in individuals with speech impairments since it aids in clinical diagnosis and treatment planning right away. Since traditional evaluation techniques are frequently time-consuming, subjective, and unsuitable for regular or extensive use, trustworthy automated systems are needed. In this study, we present a lightweight deep learning system for automating the classification of audio intelligibility based on two-dimensional Convolutional Neural Networks (2D CNN). Mel-Frequency Cepstral Coefficients (MFCC) and log-mel spectrogram features are extracted from segmented speech samples and used in the model. The UA-Speech and TORGO benchmark datasets, were used for the experiments. The model was trained independently on male and female speech in UA-Speech to investigate adaptability. A female-trained model was then fine-tuned on male data to apply transfer learning. To assess robustness, additional tests were conducted using mixed-gender speech from the TORGO dataset. The suggested model achieves good classification accuracy under all training settings, according to the results, with spectrogram-based models exhibiting strong discriminative capacity and MFCC-based models converging more quickly. The suggested strategy offers a good compromise between accuracy and computing economy, according to comparisons with current methods. All things considered, this work presents a feasible and scalable method for automated speech intelligibility evaluation, with potential advantages for both clinical and research applications.

References

S. Venugopalan , Joel Shor, Manoj Plakal, Jimmy Tobin, Katrin Tomanek, Jordan R. Green, Michael P. Brenner, “Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases,” July 08, 2021, arXiv: arXiv:2107.03985. doi: 10.48550/arXiv.2107.03985.

J. Kim, N. Kumar, A. Tsiartas, M. Li, and S. S. Narayanan, “Automatic intelligibility classification of sentence-level pathological speech,” Computer Speech & Language, vol. 29, no. 1, pp. 132–144, Jan. 2015, doi: 10.1016/j.csl.2014.02.001.

C. Bhat and H. Strik, “Speech Technology for Automatic Recognition and Assessment of Dysarthric Speech: An Overview,” J Speech Lang Hear Res, vol. 68, no. 2, pp. 547–577, Feb. 2025, doi: 10.1044/2024_JSLHR-23-00740.

F. Javanmardi, S. R. Kadiri, and P. Alku, “Pre-trained models for detection and severity level classification of dysarthria from speech,” Speech Communication, vol. 158, p. 103047, Mar. 2024, doi: 10.1016/j.specom.2024.103047.

S. M. Shabber and E. P. Sumesh, “AFM signal model for dysarthric speech classification using speech biomarkers,” Front. Hum. Neurosci., vol. 18, p. 1346297, Feb. 2024, doi: 10.3389/fnhum.2024.1346297.

C. Spille, S. D. Ewert, B. Kollmeier, and B. T. Meyer, “Predicting speech intelligibility with deep neural networks,” Computer Speech & Language, vol. 48, pp. 51–66, Mar. 2018, doi: 10.1016/j.csl.2017.10.004.

A. Tripathi, S. Bhosale, and S. K. Kopparapu, “Automatic Speaker Independent Dysarthric Speech Intelligibility Assessment System,” Computer Speech & Language, vol. 69, p. 101213, Sept. 2021, doi: 10.1016/j.csl.2021.101213.

S. Venugopalan, Jimmy Tobin, Samuel J. Yang, Katie Seaver, Richard J.N. Cave, Pan-Pan Jiang, Neil Zeghidour, Rus Heywood, Jordan Green, Michael P. Brenner., “Speech Intelligibility Classifiers from 550k Disordered Speech Samples,” Mar. 15, 2023, arXiv: arXiv:2303.07533. doi: 10.48550/arXiv.2303.07533.

A. Huang, K. Hall, C. Watson, and S. R. Shahamiri, “A Review of Automated Intelligibility Assessment for Dysarthric Speakers,” in 2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), Bucharest, Romania: IEEE, Oct. 2021, pp. 19–24. doi: 10.1109/SpeD53181.2021.9587400.

E. Yeo, J. M. Liss, V. Berisha, and D. R. Mortensen, “Potential Applications of Artificial Intelligence for Cross-language Intelligibility Assessment of Dysarthric Speech”. doi.org/10.48550/arXiv.2501.15858.

K. L. Kadi, S. A. Selouani, B. Boudraa, and M. Boudraa, “Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge,” Biocybernetics and Biomedical Engineering, vol. 36, no. 1, pp. 233–247, 2016, doi: 10.1016/j.bbe.2015.11.004.

A. A. Joshy and R. Rajan, “Automated Dysarthria Severity Classification: A Study on Acoustic Features and Deep Learning Techniques,” IEEE Trans. Neural Syst. Rehabil. Eng., vol. 30, pp. 1147–1157, 2022, doi: 10.1109/TNSRE.2022.3169814.

H. Tong, H. Sharifzadeh, and I. McLoughlin, “Automatic Assessment of Dysarthric Severity Level Using Audio-Video Cross-Modal Approach in Deep Learning,” in Interspeech 2020, ISCA, Oct. 2020, pp. 4786–4790. doi: 10.21437/Interspeech.2020-1997.

S. Gupta, Ankur T. Patil, Mirali Purohit, Mihir Parmar, Maitreya Patel, Hemant A. Patil, Rodrigo Capobianco Guido, “Residual Neural Network precisely quantifies dysarthria severity-level based on short-duration speech segments,” Neural Networks, vol. 139, pp. 105–117, July 2021, doi: 10.1016/j.neunet.2021.02.008.

H. Kim, Mark Hasegawa-Johnson, Adrienne Perlman, Jon Gunderson, Thomas Huang, Kenneth Watkin, Simone Frame., “Dysarthric speech database for universal access research,” in Interspeech 2008, ISCA: ISCA, Sept. 2008. doi: 10.21437/interspeech.2008-480.

F. Rudzicz, A. K. Namasivayam, and T. Wolff, “The TORGO database of acoustic and articulatory speech from speakers with dysarthria,” Lang Resources & Evaluation, vol. 46, no. 4, pp. 523–541, Dec. 2012, doi: 10.1007/s10579-011-9145-0.

A. Al-Ali, Somaya Al-Maadeed, Moutaz Saleh, Rani Chinnappa Naidu, Zachariah C Alex, Prakash Ramachandran, Rajeev Khoodeeram and Rajesh Kumar., “Classification of Dysarthria based on the Levels of Severity. A Systematic Review,” Oct. 11, 2023, arXiv: arXiv:2310.07264. doi: 10.48550/arXiv.2310.07264.

A. H. Andersen, E. Schoenmaker, and S. Van De Par, “Speech intelligibility prediction as a classification problem,” in 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietri sul Mare, Salerno, Italy: IEEE, Sept. 2016, pp. 1–6. doi: 10.1109/MLSP.2016.7738814.

A. S. Al-Ali, R. M. Haris, Y. Akbari, M. Saleh, S. Al-Maadeed, and M. Rajesh Kumar, “Integrating binary classification and clustering for multi-class dysarthria severity level classification: a two-stage approach,” Cluster Comput, vol. 28, no. 2, p. 136, Apr. 2025, doi: 10.1007/s10586-024-04748-1.

P. N. Chowdary, V. S. Aravind, G. V. N. S. L. V. Vardhan, M. S. Akshay, M. S. Aashish, and J. L. G, “A Few-Shot Approach to Dysarthric Speech Intelligibility Level Classification Using Transformers,” in 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), July 2023, pp. 1–6. doi: 10.1109/ICCCNT56998.2023.10308067.

Downloads

Published

2026-03-01