Automatic Lip reading for decimal digits using ResNet50 Model

Authors

  • Computer Science Department, College of Education for pure Sciences, University of Thi-Qar , Iraq
  • Computer Science Department, College of Education for pure Sciences, University of Thi-Qar , Iraq

Keywords:

CNN, ResNet50 and viola jones

Abstract

Lip reading is a method to understand speech through the movement of the lips, as audio speech is not
inclusive of all Categories of society, especially the hearing impaired or people in noisy environments.
Lip reading is the best and alternative solution to this problem. Our proposed system solves this problem
by taking a video of the person speaking with digits. Then the pre-processing process is carried out by
Viola Jones algorithm, by cutting the video into a sequential frame, then detecting the face, then the
mouth, deducting the mouth region of interest(ROI), and inserting the mouth frame into the convolutional
neural network (ResNet50), where the results are classified and the test frames is matched with the
training frames if it is done Matching, the network is working correctly and the correct digit is spoken.
But if the test frame is not matched with the training framework, then there is an error rate in the
network’s work and there is an error rate in the network. For that, we used a standard database to
pronounce the digits from 0 to 9, and we took seven speaking people, 5 males and 2 females, and we got
an accuracy of 86%.

References

A. Nagzkshay Chandra Aarkar, “ROI EXTRACTION AND FEATURE EXTRACTION FOR LIP

READING OF,” vol. 7, no. 1, pp. 484–487, 2020.

R. Bowden, “Comparing Visual Features for Lipreading,” no. September 2016.

A. Mesbah et al., “Lip Reading with Hahn Convolutional Neural Networks moments To cite this

version : HAL Id : hal-02109397 Lip Reading with Hahn Convolutional Neural Networks,” Image

Vis. Comput., vol. 88, pp. 76–83, 2019.

A. Garg and J. Noyola, “Lip reading using CNN and LSTM,” Proc. - 30th IEEE Conf. Comput.

Vis. Pattern Recognition, CVPR 2017, vol. 2017-Jan, p. 3450, 2017.

A. H. Kulkarni and D. Kirange, “Artificial Intelligence: A Survey on Lip-Reading Techniques,”

10th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, Jul. 2019, doi:

1109/ICCCNT45670.2019.8944628.

J. S. Chung and A. Zisserman, “Learning to lip read words by watching videos,” Comput. Vis.

Image Underst., vol. 173, pp. 76–85, 2018, doi: 10.1016/j.cviu.2018.02.001.

B. O. Li, “Deep Learning-Based Automated Lip-Reading :,” vol. 9, 2021, doi:

1109/ACCESS.2021.3107946.

T. As for training and classification, this is done with the help of artificial neural networks and A.

Basturk, “Lip Reading Using Convolutional Neural Networks with and without Pre-Trained

Models,” vol. 7, no. 2, pp. 195–201, 2019, doi: 10.17694/bajece.479891.

G. Zhao, M. Barnard, and M. Pietikäinen, “Lipreading with local spatiotemporal descriptors

IEEE Trans. Multimed., vol. 11, no. 7, pp. 1254–1265, 2009, doi: 10.1109/TMM.2009.2030637.

J. Ngiam and A. Y. Ng, “Multimodal Deep Learning,” 2011.

A. Rekik, A. R. A. Ben-Hamadou, and W. Mahdi, “An adaptive approach for lip-reading using

image and depth data,” Multimed. Tools Appl., vol. 75, no. 14, pp. 8609–8636, 2016, doi:

1007/s11042-015-2774-3.

S. Petridis and M. Pantic, “DEEP COMPLEMENTARY BOTTLENECK FEATURES FOR

VISUAL SPEECH RECOGNITION Stavros Petridis Imperial College London Dept . of Computing ,

London , UK Maja Pantic Imperial College London / Univ . of Twente Dept . of Computing , UK / EEMCS

, Netherlands,” Icassp 2016, pp. 2304–2308, 2016.

J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip reading sentences in the wild,”

Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp.

–3450, 2017, doi: 10.1109/CVPR.2017.367.

J. S. Chung and A. Zisserman, “Lip reading in the wild,” Lect. Notes Comput. Sci. (including

Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10112 LNCS, pp. 87–103, 2017,

doi: 10.1007/978-3-319-54184-6_6.

I. Horace H-S, Digital image processing and computer vision, vol. 8, no. 3. 1990.

H. VAIBHAV, “Face Identification using Haar cascade classifier,” Medium, pp. 1–5, 2020,

[Online]. Available: https://medium.com/geeky-bawa/face-identification-using-haar-cascade-

classifier-af3468a44814.

J. Shemiakina, A. Zhukovsky, I. Konovalenko, and D. Nikolaev, “Automatic cropping of images

under projective transformation,” no. March, p. 117, 2019, doi: 10.1117/12.2523483.

T. Bezdan and N. Bačanin Džakula, “Convolutional Neural Network Layers and Architectures,”

no. July, pp. 445–451, 2019, doi: 10.15308/sinteza-2019-445-451.

A. Ghosh, A. Sufian, F. Sultana, A. Chakrabarti, and D. De, Fundamental concepts of

convolutional neural network, vol. 172, no. January. 2019.

J. Wu, “Introduction to Convolutional Neural Networks,” Introd. to Convolutional Neural

Networks, pp. 1–31, 2017, [Online]. Available:

https://web.archive.org/web/20180928011532/https://cs.nju.edu.cn/wujx/teaching/15_CNN.pdf.

W. M. Learning and P. Kim, MATLAB Deep Learning. .

Q. Zhang, M. Zhang, T. Chen, Z. Sun, Y. Ma, and B. Yu, “Recent advances in convolutional

neural network acceleration,” Neurocomputing, vol. 323, pp. 37–51, 2019, doi:

1016/j.neucom.2018.09.038.

“Convolutional Layer につ いて(画像からの特徴量 抽出) .” [Online]. Available:

https://qiita.com/icoxfog417/items/5fd55fad152231d706c2.

“Enhanced Reader.” .

GeeksforGeeks, “CNN | Introduction to Pooling Layer,” Https://Www.Geeksforgeeks.Org/Cnn-

Introduction-To-Pooling-Layer/. p. duction to Pooling Layer, 2022, [Online]. Available:

https://www.geeksforgeeks.org/cnn-introduction-to-pooling-layer/.

C. Vision, H. Resnet, R. Block, H. Resnet, and U. Resnet, “What is Resnet or Residual Network

How Resnet Helps? Introduction to Resnet or Residual Network,” pp. 1–8, 2020, [Online].

Available: https://www.mygreatlearning.com/blog/resnet/.

Downloads

Published

2023-02-14