Automatic Lip reading for decimal digits using ResNet50 Model
DOI:
https://doi.org/10.32792/jeps.v12i2.196Keywords:
CNN, ResNet50 and viola jonesAbstract
Lip reading is a method to understand speech through the movement of the lips, as audio speech is not
inclusive of all Categories of society, especially the hearing impaired or people in noisy environments.
Lip reading is the best and alternative solution to this problem. Our proposed system solves this problem
by taking a video of the person speaking with digits. Then the pre-processing process is carried out by
Viola Jones algorithm, by cutting the video into a sequential frame, then detecting the face, then the
mouth, deducting the mouth region of interest(ROI), and inserting the mouth frame into the convolutional
neural network (ResNet50), where the results are classified and the test frames is matched with the
training frames if it is done Matching, the network is working correctly and the correct digit is spoken.
But if the test frame is not matched with the training framework, then there is an error rate in the
network’s work and there is an error rate in the network. For that, we used a standard database to
pronounce the digits from 0 to 9, and we took seven speaking people, 5 males and 2 females, and we got
an accuracy of 86%.
References
A. Nagzkshay Chandra Aarkar, “ROI EXTRACTION AND FEATURE EXTRACTION FOR LIP
READING OF,” vol. 7, no. 1, pp. 484–487, 2020.
R. Bowden, “Comparing Visual Features for Lipreading,” no. September 2016.
A. Mesbah et al., “Lip Reading with Hahn Convolutional Neural Networks moments To cite this
version : HAL Id : hal-02109397 Lip Reading with Hahn Convolutional Neural Networks,” Image
Vis. Comput., vol. 88, pp. 76–83, 2019.
A. Garg and J. Noyola, “Lip reading using CNN and LSTM,” Proc. - 30th IEEE Conf. Comput.
Vis. Pattern Recognition, CVPR 2017, vol. 2017-Jan, p. 3450, 2017.
A. H. Kulkarni and D. Kirange, “Artificial Intelligence: A Survey on Lip-Reading Techniques,”
10th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, Jul. 2019, doi:
1109/ICCCNT45670.2019.8944628.
J. S. Chung and A. Zisserman, “Learning to lip read words by watching videos,” Comput. Vis.
Image Underst., vol. 173, pp. 76–85, 2018, doi: 10.1016/j.cviu.2018.02.001.
B. O. Li, “Deep Learning-Based Automated Lip-Reading :,” vol. 9, 2021, doi:
1109/ACCESS.2021.3107946.
T. As for training and classification, this is done with the help of artificial neural networks and A.
Basturk, “Lip Reading Using Convolutional Neural Networks with and without Pre-Trained
Models,” vol. 7, no. 2, pp. 195–201, 2019, doi: 10.17694/bajece.479891.
G. Zhao, M. Barnard, and M. Pietikäinen, “Lipreading with local spatiotemporal descriptors
IEEE Trans. Multimed., vol. 11, no. 7, pp. 1254–1265, 2009, doi: 10.1109/TMM.2009.2030637.
J. Ngiam and A. Y. Ng, “Multimodal Deep Learning,” 2011.
A. Rekik, A. R. A. Ben-Hamadou, and W. Mahdi, “An adaptive approach for lip-reading using
image and depth data,” Multimed. Tools Appl., vol. 75, no. 14, pp. 8609–8636, 2016, doi:
1007/s11042-015-2774-3.
S. Petridis and M. Pantic, “DEEP COMPLEMENTARY BOTTLENECK FEATURES FOR
VISUAL SPEECH RECOGNITION Stavros Petridis Imperial College London Dept . of Computing ,
London , UK Maja Pantic Imperial College London / Univ . of Twente Dept . of Computing , UK / EEMCS
, Netherlands,” Icassp 2016, pp. 2304–2308, 2016.
J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip reading sentences in the wild,”
Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp.
–3450, 2017, doi: 10.1109/CVPR.2017.367.
J. S. Chung and A. Zisserman, “Lip reading in the wild,” Lect. Notes Comput. Sci. (including
Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10112 LNCS, pp. 87–103, 2017,
doi: 10.1007/978-3-319-54184-6_6.
I. Horace H-S, Digital image processing and computer vision, vol. 8, no. 3. 1990.
H. VAIBHAV, “Face Identification using Haar cascade classifier,” Medium, pp. 1–5, 2020,
[Online]. Available: https://medium.com/geeky-bawa/face-identification-using-haar-cascade-
classifier-af3468a44814.
J. Shemiakina, A. Zhukovsky, I. Konovalenko, and D. Nikolaev, “Automatic cropping of images
under projective transformation,” no. March, p. 117, 2019, doi: 10.1117/12.2523483.
T. Bezdan and N. Bačanin Džakula, “Convolutional Neural Network Layers and Architectures,”
no. July, pp. 445–451, 2019, doi: 10.15308/sinteza-2019-445-451.
A. Ghosh, A. Sufian, F. Sultana, A. Chakrabarti, and D. De, Fundamental concepts of
convolutional neural network, vol. 172, no. January. 2019.
J. Wu, “Introduction to Convolutional Neural Networks,” Introd. to Convolutional Neural
Networks, pp. 1–31, 2017, [Online]. Available:
https://web.archive.org/web/20180928011532/https://cs.nju.edu.cn/wujx/teaching/15_CNN.pdf.
W. M. Learning and P. Kim, MATLAB Deep Learning. .
Q. Zhang, M. Zhang, T. Chen, Z. Sun, Y. Ma, and B. Yu, “Recent advances in convolutional
neural network acceleration,” Neurocomputing, vol. 323, pp. 37–51, 2019, doi:
1016/j.neucom.2018.09.038.
“Convolutional Layer につ いて(画像からの特徴量 抽出) .” [Online]. Available:
https://qiita.com/icoxfog417/items/5fd55fad152231d706c2.
“Enhanced Reader.” .
GeeksforGeeks, “CNN | Introduction to Pooling Layer,” Https://Www.Geeksforgeeks.Org/Cnn-
Introduction-To-Pooling-Layer/. p. duction to Pooling Layer, 2022, [Online]. Available:
https://www.geeksforgeeks.org/cnn-introduction-to-pooling-layer/.
C. Vision, H. Resnet, R. Block, H. Resnet, and U. Resnet, “What is Resnet or Residual Network
How Resnet Helps? Introduction to Resnet or Residual Network,” pp. 1–8, 2020, [Online].
Available: https://www.mygreatlearning.com/blog/resnet/.
Downloads
Published
Issue
Section
License
The Authors understand that, the copyright of the articles shall be assigned to Journal of education for Pure Science (JEPS), University of Thi-Qar as publisher of the journal.
Copyright encompasses exclusive rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms and any other similar reproductions, as well as translations. The reproduction of any part of this journal, its storage in databases and its transmission by any form or media, such as electronic, electrostatic and mechanical copies, photocopies, recordings, magnetic media, etc. , will be allowed only with a written permission from Journal of education for Pure Science (JEPS), University of Thi-Qar.
Journal of education for Pure Science (JEPS), University of Thi-Qar, the Editors and the Advisory International Editorial Board make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in the Journal of education for Pure Science (JEPS), University of Thi-Qar are sole and exclusive responsibility of their respective authors and advertisers.