Automatic Lip reading for decimal digits using ResNet50 Model
DOI:
https://doi.org/10.32792/jeps.v12i2.196Keywords:
CNN, ResNet50 and viola jonesAbstract
Lip reading is a method to understand speech through the movement of the lips, as audio speech is not
inclusive of all Categories of society, especially the hearing impaired or people in noisy environments.
Lip reading is the best and alternative solution to this problem. Our proposed system solves this problem
by taking a video of the person speaking with digits. Then the pre-processing process is carried out by
Viola Jones algorithm, by cutting the video into a sequential frame, then detecting the face, then the
mouth, deducting the mouth region of interest(ROI), and inserting the mouth frame into the convolutional
neural network (ResNet50), where the results are classified and the test frames is matched with the
training frames if it is done Matching, the network is working correctly and the correct digit is spoken.
But if the test frame is not matched with the training framework, then there is an error rate in the
network’s work and there is an error rate in the network. For that, we used a standard database to
pronounce the digits from 0 to 9, and we took seven speaking people, 5 males and 2 females, and we got
an accuracy of 86%.
References
A. Nagzkshay Chandra Aarkar, “ROI EXTRACTION AND FEATURE EXTRACTION FOR LIP
READING OF,” vol. 7, no. 1, pp. 484–487, 2020.
R. Bowden, “Comparing Visual Features for Lipreading,” no. September 2016.
A. Mesbah et al., “Lip Reading with Hahn Convolutional Neural Networks moments To cite this
version : HAL Id : hal-02109397 Lip Reading with Hahn Convolutional Neural Networks,” Image
Vis. Comput., vol. 88, pp. 76–83, 2019.
A. Garg and J. Noyola, “Lip reading using CNN and LSTM,” Proc. - 30th IEEE Conf. Comput.
Vis. Pattern Recognition, CVPR 2017, vol. 2017-Jan, p. 3450, 2017.
A. H. Kulkarni and D. Kirange, “Artificial Intelligence: A Survey on Lip-Reading Techniques,”
10th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, Jul. 2019, doi:
1109/ICCCNT45670.2019.8944628.
J. S. Chung and A. Zisserman, “Learning to lip read words by watching videos,” Comput. Vis.
Image Underst., vol. 173, pp. 76–85, 2018, doi: 10.1016/j.cviu.2018.02.001.
B. O. Li, “Deep Learning-Based Automated Lip-Reading :,” vol. 9, 2021, doi:
1109/ACCESS.2021.3107946.
T. As for training and classification, this is done with the help of artificial neural networks and A.
Basturk, “Lip Reading Using Convolutional Neural Networks with and without Pre-Trained
Models,” vol. 7, no. 2, pp. 195–201, 2019, doi: 10.17694/bajece.479891.
G. Zhao, M. Barnard, and M. Pietikäinen, “Lipreading with local spatiotemporal descriptors
IEEE Trans. Multimed., vol. 11, no. 7, pp. 1254–1265, 2009, doi: 10.1109/TMM.2009.2030637.
J. Ngiam and A. Y. Ng, “Multimodal Deep Learning,” 2011.
A. Rekik, A. R. A. Ben-Hamadou, and W. Mahdi, “An adaptive approach for lip-reading using
image and depth data,” Multimed. Tools Appl., vol. 75, no. 14, pp. 8609–8636, 2016, doi:
1007/s11042-015-2774-3.
S. Petridis and M. Pantic, “DEEP COMPLEMENTARY BOTTLENECK FEATURES FOR
VISUAL SPEECH RECOGNITION Stavros Petridis Imperial College London Dept . of Computing ,
London , UK Maja Pantic Imperial College London / Univ . of Twente Dept . of Computing , UK / EEMCS
, Netherlands,” Icassp 2016, pp. 2304–2308, 2016.
J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, “Lip reading sentences in the wild,”
Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp.
–3450, 2017, doi: 10.1109/CVPR.2017.367.
J. S. Chung and A. Zisserman, “Lip reading in the wild,” Lect. Notes Comput. Sci. (including
Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10112 LNCS, pp. 87–103, 2017,
doi: 10.1007/978-3-319-54184-6_6.
I. Horace H-S, Digital image processing and computer vision, vol. 8, no. 3. 1990.
H. VAIBHAV, “Face Identification using Haar cascade classifier,” Medium, pp. 1–5, 2020,
[Online]. Available: https://medium.com/geeky-bawa/face-identification-using-haar-cascade-
classifier-af3468a44814.
J. Shemiakina, A. Zhukovsky, I. Konovalenko, and D. Nikolaev, “Automatic cropping of images
under projective transformation,” no. March, p. 117, 2019, doi: 10.1117/12.2523483.
T. Bezdan and N. Bačanin Džakula, “Convolutional Neural Network Layers and Architectures,”
no. July, pp. 445–451, 2019, doi: 10.15308/sinteza-2019-445-451.
A. Ghosh, A. Sufian, F. Sultana, A. Chakrabarti, and D. De, Fundamental concepts of
convolutional neural network, vol. 172, no. January. 2019.
J. Wu, “Introduction to Convolutional Neural Networks,” Introd. to Convolutional Neural
Networks, pp. 1–31, 2017, [Online]. Available:
https://web.archive.org/web/20180928011532/https://cs.nju.edu.cn/wujx/teaching/15_CNN.pdf.
W. M. Learning and P. Kim, MATLAB Deep Learning. .
Q. Zhang, M. Zhang, T. Chen, Z. Sun, Y. Ma, and B. Yu, “Recent advances in convolutional
neural network acceleration,” Neurocomputing, vol. 323, pp. 37–51, 2019, doi:
1016/j.neucom.2018.09.038.
“Convolutional Layer につ いて(画像からの特徴量 抽出) .” [Online]. Available:
https://qiita.com/icoxfog417/items/5fd55fad152231d706c2.
“Enhanced Reader.” .
GeeksforGeeks, “CNN | Introduction to Pooling Layer,” Https://Www.Geeksforgeeks.Org/Cnn-
Introduction-To-Pooling-Layer/. p. duction to Pooling Layer, 2022, [Online]. Available:
https://www.geeksforgeeks.org/cnn-introduction-to-pooling-layer/.
C. Vision, H. Resnet, R. Block, H. Resnet, and U. Resnet, “What is Resnet or Residual Network
How Resnet Helps? Introduction to Resnet or Residual Network,” pp. 1–8, 2020, [Online].
Available: https://www.mygreatlearning.com/blog/resnet/.
Downloads
Published
Issue
Section
License
Copyright Policy
Authors retain copyright of their articles published in the Journal of Education for Pure Science (JEPS).
By submitting their work, authors grant the journal a non-exclusive license to publish, distribute, and archive the article in all formats and media.
License
All articles published in JEPS are licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
This license permits unrestricted use, distribution, and reproduction in any medium, provided that the original author(s) and the source are properly credited.
Author Rights
Authors have the right to:
-
Share their articles on personal websites, institutional repositories, and academic platforms
-
Reuse their work in future research and publications
-
Distribute the published version without restriction
Journal Rights
The journal retains the right to:
-
Publish and archive the articles
-
Include them in indexing and archiving systems such as LOCKSS and CLOCKSS
-
Promote and disseminate the published work
Responsibility
The contents of all articles are the sole responsibility of the authors. The journal, editors, and editorial board are not responsible for any errors, opinions, or statements expressed in the published articles.
Open Access Statement
JEPS provides immediate open access to its content, supporting the principle that making research freely available to the public enhances global knowledge exchange.
This work is licensed under a Creative Commons Attribution 4.0 International License.
https://creativecommons.org/licenses/by/4.0/