Decimal Digits Recognition from Lip Movement Using GoogleNet network
DOI:
https://doi.org/10.32792/jeps.v12i2.195Keywords:
viola jones and GoogleNetAbstract
Lip reading is a visual way to communicate with people through the movement of the lips, especially the
hearing impaired and people who are in noisy environments such as stadiums and airports. Lip reading is
not easy to face many difficulties, especially when taking a video of the person, including lighting,
rotation, the person’s position and different skin colors...etc. As researchers are constantly looking for
new techniques for lip-reading.
The main objective of the paper is to design and implement an effective system for identifying decimal
digits by movement. Our proposed system consists of two stages, namely, preprocessing, in which the
face and mouth area are detected, lips are determined and stored in a temporary folder to used viola jones.
The second stage is to take a GoogleNet neural network and insert the flange frame in it, where the
features will be extracted in the convolutional layer and then the classification process where the results
were convincing and we obtained an accuracy of 87% by using a database consisting of 35 videos and it
contained seven males and two females, and the number of the frame was 21,501 lips image.
References
A. Nagzkshay Chandra Aarkar, “ROI EXTRACTION AND FEATURE EXTRACTION FOR LIP
READING OF,” vol. 7, no. 1, pp. 484–487, 2020.
R. Bowden, “Comparing Visual Features for Lipreading,” no. September 2016.
A. Garg and J. Noyola, “Lip reading using CNN and LSTM,” Proc. - 30th IEEE Conf. Comput.
Vis. Pattern Recognition, CVPR 2017, vol. 2017-Jan, p. 3450, 2017.
J. S. Chung and A. Zisserman, “Learning to lip read words by watching videos,” Comput. Vis.
Image Underst., vol. 173, pp. 76–85, 2018, doi: 10.1016/j.cviu.2018.02.001.
A. Mesbah et al., “Lip Reading with Hahn Convolutional Neural Networks moments To cite this
version : HAL Id : hal-02109397 Lip Reading with Hahn Convolutional Neural Networks,” Image
Vis. Comput., vol. 88, pp. 76–83, 2019.
A. H. Kulkarni and D. Kirange, “Artificial Intelligence: A Survey on Lip-Reading Techniques,”
10th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, Jul. 2019, doi:
1109/ICCCNT45670.2019.8944628.
A. Bang et al., “Automatic Lip Reading using Image Processing,” vol. 2, no. 02, pp. 279–280,
I. Anina, Z. Zhou, G. Zhao, and M. Pietikainen, “OuluVS2: A multi-view audiovisual database for
non-rigid mouth motion analysis,” 2015 11th IEEE Int. Conf. Work. Autom. Face Gesture
Recognition, FG 2015, no. June 2016, 2015, doi: 10.1109/FG.2015.7163155.
Mr. Befkadu Belete Frew, “Audio-Visual Speech Recognition using LIP Movement for Amharic
Language,” Int. J. Eng. Res., vol. V8, no. 08, pp. 594–604, 2019, doi: 10.17577/ijertv8is080217.
Y. Lu and H. Li, “Automatic lip-reading system based on deep convolutional neural network and
attention-based long short-term memory,” Appl. Sci., vol. 9, no. 8, 2019, doi: 10.3390/app9081599.
K. K. Sudha and P. Sujatha, “A qualitative analysis of googlenet and alexnet for fabric defect
detection,” Int. J. Recent Technol. Eng., vol. 8, no. 1, pp. 86–92, 2019.
L. Pigou, S. Dieleman, P.-J. Kindermans, and B. Schrauwen, “Sign Language Recognition Using
Convolutional Neural Networks BT - Computer Vision - ECCV 2014 Workshops,” pp. 572–578,
, [Online]. Available: https://core.ac.uk/download/pdf/55693048.pdf.
C. D. Mccaig, “Electric Fields in Vertebrate Repair. Edited by R. B. Borgens, K. R. Robinson, J.
W. Vanable and M. E. McGinnis. Pp. 310. (Alan R. Liss, New York, 1989.) $69.50 hardback.
ISBN 0 8451 4274,” Exp. Physiol., vol. 75, no. 2, pp. 280–281, 1990, doi:
1113/expphysiol.1998.sp004170.
Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” The
handbook of brain theory and neural networks, vol. 3361. pp. 255–258, 1995, [Online]. Available:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.32.9297&rep=rep1&type=pdf.
A. Patil and M. Rane, “Convolutional Neural Networks: An Overview and Its Applications in
Pattern Recognition,” Smart Innov. Syst. Technol., vol. 195, pp. 21–30, 2021, doi: 10.1007/978-
-15-7078-0_3.
“Supervised Deep Learning Algorithms _ Types and Applications.” .
W. M. Learning and P. Kim, MATLAB Deep Learning. .
S. Tammina, “Transfer learning using VGG-16 with Deep Convolutional Neural Network for
Classifying Images,” Int. J. Sci. Res. Publ., vol. 9, no. 10, p. p9420, 2019, doi:
29322/ijsrp.9.10.2019.p9420.
M. Z. Alom et al., “The History Began from AlexNet: A Comprehensive Survey on Deep Learning
Approaches,” 2018, [Online]. Available: http://arxiv.org/abs/1803.01164.
H. J. Jie and P. Wanda, “Runpool: A dynamic pooling layer for convolution neural network,” Int.
J. Comput. Intell. Syst., vol. 13, no. 1, pp. 66–76, 2020, doi: 10.2991/ijcis.d.200120.002.
N. A. Muhammad, A. A. Nasir, Z. Ibrahim, and N. Sabri, “Evaluation of CNN , Alexnet and
GoogleNet for Fruit Recognition,” vol. 12, no. 2, pp. 468–475, 2018, doi:
11591/ijeecs.v12.i2.pp468-475.
F. Altenberger and C. Lenz, “A Non-Technical Survey on Deep Convolutional Neural Network
Architectures.”
C. Szegedy, “Going deeper with convolutions,” no. January 2017, 2015, doi:
1109/CVPR.2015.7298594.
S. L. Wang, A. W. C. Liew, W. H. Lau, and S. H. Leung, “An automatic lipreading system for
spoken digitswith limited training data,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no.
, pp. 1760–1765, 2008, doi: 10.1109/TCSVT.2008.2004924.
Downloads
Published
Issue
Section
License
The Authors understand that, the copyright of the articles shall be assigned to Journal of education for Pure Science (JEPS), University of Thi-Qar as publisher of the journal.
Copyright encompasses exclusive rights to reproduce and deliver the article in all form and media, including reprints, photographs, microfilms and any other similar reproductions, as well as translations. The reproduction of any part of this journal, its storage in databases and its transmission by any form or media, such as electronic, electrostatic and mechanical copies, photocopies, recordings, magnetic media, etc. , will be allowed only with a written permission from Journal of education for Pure Science (JEPS), University of Thi-Qar.
Journal of education for Pure Science (JEPS), University of Thi-Qar, the Editors and the Advisory International Editorial Board make every effort to ensure that no wrong or misleading data, opinions or statements be published in the journal. In any way, the contents of the articles and advertisements published in the Journal of education for Pure Science (JEPS), University of Thi-Qar are sole and exclusive responsibility of their respective authors and advertisers.