Decimal Digits Recognition from Lip Movement Using GoogleNet network


  • Computer Science Department, College of Education for pure Sciences, University of Thi-Qar , Iraq.
  • Computer Science Department, College of Education for pure Sciences, University of Thi-Qar , Iraq.
  • Computer Science Department, College of Education for pure Sciences, University of Thi-Qar , Iraq.


viola jones and GoogleNet


Lip reading is a visual way to communicate with people through the movement of the lips, especially the
hearing impaired and people who are in noisy environments such as stadiums and airports. Lip reading is
not easy to face many difficulties, especially when taking a video of the person, including lighting,
rotation, the person’s position and different skin colors...etc. As researchers are constantly looking for
new techniques for lip-reading.
The main objective of the paper is to design and implement an effective system for identifying decimal
digits by movement. Our proposed system consists of two stages, namely, preprocessing, in which the
face and mouth area are detected, lips are determined and stored in a temporary folder to used viola jones.
The second stage is to take a GoogleNet neural network and insert the flange frame in it, where the
features will be extracted in the convolutional layer and then the classification process where the results
were convincing and we obtained an accuracy of 87% by using a database consisting of 35 videos and it
contained seven males and two females, and the number of the frame was 21,501 lips image.



READING OF,” vol. 7, no. 1, pp. 484–487, 2020.

R. Bowden, “Comparing Visual Features for Lipreading,” no. September 2016.

A. Garg and J. Noyola, “Lip reading using CNN and LSTM,” Proc. - 30th IEEE Conf. Comput.

Vis. Pattern Recognition, CVPR 2017, vol. 2017-Jan, p. 3450, 2017.

J. S. Chung and A. Zisserman, “Learning to lip read words by watching videos,” Comput. Vis.

Image Underst., vol. 173, pp. 76–85, 2018, doi: 10.1016/j.cviu.2018.02.001.

A. Mesbah et al., “Lip Reading with Hahn Convolutional Neural Networks moments To cite this

version : HAL Id : hal-02109397 Lip Reading with Hahn Convolutional Neural Networks,” Image

Vis. Comput., vol. 88, pp. 76–83, 2019.

A. H. Kulkarni and D. Kirange, “Artificial Intelligence: A Survey on Lip-Reading Techniques,”

10th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, Jul. 2019, doi:


A. Bang et al., “Automatic Lip Reading using Image Processing,” vol. 2, no. 02, pp. 279–280,

I. Anina, Z. Zhou, G. Zhao, and M. Pietikainen, “OuluVS2: A multi-view audiovisual database for

non-rigid mouth motion analysis,” 2015 11th IEEE Int. Conf. Work. Autom. Face Gesture

Recognition, FG 2015, no. June 2016, 2015, doi: 10.1109/FG.2015.7163155.

Mr. Befkadu Belete Frew, “Audio-Visual Speech Recognition using LIP Movement for Amharic

Language,” Int. J. Eng. Res., vol. V8, no. 08, pp. 594–604, 2019, doi: 10.17577/ijertv8is080217.

Y. Lu and H. Li, “Automatic lip-reading system based on deep convolutional neural network and

attention-based long short-term memory,” Appl. Sci., vol. 9, no. 8, 2019, doi: 10.3390/app9081599.

K. K. Sudha and P. Sujatha, “A qualitative analysis of googlenet and alexnet for fabric defect

detection,” Int. J. Recent Technol. Eng., vol. 8, no. 1, pp. 86–92, 2019.

L. Pigou, S. Dieleman, P.-J. Kindermans, and B. Schrauwen, “Sign Language Recognition Using

Convolutional Neural Networks BT - Computer Vision - ECCV 2014 Workshops,” pp. 572–578,

, [Online]. Available:

C. D. Mccaig, “Electric Fields in Vertebrate Repair. Edited by R. B. Borgens, K. R. Robinson, J.

W. Vanable and M. E. McGinnis. Pp. 310. (Alan R. Liss, New York, 1989.) $69.50 hardback.

ISBN 0 8451 4274,” Exp. Physiol., vol. 75, no. 2, pp. 280–281, 1990, doi:


Y. LeCun and Y. Bengio, “Convolutional networks for images, speech, and time series,” The

handbook of brain theory and neural networks, vol. 3361. pp. 255–258, 1995, [Online]. Available:

A. Patil and M. Rane, “Convolutional Neural Networks: An Overview and Its Applications in

Pattern Recognition,” Smart Innov. Syst. Technol., vol. 195, pp. 21–30, 2021, doi: 10.1007/978-


“Supervised Deep Learning Algorithms _ Types and Applications.” .

W. M. Learning and P. Kim, MATLAB Deep Learning. .

S. Tammina, “Transfer learning using VGG-16 with Deep Convolutional Neural Network for

Classifying Images,” Int. J. Sci. Res. Publ., vol. 9, no. 10, p. p9420, 2019, doi:


M. Z. Alom et al., “The History Began from AlexNet: A Comprehensive Survey on Deep Learning

Approaches,” 2018, [Online]. Available:

H. J. Jie and P. Wanda, “Runpool: A dynamic pooling layer for convolution neural network,” Int.

J. Comput. Intell. Syst., vol. 13, no. 1, pp. 66–76, 2020, doi: 10.2991/ijcis.d.200120.002.

N. A. Muhammad, A. A. Nasir, Z. Ibrahim, and N. Sabri, “Evaluation of CNN , Alexnet and

GoogleNet for Fruit Recognition,” vol. 12, no. 2, pp. 468–475, 2018, doi:


F. Altenberger and C. Lenz, “A Non-Technical Survey on Deep Convolutional Neural Network


C. Szegedy, “Going deeper with convolutions,” no. January 2017, 2015, doi:


S. L. Wang, A. W. C. Liew, W. H. Lau, and S. H. Leung, “An automatic lipreading system for

spoken digitswith limited training data,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no.

, pp. 1760–1765, 2008, doi: 10.1109/TCSVT.2008.2004924.