Literature Survey - Lip Reading Model

Gauresh Chopadekar; Nandini Pandey; Numan Rakhangi; Shraddha Balsaraf; Prof. V. P. Patil

doi:https://doi.org/10.47001/IRJIET/2024.804019

Literature Survey - Lip Reading Model

Gauresh ChopadekarStudent, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, IndiaNandini PandeyStudent, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, IndiaNuman RakhangiStudent, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, IndiaShraddha BalsarafStudent, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, IndiaProf. V. P. PatilProfessor, Dept. of AI & ML, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, India

Vol 8 No 4 (2024): Volume 8, Issue 4, April 2024 | Pages: 143-151

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 02-05-2024

doi.org/10.47001/IRJIET/2024.804019

Full Text PDF

Abstract

Although automatic speech recognition (ASR) technology is mature, there are still some unsolved problems, such as how to accurately identify what the speaker is saying in a noisy environment. Lipreading is a visual speech recognition technology that recognizes the speech content based on the motion characteristics of the speaker’s lips without speech signals. Therefore, lipreading can detect the speaker’s content in a noisy environment, even without a voice signal. This article summarizes the main research from traditional methods to deep learning methods on lipreading. Traditional lipreading methods are mainly discussed from three aspects: lip detection and extraction, lip feature extraction, and classification. Traditional feature extraction methods focus on handmade features, which are, however, not very reliable under unconstrained conditions. In recent years, traditional lipreading methods have been gradually replaced by deep learning methods. The advantage of deep learning methods is that they can learn the best features from large databases. This article analyzes typical deep learning methods in detail according to their structural characteristics, and lists existing lipreading databases, including their detailed information and the methods applied to these databases. Finally, the problems and challenges of current lipreading methods are discussed, and the future research direction has prospected.

Keywords

Lip Reading, Automatic speech recognition, ASR, Visual speech recognition, Speech

Citation of this Article

Gauresh Chopadekar, Nandini Pandey, Numan Rakhangi, Shraddha Balsaraf, Prof. V. P. Patil, “Literature Survey - Lip Reading Model”, Published in International Research Journal of Innovations in Engineering and Technology - IRJIET, Volume 8, Issue 4, pp 143-151, April 2024. Article DOI https://doi.org/10.47001/IRJIET/2024.804019

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

M. Hao et al.: Survey of Research on Lipreading Technology, IEEE, VOLUME 8, 2020, Digital Object Identifier 10.1109/ACCESS.2020.3036865.
Petridis, S., Pantic, M. (2017). "Audio-Visual Automatic Speech Recognition: An Overview". IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(5), 1033-1048. [DOI: 10.1109/TPAMI.2016.2578619]
Wand, M., Kottke, D., Meyer, C., & Schüssel, F. (2020). "Survey on Visual Speech Synthesis: A Step Towards Lifelike Virtual Avatars". arXiv preprint arXiv:2004.04579.
Assael, Y. M., Shillingford, B., Whiteson, S., & de Freitas, N. (2016). "LipNet: End-to-End Sentence-level Lipreading". arXiv preprint arXiv:1611.01599.
Zhang, J., Sun, W., Du, J., & Chen, J. (2019). "Deep Lip Reading: A Comparison Between Models". IEEE Access, 7, 16723-16733. [DOI: 10.1109/ACCESS.2019.2891403]
A.Nasuha, F. Ari n, T. Sardjono, H. Takahashi, and M. H. Purnomo, Automatic lip reading for daily Indonesian words based on frame diffeerence and horizontal-vertical image projection, J. Theor. Appl. Inf. Technol., vol. 95, pp. 393402, Jan. 2017.
J. S. Chung, A. Senior, O. Vinyals, and A. Zisserman, Lip reading sentences in the wild, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, pp. 34443453.
D. Kumar Margam, R. Aralikatti, T. Sharma, A. Thanda, P. A K, S. Roy, and S. M Venkatesan, LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models, 2019, arXiv:1906.12170. [Online]. Available: http://arxiv.org/abs/1906.12170.
J. Xiao, S. Yang, Y. Zhang, S. Shan, and X. Chen, Deformation flow based two-stream network for lip reading, in Proc. 15th IEEE Int. Conf. Automat. Face Gesture Recognit. (FG), Mar. 2020, pp. 836842.
X. Zhao, S.Yang, S. Shan, and X.Chen, Mutual information maximization for effective lip reading, in Proc.15th IEEE Int. Conf. Automat. Face Gesture Recognit. (FG), Mar. 2020, pp. 843850.
P. Zhou, W. Yang, W. Chen, Y. Wang, and J. Jia, Modality attention for End-to-end audio-visual speech recognition, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2019, pp. 65656569.
Y. Pei, T.-K. Kim, and H. Zha, Unsupervised random forest manifold alignment for lipreading, in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2013, pp. 129136.
A.Pass, J. Zhang, and D. Stewart, AN investigation into features for multi-view lipreading, in Proc. IEEE Int. Conf. Image Process., Sep. 2010, pp. 24172420.
A.Fernandez-Lopez, O. Martinez, and F. M. Sukno, Towards estimating the upper bound of visual-speech recognition: The visual lip-reading feasibility database, in Proc.12th IEEE Int. Conf. Automat. Face Gesture Recognit. (FG), May 2017, pp. 208215.
S. Petridis, J. Shen, D. Cetin, and M. Pantic, Visual-only recognition of normal, whispered and silent speech, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2018, pp. 62196223.
T. Stafylakis, M. H. Khan, and G. Tzimiropoulos, Pushing the bound aries of audiovisual word recognition using residual networks and LSTMs, Comput. Vis. Image Understand., vols. 176177, pp. 2232, Nov. 2018.
L. Wang, Y. Xu, J. Cheng, H. Xia, J. Yin, and J. Wu, Human action recognition by learning spatio-temporal features with deep neural net works, IEEE Access, vol. 6, pp. 1791317922, 2018.
A.Gutierrez and Z. Robert, Lip reading word classification, Stanford Univ., Stanford, CA, USA, Project Rep. CS231n, 2017.
J. S. Chung and A. Zisserman, Learning to lip read words by watch ing videos, Comput. Vis. Image Understand., vol. 173, pp. 7685, Aug. 2018.
D.-W. Jang, H.-I. Kim, C. Je, R.-H. Park, and H.-M. Park, Lip reading using committee networks with two different types of concatenated frame images, IEEE Access, vol. 7, pp. 9012590131, 2019.
B. Shillingford, Y. Assael, M. W. Hoffman, T. Paine, C. Hughes, U. Prabhu, H. Liao, H. Sak, K. Rao, L. Bennett, M. Mulville, B. Coppin, B. Laurie, A. Senior, and N. de Freitas, Large-scale visual speech recognition, 2018, arXiv:1807.05162. [Online]. Available: http://arxiv.org/ abs/1807.05162.
T. Stafylakis, M. H. Khan, and G. Tzimiropoulos, Pushing the bound aries of audiovisual word recognition using residual networks and LSTMs, Comput. Vis. Image Understand., vols. 176177, pp. 22-32.
J. R. Movellan, Visual speech recognition with stochastic networks, in Proc. Adv. Neural Inf. Process. Syst., 1994, pp. 851-858.
E. Bailly-Bailliére, S. Bengio, F. Bimbot, M. Hamouz, J. Kittler, J. Mariéthoz, J. Matas, K. Messer, V. Popovici, F. Porée, B. Ruiz, and J.-P. Thiran, The BANCA database and evaluation protocol, in Audio and Video-Based Biometric Person Authentication. Berlin, Germany: Springer, 2003, pp. 625638.
N. A. Fox, B. A. OMullane, and R. B. Reilly, VALID: A new practical audio-visual database, and comparative results, in Proc. Int. Conf. Audio Video Biometric Person Authentication. Berlin, Germany: Springer, 2005, pp. 777786.
E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, CUAVE: A new audio-visual database for multimodal human-computer interface research, in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (CASSP), May 2002, p. II-2017.
I.Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey, Extraction of visual features for lipreading, IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 2, pp. 198213, Feb. 2002.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Literature Survey - Lip Reading Model

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links