Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Although
automatic speech recognition (ASR) technology is mature, there are still some
unsolved problems, such as how to accurately identify what the speaker is
saying in a noisy environment. Lipreading is a visual speech recognition
technology that recognizes the speech content based on the motion
characteristics of the speaker’s lips without speech signals. Therefore,
lipreading can detect the speaker’s content in a noisy environment, even
without a voice signal. This article summarizes the main research from
traditional methods to deep learning methods on lipreading. Traditional
lipreading methods are mainly discussed from three aspects: lip detection and
extraction, lip feature extraction, and classification. Traditional feature
extraction methods focus on handmade features, which are, however, not very
reliable under unconstrained conditions. In recent years, traditional
lipreading methods have been gradually replaced by deep learning methods. The
advantage of deep learning methods is that they can learn the best features
from large databases. This article analyzes typical deep learning methods in
detail according to their structural characteristics, and lists existing
lipreading databases, including their detailed information and the methods
applied to these databases. Finally, the problems and challenges of current
lipreading methods are discussed, and the future research direction has
prospected.
Country : India
IRJIET, Volume 8, Issue 4, April 2024 pp. 143-151