MediaPipe and Deep Learning for Robust Real-Time Hand Gesture Recognition in Sign Language

D SumathiDepartment of Computer Science Engineering, Alliance School of Advanced Computing, Bengaluru, IndiaPotteti TejaswiniDepartment of Computer Science Engineering, Alliance School of Advanced Computing, Bengaluru, IndiaSadineni AasrithaDepartment of Computer Science Engineering, Alliance School of Advanced Computing, Bengaluru, IndiaGadamsetty DeepthikaDepartment of Computer Science Engineering, Alliance School of Advanced Computing, Bengaluru, India

Vol 9 No 2025 (2025): Volume 9, Special Issue of ICCIS-2025 May 2025 | Pages: 144-149

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 11-06-2025

doi Logo doi.org/10.47001/IRJIET/2025.ICCIS-202523

Abstract

This extend centers on creating an AI-based framework for real-time sign dialect discovery utilizing computer vision and profound learning methods. The essential objective is to bridge the communication hole between the hard of hearing and hearing communities by precisely recognizing hand motions and changing over them into content or discourse. The examination includes utilizing MediaPipe Hands, OpenCV, and a profound learning show prepared on a dataset of sign dialect signals. Strategies such as convolutional neural systems (CNNs) and repetitive neural systems (RNNs) are utilized to make strides motion acknowledgment precision.

The MediaPipe Hands system, combined with OpenCV, empowers vigorous real-time hand following and keypoint extraction. Profound learning models, especially CNN-based models, accomplish tall precision in classifying sign dialect motions. The framework performs well in controlled situations but faces challenges with varieties in lighting, foundation clutter, and hand occlusions. Growing the dataset and coordination more complex worldly models (e.g., LSTMs or Transformers) can assist upgrade acknowledgment exactness. Move forward dataset differing qualities by joining more hand shapes, skin tones, and lighting conditions. Execute transient modeling methods (e.g., LSTMs, Transformers) to improve acknowledgment of ceaseless sign dialect.

Keywords

Sign Language Recognition, Real-time Detection, Deep Learning, MediaPipe Hands, OpenCV, CNN, RNN, LSTM, Transformers, Hand Gesture Recognition, Temporal Modeling, Edge Deployment, Multimodal Interpretation


Citation of this Article

D Sumathi, Potteti Tejaswini, Sadineni Aasritha, & Gadamsetty Deepthika. (2025). MediaPipe and Deep Learning for Robust Real-Time Hand Gesture Recognition in Sign Language. In proceeding of Second International Conference on Computing and Intelligent Systems (ICCIS-2025), published in IRJIET, Volume 9, Special Issue ICCIS-2025, pp 144-149. Article DOI https://doi.org/10.47001/IRJIET/2025.ICCIS-202523

References
  1. Cai, R., Janaka, N., Kim, H., Chen, Y., Zhao, S., Huang, Y., & Hsu, D. (2025, April). AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (pp. 1-26).
  2. Roy, S., Maiti, A. K., Dutta, B., Basak, G. K., & Ghosh, K. (2025). Digital pedagogy in Indian sign language: requirement analysis and assistive technology based application. Universal Access in the Information Society, 1-20.
  3. Zhao, S., & Ozaki, T. (2025). Virtual Touchpad for AR Glasses Based on Gesture Recognition. Journal of Information Processing, 33, 128-138.
  4. Yue, H., Wei, Y., Yuan, H., & Li, H. (2025). Revitalizing urban industrial heritage: Enhancing public trust in government through smart city development and open big data analysis using artificial neural network (ANN) modeling. Cities, 156, 105538.
  5. Antari, N. W. A., Riastini, P. N., & Wirabrata, D. G. F. (2025). American Sign Language for Science Terminology in Fourth Grade Elementary School. Jurnal Ilmiah Sekolah Dasar, 9(1), 186-193.
  6. Abdoos, M., Rashidi, H., Esmaeili, P., Yousefi, H., & Jahangir, M. H. (2025). Forecasting solar energy generation in the Mediterranean region up to 2030–2050 using convolutional neural networks (CNN). Cleaner Energy Systems, 10, 100167.
  7. Ma, Qian, et al. "Intelligent Hand‐Gesture Recognition Based on Programmable Topological Metasurfaces." Advanced Functional Materials 35.1 (2025): 2411667.
  8. Chen, Y., & Wu, Y. (2025). Detection of Welding Defects Tracked by YOLOv4 Algorithm. Applied Sciences (2076-3417), 15(4).
  9. Liu, Z. L. (2025). Artificial neural networks. In Artificial Intelligence for Engineers: Basics and Implementations (pp. 175-190). Cham: Springer Nature Switzerland.
  10. Smart glasses integrate ASR, YOLO, and voice-to-text APIs to detect and translate sign language in real time, providing deaf users with immediate feedback via a HUD display.
  11. A CNN-based system using MediaPipe and OpenCV enables real-time Indian Sign Language recognition with speech synthesis.