MediaPipe and Deep Learning for Robust Real-Time Hand Gesture Recognition in Sign Language

Abstract

This extend centers on creating an AI-based framework for real-time sign dialect discovery utilizing computer vision and profound learning methods. The essential objective is to bridge the communication hole between the hard of hearing and hearing communities by precisely recognizing hand motions and changing over them into content or discourse. The examination includes utilizing MediaPipe Hands, OpenCV, and a profound learning show prepared on a dataset of sign dialect signals. Strategies such as convolutional neural systems (CNNs) and repetitive neural systems (RNNs) are utilized to make strides motion acknowledgment precision.

The MediaPipe Hands system, combined with OpenCV, empowers vigorous real-time hand following and keypoint extraction. Profound learning models, especially CNN-based models, accomplish tall precision in classifying sign dialect motions. The framework performs well in controlled situations but faces challenges with varieties in lighting, foundation clutter, and hand occlusions. Growing the dataset and coordination more complex worldly models (e.g., LSTMs or Transformers) can assist upgrade acknowledgment exactness. Move forward dataset differing qualities by joining more hand shapes, skin tones, and lighting conditions. Execute transient modeling methods (e.g., LSTMs, Transformers) to improve acknowledgment of ceaseless sign dialect.

Country : India

1 D Sumathi2 Potteti Tejaswini3 Sadineni Aasritha4 Gadamsetty Deepthika

  1. Department of Computer Science Engineering, Alliance School of Advanced Computing, Bengaluru, India
  2. Department of Computer Science Engineering, Alliance School of Advanced Computing, Bengaluru, India
  3. Department of Computer Science Engineering, Alliance School of Advanced Computing, Bengaluru, India
  4. Department of Computer Science Engineering, Alliance School of Advanced Computing, Bengaluru, India

IRJIET, Volume 9, Special Issue of ICCIS-2025 May 2025 pp. 144-149

doi.org/10.47001/IRJIET/2025.ICCIS-202523

References

  1. Cai, R., Janaka, N., Kim, H., Chen, Y., Zhao, S., Huang, Y., & Hsu, D. (2025, April). AiGet: Transforming Everyday Moments into Hidden Knowledge Discovery with AI Assistance on Smart Glasses. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (pp. 1-26).
  2. Roy, S., Maiti, A. K., Dutta, B., Basak, G. K., & Ghosh, K. (2025). Digital pedagogy in Indian sign language: requirement analysis and assistive technology based application. Universal Access in the Information Society, 1-20.
  3. Zhao, S., & Ozaki, T. (2025). Virtual Touchpad for AR Glasses Based on Gesture Recognition. Journal of Information Processing, 33, 128-138.
  4. Yue, H., Wei, Y., Yuan, H., & Li, H. (2025). Revitalizing urban industrial heritage: Enhancing public trust in government through smart city development and open big data analysis using artificial neural network (ANN) modeling. Cities, 156, 105538.
  5. Antari, N. W. A., Riastini, P. N., & Wirabrata, D. G. F. (2025). American Sign Language for Science Terminology in Fourth Grade Elementary School. Jurnal Ilmiah Sekolah Dasar, 9(1), 186-193.
  6. Abdoos, M., Rashidi, H., Esmaeili, P., Yousefi, H., & Jahangir, M. H. (2025). Forecasting solar energy generation in the Mediterranean region up to 2030–2050 using convolutional neural networks (CNN). Cleaner Energy Systems, 10, 100167.
  7. Ma, Qian, et al. "Intelligent Hand‐Gesture Recognition Based on Programmable Topological Metasurfaces." Advanced Functional Materials 35.1 (2025): 2411667.
  8. Chen, Y., & Wu, Y. (2025). Detection of Welding Defects Tracked by YOLOv4 Algorithm. Applied Sciences (2076-3417), 15(4).
  9. Liu, Z. L. (2025). Artificial neural networks. In Artificial Intelligence for Engineers: Basics and Implementations (pp. 175-190). Cham: Springer Nature Switzerland.
  10. Smart glasses integrate ASR, YOLO, and voice-to-text APIs to detect and translate sign language in real time, providing deaf users with immediate feedback via a HUD display.
  11. A CNN-based system using MediaPipe and OpenCV enables real-time Indian Sign Language recognition with speech synthesis.