VOCASIGHT: An AI Assistive Navigation System for the Visually Impaired

Abstract

Vocasight is a hybrid assistive system designed to support visually impaired individuals by integrating software and hardware technologies for real-time navigation and environmental awareness. The system uses computer vision, machine learning, and embedded systems to perform object detection, face recognition, and scene understanding, providing audio feedback through Text-to-Speech. A mobile application developed using Android and Flutter incorporates libraries such as OpenCV, YOLOv8n, and EasyOCR for efficient image processing. Additionally, a portable hardware device based on ESP32-CAM, equipped with ultrasonic sensors and a buzzer, enables real-time obstacle detection and alerts. The system is further extendable with features like navigation assistance and emotion detection. Overall, Vocasight offers a cost-effective and user-friendly solution that enhances safety, independence, and situational awareness for visually impaired users.

Country : India

1 Sakshi Asodekar2 Madhumita Ghosh3 Harshna Patil4 Harsh Gaikwad5 Tularam Bansode

  1. Student, Department of CSE (AI & ML), Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
  2. Student, Department of CSE (AI & ML), Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
  3. Student, Department of CSE (AI & ML), Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
  4. Student, Department of CSE (AI & ML), Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
  5. Professor, Department of CSE (AI & ML), Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India

IRJIET, Volume 10, Issue 4, April 2026 pp. 160-165

doi.org/10.47001/IRJIET/2026.104023

References

  1. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  2. Open-Source Computer Vision Library, “OpenCV Documentation,” [Online]. Available: https://opencv.org/
  3. A.Geitgey, “Face Recognition Library,” [Online]. Available: https://github.com/ageitgey/face_recognition
  4. Google, “Tesseract OCR Engine,” [Online]. Available: https://github.com/tesseract-ocr/tesseract
  5. Espressif Systems, “ESP32-CAM Technical Reference Manual,” [Online]. Available: https://www.espressif.com/
  6. D. Jurafsky and J. H. Martin, Speech and Language Processing, 3rd ed., Pearson, 2020.
  7. Flutter Documentation, “Flutter SDK,” [Online]. Available: https://flutter.dev/
  8. Android Developers, “Android OS Documentation,” [Online]. Available: https://developer.android.com/
  9. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE CVPR, 2016.
  10. World Health Organization (WHO), “World Report on Vision,” 2019.
  11. A.Howard et al., “Searching for MobileNetV3,” Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
  12. M. Sandler et al., “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  13. W. Liu et al., “SSD: Single Shot MultiBox Detector,” European Conference on Computer Vision (ECCV), 2016.
  14. S. Zhang et al., “Deep Learning-Based Object Detection for Assistive Navigation of Visually Impaired Individuals,” IEEE Access, 2021.
  15. A.Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004.10934, 2020.
  16. C. Szegedy et al., “Going Deeper with Convolutions,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  17. A.Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems (NIPS), 2012.
  18. R. Girshick, “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
  19. S. Ren et al., “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
  20. H. Bay et al., “SURF: Speeded-Up Robust Features,” European Conference on Computer Vision (ECCV), 2006.
  21. N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
  22. M. Abadi et al., “TensorFlow: A System for Large-Scale Machine Learning,” USENIX Symposium on Operating Systems Design and Implementation, 2016.
  23. S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, 1997.
  24. D. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980, 2014.