Automatic Video Subtitle Generation System through AI

M.MamathaAssistant Professor, Dept. of Computer Science and Engineering, Mahatma Gandhi Institute of Technology, Hyderabad, IndiaY Pavan Narashimha RaoDept. of Computer Science and Engineering, Mahatma Gandhi Institute of Technology, Hyderabad, India3Ch.Manoj BabuDept. of Computer Science and Engineering, Mahatma Gandhi Institute of Technology, Hyderabad, IndiaE.VikramDept. of Computer Science and Engineering, Mahatma Gandhi Institute of Technology, Hyderabad, India

Vol 10 No 5 (2026): Volume 10, Issue 5, May 2026 | Pages: 625-629

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 29-05-2026

doi Logo doi.org/10.47001/IRJIET/2026.105084

Abstract

The rapid growth of digital video content on online platforms has created a significant demand for automated subtitle generation systems to improve accessibility, content understanding, and multilingual communication. Subtitles play an important role in helping hearing-impaired individuals, non-native speakers, and viewers in noisy environments understand video content effectively. However, manual subtitle creation is time-consuming, labor-intensive, and costly, especially for multilingual videos. To address this problem, this project presents an AI-powered multilingual subtitle generation system that automatically generates and embeds subtitles for video content in Telugu and English. The proposed system uses speech recognition and natural language processing technologies to convert spoken audio from video into text subtitles. The system integrates the speech recognition model OpenAI Whisper for automatic speech transcription and translation, and the multimedia processing tool FFmpeg for audio extraction and subtitle embedding. The entire system is deployed as a web application using Flask, allowing users to upload videos and download subtitled videos through a browser interface. The system workflow begins with video upload through the web interface. The uploaded video is processed using FFmpeg to extract the audio stream in a 16 kHz mono PCM format suitable for speech recognition processing. The extracted audio is then converted into log-mel spectrogram features and processed using a transformer-based encoder-decoder architecture of the Whisper model. The model generates time-stamped text segments directly from speech audio, eliminating the need for separate alignment tools.

Keywords

Automatic Speech Recognition, Transformer Encoder-Decoder, ML model, Log-Mel Spectrogram, Subtitle Generation, FFmpeg, SubRip Text, Multilingual Video Processing, Web Application.


Citation of this Article

M.Mamatha, Y Pavan Narashimha Rao, Ch.Manoj Babu, & E.Vikram. (2026). Automatic Video Subtitle Generation System through AI. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(5), 625-629. Article DOI https://doi.org/10.47001/IRJIET/2026.105084

References
Fastelli et al., “Speech-to-Text Captioning and Subtitling in Schools,” Audio Research Journal, 2025.

J. Poncelet et al., “Leveraging Broadcast Media Subtitle Transcripts for ASR and Subtitling,” arXiv, 2025.

K. Sindhu et al., “AI Powered Real-Time Video Caption Recommendation System,” IJCRT, 2025.

N. Nguyen et al., “Whisper Based Speech-to-Text Captioning Performance Study,” 2024.

R. Veroz-Gonzalez et al., “Automatic Closed Captions in Academic Video Presentations,” 2024.

S. Anand et al., “Real-Time Subtitle Generation for Live Videos Using AI and Machine Learning,” 2023.

S. Polepaka et al., “Automated Caption Generation for Video Call with Language Translation,” E3S Web of Conferences, 2023.

S. Papi et al., “Direct Speech Translation for Automatic Subtitling,” TACL, 2023.

S. Polepaka et al., “Automated Caption Generation for Video Call,” E3S Conference Proceedings, 2023.

Y. Ming et al., “Visuals to Text: A Comprehensive Review on Automatic Image Captioning,” 2022.

M. Amirian et al., “Automatic Image and Video Caption Generation With Deep Learning: A Concise Review,” 2020.

K. R. Aiswarya, “Automatic Multiple Language Subtitle Generation for Videos,” IRJET, 2020.

P. Sharma et al., “Automatic Generation of Subtitle in Videos,” IJCSE, 2019.

A.Hannun et al., “Deep Learning Based Speech Recognition Caption Systems,” arXiv, 2019.

N. Radha and R. Pradeep, “Automated Subtitle Generation,” IJAERV, 2015.