Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 10 No 5 (2026): Volume 10, Issue 5, May 2026 | Pages: 625-629
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 29-05-2026
The rapid growth of digital video content on online platforms has created a significant demand for automated subtitle generation systems to improve accessibility, content understanding, and multilingual communication. Subtitles play an important role in helping hearing-impaired individuals, non-native speakers, and viewers in noisy environments understand video content effectively. However, manual subtitle creation is time-consuming, labor-intensive, and costly, especially for multilingual videos. To address this problem, this project presents an AI-powered multilingual subtitle generation system that automatically generates and embeds subtitles for video content in Telugu and English. The proposed system uses speech recognition and natural language processing technologies to convert spoken audio from video into text subtitles. The system integrates the speech recognition model OpenAI Whisper for automatic speech transcription and translation, and the multimedia processing tool FFmpeg for audio extraction and subtitle embedding. The entire system is deployed as a web application using Flask, allowing users to upload videos and download subtitled videos through a browser interface. The system workflow begins with video upload through the web interface. The uploaded video is processed using FFmpeg to extract the audio stream in a 16 kHz mono PCM format suitable for speech recognition processing. The extracted audio is then converted into log-mel spectrogram features and processed using a transformer-based encoder-decoder architecture of the Whisper model. The model generates time-stamped text segments directly from speech audio, eliminating the need for separate alignment tools.
Automatic Speech Recognition, Transformer Encoder-Decoder, ML model, Log-Mel Spectrogram, Subtitle Generation, FFmpeg, SubRip Text, Multilingual Video Processing, Web Application.
M.Mamatha, Y Pavan Narashimha Rao, Ch.Manoj Babu, & E.Vikram. (2026). Automatic Video Subtitle Generation System through AI. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(5), 625-629. Article DOI https://doi.org/10.47001/IRJIET/2026.105084
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence
J. Poncelet et al., “Leveraging Broadcast Media Subtitle Transcripts for ASR and Subtitling,” arXiv, 2025.
K. Sindhu et al., “AI Powered Real-Time Video Caption Recommendation System,” IJCRT, 2025.
N. Nguyen et al., “Whisper Based Speech-to-Text Captioning Performance Study,” 2024.
R. Veroz-Gonzalez et al., “Automatic Closed Captions in Academic Video Presentations,” 2024.
S. Anand et al., “Real-Time Subtitle Generation for Live Videos Using AI and Machine Learning,” 2023.
S. Polepaka et al., “Automated Caption Generation for Video Call with Language Translation,” E3S Web of Conferences, 2023.
S. Papi et al., “Direct Speech Translation for Automatic Subtitling,” TACL, 2023.
S. Polepaka et al., “Automated Caption Generation for Video Call,” E3S Conference Proceedings, 2023.
Y. Ming et al., “Visuals to Text: A Comprehensive Review on Automatic Image Captioning,” 2022.
M. Amirian et al., “Automatic Image and Video Caption Generation With Deep Learning: A Concise Review,” 2020.
K. R. Aiswarya, “Automatic Multiple Language Subtitle Generation for Videos,” IRJET, 2020.
P. Sharma et al., “Automatic Generation of Subtitle in Videos,” IJCSE, 2019.
A.Hannun et al., “Deep Learning Based Speech Recognition Caption Systems,” arXiv, 2019.
N. Radha and R. Pradeep, “Automated Subtitle Generation,” IJAERV, 2015.