DeepVision: A Hybrid Deepfake Detection Framework Using Deep Learning Approaches

Dheeraj Shukla; Dinesh Sonawane; Jitendra Kulkarni; Neha Agale; Dakshita Pawar

doi:https://doi.org/10.47001/IRJIET/2026.105038

DeepVision: A Hybrid Deepfake Detection Framework Using Deep Learning Approaches

Dheeraj ShuklaDepartment of Computer Engineering, P.S.G.V.P. Mandal’s D.N. Patel College of Engineering, Shahada, IndiaDinesh SonawaneDepartment of Computer Engineering, P.S.G.V.P. Mandal’s D.N. Patel College of Engineering, Shahada, IndiaJitendra KulkarniDepartment of Computer Engineering, P.S.G.V.P. Mandal’s D.N. Patel College of Engineering, Shahada, IndiaNeha AgaleDepartment of Computer Engineering, P.S.G.V.P. Mandal’s D.N. Patel College of Engineering, Shahada, IndiaDakshita PawarDepartment of Computer Engineering, P.S.G.V.P. Mandal’s D.N. Patel College of Engineering, Shahada, India

Vol 10 No 5 (2026): Volume 10, Issue 5, May 2026 | Pages: 284-290

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 15-05-2026

doi.org/10.47001/IRJIET/2026.105038

Full Text PDF

Abstract

Over the past decade, rapid progress in artificial intelligence (AI), machine learning, and deep learning has introduced sophisticated techniques for multimedia manipulation. Although such technologies have legitimate applications in entertainment and education, malicious actors increasingly exploit them for disinformation campaigns, political propaganda, identity fraud, and targeted harassment. High-quality synthetic videos and images commonly known as deepfakes pose a growing threat to digital security and public trust. This paper introduces DeepVision, a hybrid deepfake detection framework that fuses EfficientNet-B0 with a Vision Transformer (ViTB/16) to exploit both local texture features and global spatial dependencies simultaneously. The EfficientNet-B0 branch extracts fine-grained local texture and manipulation artefacts, while the Vision Transformer captures long range contextual relationships across facial regions using multi-head self-attention. The model is trained on a combined dataset derived from FaceForensics++ (FF++) and the DeepFake Detection Challenge (DFDC), comprising 120,000 labeled face images. Model performance is evaluated using accuracy, precision, recall, F1- score, confusion matrix, and ROC-AUC metrics. Experimental results demonstrate strong classification performance, achieving 98% accuracy and an AUC of 0.9973 on the combined dataset, representing competitive performance relative to recent state-of-the-art studies. The proposed framework supports both image-based and video-based deepfake detection and is suitable for real-world deployment in digital forensics and media authentication applications.

Keywords

Deepfake Detection, Deep Learning, Convolutional Neural Network (CNN), Vision Transformer (ViT).

Citation of this Article

Dheeraj Shukla, Dinesh Sonawane, Jitendra Kulkarni, Neha Agale, & Dakshita Pawar. (2026). DeepVision: A Hybrid Deepfake Detection Framework Using Deep Learning Approaches. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(5), 284-290. Article DOI https://doi.org/10.47001/IRJIET/2026.105038

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

P. Yu, Z. Xia, J. Fei, and Y. Lu, "A survey on deepfake video detection," IET Biometrics, vol. 10, no. 6, pp. 607–624, 2021.

Z. Wang, Z. Cheng, J. Xiong, X. Xu, T. Li, B. Veeravalli, and X. Yang, "A timely survey on vision transformer for deepfake detection," arXiv preprint arXiv:2405.08463, 2024.

M. S. Rana, M. N. Nobi, B. Murali, and A. H. Sung, "Deepfake detection: A systematic literature review," IEEE Access, vol. 10, pp. 25494–25513, 2022.

H. Alfraihi et al., "A multi-model feature fusion-based transfer learning with heuristic search for copy-move video forgery detection," Scientific Reports, vol. 15, no. 1, Art. no. 4738, 2025.

A.H. Soudy et al., "Deepfake detection using convolutional vision transformers and convolutional neural networks," Neural Computing and Applications, vol. 36, pp. 19759–19775, 2024, doi: 10.1007/s00521-024-10181-7.

A.Sar et al., "A unified neural framework for real-time deepfake detection across multimedia modalities to combat misleading content," IEEE Access, vol. 13, pp. 48683–48702, 2025, doi: 10.1109/ACCESS.2025.3550770.

F. Zafar, T. A. Khan, S. Akbar, M. T. Ubaid, S. Javaid, and K. A. Kadir, "A hybrid deep learning framework for deepfake detection using temporal and spatial features," IEEE Access, vol. 13, pp. 79560–79570, 2025, doi: 10.1109/ACCESS.2025.3566008.

S. A. Hussein and S. N. Mohamed, "Deepfake video detection using a vision transformer," International Journal of Intelligent Computing and Information Sciences, vol. 24, no. 1, pp. 55–68, 2024.

D. Nguyen, M. Astrid, E. Ghorbel, and D. Aouada, "FakeFormer: Efficient vulnerability-driven transformers for generalisable deepfake detection," arXiv preprint arXiv:2410.21964v2, 2024.

Y. Chen, L. Zhang, Y. Niu, P. Chen, L. Tan, and J. Zhou, "Guided and fused: Efficient frozen CLIP-ViT with feature guidance and multi-stage feature fusion for generalizable deepfake detection," arXiv preprint arXiv:2408.13697v1, 2024.

D. Lamichhane, "Advanced detection of AI-generated images through vision transformers," IEEE Access, vol. 13, pp. 3644–3652, 2025, doi: 10.1109/ACCESS.2024.3522759.

A.A.-M. Alrawahneh et al., "Decision-aid framework for face authentication detection using ResNext50 and BiLSTM to enhance media integrity," IEEE Access, vol. 13, pp. 89858–89873, 2025, doi: 10.1109/ACCESS.2025.3569792.

A.Almestekawy, H. H. Zayed, and A. Taha, "Deepfake detection: Enhancing performance with spatiotemporal texture and deep learning feature fusion," Egyptian Informatics Journal, vol. 27, Art. no. 100535, 2024.

E. Tchaptchet et al., "Deepfakes detection by iris analysis," IEEE Access, vol. 13, pp. 8977–8987, 2025.

A.Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, "FaceForensics++: Learning to detect manipulated facial images," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 1–11.

B. Dolhansky, J. Bitton, B. Pflaum, J. Lu, R. Howes, M. Wang, and C. Canton Ferrer, "The DeepFake Detection Challenge (DFDC) dataset," arXiv preprint arXiv:2006.07397, 2020.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

DeepVision: A Hybrid Deepfake Detection Framework Using Deep Learning Approaches

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links