Enhanced Speech Emotion Recognition Using Hybrid Machine Learning and Deep Learning Models

Roopa R; Harshitha Lakshmi N V; Dhana Lakshmi S; Dilip B

doi:https://doi.org/10.47001/IRJIET/2025.ICCIS-202531

Enhanced Speech Emotion Recognition Using Hybrid Machine Learning and Deep Learning Models

Roopa RAssistant professor, Department of Computer Science & Engineering (Data Science), Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, IndiaHarshitha Lakshmi N VDepartment of Computer Science & Engineering (Data Science), Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, IndiaDhana Lakshmi SDepartment of Computer Science & Engineering (Data Science), Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, IndiaDilip BDepartment of Computer Science & Engineering (Data Science), Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, India

Vol 9 No 2025 (2025): Volume 9, Special Issue of ICCIS-2025 May 2025 | Pages: 194-199

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 11-06-2025

doi.org/10.47001/IRJIET/2025.ICCIS-202531

Full Text PDF

Abstract

In recent times, recognizing human emotions accurately has become crucial for enhancing human-computer interaction. Speech Emotion Recognition (SER) enables systems to interpret emotional states from speech signals, improving applications such as virtual assistants, mental health monitoring, and affective computing. However, accurately classifying emotions remains a challenge due to the complexity of speech variations. In this paper, we propose a hybrid approach that integrates traditional machine learning techniques with deep learning models to improve emotion classification. Logistic Regression (LR) and Decision Trees (DT) are used for initial feature extraction and classification, ensuring the preservation of critical speech features, while Convolutional Neural Networks (CNN)and Long Short-Term Memory (LSTM) networks are employed for deep feature learning and sequential pattern recognition. This integration allows the model to capture complex acoustic patterns and temporal dependencies, improving classification accuracy. The proposed model was trained and tested on the TESS dataset, which provides a diverse range of emotional utterances. Our integrated approach achieved impressive (98- 99 percentage) accuracy in classifying emotions, significantly outperforming traditional methods. These results demonstrate the model’s potential for improving emotion recognition, making it valuable for real-world applications in interactive AI systems and healthcare.

Keywords

Speech Emotion Recognition (SER), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Machine Learning, Ensemble Techniques

Citation of this Article

Roopa R, Harshitha Lakshmi N V, Dhana Lakshmi S, & Dilip B. (2025). Enhanced Speech Emotion Recognition Using Hybrid Machine Learning and Deep Learning Models. In proceeding of Second International Conference on Computing and Intelligent Systems (ICCIS-2025), published in IRJIET, Volume 9, Special Issue ICCIS-2025, pp 194-199. Article DOI https://doi.org/10.47001/IRJIET/2025.ICCIS-202531

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

S. Bargal and S. Peleg. Deep learning for speech emotion recognition: A review. Pattern Recognition Letters, 120:29–38, 2018.
W. P. Birmingham and M. Beaudry. Affect and emotion recognition in speech: A review. Journal of Voice, 27(4):429–440, 2013.
M. El Ayadi, M. S. Kamel, and F. Karray. Speech emotion recognition using gaussian mixture vector autoregressive models. Speech Communication, 53(5):688–696, 2011.
S. Ghazal and B. Schuller. Continuous emotion recognition from speech: State-of-the-art and trends. 2014.
F. Haq, M. Rafi, and A. Hassan. Speech emotion recognition using deep learning techniques: A review. Journal of Electrical Engineering Technology, 15(3):1075–1086, 2020.
J. Kim and S. Lee. Multi-level deep neural network for speech emotion recognition. IEEE Transactions on Affective Computing, 2019.
J. Krajewski and L. K”unzel. A survey on emotion recognition using speech: Research advances and challenges, 2019.
S. Latif, R. Rana, S. Khalifa, R. Jurdak, and B. W. Schuller. Speech emotion recognition using convolutional neural networks. Neural Computing and Applications, 32(15):10303–10312, 2020.
C. H. Lee and S. Narayanan. Emotion recognition using speech: A review of techniques and applications. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2005.
S. Narayanan and J. Di Matteo. Detecting emotions in speech. IEEE Transactions on Audio, Speech, and Language Processing, 2013.
M. Rastgoo and I. Shafran. Emotion recognition from speech using a combination of acoustic features. In Proceedings of INTERSPEECH, 2013.
S. K. Sahu and R. Anuradha. Emotion recognition from speech using ensemble methods. In International Conference on Computational Intelligence and Data Science, 2019.
K. R. Scherer. Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1):227–256, 2003.
B. Schuller and S. Steidl. The inter speech 2010 paralinguistic challenge. In Proceedings of INTERSPEECH 2010, 2010.
B. Schuller, S. Steidl, A. Batliner, and D. Seppi. Paralinguistic information and emotion in speech. In Handbook of Affective Computing, pages 253–271. 2011.
D. Shillingford and M. A. Hasegawa-Johnson. An overview of speech emotion recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2014.
M. W”ollmer and B. Schuller. Lstm-based emotional speech synthesis. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013.
Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39–58, 2009.
S. Zhang, T. Huang, W. Gao, and Q. Tian. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia, 19(6):2130–2140, 2017.
Z. Zhang and Y. Zhang. Emotion recognition from speech using deep convolutional neural network. IEEE Transactions on Affective Computing, 2019.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Enhanced Speech Emotion Recognition Using Hybrid Machine Learning and Deep Learning Models

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links