Enhanced Speech Emotion Recognition Using Hybrid Machine Learning and Deep Learning Models

Abstract

In recent times, recognizing human emotions accurately has become crucial for enhancing human-computer interaction. Speech Emotion Recognition (SER) enables systems to interpret emotional states from speech signals, improving applications such as virtual assistants, mental health monitoring, and affective computing. However, accurately classifying emotions remains a challenge due to the complexity of speech variations. In this paper, we propose a hybrid approach that integrates traditional machine learning techniques with deep learning models to improve emotion classification. Logistic Regression (LR) and Decision Trees (DT) are used for initial feature extraction and classification, ensuring the preservation of critical speech features, while Convolutional Neural Networks (CNN)and Long Short-Term Memory (LSTM) networks are employed for deep feature learning and sequential pattern recognition. This integration allows the model to capture complex acoustic patterns and temporal dependencies, improving classification accuracy. The proposed model was trained and tested on the TESS dataset, which provides a diverse range of emotional utterances. Our integrated approach achieved impressive (98- 99 percentage) accuracy in classifying emotions, significantly outperforming traditional methods. These results demonstrate the model’s potential for improving emotion recognition, making it valuable for real-world applications in interactive AI systems and healthcare.

Country : India

1 Roopa R2 Harshitha Lakshmi N V3 Dhana Lakshmi S4 Dilip B

  1. Assistant professor, Department of Computer Science & Engineering (Data Science), Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, India
  2. Department of Computer Science & Engineering (Data Science), Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, India
  3. Department of Computer Science & Engineering (Data Science), Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, India
  4. Department of Computer Science & Engineering (Data Science), Madanapalle Institute of Technology & Science, Madanapalle, Andhra Pradesh, India

IRJIET, Volume 9, Special Issue of ICCIS-2025 May 2025 pp. 194-199

doi.org/10.47001/IRJIET/2025.ICCIS-202531

References

  1. S. Bargal and S. Peleg. Deep learning for speech emotion recognition: A review. Pattern Recognition Letters, 120:29–38, 2018.
  2. W. P. Birmingham and M. Beaudry. Affect and emotion recognition in speech: A review. Journal of Voice, 27(4):429–440, 2013.
  3. M. El Ayadi, M. S. Kamel, and F. Karray. Speech emotion recognition using gaussian mixture vector autoregressive models. Speech Communication, 53(5):688–696, 2011.
  4. S. Ghazal and B. Schuller. Continuous emotion recognition from speech: State-of-the-art and trends. 2014.
  5. F. Haq, M. Rafi, and A. Hassan. Speech emotion recognition using deep learning techniques: A review. Journal of Electrical Engineering Technology, 15(3):1075–1086, 2020.
  6. J. Kim and S. Lee. Multi-level deep neural network for speech emotion recognition. IEEE Transactions on Affective Computing, 2019.
  7. J. Krajewski and L. K”unzel. A survey on emotion recognition using speech: Research advances and challenges, 2019.
  8. S. Latif, R. Rana, S. Khalifa, R. Jurdak, and B. W. Schuller. Speech emotion recognition using convolutional neural networks. Neural Computing and Applications, 32(15):10303–10312, 2020.
  9. C. H. Lee and S. Narayanan. Emotion recognition using speech: A review of techniques and applications. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2005.
  10. S. Narayanan and J. Di Matteo. Detecting emotions in speech. IEEE Transactions on Audio, Speech, and Language Processing, 2013.
  11. M. Rastgoo and I. Shafran. Emotion recognition from speech using a combination of acoustic features. In Proceedings of INTERSPEECH, 2013.
  12. S. K. Sahu and R. Anuradha. Emotion recognition from speech using ensemble methods. In International Conference on Computational Intelligence and Data Science, 2019.
  13. K. R. Scherer. Vocal communication of emotion: A review of research paradigms. Speech Communication, 40(1):227–256, 2003.
  14. B. Schuller and S. Steidl. The inter speech 2010 paralinguistic challenge. In Proceedings of INTERSPEECH 2010, 2010.
  15. B. Schuller, S. Steidl, A. Batliner, and D. Seppi. Paralinguistic information and emotion in speech. In Handbook of Affective Computing, pages 253–271. 2011.
  16. D. Shillingford and M. A. Hasegawa-Johnson. An overview of speech emotion recognition. IEEE Transactions on Audio, Speech, and Language Processing, 2014.
  17. M. W”ollmer and B. Schuller. Lstm-based emotional speech synthesis. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013.
  18. Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang. A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1):39–58, 2009.
  19. S. Zhang, T. Huang, W. Gao, and Q. Tian. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Transactions on Multimedia, 19(6):2130–2140, 2017.
  20. Z. Zhang and Y. Zhang. Emotion recognition from speech using deep convolutional neural network. IEEE Transactions on Affective Computing, 2019.