Speech Emotion Recognition using Convolutional Neural Networks with Attention Mechanisms

A.Poongodai; Y.Nandini; T.Mounika; A.Karishma; N.Kevalya Kumar

doi:https://doi.org/10.47001/IRJIET/2025.ICCIS-202526

Speech Emotion Recognition using Convolutional Neural Networks with Attention Mechanisms

A.PoongodaiAssistant Professor, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, IndiaY.NandiniStudent, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, IndiaT.MounikaStudent, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, IndiaA.KarishmaStudent, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, IndiaN.Kevalya KumarStudent, Department of CSE (AI), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, India

Vol 9 No 2025 (2025): Volume 9, Special Issue of ICCIS-2025 May 2025 | Pages: 162-167

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 11-06-2025

doi.org/10.47001/IRJIET/2025.ICCIS-202526

Full Text PDF

Abstract

Speech Emotion Recognition (SER) is a crucial component in enhancing human- computer interaction by enabling machines to recognize and respond to human emotions effectively. This study proposes a novel SER framework using Convolutional Neural Networks (CNNs) augmented with attention mechanisms. The CNNs are employed to capture hierarchical and spatial features from spectrogram representations of speech signals, while Attention mechanisms focus on emotionally salient regions, improving interpretability and accuracy. The proposed model is evaluated on benchmark datasets, demonstrating superior performance compared to traditional methods. This innovative combination of CNNs and attention mechanisms highlights its potential for advancing real-world SER applications such as virtual assistants, customer support systems, and mental health monitoring. By prioritizing critical emotional features, the model improves its practical utility and reliability. This work underlines the importance of deep learning techniques in developing SER technologies, paving the way for more intuitive and effective human-computer interactions. This approach highlights the potential of combining CNNs with attention for advancing SER applications in real-world scenarios.

Keywords

Speech Emotion Recognition, Deep Learning, Convolutional Neural Networks, Attention Mechanisms

Citation of this Article

A.Poongodai, Y.Nandini, T.Mounika, A.Karishma, & N.Kevalya Kumar. (2025). Speech Emotion Recognition using Convolutional Neural Networks with Attention Mechanisms. In proceeding of Second International Conference on Computing and Intelligent Systems (ICCIS-2025), published in IRJIET, Volume 9, Special Issue ICCIS-2025, pp 162-167. Article DOI https://doi.org/10.47001/IRJIET/2025.ICCIS-202526

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

Khalil et al., Edward jones. Speech Emotion Recognition using Deep Learning Techniques https://ieeeaccess.ieee.org/
Aouani & Ben Ayed, Yassine Ben Ayed (2020). Speech Emotion Recognition with Deep Learning https://www.sciencedirect.com/search?qs=speech%20emotion%20recognition
Kaur, Jasmeet & Anil Kumar, Shwethashri k (2021). Speech Emotion Recognition using Machine Learning https://www.irjet.net/archives/V7/i9/IRJETV7I9154
Anastasia Pentari, George Kafentzis, Manolis Tsiknakis (2024). Speech Emotion Recognition via graph based representation. https://www.nature.com/articles/s4159024-52989-2.
Apoorva Sharma, Himanshu Nawani, Shalini Verma (2023) Speech Emotion Recognition using Deep Learning.
Pavithra et al., Sukhanya Ledella, Sirisha Devi (2023). Deep Learning based Speech Emotion Recognition: An Investigation into a sustainably Emotion–speech-Relationship. http://doi.org/10.1051/e3sconf/2023430010.
Congshan Sun, Haifeng Li, Lin Ma (2023). Speech Emotion Recognition based on improved masking EMD and convolutional recurrent neural network. https:/doi.10.3389/fpsyg.2022.1075624.
D. Lakshmi et al., Samuel kakuba et al. (2023). Speech Emotion Recognition using Librosa using hybrid models.
Yunhao Zhao et al. (2023). Speech Emotion Recognition using convolutional Neural Networks (CNN) and gamma classifier-based error correcting output codes (ECOC). http://www.nature.com/scientificreports
Samarth Adkitte et al., Vina Lomte, Mansi Fale, Vaibhavi k Kudale (2023). Speech Emotion Recognitionusing Deep Learning. https://ijcrt.org/papers/IJCRT2105446.pdf
Tae-Wan Kim, keun-Chang Kwak (2024). Speech Emotion Recognition using Deep Learning Transfer Models and Explainable Techniques.
Francesco Ardan Dal Ri, Fabio Cifariello Ciardi, Nicola Conci (2023). Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Speech Emotion Recognition using Convolutional Neural Networks with Attention Mechanisms

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links