Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Speech
Emotion Recognition (SER) is a crucial component in enhancing human- computer
interaction by enabling machines to recognize and respond to human emotions
effectively. This study proposes a novel SER framework using Convolutional
Neural Networks (CNNs) augmented with attention mechanisms. The CNNs are
employed to capture hierarchical and spatial features from spectrogram
representations of speech signals, while Attention mechanisms focus on
emotionally salient regions, improving interpretability and accuracy. The
proposed model is evaluated on benchmark datasets, demonstrating superior
performance compared to traditional methods. This innovative combination of
CNNs and attention mechanisms highlights its potential for advancing real-world
SER applications such as virtual assistants, customer support systems, and
mental health monitoring. By prioritizing critical emotional features, the
model improves its practical utility and reliability. This work underlines the
importance of deep learning techniques in developing SER technologies, paving
the way for more intuitive and effective human-computer interactions. This
approach highlights the potential of combining CNNs with attention for
advancing SER applications in real-world scenarios.
Country : India
IRJIET, Volume 9, Special Issue of ICCIS-2025 May 2025 pp. 162-167