Multimodal Identification of Emotions Using Facial Expressions and Physiological Signals

Abstract

Through the simultaneous analysis of voice signals and facial expressions, multimodal identification seeks to comprehend individual behaviors. In order to represent information more richly across several modalities, feature fusion is essential to this process. However, temporal misalignment between modalities and overfitting brought on by high-dimensional feature spaces are frequent problems for multimodal systems. An attention technique is developed to solve these problems by enabling the network to automatically concentrate on the most instructive local features. This approach is used by the network for both audio-visual feature integration and temporal modeling. The two primary contributions of this work are: first, it uses a multi-head self-attention mechanism to fuse audio and video features, reducing the influence of prior assumptions on the fusion process; and second, it uses a bidirectional gated recurrent unit to model the temporal dynamics of the fused features, incorporating autocorrelation coefficients along the time dimension as attention weights. Experimental results show that the proposed attention-based approach significantly improves multimodal emotion recognition accuracy.

Country : India

1 Nayana S Patel

  1. Sri Siddhartha Institute of Business Management, Maraluru, Tumkur, Karnataka, India

IRJIET, Volume 10, Issue 3, March 2026 pp. 169-174

doi.org/10.47001/IRJIET/2026.103024

References

  1. B. Cheng and G. Y. Liu, “Emotion recognition from surface EMG signal using wavelet transform and neural network,” J. Comput. Appl., vol. 28, no. 2, pp. 1363–1366, 2008.
  2. Y. Xie, R. Liang, Z. Liang, X. Zhao, and W. Zeng, “Speech emotion recognition using multihead attention in both time and feature dimensions,” IEICE Trans. Inf. Syst., 2023.
  3. R. Harper and J. Southern, “A Bayesian deep learning framework for end-to-end prediction of emotion from heartbeat,” 2019.
  4. S. Tomar, A. Gupta, and S. Rastogi, “Human behaviour recognition through AI,” GLIMPSE - J. Comput. Sci., vol. 2, no. 2, pp. 36–37, Jul.–Dec. 2023.
  5. R. D. Lane, P. M. Chua, and R. J. Dolan, “Common effects of emotional valence, arousal, and attention on neural activation during visual processing of pictures,” Neuropsychologia, vol. 37, no. 9, pp. 989–997, 1999.
  6. J. Pan, Y. Li, and J. Wang, “An EEG-based brain–computer interface for emotion recognition,” in Proc. Int. Joint Conf. Neural Netw., pp. 2063–2067, 2016.
  7. Y. Tan, Z. Sun, F. Duan, J. Solé-Casals, and C. F. Caiafa, “A multimodal emotion recognition method based on facial expressions and electroencephalography,” Biomed. Signal Process. Control, vol. 70, 103029, 2021.
  8. Y.-L. Hsu, J.-S. Wang, W.-C. Chiang, and C.-H. Hung, “Automatic ECG-based emotion recognition in music listening,” IEEE Trans. Affect. Comput., vol. 11, no. 1, pp. 85–99, Jan.–Mar. 2017.
  9. R. Sharma, “Analysis of human sentiments using machine learning,” GLIMPSE - J. Comput. Sci., vol. 2, no. 2, pp. 46–51, Jul.–Dec. 2023.
  10. T. Tunce, S. Dogan, and U. R. Acharya, “Automated accurate speech emotion recognition system using twine shuffle pattern and iterative neighborhood component analysis techniques,” Knowledge-Based Syst., vol. 211, 106547, 2021.