Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 10 No 3 (2026): Volume 10, Issue 3, March 2026 | Pages: 169-174
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 23-03-2026
Through the simultaneous analysis of voice signals and facial expressions, multimodal identification seeks to comprehend individual behaviors. In order to represent information more richly across several modalities, feature fusion is essential to this process. However, temporal misalignment between modalities and overfitting brought on by high-dimensional feature spaces are frequent problems for multimodal systems. An attention technique is developed to solve these problems by enabling the network to automatically concentrate on the most instructive local features. This approach is used by the network for both audio-visual feature integration and temporal modeling. The two primary contributions of this work are: first, it uses a multi-head self-attention mechanism to fuse audio and video features, reducing the influence of prior assumptions on the fusion process; and second, it uses a bidirectional gated recurrent unit to model the temporal dynamics of the fused features, incorporating autocorrelation coefficients along the time dimension as attention weights. Experimental results show that the proposed attention-based approach significantly improves multimodal emotion recognition accuracy.
Multimodal identification of Emotion, Audio–Visual Fusion, Attention Mechanism, Self-Attention, Temporal Dynamics, emotion recognition
Nayana S Patel. (2026). Multimodal Identification of Emotions Using Facial Expressions and Physiological Signals. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(3), 169-174. Article DOI https://doi.org/10.47001/IRJIET/2026.103024
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence