EMOSENSE – Multi-Modal Emotion Recognition to Identify Emotions

Abstract

An extended study has been done over the past years to better comprehend human emotions. The embracement of technology to recognize and react to human emotions has become a required component of society. We present a fully functional multi-modal emotion recognition system in this study that integrates data from text, voice, facial expressions, and body language. In this study, the automatic classification of anger, fear, joy, sadness, surprise, disgust, and neutral emotions from text, facial expressions, voice, and body movements have been studied on the TESS, MELD, FER2013, and EDNLP datasets. Random Forest Classifier has been used for the classification of emotions using body language, VGG16 pre-trained model for facial emotion classification, Logistic Resgression for text emotion classification, and CNN for voice emotion classification. The logistic regression model for text emotion prediction leverages natural language processing (NLP) techniques to extract emotions from textual data. The CNN-based voice model utilizes speech recognition and emotion recognition algorithms to analyze audio signals and detect emotional cues in the speaker's voice. The facial expression model employs a combination of CNN-based VGG16 pre-trained model and modified convolutional layers to detect emotions. Meanwhile, the Random Forest Clasifier model is used to capture and interpret non-verbal cues, such as gestures, posture, and overall body movements, to enrich the emotion detection process. The real strength of our proposed system lies in its ability to synergistically combine information from multiple modalities.

Country : Sri Lanka

1 De Silva J.A.D.P.R2 Lanka P.A.C3 Jayawardena R.D.T.M4 Nandakumara K.S.S5 Lakmini Abeywardhana6 Dilshan De Silva

  1. Faculty of Computing (FoC), Sri Lanka Institute of Information Technology (SLIIT), Malabe, Sri Lanka
  2. Faculty of Computing (FoC), Sri Lanka Institute of Information Technology (SLIIT), Malabe, Sri Lanka
  3. Faculty of Computing (FoC), Sri Lanka Institute of Information Technology (SLIIT), Malabe, Sri Lanka
  4. Faculty of Computing (FoC), Sri Lanka Institute of Information Technology (SLIIT), Malabe, Sri Lanka
  5. Faculty of Computing (FoC), Sri Lanka Institute of Information Technology (SLIIT), Malabe, Sri Lanka
  6. Faculty of Computing (FoC), Sri Lanka Institute of Information Technology (SLIIT), Malabe, Sri Lanka

IRJIET, Volume 7, Issue 10, October 2023 pp. 428-436

doi.org/10.47001/IRJIET/2023.710057

References

[1]

A. G. R. S. Amit Pandey, "FACIAL EMOTION DETECTION AND RECOGNITION," International Journal of Engineering Applied Sciences and Technology, vol. 7, no. 1, pp. 176-179, 2022.

[2]

K. S. R. B. Boddepalli Kiran Kumar, "Facial Emotion Recognition and Detection Using CNN," Turkish Journal of Computer and Mathematics Education, vol. 12, no. 14, pp. 5960-5968, 2021.

[3]

Z. Y. P. C. S. N. Jianhua Zhang, "Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review," 2020.

[4]

N. C. K. P. N. B. J. A. Gokul Subramanian, "Multimodal Emotion Recognition Using Different Fusion Techniques," 2021.

[5]

M. Shaikh, H. Prendinger, and M. Ishizuka, Emotion sensitive.

[6]

M. Shaikh, H. Prendinger, and M. Ishizuka, SenseNet: A linguistic.

[7]

M. Shaikh, H. Prendinger, and M. Ishizuka, A cognitively based.

[8]

H. Liu, H. Lieberman, and T. Selker, A model of textual affect.

[9]

A. C. Boucouvalas and X. Zhe, Text-to-Emotion. Engine for Real.

[10]

M. S. A. A. G. R. K. K. H. J. &. K. D. Karg, "Body movements for affective expression: A survey of automatic recognition and generation.," IEEE Transactions on Affective Computing, vol. 4, no. 4, pp. 341 - 359, 2013.

[11]

A. &. B.-B. N. Kleinsmith, "Affective body expression perception and recognition: A survey.," IEEE Transactions on Affective Computing, vol. 4, no. 1, pp. 15-33, 2013.

[12]

A. Q. Y. &. P. R. W. Kapoor, "Fully automatic upper facial action recognition.," In Proceedings of the IEEE International Workshop on Analysis and Modeling of Faces and Gestures, pp. 195-202, 2003.

[13]

S. E. B. X. L. P. G. C. M. V. K. K. .. &. B. Y. Kahou, "Emonets: Multimodal deep learning approaches for emotion recognition in video.," Journal on Multimodal User Interfaces, vol. 7, no. 1, pp. 99-111, 2013.

[14]

R. A. &. D. S. Calvo, "Affect detection: An interdisciplinary review of models, methods, and their applications.," IEEE Transactions on Affective Computing, vol. 1, no. 1, pp. 18-37, 2010.

[15]

S. C. E. B. R. &. H. A. Poria, "A review of affective computing: From unimodal analysis to multimodal fusion.," Information Fusion, vol. 37, pp. 98-125, 2017.

[16]

C. B. T. C. Amil Khanzada, "Facial Expression Recognition with Deep Learning".

[17]

D. H. M. N. C. M. Soujanya Poria, "MELD:A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations," vol. 2019, June.

[18]

D. J. RENJU RENJITH, "Emotion detection using facial expression recognition based on VGG16 network," Journal of Emerging Technologies and Innovative Research (JETIR), vol. 8, no. 7, pp. b934-b938, 2021.

[19]

https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp.

[20]

Asghar MZ, Subhan F, Imran M, Kundi FM, Shamshirband S, Mosavi A, Csiba P, Várkonyi-Kóczy AR (2019) Performance evaluation of supervised machine learning techniques for efficient detection of emotions from online content. arXiv preprint arXiv:190801587.

[21]

"Toronto emotional speech set (TESS)," [Online]. Available: https://www.kaggle.com/datasets/ejlok1/toronto-emotional-speech-set-tess.

[22]

Zhang, Y., Ishibuchi, H., and Wang, S. (2018). Deep Takagi–sugeno–kang fuzzy classifier with shared linguistic fuzzy rules. IEEE Trans. Fuzzy Syst. 26, 1535–1549. doi: 10.1109/TFUZZ.2017.2729507.