AI Based Speech Analysis Framework

Abstract

The pioneering AI-Based Speech Analysis Framework presented in this research paper was painstakingly created to help people overcome linguistic obstacles, notably in the context of English language communication. Through a thorough speech analysis, the framework's multimodal approach enables real-time evaluation of emotional state, fluency, stress levels, and even identification recognition. This framework delivers a sophisticated and perceptive interpretation of spoken language by utilizing cutting-edge artificial intelligence approaches, hence promoting an enhanced and successful communication experience. The study is focused on four significant sub-objectives, each of which advances the main objective of encouraging increased self-awareness and communication: First, by detecting subtly emotional indicators embedded in the voice, the framework transforms emotional assessment. The AI algorithms identify emotional patterns, such as enthusiasm, trepidation, or tranquility. This in-the-moment emotional analysis creates opportunities for tailored communication techniques and a greater understanding of the speaker's feelings. The framework also introduces a novel way for assessing fluency levels using voice analysis. It analyzes various facets of speech, such as pace, intonation, and lexical decisions, giving language learners immediate feedback on their level of linguistic proficiency. This makes it easier to make focused improvements and to move more easily toward effective communication. The framework also discusses the complex relationship between stress and good communication. It measures stress levels through vocal pattern analysis, offering light on instances of heightened tension or anxiety when speaking. Such knowledge enables people to overcome stress-related hurdles and enhance communication. The framework's capacity to accurately identify people based on distinctive voice traits lies at the heart of its innovation. Language limitations are no obstacle to this identity recognition technology, which provides an effective and secure method of identification in a variety of settings. Voice-based identification detection accelerates procedures and promotes inclusion in a variety of settings, including work settings and public services. The development of an AI-Based Speech Analysis Framework that reveals fresh angles in language evaluation and communication improvement is the culmination of this research. It not only encourages self-improvement but also highlights the revolutionary potential of AI in redefining language landscapes and promoting true connections by merging emotional, fluency, stress analysis, and identity identification through voice.

Country : Sri Lanka

1 W.A.D.Perera2 Mr. Jeewaka Perera3 Mr. Tharaniyawarma.K

  1. Faculty of Computing, Sri Lanka Institute of Information Technology, Sri Lanka
  2. Faculty of Computing, Sri Lanka Institute of Information Technology, Sri Lanka
  3. Faculty of Computing, Sri Lanka Institute of Information Technology, Sri Lanka

IRJIET, Volume 8, Issue 1, January 2024 pp. 94-104

doi.org/10.47001/IRJIET/2024.801013

References

  1. R. A. Khalil, E. Jones, M. I. Babar, T. Jan, M. H. Zafar, and T. Alhussain, “Speech Emotion Recognition Using Deep Learning Techniques: A Review,” IEEE Access, vol. 7, pp. 117327–117345, 2019.
  2. F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile,” Proceedings of the international conference on Multimedia - MM ’10, 2010.
  3. A.Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, vol. 18, no. 5–6, pp. 602–610, Jul. 2005.
  4. G. Hinton et al., “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82–97, Nov. 2012.
  5. S. Mirsamadi, E. Barsoum, and C. Zhang, “Automatic speech emotion recognition using recurrent neural networks with local attention,” 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Mar. 2017.
  6. S. J. Pan and Q. Yang, “A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
  7. H. Xu, H. Zhang, K. Han, Y. Wang, Y. Peng, and X. Li, “Learning Alignment for Multimodal Emotion Recognition from Speech.” Available: https://arxiv.org/pdf/1909.05645.pdf
  8. T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. Ambikairajah, “A Comprehensive Review of Speech Emotion Recognition Systems,” IEEE Access, vol. 9, pp. 47795–47814, 2021.
  9. B. Schuller et al., “The INTERSPEECH 2010 Paralinguistic Challenge *.” Accessed: Nov. 17, 2019. [Online]. Available: https://sail.usc.edu/publications/files/schuller2010_interspeech.pdf
  10. F. Liu et al., “Deep Learning for Community Detection: Progress, Challenges and Opportunities,” Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, pp. 4981–4987, Jul. 2020.