Sentiment Analysis of Imbalanced Sarcastic Flood Disaster Texts Using Deep Learning Models

Abstract

Sentiment analysis often faces challenges like manual labeling, sarcasm detection, and imbalanced class labels. Using Twitter/X data for sentiment analysis is resource-intensive due to manual labeling. The BERT model is adequate for Indonesian sentiment analysis, but sarcasm remains challenging. This research evaluates the performance of BERT, LSTM, and BERT-LSTM models for classifying sarcastic text data, specifically in flood-related posts from Indonesia. We used Twitter/X data from December 19, 2023, to January 13, 2024, labeled by three annotators. We handle imbalanced data using techniques like Random Undersampling, SMOTE, and SMOTETomek. We assessed model performance with ANOVA based on balance-weighted accuracy. The BERT and BERT-LSTM models excelled, achieving balance-weighted accuracy values of 98.61% and 98.06%, respectively. This research advances sentiment analysis methods, particularly for natural disaster contexts in Indonesia.

Country : Indonesia

1 Nur Khamidah2 Khairil Anwar Notodiputro3 Sachnaz Desta Oktarina

  1. Student, Department of Statistics and Data Science, IPB University, Bogor, Indonesia
  2. Lecturer, Department of Statistics and Data Science, IPB University, Bogor, Indonesia
  3. Professor Lecturer, Department of Statistics and Data Science, IPB University, Bogor, Indonesia

IRJIET, Volume 8, Issue 11, November 2024 pp. 150-158

doi.org/10.47001/IRJIET/2024.811015

References

  1. World Bank, “Indonesia - Vulnerability | Climate Change Knowledge Portal.” Accessed: Jun. 10, 2023. [Online]. Available: https://climateknowledgeportal.worldbank.org/country/indonesia/
  2. Badan Pusat Statistik, “BanyaknyaDesa/KelurahanMenurutJenisBencana Alam dalamTigaTahunTerakhir (Desa), 2021.” Accessed: Jun. 10, 2023. [Online]. Available: https://www.bps.go.id/indicator/168/954/1/banyaknya-desa-kelurahan-menurut-jenis-bencana-alam-dalam-tiga-tahun-terakhir.html
  3. Pusdatinkom Badan Nasional PenanggulanganBencana (BNPB), “StatistikBencanaMenurutJenis.” Accessed: Jun. 10, 2023. [Online]. Available: https://dibi.bnpb.go.id/kbencana2
  4. Pusdatinkom Badan Nasional PenanggulanganBencana (BNPB), “StatistikBencanaMenurutJenis.”
  5. R. N. Amalia, K. Sadik, and K. A. Notodiputro, “A Preliminary Study of Sentiment Analysis on COVID-19 News: Lesson Learned from Data Acquisition, Pre-processing, and Descriptive Analytics,” BAREKENG: JurnalIlmuMatematika dan Terapan, vol. 17, no. 4, pp. 1901–1914, Dec. 2023, doi: 10.30598/barekengvol17iss4pp1901-1914.
  6. C. Oktarina, K. A. Notodiputro, and Indahwati, “Kajian PerbandinganMetode K-Means dan K-Medoids untukMenggerombolkan Data Twitter,” IPB University, 2020. Accessed: May 24, 2023. [Online]. Available: https://repository.ipb.ac.id/handle/123456789/102886
  7. A.E. S. Saputro, K. A. Notodiputro, and I. A, “Study of Sentiment of Governor’s Election Opinion in 2018,” Int J Sci Res Sci Eng Technol, pp. 231–238, Dec. 2018, doi: 10.32628/ijsrset21841124.
  8. A.Ghifari, K. A. Notodiputro, and B. Sartono, “AnalisisPopularitas IPB berdasarkan Data Twitter Menggunakan Awan Kata, Geovisualisasi, dan PemodelanKlasifikasi,” IPB University, 2018. Accessed: May 24, 2023. [Online]. Available: https://repository.ipb.ac.id/handle/123456789/94386
  9. F. T. Saputra, Y. Nurhadryani, S. H. Wijaya, and Defina, “AnalisisSentimen Bahasa Indonesia pada Twitter MenggunakanStruktur Tree BerbasisLeksikon,” IPB University, 2020. Accessed: May 24, 2023. [Online]. Available: https://repository.ipb.ac.id/handle/123456789/105583
  10. S. Li, G. Zhou, Z. Wang, S. Y. M. Lee, and R. Wang, “Imbalanced Sentiment Classification,” in International Conference on Information and Knowledge Management., Association for Computing Machinery, 2011.
  11. P. Kumar, R. Bhatnagar, K. Gaur, and A. Bhatnagar, “Classification of Imbalanced Data:Review of Methods and Applications,” IOP Conf Ser Mater Sci Eng, vol. 1099, no. 1, p. 012077, 2021, doi: 10.1088/1757-899x/1099/1/012077.
  12. S. D. Oktarina, H. Wijayanto, and H. R. Yarah, “Low Birth Weight Classification With Synthetic Minority Over Sampling Technique Random Forest,” Jurnal Kesehatan Ibu dan Anak, vol. 17, no. 1, pp. 46–56, Oct. 2023, doi: 10.29238/kia.v17i1.1802.
  13. N. E. Zendrato, B. Sartono, and U. D. Syafitri, “Identifikasi Karakteristik Rumah Tangga Penerima Bantuan Sosial Menggunakan Metode Feature Importance Model Berbasis Pohon Klasifikasi,” IPB University, 2022. Accessed: May 24, 2023. [Online]. Available: https://repository.ipb.ac.id/handle/123456789/113984
  14. R. Kurnia, B. Sartono, and I. M. Sumertajaya, “Classifier Learning For Imbalanced Dataset Using Modified SMOTEBoost Algorithm And Its Application On Credit Scorecard Modeling,” IPB University, 2013. Accessed: May 24, 2023. [Online]. Available: https://repository.ipb.ac.id/handle/123456789/67070
  15. I.Tamara, B. Sartono, and A. Kurnia, “Kajian Kinerja AlgoritmeKlasifikasi Extra-Trees pada Permasalahan Data Kelas Tak Seimbang,” IPB University, 2022. Accessed: May 24, 2023. [Online]. Available: https://repository.ipb.ac.id/handle/123456789/113230
  16. I.Augenstein, T. Rocktäschel, A. Vlachos, and K. Bontcheva, “Stance Detection with Bidirectional Conditional Encoding,” in Proceedings ofthe 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2016, pp. 876–885.
  17. S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Comput, vol. 9, no. 8, pp. 1735–1780, 1997, doi: 10.1162/neco.1997.9.8.1735.
  18. F. Gers, “Long Short-Term Memory in Recurrent Neural Networks,” 2001, doi: 10.5075/epfl-thesis-2366.
  19. F. A. Gers and J. Schmidhuber, “Recurrent Nets that Time and Count,” IEEE, 2000.
  20. K. Barik, S. Misra, A. K. Ray, and A. Bokolo, “LSTM-DGWO-Based Sentiment Analysis Framework for Analyzing Online Customer Reviews,” ComputIntellNeurosci, vol. 2023, pp. 1–19, 2023, doi: 10.1155/2023/6348831.
  21. M. N. Moghadasi, M. Sc, Y. Zhuang, V. Sheng, S. Yu, and M. Sheridan, “Sentimental Semantic Classification using Decomposed LSTM over Big Data,” Texas Tech University, 2020.
  22. W. J. Murdoch, P. J. Liu, and B. Yu, “Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs,” ICLR, 2018, [Online]. Available: http://arxiv.org/abs/1801.05453
  23. A.U. Rehman, A. K. Malik, B. Raza, and W. Ali, “A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment Analysis,” Multimed Tools Appl, vol. 78, no. 18, pp. 26597–26613, 2019, doi: 10.1007/s11042-019-07788-7.
  24. K. L. Tan, C. P. Lee, K. S. M. Anbananthen, and K. M. Lim, “RoBERTa-LSTM: A Hybrid Model for Sentiment Analysis With Transformer and Recurrent Neural Network,” IEEE Access, vol. 10, pp. 21517–21525, 2022, doi: 10.1109/ACCESS.2022.3152828.
  25. L. Septiani and Y. Sibaroni, “Sentiment Analysis Terhadap Tweet Bernada Sarkasme Berbahasa Indonesia,” Jurnal Linguistik Komputasional, vol. 2, no. 2, pp. 62–67, 2019, [Online]. Available: https://twitter.com/hipwee/status/734249192273645568?y
  26. J. Devlin, M.-W. Chang, K. Lee, K. T. Google, and A. I. Language, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings ofNAACL-HLT, 2019, pp. 4171–4186. [Online]. Available: https://github.com/tensorflow/tensor2tensor
  27. Y. Yunitasari, A. Musdholifah, and A. K. Sari, “Sarcasm Detection For Sentiment Analysis in Indonesian Tweets,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 13, no. 1, p. 53, Jan. 2019, doi: 10.22146/ijccs.41136.
  28. A.Muhaddisi, B. N. Prastowo, and D. U. Kusumaning Putri, “Sentiment Analysis With Sarcasm Detection On Politician’s Instagram,” IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 15, no. 4, p. 349, Oct. 2021, doi: 10.22146/ijccs.66375.
  29. F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,” arXiv preprint, 2020, [Online]. Available: http://arxiv.org/abs/2011.00677