Performance Analysis of Logistic Regression, Naive Bayes, KNN, Decision Tree, Random Forest and SVM on Hate Speech Detection from Twitter

Abstract

Hate speech specially racism, gender and religion discrimination, defaming comments are becoming one of the biggest problems in Twitter these days, that are making people to switch to other social media. Its effect is long-standing and unpreventable. To stop hateful activities from happening, Machine Learning approaches are needed to be applied. This research article focuses on the performance analysis and effectiveness of Logistic Regression, Gaussian Naive Bayes, K-Nearest Neighbor, Decision Tree, Random Forest and Support Vector Machine on detection of hate speech from Twitter. SVM, Decision Tree and Random Forest outperformed all the other models, achieving state-of-art 95.5%, 96.2% and 98.2% accuracy respectively on comments gather over a stretch.

Country : India

1 Subhajeet Das2 Koushikk Bhattacharyya3 Sonali Sarkar

  1. Department of Computer Science & Engineering, Swami Vivekananda Institute of Science & Technology, Kolkata, India
  2. Department of Computer Science & Engineering, Swami Vivekananda Institute of Science & Technology, Kolkata, India
  3. Department of Chemical Engineering, Swami Vivekananda Institute of Science & Technology, Kolkata, India

IRJIET, Volume 7, Issue 3, March 2023 pp. 24-28

doi.org/10.47001/IRJIET/2023.703004

References

  1. Hajibabaee, Parisa, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, and H. James Jr. "Offensive language detection on social media based on text classification." In 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0092-0098. IEEE, 2022.
  2. Xu, Ruilin. "POS weighted TF-IDF algorithm and its application for an MOOC search engine." In 2014 International Conference on Audio, Language and Image Processing, pp. 868-873. IEEE, 2014.
  3. Nobata, Chikashi, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. "Abusive language detection in online user content." In Proceedings of the 25th international conference on World Wide Web, pp. 145-153. 2016.
  4. Turki, Turki, and Sanjiban Sekhar Roy. "Novel Hate Speech Detection Using Word Cloud Visualization and Ensemble Learning Coupled with Count Vectorizer." Applied Sciences 12, no. 13 (2022): 6611.
  5. Waseem, Zeerak, and Dirk Hovy. "Hateful symbols or hateful people? predictive features for hate speech detection on twitter." In Proceedings of the NAACL student research workshop, pp. 88-93. 2016.
  6. Kwok, Irene, and Yuzhou Wang. "Locate the hate: Detecting tweets against blacks." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 27, no. 1, pp. 1621-1622. 2013.
  7. Ahluwalia, Resham, Himani Soni, Edward Callow, Anderson Nascimento, and Martine De Cock. "Detecting hate speech against women in english tweets." EVALITA Evaluation of NLP and Speech Tools for Italian 12 (2018): 194.
  8. Xu, Zhi, and Sencun Zhu. "Filtering offensive language in online communities using grammatical relations." In Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 1-10. 2010.
  9. Putri, Shofianina Dwi Ananda, Muhammad OkkyIbrohim, and Indra Budi. "Abusive language and hate speech detection for javanese and sundanese languages in tweets: Dataset and preliminary study." In 2021 11th International Workshop on Computer Science and Engineering, WCSE 2021, pp. 461-465. International Workshop on Computer Science and Engineering (WCSE), 2021.
  10. Dewi, Mila Putri Kartika, and Erwin Budi Setiawan. "Feature Expansion Using Word2vec for Hate Speech Detection on Indonesian Twitter with Classification Using SVM and Random Forest." Jurnal Media Informatika Budidarma 6, no. 2 (2022): 979-988.
  11. Cahyana, Nur Heri, Shoffan Saifullah, YuliFauziah, Agus Sasmito Aribowo, and Rafal Drezewski. "Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency." Int. J. Adv. Comput. Sci. Appl 13, no. 10 (2022): 147-151.
  12. Warner, William, and Julia Hirschberg. "Detecting hate speech on the world wide web." In Proceedings of the second workshop on language in social media, pp. 19-26. 2012.
  13. Chakravartula, Nikhil. "HATEMINER at SemEval-2019 task 5: hate speech detection against immigrants and women in Twitter using a multinomial naive Bayes classifier." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 404-408. 2019.
  14. Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. "LIBLINEAR: A library for large linear classification." the Journal of machine Learning research 9 (2008): 1871-1874.
  15. Khond, Nupur, Godawari Padwal, Veena Ulgekar, Tejaswini Parsekar, and Sumit Harale. A Preventive Measure on Hate Speech Detection On Online Social Network using Naïve Bayes. No. 2967. EasyChair, 2020.
  16. Asogwa, Doris Chinedu, Chiamaka Ijeoma Chukwuneke, C. C. Ngene, and G. N. Anigbogu. "Hate Speech Classification Using SVM and Naive BAYES." arXiv preprint arXiv:2204.07057 (2022).
  17. Ginting, Purnama Sari Br, BudhiIrawan, and Casi Setianingsih. "Hate speech detection on Twitter using multinomial logistic regression classification method." In 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp. 105-111. IEEE, 2019.
  18. MacAvaney, Sean, Hao-Ren Yao, Eugene Yang, Katina Russell, Nazli Goharian, and Ophir Frieder. "Hate speech detection: Challenges and solutions." PloS one 14, no. 8 (2019): e0221152.