Performance Analysis of Logistic Regression, Naive Bayes, KNN, Decision Tree, Random Forest and SVM on Hate Speech Detection from Twitter

Subhajeet Das; Koushikk Bhattacharyya; Sonali Sarkar

doi:https://doi.org/10.47001/IRJIET/2023.703004

Performance Analysis of Logistic Regression, Naive Bayes, KNN, Decision Tree, Random Forest and SVM on Hate Speech Detection from Twitter

Subhajeet DasDepartment of Computer Science & Engineering, Swami Vivekananda Institute of Science & Technology, Kolkata, IndiaKoushikk BhattacharyyaDepartment of Computer Science & Engineering, Swami Vivekananda Institute of Science & Technology, Kolkata, IndiaSonali SarkarDepartment of Chemical Engineering, Swami Vivekananda Institute of Science & Technology, Kolkata, India

Vol 7 No 3 (2023): Volume 7, Issue 3, March 2023 | Pages: 24-28

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 11-03-2023

doi.org/10.47001/IRJIET/2023.703004

Full Text PDF

Abstract

Hate speech specially racism, gender and religion discrimination, defaming comments are becoming one of the biggest problems in Twitter these days, that are making people to switch to other social media. Its effect is long-standing and unpreventable. To stop hateful activities from happening, Machine Learning approaches are needed to be applied. This research article focuses on the performance analysis and effectiveness of Logistic Regression, Gaussian Naive Bayes, K-Nearest Neighbor, Decision Tree, Random Forest and Support Vector Machine on detection of hate speech from Twitter. SVM, Decision Tree and Random Forest outperformed all the other models, achieving state-of-art 95.5%, 96.2% and 98.2% accuracy respectively on comments gather over a stretch.

Keywords

Hate Speech, Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbor, Gaussian Naïve Bayes, Support Vector Machine, Count Vectorizer, One Hot Encoder, Precision, Recall, Accuracy

Citation of this Article

Subhajeet Das, Koushikk Bhattacharyya, Sonali Sarkar, “Performance Analysis of Logistic Regression, Naive Bayes, KNN, Decision Tree, Random Forest and SVM on Hate Speech Detection from Twitter” Published in International Research Journal of Innovations in Engineering and Technology - IRJIET, Volume 7, Issue 3, pp 24-28, March 2023. Article DOI https://doi.org/10.47001/IRJIET/2023.703004

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

Hajibabaee, Parisa, Masoud Malekzadeh, Mohsen Ahmadi, Maryam Heidari, Armin Esmaeilzadeh, Reyhaneh Abdolazimi, and H. James Jr. "Offensive language detection on social media based on text classification." In 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0092-0098. IEEE, 2022.
Xu, Ruilin. "POS weighted TF-IDF algorithm and its application for an MOOC search engine." In 2014 International Conference on Audio, Language and Image Processing, pp. 868-873. IEEE, 2014.
Nobata, Chikashi, Joel Tetreault, Achint Thomas, Yashar Mehdad, and Yi Chang. "Abusive language detection in online user content." In Proceedings of the 25th international conference on World Wide Web, pp. 145-153. 2016.
Turki, Turki, and Sanjiban Sekhar Roy. "Novel Hate Speech Detection Using Word Cloud Visualization and Ensemble Learning Coupled with Count Vectorizer." Applied Sciences 12, no. 13 (2022): 6611.
Waseem, Zeerak, and Dirk Hovy. "Hateful symbols or hateful people? predictive features for hate speech detection on twitter." In Proceedings of the NAACL student research workshop, pp. 88-93. 2016.
Kwok, Irene, and Yuzhou Wang. "Locate the hate: Detecting tweets against blacks." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 27, no. 1, pp. 1621-1622. 2013.
Ahluwalia, Resham, Himani Soni, Edward Callow, Anderson Nascimento, and Martine De Cock. "Detecting hate speech against women in english tweets." EVALITA Evaluation of NLP and Speech Tools for Italian 12 (2018): 194.
Xu, Zhi, and Sencun Zhu. "Filtering offensive language in online communities using grammatical relations." In Proceedings of the Seventh Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 1-10. 2010.
Putri, Shofianina Dwi Ananda, Muhammad OkkyIbrohim, and Indra Budi. "Abusive language and hate speech detection for javanese and sundanese languages in tweets: Dataset and preliminary study." In 2021 11th International Workshop on Computer Science and Engineering, WCSE 2021, pp. 461-465. International Workshop on Computer Science and Engineering (WCSE), 2021.
Dewi, Mila Putri Kartika, and Erwin Budi Setiawan. "Feature Expansion Using Word2vec for Hate Speech Detection on Indonesian Twitter with Classification Using SVM and Random Forest." Jurnal Media Informatika Budidarma 6, no. 2 (2022): 979-988.
Cahyana, Nur Heri, Shoffan Saifullah, YuliFauziah, Agus Sasmito Aribowo, and Rafal Drezewski. "Semi-supervised Text Annotation for Hate Speech Detection using K-Nearest Neighbors and Term Frequency-Inverse Document Frequency." Int. J. Adv. Comput. Sci. Appl 13, no. 10 (2022): 147-151.
Warner, William, and Julia Hirschberg. "Detecting hate speech on the world wide web." In Proceedings of the second workshop on language in social media, pp. 19-26. 2012.
Chakravartula, Nikhil. "HATEMINER at SemEval-2019 task 5: hate speech detection against immigrants and women in Twitter using a multinomial naive Bayes classifier." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 404-408. 2019.
Fan, Rong-En, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. "LIBLINEAR: A library for large linear classification." the Journal of machine Learning research 9 (2008): 1871-1874.
Khond, Nupur, Godawari Padwal, Veena Ulgekar, Tejaswini Parsekar, and Sumit Harale. A Preventive Measure on Hate Speech Detection On Online Social Network using Naïve Bayes. No. 2967. EasyChair, 2020.
Asogwa, Doris Chinedu, Chiamaka Ijeoma Chukwuneke, C. C. Ngene, and G. N. Anigbogu. "Hate Speech Classification Using SVM and Naive BAYES." arXiv preprint arXiv:2204.07057 (2022).
Ginting, Purnama Sari Br, BudhiIrawan, and Casi Setianingsih. "Hate speech detection on Twitter using multinomial logistic regression classification method." In 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp. 105-111. IEEE, 2019.
MacAvaney, Sean, Hao-Ren Yao, Eugene Yang, Katina Russell, Nazli Goharian, and Ophir Frieder. "Hate speech detection: Challenges and solutions." PloS one 14, no. 8 (2019): e0221152.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Performance Analysis of Logistic Regression, Naive Bayes, KNN, Decision Tree, Random Forest and SVM on Hate Speech Detection from Twitter

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links