Feature Engineering for Sentiment Analysis: Insights from Twitter Data

Mansi A. Shah; Ravi M. Gulati

doi:https://doi.org/10.47001/IRJIET/2026.101006

Feature Engineering for Sentiment Analysis: Insights from Twitter Data

Mansi A. ShahDepartment of Computer Science, Veer Narmad South Gujarat University, Surat, Gujarat, IndiaRavi M. GulatiDepartment of Computer Science, Veer Narmad South Gujarat University, Surat, Gujarat, India

Vol 10 No 1 (2026): Volume 10, Issue 1, January 2026 | Pages: 39-50

International Research Journal of Innovations in Engineering and Technology

OPEN ACCESS | Research Article | Published Date: 17-01-2026

doi.org/10.47001/IRJIET/2026.101006

Full Text PDF

Abstract

One of the most popular social media sites, Twitter, is an essential source of information for opinion mining and sentiment analysis. With millions of tweets generated daily, analysing these tweets to extract opinions and sentiments on various topics has become a critical task. In a democratic country like India, Twitter is a prominent medium for expressing views on diverse subjects, such as newly released movies, political figures and events, current affairs, the stock market, and more. This paper utilizes a balanced collection of positive and negative tweets sourced from the Sentiment140 benchmark dataset on Kaggle. Two widely used feature extraction techniques—TF-IDF (Term Frequency-Inverse Document Frequency) Vectorization and Count Vectorization—were implemented, incorporating unigram, bigram, trigram, and n-gram (1,3) approaches. Among these, TF-IDF with n-gram (1,3) modelling performed best on all evaluation metrics. For classification, Logistic Regression, a supervised machine learning model, was employed to capture sentiment patterns within the dataset effectively. This paper presents a well-structured pipeline for sentiment analysis, which can be used as a baseline method for future studies. It highlights the effectiveness of integrating advanced feature engineering techniques with robust machine learning algorithms to enhance sentiment classification accuracy on Twitter data.

Keywords

Sentiment Analysis, Twitter, TF-IDF, Feature Extraction

Citation of this Article

Mansi A. Shah, & Ravi M. Gulati. (2026). Feature Engineering for Sentiment Analysis: Insights from Twitter Data. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(1), 39-50. Article DOI https://doi.org/10.47001/IRJIET/2026.101006

This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence

References

Muhammad Javed and Shahid Kamal, “Normalization of Unstructured and Informal Text in Sentiment Analysis” International Journal of Advanced Computer Science and Applications (IJACSA), 9(10), 2018. http://dx.doi.org/10.14569/IJACSA.2018.091011
Zarisfi Kermani, F., Sadeghi, F. & Eslami, E. solving the twitter sentiment analysis problem based on a machine learning-based approach. Evol. Intel. 13, 381–398 (2020). https://doi.org/10.1007/s12065-019-00301-x
Abdulfattah Ba Alawi, Ferhat Bozkurt, A hybrid machine learning model for sentiment analysis and satisfaction assessment with Turkish universities using Twitter data, Decision Analytics Journal, Volume 11, 2024, 100473, ISSN 2772-6622, https://doi.org/10.1016/j.dajour.2024.100473.
Bello, A., Ng, S.-C., & Leung, M.-F. (2023). A BERT Framework to Sentiment Analysis of Tweets. Sensors, 23(1), 506. https://doi.org/10.3390/s23010506
Devarapalli, D., Sri, M.S., Sri, P.K., Charishma, P., Mounika, P.V.N. (2022). Sentiment Analysis of COVID-19 Tweets Using Classification Algorithms. In: Saini, H.S., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. Lecture Notes in Networks and Systems, vol 385. Springer, Singapore. https://doi.org/10.1007/978-981-16-8987-1_42
I.Gupta and N. Joshi, "Feature-Based Twitter Sentiment Analysis With Improved Negation Handling," in IEEE Transactions on Computational Social Systems, vol. 8, no. 4, pp. 917-927, Aug. 2021, doi: 10.1109/TCSS.2021.3069413.
A.Poornima and K. S. Priya, "A Comparative Sentiment Analysis Of Sentence Embedding Using Machine Learning Techniques," 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2020, pp. 493-496, doi: 10.1109/ICACCS48705.2020.9074312.
R. Gupta, J. Kumar, H. Agrawal and Kunal, "A Statistical Approach for Sarcasm Detection Using Twitter Data," 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2020, pp. 633-638, doi: 10.1109/ICICCS48265.2020.9120917.
K. Parmar, N. Limbasiya and M. Dhamecha, "Feature based Composite Approach for Sarcasm Detection using MapReduce," 2018 Second International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2018, pp. 587-591, doi: 10.1109/ICCMC.2018.8488096.
Yafeng Ren, Donghong Ji, Han Ren, Context-augmented convolutional neural networks for twitter sarcasm detection, Neurocomputing, Volume 308, 2018, Pages 1-7, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2018.03.047.
Rathan M., Vishwanath R. Hulipalled, K.R. Venugopal, L.M. Patnaik, Consumer insight mining: Aspect based Twitter opinion mining of mobile phone reviews, Applied Soft Computing, Volume 68, 2018, Pages 765-773, ISSN 1568-4946, https://doi.org/10.1016/j.asoc.2017.07.056.
Dr. Kalpesh H. Wandra, Mehul Barot, Sarcasm Detection in Sentiment Analysis, 2017, International Journal of Current Engineering and Scientific Research, ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-4, ISSUE-9.
K. Lavanya and C. Deisy, "Twitter sentiment analysis using multi-class SVM," 2017 International Conference on Intelligent Computing and Control (I2C2), Coimbatore, India, 2017, pp. 1-6, doi: 10.1109/I2C2.2017.8321798.
A.Deshwal and S. K. Sharma, "Twitter sentiment analysis using various classification algorithms," 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 2016, pp. 251-257, doi: 10.1109/ICRITO.2016.7784960.
Ana Carolina E.S. Lima, Leandro Nunes de Castro, Juan M. Corchado, A polarity analysis framework for Twitter messages, Applied Mathematics and Computation, Volume 270, 2015, Pages 756-767, ISSN 0096-3003, https://doi.org/10.1016/j.amc.2015.08.059.
Nádia F.F. da Silva, Eduardo R. Hruschka, Estevam R. Hruschka, Tweet sentiment analysis with classifier ensembles, Decision Support Systems, Volume 66, 2014, Pages 170-179, ISSN 0167-9236, https://doi.org/10.1016/j.dss.2014.07.003.
M. S. Neethu and R. Rajasree, "Sentiment analysis in twitter using machine learning techniques," 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, India, 2013, pp. 1-5, doi: 10.1109/ICCCNT.2013.6726818.
Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC`10). European Language Resources Association (ELRA). https://aclanthology.org/L10-1263/
Effrosynidis, D., Symeonidis, S., Arampatzis, A. (2017). A Comparison of Pre-processing Techniques for Twitter Sentiment Analysis. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science (), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_31
Z. Jianqiang, G. Xiaolin and Z. Xuejun, "Deep Convolution Neural Networks for Twitter Sentiment Analysis," in IEEE Access, vol. 6, pp. 23253-23260, 2018, doi: 10.1109/ACCESS.2017.2776930.
Nandy, H., Sridhar, R. (2021). A Novel Feature Engineering Approach for Twitter-Based Text Sentiment Analysis. In: Singh, P.K., Noor, A., Kolekar, M.H., Tanwar, S., Bhatnagar, R.K., Khanna, S. (eds) Evolving Technologies for Computing, Communication and Smart World. Lecture Notes in Electrical Engineering, vol 694. Springer, Singapore. https://doi.org/10.1007/978-981-15-7804-5_23
Akshi Kumar, Kathiravan Srinivasan, Wen-Huang Cheng, Albert Y. Zomaya, Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data, Information Processing & Management, Volume 57, Issue 1, 2020, 102141, ISSN 0306-4573, https://doi.org/10.1016/j.ipm.2019.102141.
S. E. Saad and J. Yang, "Twitter Sentiment Analysis Based on Ordinal Regression," in IEEE Access, vol. 7, pp. 163677-163685, 2019, doi: 10.1109/ACCESS.2019.2952127.

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Feature Engineering for Sentiment Analysis: Insights from Twitter Data

Abstract

Keywords

Citation of this Article

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links