Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 10 No 1 (2026): Volume 10, Issue 1, January 2026 | Pages: 39-50
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 17-01-2026
One of the most popular social media sites, Twitter, is an essential source of information for opinion mining and sentiment analysis. With millions of tweets generated daily, analysing these tweets to extract opinions and sentiments on various topics has become a critical task. In a democratic country like India, Twitter is a prominent medium for expressing views on diverse subjects, such as newly released movies, political figures and events, current affairs, the stock market, and more. This paper utilizes a balanced collection of positive and negative tweets sourced from the Sentiment140 benchmark dataset on Kaggle. Two widely used feature extraction techniques—TF-IDF (Term Frequency-Inverse Document Frequency) Vectorization and Count Vectorization—were implemented, incorporating unigram, bigram, trigram, and n-gram (1,3) approaches. Among these, TF-IDF with n-gram (1,3) modelling performed best on all evaluation metrics. For classification, Logistic Regression, a supervised machine learning model, was employed to capture sentiment patterns within the dataset effectively. This paper presents a well-structured pipeline for sentiment analysis, which can be used as a baseline method for future studies. It highlights the effectiveness of integrating advanced feature engineering techniques with robust machine learning algorithms to enhance sentiment classification accuracy on Twitter data.
Sentiment Analysis, Twitter, TF-IDF, Feature Extraction
Mansi A. Shah, & Ravi M. Gulati. (2026). Feature Engineering for Sentiment Analysis: Insights from Twitter Data. International Research Journal of Innovations in Engineering and Technology - IRJIET, 10(1), 39-50. Article DOI https://doi.org/10.47001/IRJIET/2026.101006
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence