Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
One of the
most popular social media sites, Twitter, is an essential source of information
for opinion mining and sentiment analysis. With millions of tweets generated
daily, analysing these tweets to extract opinions and sentiments on various
topics has become a critical task. In a democratic country like India, Twitter
is a prominent medium for expressing views on diverse subjects, such as newly
released movies, political figures and events, current affairs, the stock
market, and more. This paper utilizes a balanced collection of positive and
negative tweets sourced from the Sentiment140 benchmark dataset on Kaggle. Two
widely used feature extraction techniques—TF-IDF (Term Frequency-Inverse
Document Frequency) Vectorization and Count Vectorization—were implemented,
incorporating unigram, bigram, trigram, and n-gram (1,3) approaches. Among
these, TF-IDF with n-gram (1,3) modelling performed best on all evaluation
metrics. For classification, Logistic Regression, a supervised machine learning
model, was employed to capture sentiment patterns within the dataset
effectively. This paper presents a well-structured pipeline for sentiment
analysis, which can be used as a baseline method for future studies. It
highlights the effectiveness of integrating advanced feature engineering
techniques with robust machine learning algorithms to enhance sentiment
classification accuracy on Twitter data.
Country : India
IRJIET, Volume 10, Issue 1, January 2026 pp. 39-50