Feature Engineering for Sentiment Analysis: Insights from Twitter Data

Abstract

One of the most popular social media sites, Twitter, is an essential source of information for opinion mining and sentiment analysis. With millions of tweets generated daily, analysing these tweets to extract opinions and sentiments on various topics has become a critical task. In a democratic country like India, Twitter is a prominent medium for expressing views on diverse subjects, such as newly released movies, political figures and events, current affairs, the stock market, and more. This paper utilizes a balanced collection of positive and negative tweets sourced from the Sentiment140 benchmark dataset on Kaggle. Two widely used feature extraction techniques—TF-IDF (Term Frequency-Inverse Document Frequency) Vectorization and Count Vectorization—were implemented, incorporating unigram, bigram, trigram, and n-gram (1,3) approaches. Among these, TF-IDF with n-gram (1,3) modelling performed best on all evaluation metrics. For classification, Logistic Regression, a supervised machine learning model, was employed to capture sentiment patterns within the dataset effectively. This paper presents a well-structured pipeline for sentiment analysis, which can be used as a baseline method for future studies. It highlights the effectiveness of integrating advanced feature engineering techniques with robust machine learning algorithms to enhance sentiment classification accuracy on Twitter data.

Country : India

1 Mansi A. Shah2 Ravi M. Gulati

  1. Department of Computer Science, Veer Narmad South Gujarat University, Surat, Gujarat, India
  2. Department of Computer Science, Veer Narmad South Gujarat University, Surat, Gujarat, India

IRJIET, Volume 10, Issue 1, January 2026 pp. 39-50

doi.org/10.47001/IRJIET/2026.101006

References

  1. Muhammad Javed and Shahid Kamal, “Normalization of Unstructured and Informal Text in Sentiment Analysis” International Journal of Advanced Computer Science and Applications (IJACSA), 9(10), 2018. http://dx.doi.org/10.14569/IJACSA.2018.091011
  2. Zarisfi Kermani, F., Sadeghi, F. & Eslami, E. solving the twitter sentiment analysis problem based on a machine learning-based approach. Evol. Intel. 13, 381–398 (2020). https://doi.org/10.1007/s12065-019-00301-x
  3. Abdulfattah Ba Alawi, Ferhat Bozkurt, A hybrid machine learning model for sentiment analysis and satisfaction assessment with Turkish universities using Twitter data, Decision Analytics Journal, Volume 11, 2024, 100473, ISSN 2772-6622, https://doi.org/10.1016/j.dajour.2024.100473.
  4. Bello, A., Ng, S.-C., & Leung, M.-F. (2023). A BERT Framework to Sentiment Analysis of Tweets. Sensors, 23(1), 506. https://doi.org/10.3390/s23010506
  5. Devarapalli, D., Sri, M.S., Sri, P.K., Charishma, P., Mounika, P.V.N. (2022). Sentiment Analysis of COVID-19 Tweets Using Classification Algorithms. In: Saini, H.S., Sayal, R., Govardhan, A., Buyya, R. (eds) Innovations in Computer Science and Engineering. Lecture Notes in Networks and Systems, vol 385. Springer, Singapore. https://doi.org/10.1007/978-981-16-8987-1_42
  6. I.Gupta and N. Joshi, "Feature-Based Twitter Sentiment Analysis With Improved Negation Handling," in IEEE Transactions on Computational Social Systems, vol. 8, no. 4, pp. 917-927, Aug. 2021, doi: 10.1109/TCSS.2021.3069413.
  7. A.Poornima and K. S. Priya, "A Comparative Sentiment Analysis Of Sentence Embedding Using Machine Learning Techniques," 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 2020, pp. 493-496, doi: 10.1109/ICACCS48705.2020.9074312.
  8. R. Gupta, J. Kumar, H. Agrawal and Kunal, "A Statistical Approach for Sarcasm Detection Using Twitter Data," 2020 4th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 2020, pp. 633-638, doi: 10.1109/ICICCS48265.2020.9120917.
  9. K. Parmar, N. Limbasiya and M. Dhamecha, "Feature based Composite Approach for Sarcasm Detection using MapReduce," 2018 Second International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 2018, pp. 587-591, doi: 10.1109/ICCMC.2018.8488096.
  10. Yafeng Ren, Donghong Ji, Han Ren, Context-augmented convolutional neural networks for twitter sarcasm detection, Neurocomputing, Volume 308, 2018, Pages 1-7, ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2018.03.047.
  11. Rathan M., Vishwanath R. Hulipalled, K.R. Venugopal, L.M. Patnaik, Consumer insight mining: Aspect based Twitter opinion mining of mobile phone reviews, Applied Soft Computing, Volume 68, 2018, Pages 765-773, ISSN 1568-4946, https://doi.org/10.1016/j.asoc.2017.07.056.
  12. Dr. Kalpesh H. Wandra, Mehul Barot, Sarcasm Detection in Sentiment Analysis, 2017, International Journal of Current Engineering and Scientific Research, ISSN (PRINT): 2393-8374, (ONLINE): 2394-0697, VOLUME-4, ISSUE-9.
  13. K. Lavanya and C. Deisy, "Twitter sentiment analysis using multi-class SVM," 2017 International Conference on Intelligent Computing and Control (I2C2), Coimbatore, India, 2017, pp. 1-6, doi: 10.1109/I2C2.2017.8321798.
  14. A.Deshwal and S. K. Sharma, "Twitter sentiment analysis using various classification algorithms," 2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 2016, pp. 251-257, doi: 10.1109/ICRITO.2016.7784960.
  15. Ana Carolina E.S. Lima, Leandro Nunes de Castro, Juan M. Corchado, A polarity analysis framework for Twitter messages, Applied Mathematics and Computation, Volume 270, 2015, Pages 756-767, ISSN 0096-3003, https://doi.org/10.1016/j.amc.2015.08.059.
  16. Nádia F.F. da Silva, Eduardo R. Hruschka, Estevam R. Hruschka, Tweet sentiment analysis with classifier ensembles, Decision Support Systems, Volume 66, 2014, Pages 170-179, ISSN 0167-9236, https://doi.org/10.1016/j.dss.2014.07.003.
  17. M. S. Neethu and R. Rajasree, "Sentiment analysis in twitter using machine learning techniques," 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, India, 2013, pp. 1-5, doi: 10.1109/ICCCNT.2013.6726818.
  18. Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In N. Calzolari, K. Choukri, B. Maegaard, J. Mariani, J. Odijk, S. Piperidis, M. Rosner, & D. Tapias (Eds.), Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC`10). European Language Resources Association (ELRA). https://aclanthology.org/L10-1263/
  19. Effrosynidis, D., Symeonidis, S., Arampatzis, A. (2017). A Comparison of Pre-processing Techniques for Twitter Sentiment Analysis. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science (), vol 10450. Springer, Cham. https://doi.org/10.1007/978-3-319-67008-9_31
  20. Z. Jianqiang, G. Xiaolin and Z. Xuejun, "Deep Convolution Neural Networks for Twitter Sentiment Analysis," in IEEE Access, vol. 6, pp. 23253-23260, 2018, doi: 10.1109/ACCESS.2017.2776930.
  21. Nandy, H., Sridhar, R. (2021). A Novel Feature Engineering Approach for Twitter-Based Text Sentiment Analysis. In: Singh, P.K., Noor, A., Kolekar, M.H., Tanwar, S., Bhatnagar, R.K., Khanna, S. (eds) Evolving Technologies for Computing, Communication and Smart World. Lecture Notes in Electrical Engineering, vol 694. Springer, Singapore. https://doi.org/10.1007/978-981-15-7804-5_23
  22. Akshi Kumar, Kathiravan Srinivasan, Wen-Huang Cheng, Albert Y. Zomaya, Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data, Information Processing & Management, Volume 57, Issue 1, 2020, 102141, ISSN 0306-4573, https://doi.org/10.1016/j.ipm.2019.102141.
  23. S. E. Saad and J. Yang, "Twitter Sentiment Analysis Based on Ordinal Regression," in IEEE Access, vol. 7, pp. 163677-163685, 2019, doi: 10.1109/ACCESS.2019.2952127.