Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 9 No 25 (2025): Volume 9, Special Issue of INSPIRE’25 April 2025 | Pages: 164-171
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 24-04-2025
Building high-performance text classification models in low-resource languages is a challenging task due to the scarcity of labelled data. Traditional approaches rely on manually annotated corpora, which are expensive and time-consuming to obtain. However, most existing augmentation methods are language-dependent, leveraging linguistic tools such as synonym replacement, word embeddings, or grammar-based transformations, which restrict their applicability to multilingual and low-resource settings. Our approach leverages a combination of back-translation, token-level perturbations, and contrastive learning to create diverse, semantically meaningful augmented samples that enhance model learning. Back-translation introduces natural variations while preserving meaning, token-level perturbations modify individual tokens to improve robustness, and contrastive learning helps the model distinguish between subtle differences in text representations, leading to better generalization across unseen data. Our results show that LiDA outperforms traditional augmentation techniques by generating more contextually relevant and linguistically diverse samples, particularly in low-resource environments. Furthermore, our method enhances model adaptability to multilingual data, demonstrating its potential as a scalable and language-agnostic augmentation strategy.
LiDA, MBERT, SBERT, XLM-RoBERTa, LSTM, Token-Level, Constructive Learning, Back-Translation
M. Sharmila Devi, G. Sharanya, B. Himaja, A. Bhavya Rohitha, A. Sujitha, J. Swapna kumari. (2025). Language-Independent Data Augmentation for Text Classification [LiDA]. In proceeding of International Conference on Sustainable Practices and Innovations in Research and Engineering (INSPIRE'25), published by IRJIET, Volume 9, Special Issue of INSPIRE’25, pp 164-171. Article DOI https://doi.org/10.47001/IRJIET/2025.INSPIRE27
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence