Privacy Preserving for NLP Using Differential Privacy

Abstract

One of the most popular frameworks for guaranteeing data privacy is differential privacy preserving statistical utility. However, its practical application faces critical challenges, including the lack of a standardized approach for selecting privacy parameters, limitations in flexibility for diverse real world scenarios, and vulnerabilities in data-dependent settings. This paper offers a unique project that tackles these issues by using an enhanced differential privacy mechanism tailored for real-world datasets. Our research introduces an adaptive method for dynamically selecting the privacy parameter (ε), maintaining the best possible balance between data utility and privacy protection. Additionally, we enhance differential privacy mechanisms to support broader applications by customizing noise injection techniques, making them more adaptable to various data types and use cases. Experimental evaluations demonstrate that our approach significantly improves privacy preservation while maintaining analytical accuracy. Furthermore, we propose a robust solution to mitigate the vulnerabilities of differential privacy in data-dependent contexts, reducing the impact of inference attacks that exploit social, behavioral, and genetic relationships within datasets. By refining existing methodologies and introducing novel adaptations, our project enhances the effectiveness of differential privacy for real-world deployment. These findings contribute to advancing privacy-preserving techniques, enabling more secure and practical data analytic solutions for sensitive data handling in a variety of sectors.

Country : India

1 D. Akhil2 K. Yogananda3 A. Komala

  1. Student, Department of Computer Science and Engineering (Cyber Security) (UG), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, AP, India
  2. Student, Department of Computer Science and Engineering (Cyber Security) (UG), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, AP, India
  3. Assistant Professor, Department of Computer Science and Engineering (Cyber Security) (UG), Madanapalle Institute of Technology & Science (Autonomous), Madanapalle, AP, India

IRJIET, Volume 9, Special Issue of ICCIS-2025 May 2025 pp. 124-130

doi.org/10.47001/IRJIET/2025.ICCIS-202520

References

  1. N. R. Adam and J. C. Wortmann, “A Comparative Study of Security Mechanisms for Statistical Databases,” ACM Computing Surveys, vol. 21, no. 4, pp. 515–556, 1989.
  2. B. Barak et al., “A Comprehensive Approach to Privacy, Accuracy, and Consistency in Contingency Table Release,” in Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Beijing, China, 2007, pp. 273–282.
  3. D. C. Barth-Jones, “Revisiting Health Data Privacy Risks: Analyzing the Re-Identification of Governor William Weld’s Medical Records,” Columbia University, Department of Epidemiology, 2012.
  4. S. Chawla et al., “Enhancing Public Database Privacy through Crypto- graphic Techniques,” in Proceedings of the International Conference on Theory of Cryptography, Cambridge, MA, 2005, pp. 363–385.
  5. P. Cortez and A. M. G. Silva, “Data Mining Techniques for Predicting Secondary School Performance,” presented at an academic research conference, 2008.
  6. T. Dalenius, “Developing Statistical Disclosure Control Methodologies for Secure Data Processing,” Statistik Tidskrift, vol. 15, pp. 429–444, 1977.
  7. I.Dinur and K. Nissim, “Balancing Data Privacy and Information Disclosure,” in Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, San Diego, CA, 2003, pp. 202–210.
  8. C. Dwork, “Defining Private Data Analysis through a Universal Frame- work,” in Proceedings of the ACM SIGKDD Conference on Privacy, Security, and Trust in KDD, San Jose, CA, 2008, pp. 1–13.
  9. Gopi, S., Nakkiran, P., Smith, A., & Ullman, J. (2021). Numerical Composition of Differential Privacy. Proceedings of the 2021 ACM Symposium on Theory of Computing (STOC’21), 1327–1340. ACM.
  10. Song, C., Ristenpart, T., & Shmatikov, V. (2019). Auditing Differentially Private Machine Learning: How Private Is Private SGD?. Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS’19), 526–541. ACM.
  11. Hsu, J., Roth, A., & Ullman, J. (2014). Differential Privacy: An Economic Method for Choosing Epsilon. Proceedings of the IEEE Computer Security Foundations Symposium (CSF’14), 398–410. IEEE.
  12. Samarati, P., & Sweeney, L. (1998). Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression. Technical Report, MIT.
  13. Huan Yang, Wei Zhao, and Wei Li, DP-NLP: Implementing Differential Privacy for Large-Scale NLP Systems, in Proceedings of the 2021 ACL Workshop on NLP Privacy, 2021, pp. 19–30.
  14. Reza Shokri and Yiqiang Li, Generative Models for Privacy Preservation in NLP, Journal of Privacy and Data Security, vol. 16, no. 2, pp. 112–133, 2020.