AI-Driven Image Generation and Virtual Try-On for Personalized Fashion Experiences

Abstract

The integration of Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) presents a transformative approach to generating personalized and contextually relevant images that cater to specific user preferences. This project aims to harness the synergistic potential of RAG and LLMs to develop a robust and scalable image generation pipeline that seamlessly blends state-of-the-art natural language processing with advanced computer vision techniques. The process begins by utilizing a RAG model, which combines the strengths of retrieval-based methods and generative models to produce high-quality images that are not only coherent with the input prompts but also enriched with context from external knowledge sources .Following the image generation, a dedicated preprocessing module is employed to resize and optimize the images, ensuring they meet the quality standards required for subsequent integration. The next critical phase involves the detection of human upper bodies in photographs using Haar Cascade classifiers, a machine learning-based approach known for its efficiency in real-time object detection. The accurate identification of the upper body regions is crucial for the next step, where the generated images are overlaid onto these detected regions using OpenCV, a powerful computer vision library. This integration ensures that the images are aligned precisely with the contours of the human body, creating a visually realistic and aesthetically pleasing effect. To facilitate user interaction and deployment, the entire process is encapsulated within a Flask framework, which serves as the backbone of the application’s architecture. The Flask framework not only handles the backend processing, including API requests and image processing tasks, but also supports a user-friendly frontend interface, allowing users to interact with the system effortlessly.

Country : India

1 Ashwini K. Suganawar

  1. M. Tech Student of Department of Computer Science & Engineering, Shri Balasaheb Mane Shikshan Prasarak Mandal’s, Ashokrao Mane Group of Institutions, Vathar, Kolhapur, India

IRJIET, Volume 8, Issue 9, September 2024 pp. 112-118

doi.org/10.47001/IRJIET/2024.809014

References

  1. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Riedel, S. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Proceedings of the 34th Conference on Neural Information   Processing Systems (NeurIPS), 1-16.
  2. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P.,.. & Amodei, D. (2020). "Language Models are Few-Shot Learners." Advances in Neural Information Processing Systems (NeurIPS), 33, 1877-1901.
  3. Guu, K., Lee, K., Tung, Z., Pasupat, P., & Chang, M. (2020). "REALM: Retrieval-Augmented Language Model Pre-Training." Proceedings of the 37th International Conference on Machine Learning (ICML), 8877-8888.
  4. Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer." Journal of Machine Learning Research, 21(140), 1-67.
  5. Karpukhin, V., Oguz, B., Min, S., Lewis, P., Wu, L., Edunov, S., ... & Yih, W. T. (2020). "Dense Passage Retrieval for Open-Domain Question Answering." Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 6769-6781.
  6. Bradski, G. (2000). "The OpenCV Library." Dr. Dobb's Journal of Software Tools, 25(11), 120-125.
  7. Kaehler, A., & Bradski, G. (2016). "Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library." O'Reilly Media.
  8. Dalal, N., & Triggs, B. (2005). "Histograms of Oriented Gradients for Human Detection." Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 886-893.
  9. Viola, P., & Jones, M. (2001). "Rapid Object Detection Using a Boosted Cascade of Simple Features." Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 511-518.
  10. Goodfellow, I., Bengio, Y., & Courville, A. (2016). "Deep Learning." MIT Press.
  11. He, K., Zhang, X., Ren, S., & Sun, J. (2016). "Deep Residual Learning for Image Recognition." Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778.