Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 8 No 9 (2024): Volume 8, Issue 9, September 2024 | Pages: 112-118
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 17-09-2024
The integration of Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs) presents a transformative approach to generating personalized and contextually relevant images that cater to specific user preferences. This project aims to harness the synergistic potential of RAG and LLMs to develop a robust and scalable image generation pipeline that seamlessly blends state-of-the-art natural language processing with advanced computer vision techniques. The process begins by utilizing a RAG model, which combines the strengths of retrieval-based methods and generative models to produce high-quality images that are not only coherent with the input prompts but also enriched with context from external knowledge sources .Following the image generation, a dedicated preprocessing module is employed to resize and optimize the images, ensuring they meet the quality standards required for subsequent integration. The next critical phase involves the detection of human upper bodies in photographs using Haar Cascade classifiers, a machine learning-based approach known for its efficiency in real-time object detection. The accurate identification of the upper body regions is crucial for the next step, where the generated images are overlaid onto these detected regions using OpenCV, a powerful computer vision library. This integration ensures that the images are aligned precisely with the contours of the human body, creating a visually realistic and aesthetically pleasing effect. To facilitate user interaction and deployment, the entire process is encapsulated within a Flask framework, which serves as the backbone of the application’s architecture. The Flask framework not only handles the backend processing, including API requests and image processing tasks, but also supports a user-friendly frontend interface, allowing users to interact with the system effortlessly.
Augmented Generation (RAG),Large Language Models (LLMs), image generation, Machine Learning, Open CV, Deep Learning
Ashwini K. Suganawar, (2024). AI-Driven Image Generation and Virtual Try-On for Personalized Fashion Experiences. International Research Journal of Innovations in Engineering and Technology - IRJIET, 8(9), 112-118. Article DOI https://doi.org/10.47001/IRJIET/2024.809014
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence