Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
The
integration of Retrieval-Augmented Generation (RAG) with Large Language Models
(LLMs) presents a transformative approach to generating personalized and
contextually relevant images that cater to specific user preferences. This
project aims to harness the synergistic potential of RAG and LLMs to develop a
robust and scalable image generation pipeline that seamlessly blends
state-of-the-art natural language processing with advanced computer vision
techniques. The process begins by utilizing a RAG model, which combines the
strengths of retrieval-based methods and generative models to produce
high-quality images that are not only coherent with the input prompts but also
enriched with context from external knowledge sources .Following the image
generation, a dedicated preprocessing module is employed to resize and optimize
the images, ensuring they meet the quality standards required for subsequent
integration. The next critical phase involves the detection of human upper
bodies in photographs using Haar Cascade classifiers, a machine learning-based
approach known for its efficiency in real-time object detection. The accurate
identification of the upper body regions is crucial for the next step, where
the generated images are overlaid onto these detected regions using OpenCV, a
powerful computer vision library. This integration ensures that the images are
aligned precisely with the contours of the human body, creating a visually
realistic and aesthetically pleasing effect. To facilitate user interaction and
deployment, the entire process is encapsulated within a Flask framework, which
serves as the backbone of the application’s architecture. The Flask framework
not only handles the backend processing, including API requests and image
processing tasks, but also supports a user-friendly frontend interface,
allowing users to interact with the system effortlessly.
Country : India
IRJIET, Volume 8, Issue 9, September 2024 pp. 112-118