Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Vol 9 No 5 (2025): Volume 9, Issue 5, May 2025 | Pages: 175-180
International Research Journal of Innovations in Engineering and Technology
OPEN ACCESS | Research Article | Published Date: 20-05-2025
Accurate and real-time polyp segmentation is critical for early colorectal cancer detection in computer-aided diagnosis systems. We propose a novel deep learning-based segmentation model that integrates the strengths of transformer-based global feature extraction and multiscale contextual refinement. The architecture leverages the Pyramid Vision Transformer V2 (PVTv2-B1) as the encoder, which extracts hierarchical feature maps at four different scales: 64, 128, 320, and 512 channels. These multi-resolution features effectively capture global contextual representations essential for segmenting polyps with varying sizes and shapes. At the core of the model lies a dilated bottleneck block that enhances the receptive field without reducing spatial resolution. It comprises four parallel dilated convolutional branches with dilation rates of 1, 3, 5, and 7, followed by a channel fusion block using 1×1 convolution to aggregate contextual information. This module enables the network to learn robust multiscale features crucial for accurate segmentation. The decoder consists of three hierarchical decoder blocks, each composed of a transpose convolution layer for upsampling, followed by concatenation with the corresponding encoder skip connection and a double convolutional refinement block. These decoder stages progressively reconstruct the spatial resolution and refine boundary details. The final output is generated through bilinear upsampling and a 1×1 convolution to produce the segmentation mask. Evaluated on standard polyp segmentation datasets, the model achieves superior performance: IoU of 0.8395, Dice score of 0.9029, Recall of 0.9217, Precision of 0.9072 and a low Hausdorff Distance of 2.8736, indicating precise boundary prediction. Additionally, the model operates at 47 FPS, making it highly suitable for real-time clinical applications. This combination of transformer-based encoding, dilated context aggregation, and U-Net-inspired decoding demonstrates a powerful architecture for accurate and efficient medical image segmentation.
Computer aided diagnosis, out-of-distribution, polyp segmentation, Dilated Convolutions, Pyramid vision transformer
Laxmi Jha, & Prakash Chandra Prasad. (2025). Transformer Based Architecture for Out of Distribution in Polyp Segmentation. International Research Journal of Innovations in Engineering and Technology - IRJIET, 9(5), 175-180. Article DOI https://doi.org/10.47001/IRJIET/2025.905022
This work is licensed under Creative common Attribution Non Commercial 4.0 Internation Licence