Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Accurate
and real-time polyp segmentation is critical for early colorectal cancer
detection in computer-aided diagnosis systems. We propose a novel deep
learning-based segmentation model that integrates the strengths of
transformer-based global feature extraction and multiscale contextual
refinement. The architecture leverages the Pyramid Vision Transformer V2
(PVTv2-B1) as the encoder, which extracts hierarchical feature maps at four
different scales: 64, 128, 320, and 512 channels. These multi-resolution
features effectively capture global contextual representations essential for
segmenting polyps with varying sizes and shapes. At the core of the model lies
a dilated bottleneck block that enhances the receptive field without reducing
spatial resolution. It comprises four parallel dilated convolutional branches
with dilation rates of 1, 3, 5, and 7, followed by a channel fusion block using
1×1 convolution to aggregate contextual information. This module enables the
network to learn robust multiscale features crucial for accurate segmentation.
The decoder consists of three hierarchical decoder blocks, each composed of a
transpose convolution layer for upsampling, followed by concatenation with the
corresponding encoder skip connection and a double convolutional refinement
block. These decoder stages progressively reconstruct the spatial resolution
and refine boundary details. The final output is generated through bilinear
upsampling and a 1×1 convolution to produce the segmentation mask. Evaluated on
standard polyp segmentation datasets, the model achieves superior performance:
IoU of 0.8395, Dice score of 0.9029, Recall of 0.9217, Precision of 0.9072 and
a low Hausdorff Distance of 2.8736, indicating precise boundary prediction.
Additionally, the model operates at 47 FPS, making it highly suitable for
real-time clinical applications. This combination of transformer-based
encoding, dilated context aggregation, and U-Net-inspired decoding demonstrates
a powerful architecture for accurate and efficient medical image segmentation.
Country : Nepal
IRJIET, Volume 9, Issue 5, May 2025 pp. 175-180