Transformer Based Architecture for Out of Distribution in Polyp Segmentation

Abstract

Accurate and real-time polyp segmentation is critical for early colorectal cancer detection in computer-aided diagnosis systems. We propose a novel deep learning-based segmentation model that integrates the strengths of transformer-based global feature extraction and multiscale contextual refinement. The architecture leverages the Pyramid Vision Transformer V2 (PVTv2-B1) as the encoder, which extracts hierarchical feature maps at four different scales: 64, 128, 320, and 512 channels. These multi-resolution features effectively capture global contextual representations essential for segmenting polyps with varying sizes and shapes. At the core of the model lies a dilated bottleneck block that enhances the receptive field without reducing spatial resolution. It comprises four parallel dilated convolutional branches with dilation rates of 1, 3, 5, and 7, followed by a channel fusion block using 1×1 convolution to aggregate contextual information. This module enables the network to learn robust multiscale features crucial for accurate segmentation. The decoder consists of three hierarchical decoder blocks, each composed of a transpose convolution layer for upsampling, followed by concatenation with the corresponding encoder skip connection and a double convolutional refinement block. These decoder stages progressively reconstruct the spatial resolution and refine boundary details. The final output is generated through bilinear upsampling and a 1×1 convolution to produce the segmentation mask. Evaluated on standard polyp segmentation datasets, the model achieves superior performance: IoU of 0.8395, Dice score of 0.9029, Recall of 0.9217, Precision of 0.9072 and a low Hausdorff Distance of 2.8736, indicating precise boundary prediction. Additionally, the model operates at 47 FPS, making it highly suitable for real-time clinical applications. This combination of transformer-based encoding, dilated context aggregation, and U-Net-inspired decoding demonstrates a powerful architecture for accurate and efficient medical image segmentation.

Country : Nepal

1 Laxmi Jha2 Prakash Chandra Prasad

  1. Software Engineer, Nepal Water Supply Corporation, Tripureshwor, Nepal
  2. Assistant Professor, Department of Computer & Electronics Engineering, Pulchowk Campus, Nepal

IRJIET, Volume 9, Issue 5, May 2025 pp. 175-180

doi.org/10.47001/IRJIET/2025.905022

References

  1. Douglas A Corley, Christopher D Jensen, Amy R Marks, Wei K Zhao, Jeffrey K Lee, Chyke A Doubeni, Ann G Zauber, Jolanda De Boer, Bruce H Fireman, Joanne E Schottinger, et al. Adenoma detection rate and risk of colorectal cancer and death. New england journal of medicine, 370(14):1298–1306, 2014.
  2. Kinalis, S. Nikoletseas, D. Patroumpa, and J. Rolim, “Biased sink mobility with adaptive stop times for low latency data collection in sensor networks,” Inf. Fusion, vol. 15, pp. 56–63, Jan. 2014.
  3. Gregor Urban, Pushpak Tripathi, Talal Alkayali, Manan Mittal, Farnaz Jalali, William Karnes, and Pierre Baldi. Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy. Gastroenterology, 155(4):1069–1078.e8, 2018.
  4. Nazir and H. Hasbullah, “Mobile sink based routing protocol (MSRP) for prolonging network lifetime in clustered wireless sensor network,” in Proc. Int. Conf. Comput. Appl. Ind. Electron. (ICCAIE), pp. 624–629, Dec. 2010.
  5. Md Mostafijur Rahman and Radu Marculescu. Medical image segmentation via cascaded attention decoding. pages 6222–6231, 2023.
  6. Chalermek, R. Govindan, and D. Estrin, “Directed diffusion: A scalable and robust communication paradigm for sensor networks,” in Proc. ACM SIGMOBILE Int. Conf. Mobile Computer Network (MOBICOM), pp. 56–67, 2000.
  7. Bin Xiao, Jinwu Hu, Weisheng Li, Chi-Man Pun, and Xiuli Bi. Ctnet: Contrastive transformer network for polyp segmentation. IEEE Transactions on Cybernetics, 2024.
  8. Debesh Jha, Nikhil Kumar Tomar, Debayan Bhattacharya, and Ulas Bagci. Transrupnet for improved polyp segmentation.
  9. Xiaoqi Zhao, Hongpeng Jia, Youwei Pang, Long Lv, Feng Tian, Lihe Zhang, Weibing Sun, and Huchuan Lu. Mˆ{2} snet: Multi-scale in multi-scale subtraction network for medical image segmentation. arXiv preprint arXiv:2303.10894, 2023..
  10. Tao Zhou, Yizhe Zhang, Yi Zhou, Ye Wu, and Chen Gong. Can sam segment polyps? arXiv preprint arXiv:2304.07583, 2023.
  11. Xiaoqi Zhao, Lihe Zhang, and Huchuan Lu. Automatic polyp segmentation via multiscale subtraction network. pages 120–130, 2021
  12. Gregor Urban, Priyam Tripathi, Talal Alkayali, Mohit Mittal, Farid Jalali, William Karnes, and Pierre Baldi. Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy. Gastroenterology, 155(4):1069–1078, 2018.
  13. Jorge Bernal, F Javier S´anchez, Gloria Fern´andez-Esparrach, Debora Gil, Cristina Rodr´ıguez, and Fernando Vilari˜no. Wm-dova maps for accurate polyp highlighting in vs.