Abstract
Unlike existing fully-supervised approaches, we rethink colorectal polyp segmentation from an out-of-distribution perspective with a simple but effective self-supervised learning approach. We leverage the ability of masked autoencoders–self-supervised vision transformers trained on a reconstruction task–to learn in-distribution representations, here, the distribution of healthy colon images. We then perform out-of-distribution reconstruction and inference, with feature space standardisation to align the latent distribution of the diverse abnormal samples with the statistics of the healthy samples. We generate per-pixel anomaly scores for each image by calculating the difference between the input and reconstructed images and use this signal for out-of-distribution (i.e., polyp) segmentation. Experimental results on six benchmarks show that our model has excellent segmentation performance and generalises across datasets. Our code is publicly available at https://github.com/GewelsJI/Polyp-OOD.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
M. M. Center, A. Jemal, R. A. Smith, E. Ward. Worldwide variations in colorectal cancer. CA: A Cancer Journal for Clinicians, vol.59, no.6, pp.366–378, 2009. DOI: https://doi.org/10.3322/caac.20038.
American Cancer Society. Survival rates for colorectal cancer, [Online], Available:https://www.cancer.org/cancer/colon-rectal-cancer/detection-diagnosis-staging/survival-rates.html, 2023.
W. C. Zhang, C. Fu, Y. Zheng, F. Y. Zhang, Y. 1. Zhao, C. W. Sham. HSNet: A hybrid semantic network for polyp segmentation. Computers in Biology and Medicine, vol.150, Article number 106173, 2022. DOI: https://doi.org/10.1016/j.compbiomed.2022.106173.
D. P. Fan, G. P. Ji, T. Zhou, G. Chen, H. Z. Fu, J. B. Shen, L. Shao. PraNet: Parallel reverse attention network for polyp segmentation. In Proceedings of the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, Lima, Peru, pp. 263–273, 2020. DOI: https://doi.org/10.1007/978-3-030-59725-2_26.
G. P. Ji, G. B. Xiao, Y. C. Chou, D. P. Fan, K. Zhao, G. Chen, L. Van Gool. Video polyp segmentation: A deep learning perspective. Machine Intelligence Research, vol.19, no. 6, pp. 531–549, 2022. DOI: https://doi.org/10.1007/s11633-022-1371-y.
H. S. Wu, G. L. Chen, Z. K. Wen, J. Qin. Collaborative and adversarial learning of focused and dispersive representations for semi-supervised polyp segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, Montreal, Canada, pp. 3469–3478, 2021. DOI: https://doi.org/10.1109/ICCV48922.2021.00347.
X. T. Li, J. L. Xu, Y. J. Zhang, R. Feng, R. W. Zhao, T. Zhang, X. Q. Lu, S. Gao. TCCNet: Temporally consistent context-free network for semi-supervised video polyp segmentation. In Proceedings of the 31st International Joint Conference on Artificial Intelligence, Vienna, Austria, pp. 1109–1115, 2022. DOI: https://doi.org/10.24963/ijcai.2022/155.
X. K. Zhao, Z. H. Wu, S. Y. Tan, D. J. Fan, Z. Li, X. Wan, G. B. Li. Semi-supervised spatial temporal attention network for video polyp segmentation. In Proceedings of the 25th International Conference on Medical Image Computing and Computer Assisted Intervention, Singapore, pp. 456–466, 2022. DOI: https://doi.org/10.1007/978-3-031-16440-8_44.
R. F. Zhang, S. S. Liu, Y. Z. Yu, G. B. Li. Self-supervised correction learning for semi-supervised biomedical image segmentation. In Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, pp. 134–144, 2021. DOI: https://doi.org/10.1007/978-3-030-87196-3_13.
M. L. Zhu, Z. Chen, Y. X. Yuan. FedDM: Federated weakly supervised segmentation via annotation calibration and gradient de-conflicting. IEEE Transactions on Medical Imaging, vol.42, no.6, pp. 1632–1643, 2023. DOI: https://doi.org/10.1109/TMI.2023.3235757.
L. Ruiz, F. Martinez. Weakly supervised polyp segmentation from an attention receptive field mechanism. In Proceedings of the 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Glasgow, UK, pp. 3745–3748, 2022. DOI: https://doi.org/10.1109/EMBC48229.2022.9871158.
J. H. Dong, Y. Cong, G. Sun, D. D. Hou. Semantic-transferable weakly-supervised endoscopic lesions segmentation. In Proceedings of IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, pp. 10711–10720, 2019. DOI: https://doi.org/10.1109/ICCV.2019.01081.
J. H. Dong, Y. Cong, G. Sun, Y. S. Yang, X. W. Xu, Z. M. Ding. Weakly-supervised cross-domain adaptation for endoscopic lesions segmentation. IEEE Transactions on Circuits and Systems for Video Technology, vol.31, no.5, pp. 2020–2033, 2021. DOI: https://doi.org/10.1109/TCSVT.2020.3016058.
S. H. You, K. C. Tezcan, X. R. Chen, E. Konukoglu. Unsupervised lesion detection via image restoration with a normative prior. In Proceedings of the 2nd International Conference on Medical Imaging with Deep Learning, London, UK, pp. 540–556, 2019.
Y. Tian, F. B. Liu, G. S. Pang, Y. H. Chen, Y. Y. Liu, J. W. Verjans, R. Singh, G. Carneiro. Self-supervised multi-class pre-training for unsupervised anomaly detection and segmentation in medical images, [Online], Available:https://arxiv.org/abs/2109.01303, 2021.
Y. Tian, G. S. Pang, F. B. Liu, Y. H. Chen, S. H. Shin, J. W. Verjans, R. Singh, G. Carneiro. Constrained contrastive distribution learning for unsupervised anomaly detection and localisation in medical images. In Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, pp. 128–140, 2021. DOI: https://doi.org/10.1007/978-3-030-87240-313.
R. Chalapathy, S. Chawla. Deep learning for anomaly detection: A survey, [Online], Available: https://arxiv.org/abs/1901.03407, 2019.
T. Denouden, R. Salay, K. Czarnecki, V. Abdelzad, B. Phan, S. Vernekar. Improving reconstruction autoencoder out-of-distribution detection with Mahalanobis distance, [Online], Available: https://arxiv.org/abs/1812.02765, 2018.
Y. Tian, G. S. Pang, Y. Y. Liu, C. Wang, Y. H. Chen, F. B. Liu, R. Singh, J. W. Verjans, G. Carneiro. Unsupervised anomaly detection in medical images with a memory-augmented multi-level cross-attentional masked autoencoder, [Online], Available: https://arxiv.org/abs/2203.11725, 2022.
K. M. He, X. L. Chen, S. N. Xie, Y. H. Li, P. Dollar, R. Girshick. Masked autoencoders are scalable vision learners. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, pp. 15979–15988, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.01553.
P. Brandao, E. Mazomenos, G. Ciuti, R. Caliò, F. Bianchi, A. Menciassi, P. Dario, A. Koulaouzidis, A. Arezzo, D. Stoyanov. Fully convolutional neural networks for polyp segmentation in colonoscopy. In Proceedings of SPIE 10134, Medical Imaging 2017: Computer-Aided Diagnosis, Orlando, USA, Article number 101340F, 2017. DOI: https://doi.org/10.1117/12.2254361.
D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. De Lange, P. Halvorsen, H. D. Johansen. ResUNet++: An advanced architecture for medical image segmentation. In Proceedings of IEEE International Symposium on Multimedia, San Diego, USA, pp. 225–230, 2019. DOI: https://doi.org/10.1109/ISM46123.2019.00049.
M. Yeung, E. Sala, C. B. Schönlieb, L. Rundo. Focus U-net: A novel dual attention-gated CNN for polyp segmentation during colonoscopy. Computers in Biology and Medicine, vol.137, Article number 104815, 2021. DOI: https://doi.org/10.1016/j.compbiomed.2021.104815.
T. Mahmud, B. Paul, S. A. Fattah. PolypSegNet: A modified encoder-decoder architecture for automated polyp segmentation from colonoscopy images. Computers in Biology and Medicine, vol. 128, Article number 104119, 2021. DOI: https://doi.org/10.1016/j.compbiomed.2020.104119.
X. Q. Du, X. B. Xu, K. P. Ma. ICGNet: Integration context-based reverse-contour guidance network for polyp segmentation. In Proceedings of the 31st International Joint Conference on Artificial Intelligence, Vienna, Austria, pp. 877–883, 2022. DOI: https://doi.org/10.24963/ijcai.2022/123.
T. Kim, H. Lee, D. Kim. UACANet: Uncertainty augmented context attention for polyp segmentation. In Proceedings of the 29th ACM International Conference on Multimedia, Cheng du, China, pp. 2167–2175, 2021. DOI: https://doi.org/10.1145/3474085.3475375.
J. G. B. Puyal, K. K. Bhatia, P. Brandao, O. F. Ahmad, D. Toth, R. Kader, L. Lovat, P. Mountney, D. Stoyanov. Endoscopic polyp segmentation using a hybrid 2D/3D CNN. In Proceedings of the 23rd International Conference on Medical Image Computing and Computer Assisted Intervention, Lima, Peru, pp. 295–305, 2020. DOI: https://doi.org/10.1007/978-3-030-59725-2_29.
G. P. Ji, Y. C. Chou, D. P. Fan, G. Chen, H. Z. Fu, D. Jha, L. Shao. Progressively normalized self-attention network for video polyp segmentation. In Proceedings of the 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Strasbourg, France, pp. 142–152, 2021. DOI: https://doi.org/10.1007/978-3-030-87193-2_14.
A. Kirillov, E. Mintun, N. Ravi, H. Z. Mao, C. Rolland, L. Gustafson, T. T. Xiao, S. Whitehead, A. C. Berg, W. Y. Lo, P. Dollar, R. Girshick. Segment anything, [Online], Available: https://arxiv.org/abs/2304.02643, 2023.
G. P. Ji, D. P. Fan, P. Xu, M. M. Cheng, B. W. Zhou, L. Van Gool. SAM struggles in concealed scenes - empirical study on “segment anything”, [Online], Available: https://arxiv.org/abs/2304.06022, 2023.
T. Zhou, Y. Z. Zhang, Y. Zhou, Y. Wu, C. Gong. Can SAM segment polyps? [Online], Available: https://arxiv.org/abs/2304.07583, 2023.
T. R. Chen, L. Y. Zhu, C. T. Ding, R. L. Cao, S. Z. Zhang, Y. Wang, Z. J. Li, L. Y. Sun, P. P. Mao, Y. Zang. SAM fails to segment anything?–SAM-adapter: Adapting SAM in underperformed scenes: Camouflage, shadow, and more, [Online], Available: https://arxiv.org/abs/2304.09148, 2023.
S. W. Chen, G. Urban, P. Baldi. Weakly supervised polyp segmentation in colonoscopy images using deep neural networks. Journal of Imaging, vol.8, no.5, Article number 121, 2022. DOI: https://doi.org/10.3390/jimaging8050121.
C. M. He, K. Li, Y. C. Zhang, G. X. Xu, L. X. Tang, Y. L. Zhang, Z. H. Guo, X. Li. Weakly-supervised concealed object segmentation with SAM-based pseudo labeling and multi-scale feature grouping, [Online], Available:https://arxiv.org/abs/2305.11003, 2023.
H. S. Wu, W. D. Xie, J. Y. Lin, X. R. Guo. ACL-net: Semisupervised polyp segmentation via affinity contrastive learning. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington DC, USA, pp. 2812–2820, 2023. DOI: https://doi.org/10.1609/aaai.v37i3.25382.
X. K. Zhao, C. W. Fang, D. J. Fan, X. T. Lin, F. Gao, G. B. Li. Cross-level contrastive learning and consistency constraint for semi-supervised medical image segmentation. In Proceedings of IEEE 19th International Symposium on Biomedical Imaging, Kolkata, India, 2022. DOI: https://doi.org/10.1109/ISBI52829.2022.9761710.
G. Y. Ren, M. Lazarou, J. Yuan, T. Stathaki. Towards automated polyp segmentation using weakly- and semi-supervised learning and deformable transformers. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Vancouver, Canada, pp. 4355–4364, 2023. DOI: https://doi.org/10.1109/CVPRW59228.2023.00458.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the 9th International Conference on Learning Representations, Vienna, Austria, 2021.
J. Bernal, J. Sánchez, F. Vilariño. Towards automatic polyp detection with a polyp appearance model. Pattern Recognition, vol.45, no.9, pp.3166–3182, 2012. DOI: https://doi.org/10.1016/j.patcog.2012.03.002.
D. Jha, P. H. Smedsrud, M. A. Riegler, P. Halvorsen, T. de Lange, D. Johansen, H. D. Johansen. Kvasir-SEG: A segmented polyp dataset. In Proceedings of the 26th International Conference on MultiMedia Modeling, Daejeon, Republic of Korea, pp. 451–462, 2020. DOI: https://doi.org/10.1007/978-3-030-37734-2_37.
J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, F. Vilariño. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized Medical Imaging and Graphics, vol.43, pp. 99–111, 2015. DOI: https://doi.org/10.1016/j.compmedimag.2015.02.007.
J. Silva, A. Histace, O. Romain, X. Dray, B. Granado. Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. International Journal of Computer Assisted Radiology and Surgery, vol. 9, no. 2, pp. 283–293, 2014. DOI: https://doi.org/10.1007/s11548-013-0926-3.
J. M. Han, Y. Q. Ren, J. Ding, X. J. Pan, K. Yan, G. S. Xia. Expanding low-density latent regions for open-set object detection. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, pp. 9581–9590, 2022. DOI: https://doi.org/10.1109/CVPR52688.2022.00937.
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, F. F. Li. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Miami, USA, pp. 248–255, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206848.
P. Bergmann, S. Löwe, M. Fauser, D. Sattlegger, C. Steger. Improving unsupervised defect segmentation by applying structural similarity to autoencoders. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Funchal, Portugal, pp. 372–380, 2019. DOI: https://doi.org/10.5220/0007364503720380.
Y. H. Chen, Y. Tian, G. S. Pang, G. Carneiro. Deep one-class classification via interpolated gaussian descriptor. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, pp. 383–392, 2022. DOI: https://doi.org/10.1609/aaai.v36il.19915.
J. Wyatt, A. Leach, S. M. Schmon, C. G. Willcocks. AnoDDPM: Anomaly detection with denoising diffusion probabilistic models using simplex noise. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, New Orleans, USA, pp. 649–655, 2022. DOI: https://doi.org/10.1109/CVPRW56347.2022.00080.
M. M. Cheng, D. P. Fan. Structure-measure: A new way to evaluate foreground maps. International Journal of Computer Vision, vol.129, no. 9, pp. 2622–2638, 2021. DOI: https://doi.org/10.1007/s11263-021-01490-8.
D. P. Fan, G. P. Ji, X. B. Qin, M. M. Cheng. Cognitive vision inspired object segmentation metric and loss function. Scientia Sínica Informationis, vol.51, no.9, pp. 1475–1489, 2021. DOI: https://doi.org/10.1360/SSI-2020-0370. (in Chinese)
T. H. Li, H. W. Chang, S. K. Mishra, H. Zhang, D. Katabi, D. Krishnan. MAGE: MAsked generative encoder to unify representation learning and image synthesis. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, pp. 2142–2152, 2023. DOI: https://doi.org/10.1109/CVPR52729.2023.00213.
D. P. Fan, G. P. Ji, P. Xu, M. M. Cheng, C. Sakaridis, L. Van Gool. Advances in deep concealed scene understanding. Visual Intelligence, vol. 1, no. 1, Article number 16, 2023. DOI: https://doi.org/10.1007/s44267-023-00019-6.
X. L. Wang, R. F. Zhang, C. H. Shen, T. Kong, L. Li. Dense contrastive learning for self-supervised visual pretraining. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, pp. 3023–3032, 2021. DOI: https://doi.org/10.1109/CVPR46437.2021.00304.
M. C. Zhuge, H. Z. Liu, F. Faccio, D. R. Ashley, R. Csordás, A. Gopalakrishnan, A. Hamdi, H. A. A. K. Hammoud, V. Herrmann, K. Irie, L. Kirsch, B. Li, G. H. Li, S. M. Liu, J. J. Mai, P. Piękos, A. Ramesh, I. Schlag, W. M. Shi, A. Stanić, W. Y. Wang, Y. H. Wang, M. M. Xu, D. P. Fan, B. Ghanem, J. Schmidhuber. Mindstorms in natural language-based societies of mind, [Online], Available: https://arxiv.org/abs/2305.17066, 2023.
Acknowledgements
The authors would like to thank the anonymous reviewers and editors for their helpful comments on this manuscript. Open Access funding enabled and organized by CAUL and its Member Institutions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declared that they have no conflicts of interest to this work.
Additional information
Colored figures are available in the online version at https://doi.org/https://link.springer.com/journal/11633
Ge-Peng Ji received the M. Sc. degree in communication and information systems at the School of Computer Science, Wuhan University, China in 2021. Currently, he is a Ph.D. degree candidate at the College of Engineering and Computer Science, Australian National University, Australia, supervised by Prof. Nick Barnes. He received the MICCAI Student Travel Award in 2021.
His research interest is computer vision, especially in a variety of dense prediction tasks, such as video segmentation, medical image segmentation, camouflaged object segmentation, and saliency detection.
Jing Zhang received the B. Sc. degree in electronical engineering and M. Sc. degree in signal and information processing from Department of Electronics and Information and Engineering in Northwestern Polytechnical University, China in 2007 and 2010, respectively, and Ph.D. degree in compuer vision from Australian National University, Australia in 2022. Currently, she is a lecturer with the College of Engineering and Computer Science, Australian National University, Australia.
Her research interest is the general area of saliency detection, particularly weakly supervised and unsupervised saliency detection, generative model-based probabilistic saliency detection, confidence estimation, and model calibration estimation.
Dylan Campbell received the B. Eng. degree in mechatronic engineering (Hons) from the University of New South Wales, Australia in 2012, and the Ph.D. degree in geometric vision problems from Australian National University, Australia in 2018, under the supervision of Lars Petersson, Laurent Kneip and Hongdong Li, supported by Data61/CSIRO. Currently, he is a lecturer in the School of Computing at the Australian National University, Australia. Previously, he was a research fellow of the Visual Geometry Group at the University of Oxford, where he was supervised by Andrea Vedaldi and João Henriques. Before that, he was a research fellow of the Australian Centre for Robotic Vision and ANU, where he was supervised by Stephen Gould.
His research interests include computer vision, optimisation, machine learning, and robotics, with particular expertise in 3D vision and optimisation for deep learning. He has investigated problems of geometric sensor data alignment (including camera localisation, simultaneous localisation and mapping, structure from motion, and optical flow), 3D representations (including neural radiance fields), and differentiable optimisation layers (inserting constrained optimisation problems into deep learning systems). Current topics of interest include discovering and exploiting symmetries in data to share information across long-range physically-motivated connections, and optimisation in deep learning for training neural networks efficiently with respect to time and the quantity of data.
Huan Xiong received the B.Sc. and M. Sc. degrees in mathematics from the School of Mathematical Sciences, Peking University, China in 2010 and 2013, respectively, and the Ph.D. degree in mathematics from the Institute of Mathematics, University of Zürich, Switzerland in 2016. Currently, he is an assistant professor with Mohamed bin Zayed University of Artificial Intelligence, UAE.
His research interests include machine learning and discrete mathematics.
Nick Barnes received the B.Sc. (Hons) and Ph.D. degrees in engineering and robotic vision from the University of Melbourne, Australia in 1994 and 1999, respectively. Currently, he is a professor with the School of Computing, Australian National University (ANU), Australia. He was a visiting researcher with the LIRA Lab at the University of Genoa, Italy in 1999, and a tenured lecturer with the University of Melbourne, Australia until 2003. He was then with NICTA, an ICT Centre of Excellence, from 2003–2016, where he was a senior principal researcher and was executive leader of the Computer Vision Research Group. He was with CSIRO from 2016–2019, where he led the Computer Vision Research Group. He has best paper awards and nominations including from CVPR, Robotics and Systems Science, IROS, MICCAI Wshp on Computer Assisted Endscopy, DICTA. He has multiple patents in the area of vision processing for prosthetic vision, which contributed the creation of the company Bionic Vision Technologies.
His research interests include weakly supervised dense prediction, 3D vision, and computer vision for prosthetic vision.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Ji, GP., Zhang, J., Campbell, D. et al. Rethinking Polyp Segmentation From An Out-of-distribution Perspective. Mach. Intell. Res. 21, 631–639 (2024). https://doi.org/10.1007/s11633-023-1472-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-023-1472-2