4.7 Article Data Paper

A multi-centre polyp detection and segmentation dataset for generalisability assessment

Journal

SCIENTIFIC DATA
Volume 10, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41597-023-01981-y

Keywords

-

Ask authors/readers for more resources

Polyps in the colon are precursors to colon cancer and can be identified through colonoscopy. Automatic detection and segmentation methods have been developed, but they lack rigorous testing on a large dataset. To address this, a team has curated a dataset from six centers, including over 300 patients, with precise annotated polyp labels. This dataset, called PolypGen, is the most comprehensive in detection and segmentation and provides insight into data construction and validation.
Polyps in the colon are widely known cancer precursors identified by colonoscopy. Whilst most polyps are benign, the polyp's number, size and surface structure are linked to the risk of colon cancer. Several methods have been developed to automate polyp detection and segmentation. However, the main issue is that they are not tested rigorously on a large multicentre purpose-built dataset, one reason being the lack of a comprehensive public dataset. As a result, the developed methods may not generalise to different population datasets. To this extent, we have curated a dataset from six unique centres incorporating more than 300 patients. The dataset includes both single frame and sequence data with 3762 annotated polyp labels with precise delineation of polyp boundaries verified by six senior gastroenterologists. To our knowledge, this is the most comprehensive detection and pixel-level segmentation dataset (referred to as PolypGen) curated by a team of computational scientists and expert gastroenterologists. The paper provides insight into data construction and annotation strategies, quality assurance, and technical validation.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available