☆ 4.7 Article

A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification

BIOINFORMATICS (2022)

Journal

BIOINFORMATICS

Volume 38, Issue 4, Pages 1022-1029

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btab763

Keywords

Funding

Swedish Cancer Society [CAN 2020/709, 2019/51]
Lund Medical Faculty (ALF)
Skane University Hospital Research Funds
Skane County Council's Research and Development Foundation [REGSKANE-821461]
Cancer Research Fund at Malmo General Hospital
Mrs. Berta Kamprad's Cancer Foundation [FBKS-2019-35]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this study, we evaluated the behavior of gene-pair-based single sample predictors (SSPs) in gene expression-based multiclass prediction tasks. We found that these methods showed excellent performance, with high accuracy and informative prediction scores, in many expression-based classification tasks. However, the compatibility of gene-pair-based predictors with new datasets still needs to be verified.

Motivation: Gene expression-based multiclass prediction, such as tumor subtyping, is a non-trivial bioinformatic problem. Most classifier methods operate by comparing expression levels relative to other samples. Methods that base predictions on the expression pattern within a sample have been proposed as an alternative. As these methods are invariant to the cohort composition and can be applied to a sample in isolation, they can collectively be termed single sample predictors (SSP). Such predictors could potentially be used for preprocessing-free classification of new samples and be built to function across different expression platforms where proper batch and dataset normalization is challenging. Here, we evaluate the behavior of several multiclass SSPs based on binary gene-pair rules (k-Top Scoring Pairs, Absolute Intrinsic Molecular Subtyping and a new Random Forest approach) and compare them to centroids built with centered or raw expression values, with the criteria that an optimal predictor should have high accuracy, overcome differences in tumor purity, be robust across expression platforms and provide an informative prediction output score. Results: We found that gene-pair-based SSPs showed excellent performance on many expression-based classification tasks. The three methods differed in prediction score output, handling of tied scores and behavior in low purity samples. The k-Top Scoring Pairs and Random Forest approach both achieved high classification accuracy while providing an informative prediction score. Although gene-pair-based SSPs have been touted as being crossplatform compatible (through training on mixed platform data), out-of-the-box compatibility with a new dataset remains a potential issue that warrants cohort-to-cohort verification.

A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A comparison of rule-based and centroid single-sample multiclass predictors for transcriptomic classification

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper