4.6 Article

MSIFinder: a python package for detecting MSI status using random forest classifier

Journal

BMC BIOINFORMATICS
Volume 22, Issue 1, Pages -

Publisher

BMC
DOI: 10.1186/s12859-021-03986-z

Keywords

Microsatellite instability; Genome sequencing; Machine learning technology; Random forest classifier; Immunotherapy

Ask authors/readers for more resources

MSIFinder, a Python package using random forest classifier (RFC)-based genome sequencing, is a robust and effective tool for MSI classification, achieving high sensitivity and specificity in detecting MSI, regardless of sequencing depth and panel size influences.
Background Microsatellite instability (MSI) is a common genomic alteration in colorectal cancer, endometrial carcinoma, and other solid tumors. MSI is characterized by a high degree of polymorphism in microsatellite lengths owing to the deficiency in the mismatch repair system. Based on the degree, MSI can be classified as microsatellite instability-high (MSI-H) and microsatellite stable (MSS). MSI is a predictive biomarker for immunotherapy efficacy in advanced/metastatic solid tumors, especially in colorectal cancer patients. Several computational approaches based on target panel sequencing data have been used to detect MSI; however, they are considerably affected by the sequencing depth and panel size. Results We developed MSIFinder, a python package for automatic MSI classification, using random forest classifier (RFC)-based genome sequencing, which is a machine learning technology. We included 19 MSI-H and 25 MSS samples as training sets. First, we selected 54 feature markers from the training sets, built an RFC model, and validated the classifier using a test set comprising 21 MSI-H and 379 MSS samples. With this test set, MSIFinder achieved a sensitivity (recall) of 1.0, a specificity of 0.997, an accuracy of 0.998, a positive predictive value of 0.954, an F1 score of 0.977, and an area under the curve of 0.999. To further verify the robustness and effectiveness of the model, we used a prospective cohort consisting of 18 MSI-H samples and 122 MSS samples. MSIFinder achieved a sensitivity (recall) of 1.0 and a specificity of 1.0. We discovered that MSIFinder is less affected by a low sequencing depth and can achieve a concordance of 0.993 while exhibiting a sequencing depth of 100x. Furthermore, we realized that MSIFinder is less affected by the panel size and can achieve a concordance of 0.99 when the panel size is 0.5 M (million bases). Conclusion These results indicate that MSIFinder is a robust and effective MSI classification tool that can provide reliable MSI detection for scientific and clinical purposes.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available