4.6 Article

Updated benchmarking of variant effect predictors using deep mutational scanning

Journal

MOLECULAR SYSTEMS BIOLOGY
Volume 19, Issue 8, Pages -

Publisher

WILEY
DOI: 10.15252/msb.202211474

Keywords

Benchmark; Circularity; DMS; MAVE; VEP

Ask authors/readers for more resources

This study evaluates 55 different Variant Effect Predictor (VEP) using independently generated protein function measurements from deep mutational scanning (DMS) experiments for 26 human proteins, while minimizing data circularity. The top-performing VEPs are mostly unsupervised methods including EVE, DeepSequence, and the protein language model ESM-1v. However, recent supervised VEPs like VARITY also show strong performance, indicating a serious consideration of data circularity and bias issues by developers. The assessment of DMS and unsupervised VEPs for variant classification is mixed, with some DMS datasets performing exceptionally well while others perform poorly. Notably, a strong correlation is observed between VEP agreement with DMS data and the ability to identify clinically relevant variants, supporting the validity of rankings and the utility of DMS for independent benchmarking.
The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by benchmarking against clinical observations. In this study, building on our previous work, we use independently generated measurements of protein function from deep mutational scanning (DMS) experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data circularity. Many top-performing VEPs are unsupervised methods including EVE, DeepSequence and ESM-1v, a protein language model that ranked first overall. However, the strong performance of recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for discriminating between known pathogenic and putatively benign missense variants. Our findings are mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data and performance in identifying clinically relevant variants, strongly supporting the validity of our rankings and the utility of DMS for independent benchmarking.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available