☆ 4.7 Article

Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach

SCIENTIFIC DATA (2018)

期刊

SCIENTIFIC DATA

卷 5, 期 -, 页码 -

出版社

NATURE PUBLISHING GROUP

DOI: 10.1038/sdata.2018.293

关键词

类别

Multidisciplinary Sciences

资金

National Science Foundation
National Institute of General Medical Sciences
National Cancer Institute
Department of Energy [NSF-DBI 1338415]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Outlier analyses are central to scientific data assessments. Conventional outlier identification methods do not work effectively for Protein Data Bank (PDB) data, which are characterized by heavy skewness and the presence of bounds and/or long tails. We have developed a data-driven nonparametric method to identify outliers in PDB data based on kernel probability density estimation. Unlike conventional outlier analyses based on location and scale, Probability Density Ranking can be used for robust assessments of distance from other observations. Analyzing PDB data from the vantage points of probability and frequency enables proper outlier identification, which is important for quality control during deposition-validation-biocuration of new three-dimensional structure data. Ranking of Probability Density also permits use of Most Probable Range as a robust measure of data dispersion that is more compact than Interquartile Range. The Probability-Density-Ranking approach can be employed to analyze outliers and data-spread on any large data set with continuous distribution.

Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach

期刊

SCIENTIFIC DATA

出版社

NATURE PUBLISHING GROUP

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Outlier analyses of the Protein Data Bank archive using a probability-density-ranking approach

期刊

SCIENTIFIC DATA

出版社

NATURE PUBLISHING GROUP

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文