4.0 Article

FILER: a framework for harmonizing and querying large-scale functional genomics knowledge

期刊

NAR GENOMICS AND BIOINFORMATICS
卷 4, 期 1, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/nargab/lqab123

关键词

-

资金

  1. National Institute on Aging [U24-AG041689, U54-AG052427, U01-AG032984]
  2. Biomarkers Across Neurodegenerative Diseases (BAND 3) [18062]
  3. Michael J Fox Foundation
  4. Alzheimer's Association
  5. Alzheimer's Research UK
  6. Weston Brain Institute

向作者/读者索取更多资源

Querying and summarizing large-scale functional genomic and annotation data collections is a crucial step in genetic analysis. However, the heterogeneity and breadth of data sources and formats make this process difficult. FILER is a framework that provides streamlined access to harmonized genomic datasets, a scalable querying interface, and the ability to analyze user's experimental data. This resource is highly scalable and facilitates reproducible research.
Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 x 10(9) hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.0
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据