4.4 Article

IPDS: A semantic mediator-based system using Spark for the integration of heterogeneous proteomics data sources

Journal

Publisher

WILEY
DOI: 10.1002/cpe.5814

Keywords

Apache spark; mediator; ontology; proteomics; semantic data integration

Ask authors/readers for more resources

The article discusses the development of biomedical data integration systems, focusing on the features and advantages of the IPDS system, as well as its application cases. By introducing the development and application of the IPDS system, it demonstrates the potential of the system in biomedical data integration.
With the constant rise of data volumes in many disciplines, various new Big data management systems have emerged to provide scalable tools for efficient data integration, processing, and analysis. In this article, we provide an overview of biomedical data integration systems focusing on ontology-based semantic systems and Big data technologies based systems such as Apache Spark. We also propose a new semantic data integration system, called Integrated Proteomics Data System (IPDS), which uses a mediator approach. IPDS provides users a unified interface for query processing and data exploration. This system takes advantage of the Apache Spark framework to perform the query transformation and execution needed to question the integrated data sources. We develop a domain ontology that allows the user to formulate its queries in terms defined in the ontology. IPDS is a case study of semantic proteomics data integration linking four data sources UniProt (protein annotation), String (protein-protein interaction), PDB (protein structure), and Pubmed (biomedical citation).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.4
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available