4.6 Article

Spark-Based Parallel Deep Neural Network Model for Classification of Large Scale RNAs into piRNAs and Non-piRNAs

Journal

IEEE ACCESS
Volume 8, Issue -, Pages 136978-136991

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2020.3011508

Keywords

Computational modeling; RNA; Biological system modeling; Data models; Mathematical model; Feature extraction; Sparks; Deep neural network; spark; big data; piRNA; classification algorithm; artificial intelligence

Ask authors/readers for more resources

With recent advancement in computational biology, high throughput next generation sequencing technology has become a de facto standard technology for genes expression studies including DNAs, RNAs and proteins. As a promising technology, it has significant impact on medical sciences and genomic research. However, it generates several millions of short DNA and RNA sequences with several petabytes size in single run. In addition, the raw sequencing datasets such as RNAs are increasing exponentially leading to a big data analytics issue in computational biology. Due to the explosive growth of RNA sequences, the timely classification of RNAs sequence into piRNAs and non-piRNAs have become a challenging issue for traditional technology and conventional machine learning algorithms. Parallel and distributed computing models along with deep neural network have become a major computing platform for big data analytics now required in the field of computational biology. This paper presents a computational model based on parallel deep neural network for timely classification of large number of RNAs sequence into piRNAs and non-piRNAs, taking advantages of parallel and distributed computing platform. The performance of the proposed model was extensively evaluated using two-fold performance metrics. In the first fold, the performance of the proposed model was assessed using accuracy-based metrics such as accuracy, specificity, sensitivity and Matthews's correlation coefficient. In the second fold, computational-based metrics such as computation times, speedup and scalability were observed. Moreover, initially the performance of the proposed model was assessed using real benchmark dataset and subsequently the performance was assessed using replicated benchmark dataset. The evaluation results in both cases showed that the proposed model improved computation speedup in order of magnitude in comparison with sequential approach without affected accuracy level.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available