4.6 Article

Enhancing genomic mutation data storage optimization based on the compression of asymmetry of sparsity

Related references

Note: Only part of the references are listed.
Article Multidisciplinary Sciences

Integrating multi-type aberrations from DNA and RNA through dynamic mapping gene space for subtype-specific breast cancer driver discovery

Jianing Xi et al.

Summary: Driver event discovery is important for breast cancer diagnosis and therapy, especially in determining subtype-specific drivers for personalized biomarker discovery and precision treatment. However, most existing studies mainly focus on DNA aberrations and gene interactions, and the integration of multi-type aberrations from both DNA and RNA remains a challenge for breast cancer drivers.

PEERJ (2023)

Article Biology

An omics-to-omics joint knowledge association subtensor model for radiogenomics cross-modal modules from genomics and ultrasonic images of breast cancers

Jianing Xi et al.

Summary: Radiogenomics analysis can infer the genomic features of tumors from their radiogenomic associations through low-cost and non-invasive screening ultrasonic images, providing connections between genomics and radiomics. Existing studies mainly focus on the relationship between ultrasonic features and popular cancer genes, but overlook the many-to-many relationships and sample associations with tumor heterogeneity. To address these challenges, we propose an omics-to-omics joint knowledge association subtensor model that discovers cross-modal modules and identifies sample subgroups. Experimental results demonstrate the jointness of discovered modules, their association with tumorigenesis contribution, and their relation to cancer-related functions. In conclusion, our proposed model can effectively facilitate radiogenomic knowledge associations and promote the construction of explainable AI cancer diagnosis.

COMPUTERS IN BIOLOGY AND MEDICINE (2023)

Article Computer Science, Artificial Intelligence

Sparse Tensor-Based Multiscale Representation for Point Cloud Geometry Compression

Jianqiang Wang et al.

Summary: This study develops a unified Point Cloud Geometry (PCG) compression method called SparsePCGC, which is a low complexity solution that only performs convolutions on sparsely-distributed Most-Probable Positively-Occupied Voxels (MP-POV). The proposed method utilizes multiscale representation to compress scale-wise MP-POVs by exploiting cross-scale and same-scale correlations. It also introduces Sparse Convolution-based Neural Network (SparseCNN), Occupancy Probability Approximation (SOPA) model, and Local Neighborhood Embedding (SLNE) to improve compression performance.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2023)

Article Genetics & Heredity

Consistent count region-copy number variation (CCR-CNV): an expandable and robust tool for clinical diagnosis of copy number variation at the exon level using next-generation sequencing data

Man Jin Kim et al.

Summary: This study developed a novel diagnostic tool, CCR-CNV, for the identification of exonic copy number variations (CNVs) with high confidence. Compared to other well-known CNV tools, CCR-CNV showed higher sensitivity and specificity. However, its low covered region, positive predictive value, and high false discovery rate limit its use in clinical settings. The combined use of CCR-CNV with existing CNV tools improves its performance.

GENETICS IN MEDICINE (2022)

Article Computer Science, Hardware & Architecture

An effective SPMV based on block strategy and hybrid compression on GPU

Huanyu Cui et al.

Summary: A new matrix compression method PBC algorithm is proposed in this study, which achieves efficient computation of SPMV by considering load balancing conditions and preprocessing the original matrix in CSR and COO formats. Experimental results show that the PBC algorithm has a significant efficiency improvement compared to the comparison algorithm, with a noticeable acceleration ratio.

JOURNAL OF SUPERCOMPUTING (2022)

Article Genetics & Heredity

JAX-CNV: A Whole-genome Sequencing-based Algorithm for Copy Number Detection at Clinical Grade Level

Wan-Ping Lee et al.

Summary: This study aimed to develop a CNV calling algorithm based on whole-genome sequencing that could replace the use of chromosomal microarray assay (CMA) in clinical diagnosis. The algorithm, called JAX-CNV, demonstrated excellent performance, with a false discovery rate of 4% and the ability to detect CNVs even at low coverage.

GENOMICS PROTEOMICS & BIOINFORMATICS (2022)

Article Agriculture, Dairy & Animal Science

CNV detection and their association with growth, efficiency and carcass traits in Santa Ines sheep

Giovanni Coelho Ladeira et al.

Summary: This study investigated the association between CNV and growth, efficiency, and carcass traits in Santa Ines sheep. Significant CNV segments were found to be associated with carcass yield and residual feed intake. The study provides valuable information about CNV in sheep and its impact on productive traits.

JOURNAL OF ANIMAL BREEDING AND GENETICS (2022)

Article Biochemical Research Methods

ACO:lossless quality score compression based on adaptive coding order

Yi Niu et al.

Summary: With the rapid development of high-throughput sequencing technology, the cost of whole genome sequencing drops rapidly, contributing to the exponential growth of genome data. This paper proposes a novel lossless quality score compressor based on adaptive coding order (ACO) to address the challenge of efficiently compressing quality score data. ACO achieves state-of-the-art compression performances with moderate complexity for next-generation sequencing (NGS) data.

BMC BIOINFORMATICS (2022)

Article Biochemical Research Methods

CMIC: an efficient quality score compressor with random access functionality

Hansen Chen et al.

Summary: CMIC is an adaptive and random access supported compressor for lossless compression of quality score sequences. Our experiments show that our compressor has good performance in terms of compression rates on all the tested datasets. The file sizes can be reduced by up to 21.91% compared with LCQS. In terms of compression speed, CMIC is better than all other compressors on most of the tested cases. In terms of random access speed, CMIC is faster than LCQS, which provides a random access function for compressed quality scores.

BMC BIOINFORMATICS (2022)

Article Biochemical Research Methods

SparkGC: Spark based genome compression for large collections of genomes

Haichang Yao et al.

Summary: Since the completion of the Human Genome Project, there has been a significant increase in sequencing data, making it difficult to store and process. This study proposes a new genome compression method called SparkGC, which uses Apache Spark and in-memory computation to efficiently compress and handle large genomic datasets.

BMC BIOINFORMATICS (2022)

Article Computer Science, Software Engineering

An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU

Longyue Xing et al.

Summary: The performance of sparse stiffness matrix-vector multiplication is crucial for large-scale structural mechanics numerical simulation. This article introduces a new CSR-vector row algorithm that achieves fine-grained computing optimization for sparse stiffness matrices on AMD GPUs, demonstrating efficient reduce operations and deep memory access optimization, resulting in improved computing performance.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2022)

Article Genetics & Heredity

DETexT: An SNV detection enhancement for low read depth by integrating mutational signatures into TextCNN

Tian Zheng

Summary: Detecting SNV at low read depths is crucial for reducing sequencing requirements and costs, and aiding in early cancer screening, diagnosis, and treatment. In this study, we present DETexT, an SNV detection method specifically designed for low read depths. By combining mutational signature with deep learning algorithms, DETexT is able to classify false positive variants by mining correlation information around bases in individual reads.

FRONTIERS IN GENETICS (2022)

Article Biology

Cancer classification based on multiple dimensions: SNV patterns

Bo Li et al.

Summary: This study proposes a method to classify cancer based on multidimensional SNV features. By analyzing SNVs in cancer samples, the extracted features exhibit similar distribution patterns in the cluster centers of each cancer type. The classification accuracy using the KNN algorithm reaches approximately 97%, with the potential for oncogene discovery. The validated oncogenes in the identified features have the highest importance among the 8 cancers.

COMPUTERS IN BIOLOGY AND MEDICINE (2022)

Article Biotechnology & Applied Microbiology

Comprehensive characterization of copy number variation (CNV) called from array, long- and short-read data

Ksenia Lavrichenko et al.

Summary: Different high-throughput genome technologies have varying abilities in detecting copy number variants (CNVs), with long-read platforms being able to detect CNVs in genomic regions inaccessible to arrays or short reads. The reproducibility of CNV detection within each technology is strongly linked to other CNV evidence measures. Additionally, the public database frequency profiles for CNVs vary depending on the technology the database was built on.

BMC GENOMICS (2021)

Article Genetics & Heredity

Improved SNV Discovery in Barcode-Stratified scRNA-seq Alignments

N. M. Prashant et al.

Summary: The study shows that variant calls from individual cell alignments can identify a higher number of SNVs, which are enriched in novel variants including stop-codon and missense substitutions.

GENES (2021)

Article Biochemical Research Methods

Fast numerical optimization for genome sequencing data in population biobanks

Ruilin Li et al.

Summary: This paper addresses the computational challenges posed by large-scale and high-dimensional genome sequencing data, and develops two efficient solvers for optimization problems in this context. By utilizing a two-bit representation for genetic matrices, the memory requirement is reduced and computational speed is improved. The proposed methods successfully solve Lasso, group Lasso, linear, logistic, and Cox regression problems on sparse genetic matrices within 10 minutes using less than 32GB of memory.

BIOINFORMATICS (2021)

Article Biotechnology & Applied Microbiology

Exploiting genomic synteny in Felidae: cross-species genome alignments and SNV discovery can aid conservation management

Georgina Samaha et al.

Summary: By using cross-species genome alignment methods, the study successfully identified a large number of variants in cheetah, snow leopard, and Sumatran tiger relative to the domestic cat reference assembly. These variants provided insights into population structure, adaptive traits, evolutionary history, and pathogenesis of heritable diseases. The high degree of synteny among felid genomes allowed for reliable SNV detection and highlighted the potential for improving conservation outcomes.

BMC GENOMICS (2021)

Article Biotechnology & Applied Microbiology

CNproScan: Hybrid CNV detection for bacterial genomes

Robin Jugas et al.

Summary: CNV detection in bacteria is less focused on compared to eukaryotes, but with increasing interest due to challenges in bacterial drug resistance. CNproScan is a bacterial genome CNV detection method that can detect shorter events and provide classification, showing improvements over existing methods.

GENOMICS (2021)

Article Genetics & Heredity

CNV-MEANN: A Neural Network and Mind Evolutionary Algorithm-Based Detection of Copy Number Variations From Next-Generation Sequencing Data

Tihao Huang et al.

Summary: The research proposed an improved CNV detection method CNV-MEANN, which adjusts the neural network structure, utilizes a new feature mapping quality, considers the impact of CNV loss categories on disease prediction, and optimizes the neural network model using a mind evolutionary algorithm, successfully improving the performance of CNV detection methods.

FRONTIERS IN GENETICS (2021)

Article Biochemical Research Methods

HetRCNA: A Novel Method to Identify Recurrent Copy Number Alternations from Heterogeneous Tumor Samples Based on Matrix Decomposition Framework

Jianing Xi et al.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2020)

Article Biochemistry & Molecular Biology

The International Genome Sample Resource (IGSR) collection of open human genomic variation resources

Susan Fairley et al.

NUCLEIC ACIDS RESEARCH (2020)

Article Multidisciplinary Sciences

Pan-cancer analysis of whole genomes

Peter J. Campbell et al.

NATURE (2020)

Article Biochemical Research Methods

LCQS: an efficient lossless compression tool of quality scores with random access functionality

Jiabing Fu et al.

BMC BIOINFORMATICS (2020)

Review Cardiac & Cardiovascular Systems

Prognostic implications of programmed death ligand 1 expression in resected lung adenocarcinoma: a systematic review and meta-analysis

Donglai Chen et al.

EUROPEAN JOURNAL OF CARDIO-THORACIC SURGERY (2020)

Article Medicine, Research & Experimental

A literature-based approach for curating gene signatures in multifaceted diseases

Mathieu Garand et al.

JOURNAL OF TRANSLATIONAL MEDICINE (2020)

Article Computer Science, Hardware & Architecture

Balancing Computation Loads and Optimizing Input Vector Loading in LSTM Accelerators

Junki Park et al.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS (2020)

Article Physics, Multidisciplinary

Bayesian Compressive Sensing of Sparse Signals with Unknown Clustering Patterns

Mohammad Shekaramiz et al.

ENTROPY (2019)

Review Genetics & Heredity

Substitutions Are Boring: Some Arguments about Parallel Mutations and High Mutation Rates

Maximilian Oliver Press et al.

TRENDS IN GENETICS (2019)

Article Computer Science, Theory & Methods

Huffman Coding

Alistair Moffat

ACM COMPUTING SURVEYS (2019)

Article Biotechnology & Applied Microbiology

Evaluating the quality of the 1000 genomes project data

Saurabh Belsare et al.

BMC GENOMICS (2019)

Article Biochemistry & Molecular Biology

SNV identification from single-cell RNA sequencing data

Patricia M. Schnepp et al.

HUMAN MOLECULAR GENETICS (2019)

Article Genetics & Heredity

Human mitochondrial genome compression using machine learning techniques

Rongjie Wang et al.

HUMAN GENOMICS (2019)

Article Computer Science, Software Engineering

An efficient SIMD compression format for sparse matrix-vector multiplication

Xinhai Chen et al.

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE (2018)

Article Multidisciplinary Sciences

Earth BioGenome Project: Sequencing life for the future of life

Harris A. Lewin et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2018)

Article Biochemical Research Methods

Discovering Recurrent Copy Number Aberrations in Complex Patterns via Non-Negative Sparse Singular Value Decomposition

Jianing Xi et al.

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2016)

Editorial Material Multidisciplinary Sciences

The Genome Project-Write

Jef D. Boeke et al.

SCIENCE (2016)

Review Genetics & Heredity

Sequencing Structural Variants in Cancer for Precision Therapeutics

Geoff Macintyre et al.

TRENDS IN GENETICS (2016)

Article Biochemical Research Methods

QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles

Koen van der Borght et al.

BMC BIOINFORMATICS (2015)

Article Mathematics, Applied

COMPRESSED MULTIROW STORAGE FORMAT FOR SPARSE MATRICES ON GRAPHICS PROCESSING UNITS

Zbigniew Koza et al.

SIAM JOURNAL ON SCIENTIFIC COMPUTING (2014)

Article Multidisciplinary Sciences

A public resource facilitating clinical use of genomes

Madeleine P. Ball et al.

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA (2012)

Review Medicine, Research & Experimental

Structural Variation in the Human Genome and its Role in Disease

Pawel Stankiewicz et al.

ANNUAL REVIEW OF MEDICINE (2010)

Review Biochemical Research Methods

Computational methods for discovering structural variation with next-generation sequencing

Paul Medvedev et al.

NATURE METHODS (2009)

Article Engineering, Biomedical

An ECG signals compression method and its validation using NNs

Catalina Monica Fira et al.

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING (2008)

Article Biotechnology & Applied Microbiology

Further understanding human disease genes by comparing with housekeeping genes and other genes

ZD Tu et al.

BMC GENOMICS (2006)