期刊
BIOINFORMATICS
卷 26, 期 17, 页码 2192-2194出版社
OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btq346
关键词
-
Large volumes of data generated by high-throughput sequencing instruments present non-trivial challenges in data storage, content access and transfer. We present G-SQZ, a Huffman coding-based sequencing-reads-specific representation scheme that compresses data without altering the relative order. G-SQZ has achieved from 65% to 81% compression on benchmark datasets, and it allows selective access without scanning and decoding from start. This article focuses on describing the underlying encoding scheme and its software implementation, and a more theoretical problem of optimal compression is out of scope. The immediate practical benefits include reduced infrastructure and informatics costs in managing and analyzing large sequencing data.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据