4.7 Article

CEPZ: A Novel Predictor for Identification of DNase I Hypersensitive Sites

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TCBB.2021.3053661

Keywords

DNA; Feature extraction; Proteins; Genomics; Encoding; Strain; DNase I hypersensitive sites; XGBoost; dinucleotide; human genome

Funding

  1. National Natural Science Foundation of China [61772362, 61902271, 61972280]
  2. National Key R&D Program of China [2020YFA0908400]

Ask authors/readers for more resources

The study emphasized the importance of identifying DNase I hypersensitive sites (DHSs) using computational techniques based on composition information and physicochemical properties. By enhancing the feature selection model CEPZ, the research achieved significant improvements in accuracy and Matthews correlation coefficient, indicating its potential as a valuable tool for future DHS research.
DNase I hypersensitive sites (DHSs) have proven to be tightly associated with cis-regulatory elements, commonly indicating specific function on the chromatin structure. Thus, identifying DHSs plays a fundamental role in decoding gene regulatory behavior. While traditional experimental methods turn to be time-consuming and expensive, computational techniques promise to be practical to discovering and analyzing regulatory factors. In this study, we applied an efficient model that considered composition information and physicochemical properties and effectively selected features with a boosting algorithm. CEPZ, our predictor, greatly improved a Matthews correlation coefficient and accuracy of 0.7740 and 0.9113 respectively, more competitive than any predictor before. This result suggests that it may become a useful tool for DHSs research in the human and other complex genomes. Our research was anchored on the properties of dinucleotides and we identified several dinucleotides with significant differences in the distribution of DHS and non-DHS samples, which are likely to have a special meaning in the chromatin structure. The datasets, feature sets and the relevant algorithm are available at https://github.com/YanZheng-16/CEPZ_DHS/.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available