4.7 Article

CpG-island-based annotation and analysis of human housekeeping genes

Journal

BRIEFINGS IN BIOINFORMATICS
Volume 22, Issue 1, Pages 515-525

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbz134

Keywords

CpG island density; housekeeping genes; genome annotation; statistical genetics; genome analysis

Funding

  1. National Natural Science Foundation of China [61372138]
  2. National Science and Technology Major Project [2018ZX10201002]

Ask authors/readers for more resources

By reviewing previous studies on CpG-related genes, it is suggested that approximately half of human genes, particularly housekeeping genes, are regulated by CpG islands (CGIs). However, the specific definition of CGIs, their positioning in gene structures, and the regulatory mechanisms associated with CGIs require further explanation. Combining different analysis methods, it has been found that genes associated with high CpG density are more likely to be housekeeping genes, while the characteristics of genes with intermediate CpG density are less distinct.
By reviewing previous CpG-related studies, we consider that the transcription regulation of about half of the human genes, mostly housekeeping (HK) genes, involves CpG islands (CGIs), their methylation states, CpG spacing and other chromosomal parameters. However, the precise CGI definition and positioning of CGIs within gene structures, as well as specific CGI-associated regulatory mechanisms, all remain to be explained at individual gene and gene-family levels, together with consideration of species and lineage specificity. Although previous studies have already classified CGIs into high-CpG (HCGI), intermediate-CpG (ICGI) and low-CpG (LCGI) densities based on CpG density variation, the correlation between CGI density and gene expression regulation, such as co-regulation of CGIs and TATA box on HK genes, remains to be elucidated. First, this study introduces such a problem-solving protocol for human-genome annotation, which is based on a combination of GTEx, JBLA and Gene Ontology (GO) analysis. Next, we discuss why CGI-associated genes are most likely regulated by HCGI and tend to be HK genes; the HCGI/TATA +/- and LCGI/TATA +/- combinations show different GO enrichment, whereas the ICGI/TATA +/- combination is less characteristic based on GO enrichment analysis. Finally, we demonstrate that Hadoop MapReduce-based MR-JBLA algorithm is more efficient than the original JBLA in k-mer counting and CGI-associated gene analysis.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available