Junk DNA, black-box DNA, repeat elements, transposons, endogenous retroviruses, long non-coding RNAs are just some of the names for and elements of non-protein coding DNA that make up 99% of the human genome. Aside from structural elements such as telomeres, many “Junk DNA” elements are involved in gene expression regulation such as promoters, enhancers, and various types of regulatory RNAs, in statistical parlance they are called expression quantitative trait loci (eQTLs).
Unsurprisingly given the ratio of protein coding: noncoding DNA, most single nucleotide polymorphisms (SNPs) are found in non-coding DNA, but it is often difficult to predict their mechanistic effect if any.
Many promoters, enhancers and regulatory RNAs are modulated by epigenetic mechanisms such as methylation or histone tagging. The authors of a new preprint released on bioRxiv with senior author Hongyu Zhao of Yale School of Public Health, Connecticut, reasoned that SNPs that were annotated with epigenetic mechanisms were more likely to drive gene expression changes, and therefore have an increased probability of producing an observed phenotype. They developed a new Genome Wide Association (GWAS) method, called T-GEN (Transcriptome-mediated identification of disease-associated Genes with Epigenetic aNnotation), that uses tissue-specific epigenetic information to identify disease-associated genes.
Data from The Roadmap Epigenomics Project which analyzed samples taken directly from human tissues and cells; embryonic and adult, diseased and healthy as well as and the Genotype-Tissue Expression Project (GTEx) eQTL database was directly incorporated into the T-GEN model. After generating tissue-specific gene expression imputation models, these were combined with GWAS summary statistics to identify gene-level associations with disease phenotypes.
Using T-GEN, analysis was performed on data from LD Hub, a centralised database of summary-level GWAS results. As a proof of concept, when T-GEN was applied to late-onset Alzheimer’s disease, 96 genes influenced by 15 loci, including two novel loci which influence COG4 and TMEM135 were revealed. COG4 encodes a Golgi protein. TMEM135 is a transmembrane protein and regulator of mitochondrial dynamics and cholesterol transportation. Five loci were in brain tissues and four in whole blood.
The T-GEN method is limited by the fact that it only considers cis (adjacent) gene regulatory elements. The model cannot determine causal relationships. For example, changes in TMEM125 expression could be caused by Alzheimer’s, rather than driving the development of the disease.
“In this paper, we have introduced a new method called T-GEN, which leverages epigenetic signals to improve gene expression imputation and identify trait-associated genes. Different from previous methods, T-GEN uses data from GTEx and Roadmap Epigenomics Project to prioritize SNPs with active epigenetic annotations for gene expression imputation,” stated the authors.