Article Text
Abstract
Background Explaining the genetics of many diseases is challenging because most associations localize to incompletely understood regulatory regions.
Methods We show that transcription factors (TFs) occupy multiple loci of individual complex genetic disorders much more than expected by chance using novel computational methods.
Results Application to 213 phenotypes and 1,544 TF binding datasets identifies 2,264 relationships between hundreds of TFs and 94 phenotypes, including AR in prostate cancer and GATA3 in breast cancer. Strikingly, nearly half of the systemic lupus erythematosus risk loci are occupied by the Epstein-Barr virus (EBV) Nuclear Antigen 2 (EBNA2) protein (OR=6, P<10E-24 after Bonferroni correction), which co-clusters with a sub-set (<60) human TFs, revealing gene-environment interaction, and identifying the EBV transformed B cell as a putative site for some of the genetic mechanisms altering disease risk. Analogous EBNA2-anchored associations exist in multiple sclerosis, rheumatoid arthritis, inflammatory bowel disease, type 1 diabetes, juvenile idiopathic arthritis, and celiac disease. Instances of allele-dependent DNA binding with downstream effects on gene expression at plausibly causal variants are consistent with EBNA2 dependent genetic mechanisms.
Conclusions Our results nominate mechanisms that operate across risk loci within disease phenotypes; they suggest new paradigms for disease origin and strongly support a role for Epstein-Barr virus in the generation of systemic lupus erythematosus, as well as of particular other autoimmune diseases, apparently related to lupus by the genomic mechanisms that produce them.