With the increasing incidence of prostate cancer, identifying common genetic variants that confer risk of the disease is important. Here we report such a variant on chromosome 8q24, a region initially identified through a study of Icelandic families. Allele -8 of the microsatellite DG8S737 was associated with prostate cancer in three case-control series of European ancestry from Iceland, Sweden and the US. The estimated odds ratio (OR) of the allele is 1.62 (P = 2.7 x 10(-11)). About 19% of affected men and 13% of the general population carry at least one copy, yielding a population attributable risk (PAR) of approximately 8%. The association was also replicated in an African American case-control group with a similar OR, in which 41% of affected individuals and 30% of the population are carriers. This leads to a greater estimated PAR (16%) that may contribute to higher incidence of prostate cancer in African American men than in men of European ancestry.
Genome-wide association studies and candidate gene studies in ulcerative colitis have identified 18 susceptibility loci. We conducted a meta-analysis of six ulcerative colitis genome-wide association study datasets, comprising 6,687 cases and 19,718 controls, and followed up the top association signals in 9,628 cases and 12,917 controls. We identified 29 additional risk loci (P < 5 × 10(-8)), increasing the number of ulcerative colitis-associated loci to 47. After annotating associated regions using GRAIL, expression quantitative trait loci data and correlations with non-synonymous SNPs, we identified many candidate genes that provide potentially important insights into disease pathogenesis, including IL1R2, IL8RA-IL8RB, IL7R, IL12B, DAP, PRDM1, JAK2, IRF5, GNA12 and LSP1. The total number of confirmed inflammatory bowel disease risk loci is now 99, including a minimum of 28 shared association signals between Crohn's disease and ulcerative colitis.
We simultaneously investigated the genetic landscape of ankylosing spondylitis, Crohn's disease, psoriasis, primary sclerosing cholangitis and ulcerative colitis to investigate pleiotropy and the relationship between these clinically related diseases. Using high-density genotype data from more than 86,000 individuals of European ancestry, we identified 244 independent multidisease signals, including 27 new genome-wide significant susceptibility loci and 3 unreported shared risk loci. Complex pleiotropy was supported when contrasting multidisease signals with expression data sets from human, rat and mouse together with epigenetic and expressed enhancer profiles. The comorbidities among the five immune diseases were best explained by biological pleiotropy rather than heterogeneity (a subgroup of cases genetically identical to those with another disease, possibly owing to diagnostic misclassification, molecular subtypes or excessive comorbidity). In particular, the strong comorbidity between primary sclerosing cholangitis and inflammatory bowel disease is likely the result of a unique disease, which is genetically distinct from classical inflammatory bowel disease phenotypes.
We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease.
Somatic mutations in noncoding sequences are poorly explored in cancer, a rare exception being the recent identification of activating mutations in TERT regulatory DNA. Although this finding is suggestive of a general mechanism for oncogene activation, this hypothesis remains untested. Here we map somatic mutations in 505 tumor genomes across 14 cancer types and systematically screen for associations between mutations in regulatory regions and RNA-level changes. We identify recurrent promoter mutations in several genes but find that TERT mutations are exceptional in showing a strong and genome-wide significant association with increased expression. Detailed analysis of TERT across cancers shows that the strength of this association is highly variable and is strongest in copy number stable cancers such as thyroid carcinoma. We additionally propose that TERT promoter mutations control expression of the nearby gene CLPTM1L. Our analysis provides a detailed pan-cancer view of TERT transcriptional activation but finds no clear evidence for frequent oncogenic promoter mutations beyond TERT.
Genome-wide association studies of the related chronic inflammatory bowel diseases (IBD) known as Crohn's disease and ulcerative colitis have shown strong evidence of association to the major histocompatibility complex (MHC). This region encodes a large number of immunological candidates, including the antigen-presenting classical human leukocyte antigen (HLA) molecules. Studies in IBD have indicated that multiple independent associations exist at HLA and non-HLA genes, but they have lacked the statistical power to define the architecture of association and causal alleles. To address this, we performed high-density SNP typing of the MHC in >32,000 individuals with IBD, implicating multiple HLA alleles, with a primary role for HLA-DRB1*01:03 in both Crohn's disease and ulcerative colitis. Noteworthy differences were observed between these diseases, including a predominant role for class II HLA variants and heterozygous advantage observed in ulcerative colitis, suggesting an important role of the adaptive immune response in the colonic environment in the pathogenesis of IBD.
Chlamydia trachomatis is responsible for both trachoma and sexually transmitted infections, causing substantial morbidity and economic cost globally. Despite this, our knowledge of its population and evolutionary genetics is limited. Here we present a detailed phylogeny based on whole-genome sequencing of representative strains of C. trachomatis from both trachoma and lymphogranuloma venereum (LGV) biovars from temporally and geographically diverse sources. Our analysis shows that predicting phylogenetic structure using ompA, which is traditionally used to classify Chlamydia, is misleading because extensive recombination in this region masks any true relationships present. We show that in many instances, ompA is a chimera that can be exchanged in part or as a whole both within and between biovars. We also provide evidence for exchange of, and recombination within, the cryptic plasmid, which is another key diagnostic target. We used our phylogenetic framework to show how genetic exchange has manifested itself in ocular, urogenital and LGV C. trachomatis strains, including the epidemic LGV serotype L2b.
Pancreatitis occurs in approximately 4% of patients treated with the thiopurines azathioprine or mercaptopurine. Its development is unpredictable and almost always leads to drug withdrawal. We identified patients with inflammatory bowel disease (IBD) who had developed pancreatitis within 3 months of starting these drugs from 168 sites around the world. After detailed case adjudication, we performed a genome-wide association study on 172 cases and 2,035 controls with IBD. We identified strong evidence of association within the class II HLA region, with the most significant association identified at rs2647087 (odds ratio 2.59, 95% confidence interval 2.07-3.26, P = 2 x 10(-16)). We replicated these findings in an independent set of 78 cases and 472 controls with IBD matched for drug exposure. Fine mapping of the H LA region identified association with the HLA-DQA1*02:01-HLA-DRB1*07:01 haplotype. Patients heterozygous at rs2647087 have a 9% risk of developing pancreatitis after administration of a thiopurine, whereas homozygotes have a 17% risk.
Twin studies have provided the basis for genetic and epidemiological studies in human complex traits. As epigenetic factors can contribute to phenotypic outcomes, we conducted a DNA methylation analysis in white blood cells (WBC), buccal epithelial cells and gut biopsies of 114 monozygotic (MZ) twins as well as WBC and buccal epithelial cells of 80 dizygotic (DZ) twins using 12K CpG island microarrays. Here we provide the first annotation of epigenetic metastability of approximately 6,000 unique genomic regions in MZ twins. An intraclass correlation (ICC)-based comparison of matched MZ and DZ twins showed significantly higher epigenetic difference in buccal cells of DZ co-twins (P = 1.2 x 10(-294)). Although such higher epigenetic discordance in DZ twins can result from DNA sequence differences, our in silico SNP analyses and animal studies favor the hypothesis that it is due to epigenomic differences in the zygotes, suggesting that molecular mechanisms of heritability may not be limited to DNA sequence differences.
Bacterial speciation is a fundamental evolutionary process characterized by diverging genotypic and phenotypic properties. However, the selective forces that affect genetic adaptations and how they relate to the biological changes that underpin the formation of a new bacterial species remain poorly understood. Here, we show that the spore-forming, healthcare-associated enteropathogen Clostridium difficile is actively undergoing speciation. Through large-scale genomic analysis of 906 strains, we demonstrate that the ongoing speciation process is linked to positive selection on core genes in the newly forming species that are involved in sporulation and the metabolism of simple dietary sugars. Functional validation shows that the new C. difficile produces spores that are more resistant and have increased sporulation and host colonization capacity when glucose or fructose is available for metabolism. Thus, we report the formation of an emerging C. difficile species, selected for metabolizing simple dietary sugars and producing high levels of resistant spores, that is adapted for healthcare-mediated transmission.
Inflammatory bowel diseases (IBDs) are chronic disorders of the gastrointestinal tract with the following two subtypes: Crohn's disease (CD) and ulcerative colitis (UC). To date, most IBD genetic associations were derived from individuals of European (EUR) ancestries. Here we report the largest IBD study of individuals of East Asian (EAS) ancestries, including 14,393 cases and 15,456 controls. We found 80 IBD loci in EAS alone and 320 when meta-analyzed with similar to 370,000 EUR individuals (similar to 30,000 cases), among which 81 are new. EAS-enriched coding variants implicate many new IBD genes, including ADAP1 and GIT2. Although IBD genetic effects are generally consistent across ancestries, genetics underlying CD appears more ancestry dependent than UC, driven by allele frequency (NOD2) and effect (TNFSF15). We extended the IBD polygenic risk score (PRS) by incorporating both ancestries, greatly improving its accuracy and highlighting the importance of diversity for the equitable deployment of PRS. Genome-wide association analyses across individuals of East Asian and European ancestries identify new risk loci for inflammatory bowel diseases. A polygenic risk score derived from the combined datasets shows improved prediction accuracy.
Ulcerative colitis is a chronic, relapsing inflammatory condition of the gastrointestinal tract with a complex genetic and environmental etiology. In an effort to identify genetic variation underlying ulcerative colitis risk, we present two distinct genome-wide association studies of ulcerative colitis and their joint analysis with a previously published scan, comprising, in aggregate, 2,693 individuals with ulcerative colitis and 6,791 control subjects. Fifty-nine SNPs from 14 independent loci attained an association significance of P < 10(-5). Seven of these loci exceeded genome-wide significance (P < 5 x 10(-8)). After testing an independent cohort of 2,009 cases of ulcerative colitis and 1,580 controls, we identified 13 loci that were significantly associated with ulcerative colitis (P < 5 x 10(-8)), including the immunoglobulin receptor gene FCGR2A, 5p15, 2p16 and ORMDL3 (orosomucoid1-like 3). We confirmed association with 14 previously identified ulcerative colitis susceptibility loci, and an analysis of acknowledged Crohn's disease loci showed that roughly half of the known Crohn's disease associations are shared with ulcerative colitis. These data implicate approximately 30 loci in ulcerative colitis, thereby providing insight into disease pathogenesis.
Genome-wide association studies (GWAS) have identified dozens of risk loci for many complex disorders, including Crohn's disease. However, common disease-associated SNPs explain at most ∼20% of the genetic variance for Crohn's disease. Several factors may account for this unexplained heritability, including rare risk variants not adequately tagged thus far in GWAS. That rare susceptibility variants indeed contribute to variation in multifactorial phenotypes has been demonstrated for colorectal cancer, plasma high-density lipoprotein cholesterol levels, blood pressure, type 1 diabetes, hypertriglyceridemia and, in the case of Crohn's disease, for NOD2 (refs. 14,15). Here we describe the use of high-throughput resequencing of DNA pools to search for rare coding variants influencing susceptibility to Crohn's disease in 63 GWAS-identified positional candidate genes. We identify low frequency coding variants conferring protection against inflammatory bowel disease in IL23R, but we conclude that rare coding variants in positional candidates do not make a large contribution to inherited predisposition to Crohn's disease.
The timing of puberty is highly variable(1). We carried out a genome-wide association study for age at menarche in 4,714 women and report an association in LIN28B on chromosome 6 (rs314276, minor allele frequency (MAF) = 0.33, P = 1.5 x 10(-8)). In independent replication studies in 16,373 women, each major allele was associated with 0.12 years earlier menarche (95% CI = 0.08-0.16; P = 2.8 x 10(-10); combined P = 3.6 x 10(-16)). This allele was also associated with earlier breast development in girls (P = 0.001; N = 4,271); earlier voice breaking (P = 0.006, N = 1,026) and more advanced pubic hair development in boys (P = 0.01; N = 4,588); a faster tempo of height growth in girls (P = 0.00008; N = 4,271) and boys (P = 0.03; N = 4,588); and shorter adult height in women (P = 3.6 x 10(-7); N = 17,274) and men (P = 0.006; N = 9,840) in keeping with earlier growth cessation. These studies identify variation in LIN28B, a potent and specific regulator of microRNA processing(2), as the first genetic determinant regulating the timing of human pubertal growth and development.
More than 1,000 susceptibility loci have been identified through genome-wide association studies (GWAS) of common variants; however, the specific genes and full allelic spectrum of causal variants underlying these findings have not yet been defined. Here we used pooled next-generation sequencing to study 56 genes from regions associated with Crohn's disease in 350 cases and 350 controls. Through follow-up genotyping of 70 rare and low-frequency protein-altering variants in nine independent case-control series (16,054 Crohn's disease cases, 12,153 ulcerative colitis cases and 17,575 healthy controls), we identified four additional independent risk factors in NOD2, two additional protective variants in IL23R, a highly significant association with a protective splice variant in CARD9 (P < 1 x 10(-16), odds ratio approximate to 0.29) and additional associations with coding variants in IL18RAP, CUL2, C1orf106, PTPN22 and MUC19. We extend the results of successful GWAS by identifying new, rare and probably functional variants that could aid functional experiments and predictive models.
We carried out a fine-mapping study in the HNF1B gene at 17q12 in two study populations and identified a second locus associated with prostate cancer risk, 26 kb centromeric to the first known locus (rs4430796); these loci are separated by a recombination hot spot. We confirmed the association with a SNP in the second locus (rs11649743) in five additional populations, with P = 1.7 10-9 for an allelic test of the seven studies combined. The association at each SNP remained significant after adjustment for the other SNP.