Systemic Lupus Erythematosus (SLE or lupus) is an incurable, debilitating autoimmune disease characterized by widespread inflammation and rampant production of autoantibodies. Genetics play a definite role in lupus, with 83 established SLE risk loci contributing to disease development in an incompletely understood manner. Improved comprehension of the mechanisms by which genetic variants promote SLE would lead to more efficient diagnosis, treatment, and even prevention of lupus. One factor complicating resolution of mechanisms of genetic risk is the localization of >90% of SLE risk loci in non-coding regions of the genome. These non-coding variants can affect disease risk by altering transcription factor (TF) binding and subsequent gene expression. Expression quantitative trait loci (eQTLs) provide numerous examples of genotype-dependent expression of genes in many different cell types. In addition, twenty-one independent SLE risk loci encode TFs including examples of both coding and non-coding variants impacting TF expression or function. Thus, gene regulation by TFs is assuredly critical in SLE etiology.
Some of the lupus projects in the Kottyan Lab are aimed towards identifying “causal variants” at lupus-associated genetic loci containing immunologically relevant genes in order to understand the molecular mechanisms that contribute to increased lupus risk. We use a fine mapping strategy that combines genotyping data from multiple ancestries, HapMap and 1000 Genome project data, sequencing in lupus patients, and Bayesian and frequentist fine-mapping approaches with a resultant “credible set” that contain all of the genetic variants that are likely be mechanistically causal. Many of these projects are performed in close collaboration with John Harley and Ken Kaufman, investigators who participated in the acquisition of the extensive genetic datasets that we use in our analyses.
i. IRF5-TNPO3 risk locus at human 7q32
ii. PXK risk locus at human 3p14.3
iii. STAT1-STAT4 risk locus at human 2q32.2-2q32.3
iv. ETS1 risk locus at human 11q23.3
v. IRF7 risk locus at human 11p15.5
Eosinophilic esophagitis (EoE) is a chronic, food-driven, tissue-specific, esophageal, inflammatory allergic disease characterized by marked mucosal eosinophil accumulation. EoE remits after removal of specific food types and food re-introduction causes EoE recurrence with marked dysregulation of esophageal transcripts — all of which support the concept that EoE is a food allergic disease. Esophageal transcripts in EoE are rich in elements involved in allergic inflammation (e.g. T helper cell type 2 [Th2] cytokines such as interleukin [IL]-13, eosinophils, and mast cells), and EoE-like disease can be induced in mice by allergen exposure through IL-5– and IL-13–driven pathways. EoE frequently co-occurs with other allergic diseases including asthma and eczema; however, why EoE patients develop a tissue-specific response and why most EoE patients do not have food-induced anaphylaxis despite harboring food–specific IgE remain outstanding questions. Indeed, food sensitization is common in the majority of EoE subjects, but animal models, as well as association studies and a preliminary intervention with omalizumab, support the hypothesis that IgE is dispensable.
Despite distinct features compared with other allergic diseases (e.g. tissue specificity and lack of anaphylaxis), studies to date have identified EoE-genetic risk loci that are also linked to other allergic diseases. For example, 5q22 harbors the gene for thymic stromal lymphopoietin, TSLP, which was identified as an EoE-associated genetic locus in the first genome-wide association study (GWAS) of EoE, yet it is also associated with susceptibility to other allergic diseases.
The work of the Kottyan Lab on EoE is aimed towards identifying and then fine mapping “causal variants” at EoE-associated risk loci. In collaboration with Marc Rothenberg and the Center for Eosinophilic Disorders, we are actively genotyping new subjects and increasing our power to successfully perform these studies.
Electrophoretic Mobility Shift Assays (EMSA) analysis: We will apply nuclear lysates to fluorescently-labeled dsDNA oligos containing either allele of the variant and surrounding genomic sequence. The resulting DNA-protein complexes are then run on a 6% TBE polyacrylamide gel (non-reducing). The bands are imaged using a fluorescence-detecting gel doc. In cases in which we suspect a specific transcription factor that is likely to bind differentially, we can "super shift" the complexes with an antibody against the hypothesized protein component to determine whether the oligo:transcription factor complex is bound or, alternatively, prevented from forming. This is a systematic and practical approach.
DNA Affinity Precipitation Assays: A second method to identify allele specific ligands, applies cell lysate to either risk or non-risk oligos (these oligos are the same sequence as in the EMSAs but have biotin labels) attached to magnetic beads (using streptavidin). This is a column-based method with elution of the DNA oligo-bound material. The eluted proteins are run on a polyacrylamide gel and the bands can be visualized using a silver stain. When we have a predicted transcription factor, a Western blot is then performed using an antibody to the predicted transcription factor for detection. When we do not have a candidate transcription factor, we can use the Mass Spectrometry method utilized in previous projects in our group to identify a candidate. This approach complements the EMSA by exploring the available ligands under reducing conditions; sometimes one of the two procedures identifies a ligand that is not detected by the other approach.
Chromatin immunoprecipitation with Next Generation Sequencing (ChIP-seq): When we have a candidate transcription factor that differentially binds oligos with the risk and non-risk sequence, we select relevant tissue types from subjects that are homozygous risk, homozygous non-risk and heterozygous for the variant. From these isolated cells or patient biopsies, we perform ChIP for the transcription factor and use the immunoprecipitated DNA to assess the presence of the DNA surrounding the risk and non-risk variants.
For our ChIP-seq analysis, we perform standard quality control to confirm the appropriate complexity of reads and peaks and the anticipated transcription factor motif enrichment. Next, we map sequencing reads to genomes masked for genetic variants so that we can compare binding between alleles using our novel MARIO computational pipeline. Using MARIO, we identify risk-genotype-dependent binding of transcription factors at the heterozygous genetic variants known to be associated with SLE. We developed the MARIO pipeline to identify allele-dependent protein binding by weighing imbalance between the number of sequencing reads for each allele of a given genetic variant, the total number of reads available at the variant, and the number and consistency of available experimental replicates. MARIO extends existing methods by 1). Calculating a score that explicitly reflects reproducibility across experimental replicates; 2). Reducing run-time via utilization of multiple computational cores; and 3). Allowing the user to directly provide genotyping data as input. To identify heterozygotes for analysis, we have already genotyped the cell lines we will use and performed genome-wide imputation
Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq): We use ATAC-seq to identify the chromatin openness in cell lines, sorted primary cells, and patient derived biopsy tissues. These data are used to understand changes in chromatin availability for gene expression and pioneering activity of transcription factors. Like with our ChIP-seq analytical pipeline, we use the MARIO pipeline to identify genotype-dependent chromatin openness.
Genotype-dependent Luciferase reporter assays: To confirm the ability of noncoding variants to affect gene expression, we will clone regulatory regions into a dual luciferase reporter system upstream of a minimal promoter, such that they will control the expression of firefly (Renilla) luciferase or an enhanced, and more stable, nanoluciferase in a variety of disease-relevant cells. Site-directed mutagenesis is used to create identical reporter vectors that differ at the genetic variant of interest. A separate vector constitutively expressing Renilla luciferase will be co-transfected and used as a control so that we can measure each signal separately and normalize transfection efficiency from well-to-well. We use constructs containing either the risk or non-risk variant, and luciferase expression will be measured relative to the constitutively expressed Renilla. Both transient (luciferase) and stable (lenti-viral based) reporter assays can be modified to work regardless of whether the variants are located in a promoter, enhancer, or repressor regions.
Luciferase reporter assays can also be used to explore the degree to which differential binding of a transcription factor affects gene expression. Lentivirus-mediated over- and under-expression of the transcription factor of interest allows investigators to assess the effect of a specific transcription factor on regulator activity of the DNA surrounding a genetic variant. Site directed mutagenesis can also be used to mutate other parts of the DNA binding motif of the predicted transcription factor or transcription factor complex.
Massively parallel reporter assays (MPRAs): MPRAs are a new technology that allow us to build a library of a large set of genetic variants and screen them for their effect on gene regulation in numerous biological settings. We are building a MPRA library specifically designed for the analysis of autoimmune and allergic genetic risk variants. Our hypothesis is that many of these variants will result in genotype-dependent regulatory activity in the context of the right cell type and stimulation. MPRAs are important tools for our group because they allow us to screen many genetic variants to nominate causal functional variants in many different cell types (including primary cells) and under disease-specific inflammatory conditions.
The Functional Genomics Core is led by Dr. Kottyan and was established in July 2016 under NIH P30 AR070549 (led by Dr. Susan Thompson). The goals of the core are:
The Functional Genomics Core provides reagents, strategic, logistical, and technical expertise required for efficient investigation of genetic variants likely to be mechanistically important in pediatric rheumatic and inflammatory diseases. In doing so, we enhance research productivity and efficiency by effectively reducing the “start-up” time required to implement these complex experimental systems, thus reducing costs and allowing for higher quality studies than if investigators were to individually setup this technology. Further, this core provides access to state-of-the-art technology without requiring the users to become experts in these technologies. In this way the core becomes a practical vehicle for the translation of basic advances to clinical disease applications and their very important insights in to pathogenesis. These collaborations have resulted in exciting findings in diseases ranging from preterm birth to occupational asthma.