Novel mapping approach for DNA sequence binding motifs sharply expands library of genetic knowledge
In a study with wide-ranging impact, researchers effectively increased the DNA sequence binding motifs that are known for eukaryotic transcription factors over 10-fold, including doubling knowledge for human transcription factors.
This new insight significantly improves predicting capacity for gene expression mechanisms for many disease-mechanism problems, and essentially all of eukaryotic biology.
The study, led by Matthew Weirauch, PhD, a computational biologist in the Center for Autoimmune Genomics and Etiology, was published Sept. 11, 2014 in the journal Cell. The findings have enabled researchers who study any organism to begin to understand how genes are regulated on a global scale. For human disease, the study increases researchers’ ability to understand the function of disease-associated genetic variants that fall in non-coding regions.
It is estimated that approximately 90 percent of disease-associated variants are non-coding. In genomics, noncoding DNA sequences are components of an organism's DNA that do not encode protein sequences. “Doubling our knowledge of human DNA sequence binding motifs essentially doubles our chance of figuring out which proteins these variants might affect the binding of,” Weirauch says.
The center’s primary focus is the genesis of lupus and other immunological diseases, and to explore the mechanisms of disease through the complex interactions of genetics, the immune system and environmental factors such as stress, exercise and diet.
Two findings of the study surprised researchers. “First, that our scheme for mapping DNA sequence binding motifs across organisms based on protein similarity works for most protein families,” says Weirauch. “Second, the fact that we increased knowledge of these motifs so substantially across all of eukaryotic life, from less than one percent to almost 40 percent of all proteins.”