Buried in non-gene sequences are so-called "regulatory elements" that contain instructions for switching genes on or off, and for controlling how DNA is packaged and replicated within a human cell. Scientists believe these DNA sequences may play a very important role in some diseases, such as prostate or colon cancer.
The UW was a leading institution among dozens that participated in the Encyclopedia of DNA Elements (ENCODE) consortium, supported by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health. The consortium's initial results are being published in the June 14 issue of the journal Nature, and in companion articles in the June issue of the journal Genome Research.
Launched in September 2003, the consortium began a four-year pilot project to identify the function of all DNA sequences in a small section of the human genome. Though the pilot study only examined about 1 percent of the genome, scientists hope the work will build our understanding of the 98 percent of the genome that is made up of non-gene sections of DNA. The consortium also set out to create methods for easily building up the ENCODE project to encompass the entire human genome.
The UW is one of eight major data-producing centers in the consortium, along with Yale University, Stanford University, Affymetrix Inc., the University of Virginia, the University of California San Diego, the Wellcome Trust Sanger Centre in Cambridge, UK, and the NHGRI. UW researchers also led a computational analysis project that integrated data from multiple consortium members, uncovering how different aspects of genome function relate to one another.
"The ENCODE project has given us unprecedented insight into how functional information is organized in the human genome," said Dr. John Stamatoyannopoulos, UW professor of genome sciences, one of the leaders of the ENCODE project, and a senior author on the Nature article. "The diverse nature and sheer volume of the ENCODE data enabled us for the first time to visualize how the packaging of DNA, the replication of DNA, the production of RNA, and the evolution of DNA sequences all fit together."
Each ENCODE center used a different experimental approach for mapping functional sequences within the genome. The UW ENCODE team of molecular and computational biologists focused on identifying functional elements hidden in non-gene DNA by carefully mapping how DNA is packaged in the cell nucleus.
In order to fit within the nucleus, DNA is tightly wound around proteins to create a substance called chromatin. Many years ago, researchers found differences between the chromatin at gene-controlling DNA sequences and the chromatin in other areas of the genome. In these gene-controlling regions, specialized proteins latch onto DNA and unfold the chromatin, activating nearby genes.
The UW team created new methods for finding these unfolded regions within the billions of DNA bases on the genome. First, they used an inexpensive enzyme, called DNaseI, as a molecular bloodhound. When it is injected into a living cell, the enzyme seeks out regions of unfolded chromatin that correspond to functional elements. By mapping where the DNaseI traveled, the researchers created a high-resolution map of chromatin structure that pinpointed the locations of thousands of gene-controlling sequences.
To handle the massive amounts of data produced by the many ENCODE teams, Stamatoyannopoulos teamed up with Dr. William Noble, UW associate professor of genome sciences and computer science. Together, they created new computational techniques and used powerful computers to unravel the locations of gene-controlling sequences, and to pull together data generated by other ENCODE centers to gain a more complete picture of how the genome was working.
"In the beginning we only had the vaguest idea how the various functional data sets might fit together," said Noble. Because of that, the researchers had to build their computational tools from scratch, which required several years of work. The group used the Internet to help rapidly disseminate and review complex computational analyses with researchers around the world.
Even though the ENCODE consortium studied only 1 percent of the human genome, Stamatoyannopoulos believes their research will be useful for understanding the entire genome.
"The progress of the past four years has been breathtaking, as we have moved from being able to analyze only a few thousand DNA base pairs at a time, up to 1 perecnt of the genome, or 30 million bases," he said. "Now, with new technologies in hand, the entire genome is within reach. Expanding the ENCODE project to the entire genome will have a major impact on our understanding of the genetic basis of common diseases, where defects in gene-controlling sequences likely play a key role."
Non-gene elements of the human genome have been relatively understudied in the past, but recent research has shown that non-gene DNA defects play a critical role in many common diseases, like prostate and colon cancer, inflammatory bowel disease, arthritis, and diabetes. By having a complete map of functional DNA within the genome, biomedical scientists will be able to track down the causes of these diseases far more rapidly than before, and will be better equipped to treat or prevent them.