Powerful data-sifting algorithms are helping scientists to untangle the profoundly complex genetics of cancer. These algorithms have been developed by computer scientists at Brown University.
In a study reported today in the New England Journal of Medicine, researchers from Washington University in St. Louis used two algorithms developed at Brown to assemble the most complete genetic profile yet of acute myeloid leukemia (AML), an aggressive form of blood cancer. The researchers hope the work will lead to new AML treatments based on the genetics of each patient's disease.
The algorithms, developed by Ben Raphael, Eli Upfal, and Fabio Vandin from the Department of Computer Science and the Center for Computational Molecular Biology (CCMB), played a key role in making sense of the giant datasets required for the study. The work was part of The Cancer Genome Atlas project, which aims to catalog the genetic mutations that cause cells to become cancerous. Doing that requires sequencing the entire genome of cancer cells and comparing it to the genome of healthy cells. Without computational tools like the ones the Brown team has developed, analyzing those data would be impossible.
"Genes don't usually act or their own, but instead act together in pathways or networks," said Raphael, associate professor computer science. "Cancer-causing mutations often target these networks and pathways." This presents a problem for researchers trying to find important mutations, because these mutations are often spread across the network and hidden in the genetic data.
Imagine a cellular pathway containing five genes. If any one of those genes acquires a mutation, the pathway fails and the cell becomes cancerous. That means five patients with the same cancer can have any one of five different mutations. That makes life difficult for researchers trying to find the mutations that cancer cells have in common. The algorithms developed by Raphael and his team are designed to connect those dots and identify the important pathways, rather than looking only at individual genes.
The HotNet algorithm works by plotting mutation data from patients onto a map of known gene interactions and looking for connected networks that are mutated more often than would be expected by chance. The program represents frequently mutated genes as heat sources. By looking at the way heat is distributed and clustered across the map, the program finds the "hot" networks involved in cancer.
HotNet picked out several networks that seem to be active in the AML genome. In a study published in 2011, HotNet identified networks important to ovarian cancer as well.
Dendrix, the newest algorithm developed at Brown, takes the power of HotNet one step further. HotNet works by looking for mutations in networks that are already known to researchers. However, there are countless gene networks that researchers have not yet identified. Dendrix is designed to look for mutations in those previously unknown networks.
To find new networks, Dendrix takes advantage of the fact that cancer-causing mutations are relatively rare. A patient with a mutation in one gene in a network is unlikely to have a concurrent mutation in another gene in that network. Dendrix looks for combinations of mutations that happen frequently across patients but rarely happen together in a single patient. Put another way: Imagine that a substantial number patients with a given cancer have a mutation in gene X. Another large group of patients has a mutation in gene Y. But very few patients have mutations in both X and Y at the same time. Dendrix looks for these patterns of exclusivity and predicts that groups of genes with high exclusivity are probably working together.
"Where we see those patterns of exclusivity," Raphael said, "it suggests a possible pathway." The group has tested Dendrix on cancers in which the pathways were already known, just to see if the program would find them. Indeed, the pathways "just fall right out of the data," Raphael said.
For the AML paper, Raphael's group developed an improved algorithm — Dendrix++ — which better handles extremely rare mutations. Dendrix++ picked out three potential new pathways in AML for doctors to investigate.
Raphael and Vandin, along with computational biology graduate students Max Leiserson and Hsin-Ta Wu, are continuing to improve their algorithms and to apply them to new datasets. The group recently started putting the algorithms to work on what's called the Pan-Cancer project, which looks for commonalities in mutations across cancer types.
"For us as computational people, it's fun to push these algorithms and apply them to new datasets," Raphael said. "At the same time, in analyzing cancer data we hope that the algorithms produce actionable information that is clinically important."