Scientists at the Broad Institute of Harvard and MIT and Beth Israel Deaconess Medical Center have announced the discovery of a vast new class of previously unrecognised mammalian genes that do not encode proteins, but instead function as long RNA molecules.
The researchers describe their discovery as a novel class of "large intervening non-coding RNAs" (lincRNAs).
According to them, these genes play critical roles in both health and disease, including cancer, immune signalling and stem cell biology.
"We've known that the human genome still has many tricks up its sleeve. But, it is astounding to realize that there is a huge class of RNA-based genes that we have almost entirely missed until now," Nature magazine quoted Eric Lander, founding director of the Broad Institute, as saying.
The researchers say that, in comparison with standard "textbook" genes, the newly discovered lincRNAs are thousands of bases long.
Because only about ten examples of functional lincRNAs were known previously, they seemed more like genomic oddities than critical components.
The team said that the new find demonstrates that there are actually thousands of such genes, and that they have been conserved across mammalian evolution.
"The challenge in finding these lincRNAs is that they have been hiding in plain sight," said John Rinn, a Harvard Medical School assistant professor at Beth Israel Deaconess Medical Center and an associate member of the Broad Institute of Harvard and MIT.
"The human and mouse genomes are already known to produce many large RNA molecules, but the vast majority show no evolutionary conservation across species, suggesting that they may simply be 'genomic noise' without any biological function," he added.
During their study, the researchers looked not at the RNA molecules themselves but at telltale signs in the DNA called chromatin modifications or epigenomic marks.
The scientific team looked for genomic regions that have the same chromatin patterns as protein-coding genes, but do not encode proteins.
Upon the survey of the genomes of four different types of mouse cells, including embryonic stem cells and cells from various tissue types, the researchers found an astounding 1,586 such loci that had not been previously described.
The study also showed that the vast majority of these genomic regions are transcribed into lincRNAs, and that these are conserved across mammals.
"The epigenomic marks revealed where these genes were hiding. Analysis of their sequence then revealed that the genes are highly conserved in mammalian genomes, which strongly suggested that these genes play critical biological functions," said Mitch Guttman, a MIT graduate student working at the Broad Institute.
The scientists correlated the expression patterns of lincRNAs in various cell types with the expression patterns of known critical protein-coding genes in those same cells, and found that lincRNAs likely play critical roles in helping to regulate a variety of different cellular processes, including cell proliferation, immune surveillance, maintenance of embryonic stem cell pluripotency, neuronal and muscle development, and gametogenesis.
Their observations were verified by further experimental evidence from several of the identified lincRNAs.
In their study paper, the researchers say that the stringent experimental conditions imposed by them in identifying the 1,600 lincRNAs during the study suggest that it is likely that there are many more lincRNA genes hiding in plain sight in the genome, as well as other RNA-encoding genes that are as important to genome function as their better-recognized protein-coding counterparts.