Researchers from the University of North Carolina have found a way of categorising long non-coding RNAs (lnc-RNAs) by their probable function.
Mauro Calabrese and colleagues have discovered a code that links the molecular makeup of lnc-RNAs with what they actually do and developed an algorithm that categorizes the mysterious molecules based on this.
“Long non-coding RNAs are part of what you might call the ‘dark matter’ of the genome, and this tool we’ve developed should help us understand much better how they work in health and disease,” says Calabrese.
Most of the DNA in animals and plants is transcribed into RNAs that do not encode proteins and the non-coding RNAs that are made up of more than 200 nucleotides are referred to as long non-coding RNAs. Many of these lnc-RNAs “switch” genes on or off when they bind to proteins or other molecules, thereby regulating cellular processes. Generally, scientists agree that the disruption of these regulatory roles contributes to diseases such as cancer and other serious illnesses. However, until now, they have only been able to pinpoint the functions of only a very small fraction of the many thousands of lncRNAs present in our cells.
One reason for this is that the function of lncRNAs is not clear when they look at the make-up of their nucleotide sequence. Two of the molecules that perform similar functions may have a very different sequence of nucleotides.
Calabrese and team therefore attempted to decipher the obscure relationship between lncRNA nucleotide sequence and function. They began with the fact that lncRNAs appear to function by binding to proteins and they do this using short sequences within their make-up.
“We reasoned that the presence of protein-binding sequences in a lncRNA would be more important than their relative positioning within the lncRNA. This notion ended up being true, and allowed us to succeed where more traditional approaches have failed,” says Calabrese.
The researchers developed a computational method called SEEKR that compares protein-binding sequences called “kmers” within the lncRNAs, irrespective of where the kmers are located.
As reported in the journal Nature Genetics, Calabrese and colleagues found that approximately half of all lncRNAs could be categorized into five different groups based on similarities between their kmer content. This approach also enabled them to predict where in a cell the molecules are usually found and what type of proteins they bind to.
“We can now take sequence information from a well-studied lncRNA, and use it to discover lncRNAs that may be functioning through a related mechanism. In a way, it’s like being able to finally understand the different scripts in the Rosetta Stone,” says Calabrese.
The researchers now hope to use the kmer-based approach to help discover and study lncRNAs involved in disease and to refine the technique to better predict the molecules’ function based on their sequences.
“Our genomes produce so many lncRNAs, and now we have a much better idea of how to look at the sequences of these molecules to predict which ones are doing important things in our cells,” concludes Calabrese.