What is the Nussinov algorithm for RNA folding?

Imagine we have a string of RNA, which is like a long chain of smaller molecules called nucleotides. Each nucleotide can be one of four types:

adenine (A)
cytosine (C)
guanine (G)
uracil (U)

Now, when RNA folds, it forms a structure where different parts of the chain pair up with each other to create what’s called a secondary structure. This secondary structure is crucial for the function of RNA in cells. By predicting how an RNA sequence folds, we can gain insights into its function and interactions. This has significant implications for fields ranging from drug design to understanding genetic regulation.

RNA folding

The folding of RNA is essential for the RNA’s function. The secondary structure of RNA involves various interactions between the nucleotides, such as hydrogen bonding between complementary bases (A pairs with U, and G pairs with C). These interactions create loops, bulges, and stems, forming a complex shape that enables RNA to perform its biological roles effectively. The specific shape of an RNA molecule can affect how it interacts with other molecules, such as proteins, other RNA molecules, or small ligands. For example, transfer RNA (tRNA) adopts a cloverleaf structure essential for its role in translating genetic information into proteins.

Similarly, ribozymes, which are RNA molecules with enzymatic activity, require precise folding to catalyze chemical reactions. Predicting how an RNA sequence folds is challenging due to the number of possible structures that can form. This is where computational methods like the Nussinov algorithm come into play.

The Nussinov algorithm

The Nussinov algorithm is a clever way to predict how RNA molecules will fold into this secondary structure based solely on the sequence of nucleotides. It’s like predicting how a string of beads will loop and connect to form a necklace.

Here’s how the Nussinov algorithm works:

Initialization: We start by creating a matrix where each cell represents a region of the RNA sequence. The value in each cell represents the maximum number of base pairs (matching nucleotides) that can form between the two regions represented by the row and column of that cell.
Recursion: We fill in the matrix diagonally, starting from the diagonal and moving upwards. At each step, we consider all possible ways that the RNA sequence could fold in that region and choose the one that maximizes the number of base pairs.
Backtracking: Once the matrix is filled, we backtrack through it to find the actual pairs of nucleotides that form the secondary structure with the maximum number of base pairs.

Let’s illustrate this with a simple example.