What is the Nussinov algorithm for RNA folding?

Imagine we have a string of RNA, which is like a long chain of smaller molecules called nucleotides. Each nucleotide can be one of four types:

  • adenine (A)

  • cytosine (C)

  • guanine (G)

  • uracil (U)

Now, when RNA folds, it forms a structure where different parts of the chain pair up with each other to create what’s called a secondary structure. This secondary structure is crucial for the function of RNA in cells. By predicting how an RNA sequence folds, we can gain insights into its function and interactions. This has significant implications for fields ranging from drug design to understanding genetic regulation.

RNA folding

The folding of RNA is essential for the RNA’s function. The secondary structure of RNA involves various interactions between the nucleotides, such as hydrogen bonding between complementary bases (A pairs with U, and G pairs with C). These interactions create loops, bulges, and stems, forming a complex shape that enables RNA to perform its biological roles effectively. The specific shape of an RNA molecule can affect how it interacts with other molecules, such as proteins, other RNA molecules, or small ligands. For example, transfer RNA (tRNA) adopts a cloverleaf structure essential for its role in translating genetic information into proteins.

RNA folding pairs
RNA folding pairs

Similarly, ribozymes, which are RNA molecules with enzymatic activity, require precise folding to catalyze chemical reactions. Predicting how an RNA sequence folds is challenging due to the number of possible structures that can form. This is where computational methods like the Nussinov algorithm come into play.

The Nussinov algorithm

The Nussinov algorithm is a clever way to predict how RNA molecules will fold into this secondary structure based solely on the sequence of nucleotides. It’s like predicting how a string of beads will loop and connect to form a necklace.

Here’s how the Nussinov algorithm works:

  1. Initialization: We start by creating a matrix where each cell represents a region of the RNA sequence. The value in each cell represents the maximum number of base pairs (matching nucleotides) that can form between the two regions represented by the row and column of that cell.

  2. Recursion: We fill in the matrix diagonally, starting from the diagonal and moving upwards. At each step, we consider all possible ways that the RNA sequence could fold in that region and choose the one that maximizes the number of base pairs.

  3. Backtracking: Once the matrix is filled, we backtrack through it to find the actual pairs of nucleotides that form the secondary structure with the maximum number of base pairs.

Let’s illustrate this with a simple example.

Code example

Suppose we have an RNA sequence “GGGAAAUCC”. We can apply the Nussinov algorithm through the following steps:

Step 1: Initialization

We initialize the diagonals of row 1 and row 2 with 0. This will help to perform formula calculations.

Initialization of matrix
Initialization of matrix

Step 2: Recursion

We iterate diagonally and at each step, we consider all possible ways the RNA sequence could fold in that region and choose the one that maximizes the number of base pairs.

To choose the optimal way the RNA sequence can fold, we use the following equation:

where s(ai,aj)=1s(a_i,a_j) = 1 if aia_i and aja_j are complement and s(ai,aj)=0s(a_i,a_j) = 0 if aia_i and aja_j are non-complement.

Different types of pairing of sequences
Different types of pairing of sequences

The process of recursion and choosing the optimal way RNA would fold is as follows:

Initialization
Initialization
1 of 10

Step 3: Backtracking

We backtrack through the matrix to find the pairs of nucleotides that form the secondary structure. Starting from the top-right corner of the matrix, we'll trace the path that leads to the maximum number of base pairs.

Initialization of matrix
Initialization of matrix

The traceback above gave us the following structure:

Final RNA sequence
Final RNA sequence

The structure above represents one of the optimal ways to fold an RNA sequence. The double bonds are between nucleotides which are complementary to each other.

Knowledge test

Answer the following questions to assess your understanding.

Choose the correct answer.

1

What is the primary purpose of the Nussinov algorithm?

A)

To sequence RNA molecules

B)

To predict the secondary structure of RNA molecules

C)

To synthesize RNA in a laboratory

D)

To analyze protein-DNA interactions

Question 1 of 20 attempted

Conclusion

The Nussinov algorithm is a pivotal tool in RNA folding prediction, offering simplicity and efficiency in discerning RNA secondary structures. Despite its limitations, it remains invaluable in various fields, from drug design to evolutionary biology. As research progresses, its role in unraveling RNA complexities and guiding investigations into RNA function is poised to endure.

Free Resources

Copyright ©2025 Educative, Inc. All rights reserved