BLAT stands for BLAST-like alignment tool. It assists in the annotation and assembly of the human genome.
BLAT is used to analyze and compare biological sequences, which include DNA, RNA, and proteins. It aims to infer the
It works best for sequences that have high similarity.
BLAT has a few specifications in terms of input and output. These specifications are discussed below.
BLAT takes input from long database sequences, but it works ideally for short query sequences. It takes in sequence in the form of plain text.
It has limitations on the size limit of the input. For multiple sequences, nucleotides must not exceed 50,000
BLAT gives the output as a list of results in decreasing order of their
BLAT output displays the following information:
The BLAT algorithm works in two steps. These are the search stage and alignment stage.
The search stage has three different methodologies to search for homologous regions. These are searching with single perfect matches, single almost perfect matches, and multiple perfect matches.
The first method requires single perfect matches between the search query and the database. This is not an ideal search approach as it requires small k-mers to achieve sensitivity, thus increasing the chances of false-positive hits. This increases the amount of time needed to search.
The second method requires at least one mismatch between the sequences. This method is valuable in identifying small homologous regions. This method decreases the number of false positives, thus forming large-sized k-mers that are computationally less expensive to search.
The third method requires multiple perfect matches which are close to each other. This method considers the deletions and insertions that occur in the homologous regions.
There are three stages of alignment. These are nucleotide alignments for base pairs, protein alignments for amino acids, and the stitching and fitting of multiple homologous regions.
This algorithm initiates by making a hit list between the search query and the homologous region of the database. The algorithm looks for small and perfect hits. If a k-mer hit matches multiple k-mers in the database, the k-mer is extended one by one until it matches a unique sequence on the database or till a specific size is reached. The extended hits are then combined into a single alignment.
This algorithm allows hits from the search stage to extend into high-scoring pairs using a scoring function. A graph is built with the high-scoring pairs as nodes. A dynamic alignment program extracts the high-scoring alignment by traversing the graph. The high-scoring pairs in the alignment are removed, and if any high-scoring pairs are left in the dynamic program are rerun on the graph.
If the alignment is scattered across multiple homologous regions, these alignments are stitched together using minor changes in the algorithm used to stitch the high-scoring pairs. This stitching allows a method to find small internal exons that are out of range from other exons or are too small to be seen in the search stage.
BLAT is used for the following purposes:
BLAT | BLAST |
BLAT indexes the genome or protein database, retains its index, and then scans the query sequence for matches. | BLAST makes an index of the query sequence and searches the database for matches. |
BLAT can extend to near-perfect matches and multiple perfect matches. It extends to two perfect matches for nucleotide searches and three for protein searches. | BLAST can extend only when one or two macthes are found. |
BLAT connects all homologous regions between two sequences and then aligns them into one significant alignment. | BLAST returns each homologous region as a separate local alignment. |
BLAT displays the correctly placed bases of the mRNA onto the genome to identify exon-intron boundaries. | BLAST displays a list of exons, with each alignment extending just past the ends of the exons. |
BLAT is less sensitive than BLAST. | BLAST is more sensitive than BLAT. |
Note: To explore BLAT, click here.
Free Resources