What is BLAT?

Overview

BLAT stands for BLAST-like alignment tool. It assists in the annotation and assembly of the human genome.

BLAT is used to analyze and compare biological sequences, which include DNA, RNA, and proteins. It aims to infer the homologySimilarity in structure or development of different species. to discover the biological functions of genomic sequences.

It works best for sequences that have high similarity.

BLAT input and output

BLAT has a few specifications in terms of input and output. These specifications are discussed below.

Input

BLAT takes input from long database sequences, but it works ideally for short query sequences. It takes in sequence in the form of plain text.

It has limitations on the size limit of the input. For multiple sequences, nucleotides must not exceed 50,000 bases,Molecules that combine to form DNA. and amino acidsMolecules that combine to form proteins. must not exceed 25,000 letters. Moreover, it requires eight to six-sized k-mersSubstrings of length k obtained by breaking down a biological sequence. for nucleotides and three to seven-sized k-mers for amino acids to match a query.

Output

BLAT gives the output as a list of results in decreasing order of their alignment scoreA value calculated by assigning score to aligned pair of letters and then summing the score over the length of the sequence..

BLAT output displays the following information:

Alignment score.
Matching region of the query sequence and the database.
Query size.
Level of identity as a percentage of alignment and the chromosome.
Position at which the query sequence maps.

BLAT algorithm

The BLAT algorithm works in two steps. These are the search stage and alignment stage.

Search stage

The search stage has three different methodologies to search for homologous regions. These are searching with single perfect matches, single almost perfect matches, and multiple perfect matches.

Searching with single perfect matches

The first method requires single perfect matches between the search query and the database. This is not an ideal search approach as it requires small k-mers to achieve sensitivity, thus increasing the chances of false-positive hits. This increases the amount of time needed to search.

Searching with single almost perfect matches

The second method requires at least one mismatch between the sequences. This method is valuable in identifying small homologous regions. This method decreases the number of false positives, thus forming large-sized k-mers that are computationally less expensive to search.

Search with multiple perfect matches

The third method requires multiple perfect matches which are close to each other. This method considers the deletions and insertions that occur in the homologous regions.

Alignment stage

There are three stages of alignment. These are nucleotide alignments for base pairs, protein alignments for amino acids, and the stitching and fitting of multiple homologous regions.

Nucleotide alignments

This algorithm initiates by making a hit list between the search query and the homologous region of the database. The algorithm looks for small and perfect hits. If a k-mer hit matches multiple k-mers in the database, the k-mer is extended one by one until it matches a unique sequence on the database or till a specific size is reached. The extended hits are then combined into a single alignment.

Protein alignments

This algorithm allows hits from the search stage to extend into high-scoring pairs using a scoring function. A graph is built with the high-scoring pairs as nodes. A dynamic alignment program extracts the high-scoring alignment by traversing the graph. The high-scoring pairs in the alignment are removed, and if any high-scoring pairs are left in the dynamic program are rerun on the graph.

Stitching and filling in

If the alignment is scattered across multiple homologous regions, these alignments are stitched together using minor changes in the algorithm used to stitch the high-scoring pairs. This stitching allows a method to find small internal exons that are out of range from other exons or are too small to be seen in the search stage.

BLAT function

BLAT is used for the following purposes:

Inferring genomic coordinates by aligning multiple mRNA sequences onto a genome assembly.
Determining homology by aligning proteins or mRNA sequences of one specie to another.
Determining the distribution of exonic and intronic regions in a gene.
Displaying the protein-coding sequence in a gene.
Detection of a gene family of a specific gene query.

Comparison between BLAT and BLAST

BLAT	BLAST
BLAT indexes the genome or protein database, retains its index, and then scans the query sequence for matches.	BLAST makes an index of the query sequence and searches the database for matches.
BLAT can extend to near-perfect matches and multiple perfect matches. It extends to two perfect matches for nucleotide searches and three for protein searches.	BLAST can extend only when one or two macthes are found.
BLAT connects all homologous regions between two sequences and then aligns them into one significant alignment.	BLAST returns each homologous region as a separate local alignment.
BLAT displays the correctly placed bases of the mRNA onto the genome to identify exon-intron boundaries.	BLAST displays a list of exons, with each alignment extending just past the ends of the exons.
BLAT is less sensitive than BLAST.	BLAST is more sensitive than BLAT.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

You TubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources