Algorithms for third-generation sequencing

Third-generation sequencing techniques are used for high-quality sequencing. These are better in many aspects than the first-generation and second-generation sequencing techniques. However, third-generation sequencing and mapping technology face high error rates. They are being worked on and expected to improve over time. Third-generation sequencing proves useful for applications with tolerance to error rates.

Third-generation sequencing

Third-generation sequencing is also known as long-read sequencing. We define it as a generation of DNA sequencing methods that are currently under active development.

Algorithms for third-generation sequencing

There are many algorithms available for third-generation sequencing. A few of these are mentioned ahead.

Genome assembly

Genome assembly is defined as the reconstruction of the whole genome DNA sequences. Vast quantities of DNA fragments are aligned in this type of assembly.

Reference alignment

When a reference genome is given, the newly sequenced reads can be aligned to the reference genome to characterize their characteristics. This type of assembly is easy to implement and quick. The drawback of this method is that it hides the novel sequences and significant copy variantsA genetic trait that involes a number of copies of a particular gene which is present in the genome of an organism.. Moreover, the reference genomes for the majority of the organisms are not available.

De novo assembly

De novo assembly is another reference alignment technology. This method reconstructs the whole genome sequence from raw sequence reads. This method is chosen when no reference genome is available, when the species of the given organism is unknown or when specific genetic variants of interest can not be detected by reference genome alignment.

De novo assembly is a computational problem due to the short reads produced due to the present generation of sequencing. This is resolved by the iterative process of finding and connecting sequences with suitable overlaps. Pair end reads are a possible solution to this problem, but it also has drawbacks.

Hybrid assembly

Longer reads offered by the third-generation technologies may ease the problems faced by de novo assembly. For instance, if an entire repetitive region is sequenced ambiguously as a single read, no computational inference would be required.

Third-generation sequencing and second-generation sequencing together are used to lessen the error rates. This works as long reads from third-generation sequencing may be used to resolve the ambiguities present in the second-generation assembled sequences. While the short second-generation reads can be used to resolve the errors present in the long third-generation sequence reads.

Chromosome scaffolding

Greedy approachesAn approach that involves solving a problem by selecting the best option available at the moment. are used along with third-generation mapping technologies to order and sequence contiguities into large scaffolds. This iteratively links the contiguities and performs a global alignmentAn algorithm that aligns two sequences letter by letter. to satisfy the linking information.

This sequencing, along with long-range mapping data, improves the assemblies and is cost-effective. The drawback of this method is that scaffolding chromosomes have less information than fully sequenced chromosomes. This can cause overlooking necessary sequences while reading or the obscured gaps that occur in between the gaps.

Haplotype phasing

This method is used for phasing heterozygous variantsVariants having two different alleles of a specific gene or genes. into haplotypeA group of genetic determinants located on a chromosome. resolved sequences. This helps analyze allele-specific expression, determine the parents of origin for the de novo mutations, and so on. The phasing algorithms analyze the heterozygous variants in the genome and use the read sequences or mapping technologies to link the alleles together that are present in the same chromosome.

This analysis is complicated due to sequencing errors and the uneven coverage that gives additional false variants to be introduced and missing the true heterozygous variants in the sequence.

Optimization frameworks are used to enhance the robustness and to decrease the disagreements between the assignment and the reading process.

Free AI Mock Interviews

Coding Interview

Coding PatternsFree Interview

Gain insights and practical experience with coding patterns through targeted MCQs and coding problems, designed to match and challenge your expertise level.

System Design

YouTubeFree Interview

Learn to design a video streaming platform like YouTube by tackling functional and non-functional requirements, core components, and high-level to detailed design challenges.

Free Resources

Algorithms for third-generation sequencing

Third-generation sequencing