Kiko Garcia

About PacBio Sequencing and Its Applications

Blog Post created by Kiko Garcia on Dec 25, 2018

Brief Introdution to PacBio Sequencing

The single molecule real-time (SMRT) sequencing, developed by Pacific Biosciences, is a newly emerging third-generation DNA sequencing technology. PacBio’s SMRT sequencing is also the first commercially available long-read sequencing technology in use currently. Compared with second generation sequencing (also called high-throughput sequencing), such as Illumina, and SOLiD, the PacBio sequencing system is significantly less expensive per run, doesn't rely on amplification for library generation, and supports shorter turn-around time.

Pacific Biosciences produces two kinds of reads. One is continuous long read (CLR) with an average error rate of ~15%, the other is the circular consensus sequencing (CCS) short reads, with multiple passes from the inserted sequence with high accuracy of > 97%. The requirement to read three or more full passes of the inserted sequence across CCS limits the insert size to <2.5 kb, but the CLR reads can reach ~40 kb by using a DNA polymerase anchored in the zero-mode waveguides. In contrast, the second-generation sequencing typically generate shorter reads, with a median length of ~100-250 bp for Illumina, and has a median length of ~500 bp for Roche454. Therefore, the CLR reads generated by the PacBio platform is a key progress in the high-throughput sequencing technologies, that is expected to benefit many genomic projects in the near future. The long sequences can span extended repetitive regions, giving them more power to reveal complex structural variations present in DNA samples, such as accurately pinpointing locations where copy number variations occur relative to a reference sequence. De novo genome assembly will also benefit from PacBio sequencing, as long reads can provide large scaffolds and complete assembly of the bacterial genome into routine using the PacBio sequencing platform.

The sequencing platform provides longer read lengths than the second-generation sequencing technologies. It has revolutionized the de novo genome assembly and enabled automatic reconstruction of the reference-quality genome. Due to its wide range of application areas, rapid sequencing simulation systems with high fidelity are needed to facilitate the development and comparison of subsequent analytical tools.

Applications to genome research--De novo assembly

De novo genome assembly is one of the main applications of PacBio sequencing because long reads can provide large scaffolds. PacBio long reads overcome many of the limitations of genome assembly using SGS data, such as the presence of highly repetitive genomic regions. Although the error rate of PacBio data is higher than that of SGS, increased coverage or hybrid sequencing can greatly improve accuracy. Attempts to perform de novo genome assembly using PacBio data begined with small targets, such as microbial genomes. Hierarchical Genome-assembly Process (HGAP) developed by Chen et al. De novo assembly was generated using PacBio sequencing data from a single, long-insert shotgun DNA library.

Applications to transcriptome research

Understanding the complete expression of a genetic isoform (ie, a transcript) is the basis of transcriptome researches. Although SGS is often used for gene profiling, it is often unable to identify full-length gene isoforms and may introduce amplification bias. In the context of complex eukaryotic genomes, SGS faces particularly severe limitations in transcript recall and splicing product identification. An assessment of the SGS methods for transcript reconstitution found that even based on similar transcript models, expression level estimates vary widely across methods. Because PacBio sequencing produces longer reads, it can be used to more comprehensively y identify transcripts.