Recently there’s been a lot of excitement building up within the molecular biology community about sequencing long reads, and by “long reads” I mean reads of several kb in length…
PacBio has been offering long read sequencing since 2011 and is currently a leader in the long read sequencing segment. Although they came first, their competitors are not far behind. However, it must be noted that PacBio’s instruments or sequencing provider runs don’t come cheap. This is something that the new technologies are aiming to beat.
The first to compete for this market segment are modified short read library prep approaches combined with clever bioinformatics pipelines that can, in theory, accurately assemble very long DNA stretches. Two companies that announced these strategies are Illumina, with its TruSeq Synthetic Long-Read Technology (formerly Moleculo) and 10x genomics, with their GemCode Platform.
The second competitors that will shortly get to the marketing phase in this segment utilize other single-molecule approaches. One of these is Oxford Nanopore, which has been beta-testing its MinION Platform in the field for about a year. They use a protein pore as sensor and call nucleotide sequences from ion current changes generated when DNA passes through the pore. Although MinION sequencing reads are full of errors, some users have already achieved results by sequencing a bacterial antibiotic resistance island, the E. coli genome and the S. cerevisiae genome.
But why such a buzz? What are the benefits for biologists of long reads over short reads?
Most obviously, genomes or transcriptomes will be much easier to de novo assemble. At the moment a long and short read complementary approach still yields better results than using one or the other. This is because long read technologies have substantially higher error rates than short read technologies. When long read error rates or prices drop (and both probably will), the shorter reads may become unnecessary for de novo assembly applications.
For human genomics the most interesting applications of long read technologies are haplotype genotyping (those that distinguish paternal and maternal single-nucleotide polymorphisms (SNPs) and the unraveling of structural genome variations (rearrangements). How long reads are instrumental for these applications is elegantly explained in a short clip from 10x Genomics and a recent LabSpace blog post.
The message you should take away from this is: You should be excited about long sequencing reads and think about how these can fit into your next genomics or transcriptomics project.
By Marko Petek, PhD, Research and Development Associate, BioSistemika LLC