qPCR microarray sequencing

qPCR, Microarrays and RNA-seq: Why and When Should You Choose One Over the Other?

It all basically depends on the goals of the project, your budget and the organism of interest. If you have to analyze the expression of few genes (let’s say a maximum of 30 genes) for which you know the sequence, just go for qPCR. Why? It’s got the widest dynamic range, the lowest quantification limits and the least biased results in comparison to the other two methods. Also the amount of starting material can be very low. Running 30 reactions per sample will still be cheaper than doing microarrays or RNA-seq. In order to make your research believable and repeatable follow the minimal standards (MIQE) which will tell you to include controls, check PCR efficiency, etc. qPCR is the gold standard for expression analysis so if you use microarrays or RNA-seq you’ll have to use it anyway to confirm your results.

If you don’t know yet which genes you should analyze, or you want to do a whole transcriptome DE analysis and you have a good reference sequence for your organism, microarrays are a cheap and robust option for this. For microarrays, good bioinformatics and statistics practices are well established and easy to use free software packages, and pipelines exist (e.g. oneChannelGUI, dChip, Chipster, Orange) that can basically do everything for you. All the data together with the analyzed results can easily fit on a normal-sized hard drive (even maybe a USB stick) and your laptop will be able to handle the analysis. The downsides of arrays are the low dynamic range and the need of having the sequences for probe design. Of course arrays are also out of fashion since the arrival of next generation sequencing (NGS).

RNA-seq enables you to look at differential expressions at a broader dynamic range than microarrays, enables you to examine variations (SNPs, insertions, deletions) or even discover new genes and alternative splice variations using the same dataset. Bear in mind that it is still more expensive than arrays and presents a bigger challenge at the planning stage. First you’ll have to decide which technology to use (Illumina, Solid, IonTorrent, PacBio or a combination of these), what kind of library preparation to go for (strand-specific or not, barcode or not, amplify by PCR or not, remove rRNA or use oligoT beads, uhhhh!) and what kind of sequencing (read length, single or paired end). And that’s not all: you have to decide how many reads you have to sequence. Is 100x transcriptome coverage enough? It may not be if you want to analyze low expressed genes. When you get the data you’ll have to again decide how to analyze it – there are already some good practices and commonly used pipelines (Tuxedo protocol for RNAseq and GATK for SNP calling) but since the wet-lab advances really quickly the pipelines quickly become obsolete, slow or just don’t work anymore.

You want to do RNA-seq bioinformatics yourself and don’t want to pay a pile of money for software suits such as CLC Genomics? Then you’ll have to forget about nice graphical interfaces, it’s all Linux-terminal based programs and scripts in different programming languages, a great diversity of file formats and not “one tool” that is suitable for all datasets and every question you want to ask. Even when you just want to get fold changes out of a RNA-seq, for each of the commonly performed 5 steps (adapter trimming, filtering, alignment/mapping, counting, normalization and statistical analysis) you’ll have to choose from many software options that each claim to perform best (see the list of short read alignment programs at Wikipedia). Of course, based on the decisions you make, the fold changes and the number of DE genes will differ.

Hiring a bioinformatician and/or a statistician means more $$ in addition to library prep and sequencing $$. Also consider that the files you get out of the sequencers are big (few GB per sample) so you’ll have to check if you need more space and computing power: this again means additional $$.

RNA-seq is the only clever option if you want to find DE genes in a huge non-sequenced genome (if you’re working with small genomes like bacteria, sequence its genome first!). The lack of a reference genome means you’ll have to assemble the transcriptome “de-novo”. For that you’ll need a lot of RAM and some serious computing power if you don’t want to wait for ages. Again you’ll have at least 5 of “the best” programs/pipelines to do it. This means you can get several versions of the transcriptome assembly from which you have to choose one as your reference. Some guidelines to assess transcriptome assembly quality already exist, but may not hold true for all organisms and datasets. The problem is that no parameter really guarantees that the transcripts you’re interested in are really assembled correctly. For the proper interpretation of RNAseq results transcriptome assemblies lack the confidence you get with a good quality reference sequence.

Although RNA-seq is an invaluable tool to study gene expression and variation, make sure you carefully plan the experiments and estimate the costs before you decide which method to use.


By Marko Petek, PhD, Research and Development Associate, BioSistemika LLC



Leave us a comment:

Splice supporters