A Comparison of Base-calling Algorithms for Illumina Sequencing Technology

Brief Bioinform. 2016 Sep;17(5):786-95. doi: 10.1093/bib/bbv088. Epub 2015 Oct 5.

Abstract

Recent advances in next-generation sequencing technology have yielded increasing cost-effectiveness and higher throughput produced per run, in turn, greatly influencing the analysis of DNA sequences. Among the various sequencing technologies, Illumina is by far the most widely used platform. However, the Illumina sequencing platform suffers from several imperfections that can be attributed to the chemical processes inherent to the sequencing-by-synthesis technology. With the enormous amounts of reads produced, statistical methodologies and computationally efficient algorithms are required to improve the accuracy and speed of base-calling. Over the past few years, several papers have proposed methods to model the various imperfections, giving rise to accurate and/or efficient base-calling algorithms. In this article, we provide a comprehensive comparison of the performance of recently developed base-callers and we present a general statistical model that unifies a large majority of these base-callers.

Keywords: Illumina; base-calling; next-generation sequencing.

Publication types

  • Comparative Study

MeSH terms

  • Algorithms*
  • Base Sequence
  • High-Throughput Nucleotide Sequencing
  • Models, Statistical
  • Sequence Analysis, DNA