Recent advances in next-generation sequencing technology have yielded increasing cost-effectiveness and higher throughput produced per run, in turn, greatly influencing the analysis of DNA sequences. Among the various sequencing technologies, Illumina is by far the most widely used platform. However, the Illumina sequencing platform suffers from several imperfections that can be attributed to the chemical processes inherent to the sequencing-by-synthesis technology. With the enormous amounts of reads produced, statistical methodologies and computationally efficient algorithms are required to improve the accuracy and speed of base-calling. Over the past few years, several papers have proposed methods to model the various imperfections, giving rise to accurate and/or efficient base-calling algorithms. In this article, we provide a comprehensive comparison of the performance of recently developed base-callers and we present a general statistical model that unifies a large majority of these base-callers.
Keywords: Illumina; base-calling; next-generation sequencing.
© The Author 2015. Published by Oxford University Press. For Permissions, please email: [email protected].