featureCounts: an efficient general purpose program for assigning sequence reads to genomic features

Yang Liao; Gordon K Smyth; Wei Shi

doi:10.1093/bioinformatics/btt656

featureCounts: an efficient general purpose program for assigning sequence reads to genomic features

Bioinformatics. 2014 Apr 1;30(7):923-30. doi: 10.1093/bioinformatics/btt656. Epub 2013 Nov 13.

Authors

Yang Liao¹, Gordon K Smyth, Wei Shi

Affiliation

¹ Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Department of Computing and Information Systems and Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia.

PMID: 24227677
DOI: 10.1093/bioinformatics/btt656

Abstract

Motivation: Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to each genomic feature, for example to each exon or each gene. The process of counting reads is called read summarization. Read summarization is required for a great variety of genomic analyses but has so far received relatively little attention in the literature.

Results: We present featureCounts, a read summarization program suitable for counting reads generated from either RNA or genomic DNA sequencing experiments. featureCounts implements highly efficient chromosome hashing and feature blocking techniques. It is considerably faster than existing methods (by an order of magnitude for gene-level summarization) and requires far less computer memory. It works with either single or paired-end reads and provides a wide range of options appropriate for different sequencing applications.

Availability and implementation: featureCounts is available under GNU General Public License as part of the Subread (http://subread.sourceforge.net) or Rsubread (http://www.bioconductor.org) software packages.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Genome
Genomics / methods*
High-Throughput Nucleotide Sequencing
Histones / chemistry
Histones / genetics
Sequence Analysis, RNA
Software*

Substances

Histones