EcoCyc Project Overview

The EcoCyc database describes the genome, metabolic pathways, and regulatory network of E. coli K-12 substr. MG1655. The long-term goal of the project is to describe the molecular catalog of the E. coli cell, as well as the functions of each of its molecular parts, to facilitate a system-level understanding of E. coli. EcoCyc is an electronic reference source for E. coli biologists, and for biologists who work with related microorganisms.

The value of EcoCyc comes from its rich, extensively curated data content and its extensive bioinformatics tools. EcoCyc provides the following user operations:

"EcoCyc" sounds like "ecology" and like "encyclopedia".

EcoCyc Data Content

Genome. EcoCyc contains the complete genome sequence of E. coli, and describes the nucleotide position and function of every E. coli gene. EcoCyc curators updates the annotation of the E. coli genome on an ongoing basis using a literature-based curation strategy. Curators author mini-review summaries of E. coli genes, and annotate Gene Ontology terms. Users can retrieve the nucleotide sequence of a gene, and the amino-acid sequence of a gene product.

Proteome. EcoCyc describes each EcoCyc gene product and the protein complexes that they form. The database includes protein features (e.g., enzyme active sites); protein localizations; and enzyme cofactors, activators, and inhibitors.

Regulation. EcoCyc contains the most complete description of the regulatory network of any organism, capturing substrate-level enzyme regulation, attenuation, and regulation by small RNAs. Regulation of transcription initiation is curated by the group of Dr. Julio Collado-Vides at the UNAM. Actual curation of the data occurs within EcoCyc, and the information is periodically propagated to RegulonDB. Both databases therefore have the same data content on transcriptional regulation of gene expression.

Metabolism. EcoCyc describes all known metabolic pathways and signal-transduction pathways of E. coli. It describes each metabolic enzyme, the reaction(s) they catalyze. See also the MetaCyc project.

Gene Essentiality. Several gene essentiality datasets are included.

Literature-Based Curation. Curation is the process of manually refining and updating a database. As of February 2024, EcoCyc has encoded information from more than 44,142 publications. Curators author textual summaries with extensive citations for proteins, RNAs, pathways, and operons that capture phenotypes caused by mutation, depletion, or overproduction of each gene product; any genetic interactions known; protein domain architecture and structural studies; similarity to other proteins; or any functional complementation experiments that have been described. Summaries can also be used to note cases in which the published reports present contradictory results. In such cases, both viewpoints will be presented with proper attribution.

See also the list of data sources from which EcoCyc integrates data.

EcoCyc Bioinformatics Tools

More than 60 software tools support a number of search, visualization, and analysis operations for EcoCyc data.

Search and visualization. Scientists can use the EcoCyc web site to search for genes, pathways, metabolites, etc. The navigation capabilities of the software enable a user to move from a display of an enzyme to a display of a reaction that the enzyme catalyzes, or to the gene that encodes the enzyme. The EcoCyc genome browser visualizes the layout of genes within the E. coli chromosome, and the metabolic network browser provides a zoomable view of the full E. coli metabolic network.

Analysis of omics data. EcoCyc provides four tools for analysis of single-omics datasets. The second and third tools (Cellular Omics Viewer and Dashboard) can also be used to analyze multi-omics datasets.

Comparative genomics. We have computed orthologs between EcoCyc and the 20,000 other microbial genomes within BioCyc, including 502 other E. coli strains. The genome browser can be run in a comparative mode that aligns multiple chromosomes at orthologous genes. Metabolic network comparison tools are also available.

For More Information

EcoCyc Certification

EcoCyc has been designated a Global Core Biodata Resource by the Global Biodata Coalition, meaning that EcoCyc is one of 37 resources whose long term funding and sustainability is critical to life-science and biomedical research worldwide.

We Encourage Your Feedback

Feedback from the scientific community has been invaluable to improving EcoCyc during its many years of development. We strongly encourage your comments and suggestions for improvements in areas including the following. Please email suggestions or questions to our .

Acknowledgments

Contributors to EcoCyc are listed on the credits page.

The development of EcoCyc is funded by NIH grant 1R24GM150703 from the NIH National Institute of General Medical Sciences.