Peptides are polymers of amino acids that constitute one of the major classes of molecules in biological organisms. Peptides and small proteins function in diverse biological processes. They frequently act to convey molecular signals by binding to cell-surface receptors and regulating intracellular signaling pathways. Peptide hormones, neurotransmitters, and neuropeptides are examples of these molecules that serve important signaling functions in organisms with central nervous systems. In the field of peptide biology, several outstanding questions remain. These questions include: How many biologically active peptides exist? How are these biologically active peptides produced? What biological functions do these peptides serve? How are the functions of these biologically active peptides regulated? Which biological pathways do these peptides regulate? The answers to these questions will not only reveal the diversity of peptide and small protein effectors in biological systems, but also a deeper understanding of how these biological processes are regulated.
Recent proteogenomic studies that combine next-generation sequencing and proteomics have revealed the existence of hundreds of peptides and small proteins, also called microproteins or small open reading frame-encoded polypeptides, in Escherichia coli, Saccharomyces cerevisiae, Mus musculus, and Homo sapiens. As proteogenomic techniques have been successful at identifying and detecting small open reading frames (smORFs) and microproteins, we wondered if these techniques could also be applied to other classes of peptides and small proteins that have been historically challenging to detect: peptide hormones, neurotransmitters, and neuropeptides.
Firstly, we developed an integrated proteogenomics strategy that was optimized to detect peptides and proteins. We applied this strategy to mouse brain tissue and were able to identify known peptide hormones, neurotransmitters, and neuropeptides. We also identified microproteins from unannotated smORFs that might be candidates with similar biological function. We then applied this proteogenomic strategy to extracellular fluids and detected secreted microproteins that are encoded by unannotated smORFs. Finally, we characterized the biochemical structure of the human C4ORF48 neuropeptide and its mouse ortholog Gm1673. Taken together, our findings increased the diversity of the genomes and proteomes of both human and mouse. Our approach reported here can be used more generally to discover and characterize microproteins in other organisms.