Machine learning for predicting antibody-antigen interaction from amino acid sequences

Download files
Access & Terms of Use
open access
Copyright: Ye, Chao
Altmetric
Abstract
Background: The mammalian immune system is able to generate antibodies against a huge variety of antigens including bacteria, viruses and toxins. “Ultra-deep” DNA sequencing of rearranged immunoglobulin genes has considerable potential in furthering our understanding of the immune response, but is limited by the lack of high-throughput, sequence-based method for predicting the antigen(s) a given immunoglobulin will recognize. Objective: As a step towards the prediction of antibody-antigen binding from sequence data alone, we aimed to compare the application of a range of machine learning approaches to a collated dataset of antibody-antigen pairs in order to predict antibody-antigen binding from sequence data. Methods: Data for training and testing were extracted from the PDB and Cov-AbDab databases, and additional antibody-antigen pair data were generated using a molecular docking protocol. Several machine learning methods including weighted nearest neighbor, nearest neighbor with BLOSUM62 matrices and random forests were applied to the problem. Results: The final dataset contained 1157 antibodies and 57 antigens combined in 5041 Ab-Ag pairs. The best performance for prediction of interactions was obtained using nearest neighbor with BLOSUM62 matrices which allowed around 82% accuracy on the full dataset. These results provide a useful frame of reference as well as protocols and considerations for machine learning and dataset creation in this area. Conclusions: Several machine learning approaches were compared to predict antibody- antigen interaction from protein sequences. Both the dataset (in csv format) and the machine learning program (coded in python) are freely available for download at https://github.com/jessye123/ab-ag-seq-machine-learning
Persistent link to this record
Link to Publisher Version
Link to Open Access Version
Additional Link
Author(s)
Supervisor(s)
Creator(s)
Editor(s)
Translator(s)
Curator(s)
Designer(s)
Arranger(s)
Composer(s)
Recordist(s)
Conference Proceedings Editor(s)
Other Contributor(s)
Corporate/Industry Contributor(s)
Publication Year
2024
Resource Type
Thesis
Degree Type
PhD Doctorate
UNSW Faculty
Files
download public version.pdf 4.83 MB Adobe Portable Document Format
Related dataset(s)