Machine learning for predicting  antibody-antigen interaction from  amino acid sequences

Ye, Chao

doi:10.26190/unsworks/30143

Machine learning for predicting antibody-antigen interaction from amino acid sequences

Download files

Access & Terms of Use

open access
Copyright: Ye, Chao

CC BY 4.0

Abstract

Background: The mammalian immune system is able to generate antibodies against a huge variety of antigens including bacteria, viruses and toxins. “Ultra-deep” DNA sequencing of rearranged immunoglobulin genes has considerable potential in furthering our understanding of the immune response, but is limited by the lack of high-throughput, sequence-based method for predicting the antigen(s) a given immunoglobulin will recognize. Objective: As a step towards the prediction of antibody-antigen binding from sequence data alone, we aimed to compare the application of a range of machine learning approaches to a collated dataset of antibody-antigen pairs in order to predict antibody-antigen binding from sequence data. Methods: Data for training and testing were extracted from the PDB and Cov-AbDab databases, and additional antibody-antigen pair data were generated using a molecular docking protocol. Several machine learning methods including weighted nearest neighbor, nearest neighbor with BLOSUM62 matrices and random forests were applied to the problem. Results: The final dataset contained 1157 antibodies and 57 antigens combined in 5041 Ab-Ag pairs. The best performance for prediction of interactions was obtained using nearest neighbor with BLOSUM62 matrices which allowed around 82% accuracy on the full dataset. These results provide a useful frame of reference as well as protocols and considerations for machine learning and dataset creation in this area. Conclusions: Several machine learning approaches were compared to predict antibody- antigen interaction from protein sequences. Both the dataset (in csv format) and the machine learning program (coded in python) are freely available for download at https://github.com/jessye123/ab-ag-seq-machine-learning

Publication Year

2024

Resource Type

Thesis

Degree Type

PhD Doctorate

UNSW Faculty

Files

public version.pdf

4.83 MB

Adobe Portable Document Format

View full record Show statistics

Library

Machine learning for predicting antibody-antigen interaction from amino acid sequences

Access & Terms of Use

Altmetric

Abstract

Persistent link to this record

DOI

Link to Publisher Version

Link to Open Access Version

Additional Link

Author(s)

Supervisor(s)

Creator(s)

Editor(s)

Translator(s)

Curator(s)

Designer(s)

Arranger(s)

Composer(s)

Recordist(s)

Conference Proceedings Editor(s)

Other Contributor(s)

Corporate/Industry Contributor(s)

Publication Year

Resource Type

Degree Type

UNSW Faculty

Files

Related dataset(s)