Beyond MHC Binding: Immunogenicity Prediction Tools To Refine Neoantigen Selection in Cancer Patients
Beyond MHC Binding: Immunogenicity Prediction Tools To Refine Neoantigen Selection in Cancer Patients
Beyond MHC Binding: Immunogenicity Prediction Tools To Refine Neoantigen Selection in Cancer Patients
*Correspondence: Marí�a Marcela Barrio, Centro de Investigaciones Oncológicas, Fundación Cáncer, Ciudad Autónoma de
Buenos Aires C1426ANZ, Argentina. [email protected]
Academic Editor: Pierre-Antoine Gourraud, Public Health Université de Nantes, France
Received: October 21, 2022 Accepted: January 29, 2023 Published: April 25, 2023
Cite this article: Carri I, Schwab E, Podaza E, Garcia Alvarez HM, Mordoh J, Nielsen M, et al. Beyond MHC binding:
immunogenicity prediction tools to refine neoantigen selection in cancer patients. Explor Immunol. 2023;3:82–103.
https://doi.org/10.37349/ei.2023.00091
Abstract
In the last years, multiple efforts have been made to accurately predict neoantigens derived from somatic
mutations in cancer patients, either to develop personalized therapeutic vaccines or to study immune
responses after cancer immunotherapy. In this context, the increasing accessibility of paired whole-exome
sequencing (WES) of tumor biopsies and matched normal tissue as well as RNA sequencing (RNA-Seq)
has provided a basis for the development of bioinformatics tools that predict and prioritize neoantigen
candidates. Most pipelines rely on the binding prediction of candidate peptides to the patient’s major
histocompatibility complex (MHC), but these methods return a high number of false positives since they lack
information related to other features that influence T cell responses to neoantigens. This review explores
available computational methods that incorporate information on T cell preferences to predict their activation
after encountering a peptide-MHC complex. Specifically, methods that predict i) biological features that
may increase the availability of a neopeptide to be exposed to the immune system, ii) metrics of self-similarity
representing the chances of a neoantigen to break immune tolerance, iii) pathogen immunogenicity, and
iv) tumor immunogenicity. Also, this review describes the characteristics of these tools and addresses their
performance in the context of a novel benchmark dataset of experimentally validated neoantigens from
patients treated with a melanoma vaccine (VACCIMEL) in a phase II clinical study. The overall results of the
evaluation indicate that current tools have a limited ability to predict the activation of a cytotoxic response
against neoantigens. Based on this result, the limitations that make this problem an unsolved challenge in
immunoinformatics are discussed.
© The Author(s) 2023. This is an Open Access article licensed under a Creative Commons Attribution 4.0 International
License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution
and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Introduction
Neoantigens are defined as patient-specific antigens that arise from tumor-specific genetic variations
such as somatic mutations, gene fusions, and alternative splicing variants [1]. Other variants that
expand the repertoire of targetable neoantigens for cancer immunotherapy are derived from aberrant
transcription-induced chimeric RNAs, generated from trans-splicing of precursor mRNAs or via cis-splicing of
adjacent genes [2, 3], post-translational modifications [4, 5], and transposable elements [6]. In consequence,
neoantigens are only expressed in tumor tissues and thus, the immune response against them is highly
tumor-specific. Multiple studies have demonstrated that T cells can recognize these neoantigens and
distinguish tumor cells from normal cells [7]. In this scenario, targeting strong immunogenic neoepitopes
has relevant therapeutic potential. Highly mutated tumors allow the emergence of more neoepitopes to
be recognized by T cells and indeed, have a better response to immunotherapy with monoclonal antibodies
that block immune checkpoints (ICI). This has been demonstrated in melanoma as well as in lung cancer,
urothelial cancer [8], and in mismatch repair-deficient tumors [9]. But even if tumor mutational burden (TMB)
is high, major histocompatibility complex (MHC) allele homozygosis or the expression of MHCs with highly
similar recognition motifs can limit the number of presented peptides in a given individual [10].
However, the generation of immunogenic neoepitopes is not the only factor that influences whether a
variant results in clinically relevant tumor cell recognition and lysis by T cells. Proteins containing mutated
peptides must be efficiently transcribed, translated, processed by the antigen-processing cells, and loaded
onto MHC molecules for presentation on the cell surface to be recognized by a T cell (Figure 1). Alterations
in genes that modulate these processes, as well as downregulation of MHC expression in tumor cells, can
abrogate the immunogenicity of neoepitopes in cancer patients [11, 12].
Figure 1. Steps involved in antigen processing, presentation, and T cell recognition. Four categories of computational
predictive tools that address each aspect were defined, which are displayed in the lower panels. This figure was created
with BioRender.com. iPCPS: improved proteasome cleavage prediction server; IEDB: Immune Epitope Database;
INeo-Epp: immunogenic epitope/neoepitope prediction; TA predictor: tumor antigen predictor; PRIME: Predictor of Immunogenic
Epitopes; iTTCA-RF: Identification of Tumor T cell Antigens-Random Forest
Accurate neoepitope prediction pursues several purposes: i) to design personalized cancer treatments,
such as neoantigen-targeted vaccines [13–15] and adoptive cell therapies [16], where the immune system
is stimulated to recognize neoantigens, increasing the frequency of specific CD8+ T cells and potentially
ProteaSMM: For most MHC I ligands, proteasomal cleavage at the C-terminus is the first step in antigen
processing [45]. ProteaSMM [46] is a tool that uses the stabilized matrix method (SMM) algorithm for
predicting proteasomal cleavages. The authors constructed two different matrices that account for digests
Figure 2. Characteristics of the in-house neoepitope dataset. (A) Immunogenic fraction of neoepitopes per patient; (B) predicted
binding to corresponding patient’s MHC for immunogenic and non-immunogenic neopeptides (Mann Whitney U = 0.498, n.s.);
(C) immunogenic fraction of neoepitopes per length; (D) immunogenic fraction of neoepitopes per MHC allele
Figure 3. Amino acid enrichment in central positions of immunogenic (up) vs. non-immunogenic (down) peptides from
different sources. The amino acids discussed are shown in blue. (A) Neopeptides from in-house dataset (immunogenic: 26,
non-immunogenic: 68); (B) neopeptides from the CEDAR and NEPdb databases (immunogenic: 527, non-immunogenic: 2541);
(C) viral peptides from the IEDB (immunogenic: 367, non-immunogenic: 7080)
Table 2. AUC receiver operating characteristic (ROC) of best performing methods on in-house dataset
Category Method AUC ROC
i MHCflurry processing 0.609
iv PRIME score 0.604
i Variant allele frequency 0.6
iv INeo-Epp neoantigen 0.584
i HLAthena MSiCE 0.58
i ProteaSMM c 0.58
i HLAthena MSiC 0.576
i MHCflurry PS 0.571
i NetCTLpan TAP 0.568
i ProteaSMM i 0.561
N/A MixMHCpred 0.556
iv TA predictor 0.552
Tools are grouped by categories established in this article (i: biological features; iv: tumor immunogenicity)
The interferon gamma (IFNγ) enzyme-linked immunospot (ELISPOT) assays were used to assess
the immune responses against our in-house neopeptide dataset. This technique quantifies the number of
specific T cell clones recognizing a certain sample [94]. Considering the number of observed spots relative
to the unspecific background, a quantitative value which reflects the strength of the immunogenicity was
set (Supplementary materials). It can be hypothesized that some of the tools may better predict the most
immunogenic neoepitopes, which are capable of eliciting the highest number of IFNγ producing cells. To
identify such tools, the correlation between the quantitative values of ELISPOT and the estimations of
reviewed methods was calculated. For the immunogenic neoepitopes, a positive correlation with predictions
from NetMHCpan 4.0 and MHCflurry presentation was observed (Figure 4). With the entire dataset (which
also contains non-immunogenic neopeptides), no significant association was found (Table S1). These results
Figure 4. Correlation between immunogenicity values obtained from IFNγ ELISPOT assays and values obtained with predictive
methods. (A) NetMHCpan 4.0 EL rank (Pearson’s correlation test, r = –0.399; Spearman’s correlation test, ρ = –0.31);
(B) MHCflurry presentation score (Pearson’s correlation test, r = 0.38; Spearman’s correlation test, ρ = 0.47)
Discussion
This article has reviewed multiple bioinformatic and immunoinformatic tools proposed to contribute to
the prediction of immunogenic neopeptide candidates, besides MHC binding. A common characteristic of
these tools is that the sequence of the mutated peptide is the most relevant information considered. Other
characteristics derived or complementary to peptide sequence are: i) peptide availability (e.g., processing,
presentation, and abundance); ii) T cell availability (e.g., self-similarity and foreignness); iii, iv) TCR
preferences [e.g., location and type (charge, size, etc.) of amino acids in mutated peptides].
To evaluate the methods reviewed here, a novel neoepitope dataset was assembled. In the evaluation,
it was observed that most of the methods misclassify immunogenic and non-immunogenic neopeptides.
The authors acknowledge that the small number of peptides in this dataset (especially those from the
immunogenic fraction) may impose a limitation that could lead to underestimating the performance of the
tools. Besides, an important factor that may explain these results is the rationale behind neoepitope selection.
In biased datasets, composed of peptides preselected by some criteria (e.g., antigen presentation, peptide
binding to MHC, and antigen expression), the specific feature used for selection will in general not show any
predictive performance. In our in-house neoantigen dataset, the main selection criterion was the predicted
binding to MHC by using the NetMHCpan 4.0 EL model. This imposes a bias towards not only NetMHCpan
but also all models that directly or indirectly predict binding and antigen presentation by MHC.
Pathogen-associated epitope datasets are the most abundant among validated peptides for T cell
immune response, and for this reason, methods specific to predict cancer antigens in general suffer from
being trained on small datasets. Several tools described in this review have been trained with epitopes
derived from pathogens (mostly viral). For instance, IEDB immunogenicity [73] and PRIME [87] are
methods that rely on the amino acid composition of immunogenic peptides, and both were trained with
pathogen-derived data (for PRIME this data was complemented by a minor proportion of neoepitopes). To
test the hypothesis that T cell recognition rules are general, the preferences for neoepitopes and viral
epitopes were analyzed. An enrichment in aromatic residues among our neoepitopes was observed, in
line with what was reported by the authors [73, 87]. However, tryptophan (W) was found to be completely
absent in our data, as well as in a large neoepitope dataset (combining CEDAR and NEPdb), although it was
highly abundant in viral epitopes (IEDB). Also, it was observed that neoepitopes do not have a clear amino
Conclusions
Over the last years, much progress has been made in the selection of tumor neoepitopes that have clinical
applications such as the development of personalized therapies. This was made possible by two major
technological developments: i) next generation sequencing (NGS) to obtain tumor sequences in reasonable
short time and low cost and ii) improvements in bioinformatic and immunoinformatic algorithms to obtain
highly accurate variant calls and predictions of neopeptides binding to MHC. Although very powerful, the
combination of these two technologies still yields a large number of neoepitope candidates that are very
expensive and laborious to test. The present work evaluated tools that could refine the selection of these
candidates, and the results indicate that there is still work ahead to accurately achieve this purpose. Mutated
peptide sequences indeed contain relevant information, but it is not enough to accurately predict its
immunogenicity. In our opinion, the lack of neoantigen data is a major challenge. Also, there is still a need
to integrate the complexity of the immune response in cancer patients, in particular, the generation of T cell
repertoires capable of recognizing neoepitopes. Solving this issue will require a technological improvement
of great magnitude such as the striking development of NGS and bioinformatics, which is expected to be
developed in the years to come. Finally, the phenomenon of immunological ignorance, which is partially
determined by patient-specific factors, causes good neoepitope candidates (in terms of the features
reviewed in this article) to be detected as negatives in in vitro assays. This imposes an intrinsic limitation
on the prediction of neoantigens, which at the moment remains to be solved.
Supplementary materials
The supplementary Figure, Tables, and Supplementary methods in Supplementary materials for this article
are available at: https://www.explorationpub.com/uploads/Article/file/100391_sup_1.xlsx and https://
www.explorationpub.com/uploads/Article/file/100391_sup_2.pdf.
Declarations
Acknowledgments
We dedicate this work to our patients. This work has been performed using the Danish National Life
Science Supercomputing Center, Computerome. We thank Emilio Fenoy for insightful discussions about
this research.
Author contributions
IC, MN, and MMB: Conceptualization, Writing—original draft. IC: Formal analysis. IC, ES, EP, and HMGA:
Investigation. IC and HMGA: Software. IC: Visualization. JM, MN, and MMB: Funding acquisition, Resources.
MMB: Supervision. IC, ES, EP, HMGA, JM, MN, and MMB: Writing—review & editing. All authors read and
approved the submitted version.
Conflicts of interest
The authors declare that they have no conflicts of interest.
Ethical approval
The CASVAC-0401 study was carried out after approval of the Ethics Committee of the Instituto Alexander
Fleming. The study was also approved by the Argentine Regulatory Agency (ANMAT, Disposition 1299/09).
Consent to publication
Not applicable.
Funding
This work was supported by grants from CONICET, Agencia Nacional de Promoción Cientí�fica y Tecnológica
(ANPCyT), Instituto Nacional del Cáncer—Ministerio de Salud de la Nación Argentina (INC-MSal), Fundación
Sales, Fundación Cáncer, and Fundación Pedro F. Mosoteguy, Argentina. The CASVAC-0401 Phase II clinical
study (Clinical Trials.gov, NCT 01729663) was sponsored by Laboratorio Pablo Cassará S.R.L. The funders
had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Copyright
© The Author(s) 2023.
References
1. Türeci Ö� , Vormehr M, Diken M, Kreiter S, Huber C, Sahin U. Targeting the heterogeneity of cancer with
individualized neoepitope vaccines. Clin Cancer Res. 2016;22:1885–96.
2. Zhang H, Lin W, Kannan K, Luo L, Li J, Chao PW, et al. Aberrant chimeric RNA GOLM1-MAK10 encoding
a secreted fusion protein as a molecular signature for human esophageal squamous cell carcinoma.
Oncotarget. 2013;4:2135–43.
3. Xiong X, Ke X, Wang L, Lin Y, Wang S, Yao Z, et al. Neoantigen‐based cancer vaccination using chimeric
RNA‐loaded dendritic cell‐derived extracellular vesicles. J Extracell Vesicles. 2022;11:e12243.
4. Katayama H, Kobayashi M, Irajizad E, Sevillarno A, Patel N, Mao X, et al. Protein citrullination as a
source of cancer neoantigens. J Immunother Cancer. 2021;9:e002549.
5. De Bousser E, Meuris L, Callewaert N, Festjens N. Human T cell glycosylation and implications on
immune therapy for cancer. Hum Vaccin Immunother. 2020;16:2374–88.
6. Bonté PE, Arribas YA, Merlotti A, Carrascal M, Zhang JV, Zueva E, et al. Single-cell RNA-seq-based
proteogenomics identifies glioblastoma-specific transposable elements encoding HLA-I-presented
peptides. Cell Rep. 2022;39:110916.
7. Schumacher TN, Schreiber RD. Neoantigens in cancer immunotherapy. Science. 2015;348:69–74.
8. Chan TA, Yarchoan M, Jaffee E, Swanton C, Quezada SA, Stenzinger A, et al. Development of tumor
mutation burden as an immunotherapy biomarker: utility for the oncology clinic. Ann Oncol.
2019;30:44–56.
9. Le DT, Uram JN, Wang H, Bartlett BR, Kemberling H, Eyring AD, et al. PD-1 blockade in tumors with
mismatch-repair deficiency. N Engl J Med. 2015;372:2509–20.
10. Chowell D, Krishna C, Pierini F, Makarov V, Rizvi NA, Kuo F, et al. Evolutionary divergence of HLA class I
genotype impacts efficacy of cancer immunotherapy. Nat Med. 2019;25:1715–20.
11. Maeurer MJ, Gollin SM, Martin D, Swaney W, Bryant J, Castelli C, et al. Tumor escape from immune
recognition: lethal recurrent melanoma in a patient associated with downregulation of the peptide
transporter protein TAP-1 and loss of expression of the immunodominant MART-1/Melan-A antigen.
J Clin Invest. 1996;98:1633–41.