Immunoinformatics and epitope prediction in the age of genomic medicine

Genome Medicine, Nov 2015

Immunoinformatics involves the application of computational methods to immunological problems. Prediction of B- and T-cell epitopes has long been the focus of immunoinformatics, given the potential translational implications, and many tools have been developed. With the advent of next-generation sequencing (NGS) methods, an unprecedented wealth of information has become available that requires more-advanced immunoinformatics tools. Based on information from whole-genome sequencing, exome sequencing and RNA sequencing, it is possible to characterize with high accuracy an individual’s human leukocyte antigen (HLA) allotype (i.e., the individual set of HLA alleles of the patient), as well as changes arising in the HLA ligandome (the collection of peptides presented by the HLA) owing to genomic variation. This has allowed new opportunities for translational applications of epitope prediction, such as epitope-based design of prophylactic and therapeutic vaccines, and personalized cancer immunotherapies. Here, we review a wide range of immunoinformatics tools, with a focus on B- and T-cell epitope prediction. We also highlight fundamental differences in the underlying algorithms and discuss the various metrics employed to assess prediction quality, comparing their strengths and weaknesses. Finally, we discuss the new challenges and opportunities presented by high-throughput data-sets for the field of epitope prediction.

Article PDF cannot be displayed. You can download it here:

https://genomemedicine.biomedcentral.com/track/pdf/10.1186/s13073-015-0245-0

Immunoinformatics and epitope prediction in the age of genomic medicine

Backert and Kohlbacher Genome Medicine (2015) 7:119 DOI 10.1186/s13073-015-0245-0 REVIEW Open Access Immunoinformatics and epitope prediction in the age of genomic medicine Linus Backert1* and Oliver Kohlbacher1,2,3 Abstract Immunoinformatics involves the application of computational methods to immunological problems. Prediction of B- and T-cell epitopes has long been the focus of immunoinformatics, given the potential translational implications, and many tools have been developed. With the advent of next-generation sequencing (NGS) methods, an unprecedented wealth of information has become available that requires more-advanced immunoinformatics tools. Based on information from whole-genome sequencing, exome sequencing and RNA sequencing, it is possible to characterize with high accuracy an individual’s human leukocyte antigen (HLA) allotype (i.e., the individual set of HLA alleles of the patient), as well as changes arising in the HLA ligandome (the collection of peptides presented by the HLA) owing to genomic variation. This has allowed new opportunities for translational applications of epitope prediction, such as epitope-based design of prophylactic and therapeutic vaccines, and personalized cancer immunotherapies. Here, we review a wide range of immunoinformatics tools, with a focus on B- and T-cell epitope prediction. We also highlight fundamental differences in the underlying algorithms and discuss the various metrics employed to assess prediction quality, comparing their strengths and weaknesses. Finally, we discuss the new challenges and opportunities presented by high-throughput data-sets for the field of epitope prediction. Keywords: Immunoinformatics, Bioinformatics, Next-generation sequencing, Machine learning, HLA, Vaccine design, Personalized medicine From genomics to epitope prediction Immunoinformatics deals with the application of computational methods to immunological problems and is thus considered a part of bioinformatics. Historically, tools for the prediction of HLA-binding peptides were the first tools developed specifically for immunoinformatics applications (Box 1). These tools paved the way for more-complex applications. The development of immunoinformatics tools has been crucial to the availability of sufficient experimental data. High-throughput human leukocyte antigen (HLA) binding assays led to major progress in this area. More recently, next-generation sequencing (NGS) has facilitated many of the novel applications and challenges that we will review here. A first area where the availability of cost-effective sequencing is having a large impact is our knowledge of the major histocompatibility complex (MHC, HLA in human) itself. The number * Correspondence: 1 Applied Bioinformatics, Center of Bioinformatics and Department of Computer Science, University of Tübingen, Sand 14, 72076 Tübingen, Germany Full list of author information is available at the end of the article of known HLA alleles, as registered in the International ImMunoGeneTics information system (IMGT) database, has increased from 1000 in 1998 to more than 13,000 in 2015 [1]. Initially tools for prediction of HLA binding (often also — slightly inaccurately — called epitope prediction) were trained on data for each HLA allele independently, but the number of new alleles renders this approach more and more impractical. The development of novel predictors, so-called pan-specific binding predictors, has been necessitated by this development. In general, the availability of large-scale data has improved the performance of immunoinformatics tools, and, for many, although not for all, applications, there is now a wealth of data available. This increase in data volume often translates to an increased accuracy of these tools, primarily because many tools are based on machine learning methods, which profit greatly from additional data. In this context, the availability of comprehensive and well-curated immunological databases is essential. Here, we will first review how immunoinformatics tools can be used to infer HLA allotypes from NGS data, and © 2015 Backert and Kohlbacher. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Backert and Kohlbacher Genome Medicine (2015) 7:119 Box 1. The adaptive immune system The adaptive immune system is the component of the immune system that can learn to recognize specific threats (e.g., pathogens). This immunological memory results in long-lasting immunity and rapid immune responses. Humoral immunity is mediated by the recognition of antigens by B cells, whereas cell-mediated immunity is based on the presentation of antigens on human leukocyte antigen (HLA) and the recognition of these antigens by T cells. B cells recognize antigens through membrane-bound antibodies using B-cell receptors (BCRs), resulting in the secretion of antibodies that bind to the antigen and deactivate or eliminate it. Processing and presentation of peptide epitopes are essential steps in cell-mediated immunity. In general, the HLA class I pathway processes proteins originating from inside the cell, whereas the class II pathway presents extracellular proteins (Fig. 2). The HLA system is encoded by 21 genes, which are Page 2 of 12 construct vaccines based only on the genomic sequence of a pathogen [2], and the availability of personal genomic data enables personalized approaches to cancer immunotherapy [3]. It is in these areas that we expect the combination of NGS data and novel computational tools to impact healthcare in a most profound way. Immunoinformatics methods and databases for epitope prediction The availability of the sequence data of HLA-binding peptides in the early 1990s [4] led to a search for commonalities among these sequences — that is, allelespecific motifs that convey binding. It quickly became clear that the interaction between HLA and peptides is rather complex, and thus more and more involved pattern-recognition methods were developed. Learning patterns from data is a field in computer science that is typically called machine learning (ML), and, in particular, supervised ML has been applied to HLA-ligand binding. located on chromosome 6 and are highly polymorphic. HLA class I entails three different loci, HLA-A, HLA-B and HLA-C, and HLA class II encompasses HLA-DR, HLA-DP and HLA-DQ. Owing to the possession of a diploid genome, each individual can thus have between three and six different HLA class I allotypes. (...truncated)


This is a preview of a remote PDF: https://genomemedicine.biomedcentral.com/track/pdf/10.1186/s13073-015-0245-0
Article home page: https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-015-0245-0

Linus Backert, Oliver Kohlbacher. Immunoinformatics and epitope prediction in the age of genomic medicine, Genome Medicine, 2015, pp. 119, Volume 7, Issue 1, DOI: 10.1186/s13073-015-0245-0