A guide to in silico vaccine discovery for eukaryotic pathogens (pdf)

Article PDF cannot be displayed. You can download it here:

https://bib.oxfordjournals.org/content/14/6/753.full.pdf

A guide to in silico vaccine discovery for eukaryotic pathogens

B RIEFINGS IN BIOINF ORMATICS . VOL 14. NO 6. 753^774 Advance Access published on 24 October 2012 doi:10.1093/bib/bbs066 A guide to in silico vaccine discovery for eukaryotic pathogens Stephen J. Goodswen, Paul J. Kennedy and JohnT. Ellis Submitted: 21st April 2012; Received (in revised form) : 23rd August 2012 Abstract Keywords: reverse vaccinology; eukaryotic pathogens; in silico vaccine discovery; apicomplexans; immunoinformatics INTRODUCTION After almost a century of laboratory culture-based approaches to vaccine discovery, researchers are beginning to capitalize on the vast potential of omics data (genomes, transcriptomes and proteomes) to make an in silico approach to vaccine discovery possible, without the need to cultivate the pathogen. Eukaryotic pathogens are extremely complicated systems with multifaceted life cycles. The key challenge of this in silico approach is how best to transform mere biological abstractions of complex systems (in the form of digital information) into the knowledge required to identify vaccine candidates. In 2000, Rino Rappuoli [1] first proposed the idea of mining biological data to predict antigens that are most likely to be vaccine candidates. Effectively, the wet laboratory in the traditional culture-based approach to cultivate, dissect and identify antigens is replaced by a computer. His approach has been widely accepted as a way of discovering vaccines and is referred to as ‘Reverse Vaccinology’ on account that in its basic form, the approach starts with the genome of the pathogen rather than the pathogen itself. There are several successful applications of reverse vaccinology to the discovery of subunit vaccines against prokaryotic pathogens [2–6]. The key to subunit vaccine development is the successful identification of molecules of a pathogen, as opposed to using the entire entity, which evoke a safe immune response. The candidate molecules from a eukaryotic pathogen expected to induce immunity comprise proteins that are as follows: (i) present on the surface of the pathogen, (ii) excreted/ secreted from the pathogen and (iii) homologous to Corresponding author. John T. Ellis, School of Medical and Molecular Sciences, Ithree Institute, University of Technology Sydney. Tel.: þ61 2 9514 4161; E-mail: . Stephen Goodswen did his research for MSc at CSIRO while enrolled at the University of New England. He is now pursuing a PhD at the University of Technology Sydney focusing on an in silico vaccine discovery pipeline for parasitic protozoa. Paul Kennedy obtained his PhD in Computing Science at the University of Technology, Sydney, in 1999 where he currently directs the Knowledge Infrastructure Laboratory in the Centre for Quantum Computation and Intelligent Systems. His interests involve data mining of biomedical data, particularly visualization and classification of childhood cancer patients using their systems biology. John Ellis has research interests focused on translational research that includes development of vaccines and diagnostics for parasitic diseases of economic importance. For the past 20 years, he has studied parasitic protozoa of both veterinary and medical importance and in recent times has broadened his interests to environmental protozoology and groundwater. ß The Author 2012. Published by Oxford University Press. For Permissions, please email: In this article, a framework for an in silico pipeline is presented as a guide to high-throughput vaccine candidate discovery for eukaryotic pathogens, such as helminths and protozoa. Eukaryotic pathogens are mostly parasitic and cause some of the most damaging and difficult to treat diseases in humans and livestock. Consequently, these parasitic pathogens have a significant impact on economy and human health. The pipeline is based on the principle of reverse vaccinology and is constructed from freely available bioinformatics programs. There are several successful applications of reverse vaccinology to the discovery of subunit vaccines against prokaryotic pathogens but not yet against eukaryotic pathogens. The overriding aim of the pipeline, which focuses on eukaryotic pathogens, is to generate through computational processes of elimination and evidence gathering a ranked list of proteins based on a scoring system. These proteins are either surface components of the target pathogen or are secreted by the pathogen and are of a type known to be antigenic. No perfect predictive method is yet available; therefore, the highest-scoring proteins from the list require laboratory validation. 754 Goodswen et al. reviews and suggests freely available bioinformatics programs that can complete each explicit stage of an in silico vaccine discovery pipeline. PIPELINE OVERVIEW As a proof of concept, a vaccine discovery pipeline was constructed and evaluated using data from the eukaryotic pathogen Toxoplasma gondii, which is an important model system for the phylum Apicomplexa [8–10]. The focus here, however, is on the construction of the pipeline, and no attempt is made to propose scientific findings for T. gondii, as it is beyond the scope of the present article. Despite the similarity of eukaryotic pathogens, realistically there can be no ‘off-the-shelf’ pipeline for vaccine discovery that would instantly work for all pathogens. A generic pipeline, nevertheless, comprising the same linked programs can theoretically be used. The challenge from a user’s perspective is that these programs critically need appropriate training sets specific to the pathogen of interest. A pipeline here simply refers to a chain of data processing stages. Freely available bioinformatics programs are suggested for each stage described herein. An ideal objective of the pipeline is to have a seamless transition from start to end in which the output of each stage is the input of the next one. The transition between the stages can be achieved by writing simple parsing and reformatting programs. A critical aspect of these programs that tie the pipeline together is extracting the pertinent data from the stage outputs and providing logic to accept or reject the data from the pipeline. Example stage outputs are provided throughout the present article, and the parts of the output that are useful are indicated. The stage transitions in the pipeline presented were written in the Perl computer language. There were five underlying criteria for selecting the various programs used to complete each stage—public availability, operating platform, high-throughput functionality, cell type and software support. Each criterion is now described in more detail: (i) public availability—the program had to be freely downloadable and have stand-alone capability and (ii) type of operating platform—the numerous programs potentially available can be classified into three platform categories: web interface, Microsoft Windows and Linux. The web interface programs are by far the most prevalent because of their immediate accessi (...truncated)