Genome annotation improvements from cross-phyla proteogenomics and time-of-day differences in malaria mosquito proteins using untargeted quantitative proteomics

PLOS ONE, Jul 2019

The malaria mosquito, Anopheles stephensi, and other mosquitoes modulate their biology to match the time-of-day. In the present work, we used a non-hypothesis driven approach (untargeted proteomics) to identify proteins in mosquito tissue, and then quantified the relative abundance of the identified proteins from An. stephensi bodies. Using these quantified protein levels, we then analyzed the data for proteins that were only detectable at certain times-of-the day, highlighting the need to consider time-of-day in experimental design. Further, we extended our time-of-day analysis to look for proteins which cycle in a rhythmic 24-hour (“circadian”) manner, identifying 31 rhythmic proteins. Finally, to maximize the utility of our data, we performed a proteogenomic analysis to improve the genome annotation of An. stephensi. We compare peptides that were detected using mass spectrometry but are ‘missing’ from the An. stephensi predicted proteome, to reference proteomes from 38 other primarily human disease vector species. We found 239 such peptide matches and reveal that genome annotation can be improved using proteogenomic analysis from taxonomically diverse reference proteomes. Examination of ‘missing’ peptides revealed reading frame errors, errors in gene-calling, overlapping gene models, and suspected gaps in the genome assembly.

Genome annotation improvements from cross-phyla proteogenomics and time-of-day differences in malaria mosquito proteins using untargeted quantitative proteomics

RESEARCH ARTICLE Genome annotation improvements from cross-phyla proteogenomics and time-of-day differences in malaria mosquito proteins using untargeted quantitative proteomics Lisa Imrie1☯, Thierry Le Bihan1,2,3☯, Áine O’Toole4, Paul V. Hickner5, W. Augustine Dunn6, Benjamin Weise2, Samuel S. C. Rund ID2,5* a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 1 SynthSys–Synthetic and Systems Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom, 2 Centre for Immunity, Infection and Evolution, University of Edinburgh, Edinburgh, United Kingdom, 3 Rapid Novor, Kitchener, Ontario, Canada, 4 Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, United Kingdom, 5 Eck Institute for Global Health, University of Notre Dame, Notre Dame, Indiana, United States of America, 6 Boston Children’s Hospital, Boston, Massachusetts, United States of America ☯ These authors contributed equally to this work. * OPEN ACCESS Citation: Imrie L, Le Bihan T, O’Toole Á, Hickner PV, Dunn WA, Weise B, et al. (2019) Genome annotation improvements from cross-phyla proteogenomics and time-of-day differences in malaria mosquito proteins using untargeted quantitative proteomics. PLoS ONE 14(7): e0220225. https://doi.org/10.1371/journal. pone.0220225 Editor: Walter S. Leal, University of CaliforniaDavis, UNITED STATES Received: October 24, 2018 Accepted: July 11, 2019 Published: July 29, 2019 Copyright: © 2019 Imrie et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All relevant data are within the manuscript and its Supporting Information files with the exception of the raw mass spectroscopy files which can be found on DataDryad at https://doi.org/10.5061/dryad. 8p20m31. Funding: Samuel S.C. Rund was funded by a Royal Society Newton International Fellowship (NF140517) and a strategic award from the Abstract The malaria mosquito, Anopheles stephensi, and other mosquitoes modulate their biology to match the time-of-day. In the present work, we used a non-hypothesis driven approach (untargeted proteomics) to identify proteins in mosquito tissue, and then quantified the relative abundance of the identified proteins from An. stephensi bodies. Using these quantified protein levels, we then analyzed the data for proteins that were only detectable at certain times-of-the day, highlighting the need to consider time-of-day in experimental design. Further, we extended our time-of-day analysis to look for proteins which cycle in a rhythmic 24-hour (“circadian”) manner, identifying 31 rhythmic proteins. Finally, to maximize the utility of our data, we performed a proteogenomic analysis to improve the genome annotation of An. stephensi. We compare peptides that were detected using mass spectrometry but are ‘missing’ from the An. stephensi predicted proteome, to reference proteomes from 38 other primarily human disease vector species. We found 239 such peptide matches and reveal that genome annotation can be improved using proteogenomic analysis from taxonomically diverse reference proteomes. Examination of ‘missing’ peptides revealed reading frame errors, errors in gene-calling, overlapping gene models, and suspected gaps in the genome assembly. Introduction Anopheles stephensi is a major malaria vector in southern Asia where its geographic range extends across the Indian subcontinent [1]. Research on the African Anopheles gambiae mosquito has demonstrated that the behavior and physiology of the mosquito is highly dependent on circadian biology and time-of-day. For example, ~20% of An. gambiae genes were PLOS ONE | https://doi.org/10.1371/journal.pone.0220225 July 29, 2019 1 / 14 Time-of-day mosquito untargeted quantitative proteomics Wellcome Trust (No. 095831) for the Centre for Immunity, Infection and Evolution. Áine O’ Toole was funded by the Wellcome Trust (202769/Z/16/ Z; PhD programme in Hosts, Pathogens and Global 348 422 Health). The LC-MS QExactive equipment was purchased by a Wellcome Trust Institutional Strategic Support Fund and a strategic award from the Wellcome Trust for the Centre for Immunity, Infection and Evolution (095831/Z/11/Z). Rapid Novor provided support in the form of a salary for author TLB, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The remaining funders also had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: TLB has received salary from Rapid Novor. TLB’s employment at Rapid Novor does not alter our adherence to PLOS ONE policies on sharing data and materials. The other authors have declared that no competing interests exist. The specific roles of all authors are articulated in the ‘author contributions’ section. rhythmically expressed over the 24-hour day [2]; rhythmically expressed mosquito olfaction genes correspond with rhythmic proteins levels and time-of-day changes in electrophysiological sensitivity to host odorants [3]; and time-of-day effects are associated with mosquito insecticidal resistance[4]. An. stephensi has been demonstrated to have 24-hour nocturnal rhythms of flight behavior that persists even in the absence of light:dark cues [5]. Finally, rhythms in the biology of the mosquito, and indeed possibly in the human host and plasmodium parasite, may interact to affect disease transmission [6–8]. To date, the genomes of two strains of An. stephensi have been sequenced, one from India and one from Pakistan (SDA-500) [9, 10]. To our knowledge, proteomics in this species is limited to an Edman degradation of their salivary glands [11]; mass spectrometry proteomics analysis of salivary proteomes [11, 12]; fat bodies [13, 14]; midguts/fat bodies [14]; a mass spectrometry proteomics analysis of ageing in the head and thorax [15]; and a recent work across multiple tissues which included genome annotation improvements [16]. In An. gambiae, several mass spectrometry-based studies have been performed on various tissues, including the antennae, head, body, midgut peritrophic matrix, salivary glands, and cuticle [3, 17–20]. Proteomic experiments can be used to identify post-translational modification, improve genome annotation, and to identify and quantify proteins in a biological sample [16, 21, 22]. A previous study in An. gambiae mosquito antennae utilized targeted quantitative proteomics, in which the mass spectrometer was tuned to specifically identify and quantify the protein abundance of proteins from an a priori list of genes of interest [3] where only targeted proteins are in (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0220225&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220225

Lisa Imrie, Thierry Le Bihan, Áine O'Toole, Paul V. Hickner, W. Augustine Dunn, Benjamin Weise, Samuel S. C. Rund. Genome annotation improvements from cross-phyla proteogenomics and time-of-day differences in malaria mosquito proteins using untargeted quantitative proteomics, PLOS ONE, 2019, Volume 14, Issue 7, DOI: 10.1371/journal.pone.0220225