MosaicSolver: a tool for determining recombinants of viral genomes from pileup data

Nucleic Acids Research, Sep 2014

Viral recombination is a key evolutionary mechanism, aiding escape from host immunity, contributing to changes in tropism and possibly assisting transmission across species barriers. The ability to determine whether recombination has occurred and to locate associated specific recombination junctions is thus of major importance in understanding emerging diseases and pathogenesis. This paper describes a method for determining recombinant mosaics (and their proportions) originating from two parent genomes, using high-throughput sequence data. The method involves setting the problem geometrically and the use of appropriately constrained quadratic programming. Recombinants of the honeybee deformed wing virus and the Varroa destructor virus-1 are inferred to illustrate the method from both siRNAs and reads sampling the viral genome population (cDNA library); our results are confirmed experimentally. Matlab software (MosaicSolver) is available.

Article PDF cannot be displayed. You can download it here:

https://nar.oxfordjournals.org/content/42/16/e123.full.pdf

MosaicSolver: a tool for determining recombinants of viral genomes from pileup data

Graham R. Wood 1 Eugene V. Ryabov 0 Jessica M. Fannon 0 Jonathan D. Moore 1 David J. Evans 0 Nigel Burroughs 1 0 School of Life Sciences, University of Warwick , Coventry, CV4 7AL, UK 1 Warwick Systems Biology Centre , Senate House, University of Warwick , Coventry, CV4 7AL, UK Viral recombination is a key evolutionary mechanism, aiding escape from host immunity, contributing to changes in tropism and possibly assisting transmission across species barriers. The ability to determine whether recombination has occurred and to locate associated specific recombination junctions is thus of major importance in understanding emerging diseases and pathogenesis. This paper describes a method for determining recombinant mosaics (and their proportions) originating from two parent genomes, using high-throughput sequence data. The method involves setting the problem geometrically and the use of appropriately constrained quadratic programming. Recombinants of the honeybee deformed wing virus and the Varroa destructor virus-1 are inferred to illustrate the method from both siRNAs and reads sampling the viral genome population (cDNA library); our results are confirmed experimentally. Matlab software (MosaicSolver) is available. - INTRODUCTION Recombination provides a mechanism for the rapid evolution of viruses, being implicated in the emergence of many recent pathogenic viral strains in public health and agriculture. Recent outbreaks of avian influenza (1,2) have implicated a recombinant event as a primary cause, honeybee population decline is associated with a deformed wing virus (DWV) recombinant (3,4) and current global potato crop devastation is caused by the highly pathogenic Y NTN virus strain (5,6). Further, human immunodeficiency virus continues to evolve with recombinants now predominating in many geographical areas exacerbating control measures (7), whilst recombination has also become a focus as a potential risk factor in the use of live attenuated virus vaccines (8). These are all examples of virulence shifts, the recombined virus acquiring new capabilities such as escape from the immune system, drug resistance, increased transmission rates, changes in tissue tropism or acquisition of novel host tropism allowing cross-species transmission. Despite these evolutionary advantages, a recent review (9) suggests that recombination of ribonucleic acid (RNA) viruses may not be a selected trait but a biproduct of the RNA polymerase mechanism. Recombination is mediated through co-infection of a cell and can in principle occur anywhere along the genome, although recombination points do have preferred hotspots (1012). For instance, recombination in poliovirus was shown to be associated with RNA structure and exhibits a GC content bias over an infection cycle (11), whilst protein incompatibility and selection pressure on regulatory, maturation or associated protein functions are likely to add a further layer of selection for the location of recombination points, producing the well-known bias between structural and nonstructural genes (10). Furthermore, recent evidence indicates that the recombination mechanism is biphasic, involving distinct crossover and resolution events (12). Mapping these locations is vital for identifying the determinants of recombination and understanding the characteristics of emergent strains. Identification of recombinants within a population of mixed viral genomes, together with their abundance, is thus a problem of fundamental significance. Detection of recombinants, especially when there is no prior knowledge of recombination junctions (which would allow construction of suitable primers), is difficult, particularly if more than one recombinant progeny form is present. Next-generation sequencing (NGS) approaches provide a new opportunity to perform this task; new challenges arise however, particularly in the reconstruction of the underlying genomes from small sequences [typically less than 100 nucleotides (nt)]. In this paper, we present a novel approach to identify, characterize, quantify and assess the statistical significance of recombinant genomes in NGS sampling of population mixtures. Throughout we assume that the parent viral genomes can be globally aligned and that any recombination involves exchange of homologous regions. The current work was motivated by ongoing investigations into honeybees (Apis mellifera) infested with a parasitic mite (Varroa destructor). The latter acts as a vector for a range of pathogenic viruses (1315), the most important of which (both in terms of the individual honeybee and the penetration of colonies in the UK) are viruses related to the deformed wing virus (DWV-like viruses), which include DWV itself and its relative Varroa destructor virus-1 (VDV1) that share an 84% nt (95% amino acid) identity. The latter was first extracted from Varroa mites (16). High levels of DWV-like viruses are associated primarily with deformed wings, including atrophied wing development and abdominal stunting (17). DWV-like viruses are endemic in honeybees worldwide, usually being asymptomatic, with the virus presumably being controlled and thus not reaching harmful levels; however, it has been reported to be responsible for overwintering colony demise, although the cause of the shift from a benign to a pathogenic infection is unknown. Co-infection of either the host honeybee or the mite with DWV and VDV-1 may result in the formation of recombinants between the two viruses. Such recombinants could accumulate to high levels and it is hypothesized that one or a very limited range of such recombinant forms is responsible for colony demise (34,18). Thus, ascertaining the recombinant profile within a population is a problem of key significance to food security. Different recombinants of DWV and VDV-1 strains have been reported (3,19). This makes the identification of DWV/VDV-1 recombinants a good system for the development of methods for recombinant identification, especially as mixed infections (parental and recombinant genomes) are present in the same individual. As part of the analysis of the virological consequences of infesting Varroa-free colonies with mites we acquired two types of high-throughput sequence data, specifically sequencing of small interfering RNAs (siRNA; singlestranded RNAs that were generated as a result of the action of several components of the honeybee RNAi pathway) and short reads from the viral genome population [amplified complementary deoxyribonucleic acid (cDNA)], both extracted from Varroa-exposed, high viral load pupae. These independently generated data sets allowed us to investigate the development of a method to disentangle recombinant populations using both short, 2122-nt reads (siRNA) and long, around 100-nt (cDNA) reads. These data, arising from parent genomes and potential recombinants within the viral population, allow the relative abundance of DWV and VDV-1 reads to be determined in any continuous (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/42/16/e123.full.pdf
Article home page: http://nar.oxfordjournals.org/content/42/16/e123.abstract

Graham R. Wood, Eugene V. Ryabov, Jessica M. Fannon, Jonathan D. Moore, David J. Evans, Nigel Burroughs. MosaicSolver: a tool for determining recombinants of viral genomes from pileup data, Nucleic Acids Research, 2014, pp. e123-e123, 42/16, DOI: 10.1093/nar/gku524