TargetClone: A multi-sample approach for reconstructing subclonal evolution of tumors
November
TargetClone: A multi-sample approach for reconstructing subclonal evolution of tumors
Marleen M. NieboerID 0 1
Lambert C. J. Dorssers 1
Roy Straver 0 1
Leendert H. J. Looijenga 1
Jeroen de RidderID 0 1
0 Center for Molecular Medicine, University Medical Center Utrecht , Utrecht , The Netherlands , 2 Department of Pathology, Erasmus MC Cancer Institute, University Medical Center Rotterdam , Rotterdam , The Netherlands , 3 Princess Maxima Center for Pediatric Oncology , Utrecht , The Netherlands
1 Editor: Santosh K. Patnaik, Roswell Park Cancer Institute , UNITED STATES
Most tumors are composed of a heterogeneous population of subclones. A more detailed insight into the subclonal evolution of these tumors can be helpful to study progression and treatment response. Problematically, tumor samples are typically very heterogeneous, making deconvolving individual tumor subclones a major challenge. To overcome this limitation, reducing heterogeneity, such as by means of microdissections, coupled with targeted sequencing, is a viable approach. However, computational methods that enable reconstruction of the evolutionary relationships require unbiased read depth measurements, which are commonly challenging to obtain in this setting. We introduce TargetClone, a novel method to reconstruct the subclonal evolution tree of tumors from single-nucleotide polymorphism allele frequency and somatic single-nucleotide variant measurements. Furthermore, our method infers copy numbers, alleles and the fraction of the tumor component in each sample. TargetClone was specifically designed for targeted sequencing data obtained from microdissected samples. We demonstrate that our method obtains low error rates on simulated data. Additionally, we show that our method is able to reconstruct expected trees in a testicular germ cell cancer and ovarian cancer dataset. The TargetClone package including tree visualization is written in Python and is publicly available at https://github.com/ UMCUGenetics/targetclone.
Introduction
Tumors develop from the accumulation of somatic mutations over time. In a tumor, often
various subclonal populations with (partially) overlapping mutation patterns co-exist. These
subclones are formed through an evolutionary process [
1?3
]. Reconstructing the subclonal
evolution is important, as it can assist in characterizing the mutations driving tumor
development and progression, and can be helpful to decipher the mechanisms underlying treatment
response [
4, 5
].
A number of algorithms have been developed to reconstruct subclonal evolution trees
from rapidly emerging next-generation sequencing data (S1 Fig). The existing methods can
coarsely be divided into two categories, those based on somatic single-nucleotide variants
(SNVs) and those based on somatic copy number variations (CNVs). Somatic SNV-based
methods, such as LICHeE, PhyloSub, TrAp and AncesTree, are most often based on two
important assumptions; the sum-rule assumption and infinite sites assumption (ISA) [
6?9
].
Based on the sum rule, a branched tree, rather than a linear tree, can be ruled out if the sum
of the variant allele frequency (VAF) of SNVs in the child subclones is larger than the VAF
of SNVs in the parent [
7
]. Under the ISA, somatic SNVs are not expected to be gained twice
independently. Furthermore, somatic SNVs are not expected to be lost once gained. An
important limitation is that the VAF is affected by CNVs. As a result, SNV-based methods
are restricted to using somatic SNVs in copy number-neutral regions. To overcome potential
loss of information due to these restrictions, alternative methods, such as CNTMD, ThetA,
TITAN, MEDICC, CloneCNA and CLImAT-HET, have been developed that aim to either
infer the copy numbers of subclones, or reconstruct (subclonal) evolution trees from CNVs
inferred from e.g. read depth information [
10?15
]. Additionally, the PhyloWGS algorithm
combines somatic SNVs and CNVs to further increase the tree reconstruction accuracy [16].
However, using read depth to determine the copy number of individual subclones in
heterogeneous tumor populations is a challenging problem, as such populations consist of several
subclones and non-tumor cells mixed in different unknown fractions [
3, 15, 17
]. It is
therefore hard to distinguish between CNVs and differences in subclonal fraction, and multiple
combinations of subclonal fraction and subclonal CNVs may explain the overall read depth
profile.
While single-cell sequencing approaches largely mitigate the problem of sample
heterogeneity, it is currently not yet possible to sample accurate representations of the entire subclonal
diversity using these techniques [
18?20
]. Therefore, an interesting alternative is to perform
microdissections to obtain multiple samples of the same tumor (S2 Fig), while at the same
time reducing sample heterogeneity [
21?23
]. However, the typical low read depth of whole
genome sequencing (WGS) data compli (...truncated)