A KolmogorovSmirnov test for the molecular clock based on Bayesian ensembles of phylogenies
January
A KolmogorovSmirnov test for the molecular clock based on Bayesian ensembles of phylogenies
Fernando Antoneli 0 1
Fernando M. Passos 0 1
Luciano R. Lopes 0 1
Marcelo R. S. Briones 0 1
0 Laborat oÂrio de Gen oÃmica Evolutiva e Biocomplexidade, Escola Paulista de Medicina, Universidade Federal de São Paulo , Rua Pedro de Toledo, UNIFESP, São Paulo, SP , Brazil , 2 Departamento de Inform aÂtica em SauÂ de, Escola Paulista de Medicina, Universidade Federal de São Paulo, Ed. Jos eÂ Leal Prado , São Paulo, SP , Brazil , 3 Departamento de AnaÂ lises ClÂõnicas e Toxicol oÂgicas, Faculdade de Ciências Farmacêuticas, Universidade de São Paulo , São Paulo, SP , Brazil
1 Editor: Art F. Y. Poon, Western University , CANADA

Data Availability Statement: All relevant data are
within the paper and its Supporting Information
files.
Competing interests: The authors have declared
that no competing interests exist.
Divergence date estimates are central to understand evolutionary processes and depend, in
the case of molecular phylogenies, on tests of molecular clocks. Here we propose two
nonparametric tests of strict and relaxed molecular clocks built upon a framework that uses the
empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of
Bayesian trees and well known nonparametric (onesample and twosample) Kolmogorov
Smirnov (KS) goodnessoffit test. In the strict clock case, the method consists in using the
onesample KolmogorovSmirnov (KS) test to directly test if the phylogeny is clocklike, in
other words, if it follows a Poisson law. The ECD is computed from the discretized branch
lengths and the parameter λ of the expected Poisson distribution is calculated as the
average branch length over the ensemble of trees. To compensate for the autocorrelation in
the ensemble of trees and pseudoreplication we take advantage of thinning and effective
sample size, two features provided by Bayesian inference MCMC samplers. Finally, it is
observed that tree topologies with very long or very short branches lead to Poisson mixtures
and in this case we propose the use of the twosample KS test with samples from two
continuous branch length distributions, one obtained from an ensemble of clockconstrained trees
and the other from an ensemble of unconstrained trees. Moreover, in this second form the
test can also be applied to test for relaxed clock models. The use of a statistically equivalent
ensemble of phylogenies to obtain the branch lengths ECD, instead of one consensus tree,
yields considerable reduction of the effects of small sample size and provides a gain of
power.
Introduction
The molecular clock hypothesis postulates that for a given informational macromolecule
(DNA or protein sequence) the evolutionary rate is approximately constant over time in all
evolutionary lines of descent. This implies that if genetic divergence accumulates in a
stochastic clocklike manner, that is, approximately constant number of mutations accumulated per
time interval, then, time scales could be determined for evolutionary events, with calibration
using fossil evidence. Moreover, the evolutionary rate variation between lineages could shed
light on mechanisms of molecular evolution.
If the substitution rate is constant between lineages, then evolutionary distances are such
that all external nodes of a phylogenetic tree should be of the same size starting from the root.
In [1±3] authors suggest that the substitution process is approximately Poisson, meaning that
the average number of substitutions, and its variance, in different lineages during the same
time interval should be equal.
When the neutral theory of molecular evolution was proposed [
4,5
], the observed clocklike
behavior of molecular evolution was advocated as strong evidence supporting the theory [4±
6]. However, the reliability of the clock and its implications for the mechanism of molecular
evolution were a focus of immediate controversy, entwined in the ªneutralist±selectionistº
controversy. The debate surrounding the neutral theory has generated a rich body of
population genetics theory and analytical tools. For instance, in the strict neutral model the dynamics
depends on the neutral mutation rate alone, however one may expect most sites in a functional
protein to be constrained during most of the evolutionary time. This observation motivated
the introduction of doubly stochastic Poisson process, or Cox process, as a model for the
substitution process, implying that positive selection, if it occurs, is in episodic fashion and should
affect only a few sites [
7,8
]. More recently, these ideas motivated the introduction of relaxed
molecular clock models and advanced their use for inferring dates of divergence events.
Despite the great impact of the molecular clock in evolutionary biology, as comparative
molecular data have been accumulating over the past decades, no prediction has been proven
satisfactory; the dispersion index (the ratio of the variance to the mean value of the number of
substitutions) is generally greater than 1, suggesting that the substitution process is
overdispersed. Furthermore, it was observed that the substitution rate usually display variation along
lineages, a fact that has created great controversy on its use in divergence date estimates [
9
].
In order to address these issues, several statistical tests have been developed to examine
whether rates of molecular evolution vary significantly among phylogenetic lineages. Fitch
proposed a simple test for statistically examining whether the observed difference in
evolutionary rates between two sequences is significantly greater than that expected by chance [
10
].
More powerful versions of Fitch test have appeared subsequently leading to a general
framework for testing the molecular clock hypothesis for both DNA and protein sequences between
two lineages, given an outgroup species [
11
].
More recently, new procedures for testing the molecular clock on multiple lineages
simultaneously in a phylogenetic tree have been proposed. Some of these tests identify anomalous
groups or lineages, whereas some merely test an entire tree for conformity to the hypothesis.
The index of dispersion has been suggested as an estimator to test the molecular clock. The
rationale is that when the number of substitutions follows a Poisson law the index of dispersion
is equal to one [
12
]. The problems with this type of test have been extensively discussed [
13,14
]
and since then the approach based on this estimator has been dismissed.
Despite the observation that the strict molecular clock hypothesis does not fully explain the
substitution process, it still remains a promising concept and a powerful analytical tool in
evolutionary biology. Therefore, testing the molecular clock in phylogenetic trees is an essential
2 / 22
task. The problem is that if one assumes that the substitution process is Poisson then there is
no homogeneity in the distribution of substitutions along a lineage, that is, even though the
clock rate is constant, the variance is as large as the rate itself. In fact, in a very precise sense, a
Poisson process is as heterogeneous as it is possible, meaning that it distributes dots ªat
randomº over a halfinfinite line and is often called the ªcompletely random processº [
15
].
Takahata [
16
] observed that the rates of molecular evolution in several loci are more irregular than
described by simple Poisson processes and therefore the clock is overdispersed in these
situations. Accordingly, statistical models for the overdispersed molecular clock were proposed
which suggested that the overdispersion of molecular clock is due either to a major molecular
reconfiguration led by a series of subliminal neutral changes or to selected substitutions
finetuning a molecule after a major molecular change [
17
].
In this work we propose to directly verify PaulingZuckerkandl's assumption that the
substitution process along the branches of a (strictly) clocklike phylogeny is approximately
Poisson by testing the hypothesis that the distribution of the branch lengths a phylogenetic tree,
when measured as the average number of substitutions, follows a Poisson law. Accordingly, we
developed a procedure for testing the strict molecular clock in phylogenetic trees where the
inference is made using a nonparametric goodnessoffit test.
The method proposed here introduces two novelties: (1) it is based on an ensemble of trees,
instead of using only one single consensus tree±this is quite natural from the Bayesian
framework point of view. Indeed, the Bayesian inference procedure generates a posterior
distribution over the set of phylogenetic trees, in the form of an ensemble of representative trees; (2)
the use of wellknown nonparametric goodnessoffit test known as KolmogorovSmirnov
(KS) test [18±20], modified to account for discrete variables in Poisson distributions with
estimated parameter λ.
The classical KS test is performed given a sample of size N of independent and identically
distributed (IID) observations of the variable of interest one computes the empirical cumulative
distribution function (ECD) of the observed data, defined as
FN
x number of elements in the sample which are
Nsu
x
1
where x is a positive real number.
The expected cumulative distribution FE(x) is the cumulative distribution function
corresponding to the expected probability distribution function of the variable of interest, and the
test statistic is DKS = supFN(x) − FE(x), which is a measure of distance between the two
distributions. The null hypothesis H0: ªFE(x) = FN(x) for all xº is rejected if DKS exceeds a critical
value Dα for a fixed significance level α. In the classical KS test, the null distribution of DKS
does not depend on the expected distribution FE(x) and is given by an explicit
formula±tabulated critical values have been available from [
21
]. Moreover, the test has statistical power, or
sensitivity, tending to 1 as the sample size tends to infinity [
22
], i.e., the probability of a Type II
Error (false negative rate) goes to zero.
However, the universality of the null distribution of DKS comes at a price: (i) the test only
applies to continuous distributions and (ii) the parameters of the expected distribution cannot
be estimated from the data (the expected distribution must be completely specified in
advance)±in fact, these two conditions are necessary to show that the distribution of DKS is
independent of expected distribution. This seems to be a serious obstruction for the use of the
KS test in practical applications. Indeed, the use of the tables associated with standard KS test
when one of the conditions (i) or (ii) are not satisfied results in conservative pvalues, in the
sense that the probability of Type I Error (false positive rate) is smaller than that given by the
3 / 22
standard table [
23
]. This difficulty has prevented the dissemination of the KS test in biology
and other fields.
Nevertheless, it is possible to circumvent these limitations and modify the KS test to the
case of discrete distributions with or without estimation of parameters [24±26]. The test
statistic remains unchanged, but its null distribution is not universal anymore; unlike the
continuous completely specified case, it depends on the type of the expected distribution±more
specifically, it depends on the set of points of discontinuity of the expected distribution±and
on the parameters that are estimated from the data. Nowadays, with the advent of fast and
cheap computers, pvalues and critical values for modified KS tests can be easily calculated.
In the particular case where the expected distribution is Poisson, the modified test is a
procedure for the following null hypothesis H0: ªFN(x) is Poisson with estimated mean value λº,
against the alternative hypothesis H1: ªFN(x) is not Poissonº, where the Poisson parameter
(mean value) λ is estimated as the arithmetic mean of a list of nonnegative integer numbers
(the observed data).
The test is performed by calculating the KS statistics (Dscore):
DPKS
N supjFy
x
P
x; lj
where P(x,λ) is the cumulative distribution function for the Poisson distribution with parameter
λ, defined for all positive real values x with k x < k+1 for all integral values k = 0,1,2,. . .,1,
as
The null hypothesis is rejected if DPKS exceeds the critical value Dα for a fixed significance
level α. Now the critical value Dα must be obtained from the distribution of the statistic DPKS
which depends on the fact that the empirical distribution function is expected to be Poisson
with parameter λ. This procedure is usually referred as the PoissonKolmogorovSmirnov (PKS)
test.
Campbell and Oprian [
27
] computed tables for approximated critical values of the PKS test
(see also [
28
]). One can eliminate the need for tables by performing the parametric bootstrap
(Monte Carlo simulation) for the PKS test as described in [
29,30
]. However, neither the tables
nor the parametric bootstrap for the PKS test provide the exact critical values and pvalues.
More recently, Frey [31] gives an algorithm for the computation of exact pvalues and exact
critical values for the PKS test.
The dual coverage band, also called confidence bands, for the PKS test can be constructed as
follows. Given a fixed significance level α with corresponding critical value Dα define the
functions: U(x) = min{FN(x) + Dα, 1} and L(x) = max{FN(x) − Dα, 0}. Then, the pair (L(x),U(x)) is a
100(1−α)% nonparametric coverage band for FN(x). Thus, for every piecewise continuous
function H with jump discontinuities on the natural numbers, the probability that L(x) H(x)
U(x) for all x, conditional on H having mean value λ, is greater than (1−α) [
32,33
].
Another variation of the KS test is the twosample KolmogorovSmirnov (2KS) test. In this
case, it is given two samples of independent and identically distributed (IID) realvalued
observations of sizes N and M, respectively, and the test examines whether the two samples come
from the same continuous distribution or not. For each sample, one computes the respective
ECDs FN(x) and GM(x) (defined by the righthanded side of Eq (1) with N replaced by M) and
the test statistics (Dscore) is defined as:
D
N; M supjFN
x
GM
xj
Here, the null hypothesis is H0: ªFN(x) = GM(x) for all xº and the the alternative hypothesis H1:
P
x; l e l 1 l l2
.
2!
lk
.
k!
2
3
4
ªFN(x) 6 GM(x) for some xº. The null hypothesis is rejected if D(N,M) exceeds a critical value
Dα for a fixed significance level α. As in the standard onesample KS test, the null distribution
of D(N,M) does not depend on the cumulative distributions FN and GM when they are assumed
to be continuous (that is, when the two samples are realvalued). Tabulated critical values have
been available from Massey [
34
]. As before, it is possible to compute approximated critical
values and pvalues by parametric bootstrap and the coverage bands are given by the standard
procedure [
32,33
].
In statistics, the term ªnonparametricº has at least two different meanings. The first
meaning refers to techniques that do not rely on data belonging to any particular distribution. In
particular, it includes procedures that test hypotheses that are not statements about population
parameters and procedures that make no assumption about the sampled population, also
called distribution free procedures [
35
]. The second meaning refers to techniques that do not
assume that the structure of a model is fixed. For instance, the simple procedures for testing
the molecular clock hypothesis between two lineages of [
10,11
] are nonparametric in this
second sense, given that they do not require information on the model of evolutionary change.
The methods proposed in this paper are nonparametric tests in the first meaning described
above.
Results and discussion
The PoissonKolmogorovSmirnov test for the strict molecular clock
Our first proposal for testing the (strict) molecular clock is to apply the KolmogorovSmirnov
test for Poisson distributions with estimated parameter to an ensemble of phylogenetic trees
that has been generated by the Bayesian inference method. The PKS test for the (strict)
molecular clock phylogeny is performed as follows:
1. Generate one ensemble of unconstrained phylogenetic trees. It is recommended to choose
an ªoutgroup taxonº that should be used to root the tree. The ªoutgroup branchº must be
removed before performing the test.
2. Perform a burnin discarding at least 25%. After the burnin the loglikelihood scores
stabilize, and therefore all trees are considered statistically equivalent.
3. Extract all branch lengths of each tree and convert each real value into a nonnegative
integer, the average number of substitutions, by multiplying each branch length by the size of
the alignment and rounding the value, producing an ensemble of discrete, integervalued,
branch lengths.
4. Compute the ECD FN(x) using all the discrete branch lengths of the ensemble by formula
(1). Here, the unadjusted sample size N is a cutoff value that determine the number of trees
used in the test (see below).
5. Compute the discrete mean branch length λ using the same set of branches from item (4),
as the average value of all branch lengths and compute the expected cumulative distribution
P(x,λ) by formula (3).
6. Compute the test statistics DPKS(N) by formula (2) with the unadjusted sample size N.
7. Compute the adjusted PKS sample size NADJ (see below).
8. Compute the appropriate critical value Dα (for a fixed significance level α) or the pvalue,
using the adjusted sample size NADJ. The simplest way to obtain PKS critical values or
pvalues is from the tables of Campbell and Oprian [
27
].
5 / 22
Conclusion: If DPKS(N) > Dα or the pvalue is smaller than the threshold (say p < 0.01)
then the null hypothesis is rejected, that is, the phylogeny is not clocklike, more precisely, the
clocklike phylogenetic model does not fit the data well enough (underfitting). If DPKS(N) <
Dα or the pvalue is bigger than the threshold (say p > 0.01) then the null hypothesis is not
rejected, that is, the phylogeny is clocklike, more precisely, the clocklike phylogenetic can fit
the data as well as the nonclock like phylogenetic model, and since the clocklike model has
fewer parameters it is preferable.
Definition of PKS sample sizes: The unadjusted sample size is defined as N = τ B, where B is
number of branches of the trees and τ is the least number of trees that satisfies the following
two conditions: (i) τ number of taxa; (ii) DPKS is minimal with respect to τ, that is, the fit of
the ECD is the best possible, given that condition (i) is satisfied. The adjusted sample size is
defined as NADJ = k N = k τ B, where the autocorrelation coefficient is defined as k = TESS/T,
with T the total number of trees generated by the MCMC sampler, after the burnin and TESS
the effective sample size associated to the tree lengths (TL) computed by the MCMC sampler.
The twosample KolmogorovSmirnov test for molecular clock
Our second proposal for testing the molecular clock is to apply the 2KS test with two
ensembles of trees that have been generated by Bayesian phylogeny inference. The test is performed
as follows:
1. Generate one ensemble of unconstrained phylogenetic trees. It is recommended to choose
an ªoutgroup taxonº that should be used to root the tree.
2. Generate one ensemble of clockconstrained phylogenetic trees. If the clock being tested is
the strict clock then the trees are rooted by default and hence the ªoutgroup taxonº should
not be included.
3. Perform a burnin discarding at least 25%. After the burnin, the loglikelihood scores
stabilize, and therefore all trees in the ensemble are considered statistically equivalent.
4. Extract all the branch lengths, as real numbers, of each tree in both ensembles.
5. Compute the ECD FN(x) using all the branch lengths in the ensemble of unconstrained
phylogenetic trees by formula (1). Here the 2KS sample size N is the total number of branches
used to compute the ECD FN(x) (see below).
6. Compute the ECD GM(x) using all the branch lengths in the ensemble of clockconstrained
phylogenetic trees by formula (1) with N replaced by M. Here the 2KS sample size M is the
total number of branches used to compute the ECD GM(x) (see below).
7. Compute the test statistics D(N,M) using formula (4).
8. Compute the adjusted sample sizes NADJ and MADJ (see below).
9. Compute the appropriate critical value Dα (for a fixed significance level α) or pvalue, using
the adjusted sample sizes NADJ and MADJ. In the 2KS test, critical values Dα or pvalues are
the usual ones (see Massey [
34
]).
Conclusion: If D(N,M) > Dα or the pvalue is smaller than some threshold (say p < 0.01)
the null hypothesis is rejected, that is, the phylogeny is not clocklike, more precisely, the
clocklike phylogenetic model does not fit the data well enough (underfitting). If D(N,M) <
Dα or the pvalue is bigger than the threshold (say p > 0.01) then the null hypothesis is not
rejected, that is, the phylogeny is clocklike, more precisely, the clocklike phylogenetic can fit
the data as well as the nonclock like phylogenetic model.
6 / 22
Definition of 2KS sample sizes: The unadjusted sample sizes are defined as N = M = τ B and
where B is number of branches of the trees and τ is the least number of trees that satisfies the
following two conditions: (i) τ number of taxa; (ii) DPKS is minimal with respect to τ, that is,
the fit of the ECD is the best possible, given that condition (i) is satisfied. The adjusted sample
sizes are defined as NADJ = kC N and MADJ = kU M, where the autocorrelation coefficient kC is
obtained from the clockconstrained ensemble and the autocorrelation coefficient kU is
obtained from the unconstrained ensemble, associated to the tree lengths (TL) computed by
the MCMC sampler, as before.
Poisson mixtures
When the tree topology is highly unbalanced, with some branches much longer than others, it
is expected that the long branches accumulate more changes than the short branches. For
example, in a tree ((A,B),C), if it is perfectly clocklike, branches ªAº and ªBº will be equal in
length, but branch ªCº will be longer than both. The interior branch is unlikely to be exactly
the same length as the three exterior branches. In such cases, if the tree is clocklike, we would
expect that the (discretized) branch length distribution is a mixture of several Poisson
distributions.
In such cases, the PKS test is expected to reject the null hypothesis. This does not mean that
the tree is not clocklike, even though the branch length distribution is multimodal. It would
be more appropriate to consider that the test was inconclusive and try other approaches: (i)
directly test the null hypothesis of a Poisson mixture using the branch lengths of the consensus
tree to estimate the several Poisson parameters; (ii) apply the PKS test to appropriate subtrees;
(iii) apply the 2KS test.
Performing the tests
We have performed the test proposed here with several ensembles of trees obtained from three
types of sequence data: a set of simulated sequences, a set of sequences generated by in vitro
evolution [
36
], three small datasets of real sequences and a large dataset of real sequences [
37
]
of fungal 18S rRNA gene (18S rDNA).
Simulated data. We generated a dataset of simulated sequence containing 9 sequences
with 10,000 nucleotides each, using the Kimura 2Parameter (K2P) nucleotide substitution
model [
38
] to evolve the sequences. Since the counting process associated to the K2P
substitution model is a Poisson process [
12
] we expected the branch lengths to follow a Poisson law.
The tree used to evolve the nucleotide sequences was the ªninetaxon treeº of [
39
] (Fig 1). The
tree has a long branch corresponding to the ªoutgroupº {O} and a subtree, the ªingroupº
{S1,. . .,S8}, consisting of 8 taxa. We generated an ensemble of 1,000 trees (obtained from 2,000
trees with 50% burnin). The topology was fixed to be the ninetaxon tree, with the taxon ªOº
as the outgroup and the K2P model was used to estimate the substitution rates.
First, we observe that the mean branch length of the ingroup is λ = 5.75, while the mean
branch length of the outgroup is λ = 25.18; the outgroup branch is approximately 5 times
longer than the branches of the ingroup. Taking into account that the tree is unrooted, this is
consistent with the fact the long branches accumulate an average of five times more substitutions
than the branches of the ingroup. The mean branch length of the full tree is λ = 7.01, is
approximately equal to the weighted average of branch lengths of the ingroup and the outgroup: 14/
15 x 5.75 + 1/15 x 25.18 = 7.03. This fact suggests, as expected, that the branch length
distribution of the full tree is a (14/15,1/15) weighted mixture of two independent Poisson
distributions with parameters λ1 = 5.75 and λ2 = 25.18, respectively. The hypothesis of Poisson
mixtures is further strengthened by comparing the ECD obtained from the branch lengths of
7 / 22
Fig 1. The ªninetaxon treeº used to evolve the simulated nucleotide sequences. Dataset of simulated
sequences containing 9 sequences with 10,000 nucleotides each, using the Kimura 2Parameter (K2P)
nucleotide substitution model [
38
] to evolve the sequences.
full tree ensemble with a Poisson cumulative distribution with parameter λ = 7.01 and the
cumulative distribution of a Poisson mixture with parameters λ1 = 5.75 and λ2 = 25.18 with
weights (14/15, 1/15) (Fig 2A and 2B).
We have performed the PKS test with the full tree, with the purpose of illustrating the
situation when the observed data is a (bimodal) Poisson mixture. As expected the PKS test
concludes that the phylogeny was not clocklike, with a pvalue < 0.001%. On the other hand,
when we removed the ªoutgroup branchº and performed the test with the ingroup the PKS
test concluded that the phylogeny was clocklike, with pvalue of 98% (Table 1). We also
performed the test with the outgroup and the PKS test concluded that the tree is clocklike, with
pvalue of 34% (Table 1). The ECDs of branch lengths obtained from the ingroup and the
outgroup ensembles with the respective expected cumulative Poisson distributions are shown in
Fig 2C and 2D.
In vitro evolution data. The 16 sequences of 2,238 nucleotides were generated by
fourstep serial bifurcate PCR method, where the ancestor sequence 18S rRNA gene (18S rDNA)
evolved in vitro for 280 nested PCR cycles [
36
]. The real phylogeny obtained in the experiment,
with the number of substitutions on each branch is shown in Fig 3.
We generated an ensemble of 1,000 trees (obtained from 2,000 trees with 50% burnin).
The additional ªoutgroup taxonº that was removed during the extraction of branch lengths,
and we used the General Time Reversible (GTR) model to estimate the substitution rates
(actually, the best substitution model fitting the in vitro evolution obtained in [
36
] is not even
timereversible).
We have performed the PKS test with the ensemble of trees and with the real tree. The PKS
test concluded that phylogeny was not clocklike, with a pvalue < 0.001%. On the other hand,
when performed just with the consensus tree the PKS test concluded that the phylogeny was
clocklike, with a pvalue of 65% (Table 2). The ECDs of branch lengths with the respective
8 / 22
Fig 2. Empirical cumulative distributions and expected cumulative distributions of the simulated
sequences ensemble. Vertical dotted line indicates the mean value. Panels (a) and (b) display the ECD of
the full tree against the expected cumulative Poisson distribution with mean branch length λ = 7.01 and the
expected cumulative Poisson mixture, respectively. Panel (c) displays the ECD of the ingroup against the
expected cumulative Poisson distribution with mean branch length λ = 5.75. Panel (d) displays the ECD of the
outgroup against the expected cumulative Poisson distribution with mean branch length λ = 25.18.
expected cumulative Poisson distributions are shown in Fig 4A and 4B. The incongruence
found between the results of the test on the real tree and the ensemble of trees is most likely
because the adjusted PKS samples size (NADJ = 465) obtained for the ensemble of trees, even
though being small in relation to the ensemble, was capable of providing enough information
and power to the test. Another possibility is that the tree topology was not well balanced and
hence a test for a Poisson mixture could provide another conclusion.
Small datasets of real sequences. In order to illustrate the method with highly
unbalanced tree topologies we have performed both tests with three datasets of real sequences: (1)
the ENV viral gene of immunodeficiency Lentiviruses (9 sequences of 3,013 nucleotides), (Fig
5); (2) the COX1 gene of Primates (9 sequences of 1,569 nucleotides), (Fig 6); (3) the 18S
rRNA gene (18S rDNA) of Ascomycete yeasts (17 sequences of 1,845 nucleotides), (Fig 7). For
each dataset we generated 3 ensembles with 1,000 trees (2,000 trees with 50% burnin)±one
nonclock, one strict clock and one relaxed clock±each of them under the GTR model.
Full Tree (K2P)
7.01
0.15
33 (15)
247
0.05
< 0.00001
> 0.99
Ingroup (K2P)
5.75
0.01
49 (14)
343
0.04
0.98
> 0.99
Fig 3. The real phylogeny obtained by in vitro PCR evolution. As described in Sanson et al. [
36
] with the
number of observed substitutions along each branch.
We performed the PKS test with the three ensembles. The PKS test concluded that the
phylogenies were not clocklike in all the three cases (Table 3). Since the trees are highly
unbalanced this was expected as the branch length distributions are multimodal in all cases.
However, the PKS test could have concluded this because of the unbalanced topology and not
because the tree is nor clocklike.
We performed the 2KS test for the unconstrained ensemble against the strict
clockconstrained ensemble and the uncorrelated rates lognormal relaxed clock model [
40
]. The 2KS
test concluded that the phylogenies were not strict clocklike in the three cases (Table 4). In all
three cases, it is clear from the ECDs that the branch length distribution of the unconstrained
ensemble is overdispersed in relation to the strict clockconstrained ensemble. On the other
hand, the 2KS test concluded that the phylogenies were clocklike for the relaxed clock model
(Table 5). Nevertheless, it is apparent from the ECDs that for the Lentiviruses ENV the test
was not overwhelmingly conclusive as in the other cases. The difference between the ENV and
the other two cases can be seen by constructing the coverage bands for the three cases (we have
removed the longer branches from the ENV ensemble). In fact, the 99% confidence band of
the ENV ECD was very wide (0.125) compared with other two cases (0.062 for COX1 and
Ensemble of Trees (GTR)
5.76
0.07
31 (30)
465
0.04
< 0.00001
> 0.83
Real Tree (Sanson et al. 2002)
5.36
0.07
1 (30)
30
0.16
0.65
> 0.83
10 / 22
Fig 4. Empirical cumulative distributions of the Sanson et al. [
36
] sequences ensemble. Vertical dotted
line indicates the mean value. Panels (a) and (b) display the empirical cumulative distribution of the observed
data from ensemble of trees against the expected cumulative Poisson distribution with mean branch length
λ = 5.75 and the empirical cumulative distribution of the observed data from the real tree against the expected
cumulative Poisson distribution with mean branch length λ = 5.36, respectively.
0.091 for 18S rDNA), see Fig 8. The situation of ENV could be resolved by trying to fit other
relaxed clock models and/or performing more powerful twosample tests. More importantly,
it should be noted that ultrametricity of the chronogram/timetree is not a requirement for this
method [
41
].
Large dataset of real sequences. Until now we have used some small datasets to
illustrate several aspects of the method, which might give the impression that the size of the
dataset is a limitation of the method. This is not the case, the method works equally well,
independently of the number of sequences.
Fig 5. The ENV phylogeny of immunodeficiency Lentiviruses. Taxa and accession numbers are: HIV1 (K03455.1), HIV2 (M30502.1), BIV (M32690.1),
FIV (M25381.1), SIVgm (U58991.1), SIVcpz (X52154.1), SIVmne (M32741.1), SIVrh (FJ842859) and SIVsmm (X14307.1).
11 / 22
Fig 6. The COX1 phylogeny of Primates. Taxa and accession numbers are: Homo sapiens (YP003024028.1), Pan troglodytes (NP008188.1), Pan
paniscus (NP008201.1), Gorilla gorilla (YP002120661.1), Pongo abelii (NP007837.1), Nomascus leucogenys (YP008379101.1), Macaca fascicularis
(YP002884228.1), Mus musculus (NP904330.1).
We have performed the tests on the dataset of [
37
], consisting of an alignment of 134
sequences (with 1474 bp) of fungal 18S rDNA, of which 131 are representative of all groups of
Fungi, 2 are representatives of animal (Clathrina cerebrum) and plants (Sphagnum cuspidatum)
and 1 outgroup (Developayella elegans), see [
37
] for the complete list of taxa, accession
numbers and the phylogeny. We generated 3 ensembles with 15,000 trees (20,000 trees with 25%
burnin)±one nonclock, one strict clock and one relaxed clock±each of them under the GTR
model.
We performed the PKS test, which concluded that the phylogeny was not clocklike
(Table 6). Since the tree is extremely unbalanced this was expected, as the branch length
distributions are multimodal in all cases. However, the PKS test could have concluded this because
of the unbalanced topology and not because the tree is nor clocklike.
We performed the 2KS test for the unconstrained ensemble against the strict
clockconstrained ensemble and the uncorrelated rates lognormal relaxed clock model [
40
]. The 2KS
test concluded that the phylogeny was not strict clocklike. Moreover, the 2KS test concluded
that the phylogeny was not clocklike for the relaxed clock model, as well (Table 7). From the
ECDs it seems that the strict clock and the relaxed clock look very similar. We performed the
2KS test for the strict clockconstrained ensemble against the relaxed clockconstrained
ensemble and, as expected, the test concluded that they are not the same (Table 7). It is clear
from the ECDs that the branch length distribution of the unconstrained ensemble is highly
overdispersed in relation to the strict clockconstrained and the relaxed clockconstrained
ensembles. In fact, the 99% confidence band of the unconstrained ensemble ECD excludes a
12 / 22
Fig 7. The 18S rDNA phylogeny of Ascomycetes. Taxa and accession numbers are: Schizosaccharomyces pombe (CU329672.1), Lachancea waltii
(X89527), Wickerhamomyces canadensis (AB054565.1), Eremothecium gossypii (AY046265.1), Zygosaccharomyces rouxii (AY227011.1),
Kluyveromyces lactis (HM009311.1), Vanderwaltozyma polyspora (JQ698890.1), Nakaseomyces bacillisporus (AY046252.1), Nakaseomyces delphensis
(AY198400.1), Candida glabrata (KT229542.1), Naumovozyma castellii (HE576754), Candida castellii (AY046253.1), Saccharomyces bayanus
(AY046227), Saccharomyces cerevisiae (JQ409454.1), Saccharomyces paradoxus (BR000309.1), Lachancea kluyveri (Z75580.1), Lachancea
thermotolerants (CU928180.1).
large portion of the ECDs of the other two clock constrained ensembles (Fig 9). In this case,
the conclusion indicates that other relaxed clock models should be considered.
Comparison with other tests
There are other statistical tests for the molecular clock, including relaxed clocks and they are of
two types: (i) tests that are applied directly to a consensus tree or to the ensemble of trees
generated by a standard run of the MCMC sampler (e.g., Likelihood Ratio (LR) test [
42
]); (ii) tests
that require extensive additional computations beyond the standard run of the MCMC
< 0.00001
26.5
sampler (e.g., Bayes Factor). Since our test is of the first type we have performed a simple
comparison between the PKS test and the LR test in order to illustrate their application.
Let us recall the main points of the Likelihood Ratio (LR) test [
42
]. Under the strict clock
hypothesis (H0), there are S−1 parameters corresponding to the ages of the S−1 internal nodes
on a rooted tree with S species. The more general nonclock hypothesis (H1) allows every
branch to have its own rate. Because time and rate are confounded, there are 2S−3 free
parameters, corresponding to the branch lengths in the unrooted tree. The clock model is equivalent to
the nonclock model by applying S−2 equality constraints. If L0 and L1 are the loglikelihood
values under clock and nonclock models, respectively, then 2ΔL = −2(L1−L0) is compared with
the critical value from the χ2 (chisquare) distribution with ν = S−2 degrees of freedom to decide
whether the clock hypothesis is rejected (the chisquare distribution is the asymptotic null
distribution of the test when ν is sufficiently large). In order to perform the LR test with trees
generated by Bayesian inference one must compute two consensus trees (one for each ensemble).
We have performed the LR test with both the simulated data and in vitro evolution data in
order to compare with the PKS test (Table 8). To perform the LR test we also needed to compute
a clockconstrained consensus tree. In both cases, the null hypothesis that the phylogeny is strictly
clocklike was not rejected. This is incongruent with the result of the PKS test performed on the
ensemble of trees obtained for the in vitro evolution data, most likely because the effective samples
size achieved in the PKS test was capable of providing enough information and power to the test.
It should be noted that the LR test does not examine whether the rate is constant over time. In
fact, what is tested is the weather the hypothesis that all tips of the tree are equidistant from the
root, with distances measured by the number of substitutions. Therefore, if the evolutionary rate
has been equally accelerating (or decelerating) over time in all lineages, in the absence of a
calibration the tree will be ultrametric, although the rate is not constant. Second, the test cannot
distinguish a constant rate from an average variable rate within a lineage, although the latter may be a
more sensible explanation than the former when the clock is rejected and the rate is variable
across lineages.
Lentiviruses ENV
349 (15)
670
354
0.043
0.107
0.773
Primates COX1
940 (15)
2,707
1,395
0.005
0.050
0.999
Yeasts 18S rDNA
211 (31)
1,255
647
0.007
0.070
0.999
14 / 22
Fig 8. Empirical cumulative distributions and 99% confidence bands for the three datasets of real
sequences. The confidence band (the two black curves) is constructed around the ECD of the unconstrained
ensemble. In the COX1 and 18S rDNA, the blue curve is underneath the red curve.
Conclusion
In the present paper we introduce two nonparametric goodnessoffit GOF tests based on the
empirical cumulative distribution (ECD) to the context of testing for the molecular clock in
phylogeny, by using branch lengths extracted from an ensemble of statistically equivalent
Nonclock ensemble
15.0
trees. The use of an ensemble of statistically equivalent trees to compute the ECD from the
observed data, instead of a consensus tree, relieves the effects of small sample size and issues
related to lack of power and lack of information.
The PKS test allows to directly verify whether a phylogeny is clocklike following a Poisson
law. The PKS test is very simple to apply and could even be performed simultaneously with the
generation of the ensemble of trees, but it is more limited since it has a very restricted null
hypothesis. The test may be extended to the case of Poisson mixtures in order to allow for
more flexible null hypothesis.
The 2KS test is more flexible allowing for the investigation whether a phylogeny follows a
relaxed clock model, but has more intricate usage since it requires the generation and
comparison of two ensembles of trees. The 2KS test can distinguish the strict clock model from a
relaxed clock model but it seems that it is not powerful enough to distinguish one relaxed
clock model from another. Generally speaking, in both methods it is possible to replace the KS
tests by another ECD based goodnessoffit test, such as the CrameÂrvon Mises test or the
AndersonDarling test [43±45]. Another possible extension of these methods is to employ
Bayesian nonparametric goodnessoffit tests.
Fig 9. Empirical cumulative distributions for the unconstrained (blue), strict clock (red) and relaxed
clock (green) and a 99% confidence band for the large dataset of real sequences. The confidence band
(the two black curves) is constructed around the ECD of the unconstrained ensemble.
16 / 22
Clock (L0)
−1,476.80
−4,312.59
Nonclock (L1)
−1,477.02
−4,321.71
Materials and methods
Theoretical framework
In this section we provide the theoretical basis for the tests described in Section 2. Let us start
with the computation of the ECD using the branch lengths obtained from an ensemble of
trees. There are two points that must be clarified: (i) the fact that the trees used in the
computation of FN(x) are not independent±this is an important issue, since one of the assumptions of
all KS tests is that the data used to compute FN(x) is that it is IID and of failure this assumption
may cause a pseudoreplication effect; (ii) what is the appropriate sample size N for the KS test,
that is, how many trees must be used to compute FN(x)±in our case this definition is not trivial,
since it is not the original data (the sequence alignment) that is used in the test, but the trees
generated by a MCMC sampler from it.
The issue of nonindependence of the trees is due to the fact that Bayesian phylogeny
inference employs a Markov Chain Monte Carlo (MCMC) sampler to compute the trees. In
addition, it is well known that critical values of the KS statistics are sensitive to sample
autocorrelation, that is, the actual false positive rate tends to be higher when there exists a positive
sample autocorrelation. As a result, direct use of the KS test might give misleading results
when used with an autocorrelated sample, even though, it is possible to modify the test to
account for sample autocorrelations [
46,47
]. Nevertheless, when sample autocorrelation is
due to a process with short range memory (e.g. markovian process) there are two simple
adjustments that allows for the application of KS test for IID samples to the case of
autocorrelated samples [48]. Fortunately, both adjustments are already implemented in most of the
MCMC samplers for Bayesian phylogeny inference. The first adjustment is called thinning of
sample, which consists in discarding a fixed number of consecutive sampled trees after one tree
is included in the ensemble so that the remaining trees are almost independent of each other.
The second procedure is called effective sample size (ESS). When a sample has autocorrelations,
the information contained in the data is (usually) less than the information contained in an IID
sample with the same size. In other words, the number of equivalent independent observations
is fewer than the actual sample size. The effective sample size (ESS) is the size of a putative IID
sample that carries the same amount of information as the correlated sample. In [
49
] the authors
discuss the notion of effective number of independent observations in an autocorrelated time
series, specifically with respect to estimating the mean and the variance of a distribution.
Furthermore, it has been demonstrated that several hypothesis tests can be modified for
autocorrelated data using an ESS adjustment. For example, [
50
] proposed a modified MannKendall trend
detection test with ESS adjustment. Similar to the KS test, the original MannKendall test is a
nonparametric test based on the IID sample assumption, which is not satisfied by most of time
series data. The modification proposed in [
50
] is simply to replace the actual sample size by the
effective sample size in the computation of the pvalue or the critical value. With the the ESS
adjustment the modified MannKendall provides the correct rejection rate. Here we propose the
same type of ESS adjustment for the KS test, by replacing the actual sample size by the effective
sample size computed by the MCMC sampler for the tree length (TL) estimation. It should be
remarked that in a strict clockconstrained tree the branch lengths are not independent, as well.
In fact, the branch lengths are constrained in such a way that the distance from the root to the
17 / 22
tips is the same for all tips. In this case, the ESS adjustment in the 2KS takes into account the lack
of independence of the branch lengths by employing the adjustment with the coefficient kC
associated to the tree length (TL) estimation.
Regarding the definition of appropriate sample size of the KS test, the reason why this is an
issue here is because the the ensemble of trees used as the sample for the tests is generated by a
MCMC sampler from the original data, and so the sample size N may, in principle, be
arbitrarily large. But this would be an artifact, since the input for the computation of the ensemble
is a finite set of finite sequences and one cannot extract an arbitrary amount of information
form it by computing an arbitrarily large number of sample trees. Moreover, it is known that
the critical value Dα = Dα(λ,N) of the PKS test, as a function of the sample size, goes to zero as
fast as N ½, when N goes to infinity [
51,52
]. On the other hand, since DPKS(N) does not go to
zero, when N goes to infinity (because the original data contains a finite amount of
information), it is possible to increase the number of trees used to compute DPKS in such a way that the
null hypothesis is always rejected, thus, rendering the test completely useless. In order to
circumvent this difficulty we proposed the definition of the sample size in terms of the least
number of trees τ such that DPKS is minimal with respect to τ, given that τ the number of taxa
(this last condition ensures that the ensemble of trees is not too small). Then, if the null
hypothesis is rejected under these conditions, increasing the sample size will not change the
result, although it would artificially inflate the power of the test. In this case, the use of the ESS
adjusted sample size NADJ ensures that the power is not inflated. On the other hand, if the null
hypothesis is not rejected, increasing the sample size would artificially revert this result.
Hence, in this case, the use of NADJ ensures that this reversion does not occur. Failure to reject
the null hypothesis have two different meanings, depending on the magnitude of NADJ: (i) if
NADJ is sufficiently large and (H0) is not rejected then it is possible that the null hypothesis is
actually true; (ii) if NADJ is small and (H0) is not rejected then the test is inconclusive due to
lack of information in the data from which the ensemble of trees was generated.
It is convenient to be able to estimate the power of the KS test in order to evaluate if the test
correctly rejects the null hypothesis when the alternative hypothesis is true, equivalently, the
probability of accepting the alternative hypothesis when it is true. However, it is very difficult
to compute the asymptotic distribution of the KS statistics under the alternative hypothesis
and thus it is difficult to compute the power of the KS test. It is possible, nevertheless, to find a
lower bound for the power and use it to evaluate the asymptotic power of the KS test [
22,53
].
Although, this lower bound is conservative for discrete versions of the KS test [54], which
includes the PKS version used here, it still may be used to evaluate the power of the test.
Historically, KS tests have only been used as goodnessoffit tests for continuous
distributions, while the chisquare test has been commonly employed for discrete data. In [
55
] the
author gives a comprehensive review of both and their competitors. The chisquare test
statistics may also be written as a measure of discrepancy between the ECD FN(x) and expected
cumulative distribution FE(x). However, it does not take into account the natural ordering
among the observations, a fact exploited in analysis of attribute data. More specifically, the
chisquare test statistics is invariant under permutations. In contrast, the KS test statistics is
sensitive to the overweighting or underweighting of any tail or segment of the empirical
distribution relative to the hypothesized distribution. It is for this reason that KS tests derive their
greater advantages and is the main motivation behind several efforts to adapt KS tests to
discrete data.
Finally, it is interesting to observe that the idea behind the likelihood ratio (LR) test of
examining if all tips of the tree are equidistant has an analogue in the nonparametric setting
proposed here. For each tree in the ensemble, a tree branch randomly picked and the distance
from the root computed. Then, the ECD of distances from the tips to the root is computed and
18 / 22
tested if the distribution follows a Poisson law (the same discussion about sample size and ESS
adjustment should apply here). A hint that this procedure might work is provided by the
analysis involving the outgroup ECD in the simulated example, which is the distance from the root
to the tip and is very close to a Poisson distribution. The potential importance of this variation
in the PKS test, is beyond the scope of the present study and may be investigated in a future
work.
Software and computation resources
The alignments were made with Clustal W2 [
56
], the phylogenies were computed with
MrBayes 3.2.6 [
57
] and the model selection was computed with jModelTest 2 [
58
]. The
statistical software R 3.3.2 [
59
] was used to prepare figures of ECDs and confidence bands, compute
critical values and pvalues. Phylogenetic tree manipulation and branch length extraction was
made with the R package APE 3.5 [
60
]. The simulation of nucleotide sequence was made with
SeqGen 1.3.2 [
61
]. Alternatively, we have created a simple program to perform the
onesample PKS test. The program imports the output files generated by MrBayes, extract the branch
lengths from the ensemble of trees. The PKS critical values are computed using the tables of
Campbell and Oprian [
27
]. The user may choose one of the three possible significance levels α:
10%, 5% and 1%. The program outputs the following information: (i) the mean value and
variance of the loglikelihood scores of the ensemble of trees; (ii) the mean and the variance of the
ECD of branch lengths, measured in number of substitutions; (iii) the PKS statistics DPKS and
the PKS critical value Dα for the chosen significance level. Finally, it is possible to plot the ECD
of branch lengths and the expected Poisson cumulative distribution. The implementation of
the PKS test is done in Python 2.7 [
62
] with the libraries Numpy, Pylab, Matplotlib, Tkinter.
Program availability
The source code of program PKS, that implements the method here described, is available at
GitHub (github.com/FernandoMarcon/PKS_Test).
Supporting information
S1 File. Alignments used to generate phylogenies and analyses shown in all tables and
figures in this study are available as a single zip file containing 6 folders named 18SrRNA,
COX1, ENV, Fungi, Sanson and simulated. Each folder contains the corresponding
alignments in nexus format.
(ZIP)
Acknowledgments
The authors thank Dr. Joseph Felsenstein for critical review of the manuscript and valuable
suggestions regarding the sets of branch lengths as IID variables and effects of covariation
between long branches.
Author Contributions
Conceptualization: Fernando Antoneli, Marcelo R. S. Briones.
Data curation: Luciano R. Lopes.
Formal analysis: Marcelo R. S. Briones.
Funding acquisition: Marcelo R. S. Briones.
19 / 22
Investigation: Fernando Antoneli.
Methodology: Fernando M. Passos, Luciano R. Lopes.
Project administration: Marcelo R. S. Briones.
Resources: Luciano R. Lopes.
Software: Fernando Antoneli, Fernando M. Passos.
Supervision: Marcelo R. S. Briones.
Validation: Fernando Antoneli.
Writing ± original draft: Fernando Antoneli, Marcelo R. S. Briones.
Writing ± review & editing: Fernando Antoneli, Marcelo R. S. Briones.
20 / 22
21 / 22
1. Zuckerkandl E , Pauling L. Molecular Disease , Evolution and Genic Heterogeneity . In: Kasha M , Pullman B , editors. Horizons in Biochemistry: Albert SzentGyoÈrgyi Dedicatory Volume. 2nd ed. New York: Academic Press; 1962 . pp. 189 ± 225 .
2. Zuckerkandl E , Pauling L . Evolutionary divergence and convergence in proteins . In: Bryson V , Vogel H , editors. Evolving Genes and Proteins . New York: Academic Press; 1965 . pp. 97 ± 166 .
3. Zuckerkandl E , Pauling L . Molecules as documents of evolutionary history . J Theor Biol . 1965 ; 8 : 357 ± 366 . PMID: 5876245
4. Kimura M , Ohta T. Protein polymorphism as a phase of molecular evolution . Nature . 1971 ; 229 : 467 ± 469 . PMID: 4925204
5. Kimura M. The Neutral Theory of Molecular Evolution . Cambridge University Press; 1984 .
6. Kimura M. Molecular evolutionary clock and the neutral theory . J Mol Evol . 1987 ; 26 : 24 ± 33 . PMID: 3125335
7. Gillespie JH . The molecular clock may be an episodic clock . PNAS . 1984 ; 81 : 8009 ± 8013 . PMID: 6595674
8. Gillespie JH . Rates of Molecular Evolution. Annual Review of Ecology and Systematics . 1986 ; 17 : 637 ± 665 . https://doi.org/10.1146/annurev.es. 17 .110186.003225
9. Swofford DL , Olsen GJ , Waddell PJ , Hillis DM . Phylogenetic Inference . In: Hillis DM , Moritz C , Mable BK , editors. Molecular Systematics . 2nd ed. Massachusetts: Sinauer Associates, Inc.; 1996 . pp. 407 ± 514 .
10. Fitch WM . Molecular evolutionary clocks . In: Ayala FJ, editor. Molecular Evolution . Sinauer Associates, Inc.; 1976 . pp. 160 ± 178 .
11. Tajima F. Simple methods for testing the molecular evolutionary clock hypothesis . Genetics . 1993 ; 135 : 599 ± 607 . PMID: 8244016
12. Zheng Q. On the dispersion index of a Markovian molecular clock . Mathematical Biosciences . 2001 ; 172 : 115 ± 128 . https://doi.org/10.1016/S0025 5564 ( 01 ) 00067  0 PMID: 11520502
13. Goldman N. Variance to mean ratio, R(t), for poisson processes on phylogenetic trees . Mol Phylogenet Evol . 1994 ; 3 : 230 ± 239 . https://doi.org/10.1006/mpev. 1994 .1025 PMID: 7820287
14. Nielsen R . Robustness of the estimator of the index of dispersion for DNA sequences . Mol Phylogenet Evol . 1997 ; 7 : 346 ± 351 . https://doi.org/10.1006/mpev. 1997 .0411 PMID: 9187093
15. ReÂnyi A . On an extremal property of the poisson process . Ann Inst Stat Math . 1964 ; 16 : 129 ± 133 . https://doi.org/10.1007/BF02868567
16. Takahata N. On the overdispersed molecular clock . Genetics . 1987 ; 116 : 169 ± 179 . PMID: 3596230
17. Takahata N. Statistical models of the overdispersed molecular clock . Theor Popul Biol . 1991 ; 39 : 329 ± 344 . PMID: 1896948
18. Kolmogorov AN . Sulla determinazione empirica di una legge di distribuzione . Giornale dell'Istituto Italiano degli Attuari . 1933 ; 4 : 83 ± 91 .
19. Kolmogorov AN . On the empirical determination of a distribution law . In: Shiryayev AN, editor. Selected Works of A N Kolmogorov . Springer Netherlands; 1992 . pp. 139 ± 146 . Available: http://link.springer. com/chapter/10.1007/ 978 9401122603_ 15
20. Smirnov NV . On the Estimation of the Discrepancy Between Empirical Curves of Distribution for Two Independent Samples . Bul Math de l'Univ de Moscou. 1939 ; 2 : 3± 14 .
21. Smirnov NV . Table for estimating the goodness of fit of empirical distributions . Ann Math Statist . 1948 ; 19 : 279 ± 281 . https://doi.org/10.1214/aoms/1177730256
22. Massey FJ Jr. The KolmogorovSmirnov test for goodness of fit . Journal of the American Statistical Association . 1951 ; 46 : 68 ± 78 . https://doi.org/10.2307/2280095
23. Noether GE. Note on the kolmogorov statistic in the discrete case . Metrika . 1963 ; 7 : 115 ± 116 . https:// doi.org/10.1007/BF02613966
24. Schmid P. On the Kolmogorov and Smirnov Limit Theorems for Discontinuous Distribution Functions . The Annals of Mathematical Statistics . 1958 ; 29 : 1011 ± 1027 .
25. Pettitt AN , Stephens MA . The KolmogorovSmirnov GoodnessofFit Statistic with Discrete and Grouped Data . Technometrics . 1977 ; 19 : 205 ± 210 . https://doi.org/10.1080/00401706. 1977 .10489529
26. Wood CL , Altavela MM . LargeSample Results for KolmogorovSmirnov Statistics for Discrete Distributions . Biometrika. 1978 ; 65 : 235 ± 239 . https://doi.org/10.2307/2335304
27. Campbell DB , Oprian CA. On the KolmogorovSmirnov test for the Poisson distribution with unknown mean . Biom J . 1979 ; 21 : 17 ± 24 . https://doi.org/10.1002/bimj.4710210104
28. Papadopoulos AS , Qiao N. On the KolmogorovSmirnov test for the Poisson distribution with unknown parameter . Journal of Interdisciplinary Mathematics . 2003 ; 6 : 65 ± 82 . https://doi.org/10.1080/09720502. 2003 .10700331
29. Conover WJ . A Kolmogorov goodnessoffit test for discontinuous distributions . Journal of the American Statistical Association . 1972 ; 67 : 591 ± 596 . https://doi.org/10.2307/2284444
30. Henze N. Empiricaldistributionfunction goodnessoffit tests for discrete models . The Canadian Journal of Statistics / La Revue Canadienne de Statistique . 1996 ; 24 : 81 ± 93 . https://doi.org/10.2307/ 3315691
31. Frey J. An exact Kolmogorov±Smirnov test for the Poisson distribution with unknown mean . Journal of Statistical Computation and Simulation . 2011 ; 82 : 1023 ± 1033 . https://doi.org/10.1080/00949655. 2011 . 563740
32. Wasserman L. All of Nonparametric Statistics. Springer New York; 2006 .
33. Hollander M , Wolfe DA , Chicken E . Nonparametric Statistical Methods. 3 edition . Hoboken, New Jersey: Wiley; 2013 .
34. Massey FJ . Distribution Table for the Deviation Between two Sample Cumulatives . Ann Math Statist. 1952 ; 23 : 435 ± 441 . https://doi.org/10.1214/aoms/1177729388
35. Kendall M , Stuart A , Ord K . Kendall's Advanced Theory of Statistics . 6th ed. Wiley; 1994 .
36. Sanson GFO , Kawashita SY , Brunstein A , Briones MRS . Experimental phylogeny of neutrally evolving DNA sequences generated by a bifurcate series of nested polymerase chain reactions . Mol Biol Evol . 2002 ; 19 : 170 ± 178 . PMID: 11801745
37. Padovan ACB , Sanson GFO , Brunstein A , Briones MRS . Fungi evolution revisited: application of the penalized likelihood method to a Bayesian fungal phylogeny provides a new perspective on phylogenetic relationships and divergence dates of Ascomycota groups . J Mol Evol . 2005 ; 60 : 726 ± 735 . https:// doi.org/10.1007/s002390040164y PMID: 15909224
38. Kimura M. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences . J Mol Evol . 1980 ; 16 : 111 ± 120 . PMID: 7463489
39. Ho SYW , Phillips MJ , Drummond AJ , Cooper A . Accuracy of Rate Estimation Using RelaxedClock Models with a Critical Focus on the Early Metazoan Radiation . Mol Biol Evol . 2005 ; 22 : 1355 ± 1363 . https://doi.org/10.1093/molbev/msi125 PMID: 15758207
40. Lepage T , Bryant D , Philippe H , Lartillot N . A General Comparison of Relaxed Molecular Clock Models . Molecular Biology and Evolution . 2007 ; 24 : 2669 ± 2680 . https://doi.org/10.1093/molbev/msm193 PMID: 17890241
41. Wu CI , Li WH . Evidence for higher rates of nucleotide substitution in rodents than in man . PNAS . 1985 ; 82 : 1741 ± 1745 . PMID: 3856856
42. Felsenstein J . Phylogenies from Molecular Sequences: Inference and Reliability. Annual Review of Genetics . 1988 ; 22 : 521 ± 565 . https://doi.org/10.1146/annurev.ge. 22 .120188.002513 PMID: 3071258
43. Stephens MA . Use of the KolmogorovSmirnov, CramerVon Mises and Related Statistics Without Extensive Tables . Journal of the Royal Statistical Society Series B (Methodological) . 1970 ; 32 : 115 ± 122 .
44. Stephens MA . EDF Statistics for Goodness of Fit and Some Comparisons . Journal of the American Statistical Association . 1974 ; 69 : 730 ± 737 . https://doi.org/10.1080/01621459. 1974 .10480196
45. D'Agostino RB , Stephens MA , editors. Goodnessoffittechniques. 1 edition . New York: Dekker; 1986 .
46. Weiss MS. Modification of the KolmogorovSmirnov Statistic for use with correlated data . Journal of the American Statistical Association . 1978 ; 73 : 872 ± 875 . https://doi.org/10.2307/2286297
47. Dehling H , Mikosch T , Sorensen, editors. Empirical Process Techniques for Dependent Data [Internet] . Boston: BirkhaÈuser Basel; 2002 . Available: http://www.springer.com/la/book/9780817642013
48. Chicheportiche R , Bouchaud JP. Goodnessoffit tests with dependent observations . J Stat Mech . 2011 ; 2011: P09003 . https://doi.org/10.1088/ 1742  5468 / 2011 /09/P09003
49. Bayley GV , Hammersley JM . The ªEffectiveº Number of Independent Observations in an Autocorrelated Time Series . Supplement to the Journal of the Royal Statistical Society . 1946 ; 8 : 184 ± 197 . https://doi. org/10.2307/2983560
50. Yue S , Wang C . The MannKendall test modified by effective sample size to detect trend in serially correlated hydrological series . Water Resources Management . 2004 ; 18 : 201 ± 218 . https://doi.org/10. 1023/B:WARM. 0000043140 .61082.60
51. Dvoretzky A , Kiefer J , Wolfowitz J . Asymptotic Minimax Character of the Sample Distribution Function and of the Classical Multinomial Estimator . Ann Math Statist. 1956 ; 27 : 642 ± 669 . https://doi.org/10. 1214/aoms/1177728174
52. Massart P. The Tight Constant in the DvoretzkyKieferWolfowitz Inequality . Ann Probab. 1990 ; 18 : 1269 ± 1283 . https://doi.org/10.1214/aop/1176990746
53. Capon J . On the Asymptotic Efficiency of the KolmogorovSmirnov Test . Journal of the American Statistical Association . 1965 ; 60 : 843 ± 853 . https://doi.org/10.2307/2283250
54. Gleser LJ . Exact Power of GoodnessofFit Tests of Kolmogorov Type for Discontinuous Distributions . Journal of the American Statistical Association . 1985 ; 80 : 954 ± 958 . https://doi.org/10.1080/01621459. 1985 .10478210
55. Horn SD . GoodnessofFit Tests for Discrete Data: A Review and an Application to a Health Impairment Scale . Biometrics. 1977 ; 33 : 237 ± 247 . https://doi.org/10.2307/2529319 PMID: 843576
56. Larkin MA , Blackshields G , Brown NP , Chenna R , McGettigan PA , McWilliam H , et al. Clustal W and Clustal X version 2 .0. Bioinformatics . 2007 ; 23 : 2947 ± 2948 . https://doi.org/10.1093/bioinformatics/ btm404 PMID: 17846036
57. Ronquist F , Teslenko M , van der Mark P , Ayres DL , Darling A , HoÈhna S , et al. MrBayes 3 . 2: efficient Bayesian phylogenetic inference and model choice across a large model space . Syst Biol . 2012 ; 61 : 539 ± 542 . https://doi.org/10.1093/sysbio/sys029 PMID: 22357727
58. Posada D. jModelTest: Phylogenetic Model Averaging . Molecular Biology and Evolution . 2008 ; 25 : 1253 ± 1256 . https://doi.org/10.1093/molbev/msn083 PMID: 18397919
59. R Core Team. R: A language and environment for statistical computing [Internet] . Vienna, Austria: R Foundation for Statistical Computing; 2015 . Available: http://www.Rproject.org/
60. Paradis E , Claude J , Strimmer K. APE: Analyses of Phylogenetics and Evolution in R language . Bioinformatics . 2004 ; 20 : 289 ± 290 . https://doi.org/10.1093/bioinformatics/btg412 PMID: 14734327
61. Rambaut A , Grass NC . SeqGen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees . Comput Appl Biosci . 1997 ; 13 : 235 ± 238 . https://doi.org/10.1093/ bioinformatics/13.3.235 PMID: 9183526
62. van Rossum G. Python [Internet]. Amsterdam: Python Software Foundation; 1995 . Available: https:// www.python.org/