Diffusion Tensor Imaging with Deterministic Error Bounds
J Math Imaging Vis
Diffusion Tensor Imaging with Deterministic Error Bounds
Artur Gorokh 0 1 2 3 4
Yury Korolev 0 1 2 3 4
Tuomo Valkonen 0 1 2 3 4
Mathematics Subject Classification 0 1 2 3 4
0 Faculty of Physics, Lomonosov Moscow State University , Moscow , Russia
1 Yury Korolev
2 Artur Gorokh
3 Department of Applied Mathematics and Theoretical Physics, University of Cambridge , Cambridge , UK
4 School of Engineering and Materials Science, Queen Mary University of London , London , UK
Errors in the data and the forward operator of an inverse problem can be handily modelled using partial order in Banach lattices. We present some existing results of the theory of regularisation in this novel framework, where errors are represented as bounds by means of the appropriate partial order. We apply the theory to diffusion tensor imaging, where correct noise modelling is challenging: it involves the Rician distribution and the nonlinear StejskalTanner equation. Linearisation of the latter in the statistical framework would complicate the noise model even further. We avoid this using the error bounds approach, which preserves simple error structure under monotone transformations.
Diffusion tensor imaging; Noise modelling; Total generalised variation; Error bounds; Deterministic

92C55 · 94A08
1 Introduction
Often in inverse problems, we have only very rough
knowledge of noise models, or the exact model is too difficult to
realise in a numerical reconstruction method. The data may
also contain process artefacts from blackbox devices [
44
].
Partial order in Banach lattices has therefore recently been
investigated in [
32–34
] as a lessassuming error modelling
approach for inverse problems. This framework allows the
representation of errors in the data as well as in the forward
operator of an inverse problem by means of order intervals
(i.e. lower and upper bounds by means of appropriate partial
orders). An important advantage of this approach vs.
statistical noise modelling is that deterministic error bounds
preserve their simple structure under monotone transformations.
We apply partial order in Banach lattices to diffusion
tensor imaging (DTI). We will in due course explain the
diffusion tensor imaging progress, as well as the theory of
inverse problems in Banach lattices, but start by introducing
our model
min R(u) s.t. u
u
0,
glj A j u guj , ( j = 1, . . . , N ).
That is, we want to find a field of symmetric 2tensors
u : Ω → Sym2(R3) on the domain Ω ⊂ R3, minimising
the value of the regulariser R on the feasible set. The tensor
field u is our unknown image. It is subject to a positivity
constraint, as well as partial order constraints imposed through
the operators [ A j u](x ) := − b j , u(x )b j , and the upper and
lower bounds glj := log(sˆlj /sˆ0u ) and guj := log(sˆ uj /sˆ0l). These
arise from the linearisation (via the monotone logarithmic
transformation) of the StejskalTanner equation
s j (x ) = s0(x ) exp(− b j , u(x )b j ), ( j = 1, . . . , N ), (1)
central to the diffusion tensor imaging process.
To shed more light on u and the equation (1), let us
briefly outline the diffusion tensor imaging process. As a
first step towards DTI, diffusionweighted magnetic
resonance imaging (DWI) is performed. This process
measures the anisotropic diffusion of water molecules. To
capture the diffusion information, the magnetic resonance
images have to be measured with diffusionsensitising
gradients in multiple directions. These are the different bi ’s
in (1). Eventually, multiple DWI images {s j } are related
through the Stejskal–Tanner equation (1) to the
symmetric positivedefinite diffusion tensor field u : Ω →
Sym2(R3) [
4, 29
]. At each point x ∈ Ω , the tensor u(x )
is the covariance matrix of a normal distribution for the
probability of water diffusing in different spatial
directions.
The fact that multiple bi ’s are needed to recover u leads
to very long acquisition times, even with ultra fast sequences
like echo planar imaging (EPI). Therefore, DTI is inherently
a lowresolution and lowSNR method. In theory, the
amplitude DWI images exhibit Rician noise [
24
]. However, as the
histogram of an in vivo measurement in Fig. 1 illustrates,
this may not be the case for practical datasets from blackbox
devices. Moreover, the DWI process is prone to eddycurrent
distortions [
53
], and due to the slowness of it, it is very
sensitive to patient motion [
1, 27
]. We therefore have to use
techniques that remove these artefacts in solving for u(x ). We
also need to ensure the positivity u, as nonpositivedefinite
diffusion tensor are nonphysical. One proposed approach
for the satisfaction of this constraint is that of logEuclidean
metrics [3]. This approach has several theoretically
desirable aspects, but some practical shortcomings [
57
]. Special
Perona–Maliktype constructions on Riemannian manifolds
can also be used to maintain the structure of the tensor field
[
14, 54
]. Such anisotropic diffusion is, however, severely
illposed [60]. Recently manifoldvalued discretedomain total
variation models have also been applied to diffusion tensor
imaging [
6
].
Our approach is also in the total variation family, first
considered for diffusion tensor imaging in [
48
]. Namely,
we follow up on the work in [
55, 57–59
] on the application
of total generalised variation regularisation [9] to DTI. We
should note that in all of these works, the fidelity function was
the ROFtype [
45
] L 2 fidelity. This would only be correct,
according to the assumption that noise of MRI measurements
is Gaussian, if we had access to the original complex kspace
MRI data. The noise of the inverse Fouriertransformed
magnitude data s j , that we have in practice access to, is however
Rician under the Gaussian assumption on the original
complex data [
24
]. This is not modelled by the L 2 fidelity.
Numerical implementation of Rician noise modelling has
been studied in [
22, 40
]. As already discussed, in this work,
we take the other direction. Instead of modelling the errors
in a statistically accurate fashion, not assuming to know an
exact noise model, we represent them by means of pointwise
bounds. The details of the model are presented in Sect. 3. We
study the practical performance in Sect. 5 using the numerical
method presented in Sect. 4. First we, however, start with the
general error modelling theory in Sect. 2. Readers who are
not familiar with notation for Banach lattices or symmetric
tensor fields are advised to start with the Appendix 2 where
we introduce our mathematical notation and techniques.
2 Deterministic Error Modelling
2.1 Mathematical Basis
We now briefly outline the theoretical framework [
32
] that is
the basis for our approach. Consider a linear operator
equation
cannot be Rician, some bins of the 50bin histogram being empty. The
measurement setup of the data used here is described in Sect. 5.3. a Slice
of a real MRI measurement. b 50bin histogram of noise estimated from
background. c 50bin histogram after eddycurrent correction with FSL
Au = f, u ∈ U, f ∈ F,
where U and F are Banach lattices, A : U → F is a regular
injective operator. The inaccuracies in the righthand side f
and the operator A are represented as bounds by means of
appropriate partial orders, i.e.
f l , f u :
Al , Au :
f l
Al
F f
L∼(U,F) A
F f u ,
L∼(U,F) Au ,
where the symbol F stands for the partial order in F and
L∼(U,F) for the partial order for regular operators induced
by partial orders in U and F . Further, we will drop the
subscripts at inequality signs where it will not cause confusion.
The exact righthand side f and operator A are not
available. Given the approximate data ( f l , f u , Al , Au ), we need
to find an approximate solution u that converges to the exact
solution u¯ as the inaccuracies in the data diminish. This
statement needs to be formalised. We consider monotone
convergent sequences of lower and upper bounds
fnl : fnl+1
u u
fn : fn+1
fnl f
fnl − fnu
fnl ,
u
fn ,
u
fn ,
→ 0,
Aln : Aln+1
Anu : Aun+1
Aln A
Aln − Anu
Aln,
Aun ,
Aun ∀n ∈ N,
→ 0 as n → ∞.
We are looking for an approximate solution un such that
un − u¯ U → 0 as n → ∞.
Let us ask the following question. What are the elements
u ∈ U that could have produced data within the tolerances
(4)? Obviously, the exact solution is one of such elements.
Let us call the set containing all such elements the feasible
set Un ⊂ U .
Suppose that we know a priori that the exact solution is
positive (by means of the appropriate partial order in U ), then
it is easy to verify that the following inequalities hold for all
n ∈ N
u
¯
U 0,
u
An u¯
l
F fn ,
Alnu¯
u
F fn .
This observation motivates our choice of the feasible set:
Un =
u ∈ U : u
U 0,
Anu u
l
F fn ,
Alnu F fnu .
It is clear that the exact solution u¯ belongs to the sets Un
for all n ∈ N. Our goal is to define a rule that will choose
for any n an element un of the set Un such that the sequence
un ∈ Un will strongly converge to the exact solution u¯. We
do so by minimising an appropriate regularisation functional
R(u) on Un:
(2)
(3)
(4)
un = arg min R(u).
u∈Un
(5)
This method, in fact, is a lattice analogue of the
wellknown residual method [
23,52
]. The convergence result is
as follows [32].
Theorem 1 Suppose that
1. R(u) is bounded from below on U ,
2. R(u) is lower semicontinuous,
3. level sets {u : R(u) C } (C = const) are sequentially
compact in U (in the strong topology induced by the
norm).
Then the sequence defined in (5) strongly converges to the
exact solution u¯ and R(un) → R(u¯).
Examples of regularisation functionals that satisfy the
conditions of Theorem 1 are as follows. Total Variation in
L1(Ω), where Ω is a subset of Rn, assures strong
convergence in L1, given that the L1norm of the solution is
bounded. The Sobolev norm u W 1,q (Ω) yields strong
convergence in the spaces L p(Ω), where p np .
The latter fact follows from the compact emb1e,ddqin>g ofp +thne
corresponding Sobolev W 1,q (Ω) space into L p(Ω) [
16
].
However, the assumption that the sets {u : R(u) C } are
strong compacts in U is quite strong. It can be replaced by
the assumption of weak compactness, provided that the
regularisation functional possesses the socalled Radon–Riesz
property.
Definition 1 A functional F : U → R has the Radon–
Riesz property (sometimes called the H property), if for any
sequence un ∈ U weak convergence un u0 and
simultaneous convergence of the values F (un) → F (u0) imply
strong convergence un → u0.
Theorem 2 Suppose that
1. R(u) is bounded from below on U ,
2. R(u) is weakly lower semicontinuous,
3. level sets R(u) C (C = const) are weakly sequentially
compact in U ,
4. R(u) possesses the Radon–Riesz property.
Then the sequence defined in (5) strongly converges to the
exact solution u¯ and R(un) → R(u¯).
It is easy to verify that the norm in any Hilbert space
possesses the Radon–Riesz property. Moreover, this holds
for the norm in any reflexive Banach space [
16
].
As we explain in the Appendix 2 L p(Ω; Sym2(Rm )) is
not a Banach lattice. Therefore, Theorems 1 and 2 cannot
be applied directly. Further theoretical work will be
undertaken to extend the framework to the nonlattice case. For the
moment, however, we will prove that if there are no errors in
the operator A in (2), the requirement that the solution space
U is a lattice can be dropped.
Theorem 3 Let U be a Banach space, and F be a Banach
lattice. Let the operator A in (2) be a linear, continuous and
injective operator. Let fnl and fnu be sequences of lower and
upper bounds for the righthand side defined in (4), and
suppose that there are no errors in the operator A. Let us redefine
the feasible set in the following way
Un =
u ∈ U :
fnl
F Au
Suppose also that the regulariser R(x ) satisfies conditions of
either Theorem 1 or Theorem 2. Then the sequence defined
in (5) strongly converges to the exact solution u¯ and R(un ) →
R(u¯).
Proof Let us define an approximate righthand side and its
approximation error in the following way
fδn =
f u l
n + fn
2
, δn =
f u
l
n − fn .
2
One can easily verify that the inequality f − fδn
holds. Indeed, we have
δn
f − fδn
fnu − fδn =
− ( f − fδn )
fδn − fnl =
f u l
n − fn ,
2
f u l
n − fn ,
2
 f − fδn  = ( f − fδn ) ∨ (−( f − fδn ))
f − fδn
f u
l
n − fn .
2
f u l
n − fn ,
2
The first two inequalities are consequences of the
conditions (4), the third one holds by the definition of supremum
and the equality  f  = f ∨ (− f ) that holds for all f ∈ F ,
and the last inequality is due to the monotonicity of the norm
in a Banach lattice.
Similarly, one can show that for any u ∈ Un, we have
Au − fδn
δn.
Therefore, the inclusion Un ⊂ {u : Au − fδn δn} holds.
Now we will proceed with the proof of convergence un −
u¯ → 0. Will prove it for the case when the regulariser R(u)
satisfies conditions of Theorem 1. Suppose that the sequence
un does not converge to the exact solution u¯, then it contains
a subsequence unk such that unk − u¯ ε for any k ∈ N
and some fixed ε > 0.
Since the inclusion u¯ ∈ Un holds for all n ∈ N, we have
R(un) R(u¯) for all n ∈ N. Since the level set {u : R(u)
R(u¯)} is a compact set by assumptions of the theorem, the
sequence unk contains a strongly convergent subsequence.
With no loss of generality, let us assume that unk → u0. We
will now show that u0 = u¯. Indeed, we have
Aunk − Au¯
Aunk − fδn +
f − fδn
2δnk → 0.
On the other hand, we have
Aunk − Au¯
→
Au0 − Au¯
due to continuity of A and · . Therefore, Au0 = Au¯ and
u0 = u¯, since A is an injective operator. By contradiction,
we get un − u¯ → 0.
Finally, since the regulariser R(u) is lower semicontinu
ous, we get that lim inf R(un) = R(u¯). However, for any n
we have R(un) R(u¯), therefore, we get the convergence
R(un) → R(u¯) as n → ∞.
2.2 Philosophical Discussion and Statistical
Interpretation
In practice, our data are discrete. So let us momentarily switch
to measurements fˆ = ( fˆ1, . . . , fˆn) ∈ Rn of a true data
f ∈ Rn. If we actually knew the pointwise noise model
of the data, then one way to obtain potentially useful upper
and lower bounds for the deterministic model is by means of
statistical interval estimates: confidence intervals. Roughly,
the idea is to find for each true signal f individual random
upper and lower bounds fˆu and fˆl such that
P fˆu ≤ f ≤ fˆl
If fˆu and fˆl are computed based on multiple experiments
(i.e. multiple noisy samples fˆ1, . . . , fˆm , of the true data f ),
the interval [ fˆu,i , fˆl,i ] will converge in probability to the true
data fˆi , as the number of experiments m increases. Thus we
obtain a probabilistic version of the convergences in (4).
Let us try to see, how such intervals might work in practice.
For the purpose of the present discussion, assume that the
noise is additive and normaldistributed with variance σ and
zero mean—an assumption that does not hold in practice, as
we have already seen in Fig. 1, but will suffice for the next
thought experiments. That is, fˆj = f + ν j for the noise ν j .
Let the sample mean of { fˆj } mj=1 be f¯m = ( f¯m1 , . . . f¯mn ), and
pointwise sample variance σ = (σ 1, . . . , σ n). The product
of the pointwise confidence intervals Ii with confidence 1−θ
is [
15,49
]
Ii =
σ σ
f¯m − k1∗−θ/2 √m , f¯M + k1∗−θ/2 √m
where k1∗−t := Φ−1(t ) with Φ the cumulative normal
distribution function. For θ = 0.05, i.e. the 95 % confidence
interval, Φ−1(0.05/2) = 1.96. Now, the probability that Ii
covers the true f i is 1 − θ , e.g. 95 %. If we have only a single
sample m = 1, the intervals stay large, but the joint
probability (1 − θ )n goes to zero as n increases. As an example, for a
rather typical single 128 × 128 slice of a DTI measurement,
the probability that exactly φ = 5 % (to the closest discrete
value possible) of the 1 − θ = 95 % confidence intervals do
not cover the true parameter would be about 1.4 %, or
1.4 % ≈
n
m
θ m (1 − θ )n−m ,
where n = 1282 and m = φn .
The probability of at least φ = 5 % of the pointwise 95 %
confidence intervals not covering the true parameter is in this
setting approximately 49 %. This can be verified by summing
the above estimates over m = φn , . . . , n.
In summary, unless θ simultaneously goes to 1, the
product intervals are very unlikely to cover the true parameter.
Based on a single experiment, the deterministic approach as
interpreted statistically through confidence intervals is
therefore very likely to fail to discover the true solution as the data
size n increases unless the pointwise confidence is very low.
But, if we let the pointwise confidences be arbitrarily high,
such that the intervals are very large, the discovered solution
in our applications of interest would be just a constant!
Asymptotically, the situation is more encouraging. Indeed,
if we could perform more experiments to compute the
confidence intervals, then for any fixed n and θ , it is easy to
see that the solution of the “deterministic” error model is
an asymptotically consistent and hence asymptotically
unbiased estimator of the true f . That is, the estimates converge in
probability to f as the experiment count m increases. Indeed,
the error boundsbased estimator f˜m , based on m
experiments, by definition satisfies f˜m ∈ in=1 Ii . Therefore, we
have
i i
P  f˜m − f¯m  >
for some i
= 0
whenever m ≥ (k1∗−θ/2σ/ )2.
P
Thus f → f¯ in probability. Since by the law of large
num
P
bers also f¯m → f , this proves the claim, and to some extent
justifies our approach from the statistical viewpoint.
It should be noted that this is roughly the most that has
previously been known of the maximum a posteriori estimate
(MAP), corresponding to the Tikhonov models
min
In particular, the MAP is not the Bayes estimator for the
typical squared cost functional. This means that it does not
minimise f˜ → E[ f − f˜ 2]. The minimiser in this case is
the conditional mean (CM) estimate, which is why it has been
preferred by Bayesian statisticians despite its increased
computational cost. The MAP estimate is merely an asymptotic
Bayes estimator for the uniform cost function. In a very recent
work [
12
], it has, however, been proved that the MAP
estimate is the Bayes estimator for certain Bregman distances.
One possible critique of the result is that these distances are
not universal and do depend on the regulariser R, unlike the
squared distance for CM. The CM estimate, however, has
other problems in the setting of total variation and its
discretisation [
35,36
].
3 Application to Diffusion Tensor Imaging
We now build our model for applying the deterministic error
modelling theory to diffusion tensor imaging. We start by
building our forward model based on the Stejskal–Tanner
equation, and then briefly introduce the regularisers we use.
3.1 The Forward Model
For u : Ω → Sym2(R3), Ω ⊂ R3, a mapping from Ω to
symmetric secondorder tensors, let us introduce nonlinear
operators T j , defined by
[T j (u)](x ) := s0(x ) exp(− b j , u(x )b j ), ( j = 1, . . . , N ).
Their role is to model the socalled Stejskal–Tanner equation
[
4
]
s j (x ) = s0(x ) exp(− b j , u(x )b j ), ( j = 1, . . . , N ). (6)
Each tensor u(x ) models the covariance of a Gaussian
probability distribution at x for the diffusion of water
molecules. The data s j ∈ L2(Ω), ( j = 1, . . . , N ), are the
diffusionweighted MRI images. Each of them is obtained by
performing the MRI scan with a different nonzero
diffusionsensitising gradient b j , while s0 is obtained with a zero
gradient. After correcting the original kspace data for coil
sensitivities, each s j is assumed real. As a consequence, any
measurement sˆ j of s j has—in theory—Rician noise
distribution [
24
].
Our goal is to reconstruct u with simultaneous denoising.
Following [
31,55
], we consider using a suitable regulariser
R the Tikhonov model
min
u 0 j=1
N 1
2 sˆ j − T j (u) 2 + α R(u).
(7)
The constraint u 0 is to be understood in the sense that u(x )
is positive semidefinite for Ln a.e. x ∈ Ω (see Appendix 2
for more details). Due to the Rician noise of sˆ j , the Gaussian
noise model implied by the L2norm in (7) is not entirely
correct. However, in some cases the L2 model may be accurate
enough, as for suitable parameters the Rician distribution is
not too far from a Gaussian distribution. If one were to model
the problem correctly, one should either modify the fidelity
term to model Rician noise or include the (unit magnitude
complex number) coil sensitivities in the model. The Rician
noise model is highly nonlinear due to the Bessel functional
logarithms involved. Its approximations have been studied in
[
5,22,40
] for single MR images and DTI. Coil sensitivities
could be included either by knowing them in advance or by
simultaneous estimation as in [
30
]. Either way, significant
complexity is introduced into the model, and for the present
work, we are content with the simple L2 model.
We may also consider, as is often the case, and as was
done with TGV in [
57
], the linearised model
min
u 0
f − u 2 + α R(u),
(8)
where, for each x ∈ Ω, f (x ) is solved by regression for
u(x ) from the system of equations (6) with s j (x ) = sˆ j (x ).
Further, as in [
58
], we may also consider
N 1
min
u 0 j=1 2
with
[ A j u](x ) := b j , u(x )b j , and
g j (x ) := log(sˆ j (x )/sˆ0(x )).
In both of these linearised models, the assumption of
Gaussian noise is in principle even more remote from the
truth than in the nonlinear model (7). We will employ (8)
and (7) as benchmark models.
We want to further simplify the model, and forgo with
accurate noise modelling. After all, we often do not know the
real noise model for the data available in practice. It can be
corrupted by process artefacts from blackbox algorithms in
the MRI devices. This problem of blackbox devices has been
discussed extensively in [
44
], in the context of Computed
Tomography. Moreover, as we have discussed above, even
without such artefacts, the correct model may be difficult to
realise numerically. So we might be best off choosing the
least assuming model of all—that of error bounds. This is
what we propose in the reconstruction model
Here glj := log(sˆlj /sˆ0u ) and guj := log(sˆ uj /sˆ0l), glj , guj ∈
L2(Ω), are our upper and lower bounds on g j that we derive
from the data.
3.2 Choice of the Regulariser R
A prototypical regulariser in image processing is the total
variation, first studied in this context in [
45
]. It can be defined
for a u ∈ L1(Ω; Symk (Rm )) as
TV(u) :=
E u
M(Ω;Symk+1(Rm))
:= sup
Ω
div φ (x ), u(x ) d x
φ ∈ Cc∞(Ω; Symk+1(Rm ))
supx φ (x ) F ≤ 1
.
Observe that for scalar or vector fields, i.e. the cases k = 0, 1,
we have Sym0(Rm ) = T 0(Rm ) = R, and Sym1(Rm ) =
T 1(Rm ) = Rm . Therefore, for scalars in particular, this gives
the usual isotropic total variation
(11)
Total generalised variation was introduced in [
9
] as
a higherorder extension of TV. Following [
11,57
], the
secondorder variant may be defined with the
differentiation cascade formulation for symmetric tensor fields u ∈
L1(Ω; Symk (Rm )) as the marginal
 w ∈ L1(Ω; Symk+1(Rm ))
(12)
of
Φ(β,α)(u, w) := α E u − w F,M(Ω;Symk+1(Rm))
+ β E w F,M(Ω;Symk+2(Rm )).
It turns out that the standard BVnorm
u BV(Ω;Symk (Rm )) :=
u L1(Ω;Symk (Rm)) + TV(u)
and the “BGV norm” [
9
]
u
:=
u L1(Ω;Symk (Rm )) + TGV(2β,α)(u)
g j − A j u 2 + α R(u),
(9)
TV(u) =
Du
are topologically equivalent norms [
10,11
] on the space
BV(Ω; Symk (Rm )), yielding the same convergence results
for TGV regularisation as for TV regularisation. The
geometrical regularisation behaviour is, however, different, and
TGV tends to avoid the staircasing observed in TV
regularisation.
Regarding topologies, we say that a sequence {ui } in
BV(Ω; Symk (Rm )) converges weakly* to u, if ui →
u strongly in L1, and E ui ∗ E u weakly* as Radon
measures [
2,51,57
]. The latter means that for all φ ∈
Cc∞(Ω; Symk+1(Rm )) holds Ω div φ (x ), ui (x ) d x →
Ω div φ (x ), u(x ) d x .
3.3 Compact Subspaces
Now, for a weak* lower semicontinuous seminorm R on
BV(Ω; Symk (Rm )), let us set
BV0,R (Ω; Symk (Rm )) := BV(Ω; Symk (Rm ))/ ker R.
That is, we identify elements u, u˜ ∈ BV(Ω; Symk (Rm )),
such that R(u − u˜) = 0. Now R is a norm on the space
BV0,R (Ω; Symk (Rm )); compare, e.g. [
42
] for the case of
R = TV.
Suppose
u :=
u L1(Ω) + R(u)
is a norm on BV(Ω; Symk (Rm )), equivalent to the standard
norm. If also the RSobolevKornPoincaré inequality
inf
R(v)=0
u − v L1(Ω) ≤ C R(u)
holds, we may then bound
(14)
leva R :=
u ∈ BV0,R (Ω; Symk (Rm ))  R(u) ≤ a ,
(a > 0),
are weak* compact in BV0,R (Ω; Symk (Rm )), in the
topology inherited form BV(Ω; Symk (Rm )). Consequently, they
are strongly compact subsets of the space L1(Ω; Symk (Rm )).
This feature is crucial for the application of the regularisation
theory in Banach lattices above.
On a connected domain Ω, in particular
BV0,TV(Ω)
u ∈ BV(Ω)
Ω
u d x = 0 .
That is, the space consists of zeromean functions. Then u →
Du M(Ω;Rm) is a norm on BV0,TV(Ω) [
42
], and this space
is weak* compact. In particular, the sets leva TV are compact
in L1(Ω).
More generally, we know from [
8
] that on a connected
domain Ω, ker TV consists of Symk (Rm )valued
polynomials of maximal degree k. By extension, the kernel of
TGV2 consists of Symk (Rm )valued polynomials of
maximal degree k +1. In both cases, (13), weak* lower
semicontinuity of R and the equivalence of · to · BV(Ω;Symk (Rm))
hold by the results in [
8,11,51
]. Therefore, we have proved
the following.
Lemma 1 Let Ω ⊂ Rm and k ≥ 0. Then the sets leva TV and leva TGV2 are weak* compact in BV(Ω; Symk (Rm )) and strongly compact in L1(Ω; Symk (Rm )).
(13)
Now, in the above cases, ker R is finitedimensional, and
we may write
BV(Ω; Symk (Rm ))
BV0,R (Ω; Symk (Rm )) ⊕ ker R.
inf C
= R(v)=0
≤ C (1 + C )R(u).
R(ivn)f=0 u − v BV(Ω;Symk (Rm)) ≤ R(ivn)f=0 C u − v
u − v L1(Ω) + R(u − v)
Now, using the weak* lower semicontinuity of the
BVnorm, and the weak* compactness of the unit ball in
BV(Ω; Symk (Rm ))—we refer to [
2
] for these and other basic
properties of BVspaces—we may thus find a representative
u˜ in the BV0,R (Ω; Symk (Rm )) equivalence class of u,
satisfying
u˜ BV(Ω;Symk (Rm)) ≤ C (1 + C )R(u).
Again using the weak* compactness of the unit ball in
BV(Ω; Symk (Rm )), and the weak* lower semicontinuity of
R, it follows that the sets
Denoting by
BX (r ) := {x ∈ X  x
≤ r },
the closed ball of radius r in a normed space X , we obtain
by the finitedimensionality of ker R the following result.
Proposition 1 Let Ω ⊂ Rm and k ≥ 0. Pick a > 0. Then
the sets
V := leva R ⊕ Bker R (a)
for both regularisers R = TV and R = TGV2, are weak*
compact in BV(Ω; Symk (Rm )) and strongly compact in
L1(Ω; Symk (Rm )).
The next result summarises Theorem 3 and Proposition 1.
Theorem 4 With U = L1(Ω; Symk (Rm )), let the operator
A : U → F be linear, continuous and injective. Let fnl and
fnu be sequences of lower and upper bounds for the
righthand such that
fnl : fnl+1 fnl , fnu : fnu+1 fnu ,
fnl f fnu , fnl − fnu → 0 as n → ∞.
Supposing that there are no errors in the operator A and the
exact solution u¯ exists, define the feasible set as follows
Un =
converges strongly in L1(Ω; Symk (Rm )) to the exact
solution u¯ and R(un) → R(u¯).
Proof With the decomposition un = u0,n + u⊥, where un⊥ ∈
n
ker R, we have u0,n ∈ leva R for suitably large a > 0 through
R(u0,n ) = R(un) = um∈iUnn R(u ) ≤ R(u¯).
The assumption (15) bounds un⊥ ≤ a. Thus un ∈ V
for V as in Proposition 1. The proposition thus implies the
necessary compactness in U = L1(Ω; Symk (Rm )) for the
application of Theorem 3.
Remark 1 The condition (15) simply says for R = TV that
the data have to bound the solution in mean. This is very
reasonable to expect for practical data; anything else would
be very nondegenerate. For R = TGV2 we also need that
the data bound the entire affine part of the solution. Again,
this is very likely for real data. Indeed, in DTI practice, with
at least 6 independent diffusionsensitising gradients, A is
an invertible or even overdetermined linear operator. In that
typical case, the bounds fnl and fnu will be translated into Un
being a bounded set.
4 Solving the Optimisation Problem
4.1 The Chambolle–Pock Method
The Chambolle–Pock algorithm is an inertial primaldual
backwardbackward splitting method, classified in [
18
] as
(16)
(17a)
(17b)
(17c)
the modified primaldual hybrid gradient method (PDHGM).
It solves the minimax problem
min max G(x ) + K x , y − F ∗(y),
x y
where G : X → R and F ∗ : Y → R are convex, proper,
lower semicontinuous functionals on (finitedimensional)
Hilbert spaces X and Y . The operator K : X → Y is
linear, although an extension of the method to nonlinear K has
recently been derived [
55
]. The PDHGM can also be seen
as a preconditioned ADMM (alternating directions method
of multipliers); we refer to [
18,47,56
] for reviews of
optimisation methods popular in image processing. For step sizes
τ, σ > 0, and an overrelaxation parameter ω > 0, each
iteration of the algorithm consists of the updates
ui+1 := (I + τ ∂ G)−1(ui − τ K ∗ yi ),
u¯i+1 := ui+1 + ω(ui+1 − ui ),
yi+1 := (I + σ ∂ F ∗)−1(yi + σ K u¯i+1).
We should remark that the order of the primal (u) and dual
(y) updates here is reversed from the original presentation
in [
13
]. The reason is that when reordered, the updates can,
as discovered in [
26
], be easily written in a proximal point
form.
The first and last update are the backward (proximal) steps
for the primal (x ) and dual (y) variables, respectively, keeping
the other fixed. However, the dual step includes some
“inertia” or overrelaxation, as specified by the parameter ω.
Usually ω = 1, which is required for convergence proofs of the
method. If G or F ∗ is uniformly convex, by smartly
choosing for each iteration the step length parameters τ, σ , and
the inertia ω, the method can be shown to have convergence
rate O(1/N 2). This is similar to Nesterov’s optimal gradient
method [
43
]. In the general case the rate is O(1/N ). In
practice the method produces visually pleasing solutions in rather
few iterations, when applied to image processing problems.
In implementation of the method, it is crucial that the
resolvents (I + τ ∂ G)−1 and (I + σ ∂ F ∗)−1 can be computed
quickly. We recall that they may be written as
(I + τ ∂ G)−1(u) = arg min
u
u − u 2
2τ
+ G(u ) .
Usually in applications, these computations turn out to
be simple projections or linear operations—or the
softthresholding operation for the L1norm.
As a further implementation note, since the algorithm
(17) is formulated in Hilbert spaces (see however [
28
])
while our problems are formulated in the Banach space
BV(Ω; Sym2(R3)), we have to discretise our problems
before application of the algorithm. We do this by simple
forwarddifferences discretisation of the operator E with cell
width h = 1 on a regular rectangular grid corresponding to
the image voxels.
4.2 Implementation of Deterministic Constraints
We now reformulate the problem (11) of DTI imaging with
deterministic error bounds in the form (16). Suppose we
have some upper and lower bounds slj s j s uj on
the DWI signals s j , ( j = 0, . . . , N ). Then the bounds for
g j = log(s j /s0) are
glj = log slj /s0u ;
guj = log s uj /s0l , ( j = 1, . . . , N ),
because g j is monotone in regards to s j . We are thus trying
to solve
u = arg min R(u ),
u ∈U =∩U j
where
U j = u : glj
A j u
guj .
g = g1, . . . , gN , and
Au =
A1u1, . . . , AN u N ,
For the ease of notation, we write
so that the Stejskal–Tanner equation is satisfied when
Au = g.
The problem (19) may be rewritten as
(18)
(19)
(20)
min F0( Au ) + R(u ),
u
for
F0(y) = δ(gl
y
gr )
F0∗(y) =
and also writing
R(u) = R0(K0u)
gl , y , y < 0
gu , y , y ≥ 0.
,
with δ(gl y
convex set {y : gl
gu ) denoting the indicator function of the
y gr }. Solving the conjugate
for some R0 and K0, the problem can further be written in
the saddle point form (16) with
G(u) = 0,
K =
A
K0
,
F ∗(y, ψ ) = F0∗(y) + R0∗(ψ ).
To apply algorithm (17), we need to compute the resolvents
of G0∗ and R0∗. For details regarding R0∗ for R = TGV(2β,α)
and R = α TV in the discretised setting, we refer to
[
57,58
]; here it suffices to note that for R = α TV, we
have K0 = α E and R0∗(φ) is the indicator function of
the dual ball {φ  supx φ (x ) F ≤ 1}. Thus the resolvent
(I + τ ∂ R0∗)−1 becomes a projection to the dual ball. The
case of R = TGV(2β,α) is similar. For F0∗ we have
(I + τ ∂ F0∗)−1(y) = arg min F0∗(y ) + y − y 2 ,
y 2τ
which resolves pointwise at each ξ ∈ Ω into the expression
[(I + τ ∂ F0∗)−1(y)](ξ ) = S(y(ξ ))
for
⎧ y(ξ ) − gu (ξ )σ, y(ξ ) ≥ gu (ξ )σ,
S(y(ξ )) = ⎪⎨ 0, gu (ξ )σ ≤ y(ξ ) ≤ gl (ξ )σ,
⎪⎩ y(ξ ) − gl (ξ )σ, y(ξ ) ≤ gl (ξ )σ.
Finally, we note that the saddle point system (16) has to
have a solution for the Chambolle–Pock algorithm to
converge. In our setting, in particular, we need to find error
bounds gl and gu , for which there exists a solution u to
gl
Au
gu .
(21)
If one uses at most six independent diffusion directions (N =
6), as we will, then, for any g, there in fact exists a solution to
g = Au. The condition (21) becomes gl gu , immediately
guaranteed through the monotonicity of (18), and the trivial
conditions slj s uj .
We are thus ready to apply the algorithm (17) to
diffusion tensor imaging with deterministic error bounds. For the
realisation of (17) for models (8), (9), and (7), we refer to
[
55,57–59
].
5 Experimental Results
We now study the efficiency of the proposed reconstruction
model in comparison to the different L2squared models,
i.e. ones with Gaussian error assumption. This is based on
a synthetic data, for which a groundtruth is available, as
well as a real in vivo DTI data set. First we, however, have
to describe in detail the procedure for obtaining upper and
0.8
0.6
−0.1 −0.2 −0.3
lower bounds for real data, when we do not know the true
noise model, and are unable to perform multiple experiments
as required by the theory of Sect. 2.2.
5.1 Estimating Lower and Upper Bounds from Real
Data
As we have already discussed, in practice the noise in the
measurement signals sˆ j is not Gaussian or Rician; in fact
we do not know the true noise distribution and other
corruptions. Therefore, we have to estimate the noise distribution
from the image background. To do this, we require a known
correspondence between the measurement, the noise and the
true value. As we have no better assumptions available, the
standard one that we use is that of additive noise.
Continuing in the statistical setting of Sect. 2.2, we now describe
the procedure, working on discrete images expressed as
vectors fˆ = s j ∈ Rn for some fixed j ∈ {0, 1, . . . , N }.
We use superscripts to denote the voxel indices, that is
fˆ = ( f 1, . . . , f n ).
For the linear and nonlinear L2 the free parameter chosen by the parameter choice criterion is the regularisation parameter α, and for the constrained
problem it is the confidence interval
discrepancy principle. d Linear L2, erroroptimal. e Nonlinear L2,
discrepancy principle. f Nonlinear L2, erroroptimal. g Constrained, 95 %
confidence intervals. h Constrained, 90 % confidence intervals
In the i th voxel, the measured value fˆi is the sum of the
true value f i and additive noise νi :
fˆi = f i
+ νi .
All νi are assumed independent and identically distributed
(i.i.d.), but their distribution is unknown. If we did know
the true underlying cumulative distribution function F of the
noise, we could choose a confidence parameter θ ∈ (0, 1)
and use the cumulative distribution function to calculate
νθ/2, ν1−θ/2 such that1
1 Recall that for a random variable X with a cumulative distribution
function F , the quantile function F −1 returns a number xθ = F −1(θ )
such that P(X xθ ) = θ .
Method
Regression
Linear L2
Linear L2
Nonlinear L2
Nonlinear L2
Constraints
Constraints
Constraints
P (νθ/2
νi
Fig. 5 Colourcoded errors of fractional anisotropy and principal
eigenvector for the computations on the synthetic test data. Legend on
the right indicates the colourcoding of errors between u and g0 as
functions of the principal eigenvector angle error θ = cos−1( vˆu , vˆg0 ) in
terms of the hue, and the fractional anisotropy error eFA = FAu −FAg0 
in terms of whiteness. a Regression result. b Linear L2, discrepancy
principle. c Linear L2, erroroptimal. d Nonlinear L2, discrepancy
principle. f Nonlinear L2, erroroptimal. g Constrained, 95 %
confidence intervals. h Constrained, 90 % confidence intervals
ν1−θ/2) = 1 − θ .
Therefore, computing νθ/2 and ν1−θ/2 such that
Let us instead proceed nonparametrically, and divide the
whole image into two groups of voxels: the ones
belonging to the background region and the rest. For simplicity,
let the indices i = 1, . . . , k, (k < n), stand for the
background voxels. In this region, we have f i = 0 and fˆi = νi .
Therefore, the background region provides us with a number
of independent samples from the unknown distribution of
ν. The Dvoretzky–Kiefer–Wolfowitz inequality [
17, 37, 41
]
states that
P sup F (t ) − Fk (t ) >
t
≤ 2e−2k 2 ,
for the empirical cumulative distribution function
1 k
Fk (t ) := k
i=1
we also have
ν1−θ/2) = 1 − θ ,
Pk
f i + νθ/2
fˆi
f i
+ ν1−θ/2
We may therefore use the values
fˆl,i = fˆi − ν1−θ/2,
fˆu,i = fˆi − νθ/2,
as lower and upper bounds for the true values f i outside the
background region.
The Dvoretzky–Kiefer–Wolfowitz inequality implies that
the interval estimates converge to the true intervals,
determined by (22), as the number of background pixels k
increases with the image size n. This procedure, with large
(23)
k, will therefore provide an estimate of a singleexperiment
(m = 1) confidence interval for f i . We note that this
procedure will, however, not yield the convergence of the interval
estimate [ fˆl,i , fˆu,i ] to the true data; for that we would need
multiple experiments, i.e. multiple sample images (m > 1),
not just agglomeration of the background voxels into a
single noise distribution estimate. In practice, however, we can
only afford a single experiment (m = 1), and cannot go to
the limit.
5.2 Verification of the Approach with Synthetic Data
To verify the effectiveness of the considered approach and
to compare it to the other models, we use synthetic data. For
the groundtruth tensor field ug.t. we take a helix region in a
3D box 100 × 100 × 30, and choose the tensor in each point
inside the helix in such a way that the principal eigenvector
coincides with the helix direction (Fig. 2). The helix region
is described by the following equations:
x = (R + r cos(θ )) cos(φ),
y = (R + r cos(θ )) sin(φ),
z = r sin(θ ) + φ/φmax,
φ ∈ [0, φmax], r ∈ [0, rmax], θ ∈ [0, 2π ].
The vector direction in every point coincides with helix
direction:
⎛ −R sin(φ)⎞
r = ⎝ R cos(φ) ⎠ .
1/φmax
We take the parameters R = 0.3, φmax = 4π, rmax = 0.07
in this numerical experiment.
We apply the forward operators T j (ug.t.) for each j =
0, . . . , 6 to obtain the data s j (x ). We then add Rician noise
to this data s¯j = s j + δ with σ = 2, which corresponds to
PSNR ≈ 27dB.
We apply several models for solving the inverse problem
of reconstructing u: the linear and nonlinear L2 approaches
(8) and (7), and the constrained problem (11). As the
regulariser we use R = TGV(20.9α,α), where the choice β =
0.9α was made somewhat arbitrarily, however yielding good
results for all the models. This is slightly lower than the range
[1, 1.5]α discovered in comprehensive experiments for other
imaging modalities [
7,38
].
For the linear and nonlinear L2 models (8) and (7),
respectively, the regularisation parameter α is chosen either
by a version of the discrepancy principle for inconsistent
problems [
52
] or optimally with regard to the · F,2
distance between the solution and the groundtruth. In case of
truth. b Linear L2, discrepancy principle. c Linear L2, erroroptimal.
d Nonlinear L2, discrepancy principle. e Nonlinear L2, erroroptimal.
f Constrained, 95 % confidence intervals
the discrepancy principle, such an α was chosen that the
following equality holds:
Δρ (α) :=
T j (u) − s¯j 2 − τ
s¯j − s j 2 = 0. (24)
j
j
We find α by solving this equation numerically using
bisection method. We start by finding such α1, α2 that
Δρ (α1) > 0 and Δρ (α2) < 0. We calculate Δρ (α3) for
α3 = α1 +2α2 and depending on its sign replace either α1 or
α2 with α3. We repeat this procedure until the stopping
criteria are reached.
As stopping criteria we use  f (α) < . We use τ = 1.05,
= 0.01 for linear and τ = 1.2, = 0.0001 for nonlinear
L 2 solution. A value of τ yielding a reasonable degree of
smoothness has been chosen by trial and error, and is
different for the nonlinear model, reflecting a different nonlinear
objective in the discrepancy principle. For the constrained
problem we calculate θ = 90, 95 and 99 % confidence
intervals to generate the upper and lower bounds. We, however,
digress a little bit from the approach of Sect. 2.2. Minding
that we do not know the true underlying distribution, which
fails to be Rician as illustrated in Fig. 1, we do not use it
to calculate the confidence intervals, but use the estimation
procedure described in Sect. 5.1. We stress that we only have
a single sample of each signal s j , so are unable to verify any
asymptotic estimation properties.
The numerical results are in Table 1 and Figs. 3, 4, and 5,
with the first of the figures showing the colourcoded
principal eigenvector of the reconstruction, the second showing
the fractional anisotropy and principal eigenvectors and the
last one the errors in the latter two, in a colourcoded manner.
All plots are masked to represent only the nonzero region.
The field of fractional anisotropy is defined for a field u of
2tensors on Ω ⊂ Rm as
FAu (x ) =
∈ [
0, 1
],
i=1(λi − λ¯)2 1/2
m
m 2 1/2
i=1 λi
Fig. 8 Colourcoded errors of
fractional anisotropy and
principal eigenvector for the
computations on the synthetic
test data. Legend on the right
indicates the colourcoding of
errors between u and g0 as
functions of the principal
eigenvector angle error
θ = cos−1( vˆu, vˆg0 ) in terms of
the hue, and the fractional
anisotropy error
eFA = FAu − FAg0  in terms of
whiteness. a Linear L2,
discrepancy principle. b Linear
L2, erroroptimal. c Nonlinear
L2, discrepancy principle. d
Nonlinear L2, erroroptimal. e
Constrained, 95 % confidence
intervals
with λ1, . . . , λm denoting the eigenvalues of u(x ) at a point
x ∈ Ω. It measures how far the ellipsoid prescribed by
the eigenvalues and eigenvectors is from a sphere, with
FAu (x ) = 1 corresponding a full sphere, and FAu (x ) = 0
corresponding to a degenerate object not having full
dimension.
As we can see, the nonlinear approach (7) performs
overall the best by a wide margin, in terms of the pointwise
Frobenius error, i.e. error in · F,2. This is expressed as
a PSNR in Table 1. What is, however, interesting, is that
the constraintbased approach (11) has a much better
reconstruction of the principal eigenvector angle, and a comparable
reconstruction of its magnitude. Indeed, the 95 % confidence
interval in Figs. 3(g) and 4(g) suggests a nearly perfect
reconstruction in terms of smoothness. But, the Frobenius PSNR
in Table 1 for this approach is worse than the simple
unregularised inversion by regression. The problem is revealed by
Fig. 5(f): the large white cloudy areas indicate huge
fractional anisotropy errors, while at the same time, the principal
eigenvector angle errors expressed in colour are much lower
than for other approaches. Good reconstruction of the
principal eigenvector is important for the process of tractography,
i.e. the reconstruction of neural pathways in a brain. One
explanation for our good results is that the regulariser
completely governs the solution in areas where the error bounds
are inactive due to generally low errors. This results in very
smooth reconstructions, which is in the present case
desirable as our synthetic tensor field is also smooth within the
helix.
5.3 Results with In Vivo Brain Imaging Data
We now wish to study the proposed regularisation model on
a real in vivo diffusion tensor image. Our data are that of
a human brain, with the measurements of a volunteer
performed on a clinical 3T system (Siemens Magnetom TIM
Trio, Erlangen, Germany), with a 32 channel head coil. A 2D
diffusionweighted singleshot EPI sequence with
diffusionsensitising gradients applied in 12 independent directions
(b = 1000 s/mm2). An additional reference scan without
diffusion was used with the parameters: TR = 7900ms,
TE = 94ms, flip angle 90◦. Each slice of the 3D dataset
has plane resolution 1.95 mm × 1.95 mm, with a total of
128 × 128 pixels. The total number of slices is 60 with a slice
thickness of 2mm. The dataset consists of 4 repeated
measurements. The GRAPPA acceleration factor is 2. Prior to the
reconstruction of the diffusion tensor, eddycurrent
correction was performed with FSL [
50
]. Written informed consent
was obtained from the volunteer before the examination.
For error bounds calculation according to the procedure of
Sect. 5.1, to avoid systematic bias near the brain, we only use
about 0.6 % of the total volume near the borders, or roughly
k ≈ 6000 voxels.
To estimate errors for the all the considered reconstruction
models, for each gradient direction bi we use only one out of
the four duplicate measurements. We then calculate the errors
using a somewhat less than ideal pseudogroundtruth, which
is the linear regression reconstruction from all the available
measurements.
The results are in Table 2 and Figs. 6, 7, and 8, again with
the first of the figures showing the colourcoded principal
eigenvector of the reconstruction, the second showing the
fractional anisotropy and principal eigenvectors and the last
one the errors in the latter two, in a colourcoded manner.
Again, all plots are masked to represent only the nonzero
region. In the figures, we concentrate on error bounds based
on 95 % confidence intervals, as the results for the 90 and
99 % cases do not differ significantly according to Table 2.
This time, the linear L2 approach (8) has best
overall reconstruction (Frobenius PSNR), while the nonlinear
L2 approach (7) has clearly the best principal
eigenvector angle reconstruction besides the regression, which does
not seem entirely reliable regarding our regressionbased
pseudogroundtruth. The constraintsbased approach (11),
with 95 % confidence intervals is, however, not far behind
in terms of numbers. More detailed study of the corpus
callosum in Fig. 8 (small picture in picture) and Fig. 7,
however, indicates a better reconstruction of this
important region by the nonlinear approach. The constrained
approach has some very short vectors there in the white
region. Naturally, however, these results on the in vivo data
should be taken with a grain of salt, as we have only a
somewhat unreliable pseudogroundtruth available for
comparison purposes.
5.4 Conclusions from the Numerical Experiments
Our conclusion is that the error boundsbased approach is
a feasible alternative to standard modelling with incorrect
Gaussian assumptions. It can produce good reconstructions,
although the nonlinear L2 approach of [
55
] is possibly
slightly more reliable. The latter does, however, in principle
depend on a good initialisation of the optimisation method,
unlike the convex boundsbased approach.
Further theoretical work will be undertaken to extend the
partialorderbased approach to modelling errors in linear
operators to the nonlattice case of the semidefinite partial
order for symmetric matrices, which will allow us to
consider problems of diffusion MRI with errors in the forward
operator.
It also needs to be investigated whether the error bounds
approach needs to be combined with an alternative, novel,
regulariser that would ameliorate the fractional anisotropy
errors that the approach exhibits. It is important to note,
however, that from the practical point of view, of using the
reconstruction tensor field for basic tractography methods
based solely on principal eigenvectors, these are not that
critical. As pointed out by one of the reviewers, the situation
could differ with more recent geodesic tractography methods
[
20,21,25
] employing the full tensor. We provide basic
principal eigenvector tractography results for reference in Fig. 9,
without attempting to extensively interpret the results. It
suffices to say that the results look comparable. With this in sight,
the error bounds approach produces a very good
reconstruction of the direction of the principal eigenvectors, although
we saw some problems with the magnitude within the corpus
callosum.
Acknowledgments While at the Center for Mathematical Modelling
of the Escuela Politécnica Nacional in Quito, Ecuador, T. Valkonen has
been supported by a Prometeo scholarship of the Senescyt
(Ecuadorian Ministry of Science, Technology, Education, and Innovation). In
Cambridge, T. Valkonen has been supported by the EPSRC Grants
No. EP/J009539/1 “Sparse & Higherorder Image Restoration” and
No. EP/M00483X/1 “Efficient computational tools for inverse
imaging problems”. A. Gorokh and Y. Korolev are grateful to the RFBR
(Russian Foundation for Basic Research) for partial financial support
(Projects 140131173 and 140191151). The authors would also like
to thank Karl Koschutnig for the in vivo dataset, Kristian Bredies for
scripts used to generate the tractography images and Florian Knoll for
many inspiring discussions.
Open Access This article is distributed under the terms of the Creative
Commons Attribution 4.0 International License (http://creativecomm
ons.org/licenses/by/4.0/), which permits unrestricted use, distribution,
and reproduction in any medium, provided you give appropriate credit
to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made.
Appendix 1: A Data Statement for the EPSRC
The MRI scans used for this publication have been used by
the courtesy of the producer. Data that are similar for all
intents and purposes, are widespread, and can be easily
produced by making a measurement of a human subject with an
MRI scanner. Our source codes are archived at https://www.
repository.cam.ac.uk/handle/1810/253422.
Appendix 2: Notation and Techniques
We recall some basic, not completely standard,
mathematical notation and concepts in this appendix. We begin with
partially ordered vector spaces, following the book [
46
].
Then we proceed to tensor calculus and tensor fields of
bounded variation and of bounded deformation. These are
also covered in more detail for the diffusion tensor imaging
application in [
57
].
A linear space X endowed with a partial order relation is
called an ordered vector space if the partial order agrees with
linear operations in the following way:
x
x
y
y
⇒
⇒
x + z
λx
λy
y + z
∀x , y, z ∈ X,
∀x , y ∈ X and λ ∈ R+.
An ordered vector space is called a vector lattice if each pair
of elements x , y ∈ X have a supremum x ∨ y ∈ X and
infimum x ∧ y ∈ X . Supremum of two elements x , y of a
Banach lattice X is the element z = x ∨ y with the following
properties: z x , z y and ∀z˜ ∈ X such that z˜ x and
z˜ y we have z˜ z.
For any x ∈ X , the element x+ = x ∨ 0 is called its
positive part, the element x− = (−x ) ∨ 0 = (−x )+ is called
its negative part and the element x  = x+ + x− is its absolute
value. The equalities x = x+ − x− and x  = x ∨ (−x ) hold
for any x ∈ X .
It is obvious that suprema and infima exist for any finite
number of elements of a vector lattice. A vector lattice X is
said to be order complete if any bounded from above set in
X has a supremum.
Let X and Y be ordered vector spaces. A linear operator
U : X → Y is called positive, if x X 0 implies U x Y 0.
An operator U is called regular, if it can be written as U =
U1 − U2, where U1 and U2 are positive operators.
Denote the linear space of all regular operators X → Y by
L∼(X, Y ). A partial order can be introduced in L∼(X, Y ) in
the following way: U1 U2, if U1 −U2 is a positive operator.
If X and Y are vector lattices and Y is order complete, then
L∼(X, Y ) is also an order complete vector lattice.
A norm · defined in a vector lattice X is called monotone
if x  y implies x y . A vector lattice endowed
with a monotone norm is called a Banach lattice if it is norm
complete. If X and Y are Banach lattices, then all operators
in L∼(X, Y ) are continuous.
Let us list some examples of Banach lattices. The space
of continuous functions C (Ω), where Ω ⊂ Rn, is a Banach
lattice under the natural pointwise ordering: f C g if and
only if f (x ) g(x ) for all x ∈ Ω. The spaces L p(Ω),
1 p ∞, are also Banach lattices under the following
partial ordering: f L p g if and only if f (x ) g(x ) almost
everywhere in Ω. With this partial order, L p(Ω), 1 p
∞, are order complete Banach lattices. The Banach lattice
of continuous functions C (Ω) is not order complete.
Tensors in the Euclidean Setting
We call a klinear mapping A : Rm × · · · × Rm → R
a ktensor, denoted A ∈ T k (Rm ). This is a simplification
from the full differentialgeometric definition, sufficient for
our finitedimensional setting. We say that A is symmetric,
denoted A ∈ Symk (Rm ), if it satisfies for any permutation π
of {1, . . . , k} that
A(cπ1, . . . , cπk ) = A(c1, . . . , ck ).
With e1, . . . , em the standard basis of Rm , we define on
T k (Rm ) the inner product
A, B :=
A(e p1 , . . . , e pk )B(e p1 , . . . , e pk ),
p∈{1,...,m}k
and the Frobenius norm
A F :=
A, A .
The Frobenius norm is rotationally invariant in a sense crucial
for DTI. We refer to [
57
] for a detailed discussion of this, as
well of alternative rotationally invariant norms.
Example 1 (Vectors) Vectors A ∈ Rm are of course
symmetric 1tensors, the inner product is the usual inner product in
Rm and the Frobenius norm is the twonorm, A F = A 2.
Example 2 (Matrices) Matrices are 2tensors: A(x , y) =
Ax , y , while symmetric matrices A = AT are
symmetric 2tensors. The inner product is A, B = i, j Ai j Bi j
and A F is the matrix Frobenius norm.
We use the notation A ≥ 0 for positivesemidefinite
matrices A. One can verify that this relation indeed defines a partial
order in the space of symmetric matrices:
A ≥ B
iff A − B is positive semidefinite.
(25)
With this partial order, the space of all symmetric matrices
becomes an ordered vector space, but not a vector lattice.
However, it enjoys some properties similar to those of vector
lattices: for example, any directed upwards subset2 has a
supremum [39, Ch.8].
Symmetric Tensor Fields of Bounded Deformation
Let u : Ω → Symk (Rm ) for a domain Ω ⊂ Rm . We set
u(x ) Fp d x
1/ p
, ( p ∈ [1, ∞)),
u F, p :=
and
Ω
u F,∞ := ess supx∈Ω u(x ) F ,
The spaces L p(Ω; Symk (Rm )) are defined in the natural way
using these norms, and clearly satisfy all the usual properties
of L p spaces.
In the particular case of matrices (k = 2), partial order
can be introduced in the space L p(Ω; Sym2(Rm )) in the
following way:
u
v iff u(x ) ≥ v(x ) almost everywhere in Ω.
(26)
Recall that the symbol ≥ stands for the positivesemidefinite
order (25) in the space of symmetric matrices. Since the
positivesemidefinite order is not a lattice, neither is the
order (26).
If u ∈ C 1(Ω; Symk (Rm )), k ≥ 1, we define by
contraction the divergence div u ∈ C (Ω; Symk−1(Rm ) as
[div u(·)](ei2 , . . . , eik ) :=
ei1 , ∇u(·)(ei1 , . . . , eik ) .
m
i1=1
(27)
It is easily verified that div u(x ) is indeed symmetric. Given
a tensor field u ∈ L1(Ω; T k (Rm )) we then define the
symmetrised distributional gradient E u ∈ [Cc∞(Ω; Symk+1
2 Recall that an indexed subset {xτ : τ ∈ {τ }} of an ordered vector space
X is called directed upwards if for any pair τ1, τ2 ∈ {τ } there exists
τ3 ∈ {τ } such that xτ3 xτ1 and xτ3 xτ2 .
(28)
(Rm ))]∗ by
E u(ϕ) := −
u ∈ L 1(Ω ; Symk (Rm )) supϕ∈VFk,+s1 E u(ϕ) < ∞
,
where
VFk,s := {ϕ ∈ Cc∞(Ω ; Symk (Rm )) 
ϕ F,∞ ≤ 1}.
For u ∈ BD(Ω ; Symk (Rm )), the symmetrised gradient E u
is a Radon measure, E u ∈ M(Ω ; Symk+1(Rm )). For the
proof of this fact we refer to [19, §4.1.5].
Example 3 The space BD(Ω ; Sym0(Rm )) is the space BV
(Ω ) of functions of bounded variation while BD(Ω ; Sym1
(Rm )) is the space BD(Ω ) of functions of bounded
deformation of [
51
].
Artur Gorokh is a Ph.D.
student working at the
Center for Applied Mathematics
at Cornell University. Prior to
beginning the Ph.D. program,
Artur completed his
undergraduate degree in Mathematical
Physics at Moscow State
University. His research interests lie in
the area of optimization, inverse
problems and variational
analysis. He has been previously
working on the projects concerning
MRI imaging, characterization
and optimal design of optical
coatings.
Tuomo Valkonen received his
Ph.D. from the University of
Jyväskylä in 2008. Since then
he has been in Graz, Cambridge,
and Quito. In February 2016 he
started as a lecturer at the
University of Liverpool.
1. Aksoy , M. , Forman , C. , Straka , M. , Skare , S. , Holdsworth , S. , Hornegger , J. , Bammer , R.: Realtime optical motion correction for diffusion tensor imaging . Magn. Reson. Med . 66 ( 2 ), 366  378 ( 2011 ). doi: 10 .1002/mrm.22787
2. Ambrosio , L. , Fusco , N. , Pallara , D. : Functions of Bounded Variation and Free Discontinuity Problems . Oxford University Press, Oxford ( 2000 )
3. Arsigny , V. , Fillard , P. , Pennec , X. , Ayache , N.: Logeuclidean metrics for fast and simple calculus on diffusion tensors . Magn. Reson. Med . 56 ( 2 ), 411  421 ( 2006 )
4. Basser , P.J. , Jones , D.K. : Diffusiontensor MRI: theory, experimental design and data analysisa technical review . NMR Biomed . 15 ( 78 ), 456  467 ( 2002 ). doi: 10 .1002/nbm.783
5. Basu , S. , Fletcher , T. , Whitaker , R.: Rician noise removal in diffusion tensor mri . In: Larsen, R. , Nielsen , M. , Sporring , J . (eds.) Medical Image Computing and ComputerAssisted Intervention  MICCAI 2006, Lecture Notes in Computer Science , vol. 4190 , pp. 117  125 . Springer, Berlin ( 2006 ). doi: 10 .1007/11866565_ 15
6. Bacˇák , M. , Bergmann , R. , Steidl , G. , Weinmann , A. : A second order nonsmooth variational model for restoring manifoldvalued images ( 2015 )
7. Benning , M. , Gladden , L. , Holland , D. , Schönlieb , C.B. , Valkonen , T. : Phase reconstruction from velocityencoded MRI measurementsa survey of sparsitypromoting variational approaches . J. Magn. Reson . 238 , 26  43 ( 2014 ). doi: 10 .1016/j. jmr. 2013 . 10 .003
8. Bredies , K. : Symmetric tensor fields of bounded deformation . Annali di Matematica Pura ed Applicata 192 ( 5 ), 815  851 ( 2013 ). doi:10.1007/s1023101102484
9. Bredies , K. , Kunisch , K. , Pock , T. : Total generalized variation . SIAM J. Imaging Sci. 3 , 492  526 ( 2011 ). doi:10.1137/090769521
10. Bredies , K. , Kunisch , K. , Valkonen , T. : Properties of L1TGV2: the onedimensional case . J. Math. Anal. Appl . 398 , 438  454 ( 2013 ). doi: 10 .1016/j.jmaa. 2012 . 08 .053
11. Bredies , K. , Valkonen , T. : Inverse problems with secondorder total generalized variation constraints . In: Proceedings of the 9th International Conference on Sampling Theory and Applications (SampTA) 2011 , Singapore ( 2011 )
12. Burger , M. , Lucka , F. : Maximum a posteriori estimates in linear inverse problems with logconcave priors are proper bayes estimators . Inverse Prob . 30 ( 11 ), 114 , 004 ( 2014 )
13. Chambolle , A. , Pock , T. : A firstorder primaldual algorithm for convex problems with applications to imaging . J. Math. Imaging Vis . 40 , 120  145 ( 2011 ). doi:10.1007/s1085101002511
14. Chefd 'hotel, C. , Tschumperlé , D. , Deriche , R. , Faugeras , O. : Constrained flows of matrixvalued functions: Application to diffusion tensor regularization . In: Heyden, A. , Sparr , G. , Nielsen , M. , Johansen , P. (eds.) Computer VisionECCV 2002, Lecture Notes in Computer Science , vol. 2350 , pp. 251  265 . Springer, Berlin ( 2002 ). doi: 10 .1007/3540479694_ 17
15. Cox , D. , Hinkley , D. : Theoretical Statistics. Taylor & Francis, London ( 1979 )
16. Dunford , N. , Schwartz , J.T. : Linear Operators, Part I General Theory . Interscience Publishers, Hoboken ( 1958 )
17. Dvoretzky , A. , Kiefer , J. , Wolfowitz , J.: Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator . Ann. Math. Stat. 27 ( 3 ), 642  669 ( 1956 ). doi: 10 .1214/aoms/1177728174
18. Esser , E. , Zhang , X. , Chan , T.F. : A general framework for a class of first order primaldual algorithms for convex optimization in imaging science . SIAM J. Imaging Sci . 3 ( 4 ), 1015  1046 ( 2010 ). doi: 10 .1137/09076934X
19. Federer , H.: Geometric Measure Theory . Springer, New York ( 1969 )
20. Fuster , A. , Dela Haije , T. , TristánVega , A. , Plantinga , B. , Westin , C.F. , Florack , L. : Adjugate diffusion tensors for geodesic tractography in white matter . J. Math. Imaging Vis . 54 ( 1 ), 1  14 ( 2016 ). doi:10.1007/s1085101505868
21. Fuster , A. , TristanVega , A. , Haije , T. , Westin , C.F. , Florack , L. : A novel riemannian metric for geodesic tractography in dti . In: Schultz, T. , NedjatiGilani , G. , Venkataraman , A. , O'Donnell , L. , Panagiotaki , E. (eds.) Computational Diffusion MRI and Brain Connectivity, Mathematics and Visualization , pp. 97  104 . Springer, New York ( 2014 ). doi: 10 .1007/9783 319 02475 2 _ 9
22. Getreuer , P. , Tong , M. , Vese , L.A. : A variational model for the restoration of MR images corrupted by blur and Rician noise . In: Advances in Visual Computing, Lecture Notes in Computer Science , vol. 6938 , pp. 686  698 . Springer, Berlin ( 2011 ). doi: 10 .1007/ 9783 642 240287_ 63
23. Grasmair , M. , Haltmeier , M. , Scherzer , O. : The residual method for regularizing illposed problems . Appl. Math. Comp. 218 ( 6 ), 2693  2710 ( 2011 ). doi: 10 .1016/j.amc. 2011 . 08 .009
24. Gudbjartsson , H. , Patz , S.: The Rician distribution of noisy MRI data . Magn. Reson. Med . 34 ( 6 ), 910  914 ( 1995 )
25. Hao , X. , Whitaker , R. , Fletcher , P. : Adaptive riemannian metrics for improved geodesic tracking of white matter . In: Székely, G. , Hahn , H.K. (eds.) Information Processing in Medical Imaging, Lecture Notes in Computer Science , vol. 6801 , pp. 13  24 . Springer, Berlin ( 2011 ). doi: 10 .1007/9783 642 22092 0 _ 2
26. He , B. , Yuan , X. : Convergence analysis of primaldual algorithms for a saddlepoint problem: From contraction perspective . SIAM J. Imaging Sci . 5 ( 1 ), 119  149 ( 2012 ). doi:10.1137/100814494
27. Herbst , M. , Maclaren , J. , Weigel , M. , Korvink , J. , Hennig , J. , Zaitsev , M. : Prospective motion correction with continuous gradient updates in diffusion weighted imaging . Magn. Reson. Med . ( 2011 ). doi: 10 .1002/mrm.23230
28. Hohage , T. , Homann , C. : A generalization of the ChambollePock algorithm to Banach spaces with applications to inverse problems ( 2014 )
29. Kingsley , P. : Introduction to diffusion tensor imaging mathematics: Parts IIII. Concepts Magn . Reson. A 28 ( 2 ), 101  179 ( 2006 ). doi: 10 .1002/cmr.a. 20048
30. Knoll , F. , Clason , C. , Bredies , K. , Uecker , M. , Stollberger , R.: Parallel imaging with nonlinear reconstruction using variational penalties . Magn. Reson. Med . 67 ( 1 ), 34  41 ( 2012 )
31. Knoll , F. , Raya , J.G. , Halloran , R.O. , Baete , S. , Sigmund , E. , Bammer , R. , Block , T. , Otazo , R. , Sodickson , D.K. : A modelbased reconstruction for undersampled radial spinecho dti with variational penalties on the diffusion tensor . NMR Biomed . 28 ( 3 ), 353  366 ( 2015 ). doi: 10 .1002/nbm.3258
32. Korolev , Y. : Making use of a partial order in solving inverse problems: II. Inverse Prob . 30 ( 8 ), 085 , 003 ( 2014 )
33. Korolev , Y. , Yagola , A. : On inverse problems in partially ordered spaces with a priori information . J. Inverse IllPosed Prob . 20 ( 4 ), 567  573 ( 2012 )
34. Korolev , Y. , Yagola , A. : Making use of a partial order in solving inverse problems . Inverse Prob . 29 ( 9 ), 095 , 012 ( 2013 )
35. Lassas , M. , Saksman , E. , Siltanen , S. : Discretizationinvariant bayesian inversion and Besov space priors . Inverse Prob. Imaging 3 ( 1 ), 87  122 ( 2009 ). doi: 10 .3934/ipi. 2009 . 3 . 87
36. Lassas , M. , Siltanen , S. : Can one use total variation prior for edgepreserving bayesian inversion? Inverse Prob . 20 ( 5 ), 1537 ( 2004 ). doi: 10 .1088/ 0266 5611/20/5/013
37. Lehmann , E. , Romano , J.: Testing Statistical Hypotheses . Springer Texts in Statistics. Springer, New York ( 2008 )
38. de Los Reyes, J.C. , Schönlieb , C.B. , Valkonen , T. : Bilevel parameter learning for higherorder total variation regularisation models . arXiv:1508.07243 ( 2015 )
39. Luxemburg , W. , Zaanen , A.: Riesz Spaces. NorthHolland Publishing Company, Amsterdam ( 1971 )
40. Martín , A. , Schiavi , E. : Automatic total generalized variationbased DTI Rician denoising . In: Image Analysis and Recognition, Lecture Notes in Computer Science , vol. 7950 , pp. 581  588 . Springer, Berlin ( 2013 ). doi: 10 .1007/9783 642 390944_ 66
41. Massart , P.: The tight constant in the DvoretzkyKieferWolfowitz inequality . Ann. Prob. 18 ( 3 ), 1269  1283 ( 1990 ). doi: 10 .1214/aop/ 1176990746
42. Meyer, Y.: Oscillating Patterns in Image Processing and Nonlinear Evolution Equations . American Mathematical Society , Boston ( 2001 )
43. Nesterov , Y. : A method of solving a convex programming problem with convergence rate O(1/ k2) . Sov. Math. Dokl . 27 ( 2 ), 372  376 ( 1983 )
44. Pan , X. , Sidky , E.Y. , Vannier , M. : Why do commercial ct scanners still employ traditional, filtered backprojection for image reconstruction? Inverse Prob . 25 ( 12 ), 123 , 009 ( 2009 ). doi: 10 .1088/ 0266 5611/25/12/123009
45. Rudin , L. , Osher , S. , Fatemi , E.: Nonlinear total variation based noise removal algorithms . Phys. D 60 , 259  268 ( 1992 )
46. Schaefer , H.: Banach Lattices and Positive Operators . Springer, New York ( 1974 )
47. Setzer , S. : Operator splittings, bregman methods and frame shrinkage in image processing . Int. J. Comput. Vis . 92 ( 3 ), 265  280 ( 2011 ). doi:10.1007/s1126301003573
48. Setzer , S. , Steidl , G. , Popilka , B. , Burgeth , B. : Variational methods for denoising matrix fields . In: Weickert, J. , Hagen , H. (eds.) Visualization and Processing of Tensor Fields , pp. 341  360 . Springer, New York ( 2009 )
49. Shiryaev , A.N. : Probability. Graduate Texts in Mathematics. Springer, New York ( 1996 )
50. Smith , S.M. , Jenkinson , M. , Woolrich , M.W. , Beckmann , C.F. , Behrens , T.E.J. , JohansenBerg , H. , Bannister , P.R. , Luca , M.D. , Drobnjak , I. , Flitney , D.E. , Niazy , R.K. , Saunders , J. , Vickers , J. , Zhang, Y. , Stefano , N.D. , Brady , J.M. , Matthews , P.M.: Advances in functional and structural MR image analysis and implementation as FSL . Neuroimage 23 ( Suppl 1 ), S208  S219 ( 2004 ). doi: 10 . 1016/j.neuroimage. 2004 . 07 .051
51. Temam , R.: Mathematical problems in plasticity. GauthierVillars, Paris ( 1985 )
52. Tikhonov , A.N. , Goncharsky , A.V. , Stepanov , V.V. , Yagola , A.G. : Numerical Methods for the Solution of IllPosed Problems . Kluwer, Dordrecht ( 1995 )
53. Tournier , J.D. , Mori , S. , Leemans , A. : Diffusion tensor imaging and beyond . Magn. Reson. Med . 65 ( 6 ), 1532  1556 ( 2011 ). doi: 10 . 1002/mrm.22924
54. Tschumperlé , D. , Deriche , R.: Diffusion tensor regularization with constraints preservation . In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 948  953 ( 2001 )
55. Valkonen , T.: A primaldual hybrid gradient method for nonlinear operators with applications to MRI . Inverse Prob. 30 ( 5 ), 055 , 012 ( 2014 ). doi: 10 .1088/ 0266 5611/30/5/055012
56. Valkonen , T. : Big images . In: Emrouznejad, A . (ed.) Big Data Optimization: Recent Developments and Challenges, Studies in Big Data . Springer, New York ( 2015 ). Accepted
57. Valkonen , T. , Bredies , K. , Knoll , F. : Total generalised variation in diffusion tensor imaging . SIAM J. Imaging Sci . 6 ( 1 ), 487  525 ( 2013 ). doi:10.1137/120867172
58. Valkonen , T. , Knoll , F. , Bredies , K. : TGV for diffusion tensors: a comparison of fidelity functions . J. Inverse IllPosed Prob . 21 , 355  377 ( 2013 ). doi: 10 .1515/jip20130005. Special issue for IP:M&S 2012, Antalya , Turkey
59. Valkonen , T. , Liebmann , M.: GPUaccelerated regularisation of large diffusion tensor volumes . Computing 95 , 771  784 ( 2013 ). doi: 10 .1007/s006070120277x. Special issue for ESCO2012, Pilsen, Czech Republic
60. Weickert , J.: Anisotropic Diffusion in Image Processing , vol. 1 . Teubner , Stuttgart ( 1998 ) Yury Korolev is a Postdoctoral Research Fellow at the School of Engineering and Materials Science , Queen Mary University of London. He obtained his undergraduate degree in Physics from the Lomonosov Moscow State University in 2010 and Ph.D. in Mathematical Physics from the same university in 2013, working on error estimations in illposed problems with a priori information. He also spent some time at the Corporate Technology department of Siemens AG