How Long Can Optimal Locally Repairable Codes Bel
LIPIcs.APPROXRANDOM.
How Long Can Optimal Locally Repairable Codes Be?
Mathematical Sciences 1 3 4
0 Centrum Wiskunde & Informatica , Amsterdam, Netherlands. https://orcid.org/0000000237308397
1 Chaoping Xing School of Physical and Mathematical Sciences, Nanyang Technological University , Singapore. https://orcid.org/0000000212571033
2 Computer Science Department, Carnegie Mellon University , Pittsburgh, USA. https://orcid.org/0000000179263396
3 Venkatesan Guruswami
4 Nanyang Technological University, Singapore, and the Center of Mathematical Sciences and Applications, Harvard University. Research supported in part by NSF CCF1563742. Nanyang Technological University, Singapore. Research supported in part by ERC H2020 grant No. 74079, ALGSTRONGCRYPTO
A locally repairable code (LRC) with locality r allows for the recovery of any erased codeword symbol using only r other codeword symbols. A Singletontype bound dictates the best possible tradeoff between the dimension and distance of LRCs  an LRC attaining this tradeoff is deemed optimal. Such optimal LRCs have been constructed over alphabets growing linearly in the block length. Unlike the classical Singleton bound, however, it was not known if such a linear growth in the alphabet size is necessary, or for that matter even if the alphabet needs to grow at all with the block length. Indeed, for small code distances 3, 4, arbitrarily long optimal LRCs were known over fixed alphabets.
and phrases Locally Repairable Code; Singleton Bound

d(C) 6 n − k −
k
r
(1)
Modern distributed storage systems have been transitioning to erasure coding based schemes
with good storage efficiency in order to cope with the explosion in the amount of data stored
online. Locally Repairable Codes (LRCs) have emerged as the codes of choice for many
such scenarios and have been implemented in a number of large scale systems e.g., Microsoft
Azure [10] and Hadoop [17].
A block code is called a locally repairable code (LRC) with locality r if every symbol
in the encoding is a function of r other symbols. This enables recovery of any single
erased symbol in a local fashion by downloading at most r other symbols. On the other
hand, one would like the code to have a good minimum distance to enable recovery of
many erasures in the worstcase. LRCs have been the subject of extensive study in recent
years [8, 6, 16, 18, 11, 13, 5, 15, 19, 20, 3]. LRCs offer a good balance between very efficient
erasure recovery in the typical case in distributed storage systems where a single node fails
(or becomes temporarily unavailable due to maintenace or other causes), and still allowing
recovery of the data from a larger number of erasures and thus safeguarding the data in
more worstcase scenarios.
A Singletontype bound for locally repairable codes relating its length n, dimension k,
minimum distance d and locality r was first shown in the highly influential work [6]. It states
that a linear locally repairable code C must obey3
Note that any linear code of dimension k has locality at most k, so in the case when
r = k the above bound specializes to the classical Singleton bound d 6 n − k + 1, and in
general it quantifies how much one must back off from this bound to accommodate locality.
A linear LRC that meets the bound (1) with equality is said to be an optimal LRC. This
work concerns the tradeoff between alphabet size and code length for linear codes that are
optimal LRCs. Initially, the existence of such optimal LRCs and constructions were only
known over fields that were exponentially large in the block length [9, 18].4
In a celebrated paper, Tamo and Barg [19] constructed clever subcodes of ReedSolomon
codes that yield a class of optimal locally repairable codes inheriting the field size q ≈ n of
ReedSolomon codes. This shows that one can have optimal LRCs with a field size similar
to that of Maximum Distance Separable (MDS) codes which attain the classical Singleton
bound d = n − k + 1.
One is thus tempted to make an analogy between optimal LRCs and MDS codes. The
famous MDS conjecture says that there are no nontrivial (meaning, distance d > 2) MDS
codes with length exceeding q + 1 where q is its alphabet size, except in two corner cases
(q even and k = 3, or k = q − 1) where the length is at most q + 2. This conjecture was
famously resolved in the case when q is prime by Ball [1].
For optimal LRCs, it was shown that an analogous strong conjecture does not hold [13]
for almost every distance d — using elliptic curves, they gave LRCs length q + 2√q (an
earlier construction using rational function fields achieved length q + 1 [11]). A construction
3 The bound in [6] was shown even for a weaker requirement of locality only for the information symbols,
but we focus on the more general allsymbol locality.
4 If locality is desired only for the information symbols, then it is easy to construct optimal LRCs over
linearsized fields using any MDS code via the “Pyramid" construction [9]. As we said, our focus is on
LRCs with allsymbol locality which is more challenging to ensure.
of length n ≈ r +r1 q was given for small distances in [21]. Note that all these constructions
have length that is at most O(q).
The MDS conjecture makes a very precise statement about the maximum possible length
of MDS codes. An asymptotic upper bound of n = O(q) (in fact even n 6 2q) is much
easier to establish for MDS codes. Given this apparent parallel and the abovementioned
constructions which don’t achieve code lengths exceeding O(q), one might wonder if the
TamoBarg result is asymptotically optimal, in the sense that optimal LRCs must have
length at most O(q). Rather surprisingly, it was not even known if n must be bounded
as a function of q at all — that is, it was conceivable that one could have arbitrarily long
optimal LRCs over an alphabet of fixed size! Indeed, Barg et al. [2] gave optimal LRCs using
algebraic surfaces of length n ≈ q2 when the distance d = 3 and r 6 4. This then inspired
the discovery of optimal LRCs with unbounded length for d = 3, 4 via cyclic codes [14]. In
Appendix A.1, we include a simple construction of arbitrarily optimal LRCs for d = 3, 4 over
any fixed field size that satisfies q > r + 1.
Our Results. Given this state of knowledge, the natural question that arises is whether
there is any upper bound at all on the length of optimal locally repairable codes (as a function
of its alphabet size). In this paper, we answer this question affirmatively. In fact, we show
that as soon as the distance d > 5, one cannot have unbounded length optimal LRCs (unlike
the cases of d = 3, 4). Below is a statement of our upper bound on the code length of optimal
LRCs. To the best of our knowledge, this is the first upper bound on the length of optimal
LRCs.5
I Theorem 1 (Upper bound on code length of LRCs). Let d > 5, and let C be an optimal
LRC with locality r (that meets the bound (1) with equality) of length n > Ω(dr2) over an
alphabet of size q. Then n 6 O(dq3) when d is not divisible by 4, and n 6 O(dq3+4/(d−4))
when 4d.
Our actual upper bound is a bit better when d ≡ 1 (mod 4) and in particular yields n 6 O(q2)
when d = 5. The technical condition that n is at least Ω(dr2) arises in ensuring that the
code consists of n/(r + 1) disjoint recovery groups of size (r + 1) each, that together ensure
recoverability with locality r for every codeword symbols.
In our second result, we complement the above result on the limitation of LRCs with a
construction of superlinear (in q) length for d 6 r + 2.
I Theorem 2 (Construction of long LRCs). For every r, d with d 6 r + 2, there exist optimal
LRCs of length n > Ωd,r(q1+1/b(d−3)/2c).6
Again, to the best of our knowledge, this is the first code achieving super linear length
in q for d > 5. The previous best construction due to [21] achieved a length of r +r1 q for
d 6 r + 1
Organization of the paper. The paper is organized as follows. In Section 2, we provide
some preliminaries on locally repairable codes. In Section 3, we prove an upper bound on
5 Using the bound of Theorem 1 of [4], one can deduce an upper bound of O(qr) on the distance of
optimal LRCs.
6 When d = r + 2, it turns out that one cannot achieve bound (1) with equality; so we get codes with
d = n − k − dk/re + 1 which is the optimal tradeoff in this case. For d 6 r + 1 we attain (1) with
equality.
the length of optimal LRCs. In Section 4, we present a construction of optimal LRC with
super linear length in its alphabet size. Due to space restrictions, some of the proofs are
omitted and can be found in the full version.
2
Preliminaries
[n] stands for {1, . . . , n}. The floor function and ceiling function of x are denoted by bxc and
dxe, respectively. An [n, k, d]q code is a linear code over the field of size q that has length
n, dimension k, and distance d. We now define the local recoverability property of a code
formally. We give this definition in general without assuming linearity, though we restrict
our focus to linear codes in this paper.
I Definition 3. Let C be a qary block code of length n. For each α ∈ Fq and i ∈ {1, 2, · · · , n},
define C(i, α) := {c = (c1, . . . , cn) ∈ C : ci = α}. For a subset I ⊆ {1, 2, · · · , n} \ {i},
we denote by CI (i, α) the projection of C(i, α) on I. For i ∈ {1, 2, · · · , n}, a subset R of
{1, 2, . . . , n} that contains i is a called a recovery set for i if CIi (i, α) and CIi (i, β) are disjoint
for any α 6= β, where Ii = R \ {i}. Furthermore, C is called a locally recoverable code with
locality r if, for every i ∈ {1, 2, · · · , n}, there exists a recovery set Ri for i of size r + 1.
I Remark 4. The above definition of recovery sets is slightly different from that of recovery
sets given in literature where i is excluded in the recovery set Ii. The reason why we include
i in the recover set Ri of i is for convenience of proofs in this paper.
For linear codes, which are the focus of this paper, the following lemma establishes a
connection between the locality and the dual code C⊥. The proof is folklore.
I Lemma 5. A subset R of {1, 2, . . . , n} is a recovery set at i of a linear code C over Fq if
and only if there exists a codeword in C⊥ whose support contains i and is a subset of R.
For a qary [n, k, d]linear LRC with locality r, the Singletontype bound says
(2)
(3)
d 6 n − k −
k
r
Like the classical Singleton bound, the Singletontype bound (2) does not take into account
the cardinality of the code alphabet q. Augmenting this result, a recent work [4] established
a bound on the distance of locally repairable codes that depends on q, sometimes yielding
better results. However, in this paper, we specifically refer as optimal LRC a linear code
achieving the bound (2). We now rewrite this bound in a form that will be more convenient
to us.
I Lemma 6. Let n, k, d, r be positive integers with (r + 1)n. If the Singletontype bound (2)
is achieved, then
n
n − k = r + 1 + d − 2 −
d − 2 .
r + 1
I Remark 7. It turns out that the other direction of Lemma 6 is also true if d − 2 6≡ r
(mod r + 1).
3
An Upper Bound on Code Lengths
In this section, we investigate the upper bound on the code lengths of optimal LRCs over a
finite field Fq. For simplicity, we assume that n is divisible by r + 1 throughout this section.
However, in Remark 9 and 11, we extend our results to cover the cases when n is not divisible
by r + 1.
3.1
Justifying the assumption of disjoint recovery sets
We first argue that a rlocal LRC with block length n divisible by r + 1 can be assumed,
under modest conditions on the parameters, to contain n/(r + 1) disjoint recovery sets that
each allow for recovery of (r + 1) codeword symbols. This structure will then be helpful to
us in upper bounding the length of LRCs.
We remark that the structure theorem in [6] showed that the information symbols can be
arranged into k/r disjoint groups each with a local parity check, under the assumption that
rk. However, we seek allsymbol’s locality, and their argument does not directly apply.
I Lemma 8. Let C be an [n, k, d]q linear optimal LRC with locality r. Then, there exist r+n1
disjoint recovery sets, each of size r + 1 provided that
n
r + 1 >
d − 2 −
d − 2
r + 1
(3r + 2) +
d − 2
r + 1
+ 1.
I Remark 9. A similar result holds when n is not divisible by r + 1. In this case, one can
n
guarantee d r+1 e recovery sets that cover all the n codeword positions.
3.2
Proving the upper bound
In this subsection, we prove Theorem 1 (restated more formally below) that gives an upper
bound on the length n of a LRC in terms of its alphabet size q. The parity check view of an
LRC will be instrumental in our argument, in a manner similar to the bound obtained for
maximally recoverable (MR) LRCs in [7]. We will make use of Lemma 8 and the classical
Hamming upper bound on the size of codes as a function of minimum distance to derive our
result.
I Theorem 10. Let C be an optimal [n, k, d]qlinear locally repairable codes of locality r with
(r + 1)n and parameters satisfying the inequality (4) given in Lemma 8. If d > 5 and d ≡ a
(mod 4) for some 1 6 a 6 4, then
n =
( O(dq 4(dd−−a2) −1) if a = 1, 2,
O(dq 4(dd−−a3) −1) if a = 3, 4.
In particular, we have n = O dq3+ d −44 . Furthermore, we have n = O(q2), O(q3), O(q3),
O(q4), O(q2.5) and O(q3) for d = 5, 6, 7, 8, 9, and 10, respectively.
n d−2
Proof. Again we let n − k = r+1 + h with h = d − 2 − b r+1 c 6 d − 2. By Theorem 8,
n
we know that there exist ` := r+1 codewords c1, . . . , c` of C⊥ such that the supports
Supp(c1), . . . , Supp(c`), each of size r + 1, are pairwise disjoint. Put Ri = Supp(ci). By
considering an equivalent code, we may assume that Ri = {(i − 1)(r + 1) + 1, . . . , i(r + 1)}
for i = 1, 2, . . . , ` and the projection of ci at Ri are equal to allone vector 1 of length r + 1.
The paritycheck matrix H has the following form
1 0 · · · · · · · · · 0
H = 00... 01... ··. ··. .·· ··. ··. .·· ··. ··. .·· 10... ,
A
(4)
(5)
(6)
where A is an h × n matrix over Fq. The submatrix consisting of the first ` rows of H is a
block diagonal matrix. Let hi,j be the (i(r + 1) + j)th column of H, i.e.,
hi,j = (0, . . . , 0, 1, 0, . . . , 0, vi,j )T
 i{−z1 }  `{−zi }
for some vi,j ∈ Fqh, where T stands for transpose.
Define
hi0,j := hi,j − hi,r+1 = (0, . . . , 0, vi,j − vi,r+1)T
 {z }
`
r+1
n 6 r +r1
r
4(d−2)
× 4(dq−−a1) × q d−a
4(d−3)
4(dq−−a1) × q d−a + 1
if a = 1, 2,
if a = 3, 4.
The desired result follows.
(7)
for i ∈ [`] and j ∈ [r]. We claim that any b d −21 c of h10,1, . . . , h`0,r are linearly independent.
Indeed, for any t := b d −21 c vectors hi01,j1 , . . . , hi0t,jt and scalars λi1,j1 , . . . , λit,jt ∈ Fq satisfying
Ptk=1 λik,jk hi0k,jk = 0, i.e., Ptk=1 λik,jk (hik,jk − hik,r+1) = 0, we have Ptk=1 λik,jk hik,jk −
Ptk=1 λik,jk hik,r+1 = 0
Note that hi1,j1 , . . . , hik,jk together with hi1,r+1, . . . , hik,r+1 are at most 2t 6 d − 1
distinct columns of H. It follows that they are linearly independent and thus the coefficient
λi1,j1 , . . . , λit,jt must be all zero.
Moreover, we note that the first ` components of hi0,j are all zero for (i, j) ∈ [`] × [r]. We
shorten the vector hi0,j by puncturing its first ` coordinates. Denote by hei,j the shortened
vectors. It is clear that any b d −21 c of he1,1, . . . , he`,r are still linearly independent. Let H2
be the matrix whose columns consists of hei,j for i = 1, . . . , ` and j = 1, . . . , r and let C2
be a linear code whose paritycheck matrix is H2. Then C2 is a linear code with length
N := n − ` = rr+n1 , dimension at least N − h and distance at least b d −21 c + 1. We now apply
the Hamming bound to C2 = n − `, > n − ` − h, > b d −21 c + 1 linear code.
Let d = 4d1 + a for some d1 > 1 and 1 6 a 6 4.
Case 1. a = 1 or 2. In this case, we have b d −21 c + 1 = 2d1 + 1. Applying the Hamming
bound to C2 gives
qN−h 6
qN
Pid=11 Ni (q − 1)i
6
qN
dN1 (q − 1)d1
6
qN
( dN1 )d1 (q − 1)d1
i.e., rr+n1 = N 6 qd−11 × q d1 = 4(dq−−a1) × q d−a 6 4(dq−−a1) × q 4(dd−−a2) . The last inequality follows
h 4h
from the fact that h 6 d − 2.
Case 2. a = 3 or 4. In this case, we have b d −21 c + 1 = 2d1 + 2. Deleting the first coordinate
of C2 gives a qary [N − 1, N − h, > 2d1 + 1]linear code. Applying the Hamming bound to
[N − 1, N − h, > 2d1 + 1] gives
qN−h 6
qN−1
Pid=11
N −i1 (q − 1)i
6
qN−1
N−1 (q − 1)d1
d1
6
qN−1
( Nd−1 1 )d1 (q − 1)d1
i.e., rr+n1 − 1 = N − 1 6 qd−11 × q hd−11 = 4(dq−−a1) × q d−a 6 4(dq−−a1) × q d−a . In conclusion, we
4(h−1) 4(d−3)
have
I Remark 11. Let us extend this result to the case n is not divisible by r + 1. From Remark
9, we obtain d r+n1 e recovery sets R1, . . . , Rd r+n1 e covering all of the n indices. There are at
n n
most (r + 1)d r+1 e − n 6 r indices that belong to more than 1 of these d r+1 e recovery sets.
We first build the paritycheck matrix H whose first d r+n1 e rows are c1, . . . , cd r+n1 e where ci
corresponds to recovery set Ri. Then, we remove the columns from H whose indices belong
to multiple recovery sets. After removing at most r columns, we apply the same argument
to the resulting matrix. It is thus clear that the same result also holds for the case n is not
divisible by r + 1, with a small adjustment of r in the final upper bound on the code length.
I Remark 12. From our proof of Theorem 10, one might see why our argument is not
applicable to the optimal LRC with distance less than 5. In our argument, the optimal LRC
d+1
of distance d is reduced to a code of distance at least b 2 c without locality. If d 6 4, this
reduced code might be the Hamming code. As we know, the length of Hamming code is
independent of the alphabet size. On the other hand, there indeed exists unbounded length
of optimal LRC of distance d 6 4. Therefore, our argument reveals the inherent differences
of optimal LRCS with distance less than 5 and above.
4
Construction of LRCs of superlinear length
To the best of our knowledge, all known constructions of optimal LRCs have block length
n 6 O(q) unless d 6 4. Our upper bound in the preceding section implies that n must be
upper bounded by (roughly) q3. A natural question arises whether there exists optimal LRC
with super linear length in q, e.g, n = Ω(q1+ε) and some constant d > 4. In this section we
answer this question affirmatively, showing such codes for all d 6 r + 2.
When d = r + 2 and r + 1n, the Singletontype bound (1) can’t be met [6, Corollary 10].
k
In this case, by an optimal LRC we mean a code attaining the tradeoff d = n − k − d r e + 1.
When n is not divisible by r + 1, by shortening the code, it is still possible to obtain the
optimal LRCs.
I Theorem 13. Assume d 6 r + 2 and (r + 1)n. There exist optimal LRCs of length
1
n = Ωd,r(q1+ b(d−3)/2c ). In particular, one obtains the best possible length n = O(q2) for
optimal LRC of minimum distance 5 if r > 3 and (r + 1)n.
Proof. Let n = ηq1+1/b(d−3)/2c with some constant η that only depends on d and r, i.e.,
η = Ωd,r(1). We will determine η later. It suffices to construct a matrix H and show that
the code C derived from this paritycheck matrix is an optimal LRC. Label and order the
n coordinates with (i, j) ∈ [ r+n1 ] × [r + 1], i.e., (i1, j1) precedes (i2, j2) if i1 < i2 or i1 = i2
and j1 < j2. Let H = (hi,j)(i,j)∈[ r+n1 ]×[r+1] where hi,j ∈ Fqn−k. That means H consists of
the columns hi,j for (i, j) ∈ [ r+n1 ] × [r + 1]. We start from h1,1 and determine the value
of hi,j column by column in the above order. In each step, we make sure that the new
column hi,j together with any d − 2 columns preceding the (i, j)th column are linearly
independent. Meanwhile, the matrix H holds the same form7 as the matrix in (6). If we
n
can achieve both of the conditions, we are done. Define r+1 blocks B1, . . . , B r+n1 such that
n
Bi = {hi,1, . . . , hi,r+1}. That means we partition the n columns into r+1 disjoint blocks.
Algorithm 1 gives the iterative method to compute the columns hi,j’s.
7 The same form is referred to that their distributions of nonzero entry in upper half matrix (matrix lying
above A) are the same, i.e., entries of value 1 and 0 in this upper half matrix represents the nonzero
and zero entries, respectively.
Algorithm 1
For i = 1, . . . , r+n1 , and j = 1, . . . , r + 1, do the following operation.
Find v ∈ Fqn−k of form (7)8 such that v is linearly independent of any subset of at
most (d − 2) columns hi,j chosen before this step.
Let v be the (i, j)th column of H, i.e., hi,j = v.
We justify Algorithm 1 by showing that there always exists such hi,j for any (i, j) ∈
[ r+n1 ] × [r + 1]. Assume that we arrive at the (a, b)th column. If b = 1, the construction is
n
trivial. Let ha,b be a column vector such that the first r+1 components except ith component
are zero. Obviously, it matches the form of Equation 7. The linearly independence is also
trivial since the ith component of all the columns hi,j for i < a is 0. Otherwise, to simplify
our discussion, we assume that the first d − 2 columns are already found. Since any d − 2
columns prior to the (a, b)th column are already linearly independent by our algorithm, it
suffices to show that ha,b is linearly independent from these d − 2 columns. To achieve this,
we need to check all possible combinations of these d − 2 columns. Assume that these d − 2
columns are chosen exactly from t blocks. Obviously, block Ba must be selected. Otherwise,
the same reason for b = 1 implies that ha,b is linearly independent of these d − 2 columns.
Without loss of generality, we assume that these t blocks are B1, . . . , Bt−1 and Ba and there
are ij columns picked from block Bj. Then, the submatrix H1 consisting of these d − 2
columns has the following form:
H1 =
where xi ∈ Fiqj and A1 is a (d − 2) × (d − 2) matrix. If any Bj, j = 1, 2, . . . , t − 1, contains only
one column, then that column is linearly independent of the rest of the d − 3 columns and
ha,b, and therefore can be removed from consideration. Thus we may assume that there are
at least two columns chosen in each block except block Ba. Thus, t is at most b d −21 c. Recall
that our goal is to ensure that ha,b is linearly independent of these at most d − 2 columns in
total chosen from the blocks B1, . . . , Bt−1. Given the t blocks and the d − 2 columns chosen
from them, we count the number of bad ha,b which are linear combinations of these d − 2
columns. If the number of such linear combinations is smaller than the size of the whole
space of possible choices of ha,b, we are done. To achieve this, we need to determine the
maximal subspace V spanned by these d − 2 columns such that all the vectors in V has the
n
same form as the vector ha,b, i.e., the first r+1 components except ath component are zero.
For block Bj with j 6= a, by the expression of matrix H1, the ij columns of Bj created a
ij − 1dimensional subspace where the first r+n1 components of all the vectors are 0. That
8 Only the ith component out of the first r+n1 components is nonzero.
means, block Bj for j 6= a contributes ij − 1 linearly independent vectors to the maximal
subspace V . For block Ba, it contributes at most ia linearly independent vectors to the
maximal subspace V . It follows that the dimension of V is at most Pit=−11(ij −1)+ia = d−1−t.
This implies that there are at most qd−1−t ha,bs lying in the space spanned by these d − 2
columns.
It remains to count the number of distinct d − 2 column sets. Note that Ba is always
selected. Thus, we only have at most at−−11 6 tr−+n11 6 ( r+n1 )t−1 combinations of these t
blocks. After fixing these t blocks, there are at most (t(r + 1))d−2 ways to pick d − 2 columns
from these t blocks due to the fact that these t blocks contain only t(r + 1) columns. In total,
there are at most (t(r + 1))d−2( r+n1 )t−1 ways to pick d − 2 columns that precede the (i, j)th
column. Each combination contributes to at most qd−1−t bad ha,b. Thus, the number of
bad ha,b are upper bounded by
b d−21 c
X
t=1
t(r + 1) d−2
n
r + 1
t−1 qd−1−t
6
6
q(d − 1)(r + 1) d−2 b d−21 c
X
2
t=1
q(d − 1)(r + 1) d−2 d − 1
2 2
n
q(r + 1)
t−1
n
q(r + 1)
b d−23 c.
The first inequality is due to t 6 d −21 and the last inequality is due to n > q(r + 1). Plug
1
n = ηq1+ b(d−3)/2c into the formula. This number is upper bounded by qd−1(d − 1)d−1(r +
1)(d−2)/2ηb(d−3)/2c; by picking η small enough as a function of d, r we can ensure this quantity
is at most qd−1/2.
On the other hand, according to Algorithm 1, the ath component of ha,b should be
n
nonzero. Moreover, the first r+1 components except ath component are all zero. That
means, the whole space of ha,b is of size qd−1 − qd−2 > 12 qd−1. Thus, there always exists ha,b
satisfying our algorithm’s requirement.
We are almost done. Let C be the code whose paritycheck matrix is H. It is clear
that C has locality r. Since any d − 1 columns of H are linearly independent, C has
minimum distance d(C) at least d. Because H has r+n1 + d − 2 rows, the dimension of C is
k(C) > n − r+1 − (d − 2) = rr+n1 − (d − 2). This implies
n
k(C) n
r > r + 1 −
d − 2
r
We divide it into two case.
If d−2 < r, the condition r +1n implies l k(rC) m > r+n1 and thus k(C)+l k(rC) m > n−d+2.
It follows that
d(C) > d > n − k(C) −
d(C) > d > n − k(C) −
k(C)
r
k(C)
r
Thus, C is an optimal LRC. We are done.
If d − 2 = r, the condition r + 1n implies l k(rC) m > r+n1 − 1 and thus k(C) + l k(rC) m >
n − d + 1. It follows that
C is still an optimal LRC because there does not exist LRC reaching the Singletontype
bound. J
Next, we extend this theorem to the case n is not divisible by r + 1 and d 6 r + 2.
I Corollary 14. Assume n ≡ a (mod r + 1) and a > d − 1. There exists optimal LRC of
1
length n = Ωd,r(q1+ b(d−3)/2c ). In particular, one obtains the best possible length n = O(q2)
for optimal LRC of minimum distance 5 if n (mod r + 1) > 4.
n
It was shown in [12, Theorem III.3] that under the assumption that C has d r+1 e disjoint
recovery sets, a linear code C with length n = a mod r + 1, a 6= 0, 1 and dimension either
k
k mod r > a or rk must obey that d 6 n − k − d r e + 1.
With the help of Theorem III.3 in [12], we can extend the result in Corollary 14 to cover
the case a 6 d − 1 and a 6= 1.
I Corollary 15. Assume n ≡ a (mod r + 1) and a 6= 1. There exists optimal LRC of length
1
n = Ωd,r(q1+ b(d−3)/2c ).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Appendix
Unbounded length LRCs for distances 3 and 4
I Theorem 16. Assume that d = 3, 4, d − 2 6 r and r + 1n, there exist optimal LRCs of
arbitrarily lengths as long as q > r + 1.
Proof. For d = 3, 4, d − 2 6 r and r + 1n, the Singletontype bound implies that n − k =
n
r+1 + d − 2. Since q > r + 1, we let A be a (d − 2) × (r + 1) Vandermonde matrix over Fq
such that
A1 =
1
A
1
0
H = ...
is a (d − 1) × (r + 1) Vandermonde matrix. Define ( r+n1 + d − 2) × n matrix
Where 1 and 0 are all1 and all0 vectors in Fqr+1, respectively. We partition the columns of
n
H into r+1 blocks B1, . . . , B r+n1 such that h ∈ Bi if its ith component is nonzero. From
the expression of matrix H, it is clear that each column belongs to exactly one block and
the columns in distinct blocks are linearly independent. Moreover, any d − 1 columns in
the same block are linearly independent due to the property of Vandermonde matrix A1.
Next we show that any d − 1 columns of H are linearly independent. It suffices to verify this
claim for the case d = 4. To see this, we pick any three columns hi, hj, ht from H. Nothing
needs to prove if these three columns belong to the same block. We assume that they belong
to at least two blocks. Without loss of generality, ht is in a block that does not contain
hi and hj. From above observation, we see that ht is linearly independent from hi and hj.
It is clear hi and hj are linearly independent no matter whether they belong to the same
block or different blocks. Thus, any 3 columns of H are linearly independent. Let C be
the linear code whose paritycheck matrix is H. It is clear that C has length n, dimension
n rn
k(C) > n − r+1 − (d − 2) = (r+1) − (d − 2), distance d(C) > d and locality r. The condition
d − 2 6 r leads to d k(rC) e > r+1 and thus k(C) + d k(rC) e > n − (d − 2). The desired result
n
follows since d(C) > d > n − k(C) − d k(rC) e + 2.
J
S. Ball . On large subsets of a finite vector space in which every subset of basis size is a basis . J. Eur , 14 : 733  748 , October 2012 .
Walker, editors, Algebraic Geometry for Coding Theory and Cryptography , pages 95  126 . s, Springer, 2017 .
A. Barg , I. Tamo , and S. Vlăduţ . Locally recoverable codes on algebraic curves . IEEE Trans. Inform.Theory , 63 : 4928  4939 , 2017 .
V. Cadambe and A. Mazumda. Bounds on the size of locally recoverable codes . IEEE Trans. Inform.Theory , 61 : 5787  5794 , 2015 .
Discrete Mathematics , 324 ( 6 ): 78  84 , 2014 .
IEEE Trans. Inform.Theory , 58 : 6925  6934 , 2012 .
S. Gopi , V. Guruswami , and S. Yekhanin . On maximally recoverable local reconstruction codes . Electronic Colloquium on Computational Complexity , 24 ::183, 2017 .
J. Han and L. A. LastrasMontano . Reliable memories with subline accesses . In Proc. IEEE Internat. Sympos. Inform. Theory , pages 2531  2535 , 2007 .
C. Huang , M. Chen , and J. Li . Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems . In Sixth IEEE International Symposium on Network Computing and Applications , pages 79  86 , 2007 .
Erasure coding in windows azure storage . In USENIX Annual Technical Conference (ATC) , pages 15  26 , 2012 .
L. Jin , L. Ma, and Xing C. Construction of optimal locally repairable codes via automorphism groups of rational function fields . URL: https://arxiv.org/abs/1710.09638.
O. Kolosov , A. Barg , I. Tamo , and G. Yadgar . Optimal lrc codes for all lengths n 6 q.
URL: https://arxiv.org/pdf/ 1802 .00157.
X. Li , L. Ma , and C. Xing . Optimal locally repairable codes via elliptic curves . To appear in IEEE Trans. Inf. Theory , 2017 . URL: https://arxiv.org/abs/1712.03744.
Y. Luo , C. Xing , and C. Yuan . Optimal locally repairable codes of distance 3 and 4 via cyclic codes . To appear in IEEE Trans. Inf. Theory , 2018 . arXiv: 1801 .03623.
D. S. Papailiopoulos and A. G. Dimakis . Locally repairable codes . IEEE Trans. Inform.Theory , 60 : 5843  5855 , 2014 .
N. Prakash , G. M. Kamath , V. Lalitha , and P. V. Kumar . Optimal linear codes with a localerrorcorrection property . In Proc. 2012 IEEE Int. Symp. Inform. Theory , pages 2776  2780 , 2012 .
M. Sathiamoorthy , M. Asteris , D. S. Papailiopoulos , A. G. Dimakis , R. Vadali , S. Chen , and D. Borthakur . XORing elephants: novel erasure codes for big data . Proceedings of VLDB Endowment (PVLDB) , pages 325  336 , 2013 .
N. Silberstein , A. S. Rawat , O. O. Koyluoglu , and S. Vichwanath . Optimal locally repairable codes via rankmatric codes . In Proc. IEEE Int. Symp. Inf. Theory , pages 1819  1823 , 2013 .
I. Tamo and A. Barg . A family of optimal locally recoverable codes . IEEE Trans. Inform.Theory , 60 : 4661  4676 , 2014 .
I. Tamo , D. S. Papailiopoulos , and A. G. Dimakis . Optimal locally repairable codes and connections to matroid theory . IEEE Trans. Inform.Theory , 62 : 6661  6671 , 2016 .
Z. Zhang , J. Xu, and M. Liu . Constructions of optimal locally repairable codes over small fields . SCIENTIA SINICA Mathematica , 47 ( 11 ): 1607  1614 , 2017 .