How Long Can Optimal Locally Repairable Codes Bel

LIPICS - Leibniz International Proceedings in Informatics, Aug 2018

A locally repairable code (LRC) with locality r allows for the recovery of any erased codeword symbol using only r other codeword symbols. A Singleton-type bound dictates the best possible trade-off between the dimension and distance of LRCs - an LRC attaining this trade-off is deemed optimal. Such optimal LRCs have been constructed over alphabets growing linearly in the block length. Unlike the classical Singleton bound, however, it was not known if such a linear growth in the alphabet size is necessary, or for that matter even if the alphabet needs to grow at all with the block length. Indeed, for small code distances 3,4, arbitrarily long optimal LRCs were known over fixed alphabets. Here, we prove that for distances d >=slant 5, the code length n of an optimal LRC over an alphabet of size q must be at most roughly O(d q^3). For the case d=5, our upper bound is O(q^2). We complement these bounds by showing the existence of optimal LRCs of length Omega_{d,r}(q^{1+1/floor[(d-3)/2]}) when d <=slant r+2. Our bounds match when d=5, pinning down n=Theta(q^2) as the asymptotically largest length of an optimal LRC for this case.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://drops.dagstuhl.de/opus/volltexte/2018/9445/pdf/LIPIcs-APPROX-RANDOM-2018-41.pdf

How Long Can Optimal Locally Repairable Codes Bel

LIPIcs.APPROX-RANDOM. How Long Can Optimal Locally Repairable Codes Be? Mathematical Sciences 1 3 4 0 Centrum Wiskunde & Informatica , Amsterdam, Netherlands. https://orcid.org/0000-0002-3730-8397 1 Chaoping Xing School of Physical and Mathematical Sciences, Nanyang Technological University , Singapore. https://orcid.org/0000-0002-1257-1033 2 Computer Science Department, Carnegie Mellon University , Pittsburgh, USA. https://orcid.org/0000-0001-7926-3396 3 Venkatesan Guruswami 4 Nanyang Technological University, Singapore, and the Center of Mathematical Sciences and Applications, Harvard University. Research supported in part by NSF CCF-1563742. Nanyang Technological University, Singapore. Research supported in part by ERC H2020 grant No. 74079, ALGSTRONGCRYPTO A locally repairable code (LRC) with locality r allows for the recovery of any erased codeword symbol using only r other codeword symbols. A Singleton-type bound dictates the best possible trade-off between the dimension and distance of LRCs - an LRC attaining this trade-off is deemed optimal. Such optimal LRCs have been constructed over alphabets growing linearly in the block length. Unlike the classical Singleton bound, however, it was not known if such a linear growth in the alphabet size is necessary, or for that matter even if the alphabet needs to grow at all with the block length. Indeed, for small code distances 3, 4, arbitrarily long optimal LRCs were known over fixed alphabets. and phrases Locally Repairable Code; Singleton Bound - d(C) 6 n − k − k r (1) Modern distributed storage systems have been transitioning to erasure coding based schemes with good storage efficiency in order to cope with the explosion in the amount of data stored online. Locally Repairable Codes (LRCs) have emerged as the codes of choice for many such scenarios and have been implemented in a number of large scale systems e.g., Microsoft Azure [10] and Hadoop [17]. A block code is called a locally repairable code (LRC) with locality r if every symbol in the encoding is a function of r other symbols. This enables recovery of any single erased symbol in a local fashion by downloading at most r other symbols. On the other hand, one would like the code to have a good minimum distance to enable recovery of many erasures in the worst-case. LRCs have been the subject of extensive study in recent years [8, 6, 16, 18, 11, 13, 5, 15, 19, 20, 3]. LRCs offer a good balance between very efficient erasure recovery in the typical case in distributed storage systems where a single node fails (or becomes temporarily unavailable due to maintenace or other causes), and still allowing recovery of the data from a larger number of erasures and thus safeguarding the data in more worst-case scenarios. A Singleton-type bound for locally repairable codes relating its length n, dimension k, minimum distance d and locality r was first shown in the highly influential work [6]. It states that a linear locally repairable code C must obey3 Note that any linear code of dimension k has locality at most k, so in the case when r = k the above bound specializes to the classical Singleton bound d 6 n − k + 1, and in general it quantifies how much one must back off from this bound to accommodate locality. A linear LRC that meets the bound (1) with equality is said to be an optimal LRC. This work concerns the trade-off between alphabet size and code length for linear codes that are optimal LRCs. Initially, the existence of such optimal LRCs and constructions were only known over fields that were exponentially large in the block length [9, 18].4 In a celebrated paper, Tamo and Barg [19] constructed clever subcodes of Reed-Solomon codes that yield a class of optimal locally repairable codes inheriting the field size q ≈ n of Reed-Solomon codes. This shows that one can have optimal LRCs with a field size similar to that of Maximum Distance Separable (MDS) codes which attain the classical Singleton bound d = n − k + 1. One is thus tempted to make an analogy between optimal LRCs and MDS codes. The famous MDS conjecture says that there are no non-trivial (meaning, distance d > 2) MDS codes with length exceeding q + 1 where q is its alphabet size, except in two corner cases (q even and k = 3, or k = q − 1) where the length is at most q + 2. This conjecture was famously resolved in the case when q is prime by Ball [1]. For optimal LRCs, it was shown that an analogous strong conjecture does not hold [13] for almost every distance d — using elliptic curves, they gave LRCs length q + 2√q (an earlier construction using rational function fields achieved length q + 1 [11]). A construction 3 The bound in [6] was shown even for a weaker requirement of locality only for the information symbols, but we focus on the more general all-symbol locality. 4 If locality is desired only for the information symbols, then it is easy to construct optimal LRCs over linear-sized fields using any MDS code via the “Pyramid" construction [9]. As we said, our focus is on LRCs with all-symbol locality which is more challenging to ensure. of length n ≈ r +r1 q was given for small distances in [21]. Note that all these constructions have length that is at most O(q). The MDS conjecture makes a very precise statement about the maximum possible length of MDS codes. An asymptotic upper bound of n = O(q) (in fact even n 6 2q) is much easier to establish for MDS codes. Given this apparent parallel and the above-mentioned constructions which don’t achieve code lengths exceeding O(q), one might wonder if the Tamo-Barg result is asymptotically optimal, in the sense that optimal LRCs must have length at most O(q). Rather surprisingly, it was not even known if n must be bounded as a function of q at all — that is, it was conceivable that one could have arbitrarily long optimal LRCs over an alphabet of fixed size! Indeed, Barg et al. [2] gave optimal LRCs using algebraic surfaces of length n ≈ q2 when the distance d = 3 and r 6 4. This then inspired the discovery of optimal LRCs with unbounded length for d = 3, 4 via cyclic codes [14]. In Appendix A.1, we include a simple construction of arbitrarily optimal LRCs for d = 3, 4 over any fixed field size that satisfies q > r + 1. Our Results. Given this state of knowledge, the natural question that arises is whether there is any upper bound at all on the length of optimal locally repairable codes (as a function of its alphabet size). In this paper, we answer this question affirmatively. In fact, we show that as soon as the distance d > 5, one cannot have unbounded length optimal LRCs (unlike the cases of d = 3, 4). Below is a statement of our upper bound on the code length of optimal LRCs. To the best of our knowledge, this is the first upper bound on the length of optimal LRCs.5 I Theorem 1 (Upper bound on code length of LRCs). Let d > 5, and let C be an optimal LRC with locality r (that meets the bound (1) with equality) of length n > Ω(dr2) over an alphabet of size q. Then n 6 O(dq3) when d is not divisible by 4, and n 6 O(dq3+4/(d−4)) when 4|d. Our actual upper bound is a bit better when d ≡ 1 (mod 4) and in particular yields n 6 O(q2) when d = 5. The technical condition that n is at least Ω(dr2) arises in ensuring that the code consists of n/(r + 1) disjoint recovery groups of size (r + 1) each, that together ensure recoverability with locality r for every codeword symbols. In our second result, we complement the above result on the limitation of LRCs with a construction of super-linear (in q) length for d 6 r + 2. I Theorem 2 (Construction of long LRCs). For every r, d with d 6 r + 2, there exist optimal LRCs of length n > Ωd,r(q1+1/b(d−3)/2c).6 Again, to the best of our knowledge, this is the first code achieving super linear length in q for d > 5. The previous best construction due to [21] achieved a length of r +r1 q for d 6 r + 1 Organization of the paper. The paper is organized as follows. In Section 2, we provide some preliminaries on locally repairable codes. In Section 3, we prove an upper bound on 5 Using the bound of Theorem 1 of [4], one can deduce an upper bound of O(qr) on the distance of optimal LRCs. 6 When d = r + 2, it turns out that one cannot achieve bound (1) with equality; so we get codes with d = n − k − dk/re + 1 which is the optimal trade-off in this case. For d 6 r + 1 we attain (1) with equality. the length of optimal LRCs. In Section 4, we present a construction of optimal LRC with super linear length in its alphabet size. Due to space restrictions, some of the proofs are omitted and can be found in the full version. 2 Preliminaries [n] stands for {1, . . . , n}. The floor function and ceiling function of x are denoted by bxc and dxe, respectively. An [n, k, d]q code is a linear code over the field of size q that has length n, dimension k, and distance d. We now define the local recoverability property of a code formally. We give this definition in general without assuming linearity, though we restrict our focus to linear codes in this paper. I Definition 3. Let C be a q-ary block code of length n. For each α ∈ Fq and i ∈ {1, 2, · · · , n}, define C(i, α) := {c = (c1, . . . , cn) ∈ C : ci = α}. For a subset I ⊆ {1, 2, · · · , n} \ {i}, we denote by CI (i, α) the projection of C(i, α) on I. For i ∈ {1, 2, · · · , n}, a subset R of {1, 2, . . . , n} that contains i is a called a recovery set for i if CIi (i, α) and CIi (i, β) are disjoint for any α 6= β, where Ii = R \ {i}. Furthermore, C is called a locally recoverable code with locality r if, for every i ∈ {1, 2, · · · , n}, there exists a recovery set Ri for i of size r + 1. I Remark 4. The above definition of recovery sets is slightly different from that of recovery sets given in literature where i is excluded in the recovery set Ii. The reason why we include i in the recover set Ri of i is for convenience of proofs in this paper. For linear codes, which are the focus of this paper, the following lemma establishes a connection between the locality and the dual code C⊥. The proof is folklore. I Lemma 5. A subset R of {1, 2, . . . , n} is a recovery set at i of a linear code C over Fq if and only if there exists a codeword in C⊥ whose support contains i and is a subset of R. For a q-ary [n, k, d]-linear LRC with locality r, the Singleton-type bound says (2) (3) d 6 n − k − k r Like the classical Singleton bound, the Singleton-type bound (2) does not take into account the cardinality of the code alphabet q. Augmenting this result, a recent work [4] established a bound on the distance of locally repairable codes that depends on q, sometimes yielding better results. However, in this paper, we specifically refer as optimal LRC a linear code achieving the bound (2). We now rewrite this bound in a form that will be more convenient to us. I Lemma 6. Let n, k, d, r be positive integers with (r + 1)|n. If the Singleton-type bound (2) is achieved, then n n − k = r + 1 + d − 2 − d − 2 . r + 1 I Remark 7. It turns out that the other direction of Lemma 6 is also true if d − 2 6≡ r (mod r + 1). 3 An Upper Bound on Code Lengths In this section, we investigate the upper bound on the code lengths of optimal LRCs over a finite field Fq. For simplicity, we assume that n is divisible by r + 1 throughout this section. However, in Remark 9 and 11, we extend our results to cover the cases when n is not divisible by r + 1. 3.1 Justifying the assumption of disjoint recovery sets We first argue that a r-local LRC with block length n divisible by r + 1 can be assumed, under modest conditions on the parameters, to contain n/(r + 1) disjoint recovery sets that each allow for recovery of (r + 1) codeword symbols. This structure will then be helpful to us in upper bounding the length of LRCs. We remark that the structure theorem in [6] showed that the information symbols can be arranged into k/r disjoint groups each with a local parity check, under the assumption that r|k. However, we seek all-symbol’s locality, and their argument does not directly apply. I Lemma 8. Let C be an [n, k, d]q linear optimal LRC with locality r. Then, there exist r+n1 disjoint recovery sets, each of size r + 1 provided that n r + 1 > d − 2 − d − 2 r + 1 (3r + 2) + d − 2 r + 1 + 1. I Remark 9. A similar result holds when n is not divisible by r + 1. In this case, one can n guarantee d r+1 e recovery sets that cover all the n codeword positions. 3.2 Proving the upper bound In this subsection, we prove Theorem 1 (restated more formally below) that gives an upper bound on the length n of a LRC in terms of its alphabet size q. The parity check view of an LRC will be instrumental in our argument, in a manner similar to the bound obtained for maximally recoverable (MR) LRCs in [7]. We will make use of Lemma 8 and the classical Hamming upper bound on the size of codes as a function of minimum distance to derive our result. I Theorem 10. Let C be an optimal [n, k, d]q-linear locally repairable codes of locality r with (r + 1)|n and parameters satisfying the inequality (4) given in Lemma 8. If d > 5 and d ≡ a (mod 4) for some 1 6 a 6 4, then n = ( O(dq 4(dd−−a2) −1) if a = 1, 2, O(dq 4(dd−−a3) −1) if a = 3, 4. In particular, we have n = O dq3+ d −44 . Furthermore, we have n = O(q2), O(q3), O(q3), O(q4), O(q2.5) and O(q3) for d = 5, 6, 7, 8, 9, and 10, respectively. n d−2 Proof. Again we let n − k = r+1 + h with h = d − 2 − b r+1 c 6 d − 2. By Theorem 8, n we know that there exist ` := r+1 codewords c1, . . . , c` of C⊥ such that the supports Supp(c1), . . . , Supp(c`), each of size r + 1, are pairwise disjoint. Put Ri = Supp(ci). By considering an equivalent code, we may assume that Ri = {(i − 1)(r + 1) + 1, . . . , i(r + 1)} for i = 1, 2, . . . , ` and the projection of ci at Ri are equal to all-one vector 1 of length r + 1. The parity-check matrix H has the following form  1 0 · · · · · · · · · 0  H =  00... 01... ··. ··. .·· ··. ··. .·· ··. ··. .·· 10...  , A (4) (5) (6) where A is an h × n matrix over Fq. The submatrix consisting of the first ` rows of H is a block diagonal matrix. Let hi,j be the (i(r + 1) + j)-th column of H, i.e., hi,j = (0, . . . , 0, 1, 0, . . . , 0, vi,j )T | i{−z1 } | `{−zi } for some vi,j ∈ Fqh, where T stands for transpose. Define hi0,j := hi,j − hi,r+1 = (0, . . . , 0, vi,j − vi,r+1)T | {z } `  r+1 n 6  r +r1  r 4(d−2) × 4(dq−−a1) × q d−a 4(d−3) 4(dq−−a1) × q d−a + 1 if a = 1, 2, if a = 3, 4. The desired result follows. (7) for i ∈ [`] and j ∈ [r]. We claim that any b d −21 c of h10,1, . . . , h`0,r are linearly independent. Indeed, for any t := b d −21 c vectors hi01,j1 , . . . , hi0t,jt and scalars λi1,j1 , . . . , λit,jt ∈ Fq satisfying Ptk=1 λik,jk hi0k,jk = 0, i.e., Ptk=1 λik,jk (hik,jk − hik,r+1) = 0, we have Ptk=1 λik,jk hik,jk − Ptk=1 λik,jk hik,r+1 = 0 Note that hi1,j1 , . . . , hik,jk together with hi1,r+1, . . . , hik,r+1 are at most 2t 6 d − 1 distinct columns of H. It follows that they are linearly independent and thus the coefficient λi1,j1 , . . . , λit,jt must be all zero. Moreover, we note that the first ` components of hi0,j are all zero for (i, j) ∈ [`] × [r]. We shorten the vector hi0,j by puncturing its first ` coordinates. Denote by hei,j the shortened vectors. It is clear that any b d −21 c of he1,1, . . . , he`,r are still linearly independent. Let H2 be the matrix whose columns consists of hei,j for i = 1, . . . , ` and j = 1, . . . , r and let C2 be a linear code whose parity-check matrix is H2. Then C2 is a linear code with length N := n − ` = rr+n1 , dimension at least N − h and distance at least b d −21 c + 1. We now apply the Hamming bound to C2 = n − `, > n − ` − h, > b d −21 c + 1 -linear code. Let d = 4d1 + a for some d1 > 1 and 1 6 a 6 4. Case 1. a = 1 or 2. In this case, we have b d −21 c + 1 = 2d1 + 1. Applying the Hamming bound to C2 gives qN−h 6 qN Pid=11 Ni (q − 1)i 6 qN dN1 (q − 1)d1 6 qN ( dN1 )d1 (q − 1)d1 i.e., rr+n1 = N 6 qd−11 × q d1 = 4(dq−−a1) × q d−a 6 4(dq−−a1) × q 4(dd−−a2) . The last inequality follows h 4h from the fact that h 6 d − 2. Case 2. a = 3 or 4. In this case, we have b d −21 c + 1 = 2d1 + 2. Deleting the first coordinate of C2 gives a q-ary [N − 1, N − h, > 2d1 + 1]-linear code. Applying the Hamming bound to [N − 1, N − h, > 2d1 + 1] gives qN−h 6 qN−1 Pid=11 N −i1 (q − 1)i 6 qN−1 N−1 (q − 1)d1 d1 6 qN−1 ( Nd−1 1 )d1 (q − 1)d1 i.e., rr+n1 − 1 = N − 1 6 qd−11 × q hd−11 = 4(dq−−a1) × q d−a 6 4(dq−−a1) × q d−a . In conclusion, we 4(h−1) 4(d−3) have I Remark 11. Let us extend this result to the case n is not divisible by r + 1. From Remark 9, we obtain d r+n1 e recovery sets R1, . . . , Rd r+n1 e covering all of the n indices. There are at n n most (r + 1)d r+1 e − n 6 r indices that belong to more than 1 of these d r+1 e recovery sets. We first build the parity-check matrix H whose first d r+n1 e rows are c1, . . . , cd r+n1 e where ci corresponds to recovery set Ri. Then, we remove the columns from H whose indices belong to multiple recovery sets. After removing at most r columns, we apply the same argument to the resulting matrix. It is thus clear that the same result also holds for the case n is not divisible by r + 1, with a small adjustment of r in the final upper bound on the code length. I Remark 12. From our proof of Theorem 10, one might see why our argument is not applicable to the optimal LRC with distance less than 5. In our argument, the optimal LRC d+1 of distance d is reduced to a code of distance at least b 2 c without locality. If d 6 4, this reduced code might be the Hamming code. As we know, the length of Hamming code is independent of the alphabet size. On the other hand, there indeed exists unbounded length of optimal LRC of distance d 6 4. Therefore, our argument reveals the inherent differences of optimal LRCS with distance less than 5 and above. 4 Construction of LRCs of super-linear length To the best of our knowledge, all known constructions of optimal LRCs have block length n 6 O(q) unless d 6 4. Our upper bound in the preceding section implies that n must be upper bounded by (roughly) q3. A natural question arises whether there exists optimal LRC with super linear length in q, e.g, n = Ω(q1+ε) and some constant d > 4. In this section we answer this question affirmatively, showing such codes for all d 6 r + 2. When d = r + 2 and r + 1|n, the Singleton-type bound (1) can’t be met [6, Corollary 10]. k In this case, by an optimal LRC we mean a code attaining the trade-off d = n − k − d r e + 1. When n is not divisible by r + 1, by shortening the code, it is still possible to obtain the optimal LRCs. I Theorem 13. Assume d 6 r + 2 and (r + 1)|n. There exist optimal LRCs of length 1 n = Ωd,r(q1+ b(d−3)/2c ). In particular, one obtains the best possible length n = O(q2) for optimal LRC of minimum distance 5 if r > 3 and (r + 1)|n. Proof. Let n = ηq1+1/b(d−3)/2c with some constant η that only depends on d and r, i.e., η = Ωd,r(1). We will determine η later. It suffices to construct a matrix H and show that the code C derived from this parity-check matrix is an optimal LRC. Label and order the n coordinates with (i, j) ∈ [ r+n1 ] × [r + 1], i.e., (i1, j1) precedes (i2, j2) if i1 < i2 or i1 = i2 and j1 < j2. Let H = (hi,j)(i,j)∈[ r+n1 ]×[r+1] where hi,j ∈ Fqn−k. That means H consists of the columns hi,j for (i, j) ∈ [ r+n1 ] × [r + 1]. We start from h1,1 and determine the value of hi,j column by column in the above order. In each step, we make sure that the new column hi,j together with any d − 2 columns preceding the (i, j)-th column are linearly independent. Meanwhile, the matrix H holds the same form7 as the matrix in (6). If we n can achieve both of the conditions, we are done. Define r+1 blocks B1, . . . , B r+n1 such that n Bi = {hi,1, . . . , hi,r+1}. That means we partition the n columns into r+1 disjoint blocks. Algorithm 1 gives the iterative method to compute the columns hi,j’s. 7 The same form is referred to that their distributions of non-zero entry in upper half matrix (matrix lying above A) are the same, i.e., entries of value 1 and 0 in this upper half matrix represents the nonzero and zero entries, respectively. Algorithm 1 For i = 1, . . . , r+n1 , and j = 1, . . . , r + 1, do the following operation. Find v ∈ Fqn−k of form (7)8 such that v is linearly independent of any subset of at most (d − 2) columns hi,j chosen before this step. Let v be the (i, j)-th column of H, i.e., hi,j = v. We justify Algorithm 1 by showing that there always exists such hi,j for any (i, j) ∈ [ r+n1 ] × [r + 1]. Assume that we arrive at the (a, b)-th column. If b = 1, the construction is n trivial. Let ha,b be a column vector such that the first r+1 components except i-th component are zero. Obviously, it matches the form of Equation 7. The linearly independence is also trivial since the i-th component of all the columns hi,j for i < a is 0. Otherwise, to simplify our discussion, we assume that the first d − 2 columns are already found. Since any d − 2 columns prior to the (a, b)-th column are already linearly independent by our algorithm, it suffices to show that ha,b is linearly independent from these d − 2 columns. To achieve this, we need to check all possible combinations of these d − 2 columns. Assume that these d − 2 columns are chosen exactly from t blocks. Obviously, block Ba must be selected. Otherwise, the same reason for b = 1 implies that ha,b is linearly independent of these d − 2 columns. Without loss of generality, we assume that these t blocks are B1, . . . , Bt−1 and Ba and there are ij columns picked from block Bj. Then, the submatrix H1 consisting of these d − 2 columns has the following form:        H1 =         where xi ∈ Fiqj and A1 is a (d − 2) × (d − 2) matrix. If any Bj, j = 1, 2, . . . , t − 1, contains only one column, then that column is linearly independent of the rest of the d − 3 columns and ha,b, and therefore can be removed from consideration. Thus we may assume that there are at least two columns chosen in each block except block Ba. Thus, t is at most b d −21 c. Recall that our goal is to ensure that ha,b is linearly independent of these at most d − 2 columns in total chosen from the blocks B1, . . . , Bt−1. Given the t blocks and the d − 2 columns chosen from them, we count the number of bad ha,b which are linear combinations of these d − 2 columns. If the number of such linear combinations is smaller than the size of the whole space of possible choices of ha,b, we are done. To achieve this, we need to determine the maximal subspace V spanned by these d − 2 columns such that all the vectors in V has the n same form as the vector ha,b, i.e., the first r+1 components except a-th component are zero. For block Bj with j 6= a, by the expression of matrix H1, the ij columns of Bj created a ij − 1-dimensional subspace where the first r+n1 components of all the vectors are 0. That 8 Only the i-th component out of the first r+n1 components is nonzero. means, block Bj for j 6= a contributes ij − 1 linearly independent vectors to the maximal subspace V . For block Ba, it contributes at most ia linearly independent vectors to the maximal subspace V . It follows that the dimension of V is at most Pit=−11(ij −1)+ia = d−1−t. This implies that there are at most qd−1−t ha,bs lying in the space spanned by these d − 2 columns. It remains to count the number of distinct d − 2 column sets. Note that Ba is always selected. Thus, we only have at most at−−11 6 tr−+n11 6 ( r+n1 )t−1 combinations of these t blocks. After fixing these t blocks, there are at most (t(r + 1))d−2 ways to pick d − 2 columns from these t blocks due to the fact that these t blocks contain only t(r + 1) columns. In total, there are at most (t(r + 1))d−2( r+n1 )t−1 ways to pick d − 2 columns that precede the (i, j)-th column. Each combination contributes to at most qd−1−t bad ha,b. Thus, the number of bad ha,b are upper bounded by b d−21 c X t=1 t(r + 1) d−2 n r + 1 t−1 qd−1−t 6 6 q(d − 1)(r + 1) d−2 b d−21 c X 2 t=1 q(d − 1)(r + 1) d−2 d − 1 2 2 n q(r + 1) t−1 n q(r + 1) b d−23 c. The first inequality is due to t 6 d −21 and the last inequality is due to n > q(r + 1). Plug 1 n = ηq1+ b(d−3)/2c into the formula. This number is upper bounded by qd−1(d − 1)d−1(r + 1)(d−2)/2ηb(d−3)/2c; by picking η small enough as a function of d, r we can ensure this quantity is at most qd−1/2. On the other hand, according to Algorithm 1, the a-th component of ha,b should be n nonzero. Moreover, the first r+1 components except a-th component are all zero. That means, the whole space of ha,b is of size qd−1 − qd−2 > 12 qd−1. Thus, there always exists ha,b satisfying our algorithm’s requirement. We are almost done. Let C be the code whose parity-check matrix is H. It is clear that C has locality r. Since any d − 1 columns of H are linearly independent, C has minimum distance d(C) at least d. Because H has r+n1 + d − 2 rows, the dimension of C is k(C) > n − r+1 − (d − 2) = rr+n1 − (d − 2). This implies n k(C) n r > r + 1 − d − 2 r We divide it into two case. If d−2 < r, the condition r +1|n implies l k(rC) m > r+n1 and thus k(C)+l k(rC) m > n−d+2. It follows that d(C) > d > n − k(C) − d(C) > d > n − k(C) − k(C) r k(C) r Thus, C is an optimal LRC. We are done. If d − 2 = r, the condition r + 1|n implies l k(rC) m > r+n1 − 1 and thus k(C) + l k(rC) m > n − d + 1. It follows that C is still an optimal LRC because there does not exist LRC reaching the Singleton-type bound. J Next, we extend this theorem to the case n is not divisible by r + 1 and d 6 r + 2. I Corollary 14. Assume n ≡ a (mod r + 1) and a > d − 1. There exists optimal LRC of 1 length n = Ωd,r(q1+ b(d−3)/2c ). In particular, one obtains the best possible length n = O(q2) for optimal LRC of minimum distance 5 if n (mod r + 1) > 4. n It was shown in [12, Theorem III.3] that under the assumption that C has d r+1 e disjoint recovery sets, a linear code C with length n = a mod r + 1, a 6= 0, 1 and dimension either k k mod r > a or r|k must obey that d 6 n − k − d r e + 1. With the help of Theorem III.3 in [12], we can extend the result in Corollary 14 to cover the case a 6 d − 1 and a 6= 1. I Corollary 15. Assume n ≡ a (mod r + 1) and a 6= 1. There exists optimal LRC of length 1 n = Ωd,r(q1+ b(d−3)/2c ). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Appendix Unbounded length LRCs for distances 3 and 4 I Theorem 16. Assume that d = 3, 4, d − 2 6 r and r + 1|n, there exist optimal LRCs of arbitrarily lengths as long as q > r + 1. Proof. For d = 3, 4, d − 2 6 r and r + 1|n, the Singleton-type bound implies that n − k = n r+1 + d − 2. Since q > r + 1, we let A be a (d − 2) × (r + 1) Vandermonde matrix over Fq such that A1 = 1 A  1  0 H =  ... is a (d − 1) × (r + 1) Vandermonde matrix. Define ( r+n1 + d − 2) × n matrix Where 1 and 0 are all-1 and all-0 vectors in Fqr+1, respectively. We partition the columns of n H into r+1 blocks B1, . . . , B r+n1 such that h ∈ Bi if its i-th component is non-zero. From the expression of matrix H, it is clear that each column belongs to exactly one block and the columns in distinct blocks are linearly independent. Moreover, any d − 1 columns in the same block are linearly independent due to the property of Vandermonde matrix A1. Next we show that any d − 1 columns of H are linearly independent. It suffices to verify this claim for the case d = 4. To see this, we pick any three columns hi, hj, ht from H. Nothing needs to prove if these three columns belong to the same block. We assume that they belong to at least two blocks. Without loss of generality, ht is in a block that does not contain hi and hj. From above observation, we see that ht is linearly independent from hi and hj. It is clear hi and hj are linearly independent no matter whether they belong to the same block or different blocks. Thus, any 3 columns of H are linearly independent. Let C be the linear code whose parity-check matrix is H. It is clear that C has length n, dimension n rn k(C) > n − r+1 − (d − 2) = (r+1) − (d − 2), distance d(C) > d and locality r. The condition d − 2 6 r leads to d k(rC) e > r+1 and thus k(C) + d k(rC) e > n − (d − 2). The desired result n follows since d(C) > d > n − k(C) − d k(rC) e + 2. J S. Ball . On large subsets of a finite vector space in which every subset of basis size is a basis . J. Eur , 14 : 733 - 748 , October 2012 . Walker, editors, Algebraic Geometry for Coding Theory and Cryptography , pages 95 - 126 . s, Springer, 2017 . A. Barg , I. Tamo , and S. Vlăduţ . Locally recoverable codes on algebraic curves . IEEE Trans. Inform.Theory , 63 : 4928 - 4939 , 2017 . V. Cadambe and A. Mazumda. Bounds on the size of locally recoverable codes . IEEE Trans. Inform.Theory , 61 : 5787 - 5794 , 2015 . Discrete Mathematics , 324 ( 6 ): 78 - 84 , 2014 . IEEE Trans. Inform.Theory , 58 : 6925 - 6934 , 2012 . S. Gopi , V. Guruswami , and S. Yekhanin . On maximally recoverable local reconstruction codes . Electronic Colloquium on Computational Complexity , 24 ::183, 2017 . J. Han and L. A. Lastras-Montano . Reliable memories with subline accesses . In Proc. IEEE Internat. Sympos. Inform. Theory , pages 2531 - 2535 , 2007 . C. Huang , M. Chen , and J. Li . Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems . In Sixth IEEE International Symposium on Network Computing and Applications , pages 79 - 86 , 2007 . Erasure coding in windows azure storage . In USENIX Annual Technical Conference (ATC) , pages 15 - 26 , 2012 . L. Jin , L. Ma, and Xing C. Construction of optimal locally repairable codes via automorphism groups of rational function fields . URL: https://arxiv.org/abs/1710.09638. O. Kolosov , A. Barg , I. Tamo , and G. Yadgar . Optimal lrc codes for all lengths n 6 q. URL: https://arxiv.org/pdf/ 1802 .00157. X. Li , L. Ma , and C. Xing . Optimal locally repairable codes via elliptic curves . To appear in IEEE Trans. Inf. Theory , 2017 . URL: https://arxiv.org/abs/1712.03744. Y. Luo , C. Xing , and C. Yuan . Optimal locally repairable codes of distance 3 and 4 via cyclic codes . To appear in IEEE Trans. Inf. Theory , 2018 . arXiv: 1801 .03623. D. S. Papailiopoulos and A. G. Dimakis . Locally repairable codes . IEEE Trans. Inform.Theory , 60 : 5843 - 5855 , 2014 . N. Prakash , G. M. Kamath , V. Lalitha , and P. V. Kumar . Optimal linear codes with a local-error-correction property . In Proc. 2012 IEEE Int. Symp. Inform. Theory , pages 2776 - 2780 , 2012 . M. Sathiamoorthy , M. Asteris , D. S. Papailiopoulos , A. G. Dimakis , R. Vadali , S. Chen , and D. Borthakur . XORing elephants: novel erasure codes for big data . Proceedings of VLDB Endowment (PVLDB) , pages 325 - 336 , 2013 . N. Silberstein , A. S. Rawat , O. O. Koyluoglu , and S. Vichwanath . Optimal locally repairable codes via rank-matric codes . In Proc. IEEE Int. Symp. Inf. Theory , pages 1819 - 1823 , 2013 . I. Tamo and A. Barg . A family of optimal locally recoverable codes . IEEE Trans. Inform.Theory , 60 : 4661 - 4676 , 2014 . I. Tamo , D. S. Papailiopoulos , and A. G. Dimakis . Optimal locally repairable codes and connections to matroid theory . IEEE Trans. Inform.Theory , 62 : 6661 - 6671 , 2016 . Z. Zhang , J. Xu, and M. Liu . Constructions of optimal locally repairable codes over small fields . SCIENTIA SINICA Mathematica , 47 ( 11 ): 1607 - 1614 , 2017 .


This is a preview of a remote PDF: http://drops.dagstuhl.de/opus/volltexte/2018/9445/pdf/LIPIcs-APPROX-RANDOM-2018-41.pdf

Venkatesan Guruswami, Chaoping Xing, Chen Yuan. How Long Can Optimal Locally Repairable Codes Bel, LIPICS - Leibniz International Proceedings in Informatics, 2018, 41:1-41:11, DOI: 10.4230/LIPIcs.APPROX-RANDOM.2018.41