Perturbation of convex risk minimization and its application in differential private learning algorithms

Journal of Inequalities and Applications, Jan 2017

Convex risk minimization is a commonly used setting in learning theory. In this paper, we firstly give a perturbation analysis for such algorithms, and then we apply this result to differential private learning algorithms. Our analysis needs the objective functions to be strongly convex. This leads to an extension of our previous analysis to the non-differentiable loss functions, when constructing differential private algorithms. Finally, an error analysis is then provided to show the selection for the parameters.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

Perturbation of convex risk minimization and its application in differential private learning algorithms

Nie and Wang Journal of Inequalities and Applications Perturbation of convex risk minimization and its application in differential private learning algorithms Weilin Nie 0 Cheng Wang 0 0 Huizhou University , Huizhou, P.R Convex risk minimization is a commonly used setting in learning theory. In this paper, we firstly give a perturbation analysis for such algorithms, and then we apply this result to differential private learning algorithms. Our analysis needs the objective functions to be strongly convex. This leads to an extension of our previous analysis to the non-differentiable loss functions, when constructing differential private algorithms. Finally, an error analysis is then provided to show the selection for the parameters. learning algorithms; differential privacy; convex risk minimization; perturbation 1 Introduction In learning theory, convex optimization is one of the powerful tools in analysis and algorithm designs, which is especially used for empirical risk minimization (ERM) (Vapnik  []). When running on a sensitive data set, algorithms may leak private information. This has motivated the notion of differential privacy (Dwork et al. ,  [, ]). For the sample space Z, denote the Hamming distance between two sample sets {z1, z2} ∈ Zm as d(z1, z2) = #{i = , . . . , m : z,i = z,i}, i.e., there is only one element that is different. Then -differential privacy is defined as Definition  A random algorithm A : Zm → H is -differential private if for every two Pr A(z1) ∈ O ≤ e · Pr A(z2) ∈ O . Throughout the paper, we assume <  for meaningful privacy guaranties. The relaxerature. However, it is out of our scope and we will just focus on the -differential privacy © The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. throughout the paper. Extension of our results to ( , δ)-differential privacy or concentrated differential privacy [] may be studied in future work. A mechanism obtains differential privacy usually by adding a perturbation term to an original definite output (Dwork et al.  []), i.e., the so-called Laplacian mechanism. McSherry and Talwar  [] proposed the exponential mechanism, which chooses an output based on its utility function. Indeed, the two mechanisms are related, and both of them are dependent with some kinds of sensitivity of the original definite output. We refer to Dwork  [] and Ji et al.  [] for a general idea of the differential private algorithms and applications. A line of work, beginning with Chaudhuri et al.  [], introduced the output perturbation and objective perturbation algorithm to obtain differential privacy for the ERM algorithms. This is following [–], etc. However, most of the literature needs a differentiable loss function, sometimes a double-differentiable condition is required (see [] for detail analysis). This limits the application for the algorithms, such as ERM algorithms with hinge loss (SVM) or pinball loss ([]), and it motivates our work. On the other hand, sensitivity in a differential private algorithm, which can be considered as the perturbation for the ERM algorithms, or the stability, has been studied in Bousquet and Elisseeff  [] and Shalev-Shwartz et al.  [] in the classical learning theory setting. More recently, the relationship between the stability and differential privacy has been revealed in Wang et al.  []. The main contribution of this paper is to present a different perturbation analysis for the ERM algorithms, in which the condition is just in having convex loss functions and strongly convex regularization terms. Thus the output perturbation mechanisms can still be valid directly in SVM or other non-differentiable loss cases. Besides, an error analysis is conducted, from which we find a choice for the parameter to balance the privacy and generalization ability. 2 Perturbation analysis for ERM algorithms In this section we consider the general regularized ERM algorithms. Let X be a compact metric space, and output Y ⊂ R, where |y| ≤ M for some M > . (We refer to Cucker and Smale  [] and Cucker and Zhou  [] for more details as regards this learning theory setting.) A function fz,A : X → Y is obtained via some algorithm A based on the sample z = {zi}im= = {(xi, yi)}im=, which is drawn according to a distribution function ρ on the sample space Z := X × Y . Furthermore, we assume there is a marginal distribution ρX on X and a conditional distribution ρ(y|x) on Y given some x. Firstly we introduce our notations which will be used in the following statements and analysis. Let the loss function L(f (x), y) be positive and convex for the first variable. Denote E (f ) = Ez(f ) = m i= Without loss of generality, we set z¯ = {z, z, . . . , zm–, z¯m}, which replaces the last element of z, and z– = {z, z, . . . , zm–} as a sample set deleting the last element of z. Then similar notations can be given: Ez–(f ) = m –  i= L f (xi), yi + L f (x¯m), y¯m Denote (HK , · K ) as the reproducing kernel Hilbert space (RKHS) on X, i.e., HK := span{K (x, ·), x ∈ X}, where K : X × X → R is a Mercer kernel. Let Kx(y) = K (x, y) for any x, y ∈ X, and κ = supx,y∈X K (x, y). Then the reproducing property tells us that f (x) = f , Kx K . Now a typical regularized ERM algorithm can be stated as Here λ >  is the regularization parameter and (f ) is a γ -strongly (γ > ) convex function with respect to the K norm, i.e., for any f, f ∈ HK and t ∈ [, ], This definition of being strongly convex is taken from Sridharan  [], where the authors derived some kind of uniform convergence under the strongly convex assumption. It has been widely used in the subsequent literature such as [, , , ], etc. By denoting we have the following result. fz – fz¯ K ≤ Ez(fS) – Ez¯ (fS) ≤ mB . Proof We will prove the result in three steps. () For any S ∈ Zm and fS from (), It is obvious from the definition above that ≤ mB . Theorem  Let fz and fz¯ be defined as above. is γ -strongly convex and L is convex w.r.t. the first variable. Assume there is a B >  such that λ (fS) ≤ B and |L(fS(x), y)| ≤ B for any S ∈ Zm, m ∈ N and (x, y) ∈ Z. Then we have () The minimization of the two objective functions are close, i.e., Ez(fz) + λ (fz) – Ez¯ (fz¯ ) + λ (fz¯ ) ≤ mB . From the notations above, we have Ez(fz–) + λ (fz–) ≥ Ez(fz) + λ (fz), A similar analysis for fz¯ can be given as follows: Note that im= L(fz(xi), yi) + λm (fz) is indeed m(Ez(fz) + λ (fz)), and the two lower bounds above is the same, we have ≤ max L fz–(xm), ym + λ (fz–), L fz–(x¯m), y¯m + λ (fz–) . We can deduce that Ez(fz) + λ (fz) – Ez¯ (fz¯ ) + λ (fz¯ ) ≤ mB . () Now we can prove our main result. Since is γ -strongly convex, and L(f (x), y) is convex w.r.t. the first argument, which leads to the convexity of Ez(f ), for any  < t < , it follows that tfz + ( – t)fz¯ Simply taking t =  we have fz – fz¯ K ≤ which proves our result. Now let us make a brief remark about this result. In our theorem, only convexity for the loss function and γ -strongly convexity for are assumed. The assumption λ (fS) ≤ B is trivial for algorithms such as general SVM or coefficient regularization [], since ES(fS) + λ (fS) is the minimum value. The advantage of this result is that most of our learning algorithms satisfy this condition, especially including hinge loss for SVM and pinball loss for quantile regression. Perturbation, or stability analysis has already been performed in [, ]. There the authors proposed quite a few stability definitions, which is mainly used for classical generalization analysis. References [, ] also studied the differential private learning algorithms with different kernels and Lipschitz losses, with a regularization term of square norm. A similar result to theirs with our notations is as follows. Theorem  Let fz, fz¯ , fz– be defined as above. Assume |L(t, y) – L(t, y)| ≤ CL|t – t| for any t, t, y and some CL > , then we have Proof From the convexity of the loss function and regularization term, we have, for any f ∈ HK and  < t < , Ez(fz) + λ (fz) ≤ Ez tfz + ( – t)f + λ tfz + ( – t)f This leads to Let t tend to , we have for any f ∈ HK . Similarly, we also have for any f ∈ HK . Therefore, By adding the two equations we have λγ fz¯ – fz K ≤ Ez¯ (fz) – Ez(fz) + Ez¯ (fz)Ez¯ (fz¯ ) and the theorem is proved. Though the condition for the latter result is stronger than the first one, we will still apply this to the analysis below, as the bound is sharper and most of the loss functions satisfy the Lipschitz condition above. 3 Differential private learning algorithms In this section, we will describe the general differential private learning algorithms based on an output perturbation method. Perturbation ERM algorithms give a random output by adding a random perturbation term on the above deterministic output. That is, fA,z = fz + b, where fz is derived from (). To determine the distribution of b, we firstly recall the sensitivity, introduced in Dwork  [], in our settings. Definition  We denote f as the maximum infinite norm of difference between the outputs when changing one sample point in z. Let z and z¯ be defined as in the previous section, and fz and fz¯ be derived from () accordingly, we can see that Then a similar result to [] is the following. Lemma  Assume f is bounded by B > , and b has a density function proportional to exp{– B|b| }, then algorithm () provides -differential privacy. Proof For all possible output function r, and z, z¯ differ in last element, So by the triangle inequality, Pr{fz,A = r} = Pr{b = r – fz} ∝ exp – |r – fz| b B Pr{fz¯,A = r} = Pr{b = r – fz¯ } ∝ exp – |r – fz¯ | . b B ≤ e Pr{fz¯,A = r}. Then the lemma is proved by a union bound. Combining this with the result in the previous section, we can choose the noise term b as follows. Proposition  Assume the conditions in Theorem  hold, and b takes value in (–∞, +∞), we choose the density of b to be α exp(– λγκmCL|b| ), where α = λκγmCL , then the algorithm () provides -differential privacy. Proof Since from the previous section we have for any z and z¯ differing in the last sample point. Then from the reproducing property, 4 Error analysis In this section, we conduct the error analysis for the general differential private ERM algorithm (). We denote E (fz,A) – E (fρ ) ≤ E (fz,A) – E (fρ ) + λ (fz) as our goal function. In the following in this section, we always assume the Lipshitz continuous condition for the loss function, i.e. |L(t, y) – L(t, y)| ≤ CL|t – t| for any t, t, y and some CL > . Now let us introduce our error decomposition, ≤ E (fz,A) – Ez(fz,A) + Ez(fz,A) – Ez(fz) + Ez(fz) + λ (fz) – E (fρ ) ≤ E (fz,A) – Ez(fz,A) + Ez(fz,A) – Ez(fz) + Ez(fλ) + λ (fλ) – E (fρ ) R = E (fz,A) – Ez(fz,A), R = Ez(fz,A) – Ez(fz), Here R and R involve the function fz,A from random algorithm () so we call them random errors. S and D(λ) are similar to the classical ones in the literature in learning theory and are called sample error and approximation error. In the following we will study these errors, respectively. 4.1 Concentration inequality and error bounds for random errors To bound the first random error, we need a concentration inequality. Dwork et al.  [] have proposed such an inequality under their differential private setting. Soon Bassily et al.  [] gave a different proof for the concentration inequality, which enlightens our error analysis. Theorem  If an algorithm A provides -differential privacy, and outputs a positive function gz,A : Z → R with bounded expectation Ez,A m im= gz,A(zi) ≤ G some G > , where the expectation is taken over the sample and the output of the random algorithm. Then m i= m i= ≤ G gz,A(zi) ≤ G . m i=  m = e m i= = e Ez,A +∞ +∞ m i= m i= ≤  – e– Ez,A gz,A(zi) ≤ G . m i= On the other hand,  m = +∞  m = e m i= +∞ This leads to m i= These verify our results. Remark  In [] and [], the authors restrict the function to take values in [, ] or {, } for their special use, our result here extends the result to the function taking values in R+. This makes our following error analysis implementable. Since y is bounded by M >  throughout our paper, it is reasonable to assume that Ez() = m im= L(, yi) ≤ B for some B >  depending just on M. Then we apply this concentration inequality to the random error R. Proposition  Let fz,A be obtained from algorithm (). Assume Ez() ≤ B for some constant B > . We have Ez,AR = Ez,A E (fz,A) – Ez(fz,A) ≤  B˜ +  Ez,AR, m i= m i= gz,A(zi) = Ez,AL fz,A(xi), yi = Ez,AR + Ez,AEz(fz) Ez(fz) ≤ Ez(fz) + λ (fz) ≤ Ez() + λ () ≤ B + λ (), m i= gz,A(zi) ≤ Ez,AR + B + λ (). By applying the concentration inequality for the given gz,A we can prove the result with constant B˜ = (B + λ ()). For the random error R, we have the following estimation. Proposition  For the function fz,A obtained from algorithm (), we have Ez,AR = Ez,A Ez(fz,A) – Ez(fz) ≤ λκγCmL . Proof Note that L fz,A(xi), yi – L fz(xi), yi ≤ CL fz,A(xi) – fz(xi) = CL|b|. Ez,AR = Ez,A m i= L fz,A(xi), yi – L fz(xi), yi This verifies our bound. 4.2 Error estimate for the other error terms For the sample error and approximation error, we choose fλ to be some function in HK close to fρ , which satisfies |L(fλ(x), y)| ≤ Bρ for some Bρ > . Explicit expressions of fλ and Bρ will be presented in the next section, with respect to different algorithms. To bound the sample error, we should recall the Hoeffding inequality []. Lemma  Let ξ be a random variable on a probability space Z satisfying |ξ (z) – Eξ | ≤ for some >  for almost all z ∈ Z. Denote σ  = σ (ξ ), then, for any t > , m i= Now we have the following proposition. Ez,AS ≤ Proof Since S = Ez,AS ≤ Ez|S| = +∞ +∞ Pzr |S| ≥ t dt dt ≤ and the proposition is proved. Let us turn to the approximation error D(λ). It is difficult to give the upper bound for the abstract approximation error. So we use the natural assumption on D(λ), which is for some  < β <  and cβ > . This assumption is trivial in concrete algorithms; see [– ], etc. 4.3 Total error bound Now we can deduce our total error by combining all the error bounds above. Theorem  Let fz,A defined as (), fρ defined as above. Assume Ez() ≤ B, |L(fλ(x), y)| ≤ Bρ , and () hold. By choosing = /√λm and λ = m–/(β+) we have B +  () + Proof By substituting the upper bounds above in the error decomposition (), we have Ez,A E (fz,A) – E (fρ ) ≤  B + λ () Here we present a general convergence result for the general differential private ERM learning algorithms. In this theorem, we provide a choice for the parameters and λ, under some conditions above, which leads to a learning rate m–β/(β+) with fixed B and γ . However, in an explicit algorithm B and γ may depend on λ and the learning rate will vary accordingly. We cannot go further without a specific description of the algorithms, which will be studied in the next section. 5 Applications In this section, we will apply our results to several frequently used learning algorithms. First of all, let us take a look at the assumptions as regards fρ . Denote the integral operator LK as LK f (t) = X f (x)K (x, t) dρX(x). It is well known that [] LK ≤ κ. Then fρ ∈ LrK (LρX ) for some r >  is often used in learning theory literature. When r = /, it is the same as fρ ∈ HK []. It is natural if we consider L(π (f (x)), y) ≤ L(f (x), y) for any function f and (x, y) ∈ Z, which means π (f (x)) is more close than f (x) to y in some sense, as |y| ≤ M. Here ⎧⎪ M, f (x) > M, π f (x) = ⎨⎪ f (x), –M ≤ f (x) ≤ M, ⎪⎪⎩ –M, f (x) < –M. Then Z(π (fρ (x)), y) dρ ≤ Z(fρ (x), y) dρ, i.e., |fρ (x)| ≤ M always holds. So without loss of generality, we also assume fρ ∞ ≤ M. 5.1 Differential private least squares regularization Our first example is the differential private least squares regularization algorithm, fzl,sA = fzls + bls. Such an algorithm has been studied in our previous work []. Now we will try to apply the above analysis. Firstly we can verify that (f ) = f K is -strongly convex, i.e., γ =  in our settings. Since Ez(fzls) + λ fzls K ≤ Ez() +  ≤ M with |y| ≤ M, we have fzls K ≤ √M , λ which leads to fzls ∞ ≤ κ√Mλ for any z ∈ Zm. Therefore though the least square loss is not Lipschitz continuous, it satisfies = fSls(x) – y  – fSls (x) – y  ≤ fSls(x) + fSls (x) – y · fSls(x) – fSls (x) ≤ for any S, S ∈ Zm. So we set CL = M√(κ+) in Proposition . Then bls has a density function λ α exp{– α|b| } with α = Mλκ/(mκ+) , which makes the algorithm provide -differential privacy. A generalization analysis for this algorithm can also be found in []. What we shall mention here is that direct use of our error bound in the previous section leads to an unsatisfactory learning rate, since CL tends to ∞ when m → ∞. However, note that for any i = , , . . . , m, then When fρls ∈ LrK (LρX ), let fλ = (LK + λI)–LK fρ , we have Bρ = M, and () holds with β = min{, r} in Theorem  []. Then by choosing = /(λm  ) and λ = (/m) (β+) , we can derive an error bound in the form of for some C˜ independent with m, from the total error bound in the last section. We omit the detail complex analysis here. 5.2 Differential private SVM The second example is differential private SVM. We describe the SVM algorithm as in [], i.e., when Y = {–, +}, fzh,A = fzh + bh, where the hinge loss Lh(f (x), y) = ( – yf (x))+ = max{,  – yf (x)} is used in the ERM setting. Then the output classifier is sgn(fzh,A). Firstly we consider the differential privacy of this algorithm. Note that |a+ – b+| ≤ |a – b| for any a, b ∈ R, by the discussion, we have Then CL =  and γ =  in Proposition . Therefore bh here has a density function /α exp{– α|b| } with α = λκm . In this case, we have, for any possible output set O, where z¯ differs from z in one element. Then, for any possible classifier g defined on X, Pr fzh,A ∈ O ≤ e Pr fz¯h,A ∈ O , This verifies the -differential privacy of the algorithm. Now let us turn to the error analysis. When hinge loss is applied in the ERM setting, Theorem  of [] reveals the comparison theorem, that is, denote R(f ) = Pr(y = f (x)) = X Pr(y = f (x)|x) dρX , then R(f ) – R(fc) ≤ for any measurable function f . Here fc(x) = ⎨⎧ , Pr(y = |x) ≥ Pr(y = –|x), ⎩ –, Pr(y = |x) < Pr(y = –|x). Ez,A R fzh,A – R(fc) ≤ Still we choose stepping-stone function fλ = (LK + λI)–LK fρh, which leads to fλ ∞ ≤ M and Bρ = (M + ). Reference [] shows that D(λ) ≤ λmin{r,}, so we can follow the choice for and λ in Theorem  with β = min{r, } to get the learning rate as Ez,A R fzh,A – R(fc) ≤ C˜ where C˜ is a constant independent of m. 6 Results and conclusions In this paper, we present two results in the analysis of the differential private convex risk minimization algorithms. The first one is the perturbation results for general convex risk minimization algorithms. We studied two cases of the general algorithms. The second one is applied in the following analysis, as it leads to a sharper upper bound of the error between two outputs differ in  sample point. However, the first one is more relaxed, without Lipschitz continuity of the loss function. Based on such perturbation results we obtain a choice for the random terms of the differential private algorithms, i.e., Proposition . This gives us a theoretical and practical construction of differential private algorithms. An error analysis is the second contribution of this paper. The analysis relies on the concentration inequality in the setting of differential privacy. After conducting a different error decomposition using the above concentration inequality, we provide an upper bound or learning rate of the expected generalization error. In this result we find a selection of the parameter of differential privacy and the regularization parameter λ, both of which depend on the sample size m. Since smaller always means more effective privacy protection, this indicates that generalized algorithms must not be too much privacy protected. In [], the authors proposed that the learning rate can be  under the strong assumption on the loss function and with regularization term  f K . However, the differential private parameter is fixed there. In this paper we obtain a learning rate  with weak conditions on the loss function and r ≥  when choosing appropriate parameters and λ. As we pointed out above, should not be too small to derive convergent algorithms. In fact, for a fixed , we as well can deduce a learning rate of  (with a slightly different form); see [] for a detailed analysis. Competing interests The authors declare that they have no competing interests. Authors’ contributions All authors contributed equally to the writing of this paper. All authors read and approved the final manuscript. Acknowledgements This work is supported by NSFC (Nos. 11326096, 11401247), NSF of Guangdong Province in China (No. 2015A030313674), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (No. 2013LYM_0089), National Social Science Fund in China (No. 15BTJ024), Planning Fund Project of Humanities and Social Science Research in Chinese Ministry of Education (No. 14YJAZH040) and Doctor Grants of Huizhou University (No. C511.0206). The authors would like to thank the associated editors and anonymous referees for their valuable comments and suggestions, which have helped to improve the paper. 1. Vapnik , V: Statistical Learning Theory . Wiley, New York ( 1998 ) 2. Dwork , C: Differential privacy . In: ICALP , pp. 1 - 12 . Springer, Berlin ( 2006 ) 3. Dwork , C, Rothblum, GN : Concentrated differential privacy . arXiv:1603.01887 4. Dwork , C, McSherry , F, Nissim , K, Smith, A: Calibrating noise to sensitivity in private data analysis . In: Theory of Cryptography , pp. 265 - 284 . Springer, Berlin ( 2006 ) 5. McSherry , F, Talwar , K: Mechanism design via differential privacy . In: Proceedings of the 48th Annual Symposium on Foundations of Computer Science , pp. 94 - 103 ( 2007 ) 6. Dwork , C: Differential privacy: a survey of results . In: Theory and Applications of Models of Computation. Lecture Notes in Computer Science , vol. 4978 , pp. 1 - 19 ( 2008 ) 7. Ji , ZL, Lipton, ZC, Elkan , C: Differential privacy and machine learning: a survey and review ( 2014 ). arXiv:1412.7584 8. Chaudhuri , K, Monteleoni, C, Sarwate, AD : Differentially private empirical risk minimization . J. Mach. Learn. Res . 12 , 1069 - 1109 ( 2011 ) 9. Kifer , D, Smith, A, Thakurta, A: Private convex empirical risk minimization and high-dimensional regression . In: Conference on Learning Theory , pp. 25 . 1 - 25 . 40 10. Jain , P, Thakurta, AG : Differentially private learning with kernels . In: ICML ( 2013 ) 11. Jain , P, Thakurta, AG: (Near) dimension independent risk bounds for differentially private learning . In: ICML (2014) 12. Bassily , R, Smith, A, Thakurta, A: Differential private empirical risk minimization: efficient algorithms and tight error bounds . In: FOCS. IEEE ( 2014 ) 13. Bassily , R, Nissim , K, Smith, A, Steinke, T, Stemmer, U, Ullman, J: Algorithmic stability for adaptive data analysis ( 2015 ). arXiv:1511.02513 14. Steinwart , I, Christmann, A: Estimating conditional quantiles with the help of the pinball loss . Bernoulli 17 ( 1 ), 211 - 225 ( 2008 ) 15. Bousquet , O, Elisseeff, A: Stability and generalization. J. Mach. Learn. Res . 2 , 499 - 526 ( 2002 ) 16. Shalev-Shwartz , S, Shamir, O, Screbro, N, Scridharan , K: Learnability, stability and uniform convergence . J. Mach. Learn. Res . 11 , 2635 - 2670 ( 2010 ) 17. Wang , Y-X , Lei, J, Fienberg, SE : Learning with differential privacy: stability, learnability and the sufficiency and necessity of ERM principle . arXiv:1502.06309 18. Cucker , F, Smale , S: On the mathematical foundations of learning. Bull. Am. Math. Soc . 39 , 1 - 49 ( 2002 ) 19. Cucker , F, Zhou, DX: Learning Theory: An Approximation Theory Viewpoint . Cambridge University Press, Cambridge ( 2007 ) 20. Sridharan , K, Srebro , N, Shalev-Shwartz , S : Fast rates for regularized objectives . In: Advances in Neural Information Processing Systems 22 , pp. 1545 - 1552 ( 2008 ) 21. Wu , Q, Zhou, DX: Learning with sample dependent hypothesis space . Comput. Math. Appl. 56 , 2896 - 2907 ( 2008 ) 22. Rubinstein , BIP, Bartlett, PL, Huang , L, Taft, N : Learning in a large function space: privacy-preserving mechanisms for SVM learning . J. Priv. Confid . 4 ( 1 ), 65 - 100 ( 2012 ) 23. Dwork , C, Feldman, V, Hardt , M, Pitassi, T, Reingold , O, Roth, A: Preserving statistical validity in adaptive data analysis . In: ACM Symposium on the Theory of Computing (STOC). ACM ( 2015 ) 24. Hoeffding , W: Probability inequalities for sums of bounded random variables . J. Am. Stat. Assoc . 58 (301), 13 - 30 ( 1963 ) 25. Wang , C, Zhou, DX: Optimal learning rates for least squares regularized regression with unbounded sampling . J. Complex . 27 , 55 - 67 ( 2011 ) 26. Shi , L: Learning theory estimates for coefficient-based regularized regression . Appl. Comput. Harmon. Anal . 34 , 252 - 265 ( 2013 ) 27. Xiang , DH: Conditional quantiles with varying Gaussians . Adv. Comput. Math. 38 , 723 - 735 ( 2013 ) 28. Nie , WL, Wang , C: Error analysis and variable selection for differential private learning algorithm . Preprint ( 2016 ) 29. Smale , S, Zhou, DX: Learning theory estimates via integral operators and their applications . Constr. Approx. 26 , 153 - 172 ( 2007 ) 30. Chen , DR, Wu , Q, Ying , Y, Zhou , DX: Support vector machine soft margin classifiers: error analysis . J. Mach. Learn. Res . 5 , 1143 - 1175 ( 2004 ) 31. Xiang , DH, Hu, T, Zhou, DX: Approximation analysis of learning algorithms for support vector regression and quantile regression . J. Appl. Math . 2012 , Article ID 902139 ( 2012 ). doi:10.1155/2012/902139

This is a preview of a remote PDF:

Weilin Nie, Cheng Wang. Perturbation of convex risk minimization and its application in differential private learning algorithms, Journal of Inequalities and Applications, 2017, 9, DOI: 10.1186/s13660-016-1280-0