【论文记录】Stochastic gradient descent with differentially private updates

阅读: 评论:0

【论文记录】Stochastic gradient descent with differentially private updates

【论文记录】Stochastic gradient descent with differentially private updates

记录几条疑问

  • The sample size required for a target utility level increases with the privacy constraint.
  • Optimization methods for large data sets must also be scalable.
  • SGD algorithms satisfy asymptotic guarantees


Introduction

  • 主要工作简介:
    quad In this paper we derive differentially private versions of single-point SGD and mini-batch SGD, and evaluate them on real and synthetic data sets.

  • 更多运用SGD的原因:
    quad Stochastic gradient descent (SGD) algorithms are simple and satisfy the same asymptotic guarantees as more computationally intensive learning methods.

  • 由于asymptotic guarantees带来的影响:
    quad to obtain reasonable performance on finite data sets practitioners must take care in setting parameters such as the learning rate (step size) for the updates.

  • 上述影响的应对之策:
    quad Grouping updates into “minibatches” to alleviate some of this sensitivity and improve the performance of SGD. This can improve the robustness of the updating at a moderate expense in terms of computation, but also introduces the batch size as a free parameter.


Preliminaries

  • 优化目标:
    quad solve a regularized convex optimization problem : w ∗ = argmin w ∈ R d λ 2 ∥ w ∥ 2 + 1 n Σ i = 1 n l ( w , x i , y i ) w^* = mathop{ textbf{argmin} } limits_{ w in mathbb{R}^d} frac{lambda}{2} Vert w Vert ^2 + frac{1}{n} mathop{ Sigma }limits_{i=1}^n mathbb{l} (w,x_i,y_i) w∗=w∈Rdargmin​2λ​∥w∥2+n1​i=1Σn​l(w,xi​,yi​)
    quad where w w w is the normal vector to the hyperplane separator, and l mathbb{l} l is a convex loss function.
    quad 若 l mathbb{l} l 选为 logistic loss, 即 l ( w , x , y ) = l o g ( 1 + e − y w T x ) mathbb{l} (w,x,y)=log(1+e^{-yw^Tx}) l(w,x,y)=log(1+e−ywTx), 则 ⇒ Rightarrow ⇒ Logistic Regression
    quad 若 l mathbb{l} l 选为 hinge loss, 即 l ( w , x , y ) = mathbb{l} (w,x,y)= l(w,x,y)= max ( 0 , 1 − y w T x ) (0,1-yw^Tx) (0,1−ywTx), 则 ⇒ Rightarrow ⇒ SVM

  • 优化算法:
    quad SGD with mini-batch updates : w t + 1 = w t − η t ( λ w t + 1 b Σ ( x i , y i ) ∈ B t ▽ l ( w t , x i , y i ) ) w_{t+1} = w_t - eta_t Big( lambda w_t + frac{1}{b} mathop{Sigma}limits_{ (x_i,y_i) in B_t} triangledown mathbb{l} (w_t,x_i,y_i) Big) wt+1​=wt​−ηt​(λwt​+b1​(xi​,yi​)∈Bt​Σ​▽l(wt​,xi​,yi​))
    quad where η t eta_t ηt​ is a learning rate, the update at each step t t t is based on a small subset B t B_t Bt​ of examples of size b b b.



SGD with Differential Privacy

  • 满足差分隐私的 mini-batch SGD :
    quad A differentially-private version of the mini-batch update : w t + 1 = w t − η t ( λ w t + 1 b Σ ( x i , y i ) ∈ B t ▽ l ( w t , x i , y i ) + 1 b Z t ) w_{t+1} = w_t - eta_t Big( lambda w_t + frac{1}{b} mathop{Sigma}limits_{ (x_i,y_i) in B_t} triangledown mathbb{l} (w_t,x_i,y_i) ,+ frac{1}{b}Z_t Big) wt+1​=wt​−ηt​(λwt​+b1​(xi​,yi​)∈Bt​Σ​▽l(wt​,xi​,yi​)+b1​Zt​)
    quad where Z t Z_t Zt​ is a random noise vector in R d mathbb R ^d Rd drawn independently from the density: ρ ( z ) ∝ e − ( α / 2 ) ∥ z ∥ rho(z) propto e^{-(alpha/2) |z|} ρ(z)∝e−(α/2)∥z∥

  • 使用上式的 mini-batch update 时, 此种updates满足 α alpha α-differentially private的条件:
    quad T h e o r e m mathcal{Theorem ,} Theorem If the initialization point w o w_o wo​ is chosen independent of the sensitive data, the batches B t B_t Bt​ are disjoint, and if ∥ ▽ l ( w , x , y ) ∥ ≤ 1 | triangledown mathbb l(w,x,y)| leq 1 ∥▽l(w,x,y)∥≤1 for all w w w, and all ( x i , y i ) (x_i,y_i) (xi​,yi​), then SGD with mini-batch updates is α alpha α-differentially private.



Experiments

  • 实验现象:
    quad batch size 为1时DP-SGD的方差比普通的SGD更大。但 batch size 调大后则方差减小了很多。

  • 由此而总结出的经验:
    quad In terms of objective value, guaranteeing differential privacy can come for “free” using SGD with moderate batch size.

  • 实际上 batch size 带来的影响是先减后增
    quad increasing the batch size improved the performance of private SGD, but there is a limit , much larger batch sizes actually degrade performance.



额外记录几条经验

  • 数据维度 d d d与隐私保护参数会影响实验所需的数据量:
    quad Differentially private learning algorithms often have a sample complexity that scales linearly with the data dimension d d d and inversely with the privacy risk α alpha α. Thus a moderate reduction in α alpha α or increase in d d d may require more data.


Ref

S. Song, K. Chaudhuri, and A. Sarwate. Stochastic gradient descent with differentially private updates. In GlobalSIP Conference, 2013.

本文发布于:2024-02-04 23:53:40,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170719264660909.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:论文   gradient   Stochastic   descent   private
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23