二类分类问题的泛化误差上界的证明

阅读: 评论:0

二类分类问题的泛化误差上界的证明

二类分类问题的泛化误差上界的证明

文章目录

  • Markov 不等式
  • Hoeffding 不等式
    • Hoeffding 引理
  • 二类分类问题的泛化误差上界
  • 参考文献


Markov 不等式


对于非负随机变量 X X X 及 a > 0 a>0 a>0,有:
P ( X ≥ a ) ≤ E [ X ] a P(X geq a) leq frac{E[X]}{a} P(X≥a)≤aE[X]​

Proof:
E [ X ] = ∫ 0 + ∞ x f ( x ) d x ≥ ∫ a + ∞ a f ( x ) d x = a ⋅ P ( x ≥ a ) E[X] = int_{0}^{ +infty } xf(x) , dx geq int_{a}^{ +infty } af(x) , dx = a cdot P(x geq a) E[X]=∫0+∞​xf(x)dx≥∫a+∞​af(x)dx=a⋅P(x≥a)
对于离散情况,把积分变为求和即可。


Hoeffding 不等式


考虑独立随机变量 X i ∈ [ a i , b i ] ( i = 1 , 2 , . . . , n ) X_{i} in [a_{i}, b_{i}] , (i=1,2,...,n) Xi​∈[ai​,bi​](i=1,2,...,n) 的和 S n = ∑ i = 1 n X i S_{n} = sum_{i=1}^n X_{i} Sn​=∑i=1n​Xi​,则对任意 s > 0 s > 0 s>0 有:
P ( ∣ S n − E [ S n ] ∣ ≥ t ) ≤ e − 2 t 2 ∑ i = 1 n ( b i − a i ) 2 P(|S_{n} - E[S_{n}]| geq t) leq e^{ frac {-2t^2} {sum_{i=1}^{n} (b_{i} - a_{i})^2} } P(∣Sn​−E[Sn​]∣≥t)≤e∑i=1n​(bi​−ai​)2−2t2​

Proof:

Hoeffding 引理


考虑独立随机变量 X i ∈ [ a , b ] ( i = 1 , 2 , . . . , n ) X_{i} in [a, b] , (i=1,2,...,n) Xi​∈[a,b](i=1,2,...,n), E [ X ] = 0 E[X] = 0 E[X]=0,则对任意 s > 0 s > 0 s>0 有:
E [ e s X ] ≤ e 1 8 s 2 ( b − a ) 2 E[e^{sX}] leq e^{frac{1}{8} s^2 (b-a)^2} E[esX]≤e81​s2(b−a)2

Proof:

注意到 e s X e^{sX} esX 是关于 X X X 的一个凸函数,则有:
e s X ≤ b − X b − a e s a + X − a b − a e s b e^{sX} leq frac{b-X}{b-a} e^{sa} + frac{X-a}{b-a} e^{sb} esX≤b−ab−X​esa+b−aX−a​esb
对上式两边取期望,结合条件 E [ X ] = 0 E[X] = 0 E[X]=0 有:
E [ e s X ] ≤ b − E [ X ] b − a e s a + E [ X ] − a b − a e s b = b b − a e s a − a b − a e s b E[e^{sX}] leq frac{b-E[X]}{b-a} e^{sa} + frac{E[X]-a}{b-a} e^{sb} = frac{b}{b-a} e^{sa} - frac{a}{b-a} e^{sb} E[esX]≤b−ab−E[X]​esa+b−aE[X]−a​esb=b−ab​esa−b−aa​esb
令 θ = − a b − a theta = - frac{a}{b-a} θ=−b−aa​, u = s ( b − a ) u=s(b-a) u=s(b−a),由于 E [ X ] = 0 E[X] = 0 E[X]=0 ,故 a ≤ 0 ≤ b a leq 0 leq b a≤0≤b, θ ≥ 0 theta geq 0 θ≥0, u ≥ 0 u geq 0 u≥0,得:
b b − a e s a − a b − a e s b frac{b}{b-a} e^{sa} - frac{a}{b-a} e^{sb} b−ab​esa−b−aa​esb
= − a b − a e s a ( e s ( b − a ) − b a ) = − a b − a e − s ( b − a ) − a b − a ( e s ( b − a ) − b − a a − 1 ) = -frac{a}{b-a} e^{sa} (e^{s(b-a)} - frac{b}{a}) = -frac{a}{b-a} e^{-s(b-a) frac{-a}{b-a}} (e^{s(b-a)} - frac{b-a}{a} - 1) =−b−aa​esa(es(b−a)−ab​)=−b−aa​e−s(b−a)b−a−a​(es(b−a)−ab−a​−1)
= θ e − θ u ( e u + 1 θ − 1 ) = e − θ u ( θ e u + 1 − θ ) = theta e^{- theta u} (e^u + frac{1}{theta} - 1) = e^{- theta u} (theta e^u + 1 - theta) =θe−θu(eu+θ1​−1)=e−θu(θeu+1−θ)
令 ψ ( u ) = e − θ u ( θ e u + 1 − θ ) psi (u) = e^{- theta u} (theta e^u +1 - theta) ψ(u)=e−θu(θeu+1−θ),由于 u ≥ 0 u geq 0 u≥0,故 ψ ( u ) > 0 psi (u) >0 ψ(u)>0。
令 ϕ ( u ) = ln ⁡ [ ψ ( u ) ] = − θ u + ln ⁡ ( θ e u + 1 − θ ) phi (u) = ln [psi (u)] = - theta u + ln (theta e^u + 1 - theta) ϕ(u)=ln[ψ(u)]=−θu+ln(θeu+1−θ),
根据泰勒公式, ∃ v ∈ [ 0 , u ] exists v in [0,u] ∃v∈[0,u],使得:
ϕ ( u ) = ϕ ( 0 ) + u ⋅ ϕ ′ ( 0 ) + u 2 2 ! ⋅ ϕ ′ ′ ( v ) ( 1 ) phi (u) = phi (0) + u cdot phi ' (0) + frac{u^2}{2!} cdot phi '' (v) quad (1) ϕ(u)=ϕ(0)+u⋅ϕ′(0)+2!u2​⋅ϕ′′(v)(1)
其中:
ϕ ′ ( u ) = − θ + θ e u θ e u + 1 − θ phi ' (u) = - theta + frac {theta e^u} {theta e^u + 1 - theta} ϕ′(u)=−θ+θeu+1−θθeu​
ϕ ′ ′ ( u ) = θ e u ( 1 − θ ) ( θ e u + 1 − θ ) 2 phi '' (u) = frac {theta e^u (1 - theta)} {(theta e^u + 1 - theta)^2} ϕ′′(u)=(θeu+1−θ)2θeu(1−θ)​
可得:
ϕ ( 0 ) = 0 phi (0) = 0 ϕ(0)=0
ϕ ′ ( 0 ) = 0 phi ' (0) = 0 ϕ′(0)=0
ϕ ′ ′ ( u ) = 1 2 + ( θ 1 − θ e u + 1 − θ θ e − u ) ≤ 1 4 phi '' (u) = frac {1} {2 + (frac{theta}{1 - theta} e^u + frac{1 - theta}{theta} e^{-u})} leq frac{1}{4} ϕ′′(u)=2+(1−θθ​eu+θ1−θ​e−u)1​≤41​
由 (1) 式得:
ϕ ( u ) ≤ 1 8 u 2 = 1 8 s 2 ( b − a ) 2 phi (u) leq frac{1}{8} u^2 = frac{1}{8} s^2 (b-a)^2 ϕ(u)≤81​u2=81​s2(b−a)2
故:
E [ e s X ] ≤ b b − a e s a − a b − a e s b = ψ ( u ) = e ϕ ( u ) ≤ e 1 8 s 2 ( b − a ) 2 E[e^{sX}] leq frac{b}{b-a} e^{sa} - frac{a}{b-a} e^{sb} = psi (u) = e^{phi (u)} leq e^{frac{1}{8} s^2 (b-a)^2} E[esX]≤b−ab​esa−b−aa​esb=ψ(u)=eϕ(u)≤e81​s2(b−a)2
Hoeffding 引理得证。

下面证明 Hoeffding 不等式。

首先证明不等式 P ( S n − E [ S n ] ≥ t ) ≤ e − 2 t 2 ∑ i = 1 n ( b i − a i ) 2 P(S_{n} - E[S_{n}] geq t) leq e^{frac {-2t^2} {sum_{i=1}^{n} (b_{i} - a_{i})^2}} P(Sn​−E[Sn​]≥t)≤e∑i=1n​(bi​−ai​)2−2t2​。
对于任意 s > 0 s>0 s>0,有:
P ( S n − E [ S n ] ≥ t ) = P ( e s ( S n − E [ S n ] ) ≥ e s t ) P(S_{n} - E[S_{n}] geq t) = P(e^{s(S_{n} - E[S_{n}])} geq e^{st}) P(Sn​−E[Sn​]≥t)=P(es(Sn​−E[Sn​])≥est)
由于随机变量 e s ( S n − E [ S n ] ) > 0 e^{s(S_{n} - E[S_{n}])} > 0 es(Sn​−E[Sn​])>0,则由 Markov 不等式得:
P ( e s ( S n − E [ S n ] ) ≥ e s t ) ≤ E [ e s ( S n − E [ S n ] ) ] e s t = e − s t ⋅ ∏ i = 1 n E [ e s ( X i − E [ X i ] ) ] P(e^{s(S_{n} - E[S_{n}])} geq e^{st}) leq frac {E[e^{s(S_{n} - E[S_{n}])}]} {e^{st}} = e^{-st} cdot prod_{i=1}^{n} E[e^{s(X_{i} - E[X_{i}])}] P(es(Sn​−E[Sn​])≥est)≤estE[es(Sn​−E[Sn​])]​=e−st⋅i=1∏n​E[es(Xi​−E[Xi​])]
对于随机变量 X i − E [ X i ] X_{i} - E[X_{i}] Xi​−E[Xi​],有 E [ X i − E [ X i ] ] = 0 E[X_{i} - E[X_{i}]] = 0 E[Xi​−E[Xi​]]=0,且 X i − E [ X i ] ∈ [ a i − E [ X i ] , b i − E [ X i ] ] X_{i} - E[X_{i}] in [a_{i} - E[X_{i}], b_{i} - E[X_{i}]] Xi​−E[Xi​]∈[ai​−E[Xi​],bi​−E[Xi​]]。
则由 Hoeffding 引理得:
e − s t ⋅ ∏ i = 1 n E [ e s ( X i − E [ X i ] ) ] ≤ e − s t ⋅ ∏ i = 1 n e s 2 ( b i − a i ) 2 8 = e − s t + 1 8 s 2 ∑ i = 1 n ( b i − a i ) 2 e^{-st} cdot prod_{i=1}^{n} E[e^{s(X_{i} - E[X_{i}])}] leq e^{-st} cdot prod_{i=1}^{n} e^{frac {s^2 (b_{i} - a_{i})^2} {8}} =e^{-st + frac{1}{8} s^2 sum_{i=1}^{n} (b_{i} - a_{i})^2} e−st⋅i=1∏n​E[es(Xi​−E[Xi​])]≤e−st⋅i=1∏n​e8s2(bi​−ai​)2​=e−st+81​s2∑i=1n​(bi​−ai​)2
令 g ( s ) = − s t + 1 8 s 2 ∑ i = 1 n ( b i − a i ) 2 g(s) = -st + frac{1}{8} s^2 sum_{i=1}^{n} (b_{i} - a_{i})^2 g(s)=−st+81​s2∑i=1n​(bi​−ai​)2,则:
g ′ ( s ) = − t + 1 4 s ∑ i = 1 n ( b i − a i ) 2 g'(s) = -t + frac{1}{4} s sum_{i=1}^{n} (b_{i} - a_{i})^2 g′(s)=−t+41​si=1∑n​(bi​−ai​)2
求解 g ′ ( s ) = 0 g'(s) = 0 g′(s)=0,得:
s = 4 t ∑ i = 1 n ( b i − a i ) 2 s = frac {4t} {sum_{i=1}^{n} (b_{i} - a_{i})^2} s=∑i=1n​(bi​−ai​)24t​
故:
P ( S n − E [ S n ] ≥ t ) ≤ e − 2 t 2 ∑ i = 1 n ( b i − a i ) 2 P(S_{n} - E[S_{n}] geq t) leq e^{frac {-2t^2} {sum_{i=1}^{n} (b_{i} - a_{i})^2}} P(Sn​−E[Sn​]≥t)≤e∑i=1n​(bi​−ai​)2−2t2​
不等式 P ( E [ S n ] − S n ≥ t ) ≤ e − 2 t 2 ∑ i = 1 n ( b i − a i ) 2 P(E[S_{n}] -S_{n} geq t) leq e^{frac {-2t^2} {sum_{i=1}^{n} (b_{i} - a_{i})^2}} P(E[Sn​]−Sn​≥t)≤e∑i=1n​(bi​−ai​)2−2t2​ 的证明同理。

Hoeffding 不等式得证。


二类分类问题的泛化误差上界


定理:

对二类分类问题,当假设空间是有限个函数的集合 F = { f 1 , f 1 , . . . , f d } F = {f_{1}, f_{1}, ..., f_{d}} F={f1​,f1​,...,fd​} 时,对任意一个函数 f ∈ F f in F f∈F,至少以概率 1 − δ 1 - delta 1−δ,以下不等式成立:
R ( f ) ≤ R ^ ( f ) + ε ( d , N , δ ) ( 2 ) R(f) leq hat R(f) + varepsilon (d, N, delta) quad (2) R(f)≤R^(f)+ε(d,N,δ)(2)
其中,不等式左侧 R ( f ) R(f) R(f) 为泛化误差:
R ( f ) = E [ L ( Y , f ( X ) ) ] R(f) = E[L(Y, f(X))] R(f)=E[L(Y,f(X))]
不等式右侧即为泛化误差的上界:
R ^ ( f ) = 1 N ∑ i = 1 N L ( y i , f ( x i ) ) hat R(f) = frac{1}{N} sum_{i=1}^{N} L(y_{i}, f(x_{i})) R^(f)=N1​i=1∑N​L(yi​,f(xi​))
ε ( d , N , δ ) = 1 2 N ( ln ⁡ d + ln ⁡ 1 δ ) ( 3 ) varepsilon (d, N, delta) = sqrt {frac{1}{2N} (ln {d} + ln {frac{1}{delta}})} quad (3) ε(d,N,δ)=2N1​(lnd+lnδ1​) ​(3)
在泛化误差上界中,第1项是训练误差;
第2项 ε ( d , N , δ ) varepsilon (d, N, delta) ε(d,N,δ) 是 N N N 的单调递减函数,当 N → + ∞ N to +infty N→+∞ 时趋于 0 0 0;同时它也是 ln ⁡ d sqrt {ln d} lnd ​ 阶的函数,假设空间 F F F 包含的函数越多,其值越大。

Proof:

对任意函数 f ∈ F f in F f∈F, R ^ ( f ) hat R(f) R^(f) 是 N N N 个独立的随机变量 L ( Y , f ( X ) ) 的 样 本 均 值 , L(Y, f(X)) 的样本均值, L(Y,f(X))的样本均值,R(f)$ 是随机变量 L ( Y , f ( X ) ) L(Y, f(X)) L(Y,f(X)) 的期望值,则:
P ( R ( f ) − R ^ ( f ) ≥ ε ) = P ( N ⋅ R ( f ) − N ⋅ R ^ ( f ) ≥ N ⋅ ε ) P(R(f) - hat R(f) geq varepsilon) = P(N cdot R(f) - N cdot hat R(f) geq N cdot varepsilon) P(R(f)−R^(f)≥ε)=P(N⋅R(f)−N⋅R^(f)≥N⋅ε)
= P ( N ⋅ E [ L ( Y , f ( X ) ) ] − ∑ i = 1 N L ( y i − f ( x i ) ) ≥ N ⋅ ε ) = P(N cdot E[L(Y, f(X))] - sum_{i=1}^{N} L(y_{i} - f(x_{i})) geq N cdot varepsilon) =P(N⋅E[L(Y,f(X))]−i=1∑N​L(yi​−f(xi​))≥N⋅ε)
令 S n = ∑ i = 1 N L ( y i , f ( x i ) ) S_{n} = sum_{i=1}^{N} L(y_{i}, f(x_{i})) Sn​=∑i=1N​L(yi​,f(xi​)),则:
E [ S n ] = E [ ∑ i = 1 N L ( y i , f ( x i ) ) ] = N ⋅ E [ L ( Y , f ( X ) ] E[S_{n}] = E[sum_{i=1}^{N} L(y_{i}, f(x_{i}))] = N cdot E[L(Y, f(X)] E[Sn​]=E[i=1∑N​L(yi​,f(xi​))]=N⋅E[L(Y,f(X)]
如果损失函数取值于区间 [ 0 , 1 ] [0, 1] [0,1],即对所有 i i i, [ a i , b i ] = [ 0 , 1 ] [a_{i}, b_{i}] = [0, 1] [ai​,bi​]=[0,1],则由 Hoeffding 不等式可知,对任意 ε > 0 varepsilon > 0 ε>0,以下不等式成立:
P ( N ⋅ E [ L ( Y , f ( X ) ) ] − ∑ i = 1 N L ( y i − f ( x i ) ) ≥ N ⋅ ε ) P(N cdot E[L(Y, f(X))] - sum_{i=1}^{N} L(y_{i} - f(x_{i})) geq N cdot varepsilon) P(N⋅E[L(Y,f(X))]−i=1∑N​L(yi​−f(xi​))≥N⋅ε)
= P ( E [ S n ] − S n ≥ N ⋅ ε ) ≤ e − 2 ( N ε 2 ) N ( 1 − 0 ) 2 = e − 2 N ε 2 = P(E[S_{n}] - S_{n} geq N cdot varepsilon) leq e^{frac {-2(N varepsilon ^ 2)} {N(1-0)^2}} = e^{-2N varepsilon ^ 2} =P(E[Sn​]−Sn​≥N⋅ε)≤eN(1−0)2−2(Nε2)​=e−2Nε2
由于 F = { f 1 , f 1 , . . . , f d } F = {f_{1}, f_{1}, ..., f_{d}} F={f1​,f1​,...,fd​} 是一个有限集合,故:
P ( ∃ f ∈ F : R ( f ) − R ^ ( f ) ≥ ε ) = P ( ⋃ f ∈ F { R ( f ) − R ^ ( f ) ≥ ε } ) P(exists f in F: , R(f) - hat R(f) geq varepsilon) = P(bigcup_{f in F} { R(f) - hat R(f) geq varepsilon }) P(∃f∈F:R(f)−R^(f)≥ε)=P(f∈F⋃​{R(f)−R^(f)≥ε})
≤ ∑ f ∈ F P ( R ( f ) − R ^ ( f ) ≥ ε ) ≤ d ⋅ e − 2 N ε 2 leq sum_{f in F} P(R(f) - hat R(f) geq varepsilon) leq d cdot e^{-2N varepsilon ^ 2} ≤f∈F∑​P(R(f)−R^(f)≥ε)≤d⋅e−2Nε2
或者,等价的, ∀ f ∈ F forall f in F ∀f∈F,有:
P ( R ( f ) − R ^ ( f ) < ε ) ≥ 1 − d ⋅ e − 2 N ε 2 P(R(f) - hat R(f) < varepsilon) geq 1 - d cdot e^{-2N varepsilon ^ 2} P(R(f)−R^(f)<ε)≥1−d⋅e−2Nε2
令:
δ = d ⋅ e − 2 N ε 2 ( 4 ) delta = d cdot e^{-2N varepsilon ^ 2} quad (4) δ=d⋅e−2Nε2(4)
可得:
P ( R ( f ) < R ^ ( f ) + ε ) ≥ 1 − δ P(R(f) < hat R(f) + varepsilon) geq 1 - delta P(R(f)<R^(f)+ε)≥1−δ
即:至少以概率 1 − δ 1 - delta 1−δ,有 R ( f ) < R ^ ( f ) + ε R(f) < hat R(f) + varepsilon R(f)<R^(f)+ε,其中 ε varepsilon ε 由式 (4) 得到,即为式 (3)。
故不等式 (2) 得证,即定理得证。


参考文献


李航【统计学习方法】第一版,1.6.2

本文发布于:2024-01-31 03:57:57,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170664467825255.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:上界   误差   二类
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23