本章主要总结几个概念性的东西:
1、target shuffle:当你训练了一个model后,想要验证model的预测效果,可以首先用model预测若干sample的label,然后将这些label重新shuffle,得到label*,将这2种label与sample实际的label_y进行比较,如果label与label_y大部分相同,而label*与label_y仅有几例是相同的,说明,model能够有效预测sample。target shuffle这种方法可以convince client to believe the model。
参考文章:target shuffle
2、confidence interval(置信区间) && the level of confidence(置信度)
置信区间 一般与 置信度 相对应,如下图所示:90%的置信度 对应的 置信区间为:[53610,62279]。
对于上述置信度和置信区间,我们可以这样理解:我们有90%的把握说,总体的真值在[53610,62279]区间内。
一般,置信度越小,置信区间越窄。
3、在真实世界,鲜少有raw data服从normal distribution,要判断一个raw data是否服从normal distribution(Guassian),我们可以通过如下步骤进行判断:
step1:将raw data中的value进行z-score转换,记为Z;(z-score就是将value进行如下转化:(value - mean)/std )
step2:将Z进行升序排序;
step3:plots each value’s z-score on the y-axis; the x-axis is the corresponding quantile of a normal distribution for that value’s rank.
step4:If the points roughly fall on the diagonal line, then the sample distribution can be considered close to normal.
在R中的代码实现如下:
norm_samp <- rnorm(100)
qqnorm(norm_samp)
abline(a=0, b=1, col='grey')
4、介绍几种分布
Distribution | Parameter | 描述事件 |
---|---|---|
Possion distribution | lambda:The rate (per unit of time or space) at which events occur ;lambda = mean = variance | The frequency distribution of the number of events in sampled units of time or space. |
Exponential distribution | lambda:The rate (per unit of time or space) at which events occur ; | The frequency distribution of the time or distance from one event to the next event. |
Weibull distribution | Weibull distribution中有2个参数,一个是shape parameter “beta”:用于刻画event rate,If “beta>1”,the probability of an event increases over time, if “beta< 1”, it decreases;另一个是scale parameter “eta”,用于刻画“特征寿命”, | A generalized version of the exponential, in which the event rate is allowed to shift over time. |
本文发布于:2024-02-02 09:07:05,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170683602442766.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |