【机器学习之 第4章决策树】

阅读: 评论:0

【机器学习之 第4章决策树】

【机器学习之 第4章决策树】

文章目录

  • 决策树作业
    • 1. 题目
    • 2. 解答
      • 2.1 有毒的信息熵的多少
      • 2.2 应该选择哪个属性作为决策树的根?
      • 2.3 信息增益为多少
      • 2.4 构建一个决策树,将蘑菇分类为有毒或不有毒。

决策树作业

1. 题目

you are stranded on a deserted island. Mushrooms of various type grow wildly all over the island,but no other food is anywhere to be found. Some of the mushrooms have been determined as poisonous and others as not determined by your former companions trial and error .You are the only one remaining on the island. You have the following data to consider .

you know whether or not mushrooms A through H are poisonous ,but you do not know about U through W,Build a decision tree to classify mushroom as poisonous or not.

question

(a) What is the entropy of IsPoisonous?
(b) Which attribute should you choose as the root of a decision tree? Hint: You canfigure this out by looking at the data without explicitly computing the informationgain of all four attributes.
© What is the information gain of the attribute you chose in the previous question?

(d) Build a decision tree to classify mushrooms as poisonous or not.
(e) Classify mushrooms U, V, and W using this decision tree as poisonous or notpoisonous.

你被困在一个荒岛上。岛上到处都是各种各样的蘑菇,但找不到其他食物。

有些蘑菇被认为是有毒的,而另一些则不是,这是你以前的同伴反复试验的结果。您需要考虑以下数据。

你知道蘑菇A到H是否有毒,但你不知道蘑菇U到W,建立一个决策树来分类蘑菇是否有毒。

(a) 有毒的信息熵的多少

(b) 应该选择哪个属性作为决策树的根?提示:您可以通过查看数据而不显式地计算所有四个属性的信息。

© 你在上一个问题中选择的属性的信息增益是什么?

(d) 构建一个决策树,将蘑菇分类为有毒或不有毒。

(e) 用这个决策树将蘑菇分类为U、V和W,分别为有毒或不有毒。

Examle(样例名称)IsHeavy(是否重)IsSmelly(是否有味道)IsSpotted(是否有斑点)IsSmooth(是否光滑)IsPoisonous(是否有毒)
A00000
B00100
C11010
D10011
E01101
F00111
G00011
H11001
U1111?
V0101?
W1100?

2. 解答

2.1 有毒的信息熵的多少

信息熵公式 E n t r o p y ( t ) = − ∑ i = 1 c − 1 p ( i ∣ t ) l o g 2 p ( i ∣ t ) Entropy(t)=-sum_{i=1}^{c-1}p(imid t)log_2p(imid t) Entropy(t)=−∑i=1c−1​p(i∣t)log2​p(i∣t)

信息增益的公式 G a i n ( D , a ) = E n t r o p y ( D ) − ∑ i = 1 k ∣ D i ∣ ∣ D ∣ E n t r o p y ( D i ) Gain(D,a)=Entropy(D)-sum_{i=1}^kfrac{|D_i|}{|D|}Entropy(D_i) Gain(D,a)=Entropy(D)−∑i=1k​∣D∣∣Di​∣​Entropy(Di​)

计算信息熵
E n t r o p y ( i s P o i s o n o u s ) = − 5 8 log ⁡ ( 5 8 ) − 3 8 log ⁡ ( 3 8 ) = 0.954434002924965 E n t r o p y ( i s h e a v y ) = − 5 8 log ⁡ ( 5 8 ) − 3 8 log ⁡ ( 3 8 ) = 0.954434002924965 E n t r o p y ( i s S m e l l y ) = − 5 8 log ⁡ ( 5 8 ) − 3 8 log ⁡ ( 3 8 ) = 0.954434002924965 E n t r o p y ( i s S p o t t e d ) = − 5 8 log ⁡ ( 5 8 ) − 3 8 log ⁡ ( 3 8 ) = 0.954434002924965 E n t r o p y ( i s S m o o t h ) = − 5 8 log ⁡ ( 5 8 ) − 3 8 log ⁡ ( 3 8 ) = 1.0 Entropy(isPoisonous)=-frac{5}{8}log(frac{5}{8})-frac{3}{8}log(frac{3}{8})=0.954434002924965 \ Entropy(isheavy)=-frac{5}{8}log(frac{5}{8})-frac{3}{8}log(frac{3}{8})=0.954434002924965 \ Entropy(isSmelly)=-frac{5}{8}log(frac{5}{8})-frac{3}{8}log(frac{3}{8})=0.954434002924965 \ Entropy(isSpotted)=-frac{5}{8}log(frac{5}{8})-frac{3}{8}log(frac{3}{8})=0.954434002924965 \ \ Entropy(isSmooth)=-frac{5}{8}log(frac{5}{8})-frac{3}{8}log(frac{3}{8})=1.0 Entropy(isPoisonous)=−85​log(85​)−83​log(83​)=0.954434002924965Entropy(isheavy)=−85​log(85​)−83​log(83​)=0.954434002924965Entropy(isSmelly)=−85​log(85​)−83​log(83​)=0.954434002924965Entropy(isSpotted)=−85​log(85​)−83​log(83​)=0.954434002924965Entropy(isSmooth)=−85​log(85​)−83​log(83​)=1.0

2.2 应该选择哪个属性作为决策树的根?

一共8组数据

3,1个没毒,2个有毒不重5,2个没毒,3个有毒
有味道3,1个没毒,2个有毒没有味道5,2个没毒,3个有毒
有斑点3,1个没毒,2个有毒无斑点5, 2个没毒,3个有毒
光滑4,2个没毒,2个有毒不光滑4,1个没毒,3个有毒
有毒5无毒3

首先看重量这个属性的 信息增益计算

isHeavy-5个有毒-3个没毒 重-2个没毒-3个有毒 不重-1个没毒-2个有毒

计算公式如下
G a i n ( i s h e a v y ) = E n t r o p y ( i s P o i s o n o u s ) − ∣ D i ∣ ∣ D ∣ E n t r o p y ( n o h e a v y ) − ∣ D i ∣ ∣ D ∣ E n t r o p y ( h e a v y ) = 0.954434002924965 − 5 8 [ − 2 5 log ⁡ ( 2 5 ) − 3 5 log ⁡ ( 3 5 ) ] − 3 8 [ − 2 3 log ⁡ ( 2 3 ) − 1 3 log ⁡ ( 1 3 ) ] = 0.0032289436203635224 begin{aligned} Gain(isheavy)&=Entropy(isPoisonous)-frac{|D_i|}{|D|}Entropy(noheavy)-frac{|D_i|}{|D|}Entropy(heavy) \ &=0.954434002924965-frac{5}{8}[-frac{2}{5}log(frac{2}{5})-frac{3}{5}log(frac{3}{5})]-frac{3}{8}[-frac{2}{3}log(frac{2}{3})-frac{1}{3}log(frac{1}{3})] \ &=0.0032289436203635224 end{aligned} Gain(isheavy)​=Entropy(isPoisonous)−∣D∣∣Di​∣​Entropy(noheavy)−∣D∣∣Di​∣​Entropy(heavy)=0.954434002924965−85​[−52​log(52​)−53​log(53​)]−83​[−32​log(32​)−31​log(31​)]=0.0032289436203635224​

计算Smooth

isSmooth-5个有毒-3个没毒 光滑-2个没毒-2个有毒 不光滑-1个没毒-3个有毒

计算 G a i n ( S m o o t h ) = 0.048794940695398636 Gain(Smooth)=0.048794940695398636 Gain(Smooth)=0.048794940695398636

G a i n ( S m e l l y ) = 0.0032289436203635224 Gain(Smelly)=0.0032289436203635224 Gain(Smelly)=0.0032289436203635224

G a i n ( S p o t t e d ) = 0.0032289436203635224 Gain(Spotted)=0.0032289436203635224 Gain(Spotted)=0.0032289436203635224

选择光滑的

2.3 信息增益为多少

信息增益为 0.048794940695398636

2.4 构建一个决策树,将蘑菇分类为有毒或不有毒。

先分成2分

不光滑 光滑 IsSmooth ABEH CDFG

然后再开始分

对ABEH,进行算

1,0个没毒,1个有毒不重3,2个没毒,1个有毒
有味道2,2个没毒,0个有毒没有味道2,0个没毒,2个有毒
有斑点2,1个没毒,1个有毒无斑点2, 1个没毒,1个有毒
有毒2无毒2

G a i n ( h e a v y ) = 0.31127812445913283 Gain(heavy)=0.31127812445913283 Gain(heavy)=0.31127812445913283

G a i n ( s m e l l ) = 1 Gain(smell)=1 Gain(smell)=1

G a i n ( S p o t ) = 0 Gain(Spot)=0 Gain(Spot)=0

所以选择isSmelly这个属性,开始计算

不光滑 光滑 没有味道 有味道 没有味道 有味道 IsSmooth ABEH CDFG isSmelly isSmelly AB EH DFG C

本文发布于:2024-02-01 10:36:50,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170675501036020.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:机器   决策树
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23