原文:
Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
n the 3-dimensional space is that described in: [K. P. Bennett and O. L. Mangasarian: "Robust Linear Programming Discrimination of Two Linearly Inseparable Sets", Optimization Methods and Software 1, 1992, 23-34].
This database is also available through the UW CS ftp server:
ftp ftp.cs.wisc.edu
cd math-prog/cpo-dataset/machine-learn/WDBC/
Also can be found on UCI Machine Learning Repository: +Cancer+Wisconsin+%28Diagnostic%29
Attribute Information:
1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)
Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)
The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.
All feature values are recoded with four significant digits.
Missing attribute values: none
Class distribution: 357 benign, 212 malignant
译:
威斯康星州乳腺癌(诊断)数据集
预测癌症是良性还是恶性
特征是从一个乳腺肿块的细针抽吸(FNA)的数字化图像计算出来的。它们描述了图像中细胞核的特征。
n三维空间描述如下:【K.P.Bennett和O.L.Mangasarian:“两个线性不可分集的鲁棒线性规划判别”,《优化方法与软件》,1992年,23-34]。
该数据库也可通过UW CS ftp服务器获得:
资金转移定价ftp.cs.wisc文件.edu公司
cd数学程序/cpo数据集/机器学习/WDBC/
也可以在UCI机器学习库中找到:+Cancer+Wisconsin+%28Diagnostic%29
属性信息:
1) 身份证号码
2) 诊断(M=恶性,B=良性)
(第32-32页)
计算每个细胞核的10个实值特征:
a) 半径(从中心到周界各点的平均距离)
b) 纹理(灰度值的标准偏差)
c) 周长
d) 面积
e) 平滑度(半径长度的局部变化)
f) 密实度(周长^2/面积-1.0)
g) 凹度(轮廓凹陷部分的严重程度)
h) 凹点(轮廓凹面部分的数量)
i) 对称性
j) 分形维数(“海岸线近似值”-1)
平均值、标准误差和“最差”或最大值(三者中的平均值
最大值)为每个图像计算这些特征,
产生了30个特征。例如,场3是平均半径
13是半径SE,字段23是最差半径。
所有特征值用四个有效数字重新编码。
缺少属性值:无
分类分布:良性357例,恶性212例
大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“2020101809”获取下载链接。
本文发布于:2024-02-02 00:28:15,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170681037040196.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |