马廷淮
,
男
,
教授
,
1974
年
3
月出生
,
E
-
:
thma
@
nuist
.
edu
.
cn
。
Parallel
Implementing
KNN
Classification
Algorithm
Using
MapReduce
Programming
Mode
Yan
Yonggang
1
,
M
a
Tinghuai
1,
2
,
W
ang
J
ian
3
(
1.
Schoo
l
o
f
Co
mput
er
and
Softw
ar
e,
N
anjing
U
niver
sity
of
I
nfo
rmat
ion
Science
&
T
echnolo
gy
,
N
anjing
,
210044,
China;
2.
Jiang
su
Eng
ineering
Center
o
f
N
etw
or
k
M
onito
ring,
Nanjing
U
niver
sity
o
f
Infor
mation
Science
&
T
echnolog
y,
Nanjing,
210044,
China
;
3.
School
o
f
Electr
onic
Science
and
Engineer
ing
,
N
anjing
U
niver
sity
,
N
anjing
,
210093,
China
)
Abstract
:
In
order
to
im
pro
ve
the
ability
of
KNN
algo
rithm
to
process
massive
data,
a
new
technique
based
on
Hadoop
platform
is
used
.
Co
nsidering
the
char
acteristics
o
f
the
KNN
alg
orithm
itself
,
the
par
-
allelism
of
KNN
based
on
the
M
apReduce
prog
ramm
ing
mo
del
is
implemented
.
T
hr
ee
functio
ns
are
de-
signed
for
the
implem
entation
o
f
the
par
allelism,
named
M
ap,
Com
bine
and
Reduce.
The
Similarity
be-
tw
een
each
test
instances
and
the
training
dataset
are
evaluated
by
M
ap
functio
n.
For
reducing
the
com
-
putational
com
plex
ity
and
saving
netw
ork
bandw
idth
,
the
Combine
function
is
used
as
a
local
Reduce
op-
er
ation.
Reduce
function
is
used
to
get
the
KN
N
classification
based
on
the
inter
mediate
results.
T
he
ex
-
periment
on
the
Hadoop
platform
show
s
the
method
has
excellent
linear
speedup
w
ith
an
increasing
num
-
ber
of
co
mputer
nodes
and
goo
d
scalability
.
Key
words
:
KNN
classification
;
parallel
com
puting
;
M
apReduce
progr
am
ming
model
;
Hado
op
随着信息技术的进步以及信息化社会的发展
,
在科
学研
究、
计算
机仿
真、
互联
网应
用和电
子商
务
[
1]
等领域
,
数据量呈现快速增长的趋势。
比如
,
大
型强子对撞机每年积累的新数据量为
15
PB
左右
;
沃尔玛公司每天通过
6
000
多个商店
,
向全球客户
销售超过
2.
67
亿件商品等。为了分析和利用这些
庞大的数据资源
,
必须依赖有效的数据分析技术。
数据挖掘技术是一种通过分析海量数据从中寻找
本文发布于:2024-01-29 16:30:47,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170651705116598.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |