Q1:spark streaming 可以不同数据流 join吗?
Spark Streaming不同的数据流可以进行join操作;
Spark Streaming is an extension of the coreSpark API that allows enables high-throughput, fault-tolerant stream processingof live data streams. Data can be ingested from many sources like Kafka, Flume,Twitter, ZeroMQ or plain old TCP sockets and be processed using complexalgorithms expressed with high-level functions like map
, reduce
, join
and window
join(otherStream, [numTasks]):When called on twoDStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairswith all pairs of elements for each key.
Q2:flume 与 spark streaming 适合 集群模式吗?
Flume与Spark Streaming是为集群而生的;
For input streams that receive data over the network (suchas, Kafka, Flume, sockets, etc.), the default persistence level is set toreplicate the data to two nodes for fault-tolerance.
Using any input source that receives datathrough a network - Fornetwork-based data sources like Kafka and Flume, the received input data isreplicated in memory between nodes of the cluster (default replication factoris 2).
Q3:spark有缺点嘛?
Spark的核心缺点在于对内存的占用比较大;
在以前的版本中Spark对数据的处理主要的是粗粒度的,难以进行精细的控制;
后来加入Fair模式后可以进行细粒度的处理;
Q4:spark streming现在有生产使用吗?
Spark Streaming非常易于在生产环境下使用;
无需部署,只需安装好Spark,,就按照好了Spark Streaming;
国内像皮皮网等都在使用Spark Streaming;
本文发布于:2024-02-02 06:40:21,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170682722042046.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |