crontab是针对每个用户而言
规则:*分 *时 *天 *月 *星期 cmd
针对Hadoop任务作业调度:
适合ETL
官方文档:.1.0-cdh5.11.1/DG_Examples.html
例子:
以cloudera用户为例:
// 创建cloudera用户主目录
# hdfs dfs -mkdir -p /user/cloudera//将Oozie自带的example放到hdfs上
# cd /opt/cloudera/parcels/CDH/share/doc/oozie-4.1.0+cdh5.11.1+431
# tar -zxvf -C ~
# hdfs dfs -mkdir examples
# hdfs dfs -put examples/* examples
模仿Oozie 自带的example运行MapReduce Action:
//将Oozie自带的examples中的map-reduce例子复制到oozie-apps/mr-wordcount-wf/中
# mkdir oozie-apps
# cp -r examples/apps/map-reduce/ oozie-apps/
# cd oozie-apps
# mv map-reduce mr-wordcount-wf//删除无用的文件
# cd mr-wordcount-wf
# rm -rf job-with-config-class.properties
# rm -l
# rm -rf lib/oozie-examples-4.1.0-cdh5.11.1.jar//最后只保留下面三个文件
# ls
job.properties lib l
(1)job.properties
- 关键点: 指向l文件所在的HDFS位置
(2l
(3)lib目录: 依赖的jar包
MapReduce Action:
如何使用Oozie调度MapReduce程序
关键点:将以前Java MapReduce程序中的【Driver】部分定义在 Configuration XML文件中
关键点: 指向l文件所在的HDFS位置
job.properties
# See the License for the specific language governing permissions and
# limitations under the License.
## 定义变量,供job.properties和l使用
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
oozieAppsRoot=user/cloudera/oozie-apps
oozieDataRoot=user/cloudera/oozie/datas# 执行l在HDFS中的位置
oozie.wf.application.path=${nameNode}/${oozieAppsRoot}/linputDir=mr-wordcount-wf/input
outputDir=mr-wordcount-wf/output
# hadoop jar wordcount2.jar wordcount2.WordCount2 /user/hdfs/mapreduce/wordcount/input /user/hdfs/mapreduce/wordcount/output
所用 Jar 包为:wordcount2.jar
主类为: wordcount2.WordCount2
Mapper类为:wordcount2.TokenizerMapper
Reducer类为:wordcount2.IntSumReducer
注意:
MapReduce程序中的Mapper类为:wordcount2.TokenizerMapper, Reducer类为:wordcount2.IntSumReducer,但是 wordcount.jar包中生成的类如下:
从 wordcount2.jar包中可以看出:
Mapper类为wordcount2.WordCount2 TokenizerMapper,Reducer类为:wordcount2.WordCount2 T o k e n i z e r M a p p e r , R e d u c e r 类 为 : w o r d c o u n t 2. W o r d C o u n t 2 <script type="math/tex" id="MathJax-Element-4">TokenizerMapper,
Reducer类为:wordcount2.WordCount2</script>IntSumReducer
<!--Licensed to the Apache Software Foundation (ASF) under oneor more contributor license agreements. See the NOTICE filedistributed with this work for additional informationregarding copyright ownership. The ASF licenses this fileto you under the Apache License, Version 2.0 (the"License"); you may not use this file except in compliancewith the License. You may obtain a copy of the License at.0Unless required by applicable law or agreed to in writing, softwaredistributed under the License is distributed on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions andlimitations under the License.
-->
<workflow-app xmlns="uri:oozie:workflow:0.5" name="mr-wordcount-wf"><start to="mr-node-wordcount"/><action name="mr-node-wordcount"><map-reduce><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><prepare><delete path="${nameNode}/${oozieDataRoot}/${outputDir}"/></prepare><configuration><property><name>w-api</name><value>true</value></property><property><name&w-api</name><value>true</value></property><property><name>mapreduce.job.queuename</name><value>${queueName}</value></property><property><name>mapreduce.job.map.class</name><value>wordcount2.WordCount2$TokenizerMapper</value></property><property><name>mapreduce.map.output.key.class</name><value>org.apache.hadoop.io.Text</value></property><property><name>mapreduce.map.output.value.class</name><value>org.apache.hadoop.io.IntWritable</value></property><property><name>duce.class</name><value>wordcount2.WordCount2$IntSumReducer</value></property><property><name>mapreduce.job.output.key.class</name><value>org.apache.hadoop.io.Text</value></property><property><name>mapreduce.job.output.value.class</name><value>org.apache.hadoop.io.IntWritable</value></property><property><name>mapreduce.input.fileinputformat.inputdir</name><value>${nameNode}/${oozieDataRoot}/${inputDir}</value></property><property><name>mapreduce.output.fileoutputformat.outputdir</name><value>${nameNode}/${oozieDataRoot}/${outputDir}</value></property></configuration></map-reduce><ok to="end"/><error to="fail"/></action><kill name="fail"><message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/>
</workflow-app>
注意:官网给的例子中的Map-Reduce,使用的是旧的api,当我们在集群上运行map-reduce程序时,使用的是新的api,故我们将api改成新的api:
<property><name>w-api</name><value>true</value>
</property>
<property><name&w-api</name><value>true</value>
</property>
将wordcount2.jar 放到 /home/cloudera/oozie-apps/mr-wordcount-wf/lib
下即可
# oozie job -oozie master:11000/oozie -config oozie-apps/mr-wordcount-wf/job.properties -run
或者
# export OOZIE_URL="master:11000/oozie"
# oozie job -oozie oozie-apps/mr-wordcount-wf/job.properties -run
官方文档:.1.0-cdh5.11.1/DG_HiveActionExtension.html
其实每种 Action的创建方式都类似:
我们模仿examples/apps下的hive,编写自己的Hive Action.
首先,在hive中创建数据库,以方便我们进行hive 查询。数据库的创建我们按照《Hive应用实例:WordCount》创建 word_count 数据库。
Hive Action 目录结构为:
# ls hive-select/
l job.properties lib wordcount.sql l
job.properties:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
oozieAppsRoot=user/cloudera/oozie-apps
oozieDataRoot=user/cloudera/oozie/datas
oozie.use.system.libpath=trueoozie.wf.application.path=${nameNode}/${oozieAppsRoot}/hive-select/outputDir=hive-select/output
wordcount.sql:
insert overwrite directory '${OUTPUT}'
select * from default.word_count;
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="wf-hive-select"><start to="hive-node"/><action name="hive-node"><hive xmlns="uri:oozie:hive-action:0.5"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><prepare><delete path="${nameNode}/${oozieDataRoot}/${outputDir}"/></prepare><job-xml>${nameNode}/${oozieAppsRoot}/l</job-xml><configuration><property><name>mapred.job.queue.name</name><value>${queueName}</value></property></configuration><script>wordcount.sql</script><param>OUTPUT=${nameNode}/${oozieDataRoot}/${outputDir}</param></hive><ok to="end"/><error to="fail"/></action><kill name="fail"><message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/>
</workflow-app>
运行:
# oozie job -config oozie-apps/hive-select/job.properties -run
官方文档:.1.0-cdh5.11.1/DG_SqoopActionExtension.html
MariaDB [test]> select * _user;
+----------+------+
| name | age |
+----------+------+
| zhangsan | 14 |
| lisi | 34 |
| wangwu | 55 |
+----------+------+
3 rows in set (0.00 sec)
# sqoop import --connect jdbc:mysql://master:3306/test --username root --password 123456 --table my_user --target-dir /user/cloudera/oozie/datas/sqoop-import-user/output --fields-terminated-by "t" --num-mappers 1
注意:Sqoop命令中 –fields-terminated-by “t” 只能使用双引号” “,不能使用单引号’ ’
并且,经验证Sqoop使用的是新的api,因此l中需要加入如下配置
<property><name>w-api</name><value>true</value>
</property>
<property><name&w-api</name><value>true</value>
</property>
Sqoop Action 目录结构:
# ls sqoop-import-user/
job.properties lib l
lib:目录下存放mysql数据库驱动
job.properties:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
oozieAppsRoot=user/cloudera/oozie-apps
oozieDataRoot=user/cloudera/oozie/datas
oozie.use.system.libpath=trueoozie.wf.application.path=${nameNode}/${oozieAppsRoot}/sqoop-import-user/outputDir=sqoop-import-user/output
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="sqoop-wf"><start to="sqoop-node"/><action name="sqoop-node"><sqoop xmlns="uri:oozie:sqoop-action:0.3"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><prepare><delete path="${nameNode}/${oozieDataRoot}/${outputDir}"/></prepare><configuration><property><name>mapred.job.queue.name</name><value>${queueName}</value></property></configuration><command>import --connect jdbc:mysql://master:3306/test --username root --password 123456 --table my_user --target-dir /user/cloudera/oozie/datas/sqoop-import-user/output --fields-terminated-by "t" --num-mappers 1</command></sqoop><ok to="end"/><error to="fail"/></action><kill name="fail"><message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/>
</workflow-app>
官方文档:.1.0-cdh5.11.1/DG_ShellActionExtension.html
# ls shell-hive-select/
job.properties user-select.sh user-select.sql l
job.properties:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
oozieAppsRoot=user/cloudera/oozie-apps
oozieDataRoot=user/cloudera/oozie/datasoozie.wf.application.path=${nameNode}/${oozieAppsRoot}/shell-hive-select/exec=user-select.sh
script=user-select.sql
<workflow-app xmlns="uri:oozie:workflow:0.5" name="shell-wf"><start to="shell-node"/><action name="shell-node"><shell xmlns="uri:oozie:shell-action:0.2"><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><configuration><property><name>mapred.job.queue.name</name><value>${queueName}</value></property></configuration><exec>${exec}</exec><file>${nameNode}/${oozieAppsRoot}/shell-hive-select/${exec}#${exec}</file><file>${nameNode}/${oozieAppsRoot}/shell-hive-select/${script}#${script}</file><capture-output/></shell><ok to="end"/><error to="fail"/></action><kill name="fail"><message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/>
</workflow-app>
Shell Action 没运行成功,有空再整
Oozie使用的是UTC
Cloudera Manager中修改Oozie时区:
Cloudera Oozie –> 配置 –> Oozie Server(范围) –> 高级(类别) –> l 的 Oozie Server 高级配置代码段(安全阀)
即:
<property><name>oozie.processing.timezone</name><value>GMT+0800</value>
</property>
当运行Corrdinator时出现,如下错误:
E1003: Invalid coordinator application attributes, Coordinator job with frequency [1] minutes is faster than allowed maximum of 5 minutes (d.check.maximum.frequency is set to true
配置如下:
# vi /opt/cloudera/parcels/CDH/lib/oozie/webapps/oozie/oozie-console.js
即:
function getTimeZone() {Ext.state.Manager.setProvider(new Ext.state.CookieProvider());return Ext.("TimezoneId","GMT+0800");
}
例子:
# ls cron-schedule/
l job.properties l
Coordinator中包含Workflow(job.properties、l)并多了一个触发文件l)。
job.properties:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
oozieAppsRoot=user/cloudera/oozie-apps
oozieDataRoot=user/cloudera/d.application.path=${nameNode}/${oozieAppsRoot}/cron-schedule
start=2017-08-14T19:15+0800
end=2017-08-14T19:19+0800
workflowAppUri=${nameNode}/${oozieAppsRoot}/cron-schedule
<workflow-app xmlns="uri:oozie:workflow:0.5" name="no-op-wf"><start to="end"/><end name="end"/>
</workflow-app>
<coordinator-app name="cron-coord" frequency="${coord:minutes(1)}" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.4"><action><workflow><app-path>${workflowAppUri}</app-path><configuration><property><name>jobTracker</name><value>${jobTracker}</value></property><property><name>nameNode</name><value>${nameNode}</value></property><property><name>queueName</name><value>${queueName}</value></property></configuration></workflow></action>
</coordinator-app>
上面的例子,只是定时任务执行了一个空的workflow,下面的例子演示了一个定时任务执行MapReduce Action的任务:
# ls cron
l job.properties lib l
其中,job.properties、lib和l是上面MapReduce Action中的mr-wordcount-wf程序。然后只需要编写l中的定时任务即可。
job.properties:
nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
oozieAppsRoot=user/cloudera/oozie-apps
oozieDataRoot=user/cloudera/d.application.path=${nameNode}/${oozieAppsRoot}/cron
start=2017-08-14T21:08+0800
end=2017-08-14T21:12+0800
workflowAppUri=${nameNode}/${oozieAppsRoot}/croninputDir=mr-wordcount-wf/input
outputDir=mr-wordcount-wf/output
<workflow-app xmlns="uri:oozie:workflow:0.5" name="mr-wordcount-wf"><start to="mr-node-wordcount"/><action name="mr-node-wordcount"><map-reduce><job-tracker>${jobTracker}</job-tracker><name-node>${nameNode}</name-node><prepare><delete path="${nameNode}/${oozieDataRoot}/${outputDir}"/></prepare><configuration><property><name>w-api</name><value>true</value></property><property><name&w-api</name><value>true</value></property><property><name>mapreduce.job.queuename</name><value>${queueName}</value></property><property><name>mapreduce.job.map.class</name><value>wordcount2.WordCount2$TokenizerMapper</value></property><property><name>mapreduce.map.output.key.class</name><value>org.apache.hadoop.io.Text</value></property><property><name>mapreduce.map.output.value.class</name><value>org.apache.hadoop.io.IntWritable</value></property><property><name>duce.class</name><value>wordcount2.WordCount2$IntSumReducer</value></property><property><name>mapreduce.job.output.key.class</name><value>org.apache.hadoop.io.Text</value></property><property><name>mapreduce.job.output.value.class</name><value>org.apache.hadoop.io.IntWritable</value></property><property><name>mapreduce.input.fileinputformat.inputdir</name><value>${nameNode}/${oozieDataRoot}/${inputDir}</value></property><property><name>mapreduce.output.fileoutputformat.outputdir</name><value>${nameNode}/${oozieDataRoot}/${outputDir}</value></property></configuration></map-reduce><ok to="end"/><error to="fail"/></action><kill name="fail"><message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message></kill><end name="end"/>
</workflow-app>
<coordinator-app name="cron-coord" frequency="0/2 * * * *" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.4"><action><workflow><app-path>${workflowAppUri}</app-path><configuration><property><name>jobTracker</name><value>${jobTracker}</value></property><property><name>nameNode</name><value>${nameNode}</value></property><property><name>queueName</name><value>${queueName}</value></property></configuration></workflow></action>
</coordinator-app>
作业:
workflow: 多个action
案例:
- start node
- hive action:table result –> hdfs
- sqoop action: hdfs –> mysql
- end
- kill
在Hive Table中,提供了一些列的属性,方便进行操作。
INPUT_FILE_NAME:数据所在文件名称
本文发布于:2024-01-31 20:10:22,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170670302431063.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |