In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. Introduction to Workflow Scheduler (1) Why use Workflow Scheduler?
-A complete data analysis system usually consists of a large number of task units: shell scripts, java programs, mapreduce programs, hive scripts, etc.
-there are temporal and antecedent dependencies among task units.
-in order to organize such a complex execution plan well, a workflow scheduling system is needed to schedule execution
(2) Common workflow scheduler
in the field of hadoop, the common workflow schedulers are Oozie, Azkaban,Cascading,Hamake and so on.
(3) comparison between Oozie and Azkaban
the two most popular schedulers in the enterprise are Oozie and Azkaban. Generally speaking, compared with azkaban, ooize is a heavyweight task scheduling system with comprehensive functions, but it is also more complex to configure and use. Lightweight scheduler azkaban is a good candidate if you can ignore the lack of some functionality.
The difference between the two can be described in the following aspects:
function
Both and can schedule mapreduce,pig,java, and both script workflow tasks can execute workflow tasks regularly.
Definition of work
Azkaban uses Properties files to define workflows
Oozie uses XML files to define workflows
workflow parameter transfer
Azkaban supports passing parameters directly, such as ${input}
Oozie supports parameters and EL expressions, such as ${fs:dirSize (myInputDir)} strust2 (ONGL)
timing execution
Scheduled execution of tasks in Azkaban is based on time.
Scheduled execution tasks for Oozie are based on time and input data
resource management
Azkaban has strict permission control, such as users read / write / execute workflow, etc.
Oozie has no strict permission control for the time being.
workflow execution
Azkaban has two modes of operation, namely solo server mode (executor server and web server are deployed on the same node) and multi server mode (executor server and web server can be deployed on different nodes)
Oozie runs as a workflow server and supports multiple users and workflows
workflow management
Azkaban supports browsers and ajax-based workflows
Oozie supports command line, HTTP REST, Java API, browser operation workflow
2.Azkaban installation and deployment
Azkaban is a batch workflow task scheduler open source by Linkedin. Used to run a set of workflows and processes in a specific order within a workflow.
Functional Features of Azkaban
The Web user interface facilitates uploading workflows, facilitates setting up relationships between tasks, task flow authentication / authorization can kill and re-perform tasks modular and pluggable plug-in mechanisms, workflows and task logging audits
Field installation of Azkaban:
Install the package:
Azkaban Web server: azkaban-web-server-2.5.0.tar.gz
Azkaban Excutor execution server: azkaban-executor-server-2.5.0.tar.gz
Azkaban initialization script file: azkaban-sql-script-2.5.0.tar.gz
Download address: http://azkaban.github.io/downloads.html
① unzipped installation package
② installation Azkaban script Import
[root@hadoop03] # tar-zxvf azkaban-sql-script-2.5.0.tar.gz-C apps/azkaban/ # enter MySQL to execute script: mysql > create database azkaban; Query OK, 1 row affected (0.01sec) mysql > use azkaban; Database changed mysql > source / home/hadoop/apps/azkaban/azkaban-script-2.5.0/create-all-sql-2.5.0.sql
③ creates SSL configuration
# preferably in the azkaban directory: [root@hadoop03 ~] # keytool-keystore keystore-alias jetty-genkey-keyalg RSA
# after the execution of this command, you will be prompted to enter the password and corresponding information of the current generated keystore. Please remember to enter the password as follows:
Then the keystore certificate file will be generated in the current directory and the keystore will be copied to the azkaban web server root directory. Such as:
[root@hadoop03 ~] # cp keystore azkaban/azkaban-web-2.5.0
④ modify configuration file
# first configure the time zone configuration file Asia/Shanghai on the server node, then use the interactive command tzselect to copy the time zone file, overwriting the local time zone configuration of the system [hadoop@hadoop03 ~] $sudo cp / usr/share/zoneinfo/Asia/Shanghai / etc/localtime # azkaban web server configuration into the azkaban web server installation directory conf directory [hadoop@hadoop03 ~] $cd apps/azkaban/azkaban-web-2.5.0/conf/
# modify azkaban.properties file
# user configuration For specific configuration, please see the following # Loader for projects # global configuration file location executor.global.properties=/home/hadoop/apps/azkaban/azkaban-executor-2.5.0/conf/global.properties azkaban.project.dir=projects # database type database.type=mysql # port number mysql.port=3306 # database connection IPmysql.host=hadoop03 # database instance name Mysql.database=azkaban # Database username mysql.user=root # Database password mysql.password=root # maximum number of connections mysql.numconnections=100 # Velocity dev mode velocity.dev.mode=false # Jetty server attributes. # maximum number of threads jetty.maxThreads=25 # Jetty SSL port jetty.ssl.port=8443 # Jetty port jetty.port=8081 # SSL file name jetty.keystore=/home/hadoop/apps/azkaban/azkaban-web-2.5.0/keystore#SSL file password jetty.password=hadoop # Jetty master password same as keystore file jetty.keypassword=hadoop # SSL text File name jetty.truststore=/home/hadoop/apps/azkaban/azkaban-web-2.5.0/keystore # SSL File password jetty.trustpassword=hadoop # execute Server Properties executor.port=12321 # execute Server Port # Mail Settings (optional) mail.sender=xxxxxxxx@163.com # send mailbox mail.host=smtp.163.com # send mailbox smtp address mail.user=xxxxxxxx # name mail.password=* # password job.failure.email=xxxxxxxx@163 when sending email Com # address to send mail when task fails job.success.email=xxxxxxxx@163.com # address to send mail when task succeeds lockdown.create.projects=false # cache.directory=cache # cache directory
# enter the azkaban web server conf directory and modify the azkaban-users.xml user configuration
# azkaban execute server executor configuration, enter the execution server installation directory conf, and modify azkaban.properties
# Azkaban default.timezone.id=Asia/Shanghai # time Zone # Azkaban JobTypes plug-in configuration Location of the plug-in azkaban.jobtype.plugin.dir=/home/hadoop/apps/azkaban/azkaban-executor-2.5.0/plugins/jobtypes # Loader for projects executor.global.properties=/home/hadoop/apps/azkaban/azkaban-executor-2.5.0/conf/global.properties azkaban.project.dir=projects # Database Settings # Database Type (currently only mysql is supported) database.type=mysql # Database Port number mysql.port=3306 # Database IP address mysql .host = hadoop03 # Database instance name mysql.database=azkaban # Database user name mysql.user=root # Database password mysql.password=root # maximum number of connections mysql.numconnections=100 # execute server configuration # maximum number of threads executor.maxThreads=50 # port number (for example, modify Please be consistent with web service) number of executor.port=12321 # threads executor.flow.threads=30
⑤ configuration environment variables
[hadoop@hadoop03 ~] $vim / etc/profile#/etc/profileexport AZKABAN_WEB_HOME=/home/hadoop/apps/azkaban/azkaban-web-2.5.0 export AZKABAN_EXE_HOME=/home/hadoop/apps/azkaban/azkaban-executor-2.5.0 export PATH=$PATH:$AZKABAN_WEB_HOME/bin:$AZKABAN_EXE_HOME/bin
⑥ start
# start the web server
Nohup azkaban-web-start.sh 1 > / home/hadoop/azwebstd.out 2 > / home/hadoop/azweberr.out &
# start the execution server
Nohup azkaban-executor-start.sh 1 > / home/hadoop/azexstd.out 2 > / home/hadoop/azexerr.out &
⑥ verifies whether the login is successful
Enter: https://hadoop03:8443/ in the browser
Seeing the above interface indicates that the installation is successful!
It is recommended to change the web and executor configuration files of Azkaban to absolute paths, otherwise you will often report problems that can not be found in the files!
3.Azkaban installation and deployment error resolution
The reason is that a package called derby.jar is missing in azkaban's server and executor.
Solution: in the installed JDK:
Cp $JAVA_HOME/db/lib/derby.jar $AZKABAN_WEB_HOME/extlibcp $JAVA_HOME/db/lib/derby.jar $AZKABAN_EXE_HOME/extlib
If you encounter permission problems with MySQL:
Please refer to the https://blog.51cto.com/14048416/2344516 article.
Simple use of 3.Azkaban
① creates job: command.job
# command.job type=command command=echo 'hello'
② packages job resource files
[hadoop hadoop03@~] $zip command.job
③ creates project and uploads compressed package through azkaban web management platform
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.