Deployment and simple use of azkaban 07/06 Update SLTechnology News&Howtos

Deployment and simple use of azkaban

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Introduction to Workflow Scheduler (1) Why use Workflow Scheduler?

-A complete data analysis system usually consists of a large number of task units: shell scripts, java programs, mapreduce programs, hive scripts, etc.

-there are temporal and antecedent dependencies among task units.

-in order to organize such a complex execution plan well, a workflow scheduling system is needed to schedule execution

(2) Common workflow scheduler

in the field of hadoop, the common workflow schedulers are Oozie, Azkaban,Cascading,Hamake and so on.

(3) comparison between Oozie and Azkaban

the two most popular schedulers in the enterprise are Oozie and Azkaban. Generally speaking, compared with azkaban, ooize is a heavyweight task scheduling system with comprehensive functions, but it is also more complex to configure and use. Lightweight scheduler azkaban is a good candidate if you can ignore the lack of some functionality.

The difference between the two can be described in the following aspects:

function

Both and can schedule mapreduce,pig,java, and both script workflow tasks can execute workflow tasks regularly.

Definition of work

Azkaban uses Properties files to define workflows

Oozie uses XML files to define workflows

workflow parameter transfer

Azkaban supports passing parameters directly, such as ${input}

Oozie supports parameters and EL expressions, such as ${fs:dirSize (myInputDir)} strust2 (ONGL)

timing execution

Scheduled execution of tasks in Azkaban is based on time.

Scheduled execution tasks for Oozie are based on time and input data

resource management

Azkaban has strict permission control, such as users read / write / execute workflow, etc.

Oozie has no strict permission control for the time being.

workflow execution

Azkaban has two modes of operation, namely solo server mode (executor server and web server are deployed on the same node) and multi server mode (executor server and web server can be deployed on different nodes)

Oozie runs as a workflow server and supports multiple users and workflows

workflow management

Azkaban supports browsers and ajax-based workflows

Oozie supports command line, HTTP REST, Java API, browser operation workflow

2.Azkaban installation and deployment

Azkaban is a batch workflow task scheduler open source by Linkedin. Used to run a set of workflows and processes in a specific order within a workflow.

Functional Features of Azkaban

The Web user interface facilitates uploading workflows, facilitates setting up relationships between tasks, task flow authentication / authorization can kill and re-perform tasks modular and pluggable plug-in mechanisms, workflows and task logging audits

Field installation of Azkaban:

Install the package:

Azkaban Web server: azkaban-web-server-2.5.0.tar.gz

Azkaban Excutor execution server: azkaban-executor-server-2.5.0.tar.gz

Azkaban initialization script file: azkaban-sql-script-2.5.0.tar.gz

Download address: http://azkaban.github.io/downloads.html

① unzipped installation package

② installation Azkaban script Import

[root@hadoop03] # tar-zxvf azkaban-sql-script-2.5.0.tar.gz-C apps/azkaban/ # enter MySQL to execute script: mysql > create database azkaban; Query OK, 1 row affected (0.01sec) mysql > use azkaban; Database changed mysql > source / home/hadoop/apps/azkaban/azkaban-script-2.5.0/create-all-sql-2.5.0.sql

③ creates SSL configuration

# preferably in the azkaban directory: [root@hadoop03 ~] # keytool-keystore keystore-alias jetty-genkey-keyalg RSA

# after the execution of this command, you will be prompted to enter the password and corresponding information of the current generated keystore. Please remember to enter the password as follows:

Then the keystore certificate file will be generated in the current directory and the keystore will be copied to the azkaban web server root directory. Such as:

[root@hadoop03 ~] # cp keystore azkaban/azkaban-web-2.5.0

④ modify configuration file

# first configure the time zone configuration file Asia/Shanghai on the server node, then use the interactive command tzselect to copy the time zone file, overwriting the local time zone configuration of the system [hadoop@hadoop03 ~] $sudo cp / usr/share/zoneinfo/Asia/Shanghai / etc/localtime # azkaban web server configuration into the azkaban web server installation directory conf directory [hadoop@hadoop03 ~] $cd apps/azkaban/azkaban-web-2.5.0/conf/

# modify azkaban.properties file

# user configuration For specific configuration, please see the following # Loader for projects # global configuration file location executor.global.properties=/home/hadoop/apps/azkaban/azkaban-executor-2.5.0/conf/global.properties azkaban.project.dir=projects # database type database.type=mysql # port number mysql.port=3306 # database connection IPmysql.host=hadoop03 # database instance name Mysql.database=azkaban # Database username mysql.user=root # Database password mysql.password=root # maximum number of connections mysql.numconnections=100 # Velocity dev mode velocity.dev.mode=false # Jetty server attributes. # maximum number of threads jetty.maxThreads=25 # Jetty SSL port jetty.ssl.port=8443 # Jetty port jetty.port=8081 # SSL file name jetty.keystore=/home/hadoop/apps/azkaban/azkaban-web-2.5.0/keystore#SSL file password jetty.password=hadoop # Jetty master password same as keystore file jetty.keypassword=hadoop # SSL text File name jetty.truststore=/home/hadoop/apps/azkaban/azkaban-web-2.5.0/keystore # SSL File password jetty.trustpassword=hadoop # execute Server Properties executor.port=12321 # execute Server Port # Mail Settings (optional) mail.sender=xxxxxxxx@163.com # send mailbox mail.host=smtp.163.com # send mailbox smtp address mail.user=xxxxxxxx # name mail.password=* # password job.failure.email=xxxxxxxx@163 when sending email Com # address to send mail when task fails job.success.email=xxxxxxxx@163.com # address to send mail when task succeeds lockdown.create.projects=false # cache.directory=cache # cache directory

# enter the azkaban web server conf directory and modify the azkaban-users.xml user configuration

# azkaban execute server executor configuration, enter the execution server installation directory conf, and modify azkaban.properties

# Azkaban default.timezone.id=Asia/Shanghai # time Zone # Azkaban JobTypes plug-in configuration Location of the plug-in azkaban.jobtype.plugin.dir=/home/hadoop/apps/azkaban/azkaban-executor-2.5.0/plugins/jobtypes # Loader for projects executor.global.properties=/home/hadoop/apps/azkaban/azkaban-executor-2.5.0/conf/global.properties azkaban.project.dir=projects # Database Settings # Database Type (currently only mysql is supported) database.type=mysql # Database Port number mysql.port=3306 # Database IP address mysql .host = hadoop03 # Database instance name mysql.database=azkaban # Database user name mysql.user=root # Database password mysql.password=root # maximum number of connections mysql.numconnections=100 # execute server configuration # maximum number of threads executor.maxThreads=50 # port number (for example, modify Please be consistent with web service) number of executor.port=12321 # threads executor.flow.threads=30

⑤ configuration environment variables

[hadoop@hadoop03 ~] $vim / etc/profile#/etc/profileexport AZKABAN_WEB_HOME=/home/hadoop/apps/azkaban/azkaban-web-2.5.0 export AZKABAN_EXE_HOME=/home/hadoop/apps/azkaban/azkaban-executor-2.5.0 export PATH=$PATH:$AZKABAN_WEB_HOME/bin:$AZKABAN_EXE_HOME/bin

⑥ start

# start the web server

Nohup azkaban-web-start.sh 1 > / home/hadoop/azwebstd.out 2 > / home/hadoop/azweberr.out &

# start the execution server

Nohup azkaban-executor-start.sh 1 > / home/hadoop/azexstd.out 2 > / home/hadoop/azexerr.out &

⑥ verifies whether the login is successful

Enter: https://hadoop03:8443/ in the browser

Seeing the above interface indicates that the installation is successful!

It is recommended to change the web and executor configuration files of Azkaban to absolute paths, otherwise you will often report problems that can not be found in the files!

3.Azkaban installation and deployment error resolution

The reason is that a package called derby.jar is missing in azkaban's server and executor.

Solution: in the installed JDK:

Cp $JAVA_HOME/db/lib/derby.jar $AZKABAN_WEB_HOME/extlibcp $JAVA_HOME/db/lib/derby.jar $AZKABAN_EXE_HOME/extlib

If you encounter permission problems with MySQL:

Please refer to the https://blog.51cto.com/14048416/2344516 article.

Simple use of 3.Azkaban

① creates job: command.job

# command.job type=command command=echo 'hello'

② packages job resource files

[hadoop hadoop03@~] $zip command.job

③ creates project and uploads compressed package through azkaban web management platform

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.