A tutorial on using azkaban 07/13 Update SLTechnology News&Howtos

A tutorial on using azkaban

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the "tutorial on the use of azkaban". In the daily operation, I believe that many people have doubts about the use of azkaban. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts of "tutorial on the use of azkaban"! Next, please follow the editor to study!

Https://localhost:8443

Note that it is https and uses the jetty ssl link. Enter the account password azkaban/azkanban (if you haven't changed it before)

There are four menus on the home page

Projects: the most important part is to create a project in which all flows will run.

Scheduling: displaying scheduled tasks

Executing: displays the currently running task

History: show history run tasks

Mainly introduce the part of projects

First create a project and fill in the name and description, such as o2olog.

Type=commandcommand=echo "data 2 hive"

A simple job is created. Explain that type's command tells azkaban to run with unix native commands, such as native commands or shell scripts, and of course there are other types, as described later.

It is impossible for a project to have only one job. We now create multiple dependent job, which is the primary goal of adopting azkaban.

Flows creation

We said that multiple jobs and their dependencies make up flow. How to create a dependency, just specify the dependencies parameter. For example, you need to clean the data before importing hive, upload the data before cleaning, and obtain the log from ftp before uploading.

Define 5 job:

O2o_2_hive.job: put the cleaned data into the hive library

O2o_clean_data.job: call mr to clean hdfs data

O2o_up_2_hdfs.job: uploading files to hdfs

O2o_get_file_ftp1.job: get logs from ftp1

O2o_get_file_fip2.job: get logs from ftp2

Dependencies:

(3) dependence 4 and 5, 2 dependence, 3 quotient, 1 dependence, 2, 4 and 5 have no dependency relationship.

O2o_2_hive.job

Type=command# executes the sh script, which is recommended. You only need to maintain the script later. Azkaban defines the workflow command=sh / job/o2o_2_hive.shdependencies=o2o_clean_data.

O2o_clean_data.job

Type=command# executes the sh script, which is recommended. You only need to maintain the script later. Azkaban defines the workflow command=sh / job/o2o_clean_data.shdependencies=o2o_up_2_hdfs.

O2o_up_2_hdfs.job

Type=command# needs to configure the hadoop command. It is recommended to write it into shell. You can later maintain command=hadoop fs-put / data/*# multiple dependencies and separate dependencies=o2o_get_file_ftp1,o2o_get_file_ftp2 with commas.

O2o_get_file_ftp1.job

Type=commandcommand=wget "ftp://file1"-O / data/file1

O2o_get_file_ftp2.job

Type=commandcommand=wget "ftp:file2"-O / data/file2

You can run either the unix command or the python script (highly recommended). Pack the above job into a zip package.

Ps: in order to test the process, I changed all the above command to echo + corresponding commands

Upload:

Click o2o_2_hive to enter the process, and the azkaban process name is defined with the last job that has no dependencies.

At the top right is the configuration to execute the current process or to execute the timed process.

Flow view: process view. Can be disabled and enabled for some job

Notification: defines whether a task succeeds or fails to send an email

Failure Options: failed to define a job, how to execute the rest of the job

Concurrent: parallel task execution settings

Flow Parametters: parameter setting.

1. Execute once

Set the above parameters and click execute.

Green represents success, blue is running, and red is failure. You can view the job runtime, dependencies and logs, and click details to view the running status of each job.

We can click Detail to view the detailed output and report the error.

two。 Timing execution

Other job configuration options

You can define that job depends on another flow, configuring

Type=flowflow.name=fisrt_flow

You can set each job subcommand

Type=commandcommand=echo "hello" command.1=echo "world"

You can configure the number of failed restarts of job, and the interval between them. For example, I can configure 12 retries every 5 minutes for the above ftp to get logs.

Type=commandcommand=wget "ftp://file1"-O / data/file1retries=12# unit millisecond retry.backoff=300000" this is the end of the study on the "tutorial on the use of azkaban". I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.