In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the "tutorial on the use of azkaban". In the daily operation, I believe that many people have doubts about the use of azkaban. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts of "tutorial on the use of azkaban"! Next, please follow the editor to study!
Log in
Https://localhost:8443
Note that it is https and uses the jetty ssl link. Enter the account password azkaban/azkanban (if you haven't changed it before)
There are four menus on the home page
Projects: the most important part is to create a project in which all flows will run.
Scheduling: displaying scheduled tasks
Executing: displays the currently running task
History: show history run tasks
Mainly introduce the part of projects
First create a project and fill in the name and description, such as o2olog.
Type=commandcommand=echo "data 2 hive"
A simple job is created. Explain that type's command tells azkaban to run with unix native commands, such as native commands or shell scripts, and of course there are other types, as described later.
It is impossible for a project to have only one job. We now create multiple dependent job, which is the primary goal of adopting azkaban.
Flows creation
We said that multiple jobs and their dependencies make up flow. How to create a dependency, just specify the dependencies parameter. For example, you need to clean the data before importing hive, upload the data before cleaning, and obtain the log from ftp before uploading.
Define 5 job:
O2o_2_hive.job: put the cleaned data into the hive library
O2o_clean_data.job: call mr to clean hdfs data
O2o_up_2_hdfs.job: uploading files to hdfs
O2o_get_file_ftp1.job: get logs from ftp1
O2o_get_file_fip2.job: get logs from ftp2
Dependencies:
(3) dependence 4 and 5, 2 dependence, 3 quotient, 1 dependence, 2, 4 and 5 have no dependency relationship.
O2o_2_hive.job
Type=command# executes the sh script, which is recommended. You only need to maintain the script later. Azkaban defines the workflow command=sh / job/o2o_2_hive.shdependencies=o2o_clean_data.
O2o_clean_data.job
Type=command# executes the sh script, which is recommended. You only need to maintain the script later. Azkaban defines the workflow command=sh / job/o2o_clean_data.shdependencies=o2o_up_2_hdfs.
O2o_up_2_hdfs.job
Type=command# needs to configure the hadoop command. It is recommended to write it into shell. You can later maintain command=hadoop fs-put / data/*# multiple dependencies and separate dependencies=o2o_get_file_ftp1,o2o_get_file_ftp2 with commas.
O2o_get_file_ftp1.job
Type=commandcommand=wget "ftp://file1"-O / data/file1
O2o_get_file_ftp2.job
Type=commandcommand=wget "ftp:file2"-O / data/file2
You can run either the unix command or the python script (highly recommended). Pack the above job into a zip package.
Ps: in order to test the process, I changed all the above command to echo + corresponding commands
Upload:
Click o2o_2_hive to enter the process, and the azkaban process name is defined with the last job that has no dependencies.
At the top right is the configuration to execute the current process or to execute the timed process.
Flow view: process view. Can be disabled and enabled for some job
Notification: defines whether a task succeeds or fails to send an email
Failure Options: failed to define a job, how to execute the rest of the job
Concurrent: parallel task execution settings
Flow Parametters: parameter setting.
1. Execute once
Set the above parameters and click execute.
Green represents success, blue is running, and red is failure. You can view the job runtime, dependencies and logs, and click details to view the running status of each job.
We can click Detail to view the detailed output and report the error.
two。 Timing execution
Other job configuration options
You can define that job depends on another flow, configuring
Type=flowflow.name=fisrt_flow
You can set each job subcommand
Type=commandcommand=echo "hello" command.1=echo "world"
You can configure the number of failed restarts of job, and the interval between them. For example, I can configure 12 retries every 5 minutes for the above ftp to get logs.
Type=commandcommand=wget "ftp://file1"-O / data/file1retries=12# unit millisecond retry.backoff=300000" this is the end of the study on the "tutorial on the use of azkaban". I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.