In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
In this issue, the editor will bring you about how to analyze the local Jar package model of Spring Batch remote partition. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.
1 preface
Spring Batch remote partitioning is very good at dealing with large amounts of data, and it can be implemented in many ways, such as local Jar package pattern, MQ pattern, and Kubernetes pattern. The three modes are as follows:
(1) Local Jar package mode: the worker processed by the partition is a Java process, which starts from the jar package and passes parameters through jvm parameters and database. Sample code is provided officially.
(2) MQ mode: worker is a resident process, and Manager and Worker pass parameters through message queues; there is a lot of sample code on the Internet.
(3) Kubernetes mode: worker starts Pod directly for the Pod,Manager in K8s to process; no sample code is found on the Internet.
The first mode (native Jar package mode) is explained in code below, and the rest will be described later.
It is recommended to read the following article to learn about it:
Introduction to Spring Batch: introduction to Spring Batch through examples, excellent batch processing framework
Introduction to Spring Batch parallel processing: a large amount of data is not a problem. A preliminary study on four modes of Spring Batch parallel processing
2 code explanation
In the code of this article, Manager and Worker are put together, and in the same project, we can only pack a jar package; we use profile to distinguish between manager and worker, that is, through Spring Profile to achieve one code, two pieces of logic. In fact, it can be split into two pieces of code, but it is more convenient to test together, and the amount of code is small, so it is not necessary.
2.1 Project preparation 2.1.1 Database
First of all, we need to prepare a database, because both Manager and Worker need to synchronize the state to the DB, so we can't use the embedded in-memory database directly. We need an externally accessible database. I am using H2 Database here. For installation, please refer to: deploy H2 database from jar package to Kubernetes, and solve the problem that Ingress does not support TCP.
2.1.2 introducing dependencies
The dependency introduced by maven is as follows:
Org.springframework.boot spring-boot-starter-batch org.springframework.cloud spring-cloud-starter-task com.h3database h3 runtime org.springframework.cloud spring-cloud-deployer-local 2.4.1 org.springframework.batch spring-batch-integration
Spring-cloud-deployer-local is critical for deploying and launching worker; the rest are Spring Batch and Task-related dependencies; and database connections.
2.1.3 main class entry
The main class entry of Springboot is as follows:
@ EnableTask@SpringBootApplication@EnableBatchProcessingpublic class PkslowRemotePartitionJar {public static void main (String [] args) {SpringApplication.run (PkslowRemotePartitionJar.class, args);}}
On the basis of Springboot, support for Spring Batch and Spring Cloud Task has been added.
2.2 key code writing
There is not much to say about the previous database building and other code, so let's start writing the key code.
2.2.1 Partition Management Partitioner
Partitioner is the core bean in remote partitions, which defines how many partitions are divided, how to partition, and what variables to pass to worker. It returns a set of key-value pairs, that is, Map. Put the variables to be passed to worker in ExecutionContext, and support multiple types of variables, such as String, int, long, and so on. In fact, we don't recommend passing too much data through ExecutionContext; you can pass some identifiers or primary keys, and then worker can get the data itself.
The specific code is as follows:
Private static final int GRID_SIZE = 4 leading Beanpublic Partitioner partitioner () {return new Partitioner () {@ Override public Map partition (int gridSize) {Map partitions = new HashMap (gridSize); for (int I = 0; I < GRID_SIZE; ionization +) {ExecutionContext executionContext = new ExecutionContext (); executionContext.put ("partitionNumber", I); partitions.put ("partition" + I, executionContext);} return partitions;}} }
The above is divided into four sections, and the program starts four worker to process; the parameter passed to worker is partitionNumber.
2.2.2 Partition processor PartitionHandler
PartitionHandler is also the core bean, which determines how to start worker and what jvm parameters are passed to them (unlike previous ExecutionContext passes).
Beanpublic PartitionHandler partitionHandler (TaskLauncher taskLauncher, JobExplorer jobExplorer, TaskRepository taskRepository) throws Exception {Resource resource = this.resourceLoader.getResource (workerResource); DeployerPartitionHandler partitionHandler = new DeployerPartitionHandler (taskLauncher, jobExplorer, resource, "workerStep", taskRepository); List commandLineArgs = new ArrayList (3); commandLineArgs.add ("- spring.profiles.active=worker"); commandLineArgs.add ("- spring.cloud.task.initialize-enabled=false"); commandLineArgs.add ("- spring.batch.initializer.enabled=false"); partitionHandler .setCommandLineArgsProvider (new PassThroughCommandLineArgsProvider (commandLineArgs)) PartitionHandler .setEnvironmental VariablesProvider (new SimpleEnvironmentVariablesProvider (this.environment)); partitionHandler.setMaxWorkers (2); partitionHandler.setApplicationName ("PkslowWorkerJob"); return partitionHandler;}
In the above code:
Resource is the jar packet address of worker, indicating that the program will be started
WorkerStep is the step that worker will execute
CommandLineArgs defines the jvm parameters to start worker, such as-- spring.profiles.active=worker
Environment is the system environment variable of manager, which can be passed to worker or not.
MaxWorkers is the maximum number of worker that can be started at the same time, similar to the thread pool size; set to 2, which means that there are up to two worker to handle four partitions at the same time.
2.2.3 Batch definition of Manager and Worker
After completing the partition-related code, all that is left is how to define the business code for Manager and Worker.
As a manager, Manager does not need too much business logic. The code is as follows:
@ Bean@Profile ("! worker") public Job partitionedJob (PartitionHandler partitionHandler) throws Exception {Random random = new Random (); return this.jobBuilderFactory.get ("partitionedJob" + random.nextInt ()) .start (step1 (partitionHandler)) .build ();} @ Beanpublic Step step1 (PartitionHandler partitionHandler) throws Exception {return this.stepBuilderFactory.get ("step1") .partitioning (workerStep (). GetName (), partitioner ()) .step (workerStep ()) .partitionHandler (partitionHandler) .build ();}
Worker is mainly used to process data and is our business code. Here we demonstrate how to obtain the partitionNumber passed by Manager:
@ Beanpublic Step workerStep () {return this.stepBuilderFactory.get ("workerStep") .tasklet (workerTasklet (null, null)) .build () @ Bean@StepScopepublic Tasklet workerTasklet (final @ Value ("# {stepExecutionContext ['partitionNumber']}") Integer partitionNumber) {return new Tasklet () {@ Override public RepeatStatus execute (StepContribution contribution, ChunkContext chunkContext) throws Exception {Thread.sleep (6000) / / increase the delay to view the effect. Through jps: in the case of jar, a new java process System.out.println ("This tasklet ran partition:" + partitionNumber); return RepeatStatus.FINISHED;}};}
Get the variables passed by Manager through the expression @ Value ("# {stepExecutionContext ['partitionNumber']}"); note @ StepScope.
3 program running
Because we are divided into Manager and Worker, but they are all the same code, so let's package a jar first, otherwise manager will not start. The jar package addresses for the configuration database and Worker are as follows:
Spring.datasource.url=jdbc:h3:tcp://localhost:9092/testspring.datasource.username=pkslowspring.datasource.password=pkslowspring.datasource.driver-class-name=org.h3.Driverpkslow.worker.resource= file://pkslow/target/remote-partitioning-jar-1.0-SNAPSHOT.jar
The execution procedure is as follows:
You can see that the Java program has been started four times and the log path is also given.
Looking through the jps command, you can see one Manager process and two worker processes:
4 complex variable transfer
I mentioned earlier that Manager can pass variables through ExecutionContext, such as simple String, long, and so on. But it can also pass complex Java objects, but the corresponding classes need to be serializable, such as:
Import java.io.Serializable;public class Person implements Serializable {private Integer age; private String name; private String webSite; / / getter and setter}
Manager delivery:
ExecutionContext.put ("person", new Person (0, "pkslow", "www.pkslow.com"))
Worker receive:
@ Value ("# {stepExecutionContext ['person']}") Person person
The above is how to analyze the local Jar package model of Spring Batch remote partition. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 220
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.