How to manage Java thread pool and build a distributed Hadoop scheduling framework 04/19 Update SLTechnology News&Howtos

How to manage Java thread pool and build a distributed Hadoop scheduling framework

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

How to manage Java thread pool and build a distributed Hadoop scheduling framework? in order to solve this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Java thread pool management and distributed Hadoop scheduling framework.

Threads are an indispensable thing in normal development. For example, servlet in tomcat is a thread. How can we provide multi-user access without threads? However, many developers who have just come into contact with threads have suffered a lot from this. How to do a set of simple threaded development mode framework to let everyone quickly from single-threaded development to multi-threaded development, this is indeed a relatively difficult project.

What exactly is a thread? First of all, let's take a look at what a process is. A process is a program executed in the system that can use memory, processor, file system, and other related resources. For example, QQ software, Eclipse, Tomcat and so on are exe programs that run and start up as a process. Why do you need multithreading? If each process handles one thing separately and cannot handle multiple tasks at the same time, for example, we can only chat with one person when we open qq, we can't compile code when we develop code with eclipse, and we can only serve one user request when we request tomcat service, then I think we are still in primitive society. The purpose of multithreading is to enable a process to handle multiple things or requests at the same time. For example, the QQ software we use now can chat with multiple people at the same time, we can compile code when we develop code with eclipse, and tomcat can serve multiple user requests at the same time.

With so many benefits of threads, how can you turn a single-process program into a multi-threaded program? Different languages have different implementations. Here are two ways to implement multithreading in java: extending the java.lang.Thread class and implementing the java.lang.Runnable interface.

Let's look at an example. Suppose there are 100 pieces of data that need to be distributed and calculated. Take a look at the processing speed of a single thread:

Package thread

Import java.util.Vector

Public class OneMain {

Public static void main (String [] args) throws InterruptedException {

Vector list = new Vector

For (int I = 0; I

< 100; i++) { list.add(i); } long start = System.currentTimeMillis(); while (list.size() >

0) {

Int val = list.remove (0)

Thread. Sleep / / Analog processing

System. Out.println (val)

}

Long end = System.currentTimeMillis ()

System. Out.println ("consumption" + (end-start) + "ms")

}

/ / consume 10063 ms

}

Let's take a look at the processing speed of multithreading, which is processed by 10 threads:

[java] view plaincopy looks at the code chip derived from my code chip on CODE

Package thread

Import java.util.Vector

Import java.util.concurrent.CountDownLatch

Public class MultiThread extends Thread {

Static Vector list = new Vector

Static CountDownLatch count = new CountDownLatch (10)

Public void run () {

While (list.size () > 0) {

Try {

Int val = list.remove (0)

System.out.println (val)

Thread.sleep / / Analog processing

} catch (Exception e) {

/ / maybe the array is out of bounds. This place is just to illustrate the problem and ignore the error.

}

Count.countDown (); / / deleted successfully minus one

}

Public static void main (String [] args) throws InterruptedException {

For (int I = 0; I

< 100; i++) { list.add(i); } long start = System.currentTimeMillis(); for (int i = 0; i < 10; i++) { new MultiThread().start(); } count.await(); long end = System.currentTimeMillis(); System.out.println("消耗 " + (end - start) + " ms"); } // 消耗 1001 ms } 复制代码大家看到了线程的好处了吧！单线程需要10S，10个线程只需要1S。充分利用了系统资源实现并行计算。也许这里会产生一个误解，是不是增加的线程个数越多效率越高。线程越多处理性能越高这个是错误的，范式都要合适，过了就不好了。需要普及一下计算机硬件的一些知识。我们的cpu是个运算器，线程执行就需要这个运算器来运行。不过这个资源只有一个，大家就会争抢。一般通过以下几种算法实现争抢cpu的调度：队列方式，先来先服务。不管是什么任务来了都要按照队列排队先来后到。时间片轮转，这也是最古老的cpu调度算法。设定一个时间片，每个任务使用cpu的时间不能超过这个时间。如果超过了这个时间就把任务暂停保存状态，放到队列尾部继续等待执行。优先级方式：给任务设定优先级，有优先级的先执行，没有优先级的就等待执行。这三种算法都有优缺点，实际操作系统是结合多种算法，保证优先级的能够先处理，但是也不能一直处理优先级的任务。硬件方面为了提高效率也有多核cpu、多线程cpu等解决方案。目前看得出来线程增多了会带来cpu调度的负载增加，cpu需要调度大量的线程，包括创建线程、销毁线程、线程是否需要换出cpu、是否需要分配到cpu。这些都是需要消耗系统资源的，由此，我们需要一个机制来统一管理这一堆线程资源。线程池的理念提出解决了频繁创建、销毁线程的代价。线程池指预先创建好一定大小的线程等待随时服务用户的任务处理，不必等到用户需要的时候再去创建。特别是在java开发中，尽量减少垃圾回收机制的消耗就要减少对象的频繁创建和销毁。之前我们都是自己实现的线程池，不过随之jdk1.5的推出，jdk自带了java.util.concurrent并发开发框架，解决了我们大部分线程池框架的重复工作。可以使用Executors来建立线程池，列出以下大概的，后面再介绍。 newCachedThreadPool建立具有缓存功能线程池 newFixedThreadPool建立固定数量的线程 newScheduledThreadPool建立具有时间调度的线程有了线程池后有以下几个问题需要考虑：线程怎么管理，比如新建任务线程。线程如何停止、启动。线程除了scheduled模式的间隔时间定时外能否实现精确时间启动。比如晚上1点启动。线程如何监控，如果线程执行过程中死掉了，异常终止我们怎么知道。考虑到这几点，我们需要把线程集中管理起来，用java.util.concurrent是做不到的。需要做以下几点：将线程和业务分离，业务的配置单独做成一个表。构建基于concurrent的线程调度框架，包括可以管理线程的状态、停止线程的接口、线程存活心跳机制、线程异常日志记录模块。构建灵活的timer组件，添加quartz定时组件实现精准定时系统。和业务配置信息结合构建线程池任务调度系统。可以通过配置管理、添加线程任务、监控、定时、管理等操作。组件图为：

Can you build a thread scheduling framework to cope with the needs of a large number of computing? The answer is no. Because the resources of a machine are limited, it is also mentioned above that cpu is a time cycle, and more tasks will be queued up. Even if cpu is added, the cpu that a machine can carry is also limited. Therefore, it is necessary to make the entire thread pool framework into a distributed task scheduling framework to cope with horizontal expansion. For example, when the resources on a machine reach a bottleneck, immediately adding a machine to deploy the scheduling framework and business can increase computing power. All right, how to build it? As shown below:

The basic components of the previous distributed scheduling framework remain unchanged, with the following components and functions added:

By modifying the distributed scheduling framework, you can turn your own thread tasks into mapreduce tasks and submit them to the hadoop cluster.

Hadoop cluster can call spring and ibatis of business interface to access database.

The data needed by hadoop can be queried through hive.

Hadoop can access hdfs/hbase read and write operations.

Business data should be added to the hive warehouse in time.

Hive handles offline data, hbase deals with frequently updated data, and hdfs is the underlying structure of hive and hbase. Regular files can also be stored.

In this way, the whole transformation is basically completed. However, it should be noted that the architecture design must reduce the complexity of the development program. Although the hadoop model is introduced here, the developers in the framework are still hidden. Business processing classes can run either in stand-alone mode or on hadoop, and can call spring and ibatis. Reduce the learning cost of development, in the actual combat slowly learned a new skill.

This is the answer to the question about how to manage the Java thread pool and build a distributed Hadoop scheduling framework. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.