How to deal with Java High concurrency 07/06 Update SLTechnology News&Howtos

How to deal with Java High concurrency

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to deal with Java high concurrency", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to deal with Java high concurrency.

Catalogue

Ten core technologies that must be mastered in high-performance development

Ipaw O Optimization: zero copy Technology

Ipaw O Optimization: multiplexing Technology

Thread pool technology

Lock-free programming technology

Interprocess communication technology

Scale-out (horizontal expansion)

Caching

Async

High performance, high availability, high expansion solution

High-performance practical solution

Highly available practical solutions

A highly scalable practical solution

Summary

Ten core technologies that must be mastered in high-performance development

Step by step, we step by step from memory, disk I _ peg O, network I _ max O, CPU, cache, architecture, algorithm and other multi-level progress, and connect the ten core technologies that must be mastered in high performance development.

-Icano optimization: zero-copy technology-iMaguo optimization: multiplexing technology-thread pool technology-lock-free programming technology-interprocess communication technology-RPC & & serialization technology-database indexing technology-caching technology & & Bloom filter-full-text search technology-load balancing technology Icando O optimization: zero copy technology

Read the file from the disk, and then send the data through the network, and the data from the disk to the network need to be copied four times, of which the CPU needs to be transported twice.

Recently, I was studying the underlying design of nginx, and I happened to see this. You can add this to the nginx series later.

Zero-copy technology liberates CPU and sends file data directly from the kernel without having to copy it to the application buffer, which is a waste of resources.

Linux API:

Ssize_t sendfile (int out_fd, int in_fd, off_t * offset, size_t count)

The function name has clearly explained the function of the function: send a file. Specify the file descriptor and network socket descriptor to be sent, and a function is done!

Ipaw O Optimization: multiplexing Technology

Each thread has to block in the recv waiting for the other party's request, which has more visitors, more threads open, a large number of threads are blocking, and the speed of the system slows down.

At this point, you need multiplexing technology, use the select model, put all waits (accept, recv) in the main thread, the worker thread does not have to wait.

After a while, more and more people visit the site, even select is starting to be a little overwhelmed, and the boss continues to ask you to optimize your performance.

At this point, you need to upgrade the multiplexing model to epoll.

Select has three disadvantages and epoll has three advantages.

At the bottom of select, arrays are used to manage socket descriptors, and the number of socket descriptors managed at the same time is limited to several thousand. Epoll uses trees and linked lists to manage, and the number of management can be very large at the same time. Select won't tell you which socket has the message, you need to ask it one by one. Epoll directly tells you who has the news, no need to poll. Select also needs to copy the socket list back and forth between user space and kernel space when making system calls, which is a waste of calling select in a loop. Epoll uniformly manages socket descriptors in the kernel without having to copy them back and forth.

With the use of epoll multiplexing technology and version 3.0, your website can handle many user requests at the same time.

In the previous scheme, the worker thread was always created and closed after it was used. When a large number of requests came, the thread kept creating, closing, creating and closing, which was very expensive. At this point, you need to:

Thread pool technology

We can start a batch of worker threads as soon as the program starts, instead of creating them when there is a request, and use a common task queue to deliver tasks to the queue when the request comes. each worker thread uniformly takes tasks out of the queue for processing, which is thread pool technology.

The use of multithreading technology improves the concurrency ability of the server to a certain extent, but at the same time, multiple threads often need to use mutexes, signals, conditional variables and other means to synchronize multiple threads for data synchronization. These heavyweight synchronization methods often lead to multiple thread switching in user mode / kernel mode, system calls and thread switching are not small overhead.

In thread pool technology, a common task queue is mentioned, from which each worker thread needs to extract tasks for processing. Here, multiple worker threads are involved in the synchronous operation of this common queue.

Are there any lightweight solutions for multithreaded safe access to data? At this point, you need to:

Lock-free programming technology

In multithreaded concurrent programming, thread synchronization is needed when common data is encountered. The synchronization here can be divided into blocking synchronization and non-blocking synchronization.

Blocking synchronization is easy to understand. The mechanisms provided by our commonly used operating systems, such as mutexes, signals, and condition variables, all belong to blocking synchronization, and their essence is to add "locks".

The corresponding non-blocking synchronization is to achieve synchronization without locking. At present, there are three kinds of technical solutions:

Wait-freeLock-freeObstruction-free

The three kinds of technical solutions are all through certain algorithms and technical means to achieve synchronization without blocking waiting, among which Lock-free is the most widely used.

Lock-free can be widely used because the mainstream CPU provides atomic-level read-modify-write primitives, which is the famous CAS (Compare-And-Swap) operation. On the Intel x86 series processors, it is the cmpxchg series instructions.

We often see no lock queue, no lock list, no lock HashMap and other data structures, most of the lock-free core comes from this. In daily development, the proper use of lock-free programming technology can effectively reduce the extra overhead caused by multi-thread blocking and switching and improve performance.

Server online for a period of time, found that the service often crashed exception, troubleshooting found to be the worker thread code bug, a crash the whole service is unavailable. So you decide to split the worker thread and the main thread into different processes, and the worker thread crash does not affect the overall service. There are multiple processes at this time, and you need to:

Interprocess communication technology

What can you think of when it comes to interprocess communication?

Pipe naming pipe socket message queue semaphore shared memory

The above various ways of inter-process communication are introduced and compared in detail. It is recommended that an article re-explore inter-process communication. I will not repeat it here.

Scale-out (horizontal expansion)

The traffic is separated by distributed deployment, so that each server bears part of the concurrency and traffic. This is also one of my favorite methods.

Caching

Use caching to improve system performance.

Why can caching greatly improve the performance of the system?

It must be compared with more ordinary disks. Let's take a look at the speed of a normal disk:

The seek time of an ordinary disk is about 10ms, while compared with the time spent on disk seek, the time for CPU to execute instructions and memory addressing is at the ns (nanosecond) level, and the time to read data from the gigabit network card is at the μ s (microsecond) level. So in the whole computer system, the disk is the slowest link, even several orders of magnitude slower than other components. Therefore, we usually use caching with memory as the storage medium to improve performance.

As for why the cache is fast, because it is built-in, ah, in memory. However, there is also a drawback, that is, burning memory.

Async

This is asynchronous at the business level.

Kernel-level asynchronism requires calling kernel-specified asynchronous functions (aio family), otherwise, both blocking and non-blocking are synchronous.

High performance, high availability, high expansion solution

The following practical solutions, some I have tried, some have not experienced but know the same thing, and some do not know when they will be able to practice.

High-performance practice solution 1, cluster deployment, reduce the pressure on a single machine through load balancing. 2. Multi-level cache, including static data using CDN, local cache, distributed cache, etc., as well as the handling of hot key, cache penetration, cache concurrency, data consistency and other issues in cache scenarios. 3. Optimize the database, table and index, and solve the complex query problem with the help of search engine. 4. Consider the use of NoSQL databases, such as HBase, TiDB, etc., but the team must be familiar with these components and have strong operation and maintenance capabilities. 5. Asynchronization, processing secondary processes asynchronously through multithreading, MQ, and even delayed tasks. 6. For current limit, you need to consider whether the service allows current limit (for example, the second kill scenario is allowed), including frontend current limit, Nginx access layer current limit, and server current limit. 7. Cut the peak and fill the valley for the flow, and accept the flow through MQ. 8. Concurrent processing, parallelizing serial logic through multithreading. 9. Pre-calculation, such as the scenario of grabbing red packets, you can calculate the amount of red packets in advance and cache them, and you can use them directly when sending red packets. 10. Cache warm-up, prefetch data to local cache or distributed cache in advance through asynchronous tasks. Reduce the number of IO, such as batch read and write of database and cache, batch interface support of RPC, or kill RPC calls by means of redundant data. 12. Reduce the packet size of IO, including adopting lightweight communication protocol, appropriate data structure, removing redundant fields in the interface, reducing the size of cache key, compressing cache value and so on. 13. Program logic optimization, such as the preposition of judgment logic that blocks the execution flow with high probability, the computational logic optimization of For cycle, or the use of more efficient algorithms. 14. The use of various pooling technologies and the setting of pool size, including HTTP request pool, thread pool (considering CPU-intensive or IO-intensive setting of core parameters), database and Redis connection pool, etc. 15. JVM optimization, including the size of the new generation and the old age, the selection of GC algorithm, etc., to reduce the GC frequency and time-consuming as much as possible. Lock selection, optimistic locking is used in scenarios with more reading and writing less, or consider reducing lock conflicts by segmented locking.

The above solution only considers all possible optimization points from the two dimensions of computing and IO, and requires a supporting monitoring system to understand the current performance in real time and support you to analyze the performance bottleneck, and then follow the principle of 28 to optimize the principal contradiction.

Highly available practice scenario 1, peer node failover, Nginx and service governance framework both support access to one node after failure. 2. For the failover of non-peer nodes, the master / slave handover is detected by heartbeat (such as sentry mode or cluster mode of redis, master-slave handover of MySQL, etc.). 3. Timeout setting, retry strategy and idempotent design at the interface level. 4. Downgrade treatment: ensure core services, sacrifice non-core services, and circuit breaker if necessary; or if there is a problem with the core link, there is an alternative link. 5. Current-limiting processing: directly reject requests that exceed the processing capacity of the system or return error codes. 6. Guarantee the message reliability of the MQ scenario, including the retry mechanism on the producer side, the persistence on the broker side, the ack mechanism on the browser side, and so on. 7. Grayscale release, which can support the deployment of small traffic according to the machine dimension, observe system logs and business metrics, and then push the full volume after running smoothly. 8. Monitoring and alarm: omni-directional monitoring system, including the most basic CPU, memory, disk, network monitoring, as well as Web server, JVM, database, all kinds of middleware monitoring and business index monitoring. 9. Disaster preparedness drill: similar to the current "chaos Engineering", carry out some destructive means to the system to observe whether local failures will cause availability problems.

The highly available scheme is mainly considered from three aspects: redundancy, trade-off and system operation and maintenance. at the same time, it needs a supporting duty mechanism and fault handling flow, which can be followed up in time when there are online problems.

Highly scalable practical solution 1, reasonable hierarchical architecture: for example, the most common layered architecture of the Internet mentioned above, in addition, micro-services can be further delaminated according to the data access layer and the business logic layer (but the performance needs to be evaluated, and there will be one more hop in the network). 2. Split the storage layer: split vertically according to the business dimension and further horizontally according to the data feature dimension (sub-database table). 3. Split of the business layer: the most common split is based on the business dimension (such as merchandise services and order services in e-commerce scenarios), core and non-core interfaces, and request sources (such as To C and To Bjinger Apps and H5). Summary

1. The simplest system design meets the business needs and traffic status, and chooses the most familiar technical system.

2. With the increase of traffic and the change of business, correct the problems in the architecture, such as single point problem, scale-out problem, components whose performance can not meet the requirements.

In this process, we choose the mature components of the community that are familiar to the team to help us solve the problem, and we will build our own wheels only if the community does not have a suitable solution.

3. When the minor repairs and fixes of the architecture can not meet the demand, consider refactoring, rewriting and other large adjustment methods to solve the existing problems.

At this point, I believe you have a deeper understanding of "how to deal with Java high concurrency". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.