What is the high concurrency solution of Java architecture 07/02 Update SLTechnology News&Howtos

What is the high concurrency solution of Java architecture

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

What is the high concurrency solution of Java architecture? I believe many inexperienced people don't know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

1. Separation of application and static resources

At the beginning, the application and static resources are saved together. When the concurrency reaches a certain degree, static resources need to be saved to a special server. Static resources mainly include pictures, videos, js, css and some resource files. These files are relatively simple to separate because they have no state, and can be stored directly to the response server. Generally, they will be accessed using a special domain name.

Different domain names allow browsers to access the resource server directly without having to access the application server. The architecture diagram is as follows:

two。 Page caching

Page caching is caching the pages generated by the application so that you don't have to generate pages every time, thus saving a lot of CPU resources, and it's even faster if you put the cached pages in memory. If you use a Nginx server, you can use its own caching capabilities, or you can use a dedicated Squid server. The default invalidation mechanism of the page cache is handled according to the cache time, of course, you can also manually invalidate the corresponding cache after modifying the data.

Page caching is mainly used on pages where the data rarely changes, but on many pages, most of the data rarely changes, but a few of them change very frequently. For example, a page that displays an article can be static, but if there are "top" and "step" functions at the back of the article and the number of responses is displayed, the data will change more frequently. This will affect static. To solve this problem, you can use Mr. into a static page and then use Ajax to read and modify the response data, so that you can kill two birds with one stone, either using the page cache or displaying some frequently changed data in real time.

In fact, as we all know, the most efficient and least consuming is the pure static html page, so we try our best to make the pages on our site use static pages to achieve, this simplest method is actually the most effective way. However, for websites with a large amount of content and frequent updates, we cannot implement them all manually, so our common information publishing system CMS appears, like the news channels of various portal sites that we often visit, and even their other channels, are managed and implemented through the information publishing system, which can realize the simplest information entry and automatically generate static pages. It can also have channel management, rights management, automatic crawling and other functions. For a large website, it is essential to have a set of efficient and manageable CMS.

In addition to portals and information release type websites, for community type websites that require high interactivity, static as much as possible is also a necessary means to improve performance. Posts and articles in the community are static in real time, and static again when there are updates is also a widely used strategy, such as the hodgepodge of Mop is the use of such a strategy, NetEase community and so on.

At the same time, html static is also a means used by some caching strategies. For applications where database queries are frequently used but content updates are small, we can consider using html static to achieve this, such as the common setting information of forums in forums, which can be managed in the background and stored in the database in mainstream forums. In fact, a large number of these information are called by foreground programs, but the update frequency is very small. You can consider static when this part of the content is updated in the background, which avoids a large number of database access requests.

3. Cluster and distribution

Cluster is that every server has the same function, and it is possible to call that server when processing requests, which mainly plays a diversion role.

Distributed is to put different services on different servers, multiple servers may be needed to process a request, which can improve the processing speed of a request, and clustering and distribution can also be used at the same time.

There are two ways to cluster: one is in a static resource cluster. The other is the application cluster. Static resource clustering is relatively simple. The core problem in the process of application cluster processing is Session synchronization.

There are two ways to deal with Session synchronization: one is to automatically synchronize to other servers after a change in Session, and the other is to use a single program to manage Session. The default use of the same Session,Tomcat for all cluster servers is *, which can be achieved through simple configuration. In the second method, you can use a dedicated server to install efficient cache programs such as Mencached to manage the session, and then rewrite the Request and overwrite the getSession method in the application to obtain the Session in the defined server.

Another core problem for the cluster is load balancing, that is, the problem of receiving a request and assigning it to that server to deal with. This problem can be solved by software or by using special hardware (such as F5).

4. Reverse proxy

Reverse proxy means that the server directly accessed by the client does not really provide the service, it takes resources from other servers and returns the results to the user.

Figure:

4.1 differences between reverse proxy server and proxy server

The role of the proxy server is to get the desired resources on our behalf and then return the results to us. The resources we want to obtain are told to the proxy server on our own initiative. For example, if we want to access Facebook but cannot access it directly, we can let the proxy server access it and then return the results to us.

Reverse proxy server is when we visit a server normally, the server calls other server resources and returns the results to us, we do not know.

The proxy server is used actively and serves for us, and it does not need to have its own domain name; the reverse proxy server is tried by the server itself, and we do not know that it has its own domain name. There is no difference between visiting it and visiting a normal URL.

The reverse proxy server has three main functions:

1. Can be integrated as a front-end server with the server that actually handles the request

two。 Can do load balancing

3. Forward requests, for example, different types of resource requests can be forwarded to different servers for processing.

5. CDN

Cdn is actually a special cluster page cache server. Compared with multiple page cache servers in an ordinary cluster, it is mainly because of its storage location and the way it allocates requests. CDN servers are distributed all over the country, and when a user request is received, the request will be assigned to the most appropriate CDN server node to obtain data. For example, the users of Unicom are assigned to the nodes of Unicom, and the users of Shanghai are assigned to the nodes of Shanghai.

Each node of the CDN is actually a page cache server. If there is no request for the cache of the resource, it will be obtained from the master server, otherwise the cached page will be returned directly.

CDN allocates requests (load balancers) by using a special CDN domain name resolution server when resolving domain names. The general practice is to try CNAME to resolve the domain name to a specific domain name in ISP, and then use a special CDN server to resolve the domain name to the corresponding CDN node. As shown in the picture.

The second step to access the DNS server of CDN is to use the NS record to point to the DNS server of CDN for the target domain name of the CNAME record. Multiple servers may also be clustered on each node of the CDN.

6. Optimization of the underlying layer

All of the above is that the architecture is based on the infrastructure described above. Many places need to transmit data through the network, if the speed of network transmission can be accelerated, it will improve the whole system.

7. Database cluster and database table hash

Large websites have complex applications, and these applications must use databases, so in the face of a large number of visits, the bottleneck of the database will soon appear, and a database will soon be unable to meet the applications. so we need to use database clusters or database table hashing.

In terms of database clustering, many databases have their own solutions, such as Oracle, Sybase, and so on. The Master/Slave provided by the commonly used MySQL is also a similar solution. What kind of DB you use can be implemented by referring to the corresponding solution.

The database cluster mentioned above is limited by the DB type in terms of architecture, cost and expansibility, so we need to improve the system architecture from the application point of view. Database table hashing is the most common and effective solution. We install business and application or functional modules in the application to separate the database, different modules correspond to different databases or tables, and then carry out smaller database hashing of a page or function according to a certain strategy, such as user tables and user ID tables, so that the performance of the system can be improved at low cost and has good scalability. Sohu forum is the use of such an architecture, the forum users, settings, posts and other information for database separation, and then posts, users in accordance with the plate and ID for hashing databases and tables, and finally can be in the configuration file for a simple configuration can allow the system at any time to add a low-cost database to supplement system performance.

8. Summary

The whole evolution process of website architecture is mainly around the two problems of big data and high concurrency, and the solutions are mainly divided into two types: using cache and multi-resources. Multi-resource mainly refers to multi-storage (including multi-memory), multi-CPU and multi-network. For multi-resources, it can be divided into two types: single resource to handle a complete request and multiple resources to cooperate to process a request, such as cluster and distribution in multi-storage and multi-CPU, CDN and static resource separation in multi-network. After understanding the whole idea, you grasp the nature of architectural evolution, and you may be able to design a better architecture.

Other simple summaries:

First of all, I think we must first have a clear train of thought before solving the problem. If it is only used for other people's solutions, it can only be borrowlism, without real understanding, and without inference.

Large amounts of data and high concurrency are often linked together, although they are completely different things. Massive data only refers to the huge amount of data in the database, while concurrency refers to the high traffic of database and server.

So the question is, since the database has a large amount of data, what should we do? If you want to solve the problem, you must first know what the problem is! So what kind of problems will huge amounts of data bring to me?

The problem brought by the huge amount of data is nothing more than the problem of adding, deleting, changing and checking. What other problems can there be besides? It can't be a security problem. (hit one in the face, it could be a security problem.)

1 slow database access

(2) slow insertion and update, which can only be solved by sub-database and sub-table.

There are several ways to solve the problem of slow database access. Since accessing the database is slow, you don't have to access the database if logic allows it?

1 use caching

2 use page static

Since we can't escape without accessing the database, let's optimize the database.

3 optimize the database (contains a lot of content, such as parameter configuration, index optimization, sql optimization, etc.)

4 separate the active data in the database

5 separation of read and write

6 batch reading and delayed modification

7 use search engines to search for data in the database

8 using technologies such as NoSQL and Hadoop

9 split the business

High concurrency solution

In fact, this problem must be discussed in combination with the huge amount of data above. Under what circumstances will there be high concurrency? It must be the case that there is a relatively large number of visits at ordinary times, then there are more and more data stores corresponding to relatively large traffic at ordinary times, all of which complement each other. Of course, there are also examples, such as rigid requirements, such as 12306. The high concurrency here is not massive compared to its data. So how to solve the problem of large traffic at ordinary times? Because the problems of server and database are involved, it is necessary to optimize from these two aspects.

1 increase the number of web servers, that is, cluster and load balancing. Since one server cannot complete the task, use a few more, which is not enough for the computer room.

Before leading to the second solution, are there any optimizations that can be done in addition to the database server? Of course there is.

1.1 Page caching

1.2 cdn

1.3 reverse proxy

1.4 applications and static resources are separated (for example, downloaded resources are put together separately to provide high bandwidth resources for this server)

(2) to increase the number of database servers, do the same cluster and load balancing.

The solution of massive data

1 use caching

Many things complement each other, compared with the use of caching is more used to solve the problem of high concurrency, because massive data leads to slow access, easy to cause the seriousness of high concurrency problems, and because the database is generally the bottleneck of web access, so we try to avoid operating the database when business logic allows, so there is cache. Keeping the necessary data in memory without having to read it in the database every time results in unnecessary performance waste and faster access-this is the benefit of caching. So what should you pay attention to when using caching and choosing cache management software?

2 separate the active data in the database

Why do you want to separate? Tell me about a problem I encountered in my actual environment. There is a table with only more than 10 fields and 1.3 million pieces of data, but the size has reached 5G of data, which in itself is unreasonable. Such a small amount of data takes up too much data, indicating that some of these fields store a large number of strings (such as the content of articles, etc.). Most of these large fields are not used for each retrieval of this table, but it takes a long time to generate a lot of slow logs. At this point, we can consider splitting the table vertically and separating the active data, which can greatly speed up the access speed.

After reading the above, have you mastered what the high concurrency solution of Java architecture is? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.