In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Today, the editor will share with you how to use Java Web to build a simple e-commerce system. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.
Stage one, build a website on a stand-alone machine
In the early days of the website, we often ran all our programs and software on a single computer. At this point, we use a container, such as Tomcat, Jetty, Jboss, and then directly use JSP/Servlet technology, or use some open source frameworks such as Maven + Spring + Struts + Hibernate, Maven + Spring + Spring MVC + Mybatis. Then select a database management system to store data, such as MySQL, SqlServer, Oracle, and then connect and operate the database through JDBC.
All the above software, including databases and applications, are loaded on the same machine, and the application runs, which can be regarded as a small system. At this point, the system results are as follows:
Phase 2. Separation of application server and database
With the launch of the website, the number of visits gradually increases, and the load of the server increases slowly. When the server is not overloaded, we should be ready to improve the load capacity of the website. If our code level has been difficult to optimize, without improving the performance of a single machine, adding a machine is a good way, which can not only effectively improve the load capacity of the system, but also cost-effective.
What are the additional machines used for? At this time, we can separate the database server from the Web server, which not only improves the load capacity of a single machine, but also improves the disaster recovery capacity.
The architecture of the application server separated from the database is shown in the following figure:
Stage 3. Application server cluster
As traffic continues to increase, a single application server can no longer meet the demand. Under the assumption that there is no pressure on the database server, we can change the application server from one to two or more, dispersing users' requests to different servers, so as to improve the load capacity. However, there is no direct interaction between multiple application servers, they all rely on the database to provide external services. The famous failover software is KeepAlived,KeepAlived, which is similar to Layer3, 4, 7 switching mechanism. It is not the exclusive product of a specific software, but a product that can be applied to all kinds of software. KeepAlived with ipvsadm can also do load balancing, which can be described as an artifact.
We take the addition of an application server as an example, and the added system structure is as follows:
At this point in the evolution of the system, the following four problems will arise:
Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community
Who forwards the user's request to the specific application server?
What forwarding algorithms and strategies can be used?
How does the application server return the user's request?
If users visit different servers each time, how to maintain the consistency of session?
The common solutions to the above problems are as follows:
1. Load balancing
In general, there are 5 solutions:
1), HTTP redirection
HTTP redirection is the request forwarding at the application layer. In fact, the user's request has already reached the HTTP redirect load balancing server. The server requires the user to redirect according to the algorithm. After receiving the redirect request, the user requests the real cluster again.
Pros: easy to use
Disadvantages: poor performance.
2), DNS domain name resolution load balancing
DNS domain name resolution load balancer means that when a user requests a DNS server to obtain the IP address corresponding to the domain name, the DNS server directly gives the server IP after the load balance.
Advantages: give it to DNS, so we don't have to maintain the load balancing server.
Disadvantages: when an application server crashes, DNS can not be notified in time, and the control of DNS load balancing is in the domain name service provider, the website can not make more improvements and more powerful management.
3) reverse proxy server
When the user's request arrives at the reverse proxy server (which has already reached the website computer room), the reverse proxy server forwards it to the specific server according to the algorithm. Commonly used Apache,Nginx can act as a reverse proxy server.
Pros: easy deployment
Disadvantages: proxy servers can become performance bottlenecks, especially when uploading large files at once.
4), IP layer load balancing
After the request arrives at the load balancer, the load balancer can forward the request and achieve load balancing by modifying the destination IP address of the request.
Advantages: better performanc
Disadvantages: the broadband of the load balancer has become a bottleneck.
5), data link layer load balancing
After the request arrives at the load balancer, the load balancer achieves load balancing by modifying the MAC address of the request. Unlike IP load balancer, when the request accesses the server, it directly returns to the customer. Without having to go through the load balancer.
2. Cluster scheduling and forwarding algorithm
1), rr polling scheduling algorithm
As the name implies, polling for distribution requests.
Pros: easy to implement
Disadvantages: do not consider the processing power of each server
2), wrr weighted scheduling algorithm
We set the weight Weight for each server, and the load balancer dispatches the server according to the weight, and the number of times the server is called is proportional to the weight.
Advantages: taking into account the different processing power of the server
3), sh original address hashing algorithm
Extract the user IP, get a key according to the hash function, and then look up and deal with the corresponding value, that is, the target server IP, according to the static mapping table. If the target machine is overloaded, null is returned.
Advantages: to achieve the same user access to the same server.
4), dh destination address hashing algorithm
The principle is the same as above, but now the IP of the destination address is extracted to do the hash.
Advantages: to achieve the same user access to the same server.
5), lc least connection algorithm
Priority is given to forwarding requests to servers with a small number of connections.
Advantages: make the load of each server in the cluster more uniform.
6), wlc weighted least join algorithm
Add weights to each server on the basis of lc. The algorithm is: (number of active connections * 256 + number of inactive connections) / weight, and the servers with small calculated values are selected first.
Pros: requests can be allocated according to the capabilities of the server.
7), sed minimum expected delay algorithm
In fact, sed is similar to wlc, except that the number of inactive connections is not considered. The algorithm is: (number of active connections + 1) * 256 / weight. Similarly, servers with small values are selected first.
8), nq never queue algorithm
Improved sed algorithm. Let's think about the circumstances under which we can "never queue", that is, when the number of connections to the server is 0, then if the number of connections to the server is 0, the equalizer forwards the request directly to it without sed calculation.
9), LBLC least join algorithm based on locality
According to the destination IP address of the request, the load balancer finds out the server where the IP address is recently used and forwards the request. If the server is overloaded, the least number of connections algorithm is used.
10), LBLCR minimum join algorithm based on locality with replication
According to the destination IP address of the request, the load balancer finds out the "server group" recently used by the IP address. Note that it is not a specific server, and then selects a specific server from the group with the minimum number of connections and forwards the request. If the server is overloaded, then according to the minimum number of connections algorithm, find a server among the servers in the cluster that are not in this server group, join the server group, and then forward the request.
3. The problem of cluster request return mode
1), NAT
The load balancer receives the user's request and forwards it to the specific server. The server processes the request and returns it to the equalizer, and then the equalizer returns it to the user.
2), DR
The load balancer receives the user's request and forwards it to the specific server, and the server comes out to play with the request and returns it directly to the user. The system needs to support IP Tunneling protocol, so it is difficult to cross-platform.
3), TUN
Same as above, but without IP Tunneling protocol, cross-platform is good, most systems can support.
4. Cluster Session consistency problem
1), Session
Session is to assign the requests of the same user in a certain session to a fixed server, so that we do not need to solve the cross-server session problem. The common algorithm is the ip_hash algorithm, that is, the two hashing algorithms mentioned above.
Pros: easy to implement
Cons: session disappears when the application server is restarted.
2), Session Replication
Session replication replicates session in a cluster so that each server holds the session data of all users.
Advantages: reduce the pressure on the load balancing server and do not need to implement the ip_hasp algorithm to forward requests
Disadvantages: large network bandwidth overhead during replication, Session takes up a large amount of memory and waste if the number of visitors is large.
3), Session data centralized storage
Session data centralized storage is to use database to store session data, which realizes the decoupling of session and application server.
Advantages: compared with Session replication's solution, the pressure on broadband and memory between clusters is greatly reduced.
Cons: the database where Session is stored needs to be maintained.
4), Cookie Base
Cookie base stores Session in Cookie, and the browser tells the application server what my session is. It also decouples session from the application server.
Advantages: easy to implement, basically maintenance-free.
Disadvantages: cookie length limit, low security, bandwidth consumption.
It is worth mentioning that:
The load balancing algorithms supported by Nginx are wrr, sh (consistent hash) and fair (lc). But if Nginx is used as an equalizer, it can also be used as a static resource server.
Keepalived + ipvsadm is relatively powerful, and currently supported algorithms are: rr, wrr, lc, wlc, lblc, sh, dh
Keepalived supports cluster modes such as NAT, DR and TUN.
Nginx itself does not provide a solution for session synchronization, while Apache provides support for session sharing.
After solving the above problems, the structure of the system is as follows:
Stage 4. Separation of database reading and writing
Above we always assume that the database load is normal, but as the number of visits increases, the database load is slowly increasing. Then someone may immediately think of the same as the application server, the database is divided into two and then load balancing.
But for the database, it's not that simple. If we simply divide the database into two, and then load the requests for the database to machine An and machine B respectively, it will obviously cause the problem of data inconsistency between the two databases. Then for this case, we can first consider the use of read-write separation and master-slave replication.
The structure of the system after read-write separation is as follows:
This structural change will also bring about two problems:
Data synchronization between master and slave databases.
Apply the selection problem for the data source.
Solution:
Master-slave replication is realized by using Master + Slave, which is included in MySQL.
Use third-party database middleware, such as MyCat. MyCat is developed from Cobar, while Cobar is Ali's open source database middleware, and later stopped development. MyCat is a good MySql open source database sub-database sub-table middleware in China.
Stage 5. Use search engines to relieve the pressure of reading the library.
If the database is used as a reading library, it is often inadequate for fuzzy search, and even if the separation of read and write is done, this problem has not been solved. Take the trading website we cited as an example, the published goods are stored in the database, and the most commonly used function of users is to find the goods, especially according to the title of the goods to find the corresponding products. For this kind of requirement, we usually achieve it through the like function, but the cost of this approach is very high, and the results are very inaccurate. At this point, we can use the inverted index of the search engine to do this.
The advantages of search engine: it can greatly improve the query speed and search accuracy.
Introduce the cost of search engines
To bring a lot of maintenance work, we need to implement the index construction process and design full / additional construction methods to deal with non-real-time and real-time query requirements.
Need to maintain search engine cluster
The search engine can not replace the database, it solves the accurate, fast and efficient "read" operation in some scenarios, whether to introduce the search engine or not, we need to comprehensively consider the needs of the whole system.
The structure of the system after introducing the search engine is as follows:
Stage 6. Use cache to relieve the pressure of reading library.
Common caching mechanisms include page-level caching, application data caching and database caching.
Caching in the application tier and database tier
With the increase of the number of visits, there are gradually many users accessing the same part of the hot content, for these more popular content, it is not necessary to read from the database every time. We can use caching technology, for example, we can use Google's open source cache technology Guava or Memecahed as the application layer cache, or we can use Redis as the database layer cache.
In addition, in some scenarios, relational databases are not very suitable. For example, I want to do a function of "limit the number of password errors per day". The idea is that when a user logs in, if the login error is made, then record the user's IP and the number of errors, so where should the data be put? If you put it in memory, it will obviously take up too much content; if you put it in a relational database, you should not only create database tables, but also Java bean corresponding resumes, write SQL, and so on. When we analyze the data we want to store, it is nothing more than key:value data like {ip:errorNumber}. For this kind of data, we can use NOSQL database instead of traditional relational database.
Page caching
In addition to data caching, there is also page caching. For example, localstroage or Cookie using HTML5. In addition to the performance improvement brought by page caching, page static technology should be used as far as possible for pages with concurrent access and low frequency of page replacement.
Advantages: reduce the pressure on the database and greatly improve the access speed
Disadvantages: need to maintain the cache server, which increases the complexity of coding.
It is worth mentioning that:
The scheduling algorithm of the cache cluster is different from the application servers and databases mentioned above. Consistent hashing is used to improve the probability of the result.
After adding the cache, the system structure is as follows:
Stage 7. Horizontal split and vertical split of database
Our website has evolved to the present, the data of transactions, goods and users are still in the same database. Although the way of increasing cache and read-write separation is adopted, as the pressure on the database continues to increase, the bottleneck of database data volume is becoming more and more prominent. at this time, we can have two options: vertical split and horizontal split.
Data vertical split
Vertical split means to split different business data in the database into different databases, combined with the current example, is to separate the data of transactions, commodities, and users.
Advantages:
Solved the pressure problem of putting all the business in one database
More optimizations can be made according to the characteristics of the business.
Disadvantages:
The state consistency and data synchronization of multiple databases need to be maintained.
Question:
You need to consider the original cross-business transactions.
Join across databases.
Solution to the problem:
Distributed transactions across databases should be avoided as far as possible in the application layer, and if necessary, control them in the code.
To solve the problem through third-party middleware, such as the MyCat,MyCat mentioned above provides a wealth of cross-library Join solutions, details can be found in the MyCat official documentation.
The structure of the data after vertical split is as follows:
Data horizontal split
Horizontal data split is to split the data in the same table into two or more databases. The reason for horizontal data split is that the amount of data or updates of a business reaches the bottleneck of a single database, and the table can be split into two or more databases.
Advantages:
If we can overcome the above problems, then we will be able to cope with the increase in the amount of data and writes.
Question:
The application system that accesses the user information needs to solve the problem of SQL routing, because now the user information is divided into two databases, and it is necessary to know where the data to be operated is.
The primary key is also handled differently, such as the original self-increment field, which cannot simply continue to be used now.
If you need a paged query, it's even more troublesome.
Solution to the problem:
We can still solve third-party middleware, such as MyCat. MyCat can parse our SQL through the SQL parsing module and forward the request to a specific database according to our configuration.
We can solve this problem by customizing the ID solution with UUID guarantee.
MyCat also provides a wealth of paging query solutions, such as doing a paging query from each database, and then merging data to do a paging query and so on.
The structure of the data split horizontally is as follows:
Stage 8. Split of applications
Split applications by microservice
With the development of business, there are more and more services and applications. We need to think about how to avoid making applications bloated. This requires taking the app apart and changing it from one app to two or more. Or with our example above, we can separate users, goods and transactions. Become two subsystems of "user, commodity" and "user, transaction".
Split structure:
Question:
After this split, there may be some of the same code, such as user-related code, goods and transactions require user information, so the same code for operating user information is retained in both systems. How to ensure that the code can be reused is a problem that needs to be solved.
Resolve the problem:
By taking the route of service-oriented SOA to solve the frequent public services.
Take the road of SOA service governance
In order to solve the problems after the above split application, we split the public services out to form a service-oriented mode, referred to as SOA.
Adopt the service-oriented system structure:
Advantages:
The same code will not be scattered in different applications, these implementations are placed in each service center, so that the code can be better maintained.
We put the interactive business of the database in each service center, so that the front-end Web application pays more attention to the interaction with the browser.
Question:
How to make a remote service call?
Solution:
This can be solved by introducing message middleware below.
Stage 9. Introduction of message middleware
With the continued development of the website, there may be sub-modules developed in different languages and subsystems deployed on different platforms. At this point, we need a platform to transmit reliable data independent of platform and language, to make load balancing transparent, to collect and analyze call data during the call process, and to speculate a series of requirements, such as the visit growth rate of the website, and to predict how the website should grow. Open source message middleware has Ali's Dubbo, which can be used with Google's open source distributed program coordination service Zookeeper to realize server registration and discovery.
The structure after introducing message middleware:
These are all the contents of the article "how to build a simple e-commerce system with Java Web". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.