What is the distributed architecture of large e-commerce? 04/19 Update SLTechnology News&Howtos

What is the distributed architecture of large e-commerce?

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces the relevant knowledge of large-scale e-commerce distributed architecture, the content is detailed and easy to understand, the operation is simple and fast, and has a certain reference value. I believe you will gain something after reading this article on how the large-scale e-commerce distributed architecture is. Let's take a look.

1. Overview of large-scale distributed website architecture 1.1. Characteristics of large websites

There are many users and widely distributed.

High traffic and high concurrency

Huge amount of data, high availability of services

The security environment is poor and vulnerable to network attacks.

Multi-function, fast change, frequent release

Gradual development from small to big

User-centric

Free service, paid experience

1.2. Large-scale website architecture goal

High performance: provides a fast access experience.

High availability: the website service can always be accessed normally.

Scalable: increase / decrease processing power by adding / decreasing hardware.

Security: provide website security access and data encryption, secure storage and other strategies.

Expansibility: easily add / remove new features / modules.

Agility: on demand, fast response

1.3. Large-scale website architecture model

Layering: generally can be divided into application layer, service layer, data layer, management layer, analysis layer

Segmentation: generally divided according to business / module / functional characteristics, such as the application layer is divided into home page, user center.

Distributed: deploy applications separately (such as multiple physical machines) and work together through remote calls.

Cluster: multiple copies of an application / module / function (for example, multiple physical machines) are deployed to provide external access through load balancer.

Caching: places data closest to the application or user to speed up access.

Async: asynchronizes synchronous operations. The client sends out the request without waiting for the server to respond, and after the server has finished processing, it uses notification or polling to inform the requester. Generally refers to: request-response-notification mode.

Redundancy: increase copies, improve availability, security, performance.

Security: have effective solutions to known problems and establish discovery and defense mechanisms for unknown / potential problems.

Automation: repeat things that do not require human involvement, through the way of tools, using machines.

Agility: actively accept changes in requirements and respond quickly to business development needs.

1.4. High performance architecture

User-centric, providing a fast web access experience. The main parameters are short response time, large concurrent processing capacity, high throughput and stable performance parameters.

It can be divided into front-end optimization, application layer optimization, code layer optimization and storage layer optimization.

Front-end optimization: the part before the business logic of the website

Browser optimization: reduce the number of Http requests, use browser caching, enable compression, Css Js locations, Js async, reduce Cookie transfers

CDN acceleration, reverse proxy

Application layer optimization: a server that handles the business of a website. Use caching, async, clustering

Code optimization: reasonable architecture, multithreading, resource reuse (object pool, thread pool, etc.), good data structure, JVM tuning, singleton, Cache, etc.

Storage optimization: cache, solid state disk, optical transmission, optimized read and write, disk redundancy, distributed storage (HDFS), NOSQL, etc.

1.5. Highly available Architectur

Large websites should be accessible at all times. Normal provision of external services. Because of the complexity of large websites, distributed, cheap servers, open source databases, operating systems and other characteristics. It is difficult to ensure high availability, which means that website failure is inevitable.

How to improve usability is a problem that needs to be solved urgently. First of all, you need to consider usability at the architectural level when planning. In the industry, several 9s are generally used to represent the availability index. For example, for four 9s (99.99), the unavailable time allowed within a year is 53 minutes.

Different levels use different strategies, and redundant backup and failover are generally used to solve the problem of high availability.

Application layer: generally designed to be stateless, it doesn't matter which server is used for each request. Generally, load balancing technology (need to solve the problem of Session synchronization) is used to achieve high availability.

Service layer: load balancing, hierarchical management, fast failure (timeout setting), asynchronous invocation, service degradation, idempotent design, etc.

Data layer: redundant backup (cold, hot backup [synchronous, asynchronous], warm backup), failure transfer (confirmation, transfer, recovery). The famous theoretical basis for the high availability of data is the CAP theory (persistence, availability, data consistency [strong consistency, user consistency, final consistency])

1.6. Scalable architecture

Scalability refers to improving / reducing the processing capacity of the system by adding / reducing hardware (server) without changing the original architecture design.

Application layer: split the application vertically or horizontally. Then load balancing is performed for a single function (DNS,HTTP [reverse proxy], IP, link layer).

Service layer: similar to the application layer

Data layer: sub-library, sub-table, NOSQL, etc.; commonly used algorithm Hash, consistency Hash.

1.7. Scalable architecture

It can easily add / remove functional modules and provide good code / module level scalability.

Modularization, componentization: high cohesion, internal coupling, improve reusability, expansibility.

Stable interface: define a stable interface, and the internal structure can be changed "at will" when the interface remains the same.

Design patterns: the application of object-oriented ideas, principles, the use of design patterns, design at the code level.

Message queue: a modular system that interacts through message queues to decouple dependencies between modules.

Distributed service: common module service, provide other system use, improve reusability and expansibility.

1.8. Security architecture

Have effective solutions to known problems and establish discovery and defense mechanisms for unknown / potential problems. For security issues, we must first improve security awareness, establish a safe and effective mechanism, from the policy level, organizational level to ensure. For example, the server password can not be disclosed, the password can be updated every month, and can not be repeated within three times; weekly security scan and so on. Strengthen the construction of the security system in an institutionalized way. At the same time, we need to pay attention to all aspects related to security. Security issues cannot be ignored. Including infrastructure security, application system security, data security and so on.

Infrastructure security: hardware procurement, operating system, network environment security. Generally used, regular channels to buy high-quality products, choose a secure operating system, timely repair vulnerabilities, install anti-virus software firewall. Guard against virus, back door. Set up firewall policy, establish DDOS defense system, use attack detection system, subnet isolation and other means.

Application system security: in program development, the known common problems are solved in the right way at the code level. Prevent cross-site scripting attacks (XSS), injection attacks, cross-site request forgery (CSRF), error messages, HTML comments, file uploads, path traversal, etc. You can also use Web application firewall (such as ModSecurity), security vulnerability scanning and other measures to enhance application-level security.

Data security: storage security (existing in reliable equipment, real-time, regular backup), preservation security (encryption and preservation of important information, selection of appropriate personnel for complex preservation and detection, etc.), transmission security (to prevent data theft and data tampering)

Commonly used encryption and decryption algorithms (single hash encryption [MD5,SHA], symmetric encryption [DES,3DES,RC]), asymmetric encryption [RSA] and so on.

1.9. Agility

Website architecture design, operation and maintenance management to adapt to changes, to provide high scalability, high scalability. It is convenient to deal with the rapid business development, sudden increase of traffic access and other requirements.

In addition to the architectural elements described above, the ideas of agile management and agile development need to be introduced. Make business, product, technology, operation and maintenance unified, on demand, rapid response.

1.10. Examples of large architectures

The above adopts seven-layer logical architecture, the first layer is customer layer, the second layer is front-end optimization layer, the third layer is application layer, the fourth layer is service layer, the fifth layer is data storage layer, the sixth layer is big data storage layer, and the seventh layer is big data processing layer.

Client layer: support PC browser and mobile APP. The difference is that mobile APP can be accessed directly through IP, reverse proxy server.

Front-end layer: using DNS load balancing, CDN local acceleration, and reverse proxy services

Application layer: website application cluster; split vertically according to business, such as commodity applications, member centers, etc.

Service layer: provide public services, such as user service, order service, payment service, etc.

Data layer: support relational database cluster (support read-write separation), NOSQL cluster, distributed file system cluster; and distributed Cache

Big data storage layer: supports log data collection in application layer and service layer, structured and semi-structured data collection in relational database and NOSQL database

Big data processing layer: offline data analysis or Storm real-time data analysis through Mapreduce, and the processed data is stored in the relational database. (in practical use, offline data and real-time data will be classified according to business requirements and stored in different databases for use in the supply layer or service layer.

two。 E-commerce website architecture case 2.1. Primary structure of the website

The general website, at the beginning of the practice, is three servers, one deployment application, one deployment database, one deployment NFS file system.

This is a more traditional practice a few years ago, before seeing a website with more than 100000 members, vertical clothing design portal, N multi-pictures. A server is used to deploy applications, databases and image storage. There are a lot of performance problems.

As shown below:

However, the current mainstream website architecture has undergone earth-shaking changes. Generally speaking, clusters are used to carry out highly available designs. At least it looks like this.

(1) use clusters to redundant application servers to achieve high availability; (load balancer devices can be deployed with applications)

Use database active / standby mode to achieve data backup and high availability

2.2. System capacity estimation

Estimate steps:

(1) number of registered users-average daily UV-daily PV-daily concurrency

(2) Peak estimate: 2-3 times the normal value.

(3) the storage capacity calculates the system capacity according to the concurrency (concurrency, number of transactions).

Customer requirements: the number of users has reached 10 million registered users in 3 years.

Estimates of concurrency per second:

(1) the daily UV is 2 million (2008 principle)

(2) Click and browse 30 times a day.

(3) quantity of PV: 200,300,000 to 60,000,000

(4) centralized traffic: 240.2 to 4.8 hours, there will be 60 million, 0.848 million (2008 principle)

(5) concurrent output per minute: 4.8 minutes, 60 minutes, 288 minutes, 4800pm, 288pm, 167,000 (approximately equal to)

(6) concurrency per second: 167000 / 60 / 2780 (approximately equal to)

(7) assuming that the peak period is three times the normal value, the number of concurrency per second can reach 8340.

(8) 1 millisecond = 1.3 visits

Do you regret not studying math well?! (I don't know if there is any mistake in the above calculation, hehe)

Server estimate: (take tomcat server as an example)

(1) according to one web server, 300 concurrent calculations per second are supported. Usually requires 10 servers (approximately equal to); [tomcat default configuration is 150]

(2) Peak period: 30 servers are required

Capacity estimate: 70ppm 90 principle

The system CPU is generally maintained at the level of about 70%, and the peak reaches the level of 90%. It is not a waste of resources and is relatively stable. Memory, similar to IO.

The above estimates are for reference only, because server configuration, business logic complexity and so on have an impact. CPU, hard disk, network, etc., are no longer evaluated.

2.3. Analysis of website structure

Based on the above estimates, there are several problems:

Need to deploy a large number of servers, peak computing, may have to deploy 30 Web servers. And these 30 servers, only second kill, activities will be used, there is a lot of waste.

All applications are deployed on the same server, and the coupling between applications is serious. Vertical segmentation and horizontal segmentation are needed.

Redundant code exists in a large number of applications

Server SESSION synchronization consumes a lot of memory and network bandwidth

The data needs to visit the database frequently, and the pressure of database access is huge.

Large websites generally need to make the following architectural optimization (optimization is considered when architecture design is done, and is generally solved at the architecture / code level. Tuning is mainly the adjustment of simple parameters, such as JVM tuning; if tuning involves a large number of code modifications, it is not tuning, but belongs to refactoring):

Business split

Application cluster deployment (distributed deployment, cluster deployment and load balancing)

Multi-level cache

Single sign-on (distributed Session)

Database cluster (read-write separation, sub-database and sub-table)

Service

Message queue

Other technologies

2.4. Website architecture optimization business split

According to the business attributes, it is divided into product subsystem, shopping subsystem, payment subsystem, comment subsystem, customer service subsystem, interface subsystem (docking such as purchase, sale and storage, SMS and other external systems).

According to the level definition of business subsystem, it can be divided into core system and non-core system. Core system: product subsystem, shopping subsystem, payment subsystem; non-core: comment subsystem, customer service subsystem, interface subsystem.

Business split function: promotion to a subsystem can be carried out by special teams and departments, and professional people do professional things to solve the problems of coupling and scalability between modules; each subsystem is deployed separately to avoid centralized deployment leading to the failure of an application and the problem that all applications are not available.

The role of grade definition: when traffic burst, key applications are protected to achieve elegant degradation; key applications are protected from being affected.

Split architecture diagram:

Refer to deployment scenario 2

(1) as shown above, each application is deployed separately.

(2) combined deployment of core and non-core systems

Application cluster deployment (distributed, clustering, load balancing)

Distributed deployment: the split application is deployed separately, and the application communicates remotely through RPC directly.

Cluster deployment: high availability requirements of e-commerce websites, deploying at least two servers for cluster deployment for each application

Load balancing: it is necessary for high availability systems. General applications achieve high availability through load balancing, distributed services achieve high availability through built-in load balancing, and relational databases achieve high availability through active / standby mode.

Architecture diagram after cluster deployment:

Multi-level cache

Caches can generally be divided into two categories according to where they are stored: local cache and distributed cache. This case adopts the way of two-level cache to design the cache. The first-level cache is local cache, and the second-level cache is distributed cache. (there are also page caching, fragment caching, etc., which is a finer-grained partition)

Basically immutable / regularly changed information such as first-level cache, cache data dictionary, and commonly used hot spot data, second-level cache caches all the caches needed. Access second-tier cached data when the first-level cache expires or is not available. If there is no secondary cache, access the database.

The proportion of cache, generally 1:4, you can consider using cache. (in theory, 1:2 would be fine).

The following cache expiration policies can be used based on business characteristics:

(1) automatic cache expiration

(2) cache trigger expiration

Single sign-on (distributed Session)

The system is divided into multiple subsystems, and after independent deployment, the problem of session management will inevitably be encountered. Generally, Session synchronization, Cookies and distributed Session can be used. E-commerce websites are generally implemented by distributed Session.

Furthermore, a perfect single sign-on or account management system can be established according to the distributed Session.

Process description

(1) when a user logs in for the first time, write session information (user Id and user information), such as user Id as Key, to distributed Session

(2) when the user logs in again, obtain the distributed Session, whether there is session information, and transfer it to the login page if not.

(3) generally, Cache middleware is used, and Redis is recommended, because it has persistence function, so it is convenient for distributed Session to load session information from persistent storage after downtime.

(4) when saving a session, you can set the duration of the session, such as 15 minutes, and automatically time out after that.

Combined with Cache middleware, the distributed Session can simulate Session session very well.

Database cluster (read-write separation, sub-database and sub-table)

Large websites need to store a large amount of data, in order to achieve massive data storage, high availability, high performance generally use redundant way to design the system. Generally, there are two ways to separate reading and writing and subdatabase and table.

Read-write separation: generally, when the read ratio is much larger than the write ratio, you can use one master, one backup, one master and more than one backup, or multiple master and multiple backups.

This case is on the basis of business split, combined with sub-database and sub-table and read-write separation. As shown below:

(1) after the business is split: each subsystem needs a separate library

(2) if the separate library is too large, it can be re-divided according to the business characteristics, such as commodity classification library and product library.

(3) after sub-database, if there is a large amount of data in the table, the table can be divided according to Id, time, etc. (the advanced usage is consistency Hash)

(4) separate reading and writing on the basis of sub-database and sub-table.

For related middleware, please refer to Cobar (Ali, which is no longer under maintenance), TDDL (Ali), Atlas (Qihoo 360), MyCat (many awesome people in China on the basis of Cobar, known as the first open source project in China).

The problems of sequence after sub-database and sub-table, JOIN, and transaction will be introduced in the topic sharing of sub-database and sub-table.

Service

The functions / modules common to multiple subsystems are extracted and used as public services. For example, the membership subsystem of this case can be extracted as a public service.

Message queue

Message queuing can solve the coupling between subsystems / modules and achieve asynchronous, highly available, high-performance systems. Is the standard configuration of distributed systems. In this case, message queuing is mainly used in shopping and distribution.

(1) after the user places an order, it is written to the message queue and directly returned to the client.

(2) inventory subsystem: read message queue information and complete inventory reduction

(3) Distribution subsystem: read message queue information and carry out distribution

Currently, MQ is widely used, such as Active MQ,Rabbit MQ,Zero MQ,MS MQ, which needs to be selected according to specific business scenarios. It is suggested that we can study Rabbit MQ.

Other architectures (technologies)

In addition to the business split, application clustering, multi-level caching, single sign-on, database clustering, servicealization, and message queues described above. There are CDN, reverse proxy, distributed file system, big data processing system and other systems.

Without detailed introduction here, you can ask du Niang / Google, and you can share it with you if you have the opportunity.

2.5. Architecture summary

This is the end of this article on "what is the distributed architecture of large e-commerce?" Thank you for reading! I believe you all have a certain understanding of "what the distributed architecture of large e-commerce is like". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.