What are the knowledge points of large-scale Internet architecture? 04/28 Update SLTechnology News&Howtos

What are the knowledge points of large-scale Internet architecture?

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what are the knowledge points of large-scale Internet architecture". In daily operation, I believe many people have doubts about the knowledge points of large-scale Internet architecture. I have consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what are the knowledge points of large-scale Internet architecture?" Next, please follow the editor to study!

1. Characteristics of large-scale website system

High concurrency and high traffic

High availability

Huge amount of data

The distribution of users is wide and the network situation is complex.

Poor safety environment

Rapid change of requirements and frequent iterations

Gradual development

two。 The evolution of large-scale website architecture 2.1. Initial phase architecture

Problem: at the beginning of the operation of the website, there were few visitors, and one server was more than enough.

Features: all resources such as applications, databases, files, etc., are on one server.

Description: usually the server operating system uses linux, the application is developed using PHP, and then deployed on Apache, and the database uses Mysql, commonly known as LAMP. By pooling all kinds of free and open source software and a cheap server, you can start the development of the system.

2.2. Separation of application services and data services

Problem: more and more user access leads to worse and worse performance, more and more data leads to insufficient storage space, and one server is no longer enough to support.

Features: application server, database server and file server are deployed independently.

Description: the three servers have different performance requirements: the application server needs to handle a lot of business logic, so a faster and more powerful CPU; database server needs fast disk retrieval and data caching, so it needs faster hard disk and more memory; the file server needs to store a large number of files, so it needs a larger hard disk capacity.

2.3. Use caching to improve performance

Problem: as the number of users increases, too much pressure on the database leads to access delays.

Features: because website visits follow the same law as wealth distribution: 80% of business visits are concentrated on 20% of data. Caching a small part of the data with concentrated access in the database in memory can reduce the access times of the database and reduce the access pressure of the database.

Description: there are two kinds of cache: the local cache on the application server and the remote cache on the distributed cache server. The access speed of the local cache is faster, but the amount of cache data is limited, and there is competition for memory with the application. Distributed caching can be clustered, which can theoretically achieve caching services without memory capacity restrictions.

2.4. Use an application server cluster

Problem: after using caching, the pressure of database access is effectively alleviated. However, the request connection that a single application server can handle is limited, which becomes a bottleneck during the peak access period.

Features: multiple servers provide services to the outside world at the same time through load balancing to solve the problem of insufficient processing capacity and storage space of a single server.

Description: the use of clusters is a common means for the system to solve the problems of high concurrency and massive data. By adding resources to the cluster to improve the concurrent processing capacity of the system, the load pressure of the server is no longer the bottleneck of the whole system.

2.5. Database read-write separation

Problem: after the website uses cache, most of the data read operations can be accessed without going through the database, but some read operations and all write operations still need to access the database. After the users of the website reach a certain scale, the database becomes the bottleneck of the website because of the high load pressure.

Features: at present, most mainstream databases provide master-slave hot backup function. By configuring the master-slave relationship of two databases, you can synchronize the data updates of one database server to one server. The website uses the master-slave hot standby function of the database to achieve the separation of database read and write, so as to improve the load pressure of the database.

Description: the application server accesses the master database during the write operation, and the master database synchronizes data updates to the slave database through the master-slave replication mechanism. In this way, when the application server is in a read operation, the access obtains data from the database. In order to facilitate the application program to access the database after the separation of read and write, a special data access module is usually used on the application server to make the correspondence of the separation of database read and write transparent.

2.6. Reverse proxy and CDN acceleration

Problem: the network environment in China is complex, and when users from different regions visit the website, the speed varies greatly.

Features: CDN and reverse proxy are used to speed up the static resource access of the system.

Description: the basic principle of CDN and reverse proxy is caching. The difference is that CDN is deployed in the computer room of the network provider, so that users can obtain data from the nearest computer room of the network provider when requesting website services. The reverse proxy is deployed in the central computer room of the website. when the user requests to reach the central computer room, the first access server is the reverse proxy server. if the resources requested by the user are cached in the reverse proxy server, return it directly to the user.

2.7. Distributed file system and distributed database

Problem: with the continuous growth of the business of large websites, the database has been separated from one server into two servers, which still can not meet the demand.

Features: the database adopts distributed database, and the file system adopts distributed file system.

Description: distributed database is the last method of database splitting and is used only when the scale of single table data is very large. Less than as a last resort, the more commonly used means of database splitting is the business sub-database, which deploys different business databases on different physical servers.

2.8. Use NoSQL and search engines

Problem: as the business of the website becomes more and more complex, the demand for data storage and retrieval becomes more and more complex.

Features: the system introduces NoSQL database and search engine.

Description: NoSQL databases and search engines have better support for scalable and distributed features. The application server accesses all kinds of data through the unified data access module, which reduces the trouble of the application program managing many data sources.

2.9. Business split

Problem: the business scenarios of large websites are becoming more and more complex, divided into multiple product lines.

Features: divide and conquer the entire website business into different product lines. The system is split and reformed according to the business, and the application server is deployed separately according to the business differentiation.

Description: applications can establish relationships through hyperlinks or data distribution through message queues, of course, more often by accessing the same data storage system to form an associated complete system.

Vertical split: split a large application into multiple small applications, if the new business is relatively independent, then directly design and deploy it as an independent Web application system. Vertical split is relatively simple, by combing the business, less related business can be spun off.

Horizontal split: split the reused services and deploy them as distributed services independently. New services only need to call these distributed services to identify reusable services, design service interfaces, and standardize service dependencies.

2.10. Distributed service

Problem: as the business becomes smaller and smaller, the storage system becomes larger and larger, the overall complexity of the application system increases exponentially, and deployment and maintenance is becoming more and more difficult. Because all applications have to connect with all database systems, the database connection resources are insufficient and the service is denied.

Features: public services are extracted and deployed independently. These reusable services connect to the database and provide common business services through distributed services.

3. Large-scale website architecture model 3.1. Stratification

Hierarchical structure is often used in large-scale website architecture, and the software system is divided into application layer, service layer and data layer.

Application layer-responsible for specific business and view presentation. Such as the home page of the website and search input and results display.

Service layer-provides service support for the application layer. Such as user management service, shopping cart service and so on.

Application layer-provides data storage access services. Such as database, cache, file, search engine and so on.

Hierarchical architecture constraints: prohibit cross-layer calls (the application layer calls the data layer directly) and reverse calls (the data layer calls the service layer, or the service layer invokes the application layer).

Layering can continue within the hierarchical structure, for example, the application can be subdivided into view layer and business logic layer, and the service layer can be subdivided into data interface layer and logical processing layer.

3.2. Split up

Separate different functions and services and package them into module units with high cohesion and low coupling. This contributes to the development and maintenance of the software, facilitates the distributed deployment of different modules, and improves the concurrent processing ability and function expansion ability of the website.

3.3. Distributed system

Larger than large websites, one of the main purposes of layering and segmentation is to facilitate distributed deployment of segmented modules, that is, different modules are deployed on different servers and work together through remote calls.

Distributed means that more machines can be used to work, then CPU, memory, and storage resources will be more abundant, and the more concurrent access and data can be handled, thus providing services for more users.

Distribution also introduces some problems:

Service calls must pass through the network, and network latency can affect performance

The more servers, the greater the probability of downtime, resulting in a decrease in availability.

Data consistency is very difficult, and distributed transactions are difficult to guarantee.

The dependence of the website is complicated, and it is difficult to develop, manage and maintain.

Common distributed solutions:

Distributed applications and services

Distributed static resources

Distributed data and storage

Distributed computing

3.4. Cluster

Cluster means that multiple servers deploy the same applications to form a cluster, which provides services through load balancing devices.

The cluster needs to have scalability and failover mechanism: scalability means that machines can be added or reduced to the cluster according to the number of user visits; failover means that when a machine fails, the load balancer device or failover mechanism forwards the request to other machines in the cluster, so that the use of users is not affected.

3.5. Caching

Caching is to store data in the nearest location to speed up processing. Caching is the first way to improve software performance.

In website applications, caching can not only speed up data access, but also reduce the load of back-end applications and data storage.

Common caching methods:

CDN

Reverse proxy

Local cach

Distributed cache

There are two prerequisites for using caching:

Data access hotspots are uneven, and frequently accessed data should be placed in the cache.

The data is valid for a certain period of time, but it expires quickly, otherwise the cached data will be read dirty because it has expired.

3.6. Async

One of the important goals and driving forces of software development is to reduce software coupling. The less direct relationship between things, the less influence each other, and the easier it is to develop independently.

In the large-scale website architecture, the means of system decoupling is not only layered, segmented, distributed, but also an important means-async.

The message passing between businesses is not synchronous invocation, but splits a business operation into multiple phases, and each phase cooperates asynchronously by sharing data.

Asynchronism can be achieved within a single server by sharing memory queues with multiple threads. The thread in front of the business operation outputs the operation to the queue, and the subsequent thread reads data from the queue for processing.

In a distributed system, multiple server clusters are asynchronous through distributed message queues.

Asynchronous architecture is a typical producer consumption pattern, and there is no direct call between them. Asynchronous message queuing also has the following features:

Improve system availability

Speed up the response

Eliminate the peak of concurrent access

3.7. redundancy

For large websites, server downtime is an inevitable event. In order to ensure that the website can continue to serve when some servers are down, and there is no data loss, a certain degree of redundant operation of the server and redundant backup of data are needed. In this way, when a server goes down, the services and data access on it can be transferred to other machines.

Services with low access and low load must also deploy at least two servers to form a cluster in order to achieve high service availability through redundancy. In addition to regular backup, archival preservation and cold backup, in order to ensure the high availability of online business, it is also necessary to separate the master from the database and synchronize the hot backup in real time.

In order to withstand the complete paralysis caused by the earthquake, tsunami and other irresistible factors, some large websites will backup the entire data center and deploy disaster preparedness data centers around the world. Website programs and data are synchronized to multiple disaster recovery data centers in real time.

3.8. Automation

The automation architecture design of large-scale website architecture mainly focuses on release, operation and maintenance:

Automation of the release process

Automated code management

Automated testing

Automatic safety monitoring

Automated deployment

Automation of operation and maintenance

Automatic monitoring and control

Automatic alarm

Automatic failure transfer

Automatic failure recovery

Automatic degradation

Automatic allocation of resources

3.9. Safety

Password and mobile phone check code for identity authentication

Important operations such as login and transactions need to encrypt network communications, and sensitive data stored, such as user information, are also encrypted.

Prevent robot programs from attacking websites and use CAPTCHA to identify them.

Deal with XSS attacks, SQL injection, transcoding and other common attacks on websites.

Filter spam and sensitive information

Risk control for important operations such as transaction transfer according to transaction mode and transaction information

4. Core architecture elements of large-scale websites

A popular saying of architecture is: the highest level of planning, difficult to change the decision.

In addition to the system functional requirements, the architecture needs to focus on the following architectural elements:

4.1. Performance

Performance problems are everywhere, so there are many ways to optimize website performance:

Front end

Browser cach

Static resource compression

Rational layout of pages

Reduce cookie transmission

CDN

Application server

Local cach

Distributed cache

Asynchronous message queuing

Cluster

Code level: using multithreading to improve memory management

Database

Indexes

Database caching

SQL optimization

4.2. Usability

Availability refers to the ability to provide services to users when some servers fail.

redundancy

Set up clusters to provide services through load balancing equipment

Data is stored on multiple servers and backed up with each other

Automation: reduce the possibility of introducing faults into the online environment by means of pre-release verification, automated testing, automated release, grayscale release, etc.

4.3. Flexibility

The measure of scalability is whether a cluster can be built with multiple servers and whether it is easy to add or delete server nodes to the cluster. Whether you can provide the same service as before after adding and deleting server nodes. Whether there is a limit on the total number of servers that can be accommodated in the cluster.

Application server cluster-as long as data is stored on the server, all servers are peer-to-peer, and servers can be continuously added to the cluster through load balancing devices

Cache server cluster-adding a new server may invalidate cache routing, which in turn makes most of the cached data in the cluster inaccessible. Although cached data can be reloaded through the database, if the application relies heavily on caching, it may cause the Web site to crash. The cache routing algorithm needs to be improved to ensure the accessibility of the cached data.

Relational database cluster-although relational database supports mechanisms such as data replication and master-slave hot backup, it is difficult to achieve the scalability of large-scale clusters, so the cluster scalability scheme of relational databases must be implemented outside the database. Servers with multiple databases deployed into a cluster are formed by means of routing and partitioning.

NOSql database clusters-support for scalability is usually very good because it is born to deal with large amounts of data.

4.4. Expansibility

The measure of scalability is whether new business products can be made transparent and unaffected by the addition of new business products, without any or few changes, and new products can be launched with existing functions. The main means are: event-driven architecture and distributed services.

4.5. Security.

Security protects the website from malicious attacks and important data of the website from being stolen.

At this point, the study of "what are the knowledge points of large-scale Internet architecture" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.