Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The end of developers learning Linux: experience in large-scale system development

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

I. Preface

This article comes from one of my PPT, and this PPT comes from an invitation from a friend, who invited me to give a technical lecture for a company for about 2 hours. My chosen direction is "how to develop a large system".

Here I define large-scale systems as: the average daily PV is above 10 million, while the likes of JD.com and Taobao belong to giant systems.

Therefore, what is described in this article is based on some open source free technologies, while those technologies or services that need to be purchased through F5 hardware acceleration, DNS to achieve load balancing, CDN acceleration and so on are no longer covered in this article.

This article originally appeared at the end of the "developers learn Linux" series, but considering that I may not be able to finish it all because of my time and energy or because my concerns are in other directions, I wrote this article in advance.

Second, starting from the two systems

2.1 the server-side architecture of a mobile Internet company

The image above is the server-side architecture of a mobile Internet company, which supports access requests from millions of clients at home and abroad. It has the following characteristics:

1. Multi-level cluster, which is implemented from Web server layer and NoSQL level database layer, so that the response time of each layer is greatly shortened, so that more requests can be responded to per unit time.

2. NoSQL applications (Memcached). In the field of NoSQL, both Memcached and Redis have a large user base, and Memcached is used in this architecture.

3. Database read-write separation. At present, most database servers support master-slave mechanism or subscription-publication mechanism, which creates conditions for read-write separation, reduces the starting conditions of database competitive deadlock, and greatly shortens the response time (in the case of non-database cluster, we can also consider the mechanism of database division).

4. Load balancing. Nginx realizes the load balancing of Web server, and Memcached comes with load balancing. (note: Nginx load balancing should be involved in this series. If you are interested, please watch it.)

2.2 Architecture diagram of a company's production management system

The picture above is an architectural design for a distributed system of a company, which has several inter-city and inter-provincial production areas, each with its own ecological workshop, and each area is connected to the head office through data links.

The characteristic of this system is that all the products on the assembly line are affixed with a unique bar code, and the bar code affixed to the product will be scanned before a certain operation position of the production line is operated, and the system will do some checking work according to the bar code. for example, whether the product bar code should be used (for example, it should be delivered to the customer before), whether the product has completed all the processes before this process must be completed. If the conditions are met, the name of the current operation process, operator, operation time and operation results are recorded.

There are dozens of processes from the launch to the completion of a product, and there are as few as hundreds of thousands of off-line products and millions of products every month, and the amount of data in a month is not small. Especially in the case of unstable cross-plant network, how to ensure the minimum impact on production.

The system architecture features:

All the business logic of ● is centralized on the server and provided in the form of Service, so that the client can get the latest update in time for the adjustment of business logic.

● deploys Service servers with cluster deployment, and Nginx implements scheduling.

● NoSQL uses Redis. Compared with Memcached, Redis supports more data types. At the same time, Redis has a persistence function, so the final information of the product corresponding to each bar code can be stored in Redis, so that general query work (such as whether the bar code is used and the current status of the product) can be queried in Redis instead of database, which greatly reduces the pressure on the database.

● database adopts master-slave mechanism to realize the separation of read and write, and also to improve the response speed.

● uses message queues MQ and ETL to store some actions that can be processed asynchronously in MQ, which are then executed by ETL (such as notifying the relevant personnel by email after the order is completed)

● implements system monitoring, through Zabbix to monitor servers, applications and network key equipment 7 × 24 hours, and notify IT support personnel of major exceptions in time.

Due to the small production scale in other parts of the headquarters, the production distribution does not adopt a complex structure, but because the defective products returned from customers will be repaired in the headquarters workshop, the headquarters production system needs to save the segment workshop data, so the segment workshop data will be written to both the segment production database and the segment MQ server, and then read and written to the headquarters system by the headquarters ETL server.

In the case of interruption of the branch and headquarters network, the branch system can still work independently until the network is restored.

III. System quality assurance

3.1 Unit Test

Unit testing refers to the inspection and verification of the smallest testable unit in the software. Generally speaking, a unit test is used to determine the behavior of a specific function under a specific condition (or scenario). Common development languages have corresponding unit testing frameworks and common unit testing tools:

Junit/Nunit/xUnit.Net/Microsoft.VisualStudio.TestTool

The importance of unit testing and how to write unit test cases will not be detailed in this article. There are a large number of related articles on the Internet. In short, the larger the system, the more important the system, the greater the importance of unit testing.

For some unit tests that require external dependencies, such as the need for Web containers, mock testing can be used. Java testers can use the testing framework EasyMock. The URL is http://easymock.org/.

3.2 Code quality Management platform

For multi-person team projects, although in most cases there are coding specifications to guide you on how to write code with a consistent team style, there is no guarantee that every member of the team, especially those who join later, will still be able to write code according to the coding specification, so there needs to be a platform to ensure that SonarQube is recommended here.

SonarQube is an open source platform for managing the quality of source code. Sonar is not only a quality data reporting tool, but also a code quality management platform. Supported languages include: Java, PHP, C #, C, Cobol, PL/SQL, Flex and so on. Main features:

● code coverage: passing the unit test will show which line of code is selected

● improves coding rules

● search coding rules: query by name, plug-in, activation level and category

● item search: query by project name

● comparison data: compare the trends of any measurements in the same table

Of course, in addition to the code quality management platform, there is also the help of the source code management system, and the code is reviewed before each code is submitted, so that every change in the code can be traced back.

I have managed and experienced some important systems that have adopted this approach: in addition to managing all program code, the creation of tables, views, functions and stored procedures in the database in the system is controlled by source code version management tools, and the granularity is very small, and the creation of each object is a SQL file.

Although this approach is a bit trivial to operate, it is very convenient for code change traceability.

IV. System performance guarantee

4.1 cach

The so-called cache is to keep some frequently used but relatively unusual data in memory, persist to the database or file system every time you update the data, and update it to the cache at the same time, and use the cache as much as possible when querying.

The implementation method of cache: custom implementation or using NoSQL.

Custom implementations: custom implementations can take advantage of classes provided in SDK, such as Dictionary, etc.

Advantages: it can partially improve the query efficiency.

Disadvantages: can not cross-application, cross-server, limited to a single application; there is no good cache lifecycle management strategy.

NoSQL

Memcached

Advantages: can be cross-application, cross-server, flexible lifecycle management strategy; support high concurrency; support distribution.

Disadvantages: do not support persistence, only in memory storage, data loss after restart, need to "hot load"; only support Key/Value.

Redis

Advantages: flexible lifecycle management strategy across applications and servers; support for high concurrency; support for clustering; support for persistence; support for Key/Value, List, Set, Hash data structures

The above methods all have a characteristic: they need to find the corresponding Value, List, Set or Hash through Key.

In addition to Memcached and Redis, there are some NoSQL databases, such as MongoDB, and databases that support NoSQL, such as PostgreSQL (> V9.4). Here is a comparison of the NoSQL features of MongoDB and PostgreSQL:

The characteristics of document NoSQL database:

(1) do not define the table structure

Even if the table structure is not defined, it can be used as if it were a table structure, saving the hassle of changing the table structure.

(2) complex query conditions can be used.

Unlike key-value storage, document-oriented databases can obtain data through complex query conditions. Although they do not have the processing power of relational databases such as transaction processing and Join, they can basically achieve anything other than the first time.

Nosql is mainly to improve efficiency, relational database can ensure data security; each has its own use scenarios, general enterprise management systems, not much concurrency is not necessary to use nosql, Internet projects or require concurrent nosql to use more, but ultimately important data should be saved to the relational database.

This is why many companies use both NoSQL and relational databases.

4.2 Asynchronous

The so-called asynchronous means that after calling a method, it does not wait for the method to be executed before continuing to perform subsequent operations, but waits for other instructions from the user immediately after the call.

The printer manager is an example of asynchronism. A person may have several documents with hundreds of pages to print. You can click print after opening one document, and then continue to open another document to continue printing. Although it takes a long time to print hundreds of pages of documents, subsequent print requests are queued in the print manager to continue printing the second document after the first document has finished printing.

Asynchrony has two levels: asynchrony at the programming language level and asynchrony achieved through mechanisms such as message queuing.

Syntactic async: most languages such as Java/C# support asynchronous processing

Message queuing is asynchronous

Implementing asynchrony with message queuing is only one of the basic functions of message queuing, which also has the following functions:

● decoupling

● flexibility & Peak processing Power

● recoverability

● service guarantee

● sorting guarantee

● buffering

● understands data flow

● asynchronous communication

Note: message queuing is the best form of communication between processes or applications. Message queuing is the key to creating powerful distributed applications.

The common message queues are as follows, which can be selected according to the characteristics of the system and the mastery of the operation and maintenance support team:

● MSMQ

● ActiveMQ

● RabbitMQ

● ZeroMQ

● Kafka

● MetaMQ

● RocketMQ

4.3 load balancing

Load balancing distributes requests to each server in the cluster according to a certain load strategy, allowing the whole server farm to process the requests of the website.

Common load balancing schemes

● Windows load balancing: NLB

● Linux load balancing: LVS

● Web load balancing: Nginx

● hardware-level load balancing: F5

The first few are free solutions, and F5 is rarely used as a hardware and solution in ordinary enterprises. Only one world-class beverage company I know of uses F5 as a load balancing solution because it is said to be quite expensive.

4.4 Separation of read and write

Read-write separation in order to ensure the stability of database products, many databases have dual hot backup function.

That is, the first database server is a production server that provides addition, deletion and modification services, and the second database server is mainly for reading operations.

Principle:

Let the master database (master) handle transactional add, modify, and delete operations (INSERT, UPDATE, DELETE), while the slave database (slave) handles SELECT query operations.

In general, we deal with it in code, but there are also many read-write separation middleware in the form of commercial middleware, which can automatically schedule read-write database operations to different databases.

In large-scale systems, sometimes both master and slave databases are a cluster, which ensures faster response, and the failure of a single server in the cluster does not affect the external response of the whole system.

V. guarantee of system security

5.1XSScodes *

Guard against XSS***

XSS*** is similar to SQL injection *, * before we find a website with XSS vulnerabilities. There are two types of XSS vulnerabilities, one is DOM Based XSS vulnerability, the other is Stored XSS vulnerability. In theory, if the input data is not processed in all available places, there will be XSS vulnerabilities. The harm of the vulnerabilities depends on the power of the * code, and the * code is not limited to script.

DOM Based XSS

DOM Based XSS is a kind of * * based on the DOM structure of the web page, which is characterized by the fact that only a small number of people have been recruited.

Stored XSS

Stored XSS is a storage XSS vulnerability, because its * code has been stored on the server or in the database, so the victims are many people. If you have two pages, one is responsible for submitting content and the other is responsible for submitting content (forum posts and reading posts are typical of this form):

Submission: window.open ("www.b.com?param=" + [xss_clean])

Page content:

In this way, the things submitted by the user at the a station will open the bilibili page and display the relevant sensitive content if it is not processed during the display.

Precautions against XSS***:

Html encode

Special character filtering:

5.2 SQL injection

SQL Injection

The so-called SQL injection * * means that the SQL command is inserted into the input field of the Web form or the query string requested by the page to deceive the server into executing malicious SQL commands.

In some forms, where user input is directly used to construct (or influence) dynamic SQL commands, or as input parameters to stored procedures, such forms are particularly vulnerable to SQL injection.

For example, when we log in to a system, we query the data at the bottom of the software as follows:

Login SQL statement: SELECT COUNT (*) FROM Login WHERE UserName='admin' AND Password='123456'

SELECT COUNT (*) FROM Login

WHERE UserName='admin'-

Password='123'

Preventive measures against SQL injection:

● data input validation

● special character filtering: special character filtering

● parameterized SQL statements (including stored procedures)

● does not use sa level accounts as connection accounts or restrict access to IP

Preventive measures:

● Html encode

● Special character filtering:

5.3 CSRF***

CSRF (Cross-site request forgery) cross-site request forgery, also known as "One Click Attack" or Session Riding, is usually abbreviated to CSRF or XSRF, which is a malicious exploitation of a website.

Although it sounds like cross-site scripting (XSS), it is very different from XSS, and the * way is almost different. XSS takes advantage of trusted users within the site, while CSRF takes advantage of trusted sites by masquerading requests from trusted users.

Compared with XSS***, CSRF*** is often less popular (so there are few resources to prevent it) and difficult to prevent, so it is considered to be more dangerous than XSS.

Its core strategy is to use browser Cookie or server Session policy to steal user identity.

Precautions against CSRF***:

● form Token

● verification code

● Referer check

● key operation identity confirmation

5.4 other *

Error Code: error code echo. Many Web servers display detailed error information by default for debugging convenience, such as the context in which the error occurred, server and application information, etc., which are easily maliciously exploited.

System or framework vulnerabilities: for example, "JPG vulnerability" exists in the following versions of IIS6.0; Apache Struts2 service enables dynamic method invocation of arbitrary methods vulnerability (CVE-2016-3081); OpenSSL heartbeat vulnerability (CVE-2014-0160); Apache parsing vulnerability; Nginx (

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report