What are the high-performance developments 07/12 Update SLTechnology News&Howtos

What are the high-performance developments

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the high-performance development". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what high-performance development is there.

Ipaw O Optimization: zero copy Technology

The above worker thread reads files from disk and sends data through the network, and the data from disk to network needs to be copied four times, of which CPU needs to be moved twice.

Zero-copy technology liberates CPU and sends file data directly from the kernel without having to copy it to the application buffer, which is a waste of resources.

Linux API:

Ssize_t sendfile (int out_fd, int in_fd, off_t * offset, size_t count)

The function name has clearly explained the function of the function: send a file. Specify the file descriptor and network socket descriptor to be sent, and a function is done!

After using the zero-copy technology, version 2.0 has been developed, and the image loading speed has been significantly improved. However, when the boss found that there were more people visiting at the same time, he slowed down again and asked you to continue to optimize. At this point, you need to:

Ipaw O Optimization: multiplexing Technology

In the previous version, each thread was blocked in the recv waiting for the other party's request, so there were more visitors, more threads opened, a large number of threads were blocked, and the system slowed down.

At this point, you need multiplexing technology, use the select model, put all waits (accept, recv) in the main thread, the worker thread does not have to wait.

After a while, more and more people visit the site, even select is starting to be a little overwhelmed, and the boss continues to ask you to optimize your performance.

At this point, you need to upgrade the multiplexing model to epoll.

Select has three disadvantages and epoll has three advantages.

At the bottom of select, arrays are used to manage socket descriptors, and the number of socket descriptors managed at the same time is limited to several thousand. Epoll uses trees and linked lists to manage, and the number of management can be very large at the same time.

Select won't tell you which socket has the message, you need to ask it one by one. Epoll directly tells you who has the news, no need to poll.

Select also needs to copy the socket list back and forth between user space and kernel space when making system calls, which is a waste of calling select in a loop. Epoll uniformly manages socket descriptors in the kernel without having to copy them back and forth.

With the use of epoll multiplexing technology and version 3.0, your website can handle many user requests at the same time.

But the greedy boss is not satisfied, not willing to upgrade the hardware server, but let you further improve the throughput of the server. After your research, you find that in the previous scheme, worker threads are always used to create and close, and when a large number of requests come, threads continue to create, close, create and close, which is very expensive. At this point, you need to:

Thread pool technology

We can start a batch of worker threads as soon as the program starts, instead of creating them when there is a request, and use a common task queue to deliver tasks to the queue when the request comes. each worker thread uniformly takes tasks out of the queue for processing, which is thread pool technology.

The use of multithreading technology improves the concurrency ability of the server to a certain extent, but at the same time, multiple threads often need to use mutexes, signals, conditional variables and other means to synchronize multiple threads for data synchronization. These heavyweight synchronization methods often lead to multiple thread switching in user mode / kernel mode, system calls and thread switching are not small overhead.

In thread pool technology, a common task queue is mentioned, from which each worker thread needs to extract tasks for processing. Here, multiple worker threads are involved in the synchronous operation of this common queue.

Are there any lightweight solutions for multithreaded safe access to data? At this point, you need to:

Lock-free programming technology

In multithreaded concurrent programming, thread synchronization is needed when common data is encountered. The synchronization here can be divided into blocking synchronization and non-blocking synchronization.

Blocking synchronization is easy to understand. The mechanisms provided by our commonly used operating systems, such as mutexes, signals, and condition variables, all belong to blocking synchronization, and their essence is to add "locks".

The corresponding non-blocking synchronization is to achieve synchronization without locking. At present, there are three kinds of technical solutions:

Wait-free

Lock-free

Obstruction-free

The three kinds of technical solutions are all through certain algorithms and technical means to achieve synchronization without blocking waiting, among which Lock-free is the most widely used.

Lock-free can be widely used because the mainstream CPU provides atomic-level read-modify-write primitives, which is the famous CAS (Compare-And-Swap) operation. On the Intel x86 series processors, it is the cmpxchg series instructions.

/ / implement Lock-free do {...} while (! CAS (ptr,old_data,new_data)) through CAS operation

We often see no lock queue, no lock list, no lock HashMap and other data structures, most of the lock-free core comes from this. In daily development, the proper use of lock-free programming technology can effectively reduce the extra overhead caused by multi-thread blocking and switching and improve performance.

Server online for a period of time, found that the service often crashed exception, troubleshooting found to be the worker thread code bug, a crash the whole service is unavailable. So you decide to split the worker thread and the main thread into different processes, and the worker thread crash does not affect the overall service. There are multiple processes at this time, and you need to:

Interprocess communication technology

What can you think of when it comes to interprocess communication?

Pipeline

Named pipe

Socket

Message queue

Signal

Semaphore

Shared memory

The above various ways of inter-process communication are introduced and compared in detail. I recommend an article to master inter-process communication. I will not repeat it here.

For the large amount of data exchange between local processes, the scheme of shared memory is the first choice.

Modern operating systems generally adopt the management scheme based on virtual memory, under which the processes are forcibly isolated. The memory address used in the program code is a virtual address, which is allocated and mapped to the corresponding physical memory page in advance by the memory management algorithm of the operating system. When CPU executes the code instructions, the accessed memory address is translated in real time.

As can be seen from the above figure, although different processes have the same memory address, the actual memory pages for storing data are different with the cooperation of the operating system and CPU.

The core of the interprocess communication scheme of shared memory is that if the same physical memory page is mapped to two process address spaces, both parties can read and write directly without copying.

Of course, shared memory is only the final data transmission carrier, and the two sides have to rely on signals, semaphores and other notification mechanisms to communicate.

With the use of high-performance shared memory communication mechanism, multiple service processes can work happily, even if there is a worker process with Crash, the whole service will not be paralyzed.

Soon, the boss increased demand, no longer satisfied with providing static web browsing, but needed to be able to interact dynamically. This time the boss is conscientious and added a hardware server to you.

So you use Java/PHP/Python and other languages to build a web development framework, a separate service, used to provide dynamic web support, and other static content servers to work with.

At this point you find that communication is often needed between static and dynamic services.

At first you use the HTTP-based RESTful interface to communicate between servers, but later you find that it is inefficient to transmit data in JSON format, and you need a more efficient communication solution.

At this point, you need:

RPC & & serialization technology

What is RPC technology?

The full name of RPC is Remote Procedure Call, remote procedure call. In our usual programming, we call functions at any time, and these functions are basically located locally, that is, a block of code at a certain location in the current process. But what if the function you want to call is not local, but on a server on the network? This is the source of remote procedure calls.

As can be seen from the figure, the function call through the network involves the packaging and unpacking of parameters, the transmission of the network, the packaging and unpacking of the results, and so on. Among them, the data packaging and unpacking need to rely on serialization technology to complete.

What is serialization technology?

Serialization is simply converting objects in memory into data that can be transferred and stored, and the reverse operation of this process is deserialization. Serialization & & deserialization technology enables the movement of memory objects on local and remote computers. It's like locking an elephant in the refrigerator door in three steps:

Encode local memory objects into data streams

Transmit the above data stream over the network

Build an object in memory from the received data stream

Serialization technology has many free and open source frameworks, and there are several metrics to measure a serialization framework:

Whether it supports cross-language use and which languages can be supported?

Whether it is just a simple serialization function and whether it includes a RPC framework.

Serialize transmission performance

Extended support (compatibility before and after adding and deleting fields for data objects)

Whether dynamic parsing is supported (dynamic parsing means that it does not need to be compiled in advance and can be parsed immediately according to the data format definition file obtained)

A comparison of the following three popular serialization frameworks protobuf, thrift, and avro:

ProtoBuf:

Manufacturer: Google

Supported languages: C++, Java, Python, etc.

Dynamic support: poor, generally need to compile in advance

Whether to include RPC: no

Introduction: ProtoBuf is Google's serialization framework, mature and stable, strong performance, many large factories are in use. It is only a serialization framework and does not include RPC functionality, but it can be used with the GPRC framework, which is also produced by Google, as a golden partner for back-end RPC service development.

The disadvantage is that the support for dynamics is weak, but this phenomenon needs to be improved in the newer version. Overall, ProtoBuf is a highly recommended serialization framework.

Thrift

Manufacturer: Facebook

Supported languages: C++, Java, Python, PHP, C #, Go, JavaScript, etc.

Dynamic support: poor

Whether to include RPC: yes

Introduction: this is a RPC framework produced by Facebook, which contains a binary serialization scheme, but Thrift's own RPC and data serialization are decoupled, and you can even choose XML, JSON and other custom data formats. In China, there are also a number of large factories in use, the performance is on a par with ProtoBuf. The disadvantage is that like ProtoBuf, support for dynamic parsing is not very friendly.

Avro

Supported languages: C, C++, Java, Python, C #, etc.

Dynamic support: good

Whether to include RPC: yes

Summary: this is a serialization framework derived from the Hadoop ecology, with its own RPC framework, but also can be used independently. The biggest advantage over the first two is to support dynamic data parsing.

Why do I keep talking about this dynamic parsing feature? In the previous project experience, Xuanyuan encountered the selection of three technologies, and these three solutions are in front of us. Need a C++ development service and a Java development service to be able to RPC.

Both Protobuf and Thrift need to "compile" the corresponding data protocol definition files into the corresponding C++/Java source code, and then integrate them into the project to compile together for parsing.

At that time, the students of the Java project team strongly rejected this approach on the grounds that the compiled strong business code was integrated into their business-independent framework services, while the business was constantly changing, which was not elegant enough.

Finally, after testing, we finally choose AVRO as our scheme. The Java side only needs to dynamically load the corresponding data format file to parse the obtained data, and the performance is good. (of course, for the C++ side, we still chose to compile ahead of time.)

Since your site supports dynamic capabilities, you inevitably have to deal with databases, but as the number of users grows, you find that the query speed of the database is getting slower and slower.

At this point, you need to:

Database indexing technology

Think that you have a math textbook on hand, but the catalogue has been torn up. Now you have to turn to the page about trigonometric functions. What should you do?

Without the catalog, you have only two ways, either page by page or at random until you find the page of the trigonometric function.

The same is true for databases. If our data table does not have a "catalog", then if we want to query the rows of records that meet the criteria, we have to scan the whole table, which is annoying. So in order to speed up the query, you have to set up a catalog for the data table. in the database field, this is the index.

In general, a data table will have multiple fields, so different indexes can be set up according to different fields.

Classification of indexes

Primary key index

Clustered index

Nonclustered index

As we all know, the primary key is a field that uniquely identifies a data record (there are also multiple fields together to uniquely identify the primary key of the data record), which corresponds to the primary key index.

Clustered index refers to the index whose logical order is consistent with the physical storage order of table records. In general, the primary key index conforms to this definition, so generally speaking, the primary key index is also a clustered index. However, this is not absolute, it is still different in different databases, or in different storage engines under the same database.

The leaf node of the clustered index stores the data directly, which is also the data node, while the leaf node of the non-clustered index does not store the actual data and needs a second query.

The implementation principle of Index

There are three main implementations of indexes:

B + tree

Hash table

Bitmap

Among them, the B+ tree is the most used, and its characteristic is that the tree has many nodes. Compared with the binary tree, it is a multi-forked tree and a flat fat tree. Reducing the depth of the tree is helpful to reduce the number of disk Imando O, which is suitable for the storage characteristics of the database.

The index implemented by the hash table is also called the hash index, and the hash function is used to locate the data. The hash algorithm is characterized by high speed and constant time complexity, but the disadvantage is that it is only suitable for accurate matching, not suitable for fuzzy matching and range search.

Bitmap indexes are relatively rare. Imagine a scenario where there are only a limited number of possibilities for a field, such as gender, province, blood type, and so on. What happens if a Btree is used as an index for such a field? There will be a large number of leaf nodes with the same index value, which is actually a waste of storage.

Bitmap index is optimized based on this point, there are only a small number of limited items for the value of the field, and when there are a large number of repetitions of the column field in the data table, it is the time for the bitmap index to show its skill.

The so-called bitmap is Bitmap. Its basic idea is to establish a binary bitmap for each value of the field to mark whether the column field of each record in the data table is the corresponding value.

Although the index is good, it can not be abused. On the one hand, the index will eventually be stored on disk, which will undoubtedly increase the storage overhead. In addition, more importantly, the addition and deletion of the data table will generally be accompanied by the update of the index, so it will have a certain impact on the writing speed of the database.

Your website is getting more and more visitors now, and the number of people online has increased greatly. However, the requests of a large number of users bring a large number of back-end programs to access the database. Gradually, the bottleneck of the database began to appear, which can no longer support the growing number of users. Once again, the boss has given you the task of improving your performance.

Cache Technology & & Bloom filter

From physical CPU caching of in-memory data to browser caching of web content, caching technology can be found in every corner of the computer world.

In the face of the current database bottleneck, we can also use caching technology to solve it.

Each visit to the database requires the database to look up the table (of course, the database itself has optimization measures), which reflects that the bottom layer is to do one or more disk I _ peg O, but those involving I _ max O will slow down. If it is some data that is frequently used but does not change often, why not cache it in memory and not have to ask the database every time, so as to reduce the pressure on the database?

Where there is demand, there is market, where there is market, there will be products. The in-memory object caching system represented by memcached and Redis arises at the historic moment.

There are three famous problems with caching systems:

Cache penetration: the purpose of caching is to intercept requests to the database storage layer at a certain level. The meaning of penetration is that the interception is not successful, the request finally goes to the database, and the cache does not produce the due value.

Cache breakdown: if you understand the cache as a wall in front of the database, to "resist" query requests for the database, the so-called breakdown is to make a hole in this wall. It usually occurs when a hot data cache expires, and at this time, a large number of query requests for that data come, and everyone comes to the database.

Cache avalanche: if you understand the breakdown, the avalanche will be better understood. As the saying goes, breakdown is an avalanche of one person, and an avalanche is the breakdown of a group of people. If the cache wall is full of holes, how can the wall stand? Be over sooner or later.

For a more detailed description of these three issues, I recommend an article on what are the three mountains of the cache system.

With the cache system, we can ask whether the cache system has the data we need before making a request to the database. If so, we can save a query from the database. If not, we can request from the database again.

Note that there is a key question, how to determine whether the data we want is in the cache system?

Further, we abstract the question: how to quickly determine whether a collection with a large amount of data contains the data we specify?

At this time, it is time for the Bloom filter to show its skill, and it was born to solve this problem. So how does the Bloom filter solve this problem?

Let's go back to the above problem, which is actually a lookup problem. For finding problems, the most commonly used solutions are search trees and hash tables.

Because there are two key points in this problem: speed and a large amount of data. First of all, the tree structure has to be excluded, and the hash table can achieve the performance of constant order, but when the amount of data is large, on the one hand, the capacity of the hash table is huge. On the other hand, how to design a good hash algorithm to achieve the hash mapping of such a large amount of data is also a difficult problem.

For the problem of capacity, considering that we only need to determine whether the object exists, rather than getting the object, we can set the item size of the hash table to 1 bit,1 to indicate that it exists, and 0 means it does not exist, which greatly reduces the capacity of the hash table.

As for the problem of hash algorithm, if we have lower requirements for hash algorithm, the probability of hash collision will increase. That hash algorithm is easy to conflict, so get a few more, and the probability of multiple hash functions colliding at the same time is much smaller.

The Bloom filter is based on this design idea:

When the corresponding key-value is set, the corresponding bit position 1 will be calculated according to a set of hash algorithms.

However, when the corresponding key-value is deleted, the corresponding bit position cannot be 0, because it is guaranteed that a hash algorithm of another key is also mapped to the same location.

It is precisely because of this that leads to another important feature of the Bloom filter: the existence of the Bloom filter does not necessarily exist, but the judgment that it does not exist must not exist.

There is more and more content on your company's website, and there is a growing demand for fast site-wide search. At this point, you need to:

Full-text search technology

For some simple query requirements, the traditional relational database can cope with. However, once the search requirements become complex, such as according to the article content keywords, multiple search conditions but logical combination, the database is stretched, and a separate index system is needed to support it.

Nowadays, ElasticSearch (ES for short), which is widely used in the industry, is a powerful search engine. With the advantages of full-text retrieval, data analysis and distributed deployment, it has become the first choice of enterprise search technology.

ES uses RESTful interface, uses JSON as the data transmission format, supports a variety of query matching, and provides SDK for all mainstream languages, which is easy to use.

In addition, ES often works with two other open source software, Logstash and Kibana, to form a complete solution for log collection, analysis and presentation: ELK architecture.

Among them, Logstash is responsible for data collection and analysis, ElasticSearch is responsible for search, and Kibana is responsible for visual interaction, which has become the iron triangle of many enterprise log analysis and management.

No matter how much we optimize, the power of a server is limited. With the rapid development of the company's business, the original server has been overwhelmed, so the company has purchased a number of servers and deployed multiple copies of the original services to cope with the growing business demand.

Now, there are multiple servers in the same service, and you need to distribute the user's requests to each server evenly. At this time, you need to:

Load balancing technology

As the name implies, load balancing means to distribute the load evenly among multiple business nodes.

Like caching technology, load balancing technology also exists in every corner of the computer world.

According to the balancing entity, it can be divided into software load balancer (such as LVS, Nginx, HAProxy) and hardware load balancer (such as A10, F5).

According to the network level, it can be divided into four layers of load balancing (based on network connection) and seven layers of load balancing (based on application content).

According to the equilibrium strategy algorithm, it can be divided into polling equilibrium, hash equilibrium, weight equilibrium, random equilibrium or the combination of these algorithms.

For the problems encountered now, you can use nginx to achieve load balancing. Nginx supports multiple load balancing configurations, such as polling, weight, IP hash, minimum number of connections, minimum response time, and so on.

Polling

Upstream web-server {server 192.168.1.100; server 192.168.1.101;}

Weight

Upstream web-server {server 192.168.1.100 weight=1; server 192.168.1.101 weight=2;}

IP hash value

Upstream web-server {ip_hash; server 192.168.1.100 weight=1; server 192.168.1.101 weight=2;}

Minimum number of connections

Upstream web-server {ip_hash; server 192.168.1.100 weight=1; server 192.168.1.101 weight=2;}

Minimum response time

Upstream web-server {server 192.168.1.100 weight=1; server 192.168.1.101 weight=2; fair;} Thank you for reading, this is the content of "what is the high-performance development"? after the study of this article, I believe you have a deeper understanding of what high-performance development you have, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.