How to analyze the distributed algorithm of memcached 07/11 Update SLTechnology News&Howtos

How to analyze the distributed algorithm of memcached

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article shows you how to analyze the distributed algorithm of memcached, which is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Let's start with the distribution of memcached, not the internal structure of memcached.

Distribution of memcached

Although memcached is called a "distributed" cache server, there is no "distributed" function on the server side. Server-side memory storage function, its implementation is very simple. As for the distribution of memcached, it is implemented entirely by the client library. This kind of distribution is the biggest feature of memcached.

What does memcached's distributed mean?

The word "distributed" is used many times here, but it is not explained in detail. Now let's briefly introduce its principle, and the implementation of each client is basically the same.

The following assumes that the memcached server has three node1~node3, and the application wants to save the data with the key name "tokyo", "kanagawa", "chiba", "saitama" and "gunma".

Figure 1 distributed introduction: preparation

First add "tokyo" to the memcached. After the "tokyo" is passed to the client library, the algorithm implemented on the client side will determine the memcached server that stores the data based on the "key". Once the server is selected, it is instructed to save "tokyo" and its value.

Figure 2 introduction to distribution: when adding

Similarly, "kanagawa", "chiba", "saitama" and "gunma" all select the server before saving.

Next, get the saved data. When you get it, you also pass the key "tokyo" to the library. The function library selects the server according to the "key" through the same algorithm as when the data is saved. Using the same algorithm, you can select the same server as when you saved it, and then send the get command. As long as the data is not deleted for some reason, you can get the saved value.

Figure 3 introduction to distribution: when getting

In this way, the distribution of memcached is realized by saving different keys to different servers. When the number of memcached servers increases, the keys will be dispersed, and even if one memcached server fails to connect, it will not affect other caches, and the system will continue to run.

Next, we introduce the distributed approach to the implementation of the Perl client function library Cache::Memcached mentioned in the first time.

Distributed method of Cache::Memcached

Perl's memcached client library Cache::Memcached is the work of Brad Fitzpatrick, the author of memcached, which can be said to be the original function library.

Cache::Memcached-search.cpan.org

The function library realizes the distributed function and is the distributed method of memcached standard.

Calculate the dispersion according to the remainder

Cache::Memcached 's distributed approach is simply "dispersed according to the remainder of the number of servers". Get the integer hash value of the key, divide it by the number of servers, and select the server according to the rest.

Let's simplify Cache::Memcached to the following Perl script to illustrate.

Use strict;use warnings;use String::CRC32;my @ nodes = ('node1','node2','node3'); my @ keys = (' tokyo', 'kanagawa',' chiba', 'saitama',' gunma'); foreach my $key (@ keys) {my $crc = crc32 ($key); # CRC my $mod = $crc% ($# nodes + 1); my $server = $nodes [$mod] # Select server printf "% s = >% s\ n", $key, $server;} based on the remainder

Cache::Memcached uses CRC when calculating the hash value.

String::CRC32-search.cpan.org

First, the CRC value of the string is obtained, and the server is determined according to the remainder obtained by dividing the value by the number of server nodes. After the above code is executed, enter the following result:

Tokyo = > node2kanagawa = > node3chiba = > node2saitama = > node1gunma = > node1

According to this result, "tokyo" is dispersed to node2, "kanagawa" is dispersed to node3 and so on. By the way, when the selected server cannot connect, Cache::Memcached adds the number of connections to the key, calculates the hash again, and tries to connect. This action is called rehash. When you do not want rehash, you can specify the "rehash = > 0" option when generating the Cache::Memcached object.

The disadvantage of calculating dispersion according to the remainder

The method of calculating the remainder is simple, and the dispersion of the data is also excellent, but it also has its shortcomings. That is, cache reorganization can be costly when adding or removing servers. After you add a server, the remainder changes dramatically, so that you cannot get the same server as when you saved it, thus affecting the hit ratio of the cache. Use Perl to write a piece of code to verify the cost.

Use strict;use warnings;use String::CRC32;my @ nodes = @ ARGV;my @ keys = ('averse.. roomz'); my% nodes;foreach my $key (@ keys) {my $hash = crc32 ($key); my $mod = $hash% ($# nodes + 1); my $server = $nodes [$mod]; push @ {$nodes {$server}}, $key } foreach my $node (sort keys% nodes) {printf "% s:% s\ n", $node, join ",", @ {$nodes {$node}};}

This Perl script demonstrates saving the keys from "a" to "z" to memcached and accessing them. Save it as mod.pl and execute it.

First, when there are only three servers:

$mod.pl node1 node2 nod3node1: a forerunner, crecinct, drecinct, erecaliser, jrew, relegation, wrecery, xnode2: gpentry, iReagle, klemagerl, plemagersrecorder, ynode3: brecincefwrence, mrecory, qpencil, tdykinjrez.

As a result, node1 saves a, c, d, e... , node2 saves g, I, k. Each server holds 8 to 10 pieces of data

Next, add a memcached server.

$mod.pl node1 node2 node3 node4node1: drecinct frecedom.orecalityvnode2: brecinceirekrecedprecy ynode3: ePhenwagwaglwritnwritywnode4: arecincewrecrpjjrecoverqrecoversrexrez.

Node4 has been added. It can be seen that only d, I, k, p, r, y hit. Like this, the servers where the keys are scattered after adding nodes will change dramatically. Only six of the 26 keys are accessing the original server, and all the rest have been moved to other servers. The hit rate dropped to 23%. When using memcached in a Web application, the cache efficiency decreases significantly as soon as the memcached server is added, the load is concentrated on the database server, and the normal service may not be provided.

There is also this problem with mixi's Web application, which makes it impossible to add a memcached server. However, thanks to the new distributed approach, it is now easy to add memcached servers. This distributed approach is called Consistent Hashing.

Consistent Hashing

The idea of Consistent Hashing has been introduced in many places, such as blog, the development of mixi Co., Ltd., and here is only a brief explanation.

Mixi Engineers' Blog-dispersion is fast and fast.

ConsistentHashing-please do not know how to use this method

A brief description of Consistent Hashing

The Consistent Hashing is shown as follows: first, the hash value of the memcached server (node) is calculated and configured on the circle (continuum) of 0,232. Then use the same method to find the hash value of the key that stores the data and map it to the circle. Then start looking clockwise from the location to which the data is mapped and save the data to the first server found. If the server cannot be found over 232, it will be saved to the first memcached server.

Figure 4 Consistent Hashing: fundamentals

Add a memcached server from the status shown above. The remainder distributed algorithm will affect the cache hit ratio due to the great change in the server where the keys are stored, but in Consistent Hashing, only the keys on the first server that increases the location of the server on the continuum counterclockwise will be affected.

Figure 5 Consistent Hashing: add a server

Therefore, Consistent Hashing suppresses bond redistribution to the greatest extent. Moreover, some Consistent Hashing implementation methods also adopt the idea of virtual node. Using the general hash function, the distribution of mapping locations of the server is very uneven. Therefore, using the idea of virtual nodes, 100,200 points are allocated on the continuum for each physical node (server). In this way, the uneven distribution can be suppressed and the cache redistribution when the server increases or decreases is minimized.

The result of the test using the memcached client function library using the Consistent Hashing algorithm described below is that the hit ratio is calculated by the number of servers (n) and the increased number of servers (m) as follows:

(1-n / (ncm)) * 100

Function libraries that support Consistent Hashing

Although Cache::Memcached, which has been introduced many times in this series, does not support Consistent Hashing, several client-side libraries have supported this new distributed algorithm. The first memcached client function library that supports Consistent Hashing and virtual nodes is the PHP library called libketama, developed by last.fm.

Libketama-a consistent hashing algo for memcache clients-RJ benchmark-Users at Last.fm

As for the Perl client, the Cache::Memcached::Fast and Cache::Memcached::libmemcached introduced in the first series support Consistent Hashing.

Cache::Memcached::Fast-search.cpan.org

Cache::Memcached::libmemcached-search.cpan.org

Both have almost the same interface as Cache::Memcached, and if you are using Cache::Memcached, you can easily replace it. Cache::Memcached::Fast reimplements libketama, and you can specify the ketama_points option when creating objects using Consistent Hashing.

My $memcached = Cache::Memcached::Fast- > new ({servers = > ["192.168.0.1 Cache::Memcached::Fast- 11211", "192.168.0.2 Cache::Memcached::Fast- 11211"], ketama_points = > 150})

In addition, Cache::Memcached::libmemcached is a Perl module that uses libmemcached, a C function library developed by Brain Aker. Libmemcached itself supports several distributed algorithms, as well as Consistent Hashing, and its Perl binding also supports Consistent Hashing.

Tangent Software: libmemcached

The above content is how to analyze the distributed algorithm of memcached. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.