Simulate MMU to design a routing table that indexed IPv4 addresses, which is different from DxR. 07/02 Update SLTechnology News&Howtos

Simulate MMU to design a routing table that indexed IPv4 addresses, which is different from DxR.

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

I don't know if anyone has ever played like this, maybe, maybe not.

Time and space are always favouring one over the other, just because the facilities are incomplete, in the era of lack of resources, we can only choose. But if you have abundant resources, you can have both fish and bear's paw! For routing lookups, compact data structures take up a small amount of space, so does it have to pay the price of time? If we consider the MMU facility, we will find that the compact data structure not only saves space, but also improves speed.

The education we have received for a long time is that we must give up our lives. If we do not give up our lives, we will not get righteousness, or we may be blackmailed. We do not blame ourselves for being wronged, just because we did not die. In fact, think about it carefully, even in an era when resources are not so abundant or even scarce, will compact and compact data structures necessarily lead to a waste of time? In other words, do fast and efficient algorithms have to waste memory? If so, how was MMU designed? If so, MMU will be ruthlessly pass because the time and space it takes to maintain itself exceeds the benefits of its goals. However, as we all know, it was finally designed. And, thanks to the efficient use of CPU Cache, MMU degenerates into a slow path that is enabled only when Cache misses, and through locality we know that most of the time, the execution flows through the fast path. It seems that it is time for a change of thinking.

What I do know is that this is how the DxR algorithm works, so this is not my personal bullshit. But the way I play is a little different from that of DxR.

Advance the longest mask logic to the time of insertion

For general operating systems such as Linux, the lookup cost of the routing table is mainly concentrated on the "longest mask match", because in the process of performing IP routing, the input is only an IPv4 address, that is, the destination IP address extracted from the IP header, while the IP routing module does not know which entries in the routing table are related to this IPv4 address and needs to be determined by a lookup. The "longest mask matching" logic is executed in the search process. For the HASH organization algorithm, the hash table is organized according to the length of the mask, and the matching order is from 32-bit mask down, while for the TRIE organization algorithm, the situation is similar. For more information, see "Overview of Internet routing table lookup algorithms-hash / LC-Trie tree / 256-way-mtrie tree".

For search, especially for HASH algorithm, it is inevitable to compare, the comparison result is either 0 or non-0, if the cost of comparison is removed, it will theoretically save half of the time, for TRIE trees, including backtracking, the cost savings are more. Of course, whether it is the HASH algorithm or the TRIE tree algorithm, it will do a lot of optimization around the data structure itself and the characteristics of the address, but unfortunately, these optimizations address the symptoms rather than the root cause, and cannot fundamentally eradicate the "find" operation! Well, eliminating search / comparison is the fundamental goal!

Indexing the IPv4 address is the answer! The IPv4 address space has a total of 4G addresses, and if each address is used as an index, it will consume a lot of memory, but this is not an essential problem, and you can organize it into a hierarchical index like a page table. The essential problem is how to make a many-to-one correspondence between routing items and indexes! That is, according to an IPv4 address, take it as an index, and then the index points directly to the result of all the routing with the longest mask! This is not difficult. Instead of introducing multi-level indexes, I still use a flat 32-bit IPv4 address space as a first-level index. As shown in the following figure:

It can be seen that the most important thing is to divide the IPv4 address space into multiple intervals with the routing prefix. According to the figure above, take the target IP address as the index, go to the right, and the first routing item encountered is the result. The logic of the longest mask is fully reflected in the insertion / deletion process, that is, the prefix becomes shorter from left to right, and the routing item with the long prefix will be covered in front of the routing item with the short prefix, which is the core idea. In fact, this idea is also used in the HiPac firewall, that is, interval priority. It's just a reasonable and ingenious orchestration of the data structure, advancing the longest mask logic to the time of insertion / deletion, and indexing the IP address, which takes the matching process one step to the bit.

We cannot completely cover the back with a long prefix, because when it is deleted, the latter will be exposed.

All right, let's sum up. When the insert / delete operation is performed, it ensures that the route item with the longest mask covers the front of its address range.

Routing or switching

There is a philosophical reason behind the fact that IP Internet is designed to be routing-based rather than switching-based. Nowadays, however, people have gradually added the switching characteristics to IP routing, thus designing many hardware-based fast forwarding devices or relying on routing tables to generate transaction tables to achieve fast forwarding, such as layer 3 switches. But what exactly is exchange? What is routing? To put it simply, routing is a relatively soft term, and its executor is "take out the field of the protocol header, then match the longest prefix with the contents of the routing table, and go through a lot of memory access and comparison operations." while exchange is a harder term, it is executed by "taking an 'index field' from the protocol header, directly indexing the exchange table, and then forwarding it directly according to the result pointed to by the index". Let's see, my way of playing and DxR have changed into switching routes. Maybe you think this is nothing more than a bug trick, but shouldn't life be happy for such a small thing?

Envisioned implementation-changing "search" into "access"

As we all know, the modern operating system is based on virtual memory, which better realizes the isolation and access control between processes, but this paper does not talk about these, this article is to talk about "a kind of utilization" based on this principle.

In fact, in the running process of a computer running a modern operating system, each address accessed has to go through a "lookup". This lookup process is so fast that most users and even programmers (except system programmers) will turn a blind eye, and even many people do not know that there is such a lookup process, which is the virtual / physical address translation process of MMU.

If I use the IPv4 routing prefix as the virtual memory address, take the routing result information such as the corresponding next hop as the content of the physical page, and establish the page table mapping according to this corresponding relationship, then I only need to access the target IPv4 address extracted from the IP header to get the content of the corresponding physical page. A suit? NO! The content is the result of the route. I simplified the diagram of the first section and changed it a little bit to look like this:

Do you see anything? Isn't this the page table? Yes, the IPv4 address is used as the index, and the route item result is used as the physical page, and the longest mask matching process is reflected in the process of building the map. But there's a problem! It takes up too much space! Yes, the solution for MMU is to build multi-level mappings, and routing tables can also adopt this principle. Bend the above diagram and it becomes a routing matching table for MMU-like facilities:

All right, now the route matching table is completely wrapped in the MMU facility, and the IPv4 address is completely indexed! Directly "access the IPv4" address like the access memory address, for example, the IPv4 address is 0x01020304, then in order to get the result of its routing entry, you only need the following access:

Char * addr = 0x01020304 struct fib_res * res = (struct fib_res *) * addr

If a missing page occurs, it means that there is no matching route, that is, Network is unreachable, and if there is a default route, all virtual addresses that do not have a specified mapping will fall on the default routing page.

The interlude caused by the difference between MMU and

Although the picture above looks really like a MMU facility, have you noticed the difference between them?

The physical page size of the MMU map is fixed, but the range of addresses covered by each route in the routing table is not fixed, but what does it matter? After most of the day, I was going to write a simulation implementation, and I felt very excited, and then I went to take a shower. I can't help it. I like it cold, but it's too cold at home. Maybe a hot bath can bring some ideas, but not only does it not bring any ideas. Instead, I found a serious problem, that is, the routing item cannot be completely compared to the physical page, because its size is not fixed. If the IPv4 address space is split according to pages of a similar size of 4096, and eventually a range covering 4096 IPv4 addresses is indexed in the second-level "routing page table", do they have to use the same routing entry? I felt so stupid at that time! Push your thinking down one step and solve the problem, and this is not a problem at all. I drew it clearly in the last picture above! I used all 32-bit IPv4 addresses for indexing instead of vacating the lower 12 bits like a 4096-size page table! I'm actually building an address table instead of an address block table. The complexity is all about inserting and coding the next hop. I think it is absolutely impossible to store pointers in the final routing "page", because for 32-bit systems, pointers need 4 bytes, and 64-bit pointers are more. In order to cope with the extreme situation of an IPv4 address and a route, each target IPv4 as an index finally locates to the so-called "item" that can only be used with one byte!

How to use a byte? What if I have 10,000 items? Ha ha! On the other hand, what are we going to get in the end? Just get a next hop! How many next jumps will there be in total? Will 256 be enough? I think that's enough! You may have 10,000 routing table entries, but they will reuse much fewer "next hops". Have you ever seen a router connect more than 200 cables like a blossom? Switch! So I might code as follows: put all the next hops in a continuous piece of memory, each item of a fixed size, and then use the final routing page table plus the byte pointed to by the offset to index these next hops (if the number of next hops exceeds 256, there is a way to borrow bits from byte that are not used for alignment, alignment not only facilitates fast memory addressing, but also uses cacheline mapping).

I drew the above picture afterwards. Instead of following this line of thinking in the shower, I was thinking about the rationality of D16R (DxR instance with 16bit as a direct index). Should I also be led into the mind of DxR? Think of this, I am excited and depressed, excited because I originally designed DxR, depressed because I really do not want to learn it, I want to design a completely indexed multi-level index table, do not add any so-called "algorithm", so I want to avoid all kinds of trees, such as binary search, and even avoid hashing and traversal. So before I use it, I want to record the reasons why I want to avoid this algorithm, the following section should be encrypted, in case it is seen, do not spit, this is not spitting, this is a hobby.

Reasons to avoid trees, hash and subtle algorithms

Must O (1) be soon? What about O (n) and O (lgn)? Big O should be self-improvement!

First of all, when designing and implementing a system, don't be tied up by the theory in the algorithm book. Big O is designed to provide a consideration in terms of scalability. To put it simply, if the algorithm does not increase the computing delay as the number of elements increases, then it is O (1). If the number of elements increases with time is a log relationship, then it is O (lgn). How much is the specific n, and how many curves will it take to "turn"? You might say that this MMU-based routing table is not suitable for IPv6, because it takes up most of the space, so it's not scalable, but I didn't say it's for IPv6, isn't it the same as a 32-bit virtual address for IPv4 routing? Why isn't MMU designed with extensibility in mind? The answer is that when MMU is applied to 64-bit systems, it can have more choices, such as reverse hash tables, but for 32-bit systems, fully indexed MMU is definitely better than various trees and various hash. In addition, it is more suitable for hardware implementation, because it is "illogical" and simple! Give me an inappropriate example. If an O (1) algorithm, its execution time is 100 years, even if n reaches 10000000000. Each trip is 100 years, an absolute O (1) algorithm, and another O (N2) algorithm, which executes 1ns when n equals 100, and Hercules knows that in a given environment, n will not be greater than 500. which algorithm would you choose?

In the IPv4 environment, or in the IPv6 environment where there is no shortage of money to buy memory, or in any controllable and limited environment (don't talk about infinity! There is no infinity in the computer! You see how hard it is to calculate a big number in OpenSSL), multi-level index table is undoubtedly the fastest data structure, it is best to use hash, of course, but it is definitely not as fast as indexing. Indexing ensures the speed, multi-level ensures that the space occupation will not be too large, in which the series is the number of operations performed by the algorithm, and the rest are floating clouds.

The big O method of the algorithm is suitable for algorithm analysis, but if it is used in a real system, many other constraints must be considered. Big O ignores the overhead of memory access addressing in data access, smoothes the efficiency differences caused by all levels of cache (they are orders of magnitude differences), smoothes the instruction time differences of various operations in instruction execution, ignores cache, and ignores MMU, but these can not be ignored in the actual implementation. Algorithm analysis is not even an analysis of software performance, which is not its flaw, because that's not what people do. The modification of both software and hardware can improve the same algorithm, and different hardware wiring can cause differences in actual cost, such as changing a bus or moving a location. Therefore, the final performance should be the function of the algorithm itself, software implementation and hardware implementation, with different weights. People often care about the algorithm itself, followed by the software implementation, for the hardware, basically look up, no money how to do, unlike the former two, to change the algorithm, another implementation can be done.

The realization of reality-use it

Such a wonderful analogy completes a perfect search structure, and it's time to use it!

To put it simply, you just need to create an "address space" and populate the MMU with the contents of the routing table. But it's not that simple. For example, in Linux, you will face the following problems:

1. You cannot use the C library or any other library.

Because there are both data and instructions in the address space, every instruction, that is, the instruction of the process itself, occupies a virtual address, which cannot be used as an IPv4 address. The library encapsulates a large number of instructions, so it cannot be used.

two。 You can't even use the kernel.

This is no fun, the kernel itself for all address space sharing, the kernel as a management body, its code itself is also mapped in any address space, for example, many addresses above 0xC0000000 are mapped to physical memory, needless to say.

Because of the mapping of code instructions, the entire virtual address space cannot be used entirely by IPv4 addresses, so what is the solution?

Now that you have learned to think, why do you have to copy it completely? Direct use of MMU facilities? This idea is too crazy, but also proves that thinkers are too lazy! Admittedly, you can use a set of facilities in the virtual MMU with virtualization support, but this only means that you are proficient in the hardware itself. Why not build your own soft MMU?

There is no doubt that the construction of the DxR routing table is compact and subtle, it does not expect to use off-the-shelf MMU, but adds a dichotomy to it, which is a good compromise simulation, and I can do the same. I don't expect this matching algorithm to simulate MMU itself to be fast, but I want to learn the idea of DxR, even if a compact data structure is used to improve the utilization of CPU Cache, as much as possible to Cache the results to CPU instead of sending requests to the bus! What can I do if I completely use the system's hardware MMU? Can I use its TLB? If not, what's the point? Do you know what it means to hit TLB? Did you know that most MMU addressing operations are basically hit in TLB instead of looking up the page table directly? TLB is a kind of Cache! Therefore, simulating MMU is not the fundamental goal, using Cache is the king!

We know that CPU Cache (including TLB) can be hit with considerable frequency because of the locality of memory addressing! For IP addresses, does this locality exist? Imagine that multiple data packets belonging to a data stream will continue to pass, and the packets of different data streams will pass through the wrong peaks, and you will know that the locality principle is a universal principle. Traffic engineering on core paths is path-based, while QoS is application-based, and this principle of classification promotes locality rather than offsets it! After all, classification, what is classification? This is a philosophical question. 2000 years since Plato, people are still debating whether classification is for aggregation or hashing.

This is a harvest during the Spring Festival of the year of the Goat. I compared MMU and simulated MMU. Another harvest is that I read a lot of history books and watched several movies, including a decent horror movie "resentful Spirit", which talks about history in Shaoxing Lanting.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.