From the failure of simulating MMU to designing a routing table to the regression of DxR 07/13 Update SLTechnology News&Howtos

From the failure of simulating MMU to designing a routing table to the regression of DxR

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

In an article I wrote the other day, I described an experience of failure. For me, who cared about the process very much, I described it as success. However, I had to fall back to DxR to study its nature rather than its algorithmic thinking. The reason for my failure is that my rebellious mentality is in trouble. I really started without studying the nature of DxR. There is no doubt that I am fighting a fierce battle that is unprepared and completely ignorant of my opponent. If it is not enough, the result will be as tragic as the original Bloom!

The essence of DxR

DxR does not invent any new algorithm, it is efficient because it separates the routing prefix and the next hop in the routing item. On the basis of this, it can use three tables to achieve its goal of both high efficiency and small space. Let me sum it up:

The premise of DxR algorithm is to separate the routing prefix from the next hop.

This premise is very important! Separating prefixes and next hops can eliminate data redundancy, and the goal of building a lookup table is transformed from a simple lookup matching table to a mapping relationship between a certain interval of an IPv4 address and the next hop table, which directly leads to interval lookup. Let's take a look at a very similar Trie tree lookup algorithm, in which the routing prefix and the next hop are bound together as a "routing item", so the lookup process is a process of exact matching + backtracking. The DxR algorithm eliminates the backtracking process.

Facilities of DxR algorithm: direct index table, interval table, next hop table.

I'll talk about this later, but remember, this is not the core, it's just a way to achieve it.

DxR algorithm table design: the significance of direct index table

The direct index table merges a large range of IPv4 addresses so that the interval table can search more quickly in the much fewer intervals after the merge, and both tables are designed to point to the index of the next hop table. This establishes the mapping of the interval to the next hop.

Dividing the IPv4 address space into intervals with routing prefixes

If you don't know what the DxR algorithm is so far, it doesn't matter, its idea is very simple. The end result of the routing table is to map a contiguous address segment to a next hop (no discontiguous mask is allowed), so the routing table actually divides the entire IPv4 address space into several intervals, each associated with only one next hop. I posted a correct picture in that article about recording failures as follows:

Take the destination IP address as the index, go right, and the first route item you encounter is the result. The logic of the longest mask is fully reflected in the insertion / deletion process, that is, the prefix becomes shorter from left to right, and the routing item with the long prefix is covered in front of the routing item with the short prefix. Although I have now denied indexing IPv4 addresses directly, the core idea has not changed, that is, "map XX to a specific next hop." in that failure record, XX is the IPv4 address index, while in the right practice, XX is the interval. In fact, this idea is also used in the HiPac firewall, that is, interval search. In HiPac algorithm, the interval is the match domain, and the next hop corresponds to Rule.

Then, the DxR algorithm is a step-by-step optimization for the above diagram. To better illustrate DxR, I once again give the transformation form of the above figure:

Interval search

If according to the above diagram, the entire IPv4 address space is divided into N intervals, the ultimate goal of routing lookup is to correspond an IPv4 address to a certain interval! So far, in fact, the work has been completed. But there is a premise, that is, you have to find or implement a high-performance "interval matching algorithm"! That is to set up an interval table and store N interval items internally, each interval item corresponds to a next hop index, for example, interval m corresponds to the next hop C. our goal is to give an IPv4 address and determine which interval it belongs to. Such algorithms abound, and it seems not difficult to implement one by yourself, such as dichotomy, hash algorithm and so on, so this paper does not pay attention to these. However, DxR does not seem to be satisfied with this discovery, and of course I am not satisfied. DxR seems to be hoping to find a more optimized way to achieve this interval matching.

Before giving the framework of DxR, so far, we find that DxR essentially uses interval matching to map a target IPv4 address to an interval, and then fetch the next hop corresponding to that interval!

Delimit molecular interval

If the destination IPv4 address of each incoming packet has to be matched in N intervals, it does not seem elegant. If the N intervals can be divided into several sub-intervals, the number of matching intervals will be greatly reduced, for example, N is 100. if the entire IPv4 address space can be divided into 20 equal sub-intervals, then the number of matching intervals will be 5 instead of 100! But there is another premise here, that is, the cost of dividing sub-intervals must be offset by the benefits brought about by reducing the number of intervals, and the benefits must be greater!

At this point, it will be easy if you have a deep understanding of the secondary page table. A page directory entry contains 1024 page table entries, and one page table entry points to a 4096-byte page. The page directory divides the entire 32-bit virtual address space into 1024 segments of the same size, each of which is 4096-1024 virtual addresses corresponding to 32-bit IPv4 addresses. Isn't that what happened? However, the second-level page table or multi-level page table solves the problem of sparse addresses. if it is a first-level page table, there will be a lot of "holes" in the middle, because how the process arranges virtual addresses is beyond the control of the kernel and MMU. As for the problems we encounter at present, the purpose of using a similar grading method is to delimit molecular intervals so as to improve the efficiency of each interval matching. note that this is not for the purpose of the index. I mistakenly regard the index as a goal rather than a means. So I fell to the abyss of doom!

However, for IPv4 addresses, instead of using 10bit (which is set taking into account the characteristics of virtual address addressing and the size of the page), we use k bit partitioning. Note that there is no concept of pages in the routing table! If k equals 16, then the high 16 bits of the IPv4 address are indexed. Due to the existence of low 16 bits and free values, each index entry includes the number of IPv4 addresses covered by 16 bits, that is, 65535 IPv4 addresses. The current interval lookup table looks like this:

You know, IPv4 addresses with high 16-bit addresses can be indexed into subranges at once, which is an instant operation! Then the following question is "how to arrange these sub-intervals properly".

Sub-interval layout

How to layout sub-intervals into a compact structure matters, because a compact data structure means that you can load CPU Cache!

Taking the last picture above as an example, we certainly want all the intervals to be stored continuously, which seems to be the only way to be compact. We call this compact merged subinterval table the interval table, as shown in the following figure:

At this point, how can the high 16-bit index table of IPv4 addresses tell which subintervals are to be divided into the 65535-address range that you index? The answer, of course, is to indicate a starting position and the number of intervals. If we present all the illustrations in a final way, take a look at the following figure:

The figure above contains only three tables, an index table, an interval table, and a next hop table. The next hop table is not shown in the diagram because its content is not fixed. It can be just an IP address, device information, status information, etc., or a linked list for load balancing and, of course, pointing to something else. The most important of these are the first two tables, namely the index table and the interval table. Both tables can be placed in a very compact space and take up a very small amount of memory, and these two tables will introduce themselves to the maximum capacity to be loaded into CPU Cache.

What does DxR look like?

I'm a little embarrassed. Because the above said all that should be said.

In fact, DxR is expressed in the picture above! Only in DxR:

The x in 1.DxR refers to k above. In my case, I took 16, which can actually take a different value. But generally speaking, it is greater than or equal to 16. two。 The third part of the index table in the figure, encoding optimized data, can be optimized to make these tables more compact. 3. If the number of intervals of the interval table located in the index table is 1, the index table can point directly to the next-hop index.

Generally speaking, the higher the k value, the larger the space occupied by the index table. If the k value is 32, then I am sorry. The index table entry is 4G, and the interval table no longer exists, because all the IP address to next hop mappings are detailed. This is the final result of my own simulation MMU design. In short, the larger the index table, the more detailed the mapping of IP addresses to the next hop. The smaller the size of the interval table will be in the statistical sense, which is also the embodiment of space for time. When the size of the index table is fixed, the size of the interval table is not fixed, depending on the route item layout of your routing table, so it is impossible to make good use of DxR without any routing planning power. For example, you should try to use techniques such as summarization. In order to make the routes can be summarized, you may also need to re-route so that the routes that can be summarized can share the next hop connected to the same interface. This also involves the ability to distribute routes, especially when you mix dynamic and static routes. In short, IP routing is complex, involving integrated capabilities, algorithms, IP address understanding, address planning, route distribution, dynamic routing, configuration commands, and even integrated cabling.

I did not say how to add or delete this table. I think it can be analyzed by myself. It is mainly affected by dynamic routing. After all, if the state of the line does not change often, the routing table is generally stable.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.