How to analyze the origin and scheduling of CDN 07/13 Update SLTechnology News&Howtos

How to analyze the origin and scheduling of CDN

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces you how to analyze the origin and scheduling of CDN, the content is very detailed, interested friends can refer to, hope to be helpful to you.

CDN is a technology that distributes the content of origin server to all nodes in the country, so as to shorten the delay of users viewing objects and improve the response speed of users visiting the website and the usability of the website. It can effectively solve the problems of small network bandwidth, large user visits and uneven network distribution.

Let's move on to the sharing text:

It is mainly divided into four small parts to make a brief introduction and sharing with you.

The Origin of CDN

CDN was born more than 20 years ago. With the increasing pressure of the backbone network and the increasing demand for long-distance transmission, the pressure on the backbone network is increasing, and the long-pass effect is getting worse and worse. So in 1995, Tom Leighton, a professor of applied mathematics at MIT, led graduate student Danny Lewin and several other top researchers to try to solve network congestion problems with mathematical problems.

They use mathematical algorithms to deal with the dynamic routing of content, and finally solve the problem that puzzles Internet users. Later, Jonathan Seelig, an MBA student at Sloan School of Management, joined Leighton, and since then they began to implement their own business plan, eventually establishing the company, named Akamai, on August 20, 1998.

In 1998, ChinaCache, the first CDN company in China, was established.

In the next 20 years, the CDN industry has undergone changes and continuous development, and many cloud CDN vendors have emerged in the industry. Aliyun CDN started from Taobao CDN in 2008 and formally developed into Aliyun CDN in 2014. It not only provides services to all subsidiaries of Alibaba Group, but also exports its own resources and technologies in the form of cloud computing.

So what is CDN?

CDN is actually an abbreviation for Content Delivery Network, that is, "content distribution network".

The figure above is a topology diagram after CDN, in which there are several concepts that need to be clarified:

Origin Server: Origin server, that is, the real server of the customer before CDN.

User: visitors, that is, ask the netizens of the website.

Edge Server:CDN servers, not just "edge servers", will be discussed in more detail later.

There are also three "one mile" concepts in CDN, namely, First Mile, Middle Mile, and Last Mile.

First Mile: the CDN device that is as close to the CDN client's server as possible, that is, the first mile.

Last Mile: the visitor (netizen) goes to the nearest CDN server, the last mile.

Middle Mile: data goes from entering the CDN network to all the links before leaving the CDN network, that is, the middle mile.

Why use CDN?

As you can see from the picture above, the picture on the left shows the trans-oceanic long-distance business before CDN. Users who visit from Spain to New York in the United States have to go through the North Atlantic Ocean, with a straight distance of about 6000km. According to the transmission speed of 300000km/s, a beam of light from Spain to New York takes at least 20ms time, and a round trip requires 40ms. If the data is transmitted by optical fiber, coupled with the transmission loss and the introduction of transmission equipment delay, it may go out in hundreds of milliseconds, even if you visit a very small picture with a browser, you will wait for hundreds of milliseconds, adding up. Visiting an American shopping website will be unacceptable to users.

The picture on the right is a schematic diagram after CDN. As can be seen from the picture, the server actually accessed by netizens is not a real server located in the United States, but a CDN server located in the United Kingdom. On the other hand, CDN itself has a caching function, which distributes and caches the immutable content in those web pages, such as pictures, music, videos, etc., to various CDN service nodes, so that netizens do not have to visit New York from Spain, but can visit the UK node closer to them, thus saving more than 80% of the time.

Of course, this is an example of Spain visiting the CDN node in the UK. If the CDN node is also located locally in Spain, the effect will be more obvious, and the details will be explained in more detail later.

Next, let's talk about scheduling. Scheduling is the top priority in CDN. Traffic access, traffic traction and selection of appropriate CDN node servers are all completed in the scheduling link.

To understand the scheduling strategy and principle, we must first understand the DNS protocol and how it works.

We usually work in the computer, will be configured (artificial or automatic) a DNS server address, we call it "local DNS", also known as Local DNS, referred to as LDNS. When resolving a domain name, the actual access is not the "domain name" but the IP address, then the purpose of the LDNS server is to translate the domain name into an IP address that Internet can recognize.

When requesting a domain name, LDNS usually has two situations: one is that the domain name is recorded on LDNS, and the other is that there is no record. The processing flow of the two cases is different.

Suppose that when accessing the 163domain name, if there is a cache record on the LDNS, it will spit out the IP address directly.

If there is no cache record, it will make a request to the back server step by step, and then summarize all the data and give it to the final customer, which is called "recursion".

In the event of a complete failure, LDNS will first make a request to 13 root domain servers around the world, asking where the .com domain name is, and then the root domain server will answer, then ask the .com server where .163.com is, step by step, and finally get the IP address corresponding to the domain name www.163.com. This process is more complicated, if you are interested, you can check the relevant information, I will not repeat them here.

Surely many people are curious about how to dispatch and locate. In fact, it is also done through the specific address of LDNS, as shown in the figure above.

Suppose the netizen is a Beijing customer, then the DNS server he uses to do recursion will access the CDN manufacturer's GLB (Global Load Balance). It can see which LDNS the domain name request is from. According to the usage habits of ordinary people, the location of the netizen is the same as that of the LDNS, so the GLB can indirectly know where the netizen comes from.

As an example, if the netizen is a user of Beijing Unicom and the LDNS address it uses is also from Beijing Unicom, and LDNS accesses GLB from Beijing Unicom, then GLB thinks that the location of the netizen is Beijing Unicom, then a CDN server address of Beijing Unicom will be assigned to LDNS,LDNS to return the IP address resolved by http:www.a.com to the final netizen, then when the netizen browser initiates the request Will directly communicate with the CDN node of Beijing Unicom, thus achieving the purpose of acceleration.

From the perspective of this scheduling theory, it is not difficult to find a problem, that is, the key label "according to the usage habits of ordinary people". Assuming that the LDNS address used by the netizen is in the same area as himself, the scheduling can be accurate (the following chapter will focus on why it is "possible").

But for example, if the netizen is a user of Beijing Unicom, but he insists on using Shenzhen Telecom's LDNS,LDNS exit is also the IP address of Shenzhen Telecom, then GLB will misjudge that netizens are located in Shenzhen Telecom, and the CDN servers assigned to netizens are also owned by Shenzhen Telecom, and subsequent netizens will visit Shenzhen Telecom from Beijing Unicom, which may slow down instead of accelerating.

As mentioned earlier, due to user habits or some other reasons, scheduling through LDNS may be inaccurate, so there is another scheduling method, HTTP 302scheduling.

The principle is very simple: no matter whether the IP address that the netizen gets at first is correct or not, in the end, it has to communicate with the CDN server of this IP address, so the CDN server can know the real address of the netizen at this time (DNS scheduling can only indirectly know the address of the netizen. Although EDNS-Client-Subnet technology can solve the problem, it has not been used on a large scale).

There is a special return status in the HTTP protocol: 302. When the HTTP server returns the 302 status code, it can carry a new URL (using the correct IP). When the browser gets the 302 return status code, the browser will extract the new URL address to initiate the request, so that it can be rescheduled.

In addition to DNS scheduling and HTTP 302 scheduling, there is also a DNS scheduling strategy using HTTP.

With the rapid development and evolution of the network, there are also a lot of little-known technologies and devices, such as hijacking (which will be described separately in the following chapter). After hijacking, the target visited by netizens may no longer be a real server, and even if it is a real server, the content may be false and replaced, which is very dangerous for business security. this hijacking phenomenon often occurs in the mobile Internet (mobile Internet).

In order to avoid this problem, there is a HTTP DNS scheduling method, which is based on the transmission of DNS request and reply information through HTTP messages. But this method does not have any RFC support, so there is no ready-made operating system to directly support, must have its own HTTP DNS client to communicate with the HTTP DNS server, need dual-end support. This practice is widely used in APP.

So how does CDN introduce users' traffic into CDN networks?

When we do not CDN, we visit a domain name and get a real server IP address directly. The DNS record information showing the IP address is called A record, which is usually shown in the following figure.

When the service needs to access CDN, users only need to adjust their DNS configuration information, change A record to CNAME record, and change content to access domain name provided by CDN manufacturer.

On how to analyze the origin of CDN and scheduling to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.