Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is CDN? does it have to be faster with CDN than without it?

2025-04-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

For developers, the word CDN is both familiar and unfamiliar.

You seldom need to touch this when you are engaged in development, but you can always hear others mention it.

We've all heard that it accelerates, and we probably know why, but ask further.

Does it have to be faster with CDN than without it?

I feel a little confused. But it doesn't matter. Today, let's get to know CDN from a different perspective.

What is CDN for numeric and text type data, such as information related to names and phone numbers. We need a place to store it.

We usually use mysql database to save.

The text is stored in mysql. When we need to retrieve this data again, we need to read the mysql database.

However, because the data of mysql is stored on disk, the read performance of a single instance is almost as good as 5kqps.

It looks OK, but for a slightly larger system, it's a little bit of a rush.

In order to improve the performance, we add another layer of memory before the mysql as the cache layer, such as redis. The data is read in memory first, and then read in mysql if you cannot read it. This greatly reduces the number of times to read mysql. With this combination of punches, it can read tens of thousands of qps easily.

Mysql and redis are all right. Here, we are talking about development scenarios that are relatively easy for us to come into contact with.

But if what I'm dealing with now is no longer the text data mentioned above, but the picture data.

For example, I have a handsome picture. This is the one below.

Every time I scan a certain sound and hear someone sing Tsai Jianya's "letting go", I can't help but want to post this picture.

And the article "still can't forget".

So here comes the question.

Where should the data of this picture be stored? Where should I read it?

When we go back to the mysql and redis scenarios, it's nothing more than a storage layer and a cache layer.

Storage layer and cache layer for file objects such as images, the storage layer is unlikely to use mysql, but should switch to professional object storage, such as Amazon's S3 (Amazon Simple Storage Service, note that there are three S words after it, so it is called S3), or Aliyun's oss (Object Storage Service). The following content, we will use the more common oss to explain.

And the cache layer, which can no longer use redis, needs to use CDN (Content Delivery Network) instead.

CDN can be simply understood as the caching layer corresponding to object storage.

CDN and OSS can answer the above questions now. For users, the image data is stored in the object store and will be read from CDN when needed.

Now that CDN works with CDN and object storage, let's take a look at how they work between them.

We usually see the picture, you can right-click to copy to view its URL.

1667103075060 will find that the URL of the picture looks like this.

Https://cdn.xiaobaidebug.top/1667106197000.png where the cdn.xiaobaidebug.top in front of it is the domain name of CDN, and the 1667106197000.png behind it is the pathname of the image.

When we type this URL in the browser, we initiate a HTTP GET request, and then go through the following process.

The first stage of the CDN query process: your computer will first obtain the IP corresponding to the domain name cdn.xiaobaidebug.top through the DNS protocol.

Step1 and step2: first check the browser cache, then look at the / etc / hosts cache in the operating system, and if none, ask the nearest DNS server (such as the home router in your room). Whether there is a corresponding cache on the nearest DNS server, and if so, return.

Step3: if there is no corresponding cache on the nearest DNS server, it will query the root domain, primary domain, secondary domain, and tertiary domain server.

Step4: then the nearest DNS server will get the alias (CNAME) of the cdn.xiaobaidebug.top domain name, such as cdn.xiaobaidebug.top.w.kunlunaq.com.

Kunlunaq.com is a special DNS dispatching system for Ali CDN.

Step5 to step7: at this point, the nearest DNS server will request the kunlunaq.com and return to you the nearest IP address.

The second stage: corresponding to the step8 in the figure above. The browser takes the IP to access the cdn node, and then the cdn node returns the data.

In the first phase of the process above, many new terms are mentioned, such as CNAME, root domain, first-level domain and so on. They are described in detail in the previous "what excellent designs are worth learning in DNS". If you don't understand it, you can take a look at it.

We know that the purpose of DNS is to obtain IP addresses through domain names.

But this is just one of its many functions.

There are many types of DNS messages, among which type A uses the domain name to look up the corresponding IP address of the domain name. The CNAME type uses the domain name to look up the alias of the domain name.

For ordinary domain names, the IP address corresponding to the domain name can be obtained directly after DNS resolution (also known as A type record, A refers to Address).

For example, I use the dig command to issue a DNS request and print the process data.

$dig + trace xiaobaidebug.top;; ANSWER SECTION:xiaobaidebug.top. 600 IN A 47.102.221.141 you can see that xiaobaidebug.top directly parses to get the corresponding IP address 47.102.221.141.

However, for the cdn domain name, the first thing you get after a query is a record xx.kunlunaq.com of CNAME, and then the xx.kunlunaq.com of dig can get the corresponding IP address.

$dig + trace cdn.xiaobaidebug.topcdn.xiaobaidebug.top. 600 IN CNAME cdn.xiaobaidebug.top.w.kunlunaq.com.$ dig + trace cdn.xiaobaidebug.top.w.kunlunaq.comcdn.xiaobaidebug.top.w.kunlunaq.com. 300 IN A 122.228.7.243cdn.xiaobaidebug.top.w.kunlunaq.com. 300 IN A 122.228.7.241cdn.xiaobaidebug.top.w.kunlunaq.com. 300 IN A 122.228.7.244cdn.xiaobaidebug.top.w.kunlunaq.com. 300 IN A 122.228.7.249cdn.xiaobaidebug.top.w.kunlunaq.com. 300 IN A 122.228.7.248cdn.xiaobaidebug.top.w.kunlunaq.com. 300 IN A 122.228.7.242cdn.xiaobaidebug.top.w.kunlunaq.com. 300 IN A 122.228.7.250cdn.xiaobaidebug.top.w.kunlunaq.com. 300 IN A 122.228.7.251 see here, the problem comes again.

Why bother to add a CNAME? What is pointed to in CNAME is actually the DNS domain name server dedicated to CDN. To the whole DNS system, it is just one of the small DNS domain name servers, which looks just like other domain name servers. The DNS request will also log into the server normally.

However, when the request really hits it, it is special. When the query request enters the domain name server, the ordinary DNS domain name server will return part of the IP corresponding to the domain name, but the DNS domain name server dedicated to CDN will ask to return the "nearest" server IP to the caller.

The DNS parsing server dedicated to CDN returns the nearest CDN node. How does IP know which server has the most recent caller in IP? You can see that the word "recent" is actually in double quotation marks.

The DNS domain name server dedicated to CDN is actually provided by the CDN provider. For example, Aliyun certainly knows what its CDN nodes are, as well as the current load, response delay and even weight of these CDN servers, and can also know what the caller's IP address is. You can know the operator to which it belongs and the approximate location through the caller's IP, and filter out the most suitable CDN server according to the conditions. This is the so-called "recently".

For instance. Assuming that the nearest geographically located CDN server room has more traffic and slower response, but the geographically distant server can better respond to the current request, it is reasonable to choose the geographically distant CDN server.

In other words, the selected server may not be geographically nearest, but it must be the most appropriate server currently.

Back-to-origin what is the image above URL, is the https://cdn domain name / image address .png form.

In other words, this picture was obtained by visiting CDN.

So, can you directly access the object storage to get the image data and display it?

Like this.

Https://oss domain name / image address png this is like asking if you can read the text data from mysql and display it directly without going to redis.

Of course.

That's what I did with the pictures I put on my blog.

But the cost is higher, and the cost here can refer to either the performance cost or the call cost. Take a look at the picture below.

1667101182393 you can see that the cost of requesting oss directly is almost twice as much as that of requesting oss through cdn. Considering that my family is poor, and in order to make the blog get pictures faster, I connected to CDN.

But seeing here, the problem comes again.

In the screenshot above, there is a word called "Huiyuan" in the red box.

What is Huiyuan?

When we access the https://cdn domain name / image address .png, the request will be called to the cdn server.

But the cdn server is essentially a layer of cache, not a data source, object storage is the data source.

When you first visit cdn to get an image, there is a good chance that there is no data for the image in cdn, so you need to go back to the data source to retrieve the image data. And then put it on cdn. The next time you access cdn, you can hit the cache and return directly as long as the cache does not expire, so there is no need to go back to the origin.

So the process of the visit becomes like this.

1668605964836 then what else will happen back to the origin?

In addition to the data that cannot be obtained on the cdn mentioned above will be returned to the origin server, the expiration of the cache on the cdn will also cause the origin server to expire.

In addition, even if there is a cache and the cache does not expire, you can trigger active origin-pull through the open interface provided by cdn, but we rarely get access to this.

In addition, back to the source of this thing, in fact, the user is not aware, because when users go to read the picture, they can only know whether they have read it or not.

It is also read and subdivided into whether it is read directly from cdn or returned after cdn back-to-origin read object storage.

There is a difference between cached direct return and no cached origin-pull. So, is there any way to determine whether origin-pull has ever occurred?

Yes. Let's move on.

How to determine whether origin-pull occurs? let's take the cloud object storage and CDN as an example.

Suppose I want to request the following picture, https://cdn.xiaobaidebug.top/ image / image-20220404094549469.png

To make it easier to view the http header of the response data, we can use postman.

Request picture data through the GET method.

Then use the tab switch below to view the response header information.

View response header

In the case of origin-pull, the value of X-Cache under response header is MISS TCP_MISS. It means that missing the cache causes CDN to check the oss back to the origin, get the data and then return it.

Then there must be a cache of this picture in CDN at this time. We can try to execute the GET method again to get the picture.

The value of 1667095186020X-Cache becomes HIT TCP_MEM_HIT, which is the cache hit.

This is the practice of a cloud, others, such as Teng cloud, are also big or bad, almost all of them can find relevant information from response header.

Must it be faster to use CDN than not? If we see here, we can answer the questions at the beginning of the article.

If you do not connect to the CDN, access the origin server directly. The process is like this.

The update accesses the origin server directly, but if CDN is connected and there is no cached data on the CDN, origin-pull will be triggered.

Updating the CDN is also equivalent to adding a layer of CDN calling process to the original process.

That is, when CDN is used, missing the CDN cache results in origin-pull, which is slower than when it is not in use.

If you miss the cache, it may be that there is no such data in cdn at all, or it may be that it once existed but expired later.

Both cases are normal and do not need to be dealt with most of the time.

But for very few scenarios, we may need to make some optimizations. For example, if there is a large version update of your origin server data, such as changing the cdn domain name, at the moment of launch, users all use the new cdn domain name to request pictures and so on. The new CDN node is basically 100% triggered back to the origin, and in serious cases, it may even drag down the object storage. At this point, you may need to filter out the hot data in advance, use the tool to request a wave in advance, and let CDN load the hot data cache. For example, the CDN on a certain cloud has such a "refresh warm-up" function.

Of course, cdn refresh preheating can also be released through the grayscale release mode, first let a small number of users experience the new features, let these users "hot" cdn up, and then gradually open up the traffic.

There is once this data but then expired, for hot data, you can appropriately increase the cache time of cdn data.

1667344813600 when should CDN not be used? Judging from the above description, the biggest advantage of CDN is that, for users from all over the world, it can allocate CDN nodes nearby to obtain data, and has the role of cache acceleration when obtaining the same file data repeatedly.

This is perfect for scenarios such as web images. Because the underlying layer uses object storage, that is to say, as long as it is a file object, such as video, you can use this process to access cdn for acceleration. For example, a certain sound and a short video of a hand is done in this way.

If you think about it the other way around, the problem comes.

Under what circumstances should I not use CDN?

If you have a service on a company intranet, and files such as images requested by the service are unlikely to be called repeatedly, there is no need to use CDN.

Pay attention to the above two bold key points.

The purpose of the private network service is to ensure that you understand the source of the request for the service, and that you can also get the read permission of the object storage, and if your object storage is also internal to the company, it is likely to be in the same computer room as your service. This is very close. Access to CDN also does not enjoy the benefits of "assigning CDN nodes to the nearest".

Images or other files are unlikely to be reused many times. If you connect to CDN, every time you visit CDN to obtain images, there is a good chance that there is no data on the CDN node, which means you need to go back to COS to retrieve one every time. That access to CDN is equivalent to adding a layer of agents to yourself. One more layer of agents, one more layer of time-consuming.

1668612494972 on the second point above, if you need a clear indicator to convince yourself, then I can give you one. From the above introduction, we know that through the X-Cache field in the http header of the cdn response, you can see whether a request has triggered origin-pull, count the number of times, and then divide by the total number of requests, you can get the proportion of origin-pull. For example, if the proportion of origin-pull is as high as 90%, then what is the cdn?

To sum up, we are used to using mysql for storage and redis for caching for text data. But for file data, such as video images, you need to use oss for object storage and cdn for caching.

With CDN, if origin-pull occurs, it will actually be slower than when it is not in use.

The biggest advantage of CDN is that, for users from all over the world, it can allocate CDN nodes nearby to obtain data, and it can accelerate cache when it acquires the same file data repeatedly. If your service and object storage are on the intranet, and the file data is less likely to be reused, there is no need to access cdn.

This article comes from the official account of Wechat: rookie debug (ID:xiaobaidebug), author: Xiaobai

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report