In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
I used to be a student, and in retrospect, I will find that the boys in my school days have a good memory. They can always remember a complex and mysterious alphanumeric string of domain names, and some of the great gods can even directly click on IP to surf the Internet.
Every night when you climb the school wall and go to the Internet bar, you can always find that they will look for open source learning materials in a forum, and do not forget to wish the good people of the building a safe life at the bottom of the page.
At that time, they were already learning the most important open source and sharing spirit of the Internet.
Often think of, moved unceasingly.
After being moved.
We will find that there are several technical issues worth talking about.
For example, why you can surf the Internet with both domain name and IP.
What's the relationship between them.
As we go further, we can talk about the principle of DNS and what its design is worth learning.
Today's topic, let's start with why there is a DNS.
Why there is DNS if we want to visit a certain degree, you can enter the IP address 112.80.248.76 in the search bar on the browser to go directly to the page.
The act of accessing web pages through IP is legal, but sick.
Most people can't even remember their partner's phone number, so how can they remember such a string of IP addresses?
Oh, I'm sorry, I hurt the brothers, you don't have a date.
But I assume you do.
In retrospect, although you can't remember your partner's phone number, it doesn't prevent you from calling her. Your operation process is not to open the address book, type "rich woman", and then pop up a phone number. Click and dial.
In the computer field, there's a good chance you can't remember IP, so you need to have a similar address book function. For example, you only need to type www.baidu.com, and it will help you find the corresponding 112.80.248.76 and visit it.
Access with a domain name where www.baidu.com is the domain name, through which you can get that the IP behind it is 112.80.248.76.
Just as a person can have multiple phone numbers, a domain name can correspond to multiple IP addresses.
The process of resolving a domain name to IP, that is, the process of checking the address Book, is actually what the DNS (Domain Name System, Domain name system) protocol needs to do.
In addition, it should be noted that the IP address above can be accessed when I am writing this article, but it does not mean that you can access it when you read the article. Because the IP address behind this may be changed. You can get the latest IP address by using ping www.baidu.com.
Ping got IP, but here comes the problem.
The address book of ordinary people, which usually has a thousand phone numbers, can be regarded as a small social person, which is more than enough in the address book.
However, the domain name of the website is different, which is said to have exceeded 300 million in 2015.
If you put these 300 million records on one server, there will be two problems.
More than 300 million domain name data, the amount of data is too large, and the amount of data continues to increase.
Need to bear a large number of read requests. There may be thousands of visits to each website domain name. That adds up to hundreds of billions of qps rounded up.
Obviously, if DNS is made into a single point of service such as mobile address book, it is impossible to achieve such a capability, it must be a distributed system.
Therefore, the problem becomes how to design a large distributed system that supports hundreds of billions of + qps requests.
I know someone must say, "is this something that people who serve only 10qps should consider?"
Although the only service we do may be 10qps, it doesn't prevent us from learning the good design in DNS.
Let's start with the hierarchical structure of URL.
The hierarchical structure of URL, for example. A common domain name, such as www.baidu.com.
You can see that there are two periods in the middle of this domain name. Through the period symbol, the domain name can be divided into three parts.
Among them, com is called the first-level domain or top-level domain, other common top-level domains are cn,co, baidu is the second-level domain, www is the third-level domain.
After that, there is actually an omitted period after com. It's called the root domain.
The hierarchy of domain names when there are more domain names, extract the same part of them, and multiple domain names can become such a tree hierarchy.
Hierarchical structure at this time we can see that there is actually a hierarchical relationship between these domains, just like schools, grades, and classes.
When you want to locate a specific domain name, you can find the corresponding domain name through this level.
For instance. You should still remember the slogan, "Li Xiaoming, Class 2, Grade 3, your mother brought you two cans of Wang Zai milk." in fact, Li Xiaoming's mother found people layer by layer through the hierarchical forms of school, grade and class.
The principle of DNS, let's go back and take a look at how the bosses design DNS.
Let's get to the most important conclusion first.
Using hierarchical structure to split services
Add multi-level cache
Next, unfold.
Using the URL hierarchical structure to split the service DNS carries a lot of traffic pressure, so it must be made into a distributed service, so the crux of the problem is how to split the service.
Since URL is a tree-like hierarchical structure, the services that preserve them can also be naturally broken down into tree-like forms based on this.
A server maintains information about one or more domains. As a result, the service becomes the following hierarchy.
When we need to visit www.baidu.com.
The query process is similar to the following figure.
The DNS query process request will first call the nearest DNS server (such as your home router). If it cannot be found in the DNS server, the DNS server will directly ask the root domain server. Although there is no record of www.baidu.com in the root domain server, it can know that the URL belongs to the com domain, so it finds the IP address of the com domain server, and then accesses the com domain server to repeat the above operation. Then find out which server put the baidu domain, continue until you find the record of www.baidu.com, and finally return the corresponding IP address.
As you can see, the principle is relatively simple, but two problems are involved here.
How does this machine know what the nearest DNS server IP is?
How does the nearest DNS server know what the IP of the root domain is?
Let's answer one by one.
How does this machine know what the IP of the nearest DNS server is? This has been written before, "just plugged in the Internet cable, how does the computer know what its IP is?" "it was mentioned that when the network cable is plugged in, the machine will obtain the local IP address, subnet mask, router address, and DNS server IP address through the DHCP protocol.
Below the DHCP protocol is my mac machine, the screenshot of the package in the second phase of DHCP Offer. As you can see, the information returned here contains the IP of the DNS server.
In the offer phase, you can also find the IP address of the DNS server in the Apple icon-> system Table preferences-> Network-> Advanced-> DNS in the upper left corner.
Here is a small detail. From the packet capture diagram above, you can see that the router address and DNS server address, as well as the DHCP server address, are actually 192.168.31.1. This is actually the IP address of my home router, that is to say, most home routers come with these functions.
In a certain CVM, the same is true of DNS servers, which are obtained through dhcp protocol. It's also convenient to check the IP address of the DNS server, just execute cat / etc/ resolv.conf.
In the nameserver above, you can see that there are two DNS servers, which will initiate requests in the order in which they appear in the file, and if the first server does not respond, it will request the second server.
How does the nearest DNS server know what the IP of the root domain is? We also know that the root domain is the top level of the domain name tree, and since it is the top level, there is generally less information. There are only 13 IPV4 addresses and 25 IPV6 addresses.
We can view the dns resolution process of a domain name through the + trace option of the dig command.
The 13 legendary root domains mentioned above, from the letter Amurm, are all in the picture above.
But this raises another problem. All you see above are domain names.
Here.
"I originally wanted to find IP through the domain name, but you asked me to find the IP of other domain names?"
It sounds unscientific, but it's a dead cycle.
Yes, so the IP corresponding to these root domain names will be placed in each domain name server in the form of a configuration file.
In other words, there is no need to request the IP corresponding to the root domain name, just read it out in the configuration.
The screenshot below shows the configuration in the domain name server.
You can see the root domain starting with A, whose IPV4 address is 198.41.0.4.
Adding multi-level cache is almost standard for scenarios with high concurrency, more reads and less writes.
DNS is no exception, it adds caching and has more than one layer.
Enter URL from the search box in the browser. It accesses the browser cache, the operating system cache / etc/hosts, and the most recent DNS server cache. If it can not be found, it is to the root domain, top-level (first-level) domain, secondary domain and other DNS servers to make query requests.
The order of DNS queries after caching is added, so the request process looks like this. You can see that I have added a green small file icon to several cache places mentioned above, giving priority to making queries in the cache.
The cached DNS query process caches the information of the above tree structure, and the nearest DNS server no longer needs to start from the root domain every time. For example, if you can find the server IP of baidu.com in the cache, just skip to the secondary domain server to do the search.
Because of the existence of multi-level caching, the number of requests actually received at each layer is greatly reduced. And everyone visits only a few websites every day, so most of the time they can hit the cache and return the IP address directly.
A simple summary.
In the design of DNS, services are split through a hierarchical structure, and traffic is distributed among multiple servers.
By adding multi-level cache, the actual requests received at each level are greatly reduced, so the performance of the system is greatly improved.
These two points are excellent designs that we can refer to in the process of business development.
But there is another point that we can't learn with high probability, called anycast, which also provides important support for DNS to achieve high concurrent processing power, and I'll put it to the next article to talk about.
The protocol format DNS is a domain name resolution system, and the protocol running on this system is called DNS protocol.
Like HTTP, DNS is an application layer protocol.
DNS is an application layer protocol. The following figure shows its message format.
DNS message field is too many, very dizzy? That's right.
Let's just pick a few key points.
Transsaction ID is a transactional ID, and their transactional ID is the same for a request and the corresponding reply to that request, similar to log_id in a micro-service system.
The flag field is the metric bit, with 2 Byte,16 and bit, and what you need to pay attention to is QR,OpCode and RCode.
QR is used to indicate whether this is a query or a response message. 0 is the query and 1 is the response.
OpCode is used to mark the opcode. The normal query is 0. No matter whether the domain name is checked by ip or ip, it belongs to normal query. It can be rough to think that we usually only see 0.
RCode is a response code, similar to a status code like 404502 in HTTP. Used to indicate whether the result of this request is normal. 0 means that everything is normal. 1 refers to the incorrect message format and 2 internal error in the service domain name server.
The Queries field refers to the content of your actual query. This actually contains three parts of information, Name, Type, and Class.
The content of the query is divided into three parts. Name can put the domain name or IP. For example, if you want to check the IP corresponding to the domain name baidu.com, what is put in it is the domain name, and in turn, check the corresponding domain name through IP, and the Name field is IP.
Type refers to what kind of information you want to check. For example, if you want to find out what the IP address corresponding to this domain name is, fill in A (address). If you want to check whether the domain name has any other aliases, enter CNAME (Canonical Name). If you want to find out what the mailbox server address corresponding to xiaobaidebug@gmail.com is (such as gmail.com), fill in MX (Mail Exchanger). There are many other types, and here are some common Type tables.
The Class field is more interesting, you can simply think that we will only see it fill in IN (Internet). In fact, the DNS protocol was originally designed to consider that there may be more application scenarios, for example, you can also fill in CH,HS here. You don't even need to know what they mean, because they have become fossilized over time, and we know that the only function of this field is that you can install an x at will during the interview to hide your merit and name.
The Answers field, as you can see from the name, corresponds to Queries. Ask and answer. The function is to return the query results. For example, if you look up the corresponding IP address through the domain name, the specific IP information will be put in this field.
After reading the principle of bag grabbing, let's grab a bag.
Let's open the wireshark. And then execute
Dig www.baidu.com at this time, the operating system will issue a DNS request to query the IP address corresponding to www.baidu.com.
The figure above DNS_Query contains the contents of DNS query (request). You can see that it is the protocol of the application layer, and the transport layer uses the UDP protocol for data transmission. The red part of the screenshot is the content of the message field mentioned above that requires attention. The flag field is displayed according to bit, so the branch display is carried out in the grab package.
Next, take a look at the packet contents of the response (response).
DNS_Response can see that the transaction ID (Transaction ID) is consistent with the DNS request message. And there are two IP addresses in the Answers field. After a try, both IP addresses can be accessed normally.
It is concluded that DNS is a very excellent high-concurrency distributed system, which splits services through a hierarchical structure and distributes traffic to multiple servers. By adding multi-level cache, the actual cache received by each level is greatly reduced, so the performance of the system is greatly improved. These two points can be used for reference in the process of business development.
When the network cable is plugged in, the local machine obtains the address of the DNS server through the DHCP protocol.
The IP of the root domain server is loaded into each DNS server in a configured form. Therefore, accessing any DNS server can easily find the IP address corresponding to the root domain.
Finally, there are two questions left for you.
DNS is based on the UDP protocol. From the packet grab, we can see that DNS uses the UDP protocol at the transport layer, so does it only use UDP?
As mentioned above, there are only 13 IPV4 root domain names of DNS, many of which are actually deployed in beautiful countries. Does that mean that as long as they are unhappy and cut off our access, our network will be paralyzed?
This article comes from the official account of Wechat: rookie debug (ID:xiaobaidebug), author: Xiaobai
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.