In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "meeting and solving the problem of UDP protocol under Docker container network". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "to meet and solve the problem of UDP protocol under Docker container network".
Problem recurrence
This problem is easy to reproduce. My experiment was done with the netcat command under ubuntu16.04, and other systems should be similar. Listen on port 56789 through nc on the host, and then use nc to send data in the container. * messages can be sent, but later messages can be seen on the network, but cannot be received by the other party.
Run the nc UDP server on the host (- u for UDP protocol,-l for listening port)
$nc-ul 56789
Then start a container and run the client:
$docker run-it apline sh / # nc-u 172.16.13.13 56789
The communication of nc is for both parties. No matter what character the other party enters, the other party will receive it immediately after entering the enter. However, in this mode, the client can receive the input each time, and the subsequent message will not be received by the other party.
In this experiment, the container uses the default network of docker, the container's ip is 172.17.0.3, connects to the virtual bridge docker0 (ip address 172.17.0.1) via veth pair (not shown in the figure), and the host's own network is eth0, with its ip address 172.16.13.13.
172.17.0.3 +-+ | eth0 | +-+-+ | +-+ | docker0 | | eth0 | +-+ 172.17.0 . 1 172.16.13.13
Tcpdump grabs the bag
When we encounter this kind of difficult and complicated problem, if we want to grab a packet, we need to grab it on docker0, because this is where messages must pass. By filtering the ip address of the container, the container finds the message of interest:
$tcpdump-I docker0-nn host 172.17.0.3
In order to simulate the question-and-answer communication mode of most applications, we send a total of three messages and use tcpdump to grab the messages on the docker0 interface:
The client sends the hello string to the server first
Server-side reply world
The client continues to send hi messages
The result of capturing the packet is as follows. It can be found that there is no problem with sending * messages (because there is no ACK message in UDP, so the client cannot know whether the other party has received it or not. There is no problem here is that the value does not have a corresponding ICMP message). However, when the second message is sent from the server, the other party will return an ICMP to tell port 38908 to be unreachable. The same is true for the third message sent from the client. The future messages are similar and the two sides can no longer communicate.
UDP, length 6 1111 IP 50.102018 IP 172.17.0.1.56789 > 172.17.0.3.38908: UDP, length 6 1120 IP 50.102129 IP 172.17.0.3 > 172.17.0.1: ICMP 172.17.0.3 udp port 38908 unreachable Length 42 11 length 20 IP 54.503198 IP 172.17.0.3.38908 > 172.16.13.56789: ICMP 172.16.13.13 udp port 172.17.0.3: ICMP 172.16.13.13 udp port 56789 unreachable, length 39
At this point, the UDP nc server on the host has not exited, and you may see that it is still listening to the port using lsof-I: 56789.
Cause of the problem
From the analysis of the network message, we can see that the message source address returned by the server is not the eth0 address we expected, but the docker0 address, and the client directly thinks that the message is illegal and returns the ICMP message to the other side.
Then the cause of the problem can also be divided into two parts:
Why is the source address of the reply message wrong?
Since UDP is stateless, how can the kernel determine that the source address is incorrect?
The problem of UDP source address selection for host with multiple network interfaces
The key words of * questions are: UDP and multiple network interfaces. Because if there is only one network interface on the host, there must be no error in the source address of the message sent, and we have tested that the TCP protocol can handle this problem.
Through the search, it is found that this is indeed a known problem. In the book UNP (), this problem has been described, and here is the corresponding content:
This problem can be summed up in one sentence: UDP in the case of multiple network cards, the server-side source address may be wrong, which is the result of kernel routing. Why do UDP and TCP have different routing logic? Because UDP is a stateless protocol, the kernel will not save the information of both sides of the connection, so each message sent by the socket layer is considered to be independent. By default, the source address to be used by the socket layer will not be specified, only the address of the other party. Therefore, the kernel chooses an ip for the message to be sent, which is usually the ip address of the device through which the message is routed.
For this reason, explain the question: why don't dnsmasq services have this problem? So I used the strace tool to grab the network socket system calls of the dnsmasq and the problematic application to see what the difference was between them.
Dnsmasq listens on port 54 of UDP and TCP during the startup phase (because it was tested on the local machine, I chose port 54 instead of the standard port 53 to prevent conflicts with the local DNS-listening DNS port):
Socket (PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 setsockopt (4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 bind (4, {sa_family=AF_INET, sin_port=htons (54), sin_addr=inet_addr ("0.0.0.0")}, 16) = 0 setsockopt (4, SOL_IP, IP_PKTINFO, [1], 4) = 0 socket (PF_INET, SOCK_STREAM, IPPROTO_IP) = 5 setsockopt (5, SOL_SOCKET, SO_REUSEADDR, [1]) 4) = 0 bind (5, {sa_family=AF_INET, sin_port=htons (54), sin_addr=inet_addr ("0.0.0.0")}, 16) = 0 listen (5,5) = 0
Less listen than the TCP,UDP part, but multiple setsockopt (4, SOL_IP, IP_PKTINFO, [1], 4). After all, whether these two points are related to our problem, leave it for the time being and continue to look at the part of the transmission message.
Dnsmasq system calls for receiving and sending packets, directly using recvmsg and sendmsg system calls:
Recvmsg (4, {msg_name (16) = {sa_family=AF_INET, sin_port=htons (52072), sin_addr=inet_addr ("10.111.59.4")}, msg_iov (1) = [{"\ 315\ n\ 1\ 0\ 1\ 0\ 0\ 0\ 1\ fterminal19-0\ 5u5016\ 3"..., 4096}], msg_controllen=32, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=,.}, msg_flags=0} 0) = 67 sendmsg (4, {msg_name (16) = {sa_family=AF_INET, sin_port=htons (52072), sin_addr=inet_addr ("10.111.59.4")}, msg_iov (1) = [{"\ 315\ n\ 201\ 0\ 1\ 0\ 0\ 1\ fterminal19-0\ 5u5016\ 3", 83}], msg_controllen=28, {cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=,.}, msg_flags=0} 0) = 83
The results of the application strace with the problem are as follows:
[pid 477] socket (PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 124 [pid 477] setsockopt (124, SOL_IPV6, IPV6_V6ONLY, [0], 4) = 0 [pid 477] setsockopt (124, SOL_IPV6, IPV6_MULTICAST_HOPS, [1], 4) = 0 [pid 477] bind (124,{ sa_family=AF_INET6, sin6_port=htons (6088), inet_pton (AF_INET6, ":", & sin6_addr), sin6_flowinfo=0, sin6_scope_id=0} ) = 0 [pid 477] getsockname (124,{ sa_family=AF_INET6, sin6_port=htons (6088), inet_pton (AF_INET6, ":", & sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0 [pid 477] getsockname (124,{ sa_family=AF_INET6, sin6_port=htons (6088), inet_pton (AF_INET6, ":", & sin6_addr), sin6_flowinfo=0, sin6_scope_id=0} [28]) = 0 [pid 477] recvfrom (124,124,0, {sa_family=AF_INET6, sin6_port=htons (38790), inet_pton (AF_INET6, ":: ffff:172.17.0.3", & sin6_addr). Sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 168[ pid 477] sendto (124,124,123,0, {sa_family=AF_INET6, sin6_port=htons (38790), inet_pton (AF_INET6, ": ffff:172.17.0.3"), inet_pton (AF_INET6, ": ffff:172.17.0.3") & sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 533
The corresponding logic is as follows: use ipv6 binding on ports 0.0.0.0 and 6088, call getsockname to get the port information of the current socket binding, and the data transfer process uses recvfrom and sendto.
By comparison, there are several differences between the two:
The latter uses ipv6, while the former uses ipv4
The latter uses recvfrom and sendto to transmit data, while the former is sendmsg and recvmsg
The former has the value of calling setsockopt to set IP_PKTINFO, while the latter does not
Because it was an error while transmitting data, the doubt is that some differences between sendmsg and sendto lead to different selection of source addresses. Through man sendto, we can know that sendmsg contains more control information in msghdr. A reasonable guess is that msghdr contains information about the source address selected by the kernel!
Through the search, it is found that the IP_PKTINFO option is to let the kernel store the information of the IP message in the socket, including the source address and destination address of the message. The relationship between IP_PKTINFO and msghdr can be found in this stackoverflow: https://stackoverflow.com/questions/3062205/setting-the-source-ip-for-a-udp-socket.
The man 7 ip documentation also explains how IP_PKTINFO controls source address selection:
IP_PKTINFO (since Linux 2.2) Pass an IP_PKTINFO ancillary message that contains a pktinfo structure that supplies some information about the incoming packet. This only works for datagram ori- ented sockets. The argument is a flag that tells the socket whether the IP_PKTINFO message should be passed or not. The message itself can only be sent/retrieved as control message with a packet using recvmsg (2) or sendmsg (2). Struct in_pktinfo {unsigned int ipi_ifindex; / * Interface index * / struct in_addr ipi_spec_dst; / * Local address * / struct in_addr ipi_addr; / * Header Destination address * /} Ipi_ifindex is the unique index of the interface the packet was received on. Ipi_spec_dst is the local address of the packet and ipi_addr is the destination address in the packet header. If IP_PKTINFO is passed to sendmsg (2) and ipi_spec_dst is not zero, then it is used as the local source address for the routing table lookup and for setting up IP source route options. When ipi_ifindex is not zero, the primary local address of the interface specified by the index overwrites ipi_spec_dst for the routing table lookup.
If ipi_spec_dst and ipi_ifindex are not empty, they can both be used as a basis for source address selection, rather than for the kernel to be routed.
That is, by setting the IP_PKTINFO socket option to 1, and then using recvmsg and sendmsg to transfer data, we can ensure that the source address selection meets our expectations. This is also the solution used by dnsmasq, and the problem is due to the use of the default recvfrom and sendto.
Doubts about UDP connection
Another question is: why does the kernel discard messages with different source addresses than before? Think it's illegal? Because as we said earlier, the UDP protocol is connectionless, and by default, socket does not save the information about the connection between the two parties. Even if the source address of the message sent by the server is wrong, as long as the other side can receive and process it normally, it will not cause the network to be blocked.
Because of conntrack, the kernel's netfilter module saves the state of the connection and serves as the basis for firewall settings. The UDP connection it saves simply records the local ip and port on the host, and the peer ip and port, and does not save more.
You can refer to the article on the intables info website: http://www.iptables.info/en/connection-state.html#UDPCONNECTIONS.
Before finding the root cause, we tried to use SNAT to modify the source address of the server reply message, hoping to fix the problem. But found that this method does not work, why?
Because SNAT is done in netfilter * *, the conntrack of netfilter discarded it directly because it did not know the connection, so it will not work even if SNAT is added.
Can you get rid of the conntrack function? For example, the solution:
Iptables-I OUTPUT-t raw-p udp-- sport 5060-j CT-- notrack iptables-I PREROUTING-t raw-p udp-- dport 5060-j CT-- notrack
The answer is also no, because NAT needs conntrack to do translation work, if the removal of conntrack equals SNAT is completely useless.
Solution
Knowing the cause of the problem, the solution is easy to find.
Use the TCP protocol
If the server and the client communicate using the TCP protocol, the network between them is normal.
$nc-l 56789
Listening on a specific port
Start a udp server using nc and listen on eth0:
➜~ nc-ul 172.16.13.13 56789
Nc can take two parameters, one for ip and one for port, indicating that the server is listening on a particular ip. If the destination address of the received message is not 172.16.13.13, it will also be discarded by the kernel directly.
In this case, the server and the client can also communicate normally.
Change the application implementation
Modify the logic of the application, set IP_PKTIFO on UDP socket, and transfer data through the recvmsg and sendmsg functions.
Thank you for reading, the above is the content of "meeting the problem of solving the UDP protocol under the Docker container network". After the study of this article, I believe you have a deeper understanding of meeting the problem of solving the UDP protocol under the Docker container network, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.