How to realize connection tracking under Linux 07/01 Update SLTechnology News&Howtos

How to realize connection tracking under Linux

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to achieve connection tracking under Linux, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

1 introduction

Connection tracking is the foundation of many network applications. For example, Kubernetes Service, ServiceMesh sidecar, software layer 4 load balancer LVS/IPVS, Docker network, OVS, iptables host firewall, and so on, all rely on connection tracking.

1.1 concept

Connection tracking (conntrack)

Figure 1.1. Connection tracking and its kernel location

Connection tracking, as the name implies, tracks (and records) the status of the connection.

For example, figure 1.1 shows a Linux machine with IP address 10.1.1.2, and we can see that there are three connections on this machine:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Connection for machines to access external HTTP services (destination port 80)

External access to FTP services within the machine (destination port 21)

Connection for machines to access external DNS services (destination port 53)

What connection tracking does is discover and track the status of these connections, including:

Extract tuple (tuple) information from the packet to identify the data flow (flow) and the corresponding connection (connection)

Maintain a state database (conntrack table) for all connections, such as connection creation time, number of packets sent, number of bytes sent, etc.

Reclaim expired connections (GC)

Provide services for higher-level functions such as NAT

It should be noted that the concept of "connection" in connection tracking is not exactly the same as the "connection-oriented" (connection oriented) "connection" in the TCP/IP protocol. To put it simply:

In the TCP/IP protocol, connectivity is a four-layer (Layer 4) concept.

TCP is connection-oriented, or connection-oriented (connection oriented). Packets sent out require a peer-to-peer reply (ACK) and have a retransmission mechanism.

UDP is connectionless, and the packet sent does not require a peer-to-peer reply and has no retransmission mechanism.

In CT, a data stream (flow) defined by a tuple represents a connection.

As you will see later, UDP and even ICMP, a three-layer protocol, have connection records in CT.

But not all protocols are tracked by connections.

When the word "connection" is used in this article, it mostly refers to the latter, that is, "connection" in "connection tracking".

Network address Translation (NAT)

Figure 1.2. NAT and its kernel location

Network address translation (NAT), the meaning is relatively clear: the network address (IP + Port) of the packet is translated.

For example, in figure 1.2, the machine's own IP 10.1.1.2 can communicate normally with the outside world, but the 192.168 network segment is a private IP segment and cannot be accessed by the outside world, that is, if the source IP address is 192.168, the reply packet cannot be returned.

Therefore, when the packet with the source address of 192.168 network segment is about to go out, the machine will first change the source IP to the machine's own 10.1.1.2 and then send it; when it receives the reply packet, it will do the opposite conversion. This is the basic process of NAT.

This is the principle of Docker's default bridge network mode [4]. Each container is assigned an IP address of a private network segment. This IP address can communicate between different containers in the host, but the container traffic needs to be NAT when it leaves the host.

NAT can be subdivided into several categories:

SNAT: translating the source address (source)

DNAT: translates the destination address (destination)

Full NAT: translates both source and destination addresses

The above scenario belongs to SNAT, which maps different private IP to the same "public IP" to enable it to access external network services. This kind of scenario is also a forward agent.

NAT relies on the results of connection tracking. The most important usage scenario for connection tracking is NAT.

Layer-4 load balancing (L4 LB)

Figure 1.3. L4LB: Traffic path in NAT mode [3]

Extend the scope a little bit to discuss the four-tier load balancing of the NAT mode.

Layer-4 load balancer distributes traffic based on the layer-4 information of the packet (such as src/dst ip, src/dst port, proto).

VIP (Virtual IP) is an implementation of layer-4 load balancing:

Multiple real backend IP (Real IP) are attached to the same virtual IP (VIP)

The traffic from the client first arrives at the VIP and then is forwarded to a specific backend IP through the load balancing algorithm.

If the NAT technology is used between the VIP and Real IP nodes (other technologies can also be used), when the client accesses the server, the L4LB node will do a two-way NAT (Full NAT), as shown in figure 1.3.

1.2 principle

After understanding the above concepts, let's consider the technical principles of connection tracking.

To track all the connection status of a machine, you need

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Intercept (or filter) every packet that flows through this machine and analyze it.

Based on this information, a connection information database (conntrack table) on this machine is established.

Constantly update the database according to the intercepted packet information

For example,

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

When you intercept a TCP SYNC packet, you are trying to establish a TCP connection, and you need to create a new conntrack entry to record the connection

When intercepting a packet that belongs to an existing conntrack entry, you need to update the statistics such as the number of packets sent and received by this conntrack entry.

In addition to the above two functional requirements, performance issues should also be considered, because connection tracking filters and analyzes each packet. Performance issues are very important, but they are not the focus of this article, which will be mentioned further in the implementation.

In addition, these functions had better have matching management tools to make it easier to use.

1.3Design: Netfilter

Figure 1.4. Netfilter architecture inside Linux kernel

Connection tracking for Linux is implemented in Netfilter.

Netfilter is a framework for controlling, modifying, and filtering (manipulation and filtering) packets in the Linux kernel. It sets several hook points in the kernel protocol stack to intercept, filter or otherwise process data packets.

"to put it more bluntly, the hook mechanism is to set up a number of detection points on the necessary path of a packet, and all packets arriving at these points must be tested, based on the results of the test:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Release: do not make any changes to the package, exit the detection logic, and continue the normal package processing

Modify: for example, change the IP address to NAT, and then put the packet back to the normal packet processing logic

Discard: security policy or firewall featur

The connection tracking module only completes the collection and input of connection information and does not modify or discard data packets, which is completed by other modules (such as NAT) based on Netfilter hook. "

Netfilter, one of the oldest kernel frameworks, was developed in 1998 and merged into the 2.4.x kernel mainline version in 2000 [5].

1.4 Design: further reflection

When it comes to connection tracking (conntrack), you may think of Netfilter first. However, as we can see from the discussion in Section 1.2, the concept of connection tracking is independent of Netfilter, and Netfilter is just a connection tracking implementation in the Linux kernel.

In other words, as long as you have the hook capability, you can intercept every packet entering and leaving the host, and you can implement a set of connection tracking on this basis.

Figure 1.5. Cilium's conntrack and NAT architectrue

Cloud native network solution Cilium implements such an independent connection tracking and NAT mechanism in version 1.7.4 + (full functionality requires Kernel 4.19 +). The basic principles are as follows:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Implement packet interception function based on BPF hook (equivalent to hook mechanism in netfilter)

Implement a new set of conntrack and NAT on the basis of BPF hook

Therefore, even if Netfilter is uninstalled, it will not affect Cilium's support for functions such as Kubernetes ClusterIP, NodePort, ExternalIPs and LoadBalancer [2].

Because this connection tracking mechanism is independent of Netfilter, its conntrack and NAT information is not stored in the kernel's (that is, Netfilter's) conntrack table and NAT table. So you can't see regular conntrack/netstats/ss/lsof and other tools, so use Cilium commands, such as:

$cilium bpf nat list $cilium bpf ct list global

The configuration is also independent and needs to be configured in Cilium, such as the command line option-- bpf-ct-tcp-max.

In addition, this article will mention many times that the connection tracking module and the NAT module are independent, but for performance reasons, the two codes may be coupled in the specific implementation. For example, when Cilium does garbage collection (GC) for conntrack, it will recycle the corresponding entry in NAT by the way, rather than doing a separate GC for NAT.

The above is a theoretical part, and then let's take a look at the kernel implementation.

2 implementation of Netfilter hook mechanism

Netfilter consists of several modules, the most important of which are connection tracking (CT) module and network address translation (NAT) module.

The main responsibility of the CT module is to identify the packets that can be traced. The CT module is independent of the NAT module, but the main purpose is to serve the latter.

2.1 Netfilter framework

5 hook points

Figure 2.1. The 5 hook points in netfilter framework

As shown in the figure above, Netfilter provides five hook points on the packet processing path of the kernel protocol stack, namely:

/ / include/uapi/linux/netfilter_ipv4.h # define NF_IP_PRE_ROUTING 0 / * After promisc drops, checksum checks. * / # define NF_IP_LOCAL_IN 1 / * If the packet is destined for this box. * / # define NF_IP_FORWARD 2 / * If the packet is destined for another interface. * / # define NF_IP_LOCAL_OUT 3 / * Packets coming from a local process. * / # define NF_IP_POST_ROUTING 4 / * Packets about to hit the wire. * / # define NF_IP_NUMHOOKS 5

Users can register their own handlers (handlers) at these hook points. When a packet passes through the hook point, the corresponding handlers is called.

"there is also a set of definitions at the beginning of NF_INET_, include/uapi/linux/netfilter.h. These two sets are equivalent, and from the point of view of comments, the definition at the beginning of NF_IP_ may be to maintain compatibility.

Enum nf_inet_hooks {NF_INET_PRE_ROUTING, NF_INET_LOCAL_IN, NF_INET_FORWARD, NF_INET_LOCAL_OUT, NF_INET_POST_ROUTING, NF_INET_NUMHOOKS}

Hook return value type

After the hook function judges or processes the package, it needs to return a judgment that tells you what to do with the package next. The possible results are:

/ / include/uapi/linux/netfilter.h # define NF_DROP 0 / / the package has been discarded # define NF_ACCEPT 1 / / accept this package and proceed to the next step # define NF_STOLEN 2 / / this package has already been consumed by the current handler The latter handler does not need to be processed # define NF_QUEUE 3 / / the package should be placed in the queue # define NF_REPEAT 4 / / the current handler should be called again

Hook priority

Each hook point can register multiple handlers (handler). The priority of these handlers must be specified when registering, so that the processing function can be called according to the priority when the hook is triggered.

2.2 Organization of filtering rules

Iptables is a user space tool for configuring Netfilter filtering. For ease of management, filtering rules are divided into several table by function:

Raw

Filter

Nat

Mangle

This is not the point of this article. For more information, please refer to the in-depth understanding of iptables and netfilter architecture.

3 Netfilter conntrack implementation

The connection tracking module is used to maintain the connection status of the traceable protocol (trackable protocols). That is, connection tracking is for packets for specific protocols, not for all protocols. You will see which protocols it supports later.

3.1 important structures and functions

Important structures:

Struct nf_conntrack_tuple {}

Define a tuple.

Struct nf_conntrack_man_proto {}: the protocol-related part of manipulable part.

Struct nf_conntrack_man {}

: manipulable part for tuple.

Struct nf_conntrack_l4proto {}: the set of methods (and other protocol-related fields) that need to be implemented by a protocol that supports connection tracking.

Struct nf_conntrack_tuple_hash {}: table item (entry) in the conntrack table.

Struct nf_conn {}: define a flow.

Important functions:

Hash_conntrack_raw (): calculates a 32-bit hash value (hash key) based on tuple.

Nf_conntrack_in (): the core of the connection tracking module, where the package enters the connection tracking.

Resolve_normal_ct ()-> init_conntrack ()-> l4proto- > new (): create a new connection record (conntrack entry).

Nf_conntrack_confirm (): confirm the new connection created earlier with nf_conntrack_in ().

3.2 struct nf_conntrack_tuple {}: tuple (Tuple)

Tuple is one of the most important concepts in connection tracking.

A tuple defines a unidirectional flow. The kernel code has the following comments:

"/ / include/net/netfilter/nf_conntrack_tuple.h

A tuple is a structure containing the information to uniquely identify a connection. Ie. If two packets have the same tuple, they are in the same connection; if not, they are not. "

Structure definition

/ / include/net/netfilter/nf_conntrack_tuple.h / / to facilitate the implementation of NAT The kernel splits the tuple structure into "manipulatable" and "non-manipulatable" / / the _ man in the following structure is the abbreviation / / ude/uapi/linux/netfilter.h union nf_inet_addr {of manipulatable. _ _ u32 all [4] _ be32 ip; _ be32 ip6 [4]; struct in_addr in Struct in6_addr in6; / * manipulable part of the tuple * / /}; struct nf_conntrack_man {/ union nf_inet_addr U3;-- >-- / union nf_conntrack_man_proto u -- >--\ / / include/uapi/linux/netfilter/nf_conntrack_tuple_common.h u_int16_t l3numb; / / L3 proto\ / / Protocol related parts} Union nf_conntrack_man_proto {_ _ be16 all;/* Add other protocols here. * / struct {_ be16 port;} tcp; struct {_ _ be16 port;} udp; struct {_ _ be16 id;} icmp Struct {_ _ be16 port;} dccp; struct {_ _ be16 port;} sctp; struct {_ _ be16 key;} gre }; struct nf_conntrack_tuple {/ * This contains the information to distinguish a connection. * / struct nf_conntrack_man src; / / Source address information, manipulable part struct {union nf_inet_addr U3; union {_ _ be16 all; / * Add other protocols here. * / struct {_ be16 port;} tcp; struct {_ _ be16 port;} udp; struct {u_int8_t type, code;} icmp; struct {_ be16 port;} dccp; struct {_ _ be16 port;} sctp; struct {_ _ be16 key } gre;} u; u_int8_t protonum; / * The protocol. * / u_int8_t dir; / * The direction (for tuplehash) * /} dst; / / destination address information}

There are only two fields in the Tuple structure, src and dst, which store the source and destination information, respectively. Src and dst are themselves structures that can store data for different types of protocols. Take IPv4 UDP as an example, the quintuples are stored in the following fields:

Dst.protonum: protocol type

Src.u3.ip: source IP addr

Dst.u3.ip: destination IP address

Src.u.udp.port: source port number

Dst.u.udp.port: destination port number

Protocols supported by CT

As you can see from the above definition, the connection tracking module currently supports only the following six protocols: TCP, UDP, ICMP, DCCP, SCTP, and GRE.

Note the ICMP protocol. You might think that the connection tracking module hashes according to the layer 3 and layer 4 information of the packet, while ICMP is a layer 3 protocol and there is no layer 4 information, so ICMP will definitely not be recorded by CT. But actually it will, and as you can see in the above code, ICMP uses the ICMP type and code fields in its header information to define the tuple.

3. 3 struct nf_conntrack_l4proto {}: the set of methods that the protocol needs to implement

Protocols that support connection tracking need to implement the methods defined in the struct nf_conntrack_l4proto {} structure, such as pkt_to_tuple ().

/ / include/net/netfilter/nf_conntrack_l4proto.h struct nf_conntrack_l4proto {u_int16_t l3proto; / * L3 Protocol number. * / u_int8_t l4proto; / * L4 Protocol number. * / / extract tuple bool (* pkt_to_tuple) (struct sk_buff * skb,... Struct nf_conntrack_tuple * tuple); / / A pair of packets make a decision and return the decision result (returns verdict for packet) int (* packet) (struct nf_conn * ct, const struct sk_buff * skb...); / / create a new connection. If TRUE; is returned successfully, if TRUE is returned, the packet () method bool (* new) (struct nf_conn * ct, const struct sk_buff * skb, unsigned int dataoff) will be called next; / / to determine whether the current packet can be connected and tracked. If the return is successful, the packet () method int (* error) (struct net * net, struct nf_conn * tmpl, struct sk_buff * skb,...) will be called next.}

3.4 struct nf_conntrack_tuple_hash {}: hash table entry

Conntrack stores the state of the active connection in a hash table (key: value).

Hash_conntrack_raw () calculates a 32-bit hash (key) based on tuple:

/ / net/netfilter/nf_conntrack_core.c static U32 hash_conntrack_raw (struct nf_conntrack_tuple * tuple, struct net * net) {get_random_once (& nf_conntrack_hash_rnd, sizeof (nf_conntrack_hash_rnd)); / * The direction must be ignored, so we hash everything up to the * destination ports (which is a multiple of 4) and treat the last three bytes manually. * / U32 seed = nf_conntrack_hash_rnd ^ net_hash_mix (net); unsigned int n = (sizeof (tuple- > src) + sizeof (tuple- > dst.u3)) / sizeof (U32); return jhash3 ((U32 *) tuple, n, seed ^ ((tuple- > dst.u.all dst.protonum));}

Notice how the hash is calculated using the different fields of tuple.

Nf_conntrack_tuple_hash is the value in the hash table:

/ / include/net/netfilter/nf_conntrack_tuple.h / / each join corresponds to two items in the hash table, corresponding to two directions (egress/ingress) / / Connections have two entries in the hash table: one for each way struct nf_conntrack_tuple_hash {struct hlist_nulls_node hnnode; / / points to the join struct nf_conn corresponding to the hash. The list form is used to solve the hash conflict struct nf_conntrack_tuple tuple. / / N tuples, which have been described in detail earlier}

3.5 struct nf_conn {}: connect (connection)

Every flow in Netfilter is called a connection, even for non-connection-oriented protocols such as UDP. Each connection is represented by struct nf_conn {}. The main fields are as follows:

/ / include/net/netfilter/nf_conntrack.h / / include/linux/skbuff.h-> struct nf_conntrack {| atomic_t use; / / connection reference count? | |}; struct nf_conn {| struct nf_conntrack ct_general; struct nf_conntrack_tuple_hash tuplehash [IP _ CT_DIR_MAX]; / / Hash table items. The array is to record the flow unsigned long status; / / connection status in both directions. See U32 hash below. | / / timer of connection status possible_net_t ct_net; struct hlist_node nat_bysource; / / per conntrack: protocol private data struct nf_conn * master Union nf_conntrack_proto {/ * insert conntrack proto private data here * / u_int32_t mark; / * Special tagging of skb * / struct nf_ct_dccp dccp; u_int32_t secmark Struct ip_ct_sctp sctp; struct ip_ct_tcp tcp; union nf_conntrack_proto proto;->-> struct nf_ct_gre gre;} Unsigned int tmpl_padto;}

State collection of connections enum ip_conntrack_status:

/ / include/uapi/linux/netfilter/nf_conntrack_common.h enum ip_conntrack_status {IPS_EXPECTED = (1 status & statusbit) / / Non-atomic: these bits don't change. * / verdict = nf_nat_manip_pkt (skb, ct, mtype, dir); return verdict;} static unsigned int nf_nat_manip_pkt (struct sk_buff * skb, struct nf_conn * ct, enum nf_nat_manip_type mtype, enum ip_conntrack_dir dir) {struct nf_conntrack_tuple target; / * We are aiming to look like inverse of other direction. * / nf_ct_invert_tuplepr (& target, & ct- > tuplehash [! dir] .tuple); l3proto = _ _ nf_nat_l3proto_find (target.src.l3num); l4proto = _ _ nf_nat_l4proto_find (target.src.l3num, target.dst.protonum); if (! l3proto- > manip_pkt (skb, 0, l4proto, & target, mtype)) / / Protocol-related processing return NF_DROP; return NF_ACCEPT } these are all the contents of the article "how to implement connection tracking under Linux". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.