TCP Connection Analysis Why the Socket Remains in the FIN_WAIT_1 State Post Killing the Process

Symptoms

A process on the ECS instance establishes a socket connection to another server. However, once the process is killed, it is observed that the tcpdump cannot capture any FIN packet. As a result, the connection on the server is not properly closed. The following sections comprehensively analyze the reasons behind this problem.

Analysis

Usually, after the process is killed, close() command is called in user mode to initiate a TCP FIN to the peer end. Therefore, the preceding symptom is abnormal. The key information in this regard is as follows:

Root Cause

According to the preceding analysis, further focus on the impacts of iptables (Netfilter) and TC mechanisms on packets on the precondition that no big bugs are found. It turns out that many iptables rules are configured on the ECS instance. Use iptables -nvL to print the match count of each rule, or use the log writing method. The following snippet shows an example.

# 记录下new state的报文的日志
iptables -A INPUT -p tcp -m state --state NEW -j LOG --log-prefix "[iptables] INPUT NEW: "
# iptables -A OUTPUT -m state --state INVALID -j DROP
  • Why is the FIN packet considered INVALID?

When Does the Problem Trigger

Let’s address the first question. Does the problem always occur? What are the triggering conditions?

Why FIN Packet is Considered INVALID

For a TCP connection, it is logical to consider the FIN packet INVALID if one end initiates FIN when there is no connection tracking entry in conntrack. However, no document clearly describes how the conntrack module determines the state of a “new” packet when the user space of the TCP socket still exists but the conntrack entry does not exist.

Test: Iptables Rule Setting

Use the following script to set iptables rules.

#!/bin/sh
iptables -P INPUT ACCEPT
iptables -F
iptables -X
iptables -Z
# 在日志里记录INPUT chain里过来的每个报文的状态
iptables -A INPUT -p tcp -m state --state NEW -j LOG --log-prefix "[iptables] INPUT NEW: "
iptables -A INPUT -p TCP -m state --state ESTABLISHED -j LOG --log-prefix "[iptables] INPUT ESTABLISHED: "
iptables -A INPUT -p TCP -m state --state RELATED -j LOG --log-prefix "[iptables] INPUT RELATED: "
iptables -A INPUT -p TCP -m state --state INVALID -j LOG --log-prefix "[iptables] INPUT INVALID: "
iptables -A INPUT -i lo -j ACCEPT
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
iptables -A INPUT -p tcp --dport 21 -j ACCEPT
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
iptables -A INPUT -p tcp --dport 8088 -m state --state NEW -j ACCEPT
iptables -A INPUT -p icmp --icmp-type 8 -j ACCEPT
iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
# 在日志里记录OUTPUT chain里过来的每个报文的状态
iptables -A OUTPUT -p tcp -m state --state NEW -j LOG --log-prefix "[iptables] OUTPUT NEW: "
iptables -A OUTPUT -p TCP -m state --state ESTABLISHED -j LOG --log-prefix "[iptables] OUTPUT ESTABLISHED: "
iptables -A OUTPUT -p TCP -m state --state RELATED -j LOG --log-prefix "[iptables] OUTPUT RELATED: "
iptables -A OUTPUT -p TCP -m state --state INVALID -j LOG --log-prefix "[iptables] OUTPUT INVALID: "
# iptables -A OUTPUT -m state --state INVALID -j DROP
iptables -P INPUT DROP
iptables -P OUTPUT ACCEPT
iptables -P FORWARD DROP
service iptables save
systemctl restart iptables.service
sysctl-w net. netfilter. nf_conntrack_tcp_timeout_established = 20

Code Logic

Check the packets of the nf_conntrack module starting from the nf_conntrack_in function. The logic for the non-existing new conntrack entries is shown below.

nf_conntrack_in @net/netfilter/nf_conntrack_core.c
|--> resolve_normal_ct @net/netfilter/nf_conntrack_core.c // 利用__nf_conntrack_find_get查找对应的连接跟踪表项,没找到则init新的conntrack表项
|--> init_conntrack @net/netfilter/nf_conntrack_core.c // 初始化conntrack表项
|--> tcp_new @net/netfilter/nf_conntrack_proto_tcp.c // 到TCP协议的处理逻辑,called when a new connection for this protocol found。在这里根据tcp_conntracks数组决定状态。

reslove_normal_ct

In reslove_normal_ct, the logic is to first find the corresponding conntrack entry using __nf_conntrack_find_get. In the scenario described in this article, the conntrack entry has timed out. Therefore, this entry does not exist. The code logic goes to init_conntrack to initialize a table item.

/* look for tuple match */
hash = hash_conntrack_raw(&tuple, zone);
h = __nf_conntrack_find_get(net, zone, &tuple, hash);
if (!h) {
h = init_conntrack(net, tmpl, &tuple, l3proto, l4proto,
skb, dataoff, hash);
if (!h)
return NULL;
if (IS_ERR(h))
return (void *)h;
}

init_conntrack

In the following logic of init_conntrack, "new" of nf_conntrack_l4proto reads and verifies the content of a packet that is new for the conntrack module. If the returned value is "false", the logic goes to the subsequent "if statement" to end the process of initializing the conntrack entry. In the scenario described in this article, initialization of the conntrack entry really ends here.

if (!l4proto->new(ct, skb, dataoff, timeouts)) {
nf_conntrack_free(ct);
pr_debug("init conntrack: can't track with proto module\n");
return NULL;
}

tcp_new

In the following tcp_new logic, the key logic is to assign a value to new_state. If the value of new_state is equal to or greater than TCP_CONNTRACK_MAX, the logic returns "false" and exits. For the FIN packet, the value assigned to new_state is TCP_CONNTRACK_MAX (sIV). The specific logic is analyzed as follows.

/* Called when a new connection for this protocol found. */
static bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb,
unsigned int dataoff, unsigned int *timeouts)
{
enum tcp_conntrack new_state;
const struct tcphdr *th;
struct tcphdr _tcph;
struct net *net = nf_ct_net(ct);
struct nf_tcp_net *tn = tcp_pernet(net);
const struct ip_ct_tcp_state *sender = &ct->proto.tcp.seen[0];
const struct ip_ct_tcp_state *receiver = &ct->proto.tcp.seen[1];
th = skb_header_pointer(skb, dataoff, sizeof(_tcph), &_tcph);
BUG_ON(th == NULL);
/* Don't need lock here: this conntrack not in circulation yet */
// 这里get_conntrack_index拿到的是TCP_FIN_SET,是枚举类型tcp_bit_set的值
new_state = tcp_conntracks[0][get_conntrack_index(th)][TCP_CONNTRACK_NONE];
/* Invalid: delete conntrack */
if (new_state >= TCP_CONNTRACK_MAX) {
pr_debug("nf_ct_tcp: invalid new deleting.\n");
return false;
}
......
}
  • In the scenario described in this article, the outer mark of the middle layer is determined by get_conntrack_index. Get_conntrack_index(th) obtains the value TCP_FIN_SET of enum tcp_bit_set (defined as follows) based on the FIN flag in the packet. tcp_bit_set is in one-to-one correspondence with the middle layer subscript of the tcp_conntracks array to be introduced below.
/* What TCP flags are set from RST/SYN/FIN/ACK. */
enum tcp_bit_set {
TCP_SYN_SET,
TCP_SYNACK_SET,
TCP_FIN_SET,
TCP_ACK_SET,
TCP_RST_SET,
TCP_NON

tcp_conntracks Array

The following snippet shows the content of the array. The source code has a lot of comments that describe the status transition (which is omitted here). This article only focuses on the definition of the packet state of the first packet that is received after the conntrack entry times out.

static const u8 tcp_conntracks[2][6][TCP_CONNTRACK_MAX] = {
{
/* ORIGINAL */
/*syn*/ { sSS, sSS, sIG, sIG, sIG, sIG, sIG, sSS, sSS, sS2 },
/*synack*/ { sIV, sIV, sSR, sIV, sIV, sIV, sIV, sIV, sIV, sSR },
/*fin*/ { sIV, sIV, sFW, sFW, sLA, sLA, sLA, sTW, sCL, sIV },
/*ack*/ { sES, sIV, sES, sES, sCW, sCW, sTW, sTW, sCL, sIV },
/*rst*/ { sIV, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL },
/*none*/ { sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV }
},
{
/* REPLY */
/*syn*/ { sIV, sS2, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sS2 },
/*synack*/ { sIV, sSR, sIG, sIG, sIG, sIG, sIG, sIG, sIG, sSR },
/*fin*/ { sIV, sIV, sFW, sFW, sLA, sLA, sLA, sTW, sCL, sIV },
/*ack*/ { sIV, sIG, sSR, sES, sCW, sCW, sTW, sTW, sCL, sIG },
/*rst*/ { sIV, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL, sCL },
/*none*/ { sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV, sIV }
}
};
tcp_conntracks[0][get_conntrack_index(th)][TCP_CONNTRACK_NONE] =>tcp_conntracks[0][get_conntrack_index(th)][0]
  • When the packet carries RESET: tcp_conntracks0[0] = tcp_conntracks0[0] => INVALID state
  • When the packet carries SYNACK: tcp_conntracks0[0] = tcp_conntracks0[0] => INVALID state
  • When the packet carries both SYN and ACK, the packet is considered NEW by the conntrack module.

Conclusion

When iptables is used in the operating system (or hooks provided by Netfilter are used in other scenarios), we recommend that you set nf_conntrack_tcp_timeout_established to a value smaller than the default value (5 days). This is the best practice recommended to prevent too many entries in the conntrack table. But the critical question is how to determine whether the value of nf_conntrack_tcp_timeout_established is appropriate. Unless you clearly know the filtering action that is performed on every packet according to the iptables rules, we do not recommend to set the value to hundreds of seconds or even smaller.

Original Source:

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store