TCP协议实例连接状态存放在struct sock数据结构的state数据域中。
当TCP协议实例连接处于不同状态时,对数据包的处理不一样,所以每个输入的数据包都要来查询TCP状态机,整个状态机制划分成3个阶段:

  • 第一阶段:连接建立阶段。
  • 第二阶段:数据传送阶段。
  • 第三阶段:断开连接阶段。

1. TCP建立连接(tcp_v4_connect)

tcp_v4_connect 函数将初始化一个对外的连接请求。
主要流程:
创建一个有SYN标志的请求连接数据包发送出去。
将TCP的状态从初始的CLOSED切换到SYN_SENT状态。
初始化TCP部分选项,如数据包序列号、窗口大小、MSS、套接字传送超时等。

/* This will initiate an outgoing connection. */
int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
{struct inet_sock *inet = inet_sk(sk);struct tcp_sock *tp = tcp_sk(sk);struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;struct rtable *rt;  /*路由表项*/__be32 daddr, nexthop;  /*目的地址和网关地址*/int tmp;int err;/*正确性检查*/if (addr_len < sizeof(struct sockaddr_in))return -EINVAL;if (usin->sin_family != AF_INET)return -EAFNOSUPPORT;/*根据IP选项重新设置目标地址和网关地址*/nexthop = daddr = usin->sin_addr.s_addr;if (inet->opt && inet->opt->srr) {if (!daddr)return -EINVAL;nexthop = inet->opt->faddr;}/*在确定了目标地址、网关地址等信息后,调用ip_route_connect函数来寻址目标路由。寻址目标路由的依据源是:目标IP地址、网络设备接口、源端口号(inet->sport)、目标端口号(usin->sin_port)等信息,寻址好的路由在路由表中的记录索引返回到rt变量中。*/tmp = ip_route_connect(&rt, nexthop, inet->inet_saddr,RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,IPPROTO_TCP,inet->inet_sport, usin->sin_port, sk, 1);if (tmp < 0) {if (tmp == -ENETUNREACH)IP_INC_STATS_BH(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);return tmp;}if (rt->rt_flags & (RTCF_MULTICAST | RTCF_BROADCAST)) {ip_rt_put(rt);return -ENETUNREACH;}if (!inet->opt || !inet->opt->srr)daddr = rt->rt_dst;if (!inet->inet_saddr)inet->inet_saddr = rt->rt_src;inet->inet_rcv_saddr = inet->inet_saddr;// 获取套接字最近使用时间if (tp->rx_opt.ts_recent_stamp && inet->inet_daddr != daddr) {/* Reset inherited state */tp->rx_opt.ts_recent      = 0;tp->rx_opt.ts_recent_stamp = 0;tp->write_seq        = 0;}if (tcp_death_row.sysctl_tw_recycle &&!tp->rx_opt.ts_recent_stamp && rt->rt_dst == daddr) {struct inet_peer *peer = rt_get_peer(rt);/** VJ's idea. We save last timestamp seen from* the destination in peer table, when entering state* TIME-WAIT * and initialize rx_opt.ts_recent from it,* when trying new connection.*/if (peer != NULL &&(u32)get_seconds() - peer->tcp_ts_stamp <= TCP_PAWS_MSL) {tp->rx_opt.ts_recent_stamp = peer->tcp_ts_stamp;tp->rx_opt.ts_recent = peer->tcp_ts;}}// 更新目的端口、目的地址、选项inet->inet_dport = usin->sin_port;inet->inet_daddr = daddr;inet_csk(sk)->icsk_ext_hdr_len = 0;if (inet->opt)inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;tp->rx_opt.mss_clamp = TCP_MSS_DEFAULT;/* Socket identity is still unknown (sport may be zero).* However we set state to SYN-SENT and not releasing socket* lock select source port, enter ourselves into the hash tables and* complete initialization after this.*//*连接状态切换为SYN_SENT,调用inet_hash_connect函数把套接字指针sk放入TCP连接的哈希链表中。*/tcp_set_state(sk, TCP_SYN_SENT);err = inet_hash_connect(&tcp_death_row, sk);if (err)goto failure;/*为连接分配一个临时端口号,以便以后在哈希表中查找*/err = ip_route_newports(&rt, IPPROTO_TCP,inet->inet_sport, inet->inet_dport, sk);if (err)goto failure;/* OK, now commit destination to socket.  */sk->sk_gso_type = SKB_GSO_TCPV4;sk_setup_caps(sk, &rt->u.dst);// 初始化第一个seqif (!tp->write_seq)tp->write_seq = secure_tcp_sequence_number(inet->inet_saddr,inet->inet_daddr,inet->inet_sport,usin->sin_port);inet->inet_id = tp->write_seq ^ jiffies;// tcp_connect 实际传送err = tcp_connect(sk);rt = NULL;if (err)goto failure;return 0;failure:/** This unhashes the socket and releases the local port,* if necessary.*/// 处理失败情况tcp_set_state(sk, TCP_CLOSE);ip_rt_put(rt);sk->sk_route_caps = 0;inet->inet_dport = 0;return err;
}

2. TCP连接的状态管理(tcp_rcv_state_process)

TCP协议连接初始化后的状态管理和切换由tcp_rcv_state_process函数来完成。

TCP协议实例收到数据包后(由tcp_v4_receive函数接收),TCP必须查看协议头,以区分数据段类型是只包含纯传送负载数据,还是包含控制信息SYN、FIN、RST、ACK等,不同类型的数据包在TCP协议头中标志位的设置不同。

根据数据包的类型,调用tcp_rcv_state_process函数确定TCP连接状态应如何切换,数据包应如何处理。

各状态下的数据包处理过程大部分都在tcp_rcv_state_process函数中完成,除ESTABLISHED和TIME_WAIT这两个状态以外。

如果数据包到达,TCP连接为CLOSED状态,则扔掉数据包。

/**  This function implements the receiving procedure of RFC 793 for*    all states except ESTABLISHED and TIME_WAIT.*   It's called from both tcp_v4_rcv and tcp_v6_rcv and should be* address independent.*/int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,struct tcphdr *th, unsigned len)
{struct tcp_sock *tp = tcp_sk(sk);struct inet_connection_sock *icsk = inet_csk(sk);int queued = 0;int res;tp->rx_opt.saw_tstamp = 0;switch (sk->sk_state) {case TCP_CLOSE:goto discard;// 当前TCP套接字所在状态是LISTEN,说明这个套接字是一个服务器(server),它在等待一个连接请求。case TCP_LISTEN:// 发送连接复位。if (th->ack)return 1;// RST:连接由客户端复位,扔掉数据包。if (th->rst)goto discard;// 客户端来发送的连接请求,调用icsk_af_ops->conn_request函数完成连接请求处理。将TCP连接状态切换至SYN_RECV。icsk_af_ops->conn_request函数指针在TCP协议实例中初始化为tcp_v4_conn_request。if (th->syn) {if (icsk->icsk_af_ops->conn_request(sk, skb) < 0)return 1;/* Now we have several options: In theory there is* nothing else in the frame. KA9Q has an option to* send data with the syn, BSD accepts data with the* syn up to the [to be] advertised window and* Solaris 2.1 gives you a protocol error. For now* we just ignore it, that fits the spec precisely* and avoids incompatibilities. It would be nice in* future to drop through and process the data.** Now that TTCP is starting to be used we ought to* queue this data.* But, this leaves one open to an easy denial of* service attack, and SYN cookies can't defend* against this problem. So, we drop the data* in the interest of security over speed unless* it's still in use.*/kfree_skb(skb);return 0;}goto discard;// 如果当前套接字状态是SYN_SENT,说明套接字为客户端,它发送了一个SYN数据包请求连接,并将自己设置为SYN_SENT状态。case TCP_SYN_SENT:// 这时我们必须查看输入数据段中的ACK或SYN标志,以确定是否将状态转换到ESTABLISHED。queued = tcp_rcv_synsent_state_process(sk, skb, th, len);if (queued >= 0)return queued;/* Do step6 onward by hand. */tcp_urg(sk, skb, th);__kfree_skb(skb);tcp_data_snd_check(sk);return 0;}// 数据包有效性检查:res = tcp_validate_incoming(sk, skb, th, 0);if (res <= 0)return -res;/* step 5: check the ACK field */// 比较复杂:如果回答是可接收的数据包,则将TCP连接状态转换到ESTABLISHED状态。if (th->ack) {int acceptable = tcp_ack(sk, skb, FLAG_SLOWPATH) > 0;switch (sk->sk_state) {// 连接处于SYN_RECVcase TCP_SYN_RECV:if (acceptable) {tp->copied_seq = tp->rcv_nxt;smp_mb();// 被迫打开,转换为ESTABLISHtcp_set_state(sk, TCP_ESTABLISHED);sk->sk_state_change(sk);/* Note, that this wakeup is only for marginal* crossed SYN case. Passively open sockets* are not waked up, because sk->sk_sleep ==* NULL and sk->sk_socket == NULL.*/if (sk->sk_socket)sk_wake_async(sk,SOCK_WAKE_IO, POLL_OUT);// 更新ACK、SND_WNDtp->snd_una = TCP_SKB_CB(skb)->ack_seq;tp->snd_wnd = ntohs(th->window) <<tp->rx_opt.snd_wscale;tcp_init_wl(tp, TCP_SKB_CB(skb)->seq);/* tcp_ack considers this ACK as duplicate* and does not calculate rtt.* Force it here.*/tcp_ack_update_rtt(sk, 0, 0);if (tp->rx_opt.tstamp_ok)tp->advmss -= TCPOLEN_TSTAMP_ALIGNED;/* Make sure socket is routed, for* correct metrics.*/icsk->icsk_af_ops->rebuild_header(sk);tcp_init_metrics(sk);tcp_init_congestion_control(sk);/* Prevent spurious tcp_cwnd_restart() on* first data packet.*/tp->lsndtime = tcp_time_stamp;tcp_mtup_init(sk);tcp_initialize_rcv_mss(sk);tcp_init_buffer_space(sk);tcp_fast_path_on(tp);} else {return 1;}break;case TCP_FIN_WAIT1:if (tp->snd_una == tp->write_seq) {// 由FIN_WAIT_1切换到FIN_WAIT_2tcp_set_state(sk, TCP_FIN_WAIT2);sk->sk_shutdown |= SEND_SHUTDOWN;dst_confirm(__sk_dst_get(sk));if (!sock_flag(sk, SOCK_DEAD))/* Wake up lingering close() */sk->sk_state_change(sk);else {int tmo;if (tp->linger2 < 0 ||(TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt))) {tcp_done(sk);NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPABORTONDATA);return 1;}tmo = tcp_fin_time(sk);if (tmo > TCP_TIMEWAIT_LEN) {inet_csk_reset_keepalive_timer(sk, tmo - TCP_TIMEWAIT_LEN);} else if (th->fin || sock_owned_by_user(sk)) {/* Bad case. We could lose such FIN otherwise.* It is not a big problem, but it looks confusing* and not so rare event. We still can lose it now,* if it spins in bh_lock_sock(), but it is really* marginal case.*/inet_csk_reset_keepalive_timer(sk, tmo);} else {tcp_time_wait(sk, TCP_FIN_WAIT2, tmo);goto discard;}}}break;case TCP_CLOSING:if (tp->snd_una == tp->write_seq) {// 收到了ACK后套接字直接进入到TIME_WAIT状态,说明在连接的发送端没有其他等待向外发送的数据了。tcp_time_wait(sk, TCP_TIME_WAIT, 0);goto discard;}break;case TCP_LAST_ACK:// 如果套接字被迫关闭,则响应应用程序的close调用。// 在这个状态上接收到ACK意味着可以关闭套接字,所以调用tcp_done函数。if (tp->snd_una == tp->write_seq) {tcp_update_metrics(sk);tcp_done(sk);goto discard;}break;}} elsegoto discard;/* step 6: check the URG bit 紧急数据处理*/tcp_urg(sk, skb, th);/* step 7: process the segment text 处理段中的数据内容*/switch (sk->sk_state) {case TCP_CLOSE_WAIT:case TCP_CLOSING:case TCP_LAST_ACK:if (!before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt))break;case TCP_FIN_WAIT1:case TCP_FIN_WAIT2:/* RFC 793 says to queue data in these states,* RFC 1122 says we MUST send a reset.* BSD 4.4 also does reset.*/if (sk->sk_shutdown & RCV_SHUTDOWN) {if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&after(TCP_SKB_CB(skb)->end_seq - th->fin, tp->rcv_nxt)) {NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPABORTONDATA);tcp_reset(sk);return 1;}}/* Fall through */// 套接字为ESTABLISHED状态时收到常规数据段的处理,它调用tcp_data_queue函数把数据段放入套接字的输入缓冲队列。case TCP_ESTABLISHED:tcp_data_queue(sk, skb);queued = 1;break;}// 调用tcp_data_snd_check和tcp_ack_snd函数确定应向另一站点发送的是包含数据的数据段还是ACK数据段。/* tcp_data could move socket to TIME-WAIT */if (sk->sk_state != TCP_CLOSE) {tcp_data_snd_check(sk);tcp_ack_snd_check(sk);}if (!queued) {discard:__kfree_skb(skb);}return 0;
}

3. tcp连接为established状态时的接收处理(tcp_rcv_established)

TCP协议的目的是可靠、快速地传送数据。当两个站点的连接为ESTABLISHED状态时,表明连接已成功建立,这时开始数据的相互传送。

tcp_v4_do_rcv函数在处理输入数据包时,会查看套接字是否处于ESTABLISHED状态,如果是,则调用 tcp_rcv_established 函数来完成具体的数据接收过程。

tcp_rcv_established函数的主要目的是将数据段中的数据复制到用户地址空间。

数据包进入“Fast Path”路径条件

Linux内核提供了“Fast Path”处理路径来加速TCP数据传送。

Linux使用协议头预定向来选择哪些数据包应放到“Fast Path”路径上处理,其条件为:

  • 数据包接收顺序正确。
  • 数据包不需要进一步分析,可直接复制到应用程序的接收缓冲区中。

预定向标志存放在TCP的struct tcp_sock数据结构的tp->pred_flags数据域中

虽然协议头中预定向标志已计算出来,而在ESTABLISHED状态中处理输入数据包时,还需做一些别的检查,其中任何一项条件不符合,数据包就送给“Slow Path”处理:

  • TCP连接的接收端宣布接收窗口大小为0。对0窗口的探测处理只能在“Slow Path”上处理。
  • 数据包接收顺序不正确,也不会放入“Fast Path”处理。
  • 遇到的数据是紧急数据,这时“Fast Path”被禁止;将紧急数据复制到用户地址空间后,“Fast Path”重新打开。
  • 如果用户地址空间没有更多的接收缓冲区剩余,禁止“Fast Path”。
  • 任何协议头预定向过程失败都会将数据段放入“Slow Path”处理。
  • “Fast Path”只支持不会再改向的数据传输,所以如果我们必须把数据传送到别的路径,则将默认的输入数据段放入“Slow Path”处理。
  • 如果在输入数据包中除时间戳选项外还有别的选项,则将数据包送到“Slow Path”处理。

/** TCP receive function for the ESTABLISHED state.**   It is split into a fast path and a slow path. The fast path is*     disabled when:* - A zero window was announced from us - zero window probing*        is only handled properly in the slow path.* - Out of order segments arrived.*   - Urgent data is expected.* - There is no buffer space left*    - Unexpected TCP flags/window values/header lengths are received*     (detected by checking the TCP header against pred_flags)* - Data is sent in both directions. Fast path only supports pure senders*      or pure receivers (this means either the sequence number or the ack*    value must stay constant)*    - Unexpected TCP option.**  When these conditions are not satisfied it drops into a standard*   receive procedure patterned after RFC793 to handle all cases.*  The first three cases are guaranteed by proper pred_flags setting,* the rest is checked inline. Fast processing is turned on in*    tcp_data_queue when everything is OK.*/
int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,struct tcphdr *th, unsigned len)
{struct tcp_sock *tp = tcp_sk(sk);int res;/**  Header prediction.* The code loosely follows the one in the famous* "30 instruction TCP receive" Van Jacobson mail.** Van's trick is to deposit buffers into socket queue*   on a device interrupt, to call tcp_recv function*   on the receive process context and checksum and copy*   the buffer to user space. smart...**    Our current scheme is not silly either but we take the* extra cost of the net_bh soft interrupt processing...*  We do checksum and copy also but from device to kernel.*/tp->rx_opt.saw_tstamp = 0;/*   pred_flags is 0xS?10 << 16 + snd_wnd*    if header_prediction is to be made* 'S' will always be tp->tcp_header_len >> 2*  '?' will be 0 for the fast path, otherwise pred_flags is 0 to*  turn it off   (when there are holes in the receive*    space for instance)*   PSH flag is ignored.*/// 数据包是否满足fast_path条件// 1. 预定向标志与输入数据段的标志比较if ((tcp_flag_word(th) & TCP_HP_BITS) == tp->pred_flags &&// 2. 数据段接受顺序正确TCP_SKB_CB(skb)->seq == tp->rcv_nxt &&        !after(TCP_SKB_CB(skb)->ack_seq, tp->snd_nxt)) {int tcp_header_len = tp->tcp_header_len;/* Timestamp header prediction: tcp_header_len* is automatically equal to th->doff*4 due to pred_flags* match.*/// 3. 时间戳检查/* Check timestamp */if (tcp_header_len == sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) {/* No? Slow path! */if (!tcp_parse_aligned_timestamp(tp, th))goto slow_path;// 4. paws检查/* If PAWS failed, check it more carefully in slow path */if ((s32)(tp->rx_opt.rcv_tsval - tp->rx_opt.ts_recent) < 0)goto slow_path;/* DO NOT update ts_recent here, if checksum fails* and timestamp was corrupted part, it will result* in a hung connection since we will drop all* future packets due to the PAWS test.*/}if (len <= tcp_header_len) {/* Bulk data transfer: sender */if (len == tcp_header_len) {/* Predicted packet is in window by definition.* seq == rcv_nxt and rcv_wup <= rcv_nxt.* Hence, check seq<=rcv_wup reduces to:*//*5. 预期数据包在窗口范围内*/if (tcp_header_len ==(sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&tp->rcv_nxt == tp->rcv_wup)tcp_store_ts_recent(tp);/* We know that such packets are checksummed* on entry.*//**/tcp_ack(sk, skb, 0);__kfree_skb(skb);tcp_data_snd_check(sk);return 0;} else { /* Header too small */TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_INERRS);goto discard;}} else {int eaten = 0;int copied_early = 0;if (tp->copied_seq == tp->rcv_nxt &&len - tcp_header_len <= tp->ucopy.len) {#ifdef CONFIG_NET_DMAif (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {copied_early = 1;eaten = 1;}
#endif// Fast Path”路径运行在应用程序进程现场,现在查看当前应用进程是否为等待数据的进程。if (tp->ucopy.task == current &&sock_owned_by_user(sk) && !copied_early) {__set_current_state(TASK_RUNNING);if (!tcp_copy_to_iovec(sk, skb, tcp_header_len))eaten = 1;}if (eaten) {/* Predicted packet is in window by definition.* seq == rcv_nxt and rcv_wup <= rcv_nxt.* Hence, check seq<=rcv_wup reduces to:*/if (tcp_header_len ==(sizeof(struct tcphdr) +TCPOLEN_TSTAMP_ALIGNED) &&tp->rcv_nxt == tp->rcv_wup)tcp_store_ts_recent(tp);tcp_rcv_rtt_measure_ts(sk, skb);// 这是大块数据传送的接收端:我们将协议头从skb中移走,把数据段中包含的数据部分放入接收队列。__skb_pull(skb, tcp_header_len);tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITSTOUSER);}if (copied_early)tcp_cleanup_rbuf(sk, skb->len);}if (!eaten) {// 复制到用户空间失败if (tcp_checksum_complete_user(sk, skb))goto csum_error;/* Predicted packet is in window by definition.* seq == rcv_nxt and rcv_wup <= rcv_nxt.* Hence, check seq<=rcv_wup reduces to:*/if (tcp_header_len ==(sizeof(struct tcphdr) + TCPOLEN_TSTAMP_ALIGNED) &&tp->rcv_nxt == tp->rcv_wup)tcp_store_ts_recent(tp);tcp_rcv_rtt_measure_ts(sk, skb);if ((int)skb->truesize > sk->sk_forward_alloc)goto step5;NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPHPHITS);/* Bulk data transfer: receiver */__skb_pull(skb, tcp_header_len);__skb_queue_tail(&sk->sk_receive_queue, skb);skb_set_owner_r(skb, sk);tp->rcv_nxt = TCP_SKB_CB(skb)->end_seq;}tcp_event_data_recv(sk, skb);if (TCP_SKB_CB(skb)->ack_seq != tp->snd_una) {/* Well, only one small jumplet in fast path... */tcp_ack(sk, skb, FLAG_DATA);tcp_data_snd_check(sk);if (!inet_csk_ack_scheduled(sk))goto no_ack;}if (!copied_early || tp->rcv_nxt != tp->rcv_wup)__tcp_ack_snd_check(sk, 0);
no_ack:
#ifdef CONFIG_NET_DMAif (copied_early)__skb_queue_tail(&sk->sk_async_wait_queue, skb);else
#endifif (eaten)__kfree_skb(skb);elsesk->sk_data_ready(sk, 0);return 0;}}slow_path:if (len < (th->doff << 2) || tcp_checksum_complete_user(sk, skb))goto csum_error;/**  Standard slow path.标准slow path处理*/res = tcp_validate_incoming(sk, skb, th, 1);if (res <= 0)return -res;step5:if (th->ack && tcp_ack(sk, skb, FLAG_SLOWPATH) < 0)goto discard;tcp_rcv_rtt_measure_ts(sk, skb);/* Process urgent data. */tcp_urg(sk, skb, th);/* step 7: process the segment text */// tcp_data_queue把数据放入套接字的常规接收队列中。它把接收顺序不正确的数据段放入out_of_order_queue队列。对于套接字常规接收队列,当应用程序在打开的套接字上执行读系统调用时处理将继续,这时会调用tcp_recvmsg函数。tcp_data_queue(sk, skb);tcp_data_snd_check(sk);tcp_ack_snd_check(sk);return 0;csum_error:TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_INERRS);discard:__kfree_skb(skb);return 0;
}

4. TCP的TIME_WAIT状态处理(tcp_timewait_state_process )

TCP连接需要维护TIME_WAIT状态有两个原因:
● 当结束TCP连接时,使连接两端可靠结束。(连接的一端可能会重传最后一个ACK数据段。)
● 让网络中旧的复制段超时。

在Linux TCP/IP协议栈中记录TCP连接、连接状态、接收缓冲区的是struct sock数据结构;
struct sock数据结构的每个实例都需要分配内存,其中包括struct sock数据结构自己和由sock数据结构管理的数据缓冲区。
在每一个服务器上都有大量的连接在不停地打开和关闭,如果仍用sturct sock数据结构来维护这些等待关闭连接的套接字则开销非常大。
所以Linux内核使用了另一个数据结构来管理TCP连接的TIME_WAIT状态。

struct inet_timewait_sock 数据结构


/** This is a TIME_WAIT sock. It works around the memory consumption* problems of sockets in such a state on heavily loaded servers, but* without violating the protocol specification.*/
struct inet_timewait_sock {/** Now struct sock also uses sock_common, so please just* don't add nothing before this first member (__tw_common) --acme*/struct sock_common  __tw_common;
#define tw_family       __tw_common.skc_family
#define tw_state        __tw_common.skc_state
#define tw_reuse        __tw_common.skc_reuse
#define tw_bound_dev_if     __tw_common.skc_bound_dev_if
#define tw_node         __tw_common.skc_nulls_node
#define tw_bind_node        __tw_common.skc_bind_node
#define tw_refcnt       __tw_common.skc_refcnt
#define tw_hash         __tw_common.skc_hash
#define tw_prot         __tw_common.skc_prot
#define tw_net          __tw_common.skc_netint          tw_timeout;volatile unsigned char   tw_substate;/* 3 bits hole, try to pack */unsigned char     tw_rcv_wscale;/* Socket demultiplex comparisons on incoming packets. *//* these five are in inet_sock */__be16          tw_sport;__be32         tw_daddr __attribute__((aligned(INET_TIMEWAIT_ADDRCMP_ALIGN_BYTES)));__be32         tw_rcv_saddr;__be16         tw_dport;__u16          tw_num;kmemcheck_bitfield_begin(flags);/* And these are ours. */unsigned int        tw_ipv6only     : 1,tw_transparent  : 1,tw_pad      : 14,   /* 14 bits hole */tw_ipv6_offset  : 16;kmemcheck_bitfield_end(flags);unsigned long      tw_ttd;struct inet_bind_bucket  *tw_tb;struct hlist_node    tw_death_node;
};

tcp_timewait_state_process 函数完成大部分TCP连接在TIME_WAIT状态时对输入数据包的处理


/** * Main purpose of TIME-WAIT state is to close connection gracefully,*   when one of ends sits in LAST-ACK or CLOSING retransmitting FIN*   (and, probably, tail of data) and one or more our ACKs are lost.* * What is TIME-WAIT timeout? It is associated with maximal packet*   lifetime in the internet, which results in wrong conclusion, that*   it is set to catch "old duplicate segments" wandering out of their path.*   It is not quite correct. This timeout is calculated so that it exceeds*   maximal retransmission timeout enough to allow to lose one (or more)*   segments sent by peer and our ACKs. This time may be calculated from RTO.* * When TIME-WAIT socket receives RST, it means that another end*   finally closed and we are allowed to kill TIME-WAIT too.* * Second purpose of TIME-WAIT is catching old duplicate segments.*   Well, certainly it is pure paranoia, but if we load TIME-WAIT*   with this semantics, we MUST NOT kill TIME-WAIT state with RSTs.* * If we invented some more clever way to catch duplicates*   (f.e. based on PAWS), we could truncate TIME-WAIT to several RTOs.** The algorithm below is based on FORMAL INTERPRETATION of RFCs.* When you compare it to RFCs, please, read section SEGMENT ARRIVES* from the very beginning.** NOTE. With recycling (and later with fin-wait-2) TW bucket* is _not_ stateless. It means, that strictly speaking we must* spinlock it. I do not want! Well, probability of misbehaviour* is ridiculously low and, seems, we could use some mb() tricks* to avoid misread sequence numbers, states etc.  --ANK*/
enum tcp_tw_status
tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,const struct tcphdr *th)
{struct tcp_options_received tmp_opt;u8 *hash_location;struct tcp_timewait_sock *tcptw = tcp_twsk((struct sock *)tw);int paws_reject = 0;tmp_opt.saw_tstamp = 0;if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {tcp_parse_options(skb, &tmp_opt, &hash_location, 0);if (tmp_opt.saw_tstamp) {tmp_opt.ts_recent   = tcptw->tw_ts_recent;tmp_opt.ts_recent_stamp   = tcptw->tw_ts_recent_stamp;paws_reject = tcp_paws_reject(&tmp_opt, th->rst);}}if (tw->tw_substate == TCP_FIN_WAIT2) {/* Just repeat all the checks of tcp_rcv_state_process() *//* Out of window, send ACK */if (paws_reject ||!tcp_in_window(TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq,tcptw->tw_rcv_nxt,tcptw->tw_rcv_nxt + tcptw->tw_rcv_wnd))return TCP_TW_ACK;if (th->rst)goto kill;if (th->syn && !before(TCP_SKB_CB(skb)->seq, tcptw->tw_rcv_nxt))goto kill_with_rst;/* Dup ACK? */if (!th->ack ||!after(TCP_SKB_CB(skb)->end_seq, tcptw->tw_rcv_nxt) ||TCP_SKB_CB(skb)->end_seq == TCP_SKB_CB(skb)->seq) {inet_twsk_put(tw);return TCP_TW_SUCCESS;}/* New data or FIN. If new data arrive after half-duplex close,* reset.*/if (!th->fin ||TCP_SKB_CB(skb)->end_seq != tcptw->tw_rcv_nxt + 1) {kill_with_rst:inet_twsk_deschedule(tw, &tcp_death_row);inet_twsk_put(tw);return TCP_TW_RST;}/* FIN arrived, enter true time-wait state. */tw->tw_substate      = TCP_TIME_WAIT;tcptw->tw_rcv_nxt = TCP_SKB_CB(skb)->end_seq;if (tmp_opt.saw_tstamp) {tcptw->tw_ts_recent_stamp = get_seconds();tcptw->tw_ts_recent      = tmp_opt.rcv_tsval;}/* I am shamed, but failed to make it more elegant.* Yes, it is direct reference to IP, which is impossible* to generalize to IPv6. Taking into account that IPv6* do not understand recycling in any case, it not* a big problem in practice. --ANK */if (tw->tw_family == AF_INET &&tcp_death_row.sysctl_tw_recycle && tcptw->tw_ts_recent_stamp &&tcp_v4_tw_remember_stamp(tw))inet_twsk_schedule(tw, &tcp_death_row, tw->tw_timeout,TCP_TIMEWAIT_LEN);elseinet_twsk_schedule(tw, &tcp_death_row, TCP_TIMEWAIT_LEN,TCP_TIMEWAIT_LEN);return TCP_TW_ACK;}/**   Now real TIME-WAIT state.** RFC 1122:*  "When a connection is [...] on TIME-WAIT state [...]*  [a TCP] MAY accept a new SYN from the remote TCP to*    reopen the connection directly, if it:**    (1)  assigns its initial sequence number for the new*   connection to be larger than the largest sequence*  number it used on the previous connection incarnation,* and**   (2)  returns to TIME-WAIT state if the SYN turns out*   to be an old duplicate".*/if (!paws_reject &&(TCP_SKB_CB(skb)->seq == tcptw->tw_rcv_nxt &&(TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq || th->rst))) {/* In window segment, it may be only reset or bare ack. */if (th->rst) {/* This is TIME_WAIT assassination, in two flavors.* Oh well... nobody has a sufficient solution to this* protocol bug yet.*/if (sysctl_tcp_rfc1337 == 0) {kill:inet_twsk_deschedule(tw, &tcp_death_row);inet_twsk_put(tw);return TCP_TW_SUCCESS;}}inet_twsk_schedule(tw, &tcp_death_row, TCP_TIMEWAIT_LEN,TCP_TIMEWAIT_LEN);if (tmp_opt.saw_tstamp) {tcptw->tw_ts_recent     = tmp_opt.rcv_tsval;tcptw->tw_ts_recent_stamp = get_seconds();}inet_twsk_put(tw);return TCP_TW_SUCCESS;}/* Out of window segment.All the segments are ACKed immediately.The only exception is new SYN. We accept it, if it isnot old duplicate and we are not in danger to be killedby delayed old duplicates. RFC check is that it hasnewer sequence number works at rates <40Mbit/sec.However, if paws works, it is reliable AND even more,we even may relax silly seq space cutoff.RED-PEN: we violate main RFC requirement, if this SYN will appearold duplicate (i.e. we receive RST in reply to SYN-ACK),we must return socket to time-wait state. It is not good,but not fatal yet.*/if (th->syn && !th->rst && !th->ack && !paws_reject &&(after(TCP_SKB_CB(skb)->seq, tcptw->tw_rcv_nxt) ||(tmp_opt.saw_tstamp &&(s32)(tcptw->tw_ts_recent - tmp_opt.rcv_tsval) < 0))) {u32 isn = tcptw->tw_snd_nxt + 65535 + 2;if (isn == 0)isn++;TCP_SKB_CB(skb)->when = isn;return TCP_TW_SYN;}if (paws_reject)NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_PAWSESTABREJECTED);if (!th->rst) {/* In this case we must reset the TIMEWAIT timer.** If it is ACKless SYN it may be both old duplicate* and new good SYN with random sequence number <rcv_nxt.* Do not reschedule in the last case.*/if (paws_reject || th->ack)inet_twsk_schedule(tw, &tcp_death_row, TCP_TIMEWAIT_LEN,TCP_TIMEWAIT_LEN);/* Send ACK. Note, we do not put the bucket,* it will be released by caller.*/return TCP_TW_ACK;}inet_twsk_put(tw);return TCP_TW_SUCCESS;
}

linux内核对TCP的连接状态管理相关推荐

  1. linux mii 网卡驱动,网卡驱动8-MII接口以及linux内核对MII的支持

    首先,向大家推荐一些文章. 这上面说了MII.RMII.SMII.GMII等一系列的接口. 网口一般是这样 MacàPhyà网络变压器àRJ45口 但是只是从电路上不一定看得这么清楚,应为有些是集成的 ...

  2. Linux网卡驱动适配各个内核,网卡驱动8 MII接口以及linux内核对MII的支持

    首先,向大家推荐一些文章. http://blog.chinaunix.net/uid-24148050-id-131084.html http://hi.baidu.com/lds102/item/ ...

  3. 详解linux netstat输出的网络连接状态信息

    本博文为老男孩linu培训机构早期的培训教案,特分享以供大家学习参考. 全部系列分为五篇文章,本博文为第一篇: 目录:一.生产服务器netstat tcp连接状态................... ...

  4. 查看tcp各个连接状态的数量

    4. 查看tcp各个连接状态的数量下面对的 netstat -tan|awk '$1~/tcp/{aa[$NF]++}END{for (h in aa)print h,aa[h]}' SYN_SENT ...

  5. Linux的系统调用、网络连接状态、磁盘I/O;可疑行为监控/日志收集、SHELL命令执行流程

    http://man7.org/linux/man-pages/man7/capabilities.7.html http://www.cnblogs.com/LittleHann/p/3850653 ...

  6. TCP的连接状态标识 (SYN, FIN, ACK, PSH, RST, URG)

    TCP层,有个FLAGS字段,这个字段有以下几个标识:SYN, FIN, ACK, PSH, RST, URG. 其中,对于我们日常的分析有用的就是前面的五个字段.它们的含义是: (1)SYN表示建立 ...

  7. Linux ss命令 报错,ECS Linux中ss命令显示连接状态的使用说明

    1. ss命令可用来获取socket统计信息,这个命令输出的结果类似于netstat输出的内容,但是它能够显示更多更详细的TCP连接状态的信息,且比netstat更快更高效. ss命令能够从内核空间直 ...

  8. Linux环境下获取网卡连接状态

    在嵌入式项目中,有时需要获取设备本身的一些运行信息,网口的连接状态就是其中之一,这还真不太好弄,网上查了一下资料,整理了一下,pIfName为网口名称,比如eth0, 返回1为连接,0为断开. int ...

  9. Linux下 查看网络连接状态的命令是,查看Linux操作系统下的网络连接状态命令

    benzaoai 签约达人 07-08 TA获得超过1839个赞 查看操作系统信息的相关命令太多了,下面这个列表算是常用的和查看操作系统相关的命令的合集吧.希望能帮到你! # uname -a # 查 ...

最新文章

  1. CSS3(linear-gradient, transition, transform, border-radius)
  2. Linux编辑器vi使用方法详细介绍
  3. 最小二乘抛物线拟合原理及证明
  4. 通用mapper笔记
  5. 用 div 仿写 input 和 textarea 功能
  6. 克制懒惰之飞鸽传书版
  7. 基于QEMU的NVRAM仿真
  8. 外媒揭晓华为Mate 30 Pro配置细节:新iPhone最大的对手
  9. java log4jhelper_java项目中log4j的日志,控制台跟文件日志级别都是info,为什么文件跟控制台输出的还不同呢?...
  10. mysql 手册及优化
  11. [javaSE] GUI(Action事件)
  12. 自定义启动 android_什么是自定义Android启动器,以及为什么可能要使用一个
  13. javascript异步编程之回调函数
  14. 1007 Maximum Subsequence Sum (25 分) java 题解
  15. c语言fscanf 发生段错误,亚嵌教育
  16. Let‘sEncrypt快速颁发及自动续签泛域名证书实践指南
  17. flex布局设置宽度不生效,高度生效
  18. U²-Net:使用显著性物体检测来生成真实的铅笔肖像画
  19. STM32+OpenMV+AS608实现人脸识别
  20. ES6新语法及vue基础

热门文章

  1. 人类一败涂地电脑版_iOS账号分享 |人类一败涂地 我们继续相爱相杀,PC大火游戏移植!...
  2. 查询某个字段的记录是否包含中文或者是否是数字
  3. 条件概率的一些结论以及理解
  4. RocketMQ(三)—— 集群模式的说明
  5. pip matplotlib 使用镜像源,pytorch 1.5 cpu
  6. php多维数组删除数据,PHP多维数组删除问题
  7. linux系统while循环,linux命令:while循环(示例代码)
  8. linux分区文件 pe,关于linux的磁盘和分区的操作(一)
  9. 站覆盖范围_武汉高铁站落地灯箱广告有什么投放价值?
  10. 会计用计算机很快是,40个超实用电脑快捷键,老会计都在用!