本文转自:http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html

When an application puts a socket into LISTEN state using the listen syscall, it needs to specify a backlog for that socket. The backlog is usually described as the limit for the queue of incoming connections.

TCP state diagram

1、The implementation uses a single queue, the size of which is determined by the backlog argument of the listen syscall. When a SYN packet is received, it sends back a SYN/ACK packet and adds the connection to the queue. When the corresponding ACK is received, the connection changes its state to ESTABLISHED and becomes eligible for handover to the application. This means that the queue can contain connections in two different state: SYN RECEIVED and ESTABLISHED. Only connections in the latter state can be returned to the application by the accept syscall.

2、The implementation uses two queues, a SYN queue (or incomplete connection queue) and an accept queue (or complete connection queue). Connections in state SYN RECEIVED are added to the SYN queue and later moved to the accept queue when their state changes to ESTABLISHED, i.e. when the ACK packet in the 3-way handshake is received. As the name implies, the accept call is then implemented simply to consume connections from the accept queue. In this case, the backlog argument of the listen syscall determines the size of the accept queue.

Historically, BSD derived TCP implementations use the first approach. That choice implies that when the maximum backlog is reached, the system will no longer send back SYN/ACK packets in response to SYN packets. Usually the TCP implementation will simply drop the SYN packet (instead of responding with a RST packet) so that the client will retry. This is what is described in section 14.5, listen Backlog Queue in W. Richard Stevens’ classic textbook TCP/IP Illustrated, Volume 3.

Note that Stevens actually explains that the BSD implementation does use two separate queues, but they behave as a single queue with a fixed maximum size determined by (but not necessary exactly equal to) the backlog argument, i.e. BSD logically behaves as described in option 1:

The queue limit applies to the sum of […] the number of entries on the incomplete connection queue […] and […] the number of entries on the completed connection queue […].

On Linux, things are different, as mentioned in the man page of the listen syscall:

The behavior of the backlog argument on TCP sockets changed with Linux 2.2. Now it specifies the queue length for completely established sockets waiting to be accepted, instead of the number of incomplete connection requests. The maximum length of the queue for incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog.

This means that current Linux versions use the second option with two distinct queues: a SYN queue with a size specified by a system wide setting and an accept queue with a size specified by the application.

The interesting question is now how such an implementation behaves if the accept queue is full and a connection needs to be moved from the SYN queue to the accept queue, i.e. when the ACK packet of the 3-way handshake is received. This case is handled by the tcp_check_req function in net/ipv4/tcp_minisocks.c. The relevant code reads:

child = inet_csk(sk)->icsk_af_ops->syn_recv_sock(sk, skb, req, NULL);
if (child == NULL)goto listen_overflow;

For IPv4, the first line of code will actually call tcp_v4_syn_recv_sock in net/ipv4/tcp_ipv4.c, which contains the following code:

if (sk_acceptq_is_full(sk))goto exit_overflow;

We see here the check for the accept queue. The code after the exit_overflow label will perform some cleanup, update the ListenOverflows and ListenDrops statistics in /proc/net/netstat and then return NULL. This will trigger the execution of the listen_overflow code in tcp_check_req:

listen_overflow:if (!sysctl_tcp_abort_on_overflow) {inet_rsk(req)->acked = 1;return NULL;}

This means that unless /proc/sys/net/ipv4/tcp_abort_on_overflow is set to 1 (in which case the code right after the code shown above will send a RST packet), the implementation basically does… nothing!

To summarize, if the TCP implementation in Linux receives the ACK packet of the 3-way handshake and the accept queue is full, it will basically ignore that packet. At first, this sounds strange, but remember that there is a timer associated with the SYN RECEIVED state: if the ACK packet is not received (or if it is ignored, as in the case considered here), then the TCP implementation will resend the SYN/ACK packet (with a certain number of retries specified by /proc/sys/net/ipv4/tcp_synack_retries and using an exponential backoff algorithm).

This can be seen in the following packet trace for a client attempting to connect (and send data) to a socket that has reached its maximum backlog:

  0.000  127.0.0.1 -> 127.0.0.1  TCP 74 53302 > 9999 [SYN] Seq=0 Len=00.000  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=00.000  127.0.0.1 -> 127.0.0.1  TCP 66 53302 > 9999 [ACK] Seq=1 Ack=1 Len=00.000  127.0.0.1 -> 127.0.0.1  TCP 71 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=50.207  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=50.623  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=51.199  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=01.199  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 6#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=01.455  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=53.123  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=53.399  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=03.399  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 10#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=06.459  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=57.599  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=07.599  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 13#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=013.131  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=515.599  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=015.599  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 16#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=026.491  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=531.599  127.0.0.1 -> 127.0.0.1  TCP 74 9999 > 53302 [SYN, ACK] Seq=0 Ack=1 Len=031.599  127.0.0.1 -> 127.0.0.1  TCP 66 [TCP Dup ACK 19#1] 53302 > 9999 [ACK] Seq=6 Ack=1 Len=053.179  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
106.491  127.0.0.1 -> 127.0.0.1  TCP 71 [TCP Retransmission] 53302 > 9999 [PSH, ACK] Seq=1 Ack=1 Len=5
106.491  127.0.0.1 -> 127.0.0.1  TCP 54 9999 > 53302 [RST] Seq=1 Len=0

Since the TCP implementation on the client side gets multiple SYN/ACK packets, it will assume that the ACK packet was lost and resend it (see the lines with TCP Dup ACK in the above trace). If the application on the server side reduces the backlog (i.e. consumes an entry from the accept queue) before the maximum number of SYN/ACK retries has been reached, then the TCP implementation will eventually process one of the duplicate ACKs, transition the state of the connection from SYN RECEIVED to ESTABLISHED and add it to the accept queue. Otherwise, the client will eventually get a RST packet (as in the sample shown above).

The packet trace also shows another interesting aspect of this behavior. From the point of view of the client, the connection will be in state ESTABLISHED after reception of the first SYN/ACK. If it sends data (without waiting for data from the server first), then that data will be retransmitted as well. Fortunately TCP slow-start should limit the number of segments sent during this phase.

On the other hand, if the client first waits for data from the server and the server never reduces the backlog, then the end result is that on the client side, the connection is in state ESTABLISHED, while on the server side, the connection is considered CLOSED. This means that we end up with a half-open connection!

There is one other aspect that we didn’t discuss yet. The quote from the listen man page suggests that every SYN packet would result in the addition of a connection to the SYN queue (unless that queue is full). That is not exactly how things work. The reason is the following code in the tcp_v4_conn_request function (which does the processing of SYN packets) in net/ipv4/tcp_ipv4.c:

/* Accept backlog is full. If we have already queued enough* of warm entries in syn queue, drop request. It is better than* clogging syn queue with openreqs with exponentially increasing* timeout.*/
if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1) {NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);goto drop;
}

What this means is that if the accept queue is full, then the kernel will impose a limit on the rate at which SYN packets are accepted. If too many SYN packets are received, some of them will be dropped. In this case, it is up to the client to retry sending the SYN packet and we end up with the same behavior as in BSD derived implementations.

To conclude, let’s try to see why the design choice made by Linux would be superior to the traditional BSD implementation. Stevens makes the following interesting point:

1、The backlog can be reached if the completed connection queue fills
(i.e., the server process or the server host is so busy that the
process cannot call accept fast enough to take the completed entries
off the queue) or if the incomplete connection queue fills. The latter
is the problem that HTTP servers face, when the round-trip time
between the client and server is long, compared to the arrival rate of
new connection requests, because a new SYN occupies an entry on this
queue for one round-trip time. […]

2、The completed connection queue is almost always empty because when an
entry is placed on this queue, the server’s call to accept returns,
and the server takes the completed connection off the queue.

The solution suggested by Stevens is simply to increase the backlog. The problem with this is that it assumes that an application is expected to tune the backlog not only taking into account how it intents to process newly established incoming connections, but also in function of traffic characteristics such as the round-trip time. The implementation in Linux effectively separates these two concerns: the application is only responsible for tuning the backlog such that it can call accept fast enough to avoid filling the accept queue); a system administrator can then tune /proc/sys/net/ipv4/tcp_max_syn_backlog based on traffic characteristics.

linux backlog详解相关推荐

  1. Linux Socket详解 大全 基础知识

    1. Socket基础概念: 1.1:形象类比: Socket和电话网络的概念可以做一个很好的类比: Linux 编程中所说的socket就如同一个端点,类比到电话网中,它就如同一个电话机. 而Soc ...

  2. 《Linux命令详解手册》——Linux畅销书作家又一力作

    关注IT,更要关心IT人,让系统管理员以及程序员工作得更加轻松和快乐.鉴于此, 图灵公司引进了国外知名出版社John Wiley and Sons出版的Fedora Linux Toolbox: 10 ...

  3. Linux系统详解 系统的启动、登录、注销与开关机

    Linux系统详解 第六篇:系统的启动.登录.注销与开关机 原创作品,允许转载,转载时请务必以超链接形式标明文章 原始出处 .作者信息和本声明.否则将追究法律责任.http://johncai.blo ...

  4. 每天一个linux命令(25):linux文件属性详解

    每天一个linux命令(25):linux文件属性详解 Linux 文件或目录的属性主要包括:文件或目录的节点.种类.权限模式.链接数量.所归属的用户和用户组.最近访问或修改的时间等内容.具体情况如下 ...

  5. c linux time微秒_学习linux,看这篇1.5w多字的linux命令详解(6小时讲明白Linux)

    用心分享,共同成长 没有什么比每天进步一点点更重要了 本篇文章主要讲解了一些linux常用命令,主要讲解模式是,命令介绍.命令参数格式.命令参数.命令常用参数示例.由于linux命令较多,我还特意选了 ...

  6. Linux系统结构 详解

    Linux系统结构 详解 标签: 产品产品设计googleapple互联网 2011-01-07 14:14 31038人阅读 评论(6) 收藏 举报 分类: Linux(21) 版权声明:本文为博主 ...

  7. 《嵌入式Linux软硬件开发详解——基于S5PV210处理器》——2.2 DDR2 SDRAM芯片

    本节书摘来自异步社区<嵌入式Linux软硬件开发详解--基于S5PV210处理器>一书中的第2章,第2.2节,作者 刘龙,更多章节内容可以访问云栖社区"异步社区"公众号 ...

  8. linux系统服务详解 用于Linux系统服务优化

    linux系统服务详解 用于Linux系统服务优化 服务名        必需(是/否)用途描述        注解 acon              否       语言支持        特别支 ...

  9. linux /proc 详解

    linux /proc 详解 本文整理了一下 linux /proc下的几个常用的目录和文件,可供查阅,之后在学习工作中有别的用到的话会再补充. /proc 简介 Linux系统上的/proc目录是一 ...

最新文章

  1. 更换ubuntu软件源的方法
  2. VMware 创建开启虚拟机时候报错的解决方式
  3. mybatis insert 重复数据2条_Mybatis框架lt;增gt;:添加一条数据到数据库中,insert...
  4. 河南理工大学计算机学院课表,河南理工大学实验课课程表.doc
  5. 列表理解与lambda +过滤器
  6. Centos7安装SCIP with AMPL
  7. eclipse 快捷调整字体_eclipse字体大小设置快捷键
  8. 【bb平台刷课记】wireshark结合实例学抓包
  9. Wget下载网页与镜像网站
  10. eclipse安装插件速度很慢的解决方案
  11. 联想微型计算机m4350q升级,拆解:高度集成化的联想M4350q
  12. 计算机用几个字节储存,一个文字在计算机中用两个字节来储存。()
  13. Date的getDay()和getDate()的区别:
  14. 以逗号为分隔符对字符串进行分隔
  15. PHPStorm 配置 debug 默认参数
  16. 广义表的头尾链表存储表示(第五章 P115 算法5.5,5.6,5.8)
  17. CentOS7 下MariaDB安装与简单配置(最新)
  18. 新⼀代USDP开源套件,可替代CDH的免费大数据套件平台及架构选型
  19. 相位差和相移理论知识概括
  20. 拥抱开源,我们是认真的-网易易数2020年Apache Spark贡献总结

热门文章

  1. Robot Localization AMCL原理以及代码
  2. iMeta | 复旦大学附属华山医院证实肾上腺皮质癌中瘤内菌的存在并与预后相关...
  3. 查询关键字并显示标红(html+js)
  4. poj1273(网络流模板)
  5. Ext表单组件之textField
  6. HDU-1846-Brave Game(巴什博弈)
  7. Java后台实现网站微信扫码登录功能,获取用户openid,及微信用户信息(小程序码方案),关联微信小程序(个人主体小程序也可以)
  8. 点到反比例函数最短距离怎么求_谁教教我反比例函数距离公式?
  9. 如何使用 YOLOv5 训练自己的数据集
  10. hrtf 旋转音效matlab实现