TCP-Nagle：代码版本重新解释Nagle算法

开年来的第一份工作，就是在最新的内核上打补丁。

可没想到Nagle算法也被我冲进了去年的垃圾桶里。

在网上找了一些资料，理论很快被消化，但看了看内核的实现，久久没能动弹。坐了一天，才摸索出来点什么，觉得需要一份代码解释TCP-Nagle的版本说明，这样和Nagle算法的黑盒解释，才能对TCP-Nagle有全面的理解。

于是，有了本文。

Nagle算法

RFC对Nagle算法解释

Nagle’s algorithm, named after John Nagle, is a means of improving the efficiency of TCP/IP networks
by reducing the number of packets that need to be sent over the network.
Nagle’s algorithm works by combining a number of small outgoing messages, and sending them
all at once. Specifically, as long as there is a sent packet for which the sender has received no
acknowledgment, the sender should keep buffering its output until it has a full packet’s worth of output,
so that output can be sent all at once.
Nagle’s algorithm purposefully delays transmission, increasing bandwidth efficiency at the expense of latency.

中文意思是：TCP-Nagle避免发送大量的小包（阻塞小包，粘合成大包），减少传输次数，但会有延迟的牺牲。

至于为什么会这样，大佬的文章已经有详细的展开，最后会贴上链接，欢迎去阅读。

TCP-Nagle的逻辑

if there is new data to sendif the window size >= MSS and available data is >= MSSsend complete MSS segment nowelseif there is unconfirmed data still in the pipeenqueue data in the buffer until an acknowledge is receivedelsesend data immediatelyend ifend if
end if

TCP-Nagle 图解

在这里插入图片描述

Nagle图解完全依赖最新的内核实现。

可以看出，代码的实现短小精悍。那么来，详细解释下。

TCP-Nagle代码版本

TCP-Nagle显式标识事件

/* Flags in tp->nonagle */
#define TCP_NAGLE_OFF		1	/* Nagle's algo is disabled */
#define TCP_NAGLE_CORK		2	/* Socket is corked	    */
#define TCP_NAGLE_PUSH		4	/* Cork is overridden for already queued data */

除了这三个事件外，nonagle = 0 。

代码解释TCP-Nagle

TCP-Nagle仅仅对新报文有用，这使得它的实现只在tcp_write_xmit里。
在tcp_transmit_skb前，TCP会优先判断报文发送的时机——可靠性检查，TCP-Nagle是一种影响发送的因素，目的是为了减少小包的发送。

if (tso_segs == 1) {if (unlikely(!tcp_nagle_test(tp, skb, mss_now,(tcp_skb_is_last(sk, skb) ?nonagle : TCP_NAGLE_PUSH))))break;} else {if (!push_one &&tcp_tso_should_defer(sk, skb, &is_cwnd_limited,&is_rwnd_limited, max_segs))break;}

实际上，tcp_nagle_test和tcp_tso_should_defer都是TCP-Nagle，其中tcp_tso_should_defer讨论的是有GSO/TSO机制的介入，除了处理细节更复杂些，其他的道理和tcp_nagle_test相通，所以这里仅介绍tcp_nagle_test。

这里有个细节，tcp_skb_is_last(sk, skb) ? nonagle : TCP_NAGLE_PUSH保证了Nagle仅对发送队列sk->sk_write_queue最后一个skb有作用，原因是粘包只能发生在最后一个skb上。这也在tcp_nagle_test的具体实现中有提及。

接下来，看下TCP-Nagle的主战场tcp_nagle_test的实现。

/* Return true if the Nagle test allows this packet to be* sent now.*///表明tcp_nagle_test返回true意味TCP-Nagle不生效，false才会启用Nagle粘包。
static inline bool tcp_nagle_test(const struct tcp_sock *tp, const struct sk_buff *skb, unsigned int cur_mss, int nonagle)
{/* Nagle rule does not apply to frames, which sit in the middle of the* write_queue (they have no chances to get new data).** This is implemented in the callers, where they modify the 'nonagle'* argument based upon the location of SKB in the send queue.*/if (nonagle & TCP_NAGLE_PUSH)return true;/* Don't use the nagle rule for urgent data (or for the final FIN). */if (tcp_urg_mode(tp) || (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN))return true;if (!tcp_nagle_check(skb->len < cur_mss, tp, nonagle))return true;return false;
}

tcp_nagle_test的实现表明TCP-Nagle会受到很多条件限制。

前面的条件很直白，TCP_NAGLE_PUSH、 tcp_urg_mode(tp)、 (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)都表示数据包必须立即发送。

读TCP-Nagle时，有些误区是在tcp_nagle_check(skb->len < cur_mss, tp, nonagle)中出现的。

所以，tcp_nagle_check值得看看。

/* Return false, if packet can be sent now without violation Nagle's rules:* 1. It is full sized. (provided by caller in %partial bool)* 2. Or it contains FIN. (already checked by caller)* 3. Or TCP_CORK is not set, and TCP_NODELAY is set.* 4. Or TCP_CORK is not set, and all sent packets are ACKed.*    With Minshall's modification: all sent small packets are ACKed.*/
static bool tcp_nagle_check(bool partial, const struct tcp_sock *tp,int nonagle)
{return partial &&((nonagle & TCP_NAGLE_CORK) ||(!nonagle && tp->packets_out && tcp_minshall_check(tp)));
}

又是一堆条件。可是，在这里需要小心，所以我们仔细看看。

tcp_nagle_check的第一个行参是bool partial，上面传递的参数是skb->len < cur_mss，表明TCP-Nagle要生效的第一个条件是skb所有的数据存储长度在小于MSS值，这意味着一旦skb包的所有数据大小大于等于MSS时，立刻发出数据包。

这里严格的来说，TCP-Nagle并不是一个完全的包停-等协议。

再看看小于MSS时，后面使能TCP-Nagle的条件，即skb所有数据长度小于MSS值的后续条件。

((nonagle & TCP_NAGLE_CORK) ||(!nonagle && tp->packets_out && tcp_minshall_check(tp)))

第一个条件，(nonagle & TCP_NAGLE_CORK)，TCP_NAGLE_CORK标识显式注定了TCP-Nagle使能。

若没有TCP_NAGLE_CORK，第二个条件也可以使能TCP-Nagle。(!nonagle && tp->packets_out && tcp_minshall_check(tp))表明：

nonagle没有被标记TCP_NAGLE_OFF，原因是前面已经排除了TCP_NAGLE_PUSH和TCP_NAGLE_CORK，Nagle的显式表示只有TCP_NAGLE_OFF，其余情况nonagle皆为0。
tp->packets_out表示网络中的包，这是个强类型，当tp->packets_out为0时，网络中的包全部被确认，此时数据包需要立刻发送；当tp->packets_out>0，表明网络中存在数据包，使能TCP-Nagle。
tcp_minshall_check(tp)是网络中仍存在小包，这是个弱类型，只要网络中没有小包，不管网络中是否还存在数据包未确认，都立即发送待发送的数据包；当网络中有小包时，则选择使能TCP-Nagle。

网络中有小包时，表明网络中必然有数据包；网络中无小包时，不管你有没有数据包，都立刻发送待发送的数据，所以，条件2是强类型（：更强的条件），条件3是弱类型。

对于大多数博客上说的delayACK和Nagle算法会导致时延增大，这点没有任何问题，但是黑盒方式的描述可能会让读者产生歧义，认为一个延迟的ACK是影响TCP-Nagle导致数据包滞后发送的原因。

实际上，TCP-Nagle在条件2、3上有解释，当接收所有的未确认报文或者是网络中不存在小包时，数据包会立刻发送。

这时，你可以这么理解，当对条件2、3判断时，可能会有多个延迟ACK的确认才会使得TCP-Nagle不使能，一个延迟ACK尚且有对数据包延迟有影响，当网络容量大时，多个延迟的ACK只会更加加重数据包的滞后，TCP-Nagle就变成了鸡肋。（TCP-Nagle是对网络中包的判断，并不是直接对ACK的判断）

结论

其实上，TCP-Nagle相比较于其他机制算是比较简单的，但纯粹的理论，或者说是黑盒，并不能解释的很通透，所以，对于白盒，我做了第一个，希望不是最后一个。

下面给出借鉴的博客：
1.https://blog.csdn.net/zhangskd/article/details/7712002?ops_request_misc=&request_id=&biz_id=102&utm_term=nagle%20zhangskd&utm_medium=distribute.pc_search_result.none-task-blog-2_allsobaiduweb~default-0-7712002.first_rank_v2_pc_rank_v29&spm=1018.2226.3001.4187

2.https://blog.csdn.net/wdscq1234/article/details/52432095?ops_request_misc=&request_id=&biz_id=102&utm_term=nagle算法&utm_medium=distribute.pc_search_result.none-task-blog-2_allsobaiduweb~default-1-52432095.first_rank_v2_pc_rank_v29&spm=1018.2226.3001.4187

3.https://blog.csdn.net/dog250/article/details/21303679?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522164432001516780271977764%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fall.%2522%257D&request_id=164432001516780271977764&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2_allfirst_rank_ecpm_v1~rank_v31_ecpm-2-21303679.first_rank_v2_pc_rank_v29&utm_term=dog250+nagle算法&spm=1018.2226.3001.4187