Spring Boot 2.0 was made generally available in early 2018 and was well received by many developers. The default web container is Netty, which indicates that the “reactive” container is already the trend of the times. Whether it is the co-dependent thread of Go language or the reactor thread model based on Java, they are both examples of realizing high concurrency based on event programming. Initially, l wanted to introduce NIO in detail, but I realized that before introducing NIO, it is necessary to understand the TCP protocol. Most applications currently operate on the application layer, so a lot of network details are hidden. Knowing these details and principles is helpful for troubleshooting.
This article introduces TCP in detail, including its underlying principle and application architectures, and discusses how to use it to build high-performance servers.
TCP is a connection-oriented protocol that provides reliable full-duplex byte streams to user processes. In this way, reliable and orderly data packets can be ensured and traffic control can be supported. We start from the following aspects to explain the reasons why TCP should implement the above operations:
- The reason why IP network layer does not ensure the reliability of data packets
- The way that TCP ensures reachable and orderly data packets
- The way that TCP supports traffic control
- TCP status and application
OSI Network Layers
To understand the reason why IP network layer does not ensure the reliability of data packets, let’s look at the OSI network layer first. In the following layers, TCP is located at the transport layer, which ensures the reliability and continuity of the protocol. The specific transmitting and receiving packets are determined by the underlying link layer and physical layer, so the work of TCP is also based on the underlying optimization and improvement.
The communication between the client and the server uses the application protocol. The communication at the transport layer uses TCP, while TCP uses the lower layer IP, and IP uses some form of data link layer to communicate.
We know that the data in the network is finally transmitted through multiple router connections. The underlying Ethernet protocol specifies how electronic signals form data packets, which solves the point-to-point communication problem of local area networks (LAN), but cannot solve the intercommunication problem of multiple LAN.
The IP protocol used by the network layer defines a set of its own address rules, which mainly solve the addressing and routing problems to find the best path to transmit information according to the IP address of the other party. A LAN is connected through a router, which, based on IP protocols, directs packets to be forwarded to a certain routing interface. However, the IP protocol does not ensure the arrival and integrity of packets, especially when the network is congested, some packets are to be discarded to ensure data transmission efficiency.
To ensure the integrity, orderliness and reliability of data packets, this is what TCP is about to do.
Deep Dive into TCP
Composition of the TCP Packet
Many networks have a maximum transmission unit (MTU), which is a limitation on data frames in the Link Layer. For example, the MTU is 1,500 bytes over Ethernet. An IP datagram is transmitted over Ethernet. If its length is greater than the MTU value, it must be transmitted in shards so that the length of each shard is smaller than MTU.
In addition, a data packet also contains header information, including IP header information and Ethernet header information in addition to its own TCP header. IP packets need at least 20 bytes in the load of Ethernet data packets. Therefore, the load of IP data packets is up to 1,480 bytes.
So what is the size of a TCP packet?
This requires the MSS value to determine. MSS is a concept in TCP (in the option field of the header). MSS is the maximum data segment that TCP data packets can transmit each time. When the length of a TCP packet is greater than MSS, it must be transmitted in segments. If MSS is not set, the default value is 536 bytes. That is to say, a TCP packet is about 500 bytes.
As mentioned above, the underlying router does not ensure the reliability and orderliness of the packet when forwarding it.
First, to ensure the integrity of packets, TCP subpackage packets larger than MSS based on MSS. The default MSS is 563 bytes, which is smaller than MUT to packets from being sharded at the network layer.
Secondly, the SEQ and ACK are added, and the mechanism of timeout retransmission is adopted to ensure the reliability of the packet.
To ensure the orderliness of packets, TCP allocates a sequence number (SEQ) for each packet. In this way, the receiving party can restore the packets in sequence. In case of a packet loss, it is also possible to know which packet is lost. Generally, the SEQ of the first packet is a random number, which can also start from 1.
Now that the SEQ has been assigned, then how to ensure that the package arrives?
This is determined based on the ACK. Each time a packet is received, the receiver must return an ACK so that the sender can confirm that the packet has been transmitted. In addition, the receiver must verify each packet. If an error is found during verification, no ACK is transmitted, triggering a timeout retransmission on the sender.
An ACK contains the following information:
- The SEQ of the next packet expected to be received
- Remaining capacity of the receiving window of the receiver
We use wireshark to capture an oschina packet to view the three-way handshake data.
Native IP: 192.168.1.103 oschinaIp: 184.108.40.206 Three-way handshake process: 1.me->osChina:syn=1 seq=x ack=0 2.osChina->me:syn=1 seq=y ack=x+1 3.me->osChina:seq=x+1 ack=y+1
1. me->osChina:syn=1 seq=0 ack=0
2. osChina->me:syn=1 seq=0 ack=0+1
3. me->osChina:seq=0+1 ack=0+1
Compare the three-way handshake process.
We know that the network is extremely unstable. Even if data packets are added with the SEQ and ACK to ensure their orderliness, but it is still no guarantee that problems with packet loss or timeout will not occur. What if the data transmitted by the sender or the ACK returned by the receiver is lost in the network or timed out?
RTO, Retransmission TimeOut. To determine whether a packet has timed out, an evaluation method is required. RTT measures the round-trip time of a given connection. Due to changes in network traffic, the time changes accordingly. TCP needs to track these changes and dynamically adjust RTO.
If the sender does not receive the ACK of the packet within a certain period of time, it can be determined that the packet is lost in the network and the packet is automatically retransmitted. This mechanism is called timeout retransmission.
During this period, if the sender does not receive the ACK message because the message from the receiver is lost, the sender retransmits the packet to the receiver. If the sender receives the ACK message of this packet after the timeout timer, but the sender has already transmitted this packet again because of the timeout, the sender does not process the ACK at this time and simply discards it. The receiver returns an ACK message again after receiving the packet.
From the above, we know that TCP can ensure the reliability of data, but it also has to give consideration to efficiency. The following three aspects need to be considered:
- Support for transmitting packets in batches
- Support for congestion control based on network conditions
- The ability to understand the status of the receiver to prevent the receiver from being overwhelmed
Based on the above three requirements, the following measures have been taken.
If TCP packets are transmitted and confirmed one by one, the efficiency is too low. Although the reliability is ensured, the efficiency cannot be ensured for transmitting and confirming one packet at a time. In this case, a batch transmitting and confirming method is needed, which is what the sliding window does.
The sliding Send window:
In the Send window, from left to right, the data before this window must be the data that has been transmitted and confirmed by the receiver, while the data falling within the Send window is the data that the sender can transmit, and the data after the Send window is the data that cannot be transmitted.
In case of timeout or loss, two solutions are suggested:
- Go-Back-N. All packets with the SEQ following the SEQ of the lost package are retransmitted
- Select the ARQ to transmit the lost packets only, avoiding duplication (high efficiency, and can prevent sending duplicate packets)
The sliding window also has the function of letting the sender know the processing status of the receiver. Assuming that the cache of the TCP receiver is full and cannot process more data, but the sender does not know it. In this case, the sender will not transmit any more data provided that the sender is informed of the size of the current sliding window each time the packets are transmitted.
- The receiver also transmits an ACK immediately after receiving the data, but simultaneously declares the size of the window to the sender as 0. In this way, the sender will not transmit data for the time being.
- An ACK is not transmitted immediately when the packet arrives until there is enough space in the cache. This can prevent the sender from sliding the window. However, a problem also exist. The delay of transmitting an ACK by the receiver should not exceed the timeout time. If it is too long, the sender may mistakenly think that the data is lost and retransmit the data.
We know that the network situation is unstable. In good cases, more packets can be transmitted. In bad cases, if the rate of transmitting packets remains the same, it not only increases the network burden, but also causes too many packets to be los, resulting in more timeout retransmission, which undoubtedly reduces the communication efficiency.
Based on this, both TCP communication parties maintain a value called congestion window (cwnd, congesion window), which depends on the congestion rate in the network, and the value of the Send window of the sender is equal to the size of the congestion window. If no congestion occurs in the network, the congestion window value can be increased so that the sender can send more data to the network. Otherwise, the congestion window value is reduced to avoid increasing the congestion rate of the network.
TCP currently has the following four major algorithms for congestion control:
- Slow start
- Congestion avoidance
- Fast retransmit
- Fast recovery
The specific algorithm implementation is not introduced. The roughly implemented function is to find an appropriate transmission rate based on the current network conditions to prevent the network from being overloaded. For example, Slow Start means that the transmit speed is slow at the beginning, and then the rate is adjusted based on the packet loss. If no packet loss occurs, the transmit speed is accelerated. If the packet loss occurs, the transmit speed is reduced.
All TCP users know that a three-way handshake occurs when TCP establishes a connection and a four-way handshake occurs when the connection is disconnected. So what are the statuses?
The figure above is not too confusing to remember. Let’s take a look at the following figure to sort it out and see the specific application status.
As shown above, when the connection is established successfully, the status is ESTABLISHED. When the status of the receiver is SYN-RECV, it indicates that the receiver has replied to the second-way handshake message, and is waiting for the sender to confirm again. If the network suffers a large number of SYN attacks, a large number of SYN_RECV statuses exist. In this case, you can locate these IP addresses and use firewall filtering to solve a large number of false connection problems.
Lost Connection — TIME_WAIT
In the network, one party is closed actively but not closed by the four-way handshake. Is the channel established by TCP still there? How long will it be closed? The TCP status at this time is TIME_WAIT. It can be imagined that this situation often occurs in reality. Most closed connections are closed actively rather than through the handshake communication. If it is closed at this time, can the previous TCP channel be reconnected? Or does it need to be recreated?
For any TCP implementation, a value for MSL must be selected. The default value is 2 minutes or 30 seconds. The default value of TIME_WAIT is 2 times that of MSL, and the duration is between 1 and 4 minutes. MSL is the longest time for IP data packets to survive in the network.
Two reasons for the existence of TIME_WAIT: 1. The reliable TCP full-duplex connection is terminated 2. Old duplicate packets are allowed to disappear in the network
TCP must prevent old duplicate packets of a connection from being reproduced after the connection has been terminated, and is misinterpreted as the embodiment of the same connection. If the TIME_WAIT is long enough, which is twice the MSL, then it is sufficient to allow packets in a certain direction to survive for at most MSL before being discarded.
From the TIME_WAIT status to the CLOSED status, a timeout setting exists, which is 2 * MSL (RFC793 defines MSL as 2 minutes and Linux as 30s). If the time exceeds this limit, the current TCP channel is defined as closed.