- Released At: 23-07-2024
- Page Views:
- Downloads:
- Table of Contents
- Related Documents
-
MTU Technology White Paper
Copyright © 2024 New H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.
Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.
This document provides generic technical information, some of which might not be applicable to your products.
Contents
IP MTU fragmentation mechanism
MPLS MTU fragmentation mechanism
MTU configuration and negotiation
MTU configuration and negotiation
Setting an IP MTU for an interface
Negotiating the path MTU with RSVP-TE
Setting MTUs in an SRv6 network
Software forwarding, hardware forwarding, and MTU
Planning the MTU value in a data center interconnect (DCI) scenario
Overview
What is MTU
The maximum transmission unit (MTU) is an important concept in computer networks. It refers to the maximum packet size that the sender can transmit in one instance. The size of the MTU directly affects network performance. Too large an MTU value can cause packet fragmentation or discard, increasing packet processing overhead. Conversely, a too small MTU reduces the effective payload of packets, leading to low transmission efficiency.
Role of MTU
In different network technologies and protocols, the optimal packet size varies, and flexible MTU value settings are required. Adjusting the MTU value appropriately is crucial, because the MTU directly affects the data transmission efficiency and stability of the network. An appropriate MTU value can reduce the need for packet fragmentation, increase bandwidth efficiency, and lower the risk of latency and packet loss. Therefore, finding the most suitable MTU setting based on the specific network environment and application scenario is one of the key steps in optimizing network performance.
MTU implementation
MTU types
As shown in Figure 1, the MTU is mainly divided into two types.
· IP MTU: Refers to the maximum size of an IP packet, excluding the link layer header and trailer, and is primarily used in the IP network layer.
· MPLS MTU: Refers to the maximum size of an MPLS packet, primarily used in MPLS networks. Similar to IP MTU, MPLS MTU refers to the packet size excluding the link layer header information, but it additionally includes the size of MPLS labels. In MPLS networks, packets are encapsulated with one or more MPLS labels, resulting in a larger MPLS MTU than the traditional IP MTU.
MTU fragmentation mechanism
Fragmentation is a mechanism in the IP protocol that divides a complete packet into several fragments for transmission, to prevent a single packet from exceeding the link's capacity limit. An MTU defines the link's capacity limit and packets exceeding the MTU are usually fragmented or discarded.
IP MTU fragmentation mechanism
As shown in Figure 2, when a device sends an IP packet, it determines whether the packet needs to be fragmented.
1. If the length of the IP packet does not exceed the set IP MTU, the device sends the packet directly. If the length exceeds the set IP MTU, the device proceeds with further judgment.
2. If the packet can be fragmented, the device fragments it according to the IP MTU. If the packet cannot be fragmented, the device discards the packet. If the device is an intermediate device in the packet forwarding process, it also sends an ICMP or ICMPv6 error packet to the source of the packet, notifying that the packet has been discarded because it is too large and cannot be fragmented.
Different protocol stacks have varying rules regarding whether packets can be fragmented.
¡ For IPv4 packets, there is a DF (Don't Fragment) flag in the Fragment Offset field of the IP header. If the flag bit is set (value 1), it indicates that the packet cannot be fragmented. If the flag bit is not set (value 0), it indicates that the packet can be fragmented. DF is typically set to 1 to ensure data integrity for security reasons or when path MTU is used.
¡ For IPv6 packets, there is no DF flag bit, but IPv6 inherently requires that packets only be fragmented at the source of transmission, and that intermediate devices not fragment packets during forwarding.
3. Each fragment carries the original packet's IP header. After the device sends the fragments, the destination device reassemble them to recover the complete original packet.
Figure 2 IP MTU fragmentation mechanism
MPLS MTU fragmentation mechanism
MPLS adds the label stack between the link layer header and network layer header of each packet. To make sure the size of MPLS labeled packets is smaller than the MTU of an interface, configure an MPLS MTU on the interface. MPLS compares each MPLS packet against the interface MPLS MTU.
As shown in Figure 3, before a device encapsulating an IP packet as an MPLS packet, it determines whether the packet needs to be fragmented.
1. If the length of the IP packet plus the MPLS label stack does not exceed the set MPLS MTU, the device sends the packet directly. If it exceeds the MPLS MTU, the device proceeds with further judgment.
2. If the packet can be fragmented, the device fragments the packet, excluding the MPLS label stack, based on the IP MTU derived from the MPLS MTU minus the label length. If the packet cannot be fragmented, the device encapsulates the IP packet with MPLS and sends the packet directly.
Different protocol stacks have varying rules regarding whether packets can be fragmented.
¡ For IPv4 packets, there is a DF (Don't Fragment) flag in the Fragment Offset field of the IP header. If the flag bit is set (value 1), it indicates that the packet cannot be fragmented. If the flag bit is not set (value 0), it indicates that the packet can be fragmented. DF is typically set to 1 to ensure data integrity for security reasons or when path MTU discovery is used.
¡ For IPv6 packets, there is no DF flag bit, but IPv6 inherently requires that packets only be fragmented at the source of transmission, and that intermediate devices not fragment packets during forwarding.
3. Each fragment carries the original IP packet's header. After the device fragments the IP packet, it encapsulates the same MPLS label for each fragment and then sends these fragments. The destination device will reassemble the fragments to recover the complete original packet.
Figure 3 MPLS MTU fragmentation mechanism
MTU configuration and negotiation
MTU configuration and negotiation
Setting an IP MTU for an interface
Currently, network administrators can use the mtu size, ip mtu size, and ipv6 mtu size commands to set IP MTUs. The mtu size command takes effect on both IPv4 and IPv6 packets sent from an interface. The ip mtu size command takes effect on only IPv4 packets sent from an interface. The ipv6 mtu size command takes effect on only IPv6 packets sent from an interface.
The IP MTU set for an interface only affects the size of IP packets sent from that interface.
Setting MPLS MTUs
Setting an MPLS MTU for an interface
Use the mpls mtu command to set the MPLS MTU in interface view for an interface. The MPLS MTU affects all forwarded MPLS packets, including packets from IP to MPLS and from MPLS to MPLS.
If no MPLS MTU is set on an interface by using the mpls mtu command, fragmentation for MPLS packets is based on the IP MTU set by using the ip mtu command. If no IP MTU is set, fragmentation for MPLS packets is based on the MTU of the interface set by using the mtu command. When fragmenting an MPLS packet based on the IP MTU or the interface MTU, the device removes the label stack from the MPLS packet, fragments the IP packet, and then adds the removed label stack to each fragment.
When you set an MPLS MTU for an interface, follow these restrictions and guidelines:
· As a best practice, set an appropriate MPLS MTU. If the MPLS MTU is too small, performance degradation or even packet loss might occur.
· If the MPLS MTU of an interface is greater than the IP MTU of the interface, data forwarding might fail on the interface.
· MPLS packets that carry L2VPN or IPv6 packets are always forwarded by an interface without being fragmented, even if the length of the MPLS packets exceeds the MPLS MTU of the interface. Whether the forwarding can succeed depends on the actual forwarding capacity of the interface.
Setting an MPLS MTU in an MPLS L2VPN
MPLS L2VPN is divided into VPLS and VPWS, where VPLS is a point-to-multipoint L2VPN service and VPWS is a point-to-point L2VPN service. Both VPLS and VPWS services support setting MTU values individually, where the MTU value represents the maximum packet length with labels that a PW can carry. VPLS supports setting an MTU for all PWs under a VSI using the mtu command in VSI view. VPWS supports setting an MTU using the mtu command in cross-connect view or auto-discovery cross-connect group view. This MTU value applies to all PWs established in the cross-connect view or the auto-discovery cross-connect group view.
When setting an MTU for VPLS or VPWS, follow these restrictions and guidelines:
· For a PW to come up, configure the same MTU value on both PE devices at the ends of the PW.
· As a best practice, set an appropriate MTU value. If the packet length entering a PW is longer than the set MTU value, PW will the PW will discard packets.
· As a best practice to avoid data forwarding failures, set the MTU to be smaller than the interface MTUs on intermediate devices.
Negotiating the path MTU with RSVP-TE
Resource Reservation Protocol (RSVP) reserves resources in a network to meet quality of service (QoS) requirements. RSVP Traffic Engineering (RSVP-TE) supports optimizing data transmission by negotiating the path MTU. The negotiated path MTU ensures that the data packets transmitted on a Constraint-based Routed Label Switched Path (CRLSP) established by RSVP-TE will not be discarded or fragmented due to exceeding the processing capacity of any network device.
As shown in Figure 4, the process for RSVP-TE to negotiate the path MTU is as follows:
1. The ingress node carries the interface's MTU in a Path message. When the MPLS TE tunnel's ingress node sends a Path message downstream, it includes the MTU value of the corresponding physical egress interface in the Adspec object of the Path message. This MTU value is set for the physical egress interface by using the mtu command.
2. Intermediate nodes negotiate the MTU. When an intermediate node along an MPLS TE tunnel receives a Path message, it compares the MTU value in the Adspec object with the interface MTU set for the corresponding local physical egress interface. If the local interface MTU is smaller than the MTU value in the Adspec object, the latter will be updated as the local interface MTU. If the local interface MTU is smaller than the MTU value in the Adspec object, the latter remains unchanged.
3. The egress node receives the modified Path message that passes through the MPLS TE tunnel. The Adspec object in the Path message carries the smallest MTU value encountered along the path, known as the path MTU.
4. The egress node announce the path MTU to the ingress node through a Resv message: The egress node sends a Resv message upstream, carrying the negotiated path MTU in the FLOW_SPEC object. This way, the ingress node knows the maximum permitted packet size along the path (path MTU).
Figure 4 Path MTU negotiation with RSVP-TE
The path MTU value negotiated by RSVP-TE acts as the IP MTU for the CRLSP established by RSVP-TE, guiding the fragmentation process of packets.
Negotiating IP MTUs in an IGP
Negotiating an IP MTU in OSPF
Figure 3 shows the format of an OSPF DD packet. By default, an OSPF interface adds a value of 0, rather than the actual interface MTU, into the interface MTU field of outgoing DD packets. On receipt of a DD packet, the interface does not check the interface MTU field of that packet. Such a mechanism ensures that two interfaces in different devices can establish a neighbor relationship regardless of the interface MTU.
If an OSPF interface checks the interface MTU in received DD packets, to ensure that two interfaces on different devices can establish a neighbor relationship in Full state, set the same MTU for these interfaces.
Figure 5 OSPF DD packet format
Negotiating an IP MTU in OSPFv3
Figure 4 shows the format of an OSPFv3 DD packets. By default, the device uses the actual interface MTU value when sending DD packets on OSPFv3 interfaces. Additionally, an OSPFv3 interface checks the MTU value carried in received DD packets. If the MTU values of the interfaces at the two ends are different, the neighbor relationship cannot reach the Full state.
Figure 6 OSPFv3 DD packet format
Negotiating an IP MTU in IS-IS
IS-IS messages cannot be fragmented at the IP layer because they are directly encapsulated in frames. Therefore, when a device running IS-IS establish a neighbor relationship with a peer device, both devices negotiate a common MTU. This avoids issues where smaller PDUs can pass, but larger PDUs cannot. To establish an IS-IS neighbor relationship, the interfaces at the two ends must have the same MTU.
In real-world network environments, most interfaces have the same MTU. Sending hello packets that reach the interface's MTU size frequently wastes network resources. To address this issue, IS-IS provides the ability to send small hello packets without CLVs.
Setting MTUs in an SRv6 network
In additional to the interface IPv6 MTU, SRv6 introduces the following MTUs to reasonably control the SRv6 packet length:
· SRv6 path MTU: Controls the SRv6 packet length on the source node.
· SRv6 reserved MTU: Reserves MTU overhead on the source node for potential additional headers during SRv6 packet forwarding.
Challenges in setting the MTU in an SRv6 network
As shown in Figure 5, in an SRv6 network, when forwarding an original packet through an SRv6 tunnel, the SRv6 source node encapsulates the original packets with an IPv6 packet header to form an SRv6 packet.
· If a packet is forwarded in SRv6 BE mode, the encapsulated IPv6 header includes at least the basic IPv6 header but not the Segment Routing Header (SRH).
· If a packet is forwarded in SRv6 TE mode, the encapsulated IPv6 header includes at least the IPv6 basic header and the Segment Routing Header (SRH).
Due to the addition of the IPv6 header, the packet after encapsulation by the SRv6 source node might become very long. If the IPv6 MTU set on an interface in the network is small, the SRv6 source node will fragment the original packet into multiple segments according to the smallest IPv6 MTU on the link. Each segment carries an IPv6 header, which affects the bandwidth efficiency of the link.
Figure 7 Packet encapsulation on the SRv6 source node
Intermediate nodes on the path (including Endpoint and Transit nodes) do not fragment IPv6 packets. If the total length of the IPv6 packet header and payload exceeds the IPv6 MTU value of the interface, these intermediate nodes will discard the SRv6 packet. Then, it sends an ICMPv6 error packet to inform the SRv6 source node of the interface's IPv6 MTU value, requesting the source node to re-encapsulate the packet according to that MTU value. Therefore, when the IPv6 MTU value for an interface is small, packet re-encapsulation and retransmission might occur, affecting the overall link forwarding efficiency.
Certainly, administrators can set a larger IPv6 MTU value for some interfaces, but this behavior will disrupt the global IPv6 MTU consistency and complicates network design and planning. If the administrator increases the IPv6 MTU on interfaces for all nodes globally to accommodate SRv6 encapsulation, oversized packets might appear. These packets might not be normally accepted by the receiver, increase forwarding latency, and heighten the risk of data bits errors leading to retransmissions, thus impacting forwarding efficiency.
Based on the reasons mentioned above, SRv6 path MTU was introduced in the SRv6 networking scenario to specifically control the SRv6 packet length.
SRv6 path MTU
The SRv6 path MTU does not affect non-SRv6 packets and does not impact the mechanism of the interface IPv6 MTU.
After you set an SRv6 path MTU on the SRv6 source node, the SRv6 source node will calculate the total length of the encapsulated IPv6 packet header and the payload during encapsulation.
· If the total length does not exceed the smaller value between the SRv6 path MTU and the IPv6 MTU of the egress interface, the SRv6 source node performs normal SRv6 encapsulation and forwarding.
· If the total length exceeds the smaller value between the SRv6 Path MTU and the IPv6 MTU of the egress interface, the SRv6 source node will reasonably fragment the payload portion, which is the original packet, and then encapsulate the fragmented packets with SRv6. This ensures that the final encapsulated SRv6 packet is always less than the IPv6 MTUs of all node interfaces.
The SRv6 path MTU is required on only on the SRv6 source node. It affects the encapsulation process of SRv6 packets and is irrelevant for intermediate nodes. The SRv6 path MTU and the IPv6 MTU of the egress interface together limit the SRv6 packet length. Generally, administrators set the SRv6 path MTU slightly smaller than the IPv6 MTU of the interface. This ensures that after a 40-byte IPv6 basic header and an SRH extension header are encapsulated, the total length of the SRv6 packet's IPv6 header and payload remains less than the interface's IPv6 MTU.
In an SRv6 TE policy, you can also specify the SRv6 path MTU with explicit path Segment List granularity, allowing more precise control of SRv6 packet encapsulation.
SRv6 reserved MTU
As shown in Figure 6, in SRv6 networking scenarios, intermediate nodes might also encounter special situations that further increase the original SRv6 packet length. For example, in the SRv6 packet stitching scenario, if a SID in the SRH equals the Binding SID on an Endpoint node in the forwarding path, the Segment List represented by the Binding SID will be encapsulated into the SRv6 packet using the Encaps or Insert method. For example, in the SRv6 TI-LFA FRR scenario, a node on the forwarding path triggers TI-LFA FRR changeover, and the repair list is encapsulated in the SRv6 packet. In these cases, the SRv6 path MTU set on the SRv6 source node might no longer be suitable, and intermediate nodes might still forward SRv6 packets even if the total length of the IPv6 packet header and payload exceeds the interface's IPv6 MTU.
Figure 8 Intermediate node increasing the SRv6 packet length
To address the preceding issue, SRv6 introduces a parameter called reserved MTU. The reserved MTU is an engineering empirical value, roughly equal to the additional IP packet header length added by SRv6 packet stitching or TI-LFA FRR. The SRv6 reserved MTU on the source node further reduces the packet length after SRv6 encapsulation and reserves space for additional packet length added by intermediate nodes.
The mechanisms of the SRv6 path MTU, SRv6 reserved MTU, and interface IPv6 MTU slightly vary across different products.
· For some products, the size of SRv6 packets sent by the SRv6 source node is controlled by both the value of the SRv6 path MTU minus the SRv6 reserved MTU and the value of the interface IPv6 MTU. The effective MTU is the smaller of the two values. For example, if the SRv6 path MTU is set to 1700 and the SRv6 reserved MTU is set to 50, subtract the reserved MTU from the SRv6 path MTU to get 1650.
¡ If the IPv6 MTU of the interface is greater than or equal to 1650, the actual MTU used by the SRv6 source node is 1650.
¡ If the IPv6 MTU of the interface is less than 1650, the SRv6 source node uses the IPv6 MTU of the interface.
· For other products, the SRv6 source node first compares the SRv6 path MTU with interface IPv6 MTU, and then takes the smaller value. The MTU that actually takes effect is the smaller value minus the reserved MTU and then minus 64 bytes. For example, set the SRv6 path MTU to 1600 and the SRv6 reserved MTU to 100.
¡ If the IPv6 MTU of the interface is greater than or equal to 1600, the actual MTU used by the source node is the SRv6 path MTU minus the SRv6 reserved MTU and an additional 64 bytes, which is 1436.
¡ If the IPv6 MTU of the interface is smaller than 1600, such as 1500, the MTU actually used by the source node is the IPv6 MTU of the interface minus the SRv6 reserved MTU and an additional 64 bytes, which is 1336.
IMPORTANT: The SRv6 path MTU minus the SRv6 reserved MTU must be greater than or equal to 1280 bytes. If the subtraction result is smaller than 1280 bytes, the effective value will still be 1280 bytes. |
MTU priority
If you execute both the mtu size and ip mtu size or ipv6 mtu size command on an interface, the ip mtu size or ipv6 mtu size takes precedence.
If you both set an MTU manually and enable the path MTU discovery feature, the smaller of the manually set MTU and the discovered path MTU takes precedence.
To avoid MPLS packet forwarding failures on an interface, make sure the MPLS MTU of the interface is not smaller than the IP MTU that takes effect on the interface.
To avoid MPLS packet forwarding failures on a VSI or cross connect, make sure the MTU of the VSI or cross connect is not greater than the MPLS MTUs on intermediate devices.
MTU extensions
MTU and TCP MSS
In the TCP protocol, the maximum segment size (MSS) is a major concept. It defines the maximum amount of data a single data segment can include during TCP transmission, excluding the TCP and IP headers, as shown in Figure 7.
The purpose of the TCP MSS is similar to that of the MTU, aiming to optimize network transmission, reduce fragmentation, and enhance efficiency. During the TCP handshake phase, both parties exchange their MSS values and inform each other of the maximum MSS value they can accept. When sending TCP packets subsequently, the device will limit the size of the TCP packets to not exceed the peer's MSS value. If a device receives a packet longer than the TCP MSS, it fragments or discards the packet according to the fragmentation mechanism.
The default value of the TCP MSS is typically determined by subtracting the IP header and TCP header from the IP MTU value of the interface. For example, if the IP MTU is 1500 bytes, considering the IP header (usually 20 bytes) and TCP header (usually 20 bytes), the default MSS value will be set to 1460 bytes.
You can manually set the TCP MSS value. If it conflicts with the IP MTU configuration, such as being larger than the IP MTU on the same interface, the device will use the smaller of the two values as the effective value. For example, if you set the TCP MSS to 1600 bytes and the IP MTU to 1500 bytes, the effective TCP MSS for this interface is 1460 (1500-20-20) bytes.
Selecting the appropriate MSS is crucial for network performance. If the MSS is set too high and exceeds the MTU of a node in the network path, it will cause IP layer fragmentation. This increases the likelihood of retransmissions and lowers network efficiency. Conversely, if the MSS is set too low, it can prevent fragmentation but increases overhead, leading to reduced network efficiency. Therefore, setting the MSS appropriately can reduce network latency and improve data transmission efficiency.
MTU and jumbo frames
The standard default MTU value for Ethernet is 1500 bytes, which has remained unchanged throughout the development of Ethernet. However, with the growth of technologies like programmable networks and software-defined networks, the length of the control fields carried in a single packet has surged dramatically. The 1500-byte MTU value will lead to a sharp increase in the number of packets, significantly affecting the efficiency of network transmission due to the additional overhead. Thus, the concept of jumbo frames emerged.
Jumbo frames were first introduced by Alteon Networks in the mid-1990s for their Ethernet switch products to enhance network throughput and reduce CPU processing load. As the demand for network bandwidth increases, especially in data centers and enterprise back-end networks, Jumbo frames have gradually become an effective way to enhance data transmission efficiency.
Jumbo frames reduce the count of frames needed per data transmission by permitting the MTU to be set to a value greater than 1500 bytes. This not only reduces the total overhead but also decreases the frequency of disruptions caused by processing numerous small frames, thereby lowering the CPU usage of network devices. For applications with high data transmission requirements, such as large-scale file transfers, backup operations, or video streams, using jumbo frames can significantly enhance network transmission efficiency.
Software forwarding, hardware forwarding, and MTU
When H3C devices perform software forwarding (such as protocol packet forwarding and policy-based forwarding), the packet length is affected by the MTU settings configured for each protocol. Packets that are too long will be fragmented or discarded.
When H3C devices perform hardware forwarding, such as packet forwarding through table lookup, the packet length for outgoing packets is not affected by the MTU settings of various protocols. However, the device will discard incoming packets that exceed the jumbo size on the link (for example, 9416 or 9964 bytes).
MTU value planning
Manual MTU setting
Default MTU
Since the IEEE 802.3 standard, 1500 bytes have been set as the maximum length for the data portion of an Ethernet frame. This standard has had a profound impact on devices and protocols across the entire network field and is still in use today.
Advanced network requirements, technological developments, evolving application scenarios, and higher demands for transmission efficiency have led to an increase in packet length. Consequently, the default IP MTU and MPLS MTU values on most device interfaces are set to the maximum length defined by the IEEE 802.3 standard (1500 bytes). Using this default value ensures complete packet transmission while maintaining Ethernet data transmission compatibility, thus improving transmission efficiency.
The default MTU value for some special interfaces is less than 1500 bytes, because these interfaces typically require adding special encapsulation content before sending packets. For example, the default MTU for a VSI interface is usually 1444 bytes. This is to ensure that the inner packet, after adding a VXLAN packet header (8-byte outer VXLAN header plus 8-byte outer UDP header plus 20-byte or 40-byte outer IP header), still does not exceed the standard maximum length of 1500 bytes.
Considering that the packet length often needs to exceed 1500 bytes in many cases, use the default MTU value in the following scenarios as a best practice:
· Home networks and small campus networks: These networks typically have simple configurations, and an MTU value of 1500 bytes is sufficient to meet daily requirements.
· Scenarios where the path MTU is uncertain: The 1500-byte MTU is the commonly used MTU value on the Internet. If you are unaware of the specific MTU value settings for each device during the packet forwarding process, using the default 1500-byte MTU value can maximize compatibility and interoperability.
Manually adjusting the MTU
In the following scenarios, manually adjusting the MTU value to more than 1500 bytes as a best practice:
· Data center networks: Data centers handle vast amounts of data transmission. To maximize transmission efficiency, set the MTU values of all network devices within the data center to the maximum allowable value.
· High bandwidth requirement or internal network: In networks with high bandwidth requirements, a larger MTU can prevent packet fragmentation and improve the transmission rate. In networks with strict control and consistent internal configurations, you can use a larger MTU value to enhance transmission efficiency without worrying about security risks.
In the following scenarios, manually adjust the MTU to a value smaller than 1500 bytes as a best practice:
· Outdated network environment: Older devices might not support an MTU value of 1500 bytes, so reducing the MTU value can ensure smooth packet transmission.
· VPN and multi-encapsulation network environment: A smaller MTU value to can prevent the packet length from exceeding 1500 bytes due to encapsulation overhead, which can cause fragmentation or packet loss.
· Security consideration: Reducing the MTU value can prevent the network from attacks by oversized packets.
Path MTU discovery
Path MTU discovery is a network communication mechanism used to determine the minimum MTU of the transmission path between the source and destination nodes, thereby avoiding IP fragmentation and reducing resource waste.
As shown in Figure 1, the process for path MTU discovery is as follows:
1. The sender initially assumes a larger MTU to send a packet and sets the DF bit if the packet is an IPv4 packet.
2. During packet transmission, if a device's MTU is less than the packet length, the device will discard the packet and send an ICMP or ICMPv6 error packet to the source, notifying that the packet was discarded due to exceeding the local MTU, which is included in the protocol packet.
3. After receiving an ICMP or ICMPv6 error packet, the source reduces its MTU value to the one specified in the packet and resend smaller packets.
4. The preceding process might be repeated multiple times until the source finds an MTU size that successfully transmits the packet to the destination node. This MTU is also the smallest MTU value among all nodes in the transmission path.
Figure 10 Path MTU discovery process
To avoid potential fragmentation and packet loss risks caused by manually setting the MTU value, you can use the path MTU discovery feature to help the device automatically set the MTU value, ensuring packets are transmitted in the most effective way.
Application scenarios
Planning the MTU value in a data center interconnect (DCI) scenario
As shown in Figure 9, the DCI network contains VTEPs and edge devices (EDs) located at the edge of the transport network. A VXLAN tunnel is established between a VTEP and an ED, and a VXLAN-DCI tunnel is established between two EDs. VXLAN-DCI tunnels use VXLAN encapsulation. Each ED de-encapsulates incoming VXLAN packets and re-encapsulates them based on the destination before forwarding the packets through a VXLAN or VXLAN-DCI tunnel.
In VXLAN networks, avoid fragmenting VXLAN packets as much as possible. Some devices might discard fragmented VXLAN packets upon receipt, or some intermediate devices might not support fragmentation. These issues can lead to data loss across data centers, thereby affecting the stability of data transmission in the data center.
Therefore, plan the MTU values to ensure that the packet length does not exceed the MTU value of each hop in the transmission path. As a best practice, set the MTU value for each hop in the network to the maximum value to achieve the highest transmission efficiency, and edit the size of the packets sent by the server. In this manner, after the VXLAN encapsulation (8-byte outer VXLAN header plus 8-byte outer UDP header plus 20-byte or 40-byte outer IP header) is added, the packet length does not exceed the minimum MTU value on the transmission path.
References
· RFC 894:A Standard for the Transmission of IP Datagrams over Ethernet Networks
· RFC 1191:Path MTU Discovery
· RFC 2460:Internet Protocol, Version 6 (IPv6) Specification