Title | Size | Downloads |
---|---|---|
H3C 400G Ethernet Technology White Paper-6W100-book.pdf | 3.78 MB |
- Table of Contents
- Related Documents
-
Title | Size | Download |
---|---|---|
book | 3.78 MB |
H3C 400G Ethernet Technology White Paper
Copyright © 2023 New H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.
Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.
This document describes only the most common information. Some information might not be applicable to your products.
H3C 400G network smooth upgrade solution
Key technologies of 400G network products
Requirements of a 400G system for high-speed links
High-speed link technology implementation in the H3C 400G system
Power consumption control technology
Optimize the system link design
Low-power transceiver module technology
Heat dissipation of transceiver modules
Evolution and product introduction of 400G transceiver modules
400G encapsulation form factors
Common specifications for 400G modules
H3C 400G transceiver modules and cables
H3C 400G network devices for data centers
S12500R data center switch router series
H3C 400G unbranded data center switch
Typical network applications of H3C 400G switches
Ultra-large-scaled 400G data center interconnect application
Building an ultra-large-scaled lossless RDMA 400G network
About 400G Ethernet
Introduction
The rapid development of the digitization industry greatly impacts people's lives and drives social progress. For example, emerging technologies such as cloud computing and virtualization have been widely applied. Entertainment methods like short videos and virtual reality (VR) have become popular, while emerging technologies like autonomous driving and artificial intelligence (AI) have gradually been implemented. The development of emerging applications raises higher requirements for network computing power, bandwidth, and quality. It is expected that data center bandwidth will grow over 50% annually in the future.
The development of emerging applications has significantly impacted the distribution of network traffic. Cloud computing applications such as remote desktop and servers require putting compute, storage, and network resources in the data centers and resource pools, which increases horizontal traffic in data centers. Thus, network bandwidth needs to be improved.
Currently, 100G data center networks are becoming increasingly popular. Upgrading to 400G data center networks is a crucial direction. 400G networks are the foundation for upgrading data center networks to 800G or higher speeds.
H3C leads the industry in developing and implementing 400G products. In 2020, H3C and Spirent Communications cooperated to complete a network based on Segment Routing over IPv6 (SRv6) with 72 full-mesh 400G ports at line speed. The test, which resulted in outstanding performance certification, proved the superior quality of H3C 400G products. Currently, H3C has successfully implemented multiple commercial projects by using 400G Ethernet.
Figure 1 H3C S12500CR 400G SRv6 network test
Benefits of 400G Ethernet
400G Ethernet delivers the following benefits:
· High performance—The bandwidth capacity is quadrupled, with a single port supporting up to 400 Gbps.
· High port density—The network performance has been greatly improved to meet different application requirements through high port density. The bandwidth per rack unit (RU) has increased from 3.2-3.6T to 12.8-14.4T.
· Reduced power consumption per unit of bandwidth—Reduces operating total cost of ownership (TCO) by decreasing power consumption per unit of bandwidth.
· Reduced network complexity—Compared to a 100G network with the same total bandwidth, a 400G network greatly reduces the number of devices, transceiver modules, optical fiber links, and cable trays, and reduces floor space in the equipment room. In this way, you can maintain and manage the network structure more easily.
H3C 400G network smooth upgrade solution
Although a 400G network has cost advantages over a 100G network with the same total bandwidth, customers still need to invest heavily in the upgrading process. Meanwhile, different applications require devices with high-density 400G/200G/100G ports, which cannot be fulfilled by a single device form factor.
The H3C 400G network smooth upgrade solution mainly adopts the idea of downward compatibility with 100G and upward smooth upgrade to 400G. It protects customer investment and controls costs while meeting service requirements. The following designs of H3C 400G products can be used to achieve a smooth upgrade to 400G networks.
· The H3C S12500R switch series uses a midplaneless design, allowing for quick upgrades in network performance through replacing service modules and switching fabric modules. The switch series also offers a wide range of service module options. Therefore, the S12500R switch series can meet the requirements for high-density 10G/40G/100G ports and support smooth upgrade to high-density 100G/200G/400G ports in the same chassis.
· The S12500CR switch series developed based on the S12500R switch series can meet high-density 400G port requirements with nine switching fabric modules. When it is installed with six switching fabric modules, it is compatible with all service modules of the S12500R switch series.
· The S12500R and S12500CR switches can share switching fabric modules, allowing you to flexibly allocate resources or utilize existing resources as much as possible during network upgrades.
· The fixed-port devices have various types of ports. The links on the access/aggregation layer can select 100G/200G/400G ports as needed, while the links on the core layer select 400G ports.
In the future, the cost of the ecosystem supporting 400G core switches will also be reduced significantly. The bandwidth of a 400G network is expected to be four times that of a 100G network, while the overall deployment cost is expected to be only twice that of the 100G network. At the same time, because the H3C smooth upgrade solution tries to protect customer investments, it has a cost advantage over similar solutions at the same level.
Key technologies of 400G network products
High-speed link technology
Requirements of a 400G system for high-speed links
The 400G system transmits service information by using 56G 4-level Pulse Amplitude Modulation (PAM4) signals. As a popular high-order modulation mode, PAM4 has gained widespread usage in high-speed interconnects.
Before the emergence of PAM4 modulation, Non-Return-to-Zero (NRZ) encoding was always mainstream. In this method, data is encoded as a series of fixed voltage levels (low=0, high=1), with each signal period transmitting 1-bit logical information. As transmission rates increase, limitations of NRZ modulation in cost, optical-electric conversion bandwidth, and external interference become more prominent and NRZ modulation gradually fails to meet the requirements of high-performance networks.
The PAM4 modulation method uses four different signal levels, 00/01/10/11, to transmit data. Each signal period represents 2 bits of logical information. If the same signal is transmitted through PAM4 modulation, the baud rate is only half of that of NRZ. However, the amplitude of each PAM4 signal is only 1/3 of the NRZ signal. Therefore, the system using PAM4 signals requires higher-speed links. If the link speed is low, communication quality will be affected due to low signal-to-noise ratio.
Figure 2 The levels of NRZ and PAM4 (NRZ uses two levels of 0/1, while PAM4 uses four levels of 00/01/10/11)
High-speed link technology implementation in the H3C 400G system
The main components of the internal high-speed links in the switch systems are the high-speed links between the switching fabric module chips and service module chips, and the high-speed links between service module chips and ports. Currently, most high-speed links are carried on printed circuit boards (PCBs). A PCB includes components such as chip packaging traces, packaging solder balls, PCB vias, PCB traces, connectors, connector vias, and transceiver module PCB traces. The following ways are available for improving the performance of high-speed links:
· Control the high-speed link channel loss by reducing the loss of one or more components on the high-speed link. If controlling the loss of PCB traces and other link components cannot meet the loss control requirements, take the following measures:
¡ Add a PHY chip, which retransmits received signals to improve signal quality. This method will increase the product cost.
¡ Change the link between the switching fabric module and the service module from a PCB trace solution to a high-speed cable solution. Due to the large number of required connections inside the system, this method poses significant challenges to the internal cabling of the device.
· Control the interference among high-speed link channels, improve the signal-to-noise ratio, and ensure signal quality within the system.
With its professional design and development team and technical expertise, H3C developed 400G switches mainly through controlling link loss and interference among link channels. All service modules have achieved PHY-less design to improve product reliability and competitiveness, while ensuring sufficient system design margin.
High-speed link loss control technology
As the interconnect signal rate within the system increases to 53G-56G (PAM4) and higher, PCB loss becomes greater, requiring lower-loss card materials to match device requirements. H3C has undergone strict evaluation through electrical performance and process reliability testing.
· H3C introduces multiple ultra-low-loss materials to meet PCB loss requirements of the 53G-56G (PAM4) systems.
· By using technologies such as PCB high-speed signal stacking and normalization, H3C achieves precise control over PCB trace loss.
· The simulation information is extracted through comprehensive traversal of system links to ensure that all links meet the design requirements.
· H3C uses the back end control technology, including insertion loss management, to ensure that the final product meets design requirements.
Interference control technology for high-speed link channels
The voltage step of a PAM4 coded signal is 1/3 of that of traditional NRZ, which results in a 9.5dB signal-to-noise ratio loss compared to NRZ under the same rate and noise conditions. This poses higher requirements for interference design in high-speed links.
H3C optimizes the chip fan-out and connector fan-out through techniques such as micro-vias and offset-vias. The multi-wire-diameter interference control technique is used to control trace interference. These techniques ensure connector contact surface compatibility when upgrading connector performance. This improves the signal-to-noise ratio by over 10dB for the entire link and achieves a smooth upgrade for the high-speed system link from 25G NRZ to 53G-56G PAM4.
Power consumption control technology
Optimize the system link design
In the system link design, using fewer devices leads to lower losses while ensuring link signal quality. H3C reduces system power consumption by selecting low-loss PCB materials, fine controlling PCB trace losses, and avoiding the use of lite-PHY chips and retimer chips.
Low-power chip technology
Using advanced chips
One of the most effective methods to reduce chip power consumption is to upgrade chip processes. As the transistor size of chips decreases, the proportion of leakage power to overall power consumption increases. Therefore, reducing leakage power becomes more crucial in lowering chip power consumption. The determining factor affecting leakage power is the gate length of the transistor. The smaller the gate length, the lower the leakage power consumption. The transistor gate length of advanced process chips has evolved from 28nm to 16nm and further to 7nm, greatly enhancing chip integration while significantly reducing power consumption per unit.
Using a chip that supports AVS design
The overall power consumption of chips is positively correlated with voltage. Lowering the voltage reduces power consumption, but the voltage must meet frequency requirements. The Adaptive Voltage Scaling (AVS) mechanism can autosense the processor's performance (frequency) requirements, adjust the voltage accordingly, and reduce overall power consumption as much as possible while meeting the requirements.
Low-power transceiver module technology
H3C uses advanced processes for transceiver module chips. The integration is improved and power consumption is reduced for transceiver module chips through the following two methods.
· Use a digital signal processor (DSP) with the transistor gate length 16nm or 7nm. The DSP is the core chip for the transceiver module and consumes a significant portion of the power. The smaller the transistor gate length, the lower the power consumption. This is a principle in the manufacturing processes of device chips.
· Integrate the DSP with driver and trans-impedance amplifier (TIA).
|
NOTE: The driver and TIA are important components in the transceiver module. The driver is located in the transmitter of the transceiver module and converts an electrical signal into a corresponding modulation signal to drive the laser to emit. The TIA is located in the front end of the detector in the receiver of the transceiver module. The TIA converts the optical signal into an electrical signal and amplifies it initially. |
Furthermore, H3C is testing the application of refrigeration-free Electro-absorption Modulated Laser (EML) chips in transceiver modules. In the future, power consumption of transceiver modules might be further reduced through the application of refrigeration-free EML chips.
|
NOTE: EMLs are widely used in the industry as the signal transmitters of 400G transceiver modules. Typically, EMLs need to be cooled down in conjunction with thermo-electric coolers (TECs). However, TECs consume a lot of power. Level up the chip design process to avoid using TEC refrigeration and save power consumption for transceiver modules. |
Heat dissipation technology
Heat dissipation of chips
Compared to 100G system chips, 400G system chips upgrade Serdes links from 25G NRZ signals to 56G PAM4 signals, while increasing the number of links. This results in a power consumption increase to about 2.5 times that of 100G systems. The increase in chip power consumption and density has posed significant heat dissipation challenges. H3C resolves the 400G system's heat dissipation issues by the following methods.
· Use high-performance thermal conductive materials—The materials export high heat generated inside the chip to the radiator quickly. H3C continuously studies the heat dissipation performance and production process of various heat-dissipating materials such as thermal conductive silicone grease, thermal conductive silicone paste, phase-change thermal-conductive materials, and carbon nano thermal conductive materials. H3C cooperates with third-party organizations to conduct experimental research and study the long-term reliability of materials. Through analysis and testing, H3C builds a comprehensive platform for selecting high-performance heat-dissipating materials. This platform matches suitable heat-dissipating materials to different chip heat dissipation requirements.
· Use high-efficiency radiators—In air-cooling systems, radiator design is the key element in the system's cooling solution. H3C collaborates with a professional radiator manufacturer to develop and qualify high-performance radiators such as heat pipe radiators, vapor chamber (VC) radiators, and siphon radiators to better improve chip cooling. H3C also pioneers the commercial use of VC radiators in the industry. To solve the uneven heating of multiple chips, H3C creatively uses a solution of multiple chips sharing a VC radiator for heat dissipation.
· Host air duct design—H3C has introduced a futuristic orthogonal direct (OD) system design that allows for direct front-to-rear connection without a backplane in the middle. The design also innovatively increases the slot height and ventilation area for service modules. The direct airflow, increased slot height, and fine air duct control with precise obstruction clearance inside the module lead to greater ventilation for chip heat dissipation. These designs provide robust technical support for the smooth upgrade to 400G networks.
Heat dissipation of transceiver modules
The main form factors of 400GE transceiver modules are CFP8, QSFP-DD, and OSFP. Among them, QSFP-DD is the mainstream choice for 400G network development due to its good downward compatibility and high port density implementation. The power consumption of a 400G transceiver module is much higher than that of a 100G transceiver module. Take QSFP-DD transceiver modules as an example. A single port typically consumes 12-15W (which might increase to 20W for long-haul modules in the future), which is about four times of the power consumption (3.5W) of a single port's 100G transceiver module. QSFP-DD transceiver modules make heat dissipation for the ports more challenging. H3C employs the following methods to address the heat dissipation of transceiver modules:
· Optimize the air inlet area of a port—Increase the effective air inlet area of a port by finely designing the ventilation shape and opening distance of the panel in the port position.
· Improve the heat dissipation capability of the transceiver module cage—By adding suitable radiators and customizing a double-layered cage with a higher pitch to enhance the ventilation effect of the lower layer transceiver module.
· Enhance PCB's auxiliary heat dissipation capacity for transceiver modules.
High-performance fans
H3C has an experienced thermal design team and a comprehensive fan selection library. H3C uses advanced and efficient simulation platforms for modeling and evaluation. Ultimately, the team selects large-sized, dual-rotor, high-performance counter-rotating fans suitable for 400G products to ensure the heat dissipation for the 400G solution.
Liquid cooling
The high power density of 400G network devices and deployment of servers with higher computing power can cause heat dissipation bottlenecks in conventional air-cooled data centers, reducing device utilization and making it difficult to improve power usage effectiveness (PUE). New data center constructions should consider the benefits of liquid cooling. In liquid cooling solutions, plate liquid cooling has a relatively smaller impact on the infrastructure, and technique multiplexing is better. Plate liquid cooling is recommended for customers.
H3C has made sufficient preparations for liquid cooling solutions for devices. When the equipment rooms are switched to liquid cooling for heat dissipation, H3C is capable of rapidly launching liquid-cooled devices.
Power supply mode
During the new data center construction process, preferentially consider the high-voltage DC power supply solution. Compared to the conventional DC power supply solution, the high-voltage DC power supply solution can better control transmission losses and lower power transmission costs and it can better meet power consumption requirements of 400G network devices and transceiver modules.
The power supply design for H3C 400G core devices S12500R/S12500CR has the following advantages:
· Supports AC, high-voltage DC, and conventional DC (-48V) power supply solutions.
· The power design of S12500CR uses the same design as that of the core routers. The power design uses a power supply frame with higher reliability and flexibility. It supports flexible power supply solution switchovers and can even accommodate mixed power supplies.
· Use a high-density power supply design to ensure that the system meets N+N redundancy power supply requirements even under high power consumption.
Evolution and product introduction of 400G transceiver modules
The technical means to improve the transmission rate of transceiver modules include increasing the signal speed of a single channel, increasing the number of parallel fiber channels, and increasing the number of multiplexed wavelengths.
· Improving modulation methods (NRZàPAMàCoherent) can effectively increase the signal speed of a single channel.
· Increasing the number of fiber channels can further increase the transceiver module interface rate, while at the same time resulting in a multiplied increase in the quantity and cost of optical fibers.
· Increasing the number of multiplexed wavelengths will require an increase in lasers, which will result in increase in power consumption and complexity of the transceiver modules.
Based on these technical directions, the 400G transceiver modules have developed different encapsulation forms and application specifications by integrating actual market requirement time, usage scenarios, and technological implementation difficulties.
Figure 3 Illustration of transceiver module rate improvement technology
NOTE: The graph is referenced from Ethernet Alliance.
400G encapsulation form factors
The overall evolution direction of the 400G modules is high density, low power consumption, and low cost.
· The CFP8 form factor supports 25G*16 lines of NRZ signals, with a large form factor and relatively low technical implementation difficulty. As a result, the CFP8 form factor becomes an early adopted encapsulation specification in the telecommunications industry for long haul transmission.
· As the card electrical signals upgrade to 50G PAM4, QSFP-DD modules, with their high density, low power consumption, and downward compatibility with QSFP28, have become the mainstream choice in the market.
· The OSFP form factor is slightly larger than QSFP-DD. It is easy to be implemented in engineering and be upgraded to higher levels. It has become the choice for some Internet data centers in the 400G era.
Encapsulation type |
CFP8 |
QSFP-DD |
OSFP |
Appearance |
|
||
Protocols and standards |
CFP8 MSA |
QSFP-DD MSA |
OSFP MSA |
Electric signal rate |
25G NRZ/50G PAM4 |
50G PAM4 |
50G PAM4 |
Length*width*height (mm) |
102*40*9.5 |
89.4*18.35*8.5 |
100.4*22.58*13 |
Power consumption |
12 to 18W |
12W |
12 to 15W |
Port quantity and bandwidth per RU |
16 ports, 8T |
36 ports, 14.4T |
32 ports, 12.8T |
Compatibility |
N/A |
Compatible with QSFP28 |
N/A |
Common specifications for 400G modules
H3C 400G transceiver modules and cables
Table 1 H3C 400G transceiver modules
Appearance |
Transceiver module model |
Central wavelength (nm) |
Connector |
Interface cable specifications |
Modal bandwidth (MHz*km) |
Maximum transmission distance |
QSFPDD-400G-SR8-MM850 |
850 |
MPO (APC end face, 16 cores) |
50/125µm MMF |
2000 |
70m |
|
4700 |
100m |
|||||
QSFPDD-400G-FR4-WDM1300 |
1310 |
LC |
9/125µm SMF |
N/A |
2km |
Table 2 H3C 400G cables
Appearance |
Cable type |
Cable model |
Cable length |
400G QSFP-DD cable |
QSFPDD-400G-D-CAB-2M |
2m |
H3C 400G data center solution
H3C 400G network devices for data centers
H3C has a wide range of 400G network devices to meet various customer scenario requirements.
S12500R data center switch router series
The S12500R switch router series includes S12500R-2L, S12504R, S12508R, S12516R, S12508CR, and S12516CR, and can meet port density and performance requirements of different network scales. They provide powerful device support for wide area interconnect constructions. At the same time, H3C offers a complete solution series for consolidated wide area interconnect scenarios by combining H3C routers, switches, security devices, iMC, and SDN solutions.
Figure 4 S12500R data center switch router series
Table 3 Specifications for main H3C S12500R switches
Item |
S12500R-2L |
S12504R |
S12508R |
S12516R |
S12508CR |
S12516CR |
Interface module slots |
2 |
4 |
8 |
16 |
8 |
16 |
Maximum number of 400G interfaces |
36Note |
96 |
192 |
384 |
384Note |
768Note |
Maximum number of 200G interfaces |
72Note |
192 |
384 |
768 |
384 |
768 |
Maximum number of 100G interfaces |
96 |
192 |
384 |
768 |
384 |
768 |
Maximum number of 40G interfaces |
96 |
192 |
384 |
768 |
384 |
768 |
Maximum number of 10G interfaces |
96 |
192 |
384 |
768 |
384 |
768 |
Switching fabric module slots |
N/A |
6 |
6 |
6 |
9 |
9 |
Fan trays |
1+1 redundant backup |
1+1 redundant backup |
1+1 redundant backup |
1+1 redundant backup |
4+1 redundant backup |
4+1 redundant backup |
Environment management modules |
N/A |
- |
- |
- |
1+1 redundant backup |
1+1 redundant backup |
MPUs |
1+1 redundant backup |
1+1 redundant backup |
1+1 redundant backup |
1+1 redundant backup |
1+1 redundant backup |
1+1 redundant backup |
Chassis height |
3RU |
6RU |
12RU |
21RU |
18RU |
30RU |
Maximum power consumption |
3KW + 3KW (dual-power supply) |
6KW + 6KW (dual-power supply) |
12KW + 12KW (dual-power supply) |
24KW + 24KW (dual-power supply) |
36KW + 36KW (dual-power supply) |
54KW + 54KW (dual-power supply) |
Note: Being planned and to be supported in the future. |
H3C fixed-port 400G devices
H3C fixed-port 400G devices include the industry's highest-density 100G port device S9820-8C (supports 400G with a 400G interface module installed), industry's first high-density 400G port device S9825-64D, and 400G top of rack (ToR) switch S9855-48CD8D.
· The H3C S9820-8C switch supports high-density 400G/200G/100GE/40GE ports and has powerful forwarding capabilities. It features flexible subcard configurations and can support up to 32 × 400GE ports or 128 × 100GE ports.
· The S9825-64D fixed-port switch supports 64 × 400GE ports with high port density and powerful forwarding capabilities.
· The H3C S9855-48CD8D fixed-port switch supports 48 × 100GE (DSFP interface) ports and 8 × 400GE ports, meeting the server NIC upgrade requirements for different levels.
Figure 5 H3C fixed-port 400G devices
Table 4 Specifications for main H3C fixed-port 400G devices
Item |
S9820-8C |
S9825-64D |
S9855-48CD8D |
Interface module slots |
8 |
0 (fixed-port device) |
0 (fixed-port device) |
Maximum number of 400G interfaces |
32 |
64 |
8 |
Maximum number of 200G interfaces |
64 |
128 |
16 |
Maximum number of 100G interfaces |
128 |
256 |
48DSFP+32QSFP28 |
Fan trays |
4+1 redundant backup |
5+1 redundant backup |
5+1 redundant backup |
Chassis height |
3RU |
4RU |
1RU |
Maximum power consumption |
3.2KW + 3.2KW (dual-power supply) |
3.2KW + 3.2KW (dual-power supply) |
1.6KW + 1.6KW (dual-power supply) |
H3C 400G unbranded data center switch
H3C develops unbranded switches for customers with software development capabilities in the era of 25G/100G. Some customers have already deployed these switches at scale.
In the era of 400G, H3C will continue to dedicate itself to the research and development of unbranded switches. The unbranded switches and commercial switches are developed and manufactured on the same platform to ensure the highest quality in design and manufacturing for unbranded switches. This fully meets customers' unbranded switch customization requirements.
Typical network applications of H3C 400G switches
Ultra-large-scaled 400G data center interconnect application
Figure 6 Ultra-large-scaled 400G data center interconnect application (three-layer architecture)
As shown in Figure 6, deploy the access, aggregation, and core layers for the data center network.
· Users access the S9855-48CD8D switches through 100G interfaces.
· The access switches S9855-48CD8D connect to the aggregation switches S9820-8C through 400G links.
· The aggregation switches S9820-8C connect to the core switches S12500R through 400G links.
The preceding architecture can also be applied to the leaf-spine-border architecture of EVPN VXLAN networks. Access devices serve as EVPN VXLAN distributed gateways (leaf devices), aggregation switches act as spine devices, and core devices act as border devices.
Building an ultra-large-scaled lossless RDMA 400G network
Figure 7 Ultra-large-scaled lossless RDMA 400G network (two-layer architecture)
The server NICs are connected to the S9855 switches through 100G/200G links. The S9855 switches are uplinked to the S9820-8C switches through 400G links, enabling the deployment of a two-layer Remote Direct Memory Access (RDMA) network within a data center.