Intelligent Computing Technical Insights | Why Do We Need More Open and Decoupled Intelligent Computing Center Networks? Part 2

2025-06-26 3 min read

Topics:

In the previous article, we discussed the application of the "fat-tree" architecture in AI computing networks: it typically ensures high data transmission availability through dense link connections.

The key lies in how to effectively utilize these links to distribute traffic rationally across each one. This is akin to a highway: even with four lanes, if all vehicles cluster in one lane, congestion is inevitable. In AI computing scenarios, especially during operations like “All Reduce,” where all GPUs simultaneously exchange data, the enormous communication pressure demands that every link in the AI computing network be fully utilized.

Under these requirements, we must balance reliability and efficiency through precise control. The core of the control strategy is to detect network congestion and initiate traffic divert, promptly notifying pre-device to throttle speeds until the path clears. However, improper regulation can waste resources—for instance, throttling due to transient congestion unnecessarily compromises training efficiency.

Thus, our goal resembles a racecar driver finding the optimal speed through a curve: neither excessive deceleration to maintain control nor reckless acceleration that risks instability. The ideal state is to maintain the balance between grip and slip, achieving maximum speed safely. Similarly, an ideal AI computing network should transmit data at the highest possible rate without blocking, requiring extremely precise scheduling. Even a one-second congestion should trigger only a brief speed reduction, followed by rapid recovery to maximize efficiency.

To achieve this balance, H3C has developed a series of precise scheduling technologies, ensured network reliability and maximized efficiency. This enables every switch invested in by customers to operate at peak performance, avoiding resource idle time and delivering maximum value.

Next, we will delve into these precise scheduling technologies.

Before implementing precise scheduling technologies, we must establish a foundational architecture that is adaptable to various scheduling techniques.

Based on its open standards, Ethernet technology has gained widespread adoption, and H3C has remained committed to it. Based on Ethernet, the functions and interfaces we implement in data center networks adhere to openness principles, ensuring compatibility with users' computing scheduling systems, SDN controllers, or cloud providers. Moreover, H3C operates in the server domain, enabling preemptive compatibility validation for customers' GPU NICs and optical modules. This reduces procurement risks, as components from any brand can be validated on H3C platforms to ensure smooth operation.

In product offerings, H3C provides diverse options, whether chassis or box devices, catering to networks of varying scales. This is particularly advantageous for large-scale computing networks.

Therefore, we created a flexible foundational architecture that can adapt to various scheduling technologies and customer scenarios.

After the initial establishment, the next critical step is ensuring effective network load balancing—rational bandwidth resource on available links. Depending on network topologies and traffic patterns, users can choose different load balancing technologies.

LBN

First is LBN (Load Balancing Network), which we explain using a highway analogy.

A common congestion scenario on highways (especially at entrances/exits) occurs when vehicles from multiple lanes merge chaotically, creating bottlenecks.

Imagine four toll booths, each corresponding to one entrance and exit lane Vehicles entering through Booth 1 are assigned to Lane 1, Booth 2 to Lane 2, and so on. This binds the entrance and exit lane bandwidth, preventing cross-lane contention. In data centers, LBN implements similar precise management. It binds each switch inbound port which the GPU server’s NIC connected (like highway on-ramps) to specific outbound port (off-ramps), creating dedicated "express lanes" for data traffic. This ensures that large flows from any entrance enjoy exclusive upstream bandwidth, avoiding contention. The mechanism guarantees matched bandwidth between upstream and downstream ports, enabling exclusive traffic and high network utilization.

DLB

Next is DLB (Dynamic Load Balancing), illustrated with a train container shipping analogy.

Traditional container shipping uses fixed routes between cities—same origin-destination pairs always take the same path, even if alternate routes are idle during peak demand.

DLB acts like an intelligent cargo dispatcher. First, it splits goods to be transported into flowlets (small groups). Before shipping between cities, it identifies the least congested route and assigns the flowlet accordingly. Each flowlet is dispatched independently, ensuring equilibrium across all paths.

In AI computing networks, the intelligence of DLB is crucial. When thousands of GPUs transmit data simultaneously, similar to peak-hour container shipping, fixed-path approaches can lead to congestion. DLB ensures that each flowlet (packet group) finds the optimal path, providing stable and efficient networking for AI training.

Spray Link

The third solution, Spray Link, addresses elephant flows and hash polarization.

Returning to the four-lane highway: distributing vehicles from different entrances requires a universal algorithm. Conventional methods might classify vehicles by color, size, or height, achieving equilibrium on diverse highways. But AI computing networks differ—all GPUs transmit concurrently, like an indivisible super-long truck confined to one lane, causing "hash polarization" and reduced efficiency.

Spray Link takes inspiration from motorcycle convoys. Even in large groups, each motorcycle can travel independently. By splitting data flows into segments (like motorcycles) and assigning them to separate lanes, Spray Link overcomes traditional limitations, enabling equilibrium distribution and higher network efficiency.

FGLB

Previous strategies—whether flow slicing or dedicated lanes—focused on single-switch entrance bottlenecks. However, networks comprise multiple switches, and data often traverses several nodes. While individual switches make local forwarding decisions, they lack visibility into distant congestion. For example, even if adjacent "toll booths" assign specific lanes, remote exit congestion may go unnoticed, worsening bottlenecks. Without global awareness, overall network efficiency suffers.

To address this, H3C developed FGLB (Flexible Global Load Balancing), a hardware-based probing technology enabling real-time cross-switch monitoring. "Toll booths" share exit-lane traffic status instantly. Upon detecting congestion, the system immediately redirects local traffic to alternate lanes, avoiding traffic jam. Though real-world networks are far more complex (with hundreds of devices), FGLB allows each device to continuously monitor traffic, rapidly identifying and resolving exit congestion for superior load balancing and performance.

Of course, FGLB isn't a silver bullet—it demands advanced hardware. In our latest products, H3C innovated with DDC architecture (Diversified Dynamic-connectivity), a higher-level load balancing approach we’ll explore next time.

Intelligent Computing Technical Insights | Why Do We Need More Open and Decoupled Intelligent Computing Center Networks? Part 2

LBN

DLB

Spray Link

FGLB

Cloud & AI

Intelligent Connection

Intelligent Computing

Intelligent Storage

Security

SMB Products

Intelligent Terminal Products

Industry Solutions

Product Support Services

Technical Service Solutions

Resource Center

Policy

Online Help

Technical Blogs

Training & Certification

Become a Partner

Partner Policy & Program

Global Learning

Partner Sales Resources

Partner Business Management

Service Business

Profile

News & Events

Online Exhibition Center

Contact Us