From 30% Idling to 100% Throughput: DDC Pushes the Limits of Intelligent Computing Performance

2025-07-25 3 min read
Topics:

    In the realm of Artificial Intelligence (AI) large model training, tech giants are facing a significant waste of computing power, with GPU clusters idling up to 30% of the time, costing millions daily. This inefficiency is due to the gap between network transmission latency and GPU processing speeds. However, the Diversified Dynamic-Connectivity (DDC) network technology, aims to eliminate this waste.

    Cell Spraying: A Disruptive Transmission Revolution

    DDC is a revolutionary innovation in network architecture. It decouples the switching fabric and line cards of traditional chassis-based switches, reconstructing them into distributed, interconnected clusters of independent units. By adopting cell-based technology, DDC achieves non-blocking transmission. H3C has innovatively fused cell switching with Ethernet protocols, enabling this architecture to not only support diverse computing resources but also facilitate multi-vendor network device interoperability. DDC completely addresses the challenges of efficient transmission, multi-brand GPU and Network Interface Card (NIC) compatibility, and network heterogeneity in intelligent computing, significantly enhancing network flexibility, scalability, and overall performance.

    DDC adopts an innovative cell-spraying method, slicing data streams into standardized, fixed-length cells—akin to transforming vehicles of all sizes into identically sized "mini-cars" that can efficiently utilize every lane. This method is transparent to the data flow and agnostic to specific protocol details. Compared to InfiniBand (IB), DDC reassembles data packets on the network side, eliminating the need for NICs to support hardware packet reassembly, thereby achieving compatibility with

    Traditional network transmission methods pale in comparison. Flow-based forwarding directs all data streams with identical characteristics (e.g., five-tuple) to the same link, making it sensitive to flow attributes and prone to overloading certain links while leaving others underutilized, relying heavily on manual optimization. Packet-based spraying attempts to distribute packets across different links, but uneven packet sizes still prevent ideal load balancing. Imagine a mixed-traffic lane where large trucks squeeze out smaller cars, inevitably leading to imbalanced loads.

    Figure 1: Comparison of Three Network Transmission Methods

    Let’s delve into this revolutionary transmission system: When data enters the network, the Network Connectivity Processor (NCP) precisely slices the data stream into standardized "cells." These intelligently tagged cells then enter the high-speed traffic network built by the Network Connectivity Fabric (NCF). Here, each cell follows an optimized path like maglev trains on dedicated tracks. Upon arrival, the system reassembles the scattered cells with jigsaw-like precision. The brilliance of this design lies in its creation of two parallel worlds: the "packet forwarding domain" for regular traffic (like city streets) and the "cell forwarding domain" optimized for AI training (like expressways). These domains operate independently yet intelligently collaborate, dynamically allocating network resources based on demand.

    Figure 2: DDC Transmission System

    Architectural Revolution: Decentralized Design and Open Ecosystem

    DDC introduces a revolutionary, decentralized, and open intelligent network architecture. By decoupling and distributing the core functions of traditional centralized control planes to edge NCP nodes, it achieves a "distributed cooperative forwarding, edge autonomous decision-making" model. Each NCP node has full local decision-making capabilities, receiving real-time network status data from NCFs, running distributed scheduling algorithms, and executing fine-grained cell-level traffic scheduling. This design not only eliminates latency bottlenecks from centralized controllers but also exponentially improves scheduling efficiency through distributed computing.

    Figure 3: DDC Architecture Diagram

    For ecosystem openness, DDC leverages standard BGP protocols to create an open interoperability framework. By extending BGP for TEP information distribution, it establishes a unified device communication standard. H3C, in collaboration with industry partners, developed the DDC core framework standard based on Open Scheduling Framework (OSF) for AI Networks, providing comprehensive guidance from requirements,architecture to technical solutions. This standardization breaks vendor lock-in, enabling multi-brand hardware to interoperate seamlessly within the same network—truly achieving "hardware-defined freedom."

    This "distributed intelligence + open standards" architecture delivers three core advantages:

    • Millisecond-level response through localized decision-making.
    • Elastic scalability, where each new NCP node automatically integrates, linearly boosting overall capacity.
    • Self-healing capabilities, ensuring single-point failures don’t disrupt the network.

    Ultimate Adaptation for AI Scenarios:

    DDC's Architectural Innovations & New Benchmark in Performance Testing

    In AI computing scenarios, DDC demonstrates exceptional adaptability. Its architecture achieves breakthroughs in scalability: single clusters support nearly 10,000 GPUs, while multi-cluster solutions extend this to tens of thousands—meeting the demands of even the largest AI models. For network performance, DDC optimizes critical bottlenecks in distributed training. Its innovative traffic scheduling algorithms deliver 107% higher effective bandwidth compared to traditional ECMP networks, particularly excelling in All-to-All communication patterns that plague multi-GPU training. In such high-demand scenarios, DDC-based RoCE networks outperform industry solutions by 2.5% on average, rivaling InfiniBand’s performance while maintaining plug-and-play simplicity, load balancing, and full decoupling at the endpoint.

    DDC also shines in multi-tenancy and heterogeneous compatibility. Its hardware-software co-design enables 16K-level tenant isolation—finer-grained and larger-scale than traditional ACL/VxLAN solutions, with zero bandwidth overhead. For heterogeneous environments, DDC seamlessly connects GPUs and NICs from different vendors, solving compatibility headaches.

    On the Operation and maintenance hand, DDC simplifies management dramatically. Native cell-level scheduling eliminates manual tuning, ensuring true plug-and-play functionality. One-click auto-deployment and end-to-end visualization give operators full network transparency. Chip-level fault detection enables instant failover, ensuring uninterrupted AI training cycles.

    Conclusion

    The mature application of DDC technology marks the transition of AI computing networks from "constrained" to "liberated." Traditional network architectures, with their centralized design and hash polarization, once severely hindered the training efficiency of large AI models. However, DDC has completely redefined data transmission through cell spraying and hardware decoupling. As large models evolve toward even greater scale, DDC—with its exceptional performance and flexible scalability—has become a critical cornerstone supporting the advancement of AI computing technology.

    You may also like

    From 30% Idling to 100% Throughput: DDC Pushes the Limits of Intelligent Computing Performance

    2025-07-25
    In the realm of Artificial Intelligence (AI) large model training, tech giants are facing a significant waste of computing power, with GPU clusters idling up to 30% of the time, costing millions daily. This inefficiency is due to the gap between network transmission latency and GPU processing speeds. However, the Diversified Dynamic-Connectivity (DDC) network technology, aims to eliminate this waste.

    DDC Evolution - Agile Network for the AI Era

    2025-07-15
    The emergence of Distributed Disaggregated Chassis (DDC) technology signifies a major shift in network architectures amid ongoing digital transformation. Introduced in 2019 for Data Center Interconnect (DCI), DDC has evolved significantly, becoming standardized in 2024 and applicable in diverse scenarios. This five-year journey from concept to industry implementation heralds a new era of disaggregated network design and operation.

    The Key to AI Network, H3C’s New Generation Lossless Network Solution Based on DDC Architecture

    2025-07-03
    H3C's new generation lossless network solution based on the DDC (Diversified Dynamic-Connectivity) architecture focuses on the core challenges faced by data centers in the AI era, such as traffic congestion and complex scheduling. Through architectural innovation, it achieves efficient collaboration and lossless transmission across the entire network, helping customers reshape the infrastructure of AIDC and fully unleash the value of computing power.
    新华三官网