Intelligent Computing Technical Insights | Why Do We Need More Open and Decoupled Intelligent Computing Center Networks Part 1

2025-06-20 3 min read
Topics:

    Traditional Data Centers VS Intelligent Computing Centers

    Prior to the emergence of AI technology, users generally utilized a simpler and more direct architecture when constructing data centers, emphasizing connectivity. Over time, virtualization technology emerged, paving the way for cloud computing. However, regardless of its evolution, traditional data centers have always relied on CPUs for serial computing, ultimately providing users with deterministic computational results.

    With the rapid development of AI technology, GPUs have become an irreplaceable core component in intelligent computing scenarios. The key difference between CPU and GPU lies in their data processing and inference capabilities. CPUs derive precise conclusions based on predefined rules and deterministic data, while GPUs process massive amounts of raw data, using intelligent training and inference to provide users with uncertain predictions—a process that users cannot fully control.

    In this context, the network of an intelligent computing center must handle the transmission of vast amounts of data and frequent internal data interactions. Therefore, compared to traditional data centers, intelligent computing networks face greater challenges. These challenges not only involve ensuring network stability during high-speed computations and interactions but also require careful planning before construction to maximize the value of the investment.

    Why the Openness and Decoupling is Necessary?

    Usually, when building an intelligent computing network, the focus tends to be on GPUs and supporting hardware. However, due to the limited choices of GPUs available in the market, builders tend to opt for well-known manufacturers. These manufacturers offer comprehensive product ecosystems that cover almost every aspect of an intelligent computing center, including switches, specialized optical modules, GPUs, servers—and provide integrated solutions based on these products.

    This creates a misconception: many users assume that an intelligent computing center must be a single vendor or an all-in-one solution. However, network and computing infrastructure can be procured separately. Just like building a general-purpose data center, users can purchase servers from one vendor and switches from another, selecting the most advanced products in each category to maximize value.

    The Value of Decoupling

    1. Leveraging Leading Innovations Across Fields

    First, computing and networking are highly complex domains, involving components such as GPUs, network NICs, optical modules, and switches—each with dozens or even hundreds of manufacturers forming a vast ecosystem. By adopting a decoupled approach, customers can combine cutting-edge AI computing platforms with high-quality network connectivity, resulting in a superior overall solution for intelligent computing centers. Additionally, introducing more suppliers prevents vendor lock-in and preserves bargaining power, thereby reducing procurement costs.

    2. Flexibility and Scalability

    Choosing an open architecture when building an intelligent computing network lays a flexible foundation for future development.

    Take Ethernet as an example: its ability to integrate with all intelligent computing platforms, combined with its open-standard nature, allows for phased construction of intelligent computing center networks. This ensures seamless interoperability with existing infrastructure while enabling flexible expansion and upgrades to meet new business demands. As technology evolves, this architecture can adapt to future needs, whether by switching vendors or upgrading hardware like CPUs and GPUs, ensuring smooth scalability.

    Thus, an open and decoupled intelligent computing network is crucial for the construction of intelligent computing centers and a key driver for the continued advancement of intelligent computing technology.

    Deployability

    When transitioning from traditional data centers to intelligent computing centers, builders often fall into another misconception: they assume that simply purchasing some hardware boxes, pairing them with a fixed-architecture framework, and connecting servers to the leaf layer is sufficient. However, intelligent computing centers differ fundamentally from general-purpose data centers. They not only require high-performance hardware but also demand careful consideration of future-proof architecture design.

    Take hardware equipment as an example, intelligent computing centers often incorporate cutting-edge hardware such as high-performance GPUs, high-speed optical modules, and switches supporting 200G/400G ports. The complexity of technology selection increases significantly, leading some users to prioritize performance while overlooking the deployability of the intelligent computing center itself.

    Numerous experience and cases demonstrate that merely stacking equipment does not create an efficient intelligent computing network. Instead, it requires in-depth planning, design, and implementation.

    1. Network Scale and Equipment Selection

    Given the unique traffic patterns, intelligent computing networks typically adopt a 1:1 non-blocking fat-tree architecture, which differs markedly from the higher-blocking architectures used in traditional data centers. The core characteristic of this architecture is that its capacity is closely tied to the port count of individual devices—the more ports a device has, the more exponentially the intelligent computing network can scale. A widely adopted approach is using uniformly configured box-shaped devices (e.g., 64-port 400G products) to build two- or three-tier network structures for large-scale deployments. In such three-tier architectures, Core, Spine, and Leaf layers form multiple PODs, with Spine and Leaf serving as internal nodes and Core handling inter-POD connectivity. According to the fat-tree scaling model, the capacity of a single POD is K²/4, while the total capacity of all PODs is K³/4 (where K represents the number of device ports).

    During the initial planning phase, intelligent computing networks must account for future scaling needs and select network equipment accordingly to ensure the architecture remains flexible and upgradable, adapting to evolving business demands and technological innovations.

    2. Installation Environment and Power Consumption

    Following the principle that port count correlates with scalability, chassis switches—capable of supporting hundreds of ports—offer better scalability than box switches at the same cost. However, power consumption is a critical concern. When a chassis switch is fully configured with 400G ports, its peak power draw can reach up to 20 kilowatts—far exceeding the capacity of traditional data center racks designed for a maximum of 10 kilowatts per cabinet. In contrast, even high-performance servers equipped with eight GPUs typically consume less than 10 kilowatts. If large-scale power infrastructure upgrades are required to accommodate network equipment, the cost advantage vanishes.

    On the other hand, while an all-box architecture can theoretically support larger intelligent computing networks, not all customers require such massive deployments (e.g., enterprise-built intelligent computing centers are often smaller than those of internet companies). For smaller-scale projects, a single-chassis or multi-chassis one-tier architecture may be more cost-effective and efficient, meeting current business needs without unnecessary energy waste or costly retrofits.

    Therefore, selecting the appropriate switch type (chassis or box) based on actual network scale is crucial. When designing the network, a balance must be struck between scalability and power constraints.

    3. Optical Module Interoperability

    In some intelligent computing center layouts, a single rack may house only one GPU server. As a result, the distance between switches and connected servers can vary significantly. To address this, multiple types of optical modules may be needed to accommodate different cabling requirements. Additionally, since 400G ports may be split for use and interoperability issues may arise between QSFP-112 and QSFP-DD standards, these complexities must be carefully considered during the planning phase.

    For decisions regarding scale and architecture alignment, customers can seek assistance from professional network vendors for detailed planning. The key lies in selecting equipment with diverse form factors and support for open protocols, enabling flexible network architectures that ensure both openness and high deployability.

    Take H3C’s switch products as an example. The diversity in form factors, protocol openness, and architectural flexibility have set it as industry benchmarks, demonstrating exceptional adaptability in demanding intelligent computing scenarios. H3C offers a comprehensive portfolio of box and chassis products supporting rates from 100G, 200G, 400G, to 800G, as well as innovative Diversified Dynamic-connectivity architecture solutions, catering to intelligent computing centers of all scales and deployment environments.

    H3C adheres to the philosophy of an open ecosystem and collaborative development, integrating the strengths of major switch chip manufacturers and leveraging standardized RoCE protocols to deliver lossless network solutions. Furthermore, its products provide standard Netconf interfaces for seamless integration with third-party management systems such as SDN controllers and cloud platforms, maximizing application scenarios and customer compatibility. In terms of hardware compatibility, H3C has pre-validated end-to-end connectivity with mainstream GPUs, network NICs, and optical modules, ensuring customers can deploy with confidence, free from compatibility concerns.

    You may also like

    Intelligent Computing Technical Insights | Why Do We Need More Open and Decoupled Intelligent Computing Center Networks Part 1

    2025-06-20
    Discover why an open, decoupled network architecture is superior to single-vendor solutions. Learn how to avoid vendor lock-in, improve scalability, and navigate key deployment challenges like power consumption and hardware interoperability.

    PoE Technology: Powering Smarter Networks

    2025-03-20
    In today's fast-paced digital world, the demand for efficient, safe, and flexible power solutions for networked devices is higher than ever. Enter Power over Ethernet (PoE) technology, a game-changer that's revolutionizing how we power and connect our devices. From IP phones and wireless access points to security cameras and gateways, PoE products are making waves across various industries. But what exactly sets them apart? Let's dive into the key advantages that make PoE products a standout choice.

    What Are the Differences Between Ordinary Switches and Industrial Switches?

    2025-03-20
    As a key device for connecting network nodes, switches are widely used in various scenarios. However, different usage scenarios call for different types of switches. Thus, industrial switches, which are specifically designed for particular environments, have emerged in the market. These switches are distinct from ordinary ones in terms of environmental adaptability, communication protocol support, network management functions, and data transmission reliability.
    新华三官网