Download Book

H3C Cloud Cluster Technology Best Practices-6W100-book.pdf(458.52 KB)

Released At: 31-07-2025
Page Views:
Downloads:

Table of Contents

H3C Cloud Cluster Technology Best Practices-6W100

Related Documents

H3C Cloud Cluster Technology Best Practices

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

This document provides generic technical information, some of which might not be applicable to your products.

Introduction

Background

In a WLAN, multiple devices are typically used for redundancy to enhance network reliability. The cloud cluster technology, independently developed by H3C, uses a dual-cluster virtualization architecture of physical and container resources. It virtualizes two ACs into one, and runs container applications based on Docker on the virtual AC. This ensures that WLAN services remain uninterrupted if one AC fails.

Cloud cluster provides configuration synchronization, simplified management, and service hot backup. Unlike traditional IRF backup, cloud cluster does not require virtualizing the entire OS, featuring lower resource consumption, faster deployment and migration, and easier configuration and maintenance. This addresses the complexity and inflexibility of traditional IRF management.

Application scenarios

In scenarios where clients has high requirements on communication timing, such as office or production environments, use two ACs for redundant backup. When the master AC fails, APs and clients can seamlessly continue working with the backup AC. The backup AC uses real-time backed-up WLAN service data to ensure uninterrupted traffic flow.

Network model

Cloud cluster networking applications

A cloud cluster is divided into the following layers:

· Physical cluster—Virtualizes two physical devices (ACs) into one device. Using this virtualization technology can integrate hardware resources from multiple devices. On one hand, it enables unified management and allocation of hardware resources from multiple devices, increasing resource utilization and reducing management complexity. On the other hand, it also achieves hardware-level backup, which improves the high reliability of the entire system.

· Container cluster—Connects Comware containers running on physical devices together and virtualizes them into one system. Using this virtualization technology can integrate software processing capabilities from multiple Comware containers. It enables collaborative operation, unified management, and uninterrupted maintenance of multiple containers.

In a cloud cluster, physical devices have a one-to-one correspondence with Comware containers, with one Comware container running on each physical device. Physical devices can be clustered to form a device-level backup, while Comware containers can be clustered to form a service-level backup. The entire physical cluster corresponds to one container cluster. As shown in Figure 1, two devices form a physical cluster. For their upper and lower layer devices, the two physical devices are virtualized into one network device (corresponding to the container cluster in Figure 1). The virtual network device owns and manages the resources on all its member devices.

Figure 1 Cloud cluster network diagram

‌

Cloud cluster architecture

Figure 2 shows the physical architecture of a cloud cluster.

Figure 2 Physical architecture of a cloud cluster

· The components that implement the cloud cluster functionalities within a physical device are collectively referred to as the cloud platform. The cloud platform software runs directly on a standard distribution of the Linux system. The platforms on different physical devices communicate with each other through Layer 3 channels to virtualize the physical devices into a physical cluster.

· A Comware container is a container running the Comware system, providing basic communication functions for the device. Containers run on the cloud platform software and are managed by the cloud platform software. The containers on physical devices communicate with each other through Leopard Inter-process Communication (LIPC)/Management Bus (MBUS) channels to virtualize containers into a container cluster.

Figure 3 shows the logical architecture of a cloud cluster.

Figure 3 Logical architecture of a cloud cluster

Cloud cluster virtualizes physical devices into a dual-cluster virtualization architecture with a physical cluster and a container cluster. The cloud platform contains the following components:

· Manager—Runs on the host operating system of each physical node that participates in physical cluster management. The Manager component is responsible for providing cloud platform HA, establishing the cluster, and managing cluster members. It provides the following functions:

¡ Manage, establish, and maintain the physical cluster, manage cluster members, and generate and update the cluster topology.

¡ Manage the container cluster, intelligently deploy Comware containers based on the distribution of physical hardware resources, and elect the primary and subordinate containers for the container cluster.

· Worker—Runs on the host operating system of each physical node. The Worker component is responsible for managing the lifecycle of physical nodes and containers. It periodically reports the physical resources and status of the nodes, responds to scheduling instructions from the Manager component, and creates and runs containers based on instructions from the Manager component.

· Admin—Runs on each physical node. The Admin component receives and processes configuration messages from the primary Comware container. It is responsible for managing device operating modes and container description files, and sending container deployment requests to the Manager cluster.

· Agent—Runs in containers. The Agent component is responsible for reporting the health status of the services inside the container and notifying the service module of cluster and container events.

Figure 4 shows the internal running locations of cloud platform components within a device. After being powered on, the physical device automatically runs the components.

Figure 4 Running locations of cloud platform components

Implementation

Physical cluster implementation

Basic concepts of physical cluster

1. Member device roles

Every device in the physical cluster is called a member device. The device operates as a Manager in the physical cluster and is responsible for the following:

¡ Manage, establish, and maintain the physical cluster, manage cluster members, and generate and update the cluster topology.

¡ Manage the container cluster, deploy Comware containers as needed based on the hardware resources in the physical cluster, and elect the primary and subordinate containers for the container cluster.

At first-time creation of a physical cluster, the network administrator determines the role of each Manager member. The roles of managers include the following:

¡ Leader: Primary manager, responsible for managing and controlling the entire cloud cluster, acting as the control center of the entire cloud cluster.

¡ Follower: Backup manager, running as a backup while handling service and forwarding packets at the same time.

2. Member ID

Used to uniquely identify a physical device. Both physical and container clusters use member IDs for cluster establishment and maintenance.

3. Member IP address

Used for internal communication within a physical cluster, specifically for exchanging physical cluster protocol packets. The control packets of a physical cluster are Layer 3 IP packets. All member devices in the physical cluster must be configured with a member IP address, and all member IP addresses must belong to the same network segment. Make sure all member devices can reach each other at Layer 3.

4. Join-cluster IP address

Used to guide a device to join a physical cluster, which must be configured on the device by the administrator. The cluster IP can be the member IP of any existing member device in the physical cluster. As a best practice, configure the member IP address of the leader as the join-cluster IP. During the initial setup of a physical cluster, it is not required to configure the join-cluster IP for the leader. Devices not configured with a join-cluster IP consider themselves as the leader.

Physical cluster topology

The control packets of a physical cluster are Layer 3 IP packets. The physical cluster requires that member devices be configured in the same network segment and use this network segment to exchange physical cluster control packets. A physical cluster supports chain-shaped connection and star-shaped connection, as shown in Figure 5.

Figure 5 Physical cluster topology

NOTE:

Currently, the physical cluster uses the control channel of the container cluster link to transmit physical cluster control packets. To set up a container cluster network, the network administrator must use commands to bind physical interfaces with the control channel and data channel of the container cluster link on the device. The control channel will be used to transmit physical cluster control packets and container cluster control packets between cloud clusters, while the data channel will be used to transmit data packets for cross-container forwarding.

Physical cluster establishment

To initially establish a physical cluster, you must configure devices to determine device identities. When you build the cluster, first complete the cluster planning, including devices to participate in the management of the physical cluster, leader device, member IDs, and network segment used for internal communication.

For a device to act as the leader, configure the following on the device and restart the device:

· Specify the member ID.

· Specify the member IP address.

Newly joined devices in a physical cluster also determine their identities based on the configurations. Configure the following on follower devices:

· Specify the member ID. Make sure the member ID is unique in the cluster.

· Specify the member IP address. Make sure the address is on the same subnet as the member IP of the leader and the two devices can reach each other.

· Specify the Join-cluster IP address. As a best practice, specify the leader's member IP as the join-cluster IP. You can also specify the member IP of any other member device.

After configuring a follower device, restart the device. The device will read the configuration. Once the device detected that a join-cluster IP is configured, it starts up as a follower and sends a cluster join request to the join-cluster IP address. After receiving the join request, the leader sends a unicast reply indicating a successful join.

As shown in Figure 6, the process is as follows for Device B to join physical cluster Device A:

1. Device B starts the Admin, Manager, and Worker components of the cloud platform based on the configuration file.

2. The Worker component automatically starts the Comware container, and the Manager (follower) and Worker automatically register with the leader and start the cluster join timer.

3. Device A is the leader in the physical cluster and replies with a successful join message to the Manager (follower) and Worker.

4. The leader periodically sends unicast Hello packets (announcing itself as a healthy leader) to the members.

5. After receiving the Hello packet, Device B records the leader's information and reports its local physical resource information to the leader.

6. If the network administrator issues a command to create a container, the leader schedules Device B to create and start the container based on the resource information reported by each member device.

7. After the container on Device B is successfully started, the Worker component reports the container's information to the leader.

8. The leader synchronizes the physical cluster information with the Manager component of Device B so that the follower can act as a backup to the leader. The leader also synchronizes the information about other containers in the current cloud cluster with the Worker component of Device B.

Figure 6 Adding a new device to the physical cluster

Member leaving

After the physical cluster is successfully established, the leader draws the cluster topology based on the connection. The relationship between the leader and follower is maintained through interacting Hello packets. A member device can actively leave the physical cluster or be forced to leave the cluster:

· Active leaving

Active leaving refers to the scenario where an administrator executes the undo join-cluster command in cloud-cluster member view to remove a device from the physical cluster. The device sends a leave cluster packet to the leader, and the leader replies with a leave cluster response. Then, the leader removes the device from the physical cluster device list and physical cluster topology.

· Passive leaving

If a member device cannot reach the member IP address of the leader in the cluster, control packets cannot be transmitted between the member and the leader, which can trigger passive leaving of the member. The process of passive leaving is as follows:

a. The leader periodically sends unicast Hello packets to announce its status to followers.

b. Each follower creates a timer locally. If a Hello packet is received before the timer expires, it is considered that the leader is running normally, and the follower responds with a Hello response.

c. When the leader receives a Hello response, it considers that the corresponding follower is running normally. If the leader does not receive a Hello response from a certain follower, the leader will decrease the number of Hello packet timeouts by 1. If the number of Hello packet timeouts reaches 0 and the leader still has not received a Hello response from that follower, the leader considers that the follower has temporarily left the physical cluster and sets the follower's status to Offline.

d. If a follower has not received a Hello packet from the leader until the timer expires, it considers that the leader has failed.

Figure 7 Passive leaving

Physical cluster splitting

During the operation of a physical cluster, leaders and followers in the cluster periodically send Hello packets to each other to maintain the cluster relationship. If the Hello timer expires and no response is received from the peer end, the device considers that the peer end has failed and set the state of the peer end to Offline.

Once a physical cluster is formed, if a link fails between member devices and the Hello packets cannot reach the destination, the physical cluster splits into two separate physical clusters. This process is called cluster splitting.

As shown in Figure 8, devices split from a cluster act as followers and cannot manage containers deployed on a physical cluster. Services are managed by the container cluster, and physical cluster splitting does not affect services running on the containers.

Figure 8 Physical cluster splitting

Physical cluster merging

The process of interconnecting two physical clusters to form one physical cluster is called physical cluster merging. As shown in Figure 9, Device A and Device B formed a physical cluster, with Device B as the leader. When the cluster link between Device A and Device B fails, the cluster splits into two. When the cluster link between Device A and Device B is repaired, Device A and Device B can receive voting requests from each other. The one that receives the voting response first becomes the leader, and the other becomes a follower.

Figure 9 Merging of two clusters that do not contain a leader

Only physical clusters with member IPs on the same subnet can be merged into one cluster. Physical clusters with member IPs on different subnets, even if the clusters can reach each other at Layer 3, cannot be merged into one cluster.

Container cluster implementation

Basic concepts of container cluster

1. Member container roles

Each container in a container cluster is called a member container. Member containers are divided into the following roles according to their functions:

¡ Master container: Responsible for managing and controlling the entire container cluster.

¡ Standby container: Runs as a backup container for the master container while processing service and forwarding packets. When the master container fails, the system automatically elects a new master container from the standby containers.

Only one master container exists in a container cluster at a time. In a correctly operating physical cluster, the master and standby roles are determined by the leader of the physical cluster. When the physical cluster fails, both the master and standby containers are selected through election.

2. Container ID

A container ID is the unique identifier of a container in a container cluster, and a member ID is the unique identifier of a member device in a cloud cluster. Member containers run on physical devices, and the container IDs are assigned by the leader of the physical cluster in a unified way.

3. Container cluster domain

A domain is a logical concept, and one container cluster corresponds to one container cluster domain.

To accommodate various network applications, multiple container clusters can be deployed in a network, and the container clusters are distinguished by domain IDs. As shown in Figure 10, Device A and Device B form container cluster 1, and Device C and Device D form container cluster 2. The two container clusters are configured with different domain numbers, ensuring that the operation and services of the two clusters do not interfere with each other.

Figure 10 Container cluster domain for different container clusters

Container cluster topology

The physical cluster uses the control channel of the container cluster link to transmit physical cluster control packets. To set up a container cluster network, the network administrator must use commands to bind physical interfaces with the control channel and data channel of the container cluster link on the device. The control channel will be used to transmit physical cluster control packets and container cluster control packets between cloud clusters, while the data channel will be used to transmit data packets for cross-container forwarding. Thus, the topology of a container cluster is the same as the topology of the corresponding physical cluster.

A container cluster supports chain-shaped connection and star-shaped connection, as shown in Figure 11.

Figure 11 Container cluster topology

‌‌

Container cluster establishment

The Worker component of the cloud platform is responsible for creating and deleting containers.

A Comware 9 container is the basic container of the device, used to implement routing and forwarding functions. Therefore, the device supports Comware 9 container by default. Currently, a physical cluster supports collaboration only with Comware 9 container and can manage only Comware 9 containers (such as determining the primary and backup containers). The physical clusters can host other containers based on the Docker technology, but cannot use these containers to form a container cluster.

The process of establishing a container cluster is as follows:

1. After a device starts up, it automatically attempts to start Comware 9 containers. The Agent component inside the container notifies the Worker component of the container creation and deletion events.

2. The Worker component forwards the container creation and deletion events to the leader in the physical cluster.

3. The leader decides whether to allow the creation or deletion of containers based on the physical resource usage. If allowed, the first created container is the master container, and containers created later are standby containers.

4. The leader notifies the Worker component to create or delete containers.

5. After the Worker component successfully creates or deletes containers, it notifies the leader of the creation or deletion result.

6. The leader updates the container information table (including LIP and container MAC) and the container topology (including Container ID, Member ID, and container MAC), and then synchronizes the updated container information table and container topology to all containers in the cloud cluster.

Master container election

Master container election takes place in the following situations:

· A container cluster is established.

· The master container leaves or fails.

· A container cluster splits.

· Two independently running container clusters merge into one container cluster.

When a container cluster is established for the first time or the entire container cluster restarts, the container that starts first becomes the master container. The other containers become the standby containers. Therefore, after the entire container cluster restarts, it is possible for another container to be elected as the master container.

When the master container leaves or fails, or when the container cluster splits, the system elects a new master in the following order:

1. he current master container keeps running as the master. If the current master is still available, the container cluster will not elect a new master even if a new container with a higher priority joins. This rule does not apply when a container cluster is established, as all joined devices consider themselves as the master.

2. The container with the highest health score is selected. (The container health score obtained periodically reflects the actual health status of the device.)

3. The container with the longest running time is selected. In the container cluster, the measurement precision of running time is 5 seconds. If the startup time interval of two devices is less than or equal to 5 seconds, they are considered to have equal running time.

4. The container with the lowest CPU MAC addresses is selected.

Once determined, the master container immediately broadcasts a Hello packet to announce its master identity and health information. Upon receiving this packet, the other containers stop the election process and function as standby containers. Standby containers send unicast Hello packets that carry the role and health information to the master container. The master container collects information and topology of all standby containers through Hello packets and reports the information to the leader. Once the container cluster information finishes updating, Hello packets are periodically sent between the master and standby containers to maintain the container cluster relationship.

Cloud clusters support a dual-layer election mechanism for the master container, which enhances the reliability and robustness of the cluster:

· When the physical cluster is running normally, the leader of the physical cluster selects the master container based on the master container election rules.

· When the physical cluster does not have a leader and cannot run normally, the container cluster itself selects the master container based on the master container election rules.

After role election is complete, the container cluster is formed and enters the cluster management and maintenance stage.

The container cluster formed by the virtualization of fixed-port devices is equivalent to a modular distributed device, where the master container acts as the active MPU of the virtual device, and the standby containers acts as the standby MPUs and interface modules at the same time, as shown in Figure 12.

Figure 12 Fixed-port device virtualization

Container cluster splitting

A link failure in a container cluster can lead to the splitting into multiple new container clusters. These container clusters have conflicting configurations, such as management IP address, bridge MAC address, and IP addresses, which might amplify the network failure and affect packet forwarding. To improve system availability, a mechanism is required to detect the presence of multiple container clusters in the network and handle them accordingly. This can minimize the impact of container cluster splitting on service operations. Multi-Active Detection (MAD) is such a mechanism that provides split detection, conflict resolution, and fault recovery functionalities.

Cloud clusters support the following types of MAD: cloud platform MAD and LACP MAD within containers. Cloud clusters prefer to use cloud platform MAD. When cloud platform MAD is not available, LACP MAD is used.

Table 1 Comparison of different MAD types

MAD type	Advantages	Limitations	Application scenarios
Cloud platform MAD	· Feature that comes with physical clusters · Additional configuration not required	For products that support both physical clusters using dedicated links and physical clusters using container cluster links, cloud platform MAD takes effect as long as the physical cluster links are up (physical cluster not split).	All cloud cluster networks.
LACP MAD	· Fast detection · Supplement to cloud platform MAD	Use an H3C device (supporting extended LACP protocol packets) as the intermediate device. Each member container must connect to the intermediate device to transmit LACP MAD packets.	The container cluster uses aggregated links to connect with uplink or downlink devices.

MAD type

Advantages

Limitations

Application scenarios

Cloud platform MAD

· Feature that comes with physical clusters

· Additional configuration not required

For products that support both physical clusters using dedicated links and physical clusters using container cluster links, cloud platform MAD takes effect as long as the physical cluster links are up (physical cluster not split).

All cloud cluster networks.

LACP MAD

· Fast detection

· Supplement to cloud platform MAD

Use an H3C device (supporting extended LACP protocol packets) as the intermediate device. Each member container must connect to the intermediate device to transmit LACP MAD packets.

The container cluster uses aggregated links to connect with uplink or downlink devices.

· Split detection

During the operation of a container cluster, the master container and the standby containers in the cluster periodically transmit Hello packets to each other to maintain the cluster relationship. If a standby container does not receive a Hello packet from the master container until the Hello timeout timer expires, the standby container considers the master container faulty.

¡ If a leader exists in the current physical cluster, the leader triggers the cloud platform MAD, which then determines that the container cluster has split and proceeds with conflict processing.

See Figure 13. Physical devices Device A and Device B form a physical cluster, with each running a Comware container. These containers form a container cluster. When a communication error occurs, causing failure of receiving any Hello packet on the standby container from the master before the Hello timeout timer expires, the container cluster splits into container cluster 1 and container cluster 2. Since the physical cluster link is available, the physical cluster can still function normally. In this case, cloud platform MAD handles the split of the container cluster.

Figure 13 Cloud platform MAD networking

¡ If no leader exists in the current physical cluster, the container cluster without the master container will elect a new master container according to the primary container election rules. If LACP MAD detects two master containers within the same container cluster domain through Link Aggregation Control Protocol (LACP), it considers that the container cluster has split.

See Figure 14. Physical devices Device A and Device B form a physical cluster, with each running a Comware container. These containers form a container cluster. When a link error occurs in a container cluster, the cluster splits into two container clusters. Since the physical cluster uses the links of the container cluster, when the container cluster splits, the physical container also splits. The physical cluster does not have a leader to process the MAD event, and LACP MAD will process the splitting of the container cluster.

Figure 14 LACP MAD networking

· Conflict processing of cloud platform MAD

When cloud platform MAD detects a container cluster split, it handles the conflict as follows:

a. Prioritize the container with a higher health score.

Compare the health state of the master containers in two container clusters. The cluster with better health state continues to work, and the other cluster enters Recovery state (disabled state).

b. Prioritize the container with a longer operation time.

c. Prioritize the container with a smaller CPU MAC address.

After entering Recovery state, the container cluster closes all service ports on all member containers except for the reserved ports. This ensures that the cluster cannot forward service packets anymore. You can use the mad exclude interface command to specify the ports to be reserved.

· Conflict processing of LACP MAD in containers

When LACP MAD detects a split in the container cluster, it handles the conflict as follows:

a. Compare the health state of the master containers in two container clusters. The cluster with better health state continues to work, and the other cluster enters Recovery state (disabled state).

b. Compare the member IDs of the master containers in the two clusters. The cluster with the smaller master member ID continues to work, and the other cluster enters Recovery state (disabled state).

MAD fault recovery

The following fault recovery methods are available for both cloud platform MAD and LACP MAD:

1. Repair the faulty link for the two split clusters to merge.

2. If operating container cluster fails before the faulty link is fixed, you can execute the mad restore command to enable the container in Recovery state for emergency response.

A link fault in the container cluster causes the cluster to split, leading to multiple Active conflicts. Therefore, you can repair the failed link to allow the conflicting container clusters to merge into a single container cluster, which will restore the container cluster fault.

After you repair the faulty link, the system will automatically restart the container cluster in Recovery state. After the restart, all member containers in the Recovery-state cluster join the operating cluster as standby containers, and the service ports that were forcibly closed automatically restore to their actual physical state. Thus, the entire container cluster recovers, as shown in Figure 15.

CAUTION:

Restart the container cluster in Recovery state as prompted. Incorrectly restarting the container cluster in normal operating state will cause the merged cluster to remain in Recovery state and shut down all service ports on the member devices. In this case, you must execute the mad restore command to restore the entire container cluster.

Figure 15 Fault recovery of MAD (container cluster link recovery)

If the operating container cluster fails (device fails or uplink or downlink fails) before the faulty link recovers (see Figure 16), you can execute the mad restore command on the container cluster in Recovery state to restore it to normal state and take over the work from the originally operating container cluster. Then, repair the faulty container cluster and link.

Figure 16 Fault recovery of MAD (operating cluster fails before link recovery)

Container cluster merging

Based on whether the MAD feature is enabled, container cluster merging is divided into two scenarios:

· If the physical cluster functions normally or has the LACP MAD feature configured, when a container cluster link error causes the container cluster to split, the cloud cluster allows one container cluster to operate normally and disables the other (putting it in Recovery state). If the faulty link between the two split container clusters is restored, the two container clusters will automatically merge. The container cluster in Recovery state automatically restarts and joins the operational container cluster as standby containers.

· If a physical cluster is not functioning properly and LACP MAD is not configured, when a container cluster link error causes the cluster to split, both split container clusters continue to operate normally. This situation is known as a dual-master phenomenon. In this scenario, if you restore the faulty link between two container clusters, the containers automatically merge. During the merging, a master container election takes place. The election rules are:

a. The master container with a higher health score is selected as the new master.

b. The master container with a longer running time is selected as the new master.

c. The master container with a smaller CPU MAC address is selected as the new master.

The container cluster whose master container wins the election continues to operate, and the cluster whose master container fails the election automatically restarts and joins the winning cluster as standby containers.

Unified configuration management

All configurations are submitted to the master device of the container cluster for processing. The master device then synchronizes the configurations to the standby devices within the container cluster. For physical cluster configuration, the master converts commands into internal system instructions for the physical cluster.

Configuration synchronization in a container cluster involves the following parts:

1. Bulk synchronization at initialization.

2. Real-time synchronization during stable operation.

High availability

Protocol hot backup

Protocol hot backup ensures that protocol configuration and operational data (such as state machine or session entries) are backed up to all other member containers. This enables the cloud cluster system to function as an independent device within the network.

For example, as shown in Figure 17, the cloud cluster device uses the RIP routing protocol for the network on the left and the OSPF routing protocol for the network on the right. When the master container receives an Update packet from a neighboring router, it updates its local routing table and immediately sends the updated entries and protocol state to all member containers. Upon receiving this information, the other member containers promptly update their local routing tables and protocol states to ensure strict synchronization of routing information across the cloud cluster system. When a standby container receives an Update packet from a neighboring router, it forwards the packet to the master container for processing.

When the master container fails, the newly elected master container can seamlessly take over its tasks. After receiving OSPF packets from neighboring routers, the new master sends the updated routing entries and protocol state information to all member containers, ensuring that the OSPF protocol operation in the cloud cluster is unaffected, as shown in Figure 18. This ensures that when a member container fails, other member containers can continue to run normally and quickly take over the failed container's functions. Consequently, the internal routing protocol remains uninterrupted, and no disruption occurs in Layer 2 or Layer 3 traffic and services, achieving seamless fault protection and container switching.

Figure 17 Protocol hot backup (before a member container fails)

Figure 18 Protocol hot backup (after a member container fails)

Cluster link redundancy

The cloud cluster uses aggregation technology to implement cluster link redundancy.

First, the cloud cluster separates control packets and data packets. The control packets in the cloud cluster are sent through the cluster control link, and cross-device service data packets within the cloud cluster are sent through the cluster data link. As a best practice, deploy the cluster control link and cluster data link on different physical links to ensure that bursts of service data do not affect the communication of control packets.

Next, multiple physical ports are bound to the cluster control link, and the cloud cluster automatically aggregates these control links. Similarly, multiple physical ports are bound to the cluster data link, and the cloud cluster automatically aggregates these data links. Multiple links within the same aggregation can load balance the packets, effectively increasing bandwidth and enhancing performance. Additionally, these links back each other up, ensuring that even if one link fails, the cloud cluster functionality remains unaffected, thereby improving device reliability.

Figure 19 Cluster link redundancy

AP hot backup and client hot backup

NOTE:

Client hot backup must be used together with AP hot backup and takes effect only when both client hot backup and AP hot backup are enabled.

In large WLANs, multiple ACs are usually required to manage numerous APs. Managing each AC independently is difficult, and a single AC failure will disrupt the connected wireless network service. To resolve the mentioned issues, you can use multiple ACs to form a cloud cluster and configure AP hot backup among member ACs. This ensures unified AP management, information backup, and continuous AP operation during AC failures.

After enabling AP hot backup and client hot backup, the leader can back up all connected AP and client information to the followers. When the leader fails, the system restores the AP and client information on a follower, ensuring continuous wireless service.

Cloud cluster deployment

Network configuration

AC 1 and AC 2 establish a cloud cluster through a direct link, with AC 1 as the master device. The cloud cluster and the core switches establish a dynamically aggregated link for LACP MAD and service packet forwarding.

Figure 20 Network diagram

Restrictions and guidelines

Use two ACs of the same model to form the cloud cluster.

To upgrade a network of another type to a cloud cluster, as a best practice, first clear the AC configurations, reboot the device, set up the cluster network, and finally configure other services.

In a cloud cluster, member devices must use IP addresses on the same subnet. Therefore, assign a dedicated IP subnet to the cloud cluster, and make sure no other devices use this subnet. If any other devices also use IP addresses on the subnet, cloud cluster network issues or service issues might occur. Plan the network configuration accordingly based on actual conditions.

Prerequisites

· Configure core switch 1 and core switch 2 to form a stable Comware 7 IRF fabric.

· Use the shutdown command to shut down the service interfaces connecting AC 1 and AC 2 to the switches, preventing loops generated before dynamic aggregation is configured. Bring up the interfaces after the cluster is set up and dynamic aggregation is configured.

Procedure

Configuring core switch 1 and core switch 2

# Create Layer 2 aggregate interface Bridge-Aggregation 1 and set its aggregation mode to dynamic.

<Core> system-view

[Core] interface bridge-aggregation 1

[Core-Bridge-Aggregation1] link-aggregation mode dynamic

[Core-Bridge-Aggregation1] quit

# Add Ten-GigabitEthernet 1/0/2 to aggregation group 1.

[Core] interface ten-gigabitethernet 1/0/2

[Core-Ten-GigabitEthernet1/0/1] port link-aggregation group 1

[Core-Ten-GigabitEthernet1/0/1] quit

# Add Ten-GigabitEthernet 2/0/2 to aggregation group 1.

[Core] interface ten-gigabitethernet 2/0/2

[Core-Ten-GigabitEthernet1/0/2] port link-aggregation group 1

[Core-Ten-GigabitEthernet1/0/2] quit

# Enable link-aggregation traffic redirection for the aggregation group.

[Core] link-aggregation lacp traffic-redirect-notification enable

Configuring AC 1

1. Configure the member ID of AC 1 in the cloud cluster.

Skip this step. The default member ID of the device is 1.

2. Configure the member IP address of AC 1 in the cloud cluster.

[AC1-ccluster-member-1] member-ip 192.168.10.10 24

3. Specify the cluster IP address.

The cluster IP address is the IP address of the leader. Skip this step in this example because AC 1 acts as the leader.

4. Bind cloud cluster interfaces.

# Bind Gigabitethernet 1/0/2 and Gigabitethernet 1/0/3 to the control channel for redundancy.

[AC1-ccluster-member-1] cluster-link control bind interface GigabitEthernet 1/0/2

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC1-ccluster-member-1] cluster-link control bind interface GigabitEthernet 1/0/3

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

# Bind Ten-Gigabitethernet 1/3/9 to the data channel. As a best practice to meet the bandwidth requirements of the data channel, use a Ten-Gigabitethernet interface or a minimum of two Gigabit Ethernet interfaces.

[AC1-ccluster-member-1] cluster-link data bind interface Ten-GigabitEthernet 1/3/9

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC1-ccluster-member-1] quit

5. Activate the cloud cluster configuration.

The activated configuration takes effect after the device restarts.

[AC1] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 3 service-timeout 5

cloud-cluster member 1

member-ip 192.168.10.10/24

join-cluster ip 192.168.10.10

role manager-worker

cluster-link control bind interface GigabitEthernet1/0/2

cluster-link data bind interface Ten-GigabitEthernet 1/3/9

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):test.cfg

Configuring AC 2

1. Configure the member ID of AC 2 in the cloud cluster.

# Change the member ID of AC 2 to 2, as each device must have a unique member ID within the cluster. The change takes effect after you activate the cloud cluster configuration.

<AC2> system-view

[AC2] cloud-cluster member 1 renumber 2

This command will take effect after the cloud cluster configuration is activated. The command might result in configuration change or loss when it takes effect. Continue? [Y/N]: y

[AC2] cloud-cluster member 1

[AC2-ccluster-member-1]

2. Configure the member IP address of AC 1 in the cloud cluster.

In a cloud cluster, the member IP addresses of member devices must be on the same subnet.

[AC2-ccluster-member-1] member-ip 192.168.10.11 24

3. Specify the cluster IP address.

The cluster IP address is the IP address of the leader. In this example, AC 1 acts as the leader and the cluster IP address must be specified as 192.168.10.10.

[AC2-ccluster-member-1] join-cluster ip 192.168.10.10

4. Bind cloud cluster interfaces.

# # Bind Gigabitethernet 1/0/2 and Gigabitethernet 1/0/3 to the control channel for redundancy.

[AC2-ccluster-member-1] cluster-link control bind interface GigabitEthernet 1/0/2

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC2-ccluster-member-1] cluster-link control bind interface GigabitEthernet 1/0/3

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC2-ccluster-member-1] cluster-link data bind interface Ten-GigabitEthernet 1/3/9

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC2-ccluster-member-1] quit

5. Activate the cloud cluster configuration.

The activated configuration takes effect after the device restarts.

[AC2] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 3 service-timeout 5

cloud-cluster member 2

member-ip 192.168.10.11/24

join-cluster ip 192.168.10.10

role manager-worker

cluster-link control bind interface GigabitEthernet2/0/2

cluster-link data bind interface Ten-GigabitEthernet 2/3/9

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):test.cfg

Configure LACP MAD

# Create Layer 2 aggregate interface Bridge-Aggregation 1 and set its aggregation mode to dynamic.

<AC1> system-view

[AC1] interface bridge-aggregation 1

[AC1-Bridge-Aggregation1] link-aggregation mode dynamic

# Enable LACP MAD.

[AC1-Bridge-Aggregation1] mad enable

You need to assign a domain ID (range: 0-4294967295)

[Current domain ID is: 0]: 1

The assigned domain ID is: 1

[AC1-Bridge-Aggregation1] quit

# Add Ten-GigabitEthernet 1/3/10 to aggregation group 1.

[AC1] interface ten-gigabitethernet 1/3/10

[AC1-Ten-GigabitEthernet1/3/10] port link-aggregation group 1

[AC1-Ten-GigabitEthernet1/3/10] quit

# Add Ten-GigabitEthernet 2/3/10 to aggregation group 1.

[AC1] interface ten-gigabitethernet 2/3/10

[AC1-Ten-GigabitEthernet2/3/10] port link-aggregation group 1

[AC1-Ten-GigabitEthernet2/3/10] quit

Configuring wireless hot backup

# Configure AP hot backup.

<AC1> system-view

[AC1] wlan ap-backup hot-backup enable global

This operation will enable fast switchover for AP backup.

# Configure client hot backup.

[AC1] wlan client-backup hot-backup enable

Verifying the configuration

# Display physical cluster information. Verify that the cluster contains two devices, with AC 1 as the leader and AC 2 as a follower.

<AC1> display cloud-cluster

Manager list:

Member ID Role Member IP State Heartbeat(ms)

1 Leader 192.168.10.10 online 100

2 Follower 192.168.10.11 online 0

Worker list:

Member ID State Heartbeat(ms) Joined at

1 online 100 2023-02-12 06:13:28

2 online 200 2023-02-12 06:13:28

# Display container cluster information. Verify that the container on AC 1 is the master and the container on AC 2 is a standby.

<AC1> display cloud-cluster service-cluster container

Container ID Slot ID Member ID Role Status

1 1 1 Master Online

2 2 2 Standby Online

H3C Cloud Cluster Technology Best Practices-6W100

Basic concepts of physical cluster

Physical cluster topology

Physical cluster establishment

Member leaving

Basic concepts of container cluster

Container cluster establishment

AP hot backup and client hot backup

Procedure

Intelligent Terminal Products

Product Support Services

Technical Service Solutions

Resource Center

Policy

Online Help

Become a Partner

Partner Policy & Program

Global Learning

Partner Sales Resources

Service Business

News & Events

Contact Us