Cloud cluster is an H3C-proprietary software virtualization technology. It uses the Comware 9 containerized architecture to decouple applications and physical devices as much as possible. A Comware 9-based cloud cluster is divided into the following layers:

· Physical cluster—Indicates a physical cluster at the physical device layer. The core idea of physical clustering is to connect multiple physical devices together and virtualize them into one device after necessary configuration. Using this virtualization technology can integrate hardware resources from multiple devices. On one hand, it enables unified management and allocation of hardware resources from multiple devices, increasing resource utilization and reducing management complexity. On the other hand, it also achieves hardware-level backup, which improves the high reliability of the entire system.

· Container cluster—Indicates a container cluster at the application layer. The core idea of container clustering is to logically connect containers running on physical devices together and virtualize them into one system after necessary configuration. Using this virtualization technology can integrate software processing capabilities from multiple containers. It enables collaborative operation, unified management, and uninterrupted maintenance of multiple containers.

NOTE:

Currently, cloud cluster supports only Comware 9-based container clustering. Unless otherwise specified, the term "container" in this document refers to Comware 9-based containers.

Networking applications

Basic networking applications

In the basic networking applications of cloud cluster, physical devices have a one-to-one correspondence with Comware containers, with one Comware container running on each physical device. Physical devices can be clustered to form a device-level backup, while Comware containers can be clustered to form a service-level backup. The entire physical cluster corresponds to the Comware container cluster. This provides a simple topology, simple configuration, and easy maintenance.

As shown in Figure 1, two devices form a physical cluster. For their upper and lower layer devices, the two physical devices are virtualized into one network device (corresponding to the container cluster in Figure 1). The virtual network device owns and manages the resources on all its member devices.

Figure 1 Basic networking application of cloud cluster

Advanced networking applications

In advanced networking applications of cloud clusters, network administrators can deploy one or more Comware containers on physical devices based on parameters such as performance, hardware resources, and processing capabilities. Comware-based containers created on the same physical cluster can belong to the same cluster, or they can be divided into different clusters to provide services for different user networks.

As shown in Figure 2, two devices form a cloud cluster which, at the physical level, are virtualized into a single physical device. At the service level, they are virtualized into two Comware-based container clusters, providing transmission services for different user networks, respectively. The super administrator of the entire network maintains the physical cluster, while the Comware-based container clusters can be assigned to user network administrators for maintenance. After logging into the Comware-based container clusters, the user network administrators can complete the configuration of features such as switching, routing, and security.

Compared to basic cloud cluster networking applications, advanced cloud cluster networking applications are more flexible and can adapt to personalized management needs of user networks. However, this application has requirements for the performance, hardware resources, and processing capabilities of the physical devices.

Figure 2 Advanced networking application of cloud cluster

Cloud cluster architecture

Figure 3 shows the physical architecture of a cloud cluster. The relationship between the physical cluster and the container cluster is as follows:

· Cloud platform modules run on physical devices. Cloud platform modules run directly on the H3C-optimized Linux system. The cloud platform modules on physical devices communicate with each other through Layer 3 channels to virtualize these physical devices into a physical cluster.

· Containers run on cloud platform modules and are managed by cloud platform modules. The containers on physical devices communicate with each other through LIPC/MBUS channels to virtualize containers into a container cluster.

· A Comware container is a container that runs the Comware system and provides basic communication functions such as routing and switching for devices. You can also install other containers on devices, but only Comware containers can be virtualized into a container cluster in the current software version.

Figure 3 Physical architecture of a cloud cluster

Cloud cluster virtualizes physical devices into a dual-cluster virtualization architecture with a physical cluster and a container cluster, as shown in Figure 4.

Figure 4 Logical architecture of cloud cluster

Benefits

The following are benefits of the overall cloud cluster architecture:

· Dual-layer virtualization architecture—Cloud cluster uses a dual-layer virtual architecture with a physical cluster and a container cluster to separate the underlay hardware infrastructure from the overlay applications. This increases the flexibility of the entire cloud cluster system as follows:

¡ The underlay physical cluster achieves unified management of physical devices, while acting as the orchestration and management platform for the overlay containers.

¡ The overlay container cluster provides high reliability and intelligent elastic scaling for services.

· Dual-layer selection of primary containers—Cloud cluster enhances the stability of service operation by adopting the following dual-layer primary container selection as follow:

¡ When the physical cluster is operating correctly, the primary container in the container cluster is elected and maintained by the physical cluster.

¡ When the physical cluster is operating incorrectly, the container roles are determined by the role election mechanism within the container cluster.

· Simplified management—After the physical cluster is set up, users can log in to the cloud cluster system through any port of any member device to manage all member devices and containers in the cloud cluster.

The following are benefits of the container cluster layer:

· 1:N container backup—The container cluster is formed by multiple Comware containers. The primary Comware container is responsible for the operation, management and maintenance of the container cluster and the subordinate Comware containers process services as a backup. Once the primary Comware container fails, the system will quickly and automatically elect a new primary Comware container to ensure uninterrupted service operation. This achieves 1:N backup of Comware containers.

· Link aggregation across Comware containers—The physical links between Comware containers and upper and lower layer devices support aggregation. Different physical links on different Comware containers can also be aggregated into a logical link. The physical links can back up each other or share the traffic load. If a Comware container leaves the container cluster, the links on the other Comware containers can still send and receive packets, thus improving the reliability of the aggregated links.

· Powerful network expansion capabilities—You can add physical interfaces to increase the number of ports and bandwidth of the container cluster. The member devices can independently process protocol packets and forward packets with their own CPUs, allowing for easily and flexibly expansion of the processing capacity of the container cluster.

Cloud platform components

The components that implement the cloud cluster functionalities within a device are collectively referred to as the cloud platform. The cloud platform contains the following components:

· Cloud platform Manager—Runs in the host operating system of each physical node that participates in physical cluster management. The Manager is responsible for providing cloud platform HA, establishing the cluster, and managing cluster members. It provides the following functions:

¡ Manage, establish, and maintain the physical cluster, manage cluster members, and generate and update the cluster topology.

¡ Manage the container cluster, intelligently deploy Comware containers based on the distribution of physical hardware resources, and elect the primary and subordinate containers for the container cluster.

· Cloud platform Worker—Runs in the host operating system of each physical node. The Worker component is responsible for managing the lifecycle of physical nodes and containers. It periodically reports the physical resources and status of the nodes, responds to scheduling instructions from the Manager component, and creates and runs containers based on instructions from the Manager component.

· Cloud platform Admin—Runs on each physical node. The Admin component receives and processes configuration messages from the primary Comware container. It is responsible for managing device operating modes and container description files, and sending container deployment requests to the Manager cluster.

· Cloud platform Agent—Runs in containers. The Agent component is responsible for reporting the health status of the services inside the container and notifying the service module of cluster and container events.

Figure 5 Cloud platform components

Figure 6 shows the internal running locations of cloud platform components within the device. After each physical device is powered on and starts up, it automatically runs the cloud platform Worker, cloud platform Admin, and cloud platform Agent components. The cloud platform Manager is an optional component. The device will run the cloud platform Manager component only when the device role is configured as manager+worker in order to participate in the management of the physical cluster.

Figure 6 Running locations of cloud platform components

Physical cluster working mechanism

Basic concepts of physical cluster

Member device roles

Every device in the physical cluster is called a member device. Member devices are divided into two roles based on different functions:

· Manager: Responsible for the high availability (HA) function of the cloud platform, establishing and managing the cluster members. It includes the following functions:

¡ Manage the physical cluster, establish and maintain it, manage cluster members, and generate and update cluster topology.

¡ Manage the container cluster, intelligently deploy Comware containers based on the distribution of hardware resources in the physical cluster, and elect the primary and subordinate containers of the container cluster.

Managers are further divided into leader and follower depending on their responsibilities.

¡ Leader: Primary manager, responsible for managing and controlling the entire cloud cluster, acting as the control center of the entire cloud cluster.

¡ Follower: Backup manager, running as a backup while handling service and forwarding packets at the same time.

When devices are configured as managers, the devices automatically run the manager component to perform the relevant manager functions.

· Worker: Responsible for local node management, reporting node resources to the leader, and receiving scheduling messages from the leader for container deployment.

When devices are configured as workers, the devices automatically run the worker component to perform the relevant worker functions.

At first-time creation of a physical cluster, the network administrator determines the physical devices on which managers are deployed and the leader and follower roles of the managers.

Member ID

In a cloud cluster, a physical device is uniquely identified by a member ID, and member IDs are used during the setup and maintenance of both physical and container clusters.

In a cloud cluster, only one device can use the default member ID, and all other devices must modify their member IDs before joining the cluster. When modifying member IDs, make sure they are unique within the cloud cluster.

· During the setup of a physical cluster, if two devices have the same member ID, the device that registers later cannot join the physical cluster.

· During the operation of a physical cluster, if a new device tries to join but its member ID conflicts with an existing member's ID, the device cannot join the physical cluster.

Member IP address

Member IP addresses are used for internal communication within a physical cluster, specifically for exchanging physical cluster protocol messages (which are Layer 3 IP packets). All member devices in the physical cluster must be configured with a member IP address, and all member IP addresses must belong to the same network segment. Make sure all member devices can reach each other at Layer 3.

Join-cluster IP address

Join-cluster IP is an IP address configured by the administrator on a device to guide the device to join the physical cluster. The cluster IP can be the member IP of any existing member device in the physical cluster. As a best practice, configure the member IP of the leader as the join-cluster IP.

During the initial setup of the physical cluster, it is not required to configure the join-cluster IP for the leader. Devices not configured with a join-cluster IP consider themselves as the leader.

Physical cluster splitting

As shown in Figure 7, after the formation of a physical cluster, if a link failure occurs and results in two adjacent member devices in the physical cluster becoming disconnected, the physical cluster splits into two separate physical clusters. This process is called physical cluster splitting.

Figure 7 Physical cluster splitting

Physical cluster merging

As shown in Figure 8, after the failure link is repaired, the split physical clusters will automatically merge, and this process is called physical cluster merging.

If physical cluster A splits into physical cluster 1 and physical cluster 2, and physical cluster B splits into physical cluster 3 and physical cluster 4, the original cluster topology information remains on the split clusters. Therefore, physical cluster 1 cannot merge with physical cluster 3 or physical cluster 4, and physical cluster 2 cannot merge with physical cluster 3 or physical cluster 4 either.

Figure 8 Physical cluster merging

Physical cluster topology

The control packets of a physical cluster are Layer 3 IP packets. The physical cluster requires that member devices be configured in the same network segment and use this network segment to exchange physical cluster control packets. A physical cluster supports chain-shaped connection and star-shaped connection.

· When two devices are set up as a physical cluster, they can be connected in a chain or a star topology.

¡ Chain connections are suitable for networks where member devices are physically concentrated.

¡ Star topology connections have lower physical location requirements for member devices than chain connections and are mainly used for networks where member devices are physically dispersed. However, an intermediary device is required to interconnect the member devices.

· When the number of member devices exceeds two, you must use the star-shaped connection.

As shown in Figure 9, the container cluster shares its control link with the physical cluster, enabling members in the physical cluster to forward control packets.

Figure 9 Physical cluster topology

‌

NOTE:

In the current software version, the physical cluster uses the control channel of the container cluster link to transmit physical cluster control packets. To set up a container cluster network, the network administrator must use commands to bind physical interfaces with the control channel and data channel of the container cluster link on the device. The control channel will be used to transmit physical cluster control packets and container cluster control packets between cloud clusters. The data channel will be used to transmit data packets during cross-container forwarding.

‌

Physical cluster establishment and changing

Physical cluster establishment

To initially establish a physical cluster, you must configure devices to determine device identities. When you build the cluster, first complete the cluster planning, including devices to participate in the management of the physical cluster, leader device, member IDs, and network segment used for internal communication.

For a device to act as the leader manager, configure the following on the device:

· (Optional.) Specify the device role as Manager+Worker. By default, the device role is Manager+Worker.

· (Optional.) Specify the member ID. You can use the default member ID.

· (Required.) Specify the member IP address.

Then, restart the device. The device acts as the leader and operates as follows:

1. The device starts the Admin, Manager, and Worker components of the cloud platform according to the configuration file.

2. The leader device establishes internal communication channels with worker devices. The worker devices register on the leader and report hardware resource information.

3. The leader notifies the workers to start the containers.

Figure 10 Leader startup process

Device joining

Newly joined devices in the physical cluster also determine their identities through configuration.

For a device to act as a follower manager, you must configure the following:

· (Optional.) Specify the device role as Manager+Worker. By default, the device role is Manager+Worker.

· (Required.) Specify the member ID. Make sure the member ID is unique in the cluster.

· (Required.) Specify the member IP address. Make sure the address is on the same subnet as the member IP of the leader and the two devices can reach each other.

· (Required.) Specify the Join-cluster IP address. As a best practice, specify the leader's member IP as the join-cluster IP. You can also specify the member IP of any other member device.

For a device to act as a worker, you must configure the following:

· (Optional.) Specify the device role as Worker. By default, the device role is Manager+Worker.

· (Required.) Specify the member ID. Make sure the member ID is unique in the cluster.

· (Required.) Specify the member IP address. Make sure the address is on the same subnet as the member IP of the leader and the two devices can reach each other.

· (Required.) Specify the Join-cluster IP address. As a best practice, specify the leader's member IP as the join-cluster IP. You can also specify the member IP of any other member device.

The following steps describe the process of a follower joining the cluster. The startup process of a follower, excluding steps relevant to the manager component, is the same as the startup process of a worker.

After configuring a follower device, restart the device. The device will read the configuration. Once the device detected that a join-cluster IP is configured, it starts up as a follower and sends a cluster join request to the join-cluster IP address:

· If the join-cluster IP is the leader's member IP, the leader receives the cluster join request and unicasts a reply indicating a successful join.

· If the join-cluster IP is the member IP of another follower device in the cluster, the follower device forwards the join request to the leader. The leader unicasts a reply indicating a successful join to the new device.

As shown in Figure 11, the process is as follows for Device B to join physical cluster Device A:

1. Device B starts the Admin, Manager, and Worker components of the cloud platform based on the configuration file.

2. The Worker component automatically starts the Comware container, and the Manager (follower) and Worker automatically register with the leader and start the cluster join timer.

3. Device A is the leader in the physical cluster and replies with a successful join message to the Manager (follower) and Worker.

4. The leader periodically unicasts Hello packets (announcing itself as a healthy leader) to the members.

5. After receiving the Hello packet, Device B records the leader's information and reports its local physical resource information to the leader.

6. If the network administrator issues a command to create a container, the leader schedules Device B to create and start the container based on the resource information reported by each member device.

7. After the container on Device B is successfully started, the Worker component reports the container's information to the leader.

8. The leader synchronizes the physical cluster information with the Manager component of Device B so that the follower can act as a backup to the leader. The leader also synchronizes the information about other containers in the current cloud cluster with the Worker component of Device B.

Figure 11 Adding a new device to the physical cluster

Member leaving

After the physical cluster is successfully established, the leader records the information of all managers and workers in the cluster, and draws the cluster topology based on the connection. The relationship between the leader and follower is maintained through interacting Hello packets. A member device can actively leave the physical cluster or be forced to leave the cluster:

· Active leaving

Active leaving refers to the scenario where an administrator executes the undo join-cluster command in cloud-cluster member view to remove the device from the physical cluster. The device sends a leave cluster message to the leader, and the leader replies with a leave cluster response. Then, the leader removes the device from the physical cluster device list and physical cluster topology. Finally, the updated physical cluster information and cluster topology will be synchronized to other followers.

· Passive leaving

Passive leaving of a member device from the physical cluster refers to the scenario where the member cannot reach the leader's member IP. Control packets cannot reach the other end. The process of a member device passively leaving the physical cluster is as follows:

a. The leader periodically unicasts Hello packets to announce its status to all followers.

b. Each follower locally creates an election timer. If a Hello packet is received before the election timer expires, it is considered that the leader is running normally, and the follower responds with a Hello response packet.

c. When the leader receives a Hello response, it considers that the corresponding follower is running normally. If the leader does not receive a Hello response from a certain follower, the leader will decrease the number of Hello packet timeouts by 1. If the number of Hello packet timeouts reaches 0 and the leader still has not received a Hello response from that follower, the leader considers that the follower has temporarily left the physical cluster and sets the follower's status to Offline.

d. If a follower has not received a Hello packet from the leader until the election timer expires, it considers that the leader has failed. The follower will enter the leader role election process.

Figure 12 Passive leaving

Physical cluster splitting

During the operation of a physical cluster, leaders, followers, and workers in the cluster periodically send Hello packets to each other to maintain the cluster relationship. If the Hello timer expires and no response is received from the peer end, the device considers that the peer end has failed and set the state of the peer end to Offline.

Once a physical cluster is formed, if a link fails between member devices and the Hello packets cannot reach the destination, the physical cluster splits into two separate physical clusters. This process is called cluster splitting. After splitting, the following conditions might occur:

· If the number of member devices in one physical cluster is greater than half of the total number of member devices before the split, this physical cluster can function normally. The other physical cluster cannot function properly.

· If the number of member devices in both physical clusters is less than or equal to half of the total number of member devices before the split, neither of the two physical clusters can function properly.

Working properly means being able to maintain the physical cluster and manage the containers deployed on the physical cluster. Not working properly means being able to maintain the physical cluster but unable to manage the containers deployed on it.

One operating cluster after splitting

After the split, if the number of member devices in one physical cluster exceeds half of the total number of member devices in the original cluster, this physical cluster can retain the original leader or elect a new leader to continue functioning. The other physical cluster, in which the number of member devices is less than the total number of member devices in the original cluster, cannot retain the original leader or elect a new leader, and thus cannot function properly.

As shown in Figure 13, the total number of member devices is three. After the split, physical cluster 1 has two member devices, and physical cluster 2 has only one member device.

· In physical cluster 1 that has two member devices:

¡ If one member was the leader before the split, it can detect that a member has left through Hello packets. Since the number of remaining members exceeds half of the total members, the member can continue acting as the leader. Physical cluster 1 can operate correctly.

¡ If both members were followers before the split, a new leader will be elected from the two followers according to the Raft algorithm. The newly elected leader will take over services ofthe original leader and the physical cluster 1 can operate correctly.

· In physical cluster 2 that has only one member device:

¡ If the member was the leader before the split, it can detect that the number of remaining members in physical cluster 2 is less than the total number of clusters upon Hello packet timeout. The member degrades to follower.

¡ If the member was a follower before the split, it cannot obtain the majority of votes according to the Raft algorithm. The member must continue operating as a follower.

Figure 13 One operating cluster after splitting

No operating cluster after splitting

If the number of member devices in both physical clusters is less than or equal to half of the total number of member devices in the original physical cluster, neither physical clusters can retain the original leader nor elect a new leader. Because both clusters lack a leader, neither of the two clusters can function properly.

As shown in Figure 14, the total number of members is two in the original physical cluster. After the split, both physical cluster 1 and physical cluster 2 have only one member. Since none of the member devices can obtain a majority vote, they can only act as followers. Both physical clusters are unable to function properly.

Figure 14 No operating cluster after splitting

Physical cluster merging

The process of interconnecting two stably operating physical clusters to form one physical cluster is called physical cluster merging.

Only physical clusters with member IPs on the same subnet can be merged into one cluster. Physical clusters with member IPs on different subnets, even if the clusters can reach each other at Layer 3, cannot be merged into one cluster.

The following conditions might occur during physical cluster merging:

· If one of the clusters contains a leader, the leader can discover the adding of new devices through Hello packets. The leader role remains unchanged and the newly added devices operate as followers.

As shown in Figure 15, Device A, Device B, and Device C form a physical cluster, with Device B as the leader. When the cluster link between Device B and Device C fails, the cluster splits into two. Device B still operates as the leader. When the cluster link between Device B and Device C is repaired, Device C can receive Hello packets from the leader and will join physical cluster 1 as a follower. Device B continues to act as the leader of the entire physical cluster.

Figure 15 Merging of a cluster that has a leader and a cluster that does not have a leader

· If no cluster has a leader, the followers elect a new leader according to the Raft algorithm. As shown in Figure 16, Device A and Device B formed a physical cluster, with Device B as the leader. When the cluster link between Device A and Device B fails, the cluster splits into two. However, because the number of members in either cluster can exceed half of the total members, no leader can be elected in either cluster. When the cluster link between Device A and Device B is repaired, Device A and Device B can receive voting requests from each other. The one that receives the voting response first becomes the leader, and the other acts as a follower. In the figure, suppose Device A receives the voting response first and is elected as the leader.

Figure 16 Merging of two clusters that do not contain a leader

Container cluster working mechanism

Basic concepts of container cluster

Operation mode

Comware 9 containers use the cluster mode in factory default settings and support forming container clusters with other Comware 9 containers. Even a single Comware 9 container running on its own is considered a container cluster, but with only one member.

Member container roles

Each container in a container cluster is called a member container. Member containers are divided into the following roles according to their functions:

· Master container: Responsible for managing and controlling the entire container cluster.

· Standby container: Runs as a backup container for the master container while processing business and forwarding packets. When the master container fails, the system automatically elects a new master container from the standby containers.

In a correctly operating physical cluster, the master and standby roles are determined by the leader of the physical cluster. When the physical cluster fails, both the master and standby containers are selected through election.

Only one master container exists in a container cluster at a time, and all the other member containers are standby containers. For more information about the container election process, see "Master container election."

Container ID

A container ID is the unique identifier of a container in the container cluster, and a member ID is the unique identifier of a member device in a cloud cluster. Member containers run on physical devices, and the container IDs are assigned by the leader of the physical cluster in a unified way.

In a cloud cluster, only one device can use the default member ID, and you must modify the member IDs for the other devices before adding them to the cloud cluster. When modifying a member ID, make sure the ID is unique in the cloud cluster.

· When setting up the physical cluster, if there are devices with the same member ID, the later joined device cannot join the physical cluster.

· During the operation of the physical cluster, if a new device joins the physical cluster, but its ID conflicts with the ID of an existing member device, the device cannot join the physical cluster.

Container cluster domain

A domain is a logical concept, and one container cluster corresponds to one container cluster domain.

To accommodate various network applications, multiple container clusters can be deployed in a network, and the container clusters are distinguished by domain IDs. As shown in Figure 17, Device A and Device B form container cluster 1, and Device C and Device D form container cluster 2. The two container clusters are configured with different domain numbers, ensuring that the operation and services of the two clusters do not interfere with each other.

Figure 17 Container cluster domain for different container clusters

Container cluster splitting

As shown in Figure 18, if cluster link errors occur in a container cluster, causing inconnectivity between two adjacent devices, the container cluster splits into two. This process is called container cluster splitting.

Figure 18 Container cluster splitting

Container cluster merging

As shown in Figure 19, you can connect two (or more) stably operating container clusters to each other and configure required settings to form one container cluster. This process is called container cluster merging.

Figure 19 Container cluster merging

Container cluster topology

To set up a container cluster network, the network administrator must use commands to bind physical interfaces with the control channel and data channel of the container cluster link on the member devices. The control channel will be used to transmit physical cluster control packets and container cluster control packets between the member devices. The data channel will be used to transmit data packets during cross-container forwarding.

As shown in Figure 20, a container cluster supports chain-shaped connection and star-shaped connection.

· To use two devices to form a physical cluster, use the chain-shaped or star-shaped connection.

¡ Chain connections are suitable for networks where member devices are physically concentrated.

· To use more than two devices to form a physical cluster, use the star-shaped connection.

As shown in Figure 20, the container cluster shares its control link with the physical cluster, enabling members in the physical cluster to forward control packets.

Figure 20 Container cluster topology

‌

Container cluster establishment

The Worker component of the cloud platform is responsible for creating and deleting containers.

A Comware 9 container is the basic container of the device, used to implement routing and forwarding functions. Therefore, the device supports Comware 9 container by default. Currently, the physical cluster supports collaboration only with Comware 9 container and can manage Comware 9 containers (such as determining the primary and backup containers). The physical clusters can host other containers based on the Docker technology, but cannot manage non-Comware container clusters.

The process of establishing a container cluster is as follows:

1. After a device starts up, it automatically attempts to start Comware 9 containers. The cloud platform Agent component inside the container notifies the Worker component of the container creation and deletion events.

2. The Worker component forwards the container creation and deletion events to the leader in the physical cluster.

3. The leader decides whether to allow the creation or deletion of containers based on the physical resource usage. If allowed, the first created container is the master container, and containers created later are standby containers.

4. The leader notifies the Worker component to create or delete the container.

5. After the Worker component successfully creates or deletes containers, it notifies the leader of the creation or deletion result.

6. The leader updates the container information table (including LIP and container MAC) and the container topology (including Container ID, Member ID, and container MAC), and then synchronizes the updated container information table and container topology to all containers in the cloud cluster.

Container monitoring and intelligent management

The leader monitors various key and service metrics of containers and uses these values to intelligently manage the containers.

Container key metrics

Key metrics for containers refer to metrics that represent the fundamental functions of individual containers or container cluster systems, such as chip failures, abnormal CPU port detection, and abnormal board status. Since key metrics have a significant impact on devices, when a container detects an anomaly in a key metric, it immediately reports a critical event to the leader. The cloud platform or the inner layer of the container will isolate the faulty node to prevent the fault from escalating. The following isolation types are available:

· Fault isolation: When a container fails, the cloud platform actively triggers isolation of the container.

The service port of a container in fault isolation state is shut down. The container cannot forward service packets, nor can it send or receive container cluster control packets. When the fault recovers, the container automatically restarts to exit the isolation state. The container then can rejoin the operating container cluster.

· Cluster split isolation: When the physical link between containers is disconnected, isolation is also triggered on the cloud platform or the inner layer of the containers.

The service port of the isolated container is shut down. The container cannot forward service packets but can send and receive control packets of the container cluster. When the faulty link recovers and container clusters merging occurs, the isolated container cluster automatically restarts and joins the normally running container cluster.

Table 1 Container key metrics

No.	Key metric	Description	Values	Method for the Leader to obtain the metric	Impact of key metric failures on cloud clusters	Leader's approach to handling key metric failures
1	Chip jam	Chip blockage periodic detection	· Normal · Abnormal	Proactive report	Container fault	Fault isolation
2	CPU port	CPU chip pin detection	· Normal · Abnormal	Proactive report	Container fault	Fault isolation
3	Board status	Module status detection	· Normal · Abnormal	Proactive report	Container fault	Fault isolation
4	Fan status	Fan status detection	· Normal · Abnormal	Proactive report	Container fault	Fault isolation
5	Temperature status	Temperature sensor status detection	· Normal · Abnormal	Proactive report	Container fault	Fault isolation
6	Abnormal reboot	Device abnormal restarts (> 2 times)	· Normal · Abnormal	Proactive report	Container fault	Fault isolation

‌

Container service metrics

Container service metrics refer to the service-related metrics within a container that must be closely monitored, such as container health, the number of ARP entries, and the number of MAC entries. Container service metrics are an important basis for the election of master and standby container roles. If container service metrics are abnormal, the basic forwarding functions of the container is affected.

In a physical cluster, the Worker component of the cloud platform periodically retrieves the values of container service metrics. If the value of a service metric changes, the Worker component reports information such as the container ID, the name of the service metric, and the value of the service metric to the leader of the physical cluster. The leader then takes appropriate actions.

The container health score reflects the actual health status of the device. Containers with higher health scores have a higher priority to be elected as the master container. When several containers have the same health score, the containers with a higher cumulative service volume have a higher priority to be elected as the master container.

Table 2 Container service metrics

No.	Service metric	Description	Values (Integer)	Metric obtaining method	Metric reference value
0	Device health	Container health score	0 to 100	Periodic obtaining	Container health status
1	Arp Resource	Number of ARP entries	≥ 0	Periodic obtaining	Container service volume
2	Mac Resource	Number of MAC entries	≥ 0	Periodic obtaining
3	FIB Resource	Number of FIB forwarding entries	≥ 0	Periodic obtaining
4	ND Resource	Number of ND forwarding entries	≥ 0	Periodic obtaining
5	IPv4 Resource_L2	Number of IPv4 Layer 2 multicast entries	≥ 0	Periodic obtaining
6	IPv6 Resource_L2	Number of IPv6 Layer 2 multicast entries	≥ 0	Periodic obtaining
7	IPv4 Resource_L3	Number of IPv4 Layer 3 multicast entries	≥ 0	Periodic obtaining
8	IPv6 Resource_L3	Number of IPv6 Layer 3 multicast entries	≥ 0	Periodic obtaining
9	ACL Resource	ACL resources	≥ 0	Periodic obtaining

‌

Master container election

Master container election takes place in the following situations:

· A container cluster is established.

· The master container leaves or fails.

· A container cluster splits.

· Two (or more) independently running container clusters merge into one container cluster.

Election at container cluster establishment

When a container cluster is established for the first time or the entire container cluster restarts, the container that starts first becomes the master container. The other containers become the standby containers. Therefore, after the entire container cluster restarts, it is possible for another container to be elected as the master container.

Election upon master container leaving or failure or cluster split

When the master container leaves or fails, or when the container cluster splits, the system elects a new master in the following order:

1. The current master container keeps running as the master. If the current master is still available, the container cluster will not elect a new master even if a new container with a higher priority joins. This rule does not apply when the container cluster is established, as all joined devices consider themselves as the master.

2. The container with the highest member priority is selected.

3. The container with the highest health score is selected.

4. The container with the longest running time is selected. In the container cluster, the measurement precision of running time is 10 minutes. If the startup time interval of two devices is less than or equal to 10 minutes, they are considered to have equal running time.

5. The container with the largest cumulative service volume is selected.

6. The container with the lowest CPU MAC addresses is selected.

Once determined, the master container immediately broadcasts a Hello packet to announce its master identity, health, and service volume information. Upon receiving this packet, the other containers stop the election process and function as standby containers. Standby containers send Hello packets that carry the role, health, and service volume information to the master container. The master container collects information and topology of all standby containers through Hello packets and reports the information to the leader. Once the container cluster information finishes updating, Hello packets are periodically sent between the master and standby containers to maintain the container cluster relationship.

Cloud clusters support a dual-layer election mechanism for the master container, which enhances the reliability and robustness of the cluster:

7. When the physical cluster is running normally, the leader of the physical cluster selects the master container based on the master container election rules.

8. When the physical cluster does not have a leader and cannot run normally, the container cluster itself selects the master container based on the master container election rules.

Election at container cluster merging

See "Container cluster merging."

Cloud cluster HA mechanism

Container cluster splitting and MAD

Split detection

When a container cluster link fails, it will lead to a Hello packet timeout. As a result, the standby containers mistakenly assume that the master container has failed due to not receiving its Hello packet. According to the master container election rules, the standby containers elect a new master container. This might cause the network to split into two container clusters, essentially dividing one cluster into two new clusters.

Conflict handling

The two container clusters formed by a split have the same IP address and other Layer 3 configurations, which can cause address conflicts and lead to a wider network failure. To improve system availability, the cloud cluster offers the following technologies to minimize the impact of container cluster splits on service operations:

· Multi-Active Detection (MAD)—A container cluster uses the MAD technology to obtain MAD parameters and make MAD decisions based on those parameters, ultimately achieving the following objectives:

¡ One container cluster keeps operating.

¡ The other container cluster switches to Recovery state (disabled state) and automatically shuts down all service ports on all member containers (except for reserved ports). This ensures that the container cluster in Recovery state can no longer forward service packets. To configure the reserved ports, use the mad exclude interface command.

For more information about MAD, see "MAD."

· Aggregation side selection—The cloud cluster uses aggregate interfaces to connect upstream and downstream devices. When MAD is not configured, the aggregation side selection technology enables the upstream and downstream devices to use the standard LACP protocol to select the same side device for packet forwarding, thus avoiding network conflicts. For more information about aggregation side selection, see "Aggregation side selection."

MAD fault recovery

The MAD fault recovery methods are the same for any MAD methods:

1. Repair the faulty link to automatically merge the split container clusters.

2. If the link is still not repaired but the normally working container cluster also fails, the container cluster in Recovery state can automatically or manually be enabled as an emergency backup.

MAD fault recovery achieved by repairing the failed link

A container cluster link failure causes the container cluster to split, resulting in multi-active conflicts. Therefore, repairing the faulty container cluster link and merging the conflicting container clusters back into one can restore the container cluster fault.

After the container cluster link is repaired, the system automatically restarts the container cluster in Recovery state. After the restart, all member containers in the Recovery container cluster join the normal working container cluster as container members. The service interfaces that were forcibly closed in the Recovery container cluster will automatically recover to their actual physical state. The entire container cluster system is then restored, as shown in Figure 21.

CAUTION:

Restart the container cluster in Recovery state according to the instructions. If you mistakenly restart the container cluster in the normal working state, the merged container cluster will still be in Recovery state and all member devices' service interfaces will be closed. In such a situation, execute the mad restore command to restore the entire container cluster system.

Figure 21 MAD fault recovery (link failure in container cluster)

MAD fault recovery achieved by using LACP

With LACP MAD auto-recovery configured, if the container cluster in normal working state fails before the MAD failure is resolved (possibly due to device failure or upstream/downstream link failure), LACP automatically activates the container cluster in Recovery state to ensure service operations. Then, the system can repair the failed container cluster and cluster link.

As shown in Figure 22, LACP MAD and LACP MAD auto-recovery are enabled when the container cluster is operating normally. When a cluster link failure occurs, LACP MAD can detect multi-active conflicts and set container cluster 2 in Recovery state without shutting down the interface configured with LACP MAD. This allows container cluster 2 to interact with container cluster 1 through the interface to exchange LACP MAD packets and check if container cluster 1 is operating normally. If container cluster 1 fails and causes the LACP MAD packets to timeout, container cluster 2 immediately exits Recovery state and takes over the operations of container cluster 1.

Figure 22 LACP MAD auto-recovery

‌

MAD fault recovery achieved by executing the mad restore command

If the container cluster in working state fails due to reasons such as device failure or uplink/downlink line failure, you can execute the mad restore command on the cluster in Recovery state. This operation restores the Recovery cluster to a normal state and replaces the failed working container cluster. Then, fix the faulty container cluster and links.

Figure 23 MAD fault recovery (normally working cluster failure before fault recovery)

MAD

About MAD

To adapt to various network requirements, cloud cluster supports the following MAD technologies:

· Cloud platform MAD

· LACP MAD

Cloud clusters primarily use cloud platform MAD. When cloud platform MAD is not operational, other detection methods are employed. Among these, LACP MAD is recommended. However, LACP MAD relies on intermediate devices to forward LACP packets, and these devices must support H3C's extended LACP protocol packets.

Figure 24 Comparison of different MAD types

MAD type	Advantages	Limits	Application scenarios
Cloud platform MAD	Feature that comes with physical clusters Additional configuration not required	· For products that only support the sharing of physical cluster links with container cluster links, the cloud platform MAD can only take effect when the physical cluster links are up but the container Hello packets have timed out. · For products that support shared container cluster and physical cluster links and dedicated physical cluster links, cloud platform MAD can take effect as long as the physical cluster links are up (the physical cluster is not split).	All cloud cluster networks
LACP MAD	High detection speed Supplements to cloud platform MAD	To transmit LACP MAD detection messages, it is necessary to use H3C devices (supporting extended LACP protocol packets) as intermediate devices, and each member container must be connected to the intermediate device.	Container cluster that use aggregated links to connect with upstream or downstream devices

MAD type

Advantages

Limits

Application scenarios

Cloud platform MAD

Feature that comes with physical clusters

Additional configuration not required

· For products that only support the sharing of physical cluster links with container cluster links, the cloud platform MAD can only take effect when the physical cluster links are up but the container Hello packets have timed out.

· For products that support shared container cluster and physical cluster links and dedicated physical cluster links, cloud platform MAD can take effect as long as the physical cluster links are up (the physical cluster is not split).

All cloud cluster networks

LACP MAD

High detection speed

Supplements to cloud platform MAD

To transmit LACP MAD detection messages, it is necessary to use H3C devices (supporting extended LACP protocol packets) as intermediate devices, and each member container must be connected to the intermediate device.

Container cluster that use aggregated links to connect with upstream or downstream devices

Cloud platform MAD

After a container cluster splits, the containers send Hello packets to detect the number of connected member containers based on their local records of container cluster member information and topology. This detected information is then reported to the leader.

If a leader exists in the current physical cluster, it triggers cloud platform MAD. If cloud platform MAD determines that the container cluster has split, it resolves the existing conflicts.

As shown in Figure 25, the physical devices Device A and Device B form a physical cluster, and Comware containers are running on each device, forming a container cluster. When a link failure occurs, causing the failure on the standby container to receive a Hello packet from the master container, the container cluster split into container cluster 1 and container cluster 2. However, because the physical cluster links are usually intact, the physical cluster continues to function normally. At this time, cloud platform MAD is used to handle the split of the container cluster.

Figure 25 Cloud platform MAD

LACP MAD

After a container cluster splits, if the cloud platform MAD cannot function properly and LACP MAD is configured, LACP MAD is triggered.

As shown in Figure 26, the physical devices Device A and Device B form a physical cluster, and Comware containers are running on each device, forming a container cluster. When a link failure occurs within the container cluster, the container cluster splits into container cluster 1 and container cluster 2. Since the physical cluster links are shared with the container cluster links, the physical cluster also splits. In the absence of a leader in the physical cluster to handle the MAD event, LACP MAD handles the split of the container cluster.

LACP MAD is implemented through extended LACP protocol packets and typically uses a network setup as shown in Figure 26:

· Each member device must be connected to an intermediate device.

· The links connecting member devices to the intermediate device are added to a dynamic aggregation group.

· The intermediate device must support extended LACP packets.

For more information about LACP, see Network Connectivity Configuration Guide.

Figure 26 LACP MAD

The extended LACP protocol defines a new Type/Length/Value (TLV) data field used for exchanging the DomainID and ActiveID (member ID of the primary device) of the container cluster. When LACP MAD detection is enabled, member devices use LACP protocol packets to interact with other member devices, exchanging DomainID and ActiveID information.

· If the DomainIDs are different, it indicates that the packets are from different container clusters, and MAD processing is not required.

· If the DomainIDs are the same and the ActiveIDs are also the same, it indicates that no multi-active conflict has occurred.

· If the DomainIDs are the same but the ActiveIDs are different, it indicates a split in the container cluster. Thereby, a multi-active conflict is detected.

MAD decision making

Regardless of which MAD technology is used, the decision-making principles for choosing which container cluster continues to operate and which one is disabled are the same across all MAD technologies. This ensures consistent decision-making results across different MAD mechanisms.

The principles for MAD decision-making are as follows:

· The container cluster with more members is given priority.

· The container cluster with a healthier primary device is given priority.

· The container cluster with a longer master operation time is given priority.

· The container cluster with a lower CPU MAC address of the master is given priority.

After the above comparisons, the container cluster with the higher priority continues to operate, and the container cluster with the lower priority will be disabled (enters the Recovery state and shuts down all service interfaces except for reserved ports).

Aggregation side selection

Application scenarios

When a cloud cluster is networked using Layer 2 aggregate interfaces with upstream and downstream devices, a cluster split can occur. In such cases, employing aggregation side selection can ensure that the traffic's forward and return paths remain consistent and are forwarded through the same cluster. This helps enhance the reliability of the link.

Aggregation side selection selects one side of the cloud cluster to forward packets through LACP based on the change of the container cluster bridge MAC address.

Operating mechanism

As shown in Figure 27, change the bridge MAC address of the container cluster, and aggregation side selection operates as follows:

A cluster link failure causes a container cluster to split into two separate clusters. At this point, the operating conditions of MAD are not met, so both container clusters continue to operate in the network.

Assume that before the split, Device A was the master and Device B was the standby. After the split, Device B becomes a master. According to the cloud cluster health algorithm, the original master is heathier. Therefore, the left container cluster (where Device A resides) has a better health status, hypothetically 100, while the right container cluster (where Device B resides) has a slightly lower health status, hypothetically 99.

The LACP priority is calculated as 100 minus the health status. Thus, the LACP priority for the interfaces on Device A becomes 0, making them more likely to be selected. Meanwhile, the LACP priority for the interfaces on Device B becomes 1, reducing their chances of being selected.

Due to the configured change in the bridge MAC address for the container cluster, the bridge MAC address for the cluster containing Device A remains unchanged, while the bridge MAC address for the cluster containing Device B changes. This change triggers the sending of LACP packets, which carry the updated LACP priority.

Devices C, D, and E, based on the LACP priority, all select the interfaces connected to Device A. This ensures that the traffic is forwarded through Device A.

Figure 27 Aggregation side selection

Container cluster merging

The merging of container clusters is divided into the following two cases depending on whether the MAD function is enabled:

· If the physical cluster can work normally or LACP MAD is configured, when a container cluster link fails and causes the container cluster to split, the cloud cluster lets one container cluster to work normally. The other container cluster will be disabled (in Recovery state). If the faulty link between the two split container clusters is restored, the two container clusters will automatically merge. The container cluster in Recovery state will automatically restart and join the current normally running container cluster as standby containers.

· If the physical cluster cannot work normally and LACP MAD is not configured, when the container cluster link fails and causes the container cluster to split, both container clusters will work normally (dual master phenomenon). In this case, if the faulty link between the two container clusters is restored, the two container clusters will automatically merge and master container election will be carried out. The election rules are as follows:

a. The master container with more member containers wins.

b. The master container running for a longer time wins.

c. The master container with higher health score wins.

d. The master container with a higher cumulative service volume wins.

e. The master container with a smaller CPU MAC address wins.

The container cluster that wins the master container election continues to work, while the cluster that fails automatically restart and join the working container cluster as standby containers.

Restrictions and guidelines: cloud cluster configuration

Before you restart a cloud cluster member device or adjust its configuration, use the display system stable state command to verify that the system is running stably.

When a member device in the cloud cluster is rebooting, do not configure the cloud cluster, especially avoid manually restarting the cpfagentd process. Doing so might cause multiple reboots of member devices or lead to abnormal operation of the cloud cluster.

In a cloud cluster, to use the map-configuration command to specify the AP configuration file, import the file into the storage media of each member to prevent issues if a master-backup switchover occurs and the AP configuration file cannot be found. The AP configuration file issued by the map-configuration command is effective only on the master of the cloud cluster. You must also specify the storage path on the master as the storage path of the configuration file. For more information about the AP configuration file, see AP Management Configuration Guide.

In a cloud cluster, to use the APDB user script to extend the supported AP models, import the APDB user script into the storage media of each member device to prevent issues if a master-backup switchover occurs and the user script cannot be found. For more information about the APDB user script, see AP Management Configuration Guide.

In a cloud cluster, the following features are not supported on devices:

· Virtual AP (see AP management configuration in AP Management Configuration Guide)

· Configuration rollback (see configuration file management configuration in Fundamentals Configuration Guide)

· WAPI (see WAPI configuration in WLAN Security Configuration Guide)

· Lite control mode (see WLAN Access Configuration Guide)

· NAT (see NAT configuration in Network Connectivity Configuration Guide)

· Dual-link backup (see WLAN h igh availability configuration in High Availability Configuration Guide)

· ASPF (see ASPF configuration in Security Configuration Guide)

In a cloud cluster, follow these restrictions and guidelines when you configure DPI-related features:

· Using DPI, Internet access behavior management, or security policies can cause packets to be incorrectly dropped. For more information about DPI, see DPI Configuration Guide. For more information about Internet access behavior management, see Internet Access Behavior Management Configuration Guide. For more information about security policies, see Security Configuration Guide. Support for DPI-related features varies by device model. For more information, see DPI Configuration Guide and Internet Access Behavior Management Configuration Guide.

· The device supports installing the following licenses:

¡ Appliance Application Signature Update License.

¡ IPS Signature Update Service License.

¡ URL Signature Update License.

You must install the above licenses on both the master and backup ACs.

When you configure ports bound to a cloud cluster, follow these restrictions and guidelines:

· As a best practice to prevent MAD exceptions in the cloud cluster, do no bulk configure multiple ports bound to a cloud cluster.

· Cloud cluster ports do not support the mirroring feature. For more information about mirroring, see Network Management and Monitoring Configuration Guide.

After a master-backup switchover occurs in a cloud cluster, it might take a few minutes for the system to resynchronize data from the cloud platform. During this period, clients cannot access the network through PPSK authentication. For more information about cloud platform PPSK, see WLAN Security Configuration Guide.

The cloud cluster natively supports AP license synchronization. For example, AC 1 is installed with N licenses and can be connected to up to N APs. AC 2 is installed with M licenses and can be connected to up to M APs. After AC 1 and AC 2 form a cloud cluster, the cluster has N + M licenses and can be connected to up to N + M APs.

· If AC 1 fails or goes offline, AC 2 retains N + M licenses to allow time for AC 1's recovery. However, the real number of online APs cannot exceed the maximum AP specification of AC 2.

· If AC 1 remains offline for over 30 days, AC 2 will reduce its licenses by removing AC 1’s share, retaining only M licenses. In this case, AC 2 can be connected to up to M APs. This license change only applies to newly online APs. It does not restrict the online APs that were connected to AC 2 before the license change. Those APs do not need to go offline.

If WX3540X or WX3840X requires an EWPXM1XG20 card to expand its AP management capacity, both ACs in the cloud cluster should be installed with an EWPXM1XG20 card, which ensures that they have the same AP management capacity.

Cloud cluster configuration method

A Comware container runs on a physical device and the physical cluster shares the control links of the container cluster. When you make a network plan, perform the following tasks:

1. Identify the number of member devices in the cloud cluster. ‍‍A cloud cluster can have a maximum of two members.

2. Identify hardware compatibility and restrictions of physical devices.

3. Determine the roles of devices in the physical cluster. Devices participating in the management of the physical cluster must be configured as manager-worker, while devices not involved in the management of the physical cluster must be configured as worker.

4. Complete the configuration of the cloud cluster, including configuring member IDs, member IPs, member roles, IPs of members to be added to the cluster, and binding cluster ports.

5. Connect the physical cables of the cluster.

6. Activate the cluster configuration for the devices to form a cluster.

Configuring a cloud cluster

1. Enter system view.

system-view

2. Enter cloud cluster member view.

cloud-cluster member member-id

By default, the member ID is 1.

3. Configure the member IP address for the device.

member-ip ipv4-addr mask-length

By default, the member IP address is not configured.

4. Specify the cluster IP address on the follower device of Manager.

join-cluster ip ipv4-address

By default, the cluster IP address is not specified.

To configure a physical cluster, you must configure this command on the follower device of Manager. This command is not required on the leader device and the Manager that is not configured with this command will join the self-created cluster automatically as a leader device.

5. Bind cluster links with physical interfaces.

cluster-link [ control | data ] bind interface interface-type interface-number

By default, cluster links are not bound to any physical interface.

6. Return to system view.

quit

7. Edit the member ID of the device.

cloud-cluster member member-id renumber new-member-id

By default, the member device ID is 1.

Only one device in the cloud cluster can use the default member ID and other devices must first edit their member IDs to join the cloud cluster. When you edit the member ID, make sure the ID is unique in the cluster.

8. Activate the physical cluster configuration.

cloud-cluster configuration active

Executing this command will reboot the device. During the reboot process, the device will provide interactive information. Select to save the configuration and reboot the device. The new member number will take effect only after the device restarts.

Moving a device from physical cluster A to physical cluster B

About this task

To move a device from physical cluster A to physical cluster B, first remove the device from cluster A. During the removal, the configuration, data, and typology of cluster A will be deleted from the device and container-related configuration will be retained. Then, the device acts as the leader to build a cluster and the container on the device operates as master. Through further configuration, you can add the device to cluster B. The device cannot be added to cluster B if data of cluster A remains on the device.

Removing the device from physical cluster A

1. Disconnect cluster links and remove the device from physical cluster A.

2. Log in to the device.

3. Enter system view.

system-view

4. Enter cloud cluster member view.

cloud-cluster member member-id

By default, the device member ID is 1.

5. Remove the device from the cluster.

undo join-cluster

6. Return to system view.

quit

7. Active the physical cluster configuration.

cloud-cluster configuration active

Refer to the prompt information on the device to see whether the device reboots automatically after this command is executed.

Adding the device to physical cluster B

1. Enter system view.

system-view

2. Enter cloud cluster member view.

cloud-cluster member member-id

By default, the device member ID is 1.

3. Configure the member IP address for the device. Make sure the IP address and other member IP addresses in cluster B are in the same subnet.

member-ip ipv4-addr mask-length

4. Specify the IP address of the leader device in cluster B as the cluster IP address.

join-cluster ip ipv4-address

5. Return to system view.

quit

6. (Optional.) Edit the device member ID. If the current member ID of the device is not used in cluster B, skip this step.

cloud-cluster member member-id renumber new-member-id

7. Connect cluster links and add the device to cluster B.

8. Activate the physical cluster configuration.

cloud-cluster configuration active

Refer to the prompt information on the device to see whether the device reboots automatically after this command is executed.

Display and maintenance commands for physical clusters

· To view information about the physical cluster, use the following command in any view:

display cloud-cluster [ member member-id ] [ verbose ]

· To view cloud cluster configuration information, use the following command in any view:

display cloud-cluster configuration [ member member-id ]

· Display the status of flags during the master/backup switchover process.

display wlan ap statistics cloud-cluster switch-over-state [ history ]

Configuring MAD

Configuring LACP MAD

About this task

The MAD domain ID of the container cluster is used only for MAD. When a container receives a MAD packet, it compares the MAD domain ID in the packet with its local MAD domain ID. If they are the same, it will process the packet. You must assign the same MAD domain ID to all containers in the cloud cluster. To ensure correct split detection, assign different MAD domain IDs to containers in different cloud clusters.

A container cluster has only one MAD domain ID. You can change the MAD domain ID by using the cloud-cluster service-cluster mad domain or mad enable command. The MAD domain IDs configured by using these commands overwrite each other. Modify the MAD domain IDs of container clusters according to the network plan and avoid make random changes.

Restrictions and guidelines

Assigning MAD domain IDs to container clusters

If LACP MAD runs between two container clusters, assign each container cluster a unique MAD domain ID.

Actions on interfaces shut down by MAD

To prevent a multi-active collision from causing network issues, avoid using the undo shutdown command to bring up the interfaces shut down by a MAD mechanism on a Recovery-state container cluster.

Procedure

1. Enter system view.

system-view

2. Assign a MAD domain ID to the container cluster.

cloud-cluster service-cluster mad domain domain-id

The default MAD domain ID is 0.

CAUTION:

Changing the container cluster domain number of a container will cause the container to leave the current container cluster. The container no longer belongs to the current container cluster and will not be able to exchange container cluster control messages with devices in the current container cluster.

3. Create a Layer 2 or Layer 3 aggregate interface and enter its view.

¡ Create a Layer 2 aggregate interface.

interface bridge-aggregation interface-number

You must also perform this task on intermediate devices.

4. Configure the aggregation group to operate in dynamic aggregation mode.

link-aggregation mode dynamic

By default, an aggregation group operates in static aggregation mode.

Perform this step also on the intermediate device.

5. Enable LACP MAD.

mad enable

By default, LACP MAD is disabled.

6. Return to system view.

quit

7. Enter Ethernet interface view.

interface interface-type interface-number

8. Assign the Ethernet port to the specified aggregation group.

port link-aggregation group group-id

Perform this step also on the intermediate device.

Excluding interfaces from the shutdown action upon detection of multi-active collision

About this task

When a container cluster transits to the Recovery state, the system automatically excludes the following network interfaces from being shut down:

· Container cluster physical interfaces.

· Member interfaces of an aggregate interface if the aggregate interface is excluded from being shut down.

You can exclude an interface from the shutdown action for management or other special purposes. For example:

· Exclude a port from the shutdown action so you can Telnet to the port for managing the device.

· Exclude a VLAN interface and its Layer 2 ports from the shutdown action so you can log in through the VLAN interface.

Restrictions and guidelines

If the Layer 2 ports of a VLAN interface are distributed on multiple member devices, the exclusion operation might introduce IP collision risks. The VLAN interface might be up on both active and inactive container clusters.

Procedure

1. Enter system view.

system-view

2. Configure an interface to not shut down when a container transits to the Recovery state.

mad exclude interface interface-type interface-number

By default, all network interfaces on a Recovery-state container are shut down, except for the network interfaces automatically excluded by the system.

Enabling LACP-based MAD auto recovery

About this task

During MAD recovery, devices in Recovery state rejoin the container cluster upon reboot, and interfaces shut down by MAD automatically return to their normal state.

With this feature enabled, if the operating container cluster fails before the MAD failure is resolved, LACP automatically activates the container cluster in Recovery state. This enables the interfaces shut down by MAD in the Recovery-state cluster to return to their normal operational state, ensuring minimal impact on services.

Procedure

1. Enter system view.

system-view

2. Create an aggregate interface and enter its view.

interface bridge-aggregation interface-number

3. Enable LACP-based MAD auto recovery.

lacp mad auto-recovery

By default, LACP-based MAD auto recovery is disabled.

Manually recovering a container cluster

About this task

If the active container cluster fails before the link is recovered, perform this task on the inactive container cluster to recover the inactive container cluster for traffic forwarding. The manual recovery operation brings up all interfaces that were shut down by MAD on the inactive container cluster.

Procedure

1. Enter system view.

system-view

2. Manually recover the inactive container cluster.

mad restore

Displaying MAD configuration

To display MAD configuration, execute the following command in any view:

display mad [ verbose ]

Optimizing container cluster settings

Enabling software auto-update for software image synchronization

About this task

The software auto-update feature automatically propagates the current software images of the master in the cloud cluster to member devices you are adding to the cloud cluster. Those devices will join the cloud cluster again after software image synchronization.

When the software auto-update feature is disabled, new devices can join the cloud cluster even if their software images are different from those of the master in the cloud cluster. However, the software image differences might affect the running of some cloud cluster features on the new member devices. As a best practice to avoid such issue, enable the software auto-update feature.

Prerequisites

To ensure a successful software update, verify that the new device you are adding to the cloud cluster has sufficient storage space for the new software images. If the device does not have sufficient storage space, the cloud cluster automatically deletes the current software images of the device. If the reclaimed space is still insufficient, the device cannot complete the auto-update. You must reboot the device, and then access the BootWare menus to delete unused files.

Procedure

1. Enter system view.

system-view

2. Enable software auto-update.

cloud-cluster auto-update enable

By default, software auto-update is enabled.

Configuring the container cluster bridge MAC address

About this task

The bridge MAC address of a system must be unique on a switched LAN. Container cluster MAC address identifies a container cluster by Layer 2 protocols (for example, LACP) on a switched LAN.

A container cluster usually uses the bridge MAC address of the primary container as its bridge MAC address. In this situation, the primary container is called the address owner of the container cluster bridge MAC address. After the primary container leaves, the container cluster bridge MAC address persists for a period of time or permanently depending on the container cluster bridge MAC persistence setting.

In certain application scenarios, you can configure the bridge MAC of the container cluster to a specified MAC address. For example, when you replace an existing container cluster with a new one in the network, you can configure the new container cluster's bridge MAC to be consistent with the original cluster's MAC, reducing downtime during the replacement process.

Once you configure the bridge MAC address of the container cluster to a specified value, the bridge MAC of the container cluster will always be the specified bridge MAC. The configured container cluster bridge MAC persistence will no longer be effective.

When container clusters merge, bridge MAC addresses are processed as follows:

1. Container cluster merge fails if any two member containers have the same bridge MAC address. Container cluster bridge MAC addresses do not affect container cluster merge.

2. After container clusters merge, the new container cluster uses the bridge MAC address of the container cluster that won the election as the container cluster bridge MAC address.

Restrictions and guidelines

CAUTION:

Bridge MAC address conflicts cause communication failures. Bridge MAC address changes cause transient traffic disruption.

Both aggregation side selection and MAD auto recovery require configuration for changes in the container cluster bridge MAC to function properly.

Configuring the container cluster bridge MAC address

1. Enter system view.

system-view

2. Configure container cluster bridge MAC persistence. Perform one of the following tasks:

¡ Retain the container cluster bridge MAC address permanently even if the address owner has left the container cluster.

cloud-cluster service-cluster mac-address persistent always

¡ Retain the container cluster bridge MAC address for 6 minutes after the address owner leaves the container cluster.

cloud-cluster service-cluster mac-address persistent timer

This command avoids unnecessary bridge MAC address changes caused by device reboot, transient link failure, or purposeful link disconnection.

By default, the container cluster bridge MAC address does not change after the address owner leaves.

Setting the retention time of the container cluster bridge MAC to a fixed value of 6 minutes is suitable for situations where the bridge MAC owner leaves and returns to the container cluster within a short time (such as device reboot or temporary link failure). This can reduce unnecessary bridge MAC switches that lead to traffic interruption.

Specifying the container cluster bridge MAC address

1. Enter system view.

system-view

2. Specify the container cluster bridge MAC address.

cloud-cluster service-cluster mac-address mac-address

By default, the bridge MAC address of the master container is used as the cluster bridge MAC address.

With the bridge MAC address specified, if the container cluster splits, all resulting clusters use the configured bridge MAC address.

Delaying reporting container cluster link down events

About this task

Application scenarios

To prevent frequent container cluster splits and merges during link flapping, configure the container cluster interfaces to delay reporting link down events.

Operating mechanism

Container cluster links have two physical states, up and down. Container cluster interfaces do not delay reporting link up events. They report link up events immediately after the container cluster links come up.

After you set a delay time for container cluster link down report, a container cluster interface does not report a link down event to the container cluster immediately after its link goes down. If the container cluster link is still down when the delay time is reached, the interface reports the link down event to the container cluster.

Restrictions and guidelines

If some features (for example, OSPF) are used in the container cluster, set the delay interval shorter than the timeout timers of those features to avoid unnecessary state changes.

As a best practice, set the container cluster link down report delay to 0 in the following conditions:

· Services require high-speed primary/secondary switchover and container cluster links.

· Before shutting down physical ports of the container cluster or rebooting member containers, set the container cluster link down report delay to 0. After you finish the operation, restore the former link down report delay value.

Procedure

1. Enter system view.

system-view

2. Set a delay for the container cluster interfaces to report link down events.

cloud-cluster link-delay interval

By default, the delay is 0, and the container cluster interfaces report link down events without delay.

Enabling cloud cluster auto-merge

About this task

Cloud clusters that are merging perform master election. Member devices in the cluster that loses in the election must restart to join the other cluster.

You can enable cloud cluster auto-merge for the system to automatically complete the merging process.

If auto-merge is disabled, the administrator must save the configuration and perform a restart action on member devices in the failed cluster to complete the merging process.

· If the device generates a log message with a digest of SCLST_MERGE_MANUAL_NOREBOOT, it indicates that the device is in the cluster that wins the election and no restart is required.

· If the device generates a log message with a digest of SCLST_MERGE_MANUAL_REBOOT, it indicates that the device is in the cluster that loses in the election and a restart is required.

If you manually restart devices in the cluster that wins the election, the merging process can also be completed. However, the master resides in the cluster that loses in the election.

For the auto-merge feature to operate correctly, enable this feature on all the cloud clusters that require merging.

Restrictions and guidelines

For the auto-merge feature to operate correctly, enable this feature on all the cloud clusters that require merging.

Procedure

1. Enter system view.

system-view

2. Enable cloud cluster auto-merge.

cloud-cluster auto-merge enable

By default, the cloud cluster auto-merge feature is enabled. The cloud cluster that has failed in the master election reboots automatically to complete the cloud cluster merge.

Blocking member IDs in a cloud cluster

About this task

In certain situations, packets forwarded across member devices might carry incorrect member IDs, often due to issues such as errors from transceiver modules, fibers, or cables used in cloud cluster connections. If the member ID in a packet received by a device is within its supported range but is not used in the current cloud cluster, it might lead to flooding of the packet or even cause disturbances in the cloud cluster topology.

To prevent such issues, you can perform this task to block unused member IDs within the cloud cluster. When a member device in the cluster receives a packet with a blocked member ID, it directly discards the packet.

Restrictions and guidelines

Once a member ID is blocked in a cloud cluster, the device using that ID cannot join the cloud cluster. Use caution when you confirm the ID to be blocked. To expand the cloud cluster later, you must first execute the undo cloud-cluster service-cluster block member command to restore the blocked member ID for the new member device to use.

After a cluster splits, execute the cloud-cluster service-cluster block member command on the master or subordinate devices with caution. If you execute this command, subordinate devices will restart after the cluster recovers, which will cause an active-active issue.

Procedure

1. Enter system view.

system-view

2. Block member IDs in a cloud cluster.

cloud-cluster service-cluster block member member-id

By default, no member ID is blocked.

Co nfigur ing cloud cluster optimization for WLAN access

About this task

Use this feature to guarantee reliable AP and client access. This feature accelerates cloud cluster master election, new member joining, and cloud cluster member role change to prevent cloud cluster events from causing unstable AP and client access.

Procedure

1. Enter system view.

system-view

2. Enable cloud cluster optimization for WLAN access.

cloudcluster-optimize wlan reliable-access

By default, WLAN access optimization is enabled in a cloud cluster.

Verifying and maintaining container clusters

To display container information of the container cluster, execute the following command in any view:

display cloud-cluster service-cluster container [ container-id ] [ verbose ]

Accessing the container cluster

Comware 9-based container provides human-machine interaction interfaces for login, such as command lines, SNMP, NETCONF, CWMP, and Web interface. After you log in to the Comware 9-based container cluster, you can perform the following tasks:

· Access the container cluster and the physical cluster.

· View all configurations of the cloud cluster and information about the cloud cluster (including the container cluster and the physical cluster).

Use either of the following methods to log in to the command line interface (CLI) of the container cluster:

· Local login—Log in through the AUX or console port of any member container.

· Remote login—Log in at a Layer 3 interface on any member container by using methods including Telnet, Web, and SNMP.

After you log in to the container cluster, you are placed at the CLI of the primary container. The primary container synchronizes all user settings to the secondary containers.

The primary container synchronizes physical cluster settings made by the network administrator to the cluster leader through the cloud platform Agent component. The cluster leader then synchronizes the settings to all the managers and workers. In this way, you can realize the management of the entire cloud cluster.

Cloud cluster configuration examples

Example: Configuring a cloud cluster

Network configuration

As shown in Figure 28, perform the following tasks:

· Set up a cloud cluster that contains AC 1 and AC 2.

· Configure dynamic aggregate links between the cloud cluster and switches Core 1 and Core 2, which are used for LACP multi-active detection (MAD) and service packet forwarding.

Figure 28 Network diagram

Procedure

1. Configure Core 1 and Core 2:

Before you configure Core 1 and Core 2, make sure they form a stable Comware 7-based cloud cluster.

# Create Layer 2 aggregate interface Bridge-Aggregation 1, and set the link aggregation mode to dynamic.

<Core> system-view

[Core] interface bridge-aggregation 1

[Core-Bridge-Aggregation1] link-aggregation mode dynamic

[Core-Bridge-Aggregation1] quit

# Assign Ten-GigabitEthernet 1/0/2 to aggregation group 1.

[Core] interface ten-gigabitethernet 1/0/2

[Core-Ten-GigabitEthernet1/0/2] port link-aggregation group 1

[Core-Ten-GigabitEthernet1/0/2] quit

# Assign Ten-GigabitEthernet 2/0/2 to aggregation group 1.

[Core] interface ten-gigabitethernet 2/0/2

[Core-Ten-GigabitEthernet2/0/2] port link-aggregation group 1

[Core-Ten-GigabitEthernet2/0/2] quit

2. Configure AC 1:

# AC1 in the cloud cluster uses the default member ID, which is 1. You do not need to edit it.

# Specify the member IP address for AC1.

Make sure the member IP addresses of the member devices in the cloud cluster reside on the same network segment. Plan the network segment in advance. This document uses 192.168.10.x/24 as an example.

<AC1> system-view

[AC1] cloud-cluster member 1

[AC1-ccluster-member-1] member-ip 192.168.10.10 24

# Add 192.168.10.10 to the cloud cluster. (If the device is the cluster leader, skip this procedure.)

[AC1-ccluster-member-1] join-cluster ip 192.168.10.10

# Bind GigabitEthernet 1/0/2 to the control channel and Ten-GigabitEthernet 1/3/9 to the data channel.

[AC1-ccluster-member-1] cluster-link control bind interface gigabitethernet 1/0/2

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC1-ccluster-member-1] cluster-link data bind interface ten-gigabitethernet 1/3/9

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC1-ccluster-member-1] quit

# Activate the cloud cluster configuration.

[AC1] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 7 service-timeout 10

cloud-cluster member 1

member-ip 192.168.10.10/24

join-cluster ip 192.168.10.10

role manager-worker

cluster-link control bind interface GigabitEthernet 1/0/2

cluster-link data bind interface Ten-GigabitEthernet 1/3/9

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):test.cfg

Validating file. Please wait...

Saved the current configuration to mainboard device successfully

The cloud cluster configuration takes effect after the reboot.

3. Configure AC 2:

# Specify 192.168.10.11/24 as the member IP address for AC 2. Make sure the member IP addresses of AC1 and AC2 reside on the same network segment.

<AC2> system-view

[AC2] cloud-cluster member 1

[AC2-ccluster-member-1] member-ip 192.168.10.11 24

# Add 192.168.10.10 to the cloud cluster.

[AC2-ccluster-member-1] join-cluster ip 192.168.10.10

# Bind GigabitEthernet 1/0/2 to the control channel and Ten-GigabitEthernet 1/3/9 to the data channel.

[AC2-ccluster-member-1] cluster-link control bind interface GigabitEthernet 1/0/2

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC2-ccluster-member-1] cluster-link data bind interface Ten-GigabitEthernet 1/3/9

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[AC2-ccluster-member-1] quit

# Change the member ID of AC 2 to 2. (The member ID of each member device in the cloud cluster must be unique.)

[AC2] cloud-cluster member 1 renumber 2

This command will take effect after the cloud cluster configuration is activated. The command might result in configuration change or loss when it takes effect. Continue? [Y/N]: y

# Activate the cloud cluster configuration.

[AC2] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 7 service-timeout 10

cloud-cluster member 2

member-ip 192.168.10.11/24

join-cluster ip 192.168.10.10

role manager-worker

cluster-link control bind interface GigabitEthernet 2/0/2

cluster-link data bind interface Ten-GigabitEthernet 2/3/9

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):test.cfg

Validating file. Please wait...

Saved the current configuration to mainboard device successfully

The cloud cluster configuration takes effect after the reboot. AC2 is added to the physical cluster as a follower. In the container cluster, the container on AC 1 is the primary and the container on AC 2 is the secondary.

4. Display the cluster status to verify that the cloud cluster has been set up successfully.

# Display information about the physical cluster.

<AC1> display cloud-cluster

Manager list:

Member ID Role Member IP State Heartbeat(ms)

1 Leader 192.168.10.10 online 100

2 Follower 192.168.10.11 online 0

Worker list:

Member ID State Heartbeat(ms) Joined at

1 online 100 2023-02-12 06:13:28

2 online 200 2023-02-12 06:13:28

The output shows that the physical cluster has two member devices. AC 1 is the leader and AC 2 is the follower.

# Display information about the container cluster.

<AC1> display cloud-cluster service-cluster container

Container ID Slot ID Member ID Role Status

*+1 1 1 Master Online

2 2 2 Standby Online

---------------------------------------------------------------

* indicates the device is the master.

+ indicates the device through which the user logs in.

The output shows that the container on AC 1 is the primary and the container on AC 2 is the secondary.

5. Configure LACP MAD:

# Create Layer 2 aggregate interface Bridge-Aggregation 1, and set the link aggregation mode to dynamic.

<AC1> system-view

[AC1] interface bridge-aggregation 1

[AC1-Bridge-Aggregation1] link-aggregation mode dynamic

# Enable LACP MAD.

[AC1-Bridge-Aggregation1] mad enable

You need to assign a domain ID (range: 0-4294967295)

[Current domain ID is: 0]: 1

The assigned domain ID is: 1

[AC1-Bridge-Aggregation1] quit

# Assign Ten-GigabitEthernet 1/3/10 to aggregation group 1.

[AC1] interface ten-gigabitethernet 1/3/10

[AC1-Ten-GigabitEthernet1/3/10] port link-aggregation group 1

[AC1-Ten-GigabitEthernet1/3/10] quit

# Assign Ten-GigabitEthernet 2/3/10 to aggregation group 1.

[AC1] interface ten-gigabitethernet 2/3/10

[AC1-Ten-GigabitEthernet2/3/10] port link-aggregation group 1

[AC1-Ten-GigabitEthernet2/3/10] quit

Example: Replacing a faulty physical device

Network configuration

As shown in Figure 29, AC 1 in the cloud cluster has failed. Replace it with a new device of the same model.

Figure 29 Network diagram

Analysis

1. Isolate AC 2 from the cloud cluster and activate the physical cluster configuration on AC 2. Perform this procedure to delete information about AC 1 from the local topology of AC 2. If you do not perform this procedure, the physical cluster determines that a member ID conflict has occurred when you add the new device with the same member ID as AC 1 to it. As a result, the new device fails to be added to the physical cluster.

2. Log in to the new device. Configure the IP address to be added to the cluster as the member IP address of AC 2.

3. Remove AC 1 from the network.

4. Copy the configuration of AC 1 to the new device or perform the configuration of AC 1 on the new device again.

5. Connect the new device to the network.

Procedure

1. Configure AC 2:

# Isolate AC 2 from the cloud cluster.

<AC2> system-view

[AC2] cloud-cluster member 2

[AC2-ccluster-member-2] undo join-cluster

[AC2-ccluster-member-2] quit

[AC2] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 7 service-timeout 10

cloud-cluster member 2

member-ip 192.168.10.11/16

role manager-worker

cluster-link bind interface GigabitEthernet 2/0/1

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:Y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):

flash:/startup.cfg exists, overwrite? [Y/N]:y

Validating file. Please wait...

# Display information about the physical cluster on AC 2.

[AC2] display cloud-cluster

Manager list:

Member ID Role Member IP State Heartbeat(ms)

2 Leader 192.168.10.11 online 0

Worker list:

Member ID State Heartbeat(ms) Joined at

2 online 0 2023-02-25 22:49:52

The output shows that information about AC 1 has been cleared.

2. Configure the new device:

# Upload the configuration file of AC 1 to the new device. Execute the cloud-cluster configuration active command on the new device to activate the cloud cluster configuration. If you cannot upload the configuration file to the new device, configure the new device according to the configuration of AC 1. (Configure the IP address of the new device to be added to the cluster as the member IP address of AC 2.)

# You do not need to specify a member ID for the new device.

# Specify the member IP address for the new device.

The member IP addresses of member devices in the cloud cluster must reside on the same network segment. Plan the network segment in advance. This document uses 192.168.10.x/24 as an example.

<NewAC> system-view

[NewAC] cloud-cluster member 1

[NewAC-ccluster-member-1]

[NewAC-ccluster-member-1] member-ip 192.168.10.10 24

# Configure the IP address of the new device to be added to the cluster as the member IP address of AC 2.

[NewAC-ccluster-member-1] join-cluster ip 192.168.10.11

# Bind Gigabitethernet 1/0/2 to the control channel and Ten-GigabitEthernet 1/3/9 to the data channel.

[NewAC-ccluster-member-1] cluster-link control bind interface gigabitethernet 1/0/2

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[NewAC-ccluster-member-1] cluster-link data bind interface ten-gigabitethernet 1/3/9

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[NewAC-ccluster-member-1] quit

# Activate the cloud cluster configuration.

[NewAC] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 3 service-timeout 5

cloud-cluster member 1

member-ip 192.168.10.10/24

join-cluster ip 192.168.10.11

role manager-worker

cluster-link control bind interface GigabitEthernet 1/0/2

cluster-link data bind interface Ten-GigabitEthernet 1/3/9

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):test.cfg

Rebooting....

3. Remove AC 1 from the network.

4. Connect the new device to the network according to connections of AC 1. The new device is automatically added to the cloud cluster where AC 2 resides.

Verifying the configuration

# Display information about the physical cluster.

<AC2> display cloud-cluster

Manager list:

Member ID Role Member IP State Heartbeat(ms)

1 Follower 192.168.10.10 online 0

2 Leader 192.168.10.11 online 100

Worker list:

Member ID State Heartbeat(ms) Joined at

1 online 100 2023-02-12 06:13:28

2 online 200 2023-02-12 06:13:28

The output shows that the physical cluster has two member devices. AC 2 is the leader and the new device is the follower.

# Display information about the container cluster.

<AC2> display cloud-cluster service-cluster container

Container ID Slot ID Member ID Role Status

1 1 1 Standby Online

*+2 2 2 Master Online

---------------------------------------------------------------

* indicates the device is the master.

+ indicates the device through which the user logs in.

The output shows that the container on AC 2 is the primary and the container on the new device is the secondary.

Example: Replacing the cluster port when a cluster link fails

Network configuration

As shown in Figure 30, both GigabitEthernet 1/0/1 and GigabitEthernet 2/0/1 are bound to the control and data channels. GigabitEthernet 1/0/1 has failed and you must configure a new cluster link.

Figure 30 Network diagram

Procedure

1. Configure AC 1:

# Display information about the physical cluster.

<AC1> display cloud-cluster

Manager list:

Member ID Role Member IP State Heartbeat(ms)

1 Follower 1.1.2.11 offline --

2 Follower 1.1.2.12 offline --

Worker list:

Member ID State Heartbeat(ms) Joined at

1 offline -- --

2 offline -- --

The output shows that both AC 1 and AC 2 are followers.

# Display detailed information about all containers in the container cluster.

<Sysname> display cloud-cluster service-cluster container verbose

Service-cluster name: System

Domain ID : 1

Cluster Bridge MAC: 00e0-fc00-1001

Container ID : 1

Member ID : 1

Slot ID : 1

Health : Healthy(0)

Bridge MAC : 00e0-fc00-1001

CPU MAC : 00f0-fc00-1001

Control links: GigabitEthernet1/0/1(DOWN)

Data links : GigabitEthernet1/0/1(DOWN)

Cluster connection : Unreachable

Status : Offline

Self hello timeout (ms) : 4000

Master hello timeout (ms): 4000

Container ID : 2

Member ID : 2

Slot ID : 2

Health : Normal(0)

Bridge MAC : 00e0-fc00-1002

CPU MAC : 00f0-fc00-1002

Ctrl port : GigabitEthernet2/0/1(DOWN)

Data port : GigabitEthernet2/0/1(DOWN)

Cluster connection : Unreachable

Status : Offline

Self hello timeout (ms) : 4000

Master hello timeout (ms): 4000

# Bind GigabitEthernet 1/0/2 to both the control and data channels.

<AC1> system-view

[AC1] cloud-cluster member 1

[AC1-ccluster-member-1] cluster-link bind interface gigabitethernet 1/0/2

The system will shut down and then bring up the interface after activation the c

loud cluster configuration. Continue? [Y/N]: y

[AC1-ccluster-member-1] quit

# Activate the cloud cluster configuration.

[AC1] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 7 service-timeout 10

cloud-cluster member 1

member-ip 192.168.10.10/24

join-cluster ip 192.168.10.10

role manager-worker

cluster-link bind interface GigabitEthernet 1/0/2

The system will activate and save the configuration, and it might do a restart.Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):

flash:/startup.cfg exists, overwrite? [Y/N]:y

Validating file. Please wait...

Saved the current configuration to mainboard device successfully.

2. Configure AC 2:

# Connect GigabitEthernet 1/0/2 on AC 1 to GigabitEthernet 2/0/2 on AC 2.

# Bind GigabitEthernet 2/0/2 to both the control and data channels.

<AC2> system-view

[AC2] cloud-cluster member 2

[AC2-ccluster-member-2] cluster-link bind interface GigabitEthernet 2/0/2

The system will shut down and then bring up the interface after activation the c

loud cluster configuration. Continue? [Y/N]: y

[AC2-ccluster-member-1] quit

# Activate the cloud cluster configuration.

[AC2] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 7 service-timeout 10

cloud-cluster member 2

member-ip 192.168.10.11/24

join-cluster ip 192.168.10.10

role manager-worker

cluster-link bind interface GigabitEthernet 2/0/2

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):

flash:/startup.cfg exists, overwrite? [Y/N]:y

Validating file. Please wait...

Saved the current configuration to mainboard device successfully.

The cluster recovers after the reboot of AC 2.

Verifying the configuration

# Display information about the physical cluster.

<AC1> display cloud-cluster

Manager list:

Member ID Role Member IP State Heartbeat(ms)

1 Leader 192.168.10.10 online 100

2 Follower 192.168.10.11 online 0

Worker list:

Member ID State Heartbeat(ms) Joined at

1 online 100 2023-02-12 06:13:28

2 online 200 2023-02-12 06:13:28

The output shows that the physical cluster has two member devices. AC 1 is the leader and AC 2 is the follower.

# Display information about the container cluster.

<AC1> display cloud-cluster service-cluster container

Container ID Slot ID Member ID Role Status

*+1 1 1 Master Online

2 2 2 Standby Online

---------------------------------------------------------------

* indicates the device is the master.

+ indicates the device through which the user logs in.

The output shows that the container on AC 1 is the primary and the container on AC 2 is the secondary.

Example: Migrating a physical device to another cloud cluster

Network configuration

As shown in Figure 31, migrate Device B from Cloud cluster A to Cloud cluster B.

Figure 31 Network diagram

Procedure

1. Configure Device B:

# Log in to Cloud cluster A to isolate Device B from Cloud cluster A.

<Sysname> system-view

[Sysname] cloud-cluster member 2

[Sysname-ccluster-member-2] undo join-cluster

[Sysname-ccluster-member-2] quit

[Sysname] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 7 service-timeout 10

cloud-cluster member 2

member-ip 192.168.10.11/24

join-cluster ip 192.168.10.10

role manager-worker

cloud-cluster member 2

role manager-worker

cluster-link control bind interface GigabitEthernet 2/0/1

cluster-link data bind interface Ten-GigabitEthernet 2/0/2

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):test.cfg

Validating file. Please wait...

Saved the current configuration to mainboard device successfully.

Device B operates alone after the reboot.

# Connect Gigabitethernet 2/0/1 on Device B to Gigabitethernet 1/0/1 on Device C and Gigabitethernet 2/0/2 on Device B to Gigabitethernet 1/0/2 on Device C to migrate Device B to Cloud cluster B.

# Specify the member IP address for Device B in Cloud cluster B. Make sure the member IP addresses of Device B and Device C reside on the same network segment.

<Sysname> system-view

[Sysname] cloud-cluster member 2

[Sysname-ccluster-member-2] member-ip 192.168.20.21 24

# Configure the IP address of Device B to be added to the cluster as the member IP address of AC 2.

[Sysname-ccluster-member-1] join-cluster ip 192.168.20.20

# Bind GigabitEthernet 2/0/1 to the control channel and Gigabitethernet 2/0/2 to the data channel.

[Sysname-ccluster-member-1] cluster-link control bind interface gigabitethernet 2/0/1

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[Sysname-ccluster-member-1] cluster-link data bind interface ten-gigabitethernet 2/0/2

The system will shut down and then bring up the interface after activation the cloud cluster configuration. Continue? [Y/N]: y

[Sysname-ccluster-member-1] quit

# Activate the cloud cluster configuration.

[Sysname] cloud-cluster configuration active

New cluster configuration:

cloud-cluster service-cluster domain 0

cloud-cluster hello cloud-timeout 7 service-timeout 10

cloud-cluster member 2

member-ip 192.168.20.21/24

join-cluster ip 192.168.20.20

role manager-worker

cluster-link control bind interface GigabitEthernet 2/0/1

cluster-link data bind interface GigabitEthernet 2/0/2

The system will activate and save the configuration, and it might do a restart. Continue? [Y/N]:y

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/startup.cfg]

(To leave the existing filename unchanged, press the enter key):test.cfg

Validating file. Please wait...

Saved the current configuration to mainboard device successfully.

The cloud cluster configuration takes effect after the reboot. Device B is automatically added to Cloud cluster B.

Verifying the configuration

# Display information about the physical cluster.

<Sysname> display cloud-cluster

Manager list:

Member ID Role Member IP State Heartbeat(ms)

1 Leader 192.168.20.20 online 100

2 Follower 192.168.20.21 online 0

Worker list:

Member ID State Heartbeat(ms) Joined at

1 online 100 2023-02-12 06:13:28

2 online 200 2023-02-12 06:13:28

The output shows that the physical cluster has two member devices. Device C is the leader and Device B is the follower.

# Display information about the container cluster.

<Sysname> display cloud-cluster service-cluster container

Container ID Slot ID Member ID Role Status

*+1 1 1 Master Online

2 2 2 Standby Online

---------------------------------------------------------------

* indicates the device is the master.

+ indicates the device through which the user logs in.

The output shows that the container on Device C is the primary and the container on Device B is the secondary.

15-High Availability Configuration Guide

MAD fault recovery achieved by repairing the failed link

MAD fault recovery achieved by using LACP

MAD fault recovery achieved by executing the mad restore command

Application scenarios

Operating mechanism

About this task

Removing the device from physical cluster A

Adding the device to physical cluster B

About this task

Restrictions and guidelines

Assigning MAD domain IDs to container clusters

Actions on interfaces shut down by MAD

Procedure

About this task

Restrictions and guidelines

Procedure

About this task

Procedure

Manually recovering a container cluster

About this task

Procedure

Optimizing container cluster settings

About this task

Prerequisites

Procedure

Delaying reporting container cluster link down events

About this task

Application scenarios

Operating mechanism

Restrictions and guidelines

Procedure

About this task

Restrictions and guidelines

Procedure

About this task

Restrictions and guidelines

Procedure

About this task

Procedure

Verifying and maintaining container clusters

Network configuration

Network configuration

Analysis

Procedure

Verifying the configuration

Network configuration

Procedure

Verifying the configuration

Network configuration

Procedure

Verifying the configuration

Intelligent Terminal Products

Product Support Services

Technical Service Solutions

Resource Center

Policy

Online Help

Become a Partner

Partner Policy & Program

Global Learning