Configure cluster HA

About cluster HA

Cluster HA depends on shared storage and dynamic migration technologies to provide simple and efficient HA services for applications running on all VMs in the cluster. It reduces service interruption caused by host hardware failure. Cluster HA is applicable to scenarios that require service continuity. After you enable HA for the cluster, Space Console monitors running state of all hosts and VMs in the cluster.

When a host fails, Space Console migrates the VMs on the host to available hosts in the cluster.

When a VM fails, Space Console restarts the VM. If the VM is restarted, Space Console does not migrate the VM. If the restart fails, Space Console migrates the VM to another host and restarts it.

When the network between a host and the shared storage fails, Space Console migrates the VMs on the host to available hosts in the cluster.

Cluster HA has the following features:

Automatically monitors running state of hosts and VMs and migrates a failed VM or VMs on a failed host to other hosts in the cluster.

Reserves enough resources for VMs to restart if hosts fail.

Automatically migrates VMs between hosts to ensure service continuity in case of hardware failure.

Automatically selects suitable hosts for VMs on a failed host based on the resource usage if you enable both HA and DRS for the cluster.

Application scenarios

Enable HA cluster to ensure continuity of services in case of failure and to automate maintenance in a virtualization environment.

Prerequisites

Configure an NTP server as described in "Configure time settings" for heartbeat packets to carry the same time within an HA-enabled cluster.

Restrictions and guidelines

All hosts in an HA-enabled cluster must have the same virtual switch configuration, including virtual switch quantity, name, and forwarding mode.

To ensure that VMs in an HA-enabled cluster can migrate between hosts in the cluster, make sure the image files of all VMs in the cluster are saved in the shared storage. As a best practice, do not enable HA or DRS if the VMs use the local storage.

In an HA-enabled cluster, all hosts must use CPUs from the same manufacturer. Clusters containing hosts that use CPUs of the same model from the same manufacturer can provide better migration compatibility.

In an HA-enabled cluster, make sure all hosts use the same NUMA architecture as a best practice. For VMs bound with physical CPUs, make sure the source host and destination host have exactly the same NUMA architecture in a inter-host VM migration. If the hosts have different NUMA architectures, the migration operation might fail or the VM performance might be affected.

To prevent VM name conflict, make sure no hosts in abnormal state exist in a cluster before you disable HA for the cluster. If VM name conflict occurs, enable HA for the cluster again.

During the process of enabling or disabling HA for a cluster, do not start, deploy, or migrate VMs or restart or shut down hosts in the cluster.

If you want to reinstall the operation system for a host in an HA-enabled cluster, first remove the host from the cluster. After you reinstall the operating system, add the host to the cluster.

Before you enable HA for a cluster, make sure all hosts in the cluster have reserved sufficient system resources so that the VMs can migrate between the hosts.

Procedure

From the navigation pane, select Data Center > Virtualization > Cluster name.

Click HA.

Configure the parameters, and then click OK.

Parameters

Boot Priority: Select a default boot priority for the VMs in the cluster. Options include Low, Medium, and High. You can set the boot priority for a VM when you add or modify the VM. After a host fails, the system migrates the VMs on the host based on their boot priorities until all the VMs are migrated or the cluster does not have any available resources.

Enable Service Network HA: Configure whether to enable service network HA. When the service network of a VM fails, the VM can be migrated to another host. Virtual switches that are not bound to physical NICs and those that use the management network or VXLAN forwarding mode do not support HA failure detection.

Enable HA Access Control: Select whether to enable HA access control. If you enable HA access control, configure one of the following parameters:

Min Nodes: Specify the minimum number of hosts for HA to take effect on the cluster. If the number of hosts that are operating correctly in the cluster is smaller than the specified minimum node number, HA cannot take effect on the cluster. To avoid migration failure caused by inaccurate resource calculation, make sure all hosts in the cluster have the same CPU quantity and memory size.

Failover Host: Select hosts used for migration of failed VMs. These hosts cannot be used for common VM migration or VM adding operation. The failover hosts must use the same shared storage as the service hosts. A host that has running VMs cannot be used as failover hosts.

Reserved Resource: Set the reserved CPU and memory percentages. When the remaining resources in the cluster are less than the specified percentage of resources, you cannot start new VMs, set the VMs to running or suspending state, or migrate running VMs to the cluster.

Host Storage Failure Response: Select the action to take on VMs when a shared storage failure occurs. This parameter is editable only when the value for the Shared Storage Fault Action parameter in system settings is set to Do Not Restart Host and the HA status is changed from off to on.

Migrate: Migrates VMs that have all data stored on the shared storage to other hosts in the cluster when a shared storage failure occurs .

No Action: Freezes VMs that have all or some data stored on the shared storage. After the shared storage recovers, the VM will automatically enter running state.A VM cannot be frozen if one of the following conditions exists on it:

The disk bus type is USB.

The disk bus type is high-speed SCSI (FC&ISCSI) for block devices.

Disks are encrypted.

The disk cache mode is writeback or writethrough.

LVM raw blocks exist.

NFS storage is attached.

Timeout: The period of time between the executions of the Host Storage Failure Response-No Action and Host Storage Failure Response-Migrate policies. Within this period of time, I/O commands are re-issued to the storage device. The default is 12000 minutes. The maximum value is 2147483647 minutes..