Cluster/host

What should I do if a host is removed from or fails to be added to a cluster or host pool because that host is power cycled while HA is being enabled on that cluster or while that host is being added to the host pool or cluster?

Symptom

A host is removed from or fails to be added to a cluster or host pool because that the host is power cycled while HA is being enabled on that cluster or while that host is being added to the host pool or cluster.

Solution

How do I recover the HA clusters on a CVM server that is solely used to provide management services after the CVM application fails?

To recover an HA cluster the CVM server:

  1. Reinstall CVM and CVK, and then log in to CVM.

  1. Make sure all hosts in the original cluster are operating correctly.

  1. Create an empty cluster and enable HA.

  1. Add any of the hosts in the original cluster to the new empty cluster. In the dialog box that opens, choose to restore the HA configuration of the original cluster or to restore the shared file system of the host.

  1. In the dialog box for adding hosts, add all hosts in the host list to CVM by entering the username and password of their root user accounts in order.

How do I recover a failed host in an HA cluster if it acts as both a CVM server and an application server (installed with only CVK)?

To recover the host:

  1. Reinstall CVM and CVK on the host, and then assign it the same IP address and host name as the original CVM server.

  1. Make sure the remaining hosts in the cluster are operating correctly.

  1. Log in to CVM, and then create an empty cluster with HA enabled.

  1. Add any of the hosts other than the CVM server to the empty cluster. In the dialog box that opens, choose to restore the HA configuration of the original cluster or to restore the shared file system of the host.

  1. In the dialog box for adding hosts, add all hosts in the host list to CVM by entering the username and password of their root user accounts in order.

  1. Check the cluster for the CVM server. If the server is in the cluster, delete it from and then re-add it to the cluster. If the server is not in the cluster, add the server to the cluster.

What should I do if the cluster HA configuration differs between CVM and a host after a failed attempt to enable or disable HA on the cluster containing that host?

Condition

This issue might occur if the host restarts, shuts down, or loses its connectivity while HA is being enabled or disabled on the cluster.

Solution

What should I do if the host displays "/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. Press F to attempt to fix the errors, I to ignore, S to skip mounting, or M for manual recovery " at startup?

Condition

This prompt is displayed if the system cannot start up because of a corrupt file system. File system corruption typically occurs if unexpected power-off, hardware damage, or forced shutdown has occurred.

Solution

To resolve this issue with a minimal data loss:

  1. Press F for an automatic recovery.

  1. Restart the host.

  1. If the issue persists, press M for a manual recovery.

  1. Enter the password of the root user, and then execute the fsck -y disk-or-partition command (fsck -y /dev/sda1 for example) to check for file system errors.

  1. Restart the host.

What should I do when the host displays "The disk drive for /opt/mds/disk/1 is not present. Continue to wait, or Press S to skip mounting, or M for manual recovery" at startup?

Condition

This prompt is displayed if the system cannot start up because of a disk or disk partition recognition failure. This issue typically occurs when the disk is damaged or is absent.

Solution

To resolve this issue:

  1. Press S to skip mounting.

  1. Check for disk errors after the host starts up.

  1. Delete information about the unrecognized disk or partitions (if any) from the /etc/fstab directory. Alternatively, add the nobootwait string in the mounting option, "UUID=2f943256-c904-45f7-9ad1-f3a79e7d70f3 /opt/mds/disk/1 ext4 nobootwait,defaults 0 2" for example.

  1. Restart the host by executing the reboot command from the CLI of the host.

When I edit the IQN of a host, the system displays an error message

Symptom

When I edit the IQN of a host, the system displays "You cannot Edit the configuration file of iSCSI because iSCSI is being used (Error code: 5029).".

Condition

This issue typically occurs if the host and the shared storage fail to communicate because of link failure. When an administrator deletes a shared storage pool from CVM, the host cannot delete the shared storage pool. As a result, the shared storage pool is deleted on CVM, but the session between the host and the storage pool still exists.

Solution

To resolve this issue:

  1. Log in to the host through SSH.

  1. Execute the iscsiadm -m session command to verify that the session between the host and the storage pool exists. If the session does not exist, contact the technical support.

  1. Delete the folder named the IQN of the host from directory /etc/iscsi/nodes.

  1. Delete the folder named the IP address of the corresponding storage server from directory /etc/iscsi/send_targets.

What should I do if the system stops executing scheduled tasks after I edit the system time of the management host?

Symptom

The system stops executing scheduled tasks (including scheduled snapshot tasks, backup tasks, and DRX tasks) after I edit the system time of the management host.

Condition

This issue occurs because the new system time of the management host is different from the time of the tomcat or casserver daemon.

Solution

To resolve this issue:

  1. Log in to the management host through SSH.

  1. Execute the service tomcat8 restart command to restart the tomcat daemon.

  1. The CVM system becomes unavailable after you execute this command. Before executing this command, stop all tasks in the system. Do not perform any operation in the system after you execute this command.

  1. Execute the service casserver restart command to restart the casserver daemon.

What should I do if I receive an error message when I edit a port profile?

Symptom

The "Domain not found: no domain with matching uuid 06753ef3-7ead-42b1-9b7b-f7b09f53c8ec" error message is displayed when a port profile is edited.

Condition

This issue occurs if a port profile has been applied to a VM that is in the database of CAS, but that VM does not exist on the host.

Solution

To resolve this issue, delete the VM (most likely in unknown state) from CAS, and then edit the port profile.