Download Book

H3C UIS Manager Maintenance Guide-E0881P02 and later versions-5W100-book.pdf (7.96 MB)

Released At: 20-10-2025
Page Views:
Downloads:

Table of Contents

H3C UIS Manager Maintenance Guide-E0881P02 and later versions-5W100

Related Documents

H3C UIS Manager Maintenance Guide

Document version: 5W100-20251017

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

The information in this document is subject to change without notice

Contents

Routine maintenance· 1

Reviewing alarms· 1

Performing health check· 1

Reviewing operation logs· 3

Identifying cluster status· 3

Identifying the cluster HA feature· 4

Identifying the shared storage in the cluster 4

Identifying host information· 4

Identifying host status· 4

Identifying the uptime of a host 5

Identifying host performance monitoring information· 5

Identifying vSwitch information· 8

Identifying physical NIC status· 8

Identifying VM status· 9

Identifying the running status of CAStools· 9

Verifying disk and NIC types· 9

Identifying VM performance monitoring statistics· 10

Identifying VM backup information· 13

Identifying license information· 13

Managing alarms· 14

Managing CAS resources· 14

Managing UIS resources· 15

Backup center 15

VM backup· 15

Platform backup· 15

Configuration cautions and guidelines· 17

Change operations· 18

Upgrading UIS software· 18

Handling hardware failure· 18

Starting or shutting down a UIS host 18

IP address and host name change· 18

Replacing a disk on a CVK host 20

Changing the password for accessing UIS Manager 20

Changing the root password of a host from the Web interface· 20

Changing the admin password· 21

Scaling out and scaling in a cluster 21

Changing the system time· 21

Performing a heterogeneous or homogeneous migration· 21

Redefining a VM·· 22

Obtaining the XML file of the VM·· 22

Identifying the storage volume for VM disk files· 24

Copying the XML file of the VM to the target host 24

Defining the VM through XML· 24

Clearing VM data on the original host 25

Replacing the backup node in a stateful failover system·· 25

Displaying technical support service information· 26

VM grouping· 26

Replacing SSDs with NVMe drives· 26

Migrating VMware VMs· 26

Configuring GPUs· 26

Configuring vGPUs· 26

Configuring anti-virus· 26

Configuring storage disaster recovery· 26

Collecting logs· 27

Collecting logs of the UIS Manager 27

Collecting logs from the Web interface· 27

Collecting logs at the CLI of a CVK host 27

Introduction to logs· 28

Collecting logs of CAStools· 32

Collecting logs of a VM operating system·· 32

Collecting logs of a Windows operating system·· 32

Viewing logs of a Windows operating system·· 34

Collecting logs of a Linux operating system·· 36

Troubleshooting tools and utilities· 36

Introduction to kdump· 36

Analysis with the Kdump file· 36

Storage cluster logs· 40

/var/log/ceph/ceph.log· 40

/var/log/ceph/ceph-osd.*.log· 41

/var/log/ceph/ceph-disk.log· 41

/var/log/ceph/ceph-mon.*.log· 42

/var/log/calamari/calamari.log· 42

/var/log/onestor_cli/ onestor_cli.log· 42

Bimodal HCI logs· 43

Distributed storage maintenance· 44

Cluster issues· 44

Rebalancing data placement when data imbalance occurs· 44

In the Handy HA scenario, the system is inaccessible through the management HA IP· 45

Node issues· 46

Resolving host issues caused by a full system disk· 46

Issues caused by network failure· 46

Handling failures to add or delete hosts· 46

Deleting a storage node offline and restoring the node· 47

Disk issues· 47

Identifying the data partitions to which the OSDs are mounted· 47

OSD for a disk cannot be deleted upon a disk replacement prior to deletion of its OSD from UIS Manager 48

The UIS Web interface shows a slow disk alarm. 49

A disk fails to be added· 49

Troubleshooting· 51

Cluster initialization issues· 51

Host scan failure· 51

Compute cluster creation failure· 51

Storage configuration failure· 52

Cluster state· 52

Health index lower than 100%·· 52

Host deletion· 53

Deletion failure prompt for successful host deletion· 53

Disk issues· 54

No available disk· 54

Insufficient disk count 55

Cluster alarms· 55

Down monitor node· 55

Down OSD·· 56

OSD process terminated unexpectedly· 56

OSD soft link loss· 57

Loose or faulty disk· 58

Abnormal PG state· 58

Cache alarm·· 58

Network suboptimal health alarm·· 60

Stateful failover 61

Monitoring node failure· 61

Down monitoring node due to high system disk usage· 61

Down monitoring node due to network error 61

Extent backup file· 62

Extent backup state· 62

Extent backup directory· 62

Extent backup file decompression· 63

Script for data restoration· 63

Shared storage space reclamation· 63

Releasing space of a shared volume by editing the VM bus type· 63

Releasing space of a shared volume by deleting files· 65

SNMP· 66

Get responses not received by an NMS· 66

Value-added services· 68

Data of a value-added service in the memory is different from that in the database· 68

Data inconsistency occurs if you mount a volume on a Windows client and create a snapshot online 68

If you mount multiple snapshots of a volume on a Windows client at the same time, you are prompted that some snapshots are not initialized or assigned· 69

If you take a snapshot for a volume, delete its host mapping on the handy page without disk scanning or iSCSI disconnection, and restore the snapshot, the restored data is different from the original data. 69

If you create a read-only snapshot for a volume that is mounted by a directory, the snapshot cannot be mounted and the system prompts a wrong fs type message· 69

The state of a snapshot is Creating, Deleting, or Restoring· 70

Compatibility· 70

When the Intel ixgbe network adapter is enabled with load balancing, storage access gets slow· 70

Using a QoS policy with low bandwidth and IOPS limits makes the storage disks of a client slow· 71

Failure to recognize an encryption dongle by VMs· 72

After a USB device is plugged into a CVK host, the host cannot recognize the USB device· 72

After a USB device is loaded to a VM, the device manager of the VM cannot recognize the USB device, or the USB device appear and disappear quickly, or there is an exclamation mark on the device· 74

Use of USB3.0 devices· 76

Use of USB-to-serial devices· 77

Performance improvement 78

Guest OS and VM restoration· 78

Restrictions and guidelines· 78

Preparation before repair 78

Linux system repair steps· 79

Windows repair operations and steps· 83

Upgrade· 88

Independent deployment failure· 88

Unified authentication issue· 89

CAS authentication service exception· 89

UIS 2000 G6 hardware HA does not take effect 89

Operations and maintenance monitoring data fails to be displayed· 93

Host discovery: Hosts have empty serial numbers or the same serial number. 94

In the Handy HA scenario, you cannot access the Web interface by using the HA IP. 95

Host 2 experienced a power cycle when Host 1 entered maintenance mode. After Host 2 recovered, the OSD took a long time to restart (about 100 minutes). 95

When the CPU frequencies of the source and destination hosts differ before and after VM migration, the CPU limit set before migration changes to an invalid value after VM migration from E801P01 to E886P01· 96

Interoperation with a third-party alarm server 96

Configuring a third-party alarm server on the UIS platform·· 96

Configuring UC 2.0 to monitor UIS alarms· 97

Alarm troubleshooting guide· 99

Commonly used commands· 99

UIS Manager commands· 99

HA commands· 99

vSwitch commands· 102

SDN commands· 107

iSCSI commands· 113

Mounting FC storage· 114

Tomcat commands· 115

Database commands· 116

virsh commands· 116

casserver commands· 117

qemu commands· 117

ONEStor commands· 118

ONEStor commands· 123

File management commands· 144

Process management commands· 145

Networking· 148

Disk management commands· 152

Euler edition restrictions· 159

Disabled commands· 159

Disabled command autocompletion· 159

Routine maintenance

Stable operation of the UIS system requires maintenance works that typically include reviewing alarms, identifying cluster status, host information, virtual machine (VM) status, license information, and reviewing logs.

Reviewing alarms

The UIS platform main page displays indicators for critical alarms, major alarms, minor alarms, and information alarms generated during UIS system operation in the top right corner.

If critical or major alarms are displayed, the UIS system operation might contain anomalies that require immediate troubleshooting.

By clicking the corresponding alarm indicator, you can access the associated real-time alarm page. Alternatively, you can navigate to the Alarm Management > Real-Time Alarm page.

You can perform troubleshooting based on the alarm source, type, content, and the last alarm time on the real-time alarm page.

Performing health check

The UIS platform provides a hot key in the top right corner that allows you to perform health check, resource analysis, storage cleanup, resource export, VM restoration, and zombie VM operations.

Select Health Check to enter the health check page. You can perform health check for the specified modules.

You can print and export the health check results.

If a failure is detected in the health check, for example, a RAID controller or hard drive cache failure, you can click Remediation to resolve the issue.

Reviewing operation logs

The Operation Logs page records history operations in the UIS system, including front-end manual user operations and back-end automatic system operations.

The system provides important information about operation logs including` the operator name, finish time, login address, operation description, and failure result reason.

If an operation log message result is failed, you need to troubleshoot the failure based on the failure reason. If a large number of operation logs exist, you can download them for troubleshooting and analysis.

The following figure shows the UIS Manager operation logs.

Identifying cluster status

Identifying the cluster HA feature

Verify that the HA feature is enabled for the cluster. If HA is not enabled, and the next CVK host anomaly occurs in the cluster, the VMs on the CVK host cannot correctly migrate to other CVK hosts in the cluster.

After enabling HA for the cluster, you can enable service area HA. When the service area HA becomes faulty or a connectivity issue occurs for a VM, the VM can migrate to another host.

You can specify the boot priority for the VMs in the cluster. Options include Low, Medium, and High. The default boot priority is Medium. The VM boot priority is set upon adding or editing VMs. The boot priority specifies the startup order of VMs after a host failure occurs. The VMs restart on the new host according to the specified boot priorities. The VMs with the high, medium, and low boot priorities start up in descending order until all VMs restart or no more cluster resources are available.

Identifying the shared storage in the cluster

During VM migration, if the target host has no shared storage mounted for VMs, the migration will fail.

Identifying host information

Identifying host status

View host status on the Hosts page to identify whether abnormal hosts exist.

Check the CPU and memory usage of each host, and pay special attention to the hosts with usage exceeding 80%.

Identifying the uptime of a host

On the Summary page of a CVK host, you can see the detailed host configuration information. From the Uptime field, you can identify whether the host has been rebooted recently.

Identifying host performance monitoring information

On the Performance Monitoring page of the CVK host, you can see the CPU usage, memory usage, I/O throughput, network throughput, disk usage, and partition usage of the host.

Identifying host CPU usage

On the Performance Monitoring > CPU Usage (%) page, view CPU usage in a longer time range.

Identifying host memory usage

On the Performance Monitoring > Memory Usage (%) page, view memory usage in a longer time range.

Identifying host I/O throughput

On the Performance Monitoring > I/O Throughput (KBps) page, view I/O throughput in a longer time range.

Identifying host network throughput

On the Performance Monitoring > Network Throughput (Mbps) page, view the network throughput of each physical NIC in a longer time range.

Identifying host disk usage

On the Performance Monitoring > Disk Requests (IOPS) page, you can see the host disk usage information.

Identifying host partition usage

On the Performance Monitoring > Partition Usage page, you can see the host disk usage information.

Identifying vSwitch information

Identify whether the names of vSwitches between hosts in the cluster are consistent.

On the vSwitches page of a host, identify whether the vSwitches are active. If a vSwitch is in abnormal state, identify whether the physical NIC is normal.

Make sure only one gateway is configured for all vSwitches of the host.

Identifying physical NIC status

On the Physical NICs page, identify whether the physical NICs of the host, such as the rate and state, are normal.

Abnormal physical NICs will affect vSwitch performance.

Identifying VM status

Identifying the running status of CAStools

On the Summary page of the VM, identify whether CAStools are installed to the VM and running correctly.

Verifying disk and NIC types

Verifying the disk type

On the Disk tab of the VM modification page, verify that the device object is Virtio disk (that significantly improves disk performance), the source path is a shared storage path, and the cache mode is directsync (recommended setting).

Verifying the NIC type

On the Network tab of the VM modification page, verify that the device model is high-speed NIC and kernel acceleration is enabled (that significantly improves NIC performance).

Identifying VM performance monitoring statistics

On the Performance Monitoring page of the VM, you can see the CPU usage, memory usage, I/O throughput, network throughput, disk usage, and partition usage of the VM.

Identifying VM CPU usage

On the Performance Monitoring > CPU Usage (%) page, view CPU usage in a longer time range.

Identifying VM memory usage

On the Performance Monitoring > Memory Usage (%) page, view memory usage in a longer time range.

Identifying VM I/O throughput

On the Performance Monitoring > I/O Throughput (KBps) page, view I/O throughput in a longer time range.

Identifying VM network throughput

On the Performance Monitoring > Network Throughput (Mbps) page, view the network throughput of each physical NIC in a longer time range.

Identifying VM disk usage

On the Performance Monitoring > Disk Usage page, you can see the VM disk usage information.

Identifying VM partition usage

On the Performance Monitoring > Partition Usage page, you can see VM partition usage information.

Identifying VM backup information

On the Backup Management page of a VM, you can see the backup history of the VM. As a best practice, back up all core VMs on the UIS platform.

Identifying license information

The UIS system typically contains UIS Manager license, CAS license, and distributed storage license. You need to use official licenses at official deployment sites. You can use temporary licenses at test or temporary deployment sites. To avoid impacts on correct UIS system usage upon expiration of the temporary licenses, you need to update the temporary licenses in advance.

The following figure shows the licensing page of the UIS Manager component.

Managing alarms

The alarm management feature collects and displays statistics of concerned alarms for operators. In the current software version, UIS collects statistics of host resource alarms, VM resource alarms, cluster resource alarms, failure alarms, security alarms, other alarms, and distributed storage resource alarms.

Users can configure alarm threshold settings for the indexes such as CPU usage and memory usage of hosts or VMs. When an index value reaches the alarm threshold, an alarm is generated and reported. Users can view the reported alarms in the real-time alarm list. The alarm filtering configuration allows users to filter the alarms that are not concerned. Such alarms will not be reported. In addition, the system supports sending alarms to users through Emails or SMS messages.

Managing CAS resources

This feature allows you to manage clusters, hosts, and VMs in the CAS management platform. You can perform operations such as suspending, resuming, hibernating, rebooting, and cloning VMs as templates in the CAS platform, enabling virtual resource management, data backup and recovery, and resource sharing for CAS resources.

Managing UIS resources

This feature allows you to manage clusters, hosts, and VMs in the UIS management platform. You can perform operations such as suspending, resuming, hibernating, rebooting, and cloning VMs as templates in the UIS platform, enabling virtual resource management, data backup and recovery, and resource sharing for UIS resources.

Backup center

Backup center centrally manages backup history, backup policies, and backup configuration on the management platform, including VM backup and management platform backup.

VM backup

VM backup on the management platform includes backup history, backup policies, backup pools, and backup parameters.

Platform backup

Management data backup is used for automatic scheduled backups or manual immediate backups of relevant configuration data for the hyper-converged management platform, including database, version information, and configuration files. The backup files can be saved locally on the host where the hyper-converged management platform is located, or on a remote server (in a stateful failover environment, only backup to remote servers is supported). You can view and download historical backup data in the backup history, as well as upload or import system backup files. In the event of system failure, the historical backup data can be used to restore data and configuration files to the current system.

Configuration cautions and guidelines

See H3C UIS Manager Configuration Cautions and Guidelines.

See H3C UIS Manager Data Loss Prevention Best Practices.

Change operations

If issues occur during the UIS system running process, you must follow certain rules to resolve the issues. If you cannot do that, normal operation of services on the live network will be affected.

Upgrading UIS software

See H3C UIS Upgrade Guide.

Handling hardware failure

See H3C UIS Hyper-Converged Infrastructure Component Replacement Configuration Guide.

Starting or shutting down a UIS host

When you perform comprehensive maintenance for the UIS system, you must follow a certain order to power on or power off the device. If you cannot do that, the service system will be destroyed. Before powering on the device, make sure the health is 100%.

For more information, see H3C UIS Hyper-Converged Infrastructure Node Shutdown Configuration Guide.

IP address and host name change

CAUTION:

· To change the root password for a CVK host in the system, access the Web interface of UIS Manager. You cannot change the root password for a CVK host from its command shell.

· If you delete a CVK host when the shared storage of the CVK host is suspended, the shared storage will be automatically deleted. Therefore, you must mount the shared storage to the CVK host again after the CVK host is added again.

· When the number of nodes is equal to or less than four or when the host for which you want to change the IP address or host name is a node in the stateful failover system (for example, primary node, backup node, quorum node, or Handy node), you cannot modify IP addresses through directly deleting hosts.

· This method is applicable to changes to the host management network IP, storage front-end IP, storage back-end IP, and host name.

After the UIS system is deployed, you might need to modify the UIS system IP address or hosts.

After a CVK host is added to the UIS cluster, you can modify the IP address or host name through the method provided by the Xconsole interface, as shown in the figure below. To do that, you must first delete the CVK host from the UIS system.

If the CVK host has shared storage enabled or runs VMs, it cannot be deleted. To delete the host in this case, you must first stop or migrate VMs and pause or delete the shared file system.

After the host is deleted, you can add the host through host expansion. During the host expansion process, you can manually configure an IP address for the host and select the corresponding NIC interface, and then add the host back to the cluster. Then, you can migrate the VMs back to the host.

CAUTION:

· Make sure the IP address you enter can communicate with the management network and internal/external storage networks of the original cluster. If you cannot do that, you will fail to add the host.

· The IP address settings are planned in the deployment phase. You must determine the IP address settings at the beginning, because you cannot modify the IP address settings later.

Replacing a disk on a CVK host

When a disk in the cluster fails, it cannot be directly replaced. Software operations and configurations are required for a successful disk replacement on UIS Manager. For more information, see H3C UIS Hyper-Converged Infrastructure Component Replacement Configuration Guide.

Changing the password for accessing UIS Manager

CAUTION:

· To change the root password for a CVK, access the Web interface of UIS Manager. You cannot change the root password for a CVK host from its command shell.

· Configure the same password for all hosts in the cluster.

To meet security requirements, user passwords need to be changed periodically. The following changes the password of the UIS root user as an example.

Changing the root password of a host from the Web interface

1. Right-click a host, and then select Edit Host.

2. In the dialog box that opens, enter a new password, and then click OK.

If you forget the root password, contact Technical Support.

Changing the admin password

UIS Manager has a default password. To change this password, access UIS Manager and click admin in the upper-right corner, and then change the password as needed.

As a best practice, change the root password and admin password in time at the first login to UIS Manager.

Scaling out and scaling in a cluster

See H3C UIS Manager Resource Scale-Out and Scale-In Configuration Guide.

Changing the system time

See H3C UIS Manager System Time Modification Configuration Guide.

Performing a heterogeneous or homogeneous migration

See H3C UIS HCI Cloud Migration Guide.

Redefining a VM

In some cases, such as when a VM fails to start up due to host operation issues, it might be necessary to redefine and restore a VM on a different host from the original location.

Obtaining the XML file of the VM

Obtaining the XML file of the VM when HA is enabled and the CVM node is normal

When HA is enabled and the CVM node is normal, the XML file of a VM is saved in the HA directory on the CVM node by default. Typically, the HA directory is /etc/cvm/ha/clust_id/cvk_name, for example, /etc/cvm/ha/2/cvknode191. In the corresponding HA directory, enter the CVK directory for the VM to find the XML file of the VM, for example, test01.

Obtaining the XML file of the VM when HA is disabled and the CVM node is normal

1. On the top navigation bar, click System, and then select Data Backup > Backup History from the left navigation pane. Then, download the most recent backup file.

This example downloads backup file UIS_INFO_BACK_E0881P03_20231123203206.tar.gz.

2. Decompress the downloaded backup file and enter directory \UIS_INFO_BACK_E0881P03_20231123203206\cvknode1_crm_cvknode2\CVM_INFO_BACK_E0781P04_20231123203227\front\cvks.

3. Select the directory for the host where the VM is located, and then decompress the libvirt.tar.gz file in the directory. Then, enter the qemu subdirectory to obtain the XML file of the VM.

NOTE:

Directory cvknode1_crm_cvknode2 is named in the format of primary CVM node name_crm_secondary CVM node name. In a single host environment, this directory is named in the format of CVM node name.

Obtaining the XML file of the VM when HA is disabled and the CVM node is faulty

If HA is disabled and the CVM node is faulty, you cannot access UIS Manager. To obtain the XML file of a VM in this case, perform the following steps:

1. Use an SSH client to access each node in the cluster to find a node that has the /vms/cvmbackup directory.

The backup data is saved on three random hosts managed by the system.

2. Enter the /vms/cvmbackup directory on the node, and then enter the cvknode1_crm_cvknode2 directory to identify the most recent backup record. Then, enter the corresponding directory to locate the front.tar.gz file.

3. Decompress the front.tar.gz file, and then enter the cvks directory. Then, enter the directory for the host where the VM is located, and then decompress the libvirt.tar.gz file in the directory.

4. Enter the libvirt/qemu directory after decompression to find the XML file of the VM.

Identifying the storage volume for VM disk files

If you already know the storage volume for VM disk files, verify that the corresponding storage volume on another host that has mounted it is normal from the CLI of the host. If you do not know the storage volume for VM disk files, execute the vim or cat command to obtain the disk file location of the VM from the XML file obtained in "Obtaining the XML file of the VM." For example:

The source file field displays the location of the VM disk files.

Copying the XML file of the VM to the target host

Use SCP to copy the XML file of the VM to the /etc/libvirt/qemu directory on the host where the storage volume location has been identified in "Identifying the storage volume for VM disk files."

Defining the VM through XML

1. Execute the virsh define vm.xml command in the /etc/libvirt/qemu directory.

The VM is defined through XML.

2. Verify that the VM is also displayed in the output from the virsh list –all command at the CLI of the new host.

3. Connect the host from the Web interface. Then, you can view and start up the VM on from the Web interface.

To define many VMs, you can also reboot libvirt to automatically define these VMs if the system does not have any VMs with their names in Chinese characters. Then, start up these VMs after successful definition, as shown in the following figure:

Clearing VM data on the original host

If the original host has been completely damaged due to some hardware issues, resolve the hardware issues, and then re-install the same UIS version as the original system.

If the original host does not have hardware issues, perform the following steps to clear VM data on the host:

1. Disconnect the network cable from the original host before the host starts up.

2. Log in to the CLI of the original host to remove the XML file of the VM to avoid dual writes that occur when HA brings up the VM on the original host after the server restarts.

Replacing the backup node in a stateful failover system

See H3C UIS Manager Stateful Failover Configuration Guide.

Displaying technical support service information

You can display, export, and import technical support service information for a site on the system management page. For more information, see H3C UIS Manager Local Licensing Guide and H3C Software Products Remote Licensing Guide.

VM grouping

On the VM management page, you can assign VMs to different VM groups as needed. On the VM group details page, you can view VM resource usage information for each group.

Replacing SSDs with NVMe drives

See H3C UIS Manager Configuration Guide for Replacing SSDs with NVMe Disks.

Migrating VMware VMs

See H3C UIS HCI Cloud Migration Guide.

Configuring GPUs

See H3C UIS Manager GPU Passthrough Configuration Guide.

Configuring vGPUs

See H3C UIS Manager vGPU Configuration Guide.

Configuring anti-virus

Contact Technical Support.

Configuring storage disaster recovery

See H3C UIS Manager Site Recovery Management Configuration Guide.

Collecting logs

Collecting logs of the UIS Manager

Collecting logs from the Web interface

1. On the top navigation bar, click System, and then select Log Collection from the left navigation pane.

2. Select the CVK hosts for which the system collects logs, and then click Collect to save the log files locally.

Collecting logs at the CLI of a CVK host

If you cannot collect logs from the Web interface of the UIS Manager due to CVK failure, access the CLI of the CVK host to collect logs manually.

To collect logs at the CLI of a CVK host, access the CLI of the CVK host, and then execute the cas_collect_log.sh command. A compressed file is generated in the /vms directory as shown in the figure.

To analyze the logs, download the file to your local computer by using SSH client software.

For ONEStor-related hosts, you cannot collect logs for them by executing the script. To collect logs for a ONEStor-related host, manually copy the logs in the /var/log/storage and /var/log/ceph directories. If the time range for log collection is short or the log size is too large, you can collect part of the logs archived in the /var/log/storage/backup directory.

Introduction to logs

Logs collected from the Web interface

UIS log files downloaded from the Web interface are named in the UIS_×××_×××.tar.gz format. A decompressed log file includes the following types of files:

· catalina.out—Contains logs of Web functions on the UIS Manager.

· oper_log.log—Contains user operation logs.

· *.diag.tar.bz2—Contains logs of each CVK host.

· onestor—Contains operation logs and system logs of ONEStor.

· WARN*.tar.gz—Contains alarm messages.

Logs collected at the CLI

CVK host log files obtained at the CLI are named in the XXX.tar.bz2 format. A decompressed CVK host log file includes the following types of directory files:

· etc—Contains UIS configuration files, which are mainly VM configuration files. The VM configuration files are in the libvirt/qemu/VM.xml directory.

· var—Contains logs of each UIS feature module.

· command.out—Contains output information about frequently used commands at the CLI.

· cas _cvk-version—Contains UIS version information.

· loglist—Contains UIS log file names.

· uis_raid_card_info.log—Contains basic information about RAID controllers on the host.

The var directory mainly contains the following logs:

· messages—Host system logs, which record the system running information.

· fsm—Shared file system logs.

· cas_ha—HA logs.

· Ha_shell_XX.log—HA logs.

· libvirt—VM logs.

· openvswitch—Logs generated by the OVS running process.

· Ovs_shell_XX.log—Logs generated by calling the ovs_bridge.sh script.

· tomcat8—UIS Web logs.

· operation—Logs for manual operations at the CLI of UIS Manager.

The following provides descriptions for CVK host logs:

· Messages logs

Messages logs record critical information during operating system operation. The following introduces the records for an abnormal reboot of a CVK host.

Feb 3 13:58:01 XJYZ-CVK01 CRON【64458】: (root) CMD (ump-node-sync )

Feb 3 13:58:01 XJYZ-CVK01 CRON【64459】: (root) CMD (ump-sync -p ALL)

Feb 3 13:58:01 XJYZ-CVK01 CRON【64460】: (root) CMD ( /opt/bin/ocfs2_iscsi_conf_chg_timer.sh)

Feb 3 13:58:01 XJYZ-CVK01 CRON【64443】: (CRON) info (No MTA installed, discarding output)

Feb 3 14:06:35 XJYZ-CVK01 kernel: imklog 5.8.6, log source = /proc/kmsg started.

Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: 【origin software="rsyslogd" swVersion="5.8.6" x-pid="2747" x-info="http://www.rsyslog.com"】 start

Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: rsyslogd's groupid changed to 103

Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: rsyslogd's userid changed to 101

Feb 3 14:06:35 XJYZ-CVK01 rsyslogd-2039: Could not open output pipe '/dev/xconsole' 【try http://www.rsyslog.com/e/2039 】

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpuset

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpu

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpuacct

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Linux version 3.13.6 (root@cvknode22) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #5 SMP Mon Jul 21 10:07:26 CST 2014

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Command line: BOOT_IMAGE=/boot/vmlinuz-3.13.6 root=UUID=4beeb503-6e10-4836-93a4-0836a9a1571e ro nomodeset elevator=deadline transparent_hugepage=always crashkernel=256M quiet

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 KERNEL supported cpus:

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Intel GenuineIntel

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 AMD AuthenticAMD

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Centaur CentaurHauls

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 e820: BIOS-provided physical RAM map:

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x0000000000000000-0x000000000009cbff】 usable

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x000000000009cc00-0x000000000009ffff】 reserved

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x00000000000f0000-0x00000000000fffff】 reserved

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x0000000000100000-0x00000000bf60ffff】 usable

As shown in the example, the messages log file does not have any records from 13:58:01 to 14:06:35, indicating that the CVK host failed in the time range.

The kernel-level logs record information about the CVK host after it restarted.

· Libvirt logs

In the /var/log/libvirt/libvirtd.log log file, an alarm that the CVK host lacks memory resources exists and the current memory usage has reached 97%. (The alarm message prompted when the CPU resources are insufficient is similar to that in the example.)

2014-10-24 09:15:52.792+0000: 2994: warning : virIsLackOfResource:1106 : Lack of Memory resource! only 374164 free 64068 cached and vm locked memory(4194304*0%) of 16129760 total, max:85; now:97

2014-10-24 09:15:52.792+0000: 2994: error : qemuProcessStart:3419 : Lack of system resources, out of memory or cpu is too busy, please check it.

The /var/log/libvirt/qemu directory saves the log files of VMs running on the CVK host.

root@UIS-CVK01:/var/log/libvirt/qemu# ls -l

total 44

-rw------- 1 root root 7067 Jan 9 19:08 RedHat5.9.log

-rw------- 1 root root 1969 Jan 18 15:41 win7.log

-rw------- 1 root root 26574 Feb 11 16:15 windows2008.log

VM logs files record VM running information, including the time when the VM started up and was closed and disk files of the VM.

2015-02-11 15:50:18.349+0000: starting up

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name windows2008 -S -machine pc-i440fx-1.5,accel=kvm,usb=off,system=windows -cpu qemu64,hv_relaxed,hv_spinlocks=0x2000 -m 1024 -smp 1,maxcpus=12,sockets=12,cores=1,threads=1 -uuid 43741f06-166d-4155-b47e-4137df68e91c -no-user-config -nodefaults -chardev file=/vms/sharefile/windows2008,if=none,id=drive-virtio-disk0,format=qcow2,cache=directsync –device

…

char device redirected to /dev/pts/0 (label charserial0)

qemu: terminating on signal 15 from pid 4530

2015-02-11 16:15:28.825+0000: shutting down

· OCFS2 logs

The /var/log/fsm/fsm_core*.log log file records information about processing triggered by OCFS2 Fence of the CVK host.

2021-11-04 06:40:35,882 manager:233 INFO Received an event: {'index': 7, 'type': 'fence_umount', 'uuid': u'851D36905AB74AFD93E1ABA8259DA3A2', 'seq': 11538, 'dev_name': u'dm-7'}

2021-11-04 06:40:35,923 manager:204 INFO Remain 0 events to be handling

2021-11-04 06:40:35,923 manager:131 INFO Manager received an event: Pool sharefile06 was fence_umount

2021-11-04 06:40:35,923 fspool:141 INFO Pool sharefile06 received a event fence_umount

· Operation logs

Operation logs record information about the commands executed at the CLI of the CVK host. The following contains commands executed from Apr 19th to Apr 21st.

root@cvknode1:~/cas# ll /var/log/operation/

total 32

drwxrwxrwx 2 root root 4096 Apr 21 10:06 ./

drwxr-xr-x 40 root root 4096 Apr 21 11:01 ../

-rwxrwxrwx 1 root root 5162 Apr 19 17:49 18-04-19.log*

-rwxrwxrwx 1 root root 829 Apr 20 19:11 18-04-20.log*

-rwxrwxrwx 1 root root 8505 Apr 21 11:00 18-04-21.log*

The following example shows the content of an operation log file, including the following information:

¡ Time when a command was executed.

¡ Login user.

¡ Login address.

¡ Login method.

¡ Executed commands.

¡ Directory where a command was executed.

2018/04/19 16:56:50##root pts/6 (172.16.130.3)##/root## vi /var/log/tomcat8/cas.log

2018/04/19 16:57:05##root pts/6 (172.16.130.3)##/root## service tomcat8 restart

2018/04/19 17:02:21##root pts/5 (172.16.130.3)##/root## cat /etc/cvk/system_alarm.xml

2018/04/19 17:02:23##root pts/5 (172.16.130.3)##/root## lsblk

2018/04/19 17:49:04##root pts/6 (172.16.130.3)##/root## ceph osd tree

2018/04/19 17:49:19##root pts/6 (172.16.130.3)##/root## stop ceph-osd id=3

Collecting logs of CAStools

The UIS system and VMs are separated. To monitor and manage VMs on the UIS Manager, you must install CAStools in the operating system of the VMs.

The log collection method for CAStools varies by the operating system installed on the VM:

· Windows operating system—Obtain the qemu-ga.log file in the C:\Program Files\castools\ directory of the VM.

· Linux operating system—Obtain the qemu-ga.log and set-ip.log files in the /var/log/ directory of the VM.

Collecting logs of a VM operating system

Collecting logs of a Windows operating system

1. Open the Event Viewer window, and then select Windows Logs from the left navigation pane. Right click System, and then select Save All Events As.

2. Save the logs.

3. The downloaded log file is as shown in the figure.

Viewing logs of a Windows operating system

1. On the local computer (installed with the Windows 7 operating system), open the Event Viewer window. From the left navigation pane, right click Windows Logs, and then select Open Saved Log.

2. On the dialog box that opens, select the saved log file.

3. The logs are displayed on the Saved Logs > event page.

Collecting logs of a Linux operating system

To collect logs for a VM installed with a Linux operating system, collect logs in the /var/log directory. If the log size is large, first compress the logs and then copy the compressed file and save it locally.

For example, to collect logs generated on Sep 17th, 2019 for VM vm_test, execute the tar -cvf vm_test_20190917.tar.gz /var/log command.

Troubleshooting tools and utilities

Introduction to kdump

Kdump is a dump tool of the Linux kernel. It saves part of the memory to store the capture kernel. Once the current kernel crashes, kdump uses kexec to run the capture kernel. The capture kernel dumps complete information of the crashed kernel (for example, CPU register and stack statistics) to a file in a local disk or on the network.

By default, the UIS system supports kdump. When the kernel of a CVK host fails, the system generates a crash file in the /vms/crash directory for troubleshooting as shown in the example.

root@cvk29:/vms/crash# ls -lt

drwxr-sr-x 2 root whoopsie 4096 Jul 22 17:34 2014-07-22-09:34

The file named in the dump-*** format in the 2014-07-22-09:34 directory contains the output of kdump.

Analysis with the Kdump file

You can use the crash tool to analyze the Kdump file. The vmlinux file for the kernel version is needed for the analysis. You can find that file at /usr/src/linux-4.1.0-generic/vmlinux-kernelversion (the kernel version name might vary).

The following information describes how to use the Kdump file to locate typical online issues.

CPU error

Node cvknode1 at a site reboots repeatedly. After all virtual machines (VMs) are migrated and the shared storage settings are deleted from the node, the node still reboots repeatedly. The syslogs at reboots do not show occurrence of any anomalies before the reboot, while a vmcore file is present in the /vms/crash directory.

1. View abnormal call stack information in the vmcore file:

root@cvk21:/vms/tmp# crach vmlinux vmcore

No command 'crach' found, did you mean:

Command 'crash' from package 'crash' (main)

crach: command not found

root@cvk21:/vms/tmp# crash vmlinux vmcore

crash 7.0.5

This program is free software, covered by the GNU General Public License,

and you are welcome to change it and/or distribute copies of it under

certain conditions. Enter "help copying" to see the conditions.

This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6

License GPLv3+: GNU GPL version 3 or later [http://gnu.org/licenses/gpl.html]

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-unknown-linux-gnu"...

KERNEL: vmlinux

DUMPFILE: vmcore [PARTIAL DUMP]

CPUS: 8

DATE: Wed Nov 5 12:25:19 2014

UPTIME: 00:02:19

LOAD AVERAGE: 0.06, 0.05, 0.02

TASKS: 324

NODENAME: cvknode-1

RELEASE: 3.13.6

VERSION: #5 SMP Mon Jul 21 10:07:26 CST 2014

MACHINE: x86_64 (2132 Mhz)

MEMORY: 64 GB

PANIC: "Kernel panic - not syncing: Fatal Machine check"

PID: 0

COMMAND: "swapper/6"

TASK: ffff8807f4618000 (1 of 8) [THREAD_INFO: ffff8807f4620000]

CPU: 6

STATE: TASK_RUNNING (PANIC)

crash] bt

PID: 0 TASK: ffff8807f4618000 CPU: 6 COMMAND: "swapper/6"

#0 [ffff8807ffc6ac50] machine_kexec at ffffffff8104c991

#1 [ffff8807ffc6acc0] crash_kexec at ffffffff810e97e8

#2 [ffff8807ffc6ad90] panic at ffffffff8174ac9d

#3 [ffff8807ffc6ae10] mce_panic at ffffffff81038b2f

#4 [ffff8807ffc6ae60] do_machine_check at ffffffff810399d8

#5 [ffff8807ffc6af50] machine_check at ffffffff817589df

[exception RIP: intel_idle+204]

RIP: ffffffff8141006c RSP: ffff8807f4621db8 RFLAGS: 00000046

RAX: 0000000000000010 RBX: 0000000000000004 RCX: 0000000000000001

RDX: 0000000000000000 RSI: ffff8807f4621fd8 RDI: 0000000001c0d000

RBP: ffff8807f4621de8 R8: 0000000000000009 R9: 0000000000000004

R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003

R13: 0000000000000010 R14: 0000000000000002 R15: 0000000000000003

ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

--- [MCE exception stack] ---

#6 [ffff8807f4621db8] intel_idle at ffffffff8141006c

#7 [ffff8807f4621df0] cpuidle_enter_state at ffffffff81602a8f

#8 [ffff8807f4621e50] cpuidle_idle_call at ffffffff81602be0

#9 [ffff8807f4621ea0] arch_cpu_idle at ffffffff8101e2ce

#10 [ffff8807f4621eb0] cpu_startup_entry at ffffffff810c1818

#11 [ffff8807f4621f20] start_secondary at ffffffff8104306b

crash]

Abnormal call stack information shows that a machine check error (MCE) exception occurs. This exception is typically caused by hardware issues.

2. Execute the crash-dmesg command to view information printed before the unexpected reboots:

[ 15.707981] 8021q: 802.1Q VLAN Support v1.8

[ 16.416569] drbd: initialized. Version: 8.4.3 (api:1/proto:86-101)

[ 16.416573] drbd: srcversion: F97798065516C94BE0F27DC

[ 16.416575] drbd: registered as block device major 147

[ 17.142281] Ebtables v2.0 registered

[ 139.114172] Disabling lock debugging due to kernel taint

[ 139.114185] mce: [Hardware Error]: CPU 2: Machine Check Exception: 4 Bank 5: be00000000800400

[ 139.114192] mce: [Hardware Error]: TSC 10ba0482e78 ADDR 3fff81760d32 MISC 7fff

[ 139.114199] mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1415161519 SOCKET 0 APIC 14 microcode 13

[ 139.114203] mce: [Hardware Error]: Run the above through 'mcelog --ascii'

[ 139.114208] mce: [Hardware Error]: Machine check: Processor context corrupt

[ 139.114211] Kernel panic - not syncing: Fatal Machine check

crash]

It can be determined from preceding information that an error has occurred on CPU 2.

Memory error

A CVK node at a site reboots unexpectedly. No abnormal records are found in the syslogs before and after the reboot. Kdump records are generated at the reboots.

1. View call stack information from the Kdump records.

If information as follows is output, a hardware error might occur.

crash] bt

PID: 0 TASK: ffffffff81c144a0 CPU: 0 COMMAND: "swapper/0"

#0 [ffff880c0fa07c60] machine_kexec at ffffffff8104c991

#1 [ffff880c0fa07cd0] crash_kexec at ffffffff810e97e8

#2 [ffff880c0fa07da0] panic at ffffffff8174ac9d

#3 [ffff880c0fa07e20] asminline_call at ffffffffa014c895 [hpwdt]

#4 [ffff880c0fa07e40] nmi_handle at ffffffff817598da

#5 [ffff880c0fa07ec0] do_nmi at ffffffff81759b7d

#6 [ffff880c0fa07ef0] end_repeat_nmi at ffffffff81758cf1

[exception RIP: intel_idle+204]

RIP: ffffffff8141006c RSP: ffffffff81c01da8 RFLAGS: 00000046

RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046

RDX: ffffffff81c01da8 RSI: 0000000000000018 RDI: 0000000000000001

RBP: ffffffff8141006c R8: ffffffff8141006c R9: 0000000000000018

R10: ffffffff81c01da8 R11: 0000000000000046 R12: ffffffffffffffff

R13: 0000000000000000 R14: ffffffff81c01fd8 R15: 0000000000000000

ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018

--- [NMI exception stack] ---

#7 [ffffffff81c01da8] intel_idle at ffffffff8141006c

#8 [ffffffff81c01de0] cpuidle_enter_state at ffffffff81602a8f

#9 [ffffffff81c01e40] cpuidle_idle_call at ffffffff81602be0

#10 [ffffffff81c01e90] arch_cpu_idle at ffffffff8101e2ce

#11 [ffffffff81c01ea0] cpu_startup_entry at ffffffff810c1818

#12 [ffffffff81c01f10] rest_init at ffffffff8173fc97

#13 [ffffffff81c01f20] start_kernel at ffffffff81d37f7b

#14 [ffffffff81c01f70] x86_64_start_reservations at ffffffff81d375f8

#15 [ffffffff81c01f80] x86_64_start_kernel at ffffffff81d3773e

crash]

2. Execute the dmesg command to view information before the anomaly.

crash]dmesg

…

[10753.155822] sd 3:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).

[10804.115376] sbridge: HANDLING MCE MEMORY ERROR

[10804.115386] CPU 23: Machine Check Exception: 0 Bank 9: cc1bc010000800c0

[10804.115387] TSC 0 ADDR 12422f7000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 2b

…

[10804.283467] sbridge: HANDLING MCE MEMORY ERROR

[10804.283473] CPU 9: Machine Check Exception: 0 Bank 9: cc003010000800c0

[10804.283475] TSC 0 ADDR 1242ef7000 MISC 90868000800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 26

[10804.303482] EDAC MC1: 28416 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12422f7 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)

[10804.303489] EDAC MC1: 192 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12424a7 offset:0x0 grain:32

…

[10804.319474] sbridge: HANDLING MCE MEMORY ERROR

[10804.319481] CPU 6: Machine Check Exception: 0 Bank 9: cc001010000800c0

[10804.319482] TSC 0 ADDR 1243087000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 20

[10805.303772] EDAC MC1: 64 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x1243087 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)

[10813.602696] sd 3:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).

[10813.603219] sd 3:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).

[10840.833238] Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details.

crash]

3. View information in the kern.log file.

Nov 30 07:05:01 HBND-UIS-E-CVK09 kernel: [229821.496666] sd 11:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).

Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188854] sbridge: HANDLING MCE MEMORY ERROR

Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188873] CPU 23: Machine Check Exception: 0 Bank 9: cc1e0010000800c0

Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188874] TSC 0 ADDR 10638f7000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417302355 SOCKET 1 APIC 2b

…

Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.244902] EDAC MC1: 30720 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x10638f7 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)

…

root@gzh-139:/vms/issue_logs/hebeinongda/20141201/HBND-UIS-E-CVK09/logdir/var/log# grep OVERFLOW kern* | wc

225 6341 60264

root@gzh-139:/vms/issue_logs/hebeinongda/20141201/HBND-UIS-E-CVK09/logdir/var/log#

It can be determined from preceding information that the issue is caused by a memory error. The issue is resolved after the memory is replaced.

Storage cluster logs

/var/log/ceph/ceph.log

The ceph.log file mainly records the health status and traffic of the cluster. It is available only on monitor nodes and has the same content as that output from the ceph –w command.

· If logs as follows are in the ceph.log file, the service network of the primary monitor node of the cluster has been disconnected.

2017-05-09 19:44:03.400143 mon.2 172.16.105.84:6789/0 2009 : cluster [INF] mon.cvknode84 calling new monitor election

2017-05-09 19:44:03.404362 mon.1 172.16.105.83:6789/0 2023 : cluster [INF] mon.cvknode83 calling new monitor election

2017-05-09 19:44:05.419510 mon.1 172.16.105.83:6789/0 2024 : cluster [INF] mon.cvknode83@1 won leader election with quorum 1,2

2017-05-09 19:44:05.428131 mon.1 172.16.105.83:6789/0 2025 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 1,2 cvknode83,cvknode84

2017-05-09 19:44:14.383590 mon.1 172.16.105.83:6789/0 2057 : cluster [INF] osdmap e1397: 18 osds: 12 up, 18 in

· If logs as follows are in the ceph.log file, the health of the cluster is not 100%, and the cluster is in the process of recovery.

2017-06-06 19:31:41.319993 mon.0 192.168.93.21:6789/0 86387 : cluster [INF] pgmap v73931: 4096 pgs: 2561 active+clean, 1532 active+remapped+wait_backfill, 3 active+remapped+backfilling; 3362 GB data, 6730 GB used, 21941 GB / 28672 GB avail; 0 B/s rd, 127 kB/s wr, 256 op/s rd, 63 op/s wr; 5/2608637 objects degraded (0.000%); 1765938/2608637 objects misplaced (67.696%); 62992 kB/s, 15 objects/s recovering

· If logs as follows are in the ceph.log file, the storage network of a non-Handy or non-primary monitor node in the cluster has been disconnected.

2017-05-12 16:05:14.585496 mon.0 172.31.1.31:6789/0 106035 : cluster [INF] osd.31 marked itself down

2017-05-12 16:05:15.095824 mon.0 172.31.1.31:6789/0 106038 : cluster [INF] osd.33 marked itself down

2017-05-12 16:05:15.195542 mon.0 172.31.1.31:6789/0 106040 : cluster [INF] osdmap e286: 36 osds: 25 up, 36 in

2017-05-12 16:05:15.287350 mon.0 172.31.1.31:6789/0 106042 : cluster [INF] osd.27 marked itself down

2017-05-12 16:05:16.186527 mon.0 172.31.1.31:6789/0 106043 : cluster [INF] osdmap e287: 36 osds: 24 up, 36 in

/var/log/ceph/ceph-osd.*.log

The ceph-osd.*.log file mainly records information about an OSD in the cluster. If an error occurs on a cluster OSD, the error reasons will be recorded in the ceph-osd.*.log file for that OSD, which can be used for troubleshooting.

The following is an example about how to troubleshoot by using a ceph-osd.*.log file when an OSD is abnormal (the UI reports an OSD error):

1. Use the ceph osd tree command in the CLI to identify the identifier of the abnormal OSD.

2. Access the /var/log/ceph/ceph-osd.*.log file for the OSD and identify the reason for the OSD exception.

¡ If a log as follows is in the ceph-osd log file, the storage controller is damaged, causing the journal to be interrupted.

2017-04-25 14:34:08.807146 7f5bf690a780 -1 journal Unable to read past sequence 301115833 but header indicates the journal has committed up through 301115842, journal is corrupt

¡ If logs as follows are in the ceph-osd log file, the OSD has committed suicide because of is excessive pressure.

2017-03-09 11:46:01.576034 7f0878364700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f086fa6c700' had suicide timed out after 180

2017-03-09 11:46:01.576049 common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")

¡ If a log as follows is in the ceph-osd log file, the OSD has not been mounted.

2017-04-27 19:46:18.280510 7fcfb954c700 5 filestore(/var/lib/ceph/osd/ceph-85) umount /var/lib/ceph/osd/ceph-85

¡ If logs as follows are in the ceph-osd log file, the data copies are inconsistent.

2016-10-22 06:49:23.854201 7fd2e860f700- 1 log_channel(cluster)log [ERR]:1.ad shard 1:soid 819850ad/rbd_date.3b7055757a07.0000000000000ab1/7//1 date_digest 0xd7ac1812 != best guess date_digest 0x43d61c5d from auth shard 0

2016-10-22 06:49:23.854253 osd/osd_types.cc:4148:FAILED assert(clone_size.count(clone))

/var/log/ceph/ceph-disk.log

The ceph-disk.log file mainly records information about OSD deployment and startup and is typically used in conjunction with the ceph-osd.*.log file to locate OSD related issues.

· If logs as follows are in the ceph-disk log file, the system stops OSD mounting and exits the OSD mounting process because files exist in the /var/lib/ceph/osd/ceph-* directory. This issue typically occurs at the restart of the host. When the host restarts, all OSDs must be reactivated and mounted and the mounting process will check whether other files than the heartbeat, osd_disk_info.ini, and osd_should_be_restart_flag files exist in the OSD directory. If other files exist in the directory, the OSD mounting process stops.

ceph-disk: Error: another ceph osd.71 already mounted in position(old/different cluster instance?);unmounting ours.

· If logs as follows are in the ceph-disk log file, the OSD has not been activated and cannot be mounted.

Fri. 07 Apr 2017 10:24:48 ceph-disk[line:2438] ERROR Failed to activate

Fri. 07 Apr 2017 10:24:48 ceph-disk[line:976] DEBUG Unmounting /var/lib/ceph/tmp/mnt.hD_6nh

/var/log/ceph/ceph-mon.*.log

The ceph-mon.*.log file mainly records information of a monitor node in the Ceph cluster. Monitor nodes are responsible for monitoring the cluster. If an error occurs on a monitor node, the error reason will be recorded in the ceph-mon.*.log file for that node, which can be used for troubleshooting.

To troubleshoot for a monitor node exception (the UI reports a monitor node anomaly):

1. Check the hostname of the abnormal monitor node on the host management page.

2. Access the /var/log/ceph/ceph-mon.*.log file for the host to check for the cause of the monitor node exception. If the following logs are found in the ceph-mon log file, the primary monitor node is abnormal (possible reason is an exception occurs on the service network of the primary monitor node or the ceph-mon process on the primary master node is stopped), and the backup monitor nodes trigger the election mechanism.

2017-05-08 19:24:58.017935 7fb173765700 1 mon.cvknode84@2(peon).paxos(paxos active c 24348..24883) lease_timeout -- calling new election

2017-05-08 19:24:58.024456 7fb172f64700 0 log_channel(cluster) log [INF] : mon.cvknode84 calling new monitor election

/var/log/calamari/calamari.log

The calamari.log file mainly records the operations on Handy.

If logs as follows are in the calamari.log file, the Handy node does not have network connectivity with the other nodes.

2017-05-08 15:08:29,060 - ERROR - onestor_common.py[network_check][line:494] - django.request <network_check> Host "172.16.105.84" is unreachable, retry again...

2017-05-08 15:08:29,060 - ERROR - onestor_common.py[execute][line:622] - django.request [ONEStor] onestor_request_all_node cvknode84:Host is unreachable

/var/log/onestor_cli/ onestor_cli.log

The onestor_cli.log file records information about the process of collecting real-time logs on a node. It can be used to diagnose and troubleshoot any issues related to log collection.

· If a log as follows is in the onestor_cli.log file, the size of the collected logs has exceeded 5 GB.

[2017-05-10 10:47:01,980][WARNING][monitor.py][line:157] We detect the current collecting log size is up to 5GB, ending collecting automatically!

· If the onestor_cli.log file disappears from a node, the log disk space on the node might be full.

Bimodal HCI logs

Bimodal HCI provides VMware VM lifecycle management and VMware VM agentless migration features.

1. The vmware-api-server service on the CVM host provides VMware VM lifecycle management. It stores related logs in the /var/log/vmware-api-server directory. If an exception occurs when you operate VMware VMs on the UIS, a log is generated in that directory to record the causes for the exception, which can be used for issue diagnosis.

For example, if a log as follows is generated, you can determine that the reason for failure to generate a snapshot is that the snapshot directory is too deep (which is limited by VMware):

[Vmware VM Request Processor Manager1] Trace[] UID[] c.h.h.u.s.v.handler.VmwareHandler – vmware vm “hdm2-snapshot” to generate a snapshot fail, cause:Snapshot hierarchy is too deep.

2. The vmware-agent service on the CVK host is responsible for migrating data from VMware. It stores related logs in the /var/log/vmware-agent directory. If a migration task fails or is interrupted unexpectedly on the UIS, you can view the logs in that directory.

¡ vmware-agent.log—Migration process logs. When an exception occurs during the migration process, the vmware-agent.log file will record the causes for the exception, which can be used for future issue diagnosis.

If a log as follows is output, a known VMware issue https://kb.vmware.com/s/article/2035976 has been triggered

2022-01-19 16:03:06 [ERROR] service.go:149 migrate failed, vcenter key: 172.20.67.6:443 vmref: vm-64 task 1955534340610146293 reason: {"code": 12002, "message": "Get QueryChangedDiskAreas failed. ", "error": "ServerFaultCode: Error caused by file /vmfs/volumes/61dd4ded-84b7a178-07ce-98f181b81b1c/ubuntu18041desktop/ubuntu18041desktop.vmdk"}

¡ vmware_vddk.log—VDDK operation logs. These logs record the operations related to connecting to vSphere and can assist in locating data transmission interruption during migration.

3. If an error of failed driver injection is reported on the UI during the VM migration process, you can check the relevant error logs to preliminarily locate the cause of the failure. The relevant error logs are saved in the /var/log/caslog/cas_xc_virtio_driver.log file.

4. If the VM still reports that castools is not running on the UI a period of time after the injection is completed, remount the ISO and install castools again.

5. If no errors are reported on the UI after the VM is migrated but you cannot access the desktop after the VM is powered on, a VM driver injection compatibility issue might exist. If this VM is in the compatible migrated VM list, contact Technical Support to locate the issue on site.

The bimodal HCI system also manages CAS resources and the lifecycle of VMs on CAS platforms. The aggregator-provider service on a CVM host adds, edits, and deletes sites. For added sites, it collects, updates, and deletes resource data. It also manages VMs, snapshots, and templates. The service logs are stored in /var/log/aggregator-provider and help troubleshoot missing resources, errors, or failed operations. Resources collected by the aggregator-provider service and the operation records are stored in the graph database table to help issue location.

Distributed storage maintenance

Cluster issues

Rebalancing data placement when data imbalance occurs

ONEStor uses the CRUSH algorithm to automatically balance data across the object-based storage daemons (OSDs) in the cluster. Each OSD maps to a disk.

To rebalance data when occasional data imbalance occurs:

1. Execute the ceph osd df command and then identify the disk utilization of each OSD in the %USE field.

Figure 1 Identifying the disk utilization of each OSD

2. If the disk utilization of some OSDs is unusually higher than other OSDs, execute the ceph osd reweight-by-utilization command to rebalance data.

IMPORTANT:

Data rebalancing is read and write intensive and might cause cluster performance to degrade. To minimize its impact on storage services, perform this operation at off-peak hours.

3. Verify that the system has finished the rebalancing operation successfully.

Execute the ceph -s command to monitor the cluster health state. When the cluster state changes to HEALTH_OK, you can determine that the system has finished the rebalancing operation.

Method to accelerate data rebalancing when the cluster is in an idle state

When the cluster is in an idle state, you can accelerate data rebalancing, as follows:

1. Log in to UIS Manager.

2. On the top navigation bar, click Storage, and then select Disk Pool Management from the left navigation pane.

3. Select the disk pool on which data rebalancing is to be performed, and then click Edit.

4. In the dialog box that opens, change the restore speed from self-adaptive to reconstruction first.

In the Handy HA scenario, the system is inaccessible through the management HA IP

Symptom

· The Handy management page is inaccessible via the management HA IP in the browser.

· After you log in to the system via the HA IP, the system prompts to use the management IP. However, logging in with the management IP prompts to use the HA IP instead.

Solution

1. Check the database process on the primary and backup Handy nodes. Identify the node where the database service fails to start. If neither node has the process running, use the last node that provided management HA service as the reference.

# ps aux | grep mariadbcluster

2. Delete the gvwstate.dat file on this node. Skip this step if the file does not exist.

# sudo rm -rf /var/lib/mariadbcluster/gvwstate.dat

3. Set safe_to_bootstrap to 1 on this node.

# vim /var/lib/mariadbcluster/grastate.dat

4. Start the database service process on this node.

# service mariadbcluster bootstrap

5. Restart the database service processes on other nodes sequentially. (The nodes include primary/backup Handy nodes and nodes identified in Method 1.)

# service mariadbcluster restart

6. Check if the database service runs normally. After recovery, log in to the Handy interface again.

# /opt/h3c/bin/python /var/lib/ceph/shell/handyha/test_psql_status.py If the script returns PSQL_READY when executed on the primary Handy node, the database cluster has recovered.

Node issues

Resolving host issues caused by a full system disk

A host might malfunction when the usage of its system disk reaches 100%. For example, Apache processes and the ceph-mon daemon might fail to start, resulting in issues such as the mon down error and inability to log in to the management node.

System disk might get full for the following reasons:

· Too many large files and log files are present.

· The fio tester stores a large test0.0 file on the system disk. This issue occurs if you run fio without specifying the --filename option.

To free up disk space:

1. Execute the df –h command on the host to identify its system disk usage. The following is sample output:

root@cvknode86:~# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 28G 4.0G 23G 16% /

If the Use field displays that the disk usage has reached 100%, proceed to remove unused files.

2. Remove unused large files or log files:

a. Access the /var/log directory and other directories that might contain large files or unused files.

b. Execute the du –h --max-depth=1 command to view the size of each folder in the directory.

c. Delete unused files.

3. Remove the test data file generated by fio:

a. Execute the echo ""> filename command.

b. Execute the rm –rf filename command to delete the test data file.

Issues caused by network failure

Handling failures to add or delete hosts

You will fail to add or delete a host or disks on the host if network failure occurs before the system finishes the operation. The system will then display a failure message indicating that the system failed to delete a host because of management network failure.

The solution to these issues differs depending on the timing of the network failure.

Network failure occurs before the system starts deleting disks

If network failure occurs before the system starts deleting disks, you only need to select the target host from the webpage and perform the operation again after the system regains network connectivity to the host.

If the connectivity to the host cannot be restored in extreme cases, for example, because the host's operating system is damaged, select the host from the webpage to delete it offline. However, data on the host's disks will remain. You must take action to handle residual data.

Network failure occurs before the system deleting all disks

See "Network failure occurs before the system starts deleting disks."

Network failure occurs during disk formatting after all the disks are deleted from the cluster

The host will be invisible on the management webpages after the system deletes all its disks from the cluster and proceeds to disk formatting. If network failure occurs before the system finishes formatting all the disks, the data and Ceph partitions on the unformatted disks will remain. After the host restarts, the unformatted disks will be automatically mounted to the operating system. UIS Manager will be unable to discover these disks when the host is re-added to the cluster.

To resolve these issues, execute the umount command to manually unmount the residual disks before you add the host back to the cluster.

Deleting a storage node offline and restoring the node

You delete a storage node offline from the cluster on the webpage only if the network connectivity to its host cannot be restored. This operation directly removes the node from the cluster.

CAUTION:

If abnormal PGs are present, data rebalancing might be in progress. To avoid loss of data, do not delete the node at this time.

CAUTION:

Destroying the cluster data on a host will result in loss of all cluster data on that host. Be sure that the node is no longer in use when you perform the operation.

These operations ensure that you can add the host back to the cluster as a storage, monitor, or backup management node for management high availability.

Disk issues

Identifying the data partitions to which the OSDs are mounted

The following sample output shows that OSDs have been mounted:

The following sample output shows that no OSDs have been mounted:

You must identify the mapping between an OSD and its disk based on the partition UUID (partuuid) when you remount the OSD if it was unmounted because of a disk issue.

To identify the partuuid of the data partition for an OSD, view the content of the fsid file in the OSD directory for that OSD, for example:

cat /var/lib/ceph/osd/ceph-8/fsid

d6d97f59-171e-46f7-9759-8037c7209bf1

To identify the partuuid values of all partitions on the host, execute the following command:

ll /dev/disk/by-partuuid/

lrwxrwxrwx 1 root root 10 Dec 6 19:55 260c435a-2c35-4562-979d-7a3d641dda48 -> ../../sdf2

Mount the partition to the target disk.

OSD for a disk cannot be deleted upon a disk replacement prior to deletion of its OSD from UIS Manager

If you replace a faulty disk prior to deleting its OSD from UIS Manager, Handy adds a new disk and OSD mapping for the replacement disk. When you attempt to delete the original OSD, you will receive a no data found message and the deletion attempt will fail.

To resolve this issue:

1. Execute the lsblk command to verify that no disk has been mounted at the old OSD node. If a disk is still mounted at that OSD node, unmount it first.

Mount status:

Unmount status:

2. Execute the ps -ef | grep osd command to check whether the old OSD daemon has stopped.

3. Execute the following commands to stop the OSD daemon. Replace x in these command lines with the OSD daemon ID.

CAUTION:

These commands will erase user data. Make sure you fully understand its impact on services when you use them. If you are not sure of their impact, contact H3C Support.

stop ceph-osd id=x

ceph osd out osd.x

ceph osd crush remove osd.x

ceph auth del osd.x

ceph osd rm osd.x

4. Execute the cephosd tree command to verify that the OSD has been removed from the cluster.

5. Log in to UIS Manager to verify that the failed disk has been deleted.

The UIS Web interface shows a slow disk alarm.

· Regardless of whether the slow disk alarm is cleared within 10 minutes, strongly consider replacing the disk. After replacement, when the OSD returns to up state, any unresolved slow disk alarm will be cleared automatically. For disk replacement steps, see "Replacing a disk on a CVK host."

· If the alarm is cleared within 10 minutes and the OSD remains up without disk replacement, manually acknowledge the alarm in the Handy interface. If the alarm occurs again, replace the disk.

A disk fails to be added

Symptom

No available disks. The OSDs in this node have been used by the Ceph cluster.

To check if a disk is in use:

1. Run lsblk to view the target disk and its partitions.

2. Execute the sudo gdisk -l /dev/xxx command (xxx: disk name). If partitions contain Ceph identifiers, the disk is already in use.

Solution

Before using this method, confirm the disk is unused to avoid accidental data deletion.

If the disk has no user data but only residual partitions, run sudo ceph-disk zap /dev/xxx (xxx: disk name) to clear residual data and retry adding the disk.

Troubleshooting

Cluster initialization issues

Host scan failure

Symptom

A host cannot be discovered during cluster setup.

Solution

To resolve this issue:

· Check the network configuration as follows:

a. Verify that the management interface of the target host is in the same LAN as the management interface of the management node.

b. Verify that link aggregation is correctly configured on the switch interfaces connected to the management interface of the target host.

- If static link aggregation is configured, shut down one of the switch interfaces. After host scan is finished, bring up that interface.

- If dynamic link aggregation is configured, configure the host-facing aggregate interface as an edge aggregate interface by using the lacp edge-port command.

· Check for cluster initialization failure as follows:

c. Log in to each CVK host.

d. Access the /etc/cvk path and delete the cvm_info file (if it exists) by using the following command.

rm –rf cvm_info

e. Access the /root/.ssh path and delete the mhost file (if it exists) by using the following command.

rm –rf mhost

· Log in to the target host, access the /root/.ssh path, and delete the isCvmFlag file by using the following command. This file indicates that the host has acted as a management host.

rm –rf isCvmFlag

· Check for server serial number errors as follows:

a. Log in to the scanned host via SSH and execute the following command, where sn1234567 is the serial number. Make sure it does not conflict with others and matches the standard length.

echo "sn1234567" > /etc/cvk/.tmpSN

b. Restart the service on the host.

systemctl restart uisoncfg.service

c. Rescan the host and proceed with deployment.

Compute cluster creation failure

Symptom

Creation of a compute cluster fails.

Solution

To resolve this issue, verify that each host can reach the management, storage front-end, and storage back-end networks.

Storage configuration failure

Symptom

Storage configuration fails.

Solution

To resolve this issue:

1. If UIS fails to discover all disks or a designated disk, perform the following tasks:

a. Log in to the affected host and execute the parted /dev/sdDrive letter rm partition number command to delete all partitions from an undiscovered disk.

b. Verify that the RAID controllers are included in the H3C CAS&UIS Server Virtualization Software and Hardware Compatibility Matrix.

2. If the distributed storage service is incorrectly installed on the management node, perform the following tasks:

a. Run the /opt/bin/uis_onestor_handy_install.sh script to reinstall ONEStor.

b. If an error is reported, contact Technical Support.

3. If device management is not supported by a server or RAID controller, execute the devmgr_check_dev_type command. If the value of for_DM_ONEstor is False, device management is not supported. Verify again in H3C CAS&UIS Server Virtualization Software and Hardware Compatibility Matrix.

4. Storage initialization is stuck.

a. Execute the supervisorctl status command to identify whether the onestor-peon process is restarting repeatedly.

b. Check vim /var/log/supervisor/onestor-peon-stderr (use Tab for autocompletion). If it contains TimeoutError: Lock error: Matplotlib failed to acquire the lock file: /root/.cache/matplotlib/fontlist-v330.json.matplotlib-lock, this issue has occurred.

c. Delete /root/.cache/matplotlib/fontlist-v330.json.matplotlib-lock.

Cluster state

Health index lower than 100%

Symptom

The health index for a cluster is lower than 100%.

Solution

To resolve this issue:

1. Troubleshoot node failure or network disconnection issues as follows:

a. Log in to UIS, resolve alarms, and verify that the status of hosts is normal.

b. Log in to the command line of the management node, and verify connectivity to the hosts in the cluster by using ping operations.

2. Troubleshoot disk failure or RAID controller failure as follows:

a. Log in to UIS, and resolve the alarms generated for disk failure or RAID controller failure.

b. Log in to HDM, and resolve hardware alarms.

3. Verify that storage nodes are under maintenance or data balancing is in process as follows:

a. Log in to UIS, and verify that storage nodes are under maintenance and data balancing is enabled.

b. Log in to the command line of the management node, and verify that data balancing is in progress.

Host deletion

Deletion failure prompt for successful host deletion

Symptom

The system displays a deletion failure prompt when a host is deleted successfully.

Solution

To resolve this issue:

1. Execute the lsblk command on the deleted host and check for unmounted OSDs.

2. Verify that the directory of an OSD's directory is opened.

3. Execute the cd command to exit the OSD's directory, and then execute the umount /var/lib/ceph/osd/ceph-11 command.

4. Execute the sgdisk –zap-all /dev/sdf command to format partitions.

Disk issues

No available disk

Symptom

No disks are available

Solution

To resolve this issue:

1. Verify that the OSDs on the affected host have been used by the Ceph cluster:

a. Execute the lsblk command to view partitions on the target disk.

b. Execute the gdisk -l /dev/drive letter command to check for the ceph tag.

2. If the target disk is not in use, execute the ceph-disk zap /dev/drive letter command to clear residual data on the disk, and then add the disk again.

3. If UIS still cannot discover the disk, execute the ceph-disk zap /dev/drive letter command again.

Insufficient disk count

Possible causes:

· Some disks have partitions. Check and clean them as described in "No available disk."

· The management interface has residual data. Clear the browser cache and reconfigure the settings.

· Some disk cache settings do not meet deployment requirements. Reconfigure the disk cache according to the deployment requirements.

Cluster alarms

Down monitor node

Symptom

A monitor node is down.

Solution

To resolve this issue:

1. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Monitor Nodes from the left navigation pane.

2. If the down monitor node is powered off or shut down, start it up. Then, verify network connectivity between the cluster and the monitor node.

Figure 2 Verifying the monitor node state

Down OSD

Symptom

An OSD is down.

Solution

To resolve this issue:

1. Verify that the storage node where the down OSD resides is not powered off or shut down and it does not have network connectivity issues.

a. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Storage Nodes from the left navigation pane.

b. If the storage node where a down OSD resides is powered off or shut down (no data is displayed for the storage node), start the storage node up. Then, verify network connectivity between the cluster and the storage node.

Figure 3 Verifying the storage node state

OSD process terminated unexpectedly

Symptom

An OSD process is terminated unexpectedly on a storage node.

Solution

To resolve this issue:

1. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Storage Nodes from the left navigation pane.

2. Verify that the disks on the storage node are in normal state.

3. Log in to the host acting as the storage node through SSH from the management network, and execute the ceph osd tree command to view the status of al OSDs.

4. Execute the ps-ef | grep ceph-osd command to check the status of the osd processes.

5. If an osd process is not running, execute the systemctl start ceph-osd@OSD ID.service command to start it.

OSD soft link loss

Symptom

The OSD soft link for a disk is lost.

Solution

To resolve this issue:

1. Execute the lsblk command to view the OSD directory of the down disk.

2. Access the OSD directory by executing the following command:

cd /var/lib/ceph/osd/ceph-4

3. Enter ll to check whether the soft link exists. If the soft link exists, the journal file line contains the UUID of the disk.

4. If the soft link does not exist, execute the following command:

ceph-disk activate-all

Loose or faulty disk

Symptom

The OSD process of a disk is down, which indicates that the disk is loose or faulty.

Solution

To resolve this issue:

1. Examine the disk status LEDs of the affected server to locate the disk.

2. Replace the disk.

Abnormal PG state

Symptom

PGs are degraded, stale, stuck unclean, or undersized.

Solution

If no other alarms are generated for the abnormal PGs, data migration is in process. The PGs will recover automatically.

Cache alarm

Symptom

Physical cache alarms or logical cache alarms are generated for the following reasons:

· RAID is manually configured and the state of caches is incorrectly set during system deployment.

· Faults occur during operation of the cluster. For example, a battery fault for a RAID controller might cause logical cache errors.

Solution

To resolve this issue:

1. On the top right of the page, click Hot Key, and then select Health Check.

2. Select Physical Disk State and Logical Disk State, and then click Start.

Figure 4 Performing health check

3. Click Failure in the Cache State column for a faulty disk.

Figure 5 Disk with faulty caches

4. Fix the caches of the disk according to the remediation.

Figure 6 Remediation

Network suboptimal health alarm

When the Network suboptimal health alarm is enabled, the backend suboptimal network service triggers an alarm upon detecting NIC hardware failures.

When detecting a NIC hardware failure, the system isolates the NIC in the aggregated port based on the configured isolation policy. To troubleshoot the NIC issue, identify whether the NIC has issues or if the link and NIC are faulty. Replace faulty NIC in time.

1. Execute the ethtool -S ethx | grep crc_errors command to identify whether the number of CRC errors is increasing.

[root@cvknode1 ~]# ethtool -S eth0 | grep crc_errors

rx_crc_errors: 0

2. Execute the ethtool -m ethx command to identify whether the optical power for the NCI is normal.

3. Execute the cat /sys/class/net/eth0/carrier_changes command to identify whether the NIC keeps flapping.

Stateful failover

See H3C UIS Manager Stateful Failover Configuration Guide.

Monitoring node failure

Down monitoring node due to high system disk usage

Symptom

A monitoring node goes down because the system disk usage is high. The mon process exits or cannot start if the system disk usage exceeds 95%. The low disk space alarm is generated if the system disk usage crosses 70%.

To identify this symptom:

1. Execute the following command to check whether the mon process exists.

ps -ef|grep ceph-mon

2. If the mon process is not running, execute the df –h command to view the system disk usage.

root@cvknode1:df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 10G 9.6G 0.4G 96% /

udev 863M 12K 863M 1% /dev

tmpfs 349M 348K 349M 1% /run

none 5.0M 0 5.0M 0% /run/lock

none 873M 4.0K 873M 1% /run/shm

3. Check the status of the mon process by executing the ps aux | grep ceph-mon command.

root@cvknode20216:~/515# ps aux | grep ceph-mon

root 2619507 0.0 0.1 8112 2136 pts/3 S+ 17:47 0:00 grep --color=auto ceph-mon

Solution

To resolve this issue, release system disk space and start the mon process, for example, by executing the service ceph-mon@node name status command. The service name differs between nodes.

Down monitoring node due to network error

Symptom

A monitoring node goes down because of a network error.

To identify this symptom:

1. Verify that the mon process is running.

2. Verify that the monitoring nodes can ping one another.

3. Execute the arp -a and ifconfig commands to verify that the ARP table of the down monitoring node is correct.

Solution

To resolve this issue, troubleshoot the network error and start the mon process.

Extent backup file

Extent backup state

To verify that extent backup is enabled, execute the following command:

cat /etc/crontab

SHELL=/bin/bash

PATH=/sbin:/bin:/usr/sbin:/usr/bin

MAILTO=""

# For details see man 4 crontabs

# Example of job definition:

# .---------------- minute (0 - 59)

# | .------------- hour (0 - 23)

# | | .---------- day of month (1 - 31)

# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...

# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat

# | | | | |

# * * * * * user-name command to be executed

0 22 * * 5 root python /opt/bin/ocfs2_pool_fstrim.pyc -s onestor

1 2 * * * root /opt/bin/cas_clean_log.sh

*/1 * * * * root python /opt/bin/uis_host_network_probe.pyc

*/5 * * * * root flock -xn /tmp/util_memory_dropcaches.sh.lock -c "/opt/bin/util_memory_dropcaches.sh"

*/3 * * * * root /opt/bin/check_abrt_memory.sh

* * * * * root /opt/bin/ocfs2_iscsi_conf_chg_timer.sh

*/10 * * * * root python /opt/bin/ocfs2_cluster_config.pyc -s

0 */12 * * * root python /opt/bin/ocfs2_filesystem_layout_backup.pyc

* * * * * root /opt/bin/tomcat_check.sh

*/10 * * * * root /opt/bin/ntp_mon.sh

* * * * * root /opt/bin/tomcat_check.sh

Extent backup directory

To locate an extent backup file in the extent backup directory, access the /vms/.ocfs2_extent_backup directory, and search by the file names for the target .lzo file.

In the following example, defaultPool_hdd is the storage pool, and the file name contains a timestamp.

ll –a /vms/.ocfs2_extent_backup/defaultPool_hdd/normal/

-rw-r--r-- 1 root root 176 Dec 24 00:00 .8257798_root_zhanji_1_202012240000.lzo

Therefore, the path of the most recent extent backup file is as follows:

/vms/.ocfs2_extent_backup/defaultPool_hdd/normal/.8257798_root_zhanji_1_202012240000.lzo.

Extent backup file decompression

To decompress an extent backup file, first copy it to another directory, /home or example.

cp /vms/.ocfs2_extent_backup/defaultPool_hdd/normal/.8257798_root_zhanji_1_202012240000.lzo /home

cd /home

lzop -dv .8257798_root_zhanji_1_202012240000.lzo

Script for data restoration

To run the script for restoring data from an extent backup file, execute the following command:

python /opt/bin/ocfs2_restore_utils.pyc dd /dev/dm-0 /home/.8257798_root_zhanji_1_202012240000 /vms/hw235-1/8257798_root_zhanji_1_202012240000_new

The parameters in the script are as follows:

· /dev/dm-0—Driver letter of the shared storage that saves the extent backup file. To check the drive letter of shared storage, execute the fsmcli command.

fsmcli showpool --name defaultPool_hdd

…

device name: /dev/dm-0

device path: /dev/disk/by-id/dm-name-360000000000000000e0000003b75836c

device naa: 360000000000000000e0000003b75836c

· /home/.8257798_root_zhanji_1_202012240000—Decompressed extent backup file.

· /vms/hw235-1—Path on newly created shared storage or local storage to save the restored file. Make sure the target path has enough space. Do not save the restored file to the original shared storage.

· 8257798_root_zhanji_1_202012240000_new—Name of the restored file. This name must be different from the name of the original file.

Shared storage space reclamation

Releasing space of a shared volume by editing the VM bus type

1. Execute the df –h command to check the available space of the target shared volume.

2. Log in to the VM with the shared volume attached and check the drive letter and mount path of the data disk provided by the shared volume.

3. Log in to UIS, shut down the VM, and delete the data disk.

Figure 7 Editing the VM

4. Mount the data disk to the VM again by adding hardware, and select the high-speed SCSI bus type.

Figure 8 Mounting the data disk

5. Log in to the VM, and mount the data disk again with the new drive letter.

mount /dev/sda /vms/ruitest

6. Execute the fstrim /vms/ruitest command to release space.

7. Log in to the host where the VM resides and verify that the available space of the shared volume has increased.

Releasing space of a shared volume by deleting files

1. Mount a data disk whose bus type is high-speed SCSI disk to a VM by using the following command:

mount -o discard /dev/sda /vms/ruitest

2. Verify that the discard option is specified in the mount command.

3. Log in to the host where the VM resides and check the available space of the shared volume.

4. Delete large file from the shared volume and verify that the available space of the shared volume has increased.

SNMP

Get responses not received by an NMS

Symptom 1

An NMS cannot receive get responses because the destination port for get responses is in use.

Solution 1

To resolve this issue:

1. Execute the netstat -apn |grep desination port command to obtain the process IDs for the destination port.

2. Execute the ps –aux | grep process ID command to check the processes that occupy the destination port.

3. If processes other than the snmp-get-responder process occupy the destination port, terminate those processes or kill them by using the kill process ID command.

Symptom 2

An incorrect OID is configured for SNMPv1 get responses on an NMS

Solution 2

To resolve this issue:

1. Log in to the leader storage node and execute the snmpget -v1 -c $community $ip:$port $oid command.

¡ $community—Community name. To ignore this configuration, enter public.

¡ $ip—Storage-end IP address.

¡ $port—Destination port for get responses.

¡ $oid—OID configured on the NMS.

If the following error message is output, the OID on the NMS is incorrect.

2. Modify the OID, and verify that the oid=string information is output.

Symptom 3

An incorrect OID is configured for SNMPv2c or SNMPv3 get responses on an NMS.

The storage supports the following OID ranges:

· 1.3.6.1.4.1.25506.1.7.1.2

· 1.3.6.1.4.1.25506.1.7.1.9

· 1.3.6.1.4.1.25506.1.7.1.10

· 1.3.6.1.4.1.25506.1.7.1.12

· 1.3.6.1.4.1.25506.1.7.1.13

On the NMS, a number in the range of 0 to 2147483647 is added to the end of an OID.

Solution 3

To resolve this issue:

1. Check the /var/log/onestor/snmp_get_responder.log file.

2. If the NoSuchObjectError error exists, the OID is not among the OIDs supported by the storage, and the OID does not exist in the MIB. Verify that the OID does not exceed the valid length.

3. If the NoAccessError error exists, the OID is not among the OIDs supported by the storage. The OID exists in the MIB, but the node does not have read or write permission. Verify that the OID is not shorter than the valid length.

4. If the ValueConstraintError error exists, make sure that the last number of the OID is in the range of 0 to 2147483647.

5. After you correct the OID, verify that the Success to write the vars log message is generated.

Value-added services

Data of a value-added service in the memory is different from that in the database

Analysis

This issue occurs if the handy node fails. Upon such a system event, a value-added service fails to update its data in the database, which causes data inconsistency between the memory and the database.

Solution

The solution varies by value-added service as follows:

· For the volume migration service, delete the inconsistent migration pairs, and then create migration pairs as needed.

· For the volume copy service, stop the inconsistent copy tasks, and then start copy tasks as needed.

Data inconsistency occurs if you mount a volume on a Windows client and create a snapshot online

Analysis

The product provides the storage-side snapshot function. When the system creates a snapshot, the host side might cache data. The hang IO service is used to implement data synchronization at multiple time points. This ensures that data is flushed to the data buffer on the host side at the time when a snapshot is created. Therefore, if the Windows client performs data caching at the time when a snapshot is created, data of the snapshot might be different from the real data.

Solution

As a best practice to avoid this issue, use an agent on the host side to achieve data caching and data flushing to the data buffer upon snapshot creation. However, such agent does not exist at present. Alternatively, you can take snapshots offline.

If you mount multiple snapshots of a volume on a Windows client at the same time, you are prompted that some snapshots are not initialized or assigned

Analysis

This issue might occur if you synchronously map a volume and its snapshots to the same host. The operating system of that host might recognize the source volume and its snapshots as the same volume, due to the volume recognition mechanism used by the operating system. For example, in the Oracle ASM scenario, a host identifies different volumes by ASM disk header information. This error will result in data corruption of the source volume and its snapshots.

Solution

Do not map a volume and its snapshots to the same host synchronously.

If you take a snapshot for a volume, delete its host mapping on the handy page without disk scanning or iSCSI disconnection, and restore the snapshot, the restored data is different from the original data.

Analysis

When the volume is unmapped from the host on the storage side, the host side is not aware of this event and still has data cache. If you restore data from the volume snapshot and mount the restored volume to the host again, data cache of the host will overwrite data of the restored volume.

Solution

Perform one of the following tasks before restoring data from the volume snapshot:

· Unmap the source volume from the host and perform disk scanning.

· Tear down the iSCSI connection.

If you create a read-only snapshot for a volume that is mounted by a directory, the snapshot cannot be mounted and the system prompts a wrong fs type message

Analysis

When you mount a volume on a Linux client, the new file system might not be flushed to the data buffer due to data caching. In this situation, if you take a snapshot for the mounted volume, the snapshotted file system is incomplete. Errors will occur if you mount the snapshot later.

Solution

Unmount the volume from the Linux client before snapshot creation.

The state of a snapshot is Creating, Deleting, or Restoring

Analysis

This issue might occur if the following conditions exist:

· The system has an exception and thus fails to create, delete, or restore a snapshot.

· The system cannot roll back its system records.

Solution

· For snapshots in Creating or Deleting state, manually delete the residual records generated for those snapshots.

· For snapshots in Restoring state, restore those snapshots again.

Compatibility

When the Intel ixgbe network adapter is enabled with load balancing, storage access gets slow

To avoid this issue, perform the following tasks:

1. Use the ethtool –i eth0 command to check whether the driver is ixgbe.

2. Use the ethtool –k eth0 command to check whether the large-receive-offload (LRO) service is disabled.

3. If the LRO service is enabled, use the ethtool –K eth0 lro off command to disable this service.

To ensure that the LRO service is disabled upon startup, add the ethtool –K eth0 lro off command in the /etc/rc.local file.

Using a QoS policy with low bandwidth and IOPS limits makes the storage disks of a client slow

Analysis

The I/O of a client might drop to 0 if the following conditions exist:

· The client uses multiple storage disks and a QoS policy with low bandwidth and IOPS limits is applied to those disks.

· Each used storage disk has high I/O concurrency. For more information about I/O concurrency, see the configuration file in method 2.

If Number of storage disks × Number of I/O concurrencies per storage disk is greater than the number of concurrencies on the iSCSI initiator, those storage disks have high concurrency.

Solution

To resolve this issue, use one of the following methods:

· Method 1: Distribute the service load if the service load is heavy on a single client.

¡ If only one client is available and you must deploy multiple storage disks on the client, install the multipathing service on the client and configure multiple iSCSI connections.

¡ If you can use multiple clients, distribute storage disks across different clients.

· Method 2: Increase the I/O limit on the iSCSI initiator.

a. Open the iSCSI initiator configuration file on the client. The default path is /etc/iscsi/iscsid.conf.

b. Find the session and device queue depth area in the configuration file, and then increase the value to the maximum (2048) for the node.session.cmds_max parameter.

Figure 9 Original I/O limit

Figure 10 New I/O limit

c. After the modification, restart the iSCSI initiator.

Failure to recognize an encryption dongle by VMs

To add an encryption dongle to a VM, make sure that dongle supports USB over network.

If an issue persists, contact Technical Support. As a best practice, use USBServer. For the supported models, see UIS compatibility matrix.

After a USB device is plugged into a CVK host, the host cannot recognize the USB device

Symptom

After a USB device is plugged into a CVK host, you cannot find the USB device when you attempt to add a USB device on the Web management page of UIS.

Analysis

Troubleshoot this issue as follows:

1. This issue occurs if the USB device is plugged into an incorrect slot. You can insert the USB device to another slot, for example, a USB slot inside the server. If the server has multiple types of USB slots, make sure the USB device is plugged into the matching slot.

To check whether a USB device is plugged into the correct slot, use the lsusb –t command. The following is an output example:

root@cvk-163:~# lsusb -t

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M

/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/15p, 480M

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M

|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/8p, 480M

/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M

|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/6p, 480M

In the command output:

¡ UHCI represents USB 1.1. The maximum data transfer speed of USB 1.1 is 12Mbps.

¡ EHCI represents USB 2.0. The maximum data transfer speed of USB 2.0 is 480Mbps.

¡ XHCI represents USB 3.0. The maximum data transfer speed of USB 3.0 is 5Gbps.

If the server supports multiple USB standards and you plug a USB 2.0 device into the correct slot on the server, a USB device is added in the bus of USB 2.0 (ehci-pci).

At present, USB 3.0, 2.0, and 1.0 are supported. Although you can plug a lower-version USB device into a higher-version USB slot, USB device incompatibility issues might occur. For example, when you plug a USB 1.0 device into a server that has only USB 3.0 slots, disable USB3.0 for the BIOS of that server to avoid USB device incompatibility issues.

If the host still cannot recognize the USB device, proceed to the next step.

2. On the command shell of the CVK host, use the lsusb command before and after you plug the USB device into the host. Compare the outputs to identify whether a new USB device is added. The following is an output example:

root@ CVK:~# lsusb

Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Bus 006 Device 002: ID 03f0:7029 Hewlett-Packard

Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

If no new USB device is added, the Ubuntu operating system cannot recognize the USB device. In this situation, the USB device might have faults, because an operating system with the Linux kernel supports most of the USB devices on the market. To check whether the USB device operates correctly, you can plug the USB device into an office PC. If the USB device can operate correctly on the PC, it is normal and you need to proceed to the next step.

3. Check whether the CAS system has faults or the server is not compatible with the USB device.

a. Install the operating system of an office PC on the server, plug the USB device into the server, and then check whether the system can recognize the USB device.

- If it cannot be recognized, the server is not compatible with the USB device.

- If it can be recognized, the server is compatible with the USB device.

b. Install the native CentOS system on the server, plug the USB device into the server, and then check whether the system can recognize the USB device.

- If it cannot be recognized, the CentOS system does not support the USB device. Since UIS is CentOS-based, it also does not support the USB device.
If there is a new device, it shows that the CentOS system has recognized the device, continue with the following steps to troubleshoot.

- If it can be recognized, proceed to the next step.

4. Use the virsh nodedev-list usb_device command to view the name of the new USB device. The following is an output example:

root@ CVK:~# virsh nodedev-list usb_device

usb_2_1_5

usb_usb1

usb_usb2

usb_usb3

usb_usb4

As shown in the command output, the name of the new USB device is usb_2_1_5. Then, use the virsh nodedev-dumpxml xxx command to view XML information of USB device usb_2_1_5. The following is an output example:

NOTE:

The xxx argument represents the name of a device. You can obtain this information by using the virsh nodedev-list usb_device command.

root@CVK:~# virsh nodedev-dumpxml usb_2_1_5

<path>/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5</path>

</driver>

<product id='0x6545'>DataTraveler G2 </product>

<vendor id='0x0930'>Kingston</vendor>

</capability>

</device>

Check whether the bus ID, device ID, product ID, and vendor ID are correct. If these IDs are all correct and you still cannot find the USB device on the Web management page of UIS, contact Technical Support.

After a USB device is loaded to a VM, the device manager of the VM cannot recognize the USB device, or the USB device appear and disappear quickly, or there is an exclamation mark on the device

Symptom:

After a USB device is loaded to a VM, the device manager of the VM cannot recognize the USB device, or the USB device appear and disappear quickly, or there is an exclamation mark on the device

Analysis

To resolve this issue:

1. Connect the USB device to another USB connector. If you use a USB extension cable, connect the USB device directly to a build-in USB connector and try again. If the server provides USB slots of multiple types, make sure the USB device is connected to the correct connector.

To identify whether the USB device is connected to the correct connector, use the lsusb –t command.

root@cvk-163:~# lsusb -t

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M

/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/15p, 480M

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M

|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/8p, 480M

/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M

|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/6p, 480M

UHCI represents USB1.1, EHCI represents USB2.0, and XHCI represents USB3.0. Typically, the maximum transmission rate for USB1.1 is 12 Mbps, for USB2.0 is 480 Mbps, and for USB3.0 is 5 Gbps.

For example, if a server supports multiple USB bus standards, and a USB2.0 device is added to the server, and a USB device is then added to the USB2.0 (ehci-pci) bus, it indicates that the USB device is correctly inserted in the slot.

2. If the USB devices such as USB Key, encryption token, or SMS modem are USB1.0, and the server only has USB3.0 connectors, it is recommended to disable USB3.0 in the BIOS.

3. To identify whether the CVK host can recognize the USB device, unplug and plug in the USB device, and then use the virsh nodedev-list usb_device command to check if there are any newly added USB devices.

¡ If no newly added USB device is detected, see "After a USB device is plugged into a CVK host, the host cannot recognize the USB device."

¡ If a newly added USB device is detected, proceed to the next step.

4. When adding the USB device to a VM, it is important to examine if the selected USB controller is correct for the device and to identify the USB version of the device (USB 1.0, USB 2.0, or USB 3.0). Typically, for USB devices such as USB Key, encryption token, or SMS modem, it is recommended to use the USB 1.0 controller.

5. If the USB device is not recognized by the VM, it is possible that the driver may be incompatible or outdated. Examine if the driver version matches the operating system of the VM.

One way to identify whether the driver is correct is to install the same operating system on a physical machine and test if the driver works correctly or consult with the USB device manufacturer. Another way is to create a similar VM on the VMware platform, install the same driver, and load the USB device to see if it is recognized by the VM.

If the correct driver is used, and the VM still cannot recognize the device, proceed to the next step.

6. Use virsh nodedev-dumpxml xxx to view the XML information of the newly added USB device. xxx represents the name of the newly added USB device in the output from the virsh nodedev-list usb_device command.

root@ CVK:~# virsh nodedev-list usb_device

usb_2_1_5

usb_usb1

usb_usb2

usb_usb3

usb_usb4

In this example, the name of the newly added USB device is usb_2_1_5.

root@CVK:~# virsh nodedev-dumpxml usb_2_1_5

<path>/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5</path>

</driver>

<product id='0x6545'>DataTraveler G2 </product>

<vendor id='0x0930'>Kingston</vendor>

</capability>

</device>

7. After loading the USB device to the VM, use the virsh nodedev-dumpxml xxx command again to examine if there is any change in the values of device ID, product ID, and vendor ID.

If there is a change in these values, it could be a compatibility issue between the server and the USB device. To troubleshoot this issue, try installing the same operating system used by the VM directly on the server and see if the USB device can be used normally. Examine the system logs for any errors. It is important to ensure that the USB device is not only visible but also functional. If the USB device works fine when the operating system is installed directly on the server, please contact H3C Support.

Use of USB3.0 devices

For a USB3.0 device, if you select the USB3.0 controller from the Web interface at USB device adding to a VM, but the USB device cannot be found in the VM after loading, possible reasons include:

· The VM lacks USB 3.0 driver. USB 3.0 is a relatively new protocol, and some old operating systems do not have the corresponding driver built-in, which requires downloading and installing the appropriate USB 3.0 driver for the corresponding operating system.

You can view the item in the red the following contents highlighted in the red box in the device manager in systems that support USB 3.0:

· The USB3.0 device is incompatible with the server. In this case, after you plug the USB 3.0 device into the server equipped with UIS, log in through an SSH terminal, and execute lsusb -t, no new devices can be displayed.

Use of USB-to-serial devices

Plug in a USB-to-serial device into a server equipped with UIS, log in through an SSH terminal, and use lsusb -t to check for new USB devices. If the speed of the newly added device is 12 Mbps, select the USB 1.0 controller when you add the USB device to a VM. If the speed is 480 Mbps, select the USB 2.0 controller.

For example:

After you load a USB-to-serial port device to a VM, no newly added serial port device can be viewed on the VM. After you install the USB-to-serial driver on the VM, the device still cannot be displayed. This issue occurs because the selected USB2.0 controller does not match the device speed. The issue is removed after you change to a USB1.0 controller.

A USB-to-serial cable is connected to four switches on one end and connected to a UIS-equipped server on the other end. After you log in through an SSH terminal and use the lsusb -t command to view new devices, the four newly added devices cannot be seen simultaneously. If you unplug and then plug the cable repeatedly, only one, two, or three devices can be seen. When an unrecognized USB connector is plugged in, the following syslog is generated:

The log is generated because of bus negotiation errors occurred at device and server connection establishment. In this case, identify whether the server is compatible with the USB-to-serial connection method as a best practice. In this example, the server is not compatible with the method. After the HP FlexServer R390 server used on-site is replaced with an R590 server, all the four new devices can be correctly identified.

NOTE:

If USB issues persist after troubleshooting, check the compatibility list and use USB Server.

Performance improvement

Contact Technical Support.

Guest OS and VM restoration

Restrictions and guidelines

· This document provides a general Linux and Windows OS repair process, which can be referenced for other systems.

· Disaster recovery system repair does not ensure complete success. Perform data backup and take other necessary measures in advance.

· The repair method might not be able to completely repair the VM. If the damage is severe and cannot be repaired using ISO or related tools, professional disaster recovery tools might be needed for data recovery and rescue, such as Diskgenius and diskrec. If necessary, contact a professional data recovery company for assistance.

Preparation before repair

Backup of system disks

For a damaged system's hard drive, perform a full disk backup in advance as a best practice, in case one repair attempt fails and additional repair methods need to be attempted.

For a damaged hard drive, you can use dd or other backup tools to copy the disk and create a backup.

In virtualization systems, you can back up the VM image file and clone it to another storage pool. Alternatively, you can create a snapshot on the storage side for the disk data to prevent unexpected situations during repair.

Preparing the corresponding ISO system

For Linux systems, prepare a CentOS or Ubuntu ISO installation disk to facilitate repair of Linux system directories. For Windows systems, use the ISO file or disk with the same version as the damaged system.

CAUTION:

· As a best practice, use the same version or a newer version of the ISO to mount and repair the system.

· During the repair process, it may be discovered that the file system format in the old version of the ISO is incompatible with the new version, leading to repair failure.

Linux system repair steps

1. Mount the optical drive and configure the system to boot from the optical drive, and then restart the system.

In a virtual environment using CAS, mount the ISO file as the optical drive on the VM to be repaired. On the Edit VM page, set the boot sequence to prioritize booting from the optical drive.

2. Start the system and attempt to repair it on the terminal.

In a virtual environment, locate the IP address of the CVK used by the VM and the corresponding VNC port in the CAS interface. Use a VNC client installed on your PC to connect to the port. TightVNC is a recommended VNC client.

NOTE:

As a best practice, do not use a browser console because some browsers may require frequent clearing of the browser cache to open the corresponding page after a few operations.

3. On the CentOS control interface, select Troubleshooting.

4. Select Rescue a CentOS System.

5. Select option 3 to enter the shell command prompt.

If an older version of the CentOS ISO is used, you can select the corresponding Skip button to enter the shell interface. The options for older CentOS versions include Continue, Read-only, Skip, and Advanced.

If using the Ubuntu ISO for repair, select Execute a shell in the installer environment.

CAUTION:

· The Ubuntu 1804 ISO repair mode does not have the XFS related tools installed by default. As a best practice, use the latest version of CentOS for XFS repair.

· Make sure to use the matching or updated version of the ISO.

6. Use LVS to check if LVs are being used.

As shown in the following figure, 3 LVs are found, the swap does not need to be repaired, and the corresponding VG name is centos.

Use the lvchange -a y command to activate the corresponding LV to make it readable.

lvchange -a y centos/home

lvchange -a y centos/root

Check the file system on the corresponding LV. Different file systems require different repair commands. Use blkid /dev/centos/home to identify the file system.

blkid /dev/centos/home

CAUTION:

· Different installation systems might have different VGs (some are centos, while others are VolGroup01, etc.). Select the VGs appropriately based on the actual output content.

· If the system does not use LVM, use blkid to identify the file system on the corresponding /dev/sdaX partition.

7. Repair XFS.

xfs_repair /dev/centos/lv_root

If the repair fails, collect log information (if any) and contact Technical Support.

8. Repair Ext4.

fsck /dev/datavg/lv_data

You might be prompted to enter yes in the middle, please do so. The repair steps for other file systems are similar.

9. Shut down the VM by executing the init 0 command.

10. Unmount the ISO drive and fall back to booting from the hard disk, and then restart the system.

11. Upon reboot, verify that the system's operations are normal.

Windows repair operations and steps

Symptom

After a CAS upgrade, a Windows 2008 VM prompts for repair upon starting up. Selecting repair results in a loading screen freeze, while selecting normal startup results in a black screen.

Repair steps

1. Attach the disk to another working Windows VM.

If the object being repaired is a VM, you can mount the system disk image of the faulty VM onto a working Windows VM. Then, use the disk check tool provided by Windows to check and repair disk errors. Delete the system disk of the faulty VM via the Edit VM > Disk page with the Delete Hardware operation.

2. On the working VM, add the system disk of the faulty VM via the Add Hardware option.

3. Select the faulty VM image. At this point, the system disk of the faulty VM can be seen in the working system.

For Windows 2012, a similar process applies. Select Computer Management, select a disk to view its properties, and perform error checking.

4. After mounting the disk, an error message might appear. Click on the blue error area to proceed.

Alternatively, scan and repair the properties of both partitions.

CAUTION:

· For both the process of operation and the image files, please use original system ISO files.

· In a virtualized environment, for qcow2 formatted files, multiple VMs cannot mount the same file at the same time. Therefore, one VM should unmount the file before another VM can mount the file for repair. A RAW format, preallocate set to zero format, or raw block format image can be mounted to multiple VMs simultaneously.

5. If errors persist after repair, an ISO file needs to be mounted for further repair. Reattach the repaired disk to the faulty VM. A black screen error might appear, indicating boot failure or bootmgr missing.

6. Mount the system disk in the optical drive to repair the bootmgr. Change the boot order to booting from the optical drive. In Windows 2008, open Repair Computer and select the command prompt window.

7. Enter the command below to repair the bootmgr file. The machine should restart normally after the bootmgr is repaired.

CAUTION:

· In a virtualization environment, select an IDE disk and mount the appropriate version of the ISO file.

· If the system still reports errors after repair, such as antivirus software or application startup errors, the related software or program needs to be closed or uninstalled (modify the name so that it cannot be started) in a normally working Windows system. Try booting the system again and according to the specific error information, make corresponding adjustments and modifications.

Upgrade

Contact Technical Support.

Independent deployment failure

Symptom

· After Workspace is installed in a VM, the system reports a 502 error. The Gauss installation logs show a failure to obtain the local network connection.

· Deployment by using an independent installation package.

Possible causes

The VM network is abnormal.

Solution

To resolve the issue, re-create a VM and select the correct Euler system, change the IP address, and restart the VM.

Unified authentication issue

CAS authentication service exception

Symptom

After the CAS service is enabled, you cannot UIS due to CAS authentication failure or other issues.

Solution

1. SSH to the CLI console of CVM and execute the mysql –p uis command to access the MySQL console.

2. Execute MariaDB [uis]> update TBL_PARAMETER set VALUE='0' WHERE NAME='cas.sso.enable';.

3. Reboot the UIS service: service uis-core restart.

4. Log in to UIS through the browser again.

UIS 2000 G6 hardware HA does not take effect

Checking server hardware information

1. Server model: H3C UIS 2000 G6.

2. Serial numbers for the upper and lower nodes on the server: *-L and *-U.

3. BMC information. Identify whether the upper and lower node CPLD, BIOS, and HDM versions match.

Checking the driver and application program status

If the CPLD_HA driver or CHD service fails to start, manually enable and start them by using the systemctl enable chd.service and systemctl start chd.service commands.

Checking the configuration file

Edit the configuration file at /etc/chd/chd.conf. Restart the CHD service after edit the file for the changes to take effect.

The hardware HA feature relies on the existing HA process cvk_ha. The cvk_ha process responds to CHD interrupts and completes fast HA migration.

Description for the configuration file:

cvk_ha { # Description, which must be unique.

srv_name "cvk_ha" # Name, which must be unique.

srv_pid "/var/run/ha_cvkd.pid" # PID file for running the process.

srv_proc_name "cvk_ha" # Process name that responds to signals. To obtain the process name, use the ps command.

srv_sig_on 10 # Server signal online

srv_sig_off 0 # Server signal offline

srv_sigs_max 0 # Set this value to limit the maximum number of failure signals sent. Use 0 to send signals continuously.

}

Identifying whether the cluster HA function is enabled

Verifying the configuration

Observe whether fast HA is triggered when AC power fails or the VM is shuts down.

Operations and maintenance monitoring data fails to be displayed

Possible causes

Sudden time jumps in the cluster or other anomalies caused monitoring database anomalies.

Symptom 1:

1. The system displays data retrieval failure when the operations and maintenance monitoring data is obtained.

2. Check the Prometheus database logs. The system displays opening storage failed. Also check the Prometheus-cluster-stderr---xxxxx.log file.

3. View logs

Solution:

1. Delete abnormal WAL files.

Access the /opt/h3c/var/lib/prometheus_node/data/wal directory. Identify whether the file numbers in this directory are consecutive. As shown in the following figure, there are two consecutive subsequences: 000001, 000002, 000003, and 000006, 000007.

2. Delete the sub-sequence with the smaller number. If you find abnormalities in Prometheus-cluster-stderr---xxx.log during the above troubleshooting, perform the same operation on the /opt/h3c/var/lib/prometheus_cluster/data/wal directory.

3. Restart the Prometheus process:

¡ If an exception is found in Prometheus-node-stderr---xxxx.log, restart the Prometheus-node process by executing the supervisorctl restart Prometheus-node command.

4. Restart the Prometheus-node process.

¡ If an exception is found in Prometheus-cluster-stderr---xxxx.log, restart the Prometheus-cluster process by executing the supervisorctl restart Prometheus-cluster command.

5. Restart the Prometheus-cluster process.

Symptom 2

1. The system displays data retrieval failure or no data exists when the operations and maintenance monitoring data is obtained.

2. Execute the following command to check Prometheus-related processes. You will find that prometheus-node or prometheus-cluster keeps restarting.

# supervisorctl status prometheus-node

# supervisorctl status prometheus-cluster

3. Check the Prometheus database logs. The system displays opening storage failed: invalid block sequence: block time ranges overlap.

¡ Example: level=error ts=2023-10-26T19:42:10.042Z caller=main.go:731 err="opening storage failed: invalid block sequence: block time ranges overlap:

¡ Also check Prometheus-cluster-stderr---xxxxx.log.

Solution

1. Delete the data in the directory with data errors.

¡ For the prometheus-node process, use the following commands to delete it.

# mkdir prometheus_node_bak

# cp -rf /opt/h3c/var/lib/prometheus_node/data/* prometheus_node_bak

# rm –rf /opt/h3c/var/lib/prometheus_node/data/*

¡ For the prometheus-cluster process, use the following commands to delete it.

# mkdir prometheus_cluster_bak

# cp -rf /opt/h3c/var/lib/prometheus_cluster/data/* prometheus_cluster_bak

# rm –rf /opt/h3c/var/lib/prometheus_cluster/data/*

This action will also delete historical monitoring data. Identify whether you need to back it up.

2. Restart the Prometheus process.

¡ If an exception is found in prometheus-node-stderr---xxxx.log, restart the prometheus-node process by executing the supervisorctl restart prometheus-node command.

Figure 11 Restarting the prometheus process

¡ If an exception is found in prometheus-cluster-stderr---xxxx.log, restart the prometheus-cluster process by executing the supervisorctl restart prometheus-cluster command.

Figure 12 Restarting the prometheus-cluster process

Host discovery: Hosts have empty serial numbers or the same serial number.

Possible causes: The host hardware does not have a serial number or the VMs share the same serial number during setup.

1. Check if the serial number is empty or the same as another host.

2. Customize the serial number as shown in the following figure.

3. Rescan to identify whether the custom serial number has taken effect.

In the Handy HA scenario, you cannot access the Web interface by using the HA IP.

Symptom

Login failed because the primary and backup Handy nodes became up and down alternatively or experienced abnormal power outage

· Symptom 1: Access the Handy management interface through a browser is unavailable using the HA IP.

· Symptom 2: After log-in via the HA IP, the system requires login with the management IP. However, when the management IP is used for login, the system requires login with the HA IP instead,.

Solution

1. Check the database processes on the primary and backup handy nodes. Identify the node where the database service fails to start. If neither the primary nor the backup handy node has the process running, use the node where the HA IP provides services.

# ps aux | grep mariadbcluster

2. Delete the gvwstate.dat file on this node. Skip this step if the file does not exist.

# sudo rm -rf /var/lib/mariadbcluster/gvwstate.dat

3. Set the value for the safe_to_bootstrap parameter to 1 for the node.

# vim /var/lib/mariadbcluster/grastate.dat Set the value for safe_to_bootstrap to 1.

4. Start the database service process on this node.

# service mariadbcluster bootstrap

5. Restart the database service processes on the other nodes in the cluster one by one. The nodes include the primary and backup handy nodes, as well as the nodes identified using Method 1.

# service mariadbcluster restart

6. Identify whether the database service is running correctly. Log in to the handy management interface again after recovery.

# /opt/h3c/bin/python /var/lib/ceph/shell/handyha/test_psql_status.py Run the script on the primary handy node. If PSQL_READY is returned, the database cluster has recovered to normal.

Host 2 experienced a power cycle when Host 1 entered maintenance mode. After Host 2 recovered, the OSD took a long time to restart (about 100 minutes).

Symptom

Host 2 experienced a power cycle when Host 1 entered maintenance mode. After Host 2 recovered, the OSD took a long time to restart (about 100 minutes).

Solution

Log out all sessions on the failed node by executing the iscsiadm -m session -u command.

When the CPU frequencies of the source and destination hosts differ before and after VM migration, the CPU limit set before migration changes to an invalid value after VM migration from E801P01 to E886P01

Symptom

Solution

Manually edit the values.

Interoperation with a third-party alarm server

Configuring a third-party alarm server on the UIS platform

1. Enter the UC 2.0 platform address as the server address.

2. Use the default port number 162.

3. Use the default community private.

Configuring UC 2.0 to monitor UIS alarms

Setting alarm rules

Adding the UIS platform

UIS platform added successfully

UC 2.0 received alarms

Alarm troubleshooting guide

If the UC platform does not receive an alarm, follow the instructions below for troubleshooting.

1. Identify whether the UIS platform has generated alarms and identify the source (frontend or backend).

2. If alarms are sent, capture them in the UC backend for inspection.

tcpdump -i any -vnn udp and port 162 -w [pcap file]

3. If you do not receive the alarm trap ID, check the sender.

Commonly used commands

UIS Manager commands

HA commands

H3C UIS Manager provides HA features. The following are the commonly used HA commands.

All the following commands, except for the cha -k set-loglevel level command run on a node where UIS Manager is deployed. The cha -k set-loglevel level command runs on a CVK host.

Obtaining the clusters managed by the HA process

cha cluster-list

# Obtain the clusters managed by the HA process.

root@UIS-UISManager:~# cha cluster-list

------------------------------------------------------------

HA database info:

Cluster list:

cluster:1, name:Cluster

HA memory info:

Cluster list:

cluster ID: 1

Obtaining state statistics for a cluster

cha cluster-status cluster-id

# Obtain the hosts and VMs in a cluster.

root@UIS-UISManager:~# cha cluster-status 1

------------------------------------------------------------

HA database info:

Cluster 1 information:

Is HA enabled: 1

Cluster priority: 1

2 nodes configured

6 VM configured

host and vm list:

Host:UIS-CVK01, vm:windows2008

Host:UIS-CVK02, vm:win2008

Host:UIS-CVK02, vm:rhce-lab

Host:UIS-CVK02, vm:Linux-RedHat5.9

Host:UIS-CVK02, vm:fundation1

Host:UIS-CVK02, vm:win7

HA memory info:

Cluster 1, Least_host_number(MIN_HOST_NUM) is 1.

Obtaining information for hosts in a cluster

cha node-list cluster-id

# Obtain information for hosts and VMs in a cluster.

root@UIS-UISManager:~# cha node-list 1

------------------------------------------------------------

HA database info:

In cluster 1, node list :

host: UIS-CVK01, in cluster: 1, IP: 192.168.11.1

host: UIS-CVK02, in cluster: 1, IP: 192.168.11.2

HA memory info:

Cluster 1, Least_host_number(PermitNum) is 1. hosts list:

host: UIS-CVK02 ID: 4

host: UIS-CVK01 ID: 3

Total host num in this cluster is: 2

Obtaining information for a host in a cluster

cha node-status host-name

# Obtain information for a host in a cluster.

root@UIS-UISManager:~# cha node-status UIS-CVK01

------------------------------------------------------------

HA database info:

Node UIS-CVK01 :

in cluster: 1

ip address: 192.168.11.1

VM count: 1

HA memory info:

Host: UIS-CVK01, ID: 3, IP address: 192.168.11.1

status: CONNECT

heart beat num: 101

storage total num: 1

storage fail num: 0

heartbeat fail num: 0

recv packet: 1

host model(maintain): 0

time statmp: Fri Jan 30 15:34:04 2015

Storage info:

storage name:sharefile path:/vms/sharefile

storage status:STORAGE_NORMAL

time stamp:0

update flag:0

last send flag:0

fail num:0

Obtaining information for a VM on a host

cha vm-list host-name

# Obtain information for a VM on a host.

root@UIS-CVK03:~# cha vm-list UIS-CVK01

------------------------------------------------------------

HA database info:

1 vms in host UIS-CVK01 :

vm: windows2008 ID: 11 HA-managed: 1 Target-role: 1

Obtaining information for a VM in a cluster

cha vm-status vm-name

# Obtain information for a VM in a cluster.

root@UIS-CVK03:~# cha vm-status windows2008

------------------------------------------------------------

HA database info:

vm ID: 11 name: windows2008

at node ID: 3

target-role: 1

is-managed: 1

prority: 1

storage name: sharefile

storage psth: /vms/sharefile

Setting the log level

cha set-loglevel module level

Parameters:

· cmd|UIS managerd: Sets the log level for the cmd or UIS Manager process.

· level: Specifies the log level, including debug, info, trace, warning, error, and fatal.

# Set the log level.

root@UIS-UIS Manager:~# cha set-loglevel info

Setting the log level for a CVK host

cha -k set-loglevel level

Parameters:

level: Specifies the log level, including debug, info, trace, warning, error, and fatal.

# Set the log level for a CVK host.

root@UIS-CVK01:/vms/sharefile# cha -k set-loglevel debug

Set cvk log level success.

root@UIS-CVK01:/vms/sharefile#

vSwitch commands

The following are the basic commands for vSwitches in UIS Manager.

Obtaining the internal version number of the vSwitch

root@hz-cvknode2:~# ovs-vsctl -V

ovs-vsctl (Open vSwitch) 2.9.1

DB Schema 7.15.1

Displaying status of processes related to the vSwitch

Execute the ps aux | grep ovs command on a CVK host. ovs_workq is an OVS kernel process, and ovsdb-server and ovs-vswitchd represent a monitor process and service process, respectively. If the SDN network is initialized, there are four additional ovsdb-server processes, which represent the SDN network north-bound and south-bound database processes and north-bound and south-bound database monitor processes.

[root@cvknode1 ~]# ps aux | grep ovs

root 2133 1.5 0.0 9180 5444 ? S<s Nov08 329:47 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --verbose=PATTERN:FILE:%d{%b %d %H:%M:%S}|%05N|%c|%p|%m --verbose=PATTERN:CONSOLE:%d{%b %d %H:%M:%S}|%05N|%c|%p|%m --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach

root 2255 1.3 0.0 1885032 293384 ? S<Lsl Nov08 296:37 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --verbose=PATTERN:FILE:%d{%b %d %H:%M:%S}|%05N|%c|%p|%m --verbose=PATTERN:CONSOLE:%d{%b %d %H:%M:%S}|%05N|%c|%p|%m --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach

root 371762 0.0 0.0 8200 452 ? Ss Nov18 0:00 ovsdb-server: monitoring pid 371763 (healthy)

root 371763 0.0 0.0 9400 5992 ? S Nov18 1:12 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/run/ovn/ovnnb_db.sock --pidfile=/run/ovn/ovnnb_db.pid --unixctl=/run/ovn/ovnnb_db.ctl --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db

root 371859 0.0 0.0 8200 448 ? Ss Nov18 0:00 ovsdb-server: monitoring pid 371861 (healthy)

root 371861 0.0 0.0 12188 8640 ? S Nov18 1:36 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --remote=punix:/run/ovn/ovnsb_db.sock --pidfile=/run/ovn/ovnsb_db.pid --unixctl=/run/ovn/ovnsb_db.ctl --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections --private-key=db:OVN_Southbound,SSL,private_key --certificate=db:OVN_Southbound,SSL,certificate --ca-cert=db:OVN_Southbound,SSL,ca_cert --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers /var/lib/ovn/ovnsb_db.db

root 2172726 0.0 0.0 21964 2408 pts/5 S+ 11:00 0:00 grep --color=auto ovs

Restarting a vSwitch

root@UIS-CVK01:~# service openvswitch-switch restart

Adding a vSwitch

root@UIS-CVK01:~# ovs-vsctl add-br vswitch-app

After a vSwitch is added successfully, you can see the vSwitch on UIS Manager after connecting all hosts on UIS Manager.

Deleting a vSwitch

root@UIS-CVK01:~# ovs-vsctl del-br vswitch-app

A vSwitch cannot be deleted from UIS Manager after it is deleted from the CVK host.

Adding a port for a vSwitch

root@UIS-CVK01:~# ovs-vsctl add-port vswitch-app eth2

Deleting a port from a vSwitch

root@UIS-CVK01:~# ovs-vsctl del-port vswitch-app eth2

The port on a vSwitch cannot be deleted from UIS Manager after it is deleted from the CVK host.

Displaying vSwitch and port information

vswitch0 is an internal port (or local port), eth0 is a physical port, and vnet0 is a vSwitch port.

root@UIS-CVK01:~# ovs-vsctl show

ba390c40-8826-4a7a-8e17-f8834dab6eb3

Bridge "vswitch0"

Port "eth0"

Interface "eth0"

Port "vswitch0"

Interface "vswitch0"

type: internal

Port "vnet0"

Interface "vnet0"

root@UIS-CVK01:~#

Displaying the configuration on a vSwitch

root@UIS-CVK01:~# ovs-vsctl list br vswitch0

_uuid : 3500114d-5619-460e-ada7-d1b97f63c93c

br_mode : 【0】

controller : 【】

datapath_id : "0000ac162d88c35c"

datapath_type : ""

drop_unknown_uniUISt: 【】

external_ids : {}

fail_mode : 【】

firewall_port : 【】

flood_vlans : 【】

flow_tables : {}

ipfix : 【】

mirrors : 【】

name : "vswitch0"

netflow : 【】

other_config : {}

ports : 【16a48463-f90b-42fe-9a12-ceacfd256235, 5495812e-29e0-4364-a89f-b54ea52dd344, dec98186-2c83-447d-9215-28f99750a410】

protocols : 【】

sflow : 【】

status : {}

stp_enable : false

Displaying port configuration

root@UIS-CVK01:~# ovs-vsctl list port vnet0

_uuid : bc0b1e57-2d72-4fae-97b4-0bbca5d17ba1

TOS : routine

bond_downdelay : 0

bond_fake_iface : false

bond_mode : []

bond_updelay : 0

dynamic_acl_enable : false

external_ids : {}

fake_bridge : false

interfaces : [5495133f-7e81-4047-a0bd-734fae81f6f3]

lacp : []

lan_acl_list : []

lan_addr : []

mac : []

name : "vnet0"

other_config : {}

qbg_mode : [4]

qos : []

statistics : {}

status : {}

tag : [4]

tcp_syn_forbid : false

trunks : []

vlan_mode : []

vm_ip : []

vm_mac : "0cda411dad80"

wan_acl_list : []

wan_addr : []

Displaying the port number for a port in user mode and kernel mode

root@UIS-CVK01:~# ovs-appctl dpif/show

system@ovs-system: hit:10133796 missed:181938

flows: cur: 11, avg: 12, max: 23, life span: 79639399ms

hourly avg: add rate: 26.506/min, del rate: 26.462/min

daily avg: add rate: 24.205/min, del rate: 24.210/min

overall avg: add rate: 24.356/min, del rate: 24.354/min

vswitch0: hit:6478229 missed:39021

eth0 1/5: (system)

vnet1 2/8: (system)

vswitch0 65534/6: (internal)

For example, the port number of ether0 is 2 in user mode (OpenFlow port number) and 5 in kernel mode.

Displaying the MAC addresses on a vSwitch

root@UIS-CVK01:~# ovs-appctl fdb/show vswitch0

port VLAN MAC Age

1 0 00:0f:e2:5a:6a:20 134

2 0 0c:da:41:1d:3d:18 95

1 0 ac:16:2d:6f:3f:4a 6

1 0 a0:d3:c1:f0:a6:ca 6

1 0 c4:ca:d9:d4:c2:ff 2

4 0 0c:da:41:1d:6d:94 2

LOCAL 0 2c:76:8a:5d:df:a2 2

3 0 0c:da:41:1d:80:03 0

Displaying port binding information on a vSwitch

root@UIS-CVK02:~# ovs-appctl bond/show

---- vswitch-bond_bond ----

bond_mode: active-backup

bond-hash-basis: 0

updelay: 0 ms

downdelay: 0 ms

lacp_status: off

slave eth2: enabled

active slave

may_enable: true

slave eth3: disabled

may_enable: false

Displaying flow entry information

root@UIS-CVK01:~# ovs-ofctl dump-flows vswitch0

NXST_FLOW reply (xid=0x4):

cookie=0x0, duration=752218.541s, table=0, n_packets=15106363, n_bytes=3556156038, idle_age=0, hard_age=65534, priority=0 actions=NORMAL

Displaying kernel flow entry information on a vSwitch

root@UIS-CVK01:~# ovs-appctl dpif/dump-flows vswitch0

skb_priority(0),in_port(5),eth(src=74:25:8a:36:d8:9b,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.88.8.1/255.255.255.255,tip=10.88.8.206/255.255.255.255,op=1/0xff,sha=74:25:8a:36:d8:9b/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:2, bytes:120, used:3.018s, actions:6

skb_priority(0),in_port(5),eth(src=38:63:bb:b7:ed:6c,dst=01:00:5e:00:00:fc),eth_type(0x0800),ipv4(src=10.88.8.140/0.0.0.0,dst=224.0.0.252/0.0.0.0,proto=17/0,tos=0/0,ttl=1/0,frag=no/0xff), packets:1, bytes:66, used:1.139s, actions:6

skb_priority(0),in_port(5),eth(src=c4:34:6b:6c:ef:a8,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.200/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:17, bytes:1564, used:3.370s, actions:6

skb_priority(0),in_port(5),eth(src=14:58:d0:b7:24:07,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.229/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:6, bytes:692, used:0.771s, actions:6

skb_priority(0),in_port(5),eth(src=14:58:d0:b7:53:f6,dst=01:00:5e:7f:ff:fa),eth_type(0x0800),ipv4(src=10.88.8.146/0.0.0.0,dst=239.255.255.250/0.0.0.0,proto=17/0,tos=0/0,ttl=1/0,frag=no/0xff), packets:1, bytes:175, used:0.739s, actions:6

Displaying all kernel flow entries

root@UIS-CVK01:~# ovs-dpctl dump-flows

skb_priority(0),in_port(4),eth(src=c4:34:6b:6c:f5:ab,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.159/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:25, bytes:2300, used:0.080s, actions:3

skb_priority(0),in_port(5),eth(src=14:58:d0:b7:53:f6,dst=33:33:00:01:00:02),eth_type(0x86dd),ipv6(src=fe80::288d:70d6:36ce:60f3/::,dst=ff02::1:2/::,label=0/0,proto=17/0,tclass=0/0,hlimit=1/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:6

skb_priority(0),in_port(13),eth(src=0c:da:41:1d:80:03,dst=c4:ca:d9:d4:c2:ff),eth_type(0x0800),ipv4(src=192.168.2.15/255.255.255.255,dst=192.168.2.121/0.0.0.0,proto=6/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:1, bytes:54, used:2.924s, actions:2

skb_priority(0),in_port(4),eth(src=c4:34:6b:68:9b:78,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=fe80::85b7:25a0:d116:907a/::,dst=ff08::2/::,label=0/0,proto=17/0,tclass=0/0,hlimit=128/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:3

skb_priority(0),in_port(4),eth(src=5c:dd:70:b0:39:3d,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.11.149/255.255.255.255,tip=192.168.11.150/255.255.255.255,op=1/0xff,sha=5c:dd:70:b0:39:3d/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:1, bytes:60, used:0.264s, actions:3

Capturing packets on a port

Use tcpdump to capture packets on the port corresponding to the vSwitch: For more information about the tcpdump command, see "Networking."

tcpdump -i vnet1 -s 0 -w /tmp/test.pcap host 200.1.1.1 &

SDN commands

H3C UIS cluster CVM contains the ovn module. The following lists the commonly used ovn commands.

Obtaining the ovn version

[root@cvknode1 ~]# ovn-nbctl -V

ovn-nbctl 22.03.1

Open vSwitch Library 2.17.90

DB Schema 6.1.0

Checking the process status

· Check the status of the ovn-northd process

[root@cvknode1 ~]# systemctl status ovn-northd

● ovn-northd.service - OVN northd management daemon

Loaded: loaded (/usr/lib/systemd/system/ovn-northd.service; enabled; vendor preset: disabled)

Active: active (exited) since Wed 2023-11-22 11:40:44 CST; 2 days ago

Main PID: 577576 (code=exited, status=0/SUCCESS)

Tasks: 8 (limit: 306436)

Memory: 9.5M

CGroup: /system.slice/ovn-northd.service

├─ 577605 "ovsdb-server: monitoring pid 577606 (healthy)" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ">

├─ 577606 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/run/ovn/ovnnb_db.sock --pidfile=/run/ovn/ovnnb_db.pid --un>

├─ 577622 "ovsdb-server: monitoring pid 577623 (healthy)" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ">

├─ 577623 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --remote=punix:/run/ovn/ovnsb_db.sock --pidfile=/run/ovn/ovnsb_db.pid --un>

├─ 577633 "ovn-northd: monitoring pid 577634 (healthy)" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >

└─ 577634 ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db=unix:/run/ovn/ovnnb_db.sock --ovnsb-db=unix:/run/ovn/ovnsb_db.sock --no-chdir --log-file=/var/l>

Nov 24 15:47:08 cvknode1 ovsdb-server[577623]: ovs|00009|jsonrpc|WARN|unix#5: receive error: Connection reset by peer

Nov 24 15:47:08 cvknode1 ovsdb-server[577623]: ovs|00010|reconnect|WARN|unix#5: connection dropped (Connection reset by peer)

Nov 24 15:48:58 cvknode1 ovsdb-server[577606]: ovs|00049|jsonrpc|WARN|unix#36: receive error: Connection reset by peer

Nov 24 15:48:58 cvknode1 ovsdb-server[577606]: ovs|00050|reconnect|WARN|unix#36: connection dropped (Connection reset by peer)

Nov 24 15:51:22 cvknode1 ovsdb-server[577606]: ovs|00051|jsonrpc|WARN|unix#38: receive error: Connection reset by peer

Nov 24 15:51:22 cvknode1 ovsdb-server[577606]: ovs|00052|reconnect|WARN|unix#38: connection dropped (Connection reset by peer)

Nov 24 15:52:18 cvknode1 ovsdb-server[577606]: ovs|00053|jsonrpc|WARN|unix#41: receive error: Connection reset by peer

Nov 24 15:52:18 cvknode1 ovsdb-server[577606]: ovs|00054|reconnect|WARN|unix#41: connection dropped (Connection reset by peer)

Nov 24 15:56:27 cvknode1 ovsdb-server[577623]: ovs|00011|jsonrpc|WARN|unix#6: receive error: Connection reset by peer

Nov 24 15:56:27 cvknode1 ovsdb-server[577623]: ovs|00012|reconnect|WARN|unix#6: connection dropped (Connection reset by peer)

· Check the status of the ovn-controller process

[root@cvknode1 ~]# systemctl status ovn-controller

● ovn-controller.service - OVN controller daemon

Loaded: loaded (/usr/lib/systemd/system/ovn-controller.service; enabled; vendor preset: disabled)

Active: active (running) since Wed 2023-11-22 11:40:45 CST; 2 days ago

Main PID: 578155 (ovn-controller)

Tasks: 4 (limit: 306436)

Memory: 4.1M

CGroup: /system.slice/ovn-controller.service

└─ 578155 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn>

Notice: journal has been rotated since unit was started, output may be incomplete.

Viewing the north-bound database

[root@cvknode1 ~]# ovn-nbctl show

switch bfe0ebf6-c116-4838-a5bc-f8f70dd0fdcb (std-15c16e9c-1286-420d-aec4-6a32ad11553d) (aka pubnet1)

port std-15c16e9c-1286-420d-aec4-6a32ad11553d_lnet

type: localnet

addresses: ["unknown"]

port lsp-pubnet1-r1

type: router

router-port: lrp-r1-pubnet1

switch 8531bfe6-6cbe-407e-98d7-d28754a07608 (std-f0455795-d214-4b36-a62b-16f71d2ebf04) (aka net1)

port 3cebf254-8bc7-462f-9bef-7c9dd330b124 (aka xjxnj-1_0c:da:41:1d:52:99)

addresses: ["0c:da:41:1d:52:99 10.10.10.2"]

port 0906ddcb-ea28-4a60-91d2-301ebbf2a8d6 (aka xjxnj-3_0c:da:41:1d:2a:b0)

addresses: ["0c:da:41:1d:2a:b0"]

port 8b532d0e-5faa-4d71-8ae4-7f4dabca8e3e (aka xjxnj-2_0c:da:41:1d:3b:53)

addresses: ["0c:da:41:1d:3b:53 10.10.10.3"]

port lsp-net1-sub1-r1

type: router

router-port: lrp-r1-net1-sub1

port c768b637-44fc

addresses: ["66:01:00:00:00:03 10.10.10.8"]

router ec0d0744-3678-443b-9811-58542296818c (std-333e3b30-1f30-498d-bd1d-63758c716246) (aka r1)

port lrp-r1-net1-sub1

mac: "66:01:00:00:00:01"

networks: ["10.10.10.1/24"]

port lrp-r1-pubnet1

mac: "66:01:00:00:00:02"

networks: ["10.99.221.4/24"]

gateway chassis: [fdfad6dc-f57f-4eb2-8564-848909099a31]

nat 21cc0da1-32a1-470d-8f61-07b8dd67e8fc

external ip: "10.99.221.3"

logical ip: "10.10.10.3"

type: "dnat_and_snat"

nat 4c255190-3972-4d67-a6bc-f414177f1fb5

external ip: "10.99.221.2"

logical ip: "10.10.10.2"

type: "dnat_and_snat"

Viewing the south-bound database

[root@cvknode1 ~]# ovn-sbctl show

Chassis "fdfad6dc-f57f-4eb2-8564-848909099a31"

hostname: cvknode1

Encap vxlan

ip: "10.10.2.1"

options: {csum="true"}

Port_Binding cr-lrp-r1-pubnet1

Port_Binding c768b637-44fc

Viewing network egress configuration

[root@cvknode1 ~]# ovs-vsctl list open_vswitch

_uuid : a9e2a39e-5aa6-4679-96e5-c9a7b89026a9

acls : []

bridges : [077e541b-92ca-48d2-bb10-c0cec48eec58, 40b9f8b2-6714-439a-bbc4-04c08805ba82, 7f07633f-4ece-4783-96d7-be22d466294b]

cur_cfg : 22

datapath_types : [netdev, system]

datapaths : {system=313cbe36-963d-4193-b4e1-503eabdee554}

db_version : "8.3.0"

dpdk_initialized : false

dpdk_version : none

external_ids : {hostname=cvknode1, ovn-bridge-mappings="uis:vs_business", ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="10.10.2.1", ovn-encap-type=vxlan, ovn-remote="tcp:10.99.221.86:6642", rundir="/var/run/openvswitch", system-id="fdfad6dc-f57f-4eb2-8564-848909099a31"}

iface_types : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]

manager_options : []

next_cfg : 22

other_config : {drain_bypass=True, hw-offload="false", offload-ct="true", vlan-limit="0"}

ovs_version : "2.16.4"

ssl : []

statistics : {}

system_type : H3Linux

system_version : "2.0.2"

Viewing the switch list

[root@cvknode1 ~]# ovn-nbctl list logical_switch

_uuid : bfe0ebf6-c116-4838-a5bc-f8f70dd0fdcb

acls : []

copp : []

dns_records : []

external_ids : {description="", external="true", from=std, managed=ovnagent, mtu="1500", "neutron:network_name"=pubnet1}

forwarding_groups : []

load_balancer : []

load_balancer_group : []

name : std-15c16e9c-1286-420d-aec4-6a32ad11553d

other_config : {vlan-passthru="false"}

ports : [4325e08f-eb21-451c-82ed-1840b193ccc7, 6cac3d4c-1023-4213-bd4a-bfbd79763e3e]

qos_rules : [65c13928-3617-42b4-ba7f-d784f086366e]

_uuid : 8531bfe6-6cbe-407e-98d7-d28754a07608

acls : []

copp : []

dns_records : []

external_ids : {description="", external="false", from=std, managed=ovnagent, mtu="1500", "neutron:network_name"=net1}

forwarding_groups : []

load_balancer : []

load_balancer_group : []

name : std-f0455795-d214-4b36-a62b-16f71d2ebf04

other_config : {vlan-passthru="false"}

ports : [1872aa4d-85c1-4a28-af9f-6ad5c7fc581b, 301f2520-f518-4351-a502-3d3672cd087b, 4b8688ab-ef09-473e-9017-2dd195324005, 8a2ed28e-c41f-45ea-917a-160ef0f0a041, 904f0a26-652c-4cb2-950b-8f7733554062]

qos_rules : []

Viewing the DHCP option list

[root@cvknode1 ~]# ovn-nbctl list dhcp_options

_uuid : 02d1d6ed-6d76-4ec4-a055-1e1d658dc04c

cidr : "10.10.10.0/24"

external_ids : {description="", from=std, internal_name=std-96c8c7fd-b389-4456-98fd-3d40f127e521, ip_version="4", linked=std-333e3b30-1f30-498d-bd1d-63758c716246, managed=ovnagent, network_id=std-f0455795-d214-4b36-a62b-16f71d2ebf04, subnet_name=sub1}

options : {classless_static_route="{0.0.0.0/0,10.10.10.1}", lease_time="3600", mtu="1500", router="10.10.10.1", server_id="10.10.10.1", server_mac="66:01:00:00:00:01"}

_uuid : 5d7693ba-2403-4b6e-bd53-ec472b94759b

cidr : "10.99.221.0/24"

external_ids : {description="", externalGatewayIp="10.99.221.4", from=std, internal_name=std-fec88d08-0e37-4e07-96c5-f3ab3c752408, ip_version="4", linked="ec0d0744-3678-443b-9811-58542296818c", managed=ovnagent, network_id=std-15c16e9c-1286-420d-aec4-6a32ad11553d, subnet_name=pubsub1}

options : {classless_static_route="{0.0.0.0/0,10.99.221.1}", lease_time="3600", mtu="1500", router="10.99.221.1", server_id="10.99.221.1", server_mac="66:01:00:00:00:00"}

Viewing the logical router list

[root@cvknode1 ~]# ovn-nbctl list logical_router

_uuid : ec0d0744-3678-443b-9811-58542296818c

copp : []

enabled : true

external_ids : {description="", from=std, managed=ovnagent, "neutron:router_name"=r1}

load_balancer : []

load_balancer_group : []

name : std-333e3b30-1f30-498d-bd1d-63758c716246

nat : [21cc0da1-32a1-470d-8f61-07b8dd67e8fc, 4c255190-3972-4d67-a6bc-f414177f1fb5]

options : {}

policies : []

ports : [08cbef44-9379-4ecf-b4a7-4c8e8c0d3105, a3a82614-fc8c-41b1-aa6a-0125a3e247a3]

static_routes : [961103a0-bfb9-4557-9b0a-50d138439f82]

Viewing the port list

[root@cvknode1 ~]# ovn-nbctl list logical_switch_Port

_uuid : 301f2520-f518-4351-a502-3d3672cd087b

addresses : ["0c:da:41:1d:2a:b0"]

dhcpv4_options : []

dhcpv6_options : []

dynamic_addresses : []

enabled : true

external_ids : {from=std, ifaceid_as_name="1", managed=ovnagent, "neutron:port_name"="xjxnj-3_0c:da:41:1d:2a:b0", qos_policy_id="", security_groups=""}

ha_chassis_group : []

name : "0906ddcb-ea28-4a60-91d2-301ebbf2a8d6"

options : {}

parent_name : []

port_security : []

tag : []

tag_request : []

type : ""

up : false

Viewing the QoS list

[root@cvknode1 ~]# ovn-nbctl list qos

_uuid : 65c13928-3617-42b4-ba7f-d784f086366e

action : {}

bandwidth : {burst=65536000, rate=100000000}

direction : from-lport

external_ids : {managed=ovnagent, qos_policy_id="e2cdf741-7c9c-42f3-81af-a830d06e3ad1"}

match : "ip4.src == 10.99.221.3 || ip4.src == 10.99.221.2"

priority : 1003

Viewing the ACL list

[root@cvknode1 ~]# ovn-nbctl list acl

_uuid : c97a2140-5d79-4d20-af68-2c22046f7b8a

action : drop

direction : from-lport

external_ids : {description="security group base rule -- ipv6,egress", ethertype=ip6, from=std, managed=ovnagent, port_range_max="", port_range_min="", protocol=any, remote_ip_prefix="::/0", security_group_id="732e8385-fe95-475e-8cb9-a93299c59f6d", tcp_flags=""}

label : 0

log : false

match : "inport == @std_3cfd20f7_e3a1_41f7_b6fc_df85bb8506ec && ip6 && ip6.dst == ::/0"

meter : []

name : sg_ipv6_egress_base_white

options : {}

priority : 1001

severity : []

Viewing all NAT rules

[root@cvknode1 ~]# ovn-nbctl list nat

_uuid : 4c255190-3972-4d67-a6bc-f414177f1fb5

allowed_ext_ips : []

exempted_ext_ips : []

external_ids : {_name=fip1, description="", fixed_ip_address="10.10.10.2", floating_network_id=std-15c16e9c-1286-420d-aec4-6a32ad11553d, from=std, internal_name="532812ee-ac24-4ffe-bef1-b5e211812c25", logical_port="3cebf254-8bc7-462f-9bef-7c9dd330b124", managed=ovnagent, qos_policy_id="e2cdf741-7c9c-42f3-81af-a830d06e3ad1", subnet_id="5d7693ba-2403-4b6e-bd53-ec472b94759b"}

external_ip : "10.99.221.2"

external_mac : "0c:da:41:1d:52:99"

external_port_range : ""

logical_ip : "10.10.10.2"

logical_port : "3cebf254-8bc7-462f-9bef-7c9dd330b124"

options : {}

type : dnat_and_snatw

Viewing the security group list

[root@cvknode1 ~]# ovn-nbctl list port_group

_uuid : 732e8385-fe95-475e-8cb9-a93299c59f6d

acls : [4665023d-30b4-4384-8c1d-e02583d54f2a, 5d904d87-384a-4daa-ac50-ff8540152fd8, 863ce243-5c83-47c0-bece-886df88e0c1d, 9c6a450f-5516-48f6-a16d-aabb8edfccab, c97a2140-5d79-4d20-af68-2c22046f7b8a, da01fe2b-3f7a-4e48-b669-f2ba78751adf]

external_ids : {description="", from=std, managed=ovnagent, name=acl1, priority="1002", type=white}

name : std_3cfd20f7_e3a1_41f7_b6fc_df85bb8506ec

ports : []

Viewing the status of a load balancer

· View the status of the haproxy process.

[root@cvknode1 ~]# ps -ef | grep haproxy

root 862127 860223 0 16:46 pts/1 00:00:00 grep --color=auto haproxy

nobody 1329507 1 0 Nov23 ? 00:00:00 /usr/share/ovn-agent/usr/sbin/haproxy -f /var/lib/ovn-agent/lbaas/125eb2e6-e104-416e-bf11-b395dfeb14c7/haproxy.conf -p /var/lib/ovn-agent/lbaas/125eb2e6-e104-416e-bf11-b395dfeb14c7/haproxy.pid

· View the namespace list.

[root@cvknode1 ~]# ip netns ls | grep lbaas-

lbaas-125eb2e6-e104-416e-bf11-b395dfeb14c7 (id: 0)

iSCSI commands

H3C UIS uses iSCSI to mount IP SAN storage devices. When an iSCSI shared file system has exceptions, you can use iSCSI commands for troubleshooting. To enable iser mode, add the -I iser option to the iscsiadm command.

Discovering iSCSI storage

iscsiadm -m discovery -t st -p ISCSI_IP or

iscsiadm -m discovery -t st -p ISCSI_IP –I iser (iser mode)

# Discover iSCSI sotorage.

root@HZ-UIS01-CVK01:~# iscsiadm -m discovery -t st -p 192.168.1.248:3260

192.168.1.248:3260,1 iqn.1991-05.com.microsoft:c09599-cmh-target

root@HZ-UIS01-CVK01:~#

Displaying iSCSI storage discovery records

iscsiadm -m node

# Display iSCSI storage discovery records.

root@HZ-UIS01-CVK01:~# iscsiadm -m node

192.168.1.248:3260,1 iqn.1991-05.com.microsoft:c09599-cmh-target

Deleting the iSCSI storage discovery records

iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP

iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP –I iser (iser mode)

# Delete the iSCSI storage discovery records.

# iscsiadm -m node -o delete -T iqn.1991-05.com.microsoft:c09599-cmh-target -p

192.168.1.248:3260

Logging in to an iSCSI storage device

iscsiadm -m node -T LUN_NAME -p ISCSI_IP –l or

iscsiadm -m node -T LUN_NAME -p ISCSI_IP –l –I iser (iser mode)

# Log in to an iSCSI storage device.

root@HZ-UIS01-CVK01:~# iscsiadm -m node -T iqn.1991-05.com.microsoft:c09599-cmh-target

-p 192.168.1.248:3260 -l

Logging in to 【iface: default, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:

192.168.1.248,3260】

192.168.1.248,3260】: successful

Logging out of an iSCSI storage device

iscsiadm -m node -T LUN_NAME -p ISCSI_IP –u

iscsiadm -m node -T LUN_NAME -p ISCSI_IP –u –I iser (iser mode)

# Log out of an iSCSI storage device.

root@HZ-UIS01-CVK01:~# iscsiadm -m node -T iqn.1991-05.com.microsoft:c09599-cmh-target

-p 192.168.1.248:3260 -u

Logging out of session 【sid: 4, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:

192.168.1.248,3260】

Logout of 【sid: 4, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:

192.168.1.248,3260】: successful

Mounting FC storage

Obtaining the HBA card information

Method 1: Log in to the CVM system, access the storage management page, and then click a storage adapter to view HBA card information. If the card is in active state, storage access is available.

Method 2: Display driver information. If the driver is loaded correctly for the HBA card, HBA information will be displayed in the /sys/class/fc_host/host* directory.

[root@cvknode2-158 /]#ls /sys/class/fc_host/

host0 host2 host3 host4

[root@cvknode2-158 /]#ls /sys/class/fc_host/host0

device issue_lip npiv_vports_inuse port_state speed supported_classes system_hostname vport_create

dev_loss_tmo max_npiv_vports port_id port_type statistics supported_speeds tgtid_bind_type vport_delete

fabric_name node_name port_name power subsystem symbolic_name uevent

Connecting to the FC storage

Execute the following command:

echo hba_channel target_id target_lun > /sys/class/scsi_host/host*/scan

Hba_channel represents the HBA card channel, target_id represents the target ID, and target_lun represents the LUN. To obtain the information, execute the /sys/class/fc_transport/ command.

[root@cvknode2-158 /]#ls /sys/class/fc_transport/

target0:0:0

[root@cvknode2-158 /]# echo 0 0 0 > /sys/class/scsi_host/host0/scan

Disconnecting the FC storage

Execute the following command:

echo 1 > /sys/block/sdX/device/delete

sdX represents the SD corresponding to the FC storage device. To obtain the SD ID, execute the ll command.

[root@cvknode2-158 /]# ll /dev/disk/by-path

lrwxrwxrwx 1 root root 9 Oct 12 09:48 pci-0000:05:00.0-fc-0x21020002ac01e2d7-lun-0 -> ../../sdb

[root@cvknode2-158 /]# echo 1 > /sys/block/sdb/device/delete

Tomcat commands

H3C UIS Manager provides the Tomcat service. When an exception occurs, you can restart the Tomcat service.

To view the Tomcat status:

root@HZ-UIS01-CVK01:~# service tomcat8 status

* Tomcat servlet engine is running with pid 3362

To stop the Tomcat service:

root@HZ-UIS01-CVK01:~# service tomcat8 stop

* Stopping Tomcat servlet engine tomcat8

...done.

To start the Tomcat service:

root@HZ-UIS01-CVK01:~# service tomcat8 start

* Starting Tomcat servlet engine tomcat8

...done.

To restart the Tomcat service:

root@ HZ-UIS01-CVK01:~# service tomcat8 restart

* Stopping Tomcat servlet engine tomcat8

...done.

* Starting Tomcat servlet engine tomcat8

...done.

root@ HZ-UIS01-CVK01:~#

Database commands

H3C UIS Manager uses mariadb to provide database service.

To view the mariadb service status:

root@HZ-UIS01-CVK01:~# systemctl status mariadb.service

● mariadb.service - MariaDB database server

Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)

Active: active (running) since Fri 2023-11-17 16:27:07 CST; 6 days ago

Main PID: 2525459 (mysqld_safe)

Tasks: 86 (limit: 819200)

Memory: 945.5M

CGroup: /system.slice/mariadb.service

├─ 2525459 /bin/sh /usr/bin/mysqld_safe --basedir=/usr --skip-name-resolve

└─ 2525826 /usr/libexec/mariadbd --basedir=/usr --datadir=/var/lib/mysql-share --plugin-dir=/usr/lib64/mariadb/plugin --skip-name-resolv>

To stop the mariadb service:

root@HZ-UIS01-CVK01:~#

root@HZ-UIS01-CVK01:~# systemctl stop mariadb.service

To start the mariadb service:

root@HZ-UIS01-CVK01:~# systemctl start mariadb.service

virsh commands

virsh commands allow you to obtain VMs attached to a CVK host and the VM status. In addition, you can start and shut down the VMs by using the commands.

Displaying the VM status from a CVK host

Execute the virsh list --all command to view the status of all VMs on the host.

root@UIS-CVK01:/vms# virsh list --all

Id Name State

----------------------------------------------------

4 windows2008 running

- Linux-RedHat5.9 shut off

Starting a VM from a CVK host

Execute the virsh start VM name command.

root@UIS-CVK01:/vms# virsh start Linux-RedHat5.9

Domain Linux-RedHat5.9 started

root@UIS-CVK01:/vms#

Shutting down a VM from a CVK host

Execute the virsh shutdown VM name command.

root@UIS-CVK01:/vms# virsh shutdown Linux-RedHat5.9

Domain Linux-RedHat5.9 is being shutdown

casserver commands

The casserver service collects statistics such as disk usage and alarm information. When an exception occurs on the casserver service, you can use the service casserver restart command to restart the casserver service:

qemu commands

Use qemu commands to display image file information and convert disk file formats.

Displaying image file information for a VM

On UIS Manager, you can view the image file path for a VM. The Storage Path field displays the path for the image file for the VM.

To display basic information for an image file, for example, file format, file size, and used file size, execute the qemu-img info command. For a three-level image file, the level-2 image file name will also be displayed.

root@ZJ-UIS-001:~# qemu-img info /vms/defaultPool_hdd/A-048

image: /vms/defaultPool_hdd/A-048

file format: qcow2

virtual size: 30G (32212254720 bytes)

disk size: 1.3G

cluster_size: 262144

backing file: /vms/defaultPool_hdd/A-048_base_1

backing file format: qcow2

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false

If you display level-2 image file information, you can see information for the level-1 image file (base image file).

root@ZJ-UIS-001:~# qemu-img info /vms/defaultPool_hdd/A-048_base_1

image: /vms/defaultPool_hdd/A-048_base_1

file format: qcow2

virtual size: 30G (32212254720 bytes)

disk size: 1.0M

cluster_size: 262144

backing file: /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602

backing file format: qcow2

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false

If you display information for the base image file, you cannot see information for image files of other levels.

root@ZJ-UIS-001:~# qemu-img info /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602

image: /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602

file format: qcow2

virtual size: 30G (32212254720 bytes)

disk size: 5.5G

cluster_size: 262144

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false

Consolidating image files

If a VM uses a multi-level image file, you can use the qemu-img convert command to consolidate the image file.

root@ZJ-UIS-001:/vms/defaultPool_hdd# qemu-img convert -O qcow2 -f qcow2 A-048 A048-test

The consolidated image file is not a multi-level image file.

root@ZJ-UIS-001:~# qemu-img info /vms/defaultPool_hdd/A048-test

image: /vms/defaultPool_hdd/A048-test

file format: qcow2

virtual size: 30G (32212254720 bytes)

disk size: 1.4G

cluster_size: 262144

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false

ONEStor commands

ONEStor commands are used to obtain the cluster status and status of monitors nodes, OSDs, and PGs.

· Mon (Monitor)—Monitor node in the cluster.

· OSD—Physical disks corresponding to the storage nodes.

· PG—Virtual node on the dashboard. A PG resides in a storage pool. Every time a storage pool is added, a number of PGs will be added in the cluster.

Obtaining the health status of a cluster

· ceph health detail

This command displays PGs in unclean, inconsistent, and degraded states. As shown in the following figure, if the cluster is in healthy state, the system displays HEALTH_OK.

If HEALTH_WARN is displayed, it indicates that the cluster is in warning state. The following figure shows that 1024 PGs are in degraded state, and 1024 PGs are in unclean state. This indicates that 33.333% PGs in the cluster are degraded, 1/3 OSDs are in down state, and the PGs on the down OSDs are in degraded state.

The following are the causes of this issue:

¡ A node is unreachable. Identify whether the service network and storage network are reachable.

¡ A node has failed. Use the ceph osd tree command to identify the node where the down OSDs reside and identify whether the node hardware and operating system are operating correctly.

ceph health detail

· ceph -s

To display the cluster status, use the ceph -s command.

The output from the command is as follows:

¡ health

- HEALTH_OK—The cluster is in healthy state.

- HEALTH_WARN—Alarms have been triggered.

- HEALTH_ERR—A severe error such as data inconsistency has occurred in the cluster.

Typically, prompts related to PG and OSD abnormalities or time inconsistencies will appear in the health section.

¡ monmap—Number of monitors and the nodes where the monitors reside. As shown in the figure, the cluster contains three monitors, which reside in node 117, node 118, and node 119 respectively. The first monitor is the primary monitor.

¡ osdmap—Total number of OSDs, number of OSDs in up state, and number of OSDs in in state. As shown in the figure, all 18 OSDs in the cluster are in up and in states, which indicate they are all operating correctly.

¡ pgmap—Number of PGs, number of storage pools, space that a data replica is used, and total number of objects. This field also displays cluster usage information, including used capacity, free capacity, and total capacity. In addition, the PG state is displayed.

Error prompts:

¡ too many PGs per OSD—The error message will not be displayed if you add more OSDs or reduce the number of storage pools.

¡ clock skew detected—The system time is inconsistent on monitor nodes. Execute the ntpdate –u IP command to synchronize time from the primary NTP server. IP is the IP address of the primary NTP server. As shown in the following figure, six OSDs are in down state. The cluster puts the PGs corresponding to the OSDs in degraded state.

Execute the ceph -s command. The output shows that some PGs are abnormal, one monitor is down, 12 OSDs are up, and 18 OSDs are in in state. This indicates that node 118 might have an error or the service network is in abnormal state.

· ceph -w

To monitor a cluster, use the ceph -w command. The command continuously outputs information and can be terminated by pressing Ctrl+C. When the cluster's PG state is normal, the output from the ceph -w command is consistent with the output from the ceph -s command, as shown in the following figure.

ceph -w

To view cluster state changes, see the osdmap, pgmap, mon, and osd pgmap sections.

OSD commands

· ceph osd tree

To display the OSDs on each node and their positions in the CRUSH map, use the ceph osd tree command. This command helps maintain a large cluster. The following figure shows OSDs in normal state.

tree

Use osd.1 as an example. The weight of the OSD is 0.89999, it is in rack 3, the host node is node 111, and the OSD is down and out state.

tree

IMPORTANT:

The system marks the state of an OSD as down out five minutes after it state changes to down.

· An OSD is in down/out state. A hard disk failure might occur.

· The OSDs on the node are down. A node exception or network exception might occur.

· ceph osd perf

To display the latency of an OSD, use the ceph osd perf command. If services are running, a latency of less than 100 ms is normal. When the cluster is idle, the latency is typically within 10 ms.

perf

If the latency keeps higher than 10 ms when the cluster is idle, troubleshoot the issue. If the latency is higher than 100 ms when a large number of services are running, identify whether a network or hardware failure has occurred.

· ceph osd df

To display the disk usage, use the ceph osd df command. The command can display OSD statistics, such as OSD size, used capacity, available capacity, and usage. If the usage of an OSD is higher than 85%, the near full alarm is displayed on UIS Manager. If the usage of an OSD is higher than 5, the cluster is unavailable.

As shown in the following figure, the cluster contains three OSDs, each having a size of 920G, used capacity of 501G, and available capacity of 419G. The total capacity is 2762G, used capacity is 1505G, available capacity is 1257G, and usage of 54.48%.

ceph osd df

Obtaining the cluster usage statistics

ceph df

The command is used to obtain usage statistics for the cluster and storage pools. It displays the total capacity, remaining capacity, used capacity, and percentage of the cluster. In addition, it displays information about the storage pools, such as their names, IDs, usage status, and the number of objects in each storage pool.

For example, as shown in the figure below, the remaining capacity of the cluster is 1257G, the used capacity is 1505G, the usage is 54.48%, the used capacity by storage pool p1 is 499G, the usage is 54.29%, the available space is 419G, and the number of objects is 128003.

ceph df

ONEStor commands

iostat

Use the iostat command to monitor system input/output (I/O) devices that are loaded and the length of time it takes for the system to process the I/O requests. This command is useful for analyzing whether there is a bottleneck in the IO process during the interaction between the process and the operating system. When executed without any parameters specified, this command displays statistical information from the time the system was started to the current time when the command was executed. The following figure shows the output from the iostat command.

iostat

The following are the descriptions for the items:

· The first line displays the system version, host name, and date.

· avg-cpu—CPU usage statistics. For a multi-core CPU, this value is the average value of all cores.

· Device—IO statistics for each disk.

· CPU and disk IO statistics.

For the CPU statistics, the value for iowait is important. It indicates the percentage of time that the CPU was idle during which the system had pending disk I/O requests.

Disk names are displayed in the sdX format.

Item	Description
tps	Number of IO read and write requests per second that were issued by the process.
kB_read/s	The amount of data read from the device expressed in kilobytes per second. One sector has a size of 512 bytes.
kB_wrtn/s	The amount of data written to the device expressed in kilobytes per second.
kB_read	Total number of kilobytes read.
kB_wrtn	Total number of kilobytes written.

The iostat -x 1 command displays real-time IO device statistics. Specify the -x option when you analyze IO usage statistics.

iostat -x 1

The iostat -x 1 command displays real-time information about the disk usage for a node. If the %util ratio of a single disk is high or close to 100%, a single disk might have an issue. If the overall disk %util ratio of the cluster is over 80% or close to 100%, the cluster's disk IO usage has reached its limit. In such a case, you can add more disks or reduce the services provided by the cluster.

The following are the descriptions for the items:

Item	Description
rrqm/s	Number of read requests merged per second that were queued to the device.
wrqm/s	Number of write requests merged per second that were queued to the device.
r/s	Number of read requests completed per second for the device.
w/s	Number of write requests completed per second for the device.
rkB/s	Number of kilobytes read from the device per second.
wkB/s	Number of kilobytes written to the device per second.
avgrq-sz	Average size (in sectors) of the requests that were issued to the device.
avgqu-sz	Average queue length of the requests that were issued to the device.
await	Average time (in milliseconds) for I/O requests issued to the device to be served. The time includes the time spent by the requests in queue and the time spent servicing them.
svctm	Average service time (in milliseconds) for I/O requests that were issued to the device.
%util	Percentage of CPU time during which I/O requests were issued to the device.

top

The top command provides real-time monitoring of resource usage for different processes in the system. This command can sort tasks based on CPU usage, memory usage, and execution time.

The following are the items that need to be focused on:

· Load average

· Tasks

· CPU usage

Sorting processes by CPU or memory usage can help identify which processes are causing system issues. To do this, press either the uppercase F or O key and choose either k or n when you execute the top command.

The following is the output from the top command.

top

The following are the descriptions for the items:

· The first line is task queue information. This line shows the current time, system uptime, the number of currently logged-in users, and the system load, which is the average length of the task queue, displayed as three values for the past 1 minute, 5 minutes, and 15 minutes, respectively.

· The second and third lines display information about processes and CPUs. If multiple CPUs exist, these contents might exceed two lines. The content in memory is swapped out to the swap area, and then swapped back to memory, but the unused swap area has not been overwritten. This value is the size of the swap area that already exists in memory. When the corresponding memory is swapped out again, there is no need to write to the swap area again.

The area below system information displays detailed information for each process.

Item	Description
PID	Process ID
RUSER	Username of the owner of the process
UID	User ID of the owner of the process
USER	Username of the owner of the process
VIRT	Total virtual memory used by the process.
RES	The amount of actual physical memory a process is consuming in kb.
SHR	Shared memory size (kb) used by the process.
%MEM	Memory usage of the process.
%CPU	CPU usage of the process.

You can press the uppercase F or O key, and then press a-z to sort the processes according to the corresponding column. The uppercase R key can reverse the current sorting.

You can use the following commands during the execution of the top command.

Item	Description
q /Ctrl+C	Quits the program.
m	Displays memory information.
t	Displays process and CPU information.
c	Displays command name and complete command.
M	Sorts processes by available memory.
P	Sorts processes by CPU usage.
T	Sorts processes by time/accumulated time.

Other query commands

· lsblk

Use the lsblk command to view information about hard drive capacity, partition, usage, and mounting.

lsblk

In the above figure, the NAME column lists all hard drives and partitions, SIZE displays the total capacity of the hard drive and partition size, TYPE displays the type of hard drive and partition, and MOUNTPOINT displays the file system mount point. The sda disk is the system disk with a size of 279.4G. Six hard disks with a size of 558.9G each are mounted as OSDs, and the size of the log partition is 10G.

· mount

Use the mount command to display all mounted file systems in a cluster and their types.

mount

· df -h

Use the df -h command to list all mounted file systems, and display the total capacity, used capacity, available capacity, usage, and mount point for each mounted file system.

df -h

The output shows that 6 OSDs have been mounted, each with a capacity of 549G and a usage of 1%.

· fdisk -l

Use the fdisk -l command to display the hard drives, partitions, sizes, and usage of the nodes.

fdisk -l

· free

Use the free command to display the total memory, used memory, buffer, cache, and swap usage of a node.

free

Linux commands

vi

To create or edit a file in the Linux operating system, you must use commands such as vi and vim.

The Vi editor has two modes: Command and Insert.

The following uses the test.txt file as an example.

Executing the vi command

Enter the vi test.txt command in the command line window of Linux. If the test.text file already exists, you can use the vi command to edit its content. If the file does not exist, this command creates the file.

Entering Command mode

When you first open a file with Vi, you are in Command mode. The file does not contain any information.

In Command mode, you can use keyboard keys to navigate, delete, copy, paste except entering text.

Entering Insert mode

To enter Insert mode, press i, o, or a, as shown in the following figure.

Entering Insert mode

Enter the file content.

Returning to Command mode

To return to Command mode, press ESC.

Save & Exit

After you return to Command mode, enter a colon (:),and then execute the wq command to save the file and exit the vi editor.

To view the created file, execute the ls command.

Basic commands

Displaying the current directory

Use the pwd command to print the current working directory.

root@HZ-UIS01-CVK01:~# pwd

/root

Displaying file information

Use the ls command to display file information in the current directory.

# ls [-aAdfFhilnrRSt] directory name

Options and parameters:

-a: Lists all files including those that begin with .

-A: Lists all files except for . and ..

-d: Lists directory entries instead of contents

-h: when used with -l (long list), prints sizes in human readable format, for example GB, KB

-i: Prints the index number of each file

-r: Reverses order while sorting

-R: Lists all subdirectories recursively

-S: Displays entries sorted by file size

-t: Sorts by modification time

Example:

root@HZ-UIS01-UIS Manager:~# ls -al

total 44

drwx------ 5 root root 4096 May 23 15:33 .

drwxr-xr-x 24 root root 4096 May 13 09:47 ..

-rw------- 1 root root 847 Jan 1 12:35 .bash_history

-rw-r--r-- 1 root root 3106 Apr 19 2012 .bashrc

drwx------ 2 root root 4096 May 17 17:23 .cache

-rw-r--r-- 1 root root 8 May 23 15:33 UIS.conf

drwxr-xr-x 2 root root 4096 May 23 15:32 h3c

-rw-r--r-- 1 root root 140 Apr 19 2012 .profile

drwxr-xr-x 2 root root 4096 May 22 09:50 .ssh

-rw------- 1 root root 4962 May 23 15:33 .viminfo

Changing the working directory

Use the cd command to change the working directory.

.: The current directory.

..: One level up from the current directory.

-: Previous working directory

~: Home directory for the current user

For example, ~account represents the home directory for the account user.

Example:

root@HZ-UIS01-CVK01:/# cd ~root

# Enter the home directory for the root user.

root@HZ-UIS01-CVK01:~# cd ~

# Return to the home directory for the current user.

root@HZ-UIS01-CVK01:~# cd

# Return to the home directory for the current user.

root@HZ-UIS01-CVK01:~# cd ..

# Enter the directory one level up from the current directory.

root@HZ-UIS01-CVK01:/# cd -

# Return to the previous directory.

root@HZ-UIS01-CVK01:~# cd /root

# Enter the /root directory.

root@HZ-UIS01-CVK01:~# cd ../root

# Enter the root directory under the previous directory.

Creating a new directory

Use the mkdir (make directory) command to create a new directory.

# mkdir [-mp] directory name

Options and parameters:

-m: Sets access privilege.

-p: Adds a directory including its sub directory.

Example:

root@HZ-UIS01-UIS Manager:~# ls

root@HZ-UIS01-UIS Manager:~# mkdir h3c

root@HZ-UIS01-UIS Manager:~# ls

h3c

root@HZ-UIS01-UIS Manager:~#

Copying a file or directory

Use the cp (copy) command to copy a file or directory.

# cp [-adfilprsu] source destination

# cp [options] source1 source2 source3 .... destination directory

Options and parameters:

-a: Same as -pdr

-f: If any existing destination file can't be opened, delete it and attempt again

-i: Asks for confirmation before overwriting the destination file.

-p: Preserves the file attributes of the original file in the copy.

-r: Copies files recursively. All files and subdirectories in the specified source directory are copied to the destination.

If more than two source files exist, the last destination file must be a directory.

Example:

# Copy a file.

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf

root@HZ-UIS01-UIS Manager:~# cp UIS.conf UIS.conf.bak

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf UIS.conf.bak

root@HZ-UIS01-UIS Manager:~#

# Copy a directory.

root@HZ-UIS01-UIS Manager:~# ls

h3c

root@HZ-UIS01-UIS Manager:~# cp -rf h3c h3c.bak

root@HZ-UIS01-UIS Manager:~# ls

h3c h3c.bak

root@HZ-UIS01-UIS Manager:~#

Securely copying a file

scp (secure copy) allows you to securely copy files and directories between two locations. The protocol ensures the transmission of files is encrypted. It is a safer option for the cp (copy) command. If a disk on your server is read only system, you can use the scp command copy the files on that disk to a destination.

#scp [option] [source directory] [destination directory]

Options and parameters:

-1: Protocol 1 will be used.

-2: Protocol 2 will be used.

-4: Only IPv4 addresses will be used.

-6: Only IPv6 addresses will be used.

-B: Executes in batch mode, deactivating every query for user input.

-C: Enable compression. Compression will be activated, and transfer speed will be enhanced while copying with this option.

-p: Preserves file permissions, access time, and modifications while copying.

-q: Execute SCP in quiet mode. This option will not display the transfer process.

-r: Copies the directories and files recursively.

-v: Activates verbose mode. It will display the SCP command execution progress step-by-step on the terminal window. It is useful in debugging.

-c: Cipher. choose the cipher for the process of data encryption. This option is passed directly to SSH.

-F ssh_config: For SSH, describe a replacement configuration file. This option is passed directly to SSH.

-i identity_file: File through which to read the status for public key authentication. This option is passed directly to SSH.

-l limit: Restricts the bandwidth in Kbit/s.

-o ssh_option: Arranged options in the ssh_configure format to SSH.

-P port: Port to which to link.

-S program: Applies a specified function for encryption connection. This program must be able to understand the SSH(1) option.

Example:

root@HZ-UIS01-CVK01:~# scp UIS-E0218H06-Upgrade.tar.gz HZ-UIS01-CVK02:/root

UIS-E0218H06-Upgrade.tar.gz 100% 545MB 90.8MB/s 00:06

root@HZ-UIS01-CVK01:~#

Removing a file or directory

Use the rm (remove) command to remove a file or directory.

# rm [-fir] file or directory name

Options and parameters:

-f: Removes a directory forcefully.

-i: Removes a file interactively.

-r: Removes a directory recursively. Use this option with caution.

Example:

root@HZ-UIS01-UIS Manager:~# ls

h3c

root@HZ-UIS01-UIS Manager:~# rm -rf h3c

root@HZ-UIS01-UIS Manager:~# ls

root@HZ-UIS01-UIS Manager:~#

Moving files and directories or renaming a file or directory

Use the mv (move) command to move files and directories from one directory to another or rename a file or directory.

# mv [-fiu] source destination

# mv [options] source1 source2 source3 .... directory

Options and parameters:

-f: Overwrites the destination file or directory without asking for permission.

-i: Asks for permission to overwrite.

-u: Only moves those files that do not exist.

Example:

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf

root@HZ-UIS01-UIS Manager:~# mv UIS.conf UIS.conf.bak

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf.bak

root@HZ-UIS01-UIS Manager:~#

Creating an archive and extracting the archive files

# tar [-j|-z] [cv] [-f file name] filename... archive

# tar [-j|-z] [xv] [-f file name] [-C directory] extracting

Options and parameters:

-c: Creates the archive.

-t: Displays or lists files inside the archived file.

-x: Extracts archives. This option can be used together with the -C option.

The -c, -t, and -x option cannot be used in the same command.

-j: Filters archive tar files with the help of tbzip. As a best practice, use *.tar.bz2 as the archive name.

-z: A zip file and informs the tar command that makes a tar file with the help of gzip. As a best practice, use *.tar.gz as the archive name.

-v: Displays verbose information.

-f filename: Creates an archive along with the provided name of the file.

-C directory: Use this option to extract files in a specific directory.

Example:

# Create an archive.

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf UIS.conf-01 UIS.conf-02

root@HZ-UIS01-UIS Manager:~# tar -czvf UIS.tar.gz UIS.conf*

UIS.conf

UIS.conf-01

UIS.conf-02

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf UIS.conf-01 UIS.conf-02 UIS.tar.gz

# Extract the archive files.

root@HZ-UIS01-UIS Manager:~# ls

UIS.tar.gz

root@HZ-UIS01-UIS Manager:~# tar -xzvf UIS.tar.gz

UIS.conf

UIS.conf-01

UIS.conf-02

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf UIS.conf-01 UIS.conf-02 UIS.tar.gz

System commands

Displaying the system kernel

# uname [-asrmpi]

Options and parameters:

-a: Displays all system information.

-s: Displays the system kernel name.

-r: Displays the kernel release.

-m: Displays the name of the machine’s hardware name, for example, i686 or x86_64.

-p: Displays the architecture of the CPU.

-i: Displays the hardware platform.(x86)

Example:

root@ZJ-UIS-001:~# uname -a

Linux ZJ-UIS-001 4.1.0-generic #1 SMP Wed Nov 9 02:04:23 CST 2016 x86_64 x86_64 x86_64 GNU/Linux

Displaying uptime of the system

Example:

root@HZ-UIS01-UIS Manager:~# uptime

17:54:04 up 3 days, 23:28, 1 user, load average: 0.08, 0.12, 0.13

Displaying system resource statistics

# vmstat [-a] [delay [total monitors]]

# vmstat [-fs]

# vmstat [-S unit]

# vmstat [-d]

# vmstat [-p partition]

Options and parameters:

-a: Displays active/inactive memory.

-f: Displays the number of forks since boot.

-s: Displays a table of various event counters and memory statistics.

-S: Followed by k or K or m or M switches outputs of bytes.

-d: Lists disk statistics.

-p: Followed by some partition name for detailed statistics.

Example:

root@HZ-UIS01-CVK01:~# vmstat 1 5

procs ---------------memory----------------- -----swap---- -----io---- ----system-- -----cpu--------

r b swpd free buff cache si so bi bo in cs us sy id wa

1 0 0 60402384 58716 1712736 0 0 15 6 87 116 1 0 98 0

0 0 0 60402500 58716 1712736 0 0 1 0 631 1051 0 0 100 0

0 0 0 60402608 58756 1712752 0 0 0 840 1444 1640 2 0 98 0

0 0 0 60403360 58756 1712760 0 0 2 33 991 1346 0 0 100 0

2 0 0 60400944 58780 1712784 0 0 0 60 2225 1682 0 0 99 0

Field description for Vm mode:

procs

· r: Number of processes waiting for run time.

· b: Number of processes in uninterruptible sleep.

memory

· swpd: The amount of virtual memory used.

· free: The amount of idle memory.

· buff: The amount of memory used as buffers.

· cache: The amount of memory used as cache.

swap

· si: The amount of memory swapped in from disk (/s).

·so: The amount of memory swapped to disk (/s).

If the values are large, data in the memory is swapped between disks and the primary adapter, which means the system has low efficiency.

· io

¡ bi: Blocks received from a block device (blocks/s).

¡ bo: Blocks sent to a block device (blocks/s). A larger value indicates that the system IO is busy.

system

· in: Number of interrupts per second, including the clock.

· cs: Number of context switches per second.

A larger value indicates more frequent communications between the system and devices such as disks, NICs, and clocks.

· CPU

¡ us: Time spent running non-kernel code.

¡ sy: Time spent running kernel code. (system time). id: Time spent idle.

¡ wa: Time spent waiting for IO.

¡ st: Time stolen from a VM. Supported in versions later than Linux 2.6.11.

Displaying the load on a device

Use the iostat command to display CPU and I/O usage statistics.

#iostat[parameter][time][count]

Options and parameters:

-c: Displays the CPU usage. It is mutually exclusive with the -d option.

-d: Displays the disk usage. It is mutually exclusive with the -c option.

-k: Displays statistics in kilobytes per second. The default unit is block.

-m: Displays statistics in megabytes per second.

-N: Displays logical volume mapping (LVM) statistics.

-n: Displays NFS statistics.

-p: Displays statistics for block devices and all their partitions used by the system. You can specify a device after this option, for example, # iostat -p /dev/sda. This option is mutually exclusive with the -x option.

-t: Prints the time for each report displayed.

-x: Displays detailed information.

-v: Displays version information.

Remarks:

· avg-cpu

¡ %user: Displays the percentage of CPU usage that occurred when executing at the user level.

¡ %nice: Displays the percentage of CPU usage that occurred when executing at the user level with nice priority.

¡ %user: Displays the percentage CPU usage that occurred when executing at the system (kernel) level.

¡ %steal: Displays the percentage of time spent in involuntary wait by the virtual CPU or CPUs when the hypervisor was servicing another virtual processor.

¡ %iowait: Displays the percentage of time the CPUs were idle during which the system had an outstanding disk I/O request.

¡ %idle: Displays the percentage of time the CPUs were idle.

· Device

¡ tps: Number of IO requests per second that were issued to the device.

¡ Blk_read /s: The amount of data read from the device expressed in blocks per second.

¡ Blk_wrtn/s: The amount of data written to the device expressed in blocks per second.

¡ Blk_read: Total number of blocks read.

¡ Blk_wrtn: Total number of blocks written.

IMPORTANT:

· If the value of %iowait is too high, the disk has IO issues. If the value of %idle is high, the CPUs are idle.

· If the value of %idle is high but the system responds slowly, the CPUs might be waiting for memory allocation. You must increase the memory capacity.

· If the value of %idle keeps lower than 10, the system has low CPU processing capabilities.

iostat outputs:

· Blk_read: Total number of blocks read.

· Blk_wrtn: Total number of blocks written.

· kB_read/s: The amount of data read from the driver expressed in kilobytes per second.

· kB_wrtn/s: The amount of data written to the driver expressed in kilobytes per second.

· kB_read: Total number of kilobytes read.

· kB_wrtn: Total number of kilobytes written.

· rrqm/s: Number of read requests merged per second that were queued to the device.

· wrqm/s: Number of write requests merged per second that were queued to the device.

· r/s: Number of read requests completed per second for the device.

· w/s: Number of write requests completed per second for the device.

· rsec/s: Number of sectors read from the device per second.

· wsec/s: Number of sectors written to the device per second.

· rkB/s: The amount of data read from the device expressed in kilobytes per second.

· wkB/s: The amount of data written to the device expressed in kilobytes per second.

· avgrq-sz: Average size (in sectors) of the requests that were issued to the device.

· avgqu-sz: Average queue length of the requests that were issued to the device.

· await: Average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.

· svctm: Average service time (in milliseconds) for I/O requests that were issued to the device.

· %Util: Percentage of CPU time where I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.

Example:

root@HZ-UIS01-CVK01:~# iostat

Linux 3.13.6 (HZ-UIS01-CVK01) 12/16/2015 _x86_64_ (24 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

20.48 0.00 3.48 0.23 0.00 75.80

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

sda 10.17 1.76 269.57 1309400 201017740

sdb 16.43 181.78 202.21 135552881 150792613

Execute the iostat -d -x -m /dev/sdb 1 5 command to display detailed information about /dev/sdb.

Testing the read and write performance for a disk

dd [option]

Options and parameters:

· if=file: Specifies the input file name. The default is standard input.

· of=file: Specifies the output file name. The default is standard output.

· ibs=bytes: Reads BYTES bytes at a time. One block is BYTES bytes.

· obs=bytes: Writes BYTES bytes at a time. One block is BYTES bytes.

· bs=bytes: Reads and writes BYTES bytes at a time. It can replace ibs and obs.

· cbs=bytes: Converts BYTES bytes at a time. It is the size of the conversion buffer.

· skip=blocks: Skips BLOCKS ibs-sized blocks at start of input.

· seek=blocks: Skips BLOCKS ibs-sized blocks at start of output. This option is valid only when the output file is a disk or tape.

· count=blocks: Copies only BLOCKS input blocks. The block size is the number of bytes specified by ibs.

· conv=ASCII: Converts EBCDIC to ASCII.

· conv=ebcdic: Converts ASCII to EBCDIC.

· conv=ibm: Converts ASCII to alternate EBCDIC.

· conv=block: Converts pad newline-terminated records with spaces to cbs-size.

· conv=ublock: Replaces trailing spaces in cbs-size records with newline.

· conv=uUISe: Converts lower-case letters to upper-case letters.

· conv=lUISe: Converts upper-case letters to lower-case letters.

· conv=notrunc: Does not truncate the output file.

· conv=swab: Swaps every pair of input bytes.

· conv=noerror: Continue after read errors.

· conv=sync: Pads every input block with NULLs to ibs-size; when used with block or unblock, pad with spaces rather than NULLs.

The specified numbers must be multiplied by their corresponding factors if they are followed by any of the following characters: b=512, c=1, k=1024, w=2, xm=number m, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB=1000*1000*1000, G=1024*1024*1024.

Displaying the free and used memory

free [-b|-k|-m|-g] [-t]

Options and parameters:

· -b: Displays output in Kbytes. The output can also be displayed in b(bytes), m(Mbytes), k(Kbytes), and g(Gbytes).

· -t: Displays summary for physical memory + swap space.

Example:

root@HZ-UIS01-CVK01:~# free

total used free shared buffers cached

Mem: 65939360 4208888 61730472 0 83224 277944

-/+ buffers/cache: 384772062091640

Swap: 10772220 0 10772220

User commands

Creating a user group

groupadd [-g gid] groupname

Options and parameters:

-g: Group ID.

Example:

root@HZ-UIS01-CVK01:~# groupadd -g 1000 it

root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it

it:x:1000:

Deleting a user group

groupdel groupname

Example:

root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it

it:x:1000:

root@HZ-UIS01-CVK01:/etc# groupdel it

root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it

root@HZ-UIS01-CVK01:/etc#

Creating a user

useradd [-u UID] [-g initial_group] [-G supplementary group] [-m/M] [-d home_dir] [-s shell] username

Options and parameters:

· -u: User ID.

· -g: Initial group.

· -G: A list of supplementary groups which the user is also a member of.

· -M: The user home directory will not be created.

· -m: The user’s home directory will be created if it does not exist.

· -d: Specifies a directory as the home directory.

· -s: The name of the user’s login shell. If no login shell exists, the system selects the default login shell.

Example:

root@HZ-UIS01-CVK01:~# useradd -u 1000 -g it -m -d /home/it-user01 -s /bin/bash it-user01

root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01

it-user01:x:1000:1000::/home/it-user01:/bin/bash

root@HZ-UIS01-CVK01:~# ls /home/

it-user01

Deleting a user

userdel [-r] username

Options and parameters:

-r: Deletes files in the user’s home directory along with the home directory itself.

Example:

root@HZ-UIS01-CVK01:~# userdel -r it-user01

root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01

root@HZ-UIS01-CVK01:~# ls /home

root@HZ-UIS01-CVK01:~#

Setting the password

passwd [-l] [-u] [--sdtin] [-S] [-n days] [-x days] [-w days] [-i date] username

Options and parameters:

· -l: Locks the password.

· -u: Unlocks the password.

· -S: Displays password related parameters.

· -n: Sets the minimum number of days between password changes.

· -x: Sets the maximum number of days a password remains valid. After MAX_DAYS, the password must be changed.

· -w: Sets the number of days of warning before a password change is required.

· -i: Sets the day on which the password will expire.

Example:

root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01

it-user01:x:1000:1000::/home/it-user01:/bin/bash

root@HZ-UIS01-CVK01:~#

root@HZ-UIS01-CVK01:~# passwd it-user01

Enter new UNIX password:

Retype new UNIX password:

passwd: password updated successfully

Switching the user account

su [-lm] [-c command] [username]

Options and parameters:

· -: starts a new login shell as another username. If you do not add a username, you switch to the root user.

· -l: Similar as the - option except that you must specify the user account.

· -m: Preserves the current environment.

· -c: Passes a command to the shell.

Example:

root@HZ-UIS01-CVK01:~# su - it-user01

it-user01@HZ-UIS01-CVK01:~$ exit

logout

it-user01@HZ-UIS01-CVK01:~$ su - root

Password:

root@HZ-UIS01-CVK01:~#

File management commands

Changing the group ownership of a file or directory

chgrp [-R] group name directory/file

Options and parameters:

-R: Recursively changes the group of the directory and each file in the directory.

Example:

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01# chgrp root testFile

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 it-user01 root 4096 May 30 15:44 testFile

Changing the file owner and group

chown [-R] user file or directory

chown [-R] user:group name file or directory

Options and parameters:

-R: Recursively changes the ownership of the directory and each file in the directory.

Example:

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01# chown root:root testFile

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 root root 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01#

Changing file or directory mode bits or permissions.

chmod [-R] xyz file or directory

Options and parameters:

· xyz: File attribute in number, a sum of the values for r, w, and x.

· -R: Recursively changes file mode bits of the directory and the files in the directory.

Example:

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01# chmod 777 testFile

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxrwxrwx 2 it-user01 it 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01#

Process management commands

Displaying all running processes

top [-d number] | top [-bnp]

Options and parameters:

· -d: Specifies the delay between screen updates in seconds. The default value is 5 seconds.

· -b: Starts top in Batch mode, which is used to send output from top to a file.

· -n: Specifies the maximum number of iterations, or frames, top can produce before ending. This option is used together with the -b option.

· -p: Monitor only processes with specified process IDs.

You can use the following interactive commands during execution of the top:

· ?: Provides a reminder of all the basic interactive commands.

· P: Sorts by CPU usage.

· M: Sorts by memory usage.

· N: Sorts by PID.

· T: Sorts by CPU time used by processes.

· k: You will be prompted for a PID and then the signal to be sent.

· r: You will be prompted for a PID and then the value to nice it to.

· q: Quits top.

Example:

top - 17:40:48 up 2:13, 1 user, load average: 0.45, 0.55, 0.66

Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.6%us, 0.1%sy, 0.0%ni, 99.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 65939360k total, 5703848k used, 60235512k free, 85832k buffers

Swap: 10772220k total, 0k used, 10772220k free, 1746992k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4939 root 20 0 4583m 1.3g 4728 S 12 2.1 17:36.67 kvm

4874 root 20 0 4520m 908m 4576 S 5 1.4 11:54.61 kvm

4043 root 20 0 10.9g 402m 16m S 1 0.6 13:43.34 java

2370 root 20 0 23676 2168 1316 S 0 0.0 0:30.29 ovs-vswitchd

3184 root 20 0 15972 744 544 S 0 0.0 0:04.78 irqbalance

1 root 20 0 24456 2444 1344 S 0 0.0 0:04.07 init

2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd

3 root 20 0 0 0 0 S 0 0.0 0:00.07 ksoftirqd/0

6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0

Output description:

· The first line displays the following:

¡ Current time and length of time since last boot

¡ Total number of users

¡ System load avg over the last 1, 5 and 15 minutes

A small value indicates that the system is idle. If the value is higher than 1, you must identify whether the system is too busy.

· The second line shows total tasks or threads. If the value for zombie is not 0, you must identify which process has become a zombie process.

· The third line shows the CPU state percentages. You must focus on the %wa parameter, which represents the time waiting for I/O completion. An IO issue can cause a system to respond slowly.

· The fourth and fifth lines show the physical and virtual memory statistics. If the virtual memory usage is high, the physical memory of the system is insufficient.

The lower section displays statistics for each process.

· PID: ID of the process.

· USEr: User of the process.

· PR: Priority of the process. A smaller value means the process has a higher execution priority.

· NI: Time running niced user processes. A smaller value means the process has a higher execution priority.

· %CPU: CPU usage.

· %MEM: Memory usage.

· TIME+: CPU time.

To view information about a process:

root@HZ-UIS01-CVK01:~# top -d 2 -p 4939

top - 08:59:13 up 17:31, 1 user, load average: 0.75, 0.70, 0.58

Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 65939360k total, 6484728k used, 59454632k free, 229880k buffers

Swap: 10772220k total, 0k used, 10772220k free, 1995728k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4939 root 20 0 4583m 1.5g 4728 S 2 2.4 100:48.79 kvm

Returning the status of a process

ps aux

ps -lA

ps axjf

Options and parameters:

· -A: Displays information about all accessible processes on the system.

· -a: Displays information about all processes that are associated with terminals.

· -u: Displays information for processes with user IDs in the userlist.

· -x: Used together with the -a option to display complete information.

Output format:

· l: Displays BSD long format.

· j: BSD job control format.

· -f: Does full-format listing.

# Display bash processes.

root@HZ-UIS01-CVK01:~# ps -l

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD

4 R 0 11338 32857 0 80 0 - 2102 - pts/2 00:00:00 ps

4 S 0 32857 32797 0 80 0 - 5428 wait pts/2 00:00:00 bash

Using the ps -l command only lists programs related to the operating environment (bash). The parent program will be its own bash, which extends to the init process.

· F: Flags associated with the process.

¡ 4: used super-user privileges.

¡ 1: forked but didn't exec.

· S: Process state. R: Running. S: Sleep. D: Uninterruptible sleep (typically IO).

· T: Stop. Z: defunct zombie process, terminated but not reaped by its parent.

· UID/PID/PPID: Process ID.

· C: CPU usage.

· PRI/NI: Priority and Nice.

· ADDR/SZ/WCHAN: Memory related.

¡ ADDR: Location of the process in the memory. If it is Running, a hyphen (-) is displayed.

¡ SZ: size in physical pages of the core image of the process.

¡ WCHAN: Address of the kernel function where the process is sleeping.

· TTY: Controlling tty (terminal). For a remote login, pts/2 port is used.

· CMD: Command.

# Display all processes.

root@HZ-UIS01-CVK01:~# ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

root 1 0.0 0.0 24572 2484 ? Ss 11:20 0:04 /sbin/init

root 2 0.0 0.0 0 0 ? S 11:20 0:00 [kthreadd]

root 3 0.0 0.0 0 0 ? S 11:20 0:00 [ksoftirqd/0]

root 6 0.0 0.0 0 0 ? S 11:20 0:00 [migration/0]

root 7 0.0 0.0 0 0 ? S 11:20 0:00 [watchdog/0]

root 8 0.0 0.0 0 0 ? S 11:20 0:00 [migration/1]

...

root 55719 1.0 0.0 71272 3520 ? Ss 17:42 0:00 sshd: root@pts/3

root 55752 8.6 0.0 21712 4204 pts/3 Ss 17:43 0:00 -bash

root 55927 0.0 0.0 16872 1284 pts/3 R+ 17:43 0:00 ps aux

root 62570 0.0 0.0 0 0 ? S 14:43 0:00 [kworker/7:2]

root 62840 0.0 0.0 0 0 ? S 16:40 0:00 [kworker/u:0]

# Display information about a process.

root@HZ-UIS01-CVK01:~# ps -fu mysql

UID PID PPID C STIME TTY TIME CMD

mysql 3144 1 0 11:21 ? 00:00:46 /usr/sbin/mysqld

Ending a process

kill -signal PID

The following are the signal types:

· 1 SIGHUP: Hangs up or disconnects a process. It's often used to restart a process or to update its configuration.

· 9 SIGKILL: Immediately terminates a process, without allowing it to clean up or save any data.

· 15 SIGTERM: Requests that the process terminate gracefully, allowing it to clean up any resources or save any data before exiting.

Networking

Configuring a network interface

# Display enabled network interfaces.

root@HZ-UIS01-CVK01:/etc/network# ifconfig

vs_st6251d: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 112.113.20.116 netmask 255.255.255.0 broadcast 112.113.20.255

inet6 fe80::4abd:3dff:fe35:364f prefixlen 64 scopeid 0x20<link>

ether 48:bd:3d:35:36:4f txqueuelen 1000 (Ethernet)

RX packets 92927617 bytes 259005158671 (241.2 GiB)

RX errors 0 dropped 197 overruns 0 frame 0

TX packets 86270427 bytes 264220608508 (246.0 GiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vs_storage: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet6 fe80::888d:4ff:fe03:2b42 prefixlen 64 scopeid 0x20<link>

ether 8a:8d:04:03:2b:42 txqueuelen 1000 (Ethernet)

RX packets 2096773 bytes 113740663 (108.4 MiB)

RX errors 0 dropped 1383 overruns 0 frame 0

TX packets 49 bytes 3718 (3.6 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vswit923de: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet 10.125.36.163 netmask 255.255.254.0 broadcast 10.125.37.255

inet6 fe80::4abd:3dff:fe35:364d prefixlen 64 scopeid 0x20<link>

ether 06:dc:dd:6a:a4:6b txqueuelen 1000 (Ethernet)

RX packets 12129953 bytes 35114923993 (32.7 GiB)

RX errors 0 dropped 195 overruns 0 frame 0

TX packets 10305733 bytes 2409083342 (2.2 GiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

vswitch0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500

inet6 fe80::707b:10ff:fe08:6a4a prefixlen 64 scopeid 0x20<link>

ether 72:7b:10:08:6a:4a txqueuelen 1000 (Ethernet)

RX packets 2094681 bytes 111925332 (106.7 MiB)

RX errors 0 dropped 197 overruns 0 frame 0

TX packets 30 bytes 2196 (2.1 KiB)

TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

...

The ifconfig -a command displays all network interfaces, including disabled network interfaces.

# Display information about a network interface.

[root@autoCvk3 ~]# /opt/bin/ovs_dbg_listports

vs_st6251d (linux, vs_storage)

vs_st6251d 48bd3d35364f 1500

eth3 48bd3d35364f 1500

veth veth6251dlinux baf8cb16ce6d 1500 veth6251dovs

sub storage_ex 112.113.19.116/24

sub storage_in 112.113.20.116/24

vswit923de (linux, vswitch0)

vswit923de 06dcdd6aa46b 10.125.36.163/23 1500

eth2 48bd3d35364d 1500

veth veth923delinux 06dcdd6aa46b 1500 veth923deovs

# Shut down a network interface.

# ifdown vs_st6251d

# Start a network interface.

# ifup vs_st6251d

# Restart a network interface.

# /etc/init.d/networking restart

Starting from version E0883L01 of UIS 8.0, the network changed from OVS to Linux Engine. For bringing aggregated ports down/up, use bond interfaces.

Displaying physical NIC information

root@UIS-CVK02:~# ethtool eth1

Settings for eth1:

Supported ports: [ TP ]

Supported link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Half 1000baseT/Full

Supported pause frame use: No

Supports auto-negotiation: Yes

Advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Half 1000baseT/Full

Advertised pause frame use: Symmetric

Advertised auto-negotiation: Yes

Link partner advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Full

Link partner advertised pause frame use: No

Link partner advertised auto-negotiation: Yes

Speed: 1000Mb/s

Duplex: Full

Port: Twisted Pair

PHYAD: 1

Transceiver: internal

Auto-negotiation: on

MDI-X: on

Supports Wake-on: g

Wake-on: g

Current message level: 0x000000ff (255)

drv probe link timer ifdown ifup rx_err tx_err

Link detected: yes

Displaying network statistics

netstat -[atunlp]

Options and parameters:

· -a: Displays the state of all sockets and all routing table entries.

· -t: Lists TCP network packet data.

· -u: Lists UDP network packet data.

· -n: Displays network addresses as numbers.

· -l: Lists the services that are being listened to.

· -p: Displays process PID information for the service.

# Display network connection statistics for the service that uses port 8080.

root@HZ-UIS01-CVK01:/etc/network# netstat -an | grep 8080

tcp6 0 0 :::8080 :::* LISTEN

tcp6 0 0 192.168.1.11:8080 10.165.136.197:55954 ESTABLISHED

tcp6 0 0 192.168.1.11:8080 10.165.136.197:55989 TIME_WAIT

tcp6 0 0 192.168.1.11:8080 10.165.136.197:55990 FIN_WAIT2

tcp6 0 0 192.168.1.11:8080 192.168.1.211:53366 ESTABLISHED

tcp6 0 0 192.168.1.11:8080 192.168.1.211:54850 TIME_WAIT

# Display routing information for the system.

root@HZ-UIS01-CVK01:/etc/network# netstat -rn

Kernel IP routing table

Destination Gateway Genmask Flags MSS Window irtt Iface

0.0.0.0 192.168.1.254 0.0.0.0 UG 0 0 0 vswitch2

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage

192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app

Capturing packets on a network

tcpdump

Options and parameters:

· -a: Converts network and broadcast addresses to names.

· -d: Displays the matching packet code in a human readable form to standard output and stop.

· -dd: Displays the matching packet code in the format of a C program segment.

· -ddd: Displays the matching packet code in decimal format.

· e: Prints data link layer header information on the output line.

· -t: Does not print timestamps on each output line.

· -vv: Outputs detailed packet information.

· -c: Stops tcpdump after receiving the specified number of packets.

· -i: Specifies the network interface to listen on.

· -w: Directly writes packet to a file without analyzing or printing it.

Example:

tcpdump -i vswitch2 -s 0 -w /tmp/test.cap host 200.1.1.1 &

Displaying routing information

# Display routing information.

root@HZ-UIS01-CVK01:/etc/network# route -n

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage

192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app

# Add static routing information to access the network at 10.10.10.0/24.

# route add -net 10.10.10.0 netmask 255.255.255.0 gw 192.168.2.254

root@HZ-UIS01-CVK01:/etc/network#

root@HZ-UIS01-CVK01:/etc/network# route -n

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2

10.10.10.0 192.168.2.254 255.255.255.0 UG 0 0 0 vswitch-storage

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage

192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app

# Delete routing information.

# route del -net 10.10.10.0 netmask 255.255.255.0 gw 192.168.2.254

root@HZ-UIS01-CVK01:/etc/network# route -n

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage

192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app

The static routing information generated by executing the command is only saved in the system's memory. For the information to take effect permanently, add the command to the system startup script so it can be executed during the startup process.

Use the vi editor in the operating system of UIS Manager to edit the /etc/rc.local file.

Add routing commands in the file. Restart the system for the modification to take effect.

root@HZ-UIS01-CVK01:/etc/network# vi /etc/rc.local

#!/bin/sh -e

# rc.local

# This script is executed at the end of each multiuser runlevel.

# Make sure that the script will "" on success or any other

# value on error.

# In order to enable or disable this script just change the execution

# bits.

# By default this script does nothing.

route add -net 192.168.5.0 netmask 255.255.255.0 gw 192.168.2.254

ulimit -s 10240

ulimit -c 1024

touch /var/run/h3c_UIS_cvk

/usr/bin/set-printk-console 2

exit 0

Disk management commands

Displaying the disk capacity

df [-ahikHTm] [directory or file]

Options and parameters:

· -a: Lists all file systems, including system-specific file systems such as /proc.

· -k: Displays the capacity of each file system in KBytes.

· -m: Displays the capacity of each file system in MBytes.

· -h: Displays the capacity of each file system in a human readable format, such as GBytes, MBytes, and KBytes.

· -H: Uses M=1000K instead of M=1024K for displaying capacities in larger units.

· -T: Lists the file system name of each partition, such as ext3.

· -i: Displays the number of inodes instead of disk usage.

# Display the partition size.

root@HZ-UIS01-CVK01:/etc/network# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 28G 2.4G 25G 9% /

udev 32G 4.0K 32G 1% /dev

tmpfs 13G 396K 13G 1% /run

none 5.0M 0 5.0M 0% /run/lock

none 32G 17M 32G 1% /run/shm

/dev/sda6 241G 48G 181G 21% /vms

# Display information about a file system with partitions.

root@HZ-UIS01-CVK01:/etc/network# df -Th

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 28G 2.4G 25G 9% /

udev devtmpfs 32G 4.0K 32G 1% /dev

tmpfs tmpfs 13G 396K 13G 1% /run

none tmpfs 5.0M 0 5.0M 0% /run/lock

none tmpfs 32G 17M 32G 1% /run/shm

/dev/sda6 ext4 241G 48G 181G 21% /vms

Displaying the disk usage

du [-ahskm] file or directory name

Options and parameters:

· -a: Lists the capacity of all files or directories.

· -h: Displays the capacity of each file system in a human readable format, such as G/M.

· -s: Displays the total capacity.

· -S: Does not include statistics from subdirectories, which is slightly different from -s.

· -k: Displays the capacity in KBytes.

· -m: Displays the capacity in MBytes.

Example:

root@HZ-UIS01-CVK01:/vms# du -sh *

15G images

11G isos

16K lost+found

3.4G rhel-server-6.1-x86_64-dvd.iso

4.0K share

4.0K share-test

17G templet

4.0K test

Partitioning a disk

fdisk [-l] disk name

Options and parameters:

-l: Lists the partition tables for the specified disk.

If no disk is specified, the system lists all partitions of all disks in the system.

Example:

root@HZ-UIS01-CVK01:~# fdisk -l

Disk /dev/sda: 300.0 GB, 299966445568 bytes

255 heads, 63 sectors/track, 36468 cylinders, total 585871964 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 262144 bytes / 262144 bytes

Disk identifier: 0x00051ce2

Device Boot Start End Blocks Id System

/dev/sda1 * 512 58593791 29296640 83 Linux

/dev/sda2 58594302 585871359 263638529 5 Extended

Partition 2 does not start on physical sector boundary.

/dev/sda5 58594304 80138751 10772224 82 Linux swap / Solaris

/dev/sda6 80139264 585871359 252866048 83 Linux

Disk /dev/sdb: 4294 MB, 4294967296 bytes

133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

Disk /dev/sdb doesn't contain a valid partition table

# Create a partition on a disk.

root@HZ-UIS01-CVK01:~# fdisk /dev/sdb

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel

Building a new DOS disklabel with disk identifier 0xeb665aa3.

Changes will remain in memory only, until you decide to write them.

After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): m

Command action

a toggle a bootable flag

b edit bsd disklabel

c toggle the dos compatibility flag

d delete a partition

l list known partition types

m print this menu

n add a new partition

o create a new empty DOS partition table

p print the partition table

q quit without saving changes

s create a new empty Sun disklabel

t change a partition's system id

u change display/entry units

v verify the partition table

w write table to disk and exit

x extra functionality (experts only)

Command (m for help): p

Disk /dev/sdb: 4294 MB, 4294967296 bytes

133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xeb665aa3

Device Boot Start End Blocks Id System

Command (m for help): n

Partition type:

p primary (0 primary, 0 extended, 4 free)

e extended

Select (default p): p

Partition number (1-4, default 1): 1

First sector (2048-8388607, default 2048)

Using default value 2048

Last sector, +sectors or +size{K,M,G} (2048-8388607, default 8388607): 4000000

Command (m for help): n

Partition type:

p primary (1 primary, 0 extended, 3 free)

e extended

Select (default p): p

Partition number (1-4, default 2): 2

First sector (4000001-8388607, default 4000001)

Using default value 4000001

Last sector, +sectors or +size{K,M,G} (4000001-8388607, default 8388607): +500M

Command (m for help): p

Disk /dev/sdb: 4294 MB, 4294967296 bytes

133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xeb665aa3

Device Boot Start End Blocks Id System

/dev/sdb1 2048 4000000 1998976+ 83 Linux

/dev/sdb2 4000001 5024000 512000 83 Linux

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

# Display disk partition information.

root@HZ-UIS01-CVK01:~# fdisk -l /dev/sdb

Disk /dev/sdb: 4294 MB, 4294967296 bytes

133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xeb665aa3

Device Boot Start End Blocks Id System

/dev/sdb1 2048 4000000 1998976+ 83 Linux

/dev/sdb2 4000001 5024000 512000 83 Linux

Making a file system

mkfs [-t file system format] disk name

Options and parameters:

-t: Specifies the file system type, for example, ext2, ext3, ext4, or ocfs2.

# Make an ex3 file system on /dev/sdb1.

root@HZ-UIS01-CVK01:~# mkfs -t ext3 /dev/sdb1

mke2fs 1.42 (29-Nov-2011)

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

Stride=0 blocks, Stripe width=0 blocks

125184 inodes, 499744 blocks

24987 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=515899392

16 block groups

32768 blocks per group, 32768 fragments per group

7824 inodes per group

Superblock backups stored on blocks:

32768, 98304, 163840, 229376, 294912

Allocating group tables: done

Writing inode tables: done

Creating journal (8192 blocks): done

Writing superblocks and filesystem accounting information: done

root@HZ-UIS01-CVK01:~#

# Make an ocfs2 file system on /dev/sdb1.

root@HZ-UIS01-CVK01:~# mkfs -t ocfs2 /dev/sdb2

mkfs.ocfs2 1.6.3

Cluster stack: classic o2cb

Label:

Features: sparse backup-super unwritten inline-data strict-journal-super xattr

Block size: 1024 (10 bits)

Cluster size: 4096 (12 bits)

Volume size: 524288000 (128000 clusters) (512000 blocks)

Cluster groups: 17 (tail covers 5120 clusters, rest cover 7680 clusters)

Extent allocator size: 2097152 (1 groups)

Journal size: 16777216

Node slots: 2

Creating bitmaps: done

Initializing superblock: done

Writing system files: done

Writing superblock: done

Writing backup superblock: 0 block(s)

Formatting Journals: done

Growing extent allocator: done

Formatting slot map: done

Formatting quota files: done

Writing lost+found: done

mkfs.ocfs2 successful

root@HZ-UIS01-CVK01:~#

Checking a disk

fsck [-t file system format] [-ACay] disk name

Options and parameters:

· -t: Specifies the file system type. This option is typically not required, because the current Linux system automatically distinguishes file system types through the superblock.

· -A: Scans the necessary disks based on the content of /etc/fstab. This command is typically executed during the boot process.

· -a: Automatically repairs detected abnormal sectors, so you don't have to keep pressing y.

· -y: Similar to -a, but some file systems only support the -y parameter.

· -C: Enables a histogram to display the current progress during the check.

# Check the /dev/sdb1 partition.

root@HZ-UIS01-CVK01:~# fsck -C /dev/sdb1

fsck from util-linux 2.20.1

e2fsck 1.42 (29-Nov-2011)

/dev/sdb1: clean, 11/125184 files, 16807/499744 blocks

Mounting a file system

mount [-t file system type] [-L Lable name] [-o additional option] [-n] disk file name mount point

Options and parameters:

· -a: Mounts all file systems based on the data in the /etc/fstab configuration file.

· -l: Displays the column label name besides the mounting information.

· -t: Specifies the type of file system to be mounted.

· -n: By default, the system writes the actual mounting information to /etc/mtab in real time to facilitate operation of other programs.

· -L: Mounts the partition that has the specified label.

· -l: Add labels in the mount output, for example, account, password, or read privilege.

# Mount /dev/sdb1 to /mnt.

root@HZ-UIS01-CVK01:~# mount /dev/sdb1 /mnt

root@HZ-UIS01-CVK01:~# df -Th

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 28G 5.7G 21G 22% /

udev devtmpfs 32G 4.0K 32G 1% /dev

tmpfs tmpfs 13G 408K 13G 1% /run

none tmpfs 5.0M 0 5.0M 0% /run/lock

none tmpfs 32G 17M 32G 1% /run/shm

/dev/sda6 ext4 241G 48G 181G 21% /vms

/dev/sdb1 ext3 1.9G 35M 1.8G 2% /mnt

Umounting a file system

umount [-fn] disk file name

Options and parameters:

· -f: Unmounts a file system forcibly. Use this parameter if no data can be read from a network file system (NFS).

· -n: Unmounts a file system without writing in the /etc/mtab directory.

Example:

root@HZ-UIS01-CVK01:~# df -Th

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 28G 5.7G 21G 22% /

udev devtmpfs 32G 4.0K 32G 1% /dev

tmpfs tmpfs 13G 408K 13G 1% /run

none tmpfs 5.0M 0 5.0M 0% /run/lock

none tmpfs 32G 17M 32G 1% /run/shm

/dev/sda6 ext4 241G 48G 181G 21% /vms

/dev/sdb1 ext3 1.9G 35M 1.8G 2% /mnt

root@HZ-UIS01-CVK01:~#

root@HZ-UIS01-CVK01:~# umount /mnt

root@HZ-UIS01-CVK01:~# df -Th

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 28G 5.7G 21G 22% /

udev devtmpfs 32G 4.0K 32G 1% /dev

tmpfs tmpfs 13G 408K 13G 1% /run

none tmpfs 5.0M 0 5.0M 0% /run/lock

none tmpfs 32G 17M 32G 1% /run/shm

/dev/sda6 ext4 241G 48G 181G 21% /vms

Writing data to a disk

Use the sync command to write data not updated in the memory to a disk.

Example:

root@HZ-UIS01-CVK01:~# sync

root@HZ-UIS01-CVK01:~#

Euler edition restrictions

To maintain system security and stability and prevent unintended background operations, Euler OS restricts certain background activities.

Disabled commands

The following commands are disabled:

· rm

· rpm

· which

· grep

· mv

· vi

· vim

· ps

· top

· bash

· sh

· find

· yum

· dd

· chmod

The system displays command not found when you enter these commands.

Disabled command autocompletion

Pressing Tab no longer autocompletes commands during input.