- Released At: 20-10-2025
- Page Views:
- Downloads:
- Table of Contents
- Related Documents
-
H3C UIS Manager Maintenance Guide
Document version: 5W100-20251017
Copyright © 2025 New H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.
Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.
The information in this document is subject to change without notice
Contents
Identifying the cluster HA feature
Identifying the shared storage in the cluster
Identifying the uptime of a host
Identifying host performance monitoring information
Identifying vSwitch information
Identifying physical NIC status
Identifying the running status of CAStools
Identifying VM performance monitoring statistics
Identifying VM backup information
Identifying license information
Configuration cautions and guidelines
Starting or shutting down a UIS host
IP address and host name change
Replacing a disk on a CVK host
Changing the password for accessing UIS Manager
Changing the root password of a host from the Web interface
Scaling out and scaling in a cluster
Performing a heterogeneous or homogeneous migration
Obtaining the XML file of the VM
Identifying the storage volume for VM disk files
Copying the XML file of the VM to the target host
Clearing VM data on the original host
Replacing the backup node in a stateful failover system
Displaying technical support service information
Replacing SSDs with NVMe drives
Configuring storage disaster recovery
Collecting logs of the UIS Manager
Collecting logs from the Web interface
Collecting logs at the CLI of a CVK host
Collecting logs of a VM operating system
Collecting logs of a Windows operating system
Viewing logs of a Windows operating system
Collecting logs of a Linux operating system
Troubleshooting tools and utilities
/var/log/calamari/calamari.log
/var/log/onestor_cli/ onestor_cli.log
Distributed storage maintenance
Rebalancing data placement when data imbalance occurs
In the Handy HA scenario, the system is inaccessible through the management HA IP
Resolving host issues caused by a full system disk
Issues caused by network failure
Handling failures to add or delete hosts
Deleting a storage node offline and restoring the node
Identifying the data partitions to which the OSDs are mounted
The UIS Web interface shows a slow disk alarm.
Compute cluster creation failure
Deletion failure prompt for successful host deletion
OSD process terminated unexpectedly
Network suboptimal health alarm
Down monitoring node due to high system disk usage
Down monitoring node due to network error
Extent backup file decompression
Shared storage space reclamation
Releasing space of a shared volume by editing the VM bus type
Releasing space of a shared volume by deleting files
Get responses not received by an NMS
Data of a value-added service in the memory is different from that in the database
Data inconsistency occurs if you mount a volume on a Windows client and create a snapshot online
The state of a snapshot is Creating, Deleting, or Restoring
When the Intel ixgbe network adapter is enabled with load balancing, storage access gets slow
Using a QoS policy with low bandwidth and IOPS limits makes the storage disks of a client slow
Failure to recognize an encryption dongle by VMs
After a USB device is plugged into a CVK host, the host cannot recognize the USB device
Windows repair operations and steps
Independent deployment failure
CAS authentication service exception
UIS 2000 G6 hardware HA does not take effect
Operations and maintenance monitoring data fails to be displayed
Host discovery: Hosts have empty serial numbers or the same serial number.
In the Handy HA scenario, you cannot access the Web interface by using the HA IP.
Interoperation with a third-party alarm server
Configuring a third-party alarm server on the UIS platform
Configuring UC 2.0 to monitor UIS alarms
Disabled command autocompletion
Routine maintenance
Stable operation of the UIS system requires maintenance works that typically include reviewing alarms, identifying cluster status, host information, virtual machine (VM) status, license information, and reviewing logs.
Reviewing alarms
The UIS platform main page displays indicators for critical alarms, major alarms, minor alarms, and information alarms generated during UIS system operation in the top right corner.
If critical or major alarms are displayed, the UIS system operation might contain anomalies that require immediate troubleshooting.
By clicking the corresponding alarm indicator, you can access the associated real-time alarm page. Alternatively, you can navigate to the Alarm Management > Real-Time Alarm page.
You can perform troubleshooting based on the alarm source, type, content, and the last alarm time on the real-time alarm page.
Performing health check
The UIS platform provides a hot key in the top right corner that allows you to perform health check, resource analysis, storage cleanup, resource export, VM restoration, and zombie VM operations.
Select Health Check to enter the health check page. You can perform health check for the specified modules.
You can print and export the health check results.
If a failure is detected in the health check, for example, a RAID controller or hard drive cache failure, you can click Remediation to resolve the issue.
Reviewing operation logs
The Operation Logs page records history operations in the UIS system, including front-end manual user operations and back-end automatic system operations.
The system provides important information about operation logs including` the operator name, finish time, login address, operation description, and failure result reason.
If an operation log message result is failed, you need to troubleshoot the failure based on the failure reason. If a large number of operation logs exist, you can download them for troubleshooting and analysis.
The following figure shows the UIS Manager operation logs.
Identifying cluster status
Identifying the cluster HA feature
Verify that the HA feature is enabled for the cluster. If HA is not enabled, and the next CVK host anomaly occurs in the cluster, the VMs on the CVK host cannot correctly migrate to other CVK hosts in the cluster.
After enabling HA for the cluster, you can enable service area HA. When the service area HA becomes faulty or a connectivity issue occurs for a VM, the VM can migrate to another host.
You can specify the boot priority for the VMs in the cluster. Options include Low, Medium, and High. The default boot priority is Medium. The VM boot priority is set upon adding or editing VMs. The boot priority specifies the startup order of VMs after a host failure occurs. The VMs restart on the new host according to the specified boot priorities. The VMs with the high, medium, and low boot priorities start up in descending order until all VMs restart or no more cluster resources are available.
Identifying the shared storage in the cluster
During VM migration, if the target host has no shared storage mounted for VMs, the migration will fail.
Identifying host information
Identifying host status
View host status on the Hosts page to identify whether abnormal hosts exist.
Check the CPU and memory usage of each host, and pay special attention to the hosts with usage exceeding 80%.
Identifying the uptime of a host
On the Summary page of a CVK host, you can see the detailed host configuration information. From the Uptime field, you can identify whether the host has been rebooted recently.
Identifying host performance monitoring information
On the Performance Monitoring page of the CVK host, you can see the CPU usage, memory usage, I/O throughput, network throughput, disk usage, and partition usage of the host.
Identifying host CPU usage
On the Performance Monitoring > CPU Usage (%) page, view CPU usage in a longer time range.
Identifying host memory usage
On the Performance Monitoring > Memory Usage (%) page, view memory usage in a longer time range.
Identifying host I/O throughput
On the Performance Monitoring > I/O Throughput (KBps) page, view I/O throughput in a longer time range.
Identifying host network throughput
On the Performance Monitoring > Network Throughput (Mbps) page, view the network throughput of each physical NIC in a longer time range.
Identifying host disk usage
On the Performance Monitoring > Disk Requests (IOPS) page, you can see the host disk usage information.
Identifying host partition usage
On the Performance Monitoring > Partition Usage page, you can see the host disk usage information.
Identifying vSwitch information
Identify whether the names of vSwitches between hosts in the cluster are consistent.
On the vSwitches page of a host, identify whether the vSwitches are active. If a vSwitch is in abnormal state, identify whether the physical NIC is normal.
Make sure only one gateway is configured for all vSwitches of the host.
Identifying physical NIC status
On the Physical NICs page, identify whether the physical NICs of the host, such as the rate and state, are normal.
Abnormal physical NICs will affect vSwitch performance.
Identifying VM status
Identifying the running status of CAStools
On the Summary page of the VM, identify whether CAStools are installed to the VM and running correctly.
Verifying disk and NIC types
Verifying the disk type
On the Disk tab of the VM modification page, verify that the device object is Virtio disk (that significantly improves disk performance), the source path is a shared storage path, and the cache mode is directsync (recommended setting).
Verifying the NIC type
On the Network tab of the VM modification page, verify that the device model is high-speed NIC and kernel acceleration is enabled (that significantly improves NIC performance).
Identifying VM performance monitoring statistics
On the Performance Monitoring page of the VM, you can see the CPU usage, memory usage, I/O throughput, network throughput, disk usage, and partition usage of the VM.
Identifying VM CPU usage
On the Performance Monitoring > CPU Usage (%) page, view CPU usage in a longer time range.
Identifying VM memory usage
On the Performance Monitoring > Memory Usage (%) page, view memory usage in a longer time range.
Identifying VM I/O throughput
On the Performance Monitoring > I/O Throughput (KBps) page, view I/O throughput in a longer time range.
Identifying VM network throughput
On the Performance Monitoring > Network Throughput (Mbps) page, view the network throughput of each physical NIC in a longer time range.
Identifying VM disk usage
On the Performance Monitoring > Disk Usage page, you can see the VM disk usage information.
Identifying VM partition usage
On the Performance Monitoring > Partition Usage page, you can see VM partition usage information.
Identifying VM backup information
On the Backup Management page of a VM, you can see the backup history of the VM. As a best practice, back up all core VMs on the UIS platform.
Identifying license information
The UIS system typically contains UIS Manager license, CAS license, and distributed storage license. You need to use official licenses at official deployment sites. You can use temporary licenses at test or temporary deployment sites. To avoid impacts on correct UIS system usage upon expiration of the temporary licenses, you need to update the temporary licenses in advance.
The following figure shows the licensing page of the UIS Manager component.
Managing alarms
The alarm management feature collects and displays statistics of concerned alarms for operators. In the current software version, UIS collects statistics of host resource alarms, VM resource alarms, cluster resource alarms, failure alarms, security alarms, other alarms, and distributed storage resource alarms.
Users can configure alarm threshold settings for the indexes such as CPU usage and memory usage of hosts or VMs. When an index value reaches the alarm threshold, an alarm is generated and reported. Users can view the reported alarms in the real-time alarm list. The alarm filtering configuration allows users to filter the alarms that are not concerned. Such alarms will not be reported. In addition, the system supports sending alarms to users through Emails or SMS messages.
Managing CAS resources
This feature allows you to manage clusters, hosts, and VMs in the CAS management platform. You can perform operations such as suspending, resuming, hibernating, rebooting, and cloning VMs as templates in the CAS platform, enabling virtual resource management, data backup and recovery, and resource sharing for CAS resources.
Managing UIS resources
This feature allows you to manage clusters, hosts, and VMs in the UIS management platform. You can perform operations such as suspending, resuming, hibernating, rebooting, and cloning VMs as templates in the UIS platform, enabling virtual resource management, data backup and recovery, and resource sharing for UIS resources.
Backup center
Backup center centrally manages backup history, backup policies, and backup configuration on the management platform, including VM backup and management platform backup.
VM backup
VM backup on the management platform includes backup history, backup policies, backup pools, and backup parameters.
Platform backup
Management data backup is used for automatic scheduled backups or manual immediate backups of relevant configuration data for the hyper-converged management platform, including database, version information, and configuration files. The backup files can be saved locally on the host where the hyper-converged management platform is located, or on a remote server (in a stateful failover environment, only backup to remote servers is supported). You can view and download historical backup data in the backup history, as well as upload or import system backup files. In the event of system failure, the historical backup data can be used to restore data and configuration files to the current system.
Configuration cautions and guidelines
See H3C UIS Manager Configuration Cautions and Guidelines.
See H3C UIS Manager Data Loss Prevention Best Practices.
Change operations
If issues occur during the UIS system running process, you must follow certain rules to resolve the issues. If you cannot do that, normal operation of services on the live network will be affected.
Upgrading UIS software
See H3C UIS Upgrade Guide.
Handling hardware failure
See H3C UIS Hyper-Converged Infrastructure Component Replacement Configuration Guide.
Starting or shutting down a UIS host
When you perform comprehensive maintenance for the UIS system, you must follow a certain order to power on or power off the device. If you cannot do that, the service system will be destroyed. Before powering on the device, make sure the health is 100%.
For more information, see H3C UIS Hyper-Converged Infrastructure Node Shutdown Configuration Guide.
IP address and host name change
|
CAUTION: · To change the root password for a CVK host in the system, access the Web interface of UIS Manager. You cannot change the root password for a CVK host from its command shell. · If you delete a CVK host when the shared storage of the CVK host is suspended, the shared storage will be automatically deleted. Therefore, you must mount the shared storage to the CVK host again after the CVK host is added again. · When the number of nodes is equal to or less than four or when the host for which you want to change the IP address or host name is a node in the stateful failover system (for example, primary node, backup node, quorum node, or Handy node), you cannot modify IP addresses through directly deleting hosts. · This method is applicable to changes to the host management network IP, storage front-end IP, storage back-end IP, and host name. |
After the UIS system is deployed, you might need to modify the UIS system IP address or hosts.
After a CVK host is added to the UIS cluster, you can modify the IP address or host name through the method provided by the Xconsole interface, as shown in the figure below. To do that, you must first delete the CVK host from the UIS system.
If the CVK host has shared storage enabled or runs VMs, it cannot be deleted. To delete the host in this case, you must first stop or migrate VMs and pause or delete the shared file system.
After the host is deleted, you can add the host through host expansion. During the host expansion process, you can manually configure an IP address for the host and select the corresponding NIC interface, and then add the host back to the cluster. Then, you can migrate the VMs back to the host.
|
CAUTION: · Make sure the IP address you enter can communicate with the management network and internal/external storage networks of the original cluster. If you cannot do that, you will fail to add the host. · The IP address settings are planned in the deployment phase. You must determine the IP address settings at the beginning, because you cannot modify the IP address settings later. |
Replacing a disk on a CVK host
When a disk in the cluster fails, it cannot be directly replaced. Software operations and configurations are required for a successful disk replacement on UIS Manager. For more information, see H3C UIS Hyper-Converged Infrastructure Component Replacement Configuration Guide.
Changing the password for accessing UIS Manager
|
CAUTION: · To change the root password for a CVK, access the Web interface of UIS Manager. You cannot change the root password for a CVK host from its command shell. · Configure the same password for all hosts in the cluster. |
To meet security requirements, user passwords need to be changed periodically. The following changes the password of the UIS root user as an example.
Changing the root password of a host from the Web interface
1. Right-click a host, and then select Edit Host.
2. In the dialog box that opens, enter a new password, and then click OK.
If you forget the root password, contact Technical Support.
Changing the admin password
UIS Manager has a default password. To change this password, access UIS Manager and click admin in the upper-right corner, and then change the password as needed.
As a best practice, change the root password and admin password in time at the first login to UIS Manager.
Scaling out and scaling in a cluster
See H3C UIS Manager Resource Scale-Out and Scale-In Configuration Guide.
Changing the system time
See H3C UIS Manager System Time Modification Configuration Guide.
Performing a heterogeneous or homogeneous migration
See H3C UIS HCI Cloud Migration Guide.
Redefining a VM
In some cases, such as when a VM fails to start up due to host operation issues, it might be necessary to redefine and restore a VM on a different host from the original location.
Obtaining the XML file of the VM
Obtaining the XML file of the VM when HA is enabled and the CVM node is normal
When HA is enabled and the CVM node is normal, the XML file of a VM is saved in the HA directory on the CVM node by default. Typically, the HA directory is /etc/cvm/ha/clust_id/cvk_name, for example, /etc/cvm/ha/2/cvknode191. In the corresponding HA directory, enter the CVK directory for the VM to find the XML file of the VM, for example, test01.

Obtaining the XML file of the VM when HA is disabled and the CVM node is normal
1. On the top navigation bar, click System, and then select Data Backup > Backup History from the left navigation pane. Then, download the most recent backup file.
This example downloads backup file UIS_INFO_BACK_E0881P03_20231123203206.tar.gz.
2. Decompress the downloaded backup file and enter directory \UIS_INFO_BACK_E0881P03_20231123203206\cvknode1_crm_cvknode2\CVM_INFO_BACK_E0781P04_20231123203227\front\cvks.
3. Select the directory for the host where the VM is located, and then decompress the libvirt.tar.gz file in the directory. Then, enter the qemu subdirectory to obtain the XML file of the VM.
|
|
NOTE: Directory cvknode1_crm_cvknode2 is named in the format of primary CVM node name_crm_secondary CVM node name. In a single host environment, this directory is named in the format of CVM node name. |
Obtaining the XML file of the VM when HA is disabled and the CVM node is faulty
If HA is disabled and the CVM node is faulty, you cannot access UIS Manager. To obtain the XML file of a VM in this case, perform the following steps:
1. Use an SSH client to access each node in the cluster to find a node that has the /vms/cvmbackup directory.
The backup data is saved on three random hosts managed by the system.
2. Enter the /vms/cvmbackup directory on the node, and then enter the cvknode1_crm_cvknode2 directory to identify the most recent backup record. Then, enter the corresponding directory to locate the front.tar.gz file.
3. Decompress the front.tar.gz file, and then enter the cvks directory. Then, enter the directory for the host where the VM is located, and then decompress the libvirt.tar.gz file in the directory.
4. Enter the libvirt/qemu directory after decompression to find the XML file of the VM.
Identifying the storage volume for VM disk files
If you already know the storage volume for VM disk files, verify that the corresponding storage volume on another host that has mounted it is normal from the CLI of the host. If you do not know the storage volume for VM disk files, execute the vim or cat command to obtain the disk file location of the VM from the XML file obtained in "Obtaining the XML file of the VM." For example:
The source file field displays the location of the VM disk files.
Copying the XML file of the VM to the target host
Use SCP to copy the XML file of the VM to the /etc/libvirt/qemu directory on the host where the storage volume location has been identified in "Identifying the storage volume for VM disk files."
Defining the VM through XML
1. Execute the virsh define vm.xml command in the /etc/libvirt/qemu directory.

The VM is defined through XML.
2. Verify that the VM is also displayed in the output from the virsh list –all command at the CLI of the new host.

3. Connect the host from the Web interface. Then, you can view and start up the VM on from the Web interface.
To define many VMs, you can also reboot libvirt to automatically define these VMs if the system does not have any VMs with their names in Chinese characters. Then, start up these VMs after successful definition, as shown in the following figure:

Clearing VM data on the original host
If the original host has been completely damaged due to some hardware issues, resolve the hardware issues, and then re-install the same UIS version as the original system.
If the original host does not have hardware issues, perform the following steps to clear VM data on the host:
1. Disconnect the network cable from the original host before the host starts up.
2. Log in to the CLI of the original host to remove the XML file of the VM to avoid dual writes that occur when HA brings up the VM on the original host after the server restarts.
Replacing the backup node in a stateful failover system
See H3C UIS Manager Stateful Failover Configuration Guide.
Displaying technical support service information
You can display, export, and import technical support service information for a site on the system management page. For more information, see H3C UIS Manager Local Licensing Guide and H3C Software Products Remote Licensing Guide.
VM grouping
On the VM management page, you can assign VMs to different VM groups as needed. On the VM group details page, you can view VM resource usage information for each group.
Replacing SSDs with NVMe drives
See H3C UIS Manager Configuration Guide for Replacing SSDs with NVMe Disks.
Migrating VMware VMs
See H3C UIS HCI Cloud Migration Guide.
Configuring GPUs
See H3C UIS Manager GPU Passthrough Configuration Guide.
Configuring vGPUs
See H3C UIS Manager vGPU Configuration Guide.
Configuring anti-virus
Contact Technical Support.
Configuring storage disaster recovery
See H3C UIS Manager Site Recovery Management Configuration Guide.
Collecting logs
Collecting logs of the UIS Manager
Collecting logs from the Web interface
1. On the top navigation bar, click System, and then select Log Collection from the left navigation pane.
2. Select the CVK hosts for which the system collects logs, and then click Collect to save the log files locally.
Collecting logs at the CLI of a CVK host
If you cannot collect logs from the Web interface of the UIS Manager due to CVK failure, access the CLI of the CVK host to collect logs manually.
To collect logs at the CLI of a CVK host, access the CLI of the CVK host, and then execute the cas_collect_log.sh command. A compressed file is generated in the /vms directory as shown in the figure.
To analyze the logs, download the file to your local computer by using SSH client software.
For ONEStor-related hosts, you cannot collect logs for them by executing the script. To collect logs for a ONEStor-related host, manually copy the logs in the /var/log/storage and /var/log/ceph directories. If the time range for log collection is short or the log size is too large, you can collect part of the logs archived in the /var/log/storage/backup directory.
Introduction to logs
Logs collected from the Web interface
UIS log files downloaded from the Web interface are named in the UIS_×××_×××.tar.gz format. A decompressed log file includes the following types of files:
· catalina.out—Contains logs of Web functions on the UIS Manager.
· oper_log.log—Contains user operation logs.
· *.diag.tar.bz2—Contains logs of each CVK host.
· onestor—Contains operation logs and system logs of ONEStor.
· WARN*.tar.gz—Contains alarm messages.
Logs collected at the CLI
CVK host log files obtained at the CLI are named in the XXX.tar.bz2 format. A decompressed CVK host log file includes the following types of directory files:
· etc—Contains UIS configuration files, which are mainly VM configuration files. The VM configuration files are in the libvirt/qemu/VM.xml directory.
· var—Contains logs of each UIS feature module.
· command.out—Contains output information about frequently used commands at the CLI.
· cas _cvk-version—Contains UIS version information.
· loglist—Contains UIS log file names.
· uis_raid_card_info.log—Contains basic information about RAID controllers on the host.
The var directory mainly contains the following logs:
· messages—Host system logs, which record the system running information.
· fsm—Shared file system logs.
· cas_ha—HA logs.
· Ha_shell_XX.log—HA logs.
· libvirt—VM logs.
· openvswitch—Logs generated by the OVS running process.
· Ovs_shell_XX.log—Logs generated by calling the ovs_bridge.sh script.
· tomcat8—UIS Web logs.
· operation—Logs for manual operations at the CLI of UIS Manager.
The following provides descriptions for CVK host logs:
· Messages logs
Messages logs record critical information during operating system operation. The following introduces the records for an abnormal reboot of a CVK host.
Feb 3 13:58:01 XJYZ-CVK01 CRON【64458】: (root) CMD (ump-node-sync )
Feb 3 13:58:01 XJYZ-CVK01 CRON【64459】: (root) CMD (ump-sync -p ALL)
Feb 3 13:58:01 XJYZ-CVK01 CRON【64460】: (root) CMD ( /opt/bin/ocfs2_iscsi_conf_chg_timer.sh)
Feb 3 13:58:01 XJYZ-CVK01 CRON【64443】: (CRON) info (No MTA installed, discarding output)
Feb 3 14:06:35 XJYZ-CVK01 kernel: imklog 5.8.6, log source = /proc/kmsg started.
Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: 【origin software="rsyslogd" swVersion="5.8.6" x-pid="2747" x-info="http://www.rsyslog.com"】 start
Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: rsyslogd's groupid changed to 103
Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: rsyslogd's userid changed to 101
Feb 3 14:06:35 XJYZ-CVK01 rsyslogd-2039: Could not open output pipe '/dev/xconsole' 【try http://www.rsyslog.com/e/2039 】
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpuset
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpu
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpuacct
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Linux version 3.13.6 (root@cvknode22) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #5 SMP Mon Jul 21 10:07:26 CST 2014
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Command line: BOOT_IMAGE=/boot/vmlinuz-3.13.6 root=UUID=4beeb503-6e10-4836-93a4-0836a9a1571e ro nomodeset elevator=deadline transparent_hugepage=always crashkernel=256M quiet
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 KERNEL supported cpus:
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Intel GenuineIntel
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 AMD AuthenticAMD
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Centaur CentaurHauls
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 e820: BIOS-provided physical RAM map:
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x0000000000000000-0x000000000009cbff】 usable
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x000000000009cc00-0x000000000009ffff】 reserved
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x00000000000f0000-0x00000000000fffff】 reserved
Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x0000000000100000-0x00000000bf60ffff】 usable
As shown in the example, the messages log file does not have any records from 13:58:01 to 14:06:35, indicating that the CVK host failed in the time range.
The kernel-level logs record information about the CVK host after it restarted.
· Libvirt logs
In the /var/log/libvirt/libvirtd.log log file, an alarm that the CVK host lacks memory resources exists and the current memory usage has reached 97%. (The alarm message prompted when the CPU resources are insufficient is similar to that in the example.)
2014-10-24 09:15:52.792+0000: 2994: warning : virIsLackOfResource:1106 : Lack of Memory resource! only 374164 free 64068 cached and vm locked memory(4194304*0%) of 16129760 total, max:85; now:97
2014-10-24 09:15:52.792+0000: 2994: error : qemuProcessStart:3419 : Lack of system resources, out of memory or cpu is too busy, please check it.
The /var/log/libvirt/qemu directory saves the log files of VMs running on the CVK host.
root@UIS-CVK01:/var/log/libvirt/qemu# ls -l
total 44
-rw------- 1 root root 7067 Jan 9 19:08 RedHat5.9.log
-rw------- 1 root root 1969 Jan 18 15:41 win7.log
-rw------- 1 root root 26574 Feb 11 16:15 windows2008.log
VM logs files record VM running information, including the time when the VM started up and was closed and disk files of the VM.
2015-02-11 15:50:18.349+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name windows2008 -S -machine pc-i440fx-1.5,accel=kvm,usb=off,system=windows -cpu qemu64,hv_relaxed,hv_spinlocks=0x2000 -m 1024 -smp 1,maxcpus=12,sockets=12,cores=1,threads=1 -uuid 43741f06-166d-4155-b47e-4137df68e91c -no-user-config -nodefaults -chardev file=/vms/sharefile/windows2008,if=none,id=drive-virtio-disk0,format=qcow2,cache=directsync –device
…
char device redirected to /dev/pts/0 (label charserial0)
qemu: terminating on signal 15 from pid 4530
2015-02-11 16:15:28.825+0000: shutting down
· OCFS2 logs
The /var/log/fsm/fsm_core*.log log file records information about processing triggered by OCFS2 Fence of the CVK host.
2021-11-04 06:40:35,882 manager:233 INFO Received an event: {'index': 7, 'type': 'fence_umount', 'uuid': u'851D36905AB74AFD93E1ABA8259DA3A2', 'seq': 11538, 'dev_name': u'dm-7'}
2021-11-04 06:40:35,923 manager:204 INFO Remain 0 events to be handling
2021-11-04 06:40:35,923 manager:131 INFO Manager received an event: Pool sharefile06 was fence_umount
2021-11-04 06:40:35,923 fspool:141 INFO Pool sharefile06 received a event fence_umount
· Operation logs
Operation logs record information about the commands executed at the CLI of the CVK host. The following contains commands executed from Apr 19th to Apr 21st.
root@cvknode1:~/cas# ll /var/log/operation/
total 32
drwxrwxrwx 2 root root 4096 Apr 21 10:06 ./
drwxr-xr-x 40 root root 4096 Apr 21 11:01 ../
-rwxrwxrwx 1 root root 5162 Apr 19 17:49 18-04-19.log*
-rwxrwxrwx 1 root root 829 Apr 20 19:11 18-04-20.log*
-rwxrwxrwx 1 root root 8505 Apr 21 11:00 18-04-21.log*
The following example shows the content of an operation log file, including the following information:
¡ Time when a command was executed.
¡ Login user.
¡ Login address.
¡ Login method.
¡ Executed commands.
¡ Directory where a command was executed.
2018/04/19 16:56:50##root pts/6 (172.16.130.3)##/root## vi /var/log/tomcat8/cas.log
2018/04/19 16:57:05##root pts/6 (172.16.130.3)##/root## service tomcat8 restart
2018/04/19 17:02:21##root pts/5 (172.16.130.3)##/root## cat /etc/cvk/system_alarm.xml
2018/04/19 17:02:23##root pts/5 (172.16.130.3)##/root## lsblk
2018/04/19 17:49:04##root pts/6 (172.16.130.3)##/root## ceph osd tree
2018/04/19 17:49:19##root pts/6 (172.16.130.3)##/root## stop ceph-osd id=3
Collecting logs of CAStools
The UIS system and VMs are separated. To monitor and manage VMs on the UIS Manager, you must install CAStools in the operating system of the VMs.
The log collection method for CAStools varies by the operating system installed on the VM:
· Windows operating system—Obtain the qemu-ga.log file in the C:\Program Files\castools\ directory of the VM.
· Linux operating system—Obtain the qemu-ga.log and set-ip.log files in the /var/log/ directory of the VM.
Collecting logs of a VM operating system
Collecting logs of a Windows operating system
1. Open the Event Viewer window, and then select Windows Logs from the left navigation pane. Right click System, and then select Save All Events As.
2. Save the logs.
3. The downloaded log file is as shown in the figure.
Viewing logs of a Windows operating system
1. On the local computer (installed with the Windows 7 operating system), open the Event Viewer window. From the left navigation pane, right click Windows Logs, and then select Open Saved Log.
2. On the dialog box that opens, select the saved log file.
3. The logs are displayed on the Saved Logs > event page.
Collecting logs of a Linux operating system
To collect logs for a VM installed with a Linux operating system, collect logs in the /var/log directory. If the log size is large, first compress the logs and then copy the compressed file and save it locally.
For example, to collect logs generated on Sep 17th, 2019 for VM vm_test, execute the tar -cvf vm_test_20190917.tar.gz /var/log command.
Troubleshooting tools and utilities
Introduction to kdump
Kdump is a dump tool of the Linux kernel. It saves part of the memory to store the capture kernel. Once the current kernel crashes, kdump uses kexec to run the capture kernel. The capture kernel dumps complete information of the crashed kernel (for example, CPU register and stack statistics) to a file in a local disk or on the network.
By default, the UIS system supports kdump. When the kernel of a CVK host fails, the system generates a crash file in the /vms/crash directory for troubleshooting as shown in the example.
root@cvk29:/vms/crash# ls -lt
drwxr-sr-x 2 root whoopsie 4096 Jul 22 17:34 2014-07-22-09:34
The file named in the dump-*** format in the 2014-07-22-09:34 directory contains the output of kdump.
Analysis with the Kdump file
You can use the crash tool to analyze the Kdump file. The vmlinux file for the kernel version is needed for the analysis. You can find that file at /usr/src/linux-4.1.0-generic/vmlinux-kernelversion (the kernel version name might vary).
The following information describes how to use the Kdump file to locate typical online issues.
CPU error
Node cvknode1 at a site reboots repeatedly. After all virtual machines (VMs) are migrated and the shared storage settings are deleted from the node, the node still reboots repeatedly. The syslogs at reboots do not show occurrence of any anomalies before the reboot, while a vmcore file is present in the /vms/crash directory.
1. View abnormal call stack information in the vmcore file:
root@cvk21:/vms/tmp# crach vmlinux vmcore
No command 'crach' found, did you mean:
Command 'crash' from package 'crash' (main)
crach: command not found
root@cvk21:/vms/tmp# crash vmlinux vmcore
crash 7.0.5
Copyright (C) 2002-2014 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later [http://gnu.org/licenses/gpl.html]
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 8
DATE: Wed Nov 5 12:25:19 2014
UPTIME: 00:02:19
LOAD AVERAGE: 0.06, 0.05, 0.02
TASKS: 324
NODENAME: cvknode-1
RELEASE: 3.13.6
VERSION: #5 SMP Mon Jul 21 10:07:26 CST 2014
MACHINE: x86_64 (2132 Mhz)
MEMORY: 64 GB
PANIC: "Kernel panic - not syncing: Fatal Machine check"
PID: 0
COMMAND: "swapper/6"
TASK: ffff8807f4618000 (1 of 8) [THREAD_INFO: ffff8807f4620000]
CPU: 6
STATE: TASK_RUNNING (PANIC)
crash] bt
PID: 0 TASK: ffff8807f4618000 CPU: 6 COMMAND: "swapper/6"
#0 [ffff8807ffc6ac50] machine_kexec at ffffffff8104c991
#1 [ffff8807ffc6acc0] crash_kexec at ffffffff810e97e8
#2 [ffff8807ffc6ad90] panic at ffffffff8174ac9d
#3 [ffff8807ffc6ae10] mce_panic at ffffffff81038b2f
#4 [ffff8807ffc6ae60] do_machine_check at ffffffff810399d8
#5 [ffff8807ffc6af50] machine_check at ffffffff817589df
[exception RIP: intel_idle+204]
RIP: ffffffff8141006c RSP: ffff8807f4621db8 RFLAGS: 00000046
RAX: 0000000000000010 RBX: 0000000000000004 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff8807f4621fd8 RDI: 0000000001c0d000
RBP: ffff8807f4621de8 R8: 0000000000000009 R9: 0000000000000004
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003
R13: 0000000000000010 R14: 0000000000000002 R15: 0000000000000003
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- [MCE exception stack] ---
#6 [ffff8807f4621db8] intel_idle at ffffffff8141006c
#7 [ffff8807f4621df0] cpuidle_enter_state at ffffffff81602a8f
#8 [ffff8807f4621e50] cpuidle_idle_call at ffffffff81602be0
#9 [ffff8807f4621ea0] arch_cpu_idle at ffffffff8101e2ce
#10 [ffff8807f4621eb0] cpu_startup_entry at ffffffff810c1818
#11 [ffff8807f4621f20] start_secondary at ffffffff8104306b
crash]
Abnormal call stack information shows that a machine check error (MCE) exception occurs. This exception is typically caused by hardware issues.
2. Execute the crash-dmesg command to view information printed before the unexpected reboots:
[ 15.707981] 8021q: 802.1Q VLAN Support v1.8
[ 16.416569] drbd: initialized. Version: 8.4.3 (api:1/proto:86-101)
[ 16.416573] drbd: srcversion: F97798065516C94BE0F27DC
[ 16.416575] drbd: registered as block device major 147
[ 17.142281] Ebtables v2.0 registered
[ 17.203400] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 17.247387] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 139.114172] Disabling lock debugging due to kernel taint
[ 139.114185] mce: [Hardware Error]: CPU 2: Machine Check Exception: 4 Bank 5: be00000000800400
[ 139.114192] mce: [Hardware Error]: TSC 10ba0482e78 ADDR 3fff81760d32 MISC 7fff
[ 139.114199] mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1415161519 SOCKET 0 APIC 14 microcode 13
[ 139.114203] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
[ 139.114208] mce: [Hardware Error]: Machine check: Processor context corrupt
[ 139.114211] Kernel panic - not syncing: Fatal Machine check
crash]
It can be determined from preceding information that an error has occurred on CPU 2.
Memory error
A CVK node at a site reboots unexpectedly. No abnormal records are found in the syslogs before and after the reboot. Kdump records are generated at the reboots.
1. View call stack information from the Kdump records.
If information as follows is output, a hardware error might occur.
crash] bt
PID: 0 TASK: ffffffff81c144a0 CPU: 0 COMMAND: "swapper/0"
#0 [ffff880c0fa07c60] machine_kexec at ffffffff8104c991
#1 [ffff880c0fa07cd0] crash_kexec at ffffffff810e97e8
#2 [ffff880c0fa07da0] panic at ffffffff8174ac9d
#3 [ffff880c0fa07e20] asminline_call at ffffffffa014c895 [hpwdt]
#4 [ffff880c0fa07e40] nmi_handle at ffffffff817598da
#5 [ffff880c0fa07ec0] do_nmi at ffffffff81759b7d
#6 [ffff880c0fa07ef0] end_repeat_nmi at ffffffff81758cf1
[exception RIP: intel_idle+204]
RIP: ffffffff8141006c RSP: ffffffff81c01da8 RFLAGS: 00000046
RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046
RDX: ffffffff81c01da8 RSI: 0000000000000018 RDI: 0000000000000001
RBP: ffffffff8141006c R8: ffffffff8141006c R9: 0000000000000018
R10: ffffffff81c01da8 R11: 0000000000000046 R12: ffffffffffffffff
R13: 0000000000000000 R14: ffffffff81c01fd8 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018
--- [NMI exception stack] ---
#7 [ffffffff81c01da8] intel_idle at ffffffff8141006c
#8 [ffffffff81c01de0] cpuidle_enter_state at ffffffff81602a8f
#9 [ffffffff81c01e40] cpuidle_idle_call at ffffffff81602be0
#10 [ffffffff81c01e90] arch_cpu_idle at ffffffff8101e2ce
#11 [ffffffff81c01ea0] cpu_startup_entry at ffffffff810c1818
#12 [ffffffff81c01f10] rest_init at ffffffff8173fc97
#13 [ffffffff81c01f20] start_kernel at ffffffff81d37f7b
#14 [ffffffff81c01f70] x86_64_start_reservations at ffffffff81d375f8
#15 [ffffffff81c01f80] x86_64_start_kernel at ffffffff81d3773e
crash]
2. Execute the dmesg command to view information before the anomaly.
crash]dmesg
…
[10753.155822] sd 3:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).
[10804.115376] sbridge: HANDLING MCE MEMORY ERROR
[10804.115386] CPU 23: Machine Check Exception: 0 Bank 9: cc1bc010000800c0
[10804.115387] TSC 0 ADDR 12422f7000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 2b
…
[10804.283467] sbridge: HANDLING MCE MEMORY ERROR
[10804.283473] CPU 9: Machine Check Exception: 0 Bank 9: cc003010000800c0
[10804.283475] TSC 0 ADDR 1242ef7000 MISC 90868000800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 26
[10804.303482] EDAC MC1: 28416 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12422f7 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)
[10804.303489] EDAC MC1: 192 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12424a7 offset:0x0 grain:32
…
[10804.319474] sbridge: HANDLING MCE MEMORY ERROR
[10804.319481] CPU 6: Machine Check Exception: 0 Bank 9: cc001010000800c0
[10804.319482] TSC 0 ADDR 1243087000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 20
[10805.303772] EDAC MC1: 64 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x1243087 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)
[10813.602696] sd 3:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).
[10813.603219] sd 3:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).
[10840.833238] Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details.
crash]
3. View information in the kern.log file.
Nov 30 07:05:01 HBND-UIS-E-CVK09 kernel: [229821.496666] sd 11:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).
Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188854] sbridge: HANDLING MCE MEMORY ERROR
Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188873] CPU 23: Machine Check Exception: 0 Bank 9: cc1e0010000800c0
Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188874] TSC 0 ADDR 10638f7000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417302355 SOCKET 1 APIC 2b
…
Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.244902] EDAC MC1: 30720 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x10638f7 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)
…
root@gzh-139:/vms/issue_logs/hebeinongda/20141201/HBND-UIS-E-CVK09/logdir/var/log# grep OVERFLOW kern* | wc
225 6341 60264
root@gzh-139:/vms/issue_logs/hebeinongda/20141201/HBND-UIS-E-CVK09/logdir/var/log#
It can be determined from preceding information that the issue is caused by a memory error. The issue is resolved after the memory is replaced.
Storage cluster logs
/var/log/ceph/ceph.log
The ceph.log file mainly records the health status and traffic of the cluster. It is available only on monitor nodes and has the same content as that output from the ceph –w command.
· If logs as follows are in the ceph.log file, the service network of the primary monitor node of the cluster has been disconnected.
2017-05-09 19:44:03.400143 mon.2 172.16.105.84:6789/0 2009 : cluster [INF] mon.cvknode84 calling new monitor election
2017-05-09 19:44:03.404362 mon.1 172.16.105.83:6789/0 2023 : cluster [INF] mon.cvknode83 calling new monitor election
2017-05-09 19:44:05.419510 mon.1 172.16.105.83:6789/0 2024 : cluster [INF] mon.cvknode83@1 won leader election with quorum 1,2
2017-05-09 19:44:05.428131 mon.1 172.16.105.83:6789/0 2025 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 1,2 cvknode83,cvknode84
2017-05-09 19:44:14.383590 mon.1 172.16.105.83:6789/0 2057 : cluster [INF] osdmap e1397: 18 osds: 12 up, 18 in
· If logs as follows are in the ceph.log file, the health of the cluster is not 100%, and the cluster is in the process of recovery.
2017-06-06 19:31:41.319993 mon.0 192.168.93.21:6789/0 86387 : cluster [INF] pgmap v73931: 4096 pgs: 2561 active+clean, 1532 active+remapped+wait_backfill, 3 active+remapped+backfilling; 3362 GB data, 6730 GB used, 21941 GB / 28672 GB avail; 0 B/s rd, 127 kB/s wr, 256 op/s rd, 63 op/s wr; 5/2608637 objects degraded (0.000%); 1765938/2608637 objects misplaced (67.696%); 62992 kB/s, 15 objects/s recovering
· If logs as follows are in the ceph.log file, the storage network of a non-Handy or non-primary monitor node in the cluster has been disconnected.
2017-05-12 16:05:14.585496 mon.0 172.31.1.31:6789/0 106035 : cluster [INF] osd.31 marked itself down
2017-05-12 16:05:15.095824 mon.0 172.31.1.31:6789/0 106038 : cluster [INF] osd.33 marked itself down
2017-05-12 16:05:15.195542 mon.0 172.31.1.31:6789/0 106040 : cluster [INF] osdmap e286: 36 osds: 25 up, 36 in
2017-05-12 16:05:15.287350 mon.0 172.31.1.31:6789/0 106042 : cluster [INF] osd.27 marked itself down
2017-05-12 16:05:16.186527 mon.0 172.31.1.31:6789/0 106043 : cluster [INF] osdmap e287: 36 osds: 24 up, 36 in
/var/log/ceph/ceph-osd.*.log
The ceph-osd.*.log file mainly records information about an OSD in the cluster. If an error occurs on a cluster OSD, the error reasons will be recorded in the ceph-osd.*.log file for that OSD, which can be used for troubleshooting.
The following is an example about how to troubleshoot by using a ceph-osd.*.log file when an OSD is abnormal (the UI reports an OSD error):
1. Use the ceph osd tree command in the CLI to identify the identifier of the abnormal OSD.
2. Access the /var/log/ceph/ceph-osd.*.log file for the OSD and identify the reason for the OSD exception.
¡ If a log as follows is in the ceph-osd log file, the storage controller is damaged, causing the journal to be interrupted.
2017-04-25 14:34:08.807146 7f5bf690a780 -1 journal Unable to read past sequence 301115833 but header indicates the journal has committed up through 301115842, journal is corrupt
¡ If logs as follows are in the ceph-osd log file, the OSD has committed suicide because of is excessive pressure.
2017-03-09 11:46:01.576034 7f0878364700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f086fa6c700' had suicide timed out after 180
2017-03-09 11:46:01.576049 common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")
¡ If a log as follows is in the ceph-osd log file, the OSD has not been mounted.
2017-04-27 19:46:18.280510 7fcfb954c700 5 filestore(/var/lib/ceph/osd/ceph-85) umount /var/lib/ceph/osd/ceph-85
¡ If logs as follows are in the ceph-osd log file, the data copies are inconsistent.
2016-10-22 06:49:23.854201 7fd2e860f700- 1 log_channel(cluster)log [ERR]:1.ad shard 1:soid 819850ad/rbd_date.3b7055757a07.0000000000000ab1/7//1 date_digest 0xd7ac1812 != best guess date_digest 0x43d61c5d from auth shard 0
2016-10-22 06:49:23.854253 osd/osd_types.cc:4148:FAILED assert(clone_size.count(clone))
/var/log/ceph/ceph-disk.log
The ceph-disk.log file mainly records information about OSD deployment and startup and is typically used in conjunction with the ceph-osd.*.log file to locate OSD related issues.
· If logs as follows are in the ceph-disk log file, the system stops OSD mounting and exits the OSD mounting process because files exist in the /var/lib/ceph/osd/ceph-* directory. This issue typically occurs at the restart of the host. When the host restarts, all OSDs must be reactivated and mounted and the mounting process will check whether other files than the heartbeat, osd_disk_info.ini, and osd_should_be_restart_flag files exist in the OSD directory. If other files exist in the directory, the OSD mounting process stops.
ceph-disk: Error: another ceph osd.71 already mounted in position(old/different cluster instance?);unmounting ours.
· If logs as follows are in the ceph-disk log file, the OSD has not been activated and cannot be mounted.
Fri. 07 Apr 2017 10:24:48 ceph-disk[line:2438] ERROR Failed to activate
Fri. 07 Apr 2017 10:24:48 ceph-disk[line:976] DEBUG Unmounting /var/lib/ceph/tmp/mnt.hD_6nh
/var/log/ceph/ceph-mon.*.log
The ceph-mon.*.log file mainly records information of a monitor node in the Ceph cluster. Monitor nodes are responsible for monitoring the cluster. If an error occurs on a monitor node, the error reason will be recorded in the ceph-mon.*.log file for that node, which can be used for troubleshooting.
To troubleshoot for a monitor node exception (the UI reports a monitor node anomaly):
1. Check the hostname of the abnormal monitor node on the host management page.
2. Access the /var/log/ceph/ceph-mon.*.log file for the host to check for the cause of the monitor node exception. If the following logs are found in the ceph-mon log file, the primary monitor node is abnormal (possible reason is an exception occurs on the service network of the primary monitor node or the ceph-mon process on the primary master node is stopped), and the backup monitor nodes trigger the election mechanism.
2017-05-08 19:24:58.017935 7fb173765700 1 mon.cvknode84@2(peon).paxos(paxos active c 24348..24883) lease_timeout -- calling new election
2017-05-08 19:24:58.024456 7fb172f64700 0 log_channel(cluster) log [INF] : mon.cvknode84 calling new monitor election
/var/log/calamari/calamari.log
The calamari.log file mainly records the operations on Handy.
If logs as follows are in the calamari.log file, the Handy node does not have network connectivity with the other nodes.
2017-05-08 15:08:29,060 - ERROR - onestor_common.py[network_check][line:494] - django.request <network_check> Host "172.16.105.84" is unreachable, retry again...
2017-05-08 15:08:29,060 - ERROR - onestor_common.py[execute][line:622] - django.request [ONEStor] onestor_request_all_node cvknode84:Host is unreachable
/var/log/onestor_cli/ onestor_cli.log
The onestor_cli.log file records information about the process of collecting real-time logs on a node. It can be used to diagnose and troubleshoot any issues related to log collection.
· If a log as follows is in the onestor_cli.log file, the size of the collected logs has exceeded 5 GB.
[2017-05-10 10:47:01,980][WARNING][monitor.py][line:157] We detect the current collecting log size is up to 5GB, ending collecting automatically!
· If the onestor_cli.log file disappears from a node, the log disk space on the node might be full.
Bimodal HCI logs
Bimodal HCI provides VMware VM lifecycle management and VMware VM agentless migration features.
1. The vmware-api-server service on the CVM host provides VMware VM lifecycle management. It stores related logs in the /var/log/vmware-api-server directory. If an exception occurs when you operate VMware VMs on the UIS, a log is generated in that directory to record the causes for the exception, which can be used for issue diagnosis.
For example, if a log as follows is generated, you can determine that the reason for failure to generate a snapshot is that the snapshot directory is too deep (which is limited by VMware):
[Vmware VM Request Processor Manager1] Trace[] UID[] c.h.h.u.s.v.handler.VmwareHandler – vmware vm “hdm2-snapshot” to generate a snapshot fail, cause:Snapshot hierarchy is too deep.
2. The vmware-agent service on the CVK host is responsible for migrating data from VMware. It stores related logs in the /var/log/vmware-agent directory. If a migration task fails or is interrupted unexpectedly on the UIS, you can view the logs in that directory.
¡ vmware-agent.log—Migration process logs. When an exception occurs during the migration process, the vmware-agent.log file will record the causes for the exception, which can be used for future issue diagnosis.
If a log as follows is output, a known VMware issue https://kb.vmware.com/s/article/2035976 has been triggered
2022-01-19 16:03:06 [ERROR] service.go:149 migrate failed, vcenter key: 172.20.67.6:443 vmref: vm-64 task 1955534340610146293 reason: {"code": 12002, "message": "Get QueryChangedDiskAreas failed. ", "error": "ServerFaultCode: Error caused by file /vmfs/volumes/61dd4ded-84b7a178-07ce-98f181b81b1c/ubuntu18041desktop/ubuntu18041desktop.vmdk"}
¡ vmware_vddk.log—VDDK operation logs. These logs record the operations related to connecting to vSphere and can assist in locating data transmission interruption during migration.
3. If an error of failed driver injection is reported on the UI during the VM migration process, you can check the relevant error logs to preliminarily locate the cause of the failure. The relevant error logs are saved in the /var/log/caslog/cas_xc_virtio_driver.log file.
4. If the VM still reports that castools is not running on the UI a period of time after the injection is completed, remount the ISO and install castools again.
5. If no errors are reported on the UI after the VM is migrated but you cannot access the desktop after the VM is powered on, a VM driver injection compatibility issue might exist. If this VM is in the compatible migrated VM list, contact Technical Support to locate the issue on site.
The bimodal HCI system also manages CAS resources and the lifecycle of VMs on CAS platforms. The aggregator-provider service on a CVM host adds, edits, and deletes sites. For added sites, it collects, updates, and deletes resource data. It also manages VMs, snapshots, and templates. The service logs are stored in /var/log/aggregator-provider and help troubleshoot missing resources, errors, or failed operations. Resources collected by the aggregator-provider service and the operation records are stored in the graph database table to help issue location.
Distributed storage maintenance
Cluster issues
Rebalancing data placement when data imbalance occurs
ONEStor uses the CRUSH algorithm to automatically balance data across the object-based storage daemons (OSDs) in the cluster. Each OSD maps to a disk.
To rebalance data when occasional data imbalance occurs:
1. Execute the ceph osd df command and then identify the disk utilization of each OSD in the %USE field.
Figure 1 Identifying the disk utilization of each OSD
2. If the disk utilization of some OSDs is unusually higher than other OSDs, execute the ceph osd reweight-by-utilization command to rebalance data.
|
IMPORTANT: Data rebalancing is read and write intensive and might cause cluster performance to degrade. To minimize its impact on storage services, perform this operation at off-peak hours. |
3. Verify that the system has finished the rebalancing operation successfully.
Execute the ceph -s command to monitor the cluster health state. When the cluster state changes to HEALTH_OK, you can determine that the system has finished the rebalancing operation.
Method to accelerate data rebalancing when the cluster is in an idle state
When the cluster is in an idle state, you can accelerate data rebalancing, as follows:
1. Log in to UIS Manager.
2. On the top navigation bar, click Storage, and then select Disk Pool Management from the left navigation pane.
3. Select the disk pool on which data rebalancing is to be performed, and then click Edit.
4. In the dialog box that opens, change the restore speed from self-adaptive to reconstruction first.
In the Handy HA scenario, the system is inaccessible through the management HA IP
Symptom
· The Handy management page is inaccessible via the management HA IP in the browser.
· After you log in to the system via the HA IP, the system prompts to use the management IP. However, logging in with the management IP prompts to use the HA IP instead.
Solution
1. Check the database process on the primary and backup Handy nodes. Identify the node where the database service fails to start. If neither node has the process running, use the last node that provided management HA service as the reference.
# ps aux | grep mariadbcluster
2. Delete the gvwstate.dat file on this node. Skip this step if the file does not exist.
# sudo rm -rf /var/lib/mariadbcluster/gvwstate.dat
3. Set safe_to_bootstrap to 1 on this node.
# vim /var/lib/mariadbcluster/grastate.dat
4. Start the database service process on this node.
# service mariadbcluster bootstrap
5. Restart the database service processes on other nodes sequentially. (The nodes include primary/backup Handy nodes and nodes identified in Method 1.)
# service mariadbcluster restart
6. Check if the database service runs normally. After recovery, log in to the Handy interface again.
# /opt/h3c/bin/python /var/lib/ceph/shell/handyha/test_psql_status.py If the script returns PSQL_READY when executed on the primary Handy node, the database cluster has recovered.
Node issues
Resolving host issues caused by a full system disk
A host might malfunction when the usage of its system disk reaches 100%. For example, Apache processes and the ceph-mon daemon might fail to start, resulting in issues such as the mon down error and inability to log in to the management node.
System disk might get full for the following reasons:
· Too many large files and log files are present.
· The fio tester stores a large test0.0 file on the system disk. This issue occurs if you run fio without specifying the --filename option.
To free up disk space:
1. Execute the df –h command on the host to identify its system disk usage. The following is sample output:
root@cvknode86:~# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 28G 4.0G 23G 16% /
If the Use field displays that the disk usage has reached 100%, proceed to remove unused files.
2. Remove unused large files or log files:
a. Access the /var/log directory and other directories that might contain large files or unused files.
b. Execute the du –h --max-depth=1 command to view the size of each folder in the directory.
c. Delete unused files.
3. Remove the test data file generated by fio:
a. Execute the echo ""> filename command.
b. Execute the rm –rf filename command to delete the test data file.
Issues caused by network failure
Handling failures to add or delete hosts
You will fail to add or delete a host or disks on the host if network failure occurs before the system finishes the operation. The system will then display a failure message indicating that the system failed to delete a host because of management network failure.
The solution to these issues differs depending on the timing of the network failure.
Network failure occurs before the system starts deleting disks
If network failure occurs before the system starts deleting disks, you only need to select the target host from the webpage and perform the operation again after the system regains network connectivity to the host.
If the connectivity to the host cannot be restored in extreme cases, for example, because the host's operating system is damaged, select the host from the webpage to delete it offline. However, data on the host's disks will remain. You must take action to handle residual data.
Network failure occurs before the system deleting all disks
See "Network failure occurs before the system starts deleting disks."
Network failure occurs during disk formatting after all the disks are deleted from the cluster
The host will be invisible on the management webpages after the system deletes all its disks from the cluster and proceeds to disk formatting. If network failure occurs before the system finishes formatting all the disks, the data and Ceph partitions on the unformatted disks will remain. After the host restarts, the unformatted disks will be automatically mounted to the operating system. UIS Manager will be unable to discover these disks when the host is re-added to the cluster.
To resolve these issues, execute the umount command to manually unmount the residual disks before you add the host back to the cluster.
Deleting a storage node offline and restoring the node
You delete a storage node offline from the cluster on the webpage only if the network connectivity to its host cannot be restored. This operation directly removes the node from the cluster.
|
CAUTION: If abnormal PGs are present, data rebalancing might be in progress. To avoid loss of data, do not delete the node at this time. |
|
CAUTION: Destroying the cluster data on a host will result in loss of all cluster data on that host. Be sure that the node is no longer in use when you perform the operation. |
These operations ensure that you can add the host back to the cluster as a storage, monitor, or backup management node for management high availability.
Disk issues
Identifying the data partitions to which the OSDs are mounted
The following sample output shows that OSDs have been mounted:
The following sample output shows that no OSDs have been mounted:
You must identify the mapping between an OSD and its disk based on the partition UUID (partuuid) when you remount the OSD if it was unmounted because of a disk issue.
To identify the partuuid of the data partition for an OSD, view the content of the fsid file in the OSD directory for that OSD, for example:
cat /var/lib/ceph/osd/ceph-8/fsid
d6d97f59-171e-46f7-9759-8037c7209bf1
To identify the partuuid values of all partitions on the host, execute the following command:
ll /dev/disk/by-partuuid/
lrwxrwxrwx 1 root root 10 Dec 6 19:55 260c435a-2c35-4562-979d-7a3d641dda48 -> ../../sdf2
Mount the partition to the target disk.
OSD for a disk cannot be deleted upon a disk replacement prior to deletion of its OSD from UIS Manager
If you replace a faulty disk prior to deleting its OSD from UIS Manager, Handy adds a new disk and OSD mapping for the replacement disk. When you attempt to delete the original OSD, you will receive a no data found message and the deletion attempt will fail.
To resolve this issue:
1. Execute the lsblk command to verify that no disk has been mounted at the old OSD node. If a disk is still mounted at that OSD node, unmount it first.
Mount status:
Unmount status:
2. Execute the ps -ef | grep osd command to check whether the old OSD daemon has stopped.
3. Execute the following commands to stop the OSD daemon. Replace x in these command lines with the OSD daemon ID.
|
CAUTION: These commands will erase user data. Make sure you fully understand its impact on services when you use them. If you are not sure of their impact, contact H3C Support. |
stop ceph-osd id=x
ceph osd out osd.x
ceph osd crush remove osd.x
ceph auth del osd.x
ceph osd rm osd.x
4. Execute the cephosd tree command to verify that the OSD has been removed from the cluster.
5. Log in to UIS Manager to verify that the failed disk has been deleted.
The UIS Web interface shows a slow disk alarm.
· Regardless of whether the slow disk alarm is cleared within 10 minutes, strongly consider replacing the disk. After replacement, when the OSD returns to up state, any unresolved slow disk alarm will be cleared automatically. For disk replacement steps, see "Replacing a disk on a CVK host."
· If the alarm is cleared within 10 minutes and the OSD remains up without disk replacement, manually acknowledge the alarm in the Handy interface. If the alarm occurs again, replace the disk.
A disk fails to be added
Symptom
No available disks. The OSDs in this node have been used by the Ceph cluster.
To check if a disk is in use:
1. Run lsblk to view the target disk and its partitions.
2. Execute the sudo gdisk -l /dev/xxx command (xxx: disk name). If partitions contain Ceph identifiers, the disk is already in use.

Solution
Before using this method, confirm the disk is unused to avoid accidental data deletion.
If the disk has no user data but only residual partitions, run sudo ceph-disk zap /dev/xxx (xxx: disk name) to clear residual data and retry adding the disk.
Troubleshooting
Cluster initialization issues
Host scan failure
Symptom
A host cannot be discovered during cluster setup.
Solution
To resolve this issue:
· Check the network configuration as follows:
a. Verify that the management interface of the target host is in the same LAN as the management interface of the management node.
b. Verify that link aggregation is correctly configured on the switch interfaces connected to the management interface of the target host.
- If static link aggregation is configured, shut down one of the switch interfaces. After host scan is finished, bring up that interface.
- If dynamic link aggregation is configured, configure the host-facing aggregate interface as an edge aggregate interface by using the lacp edge-port command.
· Check for cluster initialization failure as follows:
c. Log in to each CVK host.
d. Access the /etc/cvk path and delete the cvm_info file (if it exists) by using the following command.
rm –rf cvm_info
e. Access the /root/.ssh path and delete the mhost file (if it exists) by using the following command.
rm –rf mhost
· Log in to the target host, access the /root/.ssh path, and delete the isCvmFlag file by using the following command. This file indicates that the host has acted as a management host.
rm –rf isCvmFlag
· Check for server serial number errors as follows:
a. Log in to the scanned host via SSH and execute the following command, where sn1234567 is the serial number. Make sure it does not conflict with others and matches the standard length.
echo "sn1234567" > /etc/cvk/.tmpSN
b. Restart the service on the host.
systemctl restart uisoncfg.service
c. Rescan the host and proceed with deployment.
Compute cluster creation failure
Symptom
Creation of a compute cluster fails.
Solution
To resolve this issue, verify that each host can reach the management, storage front-end, and storage back-end networks.
Storage configuration failure
Symptom
Storage configuration fails.
Solution
To resolve this issue:
1. If UIS fails to discover all disks or a designated disk, perform the following tasks:
a. Log in to the affected host and execute the parted /dev/sdDrive letter rm partition number command to delete all partitions from an undiscovered disk.
b. Verify that the RAID controllers are included in the H3C CAS&UIS Server Virtualization Software and Hardware Compatibility Matrix.
2. If the distributed storage service is incorrectly installed on the management node, perform the following tasks:
a. Run the /opt/bin/uis_onestor_handy_install.sh script to reinstall ONEStor.
b. If an error is reported, contact Technical Support.
3. If device management is not supported by a server or RAID controller, execute the devmgr_check_dev_type command. If the value of for_DM_ONEstor is False, device management is not supported. Verify again in H3C CAS&UIS Server Virtualization Software and Hardware Compatibility Matrix.
4. Storage initialization is stuck.
a. Execute the supervisorctl status command to identify whether the onestor-peon process is restarting repeatedly.
b. Check vim /var/log/supervisor/onestor-peon-stderr (use Tab for autocompletion). If it contains TimeoutError: Lock error: Matplotlib failed to acquire the lock file: /root/.cache/matplotlib/fontlist-v330.json.matplotlib-lock, this issue has occurred.
c. Delete /root/.cache/matplotlib/fontlist-v330.json.matplotlib-lock.
Cluster state
Health index lower than 100%
Symptom
The health index for a cluster is lower than 100%.
Solution
To resolve this issue:
1. Troubleshoot node failure or network disconnection issues as follows:
a. Log in to UIS, resolve alarms, and verify that the status of hosts is normal.
b. Log in to the command line of the management node, and verify connectivity to the hosts in the cluster by using ping operations.
2. Troubleshoot disk failure or RAID controller failure as follows:
a. Log in to UIS, and resolve the alarms generated for disk failure or RAID controller failure.
b. Log in to HDM, and resolve hardware alarms.
3. Verify that storage nodes are under maintenance or data balancing is in process as follows:
a. Log in to UIS, and verify that storage nodes are under maintenance and data balancing is enabled.
b. Log in to the command line of the management node, and verify that data balancing is in progress.
Host deletion
Deletion failure prompt for successful host deletion
Symptom
The system displays a deletion failure prompt when a host is deleted successfully.
Solution
To resolve this issue:
1. Execute the lsblk command on the deleted host and check for unmounted OSDs.
2. Verify that the directory of an OSD's directory is opened.
3. Execute the cd command to exit the OSD's directory, and then execute the umount /var/lib/ceph/osd/ceph-11 command.
4. Execute the sgdisk –zap-all /dev/sdf command to format partitions.
Disk issues
No available disk
Symptom
No disks are available
Solution
To resolve this issue:
1. Verify that the OSDs on the affected host have been used by the Ceph cluster:
a. Execute the lsblk command to view partitions on the target disk.
b. Execute the gdisk -l /dev/drive letter command to check for the ceph tag.

2. If the target disk is not in use, execute the ceph-disk zap /dev/drive letter command to clear residual data on the disk, and then add the disk again.

3. If UIS still cannot discover the disk, execute the ceph-disk zap /dev/drive letter command again.
Insufficient disk count
Possible causes:
· Some disks have partitions. Check and clean them as described in "No available disk."
· The management interface has residual data. Clear the browser cache and reconfigure the settings.
· Some disk cache settings do not meet deployment requirements. Reconfigure the disk cache according to the deployment requirements.
Cluster alarms
Down monitor node
Symptom
A monitor node is down.
Solution
To resolve this issue:
1. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Monitor Nodes from the left navigation pane.
2. If the down monitor node is powered off or shut down, start it up. Then, verify network connectivity between the cluster and the monitor node.
Figure 2 Verifying the monitor node state
Down OSD
Symptom
An OSD is down.
Solution
To resolve this issue:
1. Verify that the storage node where the down OSD resides is not powered off or shut down and it does not have network connectivity issues.
a. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Storage Nodes from the left navigation pane.
b. If the storage node where a down OSD resides is powered off or shut down (no data is displayed for the storage node), start the storage node up. Then, verify network connectivity between the cluster and the storage node.
Figure 3 Verifying the storage node state
OSD process terminated unexpectedly
Symptom
An OSD process is terminated unexpectedly on a storage node.
Solution
To resolve this issue:
1. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Storage Nodes from the left navigation pane.
2. Verify that the disks on the storage node are in normal state.
3. Log in to the host acting as the storage node through SSH from the management network, and execute the ceph osd tree command to view the status of al OSDs.
4. Execute the ps-ef | grep ceph-osd command to check the status of the osd processes.
5. If an osd process is not running, execute the systemctl start ceph-osd@OSD ID.service command to start it.
OSD soft link loss
Symptom
The OSD soft link for a disk is lost.
Solution
To resolve this issue:
1. Execute the lsblk command to view the OSD directory of the down disk.
2. Access the OSD directory by executing the following command:
cd /var/lib/ceph/osd/ceph-4
3. Enter ll to check whether the soft link exists. If the soft link exists, the journal file line contains the UUID of the disk.
4. If the soft link does not exist, execute the following command:
ceph-disk activate-all
Loose or faulty disk
Symptom
The OSD process of a disk is down, which indicates that the disk is loose or faulty.
Solution
To resolve this issue:
1. Examine the disk status LEDs of the affected server to locate the disk.
2. Replace the disk.
Abnormal PG state
Symptom
PGs are degraded, stale, stuck unclean, or undersized.
Solution
If no other alarms are generated for the abnormal PGs, data migration is in process. The PGs will recover automatically.
Cache alarm
Symptom
Physical cache alarms or logical cache alarms are generated for the following reasons:
· RAID is manually configured and the state of caches is incorrectly set during system deployment.
· Faults occur during operation of the cluster. For example, a battery fault for a RAID controller might cause logical cache errors.
Solution
To resolve this issue:
1. On the top right of the page, click Hot Key, and then select Health Check.
2. Select Physical Disk State and Logical Disk State, and then click Start.
Figure 4 Performing health check
3. Click Failure in the Cache State column for a faulty disk.
Figure 5 Disk with faulty caches
4. Fix the caches of the disk according to the remediation.
Figure 6 Remediation
Network suboptimal health alarm
When the Network suboptimal health alarm is enabled, the backend suboptimal network service triggers an alarm upon detecting NIC hardware failures.
When detecting a NIC hardware failure, the system isolates the NIC in the aggregated port based on the configured isolation policy. To troubleshoot the NIC issue, identify whether the NIC has issues or if the link and NIC are faulty. Replace faulty NIC in time.
1. Execute the ethtool -S ethx | grep crc_errors command to identify whether the number of CRC errors is increasing.
[root@cvknode1 ~]# ethtool -S eth0 | grep crc_errors
rx_crc_errors: 0
2. Execute the ethtool -m ethx command to identify whether the optical power for the NCI is normal.
3. Execute the cat /sys/class/net/eth0/carrier_changes command to identify whether the NIC keeps flapping.
Stateful failover
See H3C UIS Manager Stateful Failover Configuration Guide.
Monitoring node failure
Down monitoring node due to high system disk usage
Symptom
A monitoring node goes down because the system disk usage is high. The mon process exits or cannot start if the system disk usage exceeds 95%. The low disk space alarm is generated if the system disk usage crosses 70%.
To identify this symptom:
1. Execute the following command to check whether the mon process exists.
ps -ef|grep ceph-mon
2. If the mon process is not running, execute the df –h command to view the system disk usage.
root@cvknode1:df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 10G 9.6G 0.4G 96% /
udev 863M 12K 863M 1% /dev
tmpfs 349M 348K 349M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 873M 4.0K 873M 1% /run/shm
3. Check the status of the mon process by executing the ps aux | grep ceph-mon command.
root@cvknode20216:~/515# ps aux | grep ceph-mon
root 2619507 0.0 0.1 8112 2136 pts/3 S+ 17:47 0:00 grep --color=auto ceph-mon
Solution
To resolve this issue, release system disk space and start the mon process, for example, by executing the service ceph-mon@node name status command. The service name differs between nodes.
Down monitoring node due to network error
Symptom
A monitoring node goes down because of a network error.
To identify this symptom:
1. Verify that the mon process is running.
2. Verify that the monitoring nodes can ping one another.
3. Execute the arp -a and ifconfig commands to verify that the ARP table of the down monitoring node is correct.
Solution
To resolve this issue, troubleshoot the network error and start the mon process.
Extent backup file
Extent backup state
To verify that extent backup is enabled, execute the following command:
cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=""
# For details see man 4 crontabs
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed
0 22 * * 5 root python /opt/bin/ocfs2_pool_fstrim.pyc -s onestor
1 2 * * * root /opt/bin/cas_clean_log.sh
*/1 * * * * root python /opt/bin/uis_host_network_probe.pyc
*/5 * * * * root flock -xn /tmp/util_memory_dropcaches.sh.lock -c "/opt/bin/util_memory_dropcaches.sh"
*/3 * * * * root /opt/bin/check_abrt_memory.sh
* * * * * root /opt/bin/ocfs2_iscsi_conf_chg_timer.sh
*/10 * * * * root python /opt/bin/ocfs2_cluster_config.pyc -s
0 */12 * * * root python /opt/bin/ocfs2_filesystem_layout_backup.pyc
* * * * * root /opt/bin/tomcat_check.sh
*/10 * * * * root /opt/bin/ntp_mon.sh
* * * * * root /opt/bin/tomcat_check.sh
Extent backup directory
To locate an extent backup file in the extent backup directory, access the /vms/.ocfs2_extent_backup directory, and search by the file names for the target .lzo file.
In the following example, defaultPool_hdd is the storage pool, and the file name contains a timestamp.
ll –a /vms/.ocfs2_extent_backup/defaultPool_hdd/normal/
-rw-r--r-- 1 root root 176 Dec 24 00:00 .8257798_root_zhanji_1_202012240000.lzo
Therefore, the path of the most recent extent backup file is as follows:
/vms/.ocfs2_extent_backup/defaultPool_hdd/normal/.8257798_root_zhanji_1_202012240000.lzo.
Extent backup file decompression
To decompress an extent backup file, first copy it to another directory, /home or example.
cp /vms/.ocfs2_extent_backup/defaultPool_hdd/normal/.8257798_root_zhanji_1_202012240000.lzo /home
cd /home
lzop -dv .8257798_root_zhanji_1_202012240000.lzo
Script for data restoration
To run the script for restoring data from an extent backup file, execute the following command:
python /opt/bin/ocfs2_restore_utils.pyc dd /dev/dm-0 /home/.8257798_root_zhanji_1_202012240000 /vms/hw235-1/8257798_root_zhanji_1_202012240000_new
The parameters in the script are as follows:
· /dev/dm-0—Driver letter of the shared storage that saves the extent backup file. To check the drive letter of shared storage, execute the fsmcli command.
fsmcli showpool --name defaultPool_hdd
…
device name: /dev/dm-0
device path: /dev/disk/by-id/dm-name-360000000000000000e0000003b75836c
device naa: 360000000000000000e0000003b75836c
· /home/.8257798_root_zhanji_1_202012240000—Decompressed extent backup file.
· /vms/hw235-1—Path on newly created shared storage or local storage to save the restored file. Make sure the target path has enough space. Do not save the restored file to the original shared storage.
· 8257798_root_zhanji_1_202012240000_new—Name of the restored file. This name must be different from the name of the original file.
Shared storage space reclamation
Releasing space of a shared volume by editing the VM bus type
1. Execute the df –h command to check the available space of the target shared volume.
2. Log in to the VM with the shared volume attached and check the drive letter and mount path of the data disk provided by the shared volume.
3. Log in to UIS, shut down the VM, and delete the data disk.
Figure 7 Editing the VM
4. Mount the data disk to the VM again by adding hardware, and select the high-speed SCSI bus type.
Figure 8 Mounting the data disk
5. Log in to the VM, and mount the data disk again with the new drive letter.
mount /dev/sda /vms/ruitest
6. Execute the fstrim /vms/ruitest command to release space.
7. Log in to the host where the VM resides and verify that the available space of the shared volume has increased.
Releasing space of a shared volume by deleting files
1. Mount a data disk whose bus type is high-speed SCSI disk to a VM by using the following command:
mount -o discard /dev/sda /vms/ruitest
2. Verify that the discard option is specified in the mount command.
3. Log in to the host where the VM resides and check the available space of the shared volume.
4. Delete large file from the shared volume and verify that the available space of the shared volume has increased.
SNMP
Get responses not received by an NMS
Symptom 1
An NMS cannot receive get responses because the destination port for get responses is in use.
Solution 1
To resolve this issue:
1. Execute the netstat -apn |grep desination port command to obtain the process IDs for the destination port.
2. Execute the ps –aux | grep process ID command to check the processes that occupy the destination port.
3. If processes other than the snmp-get-responder process occupy the destination port, terminate those processes or kill them by using the kill process ID command.
Symptom 2
An incorrect OID is configured for SNMPv1 get responses on an NMS
Solution 2
To resolve this issue:
1. Log in to the leader storage node and execute the snmpget -v1 -c $community $ip:$port $oid command.
¡ $community—Community name. To ignore this configuration, enter public.
¡ $ip—Storage-end IP address.
¡ $port—Destination port for get responses.
¡ $oid—OID configured on the NMS.
If the following error message is output, the OID on the NMS is incorrect.
2. Modify the OID, and verify that the oid=string information is output.
Symptom 3
An incorrect OID is configured for SNMPv2c or SNMPv3 get responses on an NMS.
The storage supports the following OID ranges:
· 1.3.6.1.4.1.25506.1.7.1.2
· 1.3.6.1.4.1.25506.1.7.1.9
· 1.3.6.1.4.1.25506.1.7.1.10
· 1.3.6.1.4.1.25506.1.7.1.12
· 1.3.6.1.4.1.25506.1.7.1.13
On the NMS, a number in the range of 0 to 2147483647 is added to the end of an OID.
Solution 3
To resolve this issue:
1. Check the /var/log/onestor/snmp_get_responder.log file.
2. If the NoSuchObjectError error exists, the OID is not among the OIDs supported by the storage, and the OID does not exist in the MIB. Verify that the OID does not exceed the valid length.
3. If the NoAccessError error exists, the OID is not among the OIDs supported by the storage. The OID exists in the MIB, but the node does not have read or write permission. Verify that the OID is not shorter than the valid length.
4. If the ValueConstraintError error exists, make sure that the last number of the OID is in the range of 0 to 2147483647.

5. After you correct the OID, verify that the Success to write the vars log message is generated.
Value-added services
Data of a value-added service in the memory is different from that in the database
Analysis
This issue occurs if the handy node fails. Upon such a system event, a value-added service fails to update its data in the database, which causes data inconsistency between the memory and the database.
Solution
The solution varies by value-added service as follows:
· For the volume migration service, delete the inconsistent migration pairs, and then create migration pairs as needed.
· For the volume copy service, stop the inconsistent copy tasks, and then start copy tasks as needed.
Data inconsistency occurs if you mount a volume on a Windows client and create a snapshot online
Analysis
The product provides the storage-side snapshot function. When the system creates a snapshot, the host side might cache data. The hang IO service is used to implement data synchronization at multiple time points. This ensures that data is flushed to the data buffer on the host side at the time when a snapshot is created. Therefore, if the Windows client performs data caching at the time when a snapshot is created, data of the snapshot might be different from the real data.
Solution
As a best practice to avoid this issue, use an agent on the host side to achieve data caching and data flushing to the data buffer upon snapshot creation. However, such agent does not exist at present. Alternatively, you can take snapshots offline.
If you mount multiple snapshots of a volume on a Windows client at the same time, you are prompted that some snapshots are not initialized or assigned
Analysis
This issue might occur if you synchronously map a volume and its snapshots to the same host. The operating system of that host might recognize the source volume and its snapshots as the same volume, due to the volume recognition mechanism used by the operating system. For example, in the Oracle ASM scenario, a host identifies different volumes by ASM disk header information. This error will result in data corruption of the source volume and its snapshots.
Solution
Do not map a volume and its snapshots to the same host synchronously.
If you take a snapshot for a volume, delete its host mapping on the handy page without disk scanning or iSCSI disconnection, and restore the snapshot, the restored data is different from the original data.
Analysis
When the volume is unmapped from the host on the storage side, the host side is not aware of this event and still has data cache. If you restore data from the volume snapshot and mount the restored volume to the host again, data cache of the host will overwrite data of the restored volume.
Solution
Perform one of the following tasks before restoring data from the volume snapshot:
· Unmap the source volume from the host and perform disk scanning.
· Tear down the iSCSI connection.
If you create a read-only snapshot for a volume that is mounted by a directory, the snapshot cannot be mounted and the system prompts a wrong fs type message
Analysis
When you mount a volume on a Linux client, the new file system might not be flushed to the data buffer due to data caching. In this situation, if you take a snapshot for the mounted volume, the snapshotted file system is incomplete. Errors will occur if you mount the snapshot later.
Solution
Unmount the volume from the Linux client before snapshot creation.
The state of a snapshot is Creating, Deleting, or Restoring
Analysis
This issue might occur if the following conditions exist:
· The system has an exception and thus fails to create, delete, or restore a snapshot.
· The system cannot roll back its system records.
Solution
· For snapshots in Creating or Deleting state, manually delete the residual records generated for those snapshots.
· For snapshots in Restoring state, restore those snapshots again.
Compatibility
When the Intel ixgbe network adapter is enabled with load balancing, storage access gets slow
To avoid this issue, perform the following tasks:
1. Use the ethtool –i eth0 command to check whether the driver is ixgbe.
2. Use the ethtool –k eth0 command to check whether the large-receive-offload (LRO) service is disabled.
3. If the LRO service is enabled, use the ethtool –K eth0 lro off command to disable this service.
To ensure that the LRO service is disabled upon startup, add the ethtool –K eth0 lro off command in the /etc/rc.local file.
Using a QoS policy with low bandwidth and IOPS limits makes the storage disks of a client slow
Analysis
The I/O of a client might drop to 0 if the following conditions exist:
· The client uses multiple storage disks and a QoS policy with low bandwidth and IOPS limits is applied to those disks.
· Each used storage disk has high I/O concurrency. For more information about I/O concurrency, see the configuration file in method 2.
If Number of storage disks × Number of I/O concurrencies per storage disk is greater than the number of concurrencies on the iSCSI initiator, those storage disks have high concurrency.
Solution
To resolve this issue, use one of the following methods:
· Method 1: Distribute the service load if the service load is heavy on a single client.
¡ If only one client is available and you must deploy multiple storage disks on the client, install the multipathing service on the client and configure multiple iSCSI connections.
¡ If you can use multiple clients, distribute storage disks across different clients.
· Method 2: Increase the I/O limit on the iSCSI initiator.
a. Open the iSCSI initiator configuration file on the client. The default path is /etc/iscsi/iscsid.conf.
b. Find the session and device queue depth area in the configuration file, and then increase the value to the maximum (2048) for the node.session.cmds_max parameter.
Figure 9 Original I/O limit
Figure 10 New I/O limit
c. After the modification, restart the iSCSI initiator.
Failure to recognize an encryption dongle by VMs
To add an encryption dongle to a VM, make sure that dongle supports USB over network.
If an issue persists, contact Technical Support. As a best practice, use USBServer. For the supported models, see UIS compatibility matrix.
After a USB device is plugged into a CVK host, the host cannot recognize the USB device
Symptom
After a USB device is plugged into a CVK host, you cannot find the USB device when you attempt to add a USB device on the Web management page of UIS.
Analysis
Troubleshoot this issue as follows:
1. This issue occurs if the USB device is plugged into an incorrect slot. You can insert the USB device to another slot, for example, a USB slot inside the server. If the server has multiple types of USB slots, make sure the USB device is plugged into the matching slot.
To check whether a USB device is plugged into the correct slot, use the lsusb –t command. The following is an output example:
root@cvk-163:~# lsusb -t
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/15p, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/8p, 480M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/6p, 480M
In the command output:
¡ UHCI represents USB 1.1. The maximum data transfer speed of USB 1.1 is 12Mbps.
¡ EHCI represents USB 2.0. The maximum data transfer speed of USB 2.0 is 480Mbps.
¡ XHCI represents USB 3.0. The maximum data transfer speed of USB 3.0 is 5Gbps.
If the server supports multiple USB standards and you plug a USB 2.0 device into the correct slot on the server, a USB device is added in the bus of USB 2.0 (ehci-pci).
At present, USB 3.0, 2.0, and 1.0 are supported. Although you can plug a lower-version USB device into a higher-version USB slot, USB device incompatibility issues might occur. For example, when you plug a USB 1.0 device into a server that has only USB 3.0 slots, disable USB3.0 for the BIOS of that server to avoid USB device incompatibility issues.
If the host still cannot recognize the USB device, proceed to the next step.
2. On the command shell of the CVK host, use the lsusb command before and after you plug the USB device into the host. Compare the outputs to identify whether a new USB device is added. The following is an output example:
root@ CVK:~# lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 002: ID 03f0:7029 Hewlett-Packard
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
If no new USB device is added, the Ubuntu operating system cannot recognize the USB device. In this situation, the USB device might have faults, because an operating system with the Linux kernel supports most of the USB devices on the market. To check whether the USB device operates correctly, you can plug the USB device into an office PC. If the USB device can operate correctly on the PC, it is normal and you need to proceed to the next step.
3. Check whether the CAS system has faults or the server is not compatible with the USB device.
a. Install the operating system of an office PC on the server, plug the USB device into the server, and then check whether the system can recognize the USB device.
- If it cannot be recognized, the server is not compatible with the USB device.
- If it can be recognized, the server is compatible with the USB device.
b. Install the native CentOS system on the server, plug the USB device into the server, and then check whether the system can recognize the USB device.
- If it cannot be recognized, the CentOS system does not support the USB device. Since UIS is CentOS-based, it also does not support the USB device.
If there is a new device, it shows that the CentOS system has recognized the
device, continue with the following steps to troubleshoot.
- If it can be recognized, proceed to the next step.
4. Use the virsh nodedev-list usb_device command to view the name of the new USB device. The following is an output example:
root@ CVK:~# virsh nodedev-list usb_device
usb_2_1_5
usb_usb1
usb_usb2
usb_usb3
usb_usb4
As shown in the command output, the name of the new USB device is usb_2_1_5. Then, use the virsh nodedev-dumpxml xxx command to view XML information of USB device usb_2_1_5. The following is an output example:
|
|
NOTE: The xxx argument represents the name of a device. You can obtain this information by using the virsh nodedev-list usb_device command. |
root@CVK:~# virsh nodedev-dumpxml usb_2_1_5
<device>
<name>usb_2_1_5</name>
<path>/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5</path>
<parent>usb_2_1</parent>
<driver>
</driver>
<capability type='usb_device'>
<bus>2</bus>
<device>70</device>
<product id='0x6545'>DataTraveler G2 </product>
<vendor id='0x0930'>Kingston</vendor>
</capability>
</device>
Check whether the bus ID, device ID, product ID, and vendor ID are correct. If these IDs are all correct and you still cannot find the USB device on the Web management page of UIS, contact Technical Support.
After a USB device is loaded to a VM, the device manager of the VM cannot recognize the USB device, or the USB device appear and disappear quickly, or there is an exclamation mark on the device
Symptom:
After a USB device is loaded to a VM, the device manager of the VM cannot recognize the USB device, or the USB device appear and disappear quickly, or there is an exclamation mark on the device
Analysis
To resolve this issue:
1. Connect the USB device to another USB connector. If you use a USB extension cable, connect the USB device directly to a build-in USB connector and try again. If the server provides USB slots of multiple types, make sure the USB device is connected to the correct connector.
To identify whether the USB device is connected to the correct connector, use the lsusb –t command.
root@cvk-163:~# lsusb -t
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/15p, 480M
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/8p, 480M
/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M
|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/6p, 480M
UHCI represents USB1.1, EHCI represents USB2.0, and XHCI represents USB3.0. Typically, the maximum transmission rate for USB1.1 is 12 Mbps, for USB2.0 is 480 Mbps, and for USB3.0 is 5 Gbps.
For example, if a server supports multiple USB bus standards, and a USB2.0 device is added to the server, and a USB device is then added to the USB2.0 (ehci-pci) bus, it indicates that the USB device is correctly inserted in the slot.
2. If the USB devices such as USB Key, encryption token, or SMS modem are USB1.0, and the server only has USB3.0 connectors, it is recommended to disable USB3.0 in the BIOS.
3. To identify whether the CVK host can recognize the USB device, unplug and plug in the USB device, and then use the virsh nodedev-list usb_device command to check if there are any newly added USB devices.
¡ If no newly added USB device is detected, see "After a USB device is plugged into a CVK host, the host cannot recognize the USB device."
¡ If a newly added USB device is detected, proceed to the next step.
4. When adding the USB device to a VM, it is important to examine if the selected USB controller is correct for the device and to identify the USB version of the device (USB 1.0, USB 2.0, or USB 3.0). Typically, for USB devices such as USB Key, encryption token, or SMS modem, it is recommended to use the USB 1.0 controller.
5. If the USB device is not recognized by the VM, it is possible that the driver may be incompatible or outdated. Examine if the driver version matches the operating system of the VM.
One way to identify whether the driver is correct is to install the same operating system on a physical machine and test if the driver works correctly or consult with the USB device manufacturer. Another way is to create a similar VM on the VMware platform, install the same driver, and load the USB device to see if it is recognized by the VM.
If the correct driver is used, and the VM still cannot recognize the device, proceed to the next step.
6. Use virsh nodedev-dumpxml xxx to view the XML information of the newly added USB device. xxx represents the name of the newly added USB device in the output from the virsh nodedev-list usb_device command.
root@ CVK:~# virsh nodedev-list usb_device
usb_2_1_5
usb_usb1
usb_usb2
usb_usb3
usb_usb4
In this example, the name of the newly added USB device is usb_2_1_5.
root@CVK:~# virsh nodedev-dumpxml usb_2_1_5
<device>
<name>usb_2_1_5</name>
<path>/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5</path>
<parent>usb_2_1</parent>
<driver>
<name>usb</name>
</driver>
<capability type='usb_device'>
<bus>2</bus>
<device>70</device>
<product id='0x6545'>DataTraveler G2 </product>
<vendor id='0x0930'>Kingston</vendor>
</capability>
</device>
7. After loading the USB device to the VM, use the virsh nodedev-dumpxml xxx command again to examine if there is any change in the values of device ID, product ID, and vendor ID.
If there is a change in these values, it could be a compatibility issue between the server and the USB device. To troubleshoot this issue, try installing the same operating system used by the VM directly on the server and see if the USB device can be used normally. Examine the system logs for any errors. It is important to ensure that the USB device is not only visible but also functional. If the USB device works fine when the operating system is installed directly on the server, please contact H3C Support.
Use of USB3.0 devices
For a USB3.0 device, if you select the USB3.0 controller from the Web interface at USB device adding to a VM, but the USB device cannot be found in the VM after loading, possible reasons include:
· The VM lacks USB 3.0 driver. USB 3.0 is a relatively new protocol, and some old operating systems do not have the corresponding driver built-in, which requires downloading and installing the appropriate USB 3.0 driver for the corresponding operating system.
You can view the item in the red the following contents highlighted in the red box in the device manager in systems that support USB 3.0:
· The USB3.0 device is incompatible with the server. In this case, after you plug the USB 3.0 device into the server equipped with UIS, log in through an SSH terminal, and execute lsusb -t, no new devices can be displayed.
Use of USB-to-serial devices
Plug in a USB-to-serial device into a server equipped with UIS, log in through an SSH terminal, and use lsusb -t to check for new USB devices. If the speed of the newly added device is 12 Mbps, select the USB 1.0 controller when you add the USB device to a VM. If the speed is 480 Mbps, select the USB 2.0 controller.
For example:
After you load a USB-to-serial port device to a VM, no newly added serial port device can be viewed on the VM. After you install the USB-to-serial driver on the VM, the device still cannot be displayed. This issue occurs because the selected USB2.0 controller does not match the device speed. The issue is removed after you change to a USB1.0 controller.
A USB-to-serial cable is connected to four switches on one end and connected to a UIS-equipped server on the other end. After you log in through an SSH terminal and use the lsusb -t command to view new devices, the four newly added devices cannot be seen simultaneously. If you unplug and then plug the cable repeatedly, only one, two, or three devices can be seen. When an unrecognized USB connector is plugged in, the following syslog is generated:
The log is generated because of bus negotiation errors occurred at device and server connection establishment. In this case, identify whether the server is compatible with the USB-to-serial connection method as a best practice. In this example, the server is not compatible with the method. After the HP FlexServer R390 server used on-site is replaced with an R590 server, all the four new devices can be correctly identified.
|
|
NOTE: If USB issues persist after troubleshooting, check the compatibility list and use USB Server. |
Performance improvement
Contact Technical Support.
Guest OS and VM restoration
Restrictions and guidelines
· This document provides a general Linux and Windows OS repair process, which can be referenced for other systems.
· Disaster recovery system repair does not ensure complete success. Perform data backup and take other necessary measures in advance.
· The repair method might not be able to completely repair the VM. If the damage is severe and cannot be repaired using ISO or related tools, professional disaster recovery tools might be needed for data recovery and rescue, such as Diskgenius and diskrec. If necessary, contact a professional data recovery company for assistance.
Preparation before repair
Backup of system disks
For a damaged system's hard drive, perform a full disk backup in advance as a best practice, in case one repair attempt fails and additional repair methods need to be attempted.
For a damaged hard drive, you can use dd or other backup tools to copy the disk and create a backup.
In virtualization systems, you can back up the VM image file and clone it to another storage pool. Alternatively, you can create a snapshot on the storage side for the disk data to prevent unexpected situations during repair.
Preparing the corresponding ISO system
For Linux systems, prepare a CentOS or Ubuntu ISO installation disk to facilitate repair of Linux system directories. For Windows systems, use the ISO file or disk with the same version as the damaged system.
|
CAUTION: · As a best practice, use the same version or a newer version of the ISO to mount and repair the system. · During the repair process, it may be discovered that the file system format in the old version of the ISO is incompatible with the new version, leading to repair failure. |
Linux system repair steps
1. Mount the optical drive and configure the system to boot from the optical drive, and then restart the system.
In a virtual environment using CAS, mount the ISO file as the optical drive on the VM to be repaired. On the Edit VM page, set the boot sequence to prioritize booting from the optical drive.
2. Start the system and attempt to repair it on the terminal.
In a virtual environment, locate the IP address of the CVK used by the VM and the corresponding VNC port in the CAS interface. Use a VNC client installed on your PC to connect to the port. TightVNC is a recommended VNC client.
|
|
NOTE: As a best practice, do not use a browser console because some browsers may require frequent clearing of the browser cache to open the corresponding page after a few operations. |
3. On the CentOS control interface, select Troubleshooting.
4. Select Rescue a CentOS System.
5. Select option 3 to enter the shell command prompt.
If an older version of the CentOS ISO is used, you can select the corresponding Skip button to enter the shell interface. The options for older CentOS versions include Continue, Read-only, Skip, and Advanced.
If using the Ubuntu ISO for repair, select Execute a shell in the installer environment.
|
CAUTION: · The Ubuntu 1804 ISO repair mode does not have the XFS related tools installed by default. As a best practice, use the latest version of CentOS for XFS repair. · Make sure to use the matching or updated version of the ISO. |
6. Use LVS to check if LVs are being used.
As shown in the following figure, 3 LVs are found, the swap does not need to be repaired, and the corresponding VG name is centos.
Use the lvchange -a y command to activate the corresponding LV to make it readable.
lvchange -a y centos/home
lvchange -a y centos/root
Check the file system on the corresponding LV. Different file systems require different repair commands. Use blkid /dev/centos/home to identify the file system.
blkid /dev/centos/home
|
CAUTION: · Different installation systems might have different VGs (some are centos, while others are VolGroup01, etc.). Select the VGs appropriately based on the actual output content. · If the system does not use LVM, use blkid to identify the file system on the corresponding /dev/sdaX partition. |
7. Repair XFS.
xfs_repair /dev/centos/lv_root
If the repair fails, collect log information (if any) and contact Technical Support.
8. Repair Ext4.
fsck /dev/datavg/lv_data
You might be prompted to enter yes in the middle, please do so. The repair steps for other file systems are similar.
9. Shut down the VM by executing the init 0 command.
10. Unmount the ISO drive and fall back to booting from the hard disk, and then restart the system.
11. Upon reboot, verify that the system's operations are normal.
Windows repair operations and steps
Symptom
After a CAS upgrade, a Windows 2008 VM prompts for repair upon starting up. Selecting repair results in a loading screen freeze, while selecting normal startup results in a black screen.
Repair steps
1. Attach the disk to another working Windows VM.
If the object being repaired is a VM, you can mount the system disk image of the faulty VM onto a working Windows VM. Then, use the disk check tool provided by Windows to check and repair disk errors. Delete the system disk of the faulty VM via the Edit VM > Disk page with the Delete Hardware operation.
2. On the working VM, add the system disk of the faulty VM via the Add Hardware option.
3. Select the faulty VM image. At this point, the system disk of the faulty VM can be seen in the working system.
For Windows 2012, a similar process applies. Select Computer Management, select a disk to view its properties, and perform error checking.
4. After mounting the disk, an error message might appear. Click on the blue error area to proceed.
Alternatively, scan and repair the properties of both partitions.
|
CAUTION: · For both the process of operation and the image files, please use original system ISO files. · In a virtualized environment, for qcow2 formatted files, multiple VMs cannot mount the same file at the same time. Therefore, one VM should unmount the file before another VM can mount the file for repair. A RAW format, preallocate set to zero format, or raw block format image can be mounted to multiple VMs simultaneously. |
5. If errors persist after repair, an ISO file needs to be mounted for further repair. Reattach the repaired disk to the faulty VM. A black screen error might appear, indicating boot failure or bootmgr missing.
6. Mount the system disk in the optical drive to repair the bootmgr. Change the boot order to booting from the optical drive. In Windows 2008, open Repair Computer and select the command prompt window.
7. Enter the command below to repair the bootmgr file. The machine should restart normally after the bootmgr is repaired.
|
CAUTION: · In a virtualization environment, select an IDE disk and mount the appropriate version of the ISO file. · If the system still reports errors after repair, such as antivirus software or application startup errors, the related software or program needs to be closed or uninstalled (modify the name so that it cannot be started) in a normally working Windows system. Try booting the system again and according to the specific error information, make corresponding adjustments and modifications. |
Upgrade
Contact Technical Support.
Independent deployment failure
Symptom
· After Workspace is installed in a VM, the system reports a 502 error. The Gauss installation logs show a failure to obtain the local network connection.
![]()
· Deployment by using an independent installation package.

Possible causes
The VM network is abnormal.
Solution
To resolve the issue, re-create a VM and select the correct Euler system, change the IP address, and restart the VM.
Unified authentication issue
CAS authentication service exception
Symptom
After the CAS service is enabled, you cannot UIS due to CAS authentication failure or other issues.
Solution
1. SSH to the CLI console of CVM and execute the mysql –p uis command to access the MySQL console.
2. Execute MariaDB [uis]> update TBL_PARAMETER set VALUE='0' WHERE NAME='cas.sso.enable';.
3. Reboot the UIS service: service uis-core restart.
4. Log in to UIS through the browser again.
UIS 2000 G6 hardware HA does not take effect
Checking server hardware information
1. Server model: H3C UIS 2000 G6.
2. Serial numbers for the upper and lower nodes on the server: *-L and *-U.
3. BMC information. Identify whether the upper and lower node CPLD, BIOS, and HDM versions match.
Checking the driver and application program status
If the CPLD_HA driver or CHD service fails to start, manually enable and start them by using the systemctl enable chd.service and systemctl start chd.service commands.
Checking the configuration file
Edit the configuration file at /etc/chd/chd.conf. Restart the CHD service after edit the file for the changes to take effect.
The hardware HA feature relies on the existing HA process cvk_ha. The cvk_ha process responds to CHD interrupts and completes fast HA migration.
Description for the configuration file:
cvk_ha { # Description, which must be unique.
srv_name "cvk_ha" # Name, which must be unique.
srv_pid "/var/run/ha_cvkd.pid" # PID file for running the process.
srv_proc_name "cvk_ha" # Process name that responds to signals. To obtain the process name, use the ps command.
srv_sig_on 10 # Server signal online
srv_sig_off 0 # Server signal offline
srv_sigs_max 0 # Set this value to limit the maximum number of failure signals sent. Use 0 to send signals continuously.
}
Identifying whether the cluster HA function is enabled

Verifying the configuration
Observe whether fast HA is triggered when AC power fails or the VM is shuts down.
Operations and maintenance monitoring data fails to be displayed
Possible causes
Sudden time jumps in the cluster or other anomalies caused monitoring database anomalies.
Symptom 1:
1. The system displays data retrieval failure when the operations and maintenance monitoring data is obtained.
2. Check the Prometheus database logs. The system displays opening storage failed. Also check the Prometheus-cluster-stderr---xxxxx.log file.
3. View logs
Solution:
1. Delete abnormal WAL files.
Access the /opt/h3c/var/lib/prometheus_node/data/wal directory. Identify whether the file numbers in this directory are consecutive. As shown in the following figure, there are two consecutive subsequences: 000001, 000002, 000003, and 000006, 000007.
2. Delete the sub-sequence with the smaller number. If you find abnormalities in Prometheus-cluster-stderr---xxx.log during the above troubleshooting, perform the same operation on the /opt/h3c/var/lib/prometheus_cluster/data/wal directory.
3. Restart the Prometheus process:
¡ If an exception is found in Prometheus-node-stderr---xxxx.log, restart the Prometheus-node process by executing the supervisorctl restart Prometheus-node command.
4. Restart the Prometheus-node process.
¡ If an exception is found in Prometheus-cluster-stderr---xxxx.log, restart the Prometheus-cluster process by executing the supervisorctl restart Prometheus-cluster command.
5. Restart the Prometheus-cluster process.
Symptom 2
1. The system displays data retrieval failure or no data exists when the operations and maintenance monitoring data is obtained.
2. Execute the following command to check Prometheus-related processes. You will find that prometheus-node or prometheus-cluster keeps restarting.
# supervisorctl status prometheus-node
# supervisorctl status prometheus-cluster
3. Check the Prometheus database logs. The system displays opening storage failed: invalid block sequence: block time ranges overlap.
¡ Example: level=error ts=2023-10-26T19:42:10.042Z caller=main.go:731 err="opening storage failed: invalid block sequence: block time ranges overlap:
¡ Also check Prometheus-cluster-stderr---xxxxx.log.
Solution
1. Delete the data in the directory with data errors.
¡ For the prometheus-node process, use the following commands to delete it.
# mkdir prometheus_node_bak
# cp -rf /opt/h3c/var/lib/prometheus_node/data/* prometheus_node_bak
# rm –rf /opt/h3c/var/lib/prometheus_node/data/*
¡ For the prometheus-cluster process, use the following commands to delete it.
# mkdir prometheus_cluster_bak
# cp -rf /opt/h3c/var/lib/prometheus_cluster/data/* prometheus_cluster_bak
# rm –rf /opt/h3c/var/lib/prometheus_cluster/data/*
This action will also delete historical monitoring data. Identify whether you need to back it up.
2. Restart the Prometheus process.
¡ If an exception is found in prometheus-node-stderr---xxxx.log, restart the prometheus-node process by executing the supervisorctl restart prometheus-node command.
Figure 11 Restarting the prometheus process
¡ If an exception is found in prometheus-cluster-stderr---xxxx.log, restart the prometheus-cluster process by executing the supervisorctl restart prometheus-cluster command.
Figure 12 Restarting the prometheus-cluster process
Host discovery: Hosts have empty serial numbers or the same serial number.
Possible causes: The host hardware does not have a serial number or the VMs share the same serial number during setup.
1. Check if the serial number is empty or the same as another host.
2. Customize the serial number as shown in the following figure.
3. Rescan to identify whether the custom serial number has taken effect.
In the Handy HA scenario, you cannot access the Web interface by using the HA IP.
Symptom
Login failed because the primary and backup Handy nodes became up and down alternatively or experienced abnormal power outage
· Symptom 1: Access the Handy management interface through a browser is unavailable using the HA IP.
· Symptom 2: After log-in via the HA IP, the system requires login with the management IP. However, when the management IP is used for login, the system requires login with the HA IP instead,.
Solution
1. Check the database processes on the primary and backup handy nodes. Identify the node where the database service fails to start. If neither the primary nor the backup handy node has the process running, use the node where the HA IP provides services.
# ps aux | grep mariadbcluster
2. Delete the gvwstate.dat file on this node. Skip this step if the file does not exist.
# sudo rm -rf /var/lib/mariadbcluster/gvwstate.dat
3. Set the value for the safe_to_bootstrap parameter to 1 for the node.
# vim /var/lib/mariadbcluster/grastate.dat Set the value for safe_to_bootstrap to 1.
4. Start the database service process on this node.
# service mariadbcluster bootstrap
5. Restart the database service processes on the other nodes in the cluster one by one. The nodes include the primary and backup handy nodes, as well as the nodes identified using Method 1.
# service mariadbcluster restart
6. Identify whether the database service is running correctly. Log in to the handy management interface again after recovery.
# /opt/h3c/bin/python /var/lib/ceph/shell/handyha/test_psql_status.py Run the script on the primary handy node. If PSQL_READY is returned, the database cluster has recovered to normal.
Host 2 experienced a power cycle when Host 1 entered maintenance mode. After Host 2 recovered, the OSD took a long time to restart (about 100 minutes).
Symptom
Host 2 experienced a power cycle when Host 1 entered maintenance mode. After Host 2 recovered, the OSD took a long time to restart (about 100 minutes).
Solution
Log out all sessions on the failed node by executing the iscsiadm -m session -u command.
When the CPU frequencies of the source and destination hosts differ before and after VM migration, the CPU limit set before migration changes to an invalid value after VM migration from E801P01 to E886P01
Symptom
When the CPU frequencies of the source and destination hosts differ before and after VM migration, the CPU limit set before migration changes to an invalid value after VM migration from E801P01 to E886P01.
Solution
Manually edit the values.
Interoperation with a third-party alarm server
Configuring a third-party alarm server on the UIS platform
1. Enter the UC 2.0 platform address as the server address.
2. Use the default port number 162.
3. Use the default community private.
Configuring UC 2.0 to monitor UIS alarms
Setting alarm rules
Adding the UIS platform
Adding the UIS platform
UIS platform added successfully
UC 2.0 received alarms
Alarm troubleshooting guide
If the UC platform does not receive an alarm, follow the instructions below for troubleshooting.
1. Identify whether the UIS platform has generated alarms and identify the source (frontend or backend).
2. If alarms are sent, capture them in the UC backend for inspection.
tcpdump -i any -vnn udp and port 162 -w [pcap file]
3. If you do not receive the alarm trap ID, check the sender.
Commonly used commands
UIS Manager commands
HA commands
H3C UIS Manager provides HA features. The following are the commonly used HA commands.
All the following commands, except for the cha -k set-loglevel level command run on a node where UIS Manager is deployed. The cha -k set-loglevel level command runs on a CVK host.
Obtaining the clusters managed by the HA process
cha cluster-list
# Obtain the clusters managed by the HA process.
root@UIS-UISManager:~# cha cluster-list
------------------------------------------------------------
HA database info:
Cluster list:
cluster:1, name:Cluster
HA memory info:
Cluster list:
cluster ID: 1
Obtaining state statistics for a cluster
cha cluster-status cluster-id
# Obtain the hosts and VMs in a cluster.
root@UIS-UISManager:~# cha cluster-status 1
------------------------------------------------------------
HA database info:
Cluster 1 information:
Is HA enabled: 1
Cluster priority: 1
2 nodes configured
6 VM configured
host and vm list:
Host:UIS-CVK01, vm:windows2008
Host:UIS-CVK02, vm:win2008
Host:UIS-CVK02, vm:rhce-lab
Host:UIS-CVK02, vm:Linux-RedHat5.9
Host:UIS-CVK02, vm:fundation1
Host:UIS-CVK02, vm:win7
HA memory info:
Cluster 1, Least_host_number(MIN_HOST_NUM) is 1.
Obtaining information for hosts in a cluster
cha node-list cluster-id
# Obtain information for hosts and VMs in a cluster.
root@UIS-UISManager:~# cha node-list 1
------------------------------------------------------------
HA database info:
In cluster 1, node list :
host: UIS-CVK01, in cluster: 1, IP: 192.168.11.1
host: UIS-CVK02, in cluster: 1, IP: 192.168.11.2
HA memory info:
Cluster 1, Least_host_number(PermitNum) is 1. hosts list:
host: UIS-CVK02 ID: 4
host: UIS-CVK01 ID: 3
Total host num in this cluster is: 2
Obtaining information for a host in a cluster
cha node-status host-name
# Obtain information for a host in a cluster.
root@UIS-UISManager:~# cha node-status UIS-CVK01
------------------------------------------------------------
HA database info:
Node UIS-CVK01 :
in cluster: 1
ip address: 192.168.11.1
VM count: 1
HA memory info:
Host: UIS-CVK01, ID: 3, IP address: 192.168.11.1
status: CONNECT
heart beat num: 101
storage total num: 1
storage fail num: 0
heartbeat fail num: 0
recv packet: 1
host model(maintain): 0
time statmp: Fri Jan 30 15:34:04 2015
Storage info:
storage name:sharefile path:/vms/sharefile
storage status:STORAGE_NORMAL
time stamp:0
update flag:0
last send flag:0
fail num:0
Obtaining information for a VM on a host
cha vm-list host-name
# Obtain information for a VM on a host.
root@UIS-CVK03:~# cha vm-list UIS-CVK01
------------------------------------------------------------
HA database info:
1 vms in host UIS-CVK01 :
vm: windows2008 ID: 11 HA-managed: 1 Target-role: 1
Obtaining information for a VM in a cluster
cha vm-status vm-name
# Obtain information for a VM in a cluster.
root@UIS-CVK03:~# cha vm-status windows2008
------------------------------------------------------------
HA database info:
vm ID: 11 name: windows2008
at node ID: 3
target-role: 1
is-managed: 1
prority: 1
storage name: sharefile
storage psth: /vms/sharefile
Setting the log level
cha set-loglevel module level
Parameters:
· cmd|UIS managerd: Sets the log level for the cmd or UIS Manager process.
· level: Specifies the log level, including debug, info, trace, warning, error, and fatal.
# Set the log level.
root@UIS-UIS Manager:~# cha set-loglevel info
Setting the log level for a CVK host
cha -k set-loglevel level
Parameters:
level: Specifies the log level, including debug, info, trace, warning, error, and fatal.
# Set the log level for a CVK host.
root@UIS-CVK01:/vms/sharefile# cha -k set-loglevel debug
Set cvk log level success.
root@UIS-CVK01:/vms/sharefile#
vSwitch commands
The following are the basic commands for vSwitches in UIS Manager.
Obtaining the internal version number of the vSwitch
root@hz-cvknode2:~# ovs-vsctl -V
ovs-vsctl (Open vSwitch) 2.9.1
DB Schema 7.15.1
Displaying status of processes related to the vSwitch
Execute the ps aux | grep ovs command on a CVK host. ovs_workq is an OVS kernel process, and ovsdb-server and ovs-vswitchd represent a monitor process and service process, respectively. If the SDN network is initialized, there are four additional ovsdb-server processes, which represent the SDN network north-bound and south-bound database processes and north-bound and south-bound database monitor processes.
[root@cvknode1 ~]# ps aux | grep ovs
root 2133 1.5 0.0 9180 5444 ? S<s Nov08 329:47 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --verbose=PATTERN:FILE:%d{%b %d %H:%M:%S}|%05N|%c|%p|%m --verbose=PATTERN:CONSOLE:%d{%b %d %H:%M:%S}|%05N|%c|%p|%m --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach
root 2255 1.3 0.0 1885032 293384 ? S<Lsl Nov08 296:37 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --verbose=PATTERN:FILE:%d{%b %d %H:%M:%S}|%05N|%c|%p|%m --verbose=PATTERN:CONSOLE:%d{%b %d %H:%M:%S}|%05N|%c|%p|%m --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach
root 371762 0.0 0.0 8200 452 ? Ss Nov18 0:00 ovsdb-server: monitoring pid 371763 (healthy)
root 371763 0.0 0.0 9400 5992 ? S Nov18 1:12 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/run/ovn/ovnnb_db.sock --pidfile=/run/ovn/ovnnb_db.pid --unixctl=/run/ovn/ovnnb_db.ctl --detach --monitor --remote=db:OVN_Northbound,NB_Global,connections --private-key=db:OVN_Northbound,SSL,private_key --certificate=db:OVN_Northbound,SSL,certificate --ca-cert=db:OVN_Northbound,SSL,ca_cert --ssl-protocols=db:OVN_Northbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Northbound,SSL,ssl_ciphers /var/lib/ovn/ovnnb_db.db
root 371859 0.0 0.0 8200 448 ? Ss Nov18 0:00 ovsdb-server: monitoring pid 371861 (healthy)
root 371861 0.0 0.0 12188 8640 ? S Nov18 1:36 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --remote=punix:/run/ovn/ovnsb_db.sock --pidfile=/run/ovn/ovnsb_db.pid --unixctl=/run/ovn/ovnsb_db.ctl --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections --private-key=db:OVN_Southbound,SSL,private_key --certificate=db:OVN_Southbound,SSL,certificate --ca-cert=db:OVN_Southbound,SSL,ca_cert --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers /var/lib/ovn/ovnsb_db.db
root 2172726 0.0 0.0 21964 2408 pts/5 S+ 11:00 0:00 grep --color=auto ovs
Restarting a vSwitch
root@UIS-CVK01:~# service openvswitch-switch restart
Adding a vSwitch
root@UIS-CVK01:~# ovs-vsctl add-br vswitch-app
After a vSwitch is added successfully, you can see the vSwitch on UIS Manager after connecting all hosts on UIS Manager.
Deleting a vSwitch
root@UIS-CVK01:~# ovs-vsctl del-br vswitch-app
A vSwitch cannot be deleted from UIS Manager after it is deleted from the CVK host.
Adding a port for a vSwitch
root@UIS-CVK01:~# ovs-vsctl add-port vswitch-app eth2
Deleting a port from a vSwitch
root@UIS-CVK01:~# ovs-vsctl del-port vswitch-app eth2
The port on a vSwitch cannot be deleted from UIS Manager after it is deleted from the CVK host.
Displaying vSwitch and port information
vswitch0 is an internal port (or local port), eth0 is a physical port, and vnet0 is a vSwitch port.
root@UIS-CVK01:~# ovs-vsctl show
ba390c40-8826-4a7a-8e17-f8834dab6eb3
Bridge "vswitch0"
Port "eth0"
Interface "eth0"
Port "vswitch0"
Interface "vswitch0"
type: internal
Port "vnet0"
Interface "vnet0"
root@UIS-CVK01:~#
Displaying the configuration on a vSwitch
root@UIS-CVK01:~# ovs-vsctl list br vswitch0
_uuid : 3500114d-5619-460e-ada7-d1b97f63c93c
br_mode : 【0】
controller : 【】
datapath_id : "0000ac162d88c35c"
datapath_type : ""
drop_unknown_uniUISt: 【】
external_ids : {}
fail_mode : 【】
firewall_port : 【】
flood_vlans : 【】
flow_tables : {}
ipfix : 【】
mirrors : 【】
name : "vswitch0"
netflow : 【】
other_config : {}
ports : 【16a48463-f90b-42fe-9a12-ceacfd256235, 5495812e-29e0-4364-a89f-b54ea52dd344, dec98186-2c83-447d-9215-28f99750a410】
protocols : 【】
sflow : 【】
status : {}
stp_enable : false
Displaying port configuration
root@UIS-CVK01:~# ovs-vsctl list port vnet0
_uuid : bc0b1e57-2d72-4fae-97b4-0bbca5d17ba1
TOS : routine
bond_downdelay : 0
bond_fake_iface : false
bond_mode : []
bond_updelay : 0
dynamic_acl_enable : false
external_ids : {}
fake_bridge : false
interfaces : [5495133f-7e81-4047-a0bd-734fae81f6f3]
lacp : []
lan_acl_list : []
lan_addr : []
mac : []
name : "vnet0"
other_config : {}
qbg_mode : [4]
qos : []
statistics : {}
status : {}
tag : [4]
tcp_syn_forbid : false
trunks : []
vlan_mode : []
vm_ip : []
vm_mac : "0cda411dad80"
wan_acl_list : []
wan_addr : []
Displaying the port number for a port in user mode and kernel mode
root@UIS-CVK01:~# ovs-appctl dpif/show
system@ovs-system: hit:10133796 missed:181938
flows: cur: 11, avg: 12, max: 23, life span: 79639399ms
hourly avg: add rate: 26.506/min, del rate: 26.462/min
daily avg: add rate: 24.205/min, del rate: 24.210/min
overall avg: add rate: 24.356/min, del rate: 24.354/min
vswitch0: hit:6478229 missed:39021
eth0 1/5: (system)
vnet1 2/8: (system)
vswitch0 65534/6: (internal)
For example, the port number of ether0 is 2 in user mode (OpenFlow port number) and 5 in kernel mode.
Displaying the MAC addresses on a vSwitch
root@UIS-CVK01:~# ovs-appctl fdb/show vswitch0
port VLAN MAC Age
1 0 00:0f:e2:5a:6a:20 134
2 0 0c:da:41:1d:3d:18 95
1 0 ac:16:2d:6f:3f:4a 6
1 0 a0:d3:c1:f0:a6:ca 6
1 0 c4:ca:d9:d4:c2:ff 2
4 0 0c:da:41:1d:6d:94 2
LOCAL 0 2c:76:8a:5d:df:a2 2
3 0 0c:da:41:1d:80:03 0
Displaying port binding information on a vSwitch
root@UIS-CVK02:~# ovs-appctl bond/show
---- vswitch-bond_bond ----
bond_mode: active-backup
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
lacp_status: off
slave eth2: enabled
active slave
may_enable: true
slave eth3: disabled
may_enable: false
Displaying flow entry information
root@UIS-CVK01:~# ovs-ofctl dump-flows vswitch0
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=752218.541s, table=0, n_packets=15106363, n_bytes=3556156038, idle_age=0, hard_age=65534, priority=0 actions=NORMAL
Displaying kernel flow entry information on a vSwitch
root@UIS-CVK01:~# ovs-appctl dpif/dump-flows vswitch0
skb_priority(0),in_port(5),eth(src=74:25:8a:36:d8:9b,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.88.8.1/255.255.255.255,tip=10.88.8.206/255.255.255.255,op=1/0xff,sha=74:25:8a:36:d8:9b/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:2, bytes:120, used:3.018s, actions:6
skb_priority(0),in_port(5),eth(src=38:63:bb:b7:ed:6c,dst=01:00:5e:00:00:fc),eth_type(0x0800),ipv4(src=10.88.8.140/0.0.0.0,dst=224.0.0.252/0.0.0.0,proto=17/0,tos=0/0,ttl=1/0,frag=no/0xff), packets:1, bytes:66, used:1.139s, actions:6
skb_priority(0),in_port(5),eth(src=c4:34:6b:6c:ef:a8,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.200/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:17, bytes:1564, used:3.370s, actions:6
skb_priority(0),in_port(5),eth(src=14:58:d0:b7:24:07,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.229/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:6, bytes:692, used:0.771s, actions:6
skb_priority(0),in_port(5),eth(src=14:58:d0:b7:53:f6,dst=01:00:5e:7f:ff:fa),eth_type(0x0800),ipv4(src=10.88.8.146/0.0.0.0,dst=239.255.255.250/0.0.0.0,proto=17/0,tos=0/0,ttl=1/0,frag=no/0xff), packets:1, bytes:175, used:0.739s, actions:6
Displaying all kernel flow entries
root@UIS-CVK01:~# ovs-dpctl dump-flows
skb_priority(0),in_port(4),eth(src=c4:34:6b:6c:f5:ab,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.159/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:25, bytes:2300, used:0.080s, actions:3
skb_priority(0),in_port(5),eth(src=14:58:d0:b7:53:f6,dst=33:33:00:01:00:02),eth_type(0x86dd),ipv6(src=fe80::288d:70d6:36ce:60f3/::,dst=ff02::1:2/::,label=0/0,proto=17/0,tclass=0/0,hlimit=1/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:6
skb_priority(0),in_port(13),eth(src=0c:da:41:1d:80:03,dst=c4:ca:d9:d4:c2:ff),eth_type(0x0800),ipv4(src=192.168.2.15/255.255.255.255,dst=192.168.2.121/0.0.0.0,proto=6/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:1, bytes:54, used:2.924s, actions:2
skb_priority(0),in_port(4),eth(src=c4:34:6b:68:9b:78,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=fe80::85b7:25a0:d116:907a/::,dst=ff08::2/::,label=0/0,proto=17/0,tclass=0/0,hlimit=128/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:3
skb_priority(0),in_port(4),eth(src=5c:dd:70:b0:39:3d,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.11.149/255.255.255.255,tip=192.168.11.150/255.255.255.255,op=1/0xff,sha=5c:dd:70:b0:39:3d/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:1, bytes:60, used:0.264s, actions:3
Capturing packets on a port
Use tcpdump to capture packets on the port corresponding to the vSwitch: For more information about the tcpdump command, see "Networking."
tcpdump -i vnet1 -s 0 -w /tmp/test.pcap host 200.1.1.1 &
SDN commands
H3C UIS cluster CVM contains the ovn module. The following lists the commonly used ovn commands.
Obtaining the ovn version
[root@cvknode1 ~]# ovn-nbctl -V
ovn-nbctl 22.03.1
Open vSwitch Library 2.17.90
DB Schema 6.1.0
Checking the process status
· Check the status of the ovn-northd process
[root@cvknode1 ~]# systemctl status ovn-northd
● ovn-northd.service - OVN northd management daemon
Loaded: loaded (/usr/lib/systemd/system/ovn-northd.service; enabled; vendor preset: disabled)
Active: active (exited) since Wed 2023-11-22 11:40:44 CST; 2 days ago
Main PID: 577576 (code=exited, status=0/SUCCESS)
Tasks: 8 (limit: 306436)
Memory: 9.5M
CGroup: /system.slice/ovn-northd.service
├─ 577605 "ovsdb-server: monitoring pid 577606 (healthy)" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ">
├─ 577606 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-nb.log --remote=punix:/run/ovn/ovnnb_db.sock --pidfile=/run/ovn/ovnnb_db.pid --un>
├─ 577622 "ovsdb-server: monitoring pid 577623 (healthy)" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" ">
├─ 577623 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --remote=punix:/run/ovn/ovnsb_db.sock --pidfile=/run/ovn/ovnsb_db.pid --un>
├─ 577633 "ovn-northd: monitoring pid 577634 (healthy)" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" "" >
└─ 577634 ovn-northd -vconsole:emer -vsyslog:err -vfile:info --ovnnb-db=unix:/run/ovn/ovnnb_db.sock --ovnsb-db=unix:/run/ovn/ovnsb_db.sock --no-chdir --log-file=/var/l>
Nov 24 15:47:08 cvknode1 ovsdb-server[577623]: ovs|00009|jsonrpc|WARN|unix#5: receive error: Connection reset by peer
Nov 24 15:47:08 cvknode1 ovsdb-server[577623]: ovs|00010|reconnect|WARN|unix#5: connection dropped (Connection reset by peer)
Nov 24 15:48:58 cvknode1 ovsdb-server[577606]: ovs|00049|jsonrpc|WARN|unix#36: receive error: Connection reset by peer
Nov 24 15:48:58 cvknode1 ovsdb-server[577606]: ovs|00050|reconnect|WARN|unix#36: connection dropped (Connection reset by peer)
Nov 24 15:51:22 cvknode1 ovsdb-server[577606]: ovs|00051|jsonrpc|WARN|unix#38: receive error: Connection reset by peer
Nov 24 15:51:22 cvknode1 ovsdb-server[577606]: ovs|00052|reconnect|WARN|unix#38: connection dropped (Connection reset by peer)
Nov 24 15:52:18 cvknode1 ovsdb-server[577606]: ovs|00053|jsonrpc|WARN|unix#41: receive error: Connection reset by peer
Nov 24 15:52:18 cvknode1 ovsdb-server[577606]: ovs|00054|reconnect|WARN|unix#41: connection dropped (Connection reset by peer)
Nov 24 15:56:27 cvknode1 ovsdb-server[577623]: ovs|00011|jsonrpc|WARN|unix#6: receive error: Connection reset by peer
Nov 24 15:56:27 cvknode1 ovsdb-server[577623]: ovs|00012|reconnect|WARN|unix#6: connection dropped (Connection reset by peer)
· Check the status of the ovn-controller process
[root@cvknode1 ~]# systemctl status ovn-controller
● ovn-controller.service - OVN controller daemon
Loaded: loaded (/usr/lib/systemd/system/ovn-controller.service; enabled; vendor preset: disabled)
Active: active (running) since Wed 2023-11-22 11:40:45 CST; 2 days ago
Main PID: 578155 (ovn-controller)
Tasks: 4 (limit: 306436)
Memory: 4.1M
CGroup: /system.slice/ovn-controller.service
└─ 578155 ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/ovn>
Notice: journal has been rotated since unit was started, output may be incomplete.
Viewing the north-bound database
[root@cvknode1 ~]# ovn-nbctl show
switch bfe0ebf6-c116-4838-a5bc-f8f70dd0fdcb (std-15c16e9c-1286-420d-aec4-6a32ad11553d) (aka pubnet1)
port std-15c16e9c-1286-420d-aec4-6a32ad11553d_lnet
type: localnet
addresses: ["unknown"]
port lsp-pubnet1-r1
type: router
router-port: lrp-r1-pubnet1
switch 8531bfe6-6cbe-407e-98d7-d28754a07608 (std-f0455795-d214-4b36-a62b-16f71d2ebf04) (aka net1)
port 3cebf254-8bc7-462f-9bef-7c9dd330b124 (aka xjxnj-1_0c:da:41:1d:52:99)
addresses: ["0c:da:41:1d:52:99 10.10.10.2"]
port 0906ddcb-ea28-4a60-91d2-301ebbf2a8d6 (aka xjxnj-3_0c:da:41:1d:2a:b0)
addresses: ["0c:da:41:1d:2a:b0"]
port 8b532d0e-5faa-4d71-8ae4-7f4dabca8e3e (aka xjxnj-2_0c:da:41:1d:3b:53)
addresses: ["0c:da:41:1d:3b:53 10.10.10.3"]
port lsp-net1-sub1-r1
type: router
router-port: lrp-r1-net1-sub1
port c768b637-44fc
addresses: ["66:01:00:00:00:03 10.10.10.8"]
router ec0d0744-3678-443b-9811-58542296818c (std-333e3b30-1f30-498d-bd1d-63758c716246) (aka r1)
port lrp-r1-net1-sub1
mac: "66:01:00:00:00:01"
networks: ["10.10.10.1/24"]
port lrp-r1-pubnet1
mac: "66:01:00:00:00:02"
networks: ["10.99.221.4/24"]
gateway chassis: [fdfad6dc-f57f-4eb2-8564-848909099a31]
nat 21cc0da1-32a1-470d-8f61-07b8dd67e8fc
external ip: "10.99.221.3"
logical ip: "10.10.10.3"
type: "dnat_and_snat"
nat 4c255190-3972-4d67-a6bc-f414177f1fb5
external ip: "10.99.221.2"
logical ip: "10.10.10.2"
type: "dnat_and_snat"
Viewing the south-bound database
[root@cvknode1 ~]# ovn-sbctl show
Chassis "fdfad6dc-f57f-4eb2-8564-848909099a31"
hostname: cvknode1
Encap vxlan
ip: "10.10.2.1"
options: {csum="true"}
Port_Binding cr-lrp-r1-pubnet1
Port_Binding c768b637-44fc
Viewing network egress configuration
[root@cvknode1 ~]# ovs-vsctl list open_vswitch
_uuid : a9e2a39e-5aa6-4679-96e5-c9a7b89026a9
acls : []
bridges : [077e541b-92ca-48d2-bb10-c0cec48eec58, 40b9f8b2-6714-439a-bbc4-04c08805ba82, 7f07633f-4ece-4783-96d7-be22d466294b]
cur_cfg : 22
datapath_types : [netdev, system]
datapaths : {system=313cbe36-963d-4193-b4e1-503eabdee554}
db_version : "8.3.0"
dpdk_initialized : false
dpdk_version : none
external_ids : {hostname=cvknode1, ovn-bridge-mappings="uis:vs_business", ovn-cms-options=enable-chassis-as-gw, ovn-encap-ip="10.10.2.1", ovn-encap-type=vxlan, ovn-remote="tcp:10.99.221.86:6642", rundir="/var/run/openvswitch", system-id="fdfad6dc-f57f-4eb2-8564-848909099a31"}
iface_types : [bareudp, erspan, geneve, gre, gtpu, internal, ip6erspan, ip6gre, lisp, patch, stt, system, tap, vxlan]
manager_options : []
next_cfg : 22
other_config : {drain_bypass=True, hw-offload="false", offload-ct="true", vlan-limit="0"}
ovs_version : "2.16.4"
ssl : []
statistics : {}
system_type : H3Linux
system_version : "2.0.2"
Viewing the switch list
[root@cvknode1 ~]# ovn-nbctl list logical_switch
_uuid : bfe0ebf6-c116-4838-a5bc-f8f70dd0fdcb
acls : []
copp : []
dns_records : []
external_ids : {description="", external="true", from=std, managed=ovnagent, mtu="1500", "neutron:network_name"=pubnet1}
forwarding_groups : []
load_balancer : []
load_balancer_group : []
name : std-15c16e9c-1286-420d-aec4-6a32ad11553d
other_config : {vlan-passthru="false"}
ports : [4325e08f-eb21-451c-82ed-1840b193ccc7, 6cac3d4c-1023-4213-bd4a-bfbd79763e3e]
qos_rules : [65c13928-3617-42b4-ba7f-d784f086366e]
_uuid : 8531bfe6-6cbe-407e-98d7-d28754a07608
acls : []
copp : []
dns_records : []
external_ids : {description="", external="false", from=std, managed=ovnagent, mtu="1500", "neutron:network_name"=net1}
forwarding_groups : []
load_balancer : []
load_balancer_group : []
name : std-f0455795-d214-4b36-a62b-16f71d2ebf04
other_config : {vlan-passthru="false"}
ports : [1872aa4d-85c1-4a28-af9f-6ad5c7fc581b, 301f2520-f518-4351-a502-3d3672cd087b, 4b8688ab-ef09-473e-9017-2dd195324005, 8a2ed28e-c41f-45ea-917a-160ef0f0a041, 904f0a26-652c-4cb2-950b-8f7733554062]
qos_rules : []
Viewing the DHCP option list
[root@cvknode1 ~]# ovn-nbctl list dhcp_options
_uuid : 02d1d6ed-6d76-4ec4-a055-1e1d658dc04c
cidr : "10.10.10.0/24"
external_ids : {description="", from=std, internal_name=std-96c8c7fd-b389-4456-98fd-3d40f127e521, ip_version="4", linked=std-333e3b30-1f30-498d-bd1d-63758c716246, managed=ovnagent, network_id=std-f0455795-d214-4b36-a62b-16f71d2ebf04, subnet_name=sub1}
options : {classless_static_route="{0.0.0.0/0,10.10.10.1}", lease_time="3600", mtu="1500", router="10.10.10.1", server_id="10.10.10.1", server_mac="66:01:00:00:00:01"}
_uuid : 5d7693ba-2403-4b6e-bd53-ec472b94759b
cidr : "10.99.221.0/24"
external_ids : {description="", externalGatewayIp="10.99.221.4", from=std, internal_name=std-fec88d08-0e37-4e07-96c5-f3ab3c752408, ip_version="4", linked="ec0d0744-3678-443b-9811-58542296818c", managed=ovnagent, network_id=std-15c16e9c-1286-420d-aec4-6a32ad11553d, subnet_name=pubsub1}
options : {classless_static_route="{0.0.0.0/0,10.99.221.1}", lease_time="3600", mtu="1500", router="10.99.221.1", server_id="10.99.221.1", server_mac="66:01:00:00:00:00"}
Viewing the logical router list
[root@cvknode1 ~]# ovn-nbctl list logical_router
_uuid : ec0d0744-3678-443b-9811-58542296818c
copp : []
enabled : true
external_ids : {description="", from=std, managed=ovnagent, "neutron:router_name"=r1}
load_balancer : []
load_balancer_group : []
name : std-333e3b30-1f30-498d-bd1d-63758c716246
nat : [21cc0da1-32a1-470d-8f61-07b8dd67e8fc, 4c255190-3972-4d67-a6bc-f414177f1fb5]
options : {}
policies : []
ports : [08cbef44-9379-4ecf-b4a7-4c8e8c0d3105, a3a82614-fc8c-41b1-aa6a-0125a3e247a3]
static_routes : [961103a0-bfb9-4557-9b0a-50d138439f82]
Viewing the port list
[root@cvknode1 ~]# ovn-nbctl list logical_switch_Port
_uuid : 301f2520-f518-4351-a502-3d3672cd087b
addresses : ["0c:da:41:1d:2a:b0"]
dhcpv4_options : []
dhcpv6_options : []
dynamic_addresses : []
enabled : true
external_ids : {from=std, ifaceid_as_name="1", managed=ovnagent, "neutron:port_name"="xjxnj-3_0c:da:41:1d:2a:b0", qos_policy_id="", security_groups=""}
ha_chassis_group : []
name : "0906ddcb-ea28-4a60-91d2-301ebbf2a8d6"
options : {}
parent_name : []
port_security : []
tag : []
tag_request : []
type : ""
up : false
Viewing the QoS list
[root@cvknode1 ~]# ovn-nbctl list qos
_uuid : 65c13928-3617-42b4-ba7f-d784f086366e
action : {}
bandwidth : {burst=65536000, rate=100000000}
direction : from-lport
external_ids : {managed=ovnagent, qos_policy_id="e2cdf741-7c9c-42f3-81af-a830d06e3ad1"}
match : "ip4.src == 10.99.221.3 || ip4.src == 10.99.221.2"
priority : 1003
Viewing the ACL list
[root@cvknode1 ~]# ovn-nbctl list acl
_uuid : c97a2140-5d79-4d20-af68-2c22046f7b8a
action : drop
direction : from-lport
external_ids : {description="security group base rule -- ipv6,egress", ethertype=ip6, from=std, managed=ovnagent, port_range_max="", port_range_min="", protocol=any, remote_ip_prefix="::/0", security_group_id="732e8385-fe95-475e-8cb9-a93299c59f6d", tcp_flags=""}
label : 0
log : false
match : "inport == @std_3cfd20f7_e3a1_41f7_b6fc_df85bb8506ec && ip6 && ip6.dst == ::/0"
meter : []
name : sg_ipv6_egress_base_white
options : {}
priority : 1001
severity : []
Viewing all NAT rules
[root@cvknode1 ~]# ovn-nbctl list nat
_uuid : 4c255190-3972-4d67-a6bc-f414177f1fb5
allowed_ext_ips : []
exempted_ext_ips : []
external_ids : {_name=fip1, description="", fixed_ip_address="10.10.10.2", floating_network_id=std-15c16e9c-1286-420d-aec4-6a32ad11553d, from=std, internal_name="532812ee-ac24-4ffe-bef1-b5e211812c25", logical_port="3cebf254-8bc7-462f-9bef-7c9dd330b124", managed=ovnagent, qos_policy_id="e2cdf741-7c9c-42f3-81af-a830d06e3ad1", subnet_id="5d7693ba-2403-4b6e-bd53-ec472b94759b"}
external_ip : "10.99.221.2"
external_mac : "0c:da:41:1d:52:99"
external_port_range : ""
logical_ip : "10.10.10.2"
logical_port : "3cebf254-8bc7-462f-9bef-7c9dd330b124"
options : {}
type : dnat_and_snatw
Viewing the security group list
[root@cvknode1 ~]# ovn-nbctl list port_group
_uuid : 732e8385-fe95-475e-8cb9-a93299c59f6d
acls : [4665023d-30b4-4384-8c1d-e02583d54f2a, 5d904d87-384a-4daa-ac50-ff8540152fd8, 863ce243-5c83-47c0-bece-886df88e0c1d, 9c6a450f-5516-48f6-a16d-aabb8edfccab, c97a2140-5d79-4d20-af68-2c22046f7b8a, da01fe2b-3f7a-4e48-b669-f2ba78751adf]
external_ids : {description="", from=std, managed=ovnagent, name=acl1, priority="1002", type=white}
name : std_3cfd20f7_e3a1_41f7_b6fc_df85bb8506ec
ports : []
Viewing the status of a load balancer
· View the status of the haproxy process.
[root@cvknode1 ~]# ps -ef | grep haproxy
root 862127 860223 0 16:46 pts/1 00:00:00 grep --color=auto haproxy
nobody 1329507 1 0 Nov23 ? 00:00:00 /usr/share/ovn-agent/usr/sbin/haproxy -f /var/lib/ovn-agent/lbaas/125eb2e6-e104-416e-bf11-b395dfeb14c7/haproxy.conf -p /var/lib/ovn-agent/lbaas/125eb2e6-e104-416e-bf11-b395dfeb14c7/haproxy.pid
· View the namespace list.
[root@cvknode1 ~]# ip netns ls | grep lbaas-
lbaas-125eb2e6-e104-416e-bf11-b395dfeb14c7 (id: 0)
iSCSI commands
H3C UIS uses iSCSI to mount IP SAN storage devices. When an iSCSI shared file system has exceptions, you can use iSCSI commands for troubleshooting. To enable iser mode, add the -I iser option to the iscsiadm command.
Discovering iSCSI storage
iscsiadm -m discovery -t st -p ISCSI_IP or
iscsiadm -m discovery -t st -p ISCSI_IP –I iser (iser mode)
# Discover iSCSI sotorage.
root@HZ-UIS01-CVK01:~# iscsiadm -m discovery -t st -p 192.168.1.248:3260
192.168.1.248:3260,1 iqn.1991-05.com.microsoft:c09599-cmh-target
root@HZ-UIS01-CVK01:~#
Displaying iSCSI storage discovery records
iscsiadm -m node
# Display iSCSI storage discovery records.
root@HZ-UIS01-CVK01:~# iscsiadm -m node
192.168.1.248:3260,1 iqn.1991-05.com.microsoft:c09599-cmh-target
Deleting the iSCSI storage discovery records
iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP
iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP –I iser (iser mode)
# Delete the iSCSI storage discovery records.
# iscsiadm -m node -o delete -T iqn.1991-05.com.microsoft:c09599-cmh-target -p
192.168.1.248:3260
Logging in to an iSCSI storage device
iscsiadm -m node -T LUN_NAME -p ISCSI_IP –l or
iscsiadm -m node -T LUN_NAME -p ISCSI_IP –l –I iser (iser mode)
# Log in to an iSCSI storage device.
root@HZ-UIS01-CVK01:~# iscsiadm -m node -T iqn.1991-05.com.microsoft:c09599-cmh-target
-p 192.168.1.248:3260 -l
Logging in to 【iface: default, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:
192.168.1.248,3260】
Login to 【iface: default, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:
192.168.1.248,3260】: successful
Logging out of an iSCSI storage device
iscsiadm -m node -T LUN_NAME -p ISCSI_IP –u
iscsiadm -m node -T LUN_NAME -p ISCSI_IP –u –I iser (iser mode)
# Log out of an iSCSI storage device.
root@HZ-UIS01-CVK01:~# iscsiadm -m node -T iqn.1991-05.com.microsoft:c09599-cmh-target
-p 192.168.1.248:3260 -u
Logging out of session 【sid: 4, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:
192.168.1.248,3260】
Logout of 【sid: 4, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:
192.168.1.248,3260】: successful
Mounting FC storage
Obtaining the HBA card information
Method 1: Log in to the CVM system, access the storage management page, and then click a storage adapter to view HBA card information. If the card is in active state, storage access is available.
Method 2: Display driver information. If the driver is loaded correctly for the HBA card, HBA information will be displayed in the /sys/class/fc_host/host* directory.
[root@cvknode2-158 /]#ls /sys/class/fc_host/
host0 host2 host3 host4
[root@cvknode2-158 /]#ls /sys/class/fc_host/host0
device issue_lip npiv_vports_inuse port_state speed supported_classes system_hostname vport_create
dev_loss_tmo max_npiv_vports port_id port_type statistics supported_speeds tgtid_bind_type vport_delete
fabric_name node_name port_name power subsystem symbolic_name uevent
Connecting to the FC storage
Execute the following command:
echo hba_channel target_id target_lun > /sys/class/scsi_host/host*/scan
Hba_channel represents the HBA card channel, target_id represents the target ID, and target_lun represents the LUN. To obtain the information, execute the /sys/class/fc_transport/ command.
[root@cvknode2-158 /]#ls /sys/class/fc_transport/
target0:0:0
[root@cvknode2-158 /]# echo 0 0 0 > /sys/class/scsi_host/host0/scan
Disconnecting the FC storage
Execute the following command:
echo 1 > /sys/block/sdX/device/delete
sdX represents the SD corresponding to the FC storage device. To obtain the SD ID, execute the ll command.
[root@cvknode2-158 /]# ll /dev/disk/by-path
lrwxrwxrwx 1 root root 9 Oct 12 09:48 pci-0000:05:00.0-fc-0x21020002ac01e2d7-lun-0 -> ../../sdb
[root@cvknode2-158 /]# echo 1 > /sys/block/sdb/device/delete
Tomcat commands
H3C UIS Manager provides the Tomcat service. When an exception occurs, you can restart the Tomcat service.
To view the Tomcat status:
root@HZ-UIS01-CVK01:~# service tomcat8 status
* Tomcat servlet engine is running with pid 3362
To stop the Tomcat service:
root@HZ-UIS01-CVK01:~# service tomcat8 stop
* Stopping Tomcat servlet engine tomcat8
...done.
To start the Tomcat service:
root@HZ-UIS01-CVK01:~# service tomcat8 start
* Starting Tomcat servlet engine tomcat8
...done.
To restart the Tomcat service:
root@ HZ-UIS01-CVK01:~# service tomcat8 restart
* Stopping Tomcat servlet engine tomcat8
...done.
* Starting Tomcat servlet engine tomcat8
...done.
root@ HZ-UIS01-CVK01:~#
Database commands
H3C UIS Manager uses mariadb to provide database service.
To view the mariadb service status:
root@HZ-UIS01-CVK01:~# systemctl status mariadb.service
● mariadb.service - MariaDB database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
Active: active (running) since Fri 2023-11-17 16:27:07 CST; 6 days ago
Main PID: 2525459 (mysqld_safe)
Tasks: 86 (limit: 819200)
Memory: 945.5M
CGroup: /system.slice/mariadb.service
├─ 2525459 /bin/sh /usr/bin/mysqld_safe --basedir=/usr --skip-name-resolve
└─ 2525826 /usr/libexec/mariadbd --basedir=/usr --datadir=/var/lib/mysql-share --plugin-dir=/usr/lib64/mariadb/plugin --skip-name-resolv>
To stop the mariadb service:
root@HZ-UIS01-CVK01:~#
root@HZ-UIS01-CVK01:~# systemctl stop mariadb.service
To start the mariadb service:
root@HZ-UIS01-CVK01:~# systemctl start mariadb.service
virsh commands
virsh commands allow you to obtain VMs attached to a CVK host and the VM status. In addition, you can start and shut down the VMs by using the commands.
Displaying the VM status from a CVK host
Execute the virsh list --all command to view the status of all VMs on the host.
root@UIS-CVK01:/vms# virsh list --all
Id Name State
----------------------------------------------------
4 windows2008 running
- Linux-RedHat5.9 shut off
Starting a VM from a CVK host
Execute the virsh start VM name command.
root@UIS-CVK01:/vms# virsh start Linux-RedHat5.9
Domain Linux-RedHat5.9 started
root@UIS-CVK01:/vms#
Shutting down a VM from a CVK host
Execute the virsh shutdown VM name command.
root@UIS-CVK01:/vms# virsh shutdown Linux-RedHat5.9
Domain Linux-RedHat5.9 is being shutdown
casserver commands
The casserver service collects statistics such as disk usage and alarm information. When an exception occurs on the casserver service, you can use the service casserver restart command to restart the casserver service:
qemu commands
Use qemu commands to display image file information and convert disk file formats.
Displaying image file information for a VM
On UIS Manager, you can view the image file path for a VM. The Storage Path field displays the path for the image file for the VM.
To display basic information for an image file, for example, file format, file size, and used file size, execute the qemu-img info command. For a three-level image file, the level-2 image file name will also be displayed.
root@ZJ-UIS-001:~# qemu-img info /vms/defaultPool_hdd/A-048
image: /vms/defaultPool_hdd/A-048
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 1.3G
cluster_size: 262144
backing file: /vms/defaultPool_hdd/A-048_base_1
backing file format: qcow2
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
If you display level-2 image file information, you can see information for the level-1 image file (base image file).
root@ZJ-UIS-001:~# qemu-img info /vms/defaultPool_hdd/A-048_base_1
image: /vms/defaultPool_hdd/A-048_base_1
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 1.0M
cluster_size: 262144
backing file: /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602
backing file format: qcow2
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
If you display information for the base image file, you cannot see information for image files of other levels.
root@ZJ-UIS-001:~# qemu-img info /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602
image: /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 5.5G
cluster_size: 262144
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
Consolidating image files
If a VM uses a multi-level image file, you can use the qemu-img convert command to consolidate the image file.
root@ZJ-UIS-001:/vms/defaultPool_hdd# qemu-img convert -O qcow2 -f qcow2 A-048 A048-test
The consolidated image file is not a multi-level image file.
root@ZJ-UIS-001:~# qemu-img info /vms/defaultPool_hdd/A048-test
image: /vms/defaultPool_hdd/A048-test
file format: qcow2
virtual size: 30G (32212254720 bytes)
disk size: 1.4G
cluster_size: 262144
Format specific information:
compat: 1.1
lazy refcounts: false
refcount bits: 16
corrupt: false
ONEStor commands
ONEStor commands are used to obtain the cluster status and status of monitors nodes, OSDs, and PGs.
· Mon (Monitor)—Monitor node in the cluster.
· OSD—Physical disks corresponding to the storage nodes.
· PG—Virtual node on the dashboard. A PG resides in a storage pool. Every time a storage pool is added, a number of PGs will be added in the cluster.
Obtaining the health status of a cluster
· ceph health detail
This command displays PGs in unclean, inconsistent, and degraded states. As shown in the following figure, if the cluster is in healthy state, the system displays HEALTH_OK.
![]()
If HEALTH_WARN is displayed, it indicates that the cluster is in warning state. The following figure shows that 1024 PGs are in degraded state, and 1024 PGs are in unclean state. This indicates that 33.333% PGs in the cluster are degraded, 1/3 OSDs are in down state, and the PGs on the down OSDs are in degraded state.
The following are the causes of this issue:
¡ A node is unreachable. Identify whether the service network and storage network are reachable.
¡ A node has failed. Use the ceph osd tree command to identify the node where the down OSDs reside and identify whether the node hardware and operating system are operating correctly.

· ceph -s
To display the cluster status, use the ceph -s command.

The output from the command is as follows:
¡ health
- HEALTH_OK—The cluster is in healthy state.
- HEALTH_WARN—Alarms have been triggered.
- HEALTH_ERR—A severe error such as data inconsistency has occurred in the cluster.
Typically, prompts related to PG and OSD abnormalities or time inconsistencies will appear in the health section.
¡ monmap—Number of monitors and the nodes where the monitors reside. As shown in the figure, the cluster contains three monitors, which reside in node 117, node 118, and node 119 respectively. The first monitor is the primary monitor.
¡ osdmap—Total number of OSDs, number of OSDs in up state, and number of OSDs in in state. As shown in the figure, all 18 OSDs in the cluster are in up and in states, which indicate they are all operating correctly.
¡ pgmap—Number of PGs, number of storage pools, space that a data replica is used, and total number of objects. This field also displays cluster usage information, including used capacity, free capacity, and total capacity. In addition, the PG state is displayed.
Error prompts:
¡ too many PGs per OSD—The error message will not be displayed if you add more OSDs or reduce the number of storage pools.

¡ clock skew detected—The system time is inconsistent on monitor nodes. Execute the ntpdate –u IP command to synchronize time from the primary NTP server. IP is the IP address of the primary NTP server. As shown in the following figure, six OSDs are in down state. The cluster puts the PGs corresponding to the OSDs in degraded state.

Execute the ceph -s command. The output shows that some PGs are abnormal, one monitor is down, 12 OSDs are up, and 18 OSDs are in in state. This indicates that node 118 might have an error or the service network is in abnormal state.

· ceph -w
To monitor a cluster, use the ceph -w command. The command continuously outputs information and can be terminated by pressing Ctrl+C. When the cluster's PG state is normal, the output from the ceph -w command is consistent with the output from the ceph -s command, as shown in the following figure.

To view cluster state changes, see the osdmap, pgmap, mon, and osd pgmap sections.

OSD commands
· ceph osd tree
To display the OSDs on each node and their positions in the CRUSH map, use the ceph osd tree command. This command helps maintain a large cluster. The following figure shows OSDs in normal state.

Use osd.1 as an example. The weight of the OSD is 0.89999, it is in rack 3, the host node is node 111, and the OSD is down and out state.

|
IMPORTANT: The system marks the state of an OSD as down out five minutes after it state changes to down. · An OSD is in down/out state. A hard disk failure might occur. · The OSDs on the node are down. A node exception or network exception might occur. |
· ceph osd perf
To display the latency of an OSD, use the ceph osd perf command. If services are running, a latency of less than 100 ms is normal. When the cluster is idle, the latency is typically within 10 ms.

If the latency keeps higher than 10 ms when the cluster is idle, troubleshoot the issue. If the latency is higher than 100 ms when a large number of services are running, identify whether a network or hardware failure has occurred.
· ceph osd df
To display the disk usage, use the ceph osd df command. The command can display OSD statistics, such as OSD size, used capacity, available capacity, and usage. If the usage of an OSD is higher than 85%, the near full alarm is displayed on UIS Manager. If the usage of an OSD is higher than 5, the cluster is unavailable.
As shown in the following figure, the cluster contains three OSDs, each having a size of 920G, used capacity of 501G, and available capacity of 419G. The total capacity is 2762G, used capacity is 1505G, available capacity is 1257G, and usage of 54.48%.

Obtaining the cluster usage statistics
ceph df
The command is used to obtain usage statistics for the cluster and storage pools. It displays the total capacity, remaining capacity, used capacity, and percentage of the cluster. In addition, it displays information about the storage pools, such as their names, IDs, usage status, and the number of objects in each storage pool.
For example, as shown in the figure below, the remaining capacity of the cluster is 1257G, the used capacity is 1505G, the usage is 54.48%, the used capacity by storage pool p1 is 499G, the usage is 54.29%, the available space is 419G, and the number of objects is 128003.

ONEStor commands
iostat
Use the iostat command to monitor system input/output (I/O) devices that are loaded and the length of time it takes for the system to process the I/O requests. This command is useful for analyzing whether there is a bottleneck in the IO process during the interaction between the process and the operating system. When executed without any parameters specified, this command displays statistical information from the time the system was started to the current time when the command was executed. The following figure shows the output from the iostat command.

The following are the descriptions for the items:
· The first line displays the system version, host name, and date.
· avg-cpu—CPU usage statistics. For a multi-core CPU, this value is the average value of all cores.
· Device—IO statistics for each disk.
· CPU and disk IO statistics.
For the CPU statistics, the value for iowait is important. It indicates the percentage of time that the CPU was idle during which the system had pending disk I/O requests.
Disk names are displayed in the sdX format.
|
Item |
Description |
|
tps |
Number of IO read and write requests per second that were issued by the process. |
|
kB_read/s |
The amount of data read from the device expressed in kilobytes per second. One sector has a size of 512 bytes. |
|
kB_wrtn/s |
The amount of data written to the device expressed in kilobytes per second. |
|
kB_read |
Total number of kilobytes read. |
|
kB_wrtn |
Total number of kilobytes written. |
The iostat -x 1 command displays real-time IO device statistics. Specify the -x option when you analyze IO usage statistics.

The iostat -x 1 command displays real-time information about the disk usage for a node. If the %util ratio of a single disk is high or close to 100%, a single disk might have an issue. If the overall disk %util ratio of the cluster is over 80% or close to 100%, the cluster's disk IO usage has reached its limit. In such a case, you can add more disks or reduce the services provided by the cluster.
The following are the descriptions for the items:
|
Item |
Description |
|
rrqm/s |
Number of read requests merged per second that were queued to the device. |
|
wrqm/s |
Number of write requests merged per second that were queued to the device. |
|
r/s |
Number of read requests completed per second for the device. |
|
w/s |
Number of write requests completed per second for the device. |
|
rkB/s |
Number of kilobytes read from the device per second. |
|
wkB/s |
Number of kilobytes written to the device per second. |
|
avgrq-sz |
Average size (in sectors) of the requests that were issued to the device. |
|
avgqu-sz |
Average queue length of the requests that were issued to the device. |
|
await |
Average time (in milliseconds) for I/O requests issued to the device to be served. The time includes the time spent by the requests in queue and the time spent servicing them. |
|
svctm |
Average service time (in milliseconds) for I/O requests that were issued to the device. |
|
%util |
Percentage of CPU time during which I/O requests were issued to the device. |
top
The top command provides real-time monitoring of resource usage for different processes in the system. This command can sort tasks based on CPU usage, memory usage, and execution time.
The following are the items that need to be focused on:
· Load average
· Tasks
· CPU usage
Sorting processes by CPU or memory usage can help identify which processes are causing system issues. To do this, press either the uppercase F or O key and choose either k or n when you execute the top command.
The following is the output from the top command.

The following are the descriptions for the items:
· The first line is task queue information. This line shows the current time, system uptime, the number of currently logged-in users, and the system load, which is the average length of the task queue, displayed as three values for the past 1 minute, 5 minutes, and 15 minutes, respectively.
· The second and third lines display information about processes and CPUs. If multiple CPUs exist, these contents might exceed two lines. The content in memory is swapped out to the swap area, and then swapped back to memory, but the unused swap area has not been overwritten. This value is the size of the swap area that already exists in memory. When the corresponding memory is swapped out again, there is no need to write to the swap area again.
The area below system information displays detailed information for each process.
|
Item |
Description |
|
PID |
Process ID |
|
RUSER |
Username of the owner of the process |
|
UID |
User ID of the owner of the process |
|
USER |
Username of the owner of the process |
|
VIRT |
Total virtual memory used by the process. |
|
RES |
The amount of actual physical memory a process is consuming in kb. |
|
SHR |
Shared memory size (kb) used by the process. |
|
%MEM |
Memory usage of the process. |
|
%CPU |
CPU usage of the process. |
You can press the uppercase F or O key, and then press a-z to sort the processes according to the corresponding column. The uppercase R key can reverse the current sorting.
You can use the following commands during the execution of the top command.
|
Item |
Description |
|
q /Ctrl+C |
Quits the program. |
|
m |
Displays memory information. |
|
t |
Displays process and CPU information. |
|
c |
Displays command name and complete command. |
|
M |
Sorts processes by available memory. |
|
P |
Sorts processes by CPU usage. |
|
T |
Sorts processes by time/accumulated time. |
Other query commands
· lsblk
Use the lsblk command to view information about hard drive capacity, partition, usage, and mounting.

In the above figure, the NAME column lists all hard drives and partitions, SIZE displays the total capacity of the hard drive and partition size, TYPE displays the type of hard drive and partition, and MOUNTPOINT displays the file system mount point. The sda disk is the system disk with a size of 279.4G. Six hard disks with a size of 558.9G each are mounted as OSDs, and the size of the log partition is 10G.
· mount
Use the mount command to display all mounted file systems in a cluster and their types.

· df -h
Use the df -h command to list all mounted file systems, and display the total capacity, used capacity, available capacity, usage, and mount point for each mounted file system.

The output shows that 6 OSDs have been mounted, each with a capacity of 549G and a usage of 1%.
· fdisk -l
Use the fdisk -l command to display the hard drives, partitions, sizes, and usage of the nodes.

· free
Use the free command to display the total memory, used memory, buffer, cache, and swap usage of a node.

Linux commands
vi
To create or edit a file in the Linux operating system, you must use commands such as vi and vim.
The Vi editor has two modes: Command and Insert.
The following uses the test.txt file as an example.
Executing the vi command
Enter the vi test.txt command in the command line window of Linux. If the test.text file already exists, you can use the vi command to edit its content. If the file does not exist, this command creates the file.
Entering Command mode
When you first open a file with Vi, you are in Command mode. The file does not contain any information.
In Command mode, you can use keyboard keys to navigate, delete, copy, paste except entering text.
Entering Insert mode
To enter Insert mode, press i, o, or a, as shown in the following figure.
Entering Insert mode
Enter the file content.
Returning to Command mode
To return to Command mode, press ESC.
Save & Exit
After you return to Command mode, enter a colon (:),and then execute the wq command to save the file and exit the vi editor.
To view the created file, execute the ls command.
Basic commands
Displaying the current directory
Use the pwd command to print the current working directory.
root@HZ-UIS01-CVK01:~# pwd
/root
Displaying file information
Use the ls command to display file information in the current directory.
# ls [-aAdfFhilnrRSt] directory name
Options and parameters:
-a: Lists all files including those that begin with .
-A: Lists all files except for . and ..
-d: Lists directory entries instead of contents
-h: when used with -l (long list), prints sizes in human readable format, for example GB, KB
-i: Prints the index number of each file
-r: Reverses order while sorting
-R: Lists all subdirectories recursively
-S: Displays entries sorted by file size
-t: Sorts by modification time
Example:
root@HZ-UIS01-UIS Manager:~# ls -al
total 44
drwx------ 5 root root 4096 May 23 15:33 .
drwxr-xr-x 24 root root 4096 May 13 09:47 ..
-rw------- 1 root root 847 Jan 1 12:35 .bash_history
-rw-r--r-- 1 root root 3106 Apr 19 2012 .bashrc
drwx------ 2 root root 4096 May 17 17:23 .cache
-rw-r--r-- 1 root root 8 May 23 15:33 UIS.conf
drwxr-xr-x 2 root root 4096 May 23 15:32 h3c
-rw-r--r-- 1 root root 140 Apr 19 2012 .profile
drwxr-xr-x 2 root root 4096 May 22 09:50 .ssh
-rw------- 1 root root 4962 May 23 15:33 .viminfo
Changing the working directory
Use the cd command to change the working directory.
.: The current directory.
..: One level up from the current directory.
-: Previous working directory
~: Home directory for the current user
For example, ~account represents the home directory for the account user.
Example:
root@HZ-UIS01-CVK01:/# cd ~root
# Enter the home directory for the root user.
root@HZ-UIS01-CVK01:~# cd ~
# Return to the home directory for the current user.
root@HZ-UIS01-CVK01:~# cd
# Return to the home directory for the current user.
root@HZ-UIS01-CVK01:~# cd ..
# Enter the directory one level up from the current directory.
root@HZ-UIS01-CVK01:/# cd -
# Return to the previous directory.
root@HZ-UIS01-CVK01:~# cd /root
# Enter the /root directory.
root@HZ-UIS01-CVK01:~# cd ../root
# Enter the root directory under the previous directory.
Creating a new directory
Use the mkdir (make directory) command to create a new directory.
# mkdir [-mp] directory name
Options and parameters:
-m: Sets access privilege.
-p: Adds a directory including its sub directory.
Example:
root@HZ-UIS01-UIS Manager:~# ls
root@HZ-UIS01-UIS Manager:~# mkdir h3c
root@HZ-UIS01-UIS Manager:~# ls
h3c
root@HZ-UIS01-UIS Manager:~#
Copying a file or directory
Use the cp (copy) command to copy a file or directory.
# cp [-adfilprsu] source destination
# cp [options] source1 source2 source3 .... destination directory
Options and parameters:
-a: Same as -pdr
-f: If any existing destination file can't be opened, delete it and attempt again
-i: Asks for confirmation before overwriting the destination file.
-p: Preserves the file attributes of the original file in the copy.
-r: Copies files recursively. All files and subdirectories in the specified source directory are copied to the destination.
If more than two source files exist, the last destination file must be a directory.
Example:
# Copy a file.
root@HZ-UIS01-UIS Manager:~# ls
UIS.conf
root@HZ-UIS01-UIS Manager:~# cp UIS.conf UIS.conf.bak
root@HZ-UIS01-UIS Manager:~# ls
UIS.conf UIS.conf.bak
root@HZ-UIS01-UIS Manager:~#
# Copy a directory.
root@HZ-UIS01-UIS Manager:~# ls
h3c
root@HZ-UIS01-UIS Manager:~# cp -rf h3c h3c.bak
root@HZ-UIS01-UIS Manager:~# ls
h3c h3c.bak
root@HZ-UIS01-UIS Manager:~#
Securely copying a file
scp (secure copy) allows you to securely copy files and directories between two locations. The protocol ensures the transmission of files is encrypted. It is a safer option for the cp (copy) command. If a disk on your server is read only system, you can use the scp command copy the files on that disk to a destination.
#scp [option] [source directory] [destination directory]
Options and parameters:
-1: Protocol 1 will be used.
-2: Protocol 2 will be used.
-4: Only IPv4 addresses will be used.
-6: Only IPv6 addresses will be used.
-B: Executes in batch mode, deactivating every query for user input.
-C: Enable compression. Compression will be activated, and transfer speed will be enhanced while copying with this option.
-p: Preserves file permissions, access time, and modifications while copying.
-q: Execute SCP in quiet mode. This option will not display the transfer process.
-r: Copies the directories and files recursively.
-v: Activates verbose mode. It will display the SCP command execution progress step-by-step on the terminal window. It is useful in debugging.
-c: Cipher. choose the cipher for the process of data encryption. This option is passed directly to SSH.
-F ssh_config: For SSH, describe a replacement configuration file. This option is passed directly to SSH.
-i identity_file: File through which to read the status for public key authentication. This option is passed directly to SSH.
-l limit: Restricts the bandwidth in Kbit/s.
-o ssh_option: Arranged options in the ssh_configure format to SSH.
-P port: Port to which to link.
-S program: Applies a specified function for encryption connection. This program must be able to understand the SSH(1) option.
Example:
root@HZ-UIS01-CVK01:~# scp UIS-E0218H06-Upgrade.tar.gz HZ-UIS01-CVK02:/root
UIS-E0218H06-Upgrade.tar.gz 100% 545MB 90.8MB/s 00:06
root@HZ-UIS01-CVK01:~#
Removing a file or directory
Use the rm (remove) command to remove a file or directory.
# rm [-fir] file or directory name
Options and parameters:
-f: Removes a directory forcefully.
-i: Removes a file interactively.
-r: Removes a directory recursively. Use this option with caution.
Example:
root@HZ-UIS01-UIS Manager:~# ls
h3c
root@HZ-UIS01-UIS Manager:~# rm -rf h3c
root@HZ-UIS01-UIS Manager:~# ls
root@HZ-UIS01-UIS Manager:~#
Moving files and directories or renaming a file or directory
Use the mv (move) command to move files and directories from one directory to another or rename a file or directory.
# mv [-fiu] source destination
# mv [options] source1 source2 source3 .... directory
Options and parameters:
-f: Overwrites the destination file or directory without asking for permission.
-i: Asks for permission to overwrite.
-u: Only moves those files that do not exist.
Example:
root@HZ-UIS01-UIS Manager:~# ls
UIS.conf
root@HZ-UIS01-UIS Manager:~# mv UIS.conf UIS.conf.bak
root@HZ-UIS01-UIS Manager:~# ls
UIS.conf.bak
root@HZ-UIS01-UIS Manager:~#
Creating an archive and extracting the archive files
# tar [-j|-z] [cv] [-f file name] filename... archive
# tar [-j|-z] [xv] [-f file name] [-C directory] extracting
Options and parameters:
-c: Creates the archive.
-t: Displays or lists files inside the archived file.
-x: Extracts archives. This option can be used together with the -C option.
The -c, -t, and -x option cannot be used in the same command.
-j: Filters archive tar files with the help of tbzip. As a best practice, use *.tar.bz2 as the archive name.
-z: A zip file and informs the tar command that makes a tar file with the help of gzip. As a best practice, use *.tar.gz as the archive name.
-v: Displays verbose information.
-f filename: Creates an archive along with the provided name of the file.
-C directory: Use this option to extract files in a specific directory.
Example:
# Create an archive.
root@HZ-UIS01-UIS Manager:~# ls
UIS.conf UIS.conf-01 UIS.conf-02
root@HZ-UIS01-UIS Manager:~# tar -czvf UIS.tar.gz UIS.conf*
UIS.conf
UIS.conf-01
UIS.conf-02
root@HZ-UIS01-UIS Manager:~# ls
UIS.conf UIS.conf-01 UIS.conf-02 UIS.tar.gz
# Extract the archive files.
root@HZ-UIS01-UIS Manager:~# ls
UIS.tar.gz
root@HZ-UIS01-UIS Manager:~# tar -xzvf UIS.tar.gz
UIS.conf
UIS.conf-01
UIS.conf-02
root@HZ-UIS01-UIS Manager:~# ls
UIS.conf UIS.conf-01 UIS.conf-02 UIS.tar.gz
System commands
Displaying the system kernel
# uname [-asrmpi]
Options and parameters:
-a: Displays all system information.
-s: Displays the system kernel name.
-r: Displays the kernel release.
-m: Displays the name of the machine’s hardware name, for example, i686 or x86_64.
-p: Displays the architecture of the CPU.
-i: Displays the hardware platform.(x86)
Example:
root@ZJ-UIS-001:~# uname -a
Linux ZJ-UIS-001 4.1.0-generic #1 SMP Wed Nov 9 02:04:23 CST 2016 x86_64 x86_64 x86_64 GNU/Linux
Displaying uptime of the system
Example:
root@HZ-UIS01-UIS Manager:~# uptime
17:54:04 up 3 days, 23:28, 1 user, load average: 0.08, 0.12, 0.13
Displaying system resource statistics
# vmstat [-a] [delay [total monitors]]
# vmstat [-fs]
# vmstat [-S unit]
# vmstat [-d]
# vmstat [-p partition]
Options and parameters:
-a: Displays active/inactive memory.
-f: Displays the number of forks since boot.
-s: Displays a table of various event counters and memory statistics.
-S: Followed by k or K or m or M switches outputs of bytes.
-d: Lists disk statistics.
-p: Followed by some partition name for detailed statistics.
Example:
root@HZ-UIS01-CVK01:~# vmstat 1 5
procs ---------------memory----------------- -----swap---- -----io---- ----system-- -----cpu--------
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 60402384 58716 1712736 0 0 15 6 87 116 1 0 98 0
0 0 0 60402500 58716 1712736 0 0 1 0 631 1051 0 0 100 0
0 0 0 60402608 58756 1712752 0 0 0 840 1444 1640 2 0 98 0
0 0 0 60403360 58756 1712760 0 0 2 33 991 1346 0 0 100 0
2 0 0 60400944 58780 1712784 0 0 0 60 2225 1682 0 0 99 0
Field description for Vm mode:
procs
· r: Number of processes waiting for run time.
· b: Number of processes in uninterruptible sleep.
memory
· swpd: The amount of virtual memory used.
· free: The amount of idle memory.
· buff: The amount of memory used as buffers.
· cache: The amount of memory used as cache.
swap
· si: The amount of memory swapped in from disk (/s).
·so: The amount of memory swapped to disk (/s).
If the values are large, data in the memory is swapped between disks and the primary adapter, which means the system has low efficiency.
· io
¡ bi: Blocks received from a block device (blocks/s).
¡ bo: Blocks sent to a block device (blocks/s). A larger value indicates that the system IO is busy.
system
· in: Number of interrupts per second, including the clock.
· cs: Number of context switches per second.
A larger value indicates more frequent communications between the system and devices such as disks, NICs, and clocks.
· CPU
¡ us: Time spent running non-kernel code.
¡ sy: Time spent running kernel code. (system time). id: Time spent idle.
¡ wa: Time spent waiting for IO.
¡ st: Time stolen from a VM. Supported in versions later than Linux 2.6.11.
Displaying the load on a device
Use the iostat command to display CPU and I/O usage statistics.
#iostat[parameter][time][count]
Options and parameters:
-c: Displays the CPU usage. It is mutually exclusive with the -d option.
-d: Displays the disk usage. It is mutually exclusive with the -c option.
-k: Displays statistics in kilobytes per second. The default unit is block.
-m: Displays statistics in megabytes per second.
-N: Displays logical volume mapping (LVM) statistics.
-n: Displays NFS statistics.
-p: Displays statistics for block devices and all their partitions used by the system. You can specify a device after this option, for example, # iostat -p /dev/sda. This option is mutually exclusive with the -x option.
-t: Prints the time for each report displayed.
-x: Displays detailed information.
-v: Displays version information.
Remarks:
· avg-cpu
¡ %user: Displays the percentage of CPU usage that occurred when executing at the user level.
¡ %nice: Displays the percentage of CPU usage that occurred when executing at the user level with nice priority.
¡ %user: Displays the percentage CPU usage that occurred when executing at the system (kernel) level.
¡ %steal: Displays the percentage of time spent in involuntary wait by the virtual CPU or CPUs when the hypervisor was servicing another virtual processor.
¡ %iowait: Displays the percentage of time the CPUs were idle during which the system had an outstanding disk I/O request.
¡ %idle: Displays the percentage of time the CPUs were idle.
· Device
¡ tps: Number of IO requests per second that were issued to the device.
¡ Blk_read /s: The amount of data read from the device expressed in blocks per second.
¡ Blk_wrtn/s: The amount of data written to the device expressed in blocks per second.
¡ Blk_read: Total number of blocks read.
¡ Blk_wrtn: Total number of blocks written.
|
IMPORTANT: · If the value of %iowait is too high, the disk has IO issues. If the value of %idle is high, the CPUs are idle. · If the value of %idle is high but the system responds slowly, the CPUs might be waiting for memory allocation. You must increase the memory capacity. · If the value of %idle keeps lower than 10, the system has low CPU processing capabilities. |
iostat outputs:
· Blk_read: Total number of blocks read.
· Blk_wrtn: Total number of blocks written.
· kB_read/s: The amount of data read from the driver expressed in kilobytes per second.
· kB_wrtn/s: The amount of data written to the driver expressed in kilobytes per second.
· kB_read: Total number of kilobytes read.
· kB_wrtn: Total number of kilobytes written.
· rrqm/s: Number of read requests merged per second that were queued to the device.
· wrqm/s: Number of write requests merged per second that were queued to the device.
· r/s: Number of read requests completed per second for the device.
· w/s: Number of write requests completed per second for the device.
· rsec/s: Number of sectors read from the device per second.
· wsec/s: Number of sectors written to the device per second.
· rkB/s: The amount of data read from the device expressed in kilobytes per second.
· wkB/s: The amount of data written to the device expressed in kilobytes per second.
· avgrq-sz: Average size (in sectors) of the requests that were issued to the device.
· avgqu-sz: Average queue length of the requests that were issued to the device.
· await: Average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
· svctm: Average service time (in milliseconds) for I/O requests that were issued to the device.
· %Util: Percentage of CPU time where I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.
Example:
root@HZ-UIS01-CVK01:~# iostat
Linux 3.13.6 (HZ-UIS01-CVK01) 12/16/2015 _x86_64_ (24 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
20.48 0.00 3.48 0.23 0.00 75.80
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 10.17 1.76 269.57 1309400 201017740
sdb 16.43 181.78 202.21 135552881 150792613
Execute the iostat -d -x -m /dev/sdb 1 5 command to display detailed information about /dev/sdb.
Testing the read and write performance for a disk
dd [option]
Options and parameters:
· if=file: Specifies the input file name. The default is standard input.
· of=file: Specifies the output file name. The default is standard output.
· ibs=bytes: Reads BYTES bytes at a time. One block is BYTES bytes.
· obs=bytes: Writes BYTES bytes at a time. One block is BYTES bytes.
· bs=bytes: Reads and writes BYTES bytes at a time. It can replace ibs and obs.
· cbs=bytes: Converts BYTES bytes at a time. It is the size of the conversion buffer.
· skip=blocks: Skips BLOCKS ibs-sized blocks at start of input.
· seek=blocks: Skips BLOCKS ibs-sized blocks at start of output. This option is valid only when the output file is a disk or tape.
· count=blocks: Copies only BLOCKS input blocks. The block size is the number of bytes specified by ibs.
· conv=ASCII: Converts EBCDIC to ASCII.
· conv=ebcdic: Converts ASCII to EBCDIC.
· conv=ibm: Converts ASCII to alternate EBCDIC.
· conv=block: Converts pad newline-terminated records with spaces to cbs-size.
· conv=ublock: Replaces trailing spaces in cbs-size records with newline.
· conv=uUISe: Converts lower-case letters to upper-case letters.
· conv=lUISe: Converts upper-case letters to lower-case letters.
· conv=notrunc: Does not truncate the output file.
· conv=swab: Swaps every pair of input bytes.
· conv=noerror: Continue after read errors.
· conv=sync: Pads every input block with NULLs to ibs-size; when used with block or unblock, pad with spaces rather than NULLs.
The specified numbers must be multiplied by their corresponding factors if they are followed by any of the following characters: b=512, c=1, k=1024, w=2, xm=number m, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB=1000*1000*1000, G=1024*1024*1024.
Displaying the free and used memory
free [-b|-k|-m|-g] [-t]
Options and parameters:
· -b: Displays output in Kbytes. The output can also be displayed in b(bytes), m(Mbytes), k(Kbytes), and g(Gbytes).
· -t: Displays summary for physical memory + swap space.
Example:
root@HZ-UIS01-CVK01:~# free
total used free shared buffers cached
Mem: 65939360 4208888 61730472 0 83224 277944
-/+ buffers/cache: 384772062091640
Swap: 10772220 0 10772220
User commands
Creating a user group
groupadd [-g gid] groupname
Options and parameters:
-g: Group ID.
Example:
root@HZ-UIS01-CVK01:~# groupadd -g 1000 it
root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it
it:x:1000:
Deleting a user group
groupdel groupname
Example:
root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it
it:x:1000:
root@HZ-UIS01-CVK01:/etc# groupdel it
root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it
root@HZ-UIS01-CVK01:/etc#
Creating a user
useradd [-u UID] [-g initial_group] [-G supplementary group] [-m/M] [-d home_dir] [-s shell] username
Options and parameters:
· -u: User ID.
· -g: Initial group.
· -G: A list of supplementary groups which the user is also a member of.
· -M: The user home directory will not be created.
· -m: The user’s home directory will be created if it does not exist.
· -d: Specifies a directory as the home directory.
· -s: The name of the user’s login shell. If no login shell exists, the system selects the default login shell.
Example:
root@HZ-UIS01-CVK01:~# useradd -u 1000 -g it -m -d /home/it-user01 -s /bin/bash it-user01
root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01
it-user01:x:1000:1000::/home/it-user01:/bin/bash
root@HZ-UIS01-CVK01:~# ls /home/
it-user01
Deleting a user
userdel [-r] username
Options and parameters:
-r: Deletes files in the user’s home directory along with the home directory itself.
Example:
root@HZ-UIS01-CVK01:~# userdel -r it-user01
root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01
root@HZ-UIS01-CVK01:~# ls /home
root@HZ-UIS01-CVK01:~#
Setting the password
passwd [-l] [-u] [--sdtin] [-S] [-n days] [-x days] [-w days] [-i date] username
Options and parameters:
· -l: Locks the password.
· -u: Unlocks the password.
· -S: Displays password related parameters.
· -n: Sets the minimum number of days between password changes.
· -x: Sets the maximum number of days a password remains valid. After MAX_DAYS, the password must be changed.
· -w: Sets the number of days of warning before a password change is required.
· -i: Sets the day on which the password will expire.
Example:
root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01
it-user01:x:1000:1000::/home/it-user01:/bin/bash
root@HZ-UIS01-CVK01:~#
root@HZ-UIS01-CVK01:~# passwd it-user01
Enter new UNIX password:
Retype new UNIX password:
passwd: password updated successfully
Switching the user account
su [-lm] [-c command] [username]
Options and parameters:
· -: starts a new login shell as another username. If you do not add a username, you switch to the root user.
· -l: Similar as the - option except that you must specify the user account.
· -m: Preserves the current environment.
· -c: Passes a command to the shell.
Example:
root@HZ-UIS01-CVK01:~# su - it-user01
it-user01@HZ-UIS01-CVK01:~$ exit
logout
it-user01@HZ-UIS01-CVK01:~$ su - root
Password:
root@HZ-UIS01-CVK01:~#
File management commands
Changing the group ownership of a file or directory
chgrp [-R] group name directory/file
Options and parameters:
-R: Recursively changes the group of the directory and each file in the directory.
Example:
root@HZ-UIS01-CVK01:/home/it-user01# ls -l
total 4
drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile
root@HZ-UIS01-CVK01:/home/it-user01# chgrp root testFile
root@HZ-UIS01-CVK01:/home/it-user01# ls -l
total 4
drwxr-xr-x 2 it-user01 root 4096 May 30 15:44 testFile
Changing the file owner and group
chown [-R] user file or directory
chown [-R] user:group name file or directory
Options and parameters:
-R: Recursively changes the ownership of the directory and each file in the directory.
Example:
root@HZ-UIS01-CVK01:/home/it-user01# ls -l
total 4
drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile
root@HZ-UIS01-CVK01:/home/it-user01# chown root:root testFile
root@HZ-UIS01-CVK01:/home/it-user01# ls -l
total 4
drwxr-xr-x 2 root root 4096 May 30 15:44 testFile
root@HZ-UIS01-CVK01:/home/it-user01#
Changing file or directory mode bits or permissions.
chmod [-R] xyz file or directory
Options and parameters:
· xyz: File attribute in number, a sum of the values for r, w, and x.
· -R: Recursively changes file mode bits of the directory and the files in the directory.
Example:
root@HZ-UIS01-CVK01:/home/it-user01# ls -l
total 4
drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile
root@HZ-UIS01-CVK01:/home/it-user01# chmod 777 testFile
root@HZ-UIS01-CVK01:/home/it-user01# ls -l
total 4
drwxrwxrwx 2 it-user01 it 4096 May 30 15:44 testFile
root@HZ-UIS01-CVK01:/home/it-user01#
Process management commands
Displaying all running processes
top [-d number] | top [-bnp]
Options and parameters:
· -d: Specifies the delay between screen updates in seconds. The default value is 5 seconds.
· -b: Starts top in Batch mode, which is used to send output from top to a file.
· -n: Specifies the maximum number of iterations, or frames, top can produce before ending. This option is used together with the -b option.
· -p: Monitor only processes with specified process IDs.
You can use the following interactive commands during execution of the top:
· ?: Provides a reminder of all the basic interactive commands.
· P: Sorts by CPU usage.
· M: Sorts by memory usage.
· N: Sorts by PID.
· T: Sorts by CPU time used by processes.
· k: You will be prompted for a PID and then the signal to be sent.
· r: You will be prompted for a PID and then the value to nice it to.
· q: Quits top.
Example:
top - 17:40:48 up 2:13, 1 user, load average: 0.45, 0.55, 0.66
Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.6%us, 0.1%sy, 0.0%ni, 99.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65939360k total, 5703848k used, 60235512k free, 85832k buffers
Swap: 10772220k total, 0k used, 10772220k free, 1746992k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4939 root 20 0 4583m 1.3g 4728 S 12 2.1 17:36.67 kvm
4874 root 20 0 4520m 908m 4576 S 5 1.4 11:54.61 kvm
4043 root 20 0 10.9g 402m 16m S 1 0.6 13:43.34 java
2370 root 20 0 23676 2168 1316 S 0 0.0 0:30.29 ovs-vswitchd
3184 root 20 0 15972 744 544 S 0 0.0 0:04.78 irqbalance
1 root 20 0 24456 2444 1344 S 0 0.0 0:04.07 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0 0.0 0:00.07 ksoftirqd/0
6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0
Output description:
· The first line displays the following:
¡ Current time and length of time since last boot
¡ Total number of users
¡ System load avg over the last 1, 5 and 15 minutes
A small value indicates that the system is idle. If the value is higher than 1, you must identify whether the system is too busy.
· The second line shows total tasks or threads. If the value for zombie is not 0, you must identify which process has become a zombie process.
· The third line shows the CPU state percentages. You must focus on the %wa parameter, which represents the time waiting for I/O completion. An IO issue can cause a system to respond slowly.
· The fourth and fifth lines show the physical and virtual memory statistics. If the virtual memory usage is high, the physical memory of the system is insufficient.
The lower section displays statistics for each process.
· PID: ID of the process.
· USEr: User of the process.
· PR: Priority of the process. A smaller value means the process has a higher execution priority.
· NI: Time running niced user processes. A smaller value means the process has a higher execution priority.
· %CPU: CPU usage.
· %MEM: Memory usage.
· TIME+: CPU time.
To view information about a process:
root@HZ-UIS01-CVK01:~# top -d 2 -p 4939
top - 08:59:13 up 17:31, 1 user, load average: 0.75, 0.70, 0.58
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65939360k total, 6484728k used, 59454632k free, 229880k buffers
Swap: 10772220k total, 0k used, 10772220k free, 1995728k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4939 root 20 0 4583m 1.5g 4728 S 2 2.4 100:48.79 kvm
Returning the status of a process
ps aux
ps -lA
ps axjf
Options and parameters:
· -A: Displays information about all accessible processes on the system.
· -a: Displays information about all processes that are associated with terminals.
· -u: Displays information for processes with user IDs in the userlist.
· -x: Used together with the -a option to display complete information.
Output format:
· l: Displays BSD long format.
· j: BSD job control format.
· -f: Does full-format listing.
# Display bash processes.
root@HZ-UIS01-CVK01:~# ps -l
F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 R 0 11338 32857 0 80 0 - 2102 - pts/2 00:00:00 ps
4 S 0 32857 32797 0 80 0 - 5428 wait pts/2 00:00:00 bash
Using the ps -l command only lists programs related to the operating environment (bash). The parent program will be its own bash, which extends to the init process.
· F: Flags associated with the process.
¡ 4: used super-user privileges.
¡ 1: forked but didn't exec.
· S: Process state. R: Running. S: Sleep. D: Uninterruptible sleep (typically IO).
· T: Stop. Z: defunct zombie process, terminated but not reaped by its parent.
· UID/PID/PPID: Process ID.
· C: CPU usage.
· PRI/NI: Priority and Nice.
· ADDR/SZ/WCHAN: Memory related.
¡ ADDR: Location of the process in the memory. If it is Running, a hyphen (-) is displayed.
¡ SZ: size in physical pages of the core image of the process.
¡ WCHAN: Address of the kernel function where the process is sleeping.
· TTY: Controlling tty (terminal). For a remote login, pts/2 port is used.
· CMD: Command.
# Display all processes.
root@HZ-UIS01-CVK01:~# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 24572 2484 ? Ss 11:20 0:04 /sbin/init
root 2 0.0 0.0 0 0 ? S 11:20 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 11:20 0:00 [ksoftirqd/0]
root 6 0.0 0.0 0 0 ? S 11:20 0:00 [migration/0]
root 7 0.0 0.0 0 0 ? S 11:20 0:00 [watchdog/0]
root 8 0.0 0.0 0 0 ? S 11:20 0:00 [migration/1]
...
root 55719 1.0 0.0 71272 3520 ? Ss 17:42 0:00 sshd: root@pts/3
root 55752 8.6 0.0 21712 4204 pts/3 Ss 17:43 0:00 -bash
root 55927 0.0 0.0 16872 1284 pts/3 R+ 17:43 0:00 ps aux
root 62570 0.0 0.0 0 0 ? S 14:43 0:00 [kworker/7:2]
root 62840 0.0 0.0 0 0 ? S 16:40 0:00 [kworker/u:0]
# Display information about a process.
root@HZ-UIS01-CVK01:~# ps -fu mysql
UID PID PPID C STIME TTY TIME CMD
mysql 3144 1 0 11:21 ? 00:00:46 /usr/sbin/mysqld
Ending a process
kill -signal PID
The following are the signal types:
· 1 SIGHUP: Hangs up or disconnects a process. It's often used to restart a process or to update its configuration.
· 9 SIGKILL: Immediately terminates a process, without allowing it to clean up or save any data.
· 15 SIGTERM: Requests that the process terminate gracefully, allowing it to clean up any resources or save any data before exiting.
Networking
Configuring a network interface
# Display enabled network interfaces.
root@HZ-UIS01-CVK01:/etc/network# ifconfig
vs_st6251d: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 112.113.20.116 netmask 255.255.255.0 broadcast 112.113.20.255
inet6 fe80::4abd:3dff:fe35:364f prefixlen 64 scopeid 0x20<link>
ether 48:bd:3d:35:36:4f txqueuelen 1000 (Ethernet)
RX packets 92927617 bytes 259005158671 (241.2 GiB)
RX errors 0 dropped 197 overruns 0 frame 0
TX packets 86270427 bytes 264220608508 (246.0 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vs_storage: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::888d:4ff:fe03:2b42 prefixlen 64 scopeid 0x20<link>
ether 8a:8d:04:03:2b:42 txqueuelen 1000 (Ethernet)
RX packets 2096773 bytes 113740663 (108.4 MiB)
RX errors 0 dropped 1383 overruns 0 frame 0
TX packets 49 bytes 3718 (3.6 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vswit923de: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.125.36.163 netmask 255.255.254.0 broadcast 10.125.37.255
inet6 fe80::4abd:3dff:fe35:364d prefixlen 64 scopeid 0x20<link>
ether 06:dc:dd:6a:a4:6b txqueuelen 1000 (Ethernet)
RX packets 12129953 bytes 35114923993 (32.7 GiB)
RX errors 0 dropped 195 overruns 0 frame 0
TX packets 10305733 bytes 2409083342 (2.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vswitch0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet6 fe80::707b:10ff:fe08:6a4a prefixlen 64 scopeid 0x20<link>
ether 72:7b:10:08:6a:4a txqueuelen 1000 (Ethernet)
RX packets 2094681 bytes 111925332 (106.7 MiB)
RX errors 0 dropped 197 overruns 0 frame 0
TX packets 30 bytes 2196 (2.1 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
...
The ifconfig -a command displays all network interfaces, including disabled network interfaces.
# Display information about a network interface.
[root@autoCvk3 ~]# /opt/bin/ovs_dbg_listports
vs_st6251d (linux, vs_storage)
vs_st6251d 48bd3d35364f 1500
eth3 48bd3d35364f 1500
veth veth6251dlinux baf8cb16ce6d 1500 veth6251dovs
sub storage_ex 112.113.19.116/24
sub storage_in 112.113.20.116/24
vswit923de (linux, vswitch0)
vswit923de 06dcdd6aa46b 10.125.36.163/23 1500
eth2 48bd3d35364d 1500
veth veth923delinux 06dcdd6aa46b 1500 veth923deovs
# Shut down a network interface.
# ifdown vs_st6251d
# Start a network interface.
# ifup vs_st6251d
# Restart a network interface.
# /etc/init.d/networking restart
Starting from version E0883L01 of UIS 8.0, the network changed from OVS to Linux Engine. For bringing aggregated ports down/up, use bond interfaces.
Displaying physical NIC information
root@UIS-CVK02:~# ethtool eth1
Settings for eth1:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Half 1000baseT/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: No
Link partner advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on
Supports Wake-on: g
Wake-on: g
Current message level: 0x000000ff (255)
drv probe link timer ifdown ifup rx_err tx_err
Link detected: yes
Displaying network statistics
netstat -[atunlp]
Options and parameters:
· -a: Displays the state of all sockets and all routing table entries.
· -t: Lists TCP network packet data.
· -u: Lists UDP network packet data.
· -n: Displays network addresses as numbers.
· -l: Lists the services that are being listened to.
· -p: Displays process PID information for the service.
# Display network connection statistics for the service that uses port 8080.
root@HZ-UIS01-CVK01:/etc/network# netstat -an | grep 8080
tcp6 0 0 :::8080 :::* LISTEN
tcp6 0 0 192.168.1.11:8080 10.165.136.197:55954 ESTABLISHED
tcp6 0 0 192.168.1.11:8080 10.165.136.197:55989 TIME_WAIT
tcp6 0 0 192.168.1.11:8080 10.165.136.197:55990 FIN_WAIT2
tcp6 0 0 192.168.1.11:8080 192.168.1.211:53366 ESTABLISHED
tcp6 0 0 192.168.1.11:8080 192.168.1.211:54850 TIME_WAIT
# Display routing information for the system.
root@HZ-UIS01-CVK01:/etc/network# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 192.168.1.254 0.0.0.0 UG 0 0 0 vswitch2
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage
192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app
Capturing packets on a network
tcpdump
Options and parameters:
· -a: Converts network and broadcast addresses to names.
· -d: Displays the matching packet code in a human readable form to standard output and stop.
· -dd: Displays the matching packet code in the format of a C program segment.
· -ddd: Displays the matching packet code in decimal format.
· e: Prints data link layer header information on the output line.
· -t: Does not print timestamps on each output line.
· -vv: Outputs detailed packet information.
· -c: Stops tcpdump after receiving the specified number of packets.
· -i: Specifies the network interface to listen on.
· -w: Directly writes packet to a file without analyzing or printing it.
Example:
tcpdump -i vswitch2 -s 0 -w /tmp/test.cap host 200.1.1.1 &
Displaying routing information
# Display routing information.
root@HZ-UIS01-CVK01:/etc/network# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage
192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app
# Add static routing information to access the network at 10.10.10.0/24.
# route add -net 10.10.10.0 netmask 255.255.255.0 gw 192.168.2.254
root@HZ-UIS01-CVK01:/etc/network#
root@HZ-UIS01-CVK01:/etc/network# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2
10.10.10.0 192.168.2.254 255.255.255.0 UG 0 0 0 vswitch-storage
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage
192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app
# Delete routing information.
# route del -net 10.10.10.0 netmask 255.255.255.0 gw 192.168.2.254
root@HZ-UIS01-CVK01:/etc/network# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2
192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage
192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app
The static routing information generated by executing the command is only saved in the system's memory. For the information to take effect permanently, add the command to the system startup script so it can be executed during the startup process.
Use the vi editor in the operating system of UIS Manager to edit the /etc/rc.local file.
Add routing commands in the file. Restart the system for the modification to take effect.
root@HZ-UIS01-CVK01:/etc/network# vi /etc/rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
route add -net 192.168.5.0 netmask 255.255.255.0 gw 192.168.2.254
ulimit -s 10240
ulimit -c 1024
touch /var/run/h3c_UIS_cvk
/usr/bin/set-printk-console 2
exit 0
Disk management commands
Displaying the disk capacity
df [-ahikHTm] [directory or file]
Options and parameters:
· -a: Lists all file systems, including system-specific file systems such as /proc.
· -k: Displays the capacity of each file system in KBytes.
· -m: Displays the capacity of each file system in MBytes.
· -h: Displays the capacity of each file system in a human readable format, such as GBytes, MBytes, and KBytes.
· -H: Uses M=1000K instead of M=1024K for displaying capacities in larger units.
· -T: Lists the file system name of each partition, such as ext3.
· -i: Displays the number of inodes instead of disk usage.
# Display the partition size.
root@HZ-UIS01-CVK01:/etc/network# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 28G 2.4G 25G 9% /
udev 32G 4.0K 32G 1% /dev
tmpfs 13G 396K 13G 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 32G 17M 32G 1% /run/shm
/dev/sda6 241G 48G 181G 21% /vms
# Display information about a file system with partitions.
root@HZ-UIS01-CVK01:/etc/network# df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext4 28G 2.4G 25G 9% /
udev devtmpfs 32G 4.0K 32G 1% /dev
tmpfs tmpfs 13G 396K 13G 1% /run
none tmpfs 5.0M 0 5.0M 0% /run/lock
none tmpfs 32G 17M 32G 1% /run/shm
/dev/sda6 ext4 241G 48G 181G 21% /vms
Displaying the disk usage
du [-ahskm] file or directory name
Options and parameters:
· -a: Lists the capacity of all files or directories.
· -h: Displays the capacity of each file system in a human readable format, such as G/M.
· -s: Displays the total capacity.
· -S: Does not include statistics from subdirectories, which is slightly different from -s.
· -k: Displays the capacity in KBytes.
· -m: Displays the capacity in MBytes.
Example:
root@HZ-UIS01-CVK01:/vms# du -sh *
15G images
11G isos
16K lost+found
3.4G rhel-server-6.1-x86_64-dvd.iso
4.0K share
4.0K share-test
17G templet
4.0K test
Partitioning a disk
fdisk [-l] disk name
Options and parameters:
-l: Lists the partition tables for the specified disk.
If no disk is specified, the system lists all partitions of all disks in the system.
Example:
root@HZ-UIS01-CVK01:~# fdisk -l
Disk /dev/sda: 300.0 GB, 299966445568 bytes
255 heads, 63 sectors/track, 36468 cylinders, total 585871964 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 262144 bytes / 262144 bytes
Disk identifier: 0x00051ce2
Device Boot Start End Blocks Id System
/dev/sda1 * 512 58593791 29296640 83 Linux
/dev/sda2 58594302 585871359 263638529 5 Extended
Partition 2 does not start on physical sector boundary.
/dev/sda5 58594304 80138751 10772224 82 Linux swap / Solaris
/dev/sda6 80139264 585871359 252866048 83 Linux
Disk /dev/sdb: 4294 MB, 4294967296 bytes
133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
Disk /dev/sdb doesn't contain a valid partition table
# Create a partition on a disk.
root@HZ-UIS01-CVK01:~# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xeb665aa3.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won't be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): m
Command action
a toggle a bootable flag
b edit bsd disklabel
c toggle the dos compatibility flag
d delete a partition
l list known partition types
m print this menu
n add a new partition
o create a new empty DOS partition table
p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition's system id
u change display/entry units
v verify the partition table
w write table to disk and exit
x extra functionality (experts only)
Command (m for help): p
Disk /dev/sdb: 4294 MB, 4294967296 bytes
133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xeb665aa3
Device Boot Start End Blocks Id System
Command (m for help): n
Partition type:
p primary (0 primary, 0 extended, 4 free)
e extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-8388607, default 2048)
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-8388607, default 8388607): 4000000
Command (m for help): n
Partition type:
p primary (1 primary, 0 extended, 3 free)
e extended
Select (default p): p
Partition number (1-4, default 2): 2
First sector (4000001-8388607, default 4000001)
Using default value 4000001
Last sector, +sectors or +size{K,M,G} (4000001-8388607, default 8388607): +500M
Command (m for help): p
Disk /dev/sdb: 4294 MB, 4294967296 bytes
133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xeb665aa3
Device Boot Start End Blocks Id System
/dev/sdb1 2048 4000000 1998976+ 83 Linux
/dev/sdb2 4000001 5024000 512000 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
# Display disk partition information.
root@HZ-UIS01-CVK01:~# fdisk -l /dev/sdb
Disk /dev/sdb: 4294 MB, 4294967296 bytes
133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xeb665aa3
Device Boot Start End Blocks Id System
/dev/sdb1 2048 4000000 1998976+ 83 Linux
/dev/sdb2 4000001 5024000 512000 83 Linux
Making a file system
mkfs [-t file system format] disk name
Options and parameters:
-t: Specifies the file system type, for example, ext2, ext3, ext4, or ocfs2.
# Make an ex3 file system on /dev/sdb1.
root@HZ-UIS01-CVK01:~# mkfs -t ext3 /dev/sdb1
mke2fs 1.42 (29-Nov-2011)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
125184 inodes, 499744 blocks
24987 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=515899392
16 block groups
32768 blocks per group, 32768 fragments per group
7824 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
root@HZ-UIS01-CVK01:~#
# Make an ocfs2 file system on /dev/sdb1.
root@HZ-UIS01-CVK01:~# mkfs -t ocfs2 /dev/sdb2
mkfs.ocfs2 1.6.3
Cluster stack: classic o2cb
Label:
Features: sparse backup-super unwritten inline-data strict-journal-super xattr
Block size: 1024 (10 bits)
Cluster size: 4096 (12 bits)
Volume size: 524288000 (128000 clusters) (512000 blocks)
Cluster groups: 17 (tail covers 5120 clusters, rest cover 7680 clusters)
Extent allocator size: 2097152 (1 groups)
Journal size: 16777216
Node slots: 2
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing backup superblock: 0 block(s)
Formatting Journals: done
Growing extent allocator: done
Formatting slot map: done
Formatting quota files: done
Writing lost+found: done
mkfs.ocfs2 successful
root@HZ-UIS01-CVK01:~#
Checking a disk
fsck [-t file system format] [-ACay] disk name
Options and parameters:
· -t: Specifies the file system type. This option is typically not required, because the current Linux system automatically distinguishes file system types through the superblock.
· -A: Scans the necessary disks based on the content of /etc/fstab. This command is typically executed during the boot process.
· -a: Automatically repairs detected abnormal sectors, so you don't have to keep pressing y.
· -y: Similar to -a, but some file systems only support the -y parameter.
· -C: Enables a histogram to display the current progress during the check.
# Check the /dev/sdb1 partition.
root@HZ-UIS01-CVK01:~# fsck -C /dev/sdb1
fsck from util-linux 2.20.1
e2fsck 1.42 (29-Nov-2011)
/dev/sdb1: clean, 11/125184 files, 16807/499744 blocks
Mounting a file system
mount [-t file system type] [-L Lable name] [-o additional option] [-n] disk file name mount point
Options and parameters:
· -a: Mounts all file systems based on the data in the /etc/fstab configuration file.
· -l: Displays the column label name besides the mounting information.
· -t: Specifies the type of file system to be mounted.
· -n: By default, the system writes the actual mounting information to /etc/mtab in real time to facilitate operation of other programs.
· -L: Mounts the partition that has the specified label.
· -l: Add labels in the mount output, for example, account, password, or read privilege.
# Mount /dev/sdb1 to /mnt.
root@HZ-UIS01-CVK01:~# mount /dev/sdb1 /mnt
root@HZ-UIS01-CVK01:~# df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext4 28G 5.7G 21G 22% /
udev devtmpfs 32G 4.0K 32G 1% /dev
tmpfs tmpfs 13G 408K 13G 1% /run
none tmpfs 5.0M 0 5.0M 0% /run/lock
none tmpfs 32G 17M 32G 1% /run/shm
/dev/sda6 ext4 241G 48G 181G 21% /vms
/dev/sdb1 ext3 1.9G 35M 1.8G 2% /mnt
Umounting a file system
umount [-fn] disk file name
Options and parameters:
· -f: Unmounts a file system forcibly. Use this parameter if no data can be read from a network file system (NFS).
· -n: Unmounts a file system without writing in the /etc/mtab directory.
Example:
root@HZ-UIS01-CVK01:~# df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext4 28G 5.7G 21G 22% /
udev devtmpfs 32G 4.0K 32G 1% /dev
tmpfs tmpfs 13G 408K 13G 1% /run
none tmpfs 5.0M 0 5.0M 0% /run/lock
none tmpfs 32G 17M 32G 1% /run/shm
/dev/sda6 ext4 241G 48G 181G 21% /vms
/dev/sdb1 ext3 1.9G 35M 1.8G 2% /mnt
root@HZ-UIS01-CVK01:~#
root@HZ-UIS01-CVK01:~# umount /mnt
root@HZ-UIS01-CVK01:~# df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext4 28G 5.7G 21G 22% /
udev devtmpfs 32G 4.0K 32G 1% /dev
tmpfs tmpfs 13G 408K 13G 1% /run
none tmpfs 5.0M 0 5.0M 0% /run/lock
none tmpfs 32G 17M 32G 1% /run/shm
/dev/sda6 ext4 241G 48G 181G 21% /vms
Writing data to a disk
Use the sync command to write data not updated in the memory to a disk.
Example:
root@HZ-UIS01-CVK01:~# sync
root@HZ-UIS01-CVK01:~#
Euler edition restrictions
To maintain system security and stability and prevent unintended background operations, Euler OS restricts certain background activities.
Disabled commands
The following commands are disabled:
· rm
· rpm
· which
· grep
· mv
· vi
· vim
· ps
· top
· bash
· sh
· find
· yum
· dd
· chmod
The system displays command not found when you enter these commands.
Disabled command autocompletion
Pressing Tab no longer autocompletes commands during input.





































































































































