Download Book

H3C UIS Manager Maintenance Guide-Versions earlier than E0802P01-5W100-book.pdf (6.80 MB)

Released At: 27-01-2025
Page Views:
Downloads:

Table of Contents

H3C UIS Manager Maintenance Guide-Versions earlier than E0802P01-5W100

Related Documents

H3C UIS Manager Maintenance Guide

Document version: 5W100-20250126

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

The information in this document is subject to change without notice

Contents

Routine maintenance· 1

Reviewing alarms· 1

Performing health check· 1

Reviewing operation logs· 3

Identifying cluster status· 3

Identifying the cluster HA feature· 4

Identifying the shared storage in the cluster 4

Identifying host information· 4

Identifying host status· 5

Identifying the uptime of a host 5

Identifying host performance monitoring information· 5

Identifying vSwitch information· 9

Identifying physical NIC status· 9

Identifying VM status· 10

Identifying the running status of CAStools· 10

Verifying disk and NIC types· 10

Identifying VM performance monitoring statistics· 11

Identifying VM backup information· 14

Identifying license information· 15

Managing alarms· 15

Configuration cautions and guidelines· 17

Change operations· 18

Upgrading UIS software· 18

Handling hardware failure· 18

Starting or shutting down a UIS host 18

IP address and host name change· 18

Managing physical interfaces bound to a vSwitch· 19

Replacing a disk on a CVK host 27

Changing the password for accessing UIS Manager 27

Changing the root password of a host from the Web interface· 28

Changing the admin password· 28

Scaling out and scaling in a cluster 29

Changing the system time· 29

Performing a heterogeneous or homogeneous migration· 29

Redefining a VM·· 29

Obtaining the XML file of the VM·· 29

Identifying the storage volume for VM disk files· 32

Copying the XML file of the VM to the target host 32

Defining the VM through XML· 32

Clearing VM data on the original host 33

Configuring stateful failover 33

Replacing SSDs with NVMe drives· 34

Migrating VMware VMs· 34

Configuring GPUs· 34

Configuring vGPUs· 34

Configuring anti-virus· 34

Configuring AISHU backup· 34

Configuring storage disaster recovery· 34

Collecting logs· 35

Collecting logs of the UIS Manager 35

Collecting logs from the Web interface· 35

Collecting logs at the CLI of a CVK host 35

Introduction to logs· 36

Collecting logs of CAStools· 40

Collecting logs of a VM operating system·· 40

Collecting logs of a Windows operating system·· 40

Viewing logs of a Windows operating system·· 42

Collecting logs of a Linux operating system·· 43

Troubleshooting tools and utilities· 44

Introduction to kdump· 44

Analysis with the Kdump file· 44

Storage cluster logs· 48

/var/log/ceph/ceph.log· 48

/var/log/ceph/ceph-osd.*.log· 49

/var/log/ceph/ceph-disk.log· 49

/var/log/ceph/ceph-mon.*.log· 50

/var/log/calamari/calamari.log· 50

/var/log/onestor_cli/ onestor_cli.log· 50

Bimodal HCI logs· 50

Distributed storage maintenance· 52

Cluster issues· 52

Rebalancing data placement when data imbalance occurs· 52

Node issues· 53

Resolving host issues caused by a full system disk· 53

Issues caused by network failure· 54

Handling failures to add or delete hosts· 54

Deleting a monitor node offline and restoring the node· 55

Deleting a storage node offline and restoring the node· 55

Disk issues· 55

Missing or changing sdX device names due to host restart 55

Identifying the data partitions and journal partitions (for write caching) to which the OSDs are mounted 56

OSD for a disk cannot be deleted upon a disk replacement prior to deletion of its OSD from UIS Manager 57

Replacing disks· 58

Failure to display O&M and monitoring data· 58

Failure to display O&M and monitoring data (1) 58

Failure to display O&M and monitoring data (2) 59

Troubleshooting· 61

Cluster initialization issues· 61

Host scan failure· 61

Compute cluster creation failure· 61

Storage configuration failure· 61

Cluster state· 62

Health index lower than 100%·· 62

Host deletion· 63

Deletion failure prompt for successful host deletion· 63

Disk issues· 64

No available disk· 64

Cluster alarms· 66

Down monitor node· 66

Down OSD·· 66

OSD process terminated unexpectedly· 67

OSD soft link loss· 68

Loose or faulty disk· 69

Abnormal PG state· 69

Cache alarm·· 69

Host failure· 71

UIS management node failure· 71

Stateful failover 73

Quorum node failure· 73

Monitoring node failure· 73

Down monitoring node due to high system disk usage· 73

Down monitoring node due to network error 74

Extent backup file· 74

Extent backup state· 74

Extent backup directory· 75

Extent backup file decompression· 75

Script for data restoration· 75

Shared storage space reclamation· 76

Releasing space of a shared volume by editing the VM bus type· 76

Releasing space of a shared volume by deleting files· 78

SNMP· 78

Get responses not received by an NMS· 78

Value-added services· 80

Data of a value-added service in the memory is different from that in the database· 80

Data inconsistency occurs if you mount a volume on a Windows client and create a snapshot online 80

If you mount multiple snapshots of a volume on a Windows client at the same time, you are prompted that some snapshots are not initialized or assigned· 81

If you take a snapshot for a volume, delete its host mapping on the handy page without disk scanning or iSCSI disconnection, and restore the snapshot, the restored data is different from the original data. 81

If you create a read-only snapshot for a volume that is mounted by a directory, the snapshot cannot be mounted and the system prompts a wrong fs type message· 81

The state of a snapshot is Creating, Deleting, or Restoring· 82

Compatibility· 82

When the intel ixgbe network adapter is enabled with load balancing, storage access gets slow· 82

Using a QoS policy with low bandwidth and IOPS limits makes the storage disks of a client slow· 83

Failure to recognize an encryption dongle by VMs· 84

After a USB device is plugged into a CVK host, the host cannot recognize the USB device· 84

After a USB device is loaded to a VM, the device manager of the VM cannot recognize the USB device, or the USB device appear and disappear quickly, or there is an exclamation mark on the device· 86

Use of USB3.0 devices· 88

Use of USB-to-serial devices· 89

Performance improvement 90

Disk performance optimization· 90

Performance optimization· 90

Guest OS and VM restoration· 96

Restrictions and guidelines· 96

Preparation before repair 97

Linux system repair steps· 97

Windows repair operations and steps· 101

Space occupation issue· 106

Space occupation issue due to manual operations· 106

Space occupation issue due to software issues· 107

Log message exception· 108

Message The maximum number of pending replies per connection has been reached generated· 108

Unified authentication issue· 109

CAS authentication service exception· 109

D-state process issue· 109

Commonly used commands· 111

UIS Manager commands· 111

HA commands· 111

vSwitch commands· 113

iSCSI commands· 118

Mounting FC storage· 119

Tomcat commands· 120

MySQL database commands· 120

virsh commands· 121

casserver commands· 121

qemu commands· 121

ONEStor commands· 123

ONEStor commands· 128

nvmof commands· 134

Cloud-native engine container service commands· 134

File management commands· 151

Process management commands· 151

Networking· 155

Disk management commands· 159

Routine maintenance

Stable operation of the UIS system requires maintenance works that typically include reviewing alarms, identifying cluster status, host information, virtual machine (VM) status, license information, and reviewing logs.

Reviewing alarms

The UIS platform main page displays indicators for critical alarms, major alarms, minor alarms, and information alarms generated during UIS system operation in the top right corner.

If critical or major alarms are displayed, the UIS system operation might contain anomalies that require immediate troubleshooting.

By clicking the corresponding alarm indicator, you can access the associated real-time alarm page. Alternatively, you can navigate to the Alarm Management > Real-Time Alarm page.

You can perform troubleshooting based on the alarm source, type, content, and the last alarm time on the real-time alarm page.

Performing health check

The UIS platform provides a hot key in the top right corner that allows you to perform health check, resource analysis, storage cleanup, resource export, VM restoration, and zombie VM operations.

Select Health Check to enter the health check page. You can perform health check for the specified modules.

You can print and export the health check results.

If a failure is detected in the health check, for example, a RAID controller or hard drive cache failure, you can click Remediation to resolve the issue.

Reviewing operation logs

The Operation Logs page records history operations in the UIS system, including front-end manual user operations and back-end automatic system operations.

The system provides important information about operation logs including` the operator name, finish time, login address, operation description, and failure result reason.

If an operation log message result is failed, you need to troubleshoot the failure based on the failure reason. If a large number of operation logs exist, you can download them for troubleshooting and analysis.

The following figure shows the UIS Manager operation logs.

Identifying cluster status

Identifying the cluster HA feature

Verify that the HA feature is enabled for the cluster. If HA is not enabled, and the next CVK host anomaly occurs in the cluster, the VMs on the CVK host cannot correctly migrate to other CVK hosts in the cluster.

After enabling HA for the cluster, you can enable service area HA. When the service area HA becomes faulty or a connectivity issue occurs for a VM, the VM can migrate to another host.

You can specify the boot priority for the VMs in the cluster. Options include Low, Medium, and High. The default boot priority is Medium. The VM boot priority is set upon adding or editing VMs. The boot priority specifies the startup order of VMs after a host failure occurs. The VMs restart on the new host according to the specified boot priorities. The VMs with the high, medium, and low boot priorities start up in descending order until all VMs restart or no more cluster resources are available.

Identifying the shared storage in the cluster

During VM migration, if the target host has no shared storage mounted for VMs, the migration will fail.

Identifying host information

Identifying host status

View host status on the Hosts page to identify whether abnormal hosts exist.

Check the CPU and memory usage of each host, and pay special attention to the hosts with usage exceeding 80%.

Identifying the uptime of a host

On the Summary page of a CVK host, you can see the detailed host configuration information. From the Uptime field, you can identify whether the host has been rebooted recently.

Identifying host performance monitoring information

On the Performance Monitoring page of the CVK host, you can see the CPU usage, memory usage, I/O throughput, network throughput, disk usage, and partition usage of the host.

Identifying host CPU usage

On the Performance Monitoring > CPU Usage (%) page, click … to view CPU usage in a longer time range.

Identifying host memory usage

On the Performance Monitoring > Memory Usage (%) page, click … to view memory usage in a longer time range.

Identifying host I/O throughput

On the Performance Monitoring > I/O Throughput (KBps) page, click … to view I/O throughput in a longer time range.

Identifying host network throughput

On the Performance Monitoring > Network Throughput (Mbps) page, click ... to view the network throughput of each physical NIC in a longer time range.

Identifying host disk usage

On the Performance Monitoring > Disk Requests (IOPS) page, you can see the host disk usage information.

Identifying host partition usage

On the Performance Monitoring > Partition Usage page, you can see the host disk usage information.

Identifying vSwitch information

Identify whether the names of vSwitches between hosts in the cluster are consistent.

On the vSwitches page of a host, identify whether the vSwitches are active. If a vSwitch is in abnormal state, identify whether the physical NIC is normal.

Make sure only one gateway is configured for all vSwitches of the host.

Identifying physical NIC status

On the Physical NICs page, identify whether the physical NICs of the host, such as the rate and state, are normal.

Abnormal physical NICs will affect vSwitch performance.

Identifying VM status

Identifying the running status of CAStools

On the Summary page of the VM, identify whether CAStools are installed to the VM and running correctly.

Verifying disk and NIC types

Verifying the disk type

On the Disk tab of the VM modification page, verify that the device object is Virtio disk (that significantly improves disk performance), the source path is a shared storage path, and the cache mode is directsync (recommended setting).

Verifying the NIC type

On the Network tab of the VM modification page, verify that the device model is high-speed NIC and kernel acceleration is enabled (that significantly improves NIC performance).

Identifying VM performance monitoring statistics

On the Performance Monitoring page of the VM, you can see the CPU usage, memory usage, I/O throughput, network throughput, disk usage, and partition usage of the VM.

Identifying VM CPU usage

On the Performance Monitoring > CPU Usage (%) page, click … to view CPU usage in a longer time range.

Identifying VM memory usage

On the Performance Monitoring > Memory Usage (%) page, click … to view memory usage in a longer time range.

Identifying VM I/O throughput

On the Performance Monitoring > I/O Throughput (KBps) page, click … to view I/O throughput in a longer time range.

Identifying VM network throughput

On the Performance Monitoring > Network Throughput (Mbps) page, click … to view the network throughput of each physical NIC in a longer time range.

Identifying VM disk usage

On the Performance Monitoring > Disk Requests (IOPS) page, you can see the VM disk usage information.

Identifying VM partition usage

On the Performance Monitoring > Partition Usage page, you can see VM partition usage information.

Identifying VM backup information

On the Backup Management page of a VM, you can see the backup history of the VM. As a best practice, back up all core VMs on the UIS platform.

Identifying license information

The UIS system typically contains UIS Manager license, CAS license, and distributed storage license. You need to use official licenses at official deployment sites. You can use temporary licenses at test or temporary deployment sites. To avoid impacts on correct UIS system usage upon expiration of the temporary licenses, you need to update the temporary licenses in advance.

The following figure shows the licensing page of the UIS Manager component.

Managing alarms

The alarm management feature collects and displays statistics of concerned alarms for operators. In the current software version, UIS collects statistics of host resource alarms, VM resource alarms, cluster resource alarms, failure alarms, security alarms, other alarms, and distributed storage resource alarms.

Users can configure alarm threshold settings for the indexes such as CPU usage and memory usage of hosts or VMs. When an index value reaches the alarm threshold, an alarm is generated and reported. Users can view the reported alarms in the real-time alarm list. The alarm filtering configuration allows users to filter the alarms that are not concerned. Such alarms will not be reported. In addition, the system supports sending alarms to users through Emails or SMS messages.

Configuration cautions and guidelines

See H3C UIS Manager Configuration Cautions and Guidelines.

See H3C UIS Manager Data Loss Prevention Best Practices.

Change operations

If issues occur during the UIS system running process, you must follow certain rules to resolve the issues. If you cannot do that, normal operation of services on the live network will be affected.

Upgrading UIS software

See H3C UIS Upgrade Guide.

Handling hardware failure

See H3C UIS Hyper-Converged Infrastructure Component Replacement Configuration Guide.

Starting or shutting down a UIS host

When you perform comprehensive maintenance for the UIS system, you must follow a certain order to power on or power off the device. If you cannot do that, the service system will be destroyed. Before powering on the device, make sure the health is 100%.

For more information, see H3C UIS Hyper-Converged Infrastructure Node Shutdown Configuration Guide.

IP address and host name change

CAUTION:

· To change the root password for a CVK the system, access the Web interface of UIS Manager. You cannot change the root password for a CVK host from its command shell.

· If you delete a CVK host when the shared storage of the CVK host is suspended, the shared storage will be automatically deleted. Therefore, you must mount the shared storage to the CVK host again after the CVK host is added again.

· When the number of nodes is equal to or less than four hosts, primary nodes, backup nodes, and quorum nodes, you cannot modify IP addresses through directly deleting hosts. For more information, contact Technical Support.

After the UIS system is deployed, you might need to modify the UIS system IP address or hosts.

After a CVK host is added to the UIS cluster, you can modify the IP address or host name through the method provided by the Xconsole interface, as shown in the figure below. To do that, you must first delete the CVK host from the UIS system.

If the CVK host has shared storage enabled or runs VMs, it cannot be deleted. To delete the host in this case, you must first stop or migrate VMs and pause or delete the shared file system.

After the host is deleted, you can add the host through host expansion. During the host expansion process, you can manually configure an IP address for the host and select the corresponding NIC interface, and then add the host back to the cluster. Then, you can migrate the VMs back to the host.

CAUTION:

· Make sure the IP address you enter can communicate with the management network and internal/external storage networks of the original cluster. If you cannot do that, you will fail to add the host.

· The IP address settings are planned in the deployment phase. You must determine the IP address settings at the beginning, because you cannot modify the IP address settings later.

Managing physical interfaces bound to a vSwitch

When the live plan is improper, you must adjust the physical interfaces bound to the vSwitch. If you want to change the network settings after the deployment is finished, you must do that with caution and make sure you are familiar with the network topology and change requirements.

In version E0750P06 and later, you can do that in the Web interface as follows. First, configure the host to operate in maintenance mode. Then, access the Hosts > vSwitches page and edit the network settings. At last, confirm the connectivity and exit maintenance mode.

In versions earlier than E0750P06, you cannot modify the physical interfaces bound to a vSwitch and modify the mode in the Web interface. Instead, you must do that in the backend. By assigning multiple interfaces to an aggregation group, you can load-share the traffic among the member ports and provide higher connection availability for traffic.

Link aggregation delivers the following benefits:

· Increases the network bandwidth—Link aggregation binds multiple links into a logical link, whose bandwidth is the sum of the bandwidth of each single link.

· Improves the network connection availability—Multiple links in a link aggregation back up each other. When a link is disconnected, the traffic will be automatically load-shared again among the remaining links.

Based on whether LACP is enabled on the bond interfaces, link aggregation includes static aggregation and dynamic aggregation.

Dynamic aggregation on an OVS

LACP is enabled on both the OVS side and switch side. On the bond interfaces of an OVS, the value for the lacp parameter can be active (enable LACP) or off (disable LACP).

The lacp_status parameter represents dynamic aggregation status. Options include negotiated (LACP negotiation succeeds), configured (LACP is enabled on the OVS side but LACP negotiation fails), and disabled (LACP is not enabled on the OVS side).

As shown in Figure 1, the lacp parameter is set to active on a bond interface to enable LACP on the bond interface of the OVS. However, the lacp_status parameter is configured on the bond interface. A possible reason is that LACP is not enabled on the peer device.

Figure 1 Dynamic aggregation autonegotiation fails

In normal conditions, LACP negotiation succeeds. In this case, the bond interface status is as shown in Figure 2.

Figure 2 Dynamic aggregation autonegotiation succeeds

On the OVS, dynamic aggregation supports advanced (balance-tcp mode) load sharing and basic (balance-slb mode) load sharing. The difference lies in the dimensions during the link entry hashing process.

· balance-tcp mode—Obtains the packet forwarding interface through hashing the Ethernet type, source/destination MAC address, VLAN ID, IP packet protocol, source/destination IP/IPv6 address, and source/destination Layer 4 port number fields of packets.

· balance-slb mode—Obtains the packet forwarding interface through hashing the source MAC and VLAN fields of packets. This bond_mode is deployed on the current Web interface.

Static aggregation on an OVS

LACP is disabled on both the OVS side and switch side. When the configuration succeeds, the state is as follows:

Figure 3 Static aggregation configuration state

In the bond interface configuration, the lacp parameter is set to off, and the lacp_status parameter is off for aggregation.

On the OVS, static aggregation supports advanced load sharing, basic load sharing, and active/backup load sharing. The difference between advanced load sharing and basic load sharing is the same as that in dynamic aggregation. The following information describes basic load sharing.

In the OVSDB, the bond interface configuration saves the active link selection method, and the interface configuration saves the physical NIC priority. Configure the following settings:

1. ovs-vsctl set Port bond-name other_config: active-algorithm=”speed|order”

The speed option means to select the active link by NIC speed. The order option means to select the active link in the NIC configuration order. If this command is not executed, the active link is selected by NIC speed by default.

2. ovs-vsctl set Port bond-name other_config:active-algorithm=”true|false”

The true option means the traffic will be switched back to the selected active link NIC when the NIC goes down and then comes up. The false option means the traffic will not be switched back. If this command is not executed, the traffic will not be switched back by default.

3. ovs-vsctl set Interface ethx other_config:slave-priority=”n”,

The n argument represents the ID assigned by the back end according to the configuration order, for example, 1, 2, 3... A smaller ID means a higher priority.

Figure 4 Active/backup aggregation group configuration

Figure 5 Member interface configuration for an active/backup aggregation group on an OVS

Changing single NIC interfaces to a dynamic aggregation group on an OVS

The following information describes how to change single NIC interface eth7 into a dynamic aggregation group with member interfaces eth5 and eth7 for advanced/basic load sharing on vswitch0 on the management network.

· If the peer switch of eth5 and eth7 has been configured with a dynamic aggregation group and the two interfaces have been assigned to the aggregation group, you only need to configure the dynamic aggregation group with advanced (bond_mode=balance-tcp) or basic (bond_mode= balance-slb) load sharing on the OVS.

ovs-vsctl del-port vswitch0 eth7; ovs-vsctl -- add-bond vswitch0 vswitch0_bond eth5 eth7 bond_mode=[balance-tcp | balance-slb] -- set port vswitch0_bond lacp=active

CAUTION:

You must enter the commands before and after the semicolon (;) at the same time. In this way, when the management interface is disconnected (eth7 is removed from vswitch0), vswitch0 is immediately configured with the dynamic aggregation group containing eth5+eth7.

· If the peer switch of eth5 and eth7 is not configured with a dynamic aggregation group, you can configure a static active/backup aggregation and then switch the aggregation mode.

a. Create a static active/backup aggregation group with members eth5 and eth7 on the OVS:

ovs-vsctl del-port vswitch0 eth7;ovs-vsctl add-bond vswitch0 vswitch0_bond eth5 eth7 bond_mode=active-backup

b. Configure a dynamic aggregation group on the peer switch of eth5 and eth7, and assign the two interfaces to the aggregation group.

Without loss of generality, suppose eth5 is connected to GigabitEthernet 1/0/5 on the peer switch and eth7 is connected to GigabitEthernet1/0/7 on the peer switch.

[H3C]interface Bridge-Aggregation 8 //Create aggregation group 8

[H3C-Bridge-Aggregation8]link-aggregation mode dynamic //Specify the aggregation group as a dynamic aggregation group

[H3C]interface GigabitEthernet 1/0/5

[H3C-GigabitEthernet1/0/5]port link-aggregation group 8 //Assign GigabitEthernet 1/0/5 to aggregation group 8

[H3C]interface GigabitEthernet 1/0/7

[H3C-GigabitEthernet1/0/7]port link-aggregation group 8 //Assign GigabitEthernet 1/0/7 to aggregation group 8

CAUTION:

Make sure the configuration (especially the VLAN configuration) of aggregation group Bridge-Aggregation 8 is the same as the configuration of member interfaces (GigabitEthernet 1/0/5 and GigabitEthernet 1/0/7 in this example). If you cannot do that, dynamic aggregation and static advanced/basic load sharing will fail.

c. Execute the following command to configure the static active/backup aggregation group to operate in dynamic advanced (bond_mode=balance-tcp) or basic (bond_mode= balance-slb) load sharing mode:

ovs-vsctl set port vswitch0_bond bond_mode=[balance-tcp | balance-slb] lacp=active

Changing single NIC interfaces to a static aggregation group on an OVS

The following information describes how to change single NIC interface eth7 into a dynamic advanced/basic load sharing aggregation group with member interfaces eth5 and eth7 on vswitch0 on the management network.

ovs-vsctl del-port vswitch0 eth7; ovs-vsctl -- add-bond vswitch0 vswitch0_bond eth5 eth7 bond_mode=[balance-tcp | balance-slb] -- set port vswitch0_bond lacp=active

CAUTION:

· If the peer switch of eth5 and eth7 is not configured with a dynamic aggregation group, you can configure a static active/backup aggregation and then switch the aggregation mode.

a. Create a static active/backup aggregation group with members eth5 and eth7 on the OVS:

ovs-vsctl del-port vswitch0 eth7;ovs-vsctl add-bond vswitch0 vswitch0_bond eth5 eth7 bond_mode=active-backup

b. Configure a dynamic aggregation group on the peer switch of eth5 and eth7, and assign the two interfaces to the aggregation group.

Without loss of generality, suppose eth5 is connected to GigabitEthernet 1/0/5 on the peer switch and eth7 is connected to GigabitEthernet1/0/7 on the peer switch.

[H3C]interface Bridge-Aggregation 8 //Create aggregation group 8

[H3C]interface GigabitEthernet 1/0/5

[H3C-GigabitEthernet1/0/5]port link-aggregation group 8 //Assign GigabitEthernet 1/0/5 to aggregation group 8

[H3C]interface GigabitEthernet 1/0/7

[H3C-GigabitEthernet1/0/7]port link-aggregation group 8 //Assign GigabitEthernet 1/0/7 to aggregation group 8

CAUTION:

c. Execute the following command to configure the static active/backup aggregation group to operate in static advanced (bond_mode=balance-tcp) or basic (bond_mode= balance-slb) load sharing mode:

ovs-vsctl set port vswitch0_bond bond_mode=[balance-tcp | balance-slb]

Changing a dynamic aggregation group to a static aggregation group on an OVS

The following information changes the dynamic aggregation group with eth5 and eth7 to a static aggregation group on vswitch0.

To smoothly change a dynamic aggregation group to a static aggregation group (try to avoid packet loss as possible), you must configure a static active/backup aggregation group in between.

1. Change a dynamic aggregation group to a static active/backup aggregation group on the OVS.

ovs-vsctl set port vswitch0_bond bond_mode=active-backup lacp=off

2. Disable LACP for the aggregation group (Bridge-Aggregation 8 in this example) on the peer switch of eth5 and eth7.

[H3C]interface Bridge-Aggregation 8

[H3C-Bridge-Aggregation8]undo link-aggregation mode dynamic

3. Change the static active/backup aggregation group to a static aggregation group with advanced/basic load sharing on the OVS.

ovs-vsctl set port vswitch0_bond bond_mode=[balance-tcp | balance-slb]

Changing a static aggregation group to a dynamic aggregation group on an OVS

The following information switches the static aggregation group with eth5 and eth7 to a dynamic aggregation group on vswitch0.

1. Change a static aggregation group to a static active/backup aggregation group on the OVS.

Skip this step if the aggregation group on the OVS is a static active/backup aggregation group.

ovs-vsctl set port vswitch0_bond bond_mode=active-backup

2. Enable LACP for the aggregation group (Bridge-Aggregation 8 in this example) on the peer switch of eth5 and eth7.

[H3C]interface Bridge-Aggregation 8

[H3C-Bridge-Aggregation8]link-aggregation mode dynamic

3. Change the static active/backup aggregation group to a dynamic aggregation group with advanced/basic load sharing on the OVS.

ovs-vsctl set port vswitch0_bond bond_mode=[balance-tcp | balance-slb] lacp=active

Deleting an aggregation group on an OVS

The following information describes how to change a dynamic advanced load-sharing aggregation group with member interfaces eth5 and eth7 to single interface eth7 on vswitch0.

1. Change the aggregation mode to static active/backup aggregation on vswitch0.

ovs-vsctl set port vswitch0_bond bond_mode=active-backup lacp=off

2. Remove eth5 and eth7 from the aggregation group on vswitch0.

Suppose eth5 is connected to GigabitEthernet 1/0/5 on the peer switch and eth7 is connected to GigabitEthernet 1/0/7 on the peer switch.

[H3C]interface GigabitEthernet 1/0/5

[H3C-GigabitEthernet1/0/5]undo port link-aggregation group

[H3C]interface GigabitEthernet 1/0/7

[H3C-GigabitEthernet1/0/7]undo port link-aggregation group

3. Delete the static active/backup aggregation group on vswitch0, and assign eth7 to switch0.

ovs-vsctl del-port vswitch0_bond;ovs-vsctl add-port vswitch0 eth7

The way of switching a static advanced/basic load-sharing aggregation group to a single link is similar to the way of switching a dynamic advanced/basic load-sharing aggregation group to a single link. The difference is that the following command is executed in the first step:

ovs-vsctl set port vswitch0_bond bond_mode=active-backup

CAUTION:

Because of various objective restrictions (for example, restrictions on the peer physical switch), the CAS OVS cannot absolutely perform the aggregation mode switchover smoothly, and few packets will be dropped. As a best practice, perform aggregation mode switchover when the traffic is light.

Replacing a disk on a CVK host

When a disk in the cluster fails, it cannot be directly replaced. Software operations and configurations are required for a successful disk replacement on UIS Manager. For more information, see H3C UIS Hyper-Converged Infrastructure Component Replacement Configuration Guide.

Changing the password for accessing UIS Manager

CAUTION:

· To change the root password for a CVK, access the Web interface of UIS Manager. You cannot change the root password for a CVK host from its command shell.

· As a best practice, configure the same password for all hosts in the cluster.

· Regularly change your password and avoid using simple or common passwords.

To meet security requirements, user passwords need to be changed periodically. The following changes the password of the UIS root user as an example.

Changing the root password of a host from the Web interface

1. Right-click a host, and then select Edit Host.

2. In the dialog box that opens, enter a new password, and then click OK.

If you forget the root password, see H3C UIS&CAS Host Password Retrieval Configuration Guide.

Changing the admin password

UIS Manager has a default password. To change this password, access UIS Manager and click admin in the upper-right corner, and then change the password as needed.

As a best practice, change the root password and admin password in time at the first login to UIS Manager.

Scaling out and scaling in a cluster

See H3C UIS Manager Resource Scale-Out and Scale-In Configuration Guide.

Changing the system time

See H3C UIS Manager System Time Modification Configuration Guide.

Performing a heterogeneous or homogeneous migration

See H3C UIS HCI Cloud Migration Guide.

Redefining a VM

In some cases, such as when a VM fails to start up due to host operation issues, it might be necessary to redefine and restore a VM on a different host from the original location. However, VMs that use raw blocks and encrypted disks and have multi-level images do not support VM redefinition.

Obtaining the XML file of the VM

Obtaining the XML file of the VM when HA is enabled and the CVM node is normal

When HA is enabled and the CVM node is normal, the XML file of a VM is saved in the HA directory on the CVM node by default. Typically, the HA directory is /etc/cvm/ha/clust_id/cvk_name, for example, /etc/cvm/ha/2/cvknode191. In the corresponding HA directory, enter the CVK directory for the VM to find the XML file of the VM, for example, test01.

Obtaining the XML file of the VM when HA is disabled and the CVM node is normal

1. On the top navigation bar, click System, and then select Data Backup > Backup History from the left navigation pane. Then, download the most recent backup file.

This example downloads backup file UIS_INFO_BACK_E0750P07_20220713123106.tar.gz.

2. Decompress the downloaded backup file and enter directory UIS_INFO_BACK_E0750P07_20220713123106\cvknode1_crm_cvknode2\CVM_INFO_BACK_E0710P21_20220713123125\front\cvks.

3. Select the directory for the host where the VM is located, and then decompress the libvirt.tar.gz file in the directory. Then, enter the qemu subdirectory to obtain the XML file of the VM.

NOTE:

Directory cvknode1_crm_cvknode2 is named in the format of primary CVM node name_crm_secondary CVM node name. In a single host environment, this directory is named in the format of CVM node name.

Obtaining the XML file of the VM when HA is disabled and the CVM node is faulty

If HA is disabled and the CVM node is faulty, you cannot access UIS Manager. To obtain the XML file of a VM in this case, perform the following steps:

1. Use an SSH client to access each node in the cluster to find a node that has the /vms/cvmbackup directory.

The backup data is saved on three random hosts managed by the system.

2. Enter the /vms/cvmbackup directory on the node, and then enter the cvknode1_crm_cvknode2 directory to identify the most recent backup record. Then, enter the corresponding directory to locate the front.tar.gz file.

3. Decompress the front.tar.gz file, and then enter the cvks directory. Then, enter the directory for the host where the VM is located, and then decompress the libvirt.tar.gz file in the directory.

4. Enter the libvirt/qemu directory after decompression to find the XML file of the VM.

Identifying the storage volume for VM disk files

If you already know the storage volume for VM disk files, verify that the corresponding storage volume on another host that has mounted it is normal from the CLI of the host. If you do not know the storage volume for VM disk files, execute the vim or cat command to obtain the disk file location of the VM from the XML file obtained in "Obtaining the XML file of the VM." For example:

The source file field displays the location of the VM disk files.

Copying the XML file of the VM to the target host

Use SCP to copy the XML file of the VM to the /etc/libvirt/qemu directory on the host where the storage volume location has been identified in "Identifying the storage volume for VM disk files."

Defining the VM through XML

1. Execute the virsh define vm.xml command in the /etc/libvirt/qemu directory.

The VM is defined through XML.

2. Verify that the VM is also displayed in the output from the virsh list –all command at the CLI of the new host.

3. Connect the host from the Web interface. Then, you can view and start up the VM on from the Web interface.

To define many VMs, you can also reboot libvirt to automatically define these VMs if the system does not have any VMs with their names in Chinese characters. Then, start up these VMs after successful definition, as shown in the following figure:

Clearing VM data on the original host

If the original host has been completely damaged due to some hardware issues, resolve the hardware issues, and then re-install the same UIS version as the original system.

If the original host does not have hardware issues, perform the following steps to clear VM data on the host:

1. Disconnect the network cable from the original host before the host starts up.

2. Log in to the CLI of the original host to remove the XML file of the VM to avoid dual writes that occur when HA brings up the VM on the original host after the server restarts.

Configuring stateful failover

See H3C UIS Manager Stateful Failover Configuration Guide.

Configure a stateful failover system before a version upgrade.

If you cannot access the ONEStor Web interface, access the CVM node to execute the following commands:

Then, execute the following command:

Replacing SSDs with NVMe drives

See H3C UIS Manager Configuration Guide for Replacing SSDs with NVMe Disks.

Migrating VMware VMs

See H3C UIS HCI Cloud Migration Guide.

Configuring GPUs

See H3C UIS Manager GPU Passthrough Configuration Guide.

Configuring vGPUs

See H3C UIS Manager vGPU Configuration Guide.

Configuring anti-virus

Contact Technical Support.

Configuring AISHU backup

See H3C UIS AISHU Solution Configuration Guide.

Configuring storage disaster recovery

See H3C UIS Manager Site Recovery Management Configuration Guide.

Collecting logs

Collecting logs of the UIS Manager

Collecting logs from the Web interface

1. On the top navigation bar, click System, and then select Log Collection from the left navigation pane.

2. Select the CVK hosts for which the system collects logs, and then click Collect to save the log files locally.

Collecting logs at the CLI of a CVK host

If you cannot collect logs from the Web interface of the UIS Manager due to CVK failure, access the CLI of the CVK host to collect logs manually.

To collect logs at the CLI of a CVK host, access the CLI of the CVK host, and then execute the cas_collect_log.sh command. A compressed file is generated in the /vms directory as shown in the figure.

To analyze the logs, download the file to your local computer by using SSH client software.

For ONEStor-related hosts, you cannot collect logs for them by executing the script. To collect logs for a ONEStor-related host, manually copy the logs in the /var/log/storage and /var/log/ceph directories. If the time range for log collection is short or the log size is too large, you can collect part of the logs archived in the /var/log/storage/backup directory.

Introduction to logs

Logs collected from the Web interface

UIS log files downloaded from the Web interface are named in the UIS_×××_×××.tar.gz format. A decompressed log file includes the following types of files:

· catalina.out—Contains logs of Web functions on the UIS Manager.

· oper_log.log—Contains user operation logs.

· *.diag.tar.bz2—Contains logs of each CVK host.

· onestor—Contains operation logs and system logs of ONEStor.

· WARN*.tar.gz—Contains alarm messages.

Logs collected at the CLI

CVK host log files obtained at the CLI are named in the XXX.tar.bz2 format. A decompressed CVK host log file includes the following types of directory files:

· etc—Contains UIS configuration files, which are mainly VM configuration files. The VM configuration files are in the libvirt/qemu/VM.xml directory.

· var—Contains logs of each UIS feature module.

· command.out—Contains output information about frequently used commands at the CLI.

· cas _cvk-version—Contains UIS version information.

· loglist—Contains UIS log file names.

· uis_raid_card_info.log—Contains basic information about RAID controllers on the host.

The var directory mainly contains the following logs:

· messages—Host system logs, which record the system running information.

· fsm—Shared file system logs.

· cas_ha—HA logs.

· Ha_shell_XX.log—HA logs.

· libvirt—VM logs.

· openvswitch—Logs generated by the OVS running process.

· Ovs_shell_XX.log—Logs generated by calling the ovs_bridge.sh script.

· tomcat8—UIS Web logs.

· operation—Logs for manual operations at the CLI of UIS Manager.

The following provides descriptions for CVK host logs:

· Messages logs

Messages logs record critical information during operating system operation. The following introduces the records for an abnormal reboot of a CVK host.

Feb 3 13:58:01 XJYZ-CVK01 CRON【64458】: (root) CMD (ump-node-sync )

Feb 3 13:58:01 XJYZ-CVK01 CRON【64459】: (root) CMD (ump-sync -p ALL)

Feb 3 13:58:01 XJYZ-CVK01 CRON【64460】: (root) CMD ( /opt/bin/ocfs2_iscsi_conf_chg_timer.sh)

Feb 3 13:58:01 XJYZ-CVK01 CRON【64443】: (CRON) info (No MTA installed, discarding output)

Feb 3 14:06:35 XJYZ-CVK01 kernel: imklog 5.8.6, log source = /proc/kmsg started.

Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: 【origin software="rsyslogd" swVersion="5.8.6" x-pid="2747" x-info="http://www.rsyslog.com"】 start

Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: rsyslogd's groupid changed to 103

Feb 3 14:06:35 XJYZ-CVK01 rsyslogd: rsyslogd's userid changed to 101

Feb 3 14:06:35 XJYZ-CVK01 rsyslogd-2039: Could not open output pipe '/dev/xconsole' 【try http://www.rsyslog.com/e/2039 】

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpuset

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpu

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Initializing cgroup subsys cpuacct

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Linux version 3.13.6 (root@cvknode22) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #5 SMP Mon Jul 21 10:07:26 CST 2014

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Command line: BOOT_IMAGE=/boot/vmlinuz-3.13.6 root=UUID=4beeb503-6e10-4836-93a4-0836a9a1571e ro nomodeset elevator=deadline transparent_hugepage=always crashkernel=256M quiet

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 KERNEL supported cpus:

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Intel GenuineIntel

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 AMD AuthenticAMD

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 Centaur CentaurHauls

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 e820: BIOS-provided physical RAM map:

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x0000000000000000-0x000000000009cbff】 usable

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x000000000009cc00-0x000000000009ffff】 reserved

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x00000000000f0000-0x00000000000fffff】 reserved

Feb 3 14:06:35 XJYZ-CVK01 kernel: 【0.000000】 BIOS-e820: 【mem 0x0000000000100000-0x00000000bf60ffff】 usable

As shown in the example, the messages log file does not have any records from 13:58:01 to 14:06:35, indicating that the CVK host failed in the time range.

The kernel-level logs record information about the CVK host after it restarted.

· Libvirt logs

In the /var/log/libvirt/libvirtd.log log file, an alarm that the CVK host lacks memory resources exists and the current memory usage has reached 97%. (The alarm message prompted when the CPU resources are insufficient is similar to that in the example.)

2014-10-24 09:15:52.792+0000: 2994: warning : virIsLackOfResource:1106 : Lack of Memory resource! only 374164 free 64068 cached and vm locked memory(4194304*0%) of 16129760 total, max:85; now:97

2014-10-24 09:15:52.792+0000: 2994: error : qemuProcessStart:3419 : Lack of system resources, out of memory or cpu is too busy, please check it.

The /var/log/libvirt/qemu directory saves the log files of VMs running on the CVK host.

root@UIS-CVK01:/var/log/libvirt/qemu# ls -l

total 44

-rw------- 1 root root 7067 Jan 9 19:08 RedHat5.9.log

-rw------- 1 root root 1969 Jan 18 15:41 win7.log

-rw------- 1 root root 26574 Feb 11 16:15 windows2008.log

VM logs files record VM running information, including the time when the VM started up and was closed and disk files of the VM.

2015-02-11 15:50:18.349+0000: starting up

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin QEMU_AUDIO_DRV=none /usr/bin/kvm -name windows2008 -S -machine pc-i440fx-1.5,accel=kvm,usb=off,system=windows -cpu qemu64,hv_relaxed,hv_spinlocks=0x2000 -m 1024 -smp 1,maxcpus=12,sockets=12,cores=1,threads=1 -uuid 43741f06-166d-4155-b47e-4137df68e91c -no-user-config -nodefaults -chardev file=/vms/sharefile/windows2008,if=none,id=drive-virtio-disk0,format=qcow2,cache=directsync –device

…

char device redirected to /dev/pts/0 (label charserial0)

qemu: terminating on signal 15 from pid 4530

2015-02-11 16:15:28.825+0000: shutting down

· OCFS2 logs

The /var/log/fsm/fsm_core*.log log file records information about processing triggered by OCFS2 Fence of the CVK host.

2021-11-04 06:40:35,882 manager:233 INFO Received an event: {'index': 7, 'type': 'fence_umount', 'uuid': u'851D36905AB74AFD93E1ABA8259DA3A2', 'seq': 11538, 'dev_name': u'dm-7'}

2021-11-04 06:40:35,923 manager:204 INFO Remain 0 events to be handling

2021-11-04 06:40:35,923 manager:131 INFO Manager received an event: Pool sharefile06 was fence_umount

2021-11-04 06:40:35,923 fspool:141 INFO Pool sharefile06 received a event fence_umount

· Operation logs

Operation logs record information about the commands executed at the CLI of the CVK host. The following contains commands executed from Apr 19th to Apr 21st.

root@cvknode1:~/cas# ll /var/log/operation/

total 32

drwxrwxrwx 2 root root 4096 Apr 21 10:06 ./

drwxr-xr-x 40 root root 4096 Apr 21 11:01 ../

-rwxrwxrwx 1 root root 5162 Apr 19 17:49 18-04-19.log*

-rwxrwxrwx 1 root root 829 Apr 20 19:11 18-04-20.log*

-rwxrwxrwx 1 root root 8505 Apr 21 11:00 18-04-21.log*

The following example shows the content of an operation log file, including the following information:

¡ Time when a command was executed.

¡ Login user.

¡ Login address.

¡ Login method.

¡ Executed commands.

¡ Directory where a command was executed.

2018/04/19 16:56:50##root pts/6 (172.16.130.3)##/root## vi /var/log/tomcat8/cas.log

2018/04/19 16:57:05##root pts/6 (172.16.130.3)##/root## service tomcat8 restart

2018/04/19 17:02:21##root pts/5 (172.16.130.3)##/root## cat /etc/cvk/system_alarm.xml

2018/04/19 17:02:23##root pts/5 (172.16.130.3)##/root## lsblk

2018/04/19 17:49:04##root pts/6 (172.16.130.3)##/root## ceph osd tree

2018/04/19 17:49:19##root pts/6 (172.16.130.3)##/root## stop ceph-osd id=3

Collecting logs of CAStools

The UIS system and VMs are separated. To monitor and manage VMs on the UIS Manager, you must install CAStools in the operating system of the VMs.

The log collection method for CAStools varies by the operating system installed on the VM:

· Windows operating system—Obtain the qemu-ga.log file in the C:\Program Files\castools\ directory of the VM.

· Linux operating system—Obtain the qemu-ga.log and set-ip.log files in the /var/log/ directory of the VM.

Collecting logs of a VM operating system

Collecting logs of a Windows operating system

1. Open the Event Viewer window, and then select Windows Logs from the left navigation pane. Right click System, and then select Save All Events As.

2. Save the logs.

3. The downloaded log file is as shown in the figure.

Viewing logs of a Windows operating system

1. On the local computer (installed with the Windows 7 operating system), open the Event Viewer window. From the left navigation pane, right click Windows Logs, and then select Open Saved Log.

2. On the dialog box that opens, select the saved log file.

3. The logs are displayed on the Saved Logs > event page.

Collecting logs of a Linux operating system

To collect logs for a VM installed with a Linux operating system, collect logs in the /var/log directory. If the log size is large, first compress the logs and then copy the compressed file and save it locally.

For example, to collect logs generated on Sep 17th, 2019 for VM vm_test, execute the tar -cvf vm_test_20190917.tar.gz /var/log command.

Troubleshooting tools and utilities

Introduction to kdump

Kdump is a dump tool of the Linux kernel. It saves part of the memory to store the capture kernel. Once the current kernel crashes, kdump uses kexec to run the capture kernel. The capture kernel dumps complete information of the crashed kernel (for example, CPU register and stack statistics) to a file in a local disk or on the network.

By default, the UIS system supports kdump. When the kernel of a CVK host fails, the system generates a crash file in the /vms/crash directory for troubleshooting as shown in the example.

root@cvk29:/vms/crash# ls -lt

drwxr-sr-x 2 root whoopsie 4096 Jul 22 17:34 2014-07-22-09:34

The file named in the dump-*** format in the 2014-07-22-09:34 directory contains the output of kdump.

Analysis with the Kdump file

You can use the crash tool to analyze the Kdump file. The vmlinux file for the kernel version is needed for the analysis. You can find that file at /usr/src/linux-4.1.0-generic/vmlinux-kernelversion (the kernel version name might vary).

The following information describes how to use the Kdump file to locate typical online issues.

CPU error

Node cvknode1 at a site reboots repeatedly. After all virtual machines (VMs) are migrated and the shared storage settings are deleted from the node, the node still reboots repeatedly. The syslogs at reboots do not show occurrence of any anomalies before the reboot, while a vmcore file is present in the /vms/crash directory.

1. View abnormal call stack information in the vmcore file:

root@cvk21:/vms/tmp# crach vmlinux vmcore

No command 'crach' found, did you mean:

Command 'crash' from package 'crash' (main)

crach: command not found

root@cvk21:/vms/tmp# crash vmlinux vmcore

crash 7.0.5

This program is free software, covered by the GNU General Public License,

and you are welcome to change it and/or distribute copies of it under

certain conditions. Enter "help copying" to see the conditions.

This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.6

License GPLv3+: GNU GPL version 3 or later [http://gnu.org/licenses/gpl.html]

This is free software: you are free to change and redistribute it.

There is NO WARRANTY, to the extent permitted by law. Type "show copying"

and "show warranty" for details.

This GDB was configured as "x86_64-unknown-linux-gnu"...

KERNEL: vmlinux

DUMPFILE: vmcore [PARTIAL DUMP]

CPUS: 8

DATE: Wed Nov 5 12:25:19 2014

UPTIME: 00:02:19

LOAD AVERAGE: 0.06, 0.05, 0.02

TASKS: 324

NODENAME: cvknode-1

RELEASE: 3.13.6

VERSION: #5 SMP Mon Jul 21 10:07:26 CST 2014

MACHINE: x86_64 (2132 Mhz)

MEMORY: 64 GB

PANIC: "Kernel panic - not syncing: Fatal Machine check"

PID: 0

COMMAND: "swapper/6"

TASK: ffff8807f4618000 (1 of 8) [THREAD_INFO: ffff8807f4620000]

CPU: 6

STATE: TASK_RUNNING (PANIC)

crash] bt

PID: 0 TASK: ffff8807f4618000 CPU: 6 COMMAND: "swapper/6"

#0 [ffff8807ffc6ac50] machine_kexec at ffffffff8104c991

#1 [ffff8807ffc6acc0] crash_kexec at ffffffff810e97e8

#2 [ffff8807ffc6ad90] panic at ffffffff8174ac9d

#3 [ffff8807ffc6ae10] mce_panic at ffffffff81038b2f

#4 [ffff8807ffc6ae60] do_machine_check at ffffffff810399d8

#5 [ffff8807ffc6af50] machine_check at ffffffff817589df

[exception RIP: intel_idle+204]

RIP: ffffffff8141006c RSP: ffff8807f4621db8 RFLAGS: 00000046

RAX: 0000000000000010 RBX: 0000000000000004 RCX: 0000000000000001

RDX: 0000000000000000 RSI: ffff8807f4621fd8 RDI: 0000000001c0d000

RBP: ffff8807f4621de8 R8: 0000000000000009 R9: 0000000000000004

R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000003

R13: 0000000000000010 R14: 0000000000000002 R15: 0000000000000003

ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018

--- [MCE exception stack] ---

#6 [ffff8807f4621db8] intel_idle at ffffffff8141006c

#7 [ffff8807f4621df0] cpuidle_enter_state at ffffffff81602a8f

#8 [ffff8807f4621e50] cpuidle_idle_call at ffffffff81602be0

#9 [ffff8807f4621ea0] arch_cpu_idle at ffffffff8101e2ce

#10 [ffff8807f4621eb0] cpu_startup_entry at ffffffff810c1818

#11 [ffff8807f4621f20] start_secondary at ffffffff8104306b

crash]

Abnormal call stack information shows that a machine check error (MCE) exception occurs. This exception is typically caused by hardware issues.

2. Execute the crash-dmesg command to view information printed before the unexpected reboots:

[ 15.707981] 8021q: 802.1Q VLAN Support v1.8

[ 16.416569] drbd: initialized. Version: 8.4.3 (api:1/proto:86-101)

[ 16.416573] drbd: srcversion: F97798065516C94BE0F27DC

[ 16.416575] drbd: registered as block device major 147

[ 17.142281] Ebtables v2.0 registered

[ 139.114172] Disabling lock debugging due to kernel taint

[ 139.114185] mce: [Hardware Error]: CPU 2: Machine Check Exception: 4 Bank 5: be00000000800400

[ 139.114192] mce: [Hardware Error]: TSC 10ba0482e78 ADDR 3fff81760d32 MISC 7fff

[ 139.114199] mce: [Hardware Error]: PROCESSOR 0:206c2 TIME 1415161519 SOCKET 0 APIC 14 microcode 13

[ 139.114203] mce: [Hardware Error]: Run the above through 'mcelog --ascii'

[ 139.114208] mce: [Hardware Error]: Machine check: Processor context corrupt

[ 139.114211] Kernel panic - not syncing: Fatal Machine check

crash]

It can be determined from preceding information that an error has occurred on CPU 2.

Memory error

A CVK node at a site reboots unexpectedly. No abnormal records are found in the syslogs before and after the reboot. Kdump records are generated at the reboots.

1. View call stack information from the Kdump records.

If information as follows is output, a hardware error might occur.

crash] bt

PID: 0 TASK: ffffffff81c144a0 CPU: 0 COMMAND: "swapper/0"

#0 [ffff880c0fa07c60] machine_kexec at ffffffff8104c991

#1 [ffff880c0fa07cd0] crash_kexec at ffffffff810e97e8

#2 [ffff880c0fa07da0] panic at ffffffff8174ac9d

#3 [ffff880c0fa07e20] asminline_call at ffffffffa014c895 [hpwdt]

#4 [ffff880c0fa07e40] nmi_handle at ffffffff817598da

#5 [ffff880c0fa07ec0] do_nmi at ffffffff81759b7d

#6 [ffff880c0fa07ef0] end_repeat_nmi at ffffffff81758cf1

[exception RIP: intel_idle+204]

RIP: ffffffff8141006c RSP: ffffffff81c01da8 RFLAGS: 00000046

RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000046

RDX: ffffffff81c01da8 RSI: 0000000000000018 RDI: 0000000000000001

RBP: ffffffff8141006c R8: ffffffff8141006c R9: 0000000000000018

R10: ffffffff81c01da8 R11: 0000000000000046 R12: ffffffffffffffff

R13: 0000000000000000 R14: ffffffff81c01fd8 R15: 0000000000000000

ORIG_RAX: 0000000000000000 CS: 0010 SS: 0018

--- [NMI exception stack] ---

#7 [ffffffff81c01da8] intel_idle at ffffffff8141006c

#8 [ffffffff81c01de0] cpuidle_enter_state at ffffffff81602a8f

#9 [ffffffff81c01e40] cpuidle_idle_call at ffffffff81602be0

#10 [ffffffff81c01e90] arch_cpu_idle at ffffffff8101e2ce

#11 [ffffffff81c01ea0] cpu_startup_entry at ffffffff810c1818

#12 [ffffffff81c01f10] rest_init at ffffffff8173fc97

#13 [ffffffff81c01f20] start_kernel at ffffffff81d37f7b

#14 [ffffffff81c01f70] x86_64_start_reservations at ffffffff81d375f8

#15 [ffffffff81c01f80] x86_64_start_kernel at ffffffff81d3773e

crash]

2. Execute the dmesg command to view information before the anomaly.

crash]dmesg

…

[10753.155822] sd 3:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).

[10804.115376] sbridge: HANDLING MCE MEMORY ERROR

[10804.115386] CPU 23: Machine Check Exception: 0 Bank 9: cc1bc010000800c0

[10804.115387] TSC 0 ADDR 12422f7000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 2b

…

[10804.283467] sbridge: HANDLING MCE MEMORY ERROR

[10804.283473] CPU 9: Machine Check Exception: 0 Bank 9: cc003010000800c0

[10804.283475] TSC 0 ADDR 1242ef7000 MISC 90868000800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 26

[10804.303482] EDAC MC1: 28416 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12422f7 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)

[10804.303489] EDAC MC1: 192 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12424a7 offset:0x0 grain:32

…

[10804.319474] sbridge: HANDLING MCE MEMORY ERROR

[10804.319481] CPU 6: Machine Check Exception: 0 Bank 9: cc001010000800c0

[10804.319482] TSC 0 ADDR 1243087000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417366012 SOCKET 1 APIC 20

[10805.303772] EDAC MC1: 64 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x1243087 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)

[10813.602696] sd 3:0:0:0: [sdc] Very big device. Trying to use READ CAPACITY(16).

[10813.603219] sd 3:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).

[10840.833238] Kernel panic - not syncing: An NMI occurred, please see the Integrated Management Log for details.

crash]

3. View information in the kern.log file.

Nov 30 07:05:01 HBND-UIS-E-CVK09 kernel: [229821.496666] sd 11:0:0:1: [sdd] Very big device. Trying to use READ CAPACITY(16).

Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188854] sbridge: HANDLING MCE MEMORY ERROR

Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188873] CPU 23: Machine Check Exception: 0 Bank 9: cc1e0010000800c0

Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.188874] TSC 0 ADDR 10638f7000 MISC 90868002800208c PROCESSOR 0:306e4 TIME 1417302355 SOCKET 1 APIC 2b

…

Nov 30 07:05:55 HBND-UIS-E-CVK09 kernel: [229875.244902] EDAC MC1: 30720 CE memory scrubbing error on CPU_SrcID#1_Channel#0_DIMM#0 (channel:0 slot:0 page:0x10638f7 offset:0x0 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0008:00c0 socket:1 channel_mask:1 rank:0)

…

root@gzh-139:/vms/issue_logs/hebeinongda/20141201/HBND-UIS-E-CVK09/logdir/var/log# grep OVERFLOW kern* | wc

225 6341 60264

root@gzh-139:/vms/issue_logs/hebeinongda/20141201/HBND-UIS-E-CVK09/logdir/var/log#

It can be determined from preceding information that the issue is caused by a memory error. The issue is resolved after the memory is replaced.

Storage cluster logs

/var/log/ceph/ceph.log

The ceph.log file mainly records the health status and traffic of the cluster. It is available only on monitor nodes and has the same content as that output from the ceph –w command.

· If logs as follows are in the ceph.log file, the service network of the primary monitor node of the cluster has been disconnected.

2017-05-09 19:44:03.400143 mon.2 172.16.105.84:6789/0 2009 : cluster [INF] mon.cvknode84 calling new monitor election

2017-05-09 19:44:03.404362 mon.1 172.16.105.83:6789/0 2023 : cluster [INF] mon.cvknode83 calling new monitor election

2017-05-09 19:44:05.419510 mon.1 172.16.105.83:6789/0 2024 : cluster [INF] mon.cvknode83@1 won leader election with quorum 1,2

2017-05-09 19:44:05.428131 mon.1 172.16.105.83:6789/0 2025 : cluster [INF] HEALTH_WARN; 1 mons down, quorum 1,2 cvknode83,cvknode84

2017-05-09 19:44:14.383590 mon.1 172.16.105.83:6789/0 2057 : cluster [INF] osdmap e1397: 18 osds: 12 up, 18 in

· If logs as follows are in the ceph.log file, the health of the cluster is not 100%, and the cluster is in the process of recovery.

2017-06-06 19:31:41.319993 mon.0 192.168.93.21:6789/0 86387 : cluster [INF] pgmap v73931: 4096 pgs: 2561 active+clean, 1532 active+remapped+wait_backfill, 3 active+remapped+backfilling; 3362 GB data, 6730 GB used, 21941 GB / 28672 GB avail; 0 B/s rd, 127 kB/s wr, 256 op/s rd, 63 op/s wr; 5/2608637 objects degraded (0.000%); 1765938/2608637 objects misplaced (67.696%); 62992 kB/s, 15 objects/s recovering

· If logs as follows are in the ceph.log file, the storage network of a non-Handy or non-primary monitor node in the cluster has been disconnected.

2017-05-12 16:05:14.585496 mon.0 172.31.1.31:6789/0 106035 : cluster [INF] osd.31 marked itself down

2017-05-12 16:05:15.095824 mon.0 172.31.1.31:6789/0 106038 : cluster [INF] osd.33 marked itself down

2017-05-12 16:05:15.195542 mon.0 172.31.1.31:6789/0 106040 : cluster [INF] osdmap e286: 36 osds: 25 up, 36 in

2017-05-12 16:05:15.287350 mon.0 172.31.1.31:6789/0 106042 : cluster [INF] osd.27 marked itself down

2017-05-12 16:05:16.186527 mon.0 172.31.1.31:6789/0 106043 : cluster [INF] osdmap e287: 36 osds: 24 up, 36 in

/var/log/ceph/ceph-osd.*.log

The ceph-osd.*.log file mainly records information about an OSD in the cluster. If an error occurs on a cluster OSD, the error reasons will be recorded in the ceph-osd.*.log file for that OSD, which can be used for troubleshooting.

The following is an example about how to troubleshoot by using a ceph-osd.*.log file when an OSD is abnormal (the UI reports an OSD error):

1. Use the ceph osd tree command in the CLI to identify the identifier of the abnormal OSD.

2. Access the /var/log/ceph/ceph-osd.*.log file for the OSD and identify the reason for the OSD exception.

¡ If a log as follows is in the ceph-osd log file, the storage controller is damaged, causing the journal to be interrupted.

2017-04-25 14:34:08.807146 7f5bf690a780 -1 journal Unable to read past sequence 301115833 but header indicates the journal has committed up through 301115842, journal is corrupt

¡ If logs as follows are in the ceph-osd log file, the OSD has committed suicide because of is excessive pressure.

2017-03-09 11:46:01.576034 7f0878364700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f086fa6c700' had suicide timed out after 180

2017-03-09 11:46:01.576049 common/HeartbeatMap.cc: 81: FAILED assert(0 == "hit suicide timeout")

¡ If a log as follows is in the ceph-osd log file, the OSD has not been mounted.

2017-04-27 19:46:18.280510 7fcfb954c700 5 filestore(/var/lib/ceph/osd/ceph-85) umount /var/lib/ceph/osd/ceph-85

¡ If logs as follows are in the ceph-osd log file, the data copies are inconsistent.

2016-10-22 06:49:23.854201 7fd2e860f700- 1 log_channel(cluster)log [ERR]:1.ad shard 1:soid 819850ad/rbd_date.3b7055757a07.0000000000000ab1/7//1 date_digest 0xd7ac1812 != best guess date_digest 0x43d61c5d from auth shard 0

2016-10-22 06:49:23.854253 osd/osd_types.cc:4148:FAILED assert(clone_size.count(clone))

/var/log/ceph/ceph-disk.log

The ceph-disk.log file mainly records information about OSD deployment and startup and is typically used in conjunction with the ceph-osd.*.log file to locate OSD related issues.

· If logs as follows are in the ceph-disk log file, the system stops OSD mounting and exits the OSD mounting process because files exist in the /var/lib/ceph/osd/ceph-* directory. This issue typically occurs at the restart of the host. When the host restarts, all OSDs must be reactivated and mounted and the mounting process will check whether other files than the heartbeat, osd_disk_info.ini, and osd_should_be_restart_flag files exist in the OSD directory. If other files exist in the directory, the OSD mounting process stops.

ceph-disk: Error: another ceph osd.71 already mounted in position(old/different cluster instance?);unmounting ours.

· If logs as follows are in the ceph-disk log file, the OSD has not been activated and cannot be mounted.

Fri. 07 Apr 2017 10:24:48 ceph-disk[line:2438] ERROR Failed to activate

Fri. 07 Apr 2017 10:24:48 ceph-disk[line:976] DEBUG Unmounting /var/lib/ceph/tmp/mnt.hD_6nh

/var/log/ceph/ceph-mon.*.log

The ceph-mon.*.log file mainly records information of a monitor node in the Ceph cluster. Monitor nodes are responsible for monitoring the cluster. If an error occurs on a monitor node, the error reason will be recorded in the ceph-mon.*.log file for that node, which can be used for troubleshooting.

To troubleshoot for a monitor node exception (the UI reports a monitor node anomaly):

1. Check the hostname of the abnormal monitor node on the host management page.

2. Access the /var/log/ceph/ceph-mon.*.log file for the host to check for the cause of the monitor node exception. If the following logs are found in the ceph-mon log file, the primary monitor node is abnormal (possible reason is an exception occurs on the service network of the primary monitor node or the ceph-mon process on the primary master node is stopped), and the backup monitor nodes trigger the election mechanism.

2017-05-08 19:24:58.017935 7fb173765700 1 mon.cvknode84@2(peon).paxos(paxos active c 24348..24883) lease_timeout -- calling new election

2017-05-08 19:24:58.024456 7fb172f64700 0 log_channel(cluster) log [INF] : mon.cvknode84 calling new monitor election

/var/log/calamari/calamari.log

The calamari.log file mainly records the operations on Handy.

If logs as follows are in the calamari.log file, the Handy node does not have network connectivity with the other nodes.

2017-05-08 15:08:29,060 - ERROR - onestor_common.py[network_check][line:494] - django.request <network_check> Host "172.16.105.84" is unreachable, retry again...

2017-05-08 15:08:29,060 - ERROR - onestor_common.py[execute][line:622] - django.request [ONEStor] onestor_request_all_node cvknode84:Host is unreachable

/var/log/onestor_cli/ onestor_cli.log

The onestor_cli.log file records information about the process of collecting real-time logs on a node. It can be used to diagnose and troubleshoot any issues related to log collection.

· If a log as follows is in the onestor_cli.log file, the size of the collected logs has exceeded 5 GB.

[2017-05-10 10:47:01,980][WARNING][monitor.py][line:157] We detect the current collecting log size is up to 5GB, ending collecting automatically!

· If the onestor_cli.log file disappears from a node, the log disk space on the node might be full.

Bimodal HCI logs

Bimodal HCI provides VMware VM lifecycle management and VMware VM agentless migration features.

1. The vmware-api-server service on the CVM host provides VMware VM lifecycle management. It stores related logs in the /var/log/vmware-api-server directory. If an exception occurs when you operate VMware VMs on the UIS, a log is generated in that directory to record the causes for the exception, which can be used for issue diagnosis.

For example, if a log as follows is generated, you can determine that the reason for failure to generate a snapshot is that the snapshot directory is too deep (which is limited by VMware):

[Vmware VM Request Processor Manager1] Trace[] UID[] c.h.h.u.s.v.handler.VmwareHandler – vmware vm “hdm2-snapshot” to generate a snapshot fail, cause:Snapshot hierarchy is too deep.

2. The vmware-agent service on the CVK host is responsible for migrating data from VMware. It stores related logs in the /var/log/vmware-agent directory. If a migration task fails or is interrupted unexpectedly on the UIS, you can view the logs in that directory.

¡ vmware-agent.log—Migration process logs. When an exception occurs during the migration process, the vmware-agent.log file will record the causes for the exception, which can be used for future issue diagnosis.

If a log as follows is output, a known VMware issue https://kb.vmware.com/s/article/2035976 has been triggered

2022-01-19 16:03:06 [ERROR] service.go:149 migrate failed, vcenter key: 172.20.67.6:443 vmref: vm-64 task 1955534340610146293 reason: {"code": 12002, "message": "Get QueryChangedDiskAreas failed. ", "error": "ServerFaultCode: Error caused by file /vmfs/volumes/61dd4ded-84b7a178-07ce-98f181b81b1c/ubuntu18041desktop/ubuntu18041desktop.vmdk"}

¡ vmware_vddk.log—VDDK operation logs. These logs record the operations related to connecting to vSphere and can assist in locating data transmission interruption during migration.

3. If an error of failed driver injection is reported on the UI during the VM migration process, you can check the relevant error logs to preliminarily locate the cause of the failure. The relevant error logs are saved in the /var/log/caslog/cas_xc_virtio_driver.log file.

4. If the VM still reports that castools is not running on the UI a period of time after the injection is completed, remount the ISO and install castools again.

5. If no errors are reported on the UI after the VM is migrated but you cannot access the desktop after the VM is powered on, a VM driver injection compatibility issue might exist. If this VM is in the compatible migrated VM list, contact Technical Support to locate the issue on site.

Distributed storage maintenance

Cluster issues

Rebalancing data placement when data imbalance occurs

ONEStor uses the CRUSH algorithm to automatically balance data across the object-based storage daemons (OSDs) in the cluster. Each OSD maps to a disk.

To rebalance data when occasional data imbalance occurs:

1. Execute the ceph osd df command and then identify the disk utilization of each OSD in the %USE field.

Figure 6 Identifying the disk utilization of each OSD

2. If the disk utilization of some OSDs is unusually higher than other OSDs, execute the ceph osd reweight-by-utilization command to rebalance data.

IMPORTANT:

Data rebalancing is read and write intensive and might cause cluster performance to degrade. To minimize its impact on storage services, perform this operation at off-peak hours.

3. Verify that the system has finished the rebalancing operation successfully.

Execute the ceph -s command to monitor the cluster health state. When the cluster state changes to HEALTH_OK, you can determine that the system has finished the rebalancing operation.

Method to accelerate data rebalancing when the cluster is in an idle state

When the cluster is in an idle state, you can accelerate data rebalancing, as follows:

1. Log in to UIS Manager.

2. On the top navigation bar, click Storage, and then select Disk Pool Management from the left navigation pane.

3. Select the disk pool on which data rebalancing is to be performed, and then click Edit.

4. In the dialog box that opens, change the restore speed from self-adaptive to reconstruction first.

Node issues

Resolving host issues caused by a full system disk

A host might malfunction when the usage of its system disk reaches 100%. For example, Apache processes and the ceph-mon daemon might fail to start, resulting in issues such as the mon down error and inability to log in to the management node.

System disk might get full for the following reasons:

· Too many large files and log files are present.

· The fio tester stores a large test0.0 file on the system disk. This issue occurs if you run fio without specifying the --filename option.

To free up disk space:

1. Execute the df –h command on the host to identify its system disk usage. The following is sample output:

root@cvknode86:~# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 28G 4.0G 23G 16% /

If the Use field displays that the disk usage has reached 100%, proceed to remove unused files.

2. Remove unused large files or log files:

a. Access the /var/log directory and other directories that might contain large files or unused files.

b. Execute the du –h --max-depth=1 command to view the size of each folder in the directory.

c. Delete unused files.

3. Remove the test data file generated by fio:

a. Execute the echo ""> filename command.

b. Execute the rm –rf filename command to delete the test data file.

Issues caused by network failure

Handling failures to add or delete hosts

You will fail to add or delete a host or disks on the host if network failure occurs before the system finishes the operation. The system will then display a failure message indicating that the system failed to delete a host because of management network failure.

The solution to these issues differs depending on the timing of the network failure.

Network failure occurs before the system starts deleting disks

If network failure occurs before the system starts deleting disks, you only need to select the target host from the webpage and perform the operation again after the system regains network connectivity to the host.

If the connectivity to the host cannot be restored in extreme cases, for example, because the host's operating system is damaged, select the host from the webpage to delete it offline. However, data on the host's disks will remain. You must take action to handle residual data.

Network failure occurs before the system deleting all disks

See "Network failure occurs before the system starts deleting disks."

Network failure occurs during disk formatting after all the disks are deleted from the cluster

The host will be invisible on the management webpages after the system deletes all its disks from the cluster and proceeds to disk formatting. If network failure occurs before the system finishes formatting all the disks, the data and Ceph partitions on the unformatted disks will remain. After the host restarts, the unformatted disks will be automatically mounted to the operating system. UIS Manager will be unable to discover these disks when the host is re-added to the cluster.

To resolve these issues, execute the umount command to manually unmount the residual disks before you add the host back to the cluster.

Deleting a monitor node offline and restoring the node

You delete a monitor node offline from the cluster on the webpage only if the network connectivity to its host cannot be restored. This operation directly removes the node from the cluster.

To minimize the impact of the operation on the cluster:

1. Remove all roles of the host in the cluster.

2. Destroy the cluster data on the host.

CAUTION:

Destroying the cluster data on a host will result in loss of all cluster data on that host. Be sure that the node is no longer in use when you perform the operation.

These operations ensure that you can add the host back to the cluster as a storage, monitor, or backup management node for management high availability.

Deleting a storage node offline and restoring the node

You delete a storage node offline from the cluster on the webpage only if the network connectivity to its host cannot be restored. This operation directly removes the node from the cluster.

Before you delete a storage node offline, verify the following items:

1. No abnormal placement groups (PGs) are present for the disk pool that contains the storage node.

CAUTION:

If abnormal PGs are present, data rebalancing might be in progress. To avoid loss of data, do not delete the node at this time.

2. Verify that the disk pool is in healthy state.

Then, you can safely delete the node.

To minimize the impact of the operation on the cluster:

3. Remove all roles of the host in the cluster.

4. Destroy the cluster data on the host.

CAUTION:

Destroying the cluster data on a host will result in loss of all cluster data on that host. Be sure that the node is no longer in use when you perform the operation.

These operations ensure that you can add the host back to the cluster as a storage, monitor, or backup management node for management high availability.

Disk issues

Missing or changing sdX device names due to host restart

When you remove a disk, the state of its corresponding logical drive on the RAID controller changes from OK to Failed. Normally, the sdX drive letters will stay unchanged after you re-insert or replace the disk and restore its logical drive back to the OK state. However, if the host restarts while the logical drive is in FAIL state, the disk will not be visible in the operating system. If you execute the lsblk or fdisk command to view disks on the host, you will notice that the disk is missing.

For example, the lsblk command displays that the host has disks sda, sdb, sdc, sdd, and sde when it is executed before a disk removal operation. ONEStor shows that the sdd disk is abnormal. The output from the hpssacli controller all show config command shows that the logical drive for sdd is in Failed state, as shown in the following figures:

If the host accidentally restarts before the logical drive for sdd restores the OK state, the sdd disk will become invisible in the system. The drive letter of each subsequent disk will shift backward by one letter. In this example, the device name of the disk originally identified by sde will change to sdd. Even if the logical drive restores the OK state, the lost disk will still not be visible.

To resolve this issue:

1. Delete the logical drive that was originally in Failed state, regardless of whether its current state is Failed or OK:

hpssacli ctrl slot=0 logicaldrive 4 delete forced

2. Execute the hpssacli controller all show config command to identify the unassigned physical drive displayed in the end.

3. Recreate the logical drive.

hpssacli ctrl slot=0 create type=ld drives=2I:2:3 raid=0

4. Execute the lsblk command to verify that the new disk has been added to the end of the storage device list. In this example, the disk is named sde.

5. Remount the /dev/sde1 disk partition at the original OSD directory.

mount /dev/sde1 /var/lib/ceph/osd/ceph-4

6. If the ONEStor management system still shows that sde is abnormal, delete it and then add it again. The disk will be available for use.

Identifying the data partitions and journal partitions (for write caching) to which the OSDs are mounted

The following sample output shows that OSDs have been mounted:

The following sample output shows that no OSDs have been mounted:

You must identify the mapping between an OSD and its disk based on the partition UUID (partuuid) in the following situations:

· Remount the OSD if it was unmounted because of a disk issue.

· Identify the journal partition for an OSD for write caching.

To identify the partuuid of the data partition for an OSD, view the content of the fsid file in the OSD directory for that OSD, for example:

cat /var/lib/ceph/osd/ceph-8/fsid

d6d97f59-171e-46f7-9759-8037c7209bf1

To identify the partuuid of the journal partition for an OSD, view the content of the journal_uuid file in the OSD directory for that OSD.

cat /var/lib/ceph/osd/ceph-8/journal_uuid

1f8b0b99-69c6-404a-acfe-186f435fd877

To identify the partuuid values of all partitions on the host, execute the following command:

ll /dev/disk/by-partuuid/

lrwxrwxrwx 1 root root 10 Dec 6 19:55 1f8b0b99-69c6-404a-acfe-186f435fd877 -> ../../sdf1

lrwxrwxrwx 1 root root 10 Dec 6 19:55 260c435a-2c35-4562-979d-7a3d641dda48 -> ../../sdf2

The sample output shows the partuuid values of SSD write caches sdf1 and sdf2.

OSD for a disk cannot be deleted upon a disk replacement prior to deletion of its OSD from UIS Manager

If you replace a faulty disk prior to deleting its OSD from UIS Manager, Handy adds a new disk and OSD mapping for the replacement disk. When you attempt to delete the original OSD, you will receive a no data found message and the deletion attempt will fail.

To resolve this issue:

1. Execute the lsblk command to verify that no disk has been mounted at the old OSD node. If a disk is still mounted at that OSD node, unmount it first.

Mount status:

Unmount status:

2. Execute the ps -ef | grep osd command to check whether the old OSD daemon has stopped.

3. Execute the following commands to stop the OSD daemon. Replace x in these command lines with the OSD daemon ID.

CAUTION:

These commands will erase user data. Make sure you fully understand its impact on services when you use them. If you are not sure of their impact, contact H3C Support.

stop ceph-osd id=x

ceph osd out osd.x

ceph osd crush remove osd.x

ceph auth del osd.x

ceph osd rm osd.x

4. Execute the cephosd tree command to verify that the OSD has been removed from the cluster.

5. Log in to UIS Manager to verify that the failed disk has been deleted.

Replacing disks

For information about disk replacement, see H3C UIS Hyper-Converged Infrastructure Component Replacement Configuration Guide.

Failure to display O&M and monitoring data

Failure to display O&M and monitoring data (1)

Symptom

· When you obtain the O&M and monitoring data, the page shows that the data failed to be obtained.

· When you obtain the log information of the Prometheus database, the opening storage failed error message is generated.

In addition, identify whether the Prometheus-cluster-stderr---xxxxx.log file has anomalies.

Solution

To resolve this issue:

1. Delete exceptional WAL files.

a. Access the /opt/h3c/var/lib/prometheus_node/data/wal directory and identify whether the file numbers are sequential. The following figure shows two continuous sequences: 000001, 000002, 000003, and 000006, 000007.

b. Delete the sequence with the smaller numbers. If the Prometheus-cluster-stderr---xxx.log file also has anomalies, perform the same steps in the /opt/h3c/var/lib/prometheus_cluster/data/wal directory.

2. Reboot prometheus processes.

¡ If the prometheus-node-stderr---xxxx.log file has anomalies, reboot the prometheus-node process:

# supervisorctl restart Prometheus-node

¡ If the prometheus-cluster-stderr---xxxx.log file has anomalies, reboot the prometheus-cluster process:

# supervisorctl restart Prometheus-cluster

Failure to display O&M and monitoring data (2)

Symptom

· When you obtain the O&M and monitoring data, the page shows that the data failed to be obtained or no monitoring report data is available.

· When you view information about Prometheus related processes, the prometheus-node or prometheus-cluster process is repeatedly rebooted:

# supervisorctl status prometheus-node

# supervisorctl status prometheus-cluster

· When you obtain the log information of the Prometheus database, the opening storage failed: invalid block sequence: block time ranges overlap error message is generated. For example:

level=error ts=2023-10-26T19:42:10.042Z caller=main.go:731 err="opening storage failed: invalid block sequence: block time ranges overlap:

In addition, identify whether the Prometheus-cluster-stderr---xxxxx.log file has anomalies.

Solution

To resolve this issue:

1. Delete the data in the data directory.

¡ For the prometheus-node process that is running on all nodes in the cluster:

# mkdir prometheus_node_bak

# cp -rf /opt/h3c/var/lib/prometheus_node/data/* prometheus_node_bak

# rm –rf /opt/h3c/var/lib/prometheus_node/data/*

¡ For the prometheus-cluster process that runs only on the primary and backup Handy nodes:

# mkdir prometheus_cluster_bak

# cp -rf /opt/h3c/var/lib/prometheus_cluster/data/* prometheus_cluster_bak

# rm –rf /opt/h3c/var/lib/prometheus_cluster/data/*

CAUTION:

This step will also delete historical monitoring data, back up data as needed before performing this step.

2. Reboot prometheus processes.

¡ If the prometheus-node-stderr---xxxx.log file has anomalies, reboot the prometheus-node process:

# supervisorctl restart prometheus-node

¡ If the prometheus-cluster-stderr---xxxx.log file has anomalies, reboot the prometheus-cluster process:

# supervisorctl restart prometheus-cluster

Troubleshooting

Cluster initialization issues

Host scan failure

Symptom

A host cannot be discovered during cluster setup.

Solution

To resolve this issue:

· Check the network configuration as follows:

a. Verify that the management interface of the target host is in the same LAN as the management interface of the management node.

b. Verify that link aggregation is correctly configured on the switch interfaces connected to the management interface of the target host.

- If static link aggregation is configured, shut down one of the switch interfaces. After host scan is finished, bring up that interface.

- If dynamic link aggregation is configured, configure the host-facing aggregate interface as an edge aggregate interface by using the lacp edge-port command.

· Check for cluster initialization failure as follows:

c. Log in to each CVK host.

d. Access the /etc/cvk path and delete the cvm_info file (if it exists) by using the following command.

rm –rf cvm_info

e. Access the /root/.ssh path and delete the mhost file (if it exists) by using the following command.

rm –rf mhost

· Log in to the target host, access the /root/.ssh path, and delete the isCvmFlag file by using the following command. This file indicates that the host has acted as a management host.

rm –rf isCvmFlag

Compute cluster creation failure

Symptom

Creation of a compute cluster fails.

Solution

To resolve this issue, verify that each host can reach the management, storage front-end, and storage back-end networks.

Storage configuration failure

Symptom

Storage configuration fails.

Solution

To resolve this issue:

1. If UIS fails to discover all disks or a designated disk, perform the following tasks:

a. Log in to the affected host and execute the parted /dev/sdDrive letter rm partition number command to delete all partitions from an undiscovered disk.

b. Verify that the RAID controllers are included in the H3C CAS&UIS Server Virtualization Software and Hardware Compatibility Matrix.

2. If the distributed storage service is incorrectly installed on the management node, perform the following tasks:

a. Run the /opt/bin/uis_onestor_handy_install.sh script to reinstall ONEStor.

b. If an error is reported, contact Technical Support.

3. If device management is not supported by a server or RAID controller, perform the following tasks:

a. Modify configuration on the hardy node:

- For software versions earlier than UIS 0716, execute the sed –i ‘s/\$result/false/g’ /opt/h3c/sbin/check_raid_support command to modify the check_raid_support script. Then, execute the check_raid_support command and verify that false is output.

- For software versions later than UIS 0716, access the /opt/h3c/sbin/devmgr_check_dev_type script, and then add the return False string to the def check_raid_card() function.

b. Execute the devmgr_check_dev_type command and verify that the value of for_DM_ONEstor is False.

Cluster state

Health index lower than 100%

Symptom

The health index for a cluster is lower than 100%.

Solution

To resolve this issue:

1. Troubleshoot node failure or network disconnection issues as follows:

a. Log in to UIS, resolve alarms, and verify that the status of hosts is normal.

b. Log in to the command line of the management node, and verify connectivity to the hosts in the cluster by using ping operations.

2. Troubleshoot disk failure or RAID controller failure as follows:

a. Log in to UIS, and resolve the alarms generated for disk failure or RAID controller failure.

b. Log in to HDM, and resolve hardware alarms.

3. Verify that storage nodes are under maintenance or data balancing is in process as follows:

a. Log in to UIS, and verify that storage nodes are under maintenance and data balancing is enabled.

b. Log in to the command line of the management node, and verify that data balancing is in progress.

Host deletion

Deletion failure prompt for successful host deletion

Symptom

The system displays a deletion failure prompt when a host is deleted successfully.

Solution

To resolve this issue:

1. Execute the lsblk command on the deleted host and check for unmounted OSDs.

2. Verify that the directory of an OSD's directory is opened.

3. Execute the cd command to exit the OSD's directory, and then execute the umount /var/lib/ceph/osd/ceph-11 command.

4. Execute the sgdisk –zap-all /dev/sdf command to format partitions.

Disk issues

No available disk

Symptom

No disks are available

Solution

To resolve this issue:

1. Verify that the OSDs on the affected host have been used by the Ceph cluster:

a. Execute the lsblk command to view partitions on the target disk.

b. Execute the gdisk -l /dev/drive letter command to check for the ceph tag.

2. If the target disk is not in use, execute the ceph-disk zap /dev/drive letter command to clear residual data on the disk, and then add the disk again.

3. Clear partitions from the Web interface if you are using the most recent UIS version.

4. If UIS still cannot discover the disk, execute the ceph-disk zap /dev/drive letter command again.

5. Verify that the state of device management is consistent across the cluster nodes. For example, if the handy node does not support device management, a target node for expansion also does not support device management. To disable device management on the hardy node:

¡ For software versions earlier than UIS 0716, execute the sed –i ‘s/\$result/false/g’ /opt/h3c/sbin/check_raid_support command to modify the check_raid_support script. Then, execute the check_raid_support command and verify that false is output.

¡ For software versions later than UIS 0716, access the /opt/h3c/sbin/devmgr_check_dev_type script, and then add the return False string to the def check_raid_card() function.

6. Execute the devmgr_check_dev_type command and verify that the value of for_DM_ONEstor is False.

Cluster alarms

Down monitor node

Symptom

A monitor node is down.

Solution

To resolve this issue:

1. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Monitor Nodes from the left navigation pane.

2. If the down monitor node is powered off or shut down, start it up. Then, verify network connectivity between the cluster and the monitor node.

Figure 7 Verifying the monitor node state

Down OSD

Symptom

An OSD is down.

Solution

To resolve this issue:

1. Verify that the storage node where the down OSD resides is not powered off or shut down and it does not have network connectivity issues.

a. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Storage Nodes from the left navigation pane.

b. If the storage node where a down OSD resides is powered off or shut down (no data is displayed for the storage node), start the storage node up. Then, verify network connectivity between the cluster and the storage node.

Figure 8 Verifying the storage node state

OSD process terminated unexpectedly

Symptom

An OSD process is terminated unexpectedly on a storage node.

Solution

To resolve this issue:

1. On the top navigation bar, click Storage, and then select Storage Management > Node Management > Storage Nodes from the left navigation pane.

2. Verify that the disks on the storage node are in normal state.

3. Log in to the host acting as the storage node through SSH from the management network, and execute the ceph osd tree command to view the status of al OSDs.

4. Execute the ps-ef | grep ceph-osd command to check the status of the osd processes.

5. If an osd process is not running, execute the systemctl start ceph-osd@OSD ID.service command to start it.

OSD soft link loss

Symptom

The OSD soft link for a disk is lost.

Solution

To resolve this issue:

1. Execute the lsblk command to view the OSD directory of the down disk.

2. Access the OSD directory by executing the following command:

cd /var/lib/ceph/osd/ceph-4

3. Enter ll to check whether the soft link exists. If the soft link exists, the journal file line contains the UUID of the disk.

4. If the soft link does not exist, execute the following command:

ceph-disk activate-all

Loose or faulty disk

Symptom

The OSD process of a disk is down, which indicates that the disk is loose or faulty.

Solution

To resolve this issue:

1. Examine the disk status LEDs of the affected server to locate the disk.

2. Replace the disk.

Abnormal PG state

Symptom

PGs are degraded, stale, stuck unclean, or undersized.

Solution

If no other alarms are generated for the abnormal PGs, data migration is in process. The PGs will recover automatically.

Cache alarm

Symptom

Physical cache alarms or logical cache alarms are generated for the following reasons:

· RAID is manually configured and the state of caches is incorrectly set during system deployment.

· Faults occur during operation of the cluster. For example, a battery fault for a RAID controller might cause logical cache errors.

Solution

To resolve this issue:

1. On the top right of the page, click Hot Key, and then select Health Check.

2. Select Physical Disk State and Logical Disk State, and then click Start.

Figure 9 Performing health check

3. Click Failure in the Cache State column for a faulty disk.

Figure 10 Disk with faulty caches

4. Fix the caches of the disk according to the remediation.

Figure 11 Remediation

Host failure

UIS management node failure

Symptom

The management node cannot recover from failure.

Solution

To resolve this issue:

1. Install UIS Manager on a backup server.

2. Access UIS Manager as a system administrator.

3. On the top navigation bar, select System, and then select Data Backup from the left navigation pane.

4. On the Data Backup tab, configure the backup settings for accessing the backup files, and then click Connectivity.

Figure 12 Configuring data backup access

5. If the test succeeds, click Save. If the test fails, check the backup settings for misconfiguration.

UIS Manager automatically obtains backup files from the backup directory.

6. Click the Backup History tab.

Figure 13 Backup history

7. Select the target backup file, and then click its Restore UIS Data icon .

Figure 14 Restoring UIS data

8. In the dialog box that opens, click Yes.

9. Clear the browser cache, and then log in to the in UIS Manager again.

IMPORTANT:

The two system disks backup each other. The system still can operate correctly if one of the system disks fails. However, the system cannot be restored if both of the system disks fail. If one of the system disks fails, replace it in time.

Stateful failover

Quorum node failure

Symptom

The quorum node fails.

Solution

To recover the quorum node, contact Technical Support.

Monitoring node failure

Down monitoring node due to high system disk usage

Symptom

A monitoring node goes down because the system disk usage is high. The mon process exits or cannot start if the system disk usage exceeds 95%. The low disk space alarm is generated if the system disk usage crosses 70%.

To identify this symptom:

1. Execute the following command to check whether the mon process exists.

ps -ef|grep ceph-mon

2. If the mon process is not running, execute the df –h command to view the system disk usage.

root@cvknode1:df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 10G 9.6G 0.4G 96% /

udev 863M 12K 863M 1% /dev

tmpfs 349M 348K 349M 1% /run

none 5.0M 0 5.0M 0% /run/lock

none 873M 4.0K 873M 1% /run/shm

3. Check the status of the mon process by executing the ps aux | grep ceph-mon command.

root@cvknode20216:~/515# ps aux | grep ceph-mon

root 2619507 0.0 0.1 8112 2136 pts/3 S+ 17:47 0:00 grep --color=auto ceph-mon

Solution

To resolve this issue, release system disk space and start the mon process, for example, by executing the service ceph-mon@node name status command. The service name differs between nodes.

Down monitoring node due to network error

Symptom

A monitoring node goes down because of a network error.

To identify this symptom:

1. Verify that the mon process is running.

2. Verify that the monitoring nodes can ping one another.

3. Execute the arp -a and ifconfig commands to verify that the ARP table of the down monitoring node is correct.

Solution

To resolve this issue, troubleshoot the network error and start the mon process.

Extent backup file

Extent backup state

To verify that extent backup is enabled, execute the following command:

cat /etc/crontab

SHELL=/bin/bash

PATH=/sbin:/bin:/usr/sbin:/usr/bin

MAILTO=""

# For details see man 4 crontabs

# Example of job definition:

# .---------------- minute (0 - 59)

# | .------------- hour (0 - 23)

# | | .---------- day of month (1 - 31)

# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...

# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat

# | | | | |

# * * * * * user-name command to be executed

0 22 * * 5 root python /opt/bin/ocfs2_pool_fstrim.pyc -s onestor

1 2 * * * root /opt/bin/cas_clean_log.sh

*/1 * * * * root python /opt/bin/uis_host_network_probe.pyc

*/5 * * * * root flock -xn /tmp/util_memory_dropcaches.sh.lock -c "/opt/bin/util_memory_dropcaches.sh"

*/3 * * * * root /opt/bin/check_abrt_memory.sh

* * * * * root /opt/bin/ocfs2_iscsi_conf_chg_timer.sh

*/10 * * * * root python /opt/bin/ocfs2_cluster_config.pyc -s

0 */12 * * * root python /opt/bin/ocfs2_filesystem_layout_backup.pyc

* * * * * root /opt/bin/tomcat_check.sh

*/10 * * * * root /opt/bin/ntp_mon.sh

* * * * * root /opt/bin/tomcat_check.sh

Extent backup directory

To locate an extent backup file in the extent backup directory, access the /vms/.ocfs2_extent_backup directory, and search by the file names for the target .lzo file.

In the following example, defaultPool_hdd is the storage pool, and the file name contains a timestamp.

ll –a /vms/.ocfs2_extent_backup/defaultPool_hdd/normal/

-rw-r--r-- 1 root root 176 Dec 24 00:00 .8257798_root_zhanji_1_202012240000.lzo

Therefore, the path of the most recent extent backup file is as follows:

/vms/.ocfs2_extent_backup/defaultPool_hdd/normal/.8257798_root_zhanji_1_202012240000.lzo.

Extent backup file decompression

To decompress an extent backup file, first copy it to another directory, /home or example.

cp /vms/.ocfs2_extent_backup/defaultPool_hdd/normal/.8257798_root_zhanji_1_202012240000.lzo /home

cd /home

lzop -dv .8257798_root_zhanji_1_202012240000.lzo

Script for data restoration

To run the script for restoring data from an extent backup file, execute the following command:

python /opt/bin/ocfs2_restore_utils.pyc dd /dev/dm-0 /home/.8257798_root_zhanji_1_202012240000 /vms/hw235-1/8257798_root_zhanji_1_202012240000_new

The parameters in the script are as follows:

· /dev/dm-0—Driver letter of the shared storage that saves the extent backup file. To check the drive letter of shared storage, execute the fsmcli command.

fsmcli showpool --name defaultPool_hdd

…

device name: /dev/dm-0

device path: /dev/disk/by-id/dm-name-360000000000000000e0000003b75836c

device naa: 360000000000000000e0000003b75836c

· /home/.8257798_root_zhanji_1_202012240000—Decompressed extent backup file.

· /vms/hw235-1—Path on newly created shared storage or local storage to save the restored file. Make sure the target path has enough space. Do not save the restored file to the original shared storage.

· 8257798_root_zhanji_1_202012240000_new—Name of the restored file. This name must be different from the name of the original file.

Shared storage space reclamation

Releasing space of a shared volume by editing the VM bus type

1. Execute the df –h command to check the available space of the target shared volume.

2. Log in to the VM with the shared volume attached and check the drive letter and mount path of the data disk provided by the shared volume.

3. Log in to UIS, shut down the VM, and delete the data disk.

Figure 15 Editing the VM

4. Mount the data disk to the VM again by adding hardware, and select the high-speed SCSI bus type.

Figure 16 Mounting the data disk

5. Log in to the VM, and mount the data disk again with the new drive letter.

mount /dev/sda /vms/ruitest

6. Execute the fstrim /vms/ruitest command to release space.

7. Log in to the host where the VM resides and verify that the available space of the shared volume has increased.

Releasing space of a shared volume by deleting files

1. Mount a data disk whose bus type is high-speed SCSI disk to a VM by using the following command:

mount -o discard /dev/sda /vms/ruitest

2. Verify that the discard option is specified in the mount command.

3. Log in to the host where the VM resides and check the available space of the shared volume.

4. Delete large file from the shared volume and verify that the available space of the shared volume has increased.

SNMP

Get responses not received by an NMS

Symptom 1

An NMS cannot receive get responses because the destination port for get responses is in use.

Solution 1

To resolve this issue:

1. Execute the netstat -apn |grep desination port command to obtain the process IDs for the destination port.

2. Execute the ps –aux | grep process ID command to check the processes that occupy the destination port.

3. If processes other than the snmp-get-responder process occupy the destination port, terminate those processes or kill them by using the kill process ID command.

Symptom 2

An incorrect OID is configured for SNMPv1 get responses on an NMS

Solution 2

To resolve this issue:

1. Log in to the leader storage node and execute the snmpget -v1 -c $community $ip:$port $oid command.

¡ $community—Community name. To ignore this configuration, enter public.

¡ $ip—Storage-end IP address.

¡ $port—Destination port for get responses.

¡ $oid—OID configured on the NMS.

If the following error message is output, the OID on the NMS is incorrect.

2. Modify the OID, and verify that the oid=string information is output.

Symptom 3

An incorrect OID is configured for SNMPv2c or SNMPv3 get responses on an NMS.

The storage supports the following OID ranges:

· 1.3.6.1.4.1.25506.1.7.1.2

· 1.3.6.1.4.1.25506.1.7.1.9

· 1.3.6.1.4.1.25506.1.7.1.10

· 1.3.6.1.4.1.25506.1.7.1.12

· 1.3.6.1.4.1.25506.1.7.1.13

On the NMS, a number in the range of 0 to 2147483647 is added to the end of an OID.

Solution 3

To resolve this issue:

1. Check the /var/log/onestor/snmp_get_responder.log file.

2. If the NoSuchObjectError error exists, the OID is not among the OIDs supported by the storage, and the OID does not exist in the MIB. Verify that the OID does not exceed the valid length.

3. If the NoAccessError error exists, the OID is not among the OIDs supported by the storage. The OID exists in the MIB, but the node does not have read or write permission. Verify that the OID is not shorter than the valid length.

4. If the ValueConstraintError error exists, make sure that the last number of the OID is in the range of 0 to 2147483647.

5. After you correct the OID, verify that the Success to write the vars log message is generated.

Value-added services

Data of a value-added service in the memory is different from that in the database

Analysis

This issue occurs if the handy node fails. Upon such a system event, a value-added service fails to update its data in the database, which causes data inconsistency between the memory and the database.

Solution

The solution varies by value-added service as follows:

· For the volume migration service, delete the inconsistent migration pairs, and then create migration pairs as needed.

· For the volume copy service, stop the inconsistent copy tasks, and then start copy tasks as needed.

Data inconsistency occurs if you mount a volume on a Windows client and create a snapshot online

Analysis

The product provides the storage-side snapshot function. When the system creates a snapshot, the host side might cache data. The hang IO service is used to implement data synchronization at multiple time points. This ensures that data is flushed to the data buffer on the host side at the time when a snapshot is created. Therefore, if the Windows client performs data caching at the time when a snapshot is created, data of the snapshot might be different from the real data.

Solution

As a best practice to avoid this issue, use an agent on the host side to achieve data caching and data flushing to the data buffer upon snapshot creation. However, such agent does not exist at present. Alternatively, you can take snapshots offline.

If you mount multiple snapshots of a volume on a Windows client at the same time, you are prompted that some snapshots are not initialized or assigned

Analysis

This issue might occur if you synchronously map a volume and its snapshots to the same host. The operating system of that host might recognize the source volume and its snapshots as the same volume, due to the volume recognition mechanism used by the operating system. For example, in the Oracle ASM scenario, a host identifies different volumes by ASM disk header information. This error will result in data corruption of the source volume and its snapshots.

Solution

Do not map a volume and its snapshots to the same host synchronously.

If you take a snapshot for a volume, delete its host mapping on the handy page without disk scanning or iSCSI disconnection, and restore the snapshot, the restored data is different from the original data.

Analysis

When the volume is unmapped from the host on the storage side, the host side is not aware of this event and still has data cache. If you restore data from the volume snapshot and mount the restored volume to the host again, data cache of the host will overwrite data of the restored volume.

Solution

Perform one of the following tasks before restoring data from the volume snapshot:

· Unmap the source volume from the host and perform disk scanning.

· Tear down the iSCSI connection.

If you create a read-only snapshot for a volume that is mounted by a directory, the snapshot cannot be mounted and the system prompts a wrong fs type message

Analysis

When you mount a volume on a Linux client, the new file system might not be flushed to the data buffer due to data caching. In this situation, if you take a snapshot for the mounted volume, the snapshotted file system is incomplete. Errors will occur if you mount the snapshot later.

Solution

Unmount the volume from the Linux client before snapshot creation.

The state of a snapshot is Creating, Deleting, or Restoring

Analysis

This issue might occur if the following conditions exist:

1. The system has an exception and thus fails to create, delete, or restore a snapshot.

2. The system cannot roll back its system records.

Solution

· For snapshots in Creating or Deleting state, manually delete the residual records generated for those snapshots.

· For snapshots in Restoring state, restore those snapshots again.

Compatibility

When the intel ixgbe network adapter is enabled with load balancing, storage access gets slow

To avoid this issue, perform the following tasks:

1. Use the ethtool –i eth0 command to check whether the driver is ixgbe.

2. Use the ethtool –k eth0 command to check whether the large-receive-offload (LRO) service is disabled.

3. If the LRO service is enabled, use the ethtool –K eth0 lro off command to disable this service.

To ensure that the LRO service is disabled upon startup, add the ethtool –K eth0 lro off command in the /etc/rc.local file.

Using a QoS policy with low bandwidth and IOPS limits makes the storage disks of a client slow

Analysis

The I/O of a client might drop to 0 if the following conditions exist:

· The client uses multiple storage disks and a QoS policy with low bandwidth and IOPS limits is applied to those disks.

· Each used storage disk has high I/O concurrency. For more information about I/O concurrency, see the configuration file in method 2.

If Number of storage disks × Number of I/O concurrencies per storage disk is greater than the number of concurrencies on the iSCSI initiator, those storage disks have high concurrency.

Solution

To resolve this issue, use one of the following methods:

· Method 1: Distribute the service load if the service load is heavy on a single client.

¡ If only one client is available and you must deploy multiple storage disks on the client, install the multipathing service on the client and configure multiple iSCSI connections.

¡ If you can use multiple clients, distribute storage disks across different clients.

· Method 2: Increase the I/O limit on the iSCSI initiator.

a. Open the iSCSI initiator configuration file on the client. The default path is /etc/iscsi/iscsid.conf.

b. Find the session and device queue depth area in the configuration file, and then increase the value to the maximum (2048) for the node.session.cmds_max parameter.

Figure 17 Original I/O limit

Figure 18 New I/O limit

c. After the modification, restart the iSCSI initiator.

Failure to recognize an encryption dongle by VMs

To add an encryption dongle to a VM, make sure that dongle supports USB over network.

If an issue persists, contact Technical Support.

After a USB device is plugged into a CVK host, the host cannot recognize the USB device

Symptom

After a USB device is plugged into a CVK host, you cannot find the USB device when you attempt to add a USB device on the Web management page of UIS.

Analysis

Troubleshoot this issue as follows:

1. This issue occurs if the USB device is plugged into an incorrect slot. You can insert the USB device to another slot, for example, a USB slot inside the server. If the server has multiple types of USB slots, make sure the USB device is plugged into the matching slot.

To check whether a USB device is plugged into the correct slot, use the lsusb –t command. The following is an output example:

root@cvk-163:~# lsusb -t

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M

/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/15p, 480M

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M

|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/8p, 480M

/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M

|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/6p, 480M

In the command output:

¡ UHCI represents USB 1.1. The maximum data transfer speed of USB 1.1 is 12Mbps.

¡ EHCI represents USB 2.0. The maximum data transfer speed of USB 2.0 is 480Mbps.

¡ XHCI represents USB 3.0. The maximum data transfer speed of USB 3.0 is 5Gbps.

If the server supports multiple USB standards and you plug a USB 2.0 device into the correct slot on the server, a USB device is added in the bus of USB 2.0 (ehci-pci).

At present, USB 3.0, 2.0, and 1.0 are supported. Although you can plug a lower-version USB device into a higher-version USB slot, USB device incompatibility issues might occur. For example, when you plug a USB 1.0 device into a server that has only USB 3.0 slots, disable USB3.0 for the BIOS of that server to avoid USB device incompatibility issues.

If the host still cannot recognize the USB device, proceed to the next step.

2. On the command shell of the CVK host, use the lsusb command before and after you plug the USB device into the host. Compare the outputs to identify whether a new USB device is added. The following is an output example:

root@ CVK:~# lsusb

Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

Bus 006 Device 002: ID 03f0:7029 Hewlett-Packard

Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub

If no new USB device is added, the Ubuntu operating system cannot recognize the USB device. In this situation, the USB device might have faults, because an operating system with the Linux kernel supports most of the USB devices on the market. To check whether the USB device operates correctly, you can plug the USB device into an office PC. If the USB device can operate correctly on the PC, it is normal and you need to proceed to the next step.

3. Check whether the CAS system has faults or the server is not compatible with the USB device.

a. Install the operating system of an office PC on the server, plug the USB device into the server, and then check whether the system can recognize the USB device.

- If it cannot be recognized, the server is not compatible with the USB device.

- If it can be recognized, the server is compatible with the USB device.

b. Install the native CentOS system on the server, plug the USB device into the server, and then check whether the system can recognize the USB device.

- If it cannot be recognized, the CentOS system does not support the USB device. Since UIS is CentOS-based, it also does not support the USB device.
If there is a new device, it shows that the CentOS system has recognized the device, continue with the following steps to troubleshoot.

- If it can be recognized, proceed to the next step.

4. Use the virsh nodedev-list usb_device command to view the name of the new USB device. The following is an output example:

root@ CVK:~# virsh nodedev-list usb_device

usb_2_1_5

usb_usb1

usb_usb2

usb_usb3

usb_usb4

As shown in the command output, the name of the new USB device is usb_2_1_5. Then, use the virsh nodedev-dumpxml xxx command to view XML information of USB device usb_2_1_5. The following is an output example:

NOTE:

The xxx argument represents the name of a device. You can obtain this information by using the virsh nodedev-list usb_device command.

root@CVK:~# virsh nodedev-dumpxml usb_2_1_5

<path>/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5</path>

</driver>

<product id='0x6545'>DataTraveler G2 </product>

<vendor id='0x0930'>Kingston</vendor>

</capability>

</device>

Check whether the bus ID, device ID, product ID, and vendor ID are correct. If these IDs are all correct and you still cannot find the USB device on the Web management page of UIS, contact Technical Support.

After a USB device is loaded to a VM, the device manager of the VM cannot recognize the USB device, or the USB device appear and disappear quickly, or there is an exclamation mark on the device

Symptom:

After a USB device is loaded to a VM, the device manager of the VM cannot recognize the USB device, or the USB device appear and disappear quickly, or there is an exclamation mark on the device

Analysis

To resolve this issue:

1. Connect the USB device to another USB connector. If you use a USB extension cable, connect the USB device directly to a build-in USB connector and try again. If the server provides USB slots of multiple types, make sure the USB device is connected to the correct connector.

To identify whether the USB device is connected to the correct connector, use the lsusb –t command.

root@cvk-163:~# lsusb -t

/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M

/: Bus 03.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/15p, 480M

/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M

|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/8p, 480M

/: Bus 01.Port 1: Dev 1, Class=root_hub, Driver=ehci-pci/2p, 480M

|__ Port 1: Dev 2, If 0, Class=hub, Driver=hub/6p, 480M

UHCI represents USB1.1, EHCI represents USB2.0, and XHCI represents USB3.0. Typically, the maximum transmission rate for USB1.1 is 12 Mbps, for USB2.0 is 480 Mbps, and for USB3.0 is 5 Gbps.

For example, if a server supports multiple USB bus standards, and a USB2.0 device is added to the server, and a USB device is then added to the USB2.0 (ehci-pci) bus, it indicates that the USB device is correctly inserted in the slot.

2. If the USB devices such as USB Key, encryption token, or SMS modem are USB1.0, and the server only has USB3.0 connectors, it is recommended to disable USB3.0 in the BIOS.

3. To identify whether the CVK host can recognize the USB device, unplug and plug in the USB device, and then use the virsh nodedev-list usb_device command to check if there are any newly added USB devices.

¡ If no newly added USB device is detected, see "After a USB device is plugged into a CVK host, the host cannot recognize the USB device."

¡ If a newly added USB device is detected, proceed to the next step.

4. When adding the USB device to a VM, it is important to examine if the selected USB controller is correct for the device and to identify the USB version of the device (USB 1.0, USB 2.0, or USB 3.0). Typically, for USB devices such as USB Key, encryption token, or SMS modem, it is recommended to use the USB 1.0 controller.

5. If the USB device is not recognized by the VM, it is possible that the driver may be incompatible or outdated. Examine if the driver version matches the operating system of the VM.

One way to identify whether the driver is correct is to install the same operating system on a physical machine and test if the driver works correctly or consult with the USB device manufacturer. Another way is to create a similar VM on the VMware platform, install the same driver, and load the USB device to see if it is recognized by the VM.

If the correct driver is used, and the VM still cannot recognize the device, proceed to the next step.

6. Use virsh nodedev-dumpxml xxx to view the XML information of the newly added USB device. xxx represents the name of the newly added USB device in the output from the virsh nodedev-list usb_device command.

root@ CVK:~# virsh nodedev-list usb_device

usb_2_1_5

usb_usb1

usb_usb2

usb_usb3

usb_usb4

In this example, the name of the newly added USB device is usb_2_1_5.

root@CVK:~# virsh nodedev-dumpxml usb_2_1_5

<path>/sys/devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.5</path>

</driver>

<product id='0x6545'>DataTraveler G2 </product>

<vendor id='0x0930'>Kingston</vendor>

</capability>

</device>

7. After loading the USB device to the VM, use the virsh nodedev-dumpxml xxx command again to examine if there is any change in the values of device ID, product ID, and vendor ID.

If there is a change in these values, it could be a compatibility issue between the server and the USB device. To troubleshoot this issue, try installing the same operating system used by the VM directly on the server and see if the USB device can be used normally. Examine the system logs for any errors. It is important to ensure that the USB device is not only visible but also functional. If the USB device works fine when the operating system is installed directly on the server, please contact H3C Support.

Use of USB3.0 devices

For a USB3.0 device, if you select the USB3.0 controller from the Web interface at USB device adding to a VM, but the USB device cannot be found in the VM after loading, possible reasons include:

· The VM lacks USB 3.0 driver. USB 3.0 is a relatively new protocol, and some old operating systems do not have the corresponding driver built-in, which requires downloading and installing the appropriate USB 3.0 driver for the corresponding operating system.

You can view the item in the red the following contents highlighted in the red box in the device manager in systems that support USB 3.0:

· The USB3.0 device is incompatible with the server. In this case, after you plug the USB 3.0 device into the server equipped with UIS, log in through an SSH terminal, and execute lsusb -t, no new devices can be displayed.

Use of USB-to-serial devices

Plug in a USB-to-serial device into a server equipped with UIS, log in through an SSH terminal, and use lsusb -t to check for new USB devices. If the speed of the newly added device is 12 Mbps, select the USB 1.0 controller when you add the USB device to a VM. If the speed is 480 Mbps, select the USB 2.0 controller.

For example:

After you load a USB-to-serial port device to a VM, no newly added serial port device can be viewed on the VM. After you install the USB-to-serial driver on the VM, the device still cannot be displayed. This issue occurs because the selected USB2.0 controller does not match the device speed. The issue is removed after you change to a USB1.0 controller.

A USB-to-serial cable is connected to four switches on one end and connected to a UIS-equipped server on the other end. After you log in through an SSH terminal and use the lsusb -t command to view new devices, the four newly added devices cannot be seen simultaneously. If you unplug and then plug the cable repeatedly, only one, two, or three devices can be seen. When an unrecognized USB connector is plugged in, the following syslog is generated:

The log is generated because of bus negotiation errors occurred at device and server connection establishment. In this case, identify whether the server is compatible with the USB-to-serial connection method as a best practice. In this example, the server is not compatible with the method. After the HP FlexServer R390 server used on-site is replaced with an R590 server, all the four new devices can be correctly identified.

Performance improvement

Disk performance optimization

The disk queue mode for the E0705 and E0706 versions is cfq (E0707 version). This mode results in poor SSD performance, and also significantly impacts the I/O performance of OCFS2 shared storage volumes. This ultimately leads to poor cluster performance and affects the VM performance. To resolve this issue, switch to the deadline mode.

· Permanent change:

[root@cvknode1 ~]# cat /proc/cmdline

BOOT_IMAGE=/boot/vmlinuz-4.14.0-generic root=UUID=da51eb22-6c64-4b3b-af57-960a117823c4 ro biosdevname=0 rhgb elevator=deadline transparent_hugepage=always net.ifnames=0 crashkernel=256M quiet

Edit grub configuration:

python /opt/bin/util_kernel_cmdline.pyc -s elevator=deadline transparent_hugepage=always net.ifnames=0 crashkernel=256M

If additional grub configurations exist, include them as parameters of the command.

· Online modification:

Edit the sd device:

for i in `ls /sys/block/sd*/queue/scheduler`; do echo "deadline" > ${i};done

Edit the dm device:

for i in `ls /sys/block/dm*/queue/scheduler`; do echo "deadline" > ${i};done

The permanent modification method requires the host to be restarted in order for the changes to take effect. The online modification method does not take effect on newly added block devices and these devices continue to use the default cfq mode.

Performance optimization

Adjusting the I/O priority

On the VMs > Edit > Summary page, set the I/O priority to High.

Adjusting the CPU operating mode

On the VMs > Edit > CPU page, set the operating mode to Straight-Through.

By default, the operating mode is Compatible. This mode virtualizes physical CPUs of different models into vCPUs of the same model to provide compatibility.

The straight-through mode enables the guest OS to access the physical CPUs directly. This mode provides higher performance than compatible mode.

Adjusting the VM disk mounting method

In the case of shared storage, when you create a VM, the system creates a volume on the shared storage for VM disks by default. In scenarios that have higher performance requirements, use the raw block method to directly provision the volume to the VM, bypassing the file system layer of the CVK.

Figure 19 Creating a volume for VM disks

Figure 20 Mounting the volume to the VM through raw block

Figure 21 Information about the VM disk mounted through raw block

Adjusting the VM disk preprovisioning method

· For VMs deployed on shared storage that experience performance issues, you can adjust the disk preallocation method to improve performance. When creating a volume, change the preallocation method to thin provisioning.

· Increase the VM memory size.

· Change the log severity level.

ceph tell osd.* injectargs --debug_osd=1/1

ceph tell osd.* injectargs --debug_ms=0/0

ceph tell osd.* injectargs --debug_bluestore=1/1

ceph tell osd.* injectargs --debug_bluefs=1/1

ceph tell osd.* injectargs --debug_rocksdb=1/1

ceph tell osd.* injectargs --debug_bdev=1/1

· Change the I/O size.

cd /proc/sys/dev/flashcache;for i in `ls`; do cd ${i}; echo 16 > skip_seq_thresh_kb; cd ..; done //16 indicates that the system skips flashcache for I/Os that has a higher value than 16.

Note that the adjustment is applicable only to small I/O, such as databases, and has little meaning for copy modification operations.

· Change the number of replicas.

This can help improve performance.

CAUTION:

Changing the number of replicas can affect data balance and might cause system risks. If you are to change the number of replicas, contact Technical Support.

· Create a Window scaled-out file server

After adding disks to VMs, select to execute fast initialization when you perform volume initialization.

Guest OS and VM restoration

Restrictions and guidelines

· This document provides a general Linux and Windows OS repair process, which can be referenced for other systems.

· Disaster recovery system repair does not ensure complete success. Perform data backup and take other necessary measures in advance.

· The repair method might not be able to completely repair the VM. If the damage is severe and cannot be repaired using ISO or related tools, professional disaster recovery tools might be needed for data recovery and rescue, such as Diskgenius and diskrec. If necessary, contact a professional data recovery company for assistance.

Preparation before repair

Backup of system disks

For a damaged system's hard drive, perform a full disk backup in advance as a best practice, in case one repair attempt fails and additional repair methods need to be attempted.

For a damaged hard drive, you can use dd or other backup tools to copy the disk and create a backup.

In virtualization systems, you can back up the VM image file and clone it to another storage pool. Alternatively, you can create a snapshot on the storage side for the disk data to prevent unexpected situations during repair.

Preparing the corresponding ISO system

For Linux systems, prepare a CentOS or Ubuntu ISO installation disk to facilitate repair of Linux system directories. For Windows systems, use the ISO file or disk with the same version as the damaged system.

CAUTION:

· As a best practice, use the same version or a newer version of the ISO to mount and repair the system.

· During the repair process, it may be discovered that the file system format in the old version of the ISO is incompatible with the new version, leading to repair failure.

Linux system repair steps

1. Mount the optical drive and configure the system to boot from the optical drive, and then restart the system.

In a virtual environment using CAS, mount the ISO file as the optical drive on the VM to be repaired. On the Edit VM page, set the boot sequence to prioritize booting from the optical drive.

2. Start the system and attempt to repair it on the terminal.

In a virtual environment, locate the IP address of the CVK used by the VM and the corresponding VNC port in the CAS interface. Use a VNC client installed on your PC to connect to the port. TightVNC is a recommended VNC client.

NOTE:

As a best practice, do not use a browser console because some browsers may require frequent clearing of the browser cache to open the corresponding page after a few operations.

3. On the CentOS control interface, select Troubleshooting.

4. Select Rescue a CentOS System.

5. Select option 3 to enter the shell command prompt.

If an older version of the CentOS ISO is used, you can select the corresponding Skip button to enter the shell interface. The options for older CentOS versions include Continue, Read-only, Skip, and Advanced.

If using the Ubuntu ISO for repair, select Execute a shell in the installer environment.

CAUTION:

· The Ubuntu 1804 ISO repair mode does not have the XFS related tools installed by default. As a best practice, use the latest version of CentOS for XFS repair.

· Make sure to use the matching or updated version of the ISO.

6. Use LVS to check if LVs are being used.

As shown in the following figure, 3 LVs are found, the swap does not need to be repaired, and the corresponding VG name is centos.

Use the lvchange -a y command to activate the corresponding LV to make it readable.

lvchange -a y centos/home

lvchange -a y centos/root

Check the file system on the corresponding LV. Different file systems require different repair commands. Use blkid /dev/centos/home to identify the file system.

blkid /dev/centos/home

CAUTION:

· Different installation systems might have different VGs (some are centos, while others are VolGroup01, etc.). Select the VGs appropriately based on the actual output content.

· If the system does not use LVM, use blkid to identify the file system on the corresponding /dev/sdaX partition.

7. Repair XFS.

xfs_repair /dev/centos/lv_root

If the repair fails, collect log information (if any) and contact Technical Support.

8. Repair Ext4.

fsck /dev/datavg/lv_data

You might be prompted to enter yes in the middle, please do so. The repair steps for other file systems are similar.

9. Shut down the VM by executing the init 0 command.

10. Unmount the ISO drive and fall back to booting from the hard disk, and then restart the system.

11. Upon reboot, verify that the system's operations are normal.

Windows repair operations and steps

Symptom

After a CAS upgrade, a Windows 2008 VM prompts for repair upon starting up. Selecting repair results in a loading screen freeze, while selecting normal startup results in a black screen.

Repair steps

1. Attach the disk to another working Windows VM.

If the object being repaired is a VM, you can mount the system disk image of the faulty VM onto a working Windows VM. Then, use the disk check tool provided by Windows to check and repair disk errors. Delete the system disk of the faulty VM via the Edit VM > Disk page with the Delete Hardware operation.

2. On the working VM, add the system disk of the faulty VM via the Add Hardware option.

3. Select the faulty VM image. At this point, the system disk of the faulty VM can be seen in the working system.

For Windows 2012, a similar process applies. Select Computer Management, select a disk to view its properties, and perform error checking.

4. After mounting the disk, an error message might appear. Click on the blue error area to proceed.

Alternatively, scan and repair the properties of both partitions.

CAUTION:

· For both the process of operation and the image files, please use original system ISO files.

· In a virtualized environment, for qcow2 formatted files, multiple VMs cannot mount the same file at the same time. Therefore, one VM should unmount the file before another VM can mount the file for repair. A RAW format, preallocate set to zero format, or raw block format image can be mounted to multiple VMs simultaneously.

5. If errors persist after repair, an ISO file needs to be mounted for further repair. Reattach the repaired disk to the faulty VM. A black screen error might appear, indicating boot failure or bootmgr missing.

6. Mount the system disk in the optical drive to repair the bootmgr. Change the boot order to booting from the optical drive. In Windows 2008, open Repair Computer and select the command prompt window.

7. Enter the command below to repair the bootmgr file. The machine should restart normally after the bootmgr is repaired.

CAUTION:

· In a virtualization environment, select an IDE disk and mount the appropriate version of the ISO file.

· If the system still reports errors after repair, such as antivirus software or application startup errors, the related software or program needs to be closed or uninstalled (modify the name so that it cannot be started) in a normally working Windows system. Try booting the system again and according to the specific error information, make corresponding adjustments and modifications.

Space occupation issue

The stable operation of UIS depends on key partitions like the root partition and /var/log partition. When these partitions are full, some critical services might fail.

Space occupation issue due to manual operations

When the operator stores large files in the root partition or log partition, these partitions might be fully occupied. To resolve this issue:

1. Use du to identify the names of large files. For example, check the /var/log directory.

2. Confirm with the customer if the files are valid data.

3. Determine whether to move these files to another directory or delete them.

Space occupation issue due to software issues

Space occupation due to the large size of the /var/log/secure file

The space might be fully occupied because the size of the /var/log/secure file is too large as follows:

This issue is already known in versions earlier than UIS 6.5. The secure log compression mechanism is imperfect, which might the /var/log/secure file to become too large.

To resolve this issue temporarily, clear the secure file:

1. Access the /var/log/ directory.

2. Clear the secure file in the directory.

To resolve this issue permanently, upgrade UIS.

Space occupation due to /var/spool/postfix/maildrop/

The /var/spool/postfix/maildrop/ directory on the host records scheduled task execution logs. In early versions, these logs accumulate over time with the operation of the UIS hyper-converged environment. Then, the size of the /var/spool/postfix/maildrop/ directory increases, eventually occupying the full space of the root partition. To fundamentally resolve this issue, upgrade UIS to the most recent version.

To resolve this issue temporarily:

1. Create an empty directory in the /var/log/ path, such as blkdir.

2. Delete the /var/spool/postfix/maildrop file.

IMPORTANT:

The deletion process might take several hours. This step is required on all nodes in the cluster with full root partitions. To ensure the deletion success, do not interrupt the deletion task.

Log message exception

Message The maximum number of pending replies per connection has been reached generated

Symptom

The following message is generated in the /var/log/messages file on the host system:systemd-logind: Failed to start session scope session-c202601308.scope: The maximum number of pending replies per connection has been reached.

Solution

To resolve this issue:

1. Edit the org.freedesktop.NetworkManager.conf file in the /etc/dbus-1/system.d/ path. Before you edit this file, back up it as needed.

2. Increase the value for the max_replies_per_connection field in the configuration file, such as 10240.

3. Reboot the related services.

systemctl daemon-reexec

systemctl restart systemd-logind.service

Unified authentication issue

CAS authentication service exception

Symptom

After the CAS service is enabled, you cannot UIS due to CAS authentication failure or other issues.

Solution

1. SSH to the CLI console of CVM and execute the mysql –p uis command to access the MySQL console.

2. Execute MariaDB [uis]> update TBL_PARAMETER set VALUE='0' WHERE NAME='cas.sso.enable';.

3. Reboot the UIS service: service uis-core restart.

4. Log in to UIS through the browser again.

D-state process issue

Symptom

Due to storage issues or storage network failures, many processes appear in D state. This applicable to scenarios where the cluster only has block service deployed or uses external iSCSI storage.

Solution

IMPORTANT:

Execute the commands in this section based on the actual conditions instead of copying the directly.

To resolve this issue:

1. Continuously stop the fsm_core.service and iscsi services, which requires two SSH windows.

¡ To stop fsm_core.service continuously: while true; do systemctl stop fsm_core.service; sleep 1; done

¡ To stop iscsi continuously: while true; do iscsiadm -m node -u; sleep 1; done

To terminate the execution of these two commands, press Ctrl + C, separately.

2. Disconnect the iscsi session: iscsiadm -m node -T IQN -u, where the IQN value is filled in as needed. For example:

3. Stop the fsm_core service.

4. Observe for several minutes to identify whether the D-state processes disappear.

5. Reboot the fsm_core server after the D-state processes disappear.

IMPORTANT:

If you have executed while xxxxx commands, terminate those while commands first before performing this step.

If this issue persists after you perform the above steps, use a maintenance window to stop the corresponding services on hosts, and then restart the host.

Commonly used commands

UIS Manager commands

HA commands

H3C UIS Manager provides HA features. The following are the commonly used HA commands.

All the following commands, except for the cha -k set-loglevel level command run on a node where UIS Manager is deployed. The cha -k set-loglevel level command runs on a CVK host.

Obtaining the clusters managed by the HA process

cha cluster-list

# Obtain the clusters managed by the HA process.

root@UIS-UISManager:~# cha cluster-list

------------------------------------------------------------

HA database info:

Cluster list:

cluster:1, name:Cluster

HA memory info:

Cluster list:

cluster ID: 1

Obtaining state statistics for a cluster

cha cluster-status cluster-id

# Obtain the hosts and VMs in a cluster.

root@UIS-UISManager:~# cha cluster-status 1

------------------------------------------------------------

HA database info:

Cluster 1 information:

Is HA enabled: 1

Cluster priority: 1

2 nodes configured

6 VM configured

host and vm list:

Host:UIS-CVK01, vm:windows2008

Host:UIS-CVK02, vm:win2008

Host:UIS-CVK02, vm:rhce-lab

Host:UIS-CVK02, vm:Linux-RedHat5.9

Host:UIS-CVK02, vm:fundation1

Host:UIS-CVK02, vm:win7

HA memory info:

Cluster 1, Least_host_number(MIN_HOST_NUM) is 1.

Obtaining information for hosts in a cluster

cha node-list cluster-id

# Obtain information for hosts and VMs in a cluster.

root@UIS-UISManager:~# cha node-list 1

------------------------------------------------------------

HA database info:

In cluster 1, node list :

host: UIS-CVK01, in cluster: 1, IP: 192.168.11.1

host: UIS-CVK02, in cluster: 1, IP: 192.168.11.2

HA memory info:

Cluster 1, Least_host_number(PermitNum) is 1. hosts list:

host: UIS-CVK02 ID: 4

host: UIS-CVK01 ID: 3

Total host num in this cluster is: 2

Obtaining information for a host in a cluster

cha node-status host-name

# Obtain information for a host in a cluster.

root@UIS-UISManager:~# cha node-status UIS-CVK01

------------------------------------------------------------

HA database info:

Node UIS-CVK01 :

in cluster: 1

ip address: 192.168.11.1

VM count: 1

HA memory info:

Host: UIS-CVK01, ID: 3, IP address: 192.168.11.1

status: CONNECT

heart beat num: 101

storage total num: 1

storage fail num: 0

heartbeat fail num: 0

recv packet: 1

host model(maintain): 0

time statmp: Fri Jan 30 15:34:04 2015

Storage info:

storage name:sharefile path:/vms/sharefile

storage status:STORAGE_NORMAL

time stamp:0

update flag:0

last send flag:0

fail num:0

Obtaining information for a VM on a host

cha vm-list host-name

# Obtain information for a VM on a host.

root@UIS-CVK03:~# cha vm-list UIS-CVK01

------------------------------------------------------------

HA database info:

1 vms in host UIS-CVK01 :

vm: windows2008 ID: 11 HA-managed: 1 Target-role: 1

Obtaining information for a VM in a cluster

cha vm-status vm-name

# Obtain information for a VM in a cluster.

root@UIS-CVK03:~# cha vm-status windows2008

------------------------------------------------------------

HA database info:

vm ID: 11 name: windows2008

at node ID: 3

target-role: 1

is-managed: 1

prority: 1

storage name: sharefile

storage psth: /vms/sharefile

Setting the log level

cha set-loglevel module level

Parameters:

· cmd|UIS managerd: Sets the log level for the cmd or UIS Manager process.

· level: Specifies the log level, including debug, info, trace, warning, error, and fatal.

# Set the log level.

root@UIS-UIS Manager:~# cha set-loglevel info

Setting the log level for a CVK host

cha -k set-loglevel level

Parameters:

level: Specifies the log level, including debug, info, trace, warning, error, and fatal.

# Set the log level for a CVK host.

root@UIS-CVK01:/vms/sharefile# cha -k set-loglevel debug

Set cvk log level success.

root@UIS-CVK01:/vms/sharefile#

vSwitch commands

The following are the basic commands for vSwitches in UIS Manager.

Obtaining the internal version number of the vSwitch

root@hz-cvknode2:~# ovs-vsctl -V

ovs-vsctl (Open vSwitch) 2.9.1

DB Schema 7.15.1

Displaying status of processes related to the vSwitch

Execute the ps aux | grep ovs command on a CVK host. ovs_workq is an OVS kernel process, and ovsdb-server and ovs-vswitchd represent a monitor process and service process, respectively.

root@UIS-CVK01:~# ps aux | grep ovs

root 2207 0.0 0.0 0 0 ? S Dec07 0:00 [ovs_workq]

root 3411 0.0 0.0 23228 772 ? Ss Dec07 6:44 ovsdb-server: monitoring pid 3412 (healthy)

root 3412 0.0 0.0 23888 2656 ? S Dec07 6:15 /usr/sbin/ovsdb-server /etc/openvswitch/conf.db --verbose=ANY:console:emer --verbose=ANY:syslog:err --log-file=/var/log/openvswitch/ovsdb-server.log --detach --no-chdir --pidfile --monitor --remote punix:/var/run/openvswitch/db.sock --remote db:Open_vSwitch,Open_vSwitch,manager_options --remote ptcp:6632 --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert

root 3421 0.0 0.0 23972 804 ? Ss Dec07 7:23 ovs-vswitchd: monitoring pid 3422 (healthy)

root 3422 0.4 0.0 1721128 9364 ? Sl Dec07 55:24 /usr/sbin/ovs-vswitchd --verbose=ANY:console:emer --verbose=ANY:syslog:err --log-file=/var/log/openvswitch/ovs-vswitchd.log --detach --no-chdir --pidfile --monitor unix:/var/run/openvswitch/db.sock

root 23503 0.0 0.0 8112 936 pts/10 S+ 10:43 0:00 grep --color=auto ovs

Restarting a vSwitch

root@UIS-CVK01:~# service openvswitch-switch restart

Adding a vSwitch

root@UIS-CVK01:~# ovs-vsctl add-br vswitch-app

After a vSwitch is added successfully, you can see the vSwitch on UIS Manager after connecting all hosts on UIS Manager.

Deleting a vSwitch

root@UIS-CVK01:~# ovs-vsctl del-br vswitch-app

A vSwitch cannot be deleted from UIS Manager after it is deleted from the CVK host.

Adding a port for a vSwitch

root@UIS-CVK01:~# ovs-vsctl add-port vswitch-app eth2

Deleting a port from a vSwitch

root@UIS-CVK01:~# ovs-vsctl del-port vswitch-app eth2

The port on a vSwitch cannot be deleted from UIS Manager after it is deleted from the CVK host.

Displaying vSwitch and port information

vswitch0 is an internal port (or local port), eth0 is a physical port, and vnet0 is a vSwitch port.

root@UIS-CVK01:~# ovs-vsctl show

ba390c40-8826-4a7a-8e17-f8834dab6eb3

Bridge "vswitch0"

Port "eth0"

Interface "eth0"

Port "vswitch0"

Interface "vswitch0"

type: internal

Port "vnet0"

Interface "vnet0"

root@UIS-CVK01:~#

Displaying the configuration on a vSwitch

root@UIS-CVK01:~# ovs-vsctl list br vswitch0

_uuid : 3500114d-5619-460e-ada7-d1b97f63c93c

br_mode : 【0】

controller : 【】

datapath_id : "0000ac162d88c35c"

datapath_type : ""

drop_unknown_uniUISt: 【】

external_ids : {}

fail_mode : 【】

firewall_port : 【】

flood_vlans : 【】

flow_tables : {}

ipfix : 【】

mirrors : 【】

name : "vswitch0"

netflow : 【】

other_config : {}

ports : 【16a48463-f90b-42fe-9a12-ceacfd256235, 5495812e-29e0-4364-a89f-b54ea52dd344, dec98186-2c83-447d-9215-28f99750a410】

protocols : 【】

sflow : 【】

status : {}

stp_enable : false

Displaying port configuration

root@UIS-CVK01:~# ovs-vsctl list port vnet0

_uuid : bc0b1e57-2d72-4fae-97b4-0bbca5d17ba1

TOS : routine

bond_downdelay : 0

bond_fake_iface : false

bond_mode : []

bond_updelay : 0

dynamic_acl_enable : false

external_ids : {}

fake_bridge : false

interfaces : [5495133f-7e81-4047-a0bd-734fae81f6f3]

lacp : []

lan_acl_list : []

lan_addr : []

mac : []

name : "vnet0"

other_config : {}

qbg_mode : [4]

qos : []

statistics : {}

status : {}

tag : [4]

tcp_syn_forbid : false

trunks : []

vlan_mode : []

vm_ip : []

vm_mac : "0cda411dad80"

wan_acl_list : []

wan_addr : []

Displaying the port number for a port in user mode and kernel mode

root@UIS-CVK01:~# ovs-appctl dpif/show

system@ovs-system: hit:10133796 missed:181938

flows: cur: 11, avg: 12, max: 23, life span: 79639399ms

hourly avg: add rate: 26.506/min, del rate: 26.462/min

daily avg: add rate: 24.205/min, del rate: 24.210/min

overall avg: add rate: 24.356/min, del rate: 24.354/min

vswitch0: hit:6478229 missed:39021

eth0 1/5: (system)

vnet1 2/8: (system)

vswitch0 65534/6: (internal)

For example, the port number of ether0 is 2 in user mode (OpenFlow port number) and 5 in kernel mode.

Displaying the MAC addresses on a vSwitch

root@UIS-CVK01:~# ovs-appctl fdb/show vswitch0

port VLAN MAC Age

1 0 00:0f:e2:5a:6a:20 134

2 0 0c:da:41:1d:3d:18 95

1 0 ac:16:2d:6f:3f:4a 6

1 0 a0:d3:c1:f0:a6:ca 6

1 0 c4:ca:d9:d4:c2:ff 2

4 0 0c:da:41:1d:6d:94 2

LOCAL 0 2c:76:8a:5d:df:a2 2

3 0 0c:da:41:1d:80:03 0

Displaying port binding information on a vSwitch

root@UIS-CVK02:~# ovs-appctl bond/show

---- vswitch-bond_bond ----

bond_mode: active-backup

bond-hash-basis: 0

updelay: 0 ms

downdelay: 0 ms

lacp_status: off

slave eth2: enabled

active slave

may_enable: true

slave eth3: disabled

may_enable: false

Displaying flow entry information

root@UIS-CVK01:~# ovs-ofctl dump-flows vswitch0

NXST_FLOW reply (xid=0x4):

cookie=0x0, duration=752218.541s, table=0, n_packets=15106363, n_bytes=3556156038, idle_age=0, hard_age=65534, priority=0 actions=NORMAL

Displaying kernel flow entry information on a vSwitch

root@UIS-CVK01:~# ovs-appctl dpif/dump-flows vswitch0

skb_priority(0),in_port(5),eth(src=74:25:8a:36:d8:9b,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=10.88.8.1/255.255.255.255,tip=10.88.8.206/255.255.255.255,op=1/0xff,sha=74:25:8a:36:d8:9b/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:2, bytes:120, used:3.018s, actions:6

skb_priority(0),in_port(5),eth(src=38:63:bb:b7:ed:6c,dst=01:00:5e:00:00:fc),eth_type(0x0800),ipv4(src=10.88.8.140/0.0.0.0,dst=224.0.0.252/0.0.0.0,proto=17/0,tos=0/0,ttl=1/0,frag=no/0xff), packets:1, bytes:66, used:1.139s, actions:6

skb_priority(0),in_port(5),eth(src=c4:34:6b:6c:ef:a8,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.200/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:17, bytes:1564, used:3.370s, actions:6

skb_priority(0),in_port(5),eth(src=14:58:d0:b7:24:07,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.229/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no/0xff), packets:6, bytes:692, used:0.771s, actions:6

skb_priority(0),in_port(5),eth(src=14:58:d0:b7:53:f6,dst=01:00:5e:7f:ff:fa),eth_type(0x0800),ipv4(src=10.88.8.146/0.0.0.0,dst=239.255.255.250/0.0.0.0,proto=17/0,tos=0/0,ttl=1/0,frag=no/0xff), packets:1, bytes:175, used:0.739s, actions:6

Displaying all kernel flow entries

root@UIS-CVK01:~# ovs-dpctl dump-flows

skb_priority(0),in_port(4),eth(src=c4:34:6b:6c:f5:ab,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0800),ipv4(src=10.88.8.159/0.0.0.0,dst=10.88.9.255/0.0.0.0,proto=17/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:25, bytes:2300, used:0.080s, actions:3

skb_priority(0),in_port(5),eth(src=14:58:d0:b7:53:f6,dst=33:33:00:01:00:02),eth_type(0x86dd),ipv6(src=fe80::288d:70d6:36ce:60f3/::,dst=ff02::1:2/::,label=0/0,proto=17/0,tclass=0/0,hlimit=1/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:6

skb_priority(0),in_port(13),eth(src=0c:da:41:1d:80:03,dst=c4:ca:d9:d4:c2:ff),eth_type(0x0800),ipv4(src=192.168.2.15/255.255.255.255,dst=192.168.2.121/0.0.0.0,proto=6/0,tos=0/0,ttl=128/0,frag=no/0xff), packets:1, bytes:54, used:2.924s, actions:2

skb_priority(0),in_port(4),eth(src=c4:34:6b:68:9b:78,dst=33:33:00:00:00:02),eth_type(0x86dd),ipv6(src=fe80::85b7:25a0:d116:907a/::,dst=ff08::2/::,label=0/0,proto=17/0,tclass=0/0,hlimit=128/0,frag=no/0xff), packets:0, bytes:0, used:never, actions:3

skb_priority(0),in_port(4),eth(src=5c:dd:70:b0:39:3d,dst=ff:ff:ff:ff:ff:ff),eth_type(0x0806),arp(sip=192.168.11.149/255.255.255.255,tip=192.168.11.150/255.255.255.255,op=1/0xff,sha=5c:dd:70:b0:39:3d/00:00:00:00:00:00,tha=00:00:00:00:00:00/00:00:00:00:00:00), packets:1, bytes:60, used:0.264s, actions:3

Capturing packets on a port

Use tcpdump to capture packets on the port corresponding to the vSwitch: For more information about the tcpdump command, see "Networking."

tcpdump -i vnet1 -s 0 -w /tmp/test.pcap host 200.1.1.1 &

iSCSI commands

H3C UIS uses iSCSI to mount IP SAN storage devices. When an iSCSI shared file system has exceptions, you can use iSCSI commands for troubleshooting. To enable iser mode, add the -I iser option to the iscsiadm command.

Discovering iSCSI storage

iscsiadm -m discovery -t st -p ISCSI_IP or

iscsiadm -m discovery -t st -p ISCSI_IP –I iser (iser mode)

# Discover iSCSI sotorage.

root@HZ-UIS01-CVK01:~# iscsiadm -m discovery -t st -p 192.168.1.248:3260

192.168.1.248:3260,1 iqn.1991-05.com.microsoft:c09599-cmh-target

root@HZ-UIS01-CVK01:~#

Displaying iSCSI storage discovery records

iscsiadm -m node

# Display iSCSI storage discovery records.

root@HZ-UIS01-CVK01:~# iscsiadm -m node

192.168.1.248:3260,1 iqn.1991-05.com.microsoft:c09599-cmh-target

Deleting the iSCSI storage discovery records

iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP

iscsiadm -m node -o delete -T LUN_NAME -p ISCSI_IP –I iser (iser mode)

# Delete the iSCSI storage discovery records.

# iscsiadm -m node -o delete -T iqn.1991-05.com.microsoft:c09599-cmh-target -p

192.168.1.248:3260

Logging in to an iSCSI storage device

iscsiadm -m node -T LUN_NAME -p ISCSI_IP –l or

iscsiadm -m node -T LUN_NAME -p ISCSI_IP –l –I iser (iser mode)

# Log in to an iSCSI storage device.

root@HZ-UIS01-CVK01:~# iscsiadm -m node -T iqn.1991-05.com.microsoft:c09599-cmh-target

-p 192.168.1.248:3260 -l

Logging in to 【iface: default, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:

192.168.1.248,3260】

192.168.1.248,3260】: successful

Logging out of an iSCSI storage device

iscsiadm -m node -T LUN_NAME -p ISCSI_IP –u

iscsiadm -m node -T LUN_NAME -p ISCSI_IP –u –I iser (iser mode)

# Log out of an iSCSI storage device.

root@HZ-UIS01-CVK01:~# iscsiadm -m node -T iqn.1991-05.com.microsoft:c09599-cmh-target

-p 192.168.1.248:3260 -u

Logging out of session 【sid: 4, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:

192.168.1.248,3260】

Logout of 【sid: 4, target: iqn.1991-05.com.microsoft:c09599-cmh-target, portal:

192.168.1.248,3260】: successful

Mounting FC storage

Obtaining the HBA card information

Method 1: Log in to the CVM system, access the storage management page, and then click a storage adapter to view HBA card information. If the card is in active state, storage access is available.

Method 2: Display driver information. If the driver is loaded correctly for the HBA card, HBA information will be displayed in the /sys/class/fc_host/host* directory.

[root@cvknode2-158 /]#ls /sys/class/fc_host/

host0 host2 host3 host4

[root@cvknode2-158 /]#ls /sys/class/fc_host/host0

device issue_lip npiv_vports_inuse port_state speed supported_classes system_hostname vport_create

dev_loss_tmo max_npiv_vports port_id port_type statistics supported_speeds tgtid_bind_type vport_delete

fabric_name node_name port_name power subsystem symbolic_name uevent

Connecting to the FC storage

Execute the following command:

echo hba_channel target_id target_lun > /sys/class/scsi_host/host*/scan

Hba_channel represents the HBA card channel, target_id represents the target ID, and target_lun represents the LUN. To obtain the information, execute the /sys/class/fc_transport/ command.

[root@cvknode2-158 /]#ls /sys/class/fc_transport/

target0:0:0

[root@cvknode2-158 /]# echo 0 0 0 > /sys/class/scsi_host/host0/scan

Disconnecting the FC storage

Execute the following command:

echo 1 > /sys/block/sdX/device/delete

sdX represents the SD corresponding to the FC storage device. To obtain the SD ID, execute the ll command.

[root@cvknode2-158 /]# ll /dev/disk/by-path

lrwxrwxrwx 1 root root 9 Oct 12 09:48 pci-0000:05:00.0-fc-0x21020002ac01e2d7-lun-0 -> ../../sdb

[root@cvknode2-158 /]# echo 1 > /sys/block/sdb/device/delete

Tomcat commands

H3C UIS Manager provides the Tomcat service. When an exception occurs, you can restart the Tomcat service.

To view the Tomcat status:

root@HZ-UIS01-CVK01:~# service tomcat8 status

* Tomcat servlet engine is running with pid 3362

To stop the Tomcat service:

root@HZ-UIS01-CVK01:~# service tomcat8 stop

* Stopping Tomcat servlet engine tomcat8

...done.

To start the Tomcat service:

root@HZ-UIS01-CVK01:~# service tomcat8 start

* Starting Tomcat servlet engine tomcat8

...done.

To restart the Tomcat service:

root@ HZ-UIS01-CVK01:~# service tomcat8 restart

* Stopping Tomcat servlet engine tomcat8

...done.

* Starting Tomcat servlet engine tomcat8

...done.

root@ HZ-UIS01-CVK01:~#

MySQL database commands

H3C UIS Manager uses MySQL to provide database service.

To view the MySQL service status:

root@HZ-UIS01-CVK01:~# service mysql status

mysql start/running, process 3039

To stop the MySQL service:

root@HZ-UIS01-CVK01:~#

root@HZ-UIS01-CVK01:~# service mysql stop

mysql stop/waiting

To start the MySQL service:

root@HZ-UIS01-CVK01:~# service mysql start

mysql start/running, process 4821

virsh commands

virsh commands allow you to obtain VMs attached to a CVK host and the VM status. In addition, you can start and shut down the VMs by using the commands.

Displaying the VM status from a CVK host

Execute the virsh list --all command to view the status of all VMs on the host.

root@UIS-CVK01:/vms# virsh list --all

Id Name State

----------------------------------------------------

4 windows2008 running

- Linux-RedHat5.9 shut off

Starting a VM from a CVK host

Execute the virsh start VM name command.

root@UIS-CVK01:/vms# virsh start Linux-RedHat5.9

Domain Linux-RedHat5.9 started

root@UIS-CVK01:/vms#

Shutting down a VM from a CVK host

Execute the virsh shutdown VM name command.

root@UIS-CVK01:/vms# virsh shutdown Linux-RedHat5.9

Domain Linux-RedHat5.9 is being shutdown

casserver commands

The casserver service collects statistics such as disk usage and alarm information. When an exception occurs on the casserver service, you can use the service casserver restart command to restart the casserver service:

qemu commands

Use qemu commands to display image file information and convert disk file formats.

Displaying image file information for a VM

On UIS Manager, you can view the image file path for a VM. The Storage Path field displays the path for the image file for the VM.

To display basic information for an image file, for example, file format, file size, and used file size, execute the qemu-img info command. For a three-level image file, the level-2 image file name will also be displayed.

root@ZJ-UIS-001:~# qemu-img info /vms/defaultShareFileSystem0/A_048

image: /vms/defaultShareFileSystem0/A_048

file format: qcow2

virtual size: 30G (32212254720 bytes)

disk size: 1.3G

cluster_size: 262144

backing file: /vms/defaultShareFileSystem0/A_048_base_1

backing file format: qcow2

Format specific information:

compat: 0.10

refcount bits: 16

If you display level-2 image file information, you can see information for the level-1 image file (base image file).

root@ZJ-UIS-001:~# qemu-img info /vms/defaultShareFileSystem0/A_048_base_1

image: /vms/defaultShareFileSystem0/A_048_base_1

file format: qcow2

virtual size: 30G (32212254720 bytes)

disk size: 1.0M

cluster_size: 262144

backing file: /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602

backing file format: qcow2

Format specific information:

compat: 0.10

refcount bits: 16

If you display information for the base image file, you cannot see information for image files of other levels.

root@ZJ-UIS-001:~# qemu-img info /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602

image: /vms/defaultShareFileSystem0/fio-cent-autorun_UIS-e0602fio-cent-autorun_UIS-e0602

file format: qcow2

virtual size: 30G (32212254720 bytes)

disk size: 5.5G

cluster_size: 262144

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false

Consolidating image files

If a VM uses a multi-level image file, you can use the qemu-img convert command to consolidate the image file.

root@UIS-CVK01:/vms/sharefile# qemu-img convert -O qcow2 -f qcow2 windows2008 windows2008-test

root@ZJ-UIS-001:/vms/defaultShareFileSystem0# qemu-img convert -O qcow2 -f qcow2 A_048 A048-test

The consolidated image file is not a multi-level image file.

root@ZJ-UIS-001:~# qemu-img info /vms/defaultShareFileSystem0/A048-test

image: /vms/defaultShareFileSystem0/A048-test

file format: qcow2

virtual size: 30G (32212254720 bytes)

disk size: 1.4G

cluster_size: 262144

Format specific information:

compat: 1.1

lazy refcounts: false

refcount bits: 16

corrupt: false

ONEStor commands

ONEStor commands are used to obtain the cluster status and status of monitors nodes, OSDs, and PGs.

· Mon (Monitor)—Monitor node in the cluster.

· OSD—Physical disks corresponding to the storage nodes.

· PG—Virtual node on the dashboard. A PG resides in a storage pool. Every time a storage pool is added, a number of PGs will be added in the cluster.

Obtaining the health status of a cluster

· ceph health detail

This command displays PGs in unclean, inconsistent, and degraded states. As shown in the following figure, if the cluster is in healthy state, the system displays HEALTH_OK.

If HEALTH_WARN is displayed, it indicates that the cluster is in warning state. The following figure shows that 1024 PGs are in degraded state, and 1024 PGs are in unclean state. This indicates that 33.333% PGs in the cluster are degraded, 1/3 OSDs are in down state, and the PGs on the down OSDs are in degraded state.

The following are the causes of this issue:

¡ A node is unreachable. Identify whether the service network and storage network are reachable.

¡ A node has failed. Use the ceph osd tree command to identify the node where the down OSDs reside and identify whether the node hardware and operating system are operating correctly.

ceph health detail

· ceph -s

To display the cluster status, use the ceph -s command.

The output from the command is as follows:

¡ health

- HEALTH_OK—The cluster is in healthy state.

- HEALTH_WARN—Alarms have been triggered.

- HEALTH_ERR—A severe error such as data inconsistency has occurred in the cluster.

Typically, prompts related to PG and OSD abnormalities or time inconsistencies will appear in the health section.

¡ monmap—Number of monitors and the nodes where the monitors reside. As shown in the figure, the cluster contains three monitors, which reside in node 117, node 118, and node 119 respectively. The first monitor is the primary monitor.

¡ osdmap—Total number of OSDs, number of OSDs in up state, and number of OSDs in in state. As shown in the figure, all 18 OSDs in the cluster are in up and in states, which indicate they are all operating correctly.

¡ pgmap—Number of PGs, number of storage pools, space that a data replica is used, and total number of objects. This field also displays cluster usage information, including used capacity, free capacity, and total capacity. In addition, the PG state is displayed.

Error prompts:

¡ too many PGs per OSD—The error message will not be displayed if you add more OSDs or reduce the number of storage pools.

¡ clock skew detected—The system time is inconsistent on monitor nodes. Execute the ntpdate –u IP command to synchronize time from the primary NTP server. IP is the IP address of the primary NTP server. As shown in the following figure, six OSDs are in down state. The cluster puts the PGs corresponding to the OSDs in degraded state.

Execute the ceph -s command. The output shows that some PGs are abnormal, one monitor is down, 12 OSDs are up, and 18 OSDs are in in state. This indicates that node 118 might have an error or the service network is in abnormal state.

· ceph -w

To monitor a cluster, use the ceph -w command. The command continuously outputs information and can be terminated by pressing Ctrl+C. When the cluster's PG state is normal, the output from the ceph -w command is consistent with the output from the ceph -s command, as shown in the following figure.

ceph -w

To view cluster state changes, see the osdmap, pgmap, mon, and osd pgmap sections.

OSD commands

· ceph osd tree

To display the OSDs on each node and their positions in the CRUSH map, use the ceph osd tree command. This command helps maintain a large cluster. The following figure shows OSDs in normal state.

tree

Use osd.1 as an example. The weight of the OSD is 0.89999, it is in rack 3, the host node is node 111, and the OSD is down and out state.

tree

IMPORTANT:

The system marks the state of an OSD as down out five minutes after it state changes to down.

· An OSD is in down/out state. A hard disk failure might occur.

· The OSDs on the node are down. A node exception or network exception might occur.

· ceph osd perf

To display the latency of an OSD, use the ceph osd perf command. If services are running, a latency of less than 100 ms is normal. When the cluster is idle, the latency is typically within 10 ms.

perf

If the latency keeps higher than 10 ms when the cluster is idle, troubleshoot the issue. If the latency is higher than 100 ms when a large number of services are running, identify whether a network or hardware failure has occurred.

· ceph osd df

To display the disk usage, use the ceph osd df command. The command can display OSD statistics, such as OSD size, used capacity, available capacity, and usage. If the usage of an OSD is higher than 85%, the near full alarm is displayed on UIS Manager. If the usage of an OSD is higher than 5, the cluster is unavailable.

As shown in the following figure, the cluster contains three OSDs, each having a size of 920G, used capacity of 501G, and available capacity of 419G. The total capacity is 2762G, used capacity is 1505G, available capacity is 1257G, and usage of 54.48%.

ceph osd df

Obtaining the cluster usage statistics

ceph df

The command is used to obtain usage statistics for the cluster and storage pools. It displays the total capacity, remaining capacity, used capacity, and percentage of the cluster. In addition, it displays information about the storage pools, such as their names, IDs, usage status, and the number of objects in each storage pool.

For example, as shown in the figure below, the remaining capacity of the cluster is 1257G, the used capacity is 1505G, the usage is 54.48%, the used capacity by storage pool p1 is 499G, the usage is 54.29%, the available space is 419G, and the number of objects is 128003.

ceph df

ONEStor commands

iostat

Use the iostat command to monitor system input/output (I/O) devices that are loaded and the length of time it takes for the system to process the I/O requests. This command is useful for analyzing whether there is a bottleneck in the IO process during the interaction between the process and the operating system. When executed without any parameters specified, this command displays statistical information from the time the system was started to the current time when the command was executed. The following figure shows the output from the iostat command.

iostat

The following are the descriptions for the items:

· The first line displays the system version, host name, and date.

· avg-cpu—CPU usage statistics. For a multi-core CPU, this value is the average value of all cores.

· Device—IO statistics for each disk.

· CPU and disk IO statistics.

For the CPU statistics, the value for iowait is important. It indicates the percentage of time that the CPU was idle during which the system had pending disk I/O requests.

Disk names are displayed in the sdX format.

Item	Description
tps	Number of IO read and write requests per second that were issued by the process.
kB_read/s	The amount of data read from the device expressed in kilobytes per second. One sector has a size of 512 bytes.
kB_wrtn/s	The amount of data written to the device expressed in kilobytes per second.
kB_read	Total number of kilobytes read.
kB_wrtn	Total number of kilobytes written.

The iostat -x 1 command displays real-time IO device statistics. Specify the -x option when you analyze IO usage statistics.

iostat -x 1

The iostat -x 1 command displays real-time information about the disk usage for a node. If the %util ratio of a single disk is high or close to 100%, a single disk might have an issue. If the overall disk %util ratio of the cluster is over 80% or close to 100%, the cluster's disk IO usage has reached its limit. In such a case, you can add more disks or reduce the services provided by the cluster.

The following are the descriptions for the items:

Item	Description
rrqm/s	Number of read requests merged per second that were queued to the device.
wrqm/s	Number of write requests merged per second that were queued to the device.
r/s	Number of read requests completed per second for the device.
w/s	Number of write requests completed per second for the device.
rkB/s	Number of kilobytes read from the device per second.
wkB/s	Number of kilobytes written to the device per second.
avgrq-sz	Average size (in sectors) of the requests that were issued to the device.
avgqu-sz	Average queue length of the requests that were issued to the device.
await	Average time (in milliseconds) for I/O requests issued to the device to be served. The time includes the time spent by the requests in queue and the time spent servicing them.
svctm	Average service time (in milliseconds) for I/O requests that were issued to the device.
%util	Percentage of CPU time during which I/O requests were issued to the device.

top

The top command provides real-time monitoring of resource usage for different processes in the system. This command can sort tasks based on CPU usage, memory usage, and execution time.

The following are the items that need to be focused on:

· Load average

· Tasks

· CPU usage

Sorting processes by CPU or memory usage can help identify which processes are causing system issues. To do this, press either the uppercase F or O key and choose either k or n when you execute the top command.

The following is the output from the top command.

top

The following are the descriptions for the items:

· The first line is task queue information. This line shows the current time, system uptime, the number of currently logged-in users, and the system load, which is the average length of the task queue, displayed as three values for the past 1 minute, 5 minutes, and 15 minutes, respectively.

· The second and third lines display information about processes and CPUs. If multiple CPUs exist, these contents might exceed two lines. The content in memory is swapped out to the swap area, and then swapped back to memory, but the unused swap area has not been overwritten. This value is the size of the swap area that already exists in memory. When the corresponding memory is swapped out again, there is no need to write to the swap area again.

The area below system information displays detailed information for each process.

Item	Description
PID	Process ID
RUSER	Username of the owner of the process
UID	User ID of the owner of the process
USER	Username of the owner of the process
VIRT	Total virtual memory used by the process.
RES	The amount of actual physical memory a process is consuming in kb.
SHR	Shared memory size (kb) used by the process.
%MEM	Memory usage of the process.
%CPU	CPU usage of the process.

You can press the uppercase F or O key, and then press a-z to sort the processes according to the corresponding column. The uppercase R key can reverse the current sorting.

You can use the following commands during the execution of the top command.

Item	Description
q /Ctrl+C	Quits the program.
m	Displays memory information.
t	Displays process and CPU information.
c	Displays command name and complete command.
M	Sorts processes by available memory.
P	Sorts processes by CPU usage.
T	Sorts processes by time/accumulated time.

Other query commands

· lsblk

Use the lsblk command to view information about hard drive capacity, partition, usage, and mounting.

lsblk

In the above figure, the NAME column lists all hard drives and partitions, SIZE displays the total capacity of the hard drive and partition size, TYPE displays the type of hard drive and partition, and MOUNTPOINT displays the file system mount point. The sda disk is the system disk with a size of 279.4G. Six hard disks with a size of 558.9G each are mounted as OSDs, and the size of the log partition is 10G.

· mount

Use the mount command to display all mounted file systems in a cluster and their types.

mount

· df -h

Use the df -h command to list all mounted file systems, and display the total capacity, used capacity, available capacity, usage, and mount point for each mounted file system.

df -h

The output shows that 6 OSDs have been mounted, each with a capacity of 549G and a usage of 1%.

· fdisk -l

Use the fdisk -l command to display the hard drives, partitions, sizes, and usage of the nodes.

fdisk -l

· free

Use the free command to display the total memory, used memory, buffer, cache, and swap usage of a node.

free

nvmof commands

Discovering NVMeoF storage

nvme discover -t rdma -a ISER_IP -s 4420

Logging in to NVMeoF storage

nvme connect -t rdma -n nqn.2010-05.com.macrosan:storage-1:50b34200-11f0-0052-5c6d-b5f32fe90761 -a ISER_IP -s 4420

Logging out of NVMeoF storage

nvme disconnect -n nqn.2010-05.com.macrosan:storage-1:50b34200-11f0-0052-5c6d-b5f32fe90761

Cloud-native engine container service commands

Run the commands on cloud-native engine component VMs.

Obtaining the running status of components in a cluster

Use the kubectl command to maintain a Kubernetes cluster. To display the running status or deployment status of components in the cluster, use the following command:

root@HZ-UIS01-CVK01:~# kubectl get pod -A

Item	Description
NAMESPACE	Namespace to which the pod belongs.
NAME	Pod Name
READY	Current status, healthy containers/running containers.
STATUS	Pod status, including Pending, Running, Succeeded, Failed, Unknown, and XXBackoff.
RESTARTS	Number of restarts.
AGE	Uptime.

If a component is not in Running status, an exception has occurred.

Reviewing component logs

Cluster components run as pods in a Kubernetes cluster. To review the logs, use the following commands:

· Review all pod logs: kubectl logs (NAME) [-c CONTAINER]

Example: kubectl logs nginx

· Follow all pod logs: kubectl logs (NAME) [-c CONTAINER] –f

Example: kubectl logs nginx -f

· Review the most recent pod logs: kubectl logs (NAME) [-c CONTAINER] –tail=N

Example: kubectl logs nginx –tail=100

Restarting a cluster component

Cluster components run as pods in a Kubernetes cluster. To restart a component, use the kubectl delete pod [-n NAMESPACE] (NAME) command.

For example, to restart the abc container in the tke namespace, use the kubectl delete pod –n tke abc command.

Linux commands

vi

To create or edit a file in the Linux operating system, you must use commands such as vi and vim.

The Vi editor has two modes: Command and Insert.

The following uses the test.txt file as an example.

Executing the vi command

Enter the vi test.txt command in the command line window of Linux. If the test.text file already exists, you can use the vi command to edit its content. If the file does not exist, this command creates the file.

Entering Command mode

When you first open a file with Vi, you are in Command mode. The file does not contain any information.

In Command mode, you can use keyboard keys to navigate, delete, copy, paste except entering text.

Entering Insert mode

To enter Insert mode, press i, o, or a, as shown in the following figure.

Entering Insert mode

Enter the file content.

Returning to Command mode

To return to Command mode, press ESC.

Save & Exit

After you return to Command mode, enter a colon (:),and then execute the wq command to save the file and exit the vi editor.

To view the created file, execute the ls command.

Basic commands

Displaying the current directory

Use the pwd command to print the current working directory.

root@HZ-UIS01-CVK01:~# pwd

/root

Displaying file information

Use the ls command to display file information in the current directory.

# ls [-aAdfFhilnrRSt] directory name

Options and parameters:

-a: Lists all files including those that begin with .

-A: Lists all files except for . and ..

-d: Lists directory entries instead of contents

-h: when used with -l (long list), prints sizes in human readable format, for example GB, KB

-i: Prints the index number of each file

-r: Reverses order while sorting

-R: Lists all subdirectories recursively

-S: Displays entries sorted by file size

-t: Sorts by modification time

Example:

root@HZ-UIS01-UIS Manager:~# ls -al

total 44

drwx------ 5 root root 4096 May 23 15:33 .

drwxr-xr-x 24 root root 4096 May 13 09:47 ..

-rw------- 1 root root 847 Jan 1 12:35 .bash_history

-rw-r--r-- 1 root root 3106 Apr 19 2012 .bashrc

drwx------ 2 root root 4096 May 17 17:23 .cache

-rw-r--r-- 1 root root 8 May 23 15:33 UIS.conf

drwxr-xr-x 2 root root 4096 May 23 15:32 h3c

-rw-r--r-- 1 root root 140 Apr 19 2012 .profile

drwxr-xr-x 2 root root 4096 May 22 09:50 .ssh

-rw------- 1 root root 4962 May 23 15:33 .viminfo

Changing the working directory

Use the cd command to change the working directory.

.: The current directory.

..: One level up from the current directory.

-: Previous working directory

~: Home directory for the current user

For example, ~account represents the home directory for the account user.

Example:

root@HZ-UIS01-CVK01:/# cd ~root

# Enter the home directory for the root user.

root@HZ-UIS01-CVK01:~# cd ~

# Return to the home directory for the current user.

root@HZ-UIS01-CVK01:~# cd

# Return to the home directory for the current user.

root@HZ-UIS01-CVK01:~# cd ..

# Enter the directory one level up from the current directory.

root@HZ-UIS01-CVK01:/# cd -

# Return to the previous directory.

root@HZ-UIS01-CVK01:~# cd /root

# Enter the /root directory.

root@HZ-UIS01-CVK01:~# cd ../root

# Enter the root directory under the previous directory.

Creating a new directory

Use the mkdir (make directory) command to create a new directory.

# mkdir [-mp] directory name

Options and parameters:

-m: Sets access privilege.

-p: Adds a directory including its sub directory.

Example:

root@HZ-UIS01-UIS Manager:~# ls

root@HZ-UIS01-UIS Manager:~# mkdir h3c

root@HZ-UIS01-UIS Manager:~# ls

h3c

root@HZ-UIS01-UIS Manager:~#

Copying a file or directory

Use the cp (copy) command to copy a file or directory.

# cp [-adfilprsu] source destination

# cp [options] source1 source2 source3 .... destination directory

Options and parameters:

-a: Same as -pdr

-f: If any existing destination file can't be opened, delete it and attempt again

-i: Asks for confirmation before overwriting the destination file.

-p: Preserves the file attributes of the original file in the copy.

-r: Copies files recursively. All files and subdirectories in the specified source directory are copied to the destination.

If more than two source files exist, the last destination file must be a directory.

Example:

# Copy a file.

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf

root@HZ-UIS01-UIS Manager:~# cp UIS.conf UIS.conf.bak

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf UIS.conf.bak

root@HZ-UIS01-UIS Manager:~#

# Copy a directory.

root@HZ-UIS01-UIS Manager:~# ls

h3c

root@HZ-UIS01-UIS Manager:~# cp -rf h3c h3c.bak

root@HZ-UIS01-UIS Manager:~# ls

h3c h3c.bak

root@HZ-UIS01-UIS Manager:~#

Securely copying a file

scp (secure copy) allows you to securely copy files and directories between two locations. The protocol ensures the transmission of files is encrypted. It is a safer option for the cp (copy) command. If a disk on your server is read only system, you can use the scp command copy the files on that disk to a destination.

#scp [option] [source directory] [destination directory]

Options and parameters:

-1: Protocol 1 will be used.

-2: Protocol 2 will be used.

-4: Only IPv4 addresses will be used.

-6: Only IPv6 addresses will be used.

-B: Executes in batch mode, deactivating every query for user input.

-C: Enable compression. Compression will be activated, and transfer speed will be enhanced while copying with this option.

-p: Preserves file permissions, access time, and modifications while copying.

-q: Execute SCP in quiet mode. This option will not display the transfer process.

-r: Copies the directories and files recursively.

-v: Activates verbose mode. It will display the SCP command execution progress step-by-step on the terminal window. It is useful in debugging.

-c: Cipher. choose the cipher for the process of data encryption. This option is passed directly to SSH.

-F ssh_config: For SSH, describe a replacement configuration file. This option is passed directly to SSH.

-i identity_file: File through which to read the status for public key authentication. This option is passed directly to SSH.

-l limit: Restricts the bandwidth in Kbit/s.

-o ssh_option: Arranged options in the ssh_configure format to SSH.

-P port: Port to which to link.

-S program: Applies a specified function for encryption connection. This program must be able to understand the SSH(1) option.

Example:

root@HZ-UIS01-CVK01:~# scp UIS-E0218H06-Upgrade.tar.gz HZ-UIS01-CVK02:/root

UIS-E0218H06-Upgrade.tar.gz 100% 545MB 90.8MB/s 00:06

root@HZ-UIS01-CVK01:~#

Removing a file or directory

Use the rm (remove) command to remove a file or directory.

# rm [-fir] file or directory name

Options and parameters:

-f: Removes a directory forcefully.

-i: Removes a file interactively.

-r: Removes a directory recursively. Use this option with caution.

Example:

root@HZ-UIS01-UIS Manager:~# ls

h3c

root@HZ-UIS01-UIS Manager:~# rm -rf h3c

root@HZ-UIS01-UIS Manager:~# ls

root@HZ-UIS01-UIS Manager:~#

Moving files and directories or renaming a file or directory

Use the mv (move) command to move files and directories from one directory to another or rename a file or directory.

# mv [-fiu] source destination

# mv [options] source1 source2 source3 .... directory

Options and parameters:

-f: Overwrites the destination file or directory without asking for permission.

-i: Asks for permission to overwrite.

-u: Only moves those files that do not exist.

Example:

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf

root@HZ-UIS01-UIS Manager:~# mv UIS.conf UIS.conf.bak

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf.bak

root@HZ-UIS01-UIS Manager:~#

Creating an archive and extracting the archive files

# tar [-j|-z] [cv] [-f file name] filename... archive

# tar [-j|-z] [xv] [-f file name] [-C directory] extracting

Options and parameters:

-c: Creates the archive.

-t: Displays or lists files inside the archived file.

-x: Extracts archives. This option can be used together with the -C option.

The -c, -t, and -x option cannot be used in the same command.

-j: Filters archive tar files with the help of tbzip. As a best practice, use *.tar.bz2 as the archive name.

-z: A zip file and informs the tar command that makes a tar file with the help of gzip. As a best practice, use *.tar.gz as the archive name.

-v: Displays verbose information.

-f filename: Creates an archive along with the provided name of the file.

-C directory: Use this option to extract files in a specific directory.

Example:

# Create an archive.

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf UIS.conf-01 UIS.conf-02

root@HZ-UIS01-UIS Manager:~# tar -czvf UIS.tar.gz UIS.conf*

UIS.conf

UIS.conf-01

UIS.conf-02

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf UIS.conf-01 UIS.conf-02 UIS.tar.gz

# Extract the archive files.

root@HZ-UIS01-UIS Manager:~# ls

UIS.tar.gz

root@HZ-UIS01-UIS Manager:~# tar -xzvf UIS.tar.gz

UIS.conf

UIS.conf-01

UIS.conf-02

root@HZ-UIS01-UIS Manager:~# ls

UIS.conf UIS.conf-01 UIS.conf-02 UIS.tar.gz

System commands

Displaying the system kernel

# uname [-asrmpi]

Options and parameters:

-a: Displays all system information.

-s: Displays the system kernel name.

-r: Displays the kernel release.

-m: Displays the name of the machine’s hardware name, for example, i686 or x86_64.

-p: Displays the architecture of the CPU.

-i: Displays the hardware platform.(x86)

Example:

root@ZJ-UIS-001:~# uname -a

Linux ZJ-UIS-001 4.1.0-generic #1 SMP Wed Nov 9 02:04:23 CST 2016 x86_64 x86_64 x86_64 GNU/Linux

Displaying uptime of the system

Example:

root@HZ-UIS01-UIS Manager:~# uptime

17:54:04 up 3 days, 23:28, 1 user, load average: 0.08, 0.12, 0.13

Displaying system resource statistics

# vmstat [-a] [delay [total monitors]]

# vmstat [-fs]

# vmstat [-S unit]

# vmstat [-d]

# vmstat [-p partition]

Options and parameters:

-a: Displays active/inactive memory.

-f: Displays the number of forks since boot.

-s: Displays a table of various event counters and memory statistics.

-S: Followed by k or K or m or M switches outputs of bytes.

-d: Lists disk statistics.

-p: Followed by some partition name for detailed statistics.

Example:

root@HZ-UIS01-CVK01:~# vmstat 1 5

procs ---------------memory----------------- -----swap---- -----io---- ----system-- -----cpu--------

r b swpd free buff cache si so bi bo in cs us sy id wa

1 0 0 60402384 58716 1712736 0 0 15 6 87 116 1 0 98 0

0 0 0 60402500 58716 1712736 0 0 1 0 631 1051 0 0 100 0

0 0 0 60402608 58756 1712752 0 0 0 840 1444 1640 2 0 98 0

0 0 0 60403360 58756 1712760 0 0 2 33 991 1346 0 0 100 0

2 0 0 60400944 58780 1712784 0 0 0 60 2225 1682 0 0 99 0

Field description for Vm mode:

procs

· r: Number of processes waiting for run time.

· b: Number of processes in uninterruptible sleep.

memory

· swpd: The amount of virtual memory used.

· free: The amount of idle memory.

· buff: The amount of memory used as buffers.

· cache: The amount of memory used as cache.

swap

· si: The amount of memory swapped in from disk (/s).

·so: The amount of memory swapped to disk (/s).

If the values are large, data in the memory is swapped between disks and the primary adapter, which means the system has low efficiency.

· io

¡ bi: Blocks received from a block device (blocks/s).

¡ bo: Blocks sent to a block device (blocks/s). A larger value indicates that the system IO is busy.

system

· in: Number of interrupts per second, including the clock.

· cs: Number of context switches per second.

A larger value indicates more frequent communications between the system and devices such as disks, NICs, and clocks.

· CPU

¡ us: Time spent running non-kernel code.

¡ sy: Time spent running kernel code. (system time). id: Time spent idle.

¡ wa: Time spent waiting for IO.

¡ st: Time stolen from a VM. Supported in versions later than Linux 2.6.11.

Displaying the load on a device

Use the iostat command to display CPU and I/O usage statistics.

#iostat[parameter][time][count]

Options and parameters:

-c: Displays the CPU usage. It is mutually exclusive with the -d option.

-d: Displays the disk usage. It is mutually exclusive with the -c option.

-k: Displays statistics in kilobytes per second. The default unit is block.

-m: Displays statistics in megabytes per second.

-N: Displays logical volume mapping (LVM) statistics.

-n: Displays NFS statistics.

-p: Displays statistics for block devices and all their partitions used by the system. You can specify a device after this option, for example, # iostat -p /dev/sda. This option is mutually exclusive with the -x option.

-t: Prints the time for each report displayed.

-x: Displays detailed information.

-v: Displays version information.

Remarks:

· avg-cpu

¡ %user: Displays the percentage of CPU usage that occurred when executing at the user level.

¡ %nice: Displays the percentage of CPU usage that occurred when executing at the user level with nice priority.

¡ %user: Displays the percentage CPU usage that occurred when executing at the system (kernel) level.

¡ %steal: Displays the percentage of time spent in involuntary wait by the virtual CPU or CPUs when the hypervisor was servicing another virtual processor.

¡ %iowait: Displays the percentage of time the CPUs were idle during which the system had an outstanding disk I/O request.

¡ %idle: Displays the percentage of time the CPUs were idle.

· Device

¡ tps: Number of IO requests per second that were issued to the device.

¡ Blk_read /s: The amount of data read from the device expressed in blocks per second.

¡ Blk_wrtn/s: The amount of data written to the device expressed in blocks per second.

¡ Blk_read: Total number of blocks read.

¡ Blk_wrtn: Total number of blocks written.

IMPORTANT:

· If the value of %iowait is too high, the disk has IO issues. If the value of %idle is high, the CPUs are idle.

· If the value of %idle is high but the system responds slowly, the CPUs might be waiting for memory allocation. You must increase the memory capacity.

· If the value of %idle keeps lower than 10, the system has low CPU processing capabilities.

iostat outputs:

· Blk_read: Total number of blocks read.

· Blk_wrtn: Total number of blocks written.

· kB_read/s: The amount of data read from the driver expressed in kilobytes per second.

· kB_wrtn/s: The amount of data written to the driver expressed in kilobytes per second.

· kB_read: Total number of kilobytes read.

· kB_wrtn: Total number of kilobytes written.

· rrqm/s: Number of read requests merged per second that were queued to the device.

· wrqm/s: Number of write requests merged per second that were queued to the device.

· r/s: Number of read requests completed per second for the device.

· w/s: Number of write requests completed per second for the device.

· rsec/s: Number of sectors read from the device per second.

· wsec/s: Number of sectors written to the device per second.

· rkB/s: The amount of data read from the device expressed in kilobytes per second.

· wkB/s: The amount of data written to the device expressed in kilobytes per second.

· avgrq-sz: Average size (in sectors) of the requests that were issued to the device.

· avgqu-sz: Average queue length of the requests that were issued to the device.

· await: Average time (in milliseconds) for I/O requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.

· svctm: Average service time (in milliseconds) for I/O requests that were issued to the device.

· %Util: Percentage of CPU time where I/O requests were issued to the device (bandwidth utilization for the device). Device saturation occurs when this value is close to 100%.

Example:

root@HZ-UIS01-CVK01:~# iostat

Linux 3.13.6 (HZ-UIS01-CVK01) 12/16/2015 _x86_64_ (24 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle

20.48 0.00 3.48 0.23 0.00 75.80

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn

sda 10.17 1.76 269.57 1309400 201017740

sdb 16.43 181.78 202.21 135552881 150792613

Execute the iostat -d -x -m /dev/sdb 1 5 command to display detailed information about /dev/sdb.

Testing the read and write performance for a disk

dd [option]

Options and parameters:

· if=file: Specifies the input file name. The default is standard input.

· of=file: Specifies the output file name. The default is standard output.

· ibs=bytes: Reads BYTES bytes at a time. One block is BYTES bytes.

· obs=bytes: Writes BYTES bytes at a time. One block is BYTES bytes.

· bs=bytes: Reads and writes BYTES bytes at a time. It can replace ibs and obs.

· cbs=bytes: Converts BYTES bytes at a time. It is the size of the conversion buffer.

· skip=blocks: Skips BLOCKS ibs-sized blocks at start of input.

· seek=blocks: Skips BLOCKS ibs-sized blocks at start of output. This option is valid only when the output file is a disk or tape.

· count=blocks: Copies only BLOCKS input blocks. The block size is the number of bytes specified by ibs.

· conv=ASCII: Converts EBCDIC to ASCII.

· conv=ebcdic: Converts ASCII to EBCDIC.

· conv=ibm: Converts ASCII to alternate EBCDIC.

· conv=block: Converts pad newline-terminated records with spaces to cbs-size.

· conv=ublock: Replaces trailing spaces in cbs-size records with newline.

· conv=uUISe: Converts lower-case letters to upper-case letters.

· conv=lUISe: Converts upper-case letters to lower-case letters.

· conv=notrunc: Does not truncate the output file.

· conv=swab: Swaps every pair of input bytes.

· conv=noerror: Continue after read errors.

· conv=sync: Pads every input block with NULLs to ibs-size; when used with block or unblock, pad with spaces rather than NULLs.

The specified numbers must be multiplied by their corresponding factors if they are followed by any of the following characters: b=512, c=1, k=1024, w=2, xm=number m, kB=1000, K=1024, MB=1000*1000, M=1024*1024, GB=1000*1000*1000, G=1024*1024*1024.

Displaying the free and used memory

free [-b|-k|-m|-g] [-t]

Options and parameters:

· -b: Displays output in Kbytes. The output can also be displayed in b(bytes), m(Mbytes), k(Kbytes), and g(Gbytes).

· -t: Displays summary for physical memory + swap space.

Example:

root@HZ-UIS01-CVK01:~# free

total used free shared buffers cached

Mem: 65939360 4208888 61730472 0 83224 277944

-/+ buffers/cache: 384772062091640

Swap: 10772220 0 10772220

User commands

Creating a user group

groupadd [-g gid] groupname

Options and parameters:

-g: Group ID.

Example:

root@HZ-UIS01-CVK01:~# groupadd -g 1000 it

root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it

it:x:1000:

Deleting a user group

groupdel groupname

Example:

root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it

it:x:1000:

root@HZ-UIS01-CVK01:/etc# groupdel it

root@HZ-UIS01-CVK01:/etc# more /etc/group | grep it

root@HZ-UIS01-CVK01:/etc#

Creating a user

useradd [-u UID] [-g initial_group] [-G supplementary group] [-m/M] [-d home_dir] [-s shell] username

Options and parameters:

· -u: User ID.

· -g: Initial group.

· -G: A list of supplementary groups which the user is also a member of.

· -M: The user home directory will not be created.

· -m: The user’s home directory will be created if it does not exist.

· -d: Specifies a directory as the home directory.

· -s: The name of the user’s login shell. If no login shell exists, the system selects the default login shell.

Example:

root@HZ-UIS01-CVK01:~# useradd -u 1000 -g it -m -d /home/it-user01 -s /bin/bash it-user01

root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01

it-user01:x:1000:1000::/home/it-user01:/bin/bash

root@HZ-UIS01-CVK01:~# ls /home/

it-user01

Deleting a user

userdel [-r] username

Options and parameters:

-r: Deletes files in the user’s home directory along with the home directory itself.

Example:

root@HZ-UIS01-CVK01:~# userdel -r it-user01

root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01

root@HZ-UIS01-CVK01:~# ls /home

root@HZ-UIS01-CVK01:~#

Setting the password

passwd [-l] [-u] [--sdtin] [-S] [-n days] [-x days] [-w days] [-i date] username

Options and parameters:

· -l: Locks the password.

· -u: Unlocks the password.

· -S: Displays password related parameters.

· -n: Sets the minimum number of days between password changes.

· -x: Sets the maximum number of days a password remains valid. After MAX_DAYS, the password must be changed.

· -w: Sets the number of days of warning before a password change is required.

· -i: Sets the day on which the password will expire.

Example:

root@HZ-UIS01-CVK01:~# more /etc/passwd | grep it-user01

it-user01:x:1000:1000::/home/it-user01:/bin/bash

root@HZ-UIS01-CVK01:~#

root@HZ-UIS01-CVK01:~# passwd it-user01

Enter new UNIX password:

Retype new UNIX password:

passwd: password updated successfully

Switching the user account

su [-lm] [-c command] [username]

Options and parameters:

· -: starts a new login shell as another username. If you do not add a username, you switch to the root user.

· -l: Similar as the - option except that you must specify the user account.

· -m: Preserves the current environment.

· -c: Passes a command to the shell.

Example:

root@HZ-UIS01-CVK01:~# su - it-user01

it-user01@HZ-UIS01-CVK01:~$ exit

logout

it-user01@HZ-UIS01-CVK01:~$ su - root

Password:

root@HZ-UIS01-CVK01:~#

File management commands

Changing the group ownership of a file or directory

chgrp [-R] group name directory/file

Options and parameters:

-R: Recursively changes the group of the directory and each file in the directory.

Example:

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01# chgrp root testFile

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 it-user01 root 4096 May 30 15:44 testFile

Changing the file owner and group

chown [-R] user file or directory

chown [-R] user:group name file or directory

Options and parameters:

-R: Recursively changes the ownership of the directory and each file in the directory.

Example:

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01# chown root:root testFile

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 root root 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01#

Changing file or directory mode bits or permissions.

chmod [-R] xyz file or directory

Options and parameters:

· xyz: File attribute in number, a sum of the values for r, w, and x.

· -R: Recursively changes file mode bits of the directory and the files in the directory.

Example:

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxr-xr-x 2 it-user01 it 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01# chmod 777 testFile

root@HZ-UIS01-CVK01:/home/it-user01# ls -l

total 4

drwxrwxrwx 2 it-user01 it 4096 May 30 15:44 testFile

root@HZ-UIS01-CVK01:/home/it-user01#

Process management commands

Displaying all running processes

top [-d number] | top [-bnp]

Options and parameters:

· -d: Specifies the delay between screen updates in seconds. The default value is 5 seconds.

· -b: Starts top in Batch mode, which is used to send output from top to a file.

· -n: Specifies the maximum number of iterations, or frames, top can produce before ending. This option is used together with the -b option.

· -p: Monitor only processes with specified process IDs.

You can use the following interactive commands during execution of the top:

· ?: Provides a reminder of all the basic interactive commands.

· P: Sorts by CPU usage.

· M: Sorts by memory usage.

· N: Sorts by PID.

· T: Sorts by CPU time used by processes.

· k: You will be prompted for a PID and then the signal to be sent.

· r: You will be prompted for a PID and then the value to nice it to.

· q: Quits top.

Example:

top - 17:40:48 up 2:13, 1 user, load average: 0.45, 0.55, 0.66

Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.6%us, 0.1%sy, 0.0%ni, 99.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 65939360k total, 5703848k used, 60235512k free, 85832k buffers

Swap: 10772220k total, 0k used, 10772220k free, 1746992k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4939 root 20 0 4583m 1.3g 4728 S 12 2.1 17:36.67 kvm

4874 root 20 0 4520m 908m 4576 S 5 1.4 11:54.61 kvm

4043 root 20 0 10.9g 402m 16m S 1 0.6 13:43.34 java

2370 root 20 0 23676 2168 1316 S 0 0.0 0:30.29 ovs-vswitchd

3184 root 20 0 15972 744 544 S 0 0.0 0:04.78 irqbalance

1 root 20 0 24456 2444 1344 S 0 0.0 0:04.07 init

2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd

3 root 20 0 0 0 0 S 0 0.0 0:00.07 ksoftirqd/0

6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0

Output description:

· The first line displays the following:

¡ Current time and length of time since last boot

¡ Total number of users

¡ System load avg over the last 1, 5 and 15 minutes

A small value indicates that the system is idle. If the value is higher than 1, you must identify whether the system is too busy.

· The second line shows total tasks or threads. If the value for zombie is not 0, you must identify which process has become a zombie process.

· The third line shows the CPU state percentages. You must focus on the %wa parameter, which represents the time waiting for I/O completion. An IO issue can cause a system to respond slowly.

· The fourth and fifth lines show the physical and virtual memory statistics. If the virtual memory usage is high, the physical memory of the system is insufficient.

The lower section displays statistics for each process.

· PID: ID of the process.

· USEr: User of the process.

· PR: Priority of the process. A smaller value means the process has a higher execution priority.

· NI: Time running niced user processes. A smaller value means the process has a higher execution priority.

· %CPU: CPU usage.

· %MEM: Memory usage.

· TIME+: CPU time.

To view information about a process:

root@HZ-UIS01-CVK01:~# top -d 2 -p 4939

top - 08:59:13 up 17:31, 1 user, load average: 0.75, 0.70, 0.58

Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie

Cpu(s): 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 65939360k total, 6484728k used, 59454632k free, 229880k buffers

Swap: 10772220k total, 0k used, 10772220k free, 1995728k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

4939 root 20 0 4583m 1.5g 4728 S 2 2.4 100:48.79 kvm

Returning the status of a process

ps aux

ps -lA

ps axjf

Options and parameters:

· -A: Displays information about all accessible processes on the system.

· -a: Displays information about all processes that are associated with terminals.

· -u: Displays information for processes with user IDs in the userlist.

· -x: Used together with the -a option to display complete information.

Output format:

· l: Displays BSD long format.

· j: BSD job control format.

· -f: Does full-format listing.

# Display bash processes.

root@HZ-UIS01-CVK01:~# ps -l

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD

4 R 0 11338 32857 0 80 0 - 2102 - pts/2 00:00:00 ps

4 S 0 32857 32797 0 80 0 - 5428 wait pts/2 00:00:00 bash

Using the ps -l command only lists programs related to the operating environment (bash). The parent program will be its own bash, which extends to the init process.

· F: Flags associated with the process.

¡ 4: used super-user privileges.

¡ 1: forked but didn't exec.

· S: Process state. R: Running. S: Sleep. D: Uninterruptible sleep (typically IO).

· T: Stop. Z: defunct zombie process, terminated but not reaped by its parent.

· UID/PID/PPID: Process ID.

· C: CPU usage.

· PRI/NI: Priority and Nice.

· ADDR/SZ/WCHAN: Memory related.

¡ ADDR: Location of the process in the memory. If it is Running, a hyphen (-) is displayed.

¡ SZ: size in physical pages of the core image of the process.

¡ WCHAN: Address of the kernel function where the process is sleeping.

· TTY: Controlling tty (terminal). For a remote login, pts/2 port is used.

· CMD: Command.

# Display all processes.

root@HZ-UIS01-CVK01:~# ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

root 1 0.0 0.0 24572 2484 ? Ss 11:20 0:04 /sbin/init

root 2 0.0 0.0 0 0 ? S 11:20 0:00 [kthreadd]

root 3 0.0 0.0 0 0 ? S 11:20 0:00 [ksoftirqd/0]

root 6 0.0 0.0 0 0 ? S 11:20 0:00 [migration/0]

root 7 0.0 0.0 0 0 ? S 11:20 0:00 [watchdog/0]

root 8 0.0 0.0 0 0 ? S 11:20 0:00 [migration/1]

...

root 55719 1.0 0.0 71272 3520 ? Ss 17:42 0:00 sshd: root@pts/3

root 55752 8.6 0.0 21712 4204 pts/3 Ss 17:43 0:00 -bash

root 55927 0.0 0.0 16872 1284 pts/3 R+ 17:43 0:00 ps aux

root 62570 0.0 0.0 0 0 ? S 14:43 0:00 [kworker/7:2]

root 62840 0.0 0.0 0 0 ? S 16:40 0:00 [kworker/u:0]

# Display information about a process.

root@HZ-UIS01-CVK01:~# ps -fu mysql

UID PID PPID C STIME TTY TIME CMD

mysql 3144 1 0 11:21 ? 00:00:46 /usr/sbin/mysqld

Ending a process

kill -signal PID

The following are the signal types:

· 1 SIGHUP: Hangs up or disconnects a process. It's often used to restart a process or to update its configuration.

· 9 SIGKILL: Immediately terminates a process, without allowing it to clean up or save any data.

· 15 SIGTERM: Requests that the process terminate gracefully, allowing it to clean up any resources or save any data before exiting.

Networking

Configuring a network interface

# Display enabled network interfaces.

root@HZ-UIS01-CVK01:/etc/network# ifconfig

eth0 Link encap:Ethernet HWaddr 2C:76:8A:5B:3F:A0

UP BROADUIST MULTIUIST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

Interrupt:26 Memory:f6000000-f67fffff

eth1 Link encap:Ethernet HWaddr 2C:76:8A:5B:3F:A4

UP BROADUIST MULTIUIST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

Interrupt:28 Memory:f4800000-f4ffffff

...

The ifconfig -a command displays all network interfaces, including disabled network interfaces.

# Display information about a network interface.

root@HZ-UIS01-CVK01:/etc/network# ifconfig vswitch2

vswitch2 Link encap:Ethernet HWaddr 2C:76:8A:5D:DF:A0

inet addr:192.168.1.11 BUISt:192.168.1.255 Mask:255.255.255.0

inet6 addr: fe80::2e76:8aff:fe5d:dfa0/64 Scope:Link

UP BROADUIST RUNNING PROMISC MULTIUIST MTU:1500 Metric:1

RX packets:1134578 errors:0 dropped:7658 overruns:0 frame:0

TX packets:1013948 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:165047129 (157.4 Mb) TX bytes:111771007 (106.5 Mb)

# Shut down a network interface.

# ifconfig vswitch2 down

# Start a network interface.

# ifconfig vswitch2 up

# Configure a network interface (the configuration does not survive an interface or system restart).

# ifconfig vswitch2 192.168.2.12 netmask 255.255.255.0

# Restart a network interface.

# /etc/init.d/networking restart

To save the network interface configuration, use the vi editor to modify the /etc/network/interfaces configuration file.

Restart the network interface to have the change take effect.

auto vswitch2

iface vswitch2 inet static

address 192.168.1.11

netmask 255.255.255.0

network 192.168.1.0

broadUISt 192.168.1.255

gateway 192.168.1.254

# dns-* options are implemented by the resolvconf package, if installed

dns-nameservers 192.168.1.254

auto eth2

iface eth0 inet static

address 0.0.0.0

netmask 0.0.0.0

Displaying physical NIC information

root@UIS-CVK02:~# ethtool eth1

Settings for eth1:

Supported ports: [ TP ]

Supported link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Half 1000baseT/Full

Supported pause frame use: No

Supports auto-negotiation: Yes

Advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Half 1000baseT/Full

Advertised pause frame use: Symmetric

Advertised auto-negotiation: Yes

Link partner advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Full

Link partner advertised pause frame use: No

Link partner advertised auto-negotiation: Yes

Speed: 1000Mb/s

Duplex: Full

Port: Twisted Pair

PHYAD: 1

Transceiver: internal

Auto-negotiation: on

MDI-X: on

Supports Wake-on: g

Wake-on: g

Current message level: 0x000000ff (255)

drv probe link timer ifdown ifup rx_err tx_err

Link detected: yes

Displaying network statistics

netstat -[atunlp]

Options and parameters:

· -a: Displays the state of all sockets and all routing table entries.

· -t: Lists TCP network packet data.

· -u: Lists UDP network packet data.

· -n: Displays network addresses as numbers.

· -l: Lists the services that are being listened to.

· -p: Displays process PID information for the service.

# Display network connection statistics for the service that uses port 8080.

root@HZ-UIS01-CVK01:/etc/network# netstat -an | grep 8080

tcp6 0 0 :::8080 :::* LISTEN

tcp6 0 0 192.168.1.11:8080 10.165.136.197:55954 ESTABLISHED

tcp6 0 0 192.168.1.11:8080 10.165.136.197:55989 TIME_WAIT

tcp6 0 0 192.168.1.11:8080 10.165.136.197:55990 FIN_WAIT2

tcp6 0 0 192.168.1.11:8080 192.168.1.211:53366 ESTABLISHED

tcp6 0 0 192.168.1.11:8080 192.168.1.211:54850 TIME_WAIT

# Display routing information for the system.

root@HZ-UIS01-CVK01:/etc/network# netstat -rn

Kernel IP routing table

Destination Gateway Genmask Flags MSS Window irtt Iface

0.0.0.0 192.168.1.254 0.0.0.0 UG 0 0 0 vswitch2

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage

192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app

Capturing packets on a network

tcpdump

Options and parameters:

· -a: Converts network and broadcast addresses to names.

· -d: Displays the matching packet code in a human readable form to standard output and stop.

· -dd: Displays the matching packet code in the format of a C program segment.

· -ddd: Displays the matching packet code in decimal format.

· e: Prints data link layer header information on the output line.

· -t: Does not print timestamps on each output line.

· -vv: Outputs detailed packet information.

· -c: Stops tcpdump after receiving the specified number of packets.

· -i: Specifies the network interface to listen on.

· -w: Directly writes packet to a file without analyzing or printing it.

Example:

tcpdump -i vswitch2 -s 0 -w /tmp/test.cap host 200.1.1.1 &

Displaying routing information

# Display routing information.

root@HZ-UIS01-CVK01:/etc/network# route -n

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage

192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app

# Add static routing information to access the network at 10.10.10.0/24.

# route add -net 10.10.10.0 netmask 255.255.255.0 gw 192.168.2.254

root@HZ-UIS01-CVK01:/etc/network#

root@HZ-UIS01-CVK01:/etc/network# route -n

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2

10.10.10.0 192.168.2.254 255.255.255.0 UG 0 0 0 vswitch-storage

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage

192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app

# Delete routing information.

# route del -net 10.10.10.0 netmask 255.255.255.0 gw 192.168.2.254

root@HZ-UIS01-CVK01:/etc/network# route -n

Kernel IP routing table

Destination Gateway Genmask Flags Metric Ref Use Iface

0.0.0.0 192.168.1.254 0.0.0.0 UG 100 0 0 vswitch2

192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch2

192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-storage

192.168.3.0 0.0.0.0 255.255.255.0 U 0 0 0 vswitch-app

The static routing information generated by executing the command is only saved in the system's memory. For the information to take effect permanently, add the command to the system startup script so it can be executed during the startup process.

Use the vi editor in the operating system of UIS Manager to edit the /etc/rc.local file.

Add routing commands in the file. Restart the system for the modification to take effect.

root@HZ-UIS01-CVK01:/etc/network# vi /etc/rc.local

#!/bin/sh -e

# rc.local

# This script is executed at the end of each multiuser runlevel.

# Make sure that the script will "" on success or any other

# value on error.

# In order to enable or disable this script just change the execution

# bits.

# By default this script does nothing.

route add -net 192.168.5.0 netmask 255.255.255.0 gw 192.168.2.254

ulimit -s 10240

ulimit -c 1024

touch /var/run/h3c_UIS_cvk

/usr/bin/set-printk-console 2

exit 0

Disk management commands

Displaying the disk capacity

df [-ahikHTm] [directory or file]

Options and parameters:

· -a: Lists all file systems, including system-specific file systems such as /proc.

· -k: Displays the capacity of each file system in KBytes.

· -m: Displays the capacity of each file system in MBytes.

· -h: Displays the capacity of each file system in a human readable format, such as GBytes, MBytes, and KBytes.

· -H: Uses M=1000K instead of M=1024K for displaying capacities in larger units.

· -T: Lists the file system name of each partition, such as ext3.

· -i: Displays the number of inodes instead of disk usage.

# Display the partition size.

root@HZ-UIS01-CVK01:/etc/network# df -h

Filesystem Size Used Avail Use% Mounted on

/dev/sda1 28G 2.4G 25G 9% /

udev 32G 4.0K 32G 1% /dev

tmpfs 13G 396K 13G 1% /run

none 5.0M 0 5.0M 0% /run/lock

none 32G 17M 32G 1% /run/shm

/dev/sda6 241G 48G 181G 21% /vms

# Display information about a file system with partitions.

root@HZ-UIS01-CVK01:/etc/network# df -Th

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 28G 2.4G 25G 9% /

udev devtmpfs 32G 4.0K 32G 1% /dev

tmpfs tmpfs 13G 396K 13G 1% /run

none tmpfs 5.0M 0 5.0M 0% /run/lock

none tmpfs 32G 17M 32G 1% /run/shm

/dev/sda6 ext4 241G 48G 181G 21% /vms

Displaying the disk usage

du [-ahskm] file or directory name

Options and parameters:

· -a: Lists the capacity of all files or directories.

· -h: Displays the capacity of each file system in a human readable format, such as G/M.

· -s: Displays the total capacity.

· -S: Does not include statistics from subdirectories, which is slightly different from -s.

· -k: Displays the capacity in KBytes.

· -m: Displays the capacity in MBytes.

Example:

root@HZ-UIS01-CVK01:/vms# du -sh *

15G images

11G isos

16K lost+found

3.4G rhel-server-6.1-x86_64-dvd.iso

4.0K share

4.0K share-test

17G templet

4.0K test

Partitioning a disk

fdisk [-l] disk name

Options and parameters:

-l: Lists the partition tables for the specified disk.

If no disk is specified, the system lists all partitions of all disks in the system.

Example:

root@HZ-UIS01-CVK01:~# fdisk -l

Disk /dev/sda: 300.0 GB, 299966445568 bytes

255 heads, 63 sectors/track, 36468 cylinders, total 585871964 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 262144 bytes / 262144 bytes

Disk identifier: 0x00051ce2

Device Boot Start End Blocks Id System

/dev/sda1 * 512 58593791 29296640 83 Linux

/dev/sda2 58594302 585871359 263638529 5 Extended

Partition 2 does not start on physical sector boundary.

/dev/sda5 58594304 80138751 10772224 82 Linux swap / Solaris

/dev/sda6 80139264 585871359 252866048 83 Linux

Disk /dev/sdb: 4294 MB, 4294967296 bytes

133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

Disk /dev/sdb doesn't contain a valid partition table

# Create a partition on a disk.

root@HZ-UIS01-CVK01:~# fdisk /dev/sdb

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel

Building a new DOS disklabel with disk identifier 0xeb665aa3.

Changes will remain in memory only, until you decide to write them.

After that, of course, the previous content won't be recoverable.

Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): m

Command action

a toggle a bootable flag

b edit bsd disklabel

c toggle the dos compatibility flag

d delete a partition

l list known partition types

m print this menu

n add a new partition

o create a new empty DOS partition table

p print the partition table

q quit without saving changes

s create a new empty Sun disklabel

t change a partition's system id

u change display/entry units

v verify the partition table

w write table to disk and exit

x extra functionality (experts only)

Command (m for help): p

Disk /dev/sdb: 4294 MB, 4294967296 bytes

133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xeb665aa3

Device Boot Start End Blocks Id System

Command (m for help): n

Partition type:

p primary (0 primary, 0 extended, 4 free)

e extended

Select (default p): p

Partition number (1-4, default 1): 1

First sector (2048-8388607, default 2048)

Using default value 2048

Last sector, +sectors or +size{K,M,G} (2048-8388607, default 8388607): 4000000

Command (m for help): n

Partition type:

p primary (1 primary, 0 extended, 3 free)

e extended

Select (default p): p

Partition number (1-4, default 2): 2

First sector (4000001-8388607, default 4000001)

Using default value 4000001

Last sector, +sectors or +size{K,M,G} (4000001-8388607, default 8388607): +500M

Command (m for help): p

Disk /dev/sdb: 4294 MB, 4294967296 bytes

133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xeb665aa3

Device Boot Start End Blocks Id System

/dev/sdb1 2048 4000000 1998976+ 83 Linux

/dev/sdb2 4000001 5024000 512000 83 Linux

Command (m for help): w

The partition table has been altered!

Calling ioctl() to re-read partition table.

Syncing disks.

# Display disk partition information.

root@HZ-UIS01-CVK01:~# fdisk -l /dev/sdb

Disk /dev/sdb: 4294 MB, 4294967296 bytes

133 heads, 62 sectors/track, 1017 cylinders, total 8388608 sectors

Units = sectors of 1 * 512 = 512 bytes

Sector size (logical/physical): 512 bytes / 512 bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0xeb665aa3

Device Boot Start End Blocks Id System

/dev/sdb1 2048 4000000 1998976+ 83 Linux

/dev/sdb2 4000001 5024000 512000 83 Linux

Making a file system

mkfs [-t file system format] disk name

Options and parameters:

-t: Specifies the file system type, for example, ext2, ext3, ext4, or ocfs2.

# Make an ex3 file system on /dev/sdb1.

root@HZ-UIS01-CVK01:~# mkfs -t ext3 /dev/sdb1

mke2fs 1.42 (29-Nov-2011)

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

Stride=0 blocks, Stripe width=0 blocks

125184 inodes, 499744 blocks

24987 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=515899392

16 block groups

32768 blocks per group, 32768 fragments per group

7824 inodes per group

Superblock backups stored on blocks:

32768, 98304, 163840, 229376, 294912

Allocating group tables: done

Writing inode tables: done

Creating journal (8192 blocks): done

Writing superblocks and filesystem accounting information: done

root@HZ-UIS01-CVK01:~#

# Make an ocfs2 file system on /dev/sdb1.

root@HZ-UIS01-CVK01:~# mkfs -t ocfs2 /dev/sdb2

mkfs.ocfs2 1.6.3

Cluster stack: classic o2cb

Label:

Features: sparse backup-super unwritten inline-data strict-journal-super xattr

Block size: 1024 (10 bits)

Cluster size: 4096 (12 bits)

Volume size: 524288000 (128000 clusters) (512000 blocks)

Cluster groups: 17 (tail covers 5120 clusters, rest cover 7680 clusters)

Extent allocator size: 2097152 (1 groups)

Journal size: 16777216

Node slots: 2

Creating bitmaps: done

Initializing superblock: done

Writing system files: done

Writing superblock: done

Writing backup superblock: 0 block(s)

Formatting Journals: done

Growing extent allocator: done

Formatting slot map: done

Formatting quota files: done

Writing lost+found: done

mkfs.ocfs2 successful

root@HZ-UIS01-CVK01:~#

Checking a disk

fsck [-t file system format] [-ACay] disk name

Options and parameters:

· -t: Specifies the file system type. This option is typically not required, because the current Linux system automatically distinguishes file system types through the superblock.

· -A: Scans the necessary disks based on the content of /etc/fstab. This command is typically executed during the boot process.

· -a: Automatically repairs detected abnormal sectors, so you don't have to keep pressing y.

· -y: Similar to -a, but some file systems only support the -y parameter.

· -C: Enables a histogram to display the current progress during the check.

# Check the /dev/sdb1 partition.

root@HZ-UIS01-CVK01:~# fsck -C /dev/sdb1

fsck from util-linux 2.20.1

e2fsck 1.42 (29-Nov-2011)

/dev/sdb1: clean, 11/125184 files, 16807/499744 blocks

Mounting a file system

mount [-t file system type] [-L Lable name] [-o additional option] [-n] disk file name mount point

Options and parameters:

· -a: Mounts all file systems based on the data in the /etc/fstab configuration file.

· -l: Displays the column label name besides the mounting information.

· -t: Specifies the type of file system to be mounted.

· -n: By default, the system writes the actual mounting information to /etc/mtab in real time to facilitate operation of other programs.

· -L: Mounts the partition that has the specified label.

· -l: Add labels in the mount output, for example, account, password, or read privilege.

# Mount /dev/sdb1 to /mnt.

root@HZ-UIS01-CVK01:~# mount /dev/sdb1 /mnt

root@HZ-UIS01-CVK01:~# df -Th

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 28G 5.7G 21G 22% /

udev devtmpfs 32G 4.0K 32G 1% /dev

tmpfs tmpfs 13G 408K 13G 1% /run

none tmpfs 5.0M 0 5.0M 0% /run/lock

none tmpfs 32G 17M 32G 1% /run/shm

/dev/sda6 ext4 241G 48G 181G 21% /vms

/dev/sdb1 ext3 1.9G 35M 1.8G 2% /mnt

Umounting a file system

umount [-fn] disk file name

Options and parameters:

· -f: Unmounts a file system forcibly. Use this parameter if no data can be read from a network file system (NFS).

· -n: Unmounts a file system without writing in the /etc/mtab directory.

Example:

root@HZ-UIS01-CVK01:~# df -Th

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 28G 5.7G 21G 22% /

udev devtmpfs 32G 4.0K 32G 1% /dev

tmpfs tmpfs 13G 408K 13G 1% /run

none tmpfs 5.0M 0 5.0M 0% /run/lock

none tmpfs 32G 17M 32G 1% /run/shm

/dev/sda6 ext4 241G 48G 181G 21% /vms

/dev/sdb1 ext3 1.9G 35M 1.8G 2% /mnt

root@HZ-UIS01-CVK01:~#

root@HZ-UIS01-CVK01:~# umount /mnt

root@HZ-UIS01-CVK01:~# df -Th

Filesystem Type Size Used Avail Use% Mounted on

/dev/sda1 ext4 28G 5.7G 21G 22% /

udev devtmpfs 32G 4.0K 32G 1% /dev

tmpfs tmpfs 13G 408K 13G 1% /run

none tmpfs 5.0M 0 5.0M 0% /run/lock

none tmpfs 32G 17M 32G 1% /run/shm

/dev/sda6 ext4 241G 48G 181G 21% /vms

Writing data to a disk

Use the sync command to write data not updated in the memory to a disk.

Example:

root@HZ-UIS01-CVK01:~# sync

root@HZ-UIS01-CVK01:~#