- Released At: 25-06-2024
- Page Views:
- Downloads:
- Related Documents
-
H3C ONEStor Distributed Storage System V3
Component Replacement Guide
Document version: 6W100-20240630
Copyright © 2024 New H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.
Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.
The information in this document is subject to change without notice.
Contents
Application scenarios and precautions
Replacing the CPU, system board, or network adapter
Viewing the software version through the storage system management interface
Viewing the software version through the operating system
Checking the cluster health state
Checking the cluster hardware state
Replacing the CPU, system board, or NIC without shutting down the faulty node
Replacing the CPU, system board, or NIC with shutting down the faulty node
Using Toolkit to perform an inspection
Replacing components such as memory modules, RAID controllers, and drive backplanes
Component replacement workflow
Identifying the software version
Viewing the software version on the ONEStor management interface
Viewing the software version on the operating system
Checking the cluster health state
Manually checking the cluster health state
Checking the service load of the cluster
Checking the hardware state of the cluster
Using Toolkit to perform an inspection
Component replacement workflow
Identifying the software version
Viewing the software version on the ONEStor management interface
Viewing the software version on the operating system
Querying disk letters through disk slot number
Determining the hard disk type through partition and mounting
Checking the cluster health state
Checking the service load of the cluster
Checking the hardware state of the cluster
Replacing data disks (without using one-click disk replacement)
One-click replacing data disks
Replacing FlashCache and Scache cache disks (one-click disk replacement)
Replacing FlashCache cache disks (ONEStor R21xx non-one-click disk replacement)
Replacing FlashCache cache disks (ONEStor E33xx non-one-click disk replacement)
Replacing Scache cache disks (ONEStor E31xx, E33xx, one-click disk replacement)
Replacing Scache cache disks (ONEStor E31xx, E33xx non-one-click disk replacement)
Using Toolkit to perform an inspection
Common operations during disk replacement
Finding FlashCache partition information
Finding and deleting FlashCache information and cache partitions
Finding and deleting Scache information and cache partitions
Overview
This document describes how to replace components in the nodes of the H3C ONEStor distributed storage system (hereinafter referred to as ONEStor).
Application scenarios and precautions
When you use this document, follow these restrictions and guidelines:
· This document is applicable only to standalone deployment of ONEStor. If ONEStor is deployed in consolidation with other products (such as H3C CAS), contact Technical Support for the operation methods on these products.
· This document primarily explains software-related operations during the replacement process of cluster node components. For specific hardware installation and removal methods, see the user guide or hardware manual of the product.
· The information in this document is subject to change without notice due to product version upgrade or other reasons. To obtain the most recent document, contact Technical Support.
· Due to product version upgrades or other reasons, the product interface and feature parameters might change.
· As a best practice, before executing component replacement operations according to this document, use the toolkit to inspect and record relevant information on site.
· Replacing certain components (such as the CPU, system board, and network adapter) might change the hardware information of the device, causing the installed licenses to become invalid. In this case, contact Technical Support to submit an license change request to update the hardware information bound to the licenses
· This document describes RAID controller operations specific to H3C servers. For RAID controller operations on servers of other brands, contact the respective manufacturers.
Technical support
New H3C Technologies Co., Ltd. always strives to offer the most convenient and high-quality products. If you have any questions about the operations, please stop the operation and get help through the following methods:
· Hotline: 400-810-0504
· Service website: zhiliao.h3c.com
· Technical support mailbox: [email protected]
Replacing the CPU, system board, or network adapter
Component replacement process
The workflow for replacing a CPU, the system board, or a network adapter in a cluster node is as shown in Figure 1.
Figure 1 Workflow for replacing a CPU, the system board, or a network adapter in a cluster node
Checking the software version
Before executing component replacement, first identify the product software version. Execute the replacement operation based on the software version.
Viewing the software version through the storage system management interface
Log into the storage system management interface of the cluster, as shown in Figure 2. You can view the software version in the user information area. In some versions, you can click the button in the upper right corner of the interface to view the software version. Table 1 shows the relationship between software version and product version.
Figure 2 Viewing the software version (management interface)
Table 1 Relationship between software version and product version
Software version |
Product version |
E011x, R011x |
ONEStor 1.0 |
E03xx, R03xx |
ONEStor 2.0 |
E21xx, R21xx, E31xx, R31xx, E33xx |
ONEStor 3.0 |
E51xx, E52xx |
ONEStor 5.0 |
Viewing the software version through the operating system
Log into the operating system's CLI of the node, as shown in Figure 3. Execute the cat /etc/onestor_external_version command to view the software version. Table 2 shows the relationship between software version and product version.
Figure 3 Viewing the software version (operating system)
Table 2 Relationship between software version and product version
Software version |
Product Version |
E011x, R011x |
ONEStor 1.0 |
E03xx, R03xx |
ONEStor 2.0 |
E21xx, R21xx, E31xx, R31xx, E33xx |
ONEStor 3.0 |
E51xx, E52xx |
ONEStor 5.0 |
Checking the cluster state
CAUTION: Before replacing a component, complete all the checks for the cluster as outlined in this section. Before replacing a component, make sure all checklist items meet the conditions for component replacement. |
Checking the cluster health state
1. As shown in Figure 4, log in to the system management page of the storage system. On the Dashboard page, verify if the cluster health state is at 100% and if no alarms are present in the cluster. If the cluster health state is not at 100% or if alarms are present in the cluster, wait for the cluster health state to automatically recover and troubleshoot the faults that triggered the alarms. If the cluster health state does not recover after a period, contact Technical Support.
Figure 4 Confirming the cluster health and alarm state
2. As shown in Figure 5, execute the watch ceph –s command from the CLI of the operating system on any node in the cluster. Then, continuously monitor the cluster health state for about a minute and verify if the health state recovers to Health_OK. If the health state is not Health_OK, contact Technical Support.
Figure 5 Identifying the cluster state from the CLI
Checking the cluster workload
Checking the iostat state
1. Log in to the operating system's CLI of all nodes in the cluster through SSH.
2. As shown in Figure 6, execute the iostat -x 1 command and then continuously monitor the CPU usage and disk pressure of all nodes. This command continuously outputs iostat information every second. As a best practice, observe each node for about 2 minutes as. The key parameters for cluster service stress are as follows:
¡ The CPU's %idle value is above 40.
¡ The disk I/O busy rate (%util) is below 40%.
¡ The average processing time for each I/O request (svctm) is under 20 milliseconds.
¡ The average I/O wait time (await), average read operation wait time (r_await), and average write operation wait time (w_await) are below 20 milliseconds.
|
NOTE: It is normal for a parameter value to momentarily exceed the upper limit. If a parameter value continuously remains above the upper limit, wait for the service pressure to decrease or pause some operations until the cluster service pressure meets the conditions, and then proceed with subsequent actions. |
Checking the memory usage
1. Log in to the operating system's CLI of all nodes in the cluster through SSH.
2. Execute the sync;echo 3 > /proc/sys/vm/drop_cachescommand to release the memory cache, and then wait for about 1 minute.
3. As shown in Figure 7, execute the free –m command to check the memory usage. Make sure the memory usage remains below 60%. The memory usage is the ratio of the used memory to the total memory capacity.
Checking the configuration
Checking the switch configuration
Verify if the storage switch and service switch have STP enabled.
· If STP is enabled, verify that the port connecting the switch to the server is configured as an edge port. For specific methods, see the command reference for the switch.
· If STP is not enabled, skip this check.
|
NOTE: To change the switch configuration, contact Technical Support. |
Checking the host route information
1. As shown in Figure 8, execute the route –n command in the operating system on all nodes within the cluster to check the route information of the nodes.
Figure 8 Checking the node route information
2. As shown in Figure 9, execute the cat /etc/sysconfig/network-scripts/route-ethxx command on all nodes in the cluster (where ethxx is the network adapter port corresponding to the route) to check if the network configuration files of the nodes contain the relevant route configuration. If the files do not contain the relevant route configuration, write the corresponding route configuration into the /etc/sysconfig/network-scripts/route-ethxx file. If the file does not exist on a node, create the file manually.
Figure 9 Checking the node configuration file
Checking the drive cache
|
NOTE: · Execute the following operations on all nodes within the cluster. If the results do not match the expectations, contact Technical Support. · This document describes RAID controller operations specific to H3C servers. For RAID controller operations on servers of other brands, contact the respective manufacturers. |
PMC RAID controller (PM8060)
1. As shown in Figure 10, execute the arcconf getconfig x pd | grep -i “write cache” command (where x is the storage controller number) to check if the drive write cache on the node is disabled. Make sure all the output results are Disabled (write-through).
Figure 10 Checking the drive write cache
2. Execute the arcconf getconfig x ld command (where x is the storage controller number). Verify that read-write cache is enabled for all HDD RAID controllers on the node, set the powerfail safeguard mode, and then verify that read-write cache is disabled for all SSD storage controllers. Normally, the states of HDD and SSD are as shown in Figure 11 and Figure 12.
Figure 11 Example of normal HDD states
Figure 12 Example of normal SSD states
PMC RAID controller (P460)
1. As shown in Figure 13, execute the arcconf getconfig x ad |grep " Physical Drive Write Cache Policy Information" -A4 command (where x is the RAID controller number) to check if the drive write cache is disabled on the node. Verify that all output results are Disabled.
Figure 13 Checking the drive write cache
2. As shown in Figure 14, execute the arcconf getconfig x ad | grep -i cache command to query the RAID controller configuration (where x is the RAID controller number). By default, the Read Cache is to 10%, the Write Cache is 90%, and the No-Battery Write Cache is disabled.
Figure 14 Viewing the RAID controller configuration
3. Execute the arcconf getconfig x ld command (where x is the storage controller number). Verify that read-write cache is enabled for all HDD RAID controllers on the node, set the powerfail safeguard mode, and then verify that read-write cache is disabled for all SSD storage controllers. Normally, the states of HDD and SSD are as shown in Figure 15 and Figure 16.
Figure 15 Example of normal HDD states
Figure 16 Example of normal SSD states
LSI RAID controller
1. As shown in the figure, execute the /opt/raid/MegaRAID/storcli/storcli64 show command to determine the compatibility between StorCLI and the RAID controller. Based on the results, select the appropriate instruction set for subsequent operations.
¡ If the output information includes the target LSI RAID controller, it indicates that StorCLI is compatible with this RAID controller. To obtain the driver letter, see the procedure in this section.
¡ If the output information does not include the target LSI RAID controller, it indicates that StorCLI is incompatible with that RAID controller. To obtain the driver letter, see "Common commands used in MegaCLI" to use MegaCLI commands.
CAUTION: Before performing LSI RAID controller operations, make sure the RAID controller is compatible and select the correct instruction set to avoid anomalies caused by compatibility issues. |
Figure 17 Identifying the compatibility
2. View the correspondence between the enclosure number, drive slot number, logical array, drive DID, and disk letter under the RAID controller.
a. Execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show all command (where N represents the RAID controller number) to view information about physical drives and virtual drives. In the PD LIST, use the enclosure number and drive slot number to obtain the corresponding DG. Then, find the logical array VD in the VD LIST based on the DG.
Figure 18 Mappings between PD and VD
a. Execute the /opt/raid/MegaRAID/storcli/storcli64 /cN/vx show all command (where N represents the RAID controller number and x represents the VD number) to obtain the disk letter. In this example, the disk letter corresponding to VD 238 is /dev/sdb.
Figure 19 Obtaining the disk letter
3. Execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show all command (where N represents the RAID controller number) to view the read and write cache of the logic drive, as shown in Figure 20. RWTD represents R (read cache enabled) and WT (write cache disables). The read-write cache state is as shown in Table 3. Write cache enabling modes include WB and AWB. Using the WB mode requires a functioning battery pack (BBU). If the BBU is not in place or its state is abnormal, use the AWB mode.
Figure 20 Viewing the state of logical drive cache
Table 3 Logic drive cache states
Cache type |
Field value |
Description |
Read cache |
R(RA) |
Enabled |
NR(NORA) |
Disabled |
|
Write cache |
WB/AWB |
Enabled |
WT |
Disabled |
4. To query the BBU state, as shown in Figure 21, execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show all | grep -i bbu command (where N represents the RAID controller number).
Figure 21 Viewing the BBU state
5. As shown in Figure 22, execute the smartctl -g wcache -d megaraid,DID /dev/sdx command to query the drive write cache. Replace DID with the drive DID and x with the disk letter as shown in Figure 18.
Figure 22 Viewing the drive write cache
HP SSA RAID controller
1. As shown in Figure 23, execute the hpssacli ctrl all show config detail | grep -i cache command to verify if the drive write cache is disabled on the node. By default, the Cache Ratio is 10% read and 90% write, the Drive Write Cache is disabled, and the No-Battery Write Cache is disabled.
Figure 23 Checking the drive write cache
2. As shown in Figure 24, execute the hpssacli ctrl slot=x ld all show detail command (where x represents the RAID controller number) to verify if the cache mode settings of each RAID controller are correct. Normally, set the LD Acceleration Method to Controller Cache for HDDs and to Disabled or Smart IO Path for SSDs.
Figure 24 Checking the cache mode
3. As shown in Figure 25, execute the hpssacli ctrl all show config detail | grep -i Power command to verify if the Current Power Mode of the storage controller is set to Max Performance mode.
Figure 25 Checking the storage controller mode
Checking the cluster hardware state
Log in to the HDM/iLO of all nodes in the cluster to examine for any hardware errors. If errors exist on a hardware component other than the component to be replaced, contact Technical Support.
Checking the NTP clock
|
NOTE: For version E33xx, if the NTP state is not as expected, you can execute the date -s "yy/mm/dd hh:mm:ss" command in the operating system to adjust the system time online, synchronizing the time across all nodes in the cluster. |
As shown in Figure 26, execute the ntpq –p command in the CLI of the operating system on all nodes within the cluster to check the NTP server state of all nodes. Under normal circumstances, all nodes point to the same NTP server, with the server's state not being INIT, and the offset value within 100 ms. If the NTP state is not as expected, contact Technical Support. Parameter descriptions:
· An asterisk (*) before the server IP address indicates that the current server is the primary NTP server. A plus sign (+) indicates that the server is a secondary NTP server.
· When the Refid state is INIT, the NTP server state is abnormal.
Replacing components
In situations where the faulty node is not shut down and the faulty node has been shut down due to a fault, the methods of replacing components are different. Perform the corresponding operations based on the on-site situation.
Replacing the CPU, system board, or NIC without shutting down the faulty node
If the faulty node is not down due to hardware failure, see this section to replace components. Figure 27 shows the replacement workflow.
Figure 27 Workflow for replacing the CPU, system board, or NIC without shutting down the faulty node
Enabling maintenance mode
Procedure for the scenario where you can log in to the management interface
1. Log in to the management interface.
2. As shown in Figure 28, from the left navigation pane, select Hosts > Storage Nodes. Click More in the Actions column for the target storage node, and then select Maintenance Mode.
Figure 28 Enabling maintenance mode (New UI)
3. As shown in Figure 29, in the dialog box that opens, select On to enable the maintenance mode, and then click OK.
Figure 29 Enabling maintenance mode (New UI)
Procedure for the scenario where you cannot log in to the management interface
If you cannot log in to the management interface, enable maintenance mode for a node through the CLI.
1. Log in to the operating system of any node in the cluster via SSH.
2. In the operating system, execute the ceph osd set-osd noout osd_id and ceph osd set-osd noup osd_id commands to enable the maintenance mode. The osd_id argument is the OSD ID for which the maintenance mode is to be enabled. You can specify multiple OSD IDs as needed, for example, ceph osd set-osd noout 1 2 3 and ceph osd set-osd noup 1 2 3.
3. As shown in Figure 30, execute the ceph –s command to verify that the cluster state has changed to Health_WARN and the system prompts noup, noout flag(s) set.
Figure 30 Enabling maintenance mode from the CLI
Disconnecting the faulty node from the network
Disconnect the network cables of the faulty node from the management network, storage network, and service network.
|
NOTE: Before disconnecting the network cables, record the network cable connections in advance to facilitate the restoration of the node networks after component replacement. |
Manually stopping OSDs
1. Execute the systemctl stop ceph-osd.target command on the faulty node.
2. Wait for about 1 minute, and then execute the ceph osd tree command to verify that only the state of all OSDs on the faulty node has changed to down, and the state of OSDs on other nodes remains up.
3. Execute the ceph –s command to verify that the PGs are not in pg peering, pg stale, pg activating, pg incomplete, or pg inactive state.
|
NOTE: PG peering, PG stale, and PG activating are intermediate states of PGs after an OSD is stopped. They typically end automatically within a few seconds to tens of seconds. If the states persist for a long period of time, contact Technical Support for help. |
Manually shutting down a node
1. Execute the sync command to flush the memory.
2. Execute the hwclock –w command to write the clock to the BIOS.
3. Execute the shutdown -h now command to shut down the node. During the shutdown process, keep observing the power status on the HDM interface to avoid failures in executing the shutdown command or prevent execution of the shutdown command from getting stuck.
Replacing components
After the node is properly shut down, power off the faulty node and proceed with the hardware replacement. For more information, see the user guide or hardware configuration guide for the product.
Starting the node
1. After replacing components, power on the node and check the HDM page for any component error messages.
2. Log in to the remote console through HDM, and identify whether any error messages were generated during startup inspection. If any error messages were generated, resolve the issues first.
3. After the node starts up, log in to the CLI of the node through the HDM remote console, and then execute the date command to identify whether the system time of the node is consistent with that of other cluster nodes. If they are inconsistent, execute the date -s command to manually set the system time of the node. Make sure the time difference between the node and the other cluster nodes is smaller than seven seconds. Then, execute the hwclock –w command to synchronize the system clock to hardware.
4. Execute the ifconfig –a command to identify whether the name of the physical NIC is changed after the hardware replacement.
¡ If the NIC name is not changed, connect the management network cables, and then execute the ping command to identify whether the faulty node can communicate with other nodes over the management network. If they can communicate with each other, proceed to the next step. If they cannot communicate with each other, check for abnormal network ports and links.
¡ If the NIC name is changed, reconfigure the network ports of the related NIC. For more information, contact Technical Support. After configuration, connect the management network cables, and then execute the ifup ethxx command to start the management network. ethxx is the name of the physical port connected to the management network.
5. Reconnect the other network cables. Execute the ifup ethxx command to manually bring up the physical NIC ports. ethxx represents the name of a physical NIC port. Then, execute the ip addr command to verify that all physical NIC ports are up. For example:
ifup ethB03-0 //ethB03-0 is the name of the physical network interface
6. Verify that all bond interfaces are running correctly. Example:
root@node1:~# cat /proc/net/bonding/bond0 //bond0 is the name of the bond port. All bond ports must be checked.
The output is as follows:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (0)
MII Status: up //The bond port is up.
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 13
Partner Key: 1
Partner Mac Address: 38:91:d5:e0:6c:99
Slave Interface: p4p1
MII Status: up //Member port p4p1 of bond0 is up.
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 8c:dc:d4:15:f7:58
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: p4p2
MII Status: up //Member port p4p2 of bond0 is up.
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 8c:dc:d4:15:f7:5c
Aggregator ID: 1
Slave queue ID: 0
7. Use the ping command to verify that the node can communicate with other nodes in the cluster on both the storage and service networks. As a best practice, keep the ping operation for one minute. If packet loss does not occur, the network is running correctly. If the ping operation fails or packet loss occurs, resolve the network issue first.
8. Use the ping command to verify that the node can communicate with the client on the service network. As a best practice, keep the ping operation for one minute. If packet loss does not occur, the network is running correctly. If the ping operation fails or packet loss occurs, resolve the network issue first.
Disabling maintenance mode
Procedure for the scenario where you can log in to the management interface
If the maintenance mode is previously enabled through the management interface of the storage system, log in to the management interface of the storage system to disable the maintenance mode.
1. As shown in Figure 31, from the left navigation pane, select Hosts > Storage Nodes. Click More in the Actions column for the target storage node, and then select Maintenance Mode.
Figure 31 Disabling the maintenance mode (New UI)
2. As shown in Figure 32, in the dialog box that opens, select Off to disable the maintenance mode, and then click OK.
Figure 32 Disabling the maintenance mode (New UI)
Procedure for the scenario where you cannot log in to the management interface
If the maintenance mode is previously enabled through the operating system CLI of the node, disable it through the operating system CLI of the node.
1. Log in to the operating system of any cluster node through SSH, and then access the CLI.
2. In the operating system, execute the ceph osd unset-osd noout osd_id and ceph osd unset-osd noup osd_id commands to disable the maintenance mode. osd_id is the OSD ID for which the maintenance mode is to be disabled. You can specify multiple OSD IDs as needed, for example, ceph osd unset-osd noout 1 2 3 and ceph osd unset-osd noup 1 2 3.
3. Wait for five minutes, and then execute the ceph osd tree command to identify whether all OSDs on the node are up. If any OSD on a node is not up, execute the ceph-disk activate-all command on the node to pull up all OSDs. Then, execute the ceph osd tree command again to verify that all OSDs on the node are up.
Checking the cluster state
Log in to the management interface of the storage system, keep observing the cluster's health status until the health status recovers to 100% and all alarms are cleared.
Replacing the CPU, system board, or NIC with shutting down the faulty node
If the faulty node goes down due to hardware failure, see this section to replace components. Figure 33 shows the replacement workflow.
Figure 33 Workflow for replacing the CPU, system board, or NIC with the faulty node shut down
Disconnecting the faulty node from the network
Disconnect the network cables of the faulty node from the management network, storage network, and service network.
|
NOTE: Before disconnecting the network cables, record the network cable connections in advance to facilitate the restoration of the node networks after component replacement. |
Enabling maintenance mode
Procedure for the scenario where you can log in to the management interface
1. Log in to the management interface.
2. As shown in Figure 34, from the left navigation pane, select Hosts > Storage Nodes. Click More in the Actions column for the target storage node, and then select Maintenance Mode.
Figure 34 Enabling maintenance mode (New UI)
3. As shown in Figure 35, in the dialog box that opens, select On to enable the maintenance mode, and then click OK.
Figure 35 Enabling maintenance mode (New UI)
Procedure for the scenario where you cannot log in to the management interface
If you cannot log in to the management interface, enable maintenance mode for a node through the CLI.
1. Log in to the operating system of any node in the cluster via SSH.
2. In the operating system, execute the ceph osd set-osd noout osd_id and ceph osd set-osd noup osd_id commands to enable the maintenance mode. The osd_id argument is the OSD ID for which the maintenance mode is to be enabled. You can specify multiple OSD IDs as needed, for example, ceph osd set-osd noout 1 2 3 and ceph osd set-osd noup 1 2 3.
3. As shown in Figure 36, execute the ceph –s command to verify that the cluster state has changed to Health_WARN and the system prompts noup, noout flag(s) set.
Figure 36 Enabling maintenance mode from the CLI
Replacing components
Power off the faulty node, and then proceed with the hardware replacement. For more information, see the user guide or hardware configuration guide for the product.
Starting the node
1. After replacing components, power on the node and check the HDM page for any component error messages.
2. Log in to the remote console through HDM, and identify whether any error messages were generated during startup inspection. If any error messages were generated, resolve the issues first.
3. After the node starts up, log in to the CLI of the node through the HDM remote console, and then execute the date command to identify whether the system time of the node is consistent with that of other cluster nodes. If they are inconsistent, execute the date -s command to manually set the system time of the node. Make sure the time difference between the node and the other cluster nodes is smaller than seven seconds. Then, execute the hwclock –w command to synchronize the system clock to hardware.
4. Execute the ifconfig –a command to identify whether the name of the physical NIC is changed after the hardware replacement.
¡ If the NIC name is not changed, connect the management network cables, and then execute the ping command to identify whether the faulty node can communicate with other nodes over the management network. If they can communicate with each other, proceed to the next step. If they cannot communicate with each other, check for abnormal network ports and links.
¡ If the NIC name is changed, reconfigure the network ports of the related NIC. For more information, contact Technical Support. After configuration, connect the management network cables, and then execute the ifup ethxx command to start the management network. ethxx is the name of the physical port connected to the management network.
5. Reconnect the other network cables. Execute the ifup ethxx command to manually bring up the physical NIC ports. ethxx represents the name of a physical NIC port. Then, execute the ip addr command to verify that all physical NIC ports are up. For example:
ifup ethB03-0 //ethB03-0 is the name of the physical port.
6. Verify that all bond interfaces are running correctly. Example:
root@node1:~# cat /proc/net/bonding/bond0 //bond0 is the name of the bond port. All bond ports must be checked.
The output is as follows:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (0)
MII Status: up //The bond port is up.
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 13
Partner Key: 1
Partner Mac Address: 38:91:d5:e0:6c:99
Slave Interface: p4p1
MII Status: up //Member port p4p1 of bond0 is up.
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 8c:dc:d4:15:f7:58
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: p4p2
MII Status: up //Member port p4p2 of bond0 is up.
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 8c:dc:d4:15:f7:5c
Aggregator ID: 1
Slave queue ID: 0
7. Use the ping command to verify that the node can communicate with other nodes in the cluster on both the storage and service networks. As a best practice, keep the ping operation for one minute. If packet loss does not occur, the network is running correctly. If the ping operation fails or packet loss occurs, resolve the network issue first.
8. Use the ping command to verify that the node can communicate with the client on the service network. As a best practice, keep the ping operation for one minute. If packet loss does not occur, the network is running correctly. If the ping operation fails or packet loss occurs, resolve the network issue first.
Disabling maintenance mode
Procedure for the scenario where you can log in to the management interface
If the maintenance mode is previously enabled through the management interface of the storage system, log in to the management interface of the storage system to disable the maintenance mode.
1. As shown in Figure 37, from the left navigation pane, select Hosts > Storage Nodes. Click More in the Actions column for the target storage node, and then select Maintenance Mode.
Figure 37 Disabling the maintenance mode (New UI)
2. As shown in Figure 38, in the dialog box that opens, select Off to disable the maintenance mode, and then click OK.
Figure 38 Disabling the maintenance mode (New UI)
Procedure for the scenario where you cannot log in to the management interface
If the maintenance mode is previously enabled through the operating system CLI of the node, disable it through the operating system CLI of the node.
1. Log in to the operating system of any cluster node through SSH, and then access the CLI.
2. In the operating system, execute the ceph osd unset-osd noout osd_id and ceph osd unset-osd noup osd_id commands to disable the maintenance mode. osd_id is the OSD ID for which the maintenance mode is to be disabled. You can specify multiple OSD IDs as needed, for example, ceph osd unset-osd noout 1 2 3 and ceph osd unset-osd noup 1 2 3.
3. Wait for five minutes, and then execute the ceph osd tree command to identify whether all OSDs on the node are up. If any OSD on a node is not up, execute the ceph-disk activate-all command on the node to pull up all OSDs. Then, execute the ceph osd tree command again to verify that all OSDs on the node are up.
Checking the cluster state
Log in to the management interface of the storage system, keep observing the cluster's health status until the health status recovers to 100% and all alarms are cleared.
Changing licenses
If a host where components are replaced is the management node in the cluster, the component replacement might invalidate the product license. To ensure normal operation, submit a license change request and edit hardware device information bound to the license. For more information, see Technical Support.
Using Toolkit to perform an inspection
Use the toolkit to perform an inspection after completing the above operations. For information about how to use Toolkit, see the Toolkit usage guide.
· If the inspection result does not include any error messages, the component replacement is successful.
· If the inspection results include error messages, resolve the issues based on the messages. If you cannot resolve the issues, contact Technical Support.
Replacing components such as memory modules, RAID controllers, and drive backplanes
This chapter describes how to replace components on cluster nodes that do not involve BIOS time and network configuration, including but not limited to memory, RAID controllers, and drive backplanes.
Component replacement workflow
Figure 39 Workflow for replacing components such as memory modules, RAID controllers, and drive backplanes
Identifying the software version
Identify the software version before you replace components in the cluster.
Viewing the software version on the ONEStor management interface
Log in to the ONEStor management interface,
as shown in Figure 40. Then, view the software version in the administrative section. For
some versions, you can click the icon on the
top right of the page to view the software version. Table 4 shows
the software version and product version compatibility.
Figure 40 Viewing the software version on the management interface
Table 4 Software version and product version compatibility matrix
Software version |
Product version |
E011x, R011x |
ONEStor 1.0 |
E03xx, R03xx |
ONEStor 2.0 |
E21xx, R21xx, E31xx, R31xx, E33xx |
ONEStor 3.0 |
E51xx, E52xx |
ONEStor 5.0 |
Viewing the software version on the operating system
Log in to the CLI of the node, as shown in Figure 41. Then, execute the cat /etc/onestor_external_version command to view the software version. Table 5 shows the software version and product version compatibility.
Figure 41 Viewing the software version on the operating system
Table 5 Software version and product version compatibility matrix
Software version |
Product version |
E011x, R011x |
ONEStor 1.0 |
E03xx, R03xx |
ONEStor 2.0 |
E21xx, R21xx, E31xx, R31xx, E33xx |
ONEStor 3.0 |
E51xx, E52xx |
ONEStor 5.0 |
Checking the cluster health state
CAUTION: Before you replace components, complete all checks for the cluster and proceed with component replacement only after you verify that all conditions for component replacement are met. |
Manually checking the cluster health state
1. As shown in Figure 42, log in to the management interface, and identify whether the cluster health score is 100% and whether any alarms were generated. If the cluster health score is not 100% or alarms were generated, wait for the cluster to automatically recover. If the cluster does not recover after a time period, contact Technical Support.
Figure 42 Checking cluster health state and alarms
2. As shown in Figure 43, access the CLI of any cluster node, and then execute the watch ceph –s command to monitor the cluster health state for one minute. The cluster is in healthy state if the health state value is Health_OK. If the health state is not Health_OK, contact Technical Support.
Figure 43 Checking cluster state in the CLI
Checking the service load of the cluster
Checking the CPU usage and disk pressure
1. Log in to the operating system on all nodes in the cluster via SSH.
2. Execute the iostat -x 1 command and monitor the CPU usage and disk pressure of all nodes. This command outputs iostat information every second. As a best practice, observe each node for about 2 minutes. Key parameters for cluster workload pressure are as follows:
¡ %idle—Above 40.
¡ %util (disk I/O usage)—Below 40%.
¡ svctm (average processing time per I/O request)—Below 20 milliseconds.
¡ await (average IO wait time), r_await (average read operation wait time), and w_await (average write operation wait time)—Below 20 milliseconds.
|
NOTE: It is normal that parameter values exceed the upper limit. However, if the parameter values continuously remain above the upper limit, you must wait for the service pressure to decrease or suspend some services until the cluster's service pressure meets the requirements, and then continue the subsequent operations. |
Figure 44 Output from the iostat command
Checking the memory usage
1. Log in to the operating system on all nodes in the cluster via SSH.
2. Execute the sync;echo 3 > /proc/sys/vm/drop_caches command to release the memory cache, and then wait for about 1 minute.
3. Execute the free –m command to check memory usage. Memory usage should be below 60% (calculated as the ratio of used value to total memory capacity).
Figure 45 Memory usage
Checking the configuration
Checking the switch configuration
Identify whether the storage switch and service switch have STP enabled.
· If STP is enabled, verify that the ports connecting the switch to the server are configured as edge ports. For more information, see the command reference provided with the switch.
· If STP is not enabled, you can skip this check item.
|
NOTE: To change the switch configuration, contact Technical Support. |
Checking host routes
1. As shown in Figure 46, execute the route –n command in the CLI of each cluster node to check the host route information.
Figure 46 Checking the route information
2. As shown in Figure 47, execute the cat /etc/sysconfig/network-scripts/route-ethxx command on each cluster node to identify whether the /etc/sysconfig/network-scripts/route-ethxx configuration file contains the correct route configuration. ethxx represents the network adapter port related to the route. If the configuration file does not contain the route configuration, add the route configuration to the file. If the configuration file does not exist, manually create the file.
Figure 47 Checking the configuration file
Checking disk caches
IMPORTANT: · Perform the following tasks on all cluster nodes. If the inspection results are not the expected ones, contact Technical Support. · The RAID controller operations introduced in this document are applicable to only H3C devices. For RAID controller operations on a third-party device, contact the related manufacturer. |
PMC controller (PM8060)
1. As shown in Figure 48, execute the arcconf getconfig x pd | grep -i “write cache” command to verify that disk write caching is disabled on the node. x represents the number of the RAID controller. Verify that the output information is Disabled (write-through).
Figure 48 Checking disk write caching state
2. Execute the arcconf getconfig x ld command, where x represents the number of the RAID controller. Verify that read caching and write caching are enabled for all HDD controller and disabled for all SSD controllers. The correct states for HDDs and SSDs are as shown in Figure 49 and Figure 50, respectively.
Figure 49 Correct state for HDDs
Figure 50 Correct state for SSDs
PMC controller (P460)
1. As shown in Figure 51, execute the arcconf getconfig x ad |grep " Physical Drive Write Cache Policy Information" -A4 command to verify that disk write caching is disabled on the node. x represents the number of the RAID controller. Verify that the output information is Disabled.
Figure 51 Checking disk write caching state
2. As shown in Figure 52, execute the arcconf getconfig x ad | grep -i cache command to view the RAID controller configuration. x represents the number of the RAID controller. By default, the value is 10% for Read Cache, 90% for Write Cache, and Disabled for No-Battery Write Cache.
Figure 52 Viewing the RAID controller configuration
3. Execute the arcconf getconfig x ld command, where x represents the number of the RAID controller. Verify that read caching and write caching are enabled for all HDD controller and disabled for all SSD controllers. The correct states for HDDs and SSDs are as shown in Figure 53 and Figure 54, respectively.
Figure 53 Correct state for HDDs
Figure 54 Correct state for SSDs
LSI RAID controller
1. Execute the /opt/raid/MegaRAID/storcli/storcli64 show command to determine the compatibility of StorCLI with the RAID controller. Based on the result, select the appropriate instruction set for subsequent operations.
¡ If the output contains the LSI RAID controller you intend to operate on, StorCLI is compatible with the RAID controller. See this section to obtain the disk letter.
¡ If the output does not contain the LSI RAID controller to be operated on, StorCLI is not compatible with the RAID controller. See "Common commands used in MegaCLI" to use the related commands in MegaCLI to obtain the disk letter.
CAUTION: To avoid anomalies caused by compatibility issues, ensure compatibility and select the correct instruction set before you perform operations related to LSI RAID controllers. |
Figure 55 Determining compatibility
2. View the mapping relationships between the enclosure number, hard drive slot number, logical array, hard drive DID, and hard disk letter on the RAID controller.
a. Execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show all command, where N is the RAID controller number, to view information about physical and virtual drives. In the PD LIST, find the corresponding DG by using the enclosure number and HD slot number to locate the logic array number VD in the VD LIST based on the DG.
b. Execute the /opt/raid/MegaRAID/storcli/storcli64 /cN/vx show all command, where N is the RAID controller number and x is the VD number, to obtain the hard disk letter. In this example, the hard disk letter corresponding to VD 238 is /dev/sdb.
Figure 57 Obtaining the hard disk letter
3. Execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show all command, where N is the RAID controller number, to view the read/write cache of the logical drive. RWTD stands for R (read cache enabled) and WT (write cache disabled). The state of the read/write cache is shown in Figure 58. There are differences between enabling write cache with WB and AWB. Set WB with a battery backup unit (BBU) present and in normal state. Use AWB when the BBU is not present or in an abnormal state.
Figure 58 Viewing the logic disk cache state
Table 6 Logic disk cache state
Cache type |
Field value |
Description |
Read cache |
R(RA) |
Enabled |
NR(NORA) |
Disabled |
|
Write cache |
WB/AWB |
Enabled |
WT |
Disabled |
4. To obtain the BBU state, execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show all | grep -i bbu command, where N is the RAID controller number, to view the BBU state.
Figure 59 Viewing the BBU state
5. As shown in Figure 60, execute smartctl -g wcache -d megaraid,DID /dev/sdx command to obtain the hard drive write cache. Replace DID with the hard drive DID and x with the hard disk letter, as shown in Figure 56 and Figure 57, respectively.
Figure 60 Obtaining hard drive write cache
HP SSA controller
1. As shown in Figure 61, execute the hpssacli ctrl all show config detail | grep -i cache command to verify that disk write caching is disabled on the node. By default, the value is 10% read, 90% write for Cache Ratio, Disabled for Drive Write Cache, and Disabled for No-Battery Write Cache.
Figure 61 Checking disk write caching state
2. As shown in Figure 62, execute the hpssacli ctrl slot=x ld all show detail command to verify that the caching configuration is correct for each RAID controller. x represents the slot number of the RAID controller. The correct configuration is as follows: For HDDs, the value for the LD Acceleration Method parameter is Controller Cache. For SSDs, the value for the LD Acceleration Method parameter is Disabled or Smart IO Path.
Figure 62 Checking the cache mode
3. As shown in Figure 63, execute the hpssacli ctrl all show config detail | grep -i Power command to verify that the value for the Current Power Mode parameter is MaxPerformance.
Figure 63 Checking the RAID controller mode
Checking the hardware state of the cluster
Log in to the HDM/iLO of all nodes in the cluster and check for any hardware errors. If there are errors on hardware other than the component to be replaced, contact Technical Support.
Checking the NTP server state
|
NOTE: For E33xx, if the NTP server state is not as expected, you can execute the date -s "YY/MM/DD hh:mm:ss” command in the operating system to edit the system time. Make sure all nodes in the cluster are consistent in system time. |
As shown in Figure 64, execute the ntpq –p command in the CLI of each cluster node to check the NTP server state. The expected result is as follows: All nodes use the same NTP server, the NTP server is not in INIT state, and the offset value is smaller than 100ms. If the NTP configuration is not as expected, contact Technical Support.
Parameter description:
· An asterisk (*) in front of the IP address indicates the primary NTP server, and a plus sign (+) indicates the backup NTP server.
· The NTP server is abnormal if the Refid state is INIT.
Replacing components
In situations where the faulty node is not shut down and the faulty node has been shut down due to a fault, the method of replacing components is different. Perform the corresponding operations based on the on-site situation.
Replacing components such as memory modules, RAID controllers, and drive backplanes without shutting down the fault node
CAUTION: If the component to be replaced is a RAID controller, the new RAID controller must be the same model as the one to be replaced. If they are not the same model, contact Technical Support. |
If the faulty node is not down due to hardware failure, see this section to replace components. Figure 65 shows the replacement workflow.
Enabling maintenance mode
Procedure for the scenario where you can log in to the management interface
1. Log in to the management interface.
2. As shown in Figure 66, from the left navigation pane, select Hosts > Storage Nodes. Click More in the Actions column for the target storage node, and then select Maintenance Mode.
Figure 66 Enabling maintenance mode
3. As shown in Figure 67, in the dialog box that opens, select On to enable the maintenance mode, and then click OK.
Figure 67 Enabling maintenance mode
Procedure for the scenario where you cannot log in to the management interface
If you cannot log in to the management interface, enable maintenance mode for a node from the CLI.
1. Log in to the operating system of any cluster node through SSH, and then access the CLI.
2. Execute the ceph osd set-osd noout osd_id and ceph osd set-osd noup osd_id commands to enable maintenance mode. osd_id represents the ID of the target OSD, and you can specify multiple OSD IDs. For example, to enable maintenance mode for OSDs 1, 2, and 3, execute the ceph osd set-osd noout 1 2 3 and ceph osd set-osd noup 1 2 3 commands.
3. As shown in Figure 68, execute the ceph –s command to verify that the cluster state is Health_WARN and the system prompts noup, noout flag(s) set.
Figure 68 Enabling maintenance mode at the CLI
Manually stopping OSDs
1. Execute the systemctl stop ceph-osd.target command on the faulty node.
2. Wait for about 1 minute, and then execute the ceph osd tree command to verify that only the state of all OSDs on the faulty node has changed to down, and the state of OSDs on other nodes remains up.
3. Execute the ceph –s command to verify that the PGs are not in pg peering, pg stale, pg activating, pg incomplete, or pg inactive state.
|
NOTE: PG peering, PG stale, and PG activating are intermediate states of PGs after an OSD is stopped. They typically end automatically within a few seconds to tens of seconds. If the states persist for a long period of time, contact Technical Support for help. |
Manually shutting down the node
1. Execute the sync command to flush the memory.
2. Execute the hwclock –w command to write the clock to BIOS.
3. Execute the shutdown -h now command to shut down the node. During the shutdown process, keep observing the power status on the HDM interface to avoid failures in executing the shutdown command or prevent execution of the shutdown command from getting stuck.
Replacing hardware
After the node is properly shut down, power off the faulty node and proceed with the hardware replacement. For more information, see the user guide or hardware configuration guide for the product.
Starting the node
1. After hardware replacement is completed, power on the node and check the HDM interface for any hardware errors.
2. Log in to the remote console through HDM, and identify whether any error messages were generated during startup inspection. If any error messages were generated, resolve the issues first.
3. After the node starts up, log in to the CLI of the node through the HDM remote console, and then execute the ip addr command to verify that all physical NIC ports are up. If the status is not up, execute the ifup ethxx command (where ethxx is the physical NIC port name) to activate each physical NIC port. For example:
ifup ethB03-0 //ethB03-0 is the name of the physical port.
4. Verify that all bond ports are running correctly. Example:
root@node1:~# cat /proc/net/bonding/bond0 //bond0 is the name of the bond port. All bond ports must be checked.
The output is as follows:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (0)
MII Status: up //The bond port is up.
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 13
Partner Key: 1
Partner Mac Address: 38:91:d5:e0:6c:99
Slave Interface: p4p1
MII Status: up //Member port p4p1 of bond0 is up.
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 8c:dc:d4:15:f7:58
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: p4p2
MII Status: up //Member port p4p2 of bond0 is up.
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 8c:dc:d4:15:f7:5c
Aggregator ID: 1
Slave queue ID: 0
5. Use the ping command to verify that the node can communicate with other nodes in the cluster on both the storage and service networks. As a best practice, keep the ping operation for one minute. If packet loss does not occur, the network is running correctly. If the ping operation fails or packet loss occurs, resolve the network issue first.
6. Use the ping command to verify that the node can communicate with the client on the service network. As a best practice, keep the ping operation for one minute. If packet loss does not occur, the network is running correctly. If the ping operation fails or packet loss occurs, resolve the network issue first.
Disabling maintenance mode
Procedure for the scenario where you can log in to the management interface
If the maintenance mode is previously enabled through the management interface of the storage system, log in to the management interface of the storage system to disable the maintenance mode.
1. As shown in Figure 69, from the left navigation pane, select Hosts > Storage Nodes. Click More in the Actions column for the target storage node, and then select Maintenance Mode.
Figure 69 Disabling the maintenance mode (New UI)
2. As shown in Figure 70, in the dialog box that opens, select Off to disable the maintenance mode, and then click OK.
Figure 70 Disabling the maintenance mode (New UI)
Procedure for the scenario where you cannot log in to the management interface
If the maintenance mode is previously enabled through the operating system CLI of the node, disable it through the operating system CLI of the node.
1. Log in to the operating system of any cluster node through SSH, and then access the CLI.
2. In the operating system, execute the ceph osd unset-osd noout osd_id and ceph osd unset-osd noup osd_id commands to disable the maintenance mode. osd_id is the OSD ID for which the maintenance mode is to be disabled. You can specify multiple OSD IDs as needed, for example, ceph osd unset-osd noout 1 2 3 and ceph osd unset-osd noup 1 2 3.
3. Wait for 5 minutes, and then execute the ceph osd tree command to identify whether all OSDs on the node are up. If any OSD on a node is not up, execute the ceph-disk activate-all command on the node to pull up all OSDs. Then, execute the ceph osd tree command again to verify that all OSDs on the node are up.
Checking the cluster health status
Log in to the management interface of the storage system, keep observing the cluster's health status until the health status recovers to 100% and all alarms are cleared.
Replacing components such as memory modules, RAID controllers, and disk backplanes with the faulty node shut down
If the faulty node goes down due to hardware failure, see this section to replace components. Figure 71 shows the replacement workflow.
Disconnecting the faulty node from networks
IMPORTANT: To restore networks for the faulty node after component replacement, record the network cable connections before disconnecting the network cables. |
Disconnect the management, storage, and service network cables from the faulty node.
Enabling maintenance mode
Procedure for the scenario where you can log in to the management interface
1. Log in to the management interface.
2. As shown in Figure 72, from the left navigation pane, select Hosts > Storage Nodes. Click More in the Actions column for the target storage node, and then select Maintenance Mode.
Figure 72 Enabling maintenance mode (New UI)
3. As shown in Figure 73, in the dialog box that opens, select On to enable the maintenance mode, and then click OK.
Figure 73 Enabling maintenance mode (New UI)
Procedure for the scenario where you cannot log in to the management interface
If you cannot log in to the management interface, enable maintenance mode for a node through the CLI.
1. Log in to the operating system of any cluster node through SSH, and then access the CLI.
2. Execute the ceph osd set-osd noout osd_id and ceph osd set-osd noup osd_id commands to enable maintenance mode. osd_id is the OSD ID for which the maintenance mode is to be enabled. You can specify multiple OSD IDs as needed, for example, ceph osd set-osd noout 1 2 3 and ceph osd set-osd noup 1 2 3.
3. As shown in Figure 74, execute the ceph –s command to verify that the cluster state is Health_WARN and the system prompts noup, noout flag(s) set.
Figure 74 Enabling maintenance mode at the CLI
Replacing hardware
Power off the faulty node, and then proceed with the hardware replacement. For more information, see the user guide or hardware configuration guide for the product.
Starting the node
1. After hardware replacement is completed, power on the node and check the HDM interface for any hardware error messages.
2. Log in to the remote console through HDM, and identify whether any error messages were generated during startup inspection. If any error messages were generated, resolve the issues first.
3. After the node starts up, log in to the CLI of the node through the HDM remote console, and then execute the date command to identify whether the system time of the node is consistent with that of other cluster nodes. If they are inconsistent, execute the date -s command to manually set the system time of the node. Make sure the time difference between the node and the other cluster nodes is smaller than 7 seconds. Then, execute the hwclock –w command to synchronize the system clock to hardware.
4. Reconnect the network cables.
5. Execute the ip addr command to verify that all physical NIC ports are up. If the status is not up, execute the ifup ethxx command (where ethxx is the physical NIC port name) to activate each physical NIC port. For example:
ifup ethB03-0 //ethB03-0 is the name of the physical port.
6. Verify that all bond ports are running correctly. Example:
root@node1:~# cat /proc/net/bonding/bond0 //bond0 is the name of the bond port. All bond ports must be checked.
The output is as follows:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (0)
MII Status: up //The bond port is up.
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 13
Partner Key: 1
Partner Mac Address: 38:91:d5:e0:6c:99
Slave Interface: p4p1
MII Status: up //Member port p4p1 of bond0 is up.
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 8c:dc:d4:15:f7:58
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: p4p2
MII Status: up //Member port p4p2 of bond0 is up.
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 8c:dc:d4:15:f7:5c
Aggregator ID: 1
Slave queue ID: 0
7. Use the ping command to verify that the node can communicate with other nodes in the cluster on both the storage and service networks. As a best practice, keep the ping operation for one minute. If packet loss does not occur, the network is running correctly. If the ping operation fails or packet loss occurs, resolve the network issue first.
8. Use the ping command to verify that the node can communicate with the client on the service network. As a best practice, keep the ping operation for one minute. If packet loss does not occur, the network is running correctly. If the ping operation fails or packet loss occurs, resolve the network issue first.
Disabling maintenance mode
Procedure for the scenario where you can log in to the management interface
If the maintenance mode is previously enabled through the management interface of the storage system, log in to the management interface of the storage system to disable the maintenance mode.
1. As shown in Figure 75, from the left navigation pane, select Hosts > Storage Nodes. Click More in the Actions column for the target storage node, and then select Maintenance Mode.
Figure 75 Disabling the maintenance mode (New UI)
2. As shown in Figure 76, in the dialog box that opens, select Off to disable the maintenance mode, and then click OK.
Figure 76 Disabling the maintenance mode (New UI)
Procedure for the scenario where you cannot log in to the management interface
If the maintenance mode is previously enabled through the operating system CLI of the node, disable it through the operating system CLI of the node.
3. Log in to the operating system of any cluster node through SSH, and then access the CLI.
4. In the operating system, execute the ceph osd unset-osd noout osd_id and ceph osd unset-osd noup osd_id commands to disable the maintenance mode. osd_id is the OSD ID for which the maintenance mode is to be disabled. You can specify multiple OSD IDs as needed, for example, ceph osd unset-osd noout 1 2 3 and ceph osd unset-osd noup 1 2 3.
5. Wait for 5 minutes, and then execute the ceph osd tree command to identify whether all OSDs on the node are up. If any OSD on a node is not up, execute the ceph-disk activate-all command on the node to pull up all OSDs. Then, execute the ceph osd tree command again to verify that all OSDs on the node are up.
Checking the cluster health status
Log in to the management interface of the storage system, keep observing the cluster's health status until the health status recovers to 100% and all alarms are cleared.
Using Toolkit to perform an inspection
Use the toolkit to perform an inspection after completing the above operations. For information about how to use Toolkit, see the Toolkit usage guide.
· If the inspection result does not include any error messages, the component replacement is successful.
· If the inspection results include error messages, resolve the issues based on the messages. If you cannot resolve the issues, contact Technical Support.
Replacing disks
IMPORTANT: · The disk replacement method introduced in this document is applicable to only H3C servers or storage appliances. For information about replacing disks on other servers, contact Technical Support. · To replace components in a cluster with running services, contact Technical Support. |
This chapter describes how to replace hot-swappable disks on cluster nodes.
Component replacement workflow
Figure 77 Workflow for replacing disks
Identifying the software version
Identify the software version before component replacement. Perform component replacement according to the software version.
Viewing the software version on the ONEStor management interface
Log in to the ONEStor management interface,
as shown in Figure 78.
Then, view the software version in the user information area. For some
versions, you can click the icon on the top
right of the page to view the software version. Table 7 shows
the software version and product version compatibility.
Figure 78 Viewing the software version on the management interface
Table 7 Software version and product version compatibility matrix
Software version |
Product version |
E011x, R011x |
ONEStor 1.0 |
E03xx, R03xx |
ONEStor 2.0 |
E21xx, R21xx, E31xx, R31xx, E33xx |
ONEStor 3.0 |
E51xx, E52xx |
ONEStor 5.0 |
Viewing the software version on the operating system
Log in to the CLI of the node, as shown in Figure 79. Then, execute the cat /etc/onestor_external_version command to view the software version. Table 8 shows the software version and product version compatibility.
Figure 79 Viewing the software version on the operating system
Table 8 Software version and product version compatibility matrix
Software version |
Product version |
E011x, R011x |
ONEStor 1.0 |
E03xx, R03xx |
ONEStor 2.0 |
E21xx, R21xx, E31xx, R31xx, E33xx |
ONEStor 3.0 |
E51xx, E52xx |
ONEStor 5.0 |
Identifying the disk type
Disks can be categorized by functionality into data disks, system disks, and cache disks. Cache disks can be further categorized by acceleration method into FlashCache SSDs and Scache SSDs. Before you replace a disk, identify its type and perform operations described in the corresponding chapter.
Querying disk letters through disk slot number
IMPORTANT: · If you know only the slot number of the disk to be replaced and do not know its disk letter in the operating system, obtain its disk letter as described in this section. If you already know the disk letter, skip this section. · If a faulty disk causes the RAID array to go entirely offline, and you do not know the type of the faulty disk and cannot obtain its disk letter, contact Technical Support. · The method of obtaining disk letters varies by RAID controller model. For more information, see the related section. · The RAID controller operations introduced in this document are applicable to only H3C servers. For RAID controller operations on a third-party server, contact the related manufacturer. |
PMC controller (PM8060)
1. As shown in Figure 80, log in to the CLI of the node. Then, execute the arcconf list command to obtain the RAID controller number. The Controller ID field in the command output indicates the RAID controller number. In this example, the RAID controller number is found to be 1.
Figure 80 Querying the RAID controller number
2. As shown in Figure 81, execute the arcconf getconfig Controller ID ld command (where Controller ID is the RAID controller number obtained in the previous step) to obtain the logical device number of the disk in the target slot. In this example, the logical device number is 7 for the disk in Enclosure 0 and slot 6 with RAID controller number 1.
Figure 81 Querying the logical device number
3. As shown in Figure 82, execute the lsscsi command to obtain the disk letter of the faulty disk in the operating system. In this example, the third digit in the brackets is the logical device number. /dev/sdx represents the disk letter corresponding to the logical disk array in the system. The figure shows that the disk letter for the logical disk array with logical device number 4 is sde.
Figure 82 Querying the disk letter
PMC controller (P460)
1. As shown in Figure 83, log in to the CLI of the node. Then, execute the arcconf list command to obtain the RAID controller number. The Controller ID field in the command output indicates the RAID controller number. In this example, the RAID controller number is found to be 1.
Figure 83 Querying the RAID controller number
2. As shown in Figure 84, execute the arcconf getconfig Controller ID ld command (where Controller ID is the RAID controller number obtained in the previous step) to obtain the logical device number of the disk in the target slot. In this example, the logical device number is 8 for the disk in Enclosure 1 and slot 6 with RAID controller number 1. The Disk Name field indicates that the disk letter is sdi.
Figure 84 Querying the disk letter
LSI RAID controller
1. Execute the /opt/raid/MegaRAID/storcli/storcli64 show command to determine the compatibility of StorCLI with the RAID controller. Based on the result, select the appropriate instruction set for subsequent operations.
¡ If the output contains the LSI RAID controller you intend to operate on, StorCLI is compatible with the RAID controller. See this section to obtain the disk letter.
¡ If the output does not contain the LSI RAID controller to be operated on, StorCLI is not compatible with the RAID controller. See "Common commands used in MegaCLI" to use the related commands in MegaCLI to obtain the disk letter.
CAUTION: To avoid anomalies caused by compatibility issues, ensure compatibility and select the correct instruction set before you perform operations related to LSI RAID controllers. |
Figure 85 Determining compatibility
2. As shown in Figure 86, access the operating system CLI of the node, and execute the /opt/raid/MegaRAID/storcli/storcli64 /call show command to obtain the RAID controller number. In the command output, Controller=N indicates that N is the RAID controller number. In this example, the RAID controller number is found to be 0.
Figure 86 Obtaining the RAID controller number
3. As shown in Figure 87, execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show command, where N is the RAID controller number, to view the DG of the hard drive associated with the slot number. Then obtain the virtual driver number of the hard drive associated with the slot number based on the DG.
Figure 87 Obtaining the virtual drive
4. As shown in Figure 88, execute the /opt/raid/MegaRAID/storcli/storcli64 /cN/vx show all command, where N is the RAID controller number and x is the VD number obtained in the previous step, to obtain the hard disk letter associated with the slot number. In this example, the virtual driver is 238 for the hard drive corresponding to RAID controller 0, slot number Enclosure 250, and slot ID 2. The hard disk letter is /dev/sdb.
Figure 88 Checking disk letters
5. As shown in Figure 89, execute the lsscsi command to check the disk letter of the hard drive in the operating system. In this example, the third digit in the brackets is the logical device number. /dev/sdx represents the disk letter corresponding to the logical disk array in the system. The figure shows that the disk letter is sdd for the logical disk array with logical device number 3 in the operating system.
Figure 89 Checking the disk letter
HP SSA RAID controller
1. As shown in Figure 90, access the operating system CLI of the node, and execute the hpssacli ctrl all show command to obtain the RAID controller number. In the command output, the slot number represents the RAID controller number. In this example, the RAID controller number is found to be 1.
Figure 90 Obtaining the RAID controller number
2. As shown in Figure 91, execute the hpssacli ctrl slot=x physicaldrive all show detail command (where x is the RAID controller number obtained in the previous step) to display the association between the physical disk slot number and the logic array. In this example, hard disk slot number 1I:2:2 corresponds to logical array A.
Figure 91 Obtaining the logic array
3. As shown in Figure 92, execute the hpssacli ctrl slot=x logicaldrive all show detail command (where x is the RAID controller number obtained in the first step) to display the association between the logic array number and the system disk letter. In this example, the disk letter corresponding to logic array A is sda.
Figure 92 Obtaining the disk letter
Determining the hard disk type through partition and mounting
System disk
As shown in Figure 93, execute the lsblk command. The disk with a partition mounted to "/" is the system disk. In this example, the hard disk with disk letter sda has a disk partition mounted to "/", and this hard disk sda is the system disk.
Figure 93 Identifying the system disk
Identifying the FlashCache SSD (ONEStor R21xx)
Execute the lsblk command and determine whether the hard disk is a FlashCache SSD from the command output.
As shown in Figure 94, FlashCache SSD has the following features: The size of the first partition is 15 MB or 16 MB. Other partitions are of the same size, each with a long UUID, and none of them has mounting. The hard disk with disk letter sdo is a FlashCache SSD.
Figure 94 Identifying the FlashCache SSD (ONEStor R21xx)
Identifying the Scache SSD (ONEStor E31xx and E33xx)
Execute the lsblk command and determine whether the hard disk is an Scache SSD from the command output.
As shown in Figure 95 and Figure 96, Scache SSD has the following features: The size of the first partition is 16 MB, followed by three groups (two per group, six in total) of partitions. In each group, the size of the first partition is 100 MB, used for data and has mounting. The second partition is a block partition and has no mounting. For subsequent groups, each group contains three partitions, with a total of 3x partitions. The three partitions in each group correspond to FlashCache, ceph block.db, and ceph block.wal for the data disk OSD.
In this example, the hard disk with disk letter sdd is an Scache SSD.
Figure 95 Identifying Scache SSD-1 (ONEStor E31xx, E33x)
Figure 96 Identifying Scache SSD-2 (ONEStor E31xx, E33xx)
Data disk
Execute the lsblk command. The data disk might have the following types of partition and mounting forms.
· Data disk in a non-accelerated SSD environment:
As shown in Figure 97, the data disk in a non-accelerated SSD environment has two partitions: The size of the first partition is 100 MB, which is mounted to path /var/lib/ceph/osd/ceph-x (where x is the OSD number), and the second partition occupies the remaining capacity of the hard disk and has no mounting path.
Figure 97 Data disk partition/mounting method-no-acceleration SSD
· Data disk in a FlashCache SSD environment:
As shown in Figure 98, the data disk in a FlashCache SSD environment has two partitions: The size of the first partition is 100 MB, which is mounted to path /var/lib/ceph/osd/ceph-x (where x is the OSD number), and the second partition occupies the remaining capacity of the hard disk, has no mounting path, and contains a long UUID.
Figure 98 Data disk partition/mounting method-FlashCache SSD
· Data disk in a Scache SSD environment:
For ONEStor E31xx and E33xx in the Scache SSD environment as shown in Figure 99 and Figure 100, the data disk has two partitions. The size of the first partition is 100 MB and is mounted to path /var/lib/ceph/osd/ceph-x (where x is the OSD number). The second partition occupies the remaining capacity of the hard disk and has no mounting path. The path var/lib/ceph/osd/ceph-x (where x is the OSD number) contains the fcache_uuid, block.db_uuid, and block.wal_uuid files.
Figure 99 Data disk partition/mounting method-Scache SSD-1 (ONEStor E31xx, E33xx)
Figure 100 Data disk partition/mounting method-Scache SSD-2 (ONEStor E31xx, E33xx)
Checking the cluster state
IMPORTANT: Before performing component replacement, complete the inspection of all check items for the cluster as described in this section. Make sure all check items meet the component replacement requirements, and then perform the component replacement operation. |
Checking the cluster health state
1. As shown in Figure 101, log in to the management interface, and identify whether the cluster health score is 100% and whether any alarms were generated. If the cluster health score is not 100% or alarms were generated, wait for the cluster to automatically recover. If the cluster does not recover after a time period, contact Technical Support.
Figure 101 Checking cluster health state and alarms
2. As shown in Figure 102, access the CLI of any cluster node, and then execute the watch ceph –s command to monitor the cluster health state for one minute. The cluster is in healthy state if the health state value is Health_OK. If the health state is not Health_OK, contact Technical Support.
Figure 102 Checking cluster state through the CLI
Checking the service load of the cluster
Checking the CPU usage and disk pressure
1. Log in to the operating system on all nodes in the cluster via SSH.
2. Execute the iostat -x 1 command and monitor the CPU usage and disk pressure of all nodes. This command outputs iostat information every second. As a best practice, observe each node for about two minutes. Key parameters for cluster workload pressure are as follows:
¡ %idle—Above 40.
¡ %util (disk I/O usage)—Below 40%.
¡ svctm (average processing time per I/O request)—Below 20 milliseconds.
¡ await (average IO wait time), r_await (average read operation wait time), and w_await (average write operation wait time)—Below 20 milliseconds.
|
NOTE: It is normal that parameter values exceed the upper limit. However, if the parameter values continuously remain above the upper limit, you must wait for the service pressure to decrease or suspend some services until the cluster's service pressure meets the requirements, and then continue the subsequent operations. |
Figure 103 Output from the iostat command
Checking the memory usage
1. Log in to the operating system on all nodes in the cluster via SSH.
2. Execute the sync;echo 3 > /proc/sys/vm/drop_caches command to release the memory cache, and then wait for about one minute.
3. Execute the free –m command to check memory usage. Memory usage should be below 60% (calculated as the ratio of used value to total memory capacity).
Checking disk caches
|
NOTE: · Perform the following tasks on all cluster nodes. If the inspection results are not the expected ones, contact Technical Support. · The RAID controller operations introduced in this document are applicable to only H3C servers. For RAID controller operations on a third-party server, contact the related manufacturer. |
PMC controller (PM8060)
1. As shown in Figure 105, execute the arcconf getconfig x pd | grep -i “write cache” command to verify that disk write caching is disabled on the node. x represents the number of the RAID controller. Verify that the output information is Disabled (write-through).
Figure 105 Checking disk write caching state
2. Execute the arcconf getconfig x ld command, where x represents the number of the RAID controller. Verify that read caching and write caching are enabled for all HDD controller and disabled for all SSD controllers. The correct states for HDDs and SSDs are as shown in Figure 106 and Figure 107, respectively.
Figure 106 Correct state for HDDs
Figure 107 Correct state for SSDs
PMC controller (P460)
1. As shown in Figure 108, execute the arcconf getconfig x ad |grep " Physical Drive Write Cache Policy Information" -A4 command to verify that disk write caching is disabled on the node. x represents the number of the RAID controller. Verify that the output information is Disabled.
Figure 108 Checking disk write caching state
2. As shown in Figure 109, execute the arcconf getconfig x ad | grep -i cache command to view the RAID controller configuration. x represents the number of the RAID controller. By default, the value is 10% for Read Cache, 90% for Write Cache, and Disabled for No-Battery Write Cache.
Figure 109 Viewing the RAID controller configuration
3. Execute the arcconf getconfig x ld command, where x represents the number of the RAID controller. Verify that read caching and write caching are enabled for all HDD controller and disabled for all SSD controllers. The correct states for HDDs and SSDs are as shown in Figure 110 and Figure 111, respectively.
Figure 110 Correct state for HDDs
Figure 111 Correct state for SSDs
LSI RAID controller
1. Execute the /opt/raid/MegaRAID/storcli/storcli64 show command to determine the compatibility of StorCLI with the RAID controller. Based on the result, select the appropriate instruction set for subsequent operations.
¡ If the output contains the LSI RAID controller you intend to operate on, StorCLI is compatible with the RAID controller. See this section to obtain the disk letter.
¡ If the output does not contain the LSI RAID controller to be operated on, StorCLI is not compatible with the RAID controller. See "Common commands used in MegaCLI" to use the related commands in MegaCLI to obtain the disk letter.
CAUTION: To avoid anomalies caused by compatibility issues, ensure compatibility and select the correct instruction set before you perform operations related to LSI RAID controllers. |
Figure 112 Determining compatibility
2. Execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show all command, where N is the RAID controller number, to view the logic disk read and write cache, as shown in Figure 113. RWTD represents R (enable read cache) and WT (disable write cache).
Figure 113 RAID controller logic cache state
3. Execute the smartctl -g wcache -d megaraid,DID /dev/sdx command to obtain the hard drive write cache. Replace DID with the hard drive DID and x with the hard disk letter, as shown in Figure 87 and Figure 88, respectively.
Figure 114 Obtaining hard drive write cache
HP SSA RAID controller
1. As shown in Figure 115, execute the hpssacli ctrl all show config detail | grep -i cache command to verify that disk write caching is disabled on the node. By default, the value is 10% read, 90% write for Cache Ratio, Disabled for Drive Write Cache, and Disabled for No-Battery Write Cache.
Figure 115 Checking disk write caching state
2. As shown in Figure 116, execute the hpssacli ctrl slot=x ld all show detail command to verify that the caching configuration is correct for each RAID controller. x represents the slot number of the RAID controller. The correct configuration is as follows: For HDDs, the value for the LD Acceleration Method parameter is Controller Cache. For SSDs, the value for the LD Acceleration Method parameter is Disabled or Smart IO Path.
Figure 116 Checking the cache mode
3. As shown in Figure 117, execute the hpssacli ctrl all show config detail | grep -i Power command to verify that the value for the Current Power Mode parameter is MaxPerformance.
Figure 117 Checking the RAID controller mode
Checking the hardware state of the cluster
Log in to the HDM/iLO of all nodes in the cluster and check for any hardware errors. If there are errors on hardware other than the component to be replaced, contact Technical Support.
Replacing hard disks
Replacing data disks (without using one-click disk replacement)
If the ONEStor version does not support one-click disk replacement (there is no disk management button on the disk management page), see this section for guide on data disk replacement. As shown in Figure 118, if the ONEStor version supports one-click disk replacement (a disk replacement button exists on the disk management interface), see "One-click replacing data disks" for guide on data disk replacement.
You can perform a one-click disk replacement only when the following conditions are met simultaneously:
· The ONEStor version is R2130 or later.
· The server model is UIS 3010 G3, UIS3020 G3, UIS3030 G3, UIS3040 G3, or UIS 4500 G3.
· If the server uses a RAID controller, the RAID controller model must be LSI 9361 or LSI 9460.
Figure 118 Location of the disk replacement button (New UI)
The process for disk replacement that does not support one-click disk replacement is as shown in Figure 119. In actual operation, see the corresponding sections based on the RAID controller and hard disk types.
Figure 119 Data disk replacement flowchart (without using one-click disk replacement)
Deleting a faulty disk from the cluster
Deleting a faulty disk on the ONEStor management interface
1. As shown in Figure 120, from the left navigation pane, select Hosts. You can see that the number of disks on the host has decreased.
Figure 120 Decreased number of disks on the host
2. As shown in Figure 121, click the host name to open the disk management page, where you can see the disk letter of the faulty disk.
Figure 121 Abnormal hard disk state
3. As shown in Figure 122, delete the faulty disk. Make sure the cluster's health is 100%. On the hard disk management interface, locate the abnormal disks (with disk letters such as sdh and sdk), select Delete, and wait for the deletion process to complete.
IMPORTANT: · You can perform this operation only on one node at a time. After deleting the faulty disk from one node, wait for data balancing to complete, and then delete faulty disks from other nodes. Data balancing takes some time. As a best practice, complete this operation one day before the component for replacement arrives on site. · If the hard disk management page displays no data or you cannot delete the hard disk, delete the faulty hard disk at the CLI. · When the mounting path of the faulty hard disk is lost and you cannot delete the faulty hard disk on the storage system management page, log in to the operating system CLI and delete the faulty disk with relevant commands. |
Figure 122 Deleting a hard disk
Deleting a faulty disk at the CLI (ONEStor R21xx)
4. As shown in Figure 123, SSH to the CLI of the faulty node and execute the ceph osd tree command to find the OSD number in down state.
In this example, the faulty OSD number is 70.
5. First, execute the touch /var/lib/ceph/shell/watch_maintaining command to create OSD anti-activation. Then, execute the systemctl stop ceph-osd@x.service command to stop the corresponding OSD process, where x is the faulty OSD number.
6. As shown in Figure 124, execute the mount command to view the mounting information for that OSD. Execute the umount /var/lib/ceph/osd/ceph-xx command (xx is the faulty OSD number) to unmount the faulty disk.
Figure 124 Viewing OSD mounting information
7. Check the RAID controller type, and then pull out the faulty disk and insert a new disk as described in the related chapter in “Pulling out the faulty disk and inserting a new disk.”
8. As shown in Figure 125, execute the lsblk command to view the new disk, with disk letter sdn in this example.
This disk uses RAID0 and does not have partitions.
Figure 125 Viewing the new disk
9. Format the new disk:
a. As shown in Figure 126, execute the gdisk /dev/sdn command to partition the new disk. During this process, enter n, press Enter three times, and then enter w, y.
Figure 126 Partitioning the new disk
a. As shown in Figure 127, execute the lsblk command to view the partition of disk sdn, which is partition sdn1 in this example.
a. As shown in Figure 128, execute the mkfs.xfs -f /dev/sdn1 command to configure the file system for partition sdn1.
Figure 128 Configuring the file system
10. As shown in Figure 129, execute the mount /dev/sdn1 /var/lib/ceph/osd/ceph-70 command to mount the OSD in down state to the new disk.
11. If the faulty disk has the FlashCache accelerator and has residual FlashCache and cache partitions, you must locate and delete the FlashCache information and cache partitions. For more information, see “Finding and deleting FlashCache information and cache partitions.”
12. Execute the partprobe command to update disk partition information. Normally, no information is output. If the system displays that a partition is being written, contact Technical Support.
13. Delete the faulty disk from the Web management interface of the storage system.
Delete a faulty disk at the CLI (ONEStor E31xx and E33xx)
1. As shown in Figure 130, log in to the fault node's CLI through SSH, execute the ceph osd tree command, and find the OSD number that is down.
Figure 130 Finding the OSD number
2. First, execute the touch /var/lib/ceph/shell/watch_maintaining command to create OSD anti-activation. Then, execute the systemctl stop ceph-osd@x.service command to stop the corresponding OSD process, where x is the faulty OSD number.
3. As shown in Figure 131, execute the mount command to view the mounting information for that OSD. Execute the umount /var/lib/ceph/osd/ceph-x command again to unmount the faulty disk, where x is the faulty OSD number.
Figure 131 Viewing OSD mounting information
4. Check the RAID controller type, and then pull out the faulty disk and insert a new disk as described in the related chapter in “Pulling out the faulty disk and inserting a new disk.”
5. Execute the following commands (x is the faulty OSD number) to clear the residual OSD information on the faulty disk.
a. Mark the OSD as out: ceph osd out osd.x
b. Mark the OSD as down: ceph osd down osd.x
c. Remove the OSD from the CRUSH map: ceph osd crush remove osd.x
d. Remove the device from the CRUSH map:, ceph osd crush remove devicex
e. Remove the OSD authentication key: ceph auth del osd.x
f. Remove the OSD: ceph osd rm osd.x
6. If the faulty disk has the Scache accelerator and has residual Scache and cache partitions, you must locate and delete the Scache information and cache partitions. For more information, see “Finding and deleting Scache information and cache partitions.”
CAUTION: ONEStor E33xx and ONEStor E31xx use the Scache accelerator that runs in the user mode of the operating system. ONEStor R21xx uses the FlashCache accelerator that runs in the kernel mode of the operating system. The methods for finding and deleting cache partitions vary significantly. |
7. Execute the partprobe command to update disk partition information. Normally, no information is output after this command is executed. If the system displays that a partition is being written, contact Technical Support.
Pulling out the faulty disk and inserting a new disk
IMPORTANT: The RAID controller operations introduced in this document are applicable to only H3C servers. For RAID controller operations on a third-party server, contact the related manufacturer. |
PMC RAID controller
1. Locate the faulty disk.
If the LED for the faulty disk is orange, simply pull out the faulty disk and insert a new disk. If the LED for the faulty disk is off, perform the following tasks to locate the faulty disk:
As shown in Figure 132, execute the arcconf identify 1 logicaldrive y command (y is the Logical Device Number obtained in "Querying disk letters through disk slot number") to turn on the LED for the faulty disk. Then, pull out the faulty disk and insert a new disk.
|
NOTE: After the LED is turned on, press any key to exit to turn it off. |
Figure 132 Locating the faulty disk
2. Check the faulty disk for residual logical arrays and delete them.
a. Execute the arcconf getconfig 1 ld command to view all logical arrays under the RAID controller. Normally, the Segment field displays Present, as shown in Figure 133. If the faulty disk has residual arrays, the field will display Missing, as shown in Figure 134. Record the logical device number (assume it is y) that is displayed as Missing.
|
NOTE: If all logical arrays obtained in this step are in Present state, the faulty disk does not have residual information to be deleted. In this case, skip step a. |
a. Execute the arcconf delete 1 logicaldrive y command (y is the Logical Device Number obtained in the previous step) to delete the residual arrays. After deletion, repeat step Figure 1322.a to check for residual arrays.
3. Configure RAID for the new disk. For more information, see “PMC RAID controller.”
LSI RAID controller
1. Execute the /opt/raid/MegaRAID/storcli/storcli64 show command to determine the compatibility of StorCLI with the RAID controller. Based on the result, select the appropriate instruction set for subsequent operations.
¡ If the output contains the LSI RAID controller to be operated on, StorCLI is compatible with the RAID controller. See this section to obtain the disk letter.
¡ If the output does not contain the LSI RAID controller to be operated on, StorCLI is not compatible with the RAID controller. See "Common commands used in MegaCLI" to use the related commands in MegaCLI to obtain the disk letter.
CAUTION: To avoid anomalies caused by compatibility issues, ensure compatibility and select the correct instruction set before you perform operations related to LSI RAID controllers. |
Figure 135 Determining compatibility
2. Locate the faulty disk.
¡ If the LED for the faulty disk is orange, simply pull out the faulty disk and insert a new disk.
¡ If the LED for the faulty disk is off, as shown in Figure 136, execute the /opt/raid/MegaRAID/storcli/storcli64 /cN/eE/sS start locate command to turn on the LED for the disk. E and S represent the enclosure number and disk slot number obtained in "Querying disk letters through disk slot number", and N represents the RAID controller number. Replace these arguments with the actual values.
Figure 136 Turning on the LED for the faulty disk
3. Check for and clear the residual cache of the offline array:
a. As shown in Figure 137, execute the /opt/raid/MegaRAID/storcli/storcli64 /call show preservedcache command to obtain the logical array number to which the residual cache belongs. In this example, the residual cache belongs to controller 0, logical array 6.
Figure 137 Obtaining the logical array number
a. As shown in Figure 138, execute the /opt/raid/MegaRAID/storcli/storcli64 /cN/vx delete preservedcache force command to clear residual data. Replace N with the RAID controller number and x with the logical array number obtained from the previous command.
Figure 138 Clearing residual cache data
4. Check for and delete the residual logical arrays of the faulty disk:
a. Execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show command to obtain logical arrays in UBad state. Replace N with the RAID controller number.
b. Execute the following command to delete any residual logical arrays (skip this step if no logical array remains):
- To change the disk state from UBad to UGood, execute the storcli64 /cN/eE/sS set good command. Replace E and S with the enclosure number and disk slot number obtained in "Querying disk letters through disk slot number", and replace N with the RAID controller number.
- If external configurations are not automatically imported, execute the /opt/raid/MegaRAID/storcli/storcli64 /call/fall import command to import them manually. Alternatively, execute the /opt/raid/MegaRAID/storcli/storcli64 /call/fall del command to delete the external configurations.
5. Configure RAID for the new disk.
For more information, see “LSI RAID controller.”
HP SSA RAID controller
1. Locate the faulty disk.
If the LED for the faulty disk is orange, simply pull out the faulty disk and insert a new disk. If the LED for the faulty disk is off, perform the following tasks to locate the faulty disk:
a. As shown in Figure 122, when you delete the faulty disk on the Web management interface of the storage system, the sdx string displayed represents the disk letter of the faulty disk in the operating system. If the Web management interface of the storage system displays that no data is available for the disk, execute the lsblk command to find an unmounted data disk, which is the faulty disk. Figure 139 displays an unmounted disk.
Figure 139 Unmounted data disk
a. As shown in Figure 140, execute the hpssacli ctrl all show command to obtain the slot numbers of all RAID controllers in the server. In this example, the slot number of the RAID controller is found to be 1.
Figure 140 Obtaining the RAID controller slot numbers
a. As shown in Figure 141, execute the hpssacli ctrl slot=n logicaldrive all show detail command to obtain all logical array numbers and their disk letters. Replace n with the RAID controller slot number obtained in the previous step. In this example, the disk letter sda corresponds to logical array number array A, Logical Drive 2.
Figure 141 Obtaining the logical array number
a. Execute the hpssacli ctrl slot=n logicaldrive y modify led=on to turn on the LED (blue LED) for the faulty disk. Replace y with the logical drive number obtained in the previous step. Then, remove the faulty disk and insert a new one.
|
NOTE: To turn off the LED for the disk, execute the hpssacli ctrl slot=n logicaldrive y modify led=off command. |
2. If the faulty disk still contains residual logical arrays, delete the residual logical arrays.
a. As shown in Figure 142, execute the hpssacli ctrl slot=n logicaldrive all show command to view the state of all logical arrays under the RAID controller. Replace n with the RAID controller number. Normally, the state displays OK. For residual logical arrays of the faulty hard disk, the state will change to Failed or Missing. Record the logical driver number (assume it is y) that is displayed as Failed or Missing.
Figure 142 Obtaining the logical array state
|
NOTE: If all logical arrays obtained in this step are in OK state, the faulty disk does not have residual logical arrays to be deleted. In this case, skip this step and proceed with the next step. |
a. Execute the hpssacli ctrl slot=n logicaldrive y delete forced command to delete the residual array information. Replace y with the Logical Device Number obtained in the previous step. After deletion, repeat step Figure 1412.a to find that no logical drives in Failed or Missing state exist.
3. Configure RAID for the new disk. For more information, see “HP SSA RAID controller.”
Adding the new disk back to the cluster
Use the following method to add the new disk back to the cluster from the Web management interface of the storage system.
1. As shown in Figure 143, log in to the Web management interface of the storage system, and verify that the cluster health has restored to 100%. Click Hosts > Storage Nodes, and then click the name of the desired host to access the disk management page.
Figure 143 Storage node management page
2. As shown in Figure 144, click the Create button and select the identified new disk. Finally, verify the information and complete the operation.
One-click replacing data disks
If the ONEStor version supports one-click disk replacement (the Replace Disk button exists on the disk management page), see this section for the guide on data disk replacement. If the ONEStor version does not support one-click disk replacement (the Replace Disk button does not exist on the disk management page), see Replacing data disks (without using one-click disk replacement) for the guide on data disk replacement.
You can perform a one-click disk replacement only when the following conditions are all met:
· The ONEStor version is R2130 or later.
· The server model is UIS 3010 G3, UIS3020 G3, UIS3030 G3, UIS3040 G3, or UIS 4500 G3.
· If the server uses a RAID controller, the RAID controller model must be LSI 9361 or LSI 9460.
As shown in Figure 145, if the requirements for one-click disk replacement are met, the disk management page will display the Replace Disk button.
Figure 145 Location of the Replace Disk button
The one-click data disk replacement process is as shown in Figure 146.
Figure 146 Flowchart for one-click data disk replacement
Locating the faulty disk
1. As shown in Figure 147, from the left navigation pane, select Hosts > Storage Nodes. You can see that the number of disks on the host has decreased.
Figure 147 The number of disks on the host has decreased
2. As shown in Figure 148, click the host name to access the disk management page, where you can see the disk letter of the faulty disk.
Figure 148 Viewing the disk letter of the faulty disk
3. You can turn on the LED for the faulty disk in one of the following methods.
¡ As shown in Figure 149, click Cluster > Topology, select the abnormal host, and then turn on the LED for the faulty disk.
Figure 149 Method 1: Turning on the LED for the faulty disk
¡ As shown in Figure 150, click Hosts > Storage Nodes, click the name of the desired host, and then click More in the Actions column for the faulty disk.
Figure 150 Method 2: Turning on the LED for the faulty disk
Pulling out the faulty disk and inserting a new disk
One-click replacing the disk
1. As shown in Figure 151, click Hosts > Storage Nodes to access the host management page, and click the name of the desired host. Select the disk letter of the disk to be replaced, and then click Replace Disk.
2. As shown in Figure 152, select the new disk, and then click Next. Click Create, verify the settings, and then wait for the automatic data balancing to be completed.
Figure 152 Selecting the new disk
Replacing system disks
· When only one system disk is faulty:
¡ The system disk usually uses RAID 1. If only one system disk fails, you can remove the faulty disk and then insert a new disk, which will automatically start rebuilding. You can determine the disk state by observing the disk LED. For more information, see the user guide for the server. If the system disk does not use RAID 1, contact Technical Support for help.
¡ If the system does not automatically start rebuilding after a new disk is inserted, you must restart the node and enter the BIOS to manually select Rebuild. For the shutdown and startup procedures, see the node shutdown operation guide for the product. For how to select the rebuilding operation in the BIOS, see the BIOS usage guide for the corresponding model.
· When two system disks both fail:
If two system disks both fail, contact Technical Support to confirm the issue and then follow the node repair guide for the product.
Replacing FlashCache and Scache cache disks (one-click disk replacement)
If the ONEStor version supports one-click disk replacement (the Replace Disk button available on the disk management page), follow this section to replace the FlashCache disk. If the ONEStor version does not support one-click disk replacement (the Replace Disk button not available on the disk management page), replace the FlashCache disk according to the ONEStor version by referring to the sections Replace FlashCache cache disks (ONEStor R21xx non-one-click disk replacement) and Replace FlashCache cache disks (ONEStor E33xx non-one-click disk replacement).
You can perform a one-click disk replacement only when the following conditions are met simultaneously:
· The ONEStor version is R2130 or later.
· The server model is UIS 3010 G3, UIS3020 G3, UIS3030 G3, UIS3040 G3, or UIS 4500 G3.
· If the server uses a RAID controller, the RAID controller model must be LSI 9361 or LSI 9460.
If the requirements for one-click disk replacement are met, the disk management page will have a disk management button, as shown in Figure 153.
Figure 153 Location of the Replace Disk button
Locating the faulty disk
1. As shown in Figure 154, from the left navigation pane, select Hosts > Storage Nodes. You can see that the number of disks on the host has decreased.
Figure 154 The number of disks on the host has decreased
2. As shown in Figure 155, click the host name to open the disk management page, where you can see the disk letter of the faulty disk.
Figure 155 View the disk letter of the faulty disk
3. You can turn on the LED for a faulty hard disk in the following methods:
¡ As shown in Figure 156, click Cluster > Topology, select the abnormal host, and then turn on the LED for the faulty SSD disk.
Figure 156 Turning on the LED for the faulty disk (1)
¡ As shown in Figure 157, click Hosts > Storage Nodes, click the name of the desired host to open the disk management page. In the Actions column for the faulty disk, click More and select Turn On Locator LED.
Figure 157 Turning on the LED for the faulty disk (2)
Pulling out the faulty disk and inserting a new disk
One-click disk replacement
1. As shown in Figure 158, click Hosts > Storage Nodes to enter the host management page. Click the name of the desired disk to enter the disk management page. Select the disk letter of the faulty SSD disk, and then click Replace Disk.
2. As shown in Figure 159, select the replacement disk, and then click Next.
3. As shown in Figure 160, click Create, confirm the information, and then wait for the automatic data balancing to be completed.
Figure 160 Click the Create button.
Replacing FlashCache cache disks (ONEStor R21xx non-one-click disk replacement)
If the ONEStor version doe not support one-click disk replacement (the Replace Disk button not available on the disk management page), follow this section or section Replace FlashCache cache disks (ONEStor E33xx non-one-click disk replacement) to replace the FlashCache disk. If the ONEStor version supports one-click disk replacement (the Replace Disk button available on the disk management page), replace the FlashCache disk following the guide in section Replace FlashCache and Scache cache disks (one-click disk replacement).
You can perform a one-click disk replacement only when the following conditions are met simultaneously:
· The ONEStor version is R2130 or later.
· The server model is UIS 3010 G3, UIS3020 G3, UIS3030 G3, UIS3040 G3, or UIS 4500 G3.
· If the server uses a RAID controller, the RAID controller model must be LSI 9361 or LSI 9460.
Figure 161 Location of the Replace Disk button (New UI)
Pulling out the faulty disk and inserting a new disk
Check the RAID controller type, and then pull out the faulty disk and insert a new disk as described in the related sections in "Pulling out the faulty disk and inserting a new disk."
Removing FlashCache information from data disks
1. When the SSD cache disk of FlashCache fails, the data disks accelerated by it enter a down state. As shown in Figure 162, use the ceph osd tree command to view the faulty data disks. In this example, osd.8, osd.20, osd.32, and osd.44 are all in the down state.
2. As shown in Figure 163, use the lsblk command to view the FlashCache ID of the OSD in down state.
Figure 163 Obtain the flashcache identifier
3. Remove FlashCache information from data disks.
a. As shown in Figure 164, use the ls /proc/sys/dev/flashcache |grep 0“flashcahe identifier’,” command to obtain FlashCache information.
Figure 164 Obtain FlashCache information
b. As shown in Figure 165, execute the sysctl -w dev.flashcache.xxx.fast_remove=1 command to remove the fast_remove tag, where xxx is the FlashCache information obtained in the previous step.
Figure 165 Remove the fast remove tag
c. As shown in Figure 166, use the dmsetup command to delete FlashCache information on a data disk.
Figure 166 Delete FlashCache information
4. Repeat steps 2 and 3 to clear FlashCache information on other OSDs.
Deleting a data disk
a. As shown in Figure 167, execute the mount command to view the mount information and corresponding disk letter of the faulty OSD. In this example, osd.8 corresponds to the disk letter sde.
Figure 167 View the corresponding disk letter for OSD
b. As shown in Figure 168, execute the umount /var/lib/ceph/osd/ceph-8 command to unmount the faulty disk.
Figure 168 Unmounting the faulty disk
2. Repartition and mount the faulty OSD
a. As shown in Figure 169, execute the ceph-disk zap /dev/sde command to format the faulty OSD.
As shown in Figure 170, execute the lsblk command. You can see that disk sde does not have partitions.
b. As shown in Figure 171, execute the gdisk /dev/sde command to partition the new disk. During this process, enter n and press carriage return multiple times, then enter w and y.
Figure 171 Partitioning the new disk
As shown in Figure 172, execute the lsblk command. You can see that disk sde has partitions.
c. As shown in Figure 173, execute the mkfs.xfs -f /dev/sde1 command to configure a file system for the newly created partition.
Figure 173 Configuring the file system
d. As shown in Figure 174, execute the mount /dev/sde1 /var/lib/ceph/osd/ceph-8 command to remount the OSD in the down state.
3. Repeat steps 1 and 2 to remount other down-state OSDs (osd.20, osd.32, osd.44 in this example).
4. As shown in Figure 175, delete the remounted data disk on the storage systems management page and wait for the delete task to complete.
Figure 175 Delete the remounted data disk
Adding the data disk back to the cluster
See "Adding the new disk back to the cluster."
Replacing FlashCache cache disks (ONEStor E33xx non-one-click disk replacement)
If the ONEStor version doe not support one-click disk replacement (the Replace Disk button not available on the disk management page), follow this section or section Replace FlashCache cache disks (ONEStor R21xx non-one-click disk replacement) to replace the FlashCache disk. If the ONEStor version supports one-click disk replacement (the Replace Disk button available on the disk management page), replace the FlashCache disk following the guide in section Replace FlashCache and Scache cache disks (one-click disk replacement).
You can perform a one-click disk replacement only when the following conditions are met simultaneously:
· The ONEStor version is R2130 or later.
· The server model is UIS 3010 G3, UIS3020 G3, UIS3030 G3, UIS3040 G3, or UIS 4500 G3.
· If the server uses a RAID controller, the RAID controller model must be LSI 9361 or LSI 9460.
Figure 176 Location of the Replace Disk button (New UI)
Pulling out the faulty disk and inserting a new disk.
Check the RAID controller type, and then pull out the faulty disk and insert a new disk as described in the related sections in “Pulling out the faulty disk and inserting a new disk.”
Deleting a data disk
1. As shown in Figure 177, execute the ceph osd tree command to view data disks in down state. In this example, the data disks in down state are osd.2, osd.8, osd.14, and osd.20.
Figure 177 Check data disk status
2. Unmount the faulty disk
a. As shown in Figure 178, execute the mount command to view the mount information and corresponding disk letter of the OSD. In this example, osd.8 corresponds to the disk letter sde.
Figure 178 View the corresponding disk letter for OSD
b. As shown in Figure 179, execute the umount /var/lib/ceph/osd/ceph-x (x is the number of the faulty OSD) to unmount the faulty disk.
Figure 179 Unmounting the faulty disk
3. Format the corresponding disk.
a. As shown in Figure 180, execute the ceph-disk zap /dev/sde command to format the disk corresponding to the faulty OSD.
b. As shown in Figure 181, execute the lsblk command. You can see that disk sde does not have partitions.
c. Execute the following commands to delete the faulty OSD, where x is the number of the faulty OSD. Before deletion, confirm that the OSD number is correct to avoid accidental deletion.
- ceph osd crush remove osd.x
- ceph auth del osd.x
- ceph osd rm osd.x
4. Query and delete the cache partition. For more information, see "Finding and deleting FlashCache information and cache partitions."
5. Repeat the previous steps for the other faulty OSDs (in this example, osd.2, osd.14, osd.20).
Adding the data disk and cache disk back to the cluster
As shown in Figure 182, click Hosts, click the name of the desired host, select the data disk and cache disk, and then add the data disk and cache disk back to the cluster as prompted.
Figure 182 Disk management page
Replacing Scache cache disks (ONEStor E31xx, E33xx, one-click disk replacement)
If ONEStor supports one-click disk replacement (the Replace Disk button available on the disk management page), follow this section to replace the cache disk. If ONEStor does not support one-click disk replacement (the Replace Disk button not available on the disk management page), replace the cache disk following the guide in section Replace Scache cache disks (ONEStor E31xx, E33xx non-one-click disk replacement).
You can perform a one-click disk replacement only when the following conditions are met simultaneously:
· The ONEStor version is R2130 or later.
· The server model is UIS 3010 G3, UIS3020 G3, UIS3030 G3, UIS3040 G3, or UIS 4500 G3.
· If the server uses a RAID controller, the RAID controller model must be LSI 9361 or LSI 9460.
If the requirements for one-click disk replacement are met, the disk management page will have a disk management button, as shown in Figure 183.
Figure 183 Location of the Replace Disk button
Locating the faulty disk
1. As shown in Figure 184, from the left navigation pane, select Hosts > Storage Nodes. You can see that the number of disks on the host has decreased.
Figure 184 The number of disks on the host has decreased
2. As shown in Figure 185, click the host name to open the disk management page, where you can see the disk letter of the faulty disk.
Figure 185 View the disk letter of the faulty disk
3. You can turn on the LED for a faulty hard disk in the following methods:
¡ As shown in Figure 186, click Cluster > Topology, select the abnormal host, and then turn on the LED for the faulty SSD disk.
Figure 186 Turning on the LED for the faulty disk (1)
¡ As shown in Figure 187, click Hosts > Storage Nodes, click the name of the desired host to open the disk management page. In the Actions column for the faulty disk, click More and select Turn On Locator LED.
Figure 187 Turning on the LED for the faulty disk (2)
Pulling out the faulty disk and inserting a new disk
Check the RAID controller type, and then pull out the faulty disk and insert a new disk as described in the related sections in "Pulling out the faulty disk and inserting a new disk."
One-click disk replacement
1. As shown in Figure 188, click Hosts > Storage Nodes to enter the host management page. Click the name of the desired disk to enter the disk management page. Select the disk letter of the faulty SSD disk, and then click Replace Disk.
2. As shown in Figure 189, select the replacement disk, and then click Next.
3. As shown in Figure 190, click Create, confirm the information, and then wait for the automatic data balancing to be completed.
Figure 190 Click the Create button
Replacing Scache cache disks (ONEStor E31xx, E33xx non-one-click disk replacement)
If ONEStor doe not support one-click disk replacement (the Replace Disk button not available on the disk management page), follow this section to replace the cache disk. If ONEStor supports one-click disk replacement (the Replace Disk button available on the disk management page), replace the cache disk following the guide in section Replace Scache cache disks (ONEStor E31xx, E33xx, one-click disk replacement).
You can perform a one-click disk replacement only when the following conditions are met simultaneously:
· The ONEStor version is R2130 or later.
· The server model is UIS 3010 G3, UIS3020 G3, UIS3030 G3, UIS3040 G3, or UIS 4500 G3.
· If the server uses a RAID controller, the RAID controller model must be LSI 9361 or LSI 9460.
Figure 191 Location of the Replace Disk button
Pulling out the faulty disk and inserting a new disk.
Check the RAID controller type, and then pull out the faulty disk a new disk as described in the related sections in “Pulling out the faulty disk and inserting a new disk.”
Deleting a data disk
1. As shown in Figure 192, execute the ceph osd tree command to view data disks in down state. In this example, the data disks in down state are osd.2, osd.8, osd.14, and osd.20.
Figure 192 Check data disk status
2. Unmount the faulty disk
a. As shown in Figure 193 execute the mount command to view the mount information and corresponding disk letter of the OSD. In this example, osd.8 corresponds to the disk letter sde.
Figure 193 View the corresponding disk letter for OSD
b. As shown in Figure 194, execute the umount /var/lib/ceph/osd/ceph-x (x is the number of the faulty OSD) to unmount the faulty disk.
Figure 194 Unmounting the faulty disk
3. Format the corresponding disk.
a. As shown in Figure 195, execute the ceph-disk zap /dev/sde command to format the disk corresponding to the faulty OSD.
b. As shown in Figure 196, execute the lsblk command. You can see that sde does not have partitions.
c. Execute the following commands to delete the faulty OSD, where x is the number of the faulty OSD. Before deletion, confirm that the OSD number is correct to avoid accidental deletion.
- ceph osd crush remove osd.x
- ceph auth del osd.x
- ceph osd rm osd.x
4. Query and delete the cache partition. For more information, see "Finding and deleting Scache information and cache partitions".
5. Repeat the previous steps for the other faulty OSDs (in this example, osd.2, osd.14, osd.20).
Adding the data disk and cache disk back to the cluster
As shown in Figure 197, click Hosts, click the name of the desired host, select the data disk and cache disk, and then add the data disk and cache disk back to the cluster as prompted.
Figure 197 Disk management page
Using Toolkit to perform an inspection
Use the toolkit to perform an inspection after completing the above operations. For information about how to use Toolkit, see the Toolkit usage guide.
· If the inspection result does not include any error messages, the component replacement is successful.
· If the inspection results include error messages, resolve the issues based on the messages. If you cannot resolve the issues, contact Technical Support.
Common operations during disk replacement
Finding FlashCache partition information
Method 1 (recommended)
As shown in Figure 198, use the lsblk command to view the mount path and symbolic link (symlink) information of the OSD. In this example, osd.11 corresponds to data disk sde. After UUID comparison, you can find that the FlashCache partition corresponding to sde is sdf8.
Figure 198 Finding FlashCache partition information (Method 1)
Method 2
If you cannot find the UUID corresponding to the faulty OSD through Method 1, you can eliminate all of the FlashCache partitions corresponding to normal OSDs. The remaining partition must be the one belonging to the faulty OSD. After finding the desired UUID, execute the following commands in sequence to remove the related FlashCache symlinks.
1. umount /var/lib/ceph/osd/ceph-x (x is the number of the faulty OSD. You can edit it as needed.)
2. dmsetup remove /dev/mapper/xxxx-xxxx-xxxx (xxxx-xxxx-xxxx is the UUID of the FlashCache partition. You can edit it as needed.)
Finding and deleting FlashCache information and cache partitions
Finding FlashCache ID information
As shown in Figure 199, execute the lsblk command to view FlashCache ID information in cache disks. In this example, SSDs sdk and sdj act as cache disks. The information marked in the following figure is FlashCache ID information.
Figure 199 Viewing FlashCache ID information
Finding FlashCache partition ID information
As shown in Figure 200, repeatedly execute the lsblk |grep “FlashCacheID command. You can edit FlashCacheID as needed. Identify whether a command output containing only one record exists. The record carries the FlashCache partition ID corresponding to the removed disk. In this example, the FlashCache partition ID corresponding to the removed disk is e3abd762-ad2e-4221-b6d6-e9a29b6eae82.
Figure 200 Viewing FlashCache partition ID information
Checking for residual FlashCache information
As shown in Figure 201, execute the ls /proc/sys/dev/flashcache |grep “FlashCacheID” command to check the FlashCache partition of the removed disk for residual FlashCache information. You can edit FlashCacheID as needed.
Figure 201 Checking for FlashCache information
Deleting residual FlashCache information
If residual FlashCache information is found, execute the following commands in sequence to remove residual FlashCache information from the disk. If no residual FlashCache information is found, skip this step.
[root@node127 ~]# sysctl -w dev.flashcache.f28c1e04-cf71-4853-b628-8017db519b4a+e3abd762-ad2e-4221-b6d6-e9a29b6eae82.fas remove=1
[root@node127 ~]# dmsetup remove e3abd762-ad2e-4221-b6d6-e9a29b6eae82
|
NOTE: · Before executing the sysctl –w command, use the dmsetup table command to identify the write mode. If the write mode is write-back, the sysctl –w command is required. If the write mode is write-around (write-through), the sysctl –w command is not required, because no fast_remove flag exists in this mode. · The italicized part in the first command is the residual FlashCache information of the FlashCache partition. The italicized part in the second command is the ID of the FlashCache partition. You can edit them as needed. |
Verifying that residual FlashCache information has been deleted
As shown in Figure 202, execute the lsblk command to view information about the related cache disk. In this example, you can see that sdk2 no longer has FlashCache ID information.
Figure 202 Viewing cache disk information
Deleting cache partitions
As shown in Figure 203, execute the parted /dev/sdx -s rm n command to delete a cache partition from a cache disk. sdx is the disk letter of the cache disk, and n is the number of the partition to be deleted. You can edit them as needed.
Figure 203 Deleting a partition
Verifying that the desired cache partition has been deleted
As shown in Figure 204, execute the lsblk command to verify that the desired cache partition has been deleted completely. In this example, you can see that the second partition of sdk has been completely deleted.
Figure 204 Viewing cache disk information
Finding and deleting Scache information and cache partitions
Method 1
If the following conditions are met, you can directly find the cache partitions corresponding to the faulty OSD:
· The ONEStor version is E3322 or later.
· You know the ID of the faulty OSD.
1. As shown in Figure 205, execute the following commands in sequence to find the cache partitions corresponding to the faulty OSD. In this example, the cache partitions corresponding to the faulty disk are sdl5, sdl6, and sdl7.
a. ll /var/lib/ceph/osd-cache-config/ceph-x
b. cd /var/lib/ceph/osd-cache-config/ceph-x
c. cat block.db_uuid
d. cat block.wal_uuid
e. cat fcache_uuid
f. ll /dev/disk/by-partuuid/ | grep xxx
|
NOTE: · If the current ONEStor version is upgraded from E21xx, R21xx, E31xx, or R31xx, you only need to execute the cat fcache_uuid command in step c. When you delete cache partitions, only one cache partition needs to be deleted. · In the above commands, x is the OSD ID, and xxx in step f is the partition ID obtained in step c. You can edit them as needed. |
Figure 205 Finding cache partitions
2. Execute the parted /dev/sdl -s rm x command to delete partitions. x is the number of the partition to be deleted. You can edit it as needed.
Method 2
When the ONEStor version is lower than E3322, there is no /var/lib/ceph/osd-cache-config/ path to record cache partition information for OSDs. If the disk has been removed or is running incorrectly, the mount directory might be inaccessible. In this situation, you must use the existing OSD number to find the cache partitions corresponding to the faulty disk.
If the current ONEStor version meets one of the following requirements, find and delete the partitions corresponding to the faulty OSD under the guidance of the following example:
· It is E31xx or R31xx.
· It is an E33xx version lower than E3322, and is upgraded from E21xx, R21xx, E31xx, or R31xx, and the version is lower than E3322.
1. As shown in Figure 206, use the lsblk command to view the disk letter of the cache disk. In this example, the disk letter is sdl, and the cache disk has five partitions (1 to 5).
Figure 206 Cache disk information
2. As shown in Figure 207, execute the for i in `cat /var/lib/ceph/osd/ceph-*/fcache_uuid`; do ll /dev/disk/by-partuuid/ | grep $i ; done; command to view cache partition information. Compare the command output with the cache partition information obtained in the previous step. Except partition 1, any cache partition that is not included in the command output is a partition corresponding to the faulty OSD. In this example, the partition corresponding to the faulty OSD is sdl3.
Figure 207 Viewing cache partition information
|
NOTE: If the ONEStor version is higher than E3322, the following two commands need to be executed together with the for i in `cat /var/lib/ceph/osd/ceph-*/fcache_uuid`; do ll /dev/disk/by-partuuid/ | grep $i ; done; command: · for i in `cat /var/lib/ceph/osd/ceph-*/block.wal_uuid`; do ll /dev/disk/by-partuuid/ | grep $i ; done; · for i in `cat /var/lib/ceph/osd/ceph-*/block.db_uuid `; do ll /dev/disk/by-partuuid/ | grep $i ; done; Compare the outputs of the three commands with the cache partition information obtained in the previous step. Only in this way can you find the partitions corresponding to the faulty OSD. |
3. Execute the parted /dev/sdx -s rm n command to delete a cache partition from a cache disk. sdx is the disk letter of the cache disk, and n is the number of the partition to be deleted. You can edit them as needed.
Configure RAID for the new disk
The RAID controller operations introduced in this document are applicable to only H3C servers. For RAID controller operations on a third-party server, contact the related manufacturer.
PMC RAID controller
Method 1 (recommended)
1. As shown in Figure 208, select Resources > RAID Controllers from the left navigation pane.
2. Enter the IP address of the node requiring disk replacement, and the root password.
3. Click Scan.
4. Select the unknown disk. Make sure read/write caching is disabled for SSD RAID controllers and is enabled for HDD RAID controllers.
5. Click Recognize Disk.
Figure 208 Configuring RAID for the new disk (Method 1 for PMC RAID controllers)
Method 2
If Method 1 is not supported, use this method.
1. As shown in Figure 209, execute the arcconf getconfig 1 pd command to find the new disk, and then record its CD information (C for channel, D for device). The new disk is in Raw status. In this example, the CD information of the new disk is 0, 15.
Figure 209 Obtaining the CD information of the new disk
2. Execute the arcconf task start 1 device C D initialize command to initialize the new disk. You can edit CD information in the command as needed.
3. Execute the arcconf create 1 logicaldrive max simple_volume C D command to configure RAID for the new disk. You can edit CD information in the command as needed.
4. After the RAID configuration is completed, a new logical disk array will appear. You can execute the arcconf getconfig 1 ld command to obtain the number of the new logical disk array.
5. Identify the disk type, and then execute the commands listed in Table 9 in sequence to set the caching modes. For example, y is the number of the new logical disk array. You can edit it as needed.
Table 9 Setting the caching modes
Disk type |
Operations and commands |
Remarks |
HDD |
1. Enable read caching for the RAID controller: 2. Enable write caching for the RAID
controller and enable power failure protection mode: 3. Disable physical write caching: |
As shown in Figure 210 and Figure 211, after you execute step 3, the following error message might be prompted: Controller Global Physical Devices Cache policy is already Disabled. This is a normal phenomenon. |
SSD |
1. Disable read caching for the RAID controller: 2. Disable write caching for the RAID
controller: 3. Disable physical write caching: |
Figure 210 Disabling physical write caching (HDD)
Figure 211 Disabling physical write caching (SSD)
6. As shown in Figure 212, execute the lsblk command to identify whether a new disk without partitions exists. If such a disk is found, the RAID configuration is successful.
Figure 212 Successful RAID configuration (PMC RAID controller)
LSI RAID controller
Method 1 (recommended)
1. As shown in Figure 213, select Resources > RAID Controllers from the left navigation pane.
2. Enter the IP address of the node requiring disk replacement, and the root password.
3. Click Scan.
4. Select the unknown disk. Make sure read/write caching is disabled for SSD RAID controllers and is enabled for HDD RAID controllers.
5. Click Recognize Disk.
Figure 213 Configuring RAID for the new disk (Method 1 for LSI RAID controllers)
Method 2
If Method 1 is not supported, use this method.
1. Execute the /opt/raid/MegaRAID/storcli/storcli64 show command to determine the compatibility of StorCLI with the RAID controller. Based on the result, select the appropriate instruction set for subsequent operations.
¡ If the output contains the LSI RAID controller to be operated on, StorCLI is compatible with the RAID controller. See this section to obtain the disk letter.
¡ If the output does not contain the LSI RAID controller to be operated on, StorCLI is not compatible with the RAID controller. See "Common commands used in MegaCLI" to use the related commands in MegaCLI to obtain the disk letter.
CAUTION: To avoid anomalies caused by compatibility issues, ensure StorCLI compatibility with LSI RAID controllers, and select the correct instruction set before you perform operations related to LSI RAID controllers. |
Figure 214 Determining compatibility
2. As shown in Figure 215, execute the /opt/raid/MegaRAID/storcli/storcli64 /cN add vd r0 drives=E:S command to configure RAID0 for a single disk. E and S represent the enclosure number and the disk slot number obtained in “Querying disk letters through disk slot number”, respectively, and N represents the RAID controller number. You can edit them as needed. In this example, you can see that a logical disk array is created successfully.
Figure 215 Creating a logical disk array (LSI RAID controller)
3. After the RAID configuration is completed, a new logical disk array will appear. You can execute the /opt/raid/MegaRAID/storcli/storcli64 /cN show command to view the logical disk array ID corresponding to the current E:S information. N represents the RAID controller number. You can edit it as needed. To view the caching status of the logical disk array, execute the /opt/raid/MegaRAID/storcli/storcli64 /cN/vx show all command. x represents the logical disk array ID obtained in the previous step, and N represents the RAID controller number. You can edit them as needed. Verify that the caching status meet the requirements listed in “Checking disk caches”. If the caching status fails to meet the requirements, execute the commands listed in Table 10 in sequence to adjust the caching modes according to the disk type.
Table 10 Setting the caching modes
Disk type |
Operations and commands |
HDD&SSD |
4. Disable write caching on a disk: 5. Enable write caching for the RAID
controller: 6. Enable read caching for the RAID
controller: |
1. Disable write caching for the RAID
controller: 2. Disable read caching for the RAID
controller: |
|
x represents the logical disk array ID obtained in the previous step, and N represents the RAID controller number. You can edit them as needed. |
4. As shown in Figure 216, execute the lsblk command to identify whether a new disk without partitions exists. If such a disk is found, the RAID configuration is successful.
Figure 216 Successful RAID configuration (LSI RAID controller)
HP SSA RAID controller
1. As shown in Figure 217, execute the hpssacli ctrl slot=n pd all show command to find the new disk (a disk in unassigned state), and then record its disk slot number. n represents the slot number of the RAID controller. You can edit it as needed. In this example, the slot number of the new disk is 1I:4:6.
Figure 217 Obtaining the slot number of a disk
2. Execute the hpssacli ctrl slot=n create type=ld drives=xx:x:x raid=0 command to configure RAID for the new disk. n represents the slot number of the RAID controller, and xx:x:x represents the slot number of the new disk. You can edit them as needed.
3. To avoid incorrect initial caching modes, adjust the caching modes as follows:
¡ Disable caching on the physical disk (n represents the slot number of the RAID controller. You can edit it as needed.):
hpssacli ctrl slot=n modify drivewritecache=disable
¡ Enable caching on a logical disk (n represents the slot number of the RAID controller, and y represents the ID of the desired logical disk. You can edit them as needed.):
hpssacli ctrl slot=n logicaldrive y modify caching=enable
¡ Enable power failure protection for the RAID controller, and then enter y to confirm the operation (n represents the slot number of the RAID controller. You can edit it as needed.):
hpssacli ctrl slot=n modify nobatterywritecache=disable
4. As shown in Figure 218, execute the lsblk command to identify whether a new disk without partitions exists. If such a disk is found, the RAID configuration is successful.
Figure 218 Successful RAID configuration (HP SSA RAID controller)
Common commands used in MegaCLI
· To avoid anomalies caused by compatibility issues, ensure StorCLI compatibility with LSI RAID controllers, and select the correct instruction set before you perform operations related to LSI RAID controllers.
· If StorCLI is not compatible with the related LSI RAID controller, see this section and use the relevant MegaCLI commands for operations.
· To view the MegaCLI commands not mentioned in this document, execute the megacli -help command.
· To view the StorCLI commands not mentioned in this document, execute the /opt/raid/MegaRAID/storcli/storcli64 /cx help command. x represents the slot number of the RAID controller. You can edit it as needed.
Checking cache information on disks
When an LSI RAID controller is used, you can use the following MegaCLI commands to view cache information on disks.
Operation |
Command |
Remarks |
View the write caching status and RAID controller status |
megacli -LDinfo -Lall -ax |
x represents the slot number of the RAID controller. You can edit it as needed. |
Querying disk letters through disk slot number
When an LSI RAID controller is used, you can use the following MegaCLI commands for disk letter query.
Operation |
Command |
Remarks |
Query the RAID controller number |
megacli -AdpAllinfo -aALL | more |
In the command output, the number following Adapter # represents the RAID controller number. |
Query the virtual drive number of the disk corresponding to a slot number |
megacli LDPDinfo –ax |
x represents the obtained RAID controller number. |
Removing the faulty disk and inserting a new disk
When an LSI RAID controller is used, you can use the following MegaCLI commands for disk removal and disk insertion.
Operation |
Command |
Remarks |
Turn on the LED for a disk |
megacli –PdLocate -start -physdrv[E:S] –aN |
E and S represent the enclosure number and disk slot number obtained in “Querying disk letters through disk slot number“, respectively. N represents the RAID controller number. You can edit them as needed. |
Obtain the logical disk array ID to which the residual cache data belongs |
megacli -GetPreservedCacheList -aN |
N represents the RAID controller number. You can edit it as needed. |
Clear residual cache data |
megacli -DiscardPreservedCache -Lx -aN |
N represents the RAID controller number, and x represents the logical disk array ID obtained in the previous step. You can edit them as needed. |
Check for logical disk arrays in Failed state |
megacli -LDinfo -Lall –aN |
N represents the RAID controller number. You can edit it as needed. |
Delete residual information about a logical disk array. |
megacli -cfglddel -lx -aN |
N represents the RAID controller number, and x represents the ID of the logical disk array in Failed state. You can edit them as needed. |
Configuring RAID for new disks
When an LSI RAID controller is used, you can use the following MegaCLI commands to configure RAID for new disks.
Operation |
Command |
Remarks |
Create a logical disk array |
megacli -CfgLDAdd -R0[E:S] –aN |
E and S represent the enclosure number and disk slot number obtained in “Querying disk letters through disk slot number“, respectively. N represents the RAID controller number. You can edit them as needed. |
Viewing the caching status of a new logical disk array |
megacli -LDinfo -Lx -aN |
x represents the ID of the new logical disk array, and N represents the RAID controller number. You can edit them as needed. |
Configure caching mode settings for HDDs |
1. Disable write caching on disks: megacli -ldsetprop DisDskCache -Lx -aN 2. Enable write caching for a RAID controller: megacli -LDSetProp WB -Lx -aN 3. Enable power failure protection for a RAID controller: megacli -ldsetprop NoCachedBadBBU -Lx -aN 4. Enable read caching for a RAID controller: megacli -LDSetProp RA -Lx -aN |
x represents the ID of the new logical disk array, and N represents the RAID controller number. You can edit them as needed. |
Configure caching mode settings for SSDs |
1. Disable write caching on disks (When you execute this command, the system might prompt that this command is not supported. In this situation, this command is not required.): megacli -ldsetprop DisDskCache -Lx egN 2. Disable write caching for a RAID controller: megacli -LDSetProp WT -Lx -aN 3. Disable read caching for a RAID controller: megacli -LDSetProp NORA -Lx -aN |