- Table of Contents
- Related Documents
-
| Title | Size | Download |
|---|---|---|
| 07-Card Issues | 196.71 KB |
Troubleshooting hardware
Card issues
Abnormal card status
Symptom
· A card is abnormal. (For example, the card status displays Absent or Fault after you execute the display device command.)
· A card fails to boot, or it reboots unexpectedly or repeatedly.
Common causes
The following are the common causes for this type of issue:
· The card is not securely installed.
· The card is damaged.
· Lighting of LEDs on the card panel is abnormal.
· A power supply has failed.
· The power supply output power is insufficient.
· The host software version does not support the card.
Troubleshooting flow
Figure 1 shows the troubleshooting flowchart.
Figure 1 Flowchart for troubleshooting the issue of abnormal card status
Solution
Card in Absent status
1. Identify whether the card is securely installed. Examine for gaps between the card and the chassis. You can also reinstall the card. Before reinstallation, make sure the connector of the card is not distorted or dirty.
2. Move the card to another slot, or move a normal card from another slot to the slot where the card is installed. This operation helps you identify whether the card is faulty.
3. Identify whether the LEDs on the card panel are lit.
4. Identify whether the power supply output power is insufficient. For example, add power supplies and identify whether the card status restores to normal.
5. Identify whether the host software version supports the card.
a. Execute the display version command to view the software version of the host.
b. Contact Technical Support to identify whether the current software version of the host supports the card.
c. If the current software version does not support the card, upgrade it to a compatible version. Before version upgrade, make sure the new version is compatible with other cards.
6. If the card is a interface module, first ensure that the MPU is operating correctly and that the subcard connectors are not deformed or dirty.
7. If you confirm that the card is faulty, replace it. Collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Card in Fault status
1. Check the system power consumption. If the system power consumption is insufficient, the card will enter Fault status.
2. Wait about 10 minutes to identify whether the card remains in Fault status or is in Normal status and then reboots again. If the card is in Normal status and automatically reboots, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
3. Move the card to another slot to identify whether the slot is faulty.
4. If you confirm that the card is faulty, replace it. Collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Card reboot anomaly
A card reboot refers to the situation where the status of the card is normal after it reboots.
1. Determine whether a user rebooted the card by using the reboot command or by powering off and then powering on the card during the period.
2. You can use the display version command to obtain the reason for the most recent reboot of the card. For example, Last reboot reason indicates that the reason for the most recent reboot of the card was that the device was powered on.
<Sysname> display version
H3C Comware Software, Version 7.1.075, Release 7751P01
Copyright (c) 2004-2017 New H3C Technologies Co. Ltd. All rights reserved.
H3C xxx uptime is 0 weeks, 0 days, 4 hours, 24 minutes
Last reboot reason : Cold reboot……
3. If all cards reboot simultaneously, verify the following information:
¡ The power supplies are operating correctly.
¡ The external power source does not have a power outage.
¡ The power cables are connected securely.
4. If you cannot confirm the above information, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
MPU startup failure
Symptom
The original MPU or the standby MPU newly installed on the device cannot start up.
Common causes
The following are the common causes for this type of issue:
· The MPU cannot be powered up due to hardware failure.
· The basic section of BootWare for the MPU is damaged.
· The BootWare cannot operate due to memory or CPU hardware failure.
· The app software version is lost or does not match the hardware, or the app software version verification has failed.
· The model of the standby MPU is different from that of the original MPU.
· The software versions of the standby MPU and the original MPU are different.
Troubleshooting flow
Figure 2 shows the flowchart for troubleshooting the issue of original MPU startup failure.
Figure 2 Flowchart for troubleshooting the issue of original MPU startup failure
Figure 3 shows the flowchart for troubleshooting the issue of startup failure of the standby MPU newly installed on the device.
Solution
To troubleshoot the issue of original MPU startup failure:
1. Identify whether the running status LED (RUN) on the MPU is on.
This serves as an important indicator of whether the system can boot because the RUN LED will flash fast after the basic section of BootWare starts up.
Table 1 MPU running status LED description
|
LED |
LED status |
Description |
|
RUN |
Off |
The card is faulty or is not in position. |
|
Flashing green at 4 Hz |
The card is loading or downloading software. |
|
|
Flashing green at 0.5 Hz |
The card is operating correctly. |
¡ Situation 1: The LED flashes fast.
If the LED flashes fast at 4 Hz after you power on the device, the basic section starts up normally. Proceed to step 2.
¡ Situation 2: The LED is off.
If the LED is off, the device cannot be powered on or the basic section of BootWare is damaged.
First, identify whether the device is powered on. Identify whether the internal MPU has a LED flashes green or is steady on by observing from the front of the MPU air inlet vents. You can also remove the MPU after a period of time and examine the processor's heat sink for warmth. If the device is not powered on, check the power source and power supplies. Hardware faults can also prevent the MPU from being powered on.
If the device is powered on normally, the basic section of BootWare is damaged and must be returned to R&D for handling.
|
|
NOTE: · In this situation, the LED has never been on after poweron. This situation does not include the case where the LED flashes for more than 5 seconds and then turns off. · If the LED is steady on or flashes slowly at 1 Hz, a hardware fault has occurred. |
2. Identify whether BootWare runs successfully.
¡ Situation 1: The basic section runs successfully.
Identify whether the following information exists. If yes, the basic section has run successfully. Proceed to step 3.
System is starting...
Booting Normal Extended BootWare
****************************************************************************
* *
* H3C CR19000 Routers BootWare, Version 1.01 *
* *
****************************************************************************
Copyright (c) 2004-2020 New H3C Technologies Co., Ltd.
Compiled Date : Mar 9 2020
CPU Type : XLP316
CPU Clock Speed : 1200MHz
Memory Type : DDR3 SDRAM
Memory Size : 16384MB
Memory Speed : 1333MHz
Flash Size : 8MB
CPLD Version : 1.0
PCB Version : Ver.B
BootWare Validating...
¡ Situation 2: No output.
The memory or processor might be faulty. For an MPU, remove the memory module and identify whether the following information exists after startup:
RAM initialization failed
Fatal error! Please reboot the board.
If not, a failure has occurred before memory initialization due to a processor or welding issue. Please contact Technical Support. If yes, an issue has occurred during memory initialization. Try replacing the memory module.
¡ Situation 3: No output.
If the device generates information as follows after poweron, the memory module may be faulty. Identify whether it is securely inserted or replace the memory module. The hardware circuit of the memory channel might be faulty. Please contact Technical Support.
readed value is 75555555 , expected value is 55555555
DRAM test fails at: 5ff80020
Fatal error! Please reboot the board.
|
|
NOTE: The above information was generated due to memory self-test failure. Sometimes, the system experiences a warm boot due to anomalies and the memory controller state has not yet restored, leading to self-test failure (with a very low probability). To resolve such an issue, power off and then power on the device. This situation is different from the situation where self-test failure is caused by memory damage. |
3. Identify whether apps can be loaded correctly.
¡ Situation 1: The app files can be loaded and decompressed successfully.
The following information indicates that the app files have been loaded and decompressed successfully. Proceed to step 4.
****************************************************************************
* *
* BootWare, Version 1.01 *
* *
****************************************************************************
Copyright (c) 2004-2020 New H3C Technologies Co., Ltd.
Compiled Date : Mar 9 2020
CPU Type : XLP316
CPU Clock Speed : 1200MHz
Memory Type : DDR3 SDRAM
Memory Size : 16384MB
Memory Speed : 1333MHz
Flash Size : 8MB
CPLD Version : 1.0
PCB Version : Ver.B
BootWare Validating...
Press Ctrl+B to access EXTENDED-BootWare MENU...
Loading the main image files...
Loading file flash:/SYSTEM.bin..................
............................................................................
............................................................................
............................................................................
.................Done.
Loading file flash:/BOOT.bin....................
..Done.
Image file flash:/BOOT.bin is
self-decompressing...................................................Done.
¡ Situation 2: An app does not exist.
The following information indicates that an app file does not exist. The app file must be downloaded again.
****************************************************************************
* *
* BootWare, Version 1.01 *
* *
****************************************************************************
Copyright (c) 2004-2020 New H3C Technologies Co., Ltd.
Compiled Date : Mar 9 2020
CPU Type : XLP316
CPU Clock Speed : 1200MHz
Memory Type : DDR3 SDRAM
Memory Size : 16384MB
Memory Speed : 1333MHz
Flash Size : 8MB
CPLD Version : 1.0
PCB Version : Ver.B
BootWare Validating...
Application program does not exist.
Please input BootWare password:
¡ Situation 3: An app file has a CRC error.
The following information indicates that an obtained app file has a verification error. Please download the file to flash memory again.
****************************************************************************
* *
* BootWare, Version 1.01 *
* *
****************************************************************************
Copyright (c) 2004-2020 New H3C Technologies Co., Ltd.
Compiled Date : Mar 9 2020
CPU Type : XLP316
CPU Clock Speed : 1200MHz
Memory Type : DDR3 SDRAM
Memory Size : 16384MB
Memory Speed : 1333MHz
Flash Size : 8MB
CPLD Version : 1.0
PCB Version : Ver.B
BootWare Validating...
Press Ctrl+B to enter extended boot menu...
Loading file flash:/SYSTEM-.bin..................
............................................................................
............................................................................
............................................................................
Something wrong with the file.
4. Check the app startup process.
¡ Situation 1: Without the system image file, the system starts up and enters the boot interface.
Loading the main image files...
Loading file flash:/BOOT.bin....................
...................................Done.
<boot>
In this case, you must download the software version again.
¡ Situation 2: The System image is starting... message is displayed and the system gets stuck.
¡ Situation 3: The System image is starting... message is displayed, but the system fails to enter the CLI and reboots repeatedly.
¡ Situation 4: The Press ENTER to get started message is displayed, but you cannot access the CLI.
¡ Situation 5: You can access the CLI, but the system automatically reboots after a while.
In these situations, a hardware failure or software version issue might occur. Please contact Technical Support.
To troubleshoot the issue of startup failure of the standby MPU newly installed on the device.
5. Identify whether the model of the newly installed MPU is the same as that of the original MPU.
The two MPUs on the same device must be the same model. If their models are different, install an MPU of the same model as the original one.
6. Collect diagnostics information.
Check the operating status of the active MPU, collect diagnostics information, and contact Technical Support.
7. Contact Technical Support.
If the issue persists, contact Technical Support.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
An MPU restarts during use and fails to start up
Symptom
An MPU restarts during use and fails to start up.
Common causes
The following are the common causes for this type of issue:
· The startup file is damaged.
· The MPU memory is damaged.
· The card is not fully inserted or is damaged, causing BootWare to run abnormally.
Troubleshooting flow
Figure 4 shows the troubleshooting flowchart.
Solution
1. Identify whether the startup file on the MPU is normal.
Log in to the faulty MPU through the console port. Restart the device. If BootWare prompts a CRC error or the startup file cannot be found, reload the startup file and identify whether the size of the file in flash memory is the same as that on the server. If the flash memory does not have the file or the size of the file in flash memory differs from that on the server, reload the startup file. Then, configure the reloaded file as the current startup file. BootWare can automatically configure this file as the current startup file during the loading process.
2. Examine the MPU memory.
If the loaded file size is correct and the file is correctly set as the current startup file, reboot the card and immediately press CTRL+T to examine the memory. If a memory error is prompted, please replace the card.
3. Identify whether BootWare still prompts an error.
If the memory is normal but BootWare still prompts an error during startup, identify the faulty component based on the prompt. Identify whether the card is securely inserted. Replace the card if it is securely inserted.
4. Contact Technical Support.
If the issue persists, contact Technical Support.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Active/standby MPU switchover failure
Symptom
· When you use the reboot command to reboot the active MPU, the standby MPU is also rebooted.
· Active/standby MPU switchover is abnormal.
Common causes
The following are the common causes for this type of issue:
· If the original standby MPU has not completed startup, it passively becomes the active MPU because of the active MPU reboot.
· The standby MPU does not receive any packets from the active MPU and switches to the active MPU.
· The active MPU reboots due to its own anomalies.
· The versions of the standby MPU and the active MPU are different.
Troubleshooting flow
Figure 5 shows the flowchart for troubleshooting the issue that the standby MPU is also rebooted when you use the reboot command to reboot the active MPU.
Solution
To troubleshoot the issue that the standby MPU is also rebooted when you use the reboot command to reboot the active MPU:
1. After the original active MPU starts up, use the ftp or tftp command to upload the up-to-date log file in the log file directory on the storage media to the file server.
2. Search the log file for the reboot log message, for example, Command is reboot slot 0.
3. Search the log file for the most recent system restart log message, for example, SYSLOG_RESTART: System restarted.
4. Search the log messages between the two log messages for a log message like Batch backup of standby board in slot 1 has finished.
¡ If no log message like the specified message is found, the original standby MPU was starting up when you executed the reboot command. This is normal and requires no action. Next time you want to use the reboot slot command to reboot the active MPU, make sure the standby MPU has completed batch backup (a log message like Batch backup of standby board in slot 1 has finished already exists).
¡ If a log message like the specified message is found, contact Technical Support.
To troubleshoot an active/standby MPU switchover failure:
5. Use the display system stable state command to collect information about the active and standby MPU status.
<H3C> display system stable state
System state : Stable
Redundancy state : Stable
Slot CPU Role State
0 0 Active Stable
1 0 Standby Stable
Verify the following information:
¡ The roles of the two MPUs are Active and Standby.
¡ Both the active and standby MPUs are in Stable status.
6. Use the display boot-loader command to collect information about the versions of the active and standby MPUs. Identify whether the versions of the active and standby MPUs are the same.
Fault diagnostics commands
The commands required for fault diagnostics are shown in the following table.
You can execute the following commands to enter probe view:
<Sysname> system
[Sysname] probe
[Sysname-probe]
|
Command |
View |
Description |
|
display hardware internal mss slot slot-num information |
Probe view |
Display driver switchover information. |
|
set hardware internal mss slot slot-num heart-beat rob { disable | enable } |
Probe view |
Enable or disable the standby MPU to become the active MPU forcibly. |
|
display kernel exception number slot slot-num |
Any view |
Display exception information. |
|
display system stable state |
Any view |
Display the current status of the active and standby MPUs. |
|
display boot-loader |
Any view |
Display information about the versions of the active and standby MPUs. |
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Interface module startup failure
Symptom
An interface module cannot start up.
Common causes
The following are the common causes for this type of issue:
· Power supply anomaly.
· The software version does not support the interface module.
· The interface module is not securely installed.
· Interface module hardware failure.
· Chassis slot hardware failure.
Troubleshooting flow
Figure 6 shows the troubleshooting flowchart.
Figure 6 Flowchart for troubleshooting interface module startup failure
Solution
1. Identify whether the interface module is powered on.
Check the RUN LED on the interface module. If the LED is off, the interface module might not be powered on. Perform the following tasks:
a. Examine the power status LEDs to determine whether the power supplies are operating normally. If a LED indicates an error, see the abnormal power supply status troubleshooting procedure for power supply troubleshooting.
b. Calculate the system power consumption. Identify whether the remaining power of power supplies is sufficient. If the remaining power is insufficient, increase power supplies.
c. If the interface module is powered on, proceed to step 3.
2. Identify whether the device software version supports the interface module.
In any view, execute the display version command to obtain the device's software version. Then, identify whether the current software version supports the interface module. If not, please upgrade the version to one that supports the interface module. Before version upgrade, make sure the new version is compatible with other cards.
3. Reinstall the interface module.
Remove the interface module, verify the connector, and then reinsert it into the device. Make sure the interface module is installed securely.
4. Install the interface module in another slot to test if it can start up.
¡ If the interface module cannot start up, it might be faulty. Replace it with a new one.
¡ If the interface module can start up, install another interface module that can start up normally in the original faulty slot. If the interface module cannot start up, the chassis slot might be faulty.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
An interface module restarts during use and fails to start up
Symptom
An interface module restarts during use and fails to start up.
Common causes
The following are the common causes for this type of issue:
· Power supply anomaly.
· The startup file on the MPU is abnormal.
· Interface module hardware failure.
· Chassis slot hardware failure.
Troubleshooting flow
Figure 7 shows the troubleshooting flowchart.
Solution
1. Identify whether the power supplies are operating normally.
Verify that the power status LEDs indicate normal status and the power meets the normal operation requirements of cards. If a power supply malfunctions, see the abnormal power supply status troubleshooting procedure for power supply troubleshooting.
2. Identify whether the startup file on the MPU is normal.
Execute the display boot-loader command in any view to check the next-startup software image used by the card. Execute the dir command in user view to identify whether the startup software image exists. If it does not exist or is damaged, retrieve the startup software image again or set another software image as the next-startup software image.
3. Insert an interface module that can operate correctly into the slot where the interface module cannot start up.
If the startup file loaded by the interface module is normal and conditions permit, insert an interface module that can operate correctly into the slot where the interface module cannot start up.
¡ If the interface module can start up, no anomaly exists on the backplane. Proceed to step 4.
¡ If the interface module cannot start up, the chassis slot has a hardware failure.
4. Identify whether load records exist.
Execute the display logbuffer command in any view to identify whether the log buffer on the device has load records for the card.
<Sysname> display logbuffer
%Jan 12 19:13:49:513 2022 H3C DEV/4/BOARD_LOADING: -MDC=1; Board in slot 4 is loading software images.
%Jan 12 19:14:01:718 2022 H3C DEV/5/LOAD_FINISHED: -MDC=1; Board in slot 4 has finished loading software images.
¡ If the log buffer has load records for the card, move the interface module to another slot and identify whether it can start up normally.
¡ If the log buffer does not have load records for the card, proceed to step 5.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· DEV/4/BOARD_LOADING
· DEV/5/LOAD_FINISHED
Hardware forwarding failure (high-end routers)
Symptom
In the live network, when the device is operating correctly, the forwarding paths do not drop packets. If forwarding paths have severe packet loss or cannot forward packets, verify the internal forwarding paths. By default, the router is enabled with data forwarding path failure detection. Connected cards will periodically detect the status of the forwarding paths between them.
· For a CMPE-1104, CSPC, or SPC card, CEPC, CMPE-1104, or CSPC card, CEPC, MPE-1104, or SPC card, execute the display hardware internal hgmonitor info command to display the internal forwarding path detection records for a chip.
If a forwarding link is abnormal, the command output will include a record in which the Link field displays down, for example:
[Sysname-probe] display hardware internal hgmonitor info 4 0
Link status change notice event:
Unit Port Link Clock Number
0 hg0 up 08:08:03:755732 11/12/2014 1
0 hg0 down 09:22:23:977918 11/12/2014 2
0 hg1 up 08:12:19:398227 11/12/2014 1
0 hg2 up 08:08:05:465720 11/12/2014 1
0 hg3 up 08:12:21:391922 11/12/2014 1
Identify whether the time for the record is the fault occurrence time. If yes, the interconnect link has failed.
· For a CSPEX-1204 card, use the display hardware internal forward fpga counter command to display the forwarding path detection records.
If a forwarding link is abnormal, the HG section displays that one or more HG ports are in DOWN state. For example:
[Sysname-probe] display hardware internal forward fpga counter slot 3
…
5 HG
--------------------------------------------------------------------------------
-------------------------
Value(HEX) Value(DEC) | Address | Description
--------------------------------------------------------------------------------
-------------------------
0x0 0 | 0x005D0003 | SEND: HG_0 (DOWN)
OUT
0x0 0 | 0x00610003 | SEND: HG_1 (UP)
OUT
0x0 0 | 0x00650003 | SEND: HG_2 (DOWN)
OUT
0x0 0 | 0x00690003 | SEND: HG_3 (UP)
OUT
--------------------------------------------------------------------------------
-------------------------
0x0 0 | 0x005D0005 | RECV: HG_0 (DOWN)
IN
0x0 0 | 0x00610005 | RECV: HG_1 (UP)
IN
0x0 0 | 0x00650005 | RECV: HG_2 (DOWN)
IN
0xA27 2599 | 0x00690005 | RECV: HG_3 (UP)
IN
--------------------------------------------------------------------------------
-------------------------
…
· For a CSPEX or CEPC card, CSPEX card (except for the CSPEX-1204), CSPEX card (except for the CSPEX-1204) or CEPC card, SPE card, use the display hardware internal np serdes fabric status command to display the forwarding path detection records. If a forwarding link is abnormal, the HG section displays that one or more HG ports are in DOWN state. For example:
[Sysname-probe] display hardware internal np serdes fabric status slot 18 chip 0
SERDES STATUS NP_PORT IF_NUM PEER_SLOT IF_TYPE
20 UP 106 10 23 40GE(UP)
21 UP 106 10 23 40GE(UP)
22 UP 106 10 23 40GE(UP)
23 UP 106 10 23 40GE(UP)
8 DOWN 104 8 23 40GE(DOWN)
9 DOWN 104 8 23 40GE(DOWN)
10 DOWN 104 8 23 40GE(DOWN)
11 DOWN 104 8 23 40GE(DOWN)
Hg port tuning Record:
Port Event Clock
10 Tuning_start 09:41:03:039327
10 Tuning_end(S) 09:41:04:118066
10 Switch_Route 09:41:24:705325
8 Tuning_start 09:41:04:118068
8 Tuning_end(S) 09:41:05:195958
8 Switch_Route 09:41:24:705327
A forwarding link detection failure is reported to the comprehensive diagnostics module. The device generates the following information:
%@169696^Dec 21 16:04:06:987 2017 H3C SWFA/2/SWFA: -Chassis=1-Slot=15; 0x0F1E0000 [3060] :
HG Monitor check fail: (SrcSlot[15] .SrcChip[0] )-> (DstSlot[10] .DstChip[0] ))
The above information indicates that a forwarding link fault might occur, and the fault was reported to the comprehensive diagnostics module for analysis.
%@169696^Dec 21 16:04:06:987 2017 H3C SWFA/2/SWFA: -Chassis=1-Slot=15; 0x0F1E0000 [3060] :
HG Monitor check Recover: (SrcSlot[15] .SrcChip[0] )-> (DstSlot[10] .DstChip[0] ))
The above information indicates that a forwarding link fault might occur, and the fault was reported to the comprehensive diagnostics module for repair. (Applicable only to CSPEX (except for the CSPEX-1104-E and CSPEX-1802X), SPE, and CEPC cardsCSPEX cards (except for the CSPEX-1204 and CSPEX-1104-E)CSPEX (except for the CSPEX-1204, CSPEX-1104-E, and CSPEX-1802X), SPE, and CEPC cardsSPE cards.)
%@169696^Dec 21 16:04:06:987 2017 H3C SWFA/2/SWFA: -Chassis=1-Slot=15; 0x0F1E0000 [3060] :
HG Monitor check clear: (SrcSlot[15] .SrcChip[0] )-> (DstSlot[10] .DstChip[0] ))
The above information indicates that the forwarding link fault has been cleared and the information reported to the comprehensive diagnostics module has been cleared.
%@169694^Dec 21 16:04:06:927 2017 H3C SWFA/2/SWFA: -Chassis=1-Slot=15; 0x0F1E0000 [401] :
16:04:06:927390 12/21/2017: unit 0 port 23 is isolated by local.
%@169695^Dec 21 16:04:06:859 2017 H3C SWFA/2/FWD: -Chassis=1-Slot=10; 0x0FD93001 [377] :
16:04:06:859252 12/21/2017: unit 0 port 67 isolated by rpc.
The above information indicates that a forwarding link fault might occur, and the fault was reported to the comprehensive diagnostics module for link isolation.
%@169694^Dec 21 16:04:06:927 2017 H3C SWFA/2/SWFA: -Chassis=1-Slot=15; 0x0F1E0000 [401] :
16:04:06:927390 12/21/2017: unit 0 port 23 is fault, not isolated by local.
%@169695^Dec 21 16:04:06:859 2017 H3C SWFA/2/FWD: -Chassis=1-Slot=10; 0x0FD93001 [377] :
16:04:06:859252 12/21/2017: unit 0 port 67 is fault, not isolated by rpc.
The above information indicates that a forwarding link fault might occur, no backup link exists, and the fault was reported to the comprehensive diagnostics module for link isolation.
%Aug 13 15:58:18:186 2019 H3C DIAG/4/DIAG_AI: -MDC=1; Board fault: chassis 0 slot 8 or chassis 0 slot 12, please check them
The above information indicates that multiple slots might be faulty.
%Aug 13 15:58:18:186 2019 H3C DIAG/4/DIAG_AI: -MDC=1; Board fault: chassis 0 slot 8, please check it
The above information indicates that a single slot might be faulty.
Common causes
The following are the common causes for this type of issue:
· A switching fabric module is faulty.
· A service module is faulty.
Troubleshooting flow
Figure 8 shows the troubleshooting flowchart:
Figure 8 Flowchart for troubleshooting hardware forwarding failure
Solution
For an SR8800-X router, the MPUs and switching fabric modules are separated. The switching fabric modules perform service traffic forwarding. Traffic is load balanced among the switching fabric modules. MPUs perform control and management and do not participate in service traffic forwarding.
1. The SR8804-XCR16006-F router supports the CSFC-04-1, CSFC-04-2, CSFC-04-3, and CSFC-04-4 switching fabric modules. Please directly contact Technical Support.
2. If the input interface and output interface of traffic reside on the same CSPC or CMPE-1104 card, please directly contact Technical Support.
3. If the input interface and output interface of traffic reside on the same SPEX, CSPEX, or CEPC cardCSPEX or CEPC card, or the interfaces reside on different cards, execute the switch-fabric isolate command in system view to isolate switching fabric modules one by one. (Ensure that the number of switching fabric modules is equal to or greater than 1. If only one switching fabric module exists, make sure it is not the second one.) Identify whether the issue is resolved after you isolate each switching fabric module. The following uses an SR8808-XCR16010-F router as an example to describe the switching fabric module isolation steps. Slots 10 to 13 are designated for the switching fabric modules.
a. Isolate the switching fabric module in slot 10, wait for about 1 minute, and then identify whether the issue is resolved.
b. Execute the undo switch-fabric isolate command to cancel the isolation of the switching fabric module in slot 10. After the switching fabric module restarts and is in normal state, isolate the switching fabric module in slot 11 after about 3 minutes and identify whether the issue is resolved.
c. Repeat the steps to isolate the switching fabric modules in all slots sequentially.
4. If the issue is resolved after you isolate a switching fabric module, the switching fabric module is faulty. If the issue persists after you isolate all switching fabric modules, an interface module is faulty. As a best practice, transfer services to other interface modules, and then isolate interface modules or replace the faulty interface module to troubleshoot the issue.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
For an RX8800 router, the MPUs and switching fabric modules are separated. The switching fabric modules perform service traffic forwarding. Traffic is load balanced among the switching fabric modules. MPUs perform control and management and do not participate in service traffic forwarding.
6. If the input interface and output interface of traffic reside on the same SPE card, or the interfaces reside on different cards, execute the switch-fabric isolate command in system view to isolate switching fabric modules one by one. (Ensure that the number of switching fabric modules is equal to or greater than 1. If only one switching fabric module exists, make sure it is not the second one.) Identify whether the issue is resolved after you isolate each switching fabric module. The following uses an RX8800-08 router as an example to describe the switching fabric module isolation steps. Slots 10 to 13 are designated for the switching fabric modules.
a. Isolate the switching fabric module in slot 10, wait for about 1 minute, and then identify whether the issue is resolved.
b. Execute the undo switch-fabric isolate command to cancel the isolation of the switching fabric module in slot 10. After the switching fabric module restarts and is in normal state, isolate the switching fabric module in slot 11 after about 3 minutes and identify whether the issue is resolved.
c. Repeat the steps to isolate the switching fabric modules in all slots sequentially.
7. If the issue is resolved after you isolate a switching fabric module, the switching fabric module is faulty. If the issue persists after you isolate all switching fabric modules, an interface module is faulty. As a best practice, transfer services to other interface modules, and then isolate interface modules or replace the faulty interface module to troubleshoot the issue.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
The SR8800-X-S router does not have independent switching fabric modules and does not support the switching fabric module isolation command. Please contact Technical Support directly.
To troubleshoot the issue reported to the comprehensive diagnostics module, perform the following tasks:
9. Use the display hardware-failure-detection command to display the hardware failure detection and repair information.
10. Identify whether the interconnect HG ports are in Up status. If HG ports are in Down status, a hardware failure exists.
11. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A








