Title | Size | Downloads |
---|---|---|
H3C S12500 Switch Series Troubleshooting Guide-R7128-6W100-book.pdf | 492.17 KB |
- Table of Contents
H3C S12500 Switch Series (R7128) Troubleshooting Guide
Copyright © 2013 Hangzhou H3C Technologies Co., Ltd. All rights reserved. No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of Hangzhou H3C Technologies Co., Ltd. The information in this document is subject to change without notice. |
|
Contents
General troubleshooting procedures
Problem locations and possible results
Common service recovering and troubleshooting methods
Troubleshooting links and ports
A port frequently goes up and down
Troubleshooting hardware forwarding
Online hardware diagnostic and failure protection
Troubleshooting packet forwarding failure
IRF fabric establishment failure
Troubleshooting system management
General troubleshooting procedures
Obtaining information
H3C recommends that you enable the information center by using the info-center enable command for fast troubleshooting. By default, the information center is enabled.
Obtaining log information
Log information includes logs in log files that record operation information and diagnostic information in diag files that record state information. The system stores these files in the CF card or Flash.
You can export the log and diag files through FTP, TFTP, or USB. To identify the files exported from different MPUs, save them in a specific order, for example, in different folders named chassisXslotY.
Table 1 Log information classification
Category |
File name |
Content |
log file |
logfileX.log |
Command executions, traps, and operational logs. |
diag file |
XXX.gz |
Device state, CPU state, memory state, configuration state, software entries, and hardware entries. |
Restrictions and guidelines
Follow these restrictions and guidelines to obtain log information:
· Record the displayed information during operations for future analysis.
· Understand the impact of each operation and make sure the configuration can be restored upon operation failures.
· Make sure the current configuration is consistent with the saved configuration. Do not save the configuration during IRF split, card faults, and card reboot.
· After you perform an operation, wait for a while before you verify the results.
· Before you replace an MPU with a new MPU, make sure the new MPU has the same software version as the old MPU.
Obtaining log files
Use the logfile save command to save logs from the log buffer to the CF card on one of the following:
· The active and standby MPUs of a standalone device.
· The MPUs of IRF master and subordinate devices.
· The MDCs of a device.
<Sysname>logfile save
The contents in the log file buffer have been saved to the file cfa0:/logfile/lo
gfile9.log.
Display log files on the active MPU.
<Sysname> dir cfa0:/logfile/
Directory of cfa0:/logfile
0 -rw- 5233116 Apr 27 2013 09:20:44 logfile1.log
1 -rw- 5142919 May 03 2013 14:15:42 logfile2.log
2 -rw- 5193287 May 09 2013 12:28:08 logfile3.log
1021808 KB total (259072 KB free)
Display log files on the standby MPU.
<Sysname> dir slot1#cfa0:/logfile/
Directory of slot1#cfa0:/logfile
0 -rw- 5242287 May 13 2013 16:47:46 logfile4.log
1 -rw- 5143837 May 24 2013 22:56:46 logfile5.log
2 -rw- 5149806 Jun 01 2013 13:43:26 logfile6.log
1020068 KB total (643264 KB free)
Display log files on the MPU of an IRF subordinate device. If the subordinate device has two MPUs, execute this command on each MPU.
<Sysname> dir chassis2#slot0#cfa0:/logfile/
Directory of chassis2#slot0#cfa0:/logfile
0 -rw- 5215316 Jun 03 2013 05:49:20 logfile7.log
1 -rw- 5235163 Jun 21 2013 07:31:54 logfile8.log
2 -rw- 3256492 Jun 26 2013 09:01:08 logfile9.log
1021808 KB total (773424 KB free)
Display log files on each MDC. The following shows the log file on MDC 3.
<Sysname>dir cfa0:/mdc/
Directory of cfa0:/mdc
0 drw- - Jul 10 2013 14:56:50 mdc2
1 drw- - Jul 10 2013 16:48:04 mdc3
2 drw- - Jul 10 2013 16:43:20 mdc4
<Sysname>dir cfa0:/mdc/mdc3/logfile/
Directory of cfa0:/mdc/mdc3/logfile
0 -rw- 8417 Jul 10 2013 18:17:46 logfile1.log
1020068 KB total (701636 KB free)
Obtaining diag files
Execute the display diagnostic-information command, and enter "y" at the prompt to save the diag file to the CF card. If you select "n", not all the diagnostic information can be saved to the CF card. The more cards the device has, the more time the saving operation consumes. During the saving operation, do not execute any command.
<Sysname>display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:y
Please input the file name(*.gz)[flash:/diag.gz]:cfa0:/diag.gz
Diagnostic information is outputting to cfa0:/diag.gz.
Save successfully.
<Sysname>dir cfa0:/
Directory of cfa0:
……
6 -rw- 898180 Jun 26 2013 09:23:51 diag.gz
1021808 KB total (259072 KB free)
You can also view the diagnostic information by executing the following commands, but H3C recommends that you do not use this method. The screen-length disable command is used to avoid interruption of information output.
<Sysname>screen-length disable
% Screen-length configuration is disabled for current user.
<Sysname>display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:n
==================================================================
===============display cpu===============
Chassis 2 Slot 0 CPU 0 CPU usage:
4% in last 5 seconds
0% in last 1 minute
0% in last 5 minutes
Chassis 2 Slot 0 CPU 1 CPU usage:
0% in last 5 seconds
0% in last 1 minute
0% in last 5 minutes
……
Obtaining other information
You also need to obtain other operational information. The following lists some relevant information:
· Problem symptom, time, topology, configuration information, measures, and results.
· Operation logs, captured packet information, debug information, and information output from the console port during continual MPU and switching fabric card reboots.
· Alarms of cards, power supply, and fans.
Troubleshooting procedure
When the switch has a problem, do the following:
1. Obtain operation information.
2. Use the troubleshooting flowchart provided in "Troubleshooting flowchart" to determine the problem type.
3. Use the solution for the problem type to troubleshoot the switch.
If you cannot determine the problem, contact H3C Support.
Troubleshooting flowchart
Use the troubleshooting flowchart shown in Figure 1 to determine the problem type.
Figure 1 Troubleshooting flowchart
The following are commonly used troubleshooting methods:
· Collecting packet statistics on ports.
· Mirroring packets.
· Capturing packets.
· Configuring QoS policies to collect statistics.
· Enabling debugging functions.
· Replacing the suspicious hardware or install the suspicious hardware to another slot.
For example, if a transceiver might have a problem, do one of the following:
¡ Replace the transceiver with a transceiver that can operate correctly.
¡ Install the transceiver in another slot.
If the card in a slot might have a problem, do one of the following:
¡ Replace the card with a card that can operate correctly.
¡ Install the card into another slot.
Problem types
Card failure
A card failure might result in the following symptoms:
· A card cannot start up.
· A card reboots unexpectedly.
· A card reboots again and again.
· A card is not in the correct state.
· To troubleshoot a card failure, see "Card failure."
Power failure
A power failure might result in the following symptoms:
· Power LEDs are not in the correct states.
· Power alarm messages are displayed continuously.
To troubleshoot a power failure, see "Power supply failure."
Fan failure
A fan failure might result in the following symptoms:
· Fans do not operate.
· Fan LEDs are not in the correct states
· Fan alarm messages are displayed continuously.
To troubleshoot a fan failure, see "Fan failure."
Temperature problem
· If temperature alarm messages are displayed, the device might have a temperature problem. To troubleshoot a temperature problem, see "Temperature alarm."
Port failure
A port failure might result in the following symptoms:
· A port cannot come up.
· A port goes down and comes up frequently.
· The counts of packet errors on the port are not zero.
To troubleshoot a port failure, see "Troubleshooting links and ports."
Hardware forwarding failure
If the log messages such as "Forwarding fault" or "Board fault: chassis X slot Y, please check it" are displayed, the device might have a hardware forwarding failure.
To troubleshoot a hardware forwarding failure, see "Troubleshooting hardware forwarding."
Packet forwarding failure
A packet forwarding failure might result in the following symptoms:
· Some ping packets are lost, or the ping operation fails.
· Some tracert packets are lost, or the tracert operation fails.
· Layer 2 frames are lost, or the Layer 2 link is down.
· Layer 3 frames are lost, or the Layer 3 connection is down.
· The MPLS service is not running correctly.
To troubleshoot a packet forwarding failure, see "Troubleshooting links and ports."
IRF failure
An IRF failure might result in the following symptoms:
· The IRF fabric cannot be formed.
· An IRF split occurs.
To troubleshoot an IRF failure, see "Troubleshooting IRF."
Overuse of CPU
If the switch uses too much CPU, see "High CPU usage."
Overuse of memory
If the switch uses too much memory, see "High memory usage."
Insufficient resources
If the "No enough resource" message is displayed, see "Insufficient resources."
Problem locations and possible results
Figure 2 shows a typical network model and the possible problem locations. For higher availability and quick switchover and restoration in response to failures, the network uses two upstream links and two core switches. Table 2 shows the possible symptoms and results of different problem locations.
Figure 2 Typical network model and the possible problem locations
Table 2 Problem locations and possible symptoms and results
Problem location |
Possible symptoms |
Possible results |
1 (including transceivers) |
A port is down. |
A service switchover occurs. |
Counts of packet errors are increased. |
All services on the link are affected. |
|
2 |
A card fails. |
A service switchover occurs. |
A chip on a card fails while the card is operating correctly. |
Services on the chip are affected. If a switching fabric module failure occurs, the whole device is affected. |
|
A software error occurs. |
The device reboots and a service switchover occurs. If a protocol module has a problem, the service is usually affected. |
|
3 |
Same as problem location 1. |
Services on the access switch are affected. The scope of affected services is smaller than a problem at problem location 1. |
4 |
The device is down. |
Services on the device are affected. |
A chip on a card fails. |
Some ports or all services on the device are affected. |
|
A software error occurs. |
The device reboots and all services on the device are affected. If a protocol module has a problem, the service is usually affected. |
|
5 |
Same as problem location 1. |
Server services on the link are affected. |
6 |
The network is operating correctly but a service is not. |
The service on the server is affected. |
Common service recovering and troubleshooting methods
Table 3 Common service recovering and troubleshooting methods
Failure category |
Service recovering methods |
Troubleshooting methods |
Hardware |
· Isolate the failed card. · Isolate the failed device by adjusting service traffic forwarding paths. For example, adjust the preferences for routes so traffic is switched to other paths. |
Complete required tests on the backup hardware, and replace the failed hardware. |
Software |
· Reboot the protocols on the failed device. · Isolate the failed device by adjusting service traffic forwarding paths. |
· Upgrade the software or install patches. · Adjust the network topology, or modify the configuration to remove the failures. |
Link |
Isolate the failed link by adjusting service traffic forwarding paths. |
Remove link errors. |
Others |
· Correct configuration errors. · Connect the ports of the devices correctly. · Isolate the failed link by adjusting service traffic forwarding paths. |
· Correct configuration errors. · Connect the ports of the devices correctly. · Repair the power and air conditioner systems for the devices. |
Troubleshooting hardware
Card failure
Symptom
· A card runs into an abnormal state: Absent, Fault, Off, Offline, or Illegal.
· A card fails to boot, or it reboots unexpectedly or repeatedly.
|
NOTE: If the switch outputs log messages, such as "Forwarding fault," "Board fault: chassis X slot Y," or "please check it," see "Troubleshooting hardware forwarding." |
How to identify a card state
A card can operate in Normal, Master, Standby, Absent, Fault, Off, Offline, or Illegal state:
· Normal—The card is operating correctly.
· Master—The card is an active MPU.
· Standby—The card is a standby MPU.
· If the card is in Fault, Off, Offline, or Illegal state, or the slot in which the card is installed is in Absent state, the card might be faulty. See "Solution" to rectify the fault.
You can execute the display device command and check the Brd Status field for the card states. The following is a sample command output.
<Sysname> display device
Slot No. Brd Type Brd Status Subslot Num Sft Ver
1/0 LST1MRPNC1 Master 0 S12500-CMW710-R7128
1/1 LST1MRPNC1 Standby 0 S12500-CMW710-R7128
1/2 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/3 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/4 NONE Absent 0 NONE
1/5 NONE Absent 0 NONE
1/6 NONE Absent 0 NONE
1/7 NONE Absent 0 NONE
1/8 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
1/9 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
1/10 NONE Absent 0 NONE
1/11 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/12 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/13 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/14 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/15 NONE Absent 0 NONE
1/16 NONE Absent 0 NONE
1/17 NONE Absent 0 NONE
1/18 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
1/19 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
1/20 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/21 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/22 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/23 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/24 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/25 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/26 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/27 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/28 LST2SF18C1 Normal 0 S12500-CMW710-R7128
How to confirm a card reboot
Use the display version command or display the card running time through log files to confirm whether a card rebooted. If the card uptime is less than other cards, the card rebooted. See "Solution" to resolve the problem.
<Sysname>display version
H3C Comware Software, Version 7.1.034, Release 7129
Copyright (c) 2004-2013 Hangzhou H3C Tech. Co., Ltd. All rights reserved
H3C S12518 uptime is 0 weeks, 0 days, 0 hours, 57 minutes
Last reboot reason : User reboot
Boot image: cfa0:/S12500-CMW710-BOOT-R7129.bin
Boot image version: 7.1.034P04, Release 7129
System image: cfa0:/S12500-CMW710-SYSTEM-R7129.bin
System image version: 7.1.034, Release 7129
LST1MRPNC1 2/0: uptime is 0 weeks, 0 days, 0 hours, 57 minutes
Last reboot reason : User reboot
3456 Mbytes SDRAM
1024 Kbytes NVRAM Memory
Type : LST1MRPNC1
BootRom : 2.19
Software : S12500-CMW710-R7129
PCB : Ver.A
Board Cpu:
Number of Cpld: 2
Cpld 0:
SoftWare : 002
Cpld 1:
SoftWare : 002
PowChipA : 001A
CpuCard
Type : LSR1CPA
PCB : Ver.C
Number of Cpld: 1
Cpld 0:
SoftWare : 001
BootRom : 2.12
Mbus card
Type : LSR1MBCB
Software : 115
PCB : Ver.B
……
Solution
In Absent state
To resolve the problem:
1. Verify that the card is fully seated. You can remove and reinstall the card to make sure the card is installed securely.
2. Do the following:
¡ Install this card into another slot.
¡ Install another card that runs correctly on the chassis into this slot to determine whether the card is faulty.
3. Verify that the LEDs on the card panel and inside the card do not indicate any fault.
4. If the card is an MPU or switching fabric module, connect the card to a terminal through a serial cable to verify that the card boots correctly.
5. If the card is confirmed to be faulty, replace the card and contact H3C Support.
In Off state
Determine whether a user powered off the card by using the power-supply off command.
¡ If they did, power on the card by using the power-supply on command.
¡ If they did not, the power supply of the card is faulty. Replace the card and contact H3C Support.
In Fault state
To resolve the problem:
1. Wait a period of time and determine whether the card remains in Fault state or reboots after becoming Normal. If the card reboots after becoming Normal, contact H3C Support.
2. Verify that the card boots correctly.
¡ For an MPU or switching fabric module, connect the card to a terminal through a serial cable to verify that the card boots correctly. If a DRAM test fails, causing repeated reboots (as shown in the following), verify that the DRAM is installed securely.
readed value is 55555555 , expected value is aaaaaaaa
DRAM test fails at: 080ffff8
DRAM test fails at: 080ffff8
Fatal error! Please reboot the board.
¡ For an LPU, verify that the system working mode supports the card type.
Use the display system-working-mode command to display the system operating mode:
<Sysname> display system-working-mode
The current system working mode is routee.
The next system working mode is routee
If the current system operating mode does not support the card, the switch generates related information as shown in the following example:
%Jun 26 10:13:04:006 2013 H3C SYSM/1/DRV_SYSM_PROMPT: -MDC=1;
This is not hardware fault, please change mode by command 'system-working-mode' in system view.
%Jun 26 10:13:04:006 2013 H3C SYSM/1/DRV_SYSM_PROMPT: -MDC=1;
chassis 2 slot 2 is an EB type board, and it supports Standard working mode only.
%Jun 26 10:13:04:006 2013 H3C SYSM/1/DRV_SYSM_PROMPT: -MDC=1;
ERROR!!! chassis 2 slot 2 doesn't support the current system working mode, board rebooting!
The output shows that the EB card is not supported in Routee mode.
If you determine that the current system operating mode does not support the card, use the system-working-mode command to modify the system operating mode. Then save the configuration. The new operating mode takes effect after the switch reboots.
[Sysname]system-working-mode standard
Do you want to change the system working mode? [Y/N]:y
The system working mode is changed, please save the configuration and reboot the system to make it effective.
[Sysname]save
The current configuration will be written to the device. Are you sure? [Y/N]:y
Please input the file name(*.cfg)[cfa0:/ali0207-V7.cfg]
(To leave the existing filename unchanged, press the enter key):
cfa0:/ali0207-V7.cfg exists, overwrite? [Y/N]:y
Validating file. Please wait...
Saved the current configuration to mainboard device successfully.
3. Install the card into another slot to determine whether the card is faulty.
4. If the card is confirmed to be faulty, replace the card and contact H3C Support.
In Offline state
To resolve the problem:
1. Determine whether a user isolated the card from the system by using the board-offline command. If the card is isolated due to this operation, use the undo board-offline command to remove the configuration. A card is also isolated from the system when POST is performed.
2. If an LPU is isolated from the system, a fault might be detected on the LPU by the online diagnostic module. You can execute the display hardware-failure-detection command, and check for the records at the time when the card was isolated. If the LPU is faulty, replace the LPU and contact H3C Support.
<Sysname>display hardware-failure-detection
Current level:
chip : isolate
board : isolate
forwarding : isolate
---------------------Chassis 2, Slot 0 executed records:-------------------
Chassis 2, Slot 6:
1. 2013-06-26, 09:49:15 some auto-down ports on this slot are down by forwarding detection.
---------------------Chassis 2, Slot 0 trapped records:--------------------
Chassis 1, Slot 3:
1. 2013-06-20, 15:17:44 warned by forwarding detection.
Chassis 2, Slot 6:
1. 2013-06-26, 09:52:22 warned by forwarding detection.
3. If switching fabric modules are isolated from the system, forwarding-plane failures might be detected, and the system generates log messages such as "Forwarding fault," "Board fault: chassis X slot Y," and "please check it." Verify that the failure is removed after the switching fabric modules are isolated from the system. You can execute the display hardware-failure-detection command to display hardware failure detection and fix information.
¡ If one switching fabric module is isolated from the system, and the forwarding-plane failure is removed after the switching fabric module is isolated, the switching fabric module is faulty. Replace the switching fabric module and contact H3C Support. If the forwarding-plane failure persists after the switching fabric module is isolated, the switching fabric module is not faulty, because the switching fabric module does not participate in traffic forwarding after being isolated. (The online diagnostic module is not intelligent enough, and misjudgment might occur at multiple points of failures.) You can use the undo board-offline command to get the switching fabric module online. See "Troubleshooting hardware forwarding" to resolve the problem, and contact H3C Support.
¡ If multiple switching fabric modules are isolated, the LPUs might be faulty. See "Troubleshooting hardware forwarding" to resolve the problem, and contact H3C Support.
In Illegal state
To resolve the problem:
1. Verify that the switch supports the card.
2. Verify that the switch software version supports the card. New cards cannot boot on an earlier software version. Upgrade the software version to support the new cards.
3. Insert the card into another slot to determine whether the card is faulty.
4. If the problem persists, replace the card and contact H3C Support.
Unexpected reboot
Unexpected reboot means that a card has rebooted unexpectedly while its current state is Normal.
1. View the log messages, or execute the display version command to determine the period during which the card rebooted. Then determine whether a user rebooted the card by using the reboot command or by powering off and then powering on the card during the period.
2. On a switch running 18XX or a later version, the reason for the last reboot is displayed in the display version command output. You can check the Last reboot reason field for the event that caused the last reboot. As shown in the following example, User reboot indicates that the reason for the last reboot is that a user rebooted it.
<Sysname>display version
H3C Comware Software, Version 7.1.034, Release 7129
Copyright (c) 2004-2013 Hangzhou H3C Tech. Co., Ltd. All rights reserved
H3C S12518 uptime is 0 weeks, 0 days, 0 hours, 5 minutes
Last reboot reason : User reboot
Boot image: cfa0:/S12500-CMW710-BOOT-R7129.bin
Boot image version: 7.1.034P04, Release 7129
System image: cfa0:/S12500-CMW710-SYSTEM-R7129.bin
System image version: 7.1.034, Release 7129
LST1MRPNC1 2/0: uptime is 0 weeks, 0 days, 0 hours, 5 minutes
Last reboot reason : User reboot
3456 Mbytes SDRAM
1024 Kbytes NVRAM Memory
Type : LST1MRPNC1
BootRom : 2.19
Software : S12500-CMW710-R7129
PCB : Ver.A
Board Cpu:
Number of Cpld: 2
Cpld 0:
SoftWare : 002
Cpld 1:
SoftWare : 002
PowChipA : 001A
……
3. If all cards rebooted simultaneously, verify the following:
¡ The power supplies operate correctly.
¡ The power source is not powered off.
¡ The power cables are connected securely.
4. Verify that log messages such as "Slot X need to be rebooted automatically!" are not generated during the card reboot. If a message like that is displayed, replace the card and contact H3C Support.
5. Verify that the message "Hardware error" is not displayed. If the message is displayed, view the error code:
¡ If the error code is 0 through 31 or no smaller than 100, the power supply of the card is faulty. Replace the card and contact H3C Support.
¡ For other error codes, contact H3C Support.
%Jul 7 18:10:50:890 2012 H3C DIAG/1/ALERT: -MDC=1; Hardware error! slot=6, code=0
%Jul 7 18:10:50:890 2012 H3C DIAG/1/ALERT: -MDC=1; Hardware error! slot=6, code=1
%Jul 7 18:10:50:890 2012 H3C DIAG/1/ALERT: -MDC=1; Hardware error! slot=6, code=2
6. Execute the display hardware-failure-detection command. Verify that there is no card reboot record in the determined reboot period in the command output. If there is a card reboot record in the determined period, contact H3C Support.
7. If the problem persists, contact H3C Support.
Power supply failure
Symptom
The power LED on the switch indicates a failure. An alarm is generated, indicating that a power supply or power monitoring unit (PMU) is faulty, as shown in the following example:
%Jun 26 10:13:46:233 2013 H3C DEV/2/POWER_MONITOR_FAILED: -MDC=1; Power monitor unit 1 failed.
%Jun 27 18:10:50:890 2013 H3C DEVD/4/DRV_DEV_PSU_CHANGED: -MDC=1; Chassis 1: PSU ID may be changed, please check it!
Solution
To resolve the problem:
1. Verify that the power supply or PMU is securely installed and that the power supply or PMU LEDs do not indicate any failure. If LEDs of the power supply or PMU indicate any failure, remove and reinstall the power supply or PMU to make sure the module is installed securely. You can also determine whether the power supply or PMU is faulty by exchanging it with another one that runs correctly.
2. Execute the display power-supply verbose command to display the power supply information.
¡ If the power supply and PMU are installed securely but the power supply status field is empty or Absent, a failure occurs. The fault cause is displayed following the status field:
- If the cause is Under-vol, the power supply might not connect to the power cord, or the external power supply might have a bad contact.
- For other causes, remove and reinstall the power supply to make sure the power supply is installed securely. You can also determine whether the power supply is faulty by exchanging it with another one that runs correctly.
¡ Verify that the PMU information (System power monitoring unit in the command output) is displayed correctly. If the PMU information fails to be displayed, remove and reinstall the PMU, and determine whether the PMU is faulty by exchanging it with another one that runs correctly.
3. Verify that the card power states are On. For a card that is installed securely in a slot, do one of the following, depending on the state of the card:
¡ In Absent state—See "In Absent state" to remove the failure.
¡ In Wait state—The system power is insufficient, and the card is waiting to be powered on. Verify that the power source and the power supplies run correctly.
¡ In Off state—The card powers off automatically due to user operation, over-temperature protection, or power supply failure, and it will not power on automatically. See "In Offline state" to resolve the problem.
4. If a power supply or PMU is faulty, replace the module. If the problem persists, contact H3C Support.
The following is a sample output of the display power-supply command:
<Sysname>display power-supply
Power info on chassis 0:
PSU 1/1 state: Normal
PSU 1/2 state: Normal
PSU 1/3 state: Normal
PSU 1/4 state: Normal
PSU 1/5 state: Normal
PSU 1/6 state: Normal
PSU 2/1 state: Normal
PSU 2/2 state: Normal
PSU 2/3 state: Normal
PSU 2/4 state: Normal
PSU 2/5 state: Normal
PSU 2/6 state: Normal
<Sysname>display power-supply verbose
Power info on chassis 0:
System power-supply policy: enable
System power-module redundant(configured): 1
System power usable: 22000 Watts
System power redundant(actual): 2000 Watts
System power allocated: 7350 Watts
System power available: 14650 Watts
SYSTEM POWER USED(CURRENT): 4959.21 Watts
System power monitoring unit 1:
Software version: 107
System power monitoring unit 2:
Software version: 107
Type In/Out Rated-Vol(V) Existing Usable Redundant(actual)
---------- ------ ------------ -------- ------ -----------------
PSE9000-A AC/DC 220(default) 12 11 1
DC output voltage information:
Tray Value(V) Upper-Threshold(V) Lower-Threshold(V) Status
---- -------- ------------------ ------------------ -------
1 50.08 51.00 49.00 Normal
2 50.10 51.00 49.00 Normal
DC output current information:
Total current(A): 99.00
Branch Value(A)
------ --------
1/1 9.20
1/2 8.00
1/3 8.40
1/4 7.40
1/5 9.00
1/6 7.60
2/1 7.60
2/2 9.00
2/3 7.60
2/4 7.60
2/5 9.00
2/6 8.60
PSU Status:
ID Status Input-Err Output-Err High-Temperature Fan-Err Closed Current-Limit
--- ------- ----------- ---------- ---------------- ------- ------ -------------
1/1 Normal
1/2 Normal
1/3 Normal
1/4 Normal
1/5 Normal
1/6 Normal
2/1 Normal
2/2 Normal
2/3 Normal
2/4 Normal
2/5 Normal
2/6 Normal
Line-card power status:
Slot Board-Type Watts Status
---- --------------- ----- ------
2 LST1XP8LEB1 280 On
3 LST1XP8LEB1 280 On
4 LST1XP8LEB1 280 On
5 LST1XP8LEB1 280 On
6 LST1XP8LEB1 280 On
7 LST1XP8LEB1 280 On
8 LST1XP8LEB1 280 On
9 LST1XP8LEB1 280 On
10 LST1XP8LEB1 280 On
11 LST1XP8LEB1 280 On
12 LST1XP8LEB1 240 On
13 LST1XP8LEB1 280 On
14 LST1XP8LEB1 240 On
15 LST1XP8LEB1 240 On
16 LST1XP8LEB1 280 On
17 LST1XP8LEB1 280 On
18 LST1XP8LEB1 280 On
19 LST1XP8LEB1 280 On
Fan failure
Symptom
The fan tray LEDs indicate a failure. A fan error message is displayed on the switch, as shown in the following example:
%Jun 26 10:12:24:805 2013 H3C DEV/3/FAN_ABSENT: -MDC=1; Chassis 2 Fan 2 is absent.
%Jun 26 10:12:32:805 2013 H3C DEVD/2/DRV_DEV_FAN_CHANGE: -MDC=1; Chassis 2: Fan communication state changed: Fan 1 changed to fault.
%Jun 26 10:12:42:405 2013 H3C DEV/2/FAN_FAILED: -MDC=1; Chassis 2 Fan 1 failed.
Solution
To resolve the problem:
1. Put your hand at the air outlet to verify that there is air being exhausted from the air outlet. If no air is being exhausted from the outlet, the fans are faulty.
2. Verify that the airflow is not blocked at the air inlet and outlet.
3. Verify that the fan tray is securely installed. You can remove and reinstall the fan tray to make sure that the fan tray is securely installed.
4. Verify that the status of each fan is normal and that the speed difference between the fans does not exceed 50%. Execute the display fan verbose command to display detailed information about the fans. If there is an abnormality, verify that the fan tray is not faulty by exchanging it with another one that runs correctly.
5. If the problem persists, replace the fan tray. If there is no new fan tray, power off the switch to avoid damage caused by high temperatures. The switch can be used temporarily if there are cooling measures to maintain the switch operating temperature below 50°C (122°F).
<Sysname>display fan verbose
Fan-tray verbose state on chassis 0:
Fan-tray 1:
Software version: 108
Hardware version: Ver.A
CPLD version: 002
Fan number: 12
Temperature: 27 ℃
High temperature alarm threshold: 60 ℃
Low speed alarm threshold: 1450 rpm
Fan Status Speed(rpm)
--- ---------- ----------
1 normal 3780
2 normal 3780
3 normal 3720
4 normal 3840
5 normal 3900
6 normal 3660
7 normal 3780
8 normal 3840
9 normal 3660
10 normal 2940
11 normal 2940
12 normal 2880
Fan-tray 2:
Software version: 108
Hardware version: Ver.A
CPLD version: 002
Fan number: 12
Temperature: 21 ℃
High temperature alarm threshold: 60 ℃
Low speed alarm threshold: 1450 rpm
Fan Status Speed(rpm)
--- ---------- ----------
1 normal 3720
2 normal 3720
3 normal 3780
4 normal 3660
5 normal 3660
6 normal 3720
7 normal 3660
8 normal 3660
9 normal 3660
10 normal 2820
11 normal 2820
12 normal 2760
Temperature alarm
Symptom
A temperature over-low or over-high alarm is generated on the switch, as shown in the following example:
%Jun 26 10:13:46:233 2013 H3C DEV/4/TEMPERATURE_WARNING: -MDC=1; Temperature is greater than warning upper limit on Chassis 1 slot 2 sensor inflow 1.
Solution
To resolve the problem:
1. Verify that the ambient temperature is in the compliant range. If the temperature is too high, find the cause. The possible cause might be that the equipment room has bad ventilation or the air conditioning is faulty.
2. Verify that the current temperature of the switch does not exceed the upper and lower warning and alarm thresholds. The card might be damaged when operating continuously at a high temperature. You can feel the card by hand, or execute the display environment command to display temperature information.
¡ If the temperature is too high, see "Fan failure" to determine whether fan failure causes the problem.
¡ If the Temperature field displays error or a value out of the ordinary, the switch might fail to access the card temperature sensor through the I2C bus. The switch accesses the transceiver modules through the same I2C bus. You can view whether the transceiver module information is displayed correctly. If the switch can access the transceiver modules, use the temperature-limit command to reconfigure the temperature thresholds. Then use the display environment command to view whether the setting takes effect.
[Sysname]temperature-limit chassis 2 slot 0 hotspot 1 -20 85 90
<Sysname>display environment
System temperature information (degree centigrade):
-------------------------------------------------------------------------------
Slot Sensor Temperature LowerLimit WarningLimit AlarmLimit ShutdownLimit
2/0 inflow 1 35 -25 70 85 N/A
2/0 outflow 1 40 -20 80 85 N/A
2/0 hotspot 1 43 -20 85 90 N/A
2/2 inflow 1 39 -20 70 85 N/A
2/2 outflow 1 40 -10 80 90 N/A
2/2 hotspot 1 41 -10 80 90 N/A
2/3 inflow 1 41 -20 70 85 N/A
2/3 outflow 1 57 15 80 85 N/A
2/3 hotspot 1 41 -20 75 80 N/A
2/3 hotspot 2 50 0 75 80 N/A
2/4 inflow 1 43 -20 70 85 N/A
2/4 outflow 1 60 15 80 85 N/A
2/4 hotspot 1 43 -20 75 80 N/A
2/4 hotspot 2 54 0 75 80 N/A
3. If the problem persists, contact H3C Support.
Related commands
This section lists the commands that you might use for troubleshooting hardware.
Command |
Description |
display device |
Displays device information, including the card states. |
display environment |
Displays the temperature statistics of the device, including the current temperature and temperature thresholds. |
display fan |
Displays the operating states of fans. |
display hardware-failure-detection |
Displays hardware failure detection and rectification information, including the rectification actions for each failure and historic information about the last ten fault rectifications on each card. |
display power-supply |
Displays power supply information: · Enabled/disabled status of the power supply management function. · Power supply type, rated input voltage, and rated output power. · Number of redundant power supplies and the available, redundant, used, and remaining power of each power supply. · Status of the installed power supplies. · Power supply status of the LPUs. |
display system-working-mode |
Displays the current system operating mode. |
display version |
Displays system version information, card running time, and cause of the last reboot. |
save |
Saves the running configuration to a specific configuration file. |
system-working-mode |
Sets the system operating mode to modify the hardware resources allocation. The command takes effect after the configuration is saved and the device reboots. |
temperature-limit |
Sets the temperature alarm thresholds for the device. |
Troubleshooting links and ports
This section provides troubleshooting information for common problems with links and ports.
Error packets on a port
Symptom
Use the display interface command to display the traffic statistics about incoming packets and outgoing packets of a port. The error packet count is not 0.
<Sysname> display interface GigabitEthernet1/8/0/1
GigabitEthernet1/8/0/1 current state: UP
Line protocol current state: UP
IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: b8af-67bc-24fa
Description: GigabitEthernet1/8/0/1 Interface
Loopback is not set
Media type is twisted pair, Port hardware type is 1000_BASE_T
1000Mbps-speed mode, full-duplex mode
Link speed type is autonegotiation, link duplex type is autonegotiation
Flow-control is not enabled
The Maximum Frame Length is 9216
Allow jumbo frame to pass
Broadcast MAX-ratio: 100%
Multicast MAX-ratio: 100%
Unicast MAX-ratio: 100%
PVID: 999
Mdi type: automdix
Port link-type: access
Tagged Vlan: none
UnTagged Vlan: 999
Port priority: 2
Last clearing of counters: Never
Peak value of input: 70 bytes/sec, at 2013-03-19 13:04:15
Peak value of output: 210 bytes/sec, at 2013-03-19 13:04:15
Last 300 seconds input: 0 packets/sec 70 bytes/sec 0%
Last 300 seconds output: 0 packets/sec 210 bytes/sec 0%
Input (total): 693897 packets, 72834962 bytes
22196 unicasts, 584504 broadcasts, 87197 multicasts, - pauses
Input (normal): 693897 packets, 72834962 bytes
22196 unicasts, 584504 broadcasts, 87197 multicasts, 152536 pauses
Input: 0 input errors, 0 runts, 0 giants, 0 throttles
0 CRC, 0 frame, 0 overruns, - aborts
- ignored, - parity errors
Output (total): 7515164 packets, 14001669469 bytes
20811 unicasts, 6228300 broadcasts, 1266053 multicasts, - pauses
Output (normal): 7515164 packets, 14001669469 bytes
20811 unicasts, 6228300 broadcasts, 1266053 multicasts, 0 pauses
Output: 0 output errors, - underruns, - buffer failures
0 aborts, 0 deferred, 0 collisions, 0 late collisions
- lost carrier, - no carrier
Table 4 Error packet fields for incoming packets
Field |
Description |
input errors |
Number of incoming error packets. |
Runts |
Number of incoming frames shorter than 64 bytes, in correct format, and containing valid CRCs. |
Giants |
Number of incoming frames larger than the maximum frame length configured on the interface. |
CRC |
Number of incoming frames that contained CRC errors. |
frame |
Number of incoming frames that contained CRC errors and a non-integer number of bytes. |
Table 5 Error packets fields for outgoing packets
Field |
Description |
output errors |
Number of outgoing error packets. |
aborts |
Number of packets that failed to be transmitted. |
deferred |
Number of frames that the interface failed to transmit when the delay exceeded two times the maximum packet transmission time because the medium was busy. |
collisions |
Number of frames that the interface stopped transmitting because Ethernet collisions were detected during transmission. |
late collisions |
Number of frames that the interface deferred to transmit after transmitting their first 512 bits because of detected collisions. |
Solution
The number of incoming error packets of the CRC, frame, and throttle types keeps increasing on a port
To resolve the problem:
1. Use a tester to test the link, and verify that the link quality or fiber signal attenuation of the link is normal. If a link failure exists, replace the network cable or fiber.
A weak link quality or serious fiber signal attenuation will cause packet transmission errors.
2. Verify that the transceiver module is operating correctly if a transceiver module is used.
For more information, see "Transceiver module failures."
3. Use the network cable or fiber and transceiver module of the port to connect to another port that is operating correctly.
¡ If error packets do not appear on the new port and error packets appear after the network cable or fiber and transceiver module is connected to the current port again, you can determine that the port fails. Use another port that is operating correctly, and contact H3C Support.
¡ If error packets still appear on the new port, the peer device and intermediate transmission links might fail. Examine the peer device and intermediate transmission links.
4. Verify that the peer device and intermediate devices are operating correctly.
5. If the problem persists, contact H3C Support.
The number of incoming error packets of the overrun type keeps increasing on a port
The number of overrun packets keeps increasing on a port because the input rate exceeds the processing capability of the port, which causes congestion.
To resolve the problem:
1. Execute the display interface command multiple times when both of the following are true:
¡ Only one port cannot correctly send and receive packets, or the device attached to only one port cannot transmit traffic.
¡ The other ports on the same interface card are operating correctly.
2. Perform one of the following tasks, depending on the error packet count trend:
¡ If the number of input errors increases, but the number of overruns does not increase, examine the fiber, transceiver module, and the peer device.
¡ If the number of input errors increases and the increment is the same as the increment of overruns, the interface card might be internally congested or blocked. To resolve the problem, contact H3C Support.
3. If the problem persists, contact H3C Support.
The incoming error packets of the jumbo type keeps increasing on a port
To resolve the problem:
1. Verify that the jumbo frame configurations are the same on both ends, including:
¡ Whether jumbo frame support is enabled.
¡ The default maximum jumbo frame size allowed.
¡ The configured maximum jumbo frame size allowed.
2. If the problem persists, contact H3C Support.
The number of outgoing error packets keeps increasing on a port
To resolve the problem:
1. Examine the duplex mode of the port. Configure the port to operate in full duplex mode if the port is operating in half duplex mode.
2. If the problem persists, contact H3C Support.
A port fails to go up
Symptom
Solution
To resolve the problem:
1. Verify that the network cable or fiber link between ports is correct.
2. Verify that the Rx end and the Tx end are correctly connected.
3. Verify that the intermediate transmission link is correct by performing one of the following tasks:
¡ Replace the network cable or fiber between ports.
¡ Connect other ports that are operating correctly by using the network cable or fiber.
4. Verify that the configurations of the local port and the peer port are correct, including whether the port is shutdown, and its speed, duplex mode, negotiation mode, and MDI.
[Sysname]display current-configuration interface ten-gigabitethernet 1/6/0/1
#
interface Ten-GigabitEthernet1/6/0/1
port link-mode bridge
port link-type trunk
port trunk permit vlan 1 3102
port link-aggregation group 1
#
Return
Table 6 Support for duplex modes
Speed (right) |
10 Gbps |
1000 Mbps |
100 Mbps |
10 Mbps |
Duplex mode (below) |
||||
Full |
Supported |
Supported |
Supported |
Supported |
Half |
No supported |
No supported |
No supported |
No supported |
5. If the port has a transceiver module installed, verify that the transceiver modules at both ends of the link are consistent in the rate, wavelength, and single-mode or multi-mode status.
[Sysname]display transceiver interface ten-gigabitethernet 2/9/0/1
Ten-GigabitEthernet2/9/0/1 transceiver information:
Transceiver Type : 10G_BASE_LR_XFP
Connector Type : LC
Wavelength(nm) : 1310
Transfer Distance(km) : 10(SMF)
Digital Diagnostic Monitoring : YES
Vendor Name : H3C
6. Replace the transceiver module with a transceiver module that is operating correctly, and determine whether the transceiver modules fail.
For more information, see "Transceiver module failures."
7. If the transceiver module fails, replace the transceiver module, and contact H3C Support.
A port in up state goes down
Symptom
Solution
To resolve the problem:
1. Examine the logs of the local device and the peer device, and verify that a shutdown operation has not been performed.
2. Examine the status of ports at both ends. Determine whether the port is shut down because of the protocol failures or because of the failures detected by the online diagnosis module.
3. Contact H3C Support if Protect DOWN appears in the output for a port, for example, GigabitEthernet 2/6/0/1.
Protect DOWN means that the port goes down because the isolate keyword is specified for the hardware-failure-detection command. When the online diagnosis module detects port failures, the port will be shut down and isolated, so that the traffic can be switched to the backup link.
[Sysname]display interface GigabitEthernet2/6/0/1
GigabitEthernet2/6/0/1 current state: Protect DOWN
Line protocol current state: DOWN
IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0000-e80d-c000
Description: GigabitEthernet2/6/0/1 Interface
Loopback is not set
Media type is optical fiber, Port hardware type is 1000_BASE_SX_SFP
Unknown-speed mode, unknown-duplex mode
Link speed type is autonegotiation, link duplex type is autonegotiation
Flow-control is not enabled
The Maximum Frame Length is 9216
……
4. Verify that the configurations of ports at both ends, network cables, transceiver modules, and fiber links are correct.
For more information, see "A port fails to go up."
5. If the problem persists, contact H3C Support.
A port frequently goes up and down
Symptom
A port frequently goes up and down.
Solution
1. For a fiber port, verify that the transceiver module is operating correctly.
For more information, see "Transceiver module failures."
2. For a copper port, the port status might be unstable when the speed and duplex mode are autonegotiated. Manually configure the speed and duplex mode for the port.
3. Verify that the link, peer device, and intermediate devices are operating correctly.
4. If the problem persists, contact H3C Support.
Transceiver module failures
Symptom
The interface with a transceiver module installed cannot go up, and alarms are present.
Solution
To resolve the problem:
1. Check the alarms on the transceiver module:
¡ If TX faults exist in the alarms, the peer port, fiber, or intermediate transmission devices might fail.
¡ If the RX faults or electrical current and voltage faults exist in the alarms, examine the local port.
<Sysname>display transceiver alarm interface GigabitEthernet 2/0/1
GigabitEthernet2/0/1 transceiver current alarm information:
TX fault
RX power high
Table 7 Alarms on transceiver modules
Field |
Description |
Alarms on SFP/SFP+ transceiver modules: |
|
RX loss of signal |
Received signals are lost. |
RX power high |
The received optical power is high. |
RX power low |
The received optical power is low. |
TX fault |
Transmission error. |
TX bias high |
The transmitted bias current is high. |
TX bias low |
The transmitted bias current is low. |
TX power high |
The transmitted optical power is high. |
TX power low |
The transmitted optical power is low. |
Temp high |
The temperature is high. |
Temp low |
The temperature is low. |
Voltage high |
The voltage is high. |
Voltage low |
The voltage is low. |
Transceiver info I/O error |
Transceiver information read/write error. |
Transceiver info checksum error |
Transceiver information checksum error. |
Transceiver type and port configuration mismatch |
The type of the transceiver module does not match the port configuration. |
Transceiver type not supported by port hardware |
The port does not support this type of transceiver modules. |
Alarms on XFP transceiver modules: |
|
RX loss of signal |
Received signals are lost. |
RX not ready |
The receiving status is not ready |
RX CDR loss of lock |
Receiving CDR loss of lock. |
RX power high |
The received optical power is high. |
RX power low |
The received optical power is low. |
TX not ready |
The transmission status is ready. |
TX fault |
Transmission error. |
TX CDR loss of lock |
Transmission CDR loss of lock. |
TX bias high |
The transmitted bias current is high. |
TX bias low |
The transmitted bias current is low. |
TX power high |
The transmitted optical power is high. |
TX power low |
The transmitted optical power is low. |
Module not ready |
The module is not ready. |
APD supply fault |
Avalanche photo diode error. |
TEC fault |
Thermoelectric cooler error. |
Wavelength unlocked |
Wavelength loss of lock. |
Temp high |
The temperature is high. |
Temp low |
The temperature is low. |
Voltage high |
The voltage is high. |
Voltage low |
The voltage is low. |
Transceiver info I/O error |
Transceiver information read/write error. |
Transceiver info checksum error |
Transceiver information checksum error. |
Transceiver type and port configuration mismatch |
The type of the transceiver module does not match the port configuration. |
Transceiver type not supported by port hardware |
The port does not support this type of transceiver modules. |
1. Cross-verify the transceiver module that might fail:
a. Install the transceiver module in another fiber port
b. Replace the current transceiver module with a transceiver module that is operating correctly.
2. Determine whether the transceiver module fails or the neighboring devices and intermediate transmission links fail.
3. If the transceiver module fails, use the display transceiver diagnosis command to display the digital diagnosis parameters on the transceiver module, and contact H3C Support.
You might fail to query the digital diagnosis parameters of a non-H3C transceiver module. H3C recommends that you use H3C transceiver modules. To query the vendor of a transceiver module, use the display transceiver manuinfo command. If the value of the Vendor Name field is H3C, the transceiver module is customized by H3C.
<Sysname>display transceiver manuinfo interface Ten-GigabitEthernet1/2/0/15
Ten-GigabitEthernet1/2/0/15 transceiver manufacture information:
Manu. Serial Number : 213410A0000054000251
Manufacturing Date : 2012-10-26
Vendor Name : H3C
Related commands
This section lists the commands that you might use for troubleshooting ports and links.
Command |
Description |
display current-configuration |
Displays the running configuration. With an interface specified, this command displays the running configuration of the interface. |
display interface |
Displays the incoming traffic statistics, outgoing traffic statistics, and status of a port. In the output from this command, you can view whether error packets exist and view the error packet statistics. |
display transceiver alarm |
Displays alarms present on transceiver modules. |
display transceiver diagnosis |
Displays the current values of the digital diagnosis parameters on transceiver modules. |
display transceiver interface |
Displays key parameters of the transceiver module in a specified interface to verify whether the transceiver modules at both ends are consistent in the rate, wavelength, and single-mode or multi-mode status. |
display transceiver manuinfo |
Displays the electronic label information of a transceiver module to query the vendor of the transceiver module. |
Troubleshooting hardware forwarding
Forwarding path problem
Symptom
When data forwarding path failure detection is enabled (it is enabled by default), the switch periodically sends test packets between LPUs to examine whether the forwarding chips on the LPUs are operating correctly.
[Sysname]forward-path-detection enable
If a forwarding problem occurs, the switch displays "Forwarding fault" or "Board fault" messages. For example:
%Jun 26 09:51:53:207 2013 H3C DIAG/1/ALERT: -MDC=1-Chassis=2-Slot=4; Forwarding fault: chassis 2 slot 6 to chassis 2 slot 4
%Jun 26 09:51:57:621 2013 H3C DIAG/1/ALERT: -MDC=1; Board fault: chassis 2 slot 6,please check it
%Jun 26 09:51:59:251 2013 H3C DIAG/1/ALERT: -MDC=1-Chassis=2-Slot=6; Forwarding fault: chassis 2 slot 6 to chassis 2 slot 6
%Jun 26 09:52:05:621 2013 H3C DIAG/1/ALERT: -MDC=1; Board fault: chassis 2 slot 6,please check it
%Jun 26 09:52:12:621 2013 H3C DIAG/1/ALERT: -MDC=1; Board fault: chassis 2 slot 6,please check it
%Jun 26 09:52:22:621 2013 H3C DIAG/1/ALERT: -MDC=1; Board fault: chassis 2 slot 6,please check it
Solution
The switch has MPUs, LPUs, and switching fabric modules. LPUs and switching fabric modules perform service traffic forwarding. Traffic is load balanced among the switching fabric modules. MPUs perform control and management. MPUs do not participate in service traffic forwarding.
To resolve the forwarding path problem:
· If "Forwarding fault" messages show forwarding problems between multiple LPUs, it is likely that a switching fabric module has a problem. To locate the problem source, isolate switching fabric modules one by one. (An isolated switching fabric module does not participate in traffic forwarding. Isolating a switching fabric module does not result in packet loss.)
For example, do the following on an H3C S12508 switch in which slots 10 through 18 hold switching fabric modules:
a. Isolate the switching fabric module in slot 10.
[Sysname] board-offline slot 10
Caution: This command is only for diagnostic purpose which will cause board normal service unusable. Continue? [Y/N]:y
Config successfully
b. Observe for a while to see whether the problem disappears.
c. If the problem disappears, the switching fabric module is likely to be the problem source. H3C recommends that you replace the module card or install the module into another switch that is operating correctly to determine whether the module is really the problem source.
d. If the problem persists, cancel the isolation.
[Sysname]undo board-offline slot 10
This command will reboot the specified board. Continue? [Y/N]:y
Config successfully
e. After the switching fabric module in slot 10 starts up and operates correctly (in Normal state), isolate the switching fabric module in the next slot. Repeat the previous steps until you locate the failed switching fabric module and verify that other switching fabric modules are operating correctly.
· If "Forwarding fault" messages show forwarding problems from the same LPU to multiple other LPUs, the LPU is likely to have a problem. If you are not sure whether the LPU has a problem, H3C recommends that you do the following to locate the problem source:
a. Isolate switching fabric modules one by one, and observe whether the problem disappears.
b. If the problem persists during the whole isolation process, the LPU might be the source of the problem. H3C recommends that you switch the services on the LPU to other LPUs and replace or isolate the LPU. If the problem is solved, the LPU is the source of the problem.
Online hardware diagnostic and failure protection
After you enable the hardware failure detection function, the switch automatically detects hardware failures on the following elements:
· chip—Components.
· board—Cards.
· forwarding—Forwarding plane.
You can configure the switch to take the following actions in response to hardware failures:
· off—Takes no action.
· warning—Sends traps to notify you of the failures. (The default setting is warning.)
· reset—Restarts the relevant cards to recover from failures.
· isolate—Shuts down the relevant ports, prohibits loading software for the relevant cards, isolates the relevant cards, or powers off the relevant cards to reduce impact from the failures.
If there are backup links, H3C recommends that you configure the switch to take the isolate action. This action isolates the failed element and helps recover services quickly. The following shows the configuration commands:
[Sysname]hardware-failure-detection chip isolate
Config successfully
[Sysname]hardware-failure-detection board isolate
Config successfully
[Sysname]hardware-failure-detection forwarding isolate
Config successfully
To display hardware failure detection and fix information, use the following command:
<Sysname>display hardware-failure-detection
Current level:
chip : warning
board : warning
forwarding : warning
---------------------Chassis 1, Slot 0 executed records:-------------------
There is no record.
---------------------Chassis 1, Slot 0 trapped records:--------------------
There is no record.
Related commands
This section lists the commands that you might use for troubleshooting hardware forwarding.
Command |
Description |
board-offline |
Isolate a card from the system. |
display hardware-failure-detection |
Display hardware failure detection and fix information, including the following times: · Protection actions configured for hardware failures. · Most recent 10 fix records of each card. |
forward-path-detection enable |
Enable data forwarding path failure detection to examine whether data forwarding paths are operating correctly. |
hardware-failure-detection |
Configure hardware failure detection, and specify the actions to be taken in response to hardware failures. The purpose is to enable the device to automatically detect hardware failures and recover services. |
Troubleshooting packet forwarding failure
Ping failure or packet loss
Symptom
Packet loss and ping failure occurred.
<Sysname>ping 10.0.0.5
PING 10.0.0.5 (10.0.0.5): 56 data bytes, press CTRL_C to break
Request time out
Request time out
Request time out
Request time out
Request time out
--- 10.0.0.5 ping statistics ---
5 packet(s) transmitted, 0 packet(s) received, 100.0% packet loss
Solution
Packet statistics collection
To resolve the problem, collect packet statistics by using packet capture tools or by configuring ACL rules. The following uses ACL rule as an example.
1. Create an IPv4 advanced ACL rule to permit IP packets destined for 1.1.1.1.
[Sysname]acl number 3000
[Sysname-acl-adv-3000] rule 1 permit ip destination 1.1.1.1 0
2. Define a traffic class and a traffic behavior.
[Sysname]traffic classifier statistic_1
[Sysname-classifier-static] if-match acl 3000
[Sysname] traffic behavior statistic_1
[Sysname-classifier-static] accounting packet
3. Create a QoS policy, and associate traffic class statistic_1 with traffic behavior statistic_1 in the QoS policy.
[Sysname] qos policy statistic_1
[Sysname-classifier-static] classifier statistic_1 behavior statistic_1
4. Apply the QoS policy to the incoming traffic of GigabitEthernet 8/0/1.
[Sysname] interface gigabitethernet 8/0/1
[Sysname-GigabitEthernet8/0/1] qos apply policy statistic_1 inbound
5. Display information about the QoS policies applied to GigabitEthernet 8/0/1.
[Sysname] display qos policy interface gigabitethernet8/0/1
Interface: GigabitEthernet8/0/1
Direction: Inbound
Policy: statistic_1
Classifier: statistic_1
Operator: AND
Rule(s) : If-match acl 3000
Behavior: statistic_1
Accounting Enable:
1000 (Packets)
Packet count
If the device does not receive any ping packets, check the neighboring device on the uplink. If the number of ping packets sent by the device is correct, check the neighboring device on the downlink. If the number of ping packets sent is incorrect, see "Layer 2 forwarding failure, "Layer 3 forwarding failure," and "MPLS forwarding failure."
Layer 2 forwarding failure
Symptom
Layer 2 packet loss or ping failure occurs between a switch and a device on the same network segment and in the same VLAN.
A switch can perform Layer 2 forwarding only when the destination MAC address of a packet is different from any MAC address of the switch. A switch might have multiple MAC addresses in an address range. The following output shows the MAC addresses of a VLAN interface on a switch:
[Sysname]display interface vlan-interface 10
Vlan-interface10 current state: UP
Line protocol current state: UP
Description: Vlan-interface10 Interface
The Maximum Transmit Unit is 1500
Internet Address is 1.1.1.1/24 Primary
IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 00e0-fc00-6503
IPv6 Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 00e0-fc00-6503
Last clearing of counters: Never
Solution
To resolve the problem:
1. Verify that the following Layer 2 configurations are correct:
¡ VLAN and PVID.
¡ Packet filtering.
¡ Traffic redirection.
¡ Traffic policing.
¡ Generic traffic shaping (GTS).
¡ Unknown unicast suppression/multicast suppression/broadcast suppression.
2. Verify that the learned MAC addresses are correct. If they are not, determine whether loops occur. To quickly restore forwarding, you can configure static MAC address entries.
<Sysname>display mac-address
MAC Address VLAN ID State Port/NickName Aging
0010-9400-0002 10 Learned GE2/6/0/1 Y
000f-e259-79c0 25 Learned GE2/15/0/1 Y
00e0-fc12-3456 25 Learned GE2/15/0/1 Y
0023-8956-7b00 3102 Learned XGE2/4/0/1 Y
0023-8956-7b00 3202 Learned XGE2/4/0/8 Y
3. Verify traffic statistics:
¡ Execute the qos traffic-counter inbound command to collect statistics about the inbound traffic.
[Sysname]qos traffic-counter inbound counter0 slot 3 interface Gigabitethernet 3/0/1
¡ Execute the display qos traffic-counter inbound multiple times to observe the discarded packet count in the inbound direction. If the count continuously increases, verify the port configurations according to Table 8. If the reasons for packet loss still cannot be determined, contact H3C Support.
[Sysname]display qos traffic-counter inbound counter0 slot 3
Slot 3 inbound counter0 mode:
Interface: GigabitEthernet3/0/1
VLAN: all
Traffic-counter summary:
Summary inbound: 578199 packets
Dropped of local filtering: 0 packets
Dropped of VLAN filtering: 0 packets
Dropped of security filtering: 0 packets
Field |
Description |
Summary inbound |
Number of incoming packets. |
Dropped of local filtering |
A packet might be dropped due to the following reasons: · Traffic suppression is performed. · The outgoing interface is the same as the incoming interface, according to the MAC address table lookup result. · STP sets the state of the interface to discarding. |
Dropped of VLAN filtering |
A packet might be dropped due to the following reasons: · The VLAN of the packet is different from the VLAN of the interface. · The VLAN of the packet hasn't been created. |
Dropped of security filtering |
A packet might be dropped due to the following reasons: · The packet matches a blackhole MAC address entry. To display blackhole MAC address entries, execute the display mac-address blackhole command. · The packet fails the MAC authentication. To display MAC authentication settings and statistics, execute the display mac-authentication interface command. · The source MAC address of the packet is a multicast MAC address or broadcast MAC address. · The source MAC address of the packet is unknown to the interface. |
¡ Execute the qos traffic-counter outbound command to collect statistics about the outbound traffic.
[Sysname]qos traffic-counter outbound counter0 slot 4 interface Gigabitethernet 4/0/1
¡ Execute the display qos traffic-counter outbound multiple times to observe the discarded packet count in the outbound direction. If the count continuously increases, verify the port configurations according to Table 9. If the reasons for packet loss still cannot be determined, contact H3C Support.
[Sysname]display qos traffic-counter outbound counter0 slot 4
Slot 4 outbound counter0 mode:
Interface: GigabitEthernet4/0/1
VLAN: all
Local precedence: all
Drop priority: all
Traffic-counter summary:
Unicast: 0 packets
Multicast: 0 packets
Broadcast: 0 packets
Control packets: 18 packets
Bridge egress filtered packets: 0 packets
Tail drop packets: 0 packets
Tail drop multicast packets: 993827 packets
Forwarding restrictions packets: 0 packets
Field |
Description |
Unicast/Multicast/Broadcast |
Number of packets that are not dropped. |
Control packets |
Number of control packets sent by the CPU. |
Bridge egress filtered packets |
A packet might be dropped due to the following reasons: · The VLAN of the packet is different from the VLAN of the interface. · STP sets the state of the interface to discarding. · RRPP or Smart Link blocks the interface. · The outgoing interface is down. |
Tail drop packets |
A packet might be dropped due to the following reasons: · The transmit queue is congested. · Traffic shaping is performed. |
Tail drop multicast packets |
A multicast or broadcast packet might be dropped due to the following reasons: · No outgoing interface is configured for the packet. · STP blocks the interface. · The outgoing interface is down. |
Forward restrictions packets |
Number of packets that are prevented from being forwarded. |
Layer 3 forwarding failure
Symptom
IP service failures, ping or tracert operation failures, or ping or tracert packet loss occurs.
A switch performs Layer 3 forwarding by using the driver IP forwarding table instead of the routing table. The route management module selects optimal routes through various protocols, and puts them into the FIB table. The FIB table synchronizes the routes to the driver IP forwarding table, which guides packet forwarding.
Figure 3 Relationship between the routing table and forwarding table
Solution
To resolve the problem:
1. Use the mirroring function or capture packets to verify that the destination MAC address of packets is the MAC address of the switch.
A switch can perform Layer 3 forwarding only when the destination MAC address of a packet is the MAC address of the switch. The switch might have multiple MAC addresses in an address range. The following output shows the MAC addresses of VLAN interfaces on a switch:
[Sysname]display interface vlan-interface 10
Vlan-interface10 current state: UP
Line protocol current state: UP
Description: Vlan-interface10 Interface
The Maximum Transmit Unit is 1500
Internet Address is 1.1.1.1/24 Primary
IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 00e0-fc00-6503
IPv6 Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 00e0-fc00-6503
Last clearing of counters: Never
2. Verify that the route to the specific destination exists in the routing table. If it does not exist, examine the routing protocol configurations and protocol states.
[Sysname]display ip routing-table 1.1.1.0
Summary Count : 1
Destination/Mask Proto Pre Cost NextHop Interface
1.1.1.0/24 Static 60 0 20.0.0.2 Vlan20
3. Verify that the route to the specific destination exists in the FIB table. If a route exists but cannot be used to guide the packet forwarding, contact H3C Support.
[Sysname]display fib 1.1.1.0
Destination count: 1 FIB entry count: 1
Flag:
U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination/Mask Nexthop Flag OutInterface/Token Label
1.1.1.0/24 20.0.0.2 USG Vlan20
4. Verify that the interfaces in the learned ARP entries are correct. If they are not, execute the reset arp command to clear ARP entries so that the device can learn the correct ARP entries. You can also configure static ARP entries. If the problem persists, contact H3C Support.
[Sysname]display arp 20.0.0.2
Type: S-Static D-Dynamic M-Multiport I-Invalid
IP Address MAC Address VLAN Interface Aging Type
20.0.0.2 0000-0000-0001 20 GE2/0/1 N/A S
MPLS forwarding failure
Symptom
You might experience the following problems with MPLS forwarding:
· Unreachable destination.
· No routes.
· Error message printed.
· Unstable tunnels.
· Packet sending or receiving failure.
Solution
VLL and L3VPN are implemented based on LSPs.
To resolve the common problems with MPLS, verify the LSP and route configurations on the LSRs.
Figure 4 MPLS network diagram
Troubleshooting MPLS LSPs
Perform the following configurations on the ingress node (PE 1 in Figure 4):
1. Execute the display mpls lsp command to display LSP information.
[PE1]display mpls lsp
FEC Proto In/Out Label Interface/Out NHLFE
100.100.100.100/32 LDP 3/- -
4.4.4.4/32 LDP NULL/3 Vlan103
90.0.0.0/24 LDP NULL/3 Vlan103
1.1.1.1/32 LDP 3/NULL InLoop0
50.0.0.0/24 LDP NULL/3 Vlan103
70.0.0.0/24 LDP NULL/3 Vlan103
3.3.3.3/32 LDP NULL/1025 Vlan103
If the configured LSP does not exist, see MPLS Configuration Guide to verify the MPLS LSP configuration on each LSR.
2. Execute the display mpls ldp peer command and verify the MPLS LDP session.
[PE1]display mpls ldp peer
Total number of peers: 1
Peer LDP ID State Role GR MD5 KA Sent/Rcvd
4.4.4.4:0 Operational Passive Off Off 39/39
If the session status is not Operational, an error might occur. Go to steps 3 and 4 to further determine the problem. If the session status is Operational, go to step 5.
3. Execute the display current-configuration configuration ldp command, and verify that the local LSR and the peer LSR have the same MD5 password.
<PE1>display current-configuration configuration ldp
#
mpls ldp
md5-authentication 4.4.4.4 cipher $c$3$uNK0ggilqlClQ6Q/CcNQPPqux6mAqU2p
#
return
4. Execute the display mpls ldp interface command to display LDP interface information.
[PE1]display mpls ldp interface
Interface MPLS LDP Auto-config
Vlan10 Enabled Configured -
GE3/0/2 Enabled Configured -
XGE2/0/6 Enabled Configured -
If the configured information is incorrect, verify the MPLS LDP configuration on each LSR.
5. Execute the mpls lsr-id command, and verify that the LSR ID is the IP address of a loopback interface. H3C recommends that you configure the IP address of a loopback interface as the LSR ID.
<PE1>display current-configuration | include lsr-id
mpls lsr-id 2.2.2.2
<PE1>display ip interface brief
*down: administratively down
(s): spoofing
Interface Physical Protocol IP Address Description
Loop0 up up(s) 100.100.100.100 LoopBack0..
Loop2 up up(s) 100.100.100.102 LoopBack2..
M-E0/0/0 up up 192.168.147.7 M-Etherne..
<PE1>system-view
[PE1]mpls lsr-id 100.100.100.100
6. Verify that the VLAN interface is enabled with MPLS and MPLS LDP.
[PE1]interface vlan-interface 103
[PE1-Vlan-interface103]display this
#
interface Vlan-interface103
ip address 1.1.1.2 255.255.255.0
mpls enable
mpls ldp enable
#
return
Troubleshooting routes
Perform the following configurations on the ingress node (PE 1 in Figure 4):
1. Execute the display ip routing-table command to display routing table information.
[PE1]display ip routing-table
Routing Tables: Public
Destinations : 10 Routes : 10
Destination/Mask Proto Pre Cost NextHop Interface
1.1.1.1/32 Direct 0 0 127.0.0.1 InLoop0
3.3.3.3/32 OSPF 10 2 103.0.0.4 Vlan103
4.4.4.4/32 OSPF 10 1 103.0.0.4 Vlan103
50.0.0.0/24 OSPF 10 2 103.0.0.4 Vlan103
70.0.0.0/24 OSPF 10 2 103.0.0.4 Vlan103
90.0.0.0/24 OSPF 10 2 103.0.0.4 Vlan103
103.0.0.0/24 Direct 0 0 103.0.0.1 Vlan103
103.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0
Verify that the route entries include IP addresses of the loopback interfaces on PE 1, P, and PE 2, and the IP address of the remote device's VLAN interface. Otherwise, verify the routing protocol configuration on each LSR.
2. Verify that the routing protocol (this example uses OSPF) operates correctly. If it does not, verify the routing protocol configuration on each LSR.
[PE1]display ospf peer
OSPF Process 1 with Router ID 1.1.1.1
Neighbor Brief Information
Area: 0.0.0.0
Router ID Address Pri Dead-Time Interface State
4.4.4.4 103.0.0.4 1 37 Vlan103 Full/BDR
3. Verify that the loopback interface and the VLAN interface are advertised in the routing protocol. Verify that the LDP interface is enabled with a routing protocol.
[PE1-ospf-1]display this
#
ospf 1
area 0.0.0.0
network 103.0.0.0 0.0.0.255
network 1.1.1.1 0.0.0.0
#
return
4. Execute the debugging command to verify that routing protocol packets are sent and received correctly. If they are not, verify the routing protocol configurations on the local LSR and remote LSR.
<PE1>debugging ospf packet
*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; OSPF 1: Sending packe
ts.
*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Source address: 1.1.1.1
*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Destination address: 224.0.0.5
*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Version 2, Type: 1, Length: 44.
*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Router: 192.168.147.7, Area: 0.0.0.0, Checksum: 42732.
*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Authentication type: 00, Key(ASCII): 0 0 0 0 0 0 0 0.
*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Network mask: 255.255.255.0, Hello interval: 10, Option: _E_.
*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Router priority: 1, Dead Interval: 40, DR: 1.1.1.1, BDR: 0.0.0.0.
5. If the problem persists, contact H3C Support.
Related commands
This section lists the commands that you might use for troubleshooting IP forwarding.
Command |
Description |
accounting packet |
Configures a traffic accounting action in the traffic behavior database to count traffic in packets. |
acl |
Creates an ACL, and enters its view. |
classifier behavior |
Associates a traffic behavior with a traffic class in a QoS policy. |
debugging ospf packet |
Enables OSPF packet debugging to examine whether OSPF packets can be correctly sent and received. |
display arp |
Displays ARP entries to check whether output interfaces can be correctly learned through ARP. |
display current-configuration | include lsr-id |
Displays the current MPLS LSR ID. |
display current-configuration configuration mpls-ldp |
Displays information about MPLS LDP to verify the consistency of MD5 passwords. |
display fib |
Displays FIB entries to examine whether an entry matching a specific destination network exists in the FIB table. |
display interface |
Displays information about the specified interface. |
display ip interface brief |
Displays brief IP configuration information for the specified Layer 3 interface or all Layer 3 interfaces. |
display ip routing-table |
Displays brief information about active routes in the routing table to examine whether a route to the specified network exists in the routing table. |
display mac-address |
Displays MAC address entries to examine whether interfaces can be correctly learned. |
display mpls ldp interface |
Displays LDP interface information to examine whether the corresponding label advertisement mode exists. |
display mpls ldp peer |
Displays LDP peer information to examine whether the configured LSPs are up. |
display mpls ldp session |
Displays LDP session information. |
display mpls lsp |
Displays information about LSPs. |
display ospf peer |
Displays information about OSPF neighbors. |
display qos policy interface |
Displays information about the QoS policy or policies applied to an interface. |
display qos traffic-counter |
Displays the traffic statistics collected by the specified counter, and displays the configuration of the counter. |
display this |
Displays the running configuration in the current view. |
interface |
Enters interface view. |
rule |
Creates an ACL rule. |
traffic behavior |
Creates a traffic behavior and enters traffic behavior view. |
traffic classifier |
Creates a class and enters class view. |
qos apply policy |
Applies a QoS policy to a port. |
qos policy |
Creates a QoS policy and enters QoS policy view. |
qos traffic-counter |
Enables the traffic accounting function, and specifies the type of traffic. |
mpls lsr-id |
Configures an LSR ID for the local LSR. |
ping |
Examines whether the destination IP address is reachable, and displays related statistics. |
Troubleshooting IRF
This section provides troubleshooting information for common problems with IRF.
IRF fabric establishment failure
Symptom
An H3C S12500 IRF fabric cannot be established.
Solution
To resolve the problem:
1. Verify that all member chassis run the same software version and use the same type of MPUs:
a. Execute the display device command. Check the Brd Type and Software Version fields for the software version and MPU type.
<Sysname> display device
Slot No. Brd Type Brd Status Subslot Num Sft Ver
1/0 LST1MRPNC1 Master 0 S12500-CMW710-R7128
1/1 LST1MRPNC1 Standby 0 S12500-CMW710-R7128
1/2 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/3 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/4 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/5 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/6 NONE Absent 0 NONE
1/7 NONE Absent 0 NONE
1/8 NONE Absent 0 NONE
1/9 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
……
b. If the member chassis run different software versions, upgrade the software to the same version. If they use different types of MPUs, replace MPUs.
2. Verify that at least one IRF physical port is up for an IRF port:
|
NOTE: An IRF port goes down only if all its physical ports are down. |
a. Execute the display interface command. Check the current state field for the status of an IRF physical port. For example:
<Sysname> display interface gigabitethernet 2/6/0/1
GigabitEthernet2/6/0/1 current state: UP
Line protocol current state: UP
IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0000-e80d-c000
Description: GigabitEthernet2/6/0/1 Interface
Loopback is not set
Media type is optical fiber, Port hardware type is 1000_BASE_SX_SFP
……
b. If any physical port bound to an IRF port is down, bring it up.
3. Verify that all IRF physical ports are connected correctly:
|
IMPORTANT: When you connect two neighboring IRF members, you must connect the physical ports of IRF-port 1 on one member to the physical ports of IRF-port 2 on the other. |
a. Execute the display irf configuration command. Check the IRF-Port1 and IRF-Port2 fields for IRF port bindings.
<Sysname> display irf configuration
MemberID NewID IRF-Port1 IRF-Port2
1 1 Ten-GigabitEthernet1/8/0/1 disable
Ten-GigabitEthernet1/8/0/2
2 2 disable Ten-GigabitEthernet2/12/0/1
Ten-GigabitEthernet2/12/0/2
b. Verify that the physical IRF connections are consistent with the IRF port bindings. In this example, Ten-GigabitEthernet 1/8/0/1 and Ten-GigabitEthernet 1/8/0/2 on member chassis 1 must be connected to Ten-GigabitEthernet 2/12/0/1 and Ten-GigabitEthernet 2/12/0/2 on member chassis 2.
c. If connection errors exist, reconnect the IRF physical ports.
4. Verify that all member chassis use the same system operating mode:
a. Execute the display system-working-mode command on each member chassis. Check the command output for mode inconsistency.
[Sysname] display system-working-mode
The current system working mode is standard.
The next system working mode is standard.
.
b. If mode inconsistency exists, execute the system-working-mode command to change the system operating mode. The system-working-mode command setting takes effect after a system reboot.
5. Verify that all MDC settings and settings for these commands are the same across all chassis: acl hardware-mode ipv6, irf mode enhanced ,and vpn popgo:
a. Execute the display current-configuration command. Check the configuration on each member chassis for configuration inconsistency.
[Sysname] display current-configuration
……
acl hardware-mode ipv6 enable
……
irf mode enhanced
……
undo vpn popgo
……
b. If configuration inconsistency exists, modify the configuration.
6. If the problem persists, contact H3C Support.
IRF split
Symptom
An IRF fabric splits.
Solution
To resolve the problem:
1. Use the system log to identify the IRF split time.
You can use this information to search the system log for events that might cause the split.
%Jun 26 10:13:46:233 2013 H3C STM/2/STM_LINK_STATUS_TIMEOUT: IRF port 1 is down because heartbeat timed out.
%Jun 26 10:13:46:436 2013 H3C STM/3/STM_LINK_STATUS_DOWN: -MDC=1; IRF port 2 is down.
2. Verify that all interface cards that have IRF physical ports are in Normal state:
a. Execute the display device command. Check the Brd Status field for the card state.
<Sysname>display device
Slot No. Brd Type Brd Status Subslot Num Sft Ver
1/0 LST1MRPNC1 Master 0 S12500-CMW710-R7128
1/1 LST1MRPNC1 Standby 0 S12500-CMW710-R7128
1/2 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/3 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/4 NONE Absent 0 NONE
1/5 NONE Absent 0 NONE
1/6 NONE Absent 0 NONE
1/7 NONE Absent 0 NONE
1/8 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
1/9 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
1/10 NONE Absent 0 NONE
1/11 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/12 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/13 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/14 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128
1/15 NONE Absent 0 NONE
1/16 NONE Absent 0 NONE
1/17 NONE Absent 0 NONE
1/18 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
1/19 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128
1/20 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/21 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/22 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/23 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/24 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/25 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/26 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/27 LST2SF18C1 Normal 0 S12500-CMW710-R7128
1/28 LST2SF18C1 Normal 0 S12500-CMW710-R7128
b. If an interface card is not in Normal state, use the methods described in "Card failure" to resolve the problem.
3. Verify that each IRF port has at least one physical port in up state:
a. Execute the display interface command. Check the current state field for the state of an IRF physical port. For example:
<Sysname> display interface gigabitethernet 2/6/0/1
GigabitEthernet2/6/0/1 current state: UP
Line protocol current state: UP
IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0000-e80d-c000
Description: GigabitEthernet2/6/0/1 Interface
Loopback is not set
Media type is optical fiber, Port hardware type is 1000_BASE_SX_SFP
……
b. If any physical port bound to an IRF port is down, use the methods described in "Troubleshooting links and ports" to recover the link state and bring up the physical port.
4. Remove hardware problems that might cause recurring IRF split events:
a. Execute the display version command. Check the uptime of the member chassis, MPUs, and interface cards that have IRF links.
<Sysname> display version
H3C Comware Software, Version 7.1.034, Release 7129
Copyright (c) 2004-2013 Hangzhou H3C Tech. Co., Ltd. All rights reserved
H3C S12518 uptime is 0 weeks, 1 day, 23 hours, 6 minutes
Last reboot reason : User reboot
Boot image: cfa0:/S12500-CMW710-BOOT-R7129.bin
Boot image version: 7.1.034P04, Release 7129
System image: cfa0:/S12500-CMW710-SYSTEM-R7129.bin
System image version: 7.1.034, Release 7129
LST1MRPNC1 2/0: uptime is 0 weeks, 1 day, 23 hours, 6 minutes
Last reboot reason : User reboot
3456 Mbytes SDRAM
1024 Kbytes NVRAM Memory
Type : LST1MRPNC1
BootRom : 2.19
Software : S12500-CMW710-R7129
PCB : Ver.A
……
b. Compare the uptime of chassis, MPUs, and interface cards to determine whether a member chassis, MPU, or interface card rebooted before the IRF split.
c. If the IRF split is caused by a chassis or card reboot, identify the reboot cause:
- If the reboot occurred because of a hardware problem, replace the faulty component.
- If the reboot occurred because of power failure, use the methods described in "Power supply failure" to remove the power supply problems.
5. If the problem persists, contact H3C Support.
Related commands
This section lists the commands that you might use for troubleshooting IRF.
Command |
Description |
display device |
Displays device configuration. Use this command to verify that all member chassis run the same software version and use the same type of MPUs. |
display interface |
Displays interface information. Use this command to verify that each IRF port has at least one physical port in up state. |
display irf configuration |
Displays IRF configuration on each member chassis. Use this command to identify physical ports bound to IRF-port 1 and IRF-port 2 on each member chassis before you check IRF physical connections. |
display system-working-mode |
Displays system operating mode. Use this command to verify that all member chassis are operating in the same mode. |
display current-configuration |
Displays the running configuration. In system view, verify that the MDC settings and the settings for the following commands are the same across all chassis: acl hardware-mode ipv6, irf mode enhanced, and vpn popgo. |
display version |
Displays the system version and uptime as well as the uptime of each card. Use this command to identify the runtime of each member chassis, MPU, and interface card that has IRF physical ports. Compare their uptime to determine whether a member chassis, MPU, or interface card rebooted before an IRF split. |
Troubleshooting system management
This section provides troubleshooting information for common problems with system management.
High CPU usage
Symptom
A CPU usage higher than 60% persists on a card.
<Sysname>display cpu-usage
Slot 0 CPU usage:
0% in last 5 seconds
61% in last 1 minute
0% in last 5 minutes
Slot 0 CPU 1 CPU usage:
0% in last 5 seconds
0% in last 1 minute
0% in last 5 minutes
Execute the display cpu-usage history command to display the CPU usage statistics within the last 60 minutes.
<Sysname>display cpu-usage history slot 0
100%|
95%|
90%|
85%|
80%|
75%|
70%|
65%|
60%|
55%|
50%|
45%|
40%|
35%| #
30%| # #
25%| # #
20%| # # # #
15%| ## # # ##
10%| ## # # ##
5%|############################################################
------------------------------------------------------------
10 20 30 40 50 60 (minutes)
cpu-usage (CPU 0) last 60 minutes (SYSTEM)
Solution
High CPU usage might occur because of the following issues:
· Route flapping.
· Too many routing policies.
· Packet attack.
· Link loop.
To resolve the problem:
1. Execute the display route-policy command to display the configured routing policies to verify that the configured routing policies are reasonable.
<Sysname> display route-policy
Route-policy: policy1
permit : 1
if-match cost 10
continue: next node 11
apply comm-list a delete
2. Execute the display interface command, and check for loop links.
<Sysname>display interface GigabitEthernet2/6/0/1
GigabitEthernet2/6/0/1 current state: UP
Line protocol current state: UP
IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0000-e80d-c000
Description: GigabitEthernet2/6/0/1 Interface
Loopback is not set
Media type is optical fiber, Port hardware type is 1000_BASE_SX_SFP
1000Mbps-speed mode, full-duplex mode
……
Last clearing of counters: Never
Peak value of input: 123241940 bytes/sec, at 2013-06-27 14:33:15
Peak value of output: 80 bytes/sec, at 2013-06-27 14:13:00
Last 300 seconds input: 26560 packets/sec 123241940 bytes/sec 99%
Last 300 seconds output: 0 packets/sec 80 bytes/sec 0%
……
If any loop occurs, verify the following:
¡ The link connections and port configuration are correct.
¡ STP is enabled, and the configuration is correct.
¡ The STP status of the neighboring device is normal.
¡ If all the previous configurations are correct, the reason might be:
- STP calculation error.
- STP calculation is correct, but the driver does not block a port.
You can do all of the following:
¡ Shut down the uplink port on the ring.
¡ Remove and insert the transceiver module into the port to restart STP calculation.
¡ Contact H3C Support.
3. If the problem persists, contact H3C Support.
High memory usage
Symptom
A memory usage higher than 70% persists on a card.
Use the display memory command to display the memory usage of a card.
<Sysname>display memory chassis 2 slot 2
The statistics about memory is measured in KB:
Chassis 2 Slot 2:
Total Used Free Shared Buffers Cached FreeRatio
Mem: 774280 591932 182348 0 0 6548 23.6%
-/+ Buffers/Cache: 175800 598480
Swap: 0 0 0
Solution
To resolve the problem:
1. Execute the display process memory command multiple times to do the following:
¡ Display the memory usage for all user processes on a card.
¡ Identify the process for which memory usage is continuously increasing.
If the memory usage of a process is continuously increasing, the memory might be leaked.
Dynamic memory is heap memory dynamically assigned to the device. Its value becomes large when memory is leaked.
<Sysname>display process memory chassis 2 slot 2
JID Text Data Stack Dynamic Name
1 168 604 24 64 scmd
2 0 0 0 0 [kthreadd]
3 0 0 0 0 [ksoftirqd/0]
……
78 112 9368 12 320 diagd
79 76 1040 8 8 mdcagentd
80 116 8860 8 16 fsd
81 140 992 16 212 dbmd
83 72 496 8 20 syslogd
84 168 41980 16 44 drvdiagd
85 172 17112 16 12 devd
94 112 8864 12 12 edev
……
The output shows that the process with the ID 78 uses the most memory.
2. Execute the display process memory heap command multiple times to do the following:
¡ Display heap memory usage for user process 78.
¡ Identify the memory block for which memory usage is continuously increasing.
If the memory usage of a memory block is continuously increasing, the memory might be leaked.
<Sysname>display process memory heap job 78 verbose
Heap usage:
Size Free Used Total Free Ratio
16 0 385 385 0.0%
24 2 49 51 3.9%
32 0 13 13 0.0%
40 0 7 7 0.0%
64 0 411 411 0.0%
72 0 4 4 0.0%
80 1 0 1 100.0%
96 1 0 1 100.0%
104 0 8 8 0.0%
136 0 8 8 0.0%
152 0 9 9 0.0%
184 0 1 1 0.0%
368 0 8 8 0.0%
3080 0 1 1 0.0%
8200 1 0 1 100.0%
29376 1 0 1 100.0%
Large Memory Usage:
Used Blocks : 24
Used Memory(in bytes): 2031616
Free Blocks : 0
Free Memory(in bytes): 0
Summary:
Total virtual memory heap space(in bytes) : 2113536
Total physical memory heap space(in bytes) : 454656
Total allocated memory(in bytes) : 2075736
3. Contact H3C Support.
Insufficient resources
Symptom
The system displays the following log and trap information when resources are insufficient:
%Jul 26 20:43:11:218 2012 H3C DRV_L3/4/NO_RESOURCE: -MDC=1-Slot=3; Insufficient system resources!
%Jul 26 20:44:51:259 2012 H3C DRV_L3/4/NO_RESOURCE: -MDC=1-Slot=6; No enough resource!
%Jul 26 20:47:18:712 2012 H3C DRV_L3/4/NO_RESOURCE: -MDC=1-Slot=3; Not enough resources are available to complete the operation.
Solution
ACL resources
The following features use ACL resources:
· QoS.
· Packet filter.
· Priority mapping and trust.
· Mirror.
· Protocol packet to CPU.
· Selective QinQ and VLAN mapping.
· Port binding, PORTAL, and EAD.
· Broadcast suppression.
· MAC-BASED-VLAN, VOICE VLAN, RSPAN, and UDP-Helper.
To resolve the problem:
1. Use the display qos-acl resource command to display the ACL usage on a card.
<Sysname>display qos-acl resource chassis 2 slot 2
Interfaces: GE2/2/0/1 to GE2/2/0/24
---------------------------------------------------------------------
Type Total Reserved Configured Remaining Usage
---------------------------------------------------------------------
IN-MQC-CAR 8192 0 0 8192 0%
IN-COMM-CAR 7168 0 0 7168 0%
IN-COUNT 8192 0 166 8026 2%
OUT-MQC-CAR 8192 0 166 8026 2%
OUT-COUNT 8192 0 166 8026 2%
ACL-RES 2048 0 73 1975 3%
Interfaces: GE2/2/0/25 to GE2/2/0/48
---------------------------------------------------------------------
Type Total Reserved Configured Remaining Usage
---------------------------------------------------------------------
IN-MQC-CAR 8192 0 0 8192 0%
IN-COMM-CAR 7168 0 0 7168 0%
IN-COUNT 8192 0 166 8026 2%
OUT-MQC-CAR 8192 0 166 8026 2%
OUT-COUNT 8192 0 166 8026 2%
ACL-RES 2048 0 73 1975 3%
2. If most ACL resources are allocated, optimize ACL configuration. For example, delete or combine ACL rules. If the configuration cannot be optimized, contact H3C Support.
MAC resources
MAC resource insufficiency problems easily occur in large Layer 2 networks. There is a large amount of MAC addresses in these networks. New MAC addresses cannot be learned because old MAC addresses have not aged.
To resolve the problem:
1. Display MAC addresses that have been learned.
<Sysname>display mac-address count
49 mac address(es) found
The output shows that the number of MAC addresses that have been learned is small.
2. H3C recommends that you do the following:
¡ Set a smaller MAC address aging time.
¡ Create VLANs by service or by department, and connect VLANs at Layer 3.
MPLS LSP resources
To resolve the problem:
1. Display MPLS LSP statistics.
<Sysname>display mpls lsp statistics
Lsp Type Total Ingress Transit Egress
STATIC LSP 0 0 0 0
STATIC CRLSP 0 0 0 0
LDP LSP 3 1 0 2
CRLDP CRLSP 0 0 0 0
RSVP CRLSP 0 0 0 0
BGP LSP 0 0 0 0
ASBR LSP 0 0 0 0
BGP IPV6 LSP 0 0 0 0
-------------------------------------------------------------------------
LSP 3 1 0 2
CRLSP 0 0 0 0
2. If MPLS LSP resources are insufficient, contact H3C Support.
Other system resources
Contact H3C Support.
Related commands
This section lists the commands that you might use for troubleshooting system management.
Command |
Remarks |
display cpu-usage |
Displays CPU usage statistics and tasks with high CPU usage. |
display cpu-usage history |
Displays the historical CPU usage statistics in charts. |
display interface |
Displays information about a specific interface. |
display mac-address |
Displays MAC address entries. |
display memory |
Displays memory usage for a card. |
display mpls lsp statistics |
Displays MPLS LSP statistics. |
display process memory |
Displays memory usage for all user processes on a card. |
display process memory heap |
Displays heap memory usage for a user process. |
display qos-acl resource |
Displays QoS and ACL resource usage. |
display route-policy |
Displays routing policy information. |