Download Book

Title	Size	Downloads
H3C S12500 Switch Series Troubleshooting Guide-R7128-6W100-book.pdf	492.17 KB

Table of Contents

H3C S12500 Switch Series Troubleshooting Guide-R7128-6W100

Related Documents

H3C S12500 Switch Series (R7128) Troubleshooting Guide

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of Hangzhou H3C Technologies Co., Ltd.

The information in this document is subject to change without notice.

Contents

General troubleshooting procedures 1

Obtaining information· 1

Obtaining log information· 1

Obtaining other information· 3

Troubleshooting procedure· 3

Troubleshooting flowchart 3

Problem types 5

Problem locations and possible results 6

Common service recovering and troubleshooting methods 7

Troubleshooting hardware· 8

Card failure· 8

Symptom·· 8

Solution· 10

Power supply failure· 14

Temperature alarm·· 18

Symptom·· 18

Solution· 18

Related commands 19

Troubleshooting links and ports 20

Error packets on a port 20

Symptom·· 20

Solution· 22

A port fails to go up· 23

Symptom·· 23

Solution· 23

A port in up state goes down· 24

Symptom·· 24

Solution· 24

A port frequently goes up and down· 25

Symptom·· 25

Solution· 25

Transceiver module failures 25

Symptom·· 25

Solution· 25

Related commands 27

Troubleshooting hardware forwarding· 28

Forwarding path problem·· 28

Symptom·· 28

Solution· 28

Online hardware diagnostic and failure protection· 29

Related commands 30

Troubleshooting packet forwarding failure· 30

Ping failure or packet loss 30

Symptom·· 30

Solution· 31

Layer 2 forwarding failure· 32

Symptom·· 32

Solution· 32

Layer 3 forwarding failure· 34

Symptom·· 34

Solution· 35

MPLS forwarding failure· 36

Symptom·· 36

Solution· 36

Related commands 39

Troubleshooting IRF· 41

IRF fabric establishment failure· 41

Troubleshooting system management 45

High CPU usage· 45

Symptom·· 45

Solution· 46

High memory usage· 47

Symptom·· 47

Solution· 48

Insufficient resources 49

Symptom·· 49

Solution· 49

Related commands 51

General troubleshooting procedures

Obtaining information

H3C recommends that you enable the information center by using the info-center enable command for fast troubleshooting. By default, the information center is enabled.

Obtaining log information

Log information includes logs in log files that record operation information and diagnostic information in diag files that record state information. The system stores these files in the CF card or Flash.

You can export the log and diag files through FTP, TFTP, or USB. To identify the files exported from different MPUs, save them in a specific order, for example, in different folders named chassisXslotY.

Table 1 Log information classification

Category	File name	Content
log file	logfileX.log	Command executions, traps, and operational logs.
diag file	XXX.gz	Device state, CPU state, memory state, configuration state, software entries, and hardware entries.

Restrictions and guidelines

Follow these restrictions and guidelines to obtain log information:

· Record the displayed information during operations for future analysis.

· Understand the impact of each operation and make sure the configuration can be restored upon operation failures.

· Make sure the current configuration is consistent with the saved configuration. Do not save the configuration during IRF split, card faults, and card reboot.

· After you perform an operation, wait for a while before you verify the results.

· Before you replace an MPU with a new MPU, make sure the new MPU has the same software version as the old MPU.

Obtaining log files

Use the logfile save command to save logs from the log buffer to the CF card on one of the following:

· The active and standby MPUs of a standalone device.

· The MPUs of IRF master and subordinate devices.

· The MDCs of a device.

<Sysname>logfile save

The contents in the log file buffer have been saved to the file cfa0:/logfile/lo

gfile9.log.

Display log files on the active MPU.

<Sysname> dir cfa0:/logfile/

Directory of cfa0:/logfile

0 -rw- 5233116 Apr 27 2013 09:20:44 logfile1.log

1 -rw- 5142919 May 03 2013 14:15:42 logfile2.log

2 -rw- 5193287 May 09 2013 12:28:08 logfile3.log

1021808 KB total (259072 KB free)

Display log files on the standby MPU.

<Sysname> dir slot1#cfa0:/logfile/

Directory of slot1#cfa0:/logfile

0 -rw- 5242287 May 13 2013 16:47:46 logfile4.log

1 -rw- 5143837 May 24 2013 22:56:46 logfile5.log

2 -rw- 5149806 Jun 01 2013 13:43:26 logfile6.log

1020068 KB total (643264 KB free)

Display log files on the MPU of an IRF subordinate device. If the subordinate device has two MPUs, execute this command on each MPU.

<Sysname> dir chassis2#slot0#cfa0:/logfile/

Directory of chassis2#slot0#cfa0:/logfile

0 -rw- 5215316 Jun 03 2013 05:49:20 logfile7.log

1 -rw- 5235163 Jun 21 2013 07:31:54 logfile8.log

2 -rw- 3256492 Jun 26 2013 09:01:08 logfile9.log

1021808 KB total (773424 KB free)

Display log files on each MDC. The following shows the log file on MDC 3.

<Sysname>dir cfa0:/mdc/

Directory of cfa0:/mdc

0 drw- - Jul 10 2013 14:56:50 mdc2

1 drw- - Jul 10 2013 16:48:04 mdc3

2 drw- - Jul 10 2013 16:43:20 mdc4

<Sysname>dir cfa0:/mdc/mdc3/logfile/

Directory of cfa0:/mdc/mdc3/logfile

0 -rw- 8417 Jul 10 2013 18:17:46 logfile1.log

1020068 KB total (701636 KB free)

Obtaining diag files

Execute the display diagnostic-information command, and enter "y" at the prompt to save the diag file to the CF card. If you select "n", not all the diagnostic information can be saved to the CF card. The more cards the device has, the more time the saving operation consumes. During the saving operation, do not execute any command.

<Sysname>display diagnostic-information

Save or display diagnostic information (Y=save, N=display)? [Y/N]:y

Please input the file name(*.gz)[flash:/diag.gz]:cfa0:/diag.gz

Diagnostic information is outputting to cfa0:/diag.gz.

Save successfully.

<Sysname>dir cfa0:/

Directory of cfa0:

……

6 -rw- 898180 Jun 26 2013 09:23:51 diag.gz

1021808 KB total (259072 KB free)

You can also view the diagnostic information by executing the following commands, but H3C recommends that you do not use this method. The screen-length disable command is used to avoid interruption of information output.

<Sysname>screen-length disable

% Screen-length configuration is disabled for current user.

<Sysname>display diagnostic-information

Save or display diagnostic information (Y=save, N=display)? [Y/N]:n

==================================================================

===============display cpu===============

Chassis 2 Slot 0 CPU 0 CPU usage:

4% in last 5 seconds

0% in last 1 minute

0% in last 5 minutes

Chassis 2 Slot 0 CPU 1 CPU usage:

0% in last 5 seconds

0% in last 1 minute

0% in last 5 minutes

……

Obtaining other information

You also need to obtain other operational information. The following lists some relevant information:

· Problem symptom, time, topology, configuration information, measures, and results.

· Operation logs, captured packet information, debug information, and information output from the console port during continual MPU and switching fabric card reboots.

· Alarms of cards, power supply, and fans.

Troubleshooting procedure

When the switch has a problem, do the following:

1. Obtain operation information.

2. Use the troubleshooting flowchart provided in "Troubleshooting flowchart" to determine the problem type.

3. Use the solution for the problem type to troubleshoot the switch.

If you cannot determine the problem, contact H3C Support.

Troubleshooting flowchart

Use the troubleshooting flowchart shown in Figure 1 to determine the problem type.

Figure 1 Troubleshooting flowchart

The following are commonly used troubleshooting methods:

· Collecting packet statistics on ports.

· Mirroring packets.

· Capturing packets.

· Configuring QoS policies to collect statistics.

· Enabling debugging functions.

· Replacing the suspicious hardware or install the suspicious hardware to another slot.

For example, if a transceiver might have a problem, do one of the following:

¡ Replace the transceiver with a transceiver that can operate correctly.

¡ Install the transceiver in another slot.

If the card in a slot might have a problem, do one of the following:

¡ Replace the card with a card that can operate correctly.

¡ Install the card into another slot.

Problem types

Card failure

A card failure might result in the following symptoms:

· A card cannot start up.

· A card reboots unexpectedly.

· A card reboots again and again.

· A card is not in the correct state.

· To troubleshoot a card failure, see "Card failure."

Power failure

A power failure might result in the following symptoms:

· Power LEDs are not in the correct states.

· Power alarm messages are displayed continuously.

To troubleshoot a power failure, see "Power supply failure."

Fan failure

A fan failure might result in the following symptoms:

· Fans do not operate.

· Fan LEDs are not in the correct states

· Fan alarm messages are displayed continuously.

To troubleshoot a fan failure, see "Fan failure."

Temperature problem

· If temperature alarm messages are displayed, the device might have a temperature problem. To troubleshoot a temperature problem, see "Temperature alarm."

Port failure

A port failure might result in the following symptoms:

· A port cannot come up.

· A port goes down and comes up frequently.

· The counts of packet errors on the port are not zero.

To troubleshoot a port failure, see "Troubleshooting links and ports."

Hardware forwarding failure

If the log messages such as "Forwarding fault" or "Board fault: chassis X slot Y, please check it" are displayed, the device might have a hardware forwarding failure.

To troubleshoot a hardware forwarding failure, see "Troubleshooting hardware forwarding."

Packet forwarding failure

A packet forwarding failure might result in the following symptoms:

· Some ping packets are lost, or the ping operation fails.

· Some tracert packets are lost, or the tracert operation fails.

· Layer 2 frames are lost, or the Layer 2 link is down.

· Layer 3 frames are lost, or the Layer 3 connection is down.

· The MPLS service is not running correctly.

To troubleshoot a packet forwarding failure, see "Troubleshooting links and ports."

IRF failure

An IRF failure might result in the following symptoms:

· The IRF fabric cannot be formed.

· An IRF split occurs.

To troubleshoot an IRF failure, see "Troubleshooting IRF."

Overuse of CPU

If the switch uses too much CPU, see "High CPU usage."

Overuse of memory

If the switch uses too much memory, see "High memory usage."

Insufficient resources

If the "No enough resource" message is displayed, see "Insufficient resources."

Problem locations and possible results

Figure 2 shows a typical network model and the possible problem locations. For higher availability and quick switchover and restoration in response to failures, the network uses two upstream links and two core switches. Table 2 shows the possible symptoms and results of different problem locations.

Figure 2 Typical network model and the possible problem locations

Table 2 Problem locations and possible symptoms and results

Problem location	Possible symptoms	Possible results
1 (including transceivers)	A port is down.	A service switchover occurs.
1 (including transceivers)	Counts of packet errors are increased.	All services on the link are affected.
2	A card fails.	A service switchover occurs.
	A chip on a card fails while the card is operating correctly.	Services on the chip are affected. If a switching fabric module failure occurs, the whole device is affected.
	A software error occurs.	The device reboots and a service switchover occurs. If a protocol module has a problem, the service is usually affected.
3	Same as problem location 1.	Services on the access switch are affected. The scope of affected services is smaller than a problem at problem location 1.
4	The device is down.	Services on the device are affected.
	A chip on a card fails.	Some ports or all services on the device are affected.
	A software error occurs.	The device reboots and all services on the device are affected. If a protocol module has a problem, the service is usually affected.
5	Same as problem location 1.	Server services on the link are affected.
6	The network is operating correctly but a service is not.	The service on the server is affected.

Common service recovering and troubleshooting methods

Table 3 Common service recovering and troubleshooting methods

Failure category	Service recovering methods	Troubleshooting methods
Hardware	· Isolate the failed card. · Isolate the failed device by adjusting service traffic forwarding paths. For example, adjust the preferences for routes so traffic is switched to other paths.	Complete required tests on the backup hardware, and replace the failed hardware.
Software	· Reboot the protocols on the failed device. · Isolate the failed device by adjusting service traffic forwarding paths.	· Upgrade the software or install patches. · Adjust the network topology, or modify the configuration to remove the failures.
Link	Isolate the failed link by adjusting service traffic forwarding paths.	Remove link errors.
Others	· Correct configuration errors. · Connect the ports of the devices correctly. · Isolate the failed link by adjusting service traffic forwarding paths.	· Correct configuration errors. · Connect the ports of the devices correctly. · Repair the power and air conditioner systems for the devices.

Troubleshooting hardware

Card failure

Symptom

· A card runs into an abnormal state: Absent, Fault, Off, Offline, or Illegal.

· A card fails to boot, or it reboots unexpectedly or repeatedly.

NOTE:

If the switch outputs log messages, such as "Forwarding fault," "Board fault: chassis X slot Y," or "please check it," see "Troubleshooting hardware forwarding."

How to identify a card state

A card can operate in Normal, Master, Standby, Absent, Fault, Off, Offline, or Illegal state:

· Normal—The card is operating correctly.

· Master—The card is an active MPU.

· Standby—The card is a standby MPU.

· If the card is in Fault, Off, Offline, or Illegal state, or the slot in which the card is installed is in Absent state, the card might be faulty. See "Solution" to rectify the fault.

You can execute the display device command and check the Brd Status field for the card states. The following is a sample command output.

<Sysname> display device

Slot No. Brd Type Brd Status Subslot Num Sft Ver

1/0 LST1MRPNC1 Master 0 S12500-CMW710-R7128

1/1 LST1MRPNC1 Standby 0 S12500-CMW710-R7128

1/2 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/3 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/4 NONE Absent 0 NONE

1/5 NONE Absent 0 NONE

1/6 NONE Absent 0 NONE

1/7 NONE Absent 0 NONE

1/8 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

1/9 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

1/10 NONE Absent 0 NONE

1/11 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/12 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/13 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/14 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/15 NONE Absent 0 NONE

1/16 NONE Absent 0 NONE

1/17 NONE Absent 0 NONE

1/18 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

1/19 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

1/20 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/21 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/22 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/23 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/24 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/25 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/26 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/27 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/28 LST2SF18C1 Normal 0 S12500-CMW710-R7128

How to confirm a card reboot

Use the display version command or display the card running time through log files to confirm whether a card rebooted. If the card uptime is less than other cards, the card rebooted. See "Solution" to resolve the problem.

<Sysname>display version

H3C Comware Software, Version 7.1.034, Release 7129

H3C S12518 uptime is 0 weeks, 0 days, 0 hours, 57 minutes

Last reboot reason : User reboot

Boot image: cfa0:/S12500-CMW710-BOOT-R7129.bin

Boot image version: 7.1.034P04, Release 7129

System image: cfa0:/S12500-CMW710-SYSTEM-R7129.bin

System image version: 7.1.034, Release 7129

LST1MRPNC1 2/0: uptime is 0 weeks, 0 days, 0 hours, 57 minutes

Last reboot reason : User reboot

3456 Mbytes SDRAM

1024 Kbytes NVRAM Memory

Type : LST1MRPNC1

BootRom : 2.19

Software : S12500-CMW710-R7129

PCB : Ver.A

Board Cpu:

Number of Cpld: 2

Cpld 0:

SoftWare : 002

Cpld 1:

SoftWare : 002

PowChipA : 001A

CpuCard

Type : LSR1CPA

PCB : Ver.C

Number of Cpld: 1

Cpld 0:

SoftWare : 001

BootRom : 2.12

Mbus card

Type : LSR1MBCB

Software : 115

PCB : Ver.B

……

Solution

In Absent state

To resolve the problem:

1. Verify that the card is fully seated. You can remove and reinstall the card to make sure the card is installed securely.

2. Do the following:

¡ Install this card into another slot.

¡ Install another card that runs correctly on the chassis into this slot to determine whether the card is faulty.

3. Verify that the LEDs on the card panel and inside the card do not indicate any fault.

4. If the card is an MPU or switching fabric module, connect the card to a terminal through a serial cable to verify that the card boots correctly.

5. If the card is confirmed to be faulty, replace the card and contact H3C Support.

In Off state

Determine whether a user powered off the card by using the power-supply off command.

¡ If they did, power on the card by using the power-supply on command.

¡ If they did not, the power supply of the card is faulty. Replace the card and contact H3C Support.

In Fault state

To resolve the problem:

1. Wait a period of time and determine whether the card remains in Fault state or reboots after becoming Normal. If the card reboots after becoming Normal, contact H3C Support.

2. Verify that the card boots correctly.

¡ For an MPU or switching fabric module, connect the card to a terminal through a serial cable to verify that the card boots correctly. If a DRAM test fails, causing repeated reboots (as shown in the following), verify that the DRAM is installed securely.

readed value is 55555555 , expected value is aaaaaaaa

DRAM test fails at: 080ffff8

Fatal error! Please reboot the board.

¡ For an LPU, verify that the system working mode supports the card type.

Use the display system-working-mode command to display the system operating mode:

<Sysname> display system-working-mode

The current system working mode is routee.

The next system working mode is routee

If the current system operating mode does not support the card, the switch generates related information as shown in the following example:

%Jun 26 10:13:04:006 2013 H3C SYSM/1/DRV_SYSM_PROMPT: -MDC=1;

This is not hardware fault, please change mode by command 'system-working-mode' in system view.

%Jun 26 10:13:04:006 2013 H3C SYSM/1/DRV_SYSM_PROMPT: -MDC=1;

chassis 2 slot 2 is an EB type board, and it supports Standard working mode only.

%Jun 26 10:13:04:006 2013 H3C SYSM/1/DRV_SYSM_PROMPT: -MDC=1;

ERROR!!! chassis 2 slot 2 doesn't support the current system working mode, board rebooting!

The output shows that the EB card is not supported in Routee mode.

If you determine that the current system operating mode does not support the card, use the system-working-mode command to modify the system operating mode. Then save the configuration. The new operating mode takes effect after the switch reboots.

[Sysname]system-working-mode standard

Do you want to change the system working mode? [Y/N]:y

The system working mode is changed, please save the configuration and reboot the system to make it effective.

[Sysname]save

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[cfa0:/ali0207-V7.cfg]

(To leave the existing filename unchanged, press the enter key):

cfa0:/ali0207-V7.cfg exists, overwrite? [Y/N]:y

Validating file. Please wait...

Saved the current configuration to mainboard device successfully.

3. Install the card into another slot to determine whether the card is faulty.

4. If the card is confirmed to be faulty, replace the card and contact H3C Support.

In Offline state

To resolve the problem:

1. Determine whether a user isolated the card from the system by using the board-offline command. If the card is isolated due to this operation, use the undo board-offline command to remove the configuration. A card is also isolated from the system when POST is performed.

2. If an LPU is isolated from the system, a fault might be detected on the LPU by the online diagnostic module. You can execute the display hardware-failure-detection command, and check for the records at the time when the card was isolated. If the LPU is faulty, replace the LPU and contact H3C Support.

<Sysname>display hardware-failure-detection

Current level:

chip : isolate

board : isolate

forwarding : isolate

---------------------Chassis 2, Slot 0 executed records:-------------------

Chassis 2, Slot 6:

1. 2013-06-26, 09:49:15 some auto-down ports on this slot are down by forwarding detection.

---------------------Chassis 2, Slot 0 trapped records:--------------------

Chassis 1, Slot 3:

1. 2013-06-20, 15:17:44 warned by forwarding detection.

Chassis 2, Slot 6:

1. 2013-06-26, 09:52:22 warned by forwarding detection.

3. If switching fabric modules are isolated from the system, forwarding-plane failures might be detected, and the system generates log messages such as "Forwarding fault," "Board fault: chassis X slot Y," and "please check it." Verify that the failure is removed after the switching fabric modules are isolated from the system. You can execute the display hardware-failure-detection command to display hardware failure detection and fix information.

¡ If one switching fabric module is isolated from the system, and the forwarding-plane failure is removed after the switching fabric module is isolated, the switching fabric module is faulty. Replace the switching fabric module and contact H3C Support. If the forwarding-plane failure persists after the switching fabric module is isolated, the switching fabric module is not faulty, because the switching fabric module does not participate in traffic forwarding after being isolated. (The online diagnostic module is not intelligent enough, and misjudgment might occur at multiple points of failures.) You can use the undo board-offline command to get the switching fabric module online. See "Troubleshooting hardware forwarding" to resolve the problem, and contact H3C Support.

¡ If multiple switching fabric modules are isolated, the LPUs might be faulty. See "Troubleshooting hardware forwarding" to resolve the problem, and contact H3C Support.

In Illegal state

To resolve the problem:

1. Verify that the switch supports the card.

2. Verify that the switch software version supports the card. New cards cannot boot on an earlier software version. Upgrade the software version to support the new cards.

3. Insert the card into another slot to determine whether the card is faulty.

4. If the problem persists, replace the card and contact H3C Support.

Unexpected reboot

Unexpected reboot means that a card has rebooted unexpectedly while its current state is Normal.

1. View the log messages, or execute the display version command to determine the period during which the card rebooted. Then determine whether a user rebooted the card by using the reboot command or by powering off and then powering on the card during the period.

2. On a switch running 18XX or a later version, the reason for the last reboot is displayed in the display version command output. You can check the Last reboot reason field for the event that caused the last reboot. As shown in the following example, User reboot indicates that the reason for the last reboot is that a user rebooted it.

<Sysname>display version

H3C Comware Software, Version 7.1.034, Release 7129

H3C S12518 uptime is 0 weeks, 0 days, 0 hours, 5 minutes

Last reboot reason : User reboot

Boot image: cfa0:/S12500-CMW710-BOOT-R7129.bin

Boot image version: 7.1.034P04, Release 7129

System image: cfa0:/S12500-CMW710-SYSTEM-R7129.bin

System image version: 7.1.034, Release 7129

LST1MRPNC1 2/0: uptime is 0 weeks, 0 days, 0 hours, 5 minutes

Last reboot reason : User reboot

3456 Mbytes SDRAM

1024 Kbytes NVRAM Memory

Type : LST1MRPNC1

BootRom : 2.19

Software : S12500-CMW710-R7129

PCB : Ver.A

Board Cpu:

Number of Cpld: 2

Cpld 0:

SoftWare : 002

Cpld 1:

SoftWare : 002

PowChipA : 001A

……

3. If all cards rebooted simultaneously, verify the following:

¡ The power supplies operate correctly.

¡ The power source is not powered off.

¡ The power cables are connected securely.

4. Verify that log messages such as "Slot X need to be rebooted automatically!" are not generated during the card reboot. If a message like that is displayed, replace the card and contact H3C Support.

5. Verify that the message "Hardware error" is not displayed. If the message is displayed, view the error code:

¡ If the error code is 0 through 31 or no smaller than 100, the power supply of the card is faulty. Replace the card and contact H3C Support.

¡ For other error codes, contact H3C Support.

%Jul 7 18:10:50:890 2012 H3C DIAG/1/ALERT: -MDC=1; Hardware error! slot=6, code=0

%Jul 7 18:10:50:890 2012 H3C DIAG/1/ALERT: -MDC=1; Hardware error! slot=6, code=1

%Jul 7 18:10:50:890 2012 H3C DIAG/1/ALERT: -MDC=1; Hardware error! slot=6, code=2

6. Execute the display hardware-failure-detection command. Verify that there is no card reboot record in the determined reboot period in the command output. If there is a card reboot record in the determined period, contact H3C Support.

7. If the problem persists, contact H3C Support.

Power supply failure

Symptom

The power LED on the switch indicates a failure. An alarm is generated, indicating that a power supply or power monitoring unit (PMU) is faulty, as shown in the following example:

%Jun 26 10:13:46:233 2013 H3C DEV/2/POWER_MONITOR_FAILED: -MDC=1; Power monitor unit 1 failed.

%Jun 27 18:10:50:890 2013 H3C DEVD/4/DRV_DEV_PSU_CHANGED: -MDC=1; Chassis 1: PSU ID may be changed, please check it!

Solution

To resolve the problem:

1. Verify that the power supply or PMU is securely installed and that the power supply or PMU LEDs do not indicate any failure. If LEDs of the power supply or PMU indicate any failure, remove and reinstall the power supply or PMU to make sure the module is installed securely. You can also determine whether the power supply or PMU is faulty by exchanging it with another one that runs correctly.

2. Execute the display power-supply verbose command to display the power supply information.

¡ If the power supply and PMU are installed securely but the power supply status field is empty or Absent, a failure occurs. The fault cause is displayed following the status field:

- If the cause is Under-vol, the power supply might not connect to the power cord, or the external power supply might have a bad contact.

- For other causes, remove and reinstall the power supply to make sure the power supply is installed securely. You can also determine whether the power supply is faulty by exchanging it with another one that runs correctly.

¡ Verify that the PMU information (System power monitoring unit in the command output) is displayed correctly. If the PMU information fails to be displayed, remove and reinstall the PMU, and determine whether the PMU is faulty by exchanging it with another one that runs correctly.

3. Verify that the card power states are On. For a card that is installed securely in a slot, do one of the following, depending on the state of the card:

¡ In Absent state—See "In Absent state" to remove the failure.

¡ In Wait state—The system power is insufficient, and the card is waiting to be powered on. Verify that the power source and the power supplies run correctly.

¡ In Off state—The card powers off automatically due to user operation, over-temperature protection, or power supply failure, and it will not power on automatically. See "In Offline state" to resolve the problem.

4. If a power supply or PMU is faulty, replace the module. If the problem persists, contact H3C Support.

The following is a sample output of the display power-supply command:

<Sysname>display power-supply

Power info on chassis 0:

PSU 1/1 state: Normal

PSU 1/2 state: Normal

PSU 1/3 state: Normal

PSU 1/4 state: Normal

PSU 1/5 state: Normal

PSU 1/6 state: Normal

PSU 2/1 state: Normal

PSU 2/2 state: Normal

PSU 2/3 state: Normal

PSU 2/4 state: Normal

PSU 2/5 state: Normal

PSU 2/6 state: Normal

<Sysname>display power-supply verbose

Power info on chassis 0:

System power-supply policy: enable

System power-module redundant(configured): 1

System power usable: 22000 Watts

System power redundant(actual): 2000 Watts

System power allocated: 7350 Watts

System power available: 14650 Watts

SYSTEM POWER USED(CURRENT): 4959.21 Watts

System power monitoring unit 1:

Software version: 107

System power monitoring unit 2:

Software version: 107

Type In/Out Rated-Vol(V) Existing Usable Redundant(actual)

---------- ------ ------------ -------- ------ -----------------

PSE9000-A AC/DC 220(default) 12 11 1

DC output voltage information:

Tray Value(V) Upper-Threshold(V) Lower-Threshold(V) Status

---- -------- ------------------ ------------------ -------

1 50.08 51.00 49.00 Normal

2 50.10 51.00 49.00 Normal

DC output current information:

Total current(A): 99.00

Branch Value(A)

------ --------

1/1 9.20

1/2 8.00

1/3 8.40

1/4 7.40

1/5 9.00

1/6 7.60

2/1 7.60

2/2 9.00

2/3 7.60

2/4 7.60

2/5 9.00

2/6 8.60

PSU Status:

ID Status Input-Err Output-Err High-Temperature Fan-Err Closed Current-Limit

--- ------- ----------- ---------- ---------------- ------- ------ -------------

1/1 Normal

1/2 Normal

1/3 Normal

1/4 Normal

1/5 Normal

1/6 Normal

2/1 Normal

2/2 Normal

2/3 Normal

2/4 Normal

2/5 Normal

2/6 Normal

Line-card power status:

Slot Board-Type Watts Status

---- --------------- ----- ------

2 LST1XP8LEB1 280 On

3 LST1XP8LEB1 280 On

4 LST1XP8LEB1 280 On

5 LST1XP8LEB1 280 On

6 LST1XP8LEB1 280 On

7 LST1XP8LEB1 280 On

8 LST1XP8LEB1 280 On

9 LST1XP8LEB1 280 On

10 LST1XP8LEB1 280 On

11 LST1XP8LEB1 280 On

12 LST1XP8LEB1 240 On

13 LST1XP8LEB1 280 On

14 LST1XP8LEB1 240 On

15 LST1XP8LEB1 240 On

16 LST1XP8LEB1 280 On

17 LST1XP8LEB1 280 On

18 LST1XP8LEB1 280 On

19 LST1XP8LEB1 280 On

Fan failure

Symptom

The fan tray LEDs indicate a failure. A fan error message is displayed on the switch, as shown in the following example:

%Jun 26 10:12:24:805 2013 H3C DEV/3/FAN_ABSENT: -MDC=1; Chassis 2 Fan 2 is absent.

%Jun 26 10:12:32:805 2013 H3C DEVD/2/DRV_DEV_FAN_CHANGE: -MDC=1; Chassis 2: Fan communication state changed: Fan 1 changed to fault.

%Jun 26 10:12:42:405 2013 H3C DEV/2/FAN_FAILED: -MDC=1; Chassis 2 Fan 1 failed.

Solution

To resolve the problem:

1. Put your hand at the air outlet to verify that there is air being exhausted from the air outlet. If no air is being exhausted from the outlet, the fans are faulty.

2. Verify that the airflow is not blocked at the air inlet and outlet.

3. Verify that the fan tray is securely installed. You can remove and reinstall the fan tray to make sure that the fan tray is securely installed.

4. Verify that the status of each fan is normal and that the speed difference between the fans does not exceed 50%. Execute the display fan verbose command to display detailed information about the fans. If there is an abnormality, verify that the fan tray is not faulty by exchanging it with another one that runs correctly.

5. If the problem persists, replace the fan tray. If there is no new fan tray, power off the switch to avoid damage caused by high temperatures. The switch can be used temporarily if there are cooling measures to maintain the switch operating temperature below 50°C (122°F).

<Sysname>display fan verbose

Fan-tray verbose state on chassis 0:

Fan-tray 1:

Software version: 108

Hardware version: Ver.A

CPLD version: 002

Fan number: 12

Temperature: 27 ℃

High temperature alarm threshold: 60 ℃

Low speed alarm threshold: 1450 rpm

Fan Status Speed(rpm)

--- ---------- ----------

1 normal 3780

2 normal 3780

3 normal 3720

4 normal 3840

5 normal 3900

6 normal 3660

7 normal 3780

8 normal 3840

9 normal 3660

10 normal 2940

11 normal 2940

12 normal 2880

Fan-tray 2:

Software version: 108

Hardware version: Ver.A

CPLD version: 002

Fan number: 12

Temperature: 21 ℃

High temperature alarm threshold: 60 ℃

Low speed alarm threshold: 1450 rpm

Fan Status Speed(rpm)

--- ---------- ----------

1 normal 3720

2 normal 3720

3 normal 3780

4 normal 3660

5 normal 3660

6 normal 3720

7 normal 3660

8 normal 3660

9 normal 3660

10 normal 2820

11 normal 2820

12 normal 2760

Temperature alarm

Symptom

A temperature over-low or over-high alarm is generated on the switch, as shown in the following example:

%Jun 26 10:13:46:233 2013 H3C DEV/4/TEMPERATURE_WARNING: -MDC=1; Temperature is greater than warning upper limit on Chassis 1 slot 2 sensor inflow 1.

Solution

To resolve the problem:

1. Verify that the ambient temperature is in the compliant range. If the temperature is too high, find the cause. The possible cause might be that the equipment room has bad ventilation or the air conditioning is faulty.

2. Verify that the current temperature of the switch does not exceed the upper and lower warning and alarm thresholds. The card might be damaged when operating continuously at a high temperature. You can feel the card by hand, or execute the display environment command to display temperature information.

¡ If the temperature is too high, see "Fan failure" to determine whether fan failure causes the problem.

¡ If the Temperature field displays error or a value out of the ordinary, the switch might fail to access the card temperature sensor through the I2C bus. The switch accesses the transceiver modules through the same I2C bus. You can view whether the transceiver module information is displayed correctly. If the switch can access the transceiver modules, use the temperature-limit command to reconfigure the temperature thresholds. Then use the display environment command to view whether the setting takes effect.

[Sysname]temperature-limit chassis 2 slot 0 hotspot 1 -20 85 90

<Sysname>display environment

System temperature information (degree centigrade):

-------------------------------------------------------------------------------

Slot Sensor Temperature LowerLimit WarningLimit AlarmLimit ShutdownLimit

2/0 inflow 1 35 -25 70 85 N/A

2/0 outflow 1 40 -20 80 85 N/A

2/0 hotspot 1 43 -20 85 90 N/A

2/2 inflow 1 39 -20 70 85 N/A

2/2 outflow 1 40 -10 80 90 N/A

2/2 hotspot 1 41 -10 80 90 N/A

2/3 inflow 1 41 -20 70 85 N/A

2/3 outflow 1 57 15 80 85 N/A

2/3 hotspot 1 41 -20 75 80 N/A

2/3 hotspot 2 50 0 75 80 N/A

2/4 inflow 1 43 -20 70 85 N/A

2/4 outflow 1 60 15 80 85 N/A

2/4 hotspot 1 43 -20 75 80 N/A

2/4 hotspot 2 54 0 75 80 N/A

3. If the problem persists, contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting hardware.

Command	Description
display device	Displays device information, including the card states.
display environment	Displays the temperature statistics of the device, including the current temperature and temperature thresholds.
display fan	Displays the operating states of fans.
display hardware-failure-detection	Displays hardware failure detection and rectification information, including the rectification actions for each failure and historic information about the last ten fault rectifications on each card.
display power-supply	Displays power supply information: · Enabled/disabled status of the power supply management function. · Power supply type, rated input voltage, and rated output power. · Number of redundant power supplies and the available, redundant, used, and remaining power of each power supply. · Status of the installed power supplies. · Power supply status of the LPUs.
display system-working-mode	Displays the current system operating mode.
display version	Displays system version information, card running time, and cause of the last reboot.
save	Saves the running configuration to a specific configuration file.
system-working-mode	Sets the system operating mode to modify the hardware resources allocation. The command takes effect after the configuration is saved and the device reboots.
temperature-limit	Sets the temperature alarm thresholds for the device.

Troubleshooting links and ports

This section provides troubleshooting information for common problems with links and ports.

Error packets on a port

Symptom

Use the display interface command to display the traffic statistics about incoming packets and outgoing packets of a port. The error packet count is not 0.

<Sysname> display interface GigabitEthernet1/8/0/1

GigabitEthernet1/8/0/1 current state: UP

Line protocol current state: UP

IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: b8af-67bc-24fa

Description: GigabitEthernet1/8/0/1 Interface

Loopback is not set

Media type is twisted pair, Port hardware type is 1000_BASE_T

1000Mbps-speed mode, full-duplex mode

Link speed type is autonegotiation, link duplex type is autonegotiation

Flow-control is not enabled

The Maximum Frame Length is 9216

Allow jumbo frame to pass

Broadcast MAX-ratio: 100%

Multicast MAX-ratio: 100%

Unicast MAX-ratio: 100%

PVID: 999

Mdi type: automdix

Port link-type: access

Tagged Vlan: none

UnTagged Vlan: 999

Port priority: 2

Last clearing of counters: Never

Peak value of input: 70 bytes/sec, at 2013-03-19 13:04:15

Peak value of output: 210 bytes/sec, at 2013-03-19 13:04:15

Last 300 seconds input: 0 packets/sec 70 bytes/sec 0%

Last 300 seconds output: 0 packets/sec 210 bytes/sec 0%

Input (total): 693897 packets, 72834962 bytes

22196 unicasts, 584504 broadcasts, 87197 multicasts, - pauses

Input (normal): 693897 packets, 72834962 bytes

22196 unicasts, 584504 broadcasts, 87197 multicasts, 152536 pauses

Input: 0 input errors, 0 runts, 0 giants, 0 throttles

0 CRC, 0 frame, 0 overruns, - aborts

- ignored, - parity errors

Output (total): 7515164 packets, 14001669469 bytes

20811 unicasts, 6228300 broadcasts, 1266053 multicasts, - pauses

Output (normal): 7515164 packets, 14001669469 bytes

20811 unicasts, 6228300 broadcasts, 1266053 multicasts, 0 pauses

Output: 0 output errors, - underruns, - buffer failures

0 aborts, 0 deferred, 0 collisions, 0 late collisions

- lost carrier, - no carrier

Table 4 Error packet fields for incoming packets

Field	Description
input errors	Number of incoming error packets.
Runts	Number of incoming frames shorter than 64 bytes, in correct format, and containing valid CRCs.
Giants	Number of incoming frames larger than the maximum frame length configured on the interface.
CRC	Number of incoming frames that contained CRC errors.
frame	Number of incoming frames that contained CRC errors and a non-integer number of bytes.

Table 5 Error packets fields for outgoing packets

Field	Description
output errors	Number of outgoing error packets.
aborts	Number of packets that failed to be transmitted.
deferred	Number of frames that the interface failed to transmit when the delay exceeded two times the maximum packet transmission time because the medium was busy.
collisions	Number of frames that the interface stopped transmitting because Ethernet collisions were detected during transmission.
late collisions	Number of frames that the interface deferred to transmit after transmitting their first 512 bits because of detected collisions.

Solution

The number of incoming error packets of the CRC, frame, and throttle types keeps increasing on a port

To resolve the problem:

1. Use a tester to test the link, and verify that the link quality or fiber signal attenuation of the link is normal. If a link failure exists, replace the network cable or fiber.

A weak link quality or serious fiber signal attenuation will cause packet transmission errors.

2. Verify that the transceiver module is operating correctly if a transceiver module is used.

For more information, see "Transceiver module failures."

3. Use the network cable or fiber and transceiver module of the port to connect to another port that is operating correctly.

¡ If error packets do not appear on the new port and error packets appear after the network cable or fiber and transceiver module is connected to the current port again, you can determine that the port fails. Use another port that is operating correctly, and contact H3C Support.

¡ If error packets still appear on the new port, the peer device and intermediate transmission links might fail. Examine the peer device and intermediate transmission links.

4. Verify that the peer device and intermediate devices are operating correctly.

5. If the problem persists, contact H3C Support.

The number of incoming error packets of the overrun type keeps increasing on a port

The number of overrun packets keeps increasing on a port because the input rate exceeds the processing capability of the port, which causes congestion.

To resolve the problem:

1. Execute the display interface command multiple times when both of the following are true:

¡ Only one port cannot correctly send and receive packets, or the device attached to only one port cannot transmit traffic.

¡ The other ports on the same interface card are operating correctly.

2. Perform one of the following tasks, depending on the error packet count trend:

¡ If the number of input errors increases, but the number of overruns does not increase, examine the fiber, transceiver module, and the peer device.

¡ If the number of input errors increases and the increment is the same as the increment of overruns, the interface card might be internally congested or blocked. To resolve the problem, contact H3C Support.

3. If the problem persists, contact H3C Support.

The incoming error packets of the jumbo type keeps increasing on a port

To resolve the problem:

1. Verify that the jumbo frame configurations are the same on both ends, including:

¡ Whether jumbo frame support is enabled.

¡ The default maximum jumbo frame size allowed.

¡ The configured maximum jumbo frame size allowed.

2. If the problem persists, contact H3C Support.

The number of outgoing error packets keeps increasing on a port

To resolve the problem:

1. Examine the duplex mode of the port. Configure the port to operate in full duplex mode if the port is operating in half duplex mode.

2. If the problem persists, contact H3C Support.

A port fails to go up

Symptom

A port cannot go up.

Solution

To resolve the problem:

1. Verify that the network cable or fiber link between ports is correct.

2. Verify that the Rx end and the Tx end are correctly connected.

3. Verify that the intermediate transmission link is correct by performing one of the following tasks:

¡ Replace the network cable or fiber between ports.

¡ Connect other ports that are operating correctly by using the network cable or fiber.

4. Verify that the configurations of the local port and the peer port are correct, including whether the port is shutdown, and its speed, duplex mode, negotiation mode, and MDI.

[Sysname]display current-configuration interface ten-gigabitethernet 1/6/0/1

interface Ten-GigabitEthernet1/6/0/1

port link-mode bridge

port link-type trunk

port trunk permit vlan 1 3102

port link-aggregation group 1

Return

Table 6 Support for duplex modes

Speed (right)	10 Gbps	1000 Mbps	100 Mbps	10 Mbps
Duplex mode (below)	10 Gbps	1000 Mbps	100 Mbps	10 Mbps
Full	Supported	Supported	Supported	Supported
Half	No supported	No supported	No supported	No supported

5. If the port has a transceiver module installed, verify that the transceiver modules at both ends of the link are consistent in the rate, wavelength, and single-mode or multi-mode status.

[Sysname]display transceiver interface ten-gigabitethernet 2/9/0/1

Ten-GigabitEthernet2/9/0/1 transceiver information:

Transceiver Type : 10G_BASE_LR_XFP

Connector Type : LC

Wavelength(nm) : 1310

Transfer Distance(km) : 10(SMF)

Digital Diagnostic Monitoring : YES

Vendor Name : H3C

6. Replace the transceiver module with a transceiver module that is operating correctly, and determine whether the transceiver modules fail.

For more information, see "Transceiver module failures."

7. If the transceiver module fails, replace the transceiver module, and contact H3C Support.

A port in up state goes down

Symptom

A port in up state goes down.

Solution

To resolve the problem:

1. Examine the logs of the local device and the peer device, and verify that a shutdown operation has not been performed.

2. Examine the status of ports at both ends. Determine whether the port is shut down because of the protocol failures or because of the failures detected by the online diagnosis module.

3. Contact H3C Support if Protect DOWN appears in the output for a port, for example, GigabitEthernet 2/6/0/1.

Protect DOWN means that the port goes down because the isolate keyword is specified for the hardware-failure-detection command. When the online diagnosis module detects port failures, the port will be shut down and isolated, so that the traffic can be switched to the backup link.

[Sysname]display interface GigabitEthernet2/6/0/1

GigabitEthernet2/6/0/1 current state: Protect DOWN

Line protocol current state: DOWN

IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0000-e80d-c000

Description: GigabitEthernet2/6/0/1 Interface

Loopback is not set

Media type is optical fiber, Port hardware type is 1000_BASE_SX_SFP

Unknown-speed mode, unknown-duplex mode

Link speed type is autonegotiation, link duplex type is autonegotiation

Flow-control is not enabled

The Maximum Frame Length is 9216

……

4. Verify that the configurations of ports at both ends, network cables, transceiver modules, and fiber links are correct.

For more information, see "A port fails to go up."

5. If the problem persists, contact H3C Support.

A port frequently goes up and down

Symptom

A port frequently goes up and down.

Solution

1. For a fiber port, verify that the transceiver module is operating correctly.

For more information, see "Transceiver module failures."

2. For a copper port, the port status might be unstable when the speed and duplex mode are autonegotiated. Manually configure the speed and duplex mode for the port.

3. Verify that the link, peer device, and intermediate devices are operating correctly.

4. If the problem persists, contact H3C Support.

Transceiver module failures

Symptom

The interface with a transceiver module installed cannot go up, and alarms are present.

Solution

To resolve the problem:

1. Check the alarms on the transceiver module:

¡ If TX faults exist in the alarms, the peer port, fiber, or intermediate transmission devices might fail.

¡ If the RX faults or electrical current and voltage faults exist in the alarms, examine the local port.

<Sysname>display transceiver alarm interface GigabitEthernet 2/0/1

GigabitEthernet2/0/1 transceiver current alarm information:

TX fault

RX power high

Table 7 Alarms on transceiver modules

Field	Description
Alarms on SFP/SFP+ transceiver modules:
RX loss of signal	Received signals are lost.
RX power high	The received optical power is high.
RX power low	The received optical power is low.
TX fault	Transmission error.
TX bias high	The transmitted bias current is high.
TX bias low	The transmitted bias current is low.
TX power high	The transmitted optical power is high.
TX power low	The transmitted optical power is low.
Temp high	The temperature is high.
Temp low	The temperature is low.
Voltage high	The voltage is high.
Voltage low	The voltage is low.
Transceiver info I/O error	Transceiver information read/write error.
Transceiver info checksum error	Transceiver information checksum error.
Transceiver type and port configuration mismatch	The type of the transceiver module does not match the port configuration.
Transceiver type not supported by port hardware	The port does not support this type of transceiver modules.
Alarms on XFP transceiver modules:
RX loss of signal	Received signals are lost.
RX not ready	The receiving status is not ready
RX CDR loss of lock	Receiving CDR loss of lock.
RX power high	The received optical power is high.
RX power low	The received optical power is low.
TX not ready	The transmission status is ready.
TX fault	Transmission error.
TX CDR loss of lock	Transmission CDR loss of lock.
TX bias high	The transmitted bias current is high.
TX bias low	The transmitted bias current is low.
TX power high	The transmitted optical power is high.
TX power low	The transmitted optical power is low.
Module not ready	The module is not ready.
APD supply fault	Avalanche photo diode error.
TEC fault	Thermoelectric cooler error.
Wavelength unlocked	Wavelength loss of lock.
Temp high	The temperature is high.
Temp low	The temperature is low.
Voltage high	The voltage is high.
Voltage low	The voltage is low.
Transceiver info I/O error	Transceiver information read/write error.
Transceiver info checksum error	Transceiver information checksum error.
Transceiver type and port configuration mismatch	The type of the transceiver module does not match the port configuration.
Transceiver type not supported by port hardware	The port does not support this type of transceiver modules.

1. Cross-verify the transceiver module that might fail:

a. Install the transceiver module in another fiber port

b. Replace the current transceiver module with a transceiver module that is operating correctly.

2. Determine whether the transceiver module fails or the neighboring devices and intermediate transmission links fail.

3. If the transceiver module fails, use the display transceiver diagnosis command to display the digital diagnosis parameters on the transceiver module, and contact H3C Support.

You might fail to query the digital diagnosis parameters of a non-H3C transceiver module. H3C recommends that you use H3C transceiver modules. To query the vendor of a transceiver module, use the display transceiver manuinfo command. If the value of the Vendor Name field is H3C, the transceiver module is customized by H3C.

<Sysname>display transceiver manuinfo interface Ten-GigabitEthernet1/2/0/15

Ten-GigabitEthernet1/2/0/15 transceiver manufacture information:

Manu. Serial Number : 213410A0000054000251

Manufacturing Date : 2012-10-26

Vendor Name : H3C

Related commands

This section lists the commands that you might use for troubleshooting ports and links.

Command	Description
display current-configuration	Displays the running configuration. With an interface specified, this command displays the running configuration of the interface.
display interface	Displays the incoming traffic statistics, outgoing traffic statistics, and status of a port. In the output from this command, you can view whether error packets exist and view the error packet statistics.
display transceiver alarm	Displays alarms present on transceiver modules.
display transceiver diagnosis	Displays the current values of the digital diagnosis parameters on transceiver modules.
display transceiver interface	Displays key parameters of the transceiver module in a specified interface to verify whether the transceiver modules at both ends are consistent in the rate, wavelength, and single-mode or multi-mode status.
display transceiver manuinfo	Displays the electronic label information of a transceiver module to query the vendor of the transceiver module.

Troubleshooting hardware forwarding

Forwarding path problem

Symptom

When data forwarding path failure detection is enabled (it is enabled by default), the switch periodically sends test packets between LPUs to examine whether the forwarding chips on the LPUs are operating correctly.

[Sysname]forward-path-detection enable

If a forwarding problem occurs, the switch displays "Forwarding fault" or "Board fault" messages. For example:

%Jun 26 09:51:53:207 2013 H3C DIAG/1/ALERT: -MDC=1-Chassis=2-Slot=4; Forwarding fault: chassis 2 slot 6 to chassis 2 slot 4

%Jun 26 09:51:57:621 2013 H3C DIAG/1/ALERT: -MDC=1; Board fault: chassis 2 slot 6,please check it

%Jun 26 09:51:59:251 2013 H3C DIAG/1/ALERT: -MDC=1-Chassis=2-Slot=6; Forwarding fault: chassis 2 slot 6 to chassis 2 slot 6

%Jun 26 09:52:05:621 2013 H3C DIAG/1/ALERT: -MDC=1; Board fault: chassis 2 slot 6,please check it

%Jun 26 09:52:12:621 2013 H3C DIAG/1/ALERT: -MDC=1; Board fault: chassis 2 slot 6,please check it

%Jun 26 09:52:22:621 2013 H3C DIAG/1/ALERT: -MDC=1; Board fault: chassis 2 slot 6,please check it

Solution

The switch has MPUs, LPUs, and switching fabric modules. LPUs and switching fabric modules perform service traffic forwarding. Traffic is load balanced among the switching fabric modules. MPUs perform control and management. MPUs do not participate in service traffic forwarding.

To resolve the forwarding path problem:

· If "Forwarding fault" messages show forwarding problems between multiple LPUs, it is likely that a switching fabric module has a problem. To locate the problem source, isolate switching fabric modules one by one. (An isolated switching fabric module does not participate in traffic forwarding. Isolating a switching fabric module does not result in packet loss.)

For example, do the following on an H3C S12508 switch in which slots 10 through 18 hold switching fabric modules:

a. Isolate the switching fabric module in slot 10.

[Sysname] board-offline slot 10

Caution: This command is only for diagnostic purpose which will cause board normal service unusable. Continue? [Y/N]:y

Config successfully

b. Observe for a while to see whether the problem disappears.

c. If the problem disappears, the switching fabric module is likely to be the problem source. H3C recommends that you replace the module card or install the module into another switch that is operating correctly to determine whether the module is really the problem source.

d. If the problem persists, cancel the isolation.

[Sysname]undo board-offline slot 10

This command will reboot the specified board. Continue? [Y/N]:y

Config successfully

e. After the switching fabric module in slot 10 starts up and operates correctly (in Normal state), isolate the switching fabric module in the next slot. Repeat the previous steps until you locate the failed switching fabric module and verify that other switching fabric modules are operating correctly.

· If "Forwarding fault" messages show forwarding problems from the same LPU to multiple other LPUs, the LPU is likely to have a problem. If you are not sure whether the LPU has a problem, H3C recommends that you do the following to locate the problem source:

a. Isolate switching fabric modules one by one, and observe whether the problem disappears.

b. If the problem persists during the whole isolation process, the LPU might be the source of the problem. H3C recommends that you switch the services on the LPU to other LPUs and replace or isolate the LPU. If the problem is solved, the LPU is the source of the problem.

Online hardware diagnostic and failure protection

After you enable the hardware failure detection function, the switch automatically detects hardware failures on the following elements:

· chip—Components.

· board—Cards.

· forwarding—Forwarding plane.

You can configure the switch to take the following actions in response to hardware failures:

· off—Takes no action.

· warning—Sends traps to notify you of the failures. (The default setting is warning.)

· reset—Restarts the relevant cards to recover from failures.

· isolate—Shuts down the relevant ports, prohibits loading software for the relevant cards, isolates the relevant cards, or powers off the relevant cards to reduce impact from the failures.

If there are backup links, H3C recommends that you configure the switch to take the isolate action. This action isolates the failed element and helps recover services quickly. The following shows the configuration commands:

[Sysname]hardware-failure-detection chip isolate

Config successfully

[Sysname]hardware-failure-detection board isolate

Config successfully

[Sysname]hardware-failure-detection forwarding isolate

Config successfully

To display hardware failure detection and fix information, use the following command:

<Sysname>display hardware-failure-detection

Current level:

chip : warning

board : warning

forwarding : warning

---------------------Chassis 1, Slot 0 executed records:-------------------

There is no record.

---------------------Chassis 1, Slot 0 trapped records:--------------------

There is no record.

Related commands

This section lists the commands that you might use for troubleshooting hardware forwarding.

Command	Description
board-offline	Isolate a card from the system.
display hardware-failure-detection	Display hardware failure detection and fix information, including the following times: · Protection actions configured for hardware failures. · Most recent 10 fix records of each card.
forward-path-detection enable	Enable data forwarding path failure detection to examine whether data forwarding paths are operating correctly.
hardware-failure-detection	Configure hardware failure detection, and specify the actions to be taken in response to hardware failures. The purpose is to enable the device to automatically detect hardware failures and recover services.

Troubleshooting packet forwarding failure

Ping failure or packet loss

Symptom

Packet loss and ping failure occurred.

<Sysname>ping 10.0.0.5

PING 10.0.0.5 (10.0.0.5): 56 data bytes, press CTRL_C to break

Request time out

--- 10.0.0.5 ping statistics ---

5 packet(s) transmitted, 0 packet(s) received, 100.0% packet loss

Solution

Packet statistics collection

To resolve the problem, collect packet statistics by using packet capture tools or by configuring ACL rules. The following uses ACL rule as an example.

1. Create an IPv4 advanced ACL rule to permit IP packets destined for 1.1.1.1.

[Sysname]acl number 3000

[Sysname-acl-adv-3000] rule 1 permit ip destination 1.1.1.1 0

2. Define a traffic class and a traffic behavior.

[Sysname]traffic classifier statistic_1

[Sysname-classifier-static] if-match acl 3000

[Sysname] traffic behavior statistic_1

[Sysname-classifier-static] accounting packet

3. Create a QoS policy, and associate traffic class statistic_1 with traffic behavior statistic_1 in the QoS policy.

[Sysname] qos policy statistic_1

[Sysname-classifier-static] classifier statistic_1 behavior statistic_1

4. Apply the QoS policy to the incoming traffic of GigabitEthernet 8/0/1.

[Sysname] interface gigabitethernet 8/0/1

[Sysname-GigabitEthernet8/0/1] qos apply policy statistic_1 inbound

5. Display information about the QoS policies applied to GigabitEthernet 8/0/1.

[Sysname] display qos policy interface gigabitethernet8/0/1

Interface: GigabitEthernet8/0/1

Direction: Inbound

Policy: statistic_1

Classifier: statistic_1

Operator: AND

Rule(s) : If-match acl 3000

Behavior: statistic_1

Accounting Enable:

1000 (Packets)

Packet count

If the device does not receive any ping packets, check the neighboring device on the uplink. If the number of ping packets sent by the device is correct, check the neighboring device on the downlink. If the number of ping packets sent is incorrect, see "Layer 2 forwarding failure, "Layer 3 forwarding failure," and "MPLS forwarding failure."

Layer 2 forwarding failure

Symptom

Layer 2 packet loss or ping failure occurs between a switch and a device on the same network segment and in the same VLAN.

A switch can perform Layer 2 forwarding only when the destination MAC address of a packet is different from any MAC address of the switch. A switch might have multiple MAC addresses in an address range. The following output shows the MAC addresses of a VLAN interface on a switch:

[Sysname]display interface vlan-interface 10

Vlan-interface10 current state: UP

Line protocol current state: UP

Description: Vlan-interface10 Interface

The Maximum Transmit Unit is 1500

Internet Address is 1.1.1.1/24 Primary

IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 00e0-fc00-6503

IPv6 Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 00e0-fc00-6503

Last clearing of counters: Never

Solution

To resolve the problem:

1. Verify that the following Layer 2 configurations are correct:

¡ VLAN and PVID.

¡ Packet filtering.

¡ Traffic redirection.

¡ Traffic policing.

¡ Generic traffic shaping (GTS).

¡ Unknown unicast suppression/multicast suppression/broadcast suppression.

2. Verify that the learned MAC addresses are correct. If they are not, determine whether loops occur. To quickly restore forwarding, you can configure static MAC address entries.

<Sysname>display mac-address

MAC Address VLAN ID State Port/NickName Aging

0010-9400-0002 10 Learned GE2/6/0/1 Y

000f-e259-79c0 25 Learned GE2/15/0/1 Y

00e0-fc12-3456 25 Learned GE2/15/0/1 Y

0023-8956-7b00 3102 Learned XGE2/4/0/1 Y

0023-8956-7b00 3202 Learned XGE2/4/0/8 Y

3. Verify traffic statistics:

¡ Execute the qos traffic-counter inbound command to collect statistics about the inbound traffic.

[Sysname]qos traffic-counter inbound counter0 slot 3 interface Gigabitethernet 3/0/1

¡ Execute the display qos traffic-counter inbound multiple times to observe the discarded packet count in the inbound direction. If the count continuously increases, verify the port configurations according to Table 8. If the reasons for packet loss still cannot be determined, contact H3C Support.

[Sysname]display qos traffic-counter inbound counter0 slot 3

Slot 3 inbound counter0 mode:

Interface: GigabitEthernet3/0/1

VLAN: all

Traffic-counter summary:

Summary inbound: 578199 packets

Dropped of local filtering: 0 packets

Dropped of VLAN filtering: 0 packets

Dropped of security filtering: 0 packets

Table 8 Command output

Field	Description
Summary inbound	Number of incoming packets.
Dropped of local filtering	A packet might be dropped due to the following reasons: · Traffic suppression is performed. · The outgoing interface is the same as the incoming interface, according to the MAC address table lookup result. · STP sets the state of the interface to discarding.
Dropped of VLAN filtering	A packet might be dropped due to the following reasons: · The VLAN of the packet is different from the VLAN of the interface. · The VLAN of the packet hasn't been created.
Dropped of security filtering	A packet might be dropped due to the following reasons: · The packet matches a blackhole MAC address entry. To display blackhole MAC address entries, execute the display mac-address blackhole command. · The packet fails the MAC authentication. To display MAC authentication settings and statistics, execute the display mac-authentication interface command. · The source MAC address of the packet is a multicast MAC address or broadcast MAC address. · The source MAC address of the packet is unknown to the interface.

¡ Execute the qos traffic-counter outbound command to collect statistics about the outbound traffic.

[Sysname]qos traffic-counter outbound counter0 slot 4 interface Gigabitethernet 4/0/1

¡ Execute the display qos traffic-counter outbound multiple times to observe the discarded packet count in the outbound direction. If the count continuously increases, verify the port configurations according to Table 9. If the reasons for packet loss still cannot be determined, contact H3C Support.

[Sysname]display qos traffic-counter outbound counter0 slot 4

Slot 4 outbound counter0 mode:

Interface: GigabitEthernet4/0/1

VLAN: all

Local precedence: all

Drop priority: all

Traffic-counter summary:

Unicast: 0 packets

Multicast: 0 packets

Broadcast: 0 packets

Control packets: 18 packets

Bridge egress filtered packets: 0 packets

Tail drop packets: 0 packets

Tail drop multicast packets: 993827 packets

Forwarding restrictions packets: 0 packets

Table 9 Command output

Field	Description
Unicast/Multicast/Broadcast	Number of packets that are not dropped.
Control packets	Number of control packets sent by the CPU.
Bridge egress filtered packets	A packet might be dropped due to the following reasons: · The VLAN of the packet is different from the VLAN of the interface. · STP sets the state of the interface to discarding. · RRPP or Smart Link blocks the interface. · The outgoing interface is down.
Tail drop packets	A packet might be dropped due to the following reasons: · The transmit queue is congested. · Traffic shaping is performed.
Tail drop multicast packets	A multicast or broadcast packet might be dropped due to the following reasons: · No outgoing interface is configured for the packet. · STP blocks the interface. · The outgoing interface is down.
Forward restrictions packets	Number of packets that are prevented from being forwarded.

Layer 3 forwarding failure

Symptom

IP service failures, ping or tracert operation failures, or ping or tracert packet loss occurs.

A switch performs Layer 3 forwarding by using the driver IP forwarding table instead of the routing table. The route management module selects optimal routes through various protocols, and puts them into the FIB table. The FIB table synchronizes the routes to the driver IP forwarding table, which guides packet forwarding.

Figure 3 Relationship between the routing table and forwarding table

Solution

To resolve the problem:

1. Use the mirroring function or capture packets to verify that the destination MAC address of packets is the MAC address of the switch.

A switch can perform Layer 3 forwarding only when the destination MAC address of a packet is the MAC address of the switch. The switch might have multiple MAC addresses in an address range. The following output shows the MAC addresses of VLAN interfaces on a switch:

[Sysname]display interface vlan-interface 10

Vlan-interface10 current state: UP

Line protocol current state: UP

Description: Vlan-interface10 Interface

The Maximum Transmit Unit is 1500

Internet Address is 1.1.1.1/24 Primary

IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 00e0-fc00-6503

IPv6 Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 00e0-fc00-6503

Last clearing of counters: Never

2. Verify that the route to the specific destination exists in the routing table. If it does not exist, examine the routing protocol configurations and protocol states.

[Sysname]display ip routing-table 1.1.1.0

Summary Count : 1

Destination/Mask Proto Pre Cost NextHop Interface

1.1.1.0/24 Static 60 0 20.0.0.2 Vlan20

3. Verify that the route to the specific destination exists in the FIB table. If a route exists but cannot be used to guide the packet forwarding, contact H3C Support.

[Sysname]display fib 1.1.1.0

Destination count: 1 FIB entry count: 1

Flag:

U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination/Mask Nexthop Flag OutInterface/Token Label

1.1.1.0/24 20.0.0.2 USG Vlan20

4. Verify that the interfaces in the learned ARP entries are correct. If they are not, execute the reset arp command to clear ARP entries so that the device can learn the correct ARP entries. You can also configure static ARP entries. If the problem persists, contact H3C Support.

[Sysname]display arp 20.0.0.2

Type: S-Static D-Dynamic M-Multiport I-Invalid

IP Address MAC Address VLAN Interface Aging Type

20.0.0.2 0000-0000-0001 20 GE2/0/1 N/A S

MPLS forwarding failure

Symptom

You might experience the following problems with MPLS forwarding:

· Unreachable destination.

· No routes.

· Error message printed.

· Unstable tunnels.

· Packet sending or receiving failure.

Solution

VLL and L3VPN are implemented based on LSPs.

To resolve the common problems with MPLS, verify the LSP and route configurations on the LSRs.

Figure 4 MPLS network diagram

Troubleshooting MPLS LSPs

Perform the following configurations on the ingress node (PE 1 in Figure 4):

1. Execute the display mpls lsp command to display LSP information.

[PE1]display mpls lsp

FEC Proto In/Out Label Interface/Out NHLFE

100.100.100.100/32 LDP 3/- -

4.4.4.4/32 LDP NULL/3 Vlan103

90.0.0.0/24 LDP NULL/3 Vlan103

1.1.1.1/32 LDP 3/NULL InLoop0

50.0.0.0/24 LDP NULL/3 Vlan103

70.0.0.0/24 LDP NULL/3 Vlan103

3.3.3.3/32 LDP NULL/1025 Vlan103

If the configured LSP does not exist, see MPLS Configuration Guide to verify the MPLS LSP configuration on each LSR.

2. Execute the display mpls ldp peer command and verify the MPLS LDP session.

[PE1]display mpls ldp peer

Total number of peers: 1

Peer LDP ID State Role GR MD5 KA Sent/Rcvd

4.4.4.4:0 Operational Passive Off Off 39/39

If the session status is not Operational, an error might occur. Go to steps 3 and 4 to further determine the problem. If the session status is Operational, go to step 5.

3. Execute the display current-configuration configuration ldp command, and verify that the local LSR and the peer LSR have the same MD5 password.

<PE1>display current-configuration configuration ldp

mpls ldp

md5-authentication 4.4.4.4 cipher $c$3$uNK0ggilqlClQ6Q/CcNQPPqux6mAqU2p

return

4. Execute the display mpls ldp interface command to display LDP interface information.

[PE1]display mpls ldp interface

Interface MPLS LDP Auto-config

Vlan10 Enabled Configured -

GE3/0/2 Enabled Configured -

XGE2/0/6 Enabled Configured -

If the configured information is incorrect, verify the MPLS LDP configuration on each LSR.

5. Execute the mpls lsr-id command, and verify that the LSR ID is the IP address of a loopback interface. H3C recommends that you configure the IP address of a loopback interface as the LSR ID.

<PE1>display current-configuration | include lsr-id

mpls lsr-id 2.2.2.2

<PE1>display ip interface brief

*down: administratively down

(s): spoofing

Interface Physical Protocol IP Address Description

Loop0 up up(s) 100.100.100.100 LoopBack0..

Loop2 up up(s) 100.100.100.102 LoopBack2..

M-E0/0/0 up up 192.168.147.7 M-Etherne..

<PE1>system-view

[PE1]mpls lsr-id 100.100.100.100

6. Verify that the VLAN interface is enabled with MPLS and MPLS LDP.

[PE1]interface vlan-interface 103

[PE1-Vlan-interface103]display this

interface Vlan-interface103

ip address 1.1.1.2 255.255.255.0

mpls enable

mpls ldp enable

return

Troubleshooting routes

Perform the following configurations on the ingress node (PE 1 in Figure 4):

1. Execute the display ip routing-table command to display routing table information.

[PE1]display ip routing-table

Routing Tables: Public

Destinations : 10 Routes : 10

Destination/Mask Proto Pre Cost NextHop Interface

1.1.1.1/32 Direct 0 0 127.0.0.1 InLoop0

3.3.3.3/32 OSPF 10 2 103.0.0.4 Vlan103

4.4.4.4/32 OSPF 10 1 103.0.0.4 Vlan103

50.0.0.0/24 OSPF 10 2 103.0.0.4 Vlan103

70.0.0.0/24 OSPF 10 2 103.0.0.4 Vlan103

90.0.0.0/24 OSPF 10 2 103.0.0.4 Vlan103

103.0.0.0/24 Direct 0 0 103.0.0.1 Vlan103

103.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0

Verify that the route entries include IP addresses of the loopback interfaces on PE 1, P, and PE 2, and the IP address of the remote device's VLAN interface. Otherwise, verify the routing protocol configuration on each LSR.

2. Verify that the routing protocol (this example uses OSPF) operates correctly. If it does not, verify the routing protocol configuration on each LSR.

[PE1]display ospf peer

OSPF Process 1 with Router ID 1.1.1.1

Neighbor Brief Information

Area: 0.0.0.0

Router ID Address Pri Dead-Time Interface State

4.4.4.4 103.0.0.4 1 37 Vlan103 Full/BDR

3. Verify that the loopback interface and the VLAN interface are advertised in the routing protocol. Verify that the LDP interface is enabled with a routing protocol.

[PE1-ospf-1]display this

ospf 1

area 0.0.0.0

network 103.0.0.0 0.0.0.255

network 1.1.1.1 0.0.0.0

return

4. Execute the debugging command to verify that routing protocol packets are sent and received correctly. If they are not, verify the routing protocol configurations on the local LSR and remote LSR.

<PE1>debugging ospf packet

*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; OSPF 1: Sending packe

ts.

*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Source address: 1.1.1.1

*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Destination address: 224.0.0.5

*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Version 2, Type: 1, Length: 44.

*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Router: 192.168.147.7, Area: 0.0.0.0, Checksum: 42732.

*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Authentication type: 00, Key(ASCII): 0 0 0 0 0 0 0 0.

*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Network mask: 255.255.255.0, Hello interval: 10, Option: _E_.

*Jun 27 11:17:23:149 2013 PE1 OSPF/7/DEBUG: -MDC=1; Router priority: 1, Dead Interval: 40, DR: 1.1.1.1, BDR: 0.0.0.0.

5. If the problem persists, contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting IP forwarding.

Command	Description
accounting packet	Configures a traffic accounting action in the traffic behavior database to count traffic in packets.
acl	Creates an ACL, and enters its view.
classifier behavior	Associates a traffic behavior with a traffic class in a QoS policy.
debugging ospf packet	Enables OSPF packet debugging to examine whether OSPF packets can be correctly sent and received.
display arp	Displays ARP entries to check whether output interfaces can be correctly learned through ARP.
display current-configuration \| include lsr-id	Displays the current MPLS LSR ID.
display current-configuration configuration mpls-ldp	Displays information about MPLS LDP to verify the consistency of MD5 passwords.
display fib	Displays FIB entries to examine whether an entry matching a specific destination network exists in the FIB table.
display interface	Displays information about the specified interface.
display ip interface brief	Displays brief IP configuration information for the specified Layer 3 interface or all Layer 3 interfaces.
display ip routing-table	Displays brief information about active routes in the routing table to examine whether a route to the specified network exists in the routing table.
display mac-address	Displays MAC address entries to examine whether interfaces can be correctly learned.
display mpls ldp interface	Displays LDP interface information to examine whether the corresponding label advertisement mode exists.
display mpls ldp peer	Displays LDP peer information to examine whether the configured LSPs are up.
display mpls ldp session	Displays LDP session information.
display mpls lsp	Displays information about LSPs.
display ospf peer	Displays information about OSPF neighbors.
display qos policy interface	Displays information about the QoS policy or policies applied to an interface.
display qos traffic-counter	Displays the traffic statistics collected by the specified counter, and displays the configuration of the counter.
display this	Displays the running configuration in the current view.
interface	Enters interface view.
rule	Creates an ACL rule.
traffic behavior	Creates a traffic behavior and enters traffic behavior view.
traffic classifier	Creates a class and enters class view.
qos apply policy	Applies a QoS policy to a port.
qos policy	Creates a QoS policy and enters QoS policy view.
qos traffic-counter	Enables the traffic accounting function, and specifies the type of traffic.
mpls lsr-id	Configures an LSR ID for the local LSR.
ping	Examines whether the destination IP address is reachable, and displays related statistics.

Troubleshooting IRF

This section provides troubleshooting information for common problems with IRF.

IRF fabric establishment failure

Symptom

An H3C S12500 IRF fabric cannot be established.

Solution

To resolve the problem:

1. Verify that all member chassis run the same software version and use the same type of MPUs:

a. Execute the display device command. Check the Brd Type and Software Version fields for the software version and MPU type.

<Sysname> display device

Slot No. Brd Type Brd Status Subslot Num Sft Ver

1/0 LST1MRPNC1 Master 0 S12500-CMW710-R7128

1/1 LST1MRPNC1 Standby 0 S12500-CMW710-R7128

1/2 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/3 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/4 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/5 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/6 NONE Absent 0 NONE

1/7 NONE Absent 0 NONE

1/8 NONE Absent 0 NONE

1/9 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

……

b. If the member chassis run different software versions, upgrade the software to the same version. If they use different types of MPUs, replace MPUs.

2. Verify that at least one IRF physical port is up for an IRF port:

NOTE:

An IRF port goes down only if all its physical ports are down.

a. Execute the display interface command. Check the current state field for the status of an IRF physical port. For example:

<Sysname> display interface gigabitethernet 2/6/0/1

GigabitEthernet2/6/0/1 current state: UP

Line protocol current state: UP

IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0000-e80d-c000

Description: GigabitEthernet2/6/0/1 Interface

Loopback is not set

Media type is optical fiber, Port hardware type is 1000_BASE_SX_SFP

……

b. If any physical port bound to an IRF port is down, bring it up.

3. Verify that all IRF physical ports are connected correctly:

IMPORTANT:

When you connect two neighboring IRF members, you must connect the physical ports of IRF-port 1 on one member to the physical ports of IRF-port 2 on the other.

a. Execute the display irf configuration command. Check the IRF-Port1 and IRF-Port2 fields for IRF port bindings.

<Sysname> display irf configuration

MemberID NewID IRF-Port1 IRF-Port2

1 1 Ten-GigabitEthernet1/8/0/1 disable

Ten-GigabitEthernet1/8/0/2

2 2 disable Ten-GigabitEthernet2/12/0/1

Ten-GigabitEthernet2/12/0/2

b. Verify that the physical IRF connections are consistent with the IRF port bindings. In this example, Ten-GigabitEthernet 1/8/0/1 and Ten-GigabitEthernet 1/8/0/2 on member chassis 1 must be connected to Ten-GigabitEthernet 2/12/0/1 and Ten-GigabitEthernet 2/12/0/2 on member chassis 2.

c. If connection errors exist, reconnect the IRF physical ports.

4. Verify that all member chassis use the same system operating mode:

a. Execute the display system-working-mode command on each member chassis. Check the command output for mode inconsistency.

[Sysname] display system-working-mode

The current system working mode is standard.

The next system working mode is standard.

b. If mode inconsistency exists, execute the system-working-mode command to change the system operating mode. The system-working-mode command setting takes effect after a system reboot.

5. Verify that all MDC settings and settings for these commands are the same across all chassis: acl hardware-mode ipv6, irf mode enhanced ,and vpn popgo:

a. Execute the display current-configuration command. Check the configuration on each member chassis for configuration inconsistency.

[Sysname] display current-configuration

……

acl hardware-mode ipv6 enable

……

irf mode enhanced

……

undo vpn popgo

……

b. If configuration inconsistency exists, modify the configuration.

6. If the problem persists, contact H3C Support.

IRF split

Symptom

An IRF fabric splits.

Solution

To resolve the problem:

1. Use the system log to identify the IRF split time.

You can use this information to search the system log for events that might cause the split.

%Jun 26 10:13:46:233 2013 H3C STM/2/STM_LINK_STATUS_TIMEOUT: IRF port 1 is down because heartbeat timed out.

%Jun 26 10:13:46:436 2013 H3C STM/3/STM_LINK_STATUS_DOWN: -MDC=1; IRF port 2 is down.

2. Verify that all interface cards that have IRF physical ports are in Normal state:

a. Execute the display device command. Check the Brd Status field for the card state.

<Sysname>display device

Slot No. Brd Type Brd Status Subslot Num Sft Ver

1/0 LST1MRPNC1 Master 0 S12500-CMW710-R7128

1/1 LST1MRPNC1 Standby 0 S12500-CMW710-R7128

1/2 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/3 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/4 NONE Absent 0 NONE

1/5 NONE Absent 0 NONE

1/6 NONE Absent 0 NONE

1/7 NONE Absent 0 NONE

1/8 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

1/9 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

1/10 NONE Absent 0 NONE

1/11 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/12 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/13 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/14 LST1XP16LEC1 Normal 0 S12500-CMW710-R7128

1/15 NONE Absent 0 NONE

1/16 NONE Absent 0 NONE

1/17 NONE Absent 0 NONE

1/18 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

1/19 LST1GT48LEC1 Normal 0 S12500-CMW710-R7128

1/20 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/21 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/22 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/23 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/24 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/25 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/26 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/27 LST2SF18C1 Normal 0 S12500-CMW710-R7128

1/28 LST2SF18C1 Normal 0 S12500-CMW710-R7128

b. If an interface card is not in Normal state, use the methods described in "Card failure" to resolve the problem.

3. Verify that each IRF port has at least one physical port in up state:

a. Execute the display interface command. Check the current state field for the state of an IRF physical port. For example:

<Sysname> display interface gigabitethernet 2/6/0/1

GigabitEthernet2/6/0/1 current state: UP

Line protocol current state: UP

IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0000-e80d-c000

Description: GigabitEthernet2/6/0/1 Interface

Loopback is not set

Media type is optical fiber, Port hardware type is 1000_BASE_SX_SFP

……

b. If any physical port bound to an IRF port is down, use the methods described in "Troubleshooting links and ports" to recover the link state and bring up the physical port.

4. Remove hardware problems that might cause recurring IRF split events:

a. Execute the display version command. Check the uptime of the member chassis, MPUs, and interface cards that have IRF links.

<Sysname> display version

H3C Comware Software, Version 7.1.034, Release 7129

H3C S12518 uptime is 0 weeks, 1 day, 23 hours, 6 minutes

Last reboot reason : User reboot

Boot image: cfa0:/S12500-CMW710-BOOT-R7129.bin

Boot image version: 7.1.034P04, Release 7129

System image: cfa0:/S12500-CMW710-SYSTEM-R7129.bin

System image version: 7.1.034, Release 7129

LST1MRPNC1 2/0: uptime is 0 weeks, 1 day, 23 hours, 6 minutes

Last reboot reason : User reboot

3456 Mbytes SDRAM

1024 Kbytes NVRAM Memory

Type : LST1MRPNC1

BootRom : 2.19

Software : S12500-CMW710-R7129

PCB : Ver.A

……

b. Compare the uptime of chassis, MPUs, and interface cards to determine whether a member chassis, MPU, or interface card rebooted before the IRF split.

c. If the IRF split is caused by a chassis or card reboot, identify the reboot cause:

- If the reboot occurred because of a hardware problem, replace the faulty component.

- If the reboot occurred because of power failure, use the methods described in "Power supply failure" to remove the power supply problems.

5. If the problem persists, contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting IRF.

Command	Description
display device	Displays device configuration. Use this command to verify that all member chassis run the same software version and use the same type of MPUs.
display interface	Displays interface information. Use this command to verify that each IRF port has at least one physical port in up state.
display irf configuration	Displays IRF configuration on each member chassis. Use this command to identify physical ports bound to IRF-port 1 and IRF-port 2 on each member chassis before you check IRF physical connections.
display system-working-mode	Displays system operating mode. Use this command to verify that all member chassis are operating in the same mode.
display current-configuration	Displays the running configuration. In system view, verify that the MDC settings and the settings for the following commands are the same across all chassis: acl hardware-mode ipv6, irf mode enhanced, and vpn popgo.
display version	Displays the system version and uptime as well as the uptime of each card. Use this command to identify the runtime of each member chassis, MPU, and interface card that has IRF physical ports. Compare their uptime to determine whether a member chassis, MPU, or interface card rebooted before an IRF split.

Troubleshooting system management

This section provides troubleshooting information for common problems with system management.

High CPU usage

Symptom

A CPU usage higher than 60% persists on a card.

<Sysname>display cpu-usage

Slot 0 CPU usage:

0% in last 5 seconds

61% in last 1 minute

0% in last 5 minutes

Slot 0 CPU 1 CPU usage:

0% in last 5 seconds

0% in last 1 minute

0% in last 5 minutes

Execute the display cpu-usage history command to display the CPU usage statistics within the last 60 minutes.

<Sysname>display cpu-usage history slot 0

100%|

95%|

90%|

85%|

80%|

75%|

70%|

65%|

60%|

55%|

50%|

45%|

40%|

35%| #

30%| # #

25%| # #

20%| # # # #

15%| ## # # ##

10%| ## # # ##

5%|############################################################

------------------------------------------------------------

10 20 30 40 50 60 (minutes)

cpu-usage (CPU 0) last 60 minutes (SYSTEM)

Solution

High CPU usage might occur because of the following issues:

· Route flapping.

· Too many routing policies.

· Packet attack.

· Link loop.

To resolve the problem:

1. Execute the display route-policy command to display the configured routing policies to verify that the configured routing policies are reasonable.

<Sysname> display route-policy

Route-policy: policy1

permit : 1

if-match cost 10

continue: next node 11

apply comm-list a delete

2. Execute the display interface command, and check for loop links.

<Sysname>display interface GigabitEthernet2/6/0/1

GigabitEthernet2/6/0/1 current state: UP

Line protocol current state: UP

IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0000-e80d-c000

Description: GigabitEthernet2/6/0/1 Interface

Loopback is not set

Media type is optical fiber, Port hardware type is 1000_BASE_SX_SFP

1000Mbps-speed mode, full-duplex mode

……

Last clearing of counters: Never

Peak value of input: 123241940 bytes/sec, at 2013-06-27 14:33:15

Peak value of output: 80 bytes/sec, at 2013-06-27 14:13:00

Last 300 seconds input: 26560 packets/sec 123241940 bytes/sec 99%

Last 300 seconds output: 0 packets/sec 80 bytes/sec 0%

……

If any loop occurs, verify the following:

¡ The link connections and port configuration are correct.

¡ STP is enabled, and the configuration is correct.

¡ The STP status of the neighboring device is normal.

¡ If all the previous configurations are correct, the reason might be:

- STP calculation error.

- STP calculation is correct, but the driver does not block a port.

You can do all of the following:

¡ Shut down the uplink port on the ring.

¡ Remove and insert the transceiver module into the port to restart STP calculation.

¡ Contact H3C Support.

3. If the problem persists, contact H3C Support.

High memory usage

Symptom

A memory usage higher than 70% persists on a card.

Use the display memory command to display the memory usage of a card.

<Sysname>display memory chassis 2 slot 2

The statistics about memory is measured in KB:

Chassis 2 Slot 2:

Total Used Free Shared Buffers Cached FreeRatio

Mem: 774280 591932 182348 0 0 6548 23.6%

-/+ Buffers/Cache: 175800 598480

Swap: 0 0 0

Solution

To resolve the problem:

1. Execute the display process memory command multiple times to do the following:

¡ Display the memory usage for all user processes on a card.

¡ Identify the process for which memory usage is continuously increasing.

If the memory usage of a process is continuously increasing, the memory might be leaked.

Dynamic memory is heap memory dynamically assigned to the device. Its value becomes large when memory is leaked.

<Sysname>display process memory chassis 2 slot 2

JID Text Data Stack Dynamic Name

1 168 604 24 64 scmd

2 0 0 0 0 [kthreadd]

3 0 0 0 0 [ksoftirqd/0]

……

78 112 9368 12 320 diagd

79 76 1040 8 8 mdcagentd

80 116 8860 8 16 fsd

81 140 992 16 212 dbmd

83 72 496 8 20 syslogd

84 168 41980 16 44 drvdiagd

85 172 17112 16 12 devd

94 112 8864 12 12 edev

……

The output shows that the process with the ID 78 uses the most memory.

2. Execute the display process memory heap command multiple times to do the following:

¡ Display heap memory usage for user process 78.

¡ Identify the memory block for which memory usage is continuously increasing.

If the memory usage of a memory block is continuously increasing, the memory might be leaked.

<Sysname>display process memory heap job 78 verbose

Heap usage:

Size Free Used Total Free Ratio

16 0 385 385 0.0%

24 2 49 51 3.9%

32 0 13 13 0.0%

40 0 7 7 0.0%

64 0 411 411 0.0%

72 0 4 4 0.0%

80 1 0 1 100.0%

96 1 0 1 100.0%

104 0 8 8 0.0%

136 0 8 8 0.0%

152 0 9 9 0.0%

184 0 1 1 0.0%

368 0 8 8 0.0%

3080 0 1 1 0.0%

8200 1 0 1 100.0%

29376 1 0 1 100.0%

Large Memory Usage:

Used Blocks : 24

Used Memory(in bytes): 2031616

Free Blocks : 0

Free Memory(in bytes): 0

Summary:

Total virtual memory heap space(in bytes) : 2113536

Total physical memory heap space(in bytes) : 454656

Total allocated memory(in bytes) : 2075736

3. Contact H3C Support.

Insufficient resources

Symptom

The system displays the following log and trap information when resources are insufficient:

%Jul 26 20:43:11:218 2012 H3C DRV_L3/4/NO_RESOURCE: -MDC=1-Slot=3; Insufficient system resources!

%Jul 26 20:44:51:259 2012 H3C DRV_L3/4/NO_RESOURCE: -MDC=1-Slot=6; No enough resource!

%Jul 26 20:47:18:712 2012 H3C DRV_L3/4/NO_RESOURCE: -MDC=1-Slot=3; Not enough resources are available to complete the operation.

Solution

ACL resources

The following features use ACL resources:

· QoS.

· Packet filter.

· Priority mapping and trust.

· Mirror.

· Protocol packet to CPU.

· Selective QinQ and VLAN mapping.

· Port binding, PORTAL, and EAD.

· Broadcast suppression.

· MAC-BASED-VLAN, VOICE VLAN, RSPAN, and UDP-Helper.

To resolve the problem:

1. Use the display qos-acl resource command to display the ACL usage on a card.

<Sysname>display qos-acl resource chassis 2 slot 2

Interfaces: GE2/2/0/1 to GE2/2/0/24

---------------------------------------------------------------------

Type Total Reserved Configured Remaining Usage

---------------------------------------------------------------------

IN-MQC-CAR 8192 0 0 8192 0%

IN-COMM-CAR 7168 0 0 7168 0%

IN-COUNT 8192 0 166 8026 2%

OUT-MQC-CAR 8192 0 166 8026 2%

OUT-COUNT 8192 0 166 8026 2%

ACL-RES 2048 0 73 1975 3%

Interfaces: GE2/2/0/25 to GE2/2/0/48

---------------------------------------------------------------------

Type Total Reserved Configured Remaining Usage

---------------------------------------------------------------------

IN-MQC-CAR 8192 0 0 8192 0%

IN-COMM-CAR 7168 0 0 7168 0%

IN-COUNT 8192 0 166 8026 2%

OUT-MQC-CAR 8192 0 166 8026 2%

OUT-COUNT 8192 0 166 8026 2%

ACL-RES 2048 0 73 1975 3%

2. If most ACL resources are allocated, optimize ACL configuration. For example, delete or combine ACL rules. If the configuration cannot be optimized, contact H3C Support.

MAC resources

MAC resource insufficiency problems easily occur in large Layer 2 networks. There is a large amount of MAC addresses in these networks. New MAC addresses cannot be learned because old MAC addresses have not aged.

To resolve the problem:

1. Display MAC addresses that have been learned.

<Sysname>display mac-address count

49 mac address(es) found

The output shows that the number of MAC addresses that have been learned is small.

2. H3C recommends that you do the following:

¡ Set a smaller MAC address aging time.

¡ Create VLANs by service or by department, and connect VLANs at Layer 3.

MPLS LSP resources

To resolve the problem:

1. Display MPLS LSP statistics.

<Sysname>display mpls lsp statistics

Lsp Type Total Ingress Transit Egress

STATIC LSP 0 0 0 0

STATIC CRLSP 0 0 0 0

LDP LSP 3 1 0 2

CRLDP CRLSP 0 0 0 0

RSVP CRLSP 0 0 0 0

BGP LSP 0 0 0 0

ASBR LSP 0 0 0 0

BGP IPV6 LSP 0 0 0 0

-------------------------------------------------------------------------

LSP 3 1 0 2

CRLSP 0 0 0 0

2. If MPLS LSP resources are insufficient, contact H3C Support.

Other system resources

Contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting system management.

Command	Remarks
display cpu-usage	Displays CPU usage statistics and tasks with high CPU usage.
display cpu-usage history	Displays the historical CPU usage statistics in charts.
display interface	Displays information about a specific interface.
display mac-address	Displays MAC address entries.
display memory	Displays memory usage for a card.
display mpls lsp statistics	Displays MPLS LSP statistics.
display process memory	Displays memory usage for all user processes on a card.
display process memory heap	Displays heap memory usage for a user process.
display qos-acl resource	Displays QoS and ACL resource usage.
display route-policy	Displays routing policy information.

H3C S12500 Switch Series Troubleshooting Guide-R7128-6W100

Obtaining log information

Restrictions and guidelines

Obtaining log files

Obtaining diag files

Troubleshooting flowchart

Card failure

Power failure

Fan failure

Temperature problem

Port failure

Hardware forwarding failure

Packet forwarding failure

IRF failure

Overuse of CPU

Overuse of memory

Insufficient resources

How to identify a card state

How to confirm a card reboot

In Off state

In Fault state

In Illegal state

Unexpected reboot

The number of incoming error packets of the CRC, frame, and throttle types keeps increasing on a port

The number of incoming error packets of the overrun type keeps increasing on a port

The incoming error packets of the jumbo type keeps increasing on a port

The number of outgoing error packets keeps increasing on a port

Forwarding path problem

Online hardware diagnostic and failure protection

Packet statistics collection

Packet count

MPLS forwarding failure

Troubleshooting MPLS LSPs

Troubleshooting routes

ACL resources

MAC resources

MPLS LSP resources

Other system resources

Intelligent Terminal Products

Product Support Services

Technical Service Solutions

Resource Center

Policy

Online Help

Become a Partner

Partner Policy & Program

Global Learning

Partner Sales Resources

Service Business

News & Events

Contact Us