H3C S12500 Switch Series Troubleshooting Guide-R1825P01-6W100

HomeSupportSwitchesH3C S12500 Switch SeriesDiagnose & MaintainTroubleshootingH3C S12500 Switch Series Troubleshooting Guide-R1825P01-6W100

H3C S12500 Switch Series (R1825P01) Troubleshooting Guide

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The information in this document is subject to change without notice.

H3C_彩色.emf

 


Contents

General troubleshooting procedures 1

Obtaining information· 1

Obtaining log information· 1

Obtaining other information· 3

Troubleshooting procedure· 3

Troubleshooting flowchart 3

Problem types 5

Problem locations and possible results 6

Common service recovering and troubleshooting methods 7

Troubleshooting hardware· 8

Card failure· 8

Symptom·· 8

Solution· 10

Power supply failure· 13

Symptom·· 13

Solution· 13

Fan failure· 16

Symptom·· 16

Solution· 16

Temperature alarm·· 18

Symptom·· 18

Solution· 18

Related commands 19

Troubleshooting links and ports 19

Error packets on a port 19

Symptom·· 19

Solution· 21

A port fails to go up· 22

Symptom·· 22

Solution· 22

A port in up state goes down· 23

Symptom·· 23

Solution· 23

A port frequently goes up and down· 24

Symptom·· 24

Solution· 24

Transceiver module failures 24

Symptom·· 24

Solution· 25

Related commands 26

Troubleshooting hardware forwarding· 27

Forwarding path problem·· 27

Symptom·· 27

Solution· 27

Online hardware diagnostic and failure protection· 28

Related commands 29

Troubleshooting packet forwarding failure· 30

Ping failure or packet loss 30

Symptom·· 30

Solution· 30

Layer 2 forwarding failure· 31

Symptom·· 31

Solution· 31

Layer 3 forwarding failure· 34

Symptom·· 34

Solution· 35

MPLS forwarding failure· 36

Symptom·· 36

Solution· 36

Related commands 39

Troubleshooting IRF· 41

IRF fabric establishment failure· 41

Symptom·· 41

Solution· 41

IRF split 43

Symptom·· 43

Solution· 43

Related commands 44

Troubleshooting system management 45

High CPU usage· 45

Symptom·· 45

Solution· 46

Insufficient resources 49

Symptom·· 49

Solution· 49

Related commands 52

 


General troubleshooting procedures

Obtaining information

H3C recommends that you enable the information center by using the info-center enable command for fast troubleshooting. By default, the information center is enabled.

Obtaining log information

Log information includes operation information in log files, and state information in diag files. The system stores these files in the CF card (cfa0 or cfa 1).

You can export the log and diag files through FTP, TFTP, or USB. To identify the files exported from different MPUs, save them in a specific order, for example, in different folders named chassisXslotY.

Table 1 Log information classification

Category

File name

Content

log

logfileX.log

Command executions, traps, and operational logs.

diag

default.diag

Device state, CPU state, memory state, configuration state, software entries, and hardware entries.

 

Restrictions and guidelines

Follow these restrictions and guidelines to obtain log information:

·     Record the displayed information during operations for future analysis.

·     Understand the impact of each operation so that the configuration can be restored upon operation failures.

·     Make sure the current configuration is consistent with the saved configuration. Do not save the configuration during IRF split, card faults, and card reboot.

·     After you perform an operation, wait for a while before you verify the results.

·     Before you replace an MPU with a new MPU, make sure the new MPU has the same software version as the old MPU.

Obtaining log files

Use the logfile save command to save logs from the log buffer to the CF card on one of the following:

·     The active and standby MPUs of a standalone device.

·     The IRF master and subordinate devices.

<Sysname>logfile save

Saved the log file buffer to file cfa0:/logfile/logfile7.log successfully.

Display log files on the active MPU.

<Sysname>dir

Directory of cfa0:/logfile/

                                                                               

   0     -rw-   5209069  Apr 23 2013 22:06:56   logfile1.log

   1     -rw-   5200061  May 04 2013 02:36:44   logfile2.log

   2     -rw-   5205918  May 09 2013 02:41:10   logfile3.log

                                                                               

1021808 KB total (790736 KB free)

                                                                               

File system type of cfa0: FAT16

Display log files on the standby MPU.

<Sysname>dir

Directory of slot1#cfa0:/logfile/

                                                                               

   0     -rw-   5221735  Apr 10 2013 17:53:14   logfile1.log

   1     -rw-   5227102  Apr 10 2013 18:54:34   logfile2.log

   2     -rw-   3352896  May 16 2013 20:15:44   logfile3.log

                                                                               

1020068 KB total (643264 KB free)

                                                                                

File system type of slot1#cfa0: FAT32

Display log files on the active MPU of an IRF subordinate device. If the IRF fabric has more than one subordinate device, execute this command on each subordinate device.

<Sysname>dir

Directory of chassis2#slot0#cfa0:/logfile/

                                                                               

   0     -rw-   5223211  May 15 2013 12:38:44   logfile1.log

   1     -rw-   2639526  May 15 2013 20:01:14   logfile2.log

   2     -rw-   5223207  May 15 2013 11:22:24   logfile3.log

                                                                               

1021808 KB total (773424 KB free)

                                                                                

File system type of chassis2#slot0#cfa0: FAT16

Obtaining diag files

Execute the display diagnostic-information command, and enter "y" at the prompt to save the diag file to the CF card. If you select "n", not all the diagnostic information can be saved to the CF card. The more cards the device has, the more time the saving operation consumes. During the saving operation, do not execute any command.

<Sysname>display diagnostic-information

Save or display diagnostic information (Y=save, N=display)? [Y/N]:y

Please input the file name(*.diag)[cfa0:/default.diag]:20130517.diag

Diagnostic information is outputting to cfa0:/20130517.diag.

Please wait...

Save successfully.

<Sysname>dir cfa0:/

Directory of cfa0:/

                                                                                                                                   

……

   17     -rw-   5151331  May 17 2013 17:38:32   20130517.diag

                                                                                                                                    

1020068 KB total (735536 KB free)

                                                                                                                                   

File system type of cfa0: FAT32

You can also view the diagnostic information by executing the following commands, but H3C recommends that you not use this method. The screen-length disable command is used to avoid interruption of information output.

<Sysname>screen-length disable

% Screen-length configuration is disabled for current user.

<Sysname>display diagnostic-information

Save or display diagnostic information (Y=save, N=display)? [Y/N]:n

=================================================================

  ===============running CPU usage information===============

=================================================================

===== Current CPU usage info =====

CPU Usage Stat. Cycle: 19 (Second)

CPU Usage            : 5%

CPU Usage Stat. Time : 2013-05-21  10:06:25

CPU Usage Stat. Tick : 0x19aa(CPU Tick High) 0xa57f44e1(CPU Tick Low)

Actual Stat. Cycle   : 0x0(CPU Tick High) 0x39fb1e03(CPU Tick Low)

……

Obtaining other information

You also need to obtain other operational information. The following lists some relevant information:

·     Problem symptom, time, topology, configuration information, measures, and results of the measures.

·     Operation logs, captured packet information, debug information, and information output from the console port during continual MPU and switching fabric card reboots.

·     Alarms of cards, power supply, and fans. For example, you can take pictures to record alarm information.

Troubleshooting procedure

When the switch has a problem, do the following:

1.     Obtain operation information.

2.     Use the troubleshooting flowchart provided in "Troubleshooting flowchart" to determine the problem type.

3.     Use the solution for the problem type to troubleshoot the switch.

If you cannot determine the problem, contact H3C Support.

Troubleshooting flowchart

Use the troubleshooting flowchart shown in Figure 1 to determine the problem type.

Figure 1 Troubleshooting flowchart

 

The following are commonly used troubleshooting methods:

·     Collecting packet statistics on ports.

·     Mirroring packets.

·     Capturing packets.

·     Configuring QoS policies to collect statistics.

·     Enabling debugging functions.

·     Replacing the suspicious hardware or installing the suspicious hardware in another slot.

For example, if a transceiver might have a problem, do one of the following:

?     Replace the transceiver with a transceiver that can operate correctly.

?     Install the transceiver in another slot.

If the card in a slot might have a problem, do one of the following:

?     Replace the card with a card that can operate correctly.

?     Install the card in another slot.

Problem types

Card failure

A card failure might result in the following symptoms:

·     A card cannot start up.

·     A card reboots unexpectedly.

·     A card reboots again and again.

·     A card is not in the correct state.

To troubleshoot a card failure, see "Card failure."

Power failure

A power failure might result in the following symptoms:

·     Power LEDs are not in the correct states.

·     Power alarm messages are displayed continuously.

To troubleshoot a power failure, see "Power supply failure."

Fan failure

A fan failure might result in the following symptoms:

·     Fans do not operate.

·     Fan LEDs are not in the correct states.

·     Fan alarm messages are displayed continuously.

To troubleshoot a fan failure, see "Fan failure."

Temperature problem

If temperature alarm messages are displayed, the device might have a temperature problem. To troubleshoot a temperature problem, see "Temperature alarm."

Port failure

A port failure might result in the following symptoms:

·     A port cannot come up.

·     A port goes down and comes up frequently.

·     The counts of packet errors on the port are not zero.

To troubleshoot a port failure, see "Troubleshooting links and ports."

Hardware forwarding failure

If the log messages such as "Forwarding fault" or "Board fault: chassis X slot Y, please check it" are displayed, the device might have a hardware forwarding failure.

To troubleshoot a hardware forwarding failure, see "Troubleshooting hardware forwarding."

Packet forwarding failure

A packet forwarding failure might result in the following symptoms:

·     Some ping packets are lost, or the ping operation fails.

·     Some tracert packets are lost, or the tracert operation fails.

·     Layer 2 frames are lost, or the Layer 2 link is down.

·     Layer 3 frames are lost, or the Layer 3 connection is down.

·     The MPLS service is not running correctly.

To troubleshoot a packet forwarding failure, see "Troubleshooting hardware forwarding."

IRF failure

An IRF failure might result in the following symptoms:

·     The IRF fabric cannot be formed.

·     An IRF split occurs.

To troubleshoot a packet forwarding failure, see "Troubleshooting IRF."

Overuse of CPU

If the switch uses too much memory, see "High CPU usage."

Insufficient resources

If the "No enough resource" message is displayed, see "Insufficient resources."

Problem locations and possible results

Figure 2 shows a typical network model and the possible problem locations. For higher availability and quick switchover and restoration in response to failures, the network uses two upstream links and two core switches. Table 2 shows the possible symptoms and results of different problem locations.

Figure 2 Typical network model and the possible problem locations

 

Table 2 Problem locations and possible symptoms and results

Problem location

Possible symptoms

Possible results

1 (including transceivers)

A port is down.

A service switchover occurs.

Counts of packet errors are increased.

All services on the link are affected.

2

A card fails.

A service switchover occurs.

A chip on a card fails while the card is operating correctly.

Services on the chip are affected.

If a switching fabric module failure occurs, the whole device is affected.

A software error occurs.

The device reboots and a service switchover occurs.

If a protocol module has a problem, the service is usually affected.

3

Same as problem location 1.

Services on the access switch are affected. The scope of affected services is smaller than a problem at problem location 1.

4

The device is down.

Services on the device are affected.

A chip on a card fails.

Some ports or all services on the device are affected.

A software error occurs.

The device reboots and all services on the device are affected.

If a protocol module has a problem, the service is usually affected.

5

Same as problem location 1.

Server services on the link are affected.

6

The network is operating correctly but a service is not.

The service on the server is affected.

 

Common service recovering and troubleshooting methods

Table 3 Common service recovering and troubleshooting methods

Failure category

Service recovering methods

Troubleshooting methods

Hardware

·     Isolate the failed card.

·     Isolate the failed device by adjusting service traffic forwarding paths. For example, adjust the preferences for routes so traffic is switched to other paths.

Complete required tests on the backup hardware, and replace the failed hardware.

Software

·     Reboot the protocols on the failed device.

·     Isolate the failed device by adjusting service traffic forwarding paths.

·     Upgrade the software or install patches.

·     Adjust the network topology, or modify the configuration to remove the failures.

Link

Isolate the failed link by adjusting service traffic forwarding paths.

Remove link errors.

Others

·     Correct configuration errors.

·     Connect the ports of the devices correctly.

·     Isolate the failed link by adjusting service traffic forwarding paths.

·     Correct configuration errors.

·     Connect the ports of the devices correctly.

·     Repair the power and air conditioner systems for the devices.

 

Troubleshooting hardware

Card failure

Symptom

·     A card runs into an abnormal state: Absent, Fault, Off, Offline, or Illegal.

·     A card fails to boot, or it reboots unexpectedly or repeatedly.

 

 

NOTE:

If the switch outputs log messages, such as "Forwarding fault," "Board fault: chassis X slot Y," or "please check it," see "Troubleshooting hardware forwarding."

 

How to identify a card state

A card can operate in Normal, Master, Slave, Absent, Fault, Off, Offline, or Illegal state:

·     Normal—The card is operating correctly.

·     Master—The card is an active MPU.

·     Slave—The card is a standby MPU.

·     If the card is in Fault, Off, Offline, or Illegal state, or the slot in which the card is installed is in Absent state, the card might be faulty. See "Solution" to rectify the fault.

You can execute the display device command and check the Brd Status field for the card states. The following is a sample command output.

<Sysname>display device

Slot No.   Brd Type        Brd Status   Software Version

1/0       LST1MRPNC1       Master       S12500-CMW520-R1728P02

1/1       LST1MRPNC1       Slave        S12500-CMW520-R1728P02

1/2       LST1XP16LEC1     Normal       S12500-CMW520-R1728P02

1/3       LST1XP16LEC1     Normal       S12500-CMW520-R1728P02

1/4       LST1XP16LEC1     Normal       S12500-CMW520-R1728P02

1/5       NONE             Absent       NONE

1/6       NONE             Absent       NONE

1/7       NONE             Absent       NONE

1/8       NONE             Absent       NONE

1/9       LST1GP48LEC1     Normal       S12500-CMW520-R1728P02

1/10      LST2SF08C1       Normal       S12500-CMW520-R1728P02

1/11      LST2SF08C1       Normal       S12500-CMW520-R1728P02

1/12      LST2SF08C1       Normal       S12500-CMW520-R1728P02

1/13      LST2SF08C1       Normal       S12500-CMW520-R1728P02

1/14      LST2SF08C1       Normal       S12500-CMW520-R1728P02

1/15      LST2SF08C1       Normal       S12500-CMW520-R1728P02

1/16      LST2SF08C1       Normal       S12500-CMW520-R1728P02

1/17      LST2SF08C1       Normal       S12500-CMW520-R1728P02

1/18      LST2SF08C1       Normal       S12500-CMW520-R1728P02

How to confirm a card reboot

Use the display version command or display the card running time through log files to confirm whether a card rebooted. If the card uptime is obviously less than other cards, the card rebooted. See "Solution" to resolve the problem.

<Sysname>display version

H3C Comware Platform Software

Comware Software, Version 5.20, Release 1825P01

Copyright (c) 2004-2013 Hangzhou H3C Tech. Co., Ltd. All rights reserved.

H3C S12504 uptime is 0 week, 0 day, 1 hour, 48 minutes

Last reboot reason : User reboot

 

LST1MRPNC1 1/0:  uptime is 0 week, 0 day, 1 hour, 48 minutes

Last reboot reason : User reboot

3456    Mbytes SDRAM

1024    Kbytes NVRAM Memory

Type     : LST1MRPNC1

BootRom  : 1.22

Software : S12500-CMW520-R1825P01

Patch    : NONE

PCB      : Ver.B

Board Cpu:

  Number of Cpld: 2

  Cpld 0:

    SoftWare  : 005

  Cpld 1:

    SoftWare  : 005

PowChipA  : 004

CpuCard

  Type      : LSR1CPA

  PCB       : Ver.C

  Number of Cpld: 1

  Cpld 0:

    SoftWare  : 001

  BootRom   : 1.13

Mbus card

  Type      : LSR1MBCB

  Software  : 115

  PCB       : Ver.B

……

Solution

In Absent state

To resolve the problem:

1.     Reinstall the card to make sure the card is installed securely.

2.     Do the following:

?     Install this card into another slot.

?     Install another card that runs correctly on the chassis into this slot to determine whether the card is faulty.

3.     Verify that the LEDs on the card panel and inside the card do not indicate any fault.

4.     If the card is an MPU or switching fabric module, connect the card to a terminal through a serial cable to verify that the card can boot correctly.

5.     If the card is confirmed to be faulty, replace the card and contact H3C Support.

In Off state

Determine whether a user powered off the card by using the power-supply off command.

?     If they did, power on the card by using the power-supply on command.

?     If they did not, the power supply of the card is faulty. Replace the card and contact H3C Support.

In Fault state

To resolve the problem:

1.     Wait a period of time and determine whether the card remains in Fault state or reboots after becoming Normal. If the card reboots after becoming Normal, contact H3C Support.

2.     Verify that the card boots correctly.

?     For an MPU or switching fabric module, connect the card to a terminal through a serial cable to verify that the card boots correctly. If a DRAM test fails, causing repeated reboots (as shown in the following), verify that the DRAM is installed securely.

readed value is 55555555 , expected value is aaaaaaaa

DRAM test fails at: 080ffff8

DRAM test fails at: 080ffff8

Fatal error! Please reboot the board.

?     For an LPU, verify that the system working mode supports the card type.

Use the display system working mode command to display the system operating mode:

[Sysname]display system working mode

Current system working mode      : Routee

Working mode after system restart: Routee

Notice: Changing working mode will take effect only after system restart.

If the current system operating mode does not support the card, the switch generates related information as shown in the following example:

%Apr 18 10:08:11:525 2013 H3C SYSM/1/DRV_SYSM:

slot  2 is an EB type board, and it supports Standard working mode only.

%Apr 18 10:08:11:661 2013 H3C SYSM/1/DRV_SYSM:

ERROR!!! slot  2 doesn't support the current system working mode, board rebooting!

%Apr 18 10:08:11:802 2013 H3C SYSM/1/DRV_SYSM:

This is not hardware fault, please change mode by command 'system working mode' in system view.

The output shows that the EB card is not supported in Routee mode.

If you determine that the current system operating mode does not support the card, use the system working mode command to modify the system operating mode. Then save the configuration. The new operating mode takes effect after the switch reboots.

[Sysname]system working mode standard

Standard mode has been set. It needs to be saved and will take effect after system restart.

[Sysname]save

The current configuration will be written to the device. Are you sure? [Y/N]:y

Please input the file name(*.cfg)[flash:/config.cfg]

(To leave the existing filename unchanged, press the enter key):

flash:/config.cfg exists, overwrite? [Y/N]:y

Validating file. Please wait........................................

The current configuration is saved to the active main board successfully.

Configuration is saved to device successfully.

3.     Install the card into another slot to determine whether the card is faulty.

4.     If the card is confirmed to be faulty, replace the card and contact H3C Support.

In Offline state

To resolve the problem:

1.     Determine whether a user isolated the card from the system by using the board-offline command. If the card is isolated due to this operation, use the undo board-offline command to remove the configuration. A card is also isolated from the system when POST is performed.

2.     If an LPU is isolated from the system, a fault might be detected on the LPU by the online diagnostic module. You can execute the display hardware-failure-detection command and check for the records at the time when the card was isolated. If the LPU is faulty, replace the LPU and contact H3C Support.

<Sysname>display hardware-failure-detection

Current level:

chip       : isolate

board      : isolate

forwarding : isolate

----------------------------Slot  4 records:-------------------------------

Slot  0:

1. 2011-06-09, 04:34:14 rebooted by board detection.

Slot  4:

1. 2011-06-09, 11:16:39 rebooted by forwarding detection.

Slot  6:

1. 2011-06-09, 11:13:37 some auto-down ports on this slot are down by

forwarding detection.

2. 2010-06-09, 11:13:16 some auto-down ports on this slot are down by

forwarding detection.

3.     If switching fabric modules are isolated from the system, forwarding-plane failures might be detected, and the system generates log messages such as "Forwarding fault," "Board fault: chassis X slot Y," and "please check it." Verify that the failure is removed after the switching fabric modules are isolated from the system. You can execute the display hardware-failure-detection command to display hardware failure detection and fix information.

?     If one switching fabric module is isolated from the system and the forwarding-plane failure is removed after the switching fabric module is isolated, the switching fabric module is faulty. Replace the switching fabric module and contact H3C Support. If the forwarding-plane failure persists after the switching fabric module is isolated, the switching fabric module is not faulty, because the switching fabric module does not participate in traffic forwarding after being isolated. (The online diagnostic module is not intelligent enough, and misjudgment might occur at multiple points of failures.) You can use the undo board-offline command to get the switching fabric module online. See "Troubleshooting hardware forwarding" to resolve the problem, and contact H3C Support.

?     If multiple switching fabric modules are isolated, the LPUs might be faulty. See "Troubleshooting hardware forwarding" to resolve the problem, and contact H3C Support.

In Illegal state

To resolve the problem:

1.     Verify that the switch supports the card.

2.     Verify that the switch software version supports the card. New cards cannot boot on an earlier software version. Upgrade the software version to support the new cards.

3.     Insert the card into another slot to determine whether the card is faulty.

4.     If the card remains in Illegal state in another slot, replace the card and contact H3C Support.

Unexpected reboot

Unexpected reboot means that a card has rebooted unexpectedly while its current state is Normal.

1.     View the log messages, or execute the display version command to determine the period during which the card rebooted. Then determine whether a user rebooted the card by using the reboot command or by powering off and then powering on the card during the period.

2.     On a switch running 18XX or a later version, the reason for the last reboot is displayed in the display version command output. You can check the Last reboot reason field for the event that caused the last reboot. As shown in the following example, the event that caused the last reboot was a power-on operation.

<Sysname>display version

H3C Comware Platform Software

Comware Software, Version 5.20, Release 1825P01

Copyright (c) 2004-2013 Hangzhou H3C Tech. Co., Ltd. All rights reserved.

H3C S12504 uptime is 0 week, 0 day, 1 hour, 48 minutes

Last reboot reason : User reboot

 

LST1MRPNC1 1/0:  uptime is 0 week, 0 day, 1 hour, 48 minutes

Last reboot reason : User reboot

3456    Mbytes SDRAM

1024    Kbytes NVRAM Memory

Type     : LST1MRPNC1

……

3.     If all cards rebooted simultaneously, verify that the following:

?     The power supplies operate correctly.

?     The power source is not powered off.

?     The power cables are connected securely.

4.     Verify that log messages such as "Slot X need to be rebooted automatically!" are not generated during the card reboot. If a message like that is displayed, replace the card and contact H3C Support.

5.     Verify that the message "Hardware error" is not displayed. If the message is displayed, view the error code:

?     If the error code is 0 through 31 or no smaller than 100, the power supply of the card is faulty. Replace the card and contact H3C Support.

?     For other error codes, contact H3C Support.

%@437307%May 15 22:03:02:122 2013 H3C DIAG/3/ERROR: Hardware error! chassis=1, slot=7, code=0

%@437308%May 15 22:03:02:122 2013 H3C DIAG/3/ERROR: Hardware error! chassis=1, slot=7, code=1

%@437309%May 15 22:03:02:122 2013 H3C DIAG/3/ERROR: Hardware error! chassis=1, slot=7, code=2

6.     Execute the display hardware-failure-detection command. Verify that there is no card reboot record in the determined reboot period in the command output. If there is a card reboot record in the determined period, contact H3C Support.

7.     If the problem persists, contact H3C Support.

Power supply failure

Symptom

The power LED on the switch indicates a failure. An alarm is generated, indicating that a power supply or power monitoring unit (PMU) is faulty, as shown in the following example:

%Sep 22 20:38:32:947 2009 H3C DEVD/3/PMU STATUS: Chassis 1: No.1 power monitor: absent.

%Sep 22 20:38:32:947 2009 H3C DEVD/4/PSU CHANGED: Chassis 1: PSU ID may be changed, please check it!

Solution

To resolve the problem:

1.     Verify that the power supply or PMU is securely installed and that the power supply or PMU LEDs do not indicate any failure. If LEDs of the power supply or PMU indicate any failure, remove and reinstall the power supply or PMU to make sure the module is installed securely. You can also determine whether the power supply or PMU is faulty by exchanging it with another one that runs correctly.

2.     Execute the display power-supply command to display the power supply information.

?     If the power supply and PMU are installed securely but the power supply status field is empty or Absent, a failure occurs. The fault cause is displayed following the status field:

-     If the cause is Under-vol, the power supply might not connect to the power cord, or the external power supply might have a bad contact.

-     For other causes, remove and reinstall the power supply to make sure the power supply is installed securely. You can also determine whether the power supply is faulty by exchanging it with another one that runs correctly.

?     Verify that the PMU information (System power monitoring unit in the command output) is displayed correctly. If the PMU information fails to be displayed, remove and reinstall the PMU, and determine whether the PMU is faulty by exchanging it with another one that runs correctly.

3.     Verify that the card power states are On. For a card that is installed securely in a slot, do one of the following, depending on the state of the card :

?     In Absent state—See "In Absent state" to remove the failure.

?     In Wait state—The system power is insufficient, and the card is waiting to be powered on. Verify that the power source and the power supplies run correctly.

?     In Off state—The card powers off automatically due to user operation, over-temperature protection, or power supply failure, and it will not power on automatically. See "In Offline state" to resolve the problem.

4.     If a power supply or PMU is faulty, replace the module. If the problem persists, contact H3C Support.

The following is a sample output of the display power-supply command:

<Sysname>display power-supply

Power info on chassis 0:

PSU 1/1    state: Normal

PSU 1/2    state: Normal

PSU 1/3    state: Normal

PSU 1/4    state: Normal

PSU 1/5    state: Normal

PSU 1/6    state: Normal

PSU 2/1    state: Normal

PSU 2/2    state: Normal

PSU 2/3    state: Normal

PSU 2/4    state: Normal

PSU 2/5    state: Normal

PSU 2/6    state: Normal

 

<Sysname>display power-supply verbose

 

Power info on chassis 0:

System power-supply policy: enable

System power-module redundant(configured): 1

System power usable: 22000 Watts

System power redundant(actual): 2000 Watts

System power allocated: 7350 Watts

System power available: 14650 Watts

SYSTEM POWER USED(CURRENT): 4959.21 Watts

 

System power monitoring unit 1:

        Software version: 107

 

System power monitoring unit 2:

        Software version: 107

 

Type        In/Out  Rated-Vol(V)  Existing  Usable  Redundant(actual)

----------  ------  ------------  --------  ------  -----------------

PSE9000-A   AC/DC   220(default)  12        11      1

 

DC output voltage information:

Tray Value(V)  Upper-Threshold(V)  Lower-Threshold(V)  Status

---- --------  ------------------  ------------------  -------

  1  50.08     51.00               49.00               Normal

  2  50.10     51.00               49.00               Normal

 

DC output current information:

Total current(A): 99.00

Branch   Value(A)

------   --------

 1/1      9.20

 1/2      8.00

 1/3      8.40

 1/4      7.40

 1/5      9.00

 1/6      7.60

 2/1      7.60

 2/2      9.00

 2/3      7.60

 2/4      7.60

 2/5      9.00

 2/6      8.60

 

PSU Status:

ID  Status  Input-Err   Output-Err High-Temperature Fan-Err Closed Current-Limit

--- ------- ----------- ---------- ---------------- ------- ------ -------------

1/1 Normal

1/2 Normal

1/3 Normal

1/4 Normal

1/5 Normal

1/6 Normal

2/1 Normal

2/2 Normal

2/3 Normal

2/4 Normal

2/5 Normal

2/6 Normal

 

Line-card power status:

Slot  Board-Type       Watts  Status

----  ---------------  -----  ------

 2    LST1XP8LEB1      280    On

 3    LST1XP8LEB1      280    On

 4    LST1XP8LEB1      280    On

 5    LST1XP8LEB1      280    On

 6    LST1XP8LEB1      280    On

 7    LST1XP8LEB1      280    On

 8    LST1XP8LEB1      280    On

 9    LST1XP8LEB1      280    On

10    LST1XP8LEB1      280    On

11    LST1XP8LEB1      280    On

12    LST1XP8LEB1      240    On

13    LST1XP8LEB1      280    On

14    LST1XP8LEB1      240    On

15    LST1XP8LEB1      240    On

16    LST1XP8LEB1      280    On

17    LST1XP8LEB1      280    On

18    LST1XP8LEB1      280    On

19    LST1XP8LEB1      280    On

Fan failure

Symptom

The fan tray LEDs indicate a failure. A fan error message is displayed on the switch, as shown in the following example:

%Sep 22 20:38:32:947 2009 H3C DEVD/3/ FAN CHANGE: Chassis 1: Fan communication state changed: Fan 1 changed to fault.

Solution

To resolve the problem:

1.     Put your hand at the air outlet to verify that there is air being exhausted from the air outlet. If no air is being exhausted from the outlet, the fans are faulty.

2.     Verify that the airflow is not blocked at the air inlet and outlet.

3.     Verify that the fan tray is securely installed. You can remove and reinstall the fan tray to make sure that the fan tray is securely installed.

4.     Verify that the status of each fan is normal and that the speed difference between the fans does not exceed 50%. Execute the display fan verbose command to display detailed information about the fans. If there is an abnormality, verify that the fan tray is not faulty by exchanging it with another one that runs correctly.

5.     If the problem persists, replace the fan tray. If there is no new fan tray, power off the switch to avoid damage caused by high temperatures. The switch can be used temporarily if there are cooling measures to maintain the switch operating temperature below 50°C (122°F).

<Sysname>display fan verbose

Fan-tray verbose state on chassis 0:

Fan-tray 1:

Software version: 108

Hardware version: Ver.A

CPLD version: 002

Fan number: 12

Temperature: 27 C

High temperature alarm threshold: 60 C

Low speed alarm threshold: 1450 rpm

Fan  Status      Speed(rpm)

---  ----------  ----------

 1   normal      3780

 2   normal      3780

 3   normal      3720

 4   normal      3840

 5   normal      3900

 6   normal      3660

 7   normal      3780

 8   normal      3840

 9   normal      3660

10   normal      2940

11   normal      2940

12   normal      2880

 

Fan-tray 2:

Software version: 108

Hardware version: Ver.A

CPLD version: 002

Fan number: 12

Temperature: 21 C

High temperature alarm threshold: 60 C

Low speed alarm threshold: 1450 rpm

Fan  Status      Speed(rpm)

---  ----------  ----------

 1   normal      3720

 2   normal      3720

 3   normal      3780

 4   normal      3660

 5   normal      3660

 6   normal      3720

 7   normal      3660

 8   normal      3660

 9   normal      3660

10   normal      2820

11   normal      2820

12   normal      2760

Temperature alarm

Symptom

A temperature over-low or over-high alarm is generated on the switch, as shown in the following example:

%Sep 22 20:38:32:947 2009 H3C DEVM/4/BOARD_TEMPERATURE_TOOHIGH: Board temperature is too high on Chassis 1 Slot 5, type is LST1GP48LEB1.

Solution

To resolve the problem:

1.     Verify that the ambient temperature is in the compliant range. If the temperature is too high, find the cause. The possible cause might be that the equipment room has bad ventilation or the air conditioning is faulty.

2.     Verify that the current temperature of the switch does not exceed the upper and lower warning and alarm thresholds. The card might be damaged when operating continuously at a high temperature. You can feel the card by hand, or execute the display environment command to display temperature information.

?     If the temperature is too high, see "Fan failure" to determine whether fan failure causes the problem.

?     If the Temperature field displays error or a value out of the ordinary, the switch might fail to access the card temperature sensor through the I2C bus. The switch accesses the transceiver modules through the same I2C bus. You can view whether the transceiver module information is displayed correctly. If the switch can access the transceiver modules, use the temperature-limit command to reconfigure the temperature thresholds. Then use the display environment command to view whether the setting takes effect.

[Sysname]temperature-limit chassis 2 slot 0 hotspot 1 -20 85 90

<Sysname>display environment

System temperature information (degree centigrade):

-------------------------------------------------------------------------------

Slot  Sensor    Temperature  LowerLimit  WarningLimit  AlarmLimit ShutdownLimit

2/0   inflow  1       35         -25           70           85          N/A

2/0   outflow 1       40         -20           80           85          N/A

2/0   hotspot 1       43         -20           85           90          N/A

2/2   inflow  1       39         -20           70           85          N/A

2/2   outflow 1       40         -10           80           90          N/A

2/2   hotspot 1       41         -10           80           90          N/A

2/3   inflow  1       41         -20           70           85          N/A

2/3   outflow 1       57          15           80           85          N/A

2/3   hotspot 1       41         -20           75           80          N/A

2/3   hotspot 2       50           0           75           80          N/A

2/4   inflow  1       43         -20           70           85          N/A

2/4   outflow 1       60          15           80           85          N/A

2/4   hotspot 1       43         -20           75           80          N/A

2/4   hotspot 2       54           0           75           80          N/A

3.     If the problem persists, contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting hardware.

 

Command

Description

display device

Displays device information, including the card states.

display environment

Displays the temperature statistics of the device, including the current temperature and temperature thresholds.

display fan

Displays the operating states of fans.

display hardware-failure-detection

Displays hardware failure detection and rectification information, including the rectification actions for each failure and historic information about the last ten fault rectifications on each card.

display power-supply

Displays power supply information:

·     Enabled/disabled status of the power supply management function.

·     Power supply type, rated input voltage, and rated output power.

·     Number of redundant power supplies and the available, redundant, used, and remaining power of each power supply.

·     Status of the installed power supplies.

·     Power supply status of the LPUs.

display system working mode

Displays the current system operating mode.

display version

Displays system version information, card running time, and cause of the last reboot.

Save

Saves the running configuration to a specific configuration file.

system working mode

Sets the system operating mode to modify the hardware resources allocation. The command takes effect after the configuration is saved and the device reboots.

temperature-limit

Sets the temperature alarm thresholds for the device.

 

Troubleshooting links and ports

This section provides troubleshooting information for common problems with links and ports.

Error packets on a port

Symptom

Use the display interface command to display the traffic statistics about incoming packets and outgoing packets of a port. The error packet count is not 0.

<Sysname>display interface ten-gigabitethernet 1/2/0/6

Ten-GigabitEthernet1/2/0/6 current state: UP

 IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 80f6-2ec3-ac04

 Description: SH-B15A-0202-J20-H5800-L-01-te1/0/49

 Loopback is not set

 Media type is optical fiber, Port hardware type is 10G_BASE_SR_SFP

 10Gbps-speed mode, full-duplex mode

 Link speed type is force link, link duplex type is force link

 Flow-control is not enabled

 The Maximum Frame Length is 8168

 Broadcast MAX-ratio: 100%

 Unicast MAX-ratio: 100%

 Multicast MAX-ratio: 100%

 Allow jumbo frame to pass

 PVID: 1

 Link delay is 2(sec)

 Ethernet port mode: LAN

 Port link-type: trunk

  VLAN passing  : 1(default vlan), 10-28, 91-93, 106-108, 121-123, 184, 401, 999

  VLAN permitted: 1(default vlan), 2-4094

  Trunk port encapsulation: IEEE 802.1q

 Port priority: 2

 Last clearing of counters:  Never

 Peak value of input: 10070 bytes/sec, at 2013-05-14 19:11:30

 Peak value of output: 315310 bytes/sec, at 2013-05-14 19:56:27

 Last 300 seconds input:  0 packets/sec 90 bytes/sec 0%

 Last 300 seconds output:  0 packets/sec 530 bytes/sec 0%

 Input (total):  1617091 packets, 131185047 bytes

     1144855 unicasts, 79482 broadcasts, 392754 multicasts, - pauses

 Input (normal):  1617091 packets, 131185047 bytes

     1144855 unicasts, 79482 broadcasts, 392754 multicasts, 0 pauses

 Input:  0 input errors, 0 runts, 0 giants, 0 throttles

     0 CRC, 0 frame, 0 overruns, - aborts

     - ignored, - parity errors

 Output (total): 7779022 packets, 862020306 bytes

     1138915 unicasts, 3567900 broadcasts, 3072207 multicasts, - pauses

 Output (normal): 7779022 packets, 862020306 bytes

     1138915 unicasts, 3567900 broadcasts, 3072207 multicasts, 0 pauses

 Output: 0 output errors, - underruns, - buffer failures

     0 aborts, 0 deferred, 0 collisions, 0 late collisions

     - lost carrier, - no carrier

Table 4 Error packet fields for incoming packets

Field

Description

input errors

Number of incoming error packets.

Runts

Number of incoming frames shorter than 64 bytes, in correct format, and containing valid CRCs.

Giants

Number of incoming frames larger than the maximum frame length configured on the interface.

CRC

Number of incoming frames that contained CRC errors.

frame

Number of incoming frames that contained CRC errors and a non-integer number of bytes.

 

Table 5 Error packets fields for outgoing packets

Field

Description

output errors

Number of outgoing error packets.

Aborts

Number of packets that failed to be transmitted.

Deferred

Number of frames that the interface failed to transmit when the delay exceeded two times the maximum packet transmission time because the medium was busy.

Collisions

Number of frames that the interface stopped transmitting because Ethernet collisions were detected during transmission.

late collisions

Number of frames that the interface deferred to transmit after transmitting their first 512 bits because of detected collisions.

 

Solution

The number of incoming error packets of the CRC, frame, and throttle types keeps increasing on a port

To resolve the problem:

1.     Use a tester to test the link, and verify that the link quality or fiber signal attenuation of the link is normal. If a link failure exists, replace the network cable or fiber.

A weak link quality or serious fiber signal attenuation will cause packet transmission errors.

2.     Verify that the transceiver module is operating correctly if a transceiver module is used.

For more information, see "Transceiver module failures."

3.     Use the network cable or fiber and transceiver module of the port to connect to another port that is operating correctly.

?     If error packets do not appear on the new port and error packets appear after the network cable or fiber and transceiver module is connected to the current port again, you can determine that the port fails. Use another port that is operating correctly, and contact H3C Support.

?     If error packets still appear on the new port, the peer device and intermediate transmission links might fail. Examine the peer device and intermediate transmission links.

4.     Verify that the peer device and intermediate devices are operating correctly.

5.     If the problem persists, contact H3C Support.

The number of incoming error packets of the overrun type keeps increasing on a port

The number of overrun packets keeps increasing on a port because the input rate exceeds the processing capability of the port, which causes congestion.

To resolve the problem:

1.     Execute the display interface command multiple times when both of the following are true:

?     Only one port cannot correctly send and receive packets, or only the device attached to one port cannot transmit traffic.

?     The other ports on the same interface card are operating correctly.

2.     Perform one of the following tasks, depending on the error packet count trend:

?     If the number of input errors increases, but the number of overruns does not increase, examine the fiber, transceiver module, and the peer device.

?     If the number of input errors increases and the increment is the same as the increment of overruns, the interface card might be internally congested or blocked:

-     If the number of overruns increases, but the number of packets in the Input (normal) field does not increase, all incoming packets become overruns, which indicates that the port is blocked. To resolve the problem, contact H3C Support.

-     If the number of overruns increases, and the number of packets in the Input (normal) field also increases, some of the incoming packets become overruns, which indicates that the port is congested. To resolve the problem, contact H3C Support.

3.     If the problem persists, contact H3C Support.

The incoming error packets of the jumbo type keeps increasing on a port

To resolve the problem:

1.     Verify that the jumbo frame configurations are the same on both ends, including:

?     Whether jumbo frame support is enabled.

?     The default maximum jumbo frame size allowed.

?     The configured maximum jumbo frame size allowed.

2.     If the problem persists, contact H3C Support.

The number of outgoing error packets keeps increasing on a port

To resolve the problem:

1.     Examine the duplex mode of the port. Configure the port to operate in full duplex mode if the port is operating in half duplex mode.

2.     If the problem persists, contact H3C Support.

A port fails to go up

Symptom

A port cannot go up.

Solution

To resolve the problem:

1.     Verify that the network cable or fiber link between ports is correct.

2.     Verify that the Rx end and the Tx end are correctly connected.

3.     Verify that the intermediate transmission link is correct by performing one of the following tasks:

?     Replace the network cable or fiber between ports.

?     Connect other ports that are operating correctly by using the network cable or fiber.

4.     Verify that the configurations of the local port and the peer port are correct, including whether the port is shut down, and its speed, duplex mode, negotiation mode, and MDI.

[Sysname]display current-configuration interface ten-gigabitethernet 1/6/0/1

#

interface Ten-GigabitEthernet1/6/0/1

 port link-mode bridge

 port link-type trunk

 port trunk permit vlan 1 3102

 port link-aggregation group 1

#

Return

Table 6 Support for duplex modes

Speed (right)

10 Gbps

1000 Mbps

100 Mbps

10 Mbps

Duplex mode (below)

Full

Supported

Supported

Supported

Supported

Half

No supported

No supported

No supported

No supported

 

5.     If the port has a transceiver module installed, verify that the transceiver modules at both ends of the link are consistent in the rate, wavelength, and single-mode or multi-mode status.

[Sysname]display transceiver interface ten-gigabitethernet 2/9/0/1

Ten-GigabitEthernet2/9/0/1 transceiver information:

  Transceiver Type              : 10G_BASE_LRM_SFP

  Connector Type                : LC

  Wavelength(nm)                : 1310

  Transfer Distance(m)          : 220(50um),220(62.5um),220(om3)

  Digital Diagnostic Monitoring : YES

  Vendor Name                   : H3C.

6.     Replace the transceiver module with a transceiver module that is operating correctly, and determine whether the transceiver modules fail.

For more information, see "Transceiver module failures."

7.     If the transceiver module fails, replace the transceiver module, and contact H3C Support.

A port in up state goes down

Symptom

A port in up state goes down.

Solution

To resolve the problem:

1.     Examine the logs of the local device and the peer device, and verify that a shutdown operation has not been performed.

2.     Examine the status of ports at both ends. Determine whether the port is shut down because of the protocol failures or because of the failures detected by the online diagnosis module.

3.     Contact H3C Support if Protect DOWN appears in the output for a port, for example, Ten-GigabitEthernet 2/8/0/1.

Protect DOWN means that the port goes down because the isolate keyword is specified for the hardware-failure-detection command. When the online diagnosis module detects port failures, the port will be shut down and isolated, so that the traffic can be switched to the backup link.

[Sysname]display interface ten-gigabitethernet 2/8/0/1

Ten-GigabitEthernet2/8/0/1 current state: DOWN ( Protect DOWN )

 IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 80f6-2ec3-ac05

 Description: SH-B15A-0202-V03-H5800-L-01-te1/0/50

 Loopback is not set

 Media type is optical fiber, Port hardware type is 10G_BASE_SR_SFP

 10Gbps-speed mode, full-duplex mode

 Link speed type is force link, link duplex type is force link

 Flow-control is not enabled

 ……

4.     Verify that the configurations of ports at both ends, network cables, transceiver modules, and fiber links are correct.

For more information, see "A port fails to go up."

5.     If the problem persists, contact H3C Support.

A port frequently goes up and down

Symptom

A port frequently goes up and down.

Solution

1.     For a fiber port, verify that the transceiver module is operating correctly.

For more information, see "Transceiver module failures."

2.     For a copper port, the port status might be unstable when the speed and duplex mode are autonegotiated. Manually configure the speed and duplex mode for the port.

3.     Verify that the link, peer device, and intermediate devices are operating correctly.

4.     If the problem persists, contact H3C Support.

Transceiver module failures

Symptom

The interface with a transceiver module installed cannot go up, and alarms are present.

Solution

To resolve the problem:

1.     Check the alarms on the transceiver module:

?     If TX faults exist in the alarms, the peer port, fiber, or intermediate transmission devices might fail.

?     If the RX faults or electrical current and voltage faults exist in the alarms, examine the local port.

<Sysname>display transceiver alarm interface GigabitEthernet 2/0/1

GigabitEthernet2/0/1 transceiver current alarm information:

  TX fault

  PCS receive local fault

  Laser temperature fault

Table 7 Alarms on transceiver modules

Field

Description

Alarms on SFP/SFP+ transceiver modules:

RX loss of signal

Received signals are lost.

RX power high

The received optical power is high.

RX power low

The received optical power is low.

TX fault

Transmission error.

TX bias high

The transmitted bias current is high.

TX bias low

The transmitted bias current is low.

TX power high

The transmitted optical power is high.

TX power low

The transmitted optical power is low.

Temp high

The temperature is high.

Temp low

The temperature is low.

Voltage high

The voltage is high.

Voltage low

The voltage is low.

Transceiver info I/O error

Transceiver information read/write error.

Transceiver info checksum error

Transceiver information checksum error.

Transceiver type and port configuration mismatch

The type of the transceiver module does not match the port configuration.

Transceiver type not supported by port hardware

The port does not support this type of transceiver modules.

Alarms on XFP transceiver modules:

RX loss of signal

Received signals are lost.

RX not ready

The receiving status is not ready

RX CDR loss of lock

Receiving CDR loss of lock.

RX power high

The received optical power is high.

RX power low

The received optical power is low.

TX not ready

The transmission status is ready.

TX fault

Transmission error.

TX CDR loss of lock

Transmission CDR loss of lock.

TX bias high

The transmitted bias current is high.

TX bias low

The transmitted bias current is low.

TX power high

The transmitted optical power is high.

TX power low

The transmitted optical power is low.

Module not ready

The module is not ready.

APD supply fault

Avalanche photo diode error.

TEC fault

Thermoelectric cooler error.

Wavelength unlocked

Wavelength loss of lock.

Temp high

The temperature is high.

Temp low

The temperature is low.

Voltage high

The voltage is high.

Voltage low

The voltage is low.

Transceiver info I/O error

Transceiver information read/write error.

Transceiver info checksum error

Transceiver information checksum error.

Transceiver type and port configuration mismatch

The type of the transceiver module does not match the port configuration.

Transceiver type not supported by port hardware

The port does not support this type of transceiver modules.

 

2.     Cross-verify the transceiver module that might fail:

a.     Install the transceiver module in another fiber port.

b.     Replace the current transceiver module with a transceiver module that is operating correctly.

3.     Determine whether the transceiver module fails or the neighboring devices and intermediate transmission links fail.

4.     If the transceiver module fails, use the display transceiver diagnosis command to display the digital diagnosis parameters on the transceiver module, and contact H3C Support.

You might fail to query the digital diagnosis parameters of a non-H3C transceiver module. H3C recommends that you use H3C transceiver modules. To query the vendor of a transceiver module, use the display transceiver manuinfo command. If the value of the Vendor Name field is H3C, the transceiver module is customized by H3C.

Related commands

This section lists the commands that you might use for troubleshooting ports and links.

 

Command

Description

display current-configuration

Displays the running configuration. With an interface specified, this command displays the running configuration of the interface.

display interface

Displays the incoming traffic statistics, outgoing traffic statistics, and status of a port. In the output from this command, you can view whether error packets exist and view the error packet statistics.

display transceiver alarm

Displays alarms present on transceiver modules.

display transceiver diagnosis

Displays the current values of the digital diagnosis parameters on transceiver modules.

display transceiver interface

Displays key parameters of the transceiver module in a specified interface to verify whether the transceiver modules at both ends are consistent in the rate, wavelength, and single-mode or multi-mode status.

display transceiver manuinfo

Displays the electronic label information of a transceiver module to query the vendor of the transceiver module.

 

Troubleshooting hardware forwarding

Forwarding path problem

Symptom

When data forwarding path failure detection is enabled (it is enabled by default), the switch periodically sends test packets between LPUs to examine whether the forwarding chips on the LPUs are operating correctly.

[Sysname]forward-path check enable

If a forwarding problem occurs, the switch displays "Forwarding fault" or "Board fault" messages. For example:

%May 12 11:51:30:664 2013 H3C DIAG/3/ERROR: -Slot=12; Forwarding fault: slot 18 to slot 12

%May 12 11:51:30:664 2013 H3C DIAG/3/ERROR: -Slot=14; Forwarding fault: slot 18 to slot 14

%May 12 11:51:30:665 2013 H3C DIAG/3/ERROR: -Slot=13; Forwarding fault: slot 18 to slot 13

%May 12 11:51:30:665 2013 H3C DIAG/3/ERROR: -Slot=16; Forwarding fault: slot 18 to slot 16

%May 12 11:51:31:494 2013 H3C DIAG/3/ERROR: Board fault: chassis 0 slot 18,please check it

%May 12 11:51:31:702 2013 H3C DIAG/3/ERROR: Board fault: chassis 0 slot 18,please check it

Solution

The switch has MPUs, LPUs, and switching fabric modules. LPUs and switching fabric modules perform service traffic forwarding. Traffic is load balanced among the switching fabric modules. MPUs perform control and management. MPUs do not participate in service traffic forwarding.

To resolve the forwarding path problem:

·     If "Forwarding fault" messages show forwarding problems between multiple LPUs, it is likely that a switching fabric module has a problem. To locate the problem source, isolate switching fabric modules one by one. (An isolated switching fabric module does not participate in traffic forwarding. Isolating a switching fabric module does not result in packet loss.)

For example, do the following on an H3C S12508 switch in which slots 10 through 18 hold switching fabric modules:

a.     Isolate the switching fabric module in slot 10.

[Sysname] board-offline slot 10

Caution: This command is only for diagnostic purpose which will cause board normal service unusable. Continue? [Y/N]:y

Config successfully

b.     Observe for a while to see whether the problem disappears.

c.     If the problem disappears, the switching fabric module is likely to be the problem source. H3C recommends that you replace the module or install the module into another switch that is operating correctly to determine whether the module is really the problem source.

d.     If the problem persists, cancel the isolation.

[Sysname]undo board-offline slot 10

This command will reboot the specified board. Continue? [Y/N]:y

Config successfully

e.     After the switching fabric module in slot 10 starts up and operates correctly (in Normal state), isolate the switching fabric module in the next slot. Repeat the previous steps until you locate the failed switching fabric module and verify that other switching fabric modules are operating correctly.

·     If "Forwarding fault" messages show forwarding problems from the same LPU to multiple other LPUs, the LPU is likely to have a problem. If you are not sure whether the LPU has a problem, H3C recommends that you do the following to locate the problem source:

a.     Isolate switching fabric modules one by one, and observe whether the problem disappears.

b.     If the problem persists during the whole isolation process, the LPU might be the source of the problem. H3C recommends that you switch the services on the LPU to other LPUs and replace or isolate the LPU. If the problem is solved, the LPU is the source of the problem.

Online hardware diagnostic and failure protection

After you enable the hardware failure detection function, the switch automatically detects hardware failures on the following elements:

·     chip—Components.

·     board—Cards.

·     forwarding—Forwarding plane.

You can configure the switch to take the following actions in response to hardware failures:

·     off—Takes no action.

·     warningSends traps to notify you of the failures. (The default setting is warning.)

·     resetRestarts the relevant cards to recover from failures.

·     isolateShuts down the relevant ports, prohibits loading software for the relevant cards, isolates the relevant cards, or powers off the relevant cards to reduce impact from the failures.

If there are backup links, H3C recommends that you configure the switch to take the isolate action. This action isolates the failed element and helps recover services quickly. The following shows the configuration commands:

[Sysname]hardware-failure-detection chip isolate

Config successfully

[Sysname]hardware-failure-detection board isolate

Config successfully

[Sysname]hardware-failure-detection forwarding isolate

Config successfully

To display hardware failure detection and fix information, use the following command:

<Sysname>display hardware-failure-detection

Current level:

    chip       : warning

    board      : warning

    forwarding : warning

---------------------Chassis 1, Slot  0 executed records:-------------------

                 There is no record.

 

---------------------Chassis 1, Slot  0 trapped records:--------------------

                 There is no record.

Related commands

This section lists the commands that you might use for troubleshooting hardware forwarding.

 

Command

Description

board-offline

Isolate a card from the system.

display hardware-failure-detection

Display hardware failure detection and fix information, including the following times:

·     Protection actions configured for hardware failures.

·     Most recent 10 fix records of each card.

forward-path check enable

Enable data forwarding path failure detection to examine whether data forwarding paths are operating correctly.

hardware-failure-detection

Configure hardware failure detection, and specify the actions to be taken in response to hardware failures. The purpose is to enable the device to automatically detect hardware failures and recover services.

 

Troubleshooting packet forwarding failure

Ping failure or packet loss

Symptom

Packet loss and ping failure occurred.

<Sysname>ping 10.0.0.5

  PING 10.0.0.5: 56  data bytes, press CTRL_C to break

    Request time out

    Request time out

    Request time out

    Request time out

    Request time out

                                                                               

  --- 10.0.0.5 ping statistics ---

    5 packet(s) transmitted

    0 packet(s) received

    100.00% packet loss

Solution

Packet statistics collection

To resolve the problem, collect packet statistics by using packet capture tools or by configuring ACL rules. The following uses ACL rule as an example.

1.     Create an IPv4 advanced ACL rule to permit IP packets destined for 1.1.1.1.

[Sysname]acl number 3000

[Sysname-acl-adv-3000] rule 1 permit ip destination 1.1.1.1 0

2.     Define a traffic class and a traffic behavior.

[Sysname]traffic classifier statistic_1

[Sysname-classifier-static] if-match acl 3000

[Sysname] traffic behavior statistic_1

[Sysname-classifier-static] accounting packet

3.     Create a QoS policy, and associate traffic class statistic_1 with traffic behavior statistic_1 in the QoS policy.

[Sysname] qos policy statistic_1

[Sysname-classifier-static] classifier statistic_1 behavior statistic_1

4.     Apply the QoS policy to the incoming traffic of GigabitEthernet 8/0/1.

[Sysname] interface gigabitethernet 8/0/1

[Sysname-GigabitEthernet8/0/1] qos apply policy statistic_1 inbound

5.     Display information about the QoS policies applied to GigabitEthernet 8/0/1.

[Sysname] display qos policy interface g8/0/1

Interface: GigabitEthernet8/0/1

 

  Direction: Inbound

 

  Policy: statistic_1

   Classifier: statistic_1

     Operator: AND

     Rule(s) : If-match acl 3000

     Behavior: statistic_1

      Accounting Enable:

        1000 (Packets)

Packet count

If the device does not receive any ping packets, check the neighboring device on the uplink. If the number of ping packets sent by the device is correct, check the neighboring device on the downlink. If the number of ping packets sent is incorrect, see "Layer 2 forwarding failure, "Layer 3 forwarding failure," and "MPLS forwarding failure."

Layer 2 forwarding failure

Symptom

Layer 2 packet loss or ping failure occurs between a switch and a device on the same network segment and in the same VLAN.

A switch can perform Layer 2 forwarding only when the destination MAC address of a packet is different from any MAC address of the switch. The switch has multiple MAC addresses in an address range. The following output shows the MAC addresses of a VLAN interface on a switch:

[Sysname]display interface vlan-interface 10

Vlan-interface10 current state: UP

Line protocol current state: UP

Description: Vlan-interface10 Interface

The Maximum Transmit Unit is 1500

Internet Address is 10.0.0.1/24 Primary

IP Packet Frame Type: PKTFMT_ETHNT_2,  Hardware Address: 00e0-fc00-6503

IPv6 Packet Frame Type: PKTFMT_ETHNT_2,  Hardware Address: 00e0-fc00-6503

Last clearing of counters:  Never

    Last 300 seconds input rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

    Last 300 seconds output rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

    0 packets input, 0 bytes, 0 drops

    0 packets output, 0 bytes, 0 drops

Solution

To resolve the problem:

1.     Verify that the following Layer 2 configurations are correct:

?     VLAN and PVID.

?     Packet filtering.

?     Traffic redirection.

?     Traffic policing.

?     Generic traffic shaping (GTS).

?     Unknown unicast suppression/multicast suppression/broadcast suppression.

2.     Verify that the learned MAC addresses are correct. If they are not, determine whether loops occur. To quickly restore forwarding, you can configure static MAC address entries.

<Sysname>display mac-address

MAC ADDR        VLAN ID   STATE            PORT INDEX              AGING TIME(s)

000f-e259-79c0    25      Learned          GigabitEthernet2/15/0/1      AGING

00e0-fc12-3456    25      Learned          GigabitEthernet2/15/0/1      AGING

0023-8956-7b00  3102      Learned          Ten-GigabitEthernet2/4/0/1   AGING

0023-8956-7b00  3202      Learned          Ten-GigabitEthernet2/4/0/8   AGING

                                                                               

  ---  4 mac address(es) found  ---

3.     Verify traffic statistics:

?     Execute the qos traffic-counter inbound command to collect statistics about the inbound traffic.

[Sysname]qos traffic-counter inbound counter0 slot 3 interface Gigabitethernet 3/0/1

?     Execute the display qos traffic-counter inbound multiple times to observe the discarded packet count in the inbound direction. If the count continuously increases, verify the port configurations according to Table 8. If the reasons for packet loss still cannot be determined, contact H3C Support.

[Sysname]display qos traffic-counter inbound counter0 slot 3

Slot 3 inbound counter0 mode:

 Interface: GigabitEthernet3/0/1

 VLAN: all

 

Traffic-counter summary:

 Bridge in frames: 0 packets

 Bridge local discarded: 0 packets

 Bridge vlan ingress filter discarded: 0 packets

 Bridge security filter discarded: 0 packets

Table 8 Command output

Field

Description

Bridge in frames

Number of incoming packets.

Bridge local discarded

A packet might be dropped due to the following reasons:

·     Traffic suppression is performed.

·     The outgoing interface is the same as the incoming interface, according to the MAC address table lookup result.

·     STP sets the state of the interface to discarding.

Bridge vlan ingress filter discarded

A packet might be dropped due to the following reasons:

·     The VLAN of the packet is different from the VLAN of the interface.

·     The VLAN of the packet hasn't been created.

Bridge security filter discarded

A packet might be dropped due to the following reasons:

·     The packet matches a blackhole MAC address entry. To display blackhole MAC address entries, execute the display mac-address blackhole command.

·     The packet fails the MAC authentication. To display MAC authentication settings and statistics, execute the display mac-authentication interface command.

·     The source MAC address of the packet is a multicast MAC address or broadcast MAC address.

·     The source MAC address of the packet is unknown to the interface.

 

?     Execute the qos traffic-counter outbound command to collect statistics about the outbound traffic.

[Sysname]qos traffic-counter outbound counter0 slot 4 interface Gigabitethernet 4/0/1

?     Execute the display qos traffic-counter outbound multiple times to observe the discarded packet count in the outbound direction. If the count continuously increases, verify the port configurations according to Table 9. If the reasons for packet loss still cannot be determined, contact H3C Support.

[Sysname]display qos traffic-counter outbound counter0 slot 4

Slot 4 outbound counter0 mode:

 Interface: GigabitEthernet4/0/1

 VLAN: all

 Local precedence: all

 Drop priority: all

 

Traffic-counter summary:

 Unicast: 0 packets

 Multicast: 0 packets

 Broadcast: 0 packets

 Control packets: 0 packets

 Bridge egress filtered packets: 0 packets

 Tail drop packets: 0 packets

 Multicast Tail drop packets: 2 packets

 Forward restrictions packets: 0 packets

Table 9 Command output

Field

Description

Unicast/Multicast/Broadcast

Number of packets that are not dropped.

Control packets

Number of control packets sent by the CPU.

Bridge egress filtered packets

A packet might be dropped due to the following reasons:

·     The VLAN of the packet is different from the VLAN of the interface.

·     STP sets the state of the interface to discarding.

·     RRPP or Smart Link blocks the interface.

·     The outgoing interface is down.

Tail drop packets

A packet might be dropped due to the following reasons:

·     The transmit queue is congested.

·     Traffic shaping is performed.

Multicast Tail drop packets

A multicast or broadcast packet might be dropped due to the following reasons:

·     No outgoing interface is configured for the packet.

·     STP blocks the interface.

·     The outgoing interface is down.

Forward restrictions packets

Number of packets that are prevented from being forwarded.

 

Layer 3 forwarding failure

Symptom

IP service failures, ping or tracert operation failures, or ping or tracert packet loss occurs.

A switch performs Layer 3 forwarding by using the driver IP forwarding table instead of the routing table. The route management module selects optimal routes through various protocols, and puts them into the FIB table. The FIB table synchronizes the routes to the driver IP forwarding table, which guides packet forwarding.

Figure 3 Relationship between the routing table and forwarding table

 

Solution

To resolve the problem:

1.     Use the mirroring function or capture packets to verify that the destination MAC address of packets is the MAC address of the switch.

A switch can perform Layer 3 forwarding only when the destination MAC address of a packet is the MAC address of the switch. The switch has multiple MAC addresses in an address range. The following output shows the MAC addresses of VLAN interfaces on a switch:

[Sysname]display interface vlan-interface 10

Vlan-interface10 current state: UP

Line protocol current state: UP

Description: Vlan-interface10 Interface

The Maximum Transmit Unit is 1500

Internet Address is 10.0.0.1/24 Primary

IP Packet Frame Type: PKTFMT_ETHNT_2,  Hardware Address: 00e0-fc00-6503

IPv6 Packet Frame Type: PKTFMT_ETHNT_2,  Hardware Address: 00e0-fc00-6503

Last clearing of counters:  Never

    Last 300 seconds input rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

    Last 300 seconds output rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

    0 packets input, 0 bytes, 0 drops

    0 packets output, 0 bytes, 0 drops

2.     Verify that the route to the specific destination exists in the routing table. If it does not exist, examine the routing protocol configurations and protocol states.

[Sysname]display ip routing-table 1.1.1.0

Routing Table : Public

Summary Count : 1

                                                                               

Destination/Mask    Proto  Pre  Cost         NextHop         Interface

                                                                               

1.1.1.0/24           Static 60   0             20.0.0.2        Vlan20

3.     Verify that the route to the specific destination exists in the FIB table. If a route exists but cannot be used to guide the packet forwarding, contact H3C Support.

[Sysname]display fib 1.1.1.0

Destination count: 1    FIB entry count: 1

Flag:

  U:Useable G:Gateway H:Host B:Blackhole   D:Dynamic   S:Static

  R:Relay

Destination/Mask Nexthop   Flag  OutInterface  InnerLabel Token

1.1.1.0/24        20.0.0.2  USG    Vlan20         Null        Invalid

4.     Verify that the interfaces in the learned ARP entries are correct. If they are not, execute the reset arp command to clear ARP entries so that the device can learn the correct ARP entries. You can also configure static ARP entries. If the problem persists, contact H3C Support.

[Sysname]display arp 20.0.0.2

                Type: S-Static    D-Dynamic    A-Authorized    M-Multiport

IP Address       MAC Address     VLAN ID  Interface              Aging Type

20.0.0.2         0000-0000-0001  20        GE2/0/1                N/A    S

MPLS forwarding failure

Symptom

You might experience the following problems with MPLS forwarding:

·     Unreachable destination.

·     No routes.

·     Error message printed.

·     Unstable tunnels.

·     Packet sending or receiving failure.

Solution

VLL, VPLS, and L3VPN are implemented based on LSPs.

To resolve the common problems with MPLS, verify the LSP and route configurations on the LSRs.

Figure 4 MPLS network diagram

 

Troubleshooting MPLS LSPs

Perform the following configurations on the ingress node (PE 1 in Figure 4):

1.     Execute the display mpls lsp command to display LSP information.

[PE1]display mpls lsp

-------------------------------------------------------------------------

LSP Information: LDP LSP

-------------------------------------------------------------------------

FEC                In/Out Label  In/Out IF                       Vrf Name

4.4.4.4/32         NULL/3        -/Vlan103

90.0.0.0/24        NULL/3        -/Vlan103

1.1.1.1/32         3/NULL        -/InLoop0

50.0.0.0/24        NULL/3        -/Vlan103

70.0.0.0/24        NULL/3        -/Vlan103

3.3.3.3/32         NULL/1025     -/Vlan103

If the configured LSP does not exist, verify the MPLS LSP configuration on each LSR.

2.     Execute the display mpls ldp peer command to display the LDP peer and session information.

[PE1]display mpls ldp peer

                                                                               

         LDP Peer Information in Public network

 Total number of peers: 1

-------------------------------------------------------------------------

 Peer-ID                Transport-Address  Discovery-Source

-------------------------------------------------------------------------

 4.4.4.4:0              4.4.4.4            Vlan-interface103

 ----------------------------------------------------------------

If the configured LSP is not up, verify the MPLS LSP configuration on each LSR.

3.     Execute the display mpls ldp session command to display LDP session information.

[PE1]display mpls ldp session

               LDP Session(s) in Public Network

 Total number of sessions: 1

-------------------------------------------------------------------------

 Peer-ID            Status        LAM  SsnRole  FT   MD5  KA-Sent/Rcv

-------------------------------------------------------------------------

 4.4.4.4:0    Non Existent        ---  Passive  Off  Off  0/0

-------------------------------------------------------------------------

 LAM : Label Advertisement Mode         FT  : Fault Tolerance

If the session status is not Operational, an error might occur. Go to steps 4 and 5 to further determine the problem. If the session status is Operational, go to step 6.

4.     Execute the display current-configuration configuration mpls-ldp command to display the MD5 password configuration for MPLS LDP.

<PE1>display current-configuration configuration mpls-ldp

#

mpls ldp

 md5-password cipher 2.2.2.2 GXA^DW>%V=_Q=^Q`MAF4<1!!

#

return

Verify that the local LSR and the peer LSR have the same MD5 password.

5.     Execute the display mpls ldp interface command to display LDP interface information.

[PE1]display mpls ldp interface

                                                                                

     LDP Interface Information in Public Network

-------------------------------------------------------------------------

 IF-Name         Status       LAM   Transport-Address   Hello-Sent/Rcv

-------------------------------------------------------------------------

 Vlan103         Active       DU    1.1.1.1             469/608

-------------------------------------------------------------------------

 LAM: Label Advertisement Mode         IF-Name: Interface name

If the label advertisement mode does not exist, verify the MPLS configuration on each LSR.

6.     Execute the display current-configuration command to display local LSR information.

<PE1>display current-configuration | include lsr-id

 mpls lsr-id 2.2.2.2

<PE1>display ip interface brief

*down: administratively down

(s): spoofing  (l): loopback

Interface          Physical Protocol IP Address      Description

Loop0              up       up(s)    100.100.100.100 --

Loop2              up       up(s)    100.100.100.102 --

M-E0/0/0           up       up       192.168.147.7   --

Vlan10             down     down     192.168.10.1    --

If the LSR ID is not the IP address of a loopback interface, H3C recommends that you configure the IP address of a loopback interface as the LSR ID by executing the mpls lsr-id command.

<PE1>system-view

[PE1]mpls lsr-id 100.100.100.100

7.     Verify that the VLAN interface is enabled with MPLS and MPLS LDP.

[PE1]interface vlan-interface 103

[PE1-Vlan-interface103]display this

#

interface Vlan-interface103

 ip address 1.1.1.2 255.255.255.0

 mpls

 mpls ldp

#

return

Troubleshooting routes

Perform the following configurations on the ingress node (PE 1 in Figure 4):

1.     Execute the display ip routing-table command to display routing table information.

[PE1]display ip routing-table

Routing Tables: Public

         Destinations : 10       Routes : 10

 

Destination/Mask    Proto  Pre  Cost         NextHop         Interface

 

1.1.1.1/32          Direct 0    0            127.0.0.1       InLoop0

3.3.3.3/32          OSPF   10   2            103.0.0.4       Vlan103

4.4.4.4/32          OSPF   10   1            103.0.0.4       Vlan103

50.0.0.0/24         OSPF   10   2            103.0.0.4       Vlan103

70.0.0.0/24         OSPF   10   2            103.0.0.4       Vlan103

90.0.0.0/24         OSPF   10   2            103.0.0.4       Vlan103

103.0.0.0/24        Direct 0    0            103.0.0.1       Vlan103

103.0.0.1/32        Direct 0    0            127.0.0.1       InLoop0

127.0.0.0/8         Direct 0    0            127.0.0.1       InLoop0

127.0.0.1/32        Direct 0    0            127.0.0.1       InLoop0

Verify that the route entries include IP addresses of the loopback interfaces on PE 1, P, and PE 2, and the IP address of the remote device's VLAN interface. Otherwise, verify the routing protocol configuration on each LSR.

2.     Verify that the routing protocol operates correctly. If it does not, verify the routing protocol configuration on each LSR.

[PE1]display ospf peer

 

                   OSPF Process 1 with Router ID 1.1.1.1

                        Neighbor Brief Information

 

 Area: 0.0.0.0

 Router ID       Address         Pri Dead-Time Interface       State

 4.4.4.4         103.0.0.4       1   37        Vlan103         Full/BDR

3.     Verify that the loopback interface and the VLAN interface are advertised in the routing protocol.

[PE1-ospf-1]display this

#

ospf 1

 area 0.0.0.0

  network 103.0.0.0 0.0.0.255

  network 1.1.1.1 0.0.0.0

#

return

4.     Execute the debugging command to verify that routing protocol packets are sent and received correctly. If they are not, verify the routing protocol configurations on the local LSR and remote LSR.

<PE1>debugging ospf packet

*Mar  5 04:33:09:294 2022 H3C RM/6/RMDEBUG: OSPF 1: SEND Packet.

*Mar  5 04:33:09:365 2022 H3C RM/6/RMDEBUG: Source Address: 103.0.0.1

*Mar  5 04:33:09:446 2022 H3C RM/6/RMDEBUG: Destination Address: 224.0.0.5

*Mar  5 04:33:09:537 2022 H3C RM/6/RMDEBUG: Ver# 2, Type: 1, Length: 48.

*Mar  5 04:33:09:618 2022 H3C RM/6/RMDEBUG: Router: 1.1.1.1, Area: 0.0.0.0, Checksum: 9355.

*Mar  5 04:33:09:719 2022 H3C RM/6/RMDEBUG: AuType: 00, Key(ascii): 0 0 0 0 0 0 0 0.

*Mar  5 04:33:09:820 2022 H3C RM/6/RMDEBUG: Net Mask: 255.255.255.0, Hello Int: 10, Option: _E_.

*Mar  5 04:33:09:931 2022 H3C RM/6/RMDEBUG: Rtr Priority: 1, Dead Int: 40, DR: 103.0.0.1, BDR: 103.0.0.4.

*Mar  5 04:33:10:053 2022 H3C RM/6/RMDEBUG: Attached Neighbor: 4.4.4.4.

*Mar  5 04:33:10:437 2022 H3C RM/6/RMDEBUG: OSPF 1: RECV Packet.

*Mar  5 04:33:10:508 2022 H3C RM/6/RMDEBUG: Source Address: 103.0.0.4

*Mar  5 04:33:10:589 2022 H3C RM/6/RMDEBUG: Destination Address: 224.0.0.5

*Mar  5 04:33:10:680 2022 H3C RM/6/RMDEBUG: Ver# 2, Type: 1, Length: 48.

*Mar  5 04:33:10:761 2022 H3C RM/6/RMDEBUG: Router: 4.4.4.4, Area: 0.0.0.0, Checksum: 9355.

If the problem persists, contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting IP forwarding.

 

Command

Description

accounting packet

Configures a traffic accounting action in the traffic behavior database to count traffic in packets.

acl

Creates an ACL, and enters its view.

classifier behavior

Associates a traffic behavior with a traffic class in a QoS policy.

debugging ospf packet

Enables OSPF packet debugging to examine whether OSPF packets can be correctly sent and received.

display arp

Displays ARP entries to check whether output interfaces can be correctly learned through ARP.

display current-configuration | include lsr-id

Displays the current MPLS LSR ID.

display current-configuration configuration mpls-ldp

Displays information about MPLS LDP to verify the consistency of MD5 passwords.

display fib

Displays FIB entries to examine whether an entry matching a specific destination network exists in the FIB table.

display interface

Displays information about the specified interface.

display ip interface brief

Displays brief IP configuration information for the specified Layer 3 interface or all Layer 3 interfaces.

display ip routing-table

Displays brief information about active routes in the routing table to examine whether a route to the specified network exists in the routing table.

display mac-address

Displays MAC address entries to examine whether interfaces can be correctly learned.

display mpls ldp interface

Displays LDP interface information to examine whether the corresponding label advertisement mode exists.

display mpls ldp peer

Displays LDP peer information to examine whether the configured LSPs are up.

display mpls ldp session

Displays LDP session information.

display mpls lsp

Displays information about LSPs.

display ospf peer

Displays information about OSPF neighbors.

display qos policy interface

Displays information about the QoS policy or policies applied to an interface.

display qos traffic-counter

Displays the traffic statistics collected by the specified counter, and displays the configuration of the counter.

display this

Displays the running configuration in the current view.

Interface

Enters interface view.

Rule

Creates an ACL rule.

traffic behavior

Creates a traffic behavior and enters traffic behavior view.

traffic classifier

Creates a class and enters class view.

qos apply policy

Applies a QoS policy to a port.

qos policy

Creates a QoS policy and enters QoS policy view.

qos traffic-counter

Enables the traffic accounting function, and specifies the type of traffic.

mpls lsr-id

Configures an LSR ID for the local LSR.

Ping

Examines whether the destination IP address is reachable, and displays related statistics.

 

Troubleshooting IRF

This section provides troubleshooting information for common problems with IRF.

IRF fabric establishment failure

Symptom

An H3C S12500 IRF fabric cannot be established.

Solution

To resolve the problem:

1.     Verify that all member chassis run the same software version and use the same type of MPUs:

a.     Execute the display device command. Check the Brd Type and Software Version fields for the software version and MPU type.

<Sysname> display device

Slot No.   Brd Type        Brd Status   Software Version

 1/0       LST1MRPNC1      Master       S12500-CMW520-R1728P02

 1/1       LST1MRPNC1      Slave        S12500-CMW520-R1728P02

 1/2       LST1XP16LEC1    Normal       S12500-CMW520-R1728P02

 1/3       LST1XP16LEC1    Normal       S12500-CMW520-R1728P02

 1/4       LST1XP16LEC1    Normal       S12500-CMW520-R1728P02

 1/5       NONE            Absent       NONE

 1/6       NONE            Absent       NONE

 1/7       NONE            Absent       NONE

 1/8       NONE            Absent       NONE

 1/9       LST1GP48LEC1    Normal       S12500-CMW520-R1728P02

 1/10      LST2SF08C1      Normal       S12500-CMW520-R1728P02

b.     If the member chassis run different software versions, upgrade the software to the same version. If they use different types of MPUs, replace MPUs.

2.     Verify that at least one IRF physical port is up for an IRF port:

 

 

NOTE:

An IRF port goes down only if all its physical ports are down.

 

a.     Execute the display interface command. Check the current state field for the status of an IRF physical port. For example:

<Sysname> display interface gigabitethernet 1/5/0/1

 GigabitEthernet1/5/0/1 current state: UP

 IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0023-8956-7a04

 Description: GigabitEthernet1/5/0/1 Interface

 Media type is twisted pair, Port hardware type is 1000_BASE_T

……

b.     If all physical ports bound to an IRF port are down, bring them up. H3C recommends binding multiple physical ports to an IRF port for redundancy.

3.     Verify that all IRF physical ports are connected correctly:

 

IMPORTANT

IMPORTANT:

When you connect two neighboring IRF members, you must connect the physical ports of IRF-port 1 on one member to the physical ports of IRF-port 2 on the other.

 

a.     Execute the display irf configuration command. Check the IRF-Port1 and IRF-Port2 fields for IRF port bindings.

<Sysname> display irf configuration

 MemberID  NewID  IRF-Port1                        IRF-Port2

  1         1     Ten-GigabitEthernet1/8/0/1       disable

                  Ten-GigabitEthernet1/8/0/2                              

  2         2     disable                          Ten-GigabitEthernet2/12/0/1

                                                   Ten-GigabitEthernet2/12/0/2

b.     Verify that the physical IRF connections are consistent with the IRF port bindings. In this example, Ten-GigabitEthernet 1/8/0/1 and Ten-GigabitEthernet 1/8/0/2 on member chassis 1 must be connected to Ten-GigabitEthernet 2/12/0/1 and Ten-GigabitEthernet 2/12/0/2 on member chassis 2.

c.     If connection errors exist, reconnect the IRF physical ports.

4.     Verify that all member chassis use the same system operating mode:

a.     Execute the display system working mode command on each member chassis. Check the command output for mode inconsistency.

[Sysname] display system working mode

Current system working mode      : Routee

Working mode after system restart: Routee

Notice: Changing working mode will take effect only after system restart.

b.     If mode inconsistency exists, execute the system working mode command to change the system operating mode. The command setting takes effect after a system reboot.

5.     Verify that the configuration is the same across all chassis:

 

IMPORTANT

IMPORTANT:

The settings for the following commands must be the same across all chassis: acl ipv6, acl mode, irf mode enhanced, portal-roaming enable, and vpn popgo.

 

a.     Execute the display this command. Check the configuration on each member chassis for configuration inconsistency.

[Sysname] display this

 ……

 acl ipv6 disable

 portal-roaming enable

 undo vpn popgo

 system working mode routee

……

b.     If configuration inconsistency exists, modify the configuration.

6.     If the problem persists, contact H3C Support.

IRF split

Symptom

An IRF fabric splits.

Solution

To resolve the problem:

1.     Use the system log to identify the IRF split time.

You can use this information to search the system log for events that might cause the split.

%Jan 13 19:31:22:476 2010 H3C STM/4/LINK STATUS CHANGE:

 IRF port 1 is down because heartbeat timed out.

%Jan 13 19:31:22:689 2010 H3C STM/4/LINK STATUS CHANGE:

 IRF port 1 is down.

2.     Verify that all interface cards that have IRF physical ports are in Normal state:

a.     Execute the display device command. Check the Brd Status field for the card state.

<Sysname>display device

Slot No.   Brd Type        Brd Status   Software Version

1/0       LST1MRPNC1      Master       S12500-CMW520-R1728P02

 1/1       LST1MRPNC1      Slave        S12500-CMW520-R1728P02

 1/2       LST1XP16LEC1    Normal       S12500-CMW520-R1728P02

 1/3       LST1XP16LEC1    Normal       S12500-CMW520-R1728P02

 1/4       LST1XP16LEC1    Normal       S12500-CMW520-R1728P02

 1/5       NONE            Absent       NONE

 1/6       NONE            Absent       NONE

 1/7       NONE            Absent       NONE

 1/8       NONE            Absent       NONE

 1/9       LST1GP48LEC1    Normal       S12500-CMW520-R1728P02

 1/10      LST2SF08C1      Normal       S12500-CMW520-R1728P02

 1/11      LST2SF08C1      Normal       S12500-CMW520-R1728P02

 1/12      LST2SF08C1      Normal       S12500-CMW520-R1728P02

 1/13      LST2SF08C1      Normal       S12500-CMW520-R1728P02

 1/14      LST2SF08C1      Normal       S12500-CMW520-R1728P02

 1/15      LST2SF08C1      Normal       S12500-CMW520-R1728P02

 1/16      LST2SF08C1      Normal       S12500-CMW520-R1728P02

 1/17      LST2SF08C1      Normal       S12500-CMW520-R1728P02

 1/18      LST2SF08C1      Normal       S12500-CMW520-R1728P02

b.     If an interface card is not in Normal state, use the methods described in "Card failure" to resolve the problem.

3.     Verify that each IRF port has at least one physical port in up state:

a.     Execute the display interface command. Check the current state field for the state of an IRF physical port. For example:

<Sysname> display interface gigabitethernet 1/5/0/1

 GigabitEthernet1/5/0/1 current state: UP

 IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0023-8956-7a04

 Description: GigabitEthernet1/5/0/1 Interface

 Media type is twisted pair, Port hardware type is 1000_BASE_T

……

b.     If any physical port bound to an IRF port is down, recover the link state and bring up the physical port (see "Troubleshooting links and ports").

4.     Remove hardware problems that might cause recurring IRF split events:

a.     Execute the display version command. Check the uptime of the member chassis, MPUs, and interface cards that have IRF links.

<Sysname> display version

H3C Comware Platform Software

Comware Software, Version 5.20, Release 1825P01

Copyright (c) 2004-2013 Hangzhou H3C Tech. Co., Ltd. All rights reserved.

H3C S12504 uptime is 0 week, 0 day, 1 hour, 48 minutes

Last reboot reason : User reboot

 

LST1MRPNC1 1/0:  uptime is 0 week, 0 day, 1 hour, 48 minutes

Last reboot reason : User reboot

3456    Mbytes SDRAM

1024    Kbytes NVRAM Memory

Type     : LST1MRPNC1

BootRom  : 1.22

Software : S12500-CMW520-R1825P01

Patch    : NONE

PCB      : Ver.B

……

b.     Compare the uptime of chassis, MPUs, and interface cards to determine whether a member chassis, MPU, or interface card rebooted before the IRF split.

c.     If the IRF split is caused by a chassis or card reboot, identify the reboot cause:

-     If the reboot occurred because of a hardware problem, replace the faulty component.

-     If the reboot occurred because of power failure, use the methods described in "Power supply failure" to remove the power supply problems.

5.     If the problem persists, contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting IRF.

 

Command

Description

display device

Displays device configuration.

Use this command to verify that all member chassis run the same software version and use the same type of MPUs.

display interface

Displays interface information.

Use this command to verify that each IRF port has at least one physical port in up state.

display irf configuration

Displays IRF configuration on each member chassis.

Use this command to identify physical ports bound to IRF-port 1 and IRF-port 2 on each member chassis before you check IRF physical connections.

display system working mode

Displays system operating mode.

Use this command to verify that all member chassis are operating in the same mode.

display this

Displays the running configuration in the current view.

In system view, verify that the settings for the following commands are the same across all chassis: acl ipv6, acl mode, irf mode enhanced, portal-roaming enable, and vpn popgo.

display version

Displays the system version and uptime as well as the uptime of each card.

Use this command to identify the runtime of each member chassis, MPU, and interface card that has IRF physical ports. Compare their uptime to determine whether a member chassis, MPU, or interface card rebooted before an IRF split.

 

Troubleshooting system management

This section provides troubleshooting information for common problems with system management.

High CPU usage

Symptom

A CPU usage higher than 60% persists on a card.

<Sysname>display cpu-usage

Slot 0 CPU usage:

       0% in last 5 seconds

      61% in last 1 minute

       0% in last 5 minutes

 

Slot 0 CPU 1 CPU usage:

       0% in last 5 seconds

       0% in last 1 minute

       0% in last 5 minutes

Execute the display cpu-usage history command to display the CPU usage statistics within the last 60 minutes.

<Sysname>display cpu-usage history slot 0

100%|

 95%|

 90%|

 85%|

 80%|

 75%|

 70%|

 65%|

 60%|

 55%|

 50%|

 45%|

 40%|

 35%|                             #

 30%|                         #   #

 25%|                         #   #

 20%|           #             #   #                    #

 15%|          ##             #   #                   ##

 10%|          ##             #   #                   ##

  5%|############################################################

     ------------------------------------------------------------

              10        20        30        40        50        60  (minutes)

              cpu-usage last 60 minutes(SYSTEM)

Solution

High CPU usage might occur because of the following issues:

·     Link loop.

·     Packet attack.

·     Route flapping.

·     Too many routing policies.

To resolve the problem:

1.     Execute the display cpu-usage number [verbose] [slot slot-number [cpu cpu-number] ] command to display tasks with high CPU usage.

<Sysname>display cpu-usage 5 verbose slot 0

===== CPU usage info (no:  0  idx: 31) =====

CPU Usage Stat. Cycle: 60 (Second)

CPU Usage            : 63%                        <--- CPU usage

CPU Usage Stat. Time : 2009-07-26  16:55:33       <--- Statistics collection time

CPU Usage Stat. Tick : 0x15(CPU Tick High) 0x429be6f6(CPU Tick Low)

Actual Stat. Cycle   : 0x0(CPU Tick High) 0xb2d2a975(CPU Tick Low)

 

TaskName        CPU        Runtime(CPU Tick High/CPU Tick Low)

VIDL            37%               0/77d02af4

TICK             0%               0/  469276

STMR             0%               0/   7d8c9

DIBC             0%               0/  3e1ecd

… …

 BFD             0%               0/   463ad

MFIB             0%               0/   ae8a6

IGSP             0%               0/     431

ROUT             0%               0/  30a6ed

TNLM             0%               0/   37a26

IFNT             0%               0/    833f

 co0            61%               0/39012f2b

The output shows that the CPU usage of the VIDL and co0 is 37% and 61%, respectively. A high CPU usage for the VIDL task indicates that the device is idle more of the time, so it is co0 that causes high CPU usage. You need to determine which task causes the high CPU usage. For example, if it is the ROUT task that causes high CPU usage, the reason might be flapping route.

Table 10 Task description

Task

Description

VFS

Cross-card file system operation task.

VIDL

Idle task.

VMON

System monitoring task.

IPCB

IPC main task.

IPCD

IPC packet distribution task.

RPCQ

RPC packet sending timeout check task.

RPCD

RPC packet distribution task.

INFO

Information center task.

co0

Session task between a console user and the device.

au0

Session task between an AUX user and the device.

STM

STM main task.

STMH

STM hello packet sending task.

VLAN

VLAN task.

DDNS

Dynamic DNS task.

DNS

DNS task.

HTTP

HTTPD and HTTPS main task.

HDQx

Sub task for HTTP request processing.

MAC

MAC main task.

ARP

ARP task.

IP

IP task.

DHCP

DHCP task.

DHSE

DHCP snooping security entry processing task.

DHCC

DHCP client main task.

DHC6

DHCPv6 client task.

DHP6

DHCPv6 common task.

FIB6

IPv6 FIB task.

FIB

IPv4 FIB task.

ND

IPv6 ARP task.

LFIB

MPLS forwarding and FIB maintenance task.

L2V

MPLS L2 VPN task.

MACA

MAC authentication task.

ROUT

Routing management task.

BFD

Bidirectional forwarding detection task.

DLDP

DLDP task.

EOAM

Ethernet OAM task.

GARP

GVRP task.

LAGG

Link aggregation task.

LLDP

LLDP task.

LPDT

Loop detection task.

MAC

MAC address entry aging task.

MGRP

Port mirroring task.

MSTP

MSTP task.

MTLK

Monitor link task.

QINQ

QinQ task.

QOS

QoS task.

RRPP

RRPP task.

SMLK

Smart link.

DT1X

802.1X authentication task.

CF

CF card mounting/unmounting task.

L2AU

MAC AU message processing task.

L2HC

MAC check task.

 

2.     Execute the display route-policy command to display the configured routing policies to verify that the configured routing policies are reasonable.

<Sysname> display route-policy

Route-policy : policy1

  permit : 10

        if-match ip-prefix abc

        apply cost 120

3.     Execute the display interface command, and check for loop links.

<Sysname>display interface ten-gigabitethernet 2/3/0/1

 Ten-GigabitEthernet2/3/0/1 current state: UP

 IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 000f-e20a-2005

 Description: Ten-GigabitEthernet2/3/0/1 Interface

……

 Last clearing of counters:  Never

 Peak value of input: 0 bytes/sec, at 2013-05-29 15:05:34

 Peak value of output: 1191343840 bytes/sec, at 2013-05-29 19:30:44

 Last 300 seconds input:  0 packets/sec 0 bytes/sec 0%

 Last 300 seconds output:  0 packets/sec 0 bytes/sec 0%

……

If any loop occurs, verify the following:

?     The link connections and port configuration are correct.

?     STP is enabled, and the configuration is correct.

?     The STP status of the neighboring device is normal.

?     If all the previous configurations are correct, the reason might be:

-      STP calculation error.

-     STP calculation is correct, but the driver does not block a port.

You can do all of the following:

?     Shut down the uplink port on the ring.

?     Remove and insert the transceiver module into the port to restart STP calculation.

?     Contact H3C Support.

4.     If the problem persists, contact H3C Support.

Insufficient resources

Symptom

The system displays the following log and trap information when resources are insufficient:

%Oct 30 20:41:42:29 2011 LS-SHQ-9508 DRVL3/4/NO_RESOURCE:No enough resource: Insufficient system resources!

%Oct 30 20:41:42:29 2011 LS-SHQ-9508 DRVL3/4/NO_RESOURCE:No enough resource: Insufficient system resources!

%Oct 30 20:41:42:29 2011 LS-SHQ-9508 DRVL3/4/NO_RESOURCE:No enough resource: Insufficient system resources!

%Oct 30 20:41:42:29 2011 LS-SHQ-9508 DRVL3/4/NO_RESOURCE:No enough resource: Insufficient system resources!

%Oct 30 20:41:42:29 2011 LS-SHQ-9508 DRVL3/4/NO_RESOURCE:No enough resource: Insufficient system resources!

%Oct 30 20:41:42:29 2011 LS-SHQ-9508 DRVL3/4/NO_RESOURCE:No enough resource: Insufficient system resources!

 

[hntjjS12508]mirroring-group 2 monitor-port g4/0/34

Error: Local mirroring-group number exceeds hardware capability.

Solution

ACL resources

The following features use ACL resources:

·     QoS.

·     Packet filter.

·     Priority mapping and trust.

·     Mirror.

·     Protocol packet to CPU.

·     Selective QinQ and VLAN mapping.

·     Port binding, PORTAL, and EAD.

·     Broadcast suppression.

·     MAC-BASED-VLAN, VOICE VLAN, RSPAN, and UDP-Helper.

The system displays the following information for insufficient resources:

%Sep  9 13:56:24:871 2011 H3C DRVQACL/5/LOG_NOTICE: PCL resources are not enough.

To resolve the problem:

1.     Display the ACL usage on a card.

<Sysname>display acl resource chassis 2 slot 2

 Interface:

   GE2/2/0/1 to GE2/2/0/24

---------------------------------------------------------------------

 Type          Total       Reserved    Configured  Remaining   Usage

---------------------------------------------------------------------

 ACL rule      2048        0           89          1959        4%

 Inbound ACL   2048        0           3           1959        0%

 Outbound ACL  2048        0           86          1959        4%

                                                                                

 Interface:

   GE2/2/0/25 to GE2/2/0/48

---------------------------------------------------------------------

 Type          Total       Reserved    Configured  Remaining   Usage

---------------------------------------------------------------------

 ACL rule      2048        0           89          1959        4%

 Inbound ACL   2048        0           3           1959        0%

 Outbound ACL  2048        0           86          1959        4%

2.     If most ACL resources are allocated, optimize ACL configuration. For example, delete or combine ACL rules. If the configuration cannot be optimized, contact H3C Support.

Multicast resources

To resolve the problem:

1.     Display multicast statistics in the .diag file.

  ===============Display l3mc keyinfo slot 1===============

===============================================================

Resource Info:

 TCAM Resource: total 511 free 511

 Local DIT Resource: total 1003 free 871 usage list:

  L3MC: 132

  SUPERVLAN: 0

  VLL: 0

  VPLS: 0

  DIAG: 0

  BLG: 0

 Local VIDX Resource: total 2044 free 2040

……

2.     If most multicast resources are allocated, optimize multicast configuration. For example, delete multicast entries not in use. If the configuration cannot be optimized, contact H3C Support.

ARP resources

To resolve the problem:

1.     Display ARP statistics in the .diag file.

  ===============Display arpnd index resourece slot 4===============

==============================================================================================================

Resource distribution on master board:

Total Index number is 12287, ARP alloced 2724, ND alloced 8274.

    (   0 -    7):      2     0     0    10  6000    64   148   280

    (   8 -   15):     81  1000  3048   820     a  4520     0    80

……

Total Index number indicates the maximum number of ARP/ND resources. ARP alloced indicates allocated ARP resources. ND alloced indicates allocated ND resources.

2.     If most ARP/ND resources are allocated, H3C recommends that you do one of the following:

?     Optimize the network by reducing the number of gateways.

?     Replace the EB card with the EC card, and change the operating mode of the system to Routee.

3.     If the problem persists, contact H3C Support.

FIB resources

To resolve the problem:

1.     Display FIB information in the .diag file.

===============Display L3 fib information slot 3===============

=====================================================================

 Ipv4 route prefix       : 12

 Ipv6 route prefix       : 1

 Allocated route entry   : 9

 Ipv4Uc allocated nexthop: 2   1    0   0   0   0   0  0

 Ipv6Uc allocated nexthop: 1   0    0   0   0   0   0  0

 Ipv4Mc allocated nexthop: 1

 Ipv6Mc allocated nexthop: 0

 Tunnel allocated nexthop: 0

 Max support vrf         : 4096

 Max support ipv4 prefix : 262144

 Max support ipv6 prefix : 131072

 Max support nexthop     : 65536

2.     If most FIB resources are allocated, contact H3C Support.

MAC resources

MAC resource insufficiency problems easily occur in large Layer 2 networks. There is a large amount of MAC addresses in these networks. New MAC addresses cannot be learned because old MAC addresses have not aged.

To resolve the problem:

1.     Display MAC addresses that have been learned.

<Sysname>display mac-address count

 49 mac address(es) found

The output shows that the number of MAC addresses that have been learned is small.

2.     H3C recommends that you do the following:

?     Set a smaller MAC address aging time.

?     Create VLANs by service or by department, and connect VLANs at Layer 3.

MPLS LSP resources

The system displays the following information if the resources are insufficient:

%Jul 28 16:02:24:563 2011 H3C DRVMPLS/3/L3VPN_ERR: -Chassis=2-Slot=3; L3VPN ERR: No enough resource!

To resolve the problem:

1.     Display MPLS LSP statistics.

<Sysname>display mpls lsp statistics

Lsp Type       Total     Ingress   Transit   Egress

STATIC LSP     0         0         0         0

STATIC CRLSP   0         0         0         0

LDP LSP        3         1         0         2

CRLDP CRLSP    0         0         0         0

RSVP CRLSP     0         0         0         0

BGP LSP        0         0         0         0

ASBR LSP       0         0         0         0

BGP IPV6 LSP   0         0         0         0

-------------------------------------------------------------------------

LSP            3         1         0         2

CRLSP          0         0         0         0

2.     If MPLS LSP resources are insufficient, contact H3C Support.

Related commands

This section lists the commands that you might use for troubleshooting system management.

 

Command

Remarks

display acl resource

Displays ACL resource information.

display cpu-usage

Displays CPU usage statistics and tasks with high CPU usage.

display cpu-usage history

Displays the historical CPU usage statistics in charts.

display interface

Displays information about a specific interface.

display mac-address

Displays MAC address entries.

display mpls lsp statistics

Displays MPLS LSP statistics.

 

 

  • Cloud & AI
  • InterConnect
  • Intelligent Computing
  • Intelligent Storage
  • Security
  • SMB Products
  • Intelligent Terminal Products
  • Product Support Services
  • Technical Service Solutions
All Services
  • Resource Center
  • Policy
  • Online Help
  • Technical Blogs
All Support
  • Become A Partner
  • Partner Policy & Program
  • Global Learning
  • Partner Sales Resources
  • Partner Business Management
  • Service Business
All Partners
  • Profile
  • News & Events
  • Online Exhibition Center
  • Contact Us
All About Us
新华三官网