- Released At: 18-11-2019
- Page Views:
- Downloads:
- Table of Contents
- Related Documents
-
|
H3C S6890 Switch Series |
Troubleshooting Guide |
|
|
Document version: 6W100-20190725
Copyright © 2019 New H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.
Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.
The information in this document is subject to change without notice
Contents
Collecting log and operating information
Collecting common log messages
Collecting diagnostic log messages
Collecting operating statistics
Operating power module failure
Newly installed power module failure·
Troubleshooting system management
10-Gigabit SFP+ fiber port fails to come up
100-GE QSFP28 fiber port fails to come up
Non-H3C transceiver module error message·
Transceiver module does not support digital diagnosis
Error frames (for example, CRC errors) on a port
ACL application failure for unsupported ACL rules or insufficient resources
Introduction
This document provides information about troubleshooting common software and hardware issues with S6890 Switch Series.
This document is not restricted to specific software or hardware versions.
General guidelines
|
IMPORTANT: To prevent an issue from causing loss of configuration, save the configuration each time you finish configuring a feature. For configuration recovery, regularly back up the configuration to a remote server. |
When you troubleshoot S6890 switches, follow these general guidelines:
· To help identify the cause of the issue, collect system and configuration information, including:
¡ Symptom, time of failure, and configuration.
¡ Network topology information, including the network diagram, port connections, and points of failure.
¡ Log messages and diagnostic information. For more information about collecting this information, see "Collecting log and operating information."
¡ Physical evidence of failure:
- Photos of the hardware.
- Status of the LEDs.
¡ Steps you have taken, such as reconfiguration, cable swapping, and reboot.
¡ Output from the commands executed during the troubleshooting process.
· To ensure safety, wear an ESD wrist strap when you replace or maintain a hardware component.
Collecting log and operating information
|
IMPORTANT: By default, the information center is enabled. If the feature is disabled, you must use the info-center enable command to enable the feature for collecting log messages. |
Table 1 shows the types of files that the system uses to store operating log and status information. You can export these files by using FTP, TFTP, or USB.
In an IRF system, these files are stored on the master device. Multiple MPUs will have log files if master/subordinate switchovers have occurred. You must collect log files from all these devices. To more easily locate log information, use a consistent rule to categorize and name files. For example, save log files to a separate folder for each member device, and include their slot numbers in the folder names.
Table 1 Log and operating information
Category |
File name format |
Content |
Common log |
logfileX.log |
Command execution and operational log messages. |
Diagnostic log |
diagfileX.log |
Diagnostic log messages about device operation, including the following items: · Parameter settings in effect when an error occurs. · Information about a device startup error. · Handshaking information between member devices when a communication error occurs. |
Operating statistics |
file-basename.gz |
Current operating statistics for feature modules, including the following items: · Device status. · CPU status. · Memory status. · Configuration status. · Software entries. · Hardware entries. |
Collecting common log messages
1. Save common log messages from the log buffer to a log file:
By default, the log file is saved in the logfile directory of the flash memory on each member device.
<Sysname> logfile save
The contents in the log file buffer have been saved to the file flash:/logfile/logfile.log
2. Identify the log file on each member device:
# Display the log file on the master device.
<Sysname> dir flash:/logfile/
Directory of flash:/logfile
0 -rw- 21863 Jul 11 2013 16:00:37 logfile.log
1048576 KB total (38812 KB free)
# Display the log file on each subordinate device:
<Sysname> dir slot2#flash:/logfile/
Directory of flash:/logfile
0 -rw- 21863 Jul 11 2013 16:00:37 logfile.log
1048576 KB total (38812 KB free)
3. Transfer the files to the desired destination by using FTP, TFTP, or USB. (Details not shown.)
Collecting diagnostic log messages
1. Save diagnostic log messages from the diagnostic log file buffer to a diagnostic log file:
By default, the diagnostic log file is saved in the diagfile directory of the flash memory on each member device.
<Sysname> diagnostic-logfile save
The contents in the diagnostic log file buffer have been saved to the file flash:/diagfile/diagfile.log
2. Identify the log file on each member device:
# Display the log file on the master device.
<Sysname> dir flash:/diagfile/
Directory of flash:/diagfile
0 -rw- 161321 Jul 11 2013 16:16:00 diagfile.log
1048576 KB total (38812 KB free)
# Display the log file on each subordinate device:
<Sysname> dir slot2#flash:/diagfile/
Directory of flash:/diagfile
0 -rw- 161321 Jul 11 2013 16:16:00 diagfile.log
1048576 KB total (38812 KB free)
3. Transfer the files to the desired destination by using FTP, TFTP, or USB. (Details not shown.)
Collecting operating statistics
You can collect operating statistics by saving the statistics to a file or displaying the statistics on the screen.
When you collect operating statistics, follow these guidelines:
· Log in to the device through a network or management port instead of the console port, if possible. Network and management ports are faster than the console port.
· Do not execute commands while operating statistics are being collected.
· As a best practice, save operating statistics to a file to retain the information.
To collect operating statistics:
1. Disable pausing between screens of output if you want to display operating statistics on the screen. Skip this step if you are saving statistics to a file.
<Sysname> screen-length disable
2. Collect operating statistics for multiple feature modules.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N] :
3. At the prompt, choose to save or display operating statistics:
# To save operating statistics, enter y at the prompt and then specify the destination file path.
Save or display diagnostic information (Y=save, N=display)? [Y/N] : Y
Please input the file name(*.tar.gz)[flash:/diag.tar.gz] :
Diagnostic information is outputting to flash:/diag.tar.gz.
Please wait...
Save successfully.
<Sysname> dir flash:/
Directory of flash:
…
6 -rw- 898180 Jun 26 2013 09:23:51 diag.tar.gz
1021808 KB total (259072 KB free)
# To display operating statistics on the monitor terminal, enter n at the prompt. The output from this command varies by software version.
Save or display diagnostic information (Y=save, N=display)? [Y/N] :n
===============================================
===============display clock===============
00:08:02.487 UTC Sat 07/06/2019
=================================================
===============display version===============
H3C Comware Software, Version 7.1.070, Release 2712
Copyright (c) 2004-2018 New H3C Technologies Co., Ltd. All rights reserved.
H3C S6890-54HF uptime is 0 weeks, 0 days, 0 hours, 8 minutes
Last reboot reason : User reboot
Boot image: flash:/S6890-CMW710-BOOT-R2712.bin
Boot image version: 7.1.070P2214, Release 2712
Compiled Jul 18 2018 14:00:00
System image: flash:/S6890-CMW710-SYSTEM-R2712.bin
System image version: 7.1.070, Release 2712
Compiled Jul 18 2018 14:00:00
……
Contacting technical support
· Information described in "General guidelines."
· Product serial numbers.
This information will help the support engineer assist you as quickly as possible.
The following is the contact information for H3C Support:
· Telephone number—400-810-0504.
· E-mail—service@h3c.com.
Removing deployment errors
Use the deployment checklist in Table 2 to eliminate issues that might be introduced at the deployment stage. Select items that are suitable for your site.
Question |
Command or method |
Result |
Remarks |
Environment and device hardware status |
|
|
|
Is the sensor temperature between the low-temperature and high-temperature warning thresholds? |
display environment |
□OK □Not OK □Not related |
Make sure the temperature of each sensor is between the low-temperature and high-temperature warning thresholds. |
Are the fan trays operating correctly? |
display fan |
□OK □Not OK □Not related |
Make sure the fan trays are operating correctly. |
Are sufficient power modules installed and are they operating correctly? |
display power |
□OK □Not OK □Not related |
Make sure the following conditions are met: · You have installed sufficient power modules to provide power redundancy. · The power modules are operating correctly. The display power command shows that their state is Normal. |
Are the LEDs all displaying correct statuses? |
Visually check the status of LEDs on each device. |
□OK □Not OK □Not related |
The LED shows the status of the device: · Steady green—The switch is operating correctly. · Steady red—The switch has failed to pass POST or has problems such as fan failure. · Off—The switch is powered off or has failed to start up. |
CPU and memory usage |
|
|
|
Does the CPU usage change rate exceed 10%? Does the sustained CPU usage exceed 60%? |
display cpu-usage |
□OK □Not OK □Not related |
Execute the display cpu-usage command repeatedly. If the CPU sustains a usage level of over 60% or has a change rate higher than 10%, execute the debugging ip packet command to view the packets delivered to the CPU for analysis. |
Does the memory usage exceed 60%? |
display memory |
□OK □Not OK □Not related |
If memory usage exceeds 60%, execute the display memory command to identify the module that is using the most memory. |
Ports |
|
|
|
Is half duplex used in port negotiation? |
display interface brief |
□OK □Not OK □Not related |
If the duplex mode of a port is half, verify that the peer port uses the same duplex mode. |
Is flow control unnecessarily enabled on ports? |
Verify the port settings. |
□OK □Not OK □Not related |
Disable flow control on the ports. |
Are large numbers of error packets generated continuously in the outbound or inbound direction of the port? |
display interface |
□OK □Not OK □Not related |
If an error counter displays a non-zero value and the value is increasing, check for the following errors: · Link and optical-electrical converter errors. · Port setting inconsistencies with the peer port. |
Does the port change between an up and down state frequently? |
display logbuffer |
□OK □Not OK □Not related |
If the port state flaps, check for the following errors: · Link and optical-electrical converter errors. · Optical power threshold crossing events if the port is a fiber port. · Port setting inconsistencies with the peer port. |
Fiber ports |
|
|
|
Do the ports at the two ends use the same port settings? |
display current-configuration interface |
□OK □Not OK □Not related |
When you connect an H3C device to a device from another vendor, set the same port rate and duplex mode settings at the two ends as a best practice. |
Are CRC errors present on any fiber port? Is the number of CRC errors increasing? |
display interface |
□OK □Not OK □Not related |
If CRC errors persist, replace the transceiver module or pigtail fiber, or clean the transceiver module connector. |
Trunk port configuration |
|
|
|
Do the peer trunk ports use the same PVID? |
display current-configuration interface |
□OK □Not OK □Not related |
Make sure the same PVID is configured on the trunk ports between two devices. |
Are the peer ports assigned to the same VLANs? |
display current-configuration interface |
□OK □Not OK □Not related |
Make sure the trunk ports between two devices are assigned to the same VLANs. For example, if you assign a trunk port to all VLANs, also assign its peer port to all VLANs. |
Are the peer ports set to the same link type? |
display current-configuration interface |
□OK □Not OK □Not related |
Make sure the ports between two devices use the same link type. |
Is a loop present in VLAN 1? |
loopback-detection global enable vlan 1 |
□OK □Not OK □Not related |
Remove ports from VLAN 1 as needed. |
Spanning tree feature |
|
|
|
Is the timeout factor correctly set? |
display current-configuration |
□OK □Not OK □Not related |
As a best practice, set a timeout factor in the range of 5 to 7 on a stable network to avoid unnecessary recalculations. |
Are ports connected to end-user devices configured as edge ports? |
display current-configuration interface |
□OK □Not OK □Not related |
Verify that the output from the display current-configuration interface command contains the "stp edged-port enable" string for ports connected to end-user devices. As a best practice, configure ports connected to end-user devices (PCs, for example) as edge ports, or disable the spanning tree feature on the ports. |
Is the spanning tree feature disabled on ports connected to devices that do not support spanning tree protocols? |
display current-configuration interface |
□OK □Not OK □Not related |
Disable the spanning tree feature on ports connected to devices that do not support spanning tree protocols. Make sure the output from the display current-configuration interface command contains the "undo stp enable" string for these ports. |
Is the device running MSTP, STP, or RSTP, and working with a Cisco PVST+ device? |
display stp |
□OK □Not OK □Not related |
As a best practice to avoid interoperability issues, set up a Layer 3 connection to the Cisco device. |
Do the topologies of MSTIs meet the design? Are there as few overlapping paths as possible among MSTIs? |
display current-configuration interface |
□OK □Not OK □Not related |
If the topologies deviate from the design, reassign ports to VLANs and revise the VLAN and instance mappings. For optimal load balancing, plan VLANs and VLAN-to-instance mappings to minimize overlapping paths among different MSTIs. |
Does a TC attack exist to cause frequent STP status changes on any ports? |
display stp tc display stp history |
□OK □Not OK □Not related |
Examine the following items in the command output for TC attacks: · Incoming and outgoing TC/TCN BPDU statistics. · Historical port role calculation information. There is a risk of TC attack if frequent STP status changes occur on a stable network. Make sure you have configured the following settings: · Configure ports connected to end-user devices as edge ports, and enable BPDU guard. Alternatively, disable the spanning tree feature on the ports. · Disable the spanning tree feature on ports connected to devices that do not support spanning tree protocols. · Do not disable TC-BPDU guard. |
VRRP |
|
|
|
Is the handshake interval correctly set? Are the handshake intervals of the two ends the same? |
display vrrp |
□OK □Not OK □Not related |
Change the handshake interval to 3 seconds if the number of VRRP groups is less than five. If five or more VRRP groups exist, assign three or five VRRP groups into one group, and configure the handshake interval as 3 seconds, 5 seconds, and 7 seconds for each group. |
ARP |
|
|
|
Are there ARP conflicts? |
display logbuffer |
□OK □Not OK □Not related |
If the log contains ARP conflict records, verify that the hosts in conflict are legitimate, and remove the conflicts. |
OSPF |
|
|
|
Is the router ID of the device unique on the network? |
display ospf peer |
□OK □Not OK □Not related |
Change the router ID if it is not unique on the network. To restart route learning after you remove the router ID conflict, you must execute the reset ospf process command. |
Are there a lot of errors in the output from the display ospf statistics error command? |
display ospf statistics error |
□OK □Not OK □Not related |
If a large number of OSPF errors has occurred and the number continues to increase, collect the error information for further analysis. |
Are there severe route flappings? |
display ip routing-table statistics |
□OK □Not OK □Not related |
Examine the statistics for added and deleted routes during the system uptime. If route flapping occurs, locate the flapping route and the source device to analyze the cause. You can use the display ospf lsdb command multiple times to view the age of routes and locate the flapping route. |
Is the OSPF status stable? |
display ospf peer |
□OK □Not OK □Not related |
View the up time of the OSPF neighbor. |
Routes |
|
|
|
Is the default route correct? Are there any routing loops? |
tracert debug ip packet |
□OK □Not OK □Not related |
Use the tracert command to trace the path to a nonexistent network (1.1.1.1, for example) to check for routing loops. If a routing loop exists, check the configuration of the involved devices for errors. Adjust the route to remove the loop. Use the debug ip packet command to check for packets with TTL 0 or 1. If TTL exceeded packets are received, check for network route errors. |
CPU security |
|
|
|
Are there packet attacks on CPU? |
debug rxtx softcar show |
□OK □Not OK □Not related |
Execute the debug rxtx softcar show command in probe view to view packet rate limit information for cards. The CPU is under attack if the number of packets of a type keeps increasing unusually. |
Records in the local log buffer |
|
|
|
Does the local log buffer contain exception records? |
· In standalone mode: · In IRF mode: |
□OK □Not OK □Not related |
Execute the local logbuffer display command in probe view. If the local log buffer contains exception records, contact H3C Support to troubleshoot the exceptions. Use the following commands in probe view to clear the history records after the exceptions are removed: · In standalone mode: · In IRF mode: |
Troubleshooting hardware
This section provides troubleshooting information for common hardware issues.
|
NOTE: This section describes how to troubleshoot switch reboot failure, power module failure, and fan tray failure. To troubleshoot transceiver modules, ports, and temperature alarms, see "Troubleshooting ports" and "Troubleshooting system management." |
Switch reboot failure
Symptom
The switch fails to reboot.
Troubleshooting flowchart
Figure 1 Troubleshooting switch reboot failure
Solution
To resolve the issue:
1. Verify that the system software image on the switch is correct.
a. Log in to the switch through the console port and restart the switch. If the system reports that a CRC error occurs or that no system software image is available during the BootWare loading process, reload the system software image.
b. Verify that the system software image in the flash memory is the same size as the one on the server. If no system software image is available in the flash memory, or if the image size is different from the one on the server, reload the system software image. Then set the reloaded system software image to the current system software image.
The system software image in the flash memory is automatically set to the current system software image during the BootWare loading process.
2. Verify that the memory is running correctly.
Reboot the switch, and immediately press CTRL+T to examine the memory. If a memory fault is detected, replace the switch.
3. Verify that no error is reported during the BootWare loading process.
If the memory is running correctly but there are still errors reported during the BootWare loading process, replace the switch.
4. If the issue persists, contact H3C Support.
Operating power module failure
Symptom
An operating power module fails.
Solution
To resolve the issue:
1. Identify the operating state of the power module.
¡ Execute the display power command to view the operating state of the power module.
<Sysname> display power
Input Power:132W
PowerID State InPower(W) Current(A) Voltage(V) OutPower(W) Type
1 Absent -- -- -- -- ---
2 Normal -- -- -- -- PSR300-A
¡ Execute the display alarm command to view alarm information about the power module.
<Sysname> display alarm
Slot CPU Level Info
1 0 INFO Chassis 1 power 1 is absent.
If the power module is in Absent state, go to step 2. If the power module is in Fault state, go to step 3.
2. Verify that the power module is installed securely.
Remove and reinstall the power module to ensure that the power module is installed securely. Then execute the display power command to verify that the power module has changed to Normal state. If the power module remains in Absent state, replace the power module.
3. Verify that the power module is operating correctly.
a. Verify that the power cord is connected to the power module securely.
<Sysname> display power
Input Power:132W
PowerID State InPower(W) Current(A) Voltage(V) OutPower(W) Type
1 Absent -- -- -- -- ---
2 Normal -- -- -- -- PSR300-A
If the voltage and current of the power module are 0 and the power module state is Fault, the power cord is disconnected. Connect the power cord securely to the power module. Then execute the display power command to verify that the power module has changed to Normal state.
b. Determine whether the power module is in high temperature. If dust accumulation on the power module causes the high temperature, remove the dust. Then remove and reinstall the power module. Execute the display power command to verify that the power module has changed to Normal state.
c. Install the power module into an empty power module slot. Then execute the display power command to verify that the power module has changed to Normal state in the new slot. If the power module remains in Fault state, replace the power module.
4. If the issue persists, contact H3C Support.
Newly installed power module failure
Symptom
A newly installed power module fails.
Solution
To resolve the issue:
1. Identify the operating state of the power module.
¡ Execute the display power command to view the operating state of the power module.
<Sysname> display power
Input Power:132W
PowerID State InPower(W) Current(A) Voltage(V) OutPower(W) Type
1 Absent -- -- -- -- ---
2 Normal -- -- -- -- PSR300-A
¡ Execute the display alarm command to view alarm information about the power module.
<Sysname> display alarm
Slot CPU Level Info
1 0 INFO Chassis 1 power 1 is absent.
If the power module is in Absent state, go to step 2. If the power module is in Fault state, go to step 3.
2. Verify that the power module is installed securely.
a. Remove and reinstall the power module to make sure the power module is installed securely. Then execute the display power command to verify that the power module has changed.
b. Remove and install the power module into an empty power module slot. Then execute the display power command to verify that the power module has changed to Normal state in the new slot. If the power module remains in Absent state, go to step 4.
3. Verify that the power module is operating correctly.
a. Verify that the power module is connected to the power source correctly. If it is not, connect it to the power source correctly. Then execute the display power command to verify that the power module has changed.
b. Remove and install the power module into an empty power module slot. Then execute the display power command to verify that the power module has changed to Normal state in the new slot. If the power module remains in Fault state, go to step 4.
4. If the issue persists, contact H3C Support.
Fan tray failure
Symptom
An operating fan tray or a newly installed fan tray fails.
Solution
To resolve the issue:
1. Identify the operating state of the fan tray.
¡ Execute the display fan command to view the operating state of the fan tray.
<Sysname> display fan
Fan-tray 1:
Status : Normal
Fan Type : LSWM1FANSA
Fan number: 2
Fan mode : Auto
Airflow Direction: Port-to-power
Fan Speed(rpm)
--- ----------
1 10692
2 9105
Fan-tray 2:
Status : Normal
Fan Type : LSWM1FANSA
Fan number: 2
Fan mode : Auto
Airflow Direction: Port-to-power
Fan Speed(rpm)
--- ----------
1 10702
2 9133
Fan-tray 3:
Status : Normal
Fan Type : LSWM1FANSA
Fan number: 2
Fan mode : Auto
Airflow Direction: Port-to-power
Fan Speed(rpm)
--- ----------
1 10692
2 9162
Fan-tray 4:
Status : Normal
Fan Type : LSWM1FANSA
Fan number: 2
Fan mode : Auto
Airflow Direction: Port-to-power
Fan Speed(rpm)
--- ----------
1 10731
2 9183
Fan-tray 5:
Status : Normal
Fan Type : LSWM1FANSA
Fan number: 2
Fan mode : Auto
Airflow Direction: Port-to-power
Fan Speed(rpm)
--- ----------
1 10672
2 9183
¡ Execute the display alarm command to view alarm information about the fan tray.
<Sysname> display alarm
Slot CPU Level Info
1 0 INFO Chassis 1 power 1 is absent.
If the fan tray is in Absent state, go to step 2. If the fan tray is in Fault state, go to step 3.
2. Verify that the fan tray is installed securely.
Remove and reinstall the fan tray to ensure that the fan tray is installed securely. Then execute the display fan command to verify that the fan tray has changed to Normal state. If the fan tray remains in Absent state, replace the fan tray.
3. Verify that the fan tray is operating correctly.
a. Identify whether the fan tray is faulty.
- Execute the display environment command to view temperature information.
If the temperature continues to rise, put your hand at the air outlet to feel if air is being expelled out of the air outlet. If no air is being expelled out of the air outlet, the fan tray is faulty.
- Execute the display fan command to view the fan speed information.
If the fan speed is less than 500 rpm, the fan tray is faulty.
b. If the fan tray is faulty, remove and reinstall the fan tray to make sure the fan tray is installed securely. Then execute the display fan command to verify that the fan tray has changed to Normal state.
c. If the fan tray remains in Fault state, replace the fan tray.
You must make sure the switching operating temperature is below 60°C (140°F) while you replace the fan tray. If a new fan tray is not readily available, power off the switch to avoid damage caused by high temperature.
4. If the issue persists, contact H3C Support.
Related commands
This section lists the commands that you might use for troubleshooting the hardware.
Command |
Description |
display alarm |
Displays alarm information. |
display environment |
Displays temperature information. |
display fan |
Displays the operating states of the fan tray. |
display power |
Displays power module information. |
Troubleshooting system management
This section provides troubleshooting information for common system management issues.
High CPU utilization
Symptom
The sustained CPU utilization on a device is apparently higher than the CPU utilization on other devices.
Troubleshooting flowchart
Figure 2 Troubleshooting high CPU utilization
Solution
To resolve the issue:
1. Identify the job that has a high CPU utilization. For example:
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display process cpu slot 1
CPU utilization in 5 secs: 6.0%; 1 min: 5.6%; 5 mins: 5.7%
JID 5Sec 1Min 5Min Name
1 0.0% 0.0% 0.0% scmd
2 0.0% 0.0% 0.0% [kthreadd]
3 0.0% 0.0% 0.0% [migration/0]
4 0.0% 0.0% 0.0% [ksoftirqd/0]
5 0.0% 0.0% 0.0% [watchdog/0]
6 0.0% 0.0% 0.0% [migration/1]
7 0.0% 0.0% 0.0% [ksoftirqd/1]
8 0.0% 0.0% 0.0% [watchdog/1]
9 0.0% 0.0% 0.0% [migration/2]
10 0.0% 0.0% 0.0% [ksoftirqd/2]
11 0.0% 0.0% 0.0% [watchdog/2]
12 0.0% 0.0% 0.0% [migration/3]
13 0.0% 0.0% 0.0% [ksoftirqd/3]
14 0.0% 0.0% 0.0% [watchdog/3]
15 0.0% 0.0% 0.0% [migration/4]
16 0.0% 0.0% 0.0% [ksoftirqd/4]
17 0.0% 0.0% 0.0% [watchdog/4]
18 0.0% 0.0% 0.0% [migration/5]
19 0.0% 0.0% 0.0% [ksoftirqd/5]
20 0.0% 0.0% 0.0% [watchdog/5]
21 0.0% 0.0% 0.0% [migration/6]
The output shows the average CPU usage values of jobs for the last 5 seconds, 1 minute, and 5 minutes. Typically, the average CPU usage of a job is less than 5%.
2. Display the job's stack. In this example, the job uses the ID of 284.
[Sysname-probe]follow job 284 slot 1
Attaching to process 284 ([OPTK])
Iteration 1 of 5
------------------------------
Kernel stack:
[<ffffffff804ad9f0>] schedule+0x710/0x1050
[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0
[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450
[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]
[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]
[<ffffffff80266470>] kthread+0x140/0x150
[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20
Iteration 2 of 5
------------------------------
Kernel stack:
[<ffffffff804ad9f0>] schedule+0x710/0x1050
[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0
[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450
[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]
[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]
[<ffffffff80266470>] kthread+0x140/0x150
[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20
Iteration 3 of 5
------------------------------
Kernel stack:
[<ffffffff804ad9f0>] schedule+0x710/0x1050
[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0
[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450
[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]
[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]
[<ffffffff80266470>] kthread+0x140/0x150
[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20
Iteration 4 of 5
------------------------------
Kernel stack:
[<ffffffff804ad9f0>] schedule+0x710/0x1050
[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0
[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450
[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]
[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]
[<ffffffff80266470>] kthread+0x140/0x150
[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20
Iteration 5 of 5
------------------------------
Kernel stack:
[<ffffffff804ad9f0>] schedule+0x710/0x1050
[<ffffffff804ae5d8>] schedule_timeout+0x98/0xe0
[<ffffffff803187d0>] kepoll_wait+0x2d0/0x450
[<ffffffffc71b29d4>] DWARE_OPTMOD_TaskEntry+0xa4/0xd0 [system]
[<ffffffffc72e1894>] thread_boot+0x74/0x90 [system]
[<ffffffff80266470>] kthread+0x140/0x150
[<ffffffff8021d910>] kernel_thread_helper+0x10/0x20
3. Save the information displayed in the previous steps and use the display diagnostic-information command to collect diagnostic information.
4. Contact H3C Support.
High memory utilization
Symptom
The display memory command shows that the memory utilization of the device is higher than 60% during a period of time (typically 30 minutes).
Troubleshooting flowchart
Figure 3 Troubleshooting high memory utilization
Solution
To resolve the issue:
1. Execute the display system internal kernel memory pool command multiple times to display kernel memory pool usage information. Identify the memory pool that shows a suspicious utilization increase.
[Sysname-probe] display system internal kernel memory pool slot 1
Active Number Size Align Slab Pg/Slab ASlabs NSlabs Name
9126 9248 64 8 32 1 289 289 kmalloc-64
105 112 16328 0 2 8 54 56 kmalloc-16328
14 14 2097096 0 1 512 14 14 kmalloc-2097096
147 225 2048 8 15 8 12 15 kmalloc-2048
7108 7232 192 8 32 2 226 226 kmalloc-192
22 22 524232 0 1 128 22 22 kmalloc-524232
1288 1344 128 8 21 1 64 64 kmalloc-128
0 0 67108808 0 1 16384 0 0 kmalloc-67108808
630 651 4096 8 7 8 93 93 kmalloc-4096
68 70 131016 0 1 32 68 70 kmalloc-131016
1718 2048 8 8 64 1 31 32 kmalloc-8
1 1 16777160 0 1 4096 1 1 kmalloc-16777160
2 15 2048 0 15 8 1 1 sgpool-64
0 0 40 0 42 1 0 0 inotify_event_cache
325 330 16328 8 2 8 165 165 kmalloc_dma-16328
0 0 72 0 30 1 0 0 LFIB_IlmEntryCache
0 0 1080 0 28 8 0 0 LFIB_IlmEntryCache
0 0 1464 0 21 8 0 0 MFW_FsCache
1 20 136 0 20 1 1 1 L2VFIB_Ac_cache
0 0 240 0 25 2 0 0 CCF_JOBDESC
0 0 88 0 26 1 0 0 NS4_Aggre_TosSrcPre
0 0 128 0 21 1 0 0 IPFS_CacheHash_cachep
---- More ----
Observe the following items:
¡ Changes of the values in the Number column. This column indicates the allocated numbers of memory objects.
¡ Change rates of the values in the Number column.
¡ Values in the Active column. This column indicates the memory usage.
If the value in the Number column for a memory pool continually increases or increases quickly or the values in the Number column and Active column are suspicious, the memory pool might have memory leakage issues.
The memory leakage process might be slow. You might take a long period of time (for example, several weeks) to observe and identify the situation.
2. Display the call information for the memory pool that might have memory leakage issues. This example uses the kmalloc-2048 memory pool.
[Sysname-probe]view /sys/kernel/slab/kmalloc-2048/alloc_calls
23 kque_create+0x58/0x260 age=4262117/4404939/4692659 pid=128-372 cpus=0,2-3
2 sys_init_module+0x1bdc/0x1e50 age=4746250/4748179/4750108 pid=109-128 cpus=9,12
4 __vmalloc_area_node+0x154/0x1b0 age=4652363/4677089/4747310 pid=128-166
cpus=0-1,12
16 percpu_populate+0x3c/0x60 age=4322758/4322758/4322758 pid=128 cpus=0
21 alloc_pipe_info+0x24/0x60 age=4/3888025/4320768 pid=1-564 cpus=0-4,9,11
29 alloc_pci_dev+0x18/0x40 age=4758366/4758366/4758368 pid=1 cpus=15
2 init_dev+0x1c0/0x870 age=510128/2630142/4750157 pid=1-542 cpus=0,2
1 init_dev+0x4dc/0x870 age=510128 pid=542 cpus=2
2 kobj_map_init+0x2c/0xd0 age=4758371/4758535/4758700 pid=0-1 cpus=0,15
2 usb_alloc_dev+0x38/0x200 age=4750540/4750605/4750671 pid=1 cpus=15
1 usb_create_hcd+0x34/0x120 age=4750540 pid=1 cpus=15
16 exception_notifier_init+0x298/0x4f8 age=4750380/4750380/4750381 pid=1 cpus=15
1 drv_port_module_varialbe_init+0x24/0x80 [system] age=4651959 pid=128 cpus=0
1 DRV_VLAN_BasicFunc_Init+0x1ec/0x700 [system] age=4651871 pid=128 cpus=0
1 drv_vlan_maccash_init+0x124/0x240 [system] age=4651869 pid=128 cpus=0
1 drv_ipmc_spec_init+0x54/0x840 [system] age=4650355 pid=128 cpus=0
1 drv_evb_add_broadcast_group+0x964/0xa50 [system] age=4264182 pid=312 cpus=1
2 DRV_EVB_MAP_AddRec+0x160/0x2a0 [system] age=4264142/4264175/4264209 pid=288 cpus=9
1 drv_evi_localmac_init+0x160/0x650 [system] age=4651896 pid=128 cpus=0
1 DRV_QINQ_Init+0x278/0x890 [system] age=4650270 pid=128 cpus=0
1 DRV_QINQ_Init+0x478/0x890 [system] age=4650270 pid=128 cpus=0
1 Drv_Qacl_InitAddUdfTemplate+0x68/0xb30 [system] age=4651968 pid=128 cpus=0
1 drv_qacl_sal_rsc_init+0xc8/0x210 [system] age=4651968 pid=128 cpus=0
---- More ----
The first field in the output shows the number of allocated memory blocks. The remaining fields show the call information.
3. Save the information displayed in the previous steps.
4. Contact H3C Support.
|
IMPORTANT: As a best practice to retain critical diagnostic information, do not reboot the device before you contact H3C Support. |
Temperature alarms
Symptom
Temperature alarms occur.
Troubleshooting flowchart
Figure 4 Troubleshooting temperature alarms
Solution
To resolve the issue:
1. Check the ambient temperature. If the ambient temperature is higher than 22 °C (71.6°F), lower the temperature by adding air conditioners or taking other heat dissipation measures.
2. Check the device temperature. If the device temperature is higher than the high-temperature warning threshold, use the display fan command to verify that the fans a operating correctly. If the status of a fan is Fault, replace the fan.
3. Check whether the air filters are clean. If they are not, clean them.
4. If the issue persists, use the temperature-limit command to adjust the temperature alarm thresholds.
5. Use the display environment command to identify whether the temperature alarm thresholds are adjusted successfully. If the thresholds are not adjusted, temperature related components of the device might have failed. Replace the device.
6. If the thresholds are adjusted but the issue persists, collect diagnostic information and contact H3C Support.
Related commands
This section lists the commands that you might use for troubleshooting system management.
Command |
Description |
display environment |
Displays temperature information, including the current temperature and the temperature alarm thresholds. |
display fan |
Displays the operating status of all fans on the switch. |
display memory |
Displays memory usage statistics. |
display process cpu |
Displays the CPU usage statistics for jobs. This command is available in probe view. |
display system internal kernel memory pool |
Displays kernel memory pool usage information. This command is available in probe view. |
follow job job-id |
Displays the stack of a job. This command is available in probe view. |
temperature-limit |
Sets temperature alarm thresholds. |
view /sys/kernel/slab/<modulename>/alloc_calls |
Displays the number of allocated memory blocks and the call information. This command is available in probe view. |
Troubleshooting ports
This section provides troubleshooting information for common port issues.
10-Gigabit SFP+ fiber port fails to come up
Symptom
A 10-Gigabit SFP+ fiber port fails to come up.
Troubleshooting flowchart
Figure 5 Troubleshooting link up failure on a port
Solution
To resolve the issue:
1. Verify that the speed and duplex mode of the local port match the speed and duplex mode of the peer port:
a. Execute the display interface brief command to examine whether the speed and duplex mode of the port match the speed and duplex mode of the peer port.
b. If they do not match, use the speed command and the duplex command to set the speed and duplex mode for the port.
2. Verify that the speed and duplex mode of the port match the speed and duplex mode of the transceiver module:
a. Execute the display interface brief command to examine whether the speed and duplex mode of the port match the speed and duplex mode of the transceiver module.
b. If they do not match, use the speed command and the duplex command to set the speed and duplex mode for the port.
3. Verify that the local and peer ports are operating correctly:
Use a 10-Gigabit SFP+ cable to connect the local port directly to another 10-Gigabit SFP+ fiber port on the same device.
¡ If the local port can come up, replace the peer port with a new one.
¡ If the local port cannot come up, replace the local port with a new one.
4. Verify that the transceiver module and cable are operating correctly:
a. Execute the display transceiver alarm interface command to check for alarms on the transceiver module.
- The device displays None if no error has occurred.
- The device displays alarms if the transceiver module has failed or if the type of the transceiver module does not match the port type.
b. Use an optical power meter to verify that the Tx power and Rx power of the transceiver module are stable and are within the correct range.
c. Execute the display transceiver interface command to verify that the local transceiver module has the same wavelength and transmission distance as the peer transceiver module.
d. If the transceiver module or cable is not operating correctly, replace it with a new H3C transceiver module or cable that matches the fiber port.
For more information about transceiver modules and cables, see the installation guide for the switch.
5. Verify that the fiber matches the transceiver module. If they do not match, replace the fiber with a new one that matches the transceiver module.
6. If the issue persists, collect diagnostic information and contact H3C Support.
To collect diagnostic information, execute the display diagnostic-information command to display or save running status data for multiple feature modules.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:Y
100-GE QSFP28 fiber port fails to come up
Symptom
A 100-GE QSFP28 fiber port fails to come up.
Troubleshooting flowchart
Figure 6 Troubleshooting link up failure on a port
Solution
To resolve the issue:
1. Verify that the transceiver module is operating correctly:
a. Execute the display transceiver alarm interface command to check for alarms on the transceiver module.
- The device displays None if no error has occurred.
- The device displays alarms if the transceiver module has failed or if the type of the transceiver module does not match the port type.
b. Use an optical power meter to verify that the Tx power and Rx power of the transceiver module are stable and are within the correct range.
c. Execute the display transceiver interface command to verify that the local transceiver module has the same wavelength and transmission distance as the peer transceiver module.
d. If the transceiver module is not operating correctly, replace it with a new H3C transceiver module that matches the fiber port.
For more information about transceiver modules, see the installation guide for the switch.
2. Verify that the fiber matches the transceiver module. If they do not match, replace the fiber with a new one that matches the transceiver module.
3. Verify that the local port is operating correctly:
Replace the local port with a new one to identify whether this issue has been resolved.
4. Verify that the peer port is operating correctly:
Replace the peer port with a new one to identify whether this issue has been resolved.
5. If the issue persists, collect diagnostic information and contact H3C Support.
To collect diagnostic information, execute the display diagnostic-information command to display or save running status data for multiple feature modules.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:Y
Non-H3C transceiver module error message
Symptom
The output from the display logbuffer command shows that the transceiver module is not an H3C transceiver module.
<Sysname> display logbuffer
HundredGigE1/0/25: This transceiver is NOT sold by H3C. H3C therefore shall NOT guarantee the normal function of the device or assume the maintenance responsibility thereof!
Troubleshooting flowchart
Figure 7 Troubleshooting non-H3C transceiver module error message
Solution
To resolve the issue:
1. Identify whether the transceiver module is an H3C transceiver module:
Execute the display transceiver interface command to view the vendor name of the transceiver module. If the vendor name field does not display H3C, replace the transceiver module with an H3C transceiver module.
2. If the vendor name field displays H3C, perform the following tasks:
a. Execute the debug port optical-eeprom command to save the transceiver module information.
[Sysname-probe]debug port optical-eeprom 1 0 25 0 160 0 128
The eeprom information of HundredGigE1/0/25:
=======================================================================
0x00: 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10: 00 00 00 00 00 00 1f 06 1f 06 81 58 00 00 00 00
0x20: 00 00 00 00 00 00 00 00 00 07 00 00 00 00 00 00
0x30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 81 28
0x70: 21 4d 00 00 00 00 00 00 00 00 00 00 00 00 00 01
b. Provide the information to H3C Support to verify that the transceiver module is an H3C transceiver module. If it is not, replace it with an H3C transceiver module.
3. If the issue persists, contact H3C Support.
Transceiver module does not support digital diagnosis
Symptom
The output from the display transceiver diagnosis interface command shows that the transceiver module does not support the digital diagnosis function.
<Sysname> display transceiver diagnosis interface HundredGigE1/0/25
Error: The transceiver does not support this function.
Troubleshooting flowchart
Figure 8 Troubleshooting digital diagnosis failure on a transceiver module
Solution
To resolve the issue:
1. Verify that the transceiver module is an H3C transceiver module:
Execute the display transceiver interface command to view the vendor name of the transceiver module.
¡ If the vendor name field does not display H3C, replace the transceiver module with an H3C transceiver module.
¡ If the vendor name field displays H3C, perform the following tasks:
- Execute the display transceiver register interface command in probe view to save the transceiver module information.
- Provide the information to H3C Support to verify that the transceiver module is an H3C transceiver module. If it is not, replace it with an H3C transceiver module.
2. Execute the display transceiver interface command to save the transceiver module information, and contact H3C Support to verify that the transceiver module supports the digital diagnosis function.
3. If the issue persists, collect diagnostic information and contact H3C Support.
To collect diagnostic information, execute the display diagnostic-information command to display or save running status data for multiple feature modules.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:Y
Error frames (for example, CRC errors) on a port
Symptom
In the output from the display interface command, error frames exist (for example, CRC error frames).
Troubleshooting flowchart
Figure 9 Troubleshooting error frames (for example, CRC errors) on a port
Solution
To resolve the issue:
1. Examine the error frame statistics and identify the error frame type:
a. (Optional.) Use the reset counter interface command in user view to clear the packet statistics of the port.
This command resets all packet counters to 0, so that you can view the statistics changes more clearly.
b. Use the display interface command to display the incoming packet statistics and outgoing packet statistics of the port.
c. Determine the type of error frames that are accumulating.
2. If the port is a fiber port, verify that the optical power of the transceiver module is operating correctly:
a. Use the display transceiver diagnosis interface command to view the present measured values of the digital diagnosis parameters for the transceiver module.
<Sysname> display transceiver diagnosis interface HundredGigE1/0/25
HundredGigE1/0/25 transceiver diagnostic information:
Current diagnostic parameters:
[module] Temp.(°C) Voltage(V)
23 3.33
[channel] Bias(mA) RX power(dBm) TX power(dBm)
1 33.82 0.99 0.36
2 34.59 1.15 0.35
3 35.56 0.85 0.10
4 33.78 -0.39 0.20
Alarm thresholds:
Temp.(°C) Voltage(V) Bias(mA) RX power(dBm) TX power(dBm)
High 78 3.63 105.00 3.30 5.30
Low -5 2.97 8.00 -17.70 -11.00
b. If the optical power of the transceiver module is not within the correct range, replace the transceiver module with a transceiver module of the same model that is operating correctly.
3. Verify that the port configurations are correct:
a. Execute the display interface brief command.
b. Determine whether the speed and duplex mode of the port match the speed and duplex mode of the peer port.
c. If they do not match, use the speed command and the duplex command to set the speed and duplex mode for the port.
4. Verify that the link medium connected to the port is operating correctly.
Plug the link medium into a new port that is operating correctly. If error frames still exist, replace the link medium.
5. Determine whether the port has received a large amount of flow control frames:
a. Use the display interface command to view the number of pause frames.
If the number of pause frames is accumulating, you can determine that the port has sent or received a large amount of flow control frames.
b. Verify that the incoming traffic and outgoing traffic have not exceeded the maximum traffic processing capability of the local device and the peer device.
6. If the issue persists, collect diagnostic information and contact H3C Support.
To collect diagnostic information, execute the display diagnostic-information command to display or save running status data for multiple feature modules.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:Y
Failure to receive packets
Symptom
A port is up, but it cannot receive packets.
Troubleshooting flowchart
Figure 10 Troubleshooting failure to receive packets
Solution
To resolve the issue:
1. Verify that the ports at both ends are up.
2. Examine the packet statistics of the port:
a. (Optional.) Use the reset counter interface command in user view to clear the packet statistics of the port.
This command resets all packet counters to 0, so that you can view the statistics changes more clearly.
b. Use the display interface command to verify that the number of incoming packets is accumulating.
c. Verify that the number of error frames is not accumulating.
3. Verify that the port configurations do not affect packet receiving:
a. Use the display interface brief command to verify that the port configurations are correct.
The port configurations include the duplex mode, speed, port type, and VLAN configurations of the ports at both ends of the link. If configuration errors exist, modify the port configurations. If the port still fails to receive packets, use the shutdown command and then the undo shutdown command to re-enable the port.
b. If the port is configured with the spanning tree feature, use the display stp brief command to verify that the port is not in the discarding state.
If the port is set to the discarding state by the spanning tree feature, examine and modify the spanning tree feature configurations to resolve the issue.
As a best practice, configure the port as an edge port or disable the spanning tree feature on the port if it is directly connected to a terminal.
c. If the port is in an aggregation group, use the display link-aggregation summary command to verify that the status of the port is Selected.
If the status of the port is Unselected, the port cannot send or receive data packets. Determine the reasons why the port becomes Unselected, for example, the attribute configurations of the port are different from the reference port. Modify the attribute configurations of the port to make the port become Selected.
4. Verify that the link medium connected to the port is operating correctly.
Plug the link medium into a new port that is operating correctly. If the new port cannot receive packets, replace the link medium.
5. If the issue persists, collect diagnostic information and contact H3C Support.
To collect diagnostic information, execute the display diagnostic-information command to display or save running status data for multiple feature modules.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:Y
Failure to send packets
Symptom
A port is up, but it cannot send packets.
Troubleshooting flowchart
Figure 11 Troubleshooting failure to send packets
Solution
To resolve the issue:
1. Verify that the ports at both ends are up.
2. Examine the packet statistics of the port:
a. (Optional.) Use the reset counter interface command in user view to clear the packet statistics of the port.
This command resets all packet counters to 0, so that you can view the statistics changes more clearly.
b. Use the display interface command to verify that the number of outgoing packets is accumulating.
c. Verify that the number of error frames is not accumulating.
3. Verify that the port configurations do not affect packet sending:
a. Use the display interface brief command to verify that the port configurations are correct.
The port configurations include the duplex mode, speed, port type, and VLAN configurations of the ports at both ends of the link. If configuration errors exist, modify the port configurations. If the port fails to send packets, use the shutdown command and then the undo shutdown command to re-enable the port.
b. If the port is configured with the spanning tree feature, use the display stp brief command to verify that the port is not in the discarding state.
If the port is set to the discarding state by the spanning tree feature, examine and modify the spanning tree feature configurations to resolve the issue.
As a best practice, configure the port as an edge port or disable the spanning tree feature on the port if it is directly connected to a terminal.
c. If the port is in an aggregation group, use the display link-aggregation summary command to verify that the status of the port is Selected.
If the status of the port is Unselected, the port cannot send or receive data packets. Determine the reasons why the port becomes Unselected, for example, the attribute configurations of the port are different from the reference port. Modify the attribute configurations of the port to make the port become Selected.
4. Verify that the link medium connected to the port is operating correctly.
Plug the link medium into a new port that is operating correctly. If the new port cannot send packets, replace the link medium.
5. If the issue persists, collect diagnostic information and contact H3C Support.
To collect diagnostic information, execute the display diagnostic-information command to display or save running status data for multiple feature modules.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:Y
Incorrect port information
Symptom
The port transceiver module type cannot be correctly identified, and the diagnosis information is incorrect. The port transceiver module is displayed as absent, or sometimes present and sometimes absent.
Troubleshooting flowchart
Figure 12 Troubleshooting incorrect port information
Solution
1. Execute the display transceiver interface command to view basic information of the transceiver modules and identify the incorrect information of the transceiver module.
2. See the installation guide for the device to identify whether the transceiver module is supported. If the transceiver module is not supported, replace it with an H3C transceiver module supported by the device.
3. Identify whether the I2C of the port is normal.
a. Identify whether the transceiver module type or diagnosis information of another port of the same PHY (typically, a neighboring port) is incorrect. If the transceiver module type or diagnosis information of another port is also incorrect, the I2C might be abnormal. Contact H3C Support.
b. If another port does not have this problem, replace the transceiver module. If the problem is resolved after replacement, contact H3C Support.
4. Execute commands in the following table to collect information and contact H3C Support.
Command |
Description |
display transceiver diagnosis interface |
Displays the current values of the digital diagnosis parameters on transceiver modules. |
display transceiver interface |
Displays the key parameters of transceiver modules. |
display transceiver manuinfo interface |
Displays electronic label information for transceiver modules. |
display transceiver information interface |
Displays transmission information for transceiver modules. |
A port fails to come up
Symptom
A port fails to come up.
Troubleshooting flowchart
Figure 13 Troubleshooting the failure of a port to come up
Solution
1. Perform a basic replacement test to remove link, interface module, and peer port problems. For more information, see "10-Gigabit SFP+ fiber port fails to come up. "
2. Identify whether the software configuration is correct.
a. Read the Tx and Rx optical powers of the port multiple times. If a problem exists, troubleshoot the link.
b. Execute the following command multiple times. If a problem occurs, contact H3C Support.
====debug port optical-module chassis 1 slot 0====
[Interface] [Exist?] [Tx_fault] [Rx_los] [Tx_disable]
===============================================================
XGE1/0/1 yes normal normal no
3. Execute commands in the following table to collect information and contact H3C Support.
Command |
Description |
display transceiver diagnosis interface |
Displays the current values of the digital diagnosis parameters on transceiver modules. |
display transceiver interface |
Displays the key parameters of transceiver modules. |
display transceiver manuinfo interface |
Displays electronic label information for transceiver modules. |
display transceiver information interface |
Displays transmission information for transceiver modules. |
bcm slot 6 chip 0 phy/control/xl1/dump |
Collects the port register information. (This operation will not cause flapping of ports on the same chip. To see the xl1 port name. execute the debug port mapping slot command.) |
A port flaps
Symptom
A port flaps.
Troubleshooting flowchart
Figure 14 Troubleshooting port flapping
Solution
1. Perform a basic replacement test to remove link, interface module, and port problems.
a. Remove link problems. For more information, see "10-Gigabit SFP+ fiber port fails to come up.
b. Execute the display transceiver diagnosis interface command multiple times to view the Tx and Rx optical powers of the port. If port flapping is caused by the Tx/Rx power instability, troubleshoot the link.
c. If port flapping is caused by CRC error packets on the port, remove the problem of CRC error packets. For more information, see "CRC error packets on a port."
2. See the known problem list in the S6890 switch series usage guidelines to identify whether port flapping is caused by known problems. If port flapping is caused by known problems, upgrade the software or install patches according to the problem description and identify whether the port flapping problem has been resolved.
3. Identify whether the problem is caused by the MAC-faults of the port.
MAC-faults include the following types:
¡ local fault—A fault exists on the local end in the direction of receiving packets from the remote end. In this case, a local fault prompt message appears on the local device, and you must locate the problem.
¡ remote fault—When the local fault occurs on a remote port, the remote end sends a message to the local end. A remote fault prompt message appears on the local device, which indicates that no fault exists on the local device in the direction of receiving packets. In this case, assist in troubleshooting the remote end.
MAC faults are not directly related to MAC addresses. If the issue persists after you remove the link and module problems, contact H3C Support.
[probe] debug port link-diag slot 2 0 1 0
Chassis 0 Slot 2/port/tgid:0 operation record, need be translated with code and line num
[0]:PID=128 TName=FMCK Line=5161 03/24/2016 13:26:26:369
m0=92 m1=1 m2=1 drv_port_opticmod_check_port_in
[7]:PID=193 TName=kifup Line=8592 03/24/2016 13:26:25:104
m0=1 m1=0 m2=0 drv_port_platform_event_notify
Slot:2/0 link down reason.
[0]: CallLine=1314 Time=03/23/2016 14:03:55:610
linkinfo=0xfd down. pma. pcs. xgs. fault. localfault
[9]: CallLine=1314 Time=03/23/2016 05:05:14:565
linkinfo=0xbd down. pma. pcs. xgs. fault. rmtfault
4. Execute commands in the following table to collect information and contact H3C Support.
Command |
Description |
display transceiver diagnosis interface |
Displays the current values of the digital diagnosis parameters on transceiver modules. |
display transceiver interface |
Displays the key parameters of transceiver modules. |
display transceiver manuinfo interface |
Displays electronic label information for transceiver modules. |
display transceiver information interface |
Displays transmission information for transceiver modules. |
bcm slot 6 chip 0 phy/control/xl1/dump |
Collects the port register information. (This operation will not cause flapping of ports on the same chip. To see the xl1 port name. execute the debug port mapping slot command.) |
CRC error packets on a port
Symptom
A large number of CRC error packets exist on the local or remote port.
Troubleshooting flowchart
Figure 15 Troubleshooting CRC error packets on a port
Solution
1. Perform a basic replacement test to remove link, interface module, and port problems.
a. Remove link problems. For more information, see "10-Gigabit SFP+ fiber port fails to come up.
b. Execute the display transceiver diagnosis interface command multiple times to view the Tx and Rx optical powers of the port. If CRC error packets on the port are caused by the Tx/Rx power instability, troubleshoot the link.
2. Identify whether the problem is known.
a. See the known problem list in the S6890 switch series usage guidelines to identify whether CRC error packets are caused by known problems. If CRC error packets are caused by known problems, upgrade the software or install patches according to the problem description and identify whether the problem of CRC error packets has been resolved.
b. Troubleshoot the problems of port failure to come up, port flapping, and CRC error packets on a port together. The three symptoms are sometimes related, and it is hard to locate the root cause. When one of the problems occurs, as a best practice, troubleshoot the three problems together.
3. Execute commands in the following table to collect information and contact H3C Support.
Command |
Description |
display transceiver diagnosis interface |
Displays the current values of the digital diagnosis parameters on transceiver modules. |
display transceiver interface |
Displays the key parameters of transceiver modules. |
display transceiver manuinfo interface |
Displays electronic label information for transceiver modules. |
display transceiver information interface |
Displays transmission information for transceiver modules. |
bcm slot 6 chip 0 phy/control/xl1/dump |
Collects the port register information. (This operation will not cause flapping of interfaces on the same chip. To see the xl1 port name. execute the debug port mapping slot command.) |
Related commands
This section lists the commands that you might use for troubleshooting ports.
Command |
Description |
display diagnostic-information |
Displays or saves running status data for multiple feature modules. |
display interface |
Displays Ethernet interface information. |
display interface brief |
Displays brief interface information. |
display link-aggregation summary |
Displays the summary information for all aggregation groups. |
display logbuffer |
Displays the state of the log buffer and the log information in the log buffer. |
display stp brief |
Displays brief spanning tree status and statistics. |
display transceiver alarm interface |
Displays the current transceiver module alarms. |
display transceiver diagnosis |
Displays the present measured values of the digital diagnosis parameters for transceiver modules. |
display transceiver interface |
Displays the key parameters of transceiver modules. |
Troubleshooting IRF
This section provides troubleshooting information for common IRF issues.
IRF fabric setup failure
Symptom
A device cannot be added to an IRF fabric.
Troubleshooting flowchart
Figure 16 Troubleshooting IRF fabric setup failure
Solution
1. Execute the display irf command to verify that the number of member devices in the IRF fabric does not reach the upper limit.
If the upper limit is reached, you cannot add new member devices to the IRF fabric. The upper limit varies by software version. For information about the upper limit, see IRF configuration in the configuration guides for your software version.
2. Verify that the device is the same model as the member devices in the IRF fabric.
The H3C S6890 switches must be the same model to form an IRF fabric.
3. Verify that the member ID of the device is unique in the IRF fabric:
a. Execute the display irf command to view member IDs.
b. Execute the irf member renumber command to assign a new member ID to the device if necessary.
4. Verify that the physical IRF links are connected correctly:
|
IMPORTANT: When you connect two neighboring IRF members, you must connect the physical interfaces of IRF-port 1 on one member to the physical interfaces of IRF-port 2 on the other. |
a. Execute the display irf configuration command on each member device, and check the IRF-Port1 and IRF-Port2 fields for IRF port bindings.
b. Verify that the physical IRF connections are consistent with the IRF port bindings.
c. If there are inconsistencies, reconfigure the IRF port bindings or reconnect the IRF physical interfaces.
5. Verify that all IRF links are up for the device:
|
IMPORTANT: An IRF port is a logical interface that connects IRF member devices. To use an IRF port, you must bind a minimum of one physical interface to it. The physical interfaces assigned to an IRF port automatically form an aggregate IRF link. An IRF port goes down when all its IRF physical interfaces are down. |
a. Execute the display irf topology command, and then check the Link field.
b. If the Link field for an IRF port displays DOWN, execute the display irf link command.
c. Check the Status field for each physical interface bound to the IRF port. If the field displays ADM, execute the undo shutdown command to bring up the interface. If the field displays DOWN, check the transceiver modules or cables for connectivity issues. When you select transceiver modules and cables, follow these restrictions and guidelines:
- Use SFP+ transceiver modules and fibers to connect SFP+ ports for a long-distance connection, or use SFP+ DAC cables to connect SFP+ ports for a short-distance connection.
- Use QSFP28 or QSFP+ transceiver modules and fibers to connect QSFP28 ports for a long-distance connection, or use QSFP28 or QSFP+ DAC cables to connect QSFP28 ports for a short-distance connection.
- The transceiver modules at the two ends of an IRF link must be the same type.
|
NOTE: Supported transceiver modules and cables might vary depending on the software version and device model. For more information, see the most recent installation guide. |
6. Verify that the device is running the same software version as the IRF fabric:
a. Execute the display version command to identify the software version.
b. Upgrade the device to use the same software version as the IRF fabric.
|
NOTE: Typically, the software auto-update feature can automatically synchronize a member device with the software version of the master device. However, the synchronization might fail when the gap between the software versions is large. |
7. Verify that the device has a unique bridge MAC address:
a. Execute the display interface vlan-interface 1 command, and then check the Hardware Address field.
[Sysname] display interface vlan-interface 1
Vlan-interface1 current state: UP
Line protocol current state: UP
Description: Vlan-interface1 Interface
The Maximum Transmit Unit is 1500
Internet protocol processing : disabled
IP Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0023-8912-3d07
IPv6 Packet Frame Type: PKTFMT_ETHNT_2, Hardware Address: 0023-8912-3d07
b. If the device has the same bridge MAC address as the IRF fabric, remove the bridge MAC conflict.
8. If the issue persists, execute the display diagnostic-information command and collect the device diagnostic information, and then send the information to contact H3C Support.
IRF split
Symptom
An IRF fabric splits.
Troubleshooting flowchart
Figure 17 Troubleshooting IRF split
Solution
To resolve the issue:
1. Verify that the IRF physical interfaces are operating correctly:
a. Execute the display logbuffer command or view system logs to check for physical IRF link down events that occurred around the split time.
b. If the split followed an IRF link down event, execute the display interface command to check port statistics for CRC errors.
c. If transceiver modules and fibers are used, execute the display transceiver diagnosis command. Make sure the transmit and receive power has not exceeded the power specifications of the fiber module.
2. Remove hardware issues that might cause recurring IRF split events:
a. Execute the display version command to identify the uptime of IRF member devices.
b. Compare the uptime of IRF member devices to determine whether a member device had rebooted before the IRF split.
c. If the IRF split is caused by a device reboot, perform one of the following tasks:
- If the split is caused by a device reboot, use the methods described in "Switch reboot failure" to resolve the issue.
- If the reboot is caused by power failures, use the methods described in "Operating power module failure" to resolve the issue.
3. If the issue persists, execute the display diagnostic-information command and collect the device diagnostic information, and then send the information to H3C Support.
BFD MAD failure
Symptom
BFD MAD fails to detect an IRF split event. Two IRF fabrics are operating with the same Layer 3 configurations, including the same IP address.
Troubleshooting flowchart
Figure 18 Troubleshooting BFD MAD failure
Solution
To resolve the issue:
1. Verify that BFD MAD link connections are correct:
¡ If you do not use an intermediate device, verify that each pair of member devices has a dedicated BFD MAD link.
¡ If you use an intermediate device, verify that each member device has a dedicated BFD MAD link with the intermediate device.
As a best practice to conserve management Ethernet port resources, use an intermediate device if you use management Ethernet ports for BFD MAD.
2. Verify that all physical ports used for BFD MAD are up:
a. Execute the display interface command.
b. Check the Current state field in the command output:
- If the field displays Administratively DOWN for a port, execute the undo shutdown command to bring up the port.
- If the field displays DOWN for a port, check the physical link for a link failure.
3. Verify that the Layer 3 interface used for BFD MAD and the BFD MAD VLAN (if any) are configured correctly:
a. Execute the display mad verbose command.
b. Check the MAD BFD enabled interface field to identify the Layer 3 interface used for BFD MAD. This interface can be a VLAN interface, Layer 3 aggregate interface, or management Ethernet port.
|
NOTE: The interface used for MAD BFD can be a VLAN interface, Layer 3 aggregate interface, or management Ethernet port, depending on the software version. For more information, see IRF configuration in the configuration guides for the software version of your device. |
c. Verify that the Layer 3 interface used for BFD MAD is configured as required depending on the interface type. Table 3, Table 4, and Table 5 list the restrictions and guidelines on configuring a VLAN interface, Layer 3 aggregate interface, and management Ethernet port, respectively.
Table 3 BFD MAD configuration restrictions and guidelines (VLAN interface)
Category |
Restrictions and guidelines |
BFD MAD VLAN |
· Do not enable BFD MAD on VLAN-interface 1. · If you are using an intermediate device, perform the following tasks: ¡ On both the IRF fabric and intermediate device, create a VLAN for BFD MAD. ¡ On both the IRF fabric and intermediate device, assign the ports of BFD MAD links to the BFD MAD VLAN. ¡ On the IRF fabric, create a VLAN interface for the BFD MAD VLAN. · Make sure the IRF fabrics on the network use different BFD MAD VLANs. · Make sure the BFD MAD VLAN contains only ports on the BFD MAD links. Exclude a port from the BFD MAD VLAN if that port is not on a BFD MAD link. If you have assigned that port to all VLANs by using the port trunk permit vlan all command, use the undo port trunk permit command to exclude that port from the BFD MAD VLAN. |
BFD MAD VLAN and feature compatibility |
Do not use the BFD MAD VLAN and its member ports for any purpose other than configuring BFD MAD. · Use only the mad bfd enable and mad ip address commands on the BFD MAD-enabled VLAN interface. If you configure other features, both BFD MAD and other features on the interface might run incorrectly.
· Disable the spanning tree feature on any Layer 2 Ethernet ports in the BFD MAD VLAN. The MAD feature is mutually exclusive with the spanning tree feature. |
Table 4 BFD MAD configuration restrictions and guidelines (Layer 3 aggregate interface)
Category |
Restrictions and guidelines |
BFD MAD-enabled Layer 3 aggregate interface |
· Make sure the Layer 3 aggregate interface operates in static aggregation mode. · Make sure the member ports in the aggregation group do not exceed the maximum number of Selected ports allowed for an aggregation group. If the number of member ports exceeds the maximum number of Selected ports, some member ports cannot become Selected. BFD MAD will be unable to work correctly and its state will change to Faulty. |
BFD MAD VLAN |
· On the intermediate device (if any), assign the ports on the BFD MAD links to the same VLAN. Do not assign the ports to an aggregate interface. If the ports are hybrid ports, make sure these ports are untagged members of their PVIDs. · If the intermediate device acts as a BFD MAD intermediate device for multiple IRF fabrics, assign different BFD MAD VLANs to the IRF fabrics. · Do not use the BFD MAD VLAN on the intermediate device for any purposes other than BFD MAD. · Make sure the BFD MAD VLAN on the intermediate device contains only ports on the BFD MAD links. Exclude a port from the BFD MAD VLAN if that port is not on a BFD MAD link. If you have assigned that port to all VLANs by using the port trunk permit vlan all command, use the undo port trunk permit command to exclude that port from the BFD MAD VLAN. |
BFD MAD-enabled Layer 3 aggregate interface and feature compatibility |
Use only the mad bfd enable and mad ip address commands on the BFD MAD-enabled interface. If you configure other features, both BFD MAD and other features on the interface might run incorrectly. |
Table 5 BFD MAD configuration restrictions and guidelines (management Ethernet port)
Category |
Restrictions and guidelines |
Management Ethernet ports for BFD MAD |
Connect a management Ethernet port on each IRF member device to the common Ethernet ports on the intermediate device. |
BFD MAD VLAN |
· On the intermediate device, create a VLAN for BFD MAD, and assign the ports used for BFD MAD to the VLAN. On the IRF fabric, you do not need to assign the management Ethernet ports to the VLAN. · Make sure the IRF fabrics on the network use different BFD MAD VLANs. · Make sure the BFD MAD VLAN on the intermediate device contains only ports on the BFD MAD links. |
4. Verify that MAD IP addresses are configured correctly:
a. Execute the display mad verbose command.
b. Check the MAD IP address field to verify that all the MAD IP addresses are on the same subnet. In addition, verify that the MAD IP addresses are unique among all IP addresses on the IRF fabric.
c. Execute the display interface command to verify that the Layer 3 interface used for BFD MAD has only MAD IP addresses configured by using the mad ip address command. For example, make sure the interface does not have a VRRP virtual address or an IP address configured by using the ip address command.
5. Verify that the physical ports in the BFD MAD VLAN are always up:
a. Execute the display logbuffer command or use system logs to check for BFD MAD port-down events that occurred around the split time.
b. Identify the cause of the events, and remove the issue.
6. If the issue persists, execute the display diagnostic-information command and collect the device diagnostic information, and then send the information to H3C Support.
LACP MAD failure
Symptom
LACP MAD fails to detect an IRF split event. Two IRF fabrics are operating with the same Layer 3 configurations, including the same IP address.
Troubleshooting flowchart
Figure 19 Troubleshooting LACP MAD failure
Solution
To resolve the issue:
1. Verify that the intermediate device is a device that supports extended LACPDUs for MAD.
If the intermediate device does not support extended LACPDUs for MAD, replace the intermediate device, or use BFD MAD for split detection.
2. Verify that each member device has a link in the link aggregation with the intermediate device.
3. Verify that the link aggregation is operating in dynamic mode.
To enable dynamic aggregation mode, use the link-aggregation mode dynamic command.
4. Verify that the aggregate interface and its member ports are up:
a. Execute the display interface command.
b. Check the Current state field of the aggregate interface:
- If the field displays Administratively DOWN, execute the undo shutdown command to bring up the interface.
- If the field displays DOWN, check the state of all its physical ports.
An aggregate interface goes down only if all its physical ports are down.
c. Check the Current state field of each member port:
- If the field displays Administratively DOWN, execute the undo shutdown command to bring up the port.
- If the field displays DOWN, check the physical link of the port for a link failure.
5. If the intermediate device is also an IRF fabric, verify that the IRF domain IDs of the two IRF fabrics are unique:
|
CAUTION: The IRF member devices send extended LACPDUs with TLVs that convey the domain ID and the active ID (the member ID of the master) of the IRF fabric. To avoid split detection failure, make sure the IRF fabric has a unique domain ID. |
a. Execute the display irf command to identify the domain ID of each IRF fabric.
b. If the IRF fabrics use the same domain ID, execute the irf domain command to change the domain ID on one IRF fabric.
6. Verify that the physical ports in the link aggregation are always up:
a. Execute the display logbuffer command or use system logs to check for port-down events around the split time.
b. Identify the event cause and remove the issue.
7. If the issue persists, execute the display diagnostic-information command and collect the device diagnostic information, and then send the information to H3C Support.
Related commands
This section lists the commands that you might use for troubleshooting IRF:
Command |
Description |
display diagnostic-information |
Displays or saves the operating statistics for multiple feature modules. |
display interface |
Displays interface information. |
display interface brief |
Displays brief interface information. |
display irf |
Displays IRF fabric information, including the member ID, role, priority, bridge MAC address, and description of each IRF member. |
display irf configuration |
Displays the IRF configuration on each member device. |
display irf topology |
Displays the IRF topology. |
display mad verbose |
Displays detailed MAD configuration. |
display transceiver diagnosis |
Displays the present measured values of the digital diagnosis parameters for transceiver modules. |
display logbuffer |
Displays log data in the log buffer. |
display version |
Displays system version information. |
Troubleshooting QoS and ACL
This section provides troubleshooting information for common QoS and ACL issues.
ACL application failure for unsupported ACL rules or insufficient resources
Symptom
The system fails to apply a packet filter or an ACL-based QoS policy to the hardware. It displays an error message that an unsupported rule exists in the ACL or hardware resources are insufficient. The following are sample error messages:
· For unsupported rules:
· For insufficient hardware resources:
Error: Slot=2 Fail to apply or refresh packet filter policy 3001 rule 25 on interface Vlan-interface6 due to lack of resources.
Warning: Classifier-behavior test in policy test applied on vlan 4079 failed in slot 2. Reason: Not enough hardware resource.
Either of the preceding two messages indicates insufficient hardware resources.
Troubleshooting flowchart
Figure 20 Troubleshooting ACL application failure
Solution
To resolve the issue in the case of unsupported rules:
1. Identify the unsupported match criterion in the rule:
a. Split the rule into multiple rules such that each rule contains one match criterion.
b. Apply the packet filter or QoS policy again.
c. Identify the unsupported criterion according to the new error message.
If the rule contains only one criterion, the criterion is not supported.
2. If the issue persists, contact H3C Support.
To resolve the issue in the case of insufficient hardware resources:
1. Check the ACL, Counter, and Meter resource usage for insufficiency.
Execute the display qos-acl resource command to display the resource usage. If the number of ACLs to be applied is higher than the value of the Remaining field, resources are insufficient. If the ACLs are to be applied globally, check the resource usage on all cards.
[Sysname] display qos-acl resource
Interfaces: XGE1/0/1 to XGE1/0/24, HGE1/0/25 to HGE1/0/30
XGE1/0/31 to XGE1/0/54, MGE0/0/0 to MGE0/0/1 (slot 1)
---------------------------------------------------------------------
Type Total Reserved Configured Remaining Usage
---------------------------------------------------------------------
VFP ACL 41984 0 0 41984 0%
IFP ACL 50176 8192 0 41984 16%
IFP Meter 30720 129 0 30591 0%
IFP Counter 8175 140 0 8035 1%
EFP ACL 20992 0 0 20992 0%
EFP Counter 4094 0 0 4094 0%
¡ IFP—Resource usage for inbound traffic.
¡ EFP—Resource usage for outbound traffic.
2. Delete unnecessary ACLs to release resources if the failure is caused by ACL resource insufficiency. Delete unnecessary ACLs that use Meter or Counter resources if the failure is caused by Meter or Counter resource insufficiency. If the ACL resources are sufficient, go to step 4.
3. Apply the packet filter or QoS policy again.
4. If the issue persists, collect diagnostic information by using the following command and contact H3C Support.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:Y
ACL application failure without an error message
Symptom
Troubleshooting flowchart
Figure 21 Troubleshooting ACL application failure
Solution
To resolve the issue:
1. Check the ACLs used by QoS policies and packet filters for overlapping rules:
a. Use the following commands to display ACLs that are used by QoS policies and packet filters:
- display packet-filter
- display qos policy user-defined
- display traffic classifier user-defined
b. Execute the display acl command to check for overlapping rules in the ACLs.
For example, the following sample output shows that rule 0 in ACL 3100 and rule 0 in ACL 3009 can both match traffic sourced from 2.2.2.1.
ACL number 3100
rule 0 permit ip source 2.2.2.2 255.255.0.0
ACL number 3009
rule 0 permit ip source 2.2.2.2 255.255.0.0
2. Check the filters and policies that use overlapping ACLs for a behavior conflict.
If two behaviors conflict, the device performs the behavior that has higher priority, as shown in Table 6.
Table 6 Rules for selecting a higher priority behavior from conflicting behaviors
Conflicting behaviors |
Higher priority behavior |
· redirect · filter permit |
redirect |
· redirect · filter deny |
filter deny |
· filter permit · filter deny |
The behavior configured first. |
3. Revise ACLs, packet filters, or QoS policies to remove the behavior conflict.
4. If the issue persists, collect diagnostic information by using the following command and contact H3C Support.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N]:Y
Related commands
This section lists the commands that you might use for troubleshooting QoS and ACLs.
Command |
Description |
display acl |
Displays configuration and match statistics for ACLs. |
display diagnostic-information |
Displays operating statistics for multiple feature modules in the system. |
display packet-filter |
Displays ACL application information for packet filtering. |
display qos-acl resource |
Displays ACL resource usage. |
display qos policy interface |
Displays QoS policies applied to interfaces. |
display qos policy user-defined |
Displays user-defined QoS policies. |
display traffic classifier user-defined |
Displays user-defined traffic classes. |