- Released At: 16-04-2025
- Page Views:
- Downloads:
- Table of Contents
- Related Documents
-
H3C SR6600[SR6600-X] Router Series
Troubleshooting Guide
Document version: 6W100-20250416
Copyright © 2025 New H3C Technologies Co., Ltd. All rights reserved. No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd. Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners. The information in this document is subject to change without notice. |
Collecting log and operating information
Troubleshooting hardware issues
Troubleshooting fundamental issues
Troubleshooting system management issues
Hardware resource management issues
Troubleshooting virtual technology issues
Troubleshooting interface issues
Troubleshooting Layer 2—LAN switching issues
Ethernet link aggregation issues
Troubleshooting Layer 2—WAN access issues
Troubleshooting Layer 3—IP services issues
Troubleshooting Layer 3 IP routing issues
Troubleshooting multicast issues
Troubleshooting MPLS L2VPN/VPLS
Troubleshooting MPLS L3VPN issues
Troubleshooting segment routing issues
Troubleshooting EVPN VPWS over SRv6
Troubleshooting ACL and QoS issues
Troubleshooting IP tunneling and security VPN issues
Troubleshooting user access and authentication issues
Troubleshooting security issues
Troubleshooting high availability issues
Troubleshooting system management issues
Troubleshooting network management and monitoring issues
Introduction
This document provides information about troubleshooting common software and hardware problems with H3C SR6600[SR6600-X] routers.
General guidelines
IMPORTANT: To prevent a problem from causing loss of configuration, save the configuration each time you finish configuring a feature. For configuration recovery, regularly back up the configuration to a remote server. |
When you troubleshoot H3C MSR routers, follow these general guidelines:
· To help identify the cause of the problem, collect system and configuration information, including:
¡ Symptom, time of failure, and configuration.
¡ Network topology information, including the network diagram, port connections, and points of failure.
¡ Log messages and diagnostic information. For more information about collecting this information, see "Collecting log and operating information."
¡ Physical evidence of failure:
- Photos of the hardware.
- Status of the card, power, and fan status LEDs.
¡ Steps you have taken, such as reconfiguration, cable swapping, and rebooting.
¡ Output from the commands executed during the troubleshooting process.
· To ensure safety, wear an ESD-preventive wrist strap when you replace or maintain a hardware component.
· If hardware replacement is required, use the release notes to verify the hardware and software compatibility.
Collecting log and operating information
IMPORTANT: By default, the information center is enabled. If the feature has been disabled, you must use the info-center enable command to enable the feature for collecting log messages. |
Table 1 shows the types of files that the system uses to store operating log and status information. You can export these files by using FTP, TFTP, or USB. To more easily locate log information, use a consistent rule to categorize and name files. For example, save log information files to a separate folder for each MPU on a distributed device, and include their chassis and slot numbers in the folder names.
Table 1 Log and operating information
Category |
File name format |
Content |
Common log |
logfileX.log |
Command execution and operational log messages. |
Diagnostic log |
diagfileX.log |
Diagnostic log messages about device operation, including the following items: · Parameter settings in effect when an error occurs. · Information about a card startup error. · Handshaking information between the MPU and interface card when a communication error occurs. |
Operating statistics |
file-basename.gz |
Collecting operating statistics decreases system performance. Current operating statistics for feature modules, including the following items: · Device status. · CPU status. · Memory status. · Configuration status. · Software entries. · Hardware entries. |
|
NOTE: For common and diagnostic log files, the system automatically compresses them into .gz files when they are full. |
Collecting common log messages
# Save common log messages from the log buffer to a log file.
By default, the log file is saved in the logfile directory of the storage medium on the device.
<Sysname> logfile save
The contents in the log file buffer have been saved to the file cfa0:/logfile/logfile8.log
# Identify the log file on the active MPU of the master device.
<Sysname> dir cfa0:/logfile/
Directory of cfa0:/logfile
0 -rw- 21863 Jul 11 2013 16:00:37 logfile8.log
1021104 KB total (421552 KB free)
# Identify the log file on the standby MPU of the master device.
<Sysname> dir slot1#cfa0:/logfile/
Directory of slot1#cfa0:/logfile
0 -rw- 21863 Jul 11 2013 16:00:37 logfile8.log
1021104 KB total (421552 KB free)
# Transfer the files to the desired destination by using FTP, TFTP, or USB. (Details not shown.)
Collecting diagnostic log messages
# Save diagnostic log messages from the diagnostic log file buffer to a diagnostic log file.
By default, the diagnostic log file is saved in the diagfile directory of the storage medium on the device.
<Sysname> diagnostic-logfile save
The contents in the diagnostic log file buffer have been saved to the file cfa0:/diagfile/diagfile18.log
# Identify the diagnostic log file on the active MPU of the master device.
<Sysname> dir cfa0:/diagfile/
Directory of cfa0:/diagfile
0 -rw- 161321 Jul 11 2013 16:16:00 diagfile18.log
1021104 KB total (421416 KB free)
# Identify the diagnostic log file on the standby MPU of the master device.
<Sysname> dir slot1#cfa0:/diagfile/
Directory of slot1#cfa0:/diagfile
0 -rw- 161321 Jul 11 2013 16:16:00 diagfile18.log
1021104 KB total (421416 KB free)
# Transfer the files to the desired destination by using FTP, TFTP, or USB. (Details not shown.)
Collecting operating statistics
You can collect operating statistics by saving the statistics to a file or displaying the statistics on the screen.
When you collect operating statistics, follow these guidelines:
· Log in to the device through a network or management port instead of the console port, if possible. Network and management ports are faster than the console port.
· Do not execute commands while operating statistics are being collected.
· H3C recommends saving operating statistics to a file to retain the information.
|
NOTE: The amount of time to collect statistics increases along with the number of cards. |
To collect operating statistics:
1. Disable pausing between screens of output if you want to display operating statistics on the screen. Skip this step if you are saving statistics to a file.
<Sysname> screen-length disable
2. Collect operating statistics for multiple feature modules.
<Sysname> display diagnostic-information
Save or display diagnostic information (Y=save, N=display)? [Y/N] :
3. At the prompt, choose to save or display operating statistics:
# To save operating statistics, enter y at the prompt and then specify the destination file path.
Save or display diagnostic information (Y=save, N=display)? [Y/N] :y
Please input the file name(*.tar.gz)[cfa0:/diag.tar.gz] :cfa0:/diag.tar.gz
Diagnostic information is outputting to cfa0:/diag.tar.gz.
Please wait...
Save successfully.
<Sysname> dir cfa0:/
Directory of cfa0:
…
6 -rw- 898180 Jun 26 2013 09:23:51 diag.tar.gz
1021808 KB total (259072 KB free)
# To display operating statistics on the monitor terminal, enter n at the prompt.
Save or display diagnostic information (Y=save, N=display)? [Y/N] :n
===========================================================
===============display alarm===============
No alarm information.
=========================================================
===============display boot-loader===============
Software images on slot 0:
Current software images:
cfa0:/SR6600X-CMW710-BOOT-R7328_mrpnc.bin
cfa0:/SR6600X-CMW710-SYSTEM-R7328_mrpnc.bin
Main startup software images:
cfa0:/SR6600X-CMW710-BOOT-R7328_mrpnc.bin
cfa0:/SR6600X-CMW710-SYSTEM-R7328_mrpnc.bin
Backup startup software images:
None
=========================================================
===============display counters inbound interface===============
Interface Total (pkts) Broadcast (pkts) Multicast (pkts) Err (pkts)
BAGG1 0 0 0 0
GE2/0/1 0 0 0 0
GE2/0/2 2 2 0 0
GE2/0/3 0 0 0 0
GE2/0/4 0 0 0 0
GE2/0/5 0 0 0 0
GE2/0/6 0 0 0 0
GE2/0/7 0 0 0 0
GE2/0/8 0 0 0 0
GE2/0/9 0 0 0 0
GE2/0/10 0 0 0 0
……
Contacting technical support
· Information described in "General guidelines."
· Product serial numbers.
This information will help the support engineer assist you as quickly as possible.
Contact H3C Support at [email protected].
Troubleshooting hardware issues
System issues
The terminal displays nothing or garbled characters
Symptom
When the device powers on, the configuration terminal displays nothing or garbled characters.
Common causes
The following are the common causes of this type of issue:
· Power is malfunctioning.
· The MPU is experiencing abnormal operation.
· Connect the configuration cable to the MPU's console port.
· Configure terminal parameter settings correctly.
· Configure cable faults.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 1.
Solution
1. Identify whether the power is functioning properly.
If the power supply unit indication light shows an abnormal status, refer to the power failure troubleshooting section for assistance.
2. Identify whether the MPU operates normally.
If the MPU indication light status is abnormal, refer to the MPU troubleshooting section for resolution.
3. Identify whether the configuration cable is connected to the MPU's console port.
4. Identify whether the COM port connection of the configuration terminal is correct. Ensure the selected serial port matches the terminal's settings and that the serial port parameters are configured correctly.
The serial port parameters are as follows: use a baud rate of 9600, set data bits to 8, select no parity check, use 1 stop bit, and set no traffic control. Choose VT100 for terminal emulation. Use the actual conditions of the device for the serial port parameters of different device configurations.
5. Replace the configuration cable.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
The device experiences an abnormal reboot
Symptom
The device experiences an abnormal restart during operation.
Common causes
Common causes of this type of failure include boot file issues.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 2.
Figure 2 Troubleshooting flowchart
Solution
1. Identify whether the device can enter command line mode after rebooting.
If the device can access command line mode, use the display diagnostic-information command to collect diagnostic information. After collecting, export the device information and send it to H3C technical support for assistance.
|
NOTE: When you execute the display diagnostic-information command, specify the key-info parameter to collect only essential diagnostic information, reducing collection time. |
2. Identify whether the startup file is functioning properly.
If the device cannot enter command line mode, connect the device through the Console port and restart it. If BootWare prompts a CRC error or cannot find the boot file, use the BootWare menu to re-download the boot file and set it as the current boot file. During the BootWare loading process, BootWare automatically sets this file as the current boot file.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Temperature anomaly alarm
Symptom
The system generates a temperature alarm. Print alarm messages indicating high temperatures, for example:
%Jun 26 10:13:46:233 2013 H3C DRVPLAT/4/DrvDebug: Temperature of the board is too high!
Common causes
The following are the common causes of this type of issue:
· Poor ventilation or air conditioning failures cause high ambient temperature.
· The device fan malfunctions or the air intake vent is blocked by foreign objects.
· The air filter on the device has accumulated too much dust.
· The software failed to retrieve temperature data and generated an error alarm.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 3.
Figure 3 Troubleshooting flowchart
Solution
1. Identify whether the ambient temperature is too high.
If the temperature is too high, increase the air conditioning or take other heat dissipation measures to lower the ambient temperature.
2. Identify whether the device temperature is too high.
Execute the display environment command to check the device's current temperature value. If it shows 255, the software fails to obtain temperature data. Execute the display environment command multiple times until the temperature data displays normally. Then, Identify whether the device temperature is too high.
If the device temperature is too high (exceeding the general high-temperature alarm threshold), acknowledge that the device fan is functioning properly and Identify whether the air intake vent is blocked by foreign objects.
3. Use the display fan command to Identify whether the fan tray is operating correctly. If it is not functioning properly, see the fan module failure section to troubleshoot the fan issue.
4. Identify whether the air filter is clean.
If the fan operates normally, Identify whether the air filter is clean. After cleaning the air filter, Identify whether the temperature can return to normal.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· TEMP_HIGH
· TEMP_LOW
· TEMP_NORMAL
· TEMPERATURE_ALARM
· TEMPERATURE_LOW
· TEMPERATURE_NORMAL
· TEMPERATURE_POWEROFF
· TEMPERATURE_SHUTDOWN
· TEMPERATURE_WARNING
Voltage abnormality alarm
Symptom
The system prints voltage anomaly alarm messages, for example:
DEV/4/VOLTAGE_HIGH: Voltage is greater than the high-voltage alarm threshold on chasiss 1 slot 16 voltage sensor 1.
DEV/4/VOLTAGE_LOW: Voltage is less than the low-voltage alarm threshold on chasiss 1 slot 16 voltage sensor 24.
Common causes
Common causes of this type of failure typically include hardware (HW) malfunctions.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 4.
Figure 4 Troubleshooting flowchart
Solution
Collect the device configuration file, log information, and alarm information, and contact Technical Support.
Related alarm and log messages
Alarm messages
N/A
Log messages
· VOLT_HIGH
· VOLT_LOW
· VOLT_NORMAL
Memory exception alarm
Symptom
The system prints memory exception alarm messages, such as:
DIAG/1/MEM_EXCEED_THRESHOLD: Memory minor threshold has been exceeded.
Common causes
The common causes of this type of failure mainly stem from memory leaks.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 5.
Figure 5 Troubleshooting flowchart
Solution
1. Determine the usage of each memory block.
Use the display system internal kernel memory pool command in the probe view to check memory usage for each block. Identify memory modules with abnormal or increasing usage.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display system internal kernel memory pool slot 1
Active Number Size Align Slab Pg/Slab ASlabs NSlabs Name
9126 9248 64 8 32 1 289 289 kmalloc-64
105 112 16328 0 2 8 54 56 kmalloc-16328
14 14 2097096 0 1 512 14 14 kmalloc-2097096
147 225 2048 8 15 8 12 15 kmalloc-2048
7108 7232 192 8 32 2 226 226 kmalloc-192
22 22 524232 0 1 128 22 22 kmalloc-524232
1288 1344 128 8 21 1 64 64 kmalloc-128
0 0 67108808 0 1 16384 0 0 kmalloc-67108808
630 651 4096 8 7 8 93 93 kmalloc-4096
68 70 131016 0 1 32 68 70 kmalloc-131016
1718 2048 8 8 64 1 31 32 kmalloc-8
1 1 16777160 0 1 4096 1 1 kmalloc-16777160
2 15 2048 0 15 8 1 1 sgpool-64
0 0 40 0 42 1 0 0 inotify_event_cache
325 330 16328 8 2 8 165 165 kmalloc_dma-16328
0 0 72 0 30 1 0 0 LFIB_IlmEntryCache
0 0 1080 0 28 8 0 0 LFIB_IlmEntryCache
0 0 1464 0 21 8 0 0 MFW_FsCache
1 20 136 0 20 1 1 1 L2VFIB_Ac_cache
0 0 240 0 25 2 0 0 CCF_JOBDESC
0 0 88 0 26 1 0 0 NS4_Aggre_TosSrcPre
0 0 128 0 21 1 0 0 IPFS_CacheHash_cachep
---- More ----
Please focus on the statistics in the Number and Size columns. If you notice continuous growth in a specific block, it indicates that the block is being actively utilized. Follow these restrictions and guidelines:
¡ An increase in memory block usage is normal. Therefore, determine whether the memory block is truly abnormal. Number*Size represents the memory size used by a specific module. To determine if memory usage is normal, observe the memory growth rate and analyze the amount of memory used over time.
¡ Some memory leaks occur slowly, requiring a longer observation period, even weeks, for comparison.
2. Collect information and seek technical support.
The above steps only define the scope of the issue. Continue gathering information to identify the specific fault. Due to the high requirements for subsequent information collection, avoid user operations. Please contact H3C's technical support engineer.
Please do not restart the device, as it may corrupt fault information and complicate fault localization.
Related alarm and log messages
Alarm messages
N/A
Log messages
· MEM_ALERT
· MEM_EXCEED_THRESHOLD
· MEM_BELOW_THRESHOLD
High CPU usage
Symptom
Use the display cpu-usage command to monitor CPU usage continuously. If the CPU usage remains above 80%, a task is likely consuming CPU resources for an extended period. Acknowledge the specific cause of the high CPU usage.
<Sysname> display cpu-usage
Slot 1 CPU 0 CPU usage:
80% in last 5 seconds
80% in last 1 minute
80% in last 5 minutes
Common causes
The following are the common causes of this type of issue:
· Route oscillation
· Message attack
· Link loop
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 6.
Figure 6 Troubleshooting flowchart
Solution
1. Check for routing oscillation.
Frequent changes in the routing table entries may cause high CPU usage. When route flapping occurs, collect information and contact H3C technicians for technical support.
View the routing table for the first time.
[Sysname] display ip routing-table
Destinations : 9 Routes : 9
Destination/Mask Proto Pre Cost NextHop Interface
0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
10.1.1.0/24 OSPF 150 1 11.2.1.1 Vlan100
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0
127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
224.0.0.0/4 Direct 0 0 0.0.0.0 NULL0
224.0.0.0/24 Direct 0 0 0.0.0.0 NULL0
255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
Review the routing table again.
[Sysname] display ip routing-table
Destinations : 8 Routes : 8
Destination/Mask Proto Pre Cost NextHop Interface
0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0
127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
224.0.0.0/4 Direct 0 0 0.0.0.0 NULL0
224.0.0.0/24 Direct 0 0 0.0.0.0 NULL0
255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
2. Check for message attacks.
Acknowledge the attack source by capturing packets. Capture packets at the device port. Use packet capture tools like Sniffer, Wireshark, or WinNetCap to analyze packet characteristics and acknowledge the attack source. Then configure message protection against the attack source. For more information about message attack prevention and configuration, see "Attack Detection and Prevention" in the "Security Configuration Guide."
3. Check for the existence of a link.
When a loop exists in the link, it may cause a broadcast storm and network oscillation. A large number of protocol packets sent to the CPU can increase CPU usage. Many device ports may experience high traffic, with port utilization exceeding 90%.
<Sysname> display interface gigabitethernet2/0/1
GigabitEthernet2/0/1
Current state: UP
Line protocol state: UP
Description: GigabitEthernet2/0/1 Interface
Bandwidth: 1000000 kbps
Maximum transmission unit: 1500
Internet address: 2.1.1.2/24 (primary)
IP packet frame type: Ethernet II, hardware address: 0000-fc00-9276
IPv6 packet frame type: Ethernet II, hardware address: 0000-fc00-9276
Loopback is not set
Media type is twisted pair, port hardware type is 1000_BASE_T
Port priority: 0
1000Mbps-speed mode, full-duplex mode
Link speed type is autonegotiation, link duplex type is autonegotiation
Flow-control is not enabled
Maximum frame length: 9216
Last clearing of counters: Never
Peak input rate: 8 bytes/sec, at 2016-03-19 09:20:48
Peak output rate: 1 bytes/sec, at 2016-03-19 09:16:16
Last 300 second input: 26560 packets/sec 123241940 bytes/sec 99%
Last 300 second output: 0 packets/sec 0 bytes/sec 0%
……
If a loop occurs in the link:
¡ Check the link connection and ensure the port configuration is correct.
¡ For Layer 2 interfaces, enable the STP protocol and ensure the configuration is correct.
¡ Identify whether the STP status of adjacent devices is normal for Layer 2 ports.
¡ If the above configurations are correct, STP may have a calculation error or the protocol may calculate correctly, but the port driver layer does not block properly. You can shut down the ports on the loop or unplug and replug the ports to prompt STP to recalculate for a quick recovery.
4. Identify the CPU-intensive tasks.
If the above steps do not resolve the issue, use the display process cpu command to check which task is using the most CPU.
<Sysname> display process cpu slot 1
CPU utilization in 5 secs: 2.4%; 1 min: 2.5%; 5 mins: 2.4%
JID 5Sec 1Min 5Min Name
1 0.0% 0.0% 0.0% scmd
2 0.0% 0.0% 0.0% [kthreadd]
3 0.0% 0.0% 0.0% [migration/0]
4 0.0% 0.0% 0.0% [ksoftirqd/0]
5 0.0% 0.0% 0.0% [watchdog/0]
6 0.0% 0.0% 0.0% [migration/1]
7 0.0% 0.0% 0.0% [ksoftirqd/1]
8 0.0% 0.0% 0.0% [watchdog/1]
9 0.0% 0.0% 0.0% [migration/2]
10 0.0% 0.0% 0.0% [ksoftirqd/2]
11 0.0% 0.0% 0.0% [watchdog/2]
……
Each column represents the percentage of CPU usage for a task over 5 seconds, 1 minute, and 5 minutes, along with the task name. The higher the task utilization, the more CPU resources the corresponding task consumes. In normal conditions, task CPU usage is usually below 5%. Use this command to check tasks with significantly higher usage.
5. Acknowledge the call stack of the abnormal task.
Use the follow job job-id command in probe view to acknowledge the call stack of the abnormal task. Query it more than five times and send the results to the technical support personnel for analysis. This helps determine what processing the task is performing that causes the CPU usage to remain high. This example shows the call stack for JID 145.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] follow job 145 slot 1
Attaching to process 145 ([dGDB])
Iteration 1 of 5
------------------------------
Kernel stack:
[<ffffffff80355290>] schedule+0x570/0xde0
[<ffffffff80355da8>] schedule_timeout+0x98/0xe0
[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0
[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]
[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]
[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]
[<ffffffff8015c420>] kthread+0x130/0x140
[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20
Iteration 2 of 5
------------------------------
Kernel stack:
[<ffffffff80355290>] schedule+0x570/0xde0
[<ffffffff80355da8>] schedule_timeout+0x98/0xe0
[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0
[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]
[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]
[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]
[<ffffffff8015c420>] kthread+0x130/0x140
[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20
Iteration 3 of 5
------------------------------
Kernel stack:
[<ffffffff80355290>] schedule+0x570/0xde0
[<ffffffff80355da8>] schedule_timeout+0x98/0xe0
[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0
[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]
[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]
[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]
[<ffffffff8015c420>] kthread+0x130/0x140
[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20
Iteration 4 of 5
------------------------------
Kernel stack:
[<ffffffff80355290>] schedule+0x570/0xde0
[<ffffffff80355da8>] schedule_timeout+0x98/0xe0
[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0
[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]
[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]
[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]
[<ffffffff8015c420>] kthread+0x130/0x140
[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20
Iteration 5 of 5
------------------------------
Kernel stack:
[<ffffffff80355290>] schedule+0x570/0xde0
[<ffffffff80355da8>] schedule_timeout+0x98/0xe0
[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0
[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]
[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]
[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]
[<ffffffff8015c420>] kthread+0x130/0x140
[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· CPU_STATE_NORMAL
· CPU_MINOR_RECOVERY
· CPU_MINOR_THRESHOLD
· CPU_SEVERE_RECOVERY
· CPU_SEVERE_THRESHOLD
Power supply issues
Power supply is abnormal
Symptom
The power supply unit status LED is abnormal, or the power supply reports a fault during operation.
Common causes
The following are the common causes of this type of issue:
· The power supply unit model does not match the host.
· The power supply unit is not installed properly.
· The power cord is not securely plugged in.
· The power supply unit temperature is too high.
· Power supply unit failure.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 7.
Figure 7 Troubleshooting flowchart
Solution
1. Check whether the power supply unit model matches the host model.
2. Check the power supply system of the device: acknowledge that the power supply system operates correctly and the voltage is normal.
3. Use the indicator on the power supply unit to initially assess whether issues such as output short circuit, output overcurrent, output overvoltage, input undervoltage, or overheating exist. The power indication light states vary among different hosts. For more information about the specific host, see the corresponding hardware manual.
4. Check the power supply unit status.
5. Use the display power command to show the power supply unit status. Check for any Fault, Error, or Absent states in the power modules.
<Sysname> display power
Power 0 State: Normal
Power 1 State: Absent
Power 2 State: Absent
Power 3 State: Absent
You can also use the display alarm command to view the alarm messages from the power supply unit.
<Sysname> display alarm
Slot CPU Level Info
- - INFO Power 1 is absent.
- - INFO Power 2 is absent.
- - INFO Power 3 is absent.
6. If the power supply unit status is Absent, follow these sub-steps for troubleshooting.
a. Please remove the power supply unit and reinstall it. Check the power connector for any damage before reinstalling.
b. After reinstallation, if the power supply unit does not return to Normal status, replace it with a functioning power supply unit in a different slot for cross-verification.
c. If the power supply unit still shows as absent, replace it with a new power supply unit.
d. After replacing the power supply unit, this issue persists. Please execute step 7.
7. If the power supply unit status shows Fault or Error, follow these steps for troubleshooting.
a. Identify whether the power cord (PWR) is loose or properly connected.
b. If the power cord (PWR) connects properly, verify the power cord (PWR) for faults.
c. If the power cord (PWR) is normal, high temperatures may be causing the power supply unit (PSU) to malfunction. Check the power supply unit for dust accumulation. If there is excessive dust, clean it and then uninstall and reinstall the power supply unit.
d. After reinstallation, the power supply unit status did not return to Normal. Please swap this power supply unit with a functioning one for cross-verification.
e. If the power supply unit still shows a Fault status, replace the power supply unit.
f. After replacing the power supply unit, this issue persists. Please execute step 7.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· DEV/2/POWER_FAILED
· DEV/3/POWER_ABSENT
Fan issues
The fan module status is abnormal
Symptom
The fan module status LED is abnormal, or the fan frame reports a fault during operation.
Common causes
The following are the common causes of this type of issue:
· The fan is not securely plugged in.
· The chassis air intake vent and exhaust vent are blocked by foreign objects.
· Fan hardware failure.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 8.
Figure 8 Troubleshooting flowchart
Solution
1. Check the fan module indicator light status for normal operation. The status may vary between different hosts. For more information about the specific host's hardware manual, see the relevant documentation. If all the indicator lights are off, acknowledge whether the power supply unit is functioning properly or if the overall system switch wiring is open. For more information about power supply unit status anomalies, see"Power supply is abnormal."
2. Check the fan status.
Use the display fan command to check the fan frame status.
<Sysname> display fan
Fan Frame 0 State: Normal
Use the display alarm command to view the fan box alarm message.
<Sysname> display alarm
Chassis Slot CPU Level Info
2 - - INFO fan 1 is absent.
3. Check that the fan frame is securely installed.
4. If the fan frame's operating state shows as Absent, the fan frame is either not in place or not securely installed. If the fan frame is in place, remove and reinstall it. Before reinstalling, check that the fan connector is intact. Then, ensure the fan frame status shows as Normal. If it still shows as Absent, replace the fan frame. If the new fan frame still shows as Absent, execute step 5.
5. Check the device's operating environment information.
6. If the fan frame's operating state shows as Fault, the fan frame is malfunctioning and cannot provide heat dissipation function. Use the following steps for further identification.
a. Use the display environment command to Identify whether the system temperature continues to rise. If the system temperature continues to rise, touch the device's air outlet with your hand to check for airflow. If the temperature continues to rise and there is no airflow from the outlet, the fan frame is abnormal.
b. Identify whether the chassis air intake vent and exhaust vent are blocked by foreign objects. Please clear any foreign objects.
c. Identify whether the speed of each fan is normal.
Use the display fan command in any view to Identify whether the speed of each fan differs from the normal speed by more than 50%. If abnormalities occur, acknowledge them by unplugging and reinserting the fan or replacing the crossover for further confirmation.
d. If you confirm a fan issue, uninstall and then reinstall the fan module. Before reinstalling, check the fan connector for damage. Use the display fan command to see if it returns to Normal status.
e. If you still cannot restore to Normal status, please replace the fan frame. If there is no fan frame on site and immediate replacement is not possible, turn off the device to prevent overheating and circuit damage. If cooling measures keep the system below 50 degrees Celsius, you can continue using the device.
f. If replacing the fan frame still does not restore to Normal status, execute step 5.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· DEV/2/FAN_FAILED
· DEV/3/FAN_ABSENT
Card issues
Abnormal card state
Symptom
· The card status is abnormal (for example, execute the display device command to Identify whether the card status shows Absent or Fault).
· The card experiences abnormal reboots, fails to start, or keeps rebooting.
Common causes
The following are the common causes of this type of issue:
· The card is not installed properly.
· Card damage.
· The faceplate's indication light is abnormally lit.
· Power supply unit failure.
· The power supply unit output power is insufficient.
· The host software version does not support using this card.
· The MPU is in an abnormal operating state.
· The device identifier of the service module, standby MPU, or switching fabric module does not match the active MPU.
· The switching fabric module is not in place or is in an abnormal state before the service module starts.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 9.
Figure 9 Troubleshooting flowchart
Solution
Card status: Absent
1. Acknowledge that the module is securely inserted. Check for any gaps between the module and the chassis. You can also remove and reinsert the module. Before reinserting, check the connector status of the card for deformation or dirt.
2. Place the card in another slot and move a functioning card from the frame to this slot. Acknowledge whether the issue is with the card.
3. Identify whether the indication lights on the faceplate are lit.
4. Acknowledge whether the power supply unit provides sufficient output power. For example, add a power supply unit and Identify whether the card status recovers to normal.
5. Acknowledge whether the host software version supports this card.
a. Execute the display version command to view the software version of the host.
b. Contact technical support to acknowledge whether the current host software version supports this card.
c. If the current software version does not support this card, upgrade to the correct version. Acknowledge compatibility with other cards before the version upgrade.
6. If the card is the MPU, connect the configuration cable to the Console port. Use a pointed tool (like a pen tip) to press the system reset button (RESET) on the card, or reboot the card using the reboot slot slotid force command. Check the boot information on the configuration terminal to see if it returns to normal (no display or garbled characters indicate an abnormal situation). Also, verify that the status LED on the card returns to normal. Under normal conditions, the terminal displays information similar to the following after startup:
System is starting...
Press Ctrl+D to access BASIC-BOOTWARE MENU
Press Ctrl+T to start memory test
Booting Normal Extend BootWare........
****************************************************************************
* *
* H3C SR66 BootWare, Version 7.1.064 *
* *
****************************************************************************
Copyright (c) 2004-2017 New H3C Technologies Co., Ltd.
Compiled Date : Apr 6 2017
CPU Type : XLS408
CPU L1 Cache : 32KB
CPU Clock Speed : 1000MHz
Memory Type : DDR2 SDRAM
Memory Size : 2048MB
Memory Speed : 533MHz
BootWare Size : 1024KB
Flash Size : 4MB
cfa0 Size : 244MB
BASIC CPLD Version : 131.0
EXTEND CPLD Version : 133.0
PCB Version : Ver.B
BootWare Validating...
Press Ctrl+B to enter extended boot menu...
7. If the card is a service module, first ensure the MPU is in a normal operating state and check that the daughtercard connector is not deformed or dirty.
8. If you confirm a card failure, replace the card, collect the information below, and contact technical support personnel.
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
The card status is power-off.
1. Acknowledge whether the device environment has experienced overheating. Use the display power-supply command to check for records of excessive ambient temperature and powered-off cards. For example, if the power status of the card shows "Status" as "off," it indicates that the card has been actively powered down due to user actions or over-temperature protection.
<Sysname> display power-supply verbose
Power No. State Description
------------------------------------------------
1 Normal VAPEL-1200AC
2 Absent Unknown
Power supply information for chassis 0
------------------------------------------------
Total system power : 1200 watts
Redundant system power : 0 watts
Used system power : 0 watts
Available system power : 990 watts
Reserved system power : 210 watts
Slot Card type Used power(W) State
------------------------------------------------
0 RT-RSE-X3 50 On
1 N/A 0 Off
2 N/A 0 Off
3 N/A 0 Off
4 N/A 0 Off
5 N/A 0 Off
2. If you acknowledge overheating due to power issues, Identify whether all slots for the cards are filled. If they are, use the display fan command to confirm the fan's operation. A fan status of Normal indicates proper functioning. If the fan is not normal or if you suspect a power issue with the card, collect the following information and contact technical support personnel.
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Card status: Fault.
1. Check the overall system power consumption. If the power consumption is insufficient, the card will enter a fault state.
2. Wait about 10 minutes to acknowledge whether the order card remains in Fault or Normal status, then restart it again. If the card restarts automatically after being in Normal mode, collect the following information and contact technical support personnel.
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
3. If the card is the MPU, connect the serial line and check the configuration terminal for normal startup information or any abnormal startup messages. If the MPU fails the memory read/write test during startup and keeps rebooting, Identify whether the memory module is securely seated.
readed value is 55555555 , expected value is aaaaaaaa
DRAM test fails at: 080ffff8
DRAM test fails at: 080ffff8
Fatal error! Please reboot the card.
4. Place the card in a different slot to further confirm whether the slot is faulty.
5. If you confirm a single card failure, replace the card, collect the following information, and contact technical support personnel.
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Abnormal single card restart
The card restart here refers to a situation where the card has restarted, and its current status is Normal.
1. Analyze the logs or run time to identify the restart period. Acknowledge whether users executed the reboot command or performed card power cycling near the restart time.
2. Use the display version command to check the reason for the last reboot of the card. For example, "Last reboot reason" indicates that the last reboot of the card was due to power on the device.
<Sysname> display version
H3C Comware Software, Version 7.1.075, Release 7751P01
Copyright (c) 2004-2017 New H3C Technologies Co. Ltd. All rights reserved.
H3C SR6600-X uptime is 0 weeks, 0 days, 4 hours, 24 minutes
Last reboot reason : Cold reboot……
3. If all circuit cards restart simultaneously, Identify whether the device's power supply unit functions properly. Acknowledge any power outages and ensure the power input is securely connected without looseness.
4. Check the logs to see if there are warning messages similar to **"Warning: Standby board on slot 1 is not compatible with master board."** or **"Warning: The LPU board on slot 1 is not compatible with MPU board."** during the reboot. This indicates that the device identifier of the service module, standby MPU, or switching fabric module does not match the active MPU. In such cases, contact technical support for replacement.
5. If you cannot acknowledge, collect the following information and contact technical support personnel.
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
The MPU cannot start
Symptom
The MPU fails to start.
Common causes
The following are the common causes of this type of issue:
· The MPU hardware failure prevents powering on.
· The MPU BootWare basic segment is damaged.
· Memory or CPU hardware failure prevents BootWare from running.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 10.
Figure 10 Troubleshooting flowchart
Solution
1. Identify whether the MPU status light (RUN light) is on.
After BootWare initializes, it immediately sets the running light to a quick flash. This serves as an important indicator of whether the system can boot.
The LEDs on different MPUs might vary slightly. For specific details, see the hardware description of the corresponding product.
2. If the device's power-on indicator flashes quickly, the basic segment starts normally. Proceed to step 4.
3. If the power light is off, the device may not be powered on or the BootWare segment may have corruption.
a. First, Identify whether the device is powered on. Observe the MPU from the front of the air intake vent. Check for any green flashing lights or steady lights inside the MPU. After some time, remove the MPU and feel the heat of the heat dissipation fins on the CPU.
b. If there is no power, check the power supply and power module. Hardware faults in the device may also prevent the motherboard from powering on.
c. If the device powers on normally, the BootWare basic segment may be corrupted and should be returned for research and development (R&D) processing.
|
NOTE: The term "the running light not lit" refers to a situation where the light has never turned on after power-up. If it flashes for more than 5 seconds and then goes out, it does not apply. It is virtually impossible for the running light to stay constantly lit or blink slowly (at 1Hz frequency) immediately after power-on. If this occurs, it indicates a hardware failure. |
4. Identify whether the Bootware runs successfully.
¡ Check for the following information. If present, it indicates that the basic segment has run successfully. Proceed to step 5.
System is starting...
Press Ctrl+D to access BASIC-BOOTWARE MENU
Press Ctrl+T to start memory test
Booting Normal Extend BootWare........
****************************************************************************
* *
* H3C SR66 BootWare, Version 7.1.064 *
* *
****************************************************************************
Copyright (c) 2004-2017 New H3C Technologies Co., Ltd.
Compiled Date : Apr 6 2017
CPU Type : XLS408
CPU L1 Cache : 32KB
CPU Clock Speed : 1000MHz
Memory Type : DDR2 SDRAM
Memory Size : 2048MB
Memory Speed : 533MHz
BootWare Size : 1024KB
Flash Size : 4MB
cfa0 Size : 244MB
BASIC CPLD Version : 131.0
EXTEND CPLD Version : 133.0
PCB Version : Ver.B
BootWare Validating...
¡ If there is no output, the memory or CPU may have issues. Proceed to step 5.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
The new MPU cannot start
Symptom
The device originally had one main board. A new main board was added as a standby main board (SMB), but the new main board fails to start.
Common causes
The following are the common causes of this type of issue:
· The standby MPU and the original MPU have different models.
· The software versions of the standby MPU and the original MPU do not match.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 11.
Figure 11 Troubleshooting flowchart
Solution
1. Identify whether the new MPU matches the model of the original MPU.
Both MPU models in the same device must match. Identify whether the two MPU models match. If they do not match, replace one with a compatible MPU.
2. Identify whether the newly added MPU version matches the original MPU version.
Connect to the Console port of the standby main board (SMB) to Identify whether the system software version loaded during startup matches that of the primary main board. If they are inconsistent, upgrade the version of the standby main board (SMB) in the BootWare menu.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
The MPU restarts during use and fails to boot properly
Symptom
The MPU restarts during use and fails to boot normally.
Common causes
The following are the common causes of this type of issue:
· The startup file is corrupted.
· The MPU memory unit is damaged.
· The board is not fully inserted or is damaged, causing BootWare to run abnormally.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 12.
Figure 12 Troubleshooting flowchart
Solution
1. Identify whether the startup files on the MPU are functioning properly.
Log in to the faulty MPU through the console port. Restart the device. If BootWare prompts a CRC error or cannot find the boot file, reload the boot file. Acknowledge that the file size in Flash matches the file on the server. If it does not exist or is inconsistent, reload the boot file. After loading, set this file as the current startup file. BootWare automatically sets this file as the current startup file during the loading process.
2. Test whether the MPU memory unit functions correctly.
Acknowledge that the loaded file size is correct and that setting it as the current startup file is also normal. Please restart the board and immediately hold down CTRL+T to check the memory module. If you receive a memory error, replace the board.
3. Identify whether Bootware still prompts an error.
If the memory check is normal but you still see error messages during the BootWare startup, use the relevant prompts to initially identify the faulty component. Check that the board is securely inserted. Replace the module if it is securely inserted.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Active/standby switchover failure
Symptom
This type of failure commonly occurs in the following three situations:
· Use the reboot command to restart the active MPU, and the standby MPU will also restart.
· Changeover between primary and backup has an issue.
Common causes
The following are the common causes of this type of issue:
· The standby main board (SMB) automatically becomes the main control board when the primary control board restarts before the original SMB finishes booting.
· The standby main board (SMB) did not receive messages from the main main board and switched to the main main board.
· The MPU itself has an anomaly that causes a reboot.
· The main control board and standby main board (SMB) versions are inconsistent.
Troubleshooting flow
Use the reboot command to restart the active main board. The standby main board also restarts. For diagnosing such faults, see Figure 13.
Figure 13 Troubleshooting flowchart
Solution
When you reboot the primary main board using the reboot command, the standby main board (SMB) also restarts. The solution for this issue is as follows:
1. After the primary MPU starts successfully, use the ftp or tftp command to upload the latest logfile from the logfile directory on the storage media to the file server.
2. Check the logfile for the reboot command log (similar to "Command is reboot slot 0") from the last startup (similar to "SYSLOG_RESTART:"). During this time, has a similar message appeared, such as "Batch backup of standby board in slot 1 has finished"?
a. If it has not occurred before, this indicates that the standby main board (SMB) has not fully started. The active main board has rebooted and passively turned into the active main board. In this case, the SMB reboot is normal and requires no action. Before the next reboot, ensure the standby main board (SMB) completes the batch backup. Look for logs indicating "Batch backup of standby board in slot 1 has finished." Then, use the reboot slot command to reboot the main control board.
b. If this occurs, please contact H3C technical support personnel.
For changeover exceptions between primary and backup, the solution for such faults is as follows:
3. Use the display system stable state command to collect the status information of the primary and backup controllers.
<H3C> display system stable state
System state : Stable
Redundancy state : Stable
Slot CPU Role State
0 0 Active Stable
1 0 Standby Stable
Check the displayed information.
a. Determine whether the dual master control role is Active or Standby.
b. Identify whether the primary and backup control statuses are stable.
4. Use the display boot-loader command to collect version information for the primary and backup controllers. Identify whether the versions of the primary and backup controllers match.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Service card cannot start up
Symptom
A service card cannot start up.
Common causes
The following are the common causes of this type of issue:
· Abnormal operation of the switching fabric module.
· Power supply anomaly.
· The software version does not support this service card.
· The service card is not properly installed.
· Hardware failure of the service card.
· Hardware failure of the chassis slot.
Solution
1. Check whether the switch fabric module is functioning properly.
Ensure that the switch fabric module is in place and its status is **Normal**. If the status is abnormal, troubleshoot the switch fabric module first.
2. Check whether the service module is powered on.
Observe the RUN indicator status on the service module. If the indicator is off, the service module may not be powered on. Proceed with the following substeps for troubleshooting. If the power is normal, proceed to step 3.
3. Check the power module indicators to determine whether the power module is functioning properly.
If the indicators are abnormal, see the power module state section for troubleshooting.
4. Calculate the total power consumption and check whether the remaining power capacity is sufficient.
If the power is insufficient, add additional power modules.
5. Check whether the software version supports the service module.
Execute the display version command in any view to check the device’s software version. Then verify whether the current software version supports the service module. If not, upgrade to a compatible version. Before upgrading, ensure that the new version is compatible with other boards.
6. Reseat the service module.
Remove the service module, inspect the connectors for any damage, and reinsert it securely to ensure proper installation.
7. Test the service module in another slot to see if it can boot.
¡ If it still fails to boot in another slot, the service module may be faulty. Replace it with a new one for testing.
¡ If it boots successfully in another slot, install another working service module into the original faulty slot. If it fails to boot, the chassis slot itself may be faulty.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Service card restarts during operation and cannot start up
Symptom
A service card restarts during operation and cannot start up
Common causes
The following are the common causes of this type of issue:
· Abnormal power supply.
· Abnormal startup file on the MPU.
· Service module hardware failure.
· Chassis slot hardware failure.
Solution
1. Check whether the power module is functioning properly.
Verify the status of the power module indicators and ensure the power capacity meets the operational requirements of the board. If any power module is malfunctioning, see the power module fault section for troubleshooting.
2. Check whether the boot files on the main control board are intact.
Execute the display boot-loader command in any view to check the next-startup software package for the board. In the user view, run the dir command to confirm whether the boot software package exists. If it is missing or corrupted, obtain the correct boot package again or configure another software package as the next-startup file for the board.
<Sysname> display boot-loader
Software images on slot 0:
Current software images:
cfa0:/SR6600X-CMW710-BOOT-F8149L19-RSE3.bin
cfa0:/SR6600X-CMW710-SYSTEM-F8149L19-RSE3.bin
Main startup software images:
cfa0:/SR6600X-CMW710-BOOT-F8149L19-RSE3.bin
cfa0:/SR6600X-CMW710-SYSTEM-F8149L19-RSE3.bin
Backup startup software images:
None
<Sysname>dir
Directory of cfa0: (VFAT)
0 -rw- 4944 Mar 06 2024 08:35:06 20210430.cfg
1 -rw- 94704 Mar 06 2024 08:35:06 20210430.mdb
2 -rw- 43518976 Mar 06 2024 08:17:58 SR6600X-CMW710-BOOT-F8149L19-RSE3.b
in
3 -rw- 317644800 Mar 06 2024 08:26:24 SR6600X-CMW710-SYSTEM-F8149L19-RSE3
.bin
4 -rw- 361170944 Mar 06 2024 08:12:42 SR6600X-RSE3.ipe
5 drw- - Apr 22 2021 23:32:50 diagfile
6 drw- - Oct 11 2022 18:49:54 dpi
7 -rw- 296 Mar 06 2024 08:35:06 ifindex.dat
8 drw- - Jan 11 2021 10:11:48 license
9 drw- - Oct 25 2022 02:09:00 logfile
10 drw- - Mar 06 2024 17:35:30 pki
11 drw- - Jan 11 2021 10:11:48 seclog
12 drw- - Mar 06 2024 07:24:46 tracefile
13 drw- - Apr 29 2021 17:39:16 versioninfo
1020068 KB total (304488 KB free)
3. Test whether a normally functioning service board can boot in the faulty slot.
If the boot files loaded on the service board are confirmed to be normal, insert another working service board into the problematic slot for testing (if conditions permit).
If the inserted working service board boots successfully, this rules out issues with the MPU or backplane. Proceed to step 4.
If the inserted working service board still fails to boot, replace the MPU.
4. Check for loading logs.
Execute the display logbuffer command in any view to check whether there are loading records for the board in the corresponding slot in the device's logbuffer.
<Sysname> display logbuffer
%Jan 12 19:13:49:513 2022 H3C DEV/4/BOARD_LOADING: -MDC=1; Board in slot 1 is loading software images.
%Jan 12 19:14:01:718 2022 H3C DEV/5/LOAD_FINISHED: -MDC=1; Board in slot 1 has finished loading software images.
If loading logs for the board in the corresponding slot exist, relocate the service board to another slot to see if it can boot normally.
If no loading logs are found for the board in the corresponding slot, proceed to step 5.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· DEV/4/BOARD_LOADING
· DEV/5/LOAD_FINISHED
Port issues
The port experiences a CRC error
Symptom
Use the display interface command to check for CRC error packets on the port.
<Sysname> display interface gigabitethernet2/0/1
GigabitEthernet2/0/1
Current state: DOWN
Line protocol state: DOWN
Description: GigabitEthernet2/0/1 Interface
Bandwidth: 1000000 kbps
Maximum transmission unit: 1500
Allow jumbo frames to pass
Broadcast max-ratio: 100%
Multicast max-ratio: 100%
Unicast max-ratio: 100%
Internet address: 2.1.1.2/24 (primary)
IP packet frame type: Ethernet II, hardware address: 0000-fc00-9276
IPv6 packet frame type: Ethernet II, hardware address: 0000-fc00-9276
Loopback is not set
Media type is twisted pair, port hardware type is 1000_BASE_T
Promiscuous mode is not set
Port priority: 0
1000Mbps-speed mode, full-duplex mode
Link speed type is autonegotiation, link duplex type is autonegotiation
Flow-control is not enabled
Maximum frame length: 9216
Output queue - Urgent queuing: Size/Length/Discards 0/1024/0
Output queue - Protocol queuing: Size/Length/Discards 0/500/0
Output queue - FIFO queuing: Size/Length/Discards 0/1024/0
Last link flapping: 6 hours 39 minutes 28 seconds
Last hardware down reason: PHY line side is down
Last clearing of counters: Never
Current system time:2017-12-09 10:46:24
Last time when physical state changed to up:-
Last time when physical state changed to down:2017-12-09 10:25:30
Peak input rate: 8 bytes/sec, at 2019-03-19 09:20:48
Peak output rate: 1 bytes/sec, at 2019-03-19 09:16:16
Last 300 second input: 0 packets/sec 0 bytes/sec -%
Last 300 second output: 0 packets/sec 0 bytes/sec -%
Input (total): 2892 packets, 236676 bytes
24 unicasts, 2 broadcasts, 2866 multicasts, 0 pauses
Input (normal): 2892 packets, - bytes
24 unicasts, 2 broadcasts, 2866 multicasts, 0 pauses
Input: 0 input errors, 0 runts, 0 giants, 0 throttles
3 CRC, 0 frame, - overruns, 0 aborts
- ignored, - parity errors
Output (total): 29 packets, 1856 bytes
24 unicasts, 5 broadcasts, 0 multicasts, 0 pauses
Output (normal): 29 packets, - bytes
24 unicasts, 5 broadcasts, 0 multicasts, 0 pauses
Output: 0 output errors, - underruns, - buffer failures
0 aborts, 0 deferred, 0 collisions, 0 late collisions
0 lost carrier, - no carrier
The information above shows that the incoming port has experienced CRC errors.
Common causes
· The port has a ghost connection with the cable connector.
· Port anomaly.
· The cable connector is damaged.
· The transceiver module or fiber optic may have contamination or poor connections.
· Insufficient optical power.
· Intermediate link or device failure.
· Device or board hardware failure.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 14.
Figure 14 Troubleshooting flowchart
Solution
1. Use the port to perform an internal loopback check.
Configure the loopback internal command on the port to enable the internal loopback function. Then, use the display interface command to Identify whether the port's CRC error packet statistics increase. If growth occurs, a device or hardware fault may exist. Please contact technical support personnel. If there is no growth, then it is not an internal port issue.
2. Check for any abnormalities between the port and the cable connector.
a. Check the physical connection of the port and cable connector for any loose connections. If there is a loose connection, connect the port and cable connector properly.
b. Check the port for abnormalities, such as foreign objects, bent pins, or deformed housings. If there is an issue, replace it with another functioning port or transceiver module.
c. Check the cable connector for any damage. If you notice any damage, replace the cable.
3. Check the transceiver module for any abnormalities.
a. Connect the Tx and Rx ends of the transceiver module for this port using fiber optic. Then, use the display interface command to Identify whether the port's CRC error packet statistics increase. If there is growth, the issue may be with the transceiver module. If there is no growth, the issue does not lie with the transceiver module.
b. Use the display transceiver alarm command to check for Rx_Los or Tx_Fault alarm messages in the transceiver module. If you find any alarm messages, clean or replace the fiber optic or transceiver module.
c. Use the display transceiver diagnosis command to Identify whether the transceiver module's receive (Rx) and transmit power are within the specified maximum and minimum range. If the receive or transmit power exceeds the range, clean or replace the fiber optic or transceiver module.
4. Replace the normal port to test if recovery is possible.
Test by replacing with another normal port. If the packet loss disappears after the change and reappears when switching back, replace the port due to hardware fault and send the fault information to technical support personnel for analysis. If packet loss persists on other normal ports, a link fault in the transmission link is likely.
5. Identify whether the transmission link is functioning properly.
Use the instrument to test the intermediate link. Poor link quality or excessive signal degrade can cause errors during message transmission. Identify whether the interconnecting intermediate link devices (optical transceivers, patch panels, transmission devices, etc.) are functioning properly. If a link fault occurs in the transmission link, replace or recover the transmission link.
6. Execute the shutdown command, then execute the undo shutdown command to Identify whether the port can recover normally.
7. If the issue persists, it may be a device or board hardware failure. Collect information and contact technical support personnel.
Related alarm and log messages
Alarm messages
N/A
Log messages
Number of CRC error packets exceeded the high threshold: Interface Name GigabitEthernet2/0/1, High threshold 1000, Number of CRC error packets 6611063, Interval 10s.
The port does not receive packets
Symptom
The port status is UP but does not receive packets or experiences packet loss.
Use the display interface command to check that the received message statistics for this end's incoming orientation are less than the sent message statistics for the opposite end's outgoing orientation.
Common causes
· The port has a CRC error.
· The configuration on the port affects packet reception.
· Device or board hardware failure.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 15.
Figure 15 Troubleshooting flowchart
Solution
1. Identify whether the port has CRC errors.
Check the "Port CRC Errors" section for troubleshooting.
2. Identify whether the port configuration affects message reception.
Identify whether the port configuration affects message reception by following these steps:
a. Use the display interface brief command to check for any anomalies in the port configuration. This includes configurations for the duplex mode of both ends, port types, and VLAN settings. If there are any issues, change the port attribute configuration to Identify whether the faulty port can recover. If you cannot, first execute the shutdown command, then execute the undo shutdown command, and Identify whether the port can recover normally.
b. For Layer 2 ports, if you configure the STP function, use the display stp brief command to Identify whether the port is in a discarding state. If the port is set to a discarding state by STP, investigate further based on the relevant STP configuration. Set the port configuration for connecting terminal equipment (TE) as an edge port or disable STP functionality for that port.
c. If the port joins an aggregation group, use the display link-aggregation summary command to Identify whether the port is in the Selected state. When the port status is Unselected, the port cannot send or receive datagrams. Identify the reason for the port being in the Unselected state. Check for inconsistencies in the attribute configuration of member ports within the aggregation group compared to the reference port, and investigate further to resolve the issue.
d. If you configure ACL filtering, further investigate based on the related ACL settings.
e. If the interface has traffic control enabled, disable the traffic control function to see if the faulty port can recover.
3. Execute the shutdown command, then execute the undo shutdown command to Identify whether the port can recover normally.
4. If the issue persists, it may indicate a device or hardware failure. Collect information and contact technical support personnel.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
The port does not send packets
Symptom
The port status is UP, but it does not send packets.
Use the display interface command to check that the sending message statistics for this end's orientation do not increase.
Common causes
· Transceiver module malfunction.
· The configuration on the port affects message reception.
· Device or board hardware failure.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 16.
Figure 16 Troubleshooting flowchart
Solution
1. Perform internal loopback checks on the port.
Configure the loopback internal command under the port to enable the internal loopback function. Then, use the display interface command to Identify whether the outgoing packet statistics have increased. If it does not grow, a device or hardware (HW) failure may occur. Please contact technical support personnel. If there is growth, it is not an internal port issue.
2. Identify whether the port configuration affects message transmission.
Identify whether the port configuration affects message transmission by following these steps:
a. For Layer 2 ports, if you configure the STP function, use the display stp brief command to Identify whether the port is in a discarding state. If the port is set to the discarding state by STP, further investigate according to the relevant STP configuration. Configure the port connecting the terminal equipment as an edge port or disable the STP function for that port.
b. If the port joins an aggregation group, use the display link-aggregation summary command to Identify whether the port is in the Selected state. When the port status is Unselected, the port cannot send or receive datagrams. Identify the reason why the port is in the Unselected state. Check for inconsistencies in attribute configurations among member ports in the aggregation group compared to the reference port, and investigate further to resolve the issue.
c. If you configure ACL filtering, further investigate based on the relevant ACL settings.
d. If the interface has traffic control enabled, disable the traffic control function to Identify whether the faulty port can recover.
3. Execute the shutdown command, then execute the undo shutdown command to Identify whether the port can recover normally.
4. If the issue persists, it may be a device or hardware failure. Gather information and contact technical support personnel.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Copper port is not up
Symptom
The copper ports cannot establish a normal connection after connecting the All-in-one cable.
Common causes
The following are the common causes of this type of issue:
· Port configuration issue.
· The network cable has issues.
· There is an issue with this port or the remote port.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 17.
Figure 17 Troubleshooting flowchart
Solution
1. Identify whether the network port configurations (port speed, duplex, negotiation mode, etc.) are consistent on both ends of the network cable. Execute the display interface brief command to Identify whether the rates and duplex configurations of both ends match. If they do not match, configure the port's speed and duplex mode using the speed and duplex commands.
<Sysname> display interface brief
Brief information on interfaces in route mode:
Link: ADM - administratively down; Stby - standby
Protocol: (s) – spoofing
Interface Link Protocol Primary IP Description
GE2/0/1 DOWN DOWN --
Loop0 UP UP(s) 2.2.2.9
NULL0 UP UP(s) --
Vlan1 UP UP --
Vlan999 UP UP 192.168.1.42
Brief information on interfaces in bridge mode:
Link: ADM - administratively down; Stby - standby
Speed: (a) - auto
Duplex: (a)/A - auto; H - half; F - full
Type: A - access; T - trunk; H - hybrid
Interface Link Speed Duplex Type PVID Description
GE2/0/2 DOWN auto A A 1 aaaaaaa
GE2/0/3 UP 1G(a) F(a) A 1 aaaaaaa
2. Use the display interface command to Identify whether the port status is Administratively DOWN. If it is, activate the corresponding Ethernet port with the undo shutdown command.
<Sysname> display interface gigabitethernet 2/0/1
GigabitEthernet2/0/1
Current state: Administratively DOWN
Line protocol state: DOWN
Description: GigabitEthernet2/0/1 Interface
Bandwidth: 1000000 kbps
Maximum transmission unit: 1500
Allow jumbo frames to pass
Broadcast max-ratio: 100%
Multicast max-ratio: 100%
Unicast max-ratio: 100%
Internet protocol processing: Disabled
...
3. Replace the faulty network cable with a confirmed working one and Identify whether the issue is resolved.
4. Replace the local device port and the remote device port, then Identify whether the fault is resolved.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Frequent port up/down events
Symptom
After inserting the All-in-one cable or transceiver module into the card, the port frequently goes UP and DOWN.
Common causes
The following are the common causes of this type of issue:
· Transceiver module or All-in-one cable failure.
· The copper ports' auto-negotiation is unstable.
· Clock configuration issues at both ends of the WAN port.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 18.
Figure 18 Troubleshooting flowchart
Solution
1. For the fiber port, acknowledge whether the transceiver module is functioning abnormally. Check the alarm information of the transceiver modules to troubleshoot issues with both modules and the fiber optic in between. If the alarm message indicates a receiving issue, check the opposite end port, fiber optic, or transmission equipment. For sending issues or abnormal current and voltage, investigate your local port.
<Sysname> display transceiver alarm interface gigabitethernet 2/0/1
GigabitEthernet2/0/1 transceiver current alarm information:
RX loss of signal
RX power low
2. Check whether the transceiver module's receive (Rx) and transmit optical power are normal, meaning they fall within the upper and lower threshold values. If the transmitted optical power is at a critical value, replace the fiber and transceiver module for cross-verification. If the received optical power is at a critical value, check the remote transceiver module and the intermediate fiber link.
<Sysname> display transceiver diagnosis interface gigabitethernet 2/0/1
GigabitEthernet2/0/1 transceiver diagnostic information:
Current diagnostic parameters:
Temp(°C) Voltage(V) Bias(mA) RX power(dBm) TX power(dBm)
36 3.31 6.13 -35.64 -5.19
Alarm thresholds:
Temp(°C) Voltage(V) Bias(mA) RX power(dBM) TX power(dBM)
High 50 3.55 1.44 -10.00 5.00
Low 30 3.01 1.01 -30.00 0.00
3. For copper ports, unstable negotiation often occurs during auto-negotiation. In this case, try setting a fixed rate and duplex.
4. For the WAN port, Identify whether the clocks on both ends are configured. Set the side with the clock card on the MPU to Master and the other side to Slave.
5. If the fault persists, check the link, endpoint devices, and intermediate equipment.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Transceiver module issues
Fiber port is not up
Symptom
The fiber port is not up.
Common causes
· The current version of the device does not support this transceiver module.
· The fiber port has foreign objects, or the transceiver module's gold fingers are contaminated or damaged.
· The transceiver module does not match the interface rate.
· Fiber port failure.
· Transceiver module or All-in-one cable failure.
· The transceiver module does not match the fiber optic type.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 19.
Figure 19 Troubleshooting flowchart
Solution
1. Identify whether the device's current version supports the transceiver module.
Check the Installation Manual (IM) or Release Notes to see if the current software version supports this transceiver module. You can upgrade the software version if a new version supports the transceiver module.
2. Identify whether the transceiver module matches the interface rate and duplex mode.
Execute the display interface command to Identify whether the port and transceiver module's rate and duplex configuration match. If they do not match, configure the port's speed and duplex mode using the speed and duplex commands.
3. Identify whether the optical interface is faulty.
Directly connect the fiber ports with matching All-in-one cables for short reach (SR) on the same rate on this device. Identify whether the port can come up. If the connection can be established, the remote port is abnormal. If it cannot be established, the local port is abnormal. You can Identify whether the issue is resolved by swapping the local and remote ports.
4. Check for any issues with the transceiver module or all-in-one cable.
Check for abnormalities in the transceiver module or All-in-one cable using the following steps:
a. Use the display transceiver alarm interface command to view the current port's transceiver module alarm messages. If it shows "None," there are no faults. If there are alarm messages, check the transceiver module or All-in-one cable alarm messages to determine if the issue is with the optical transceiver or the fiber optic or the opposite end. For example, if you encounter RX signal loss and TX fault errors, check for foreign objects on the fiber port or severe oxidation on the transceiver module's gold fingers.
b. Use the display transceiver interface command to Identify whether the transceiver module types, wavelengths, and transmission distances match on both ends.
c. Use the display transceiver diagnosis interface command to Identify whether the current measurement values of the transceiver module's diagnostic parameters are within the normal range. Common issues and solutions for parameter exceptions are as follows:
- Secure the fiber optic connection with the transceiver module to resolve poor contact issues.
- Replace the fiber optic if its quality is poor or damaged.
- Adjust the optical attenuation devices based on actual usage when the transmission path adds intermediate optical attenuation devices.
- Replace the transceiver module with one that matches the actual transmission distance when there is a significant difference between the adapted and actual distances.
5. Check that the transceiver module type matches the fiber optic.
Use the H3C transceiver module manual to Identify whether the transceiver module type matches the fiber optic type. If there is a mismatch, resolve it by replacing the fiber optic.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· OPTMOD/3/CFG_ERR
· OPTMOD/5/CHKSUM_ERR
· OPTMOD/5/IO_ERR
· OPTMOD/4/FIBER_SFPMODULE_INVALID
· OPTMOD/4/FIBER_SFPMODULE_NOWINVALID
· OPTMOD/5/MOD_ALM_ON
· OPTMOD/5/RX_ALM_ON
· OPTMOD/5/RX_POW_HIGH
· OPTMOD/5/RX_POW_LOW
System logs contain information about non-H3C transceiver modules
Symptom
Use the display logbuffer command to check the system logs. You may find information about non-H3C compliant transceiver modules. Log messages display the following information:
This transceiver is NOT sold by H3C. H3C therefore shall NOT guarantee the normal function of the device or assume the maintenance responsibility thereof!
Common causes
The transceiver module is either from third parties or a counterfeit H3C transceiver module.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 20.
Figure 20 Troubleshooting flowchart
Solution
1. Identify whether the transceiver module is an H3C model.
a. Determine if the transceiver module is H3C certified by checking the label on the module.
b. Use the display transceiver interface command to Identify whether the Vendor Name is H3C. If it displays H3C, you may have an H3C transceiver module without an electronic label, or it may not be an H3C transceiver module. Acknowledge the need for further verification. If other information is displayed, it is not an H3C transceiver module. Replace it with an H3C transceiver module to Identify whether the issue is resolved.
c. Acknowledge with H3C's technical support engineer whether it is an H3C transceiver module.
Use the display hardware internal transceiver register interface and display transceiver information interface commands in probe view to collect transceiver module information. Then provide feedback to the H3C technical support engineer with the bar code on the transceiver module to acknowledge the source of the module and confirm whether it is an H3C transceiver module. If you acknowledge that it is not an H3C transceiver module, replace it with an H3C transceiver module to Identify whether the issue is resolved.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Device log and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
OPTMOD/4/PHONY_MODULE
The transceiver module does not support digital diagnostics
Symptom
When you use the display transceiver diagnosis interface command to view transceiver module diagnostic information, the system indicates that the transceiver module does not support digit diagnostics. Display as follows:
<Sysname> display transceiver diagnosis interface Twenty-FiveGiGE2/0/1
The transceiver does not support this function.
Common causes
· The transceiver module is a non-H3C transceiver module.
· The transceiver module does not support digital diagnostics.
· Transceiver module failure.
· Device/Fiber port failure.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 21.
Figure 21 Troubleshooting flowchart
Solution
1. Determine if it is an H3C transceiver module. For detailed steps, see "System logs contain information about non-H3C transceiver modules."
2. Use the display transceiver interface command to Identify whether the Digital Diagnostic Monitoring field is YES. If it is YES, the device supports digital diagnostics; otherwise, it does not.
3. Insert the same model transceiver module into other functioning ports of this device or into other operational devices that support the module. Identify whether the unsupported digits diagnostic message still appears.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Device alarm message.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Lost transceiver module serial number
Symptom
Use the display transceiver manuinfo interface command to check for missing transceiver module serial numbers.
Common causes
· The transceiver module is not securely inserted.
· Transceiver module/device failure.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 22.
Figure 22 Troubleshooting flowchart
Solution
1. Identify whether the transceiver module is fully inserted into the fiber port.
2. You can resolve this by securely inserting the transceiver module or replacing the fiber port.
3. Identify whether the transceiver module is faulty.
4. You can determine this by using the same model of transceiver module plugged into this device's port or another functioning device that supports the module.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Device alarm message.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting fundamental issues
Login management issues
Forgetting the login password for the console port
Symptom
When local password authentication or AAA local authentication is used for console login, you cannot successfully log in to the device through the console port due to an incorrect password.
Common causes
The following are the common causes of this type of issue:
· You forget the login password for the console port or enter an incorrect password.
· The login account for the console port has expired.
Troubleshooting flow
Figure 23 shows the troubleshooting flowchart.
Solution
1. Verify that you can log in to the device through Telnet or Stelnet.
If you have a user account assigned the Telnet or Stelnet service and the network-admin or level-15 user role, you can use this account to log in to the device through Telnet or Stelnet and modify the settings related to console login. The procedure is as follows:
a. Use the account assigned the Telnet or Stelnet service to log in to the device and execute the display line command to view the authentication mode of the user line for the console port.
<Sysname> display line
Idx Type Tx/Rx Modem Auth Int Location
0 CON 0 9600 - P - 0/0
+ 81 VTY 0 - N - 0/0
...
If the value for the Auth field is P, the authentication mode is local password authentication. If the value for this field is A, the authentication mode is AAA (scheme) authentication.
b. Verify that the user account you use has the network-admin or level-15 user role.
If you log in to the device on a user line that uses local password authentication or does not require authentication, you can enter the view of that user line to identify whether the user line is assigned the network-admin or level-15 user role. If you log in to the device on a user line that uses scheme authentication, the user roles are assigned by AAA. You must check the authorization attributes assigned to your user account to identify whether the user account is assigned the network-admin or level-15 user role. For local authentication, the user account is configured on the device. For remote authentication, the user account is configured on a remote server.
<Sysname> system-view
[Sysname]line vty 0
[Sysname-line-vty0] display this
#
line con 0
authentication-mode password
user-role network-admin
#
line vty 0 63
authentication-mode none
user-role network-admin
#
return
If your user account is not assigned the network-admin or level-15 user role, it does not have permissions to change the settings related to console login. In this case, proceed to step 2. If your user account is assigned the network-admin or level-15 user role, handle the password forgotten issue according to the authentication mode used for console login.
c. If local password authentication is used for console login, change the authentication password for the console port.
Access the user line where the console port is located and set a new password for the user line. In this example, the password is 1234567890!. As a best practice, assign the network-admin or level-15 user role to the user line to ensure that the users who log in to the device through the console port have sufficient privileges.
[Sysname] line console 0
[Sysname-line-console0] set authentication password simple 1234567890!
[Sysname-line-console0] user-role network-admin
d. If AAA local authentication is used for console login, change the password of the local user account that can be used to log in to the device through the console port.
Enter the local user view of the account used to log in to the device through the console port, and change the password of the account. In this example, the username is admin, and the password is 1234567890!. As a best practice, assign the network-admin or level-15 user role to the account to ensure that the users who use this account to log in to the device through the console port have sufficient privileges.
[Sysname] local-user admin class manage
[Sysname-luser-manage-admin] password simple 1234567890!
[Sysname-luser-manage-admin] authorization-attribute user-role network-admin
e. If AAA remote authentication is used for console login, contact the administrator of the AAA server to obtain the login password.
f. To prevent configuration loss after a reboot, execute the save command to save the running configuration.
IMPORTANT: · Accessing the BootWare menu requires a device reboot, which causes service interruption. As a best practice, back up services as needed and reboot the device when the service traffic is light. · For a distributed device, you must connect your configuration terminal to the console ports on both MPUs and then reboot the entire device. After you access the extended BootWare menu of each MPU, perform the operations in this step and subsequent steps first on the active MPU and then reboot the standby MPU. |
Upon system startup, if you fail to promptly select the basic segment, the system directly runs the BootWare extended segment. When message Press Ctrl+B to access EXTENDED-BOOTWARE MENU... appears, immediately press Ctrl + B. The system provides a prompt on whether password recovery capability is enabled.
Password recovery capability is enabled.
Password recovery capability is disabled.
¡ When password recovery capability is enabled, you can choose to skip authentication for console login or skip the current system configuration. For more information about the troubleshooting procedure, see steps 3 and 4.
¡ When password recovery capability is disabled, you can choose to restore the factory defaults on the device. For more information about the troubleshooting procedure, see step 5.
Press Enter to access the extended BootWare menu, and then follow the system prompt to select the option that skips authentication for console login (the menu option might vary by device model). After the system starts up, you do not need to enter the password of the console port and the system can load all settings.
a. After the system starts up, you must change the password of the console port as soon as possible according to the authentication mode used by the console port.
# If local password authentication is used for console login, change the authentication password for the console port.
Access the user line where the console port is located and set a new password for the user line. In this example, the password is 1234567890!. As a best practice, assign the network-admin or level-15 user role to the user line to ensure that the users who log in to the device through the console port have sufficient privileges.
<Sysname> system-view
[Sysname] line console 0
[Sysname-line-console0] set authentication password simple 1234567890!
[Sysname-line-console0] user-role network-admin
# If AAA local authentication is used for console login, change the password of the local user account that can be used to log in to the device through the console port.
Enter the local user view of the account used to log in to the device through the console port, and change the password of the account. In this example, the username is admin, and the password is 1234567890!. As a best practice, assign the network-admin or level-15 user role to the account to ensure that the users who use this account to log in to the device through the console port have sufficient privileges.
<Sysname> system-view
[Sysname] local-user admin class manage
[Sysname-luser-manage-admin] password simple 1234567890!
[Sysname-luser-manage-admin] authorization-attribute user-role network-admin
b. To prevent configuration loss after a reboot, execute the save command to save the running configuration.
Press Enter to access the extended BootWare menu, and then follow the system prompt to select the option that skips the current system configuration (the menu option might differ by device model). When the system starts, it ignores all settings in the next-startup configuration file and starts up with initial settings. This is a one-time operation and takes effect only for the first system boot or reboot after you choose this option. After the system starts up, you do not need to enter the password of the console port.
a. After the system starts up, you must export the settings in the original next-startup configuration file as soon as possible. Do not power off the device during this operation. You can use one of the following methods:
- Use FTP or TFTP to export the original next-startup configuration file to your local terminal.
- Execute the more command in user view to display the contents of the original next-startup configuration file, and then copy and paste all the displayed contents to a local configuration file.
b. Manually edit the settings related to console login in the local file, and then upload the edited file to the root directory of the storage medium on the device.
c. Specify the edited configuration file as the next-startup configuration file (in this example, the configuration file is startup.cfg).
<Sysname> startup saved-configuration startup.cfg
d. Reboot the device.
CAUTION: In this operation, the system will automatically delete the main and backup next-startup configuration files upon startup, and then load the factory defaults. You must ensure that this operation does not have negative impact on services. |
Press Enter to access the extended BootWare menu, and then follow the system prompt to select the option that restores the factory defaults. The menu option might differ by device model. After the system starts up, you do not need to enter the password of the console port.
a. After the system starts up, configure the login authentication mode for the console port as per your actual needs, as well as the relevant login password or account.
The authentication mode is none:
<Sysname> system-view
[Sysname] line console 0
[Sysname-line-console0] authentication-mode none
[Sysname-line-console0] user-role network-admin
You can log in to the device through this user line without providing any username or password. This authentication mode has security risks. Use it with caution.
The authentication mode is local password authentication:
<Sysname> system-view
[Sysname] line console 0
[Sysname-line-console0] authentication-mode password
[Sysname-line-console0] set authentication password simple 1234567890!
[Sysname-line-console0] user-role network-admin
The authentication mode is local AAA authentication:
<Sysname> system-view
[Sysname] line console 0
[Sysname-line-console0] authentication-mode scheme
[Sysname-line-console0] quit
[Sysname] local-user admin class manage
[Sysname-luser-manage-admin] service-type terminal
[Sysname-luser-manage-admin] password simple 1234567890!
[Sysname-luser-manage-admin] authorization-attribute user-role network-admin
The authentication mode is remote AAA authentication:
<Sysname> system-view
[Sysname] line console 0
[Sysname-line-console0] authentication-mode scheme
[Sysname-line-console0] quit
In addition, you must configure an authentication domain for login users and a RADIUS, HWTACACS, or LDAP scheme. For more information about the configuration, see AAA configuration in Security Configuration Guide.
b. To prevent configuration loss after a reboot, execute the save command to save the running configuration.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Forgetting the password for Telnet login
Symptom
When local password authentication or AAA local authentication is used for Telnet login, you cannot successfully Telnet to the device due to an incorrect password.
Common causes
The following are the common causes of this type of issue:
· You forget the login password for the user account that you use to Telnet to the device or enter an incorrect password.
· The account that you use to Telnet to the device has expired.
Troubleshooting flow
Figure 24 shows the troubleshooting flowchart.
Solution
1. Verify that you can use another method to log in to the device.
If the Telnet login password is lost, you can log in to the device through another method (such as through the console port) and reconfigure a Telnet login password.
a. Log in to the device through a non-Telnet method, and then execute the display line command to display the authentication mode used by VTY lines.
<Sysname> display line
Idx Type Tx/Rx Modem Auth Int Location
+ 0 CON 0 9600 - P - 0/0
81 VTY 0 - P - 0/0
...
If the value for the Auth field is P, the authentication mode is local password authentication. If the value for this field is A, the authentication mode is AAA (scheme) authentication.
b. Based on the authentication mode used by the VTY lines, configure a new login password for Telnet login.
For local password authentication:
Set the authentication mode for VTY login users to local password authentication, and configure the login password and user role. For example, set the login password to 1234567890! and specify the network-admin user role for VTY login users.
<Sysname> system-view
[Sysname] line vty 0 63
[Sysname-line-vty0-63] authentication-mode password
[Sysname-line-vty0-63] set authentication password simple 1234567890!
[Sysname-line-vty0-63] user-role network-admin
For AAA local authentication:
Set the authentication mode for VTY login users to AAA authentication, and configure a new password for the account that you use to Telnet to the device and specify user roles for the account. In this example, the local account used for Telnet login is admin, the password is set to 1234567890!, and the network-admin user role is specified for the account.
<Sysname> system-view
[Sysname] line vty 0 63
[Sysname-line-vty0-63] authentication-mode scheme
[Sysname-line-vty0-63] quit
[Sysname] local-user admin class manage
[Sysname-luser-manage-admin] service-type telnet
[Sysname-luser-manage-admin] password simple 1234567890!
[Sysname-luser-manage-admin] authorization-attribute user-role network-admin
If you forget the original login account name, you can create a new local account by performing the operations in this step.
For AAA remote authentication:
Contact the administrator of the AAA server to obtain the login password.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Telnet login failure
Symptom
When the device acts as a Telnet server, you fail to log in to the device through a Telnet client.
Common causes
The following are the common causes of this type of issue:
· The network connection between the Telnet client and the device is poor.
· The Telnet client feature is not enabled on the Telnet client.
· The Telnet service is not enabled on the device.
· VTY lines do not support the Telnet protocol.
· The login username or password is incorrect.
· The number of login users on the device has reached the upper limit.
· Access control for Telnet login has been configured on the device, and the Telnet client is not permitted by the rules in the ACL specified for filtering users.
· The authentication mode settings are not configured correctly.
· When both the Telnet client and Telnet server are H3C devices, you do not log in to the Telnet server from the source address or source interface specified on the Telnet client for outgoing Telnet packets.
Troubleshooting flow
Figure 25 shows the troubleshooting flowchart.
Figure 25 Flowchart for troubleshooting Telnet login failure
Solution
1. Verify that the client can successfully ping the device.
Execute the ping command on the Telnet client to check the network connection between the Telnet client and the device.
If the Telnet client cannot ping the IP address of the device, it cannot establish a Telnet connection with the device. As a result, it cannot Telnet to the device. The reason for the ping failure might be that ping is disabled on the Telnet client. To troubleshoot the ping failure, follow the procedure in "Ping and tracert issues.”
2. Verify that the Telnet client feature is enabled on the client.
Typically, before you set up a new Telnet connection on a PC, you must enable the Telnet client feature in the Turn Windows features on or off window on the PC.
For information about enabling the Telnet client feature on other types of devices, such as mobile devices, see the user manuals for those devices.
3. Verify that the Telnet service is enabled on the device.
By default, the Telnet service is disabled. If the command output for the display this command in system view does not contain the telnet server enable command line, the Telnet service remains disabled. You can execute the telnet server enable command to enable the Telnet service to allow clients to Telnet to the device.
4. Verify that the VTY line through which the user Telnets to the device supports the Telnet protocol.
Execute the display this command in VTY line view or VTY line class view.
¡ If the command output does not contain the protocol inbound telnet or protocol inbound all command line, the VTY line does not support the Telnet protocol.
¡ In non-FIPS mode, the system supports all protocols by default. If the command output contains the undo protocol inbound command line or does not contain the protocol inbound command line, the system supports all protocols.
If the Telnet protocol is not supported on the user line, execute the protocol inbound telnet or protocol inbound all command on the user line to allow Telnet login.
<Sysname> system-view
[Sysname] line vty 0 63
[Sysname-line-vty0-63] authentication-mode scheme
[Sysname-line-vty0-63] protocol inbound all
A configuration change in user line view does not take effect on the current session. It takes effect on subsequent login sessions.
5. Verify that the username and password used by the client to Telnet to the device are correct.
If the device prompts an authentication failure when you initiate a Telnet connection and enter the username and password for Telnet login as instructed by the Telnet client, you can attempt to log in again by re-entering the username and password. If the login still fails, you can check the LOGIN/5/LOGIN_INVALID_USERNAME_PWD log. You have entered an invalid username or password if the log contains the following message:
LOGIN/5/LOGIN_INVALID_USERNAME_PWD: Invalid username or password from vty0.
If you forget the correct login username or password, you can change the authentication mode to none or reset the password, and then attempt to Telnet to the device again.
¡ In user line view or user line class view, execute the authentication-mode none command to disable authentication. The configuration indicates that when a user logs in to the device through the specified user line or user line class, no authentication is required. The user can use the user line or user line class to log in without having to enter a username or password. This mode brings security risks. Use it with caution.
<Sysname> system-view
[Sysname] line vty 0 63
[Sysname-line-vty0-63] authentication-mode none
¡ If the authentication mode is local password authentication, execute the set authentication password command in user line view or user line class view to configure an authentication password for local password authentication.
<Sysname> system-view
[Sysname] line vty 0 63
[Sysname-line-vty0-63] authentication-mode password
[Sysname-line-vty0-63] set authentication password simple hello12345&!
¡ If the authentication mode is AAA authentication, follow the procedure in "AAA and password control issues" to reset the password.
6. Identify whether the number of login users on the device has reached the upper limit.
Log in to the device through the console port and execute the display users command in any view to display the current number of Telnet users. By default, the device supports a maximum of 32 concurrent Telnet users.
Check the TELNETD/6/TELNETD_REACH_SESSION_LIMIT log. The number of Telnet users has reached the upper limit if the following log message is generated:
TELNETD/6/TELNETD_REACH_SESSION_LIMIT: Telnet client 1.1.1.1 failed to log in. The current number of Telnet sessions is 10. The maximum number allowed is (10).
If the number of Telnet users has reached the upper limit, you can first disconnect the connections of other idle Telnet users or execute the aaa session-limit telnet command to increase the maximum number of concurrent Telnet users. Then, initiate a Telnet connection to the device again.
7. Identify whether ACLs have been applied to control Telnet login on the device.
In system view, execute the display this command. If the command output contains settings related to the telnet server acl or telnet server ipv6 acl command, ACLs have been applied to control Telnet login.
¡ Verify that the rules in the ACLs permit the IP address, port number, and protocol number of the Telnet client. You can check the TELNETD_ACL_DENY log. The rules in the ACLs deny the IP address of the Telnet client if the following log message is generated:
TELNETD/5/TELNETD_ACL_DENY: The Telnet Connection 1.2.3.4(vpn1) request was denied according to ACL rules.
¡ Execute the undo telnet server acl or undo telnet server ipv6 acl command to remove ACL access restrictions for Telnet users.
8. Verify that the authentication mode settings are correctly configured on the device.
In any view, execute the display line command to check the Auth field to obtain the authentication mode used on the user line through which you Telnet to the device. The value of A indicates AAA authentication, the value of N indicates none authentication, and the value of P indicates local password authentication.
¡ If local password authentication is configured as the login authentication mode for the VTY line by using the authentication-mode password command, you must ensure that an authentication password has been configured for the VTY line.
¡ If AAA authentication is configured as the login authentication mode by using the authentication-mode scheme command, you must ensure that the user account used for Telnet login has been created. For more information about the troubleshooting procedure, see "AAA and password control issues.”
9. When both the Telnet client and Telnet server are H3C devices, identify whether the Telnet client has configured a source address or a source interface for outgoing Telnet packets.
Execute the display this command in system view. If the command output contains the telnet client source command line, a source IPv4 address or source interface has been specified on the Telnet client for outgoing Telnet packets. In this case, you must ensure that you log in to the Telnet server from the specified source IPv4 address or source interface on the Telnet client. If the login fails, perform one of the following operations and attempt to log in to the Telnet server again:
¡ Execute the telnet client source command to reconfigure the source IPv4 address or source interface for the Telnet client to use for outgoing Telnet packets.
¡ Execute the undo telnet client source command to restore the default. In this case, no source IPv4 address or source interface is specified. The Telnet client uses the primary IPv4 address of the output interface for the route to the server as the source IPv4 address.
When you perform the operations in this step, follow these restrictions and guidelines:
¡ The source setting configured by using the telnet client source command has a lower precedence than the source setting specified by using the telnet command in user view.
¡ In an IPv6 network, you can execute the telnet ipv6 command in user view to specify a source interface or source IPv6 address for outgoing Telnet packets.
10. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· LOGIN/5/LOGIN_FAILED
· LOGIN/5/LOGIN_INVALID_USERNAME_PWD
· TELNETD/5/TELNETD_ACL_DENY
· TELNETD/6/TELNETD_REACH_SESSION_LIMIT
Software upgrade issues
Device startup failure
Symptom
The device fails to restart after loading the software images.
Common causes
The following are the common causes of this type of issue:
· The storage medium (CF card or USB disk) is not securely installed.
· The BootWare version does not match the version of the software images.
Troubleshooting flow
Figure 26 shows the troubleshooting flowchart.
Figure 26 Flowchart for troubleshooting device startup failure
Solution
1. Run a terminal emulation program on the PC connected to the console port, start up the device, and identify the BootWare version according to the following information;
Booting Normal Extended BootWare
The Extended BootWare is self-decompressing.............Done.
****************************************************************************
* *
* BootWare, Version 1.50 *
* *
****************************************************************************
2. Press Ctrl + B within 3 seconds after Press Ctrl+B to access EXTENDED-BOOTWARE MENU... is displayed. The system enters the extended BootWare menu.
==========================<EXTENDED-BOOTWARE MENU>==========================
|<1> Boot System |
|<2> Enter Serial SubMenu |
|<3> Enter Ethernet SubMenu |
|<4> File Control |
|<5> Restore to Factory Default Configuration |
|<6> Skip Current System Configuration |
|<7> BootWare Operation Menu |
|<8> Skip Authentication for Console Login |
|<9> Storage Device Operation |
|<0> Reboot |
============================================================================
Ctrl+Z: Access EXTENDED ASSISTANT MENU
Ctrl+C: Display Copyright
Ctrl+F: Format File System
Enter your choice(0-9): 4
3. Enter 4 to access the file control submenu. Verify that the storage medium is securely installed (the CF card is used as example).
¡ If the Note:the operating device is cfa0 message appears and you can see the file information in the CF card after entering 1, the storage medium is securely installed, and proceed to step 4.
¡ If the Note:the operating device is cfa0 message does not appear and you cannot see the file information in the CF card after entering 1, the storage medium is not securely installed, and contact Technical Support.
===============================<File CONTROL>===============================
|Note:the operating device is cfa0 |
|<1> Display All File(s) |
|<2> Set Image File type |
|<3> Set Bin File type |
|<4> Delete File |
|<5> Copy File |
|<0> Exit To Main Menu |
============================================================================
Enter your choice(0-5): 1
Display all file(s) in cfa0:
'M' = MAIN 'B' = BACKUP 'N/A' = NOT ASSIGNED
============================================================================
|NO. Size(B) Time Type Name |
|1 539432 Nov/18/2021 21:11:56 N/A cfa0:/info/info_3_0.bin |
|2 539432 Nov/18/2021 21:15:00 N/A cfa0:/info/info_3_1.bin |
|3 539432 Aug/28/2021 19:05:42 N/A cfa0:/info/info_2_0.bin |
============================================================================
4. Confirm with the support whether the BootWare version is the latest.
¡ If yes, proceed to step 5.
¡ If no, download the latest BootWare version, and proceed to step 5.
5. Connect the PC to the Ethernet interface on the device, run the FTP or TFTP server software on the PC, specify the file path of the downloaded image, and proceed to step 6.
|
NOTE: No FTP or TFTP server software is available with the device. You must make sure that it is available by yourself.. |
6. Enter 0 to return to the extended BootWare menu. Enter 7 to access the BootWare operation menu, and proceed to step 7.
==========================<EXTENDED-BOOTWARE MENU>==========================
|<1> Boot System |
|<2> Enter Serial SubMenu |
|<3> Enter Ethernet SubMenu |
|<4> File Control |
|<5> Restore to Factory Default Configuration |
|<6> Skip Current System Configuration |
|<7> BootWare Operation Menu |
|<8> Skip Authentication for Console Login |
|<9> Storage Device Operation |
|<0> Reboot |
============================================================================
Ctrl+Z: Access EXTENDED ASSISTANT MENU
Ctrl+C: Display Copyright
Ctrl+F: Format File System
Enter your choice(0-9): 7
7. Enter 4 to update the BootWare through the Ethernet interface, and proceed to step 8.
=========================<BootWare Operation Menu>==========================
|Note:the operating device is flash |
|<1> Backup Full BootWare |
|<2> Restore Full BootWare |
|<3> Update BootWare By Serial |
|<4> Update BootWare By Ethernet |
|<0> Exit To Main Menu |
============================================================================
Enter your choice(0-4): 4
8. Enter 4 to configure Ethernet interface parameters, and proceed to step 9.
===================<BOOTWARE OPERATION ETHERNET SUB-MENU>===================
|<1> Update Full BootWare |
|<2> Update Extended BootWare |
|<3> Update Basic BootWare |
|<4> Modify Ethernet Parameter |
|<0> Exit To Main Menu |
============================================================================
Enter your choice(0-4): 4
9. Enter 1 to upload the BootWare image. Enter 0 to return to the BootWare operation menu. Enter 0 again to return to the extended BootWare menu. Enter 0 to reboot the device.
¡ If the device starts up successfully, the issue is resolved.
¡ If the device fails to start up, proceed to step 10.
10. Collect the results of each step and contact the support.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Image loading failure
Symptom
The device fails to load software images.
Common causes
The common cause of this type of issue is that the software image file is corrupted.
Solution
1. Execute the md5sum command in user view to use the MD5 algorithm to calculate the digest of the software image file.
<Sysname> md5sum cfa0:/Comware-cmw710.ipe
MD5 digest:
f2054bc35cd13bf84038bd10fc7a3efd
2. Obtain the label of the software image file from the official website or Technical Support, and use an MD5 tool to calculate the digest of the label.
3. Compare the digest of the software image file with the digest of the label.
¡ If they are the same, the software image file is not corrupted, and proceed to the next step.
¡ If they are different, the software image file is corrupted, and contact the support to obtain a new software image file.
4. Collect the following information and contact the support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting system management issues
Hardware resource management issues
High CPU usage
Symptom
If one of the following conditions occurs, the CPU control core usage of the device is high, and you must identify the causes for the high CPU usage:
· During daily inspection of the device, execute the display cpu-usage command repeatedly to view the CPU usage. The CPU usage is significantly higher than the daily average.
# Execute the display cpu-usage summary command to view the average CPU usage during the most recent 5-second, 1-minute, or 5-minute interval.
<Sysname> display cpu-usage summary
Slot CPU Last 5 sec Last 1 min Last 5 min
1 0 5% 5% 4%
# Execute the display cpu-usage history command to view the CPU usage in graphical form for the last 60 samples. The data shows that the CPU usage rate continues to increase or is significantly higher than the daily average value.
· When you log in to the device via Telnet or SSH and execute commands, the device responds slowly and experiences stagnation.
· The device outputs log messages about high CPU usage on the device.
· Alarms on high CPU usage occur on the SNMP manager.
Common causes
The following are the common causes of this type of issue:
· Network attacks.
· Protocol flappings, typically STP flappings and route protocol flappings.
· Network loops.
· After flow sampling is configured on the device and the traffic to be processed is too large or the device's sampling frequency is too high, the sampling feature occupies a significant amount of CPU resources.
· The device generates a large number of log messages. Then, abundant resources are occupied for the generation and management of these log messages.
Troubleshooting flow
Figure 27 shows the troubleshooting flowchart.
Figure 27 Flowchart for troubleshooting high CPU usage
Solution
1. Identify whether a network attack occurs.
On a live network, the most common cause of high CPU usage is a network attack. Attackers initiate a large number of abnormal network interactions which hit the device. For example, the attackers transmit a large number request messages for setting up TCP connections or ICMP request messages in a short period. Then, the device is busy processing these attack messages, leading to high CPU usage and subsequently affecting the normal operation of the device.
In probe view, execute the display system internal control-plane statistics command to view statistics of the control plane to check the number of dropped messages. If the current CPU usage is high and the Dropped field value is large, a message attack occurs probably on the device. (Support for the display system internal control-plane statistics command depends on the device model.)
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display system internal control-plane statistics slot 1
Control plane slot 1
Protocol: Default
Bandwidth: 15360 (pps)
Forwarded: 108926 (Packets), 29780155 (Bytes)
Dropped : 0 (Packets), 0 (Bytes)
Protocol: ARP
Bandwidth: 512 (pps)
Forwarded: 1489284 (Packets), 55318920 (Bytes)
Dropped : 122114 (Packets), 491421 (Bytes)
...
¡ If a network attack occurs, first resolve the network attack issue.
¡ If no network attack occurs, proceed to step 2.
2. Identify whether a protocol flapping occurs on the device.
A protocol flapping can cause continuous processing of protocol messages, topology calculations, and entry updates by the device, resulting in high CPU usage. In practical applications, the most common protocol flappings are STP protocol flappings and OSPF protocol flappings.
¡ For STP protocol flappings, execute the stp port-log command in system view to enable outputting port state transition information. If the CLI of the device frequently outputs the following logs, an STP flapping occurs:
STP/6/STP_DETECTED_TC: Instance 0's port GigabitEthernet2/0/1 detected a topology change.
STP/6/STP_DISCARDING: Instance 0's port GigabitEthernet2/0/1 has been set to discarding state.
STP/6/STP_NOTIFIED_TC: Instance 0's port GigabitEthernet2/0/1 was notified a topology change.
- If an STP flapping occurs, first resolve the STP flapping issue.
- If no STP flapping occurs, proceed to the next step.
¡ For OSPF flappings, execute the display ip routing-table command to view routing information. If route entries for the same network segment are frequently and repeatedly created and deleted in the routing table, a route flapping occurs.
- If a route flapping occurs or the routes do not exist, troubleshoot link-related issues and IGP routing issues.
- If no route flapping occurs, proceed to step 3.
3. Identify whether a network loop occurs.
When an Ethernet interface operates in Layer 2 mode and a loop occurs on the link, broadcast storms and network flappings might occur. Then, a large number of protocol packets are sent to the CPU for processing, causing high CPU usage. When a network loop occurs, traffic on many ports of the device will increase significantly, with a large proportion of broadcast and multicast packets. To identify whether a network loop occurs on the device and whether broadcast, multicast, or unknown unicast packet storms occur, follow these steps:
a. Clear the Ethernet interface traffic statistics.
<Sysname> reset counters interface
b. Execute the display counters rate inbound interface command multiple times to identify whether the port usage has significantly increased.
<Sysname> display counters rate inbound interface
Usage: Bandwidth utilization in percentage
Interface Usage(%) Total(pps) Broadcast(pps) Multicast(pps)
GE2/0/1 0.01 7 -- --
GE2/0/2 0.01 1 -- --
GE2/0/3 0.01 5 -- --
GE2/0/4 0.05 60 -- --
GE2/0/5 0.04 52 -- --
Overflow: More than 14 digits.
--: Not supported.
c. If the port usage significantly increases, repeatedly execute the display counters inbound interface command to view the total number of packets received on interfaces and the number of broadcast and multicast packets, which correspond to the values for the Total(pkt), Broadcast(pkt), and Multicast(pkt) fields, respectively. If the proportion of broadcast and multicast packets in the total number of received packets on the interfaces is high, a broadcast or multicast storm might occur. If the number of broadcast and multicast packets has not significantly increased, but the number of the total packets received on interfaces has increased significantly, an unknown unicast packet storm might occur.
<Sysname> display counters inbound interface
Interface Total(pkt) Broadcast(pkt) Multicast(pkt) Err(pkt)
GE2/0/1 141 27 111 0
GE2/0/2 274866 47696 0 --
GE2/0/3 1063034 684808 2 --
GE2/0/4 11157797 7274558 50 0
GE2/0/5 9653898 5619640 52 0
Overflow: More than 14 digits (7 digits for column "Err").
--: Not supported.
¡ If a link loop occurs, perform the following operations:
- Troubleshoot the link connection to prevent the occurrence of loops in the physical topology.
- Execute the display stp command to identify whether STP is enabled and whether the configuration is correct. If the configuration is incorrect, correct the configuration.
- Execute the display stp brief and display stp abnormal-port commands to check the spanning tree status on neighboring devices. Locate and resolve STP anomalies according to the BlockReason field value in the output from the display stp abnormal-port command.
If the STP configuration is correct, an STP protocol miscalculation might occur, or the protocol calculation is correct but the port driver layer is not blocked as expected. To quickly restore STP and eliminate loops, execute the shutdown/undo shutdown command or unplug and plug the network cable on the interface where the loop occurs, allowing STP recalculation.
- In Ethernet interface view, execute broadcast-suppression to enable broadcast suppression on an interface, execute multicast-suppression to enable multicast storm suppression, and execute unicast-suppression to enable unknown unicast storm suppression. Alternatively, execute flow-control to configure flow control. (Support for the broadcast-suppression, multicast-suppression, unicast-suppression, and flow-control commands depends on the device model.)
- Apply QoS policies for rate limiting on multicast, broadcast and unknown unicast packets.
¡ If no loop occurs, proceed to step 4.
4. Identify whether flow statistics and sampling features are configured and whether the configured parameters are appropriate.
After network traffic monitoring features including NetStream are configured on the device, the device will statistically analyze network traffic. If network traffic is high, the CPU usage might be high. In this case, perform the following operations:
¡ Configure filter conditions to precisely filter the traffic, and only analyze the traffic that users care about.
¡ Configure the sampler and adjust the sampling ratio. Then, the statistics collected by NetStream can basically reflect the status of the entire network, and can prevent the excessive statistical messages from affecting the forwarding performance of the device.
5. Identify whether the device is generating a large number of log messages.
In certain anomaly situations, for example, when the device is under attack, errors occur during the operation, or a port frequently runs up/down, the device continuously generates diagnostic information or log information. In this case, the system software needs to frequently read and write to the memory, which can increase the CPU usage.
Use the following methods to identify whether the device is generating a large number of log messages:
¡ Log in to the device via Telnet and execute the terminal monitor command to enable log output to the current terminal.
<Sysname> terminal monitor
The current terminal is enabled to display logs.
After you execute this command, if a large number of abnormal log messages or duplicated log messages are output to the CLI, the device is generating a large number of log messages.
¡ Repeatedly execute the display logbuffer summary command. If the total log volume increases obviously, execute the display logbuffer reverse command to view detailed log information to identify whether a large number of abnormal log messages occur or whether a particular log message is repeatedly appearing in large quantities.
<Sysname> display logbuffer summary
Slot EMERG ALERT CRIT ERROR WARN NOTIF INFO DEBUG
1 0 0 2 9 24 12 128 0
5 0 0 0 41 72 8 2 0
97 0 0 42 11 14 7 40 0
<Sysname> display logbuffer reverse
Log buffer: Enabled
Max buffer size: 1024
Actual buffer size: 512
Dropped messages: 0
Overwritten messages: 0
Current messages: 410
%Jan 15 08:17:24:259 2021 Sysname SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=192.168.2.108-User=**; Command is display logbuffer
%Jan 15 08:17:19:743 2021 Sysname SHELL/4/SHELL_CMD_MATCHFAIL: -User=**-IPAddr=192.168.2.108; Command display logfile in view shell failed to be matched.
...
If the device is generating a large number of log messages, use the following methods to reduce log generation:
¡ Disable the log output feature for some service modules.
¡ Execute the info-center logging suppress command to disable log output for a module.
¡ Execute the info-center logging suppress duplicates command to enable duplicate log suppression.
If the device has not generated a large number of log messages, proceed to step 6.
6. Collect CPU usage information, and identify the service modules where the CPU usage is high.
a. Identify the tasks that are consuming high CPU usage.
# Execute the display process cpu to view tasks that occupy most CPU resources within a period. This example displays information about slot 1.
<Sysname> display process cpu slot 1
CPU utilization in 5 secs: 0.4%; 1 min: 0.2%; 5 mins: 0.2%
JID 5Sec 1Min 5Min Name
1 0.0% 0.0% 0.0% scmd
2 5.5% 5.1% 5.0% [kthreadd]
3 0.0% 0.0% 0.0% [ksoftirqd/0]
...
If a process has a CPU usage higher than 3% (for reference only), further location is required for that process.
# Execute the monitor process dumbtty command to view the real-time CPU usage of a process. This example displays information about CPU 0 for slot 1.
<Sysname> system-view
[Sysname] monitor process dumbtty slot 1 cpu 0
206 processes; 342 threads; 5134 fds
Thread states: 4 running, 338 sleeping, 0 stopped, 0 zombie
CPU0: 99.04% idle, 0.00% user, 0.96% kernel, 0.00% interrupt, 0.00% steal
CPU1: 98.06% idle, 0.00% user, 1.94% kernel, 0.00% interrupt, 0.00% steal
CPU2: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal
CPU3: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal
CPU4: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal
Memory: 7940M total, 5273M available, page size 4K
JID PID PRI State FDs MEM HH:MM:SS CPU Name
322 322 115 R 0 0K 01:48:03 20.02% [kdrvfwdd2]
323 323 115 R 0 0K 01:48:03 20.02% [kdrvfwdd3]
324 324 115 R 0 0K 01:48:03 20.02% [kdrvfwdd4]
376 376 120 S 22 159288K 00:00:07 0.37% diagd
1 1 120 S 18 30836K 00:00:02 0.18% scmd
379 379 120 S 22 173492K 00:00:11 0.18% devd
2 2 120 S 0 0K 00:00:00 0.00% [kthreadd]
3 3 120 S 0 0K 00:00:02 0.00% [ksoftirqd/0]
…
- In the output from the monitor process dumbtty command, find the JIDs of processes with CPU usage higher than 3% (for reference only). Then, execute the display process job command for these processes to collect detailed information about the processes, and identify whether the processes are running on the control core.
If the LAST_CPU field value in the output from the display process job command is the ID of the control core (for example, 0 and 1), a process is running on the CPU control core and further location is required. If the LAST_CPU field value is not the ID of the control core, a process is running on the CPU forwarding core, In this case, no action is required and proceed to step 7. Take the pppd process as an example. The output shows that this process contains multiple threads, all of which are running on the control core.
<Sysname> display process name pppd
Job ID: 515
PID: 515
Parent JID: 1
Parent PID: 1
Executable path: /sbin/pppd
Instance: 0
Respawn: ON
Respawn count: 1
Max. spawns per minute: 12
Last started: Wed Nov 3 09:52:00 2021
Process state: sleeping
Max. core: 1
ARGS: --MaxTotalLimit=2000000 --MaxIfLimit=65534 --CmdOption=0x01047fbf --bSaveRunDb --pppoechastenflag=1 --pppoechastennum=6 --pppoechastenperiod=60 --pppoechastenblocktime=300 --pppchastenflag=1 --pppchastennum=6 --pppchastenperiod=60 --pppchastenblocktime=300 --PppoeKChasten --bSoftRateLimit --RateLimitToken=2048
TID LAST_CPU Stack PRI State HH:MM:SS:MSEC Name
515 0 136K 115 S 0:0:0:90 pppd
549 0 136K 115 S 0:0:0:0 ppp_misc
557 0 136K 115 S 0:0:0:10 ppp_chasten
610 0 136K 115 S 0:0:0:0 ppp_work0
611 1 136K 115 S 0:0:0:0 ppp_work1
612 1 136K 115 S 0:0:0:0 ppp_work2
613 1 136K 115 S 0:0:0:0 mp_main
618 1 136K 115 S 0:0:0:110 pppoes_main
619 1 136K 115 S 0:0:0:100 pppoes_mesh
620 1 136K 115 S 0:0:0:120 l2tp_mesh
621 1 136K 115 S 0:0:0:20 l2tp_main
- For a process running on the control core with CPU usage higher than 5%, check the Name field value to identify whether the process is a user-mode process.
If the Name field for a process includes square brackets ([ ]), the process is a kernel thread, and you do not need to execute the monitor thread dumbtty command. If the Name field for a process does not include square brackets ([ ]), the process is a user process and might contain multiple threads. For user processes with multithreading, execute the monitor thread dumbtty command. If the LAST_CPU field of a thread in the output corresponds to the ID of the CPU control core, and the CPU field value is greater than 5%, this thread might cause high CPU core usage. Then, further location is required.
<Sysname> system-view
[Sysname] monitor thread dumbtty slot 1 cpu 0
206 processes; 342 threads; 5134 fds
Thread states: 4 running, 338 sleeping, 0 stopped, 0 zombie
CPU0: 98.06% idle, 0.97% user, 0.97% kernel, 0.00% interrupt, 0.00% steal
CPU1: 97.12% idle, 0.96% user, 0.96% kernel, 0.96% interrupt, 0.00% steal
CPU2: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal
CPU3: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal
CPU4: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal
Memory: 7940M total, 5315M available, page size 4K
JID TID LAST_CPU PRI State HH:MM:SS MAX CPU Name
322 322 2 115 R 00:04:21 0 20.15% [kdrvfwdd2]
323 323 3 115 R 00:04:21 0 20.15% [kdrvfwdd3]
324 324 4 115 R 00:04:21 0 20.15% [kdrvfwdd4]
1 1 1 120 S 00:00:02 21 0.19% scmd
376 376 1 120 S 00:00:00 1 0.19% diagd
2 2 0 120 S 00:00:00 0 0.00% [kthreadd]
...
b. Identify the stacks of an abnormal task.
Execute the follow job command in probe view to identify the stacks of an abnormal task. The following takes the pppd process (with process ID 515) in slot 1 on the device as an example.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] follow job 515 slot 1
Attaching to process 515 (pppd)
Iteration 1 of 5
------------------------------
Thread LWP 515:
Switches: 3205
User stack:
#0 0x00007fdc2a3aaa8c in epoll_wait+0x14/0x2e
#1 0x0000000000441745 in ppp_EpollSched+0x35/0x5c
#2 0x0000000000000004 in ??
Kernel stack:
[<ffffffff811f0573>] ep_poll+0x2f3/0x370
[<ffffffff811f06c0>] SyS_epoll_wait+0xd0/0xe0
[<ffffffff814aed79>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
Thread LWP 549:
Switches: 20
User stack:
#0 0x00007fdc2a3aaa8c in epoll_wait+0x14/0x2e
#1 0x00000000004435d4 in ppp_misc_EpollSched+0x44/0x6c
Kernel stack:
[<ffffffffffffffff>] 0xffffffffffffffff
...
c. Identify the task name based on steps a and b, and then find the corresponding service module according to the task name to locate and resolve issues in the service module. For example, if the CPU usage of the snmpd task is high, an SNMP attack might occur, or the NMS frequently accesses the device. Then, further troubleshooting is required for the SNMP service module. If the CPU usage of the nqad task is high, the NQA detection might be performed too frequently. Then, further troubleshooting is required for the NQA service module.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
· hh3cEntityExtCpuUsageThresholdNotfication
· hh3cEntityExtCpuUsageThresholdRecover
· hh3cCpuUsageSevereNotification
· hh3cCpuUsageSevereRecoverNotification
· hh3cCpuUsageMinorNotification
· hh3cCpuUsageMinorRecoverNotification
Log messages
· DIAG/5/CPU_MINOR_RECOVERY
· DIAG/4/CPU_MINOR_THRESHOLD
· DIAG/5/CPU_SEVERE_RECOVERY
· DIAG/3/CPU_SEVERE_THRESHOLD
Troubleshooting virtual technology issues
IRF issues
IRF setup failure
Symptom
Several devices cannot form an IRF fabric, or a new member device cannot join an existing IRF fabric.
Common causes
The following are the common causes of this type of issue:
· When you use member devices to set up a new IRF fabric, the total number of IRF member devices exceeds the upper limit. When you add a new member device to an existing IRF fabric, the number of existing IRF member devices has reached the upper limit in that IRF fabric.
· The device configuration does not meet the IRF setup requirements.
· The IRF physical interfaces, cables, and physical topology do not meet the IRF setup requirements. As a result, the IRF links cannot come up.
Troubleshooting flow
Figure 28 shows the troubleshooting flowchart.
Figure 28 Flowchart for troubleshooting IRF setup failure
Solution
IMPORTANT: This section only covers the routine requirements for setting up an IRF fabric. For more information about the requirements for setting up an IRF fabric, see IRF configuration in the configuration guides for the product. |
1. Identify whether the number of IRF member devices has reached the maximum value supported by the system.
Execute the display irf command to view the number of member devices in the current IRF fabric. If the number of IRF member devices has reached the maximum value supported by the system, you cannot add any member device to the IRF fabric.
The maximum number of member devices in an IRF fabric varies by device model.
2. Verify that all member devices run the same version of software.
Execute the display version command to display the current software version on each device. Only devices running the same software version can form an IRF fabric.
Typically, the IRF auto-update feature (enabled by default) can automatically synchronize the software version of a member device with the software version of the master device. However, the synchronization might fail when the gap between the software versions is large. In this case, you must manually upgrade the software of that member device.
If the member device has two MPUs, you must upgrade software for both the MPUs to ensure software consistency across them.
3. Verify that the IRF configuration on each member device meets the IRF setup requirements.
a. Verify that all member devices are operating in IRF mode.
Some products are shipped in IRF mode and do not support mode conversion. Some products are shipped in standalone mode and support mode conversion. If a device supports the display irf link or display irf topology command, the device is operating in IRF mode. If a device does not support either of the commands, the device is operating in standalone mode. To enable IRF mode for the device, execute the chassis convert mode irf command in system view.
<Sysname> display irf ?
> Redirect it to a file
>> Redirect it to a file in append mode
configuration IRF configuration that will be valid after reboot
link Display link status
topology Topology information
| Matching output
<cr>
b. Verify that the member ID of each member device is unique across the IRF fabric.
Execute the display irf command to display the member IDs of the member devices in the IRF fabric. Each member device in the IRF fabric must use a unique member ID. Devices that use the same member ID cannot establish an IRF fabric or join the same IRF fabric. The default member ID for a device is 1. In standalone mode, you can change the IRF member ID of a device by using the irf member command. In IRF mode, you can change the IRF member ID of a device by using the irf member renumber command. For the new member ID to take effect, you must save the configuration and reboot the device.
c. Verify that each member device is shipped with a unique bridge MAC address.
Member devices shipped with the same bridge MAC address cannot join the same IRF fabric. Typically, each device is shipped with a unique bridge MAC address across the network. If IRF setup fails and the Failed to stack because of the same bridge MAC addresses message is generated, two devices are shipped with the same bridge MAC address. In this case, use the irf mac-address command to change the bridge MAC address on one of the devices. (Support for the irf mac-address command depends on the device model.)
d. Verify that all member devices in the same IRF fabric use the same IRF domain ID.
The IRF domain ID does not affect IRF fabric setup and merge, but it affects multi-active detection (MAD). To ensure that MAD can operate correctly, make sure all member devices in the same IRF fabric use the same IRF domain ID. By default, the IRF domain ID is 0. To obtain the IRF domain ID of a device, execute the display irf command on that device and check the value in the Domain ID field of the command output. If the IRF domain ID of a device is different from that of the other devices, execute the irf domain command to change the IRF domain ID on the device.
4. Verify that the IRF ports are in up state.
An IRF port is a logical interface that connects IRF member devices. To use an IRF port, you must bind a minimum of one physical interface to it. To obtain the status of IRF ports, execute the display irf topology command and check the value in the Link field of the command output.
<Sysname> display irf topology
Topology Info
-------------------------------------------------------------------------
IRF-Port1 IRF-Port2
MemberID Link neighbor Link neighbor Belong To
2 DIS --- UP 1 5e40-08d9-0104
1 UP 2 DIS --- 5e40-08d9-0104
¡ If the value of the Link field is UP for an IRF port on a member device, the IRF port is correctly connected and no action is required.
¡ If the value of the Link field is DIS for an IRF port on a member device, no IRF physical interfaces have been bound to the IRF port. If binding IRF physical interfaces to the IRF port is required, execute the port group interface command in IRF port view to bind IRF physical interfaces to the IRF port.
¡ If the value of the Link field is DOWN for an IRF port on a member device, execute the display irf link command to examine whether the IRF physical interfaces bound to the IRF port are in UP state.
- If a minimum of one IRF physical interface is up when the IRF port is down, the configuration of the IRF port might not be activated. To activate the IRF port configuration, execute the irf-port-configuration active command in system view.
- If no IRF physical interfaces are in UP state, proceed to step 5 to troubleshoot the IRF physical interface issue.
¡ If the value of the Link field is TIMEOUT for an IRF port on a member device, the IRF hello packets have timed out and the IRF link has communication issues. Perform the following tasks to locate the timeout issue of IRF packets:
- Identify whether the IRF packet exchange failure is caused by an anomaly of the neighboring IRF port. For this purpose, log in to the neighboring device at the other end of the IRF link, execute the display irf topology and display irf link commands on the neighboring device, and then locate the issue based on the command output.
- Verify that no network loops exist on the IRF fabric, as they lead to packet loss. To identify whether a network loop exists, execute the display counters rate inbound interface command to display the packet rate statistics of the IRF physical interfaces and examine whether a packet storm has occurred on the IRF link. If a packet storm exists, check for a physical loop and examine whether the VLAN and STP settings are correct. If a physical loop exists or the settings are incorrect, remove the loop or correct the settings to resolve the packet storm issue.
- Execute the display device command to examine whether the switching fabric modules are operating correctly. If not, first troubleshoot the issue with the switching fabric module.
¡ If the value of the Link field is ISOLATE for an IRF port on a member device, the member device is isolated. In this case, execute the display logbuffer | include STM stackability check command, and then proceed according to the command output.
- If the command output includes the STM stackability check: Product series is inconsistency message, the model of the member device does not meet the IRF setup requirements. In this case, proceed to step 7.
- If the command output includes the STM stackability check: Product xxx is inconsistency message, where xxx might represent the system operating mode or other settings that require consistency across member devices, the current system parameter configuration does not meet the IRF setup requirements. In this case, proceed to step 8.
Execute the display irf link command to check the state of IRF physical interfaces.
¡ If the value of the Interface field is disable for an IRF port, no IRF physical interfaces have been bound to the IRF port.
¡ If the value of the Interface field for an IRF port is one or multiple physical interface names, continue to check the Status field. The value and meaning of the Status field are as follows:
- UP—An IRF physical link is up. In this state, no action is required.
- DOWN—An IRF physical link is down. In this case, verify that the transceiver module and fiber or cable of the IRF physical interface is operating correctly. You must use a physical interface that meets the product requirements as an IRF physical interface and use a connection medium that meets the product requirements to connect the IRF physical interface. When the transceiver module and fiber or cable of the IRF physical interface is operating correctly, proceed to step 6.
- ADM—An IRF physical interface is shut down by using the shutdown command. In this state, the IRF physical interface is administratively down. To bring up the IRF physical interface, you must execute the undo shutdown command.
- ABSENT—An IRF physical interface does not exist. You can insert the card or expansion interface module that hosts the interface.
6. Verify that the IRF physical connections meet the IRF connection requirements.
Perform the following operations to locate an IRF physical connection issue:
a. On each member device, execute the display irf configuration command to view the binding relationship between IRF ports and IRF physical interfaces. Verify that the IRF physical interfaces bound to IRF ports are consistent with those on the IRF physical connections. If not, reconfigure the IRF port bindings or reconnect physical interfaces.
b. Verify that the IRF physical interfaces are correctly connected. Make sure the IRF physical interfaces of IRF-port 1 on one member device are connected to the IRF physical interfaces of IRF-port 2 on another member device. If the IRF fabric contains only two member devices, you must connect them in a daisy-chain topology rather than a ring topology.
7. Verify that the hardware of the member devices meets the IRF setup requirements.
You must use hardware that meets the IRF setup requirements to set up an IRF fabric. For example, the device model, MPUs, interface modules, and IRF physical interfaces must meet the IRF setup requirements. You can perform the following tasks to determine whether the device hardware meets the IRF setup requirements:
# Execute the display version command to check the device model.
<Sysname> display version
H3C Comware Software, Version 7.1.070, Alpha 704228
Copyright (c) 2004-2021 New H3C Technologies Co., Ltd. All rights reserved.
H3C S12508X-AF uptime is 0 weeks, 0 days, 2 hours, 31 minutes
Last reboot reason : Cold reboot
...
# Execute the display device command to check the models of the MPUs and interface modules.
...
# Execute the display interface command to check the rate and type of each IRF physical interface.
...
8. Verify that the system parameter settings meet the IRF setup requirements.
To set up an IRF fabric, all member devices must use the same system parameter settings, including the same system operating mode, VXLAN hardware resource mode, route hardware resource mode, and maximum number of ECMP routes. (The restrictions vary by device model.)
¡ To display the system operating mode on a device, use the display system-working-mode command. To change the system operating mode of the device, use the system-working-mode command. For the mode change to take effect, you must save the configuration and reboot the device.
¡ To display the hardware resource modes on a device, use the display hardware-resource command. To change the VXLAN and route hardware resource modes of the device, use the hardware-resource vxlan and hardware-resource routing-mode commands, respectively. For the mode changes to take effect, you must save the configuration and reboot the device.
¡ To display the maximum number of IPv4 ECMP routes and the maximum number of IPv6 ECMP routes supported by the system, use the display max-ecmp-num and display ipv6 max-ecmp-num commands, respectively. To change the maximum number of IPv4 ECMP routes and the maximum number of IPv6 ECMP routes, use the max-ecmp-num and ipv6 max-ecmp-num commands, respectively. For the changes to take effect, you must save the configuration and reboot the device.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: HH3C-STACK-MIB
· hh3cStackPhysicalIntfLinkDown(1.3.6.1.4.1.25506.2.91.6.0.8)
· hh3cStackPhysicalIntfRxTimeout (1.3.6.1.4.1.25506.2.91.6.0.9)
Log messages
· STM/3/STM_LINK_DOWN
· STM/2/STM_LINK_TIMEOUT
· STM/6/STM_LINK_UP
· STM/4/STM_SAMEMAC
· STM/3/STM_SOMER_CHECK
Unexpected reboot of an IRF member device
Symptom
The master device or a subordinate device in an IRF fabric reboots unexpectedly. As a result, the IRF fabric splits.
Common causes
The following are the common causes of this type of issue:
· The subordinate device automatically reboots to load startup software images from the master device.
· IRF merge causes the subordinate device to reboot.
· A software or hardware fault causes the device to reboot unexpectedly in an attempt to fix the fault.
Troubleshooting flow
Figure 29 shows the troubleshooting flowchart.
Figure 29 Flowchart for troubleshooting unexpected reboot of an IRF member device
Solution
1. Identify whether the rebooted device is a subordinate device.
¡ If the device is a subordinate device, proceed to step 2.
¡ If the device is the master device, proceed to step 4.
2. Identify whether the reboot is caused by the software auto-update feature.
¡ If the reboot is caused by the software auto-update feature, no action is required.
¡ If the reboot is not caused by the software auto-update feature, proceed to step 3.
To identify whether the reboot of the subordinate device is caused by the software auto-update feature, execute the display system internal irf msg command in probe view. If the command output includes the Version is different, and the sender CPU MAC is xxxx-xxxx-xxxx (chassis xx slot xx). message, the reboot of the subordinate device with the CPU MAC of xxxx-xxxx-xxxx is caused by the software auto-update feature.
3. Identify whether the reboot is caused by an IRF merge.
¡ If the reboot is caused by an IRF merge, locate the causes of the IRF split and merge, and eliminate security risks to prevent the same issue from causing an IRF split and merge again.
¡ If the reboot is not caused by an IRF merge, proceed to step 4.
To identify whether the reboot of the subordinate device is caused by an IRF merge:
¡ Execute the display kernel reboot command on the IRF fabric to obtain the device reboot reason after the device reboots. If the value for the Reason field is 0x7, the device reboots due to an IRF merge. The value for the Slot field represents the number of the slot that triggers the reboot, and the value for the Target Slot field represents the number of the slot that has been rebooted.
<Sysname> display kernel reboot 1
--------------------- Reboot record 1 ---------------------
Recorded at : 2021-12-06 00:10:05.440616
Occurred at : 2021-12-06 00:10:05.440616
Reason : 0x7
Thread : STM_Main (TID: 232)
Context : thread context
Slot : 1
Target Slot : 2
Cpu : 0
VCPU ID : 2
Kernel module info : module name (system) module address (0xffffffffc0074000)
module name (addon) module address (0xffffffffc0008000)
¡ Execute the display system internal irf msg | include reboot command in probe view on the IRF fabric. If the master device has sent a reboot message, the reboot of the subordinate device is caused by an IRF merge.
19> Send reboot pkt, src_addr 5e40-08d9-0104 (chassis 1 slot 1), at 2022/1/5 15:42:48:386
4. Examine whether the reboot is caused by a software or hardware fault.
Execute the display version command, check the Reboot Cause field for the reboot cause, and handle the reboot issue according to the reboot cause as shown in Table 2.
<Sysname> display version
...
Reboot Cause : ColdReboot
[SubSlot 0] 24GE+4SFP Plus+POE
Table 2 Device reboot causes and recommended actions
Value for the Reboot Cause field |
Reboot cause description |
Recommended actions |
AutoUpdateReboot |
The reboot was caused by an automatic software upgrade. |
No action is required. |
BootwareBackupReboot |
Bootware backup area reboot. |
Collect log messages and diagnostic messages, and then contact Technical Support for help. |
ColdReboot |
The reboot was caused by a power cycle. |
Check the power supply environment of the device to ensure that the power supply module can provide power correctly to the device. |
CryptographicModuleSelftestsFailedReboot |
The reboot was caused by an algorithm library self-test failure. |
Upgrade the software version as soon as possible. |
CryptotestFailReboot |
The reboot was caused by a cryptographic algorithm library self-check failure. |
Upgrade the software version as soon as possible. |
DeadLoopReboot |
The reboot was caused by a kernel thread dead loop. |
Collect log messages, diagnostic messages, and the command output from the display kernel deadloop 20 verbose command executed for the reboot slot, and then contact Technical Support for help. |
DEVHandShakeReboot |
The reboot was caused by a device management handshake failure. |
Execute the display device command to identify whether the active MPU is in Normal state. If the state is not Normal, the MPU might fail. You must resolve the MPU issue first. |
GoldMonReboot |
The Generic OnLine Diagnostics (GOLD) module detected an exception. |
Perform the following operations to locate the reboot cause: 1. Execute the display diagnostic content command, check the Correct-action field, and find that the corrective action is reboot. Then, obtain the time when the device was rebooted and troubleshoot issues occurred around the time. 2. Execute the display diagnostic event-log command to display GOLD log entries. 3. Locate the reboot cause based on the command output and resolve the issue. |
IRFMergeReboot |
The reboot was caused by an IRF merge. |
An IRF link failure can cause an IRF split. Once the IRF link is recovered, the IRF fabric will automatically merge. To prevent the same issue from causing an IRF split and merge again, locate and resolve the issue. |
KernelAbnormalReboot |
A CPU, host memory, or software issue led to a system kernel error. |
Collect log messages, diagnostic messages, and the command output from the display kernel exception 10 verbose and display kernel reboot 20 verbose commands, and then contact Technical Support for help. |
KeyReboot |
The RESET key was pressed. |
Avoid accidental operations. |
LicenseTimeoutReboot |
The license has expired. |
Install a formal license as soon as possible. |
MasterLostReboot |
The master slot was rebooted while the current slot was performing a bulk backup operation. |
Collect log messages and diagnostic messages, and then contact Technical Support for help. |
MemoryexhaustReboot |
The amount of free memory is lower than the threshold value. |
Identify the cause of high memory usage and resolve the high memory usage fault accordingly. For example, too many ACL entries can cause high memory usage. |
PdtReboot |
The reboot was required by the driver. |
Collect log messages and diagnostic messages, and then contact Technical Support for help. |
SelfReboot |
The current slot was reset. |
Collect log messages and diagnostic messages, and then contact Technical Support for help. |
StandbyCannotUpdateReboot |
The standby MPU cannot be upgraded to the active MPU. |
Collect log messages and diagnostic messages, and then contact Technical Support for help. |
StandbySwitchReboot |
The original active MPU was rebooted after an active/standby switchover. |
Identify the cause of the active/standby switchover and resolve the fault that causes the active/standby switchover to prevent another unexpected active/standby switchover. For example, software upgrade can cause an active/standby switchover. |
UserReboot |
The reboot was caused by a manual operation through the CLI, the network manager, or the Web interface. |
No action is required. |
WarmReboot |
The reboot might be caused by various reasons, for example, poor contact of board pins. |
Collect log messages and diagnostic messages, and then contact Technical Support for help. |
WatchDogReboot |
The watchdog detected a system fault, for example, a CPU, memory, software, or hardware fault. |
Use the display hardware-failure-detection command to locate the cause of the fault based on the command output, and troubleshoot the fault. |
5. If the issue persists, collect the following information and contact Technical Support:
¡ For example, the active MPU is in slot 16 and the standby MPU is in slot 17. The standby MPU reboots. To resolve the issue, collect the output information of the following commands:
- Execute the following commands in any view:
display version
display device
display diagnostic-information
display kernel deadloop 20 verbose slot 16
display kernel exception 10 verbose slot 16
display kernel reboot 20 verbose slot 16
- Execute the following commands in probe view to collect information:
local logbuffer slot 17 display
local logbuffer slot 17 display from-highmemory
display reboot last-time slot 17
display system internal version
display diag-msg start-msg slot 17
|
NOTE: Support for these commands depends on the device model and software version. |
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· DEV/1/AUTO_SWITCH_FAULT_REBOOT
· DEV/5/BOARD_REBOOT
· DEV/1/BOARD_RUNNING_FAULT_REBOOT
· DEV/5/CHASSIS_REBOOT
· DEV/5/SUBCARD_REBOOT
· DEV/5/SYSTEM_REBOOT
· STM/4/STM_MERGE
Troubleshooting interface issues
Tunnel interface issues
Tunnel interface instability
Symptom
After you configure a P2P tunnel (for example, a GRE, IPv4, or IPv6 tunnel), the local tunnel interface is in up state. You can ping the IP address of the remote tunnel interface from the local tunnel interface. However, the local tunnel interface is unstable. The following symptoms exist:
· The tunnel interface is repeatedly coming up and going down.
· The tunneled packet loss rate is high and the transmission rate is low.
This section uses a GRE/IPv4 tunnel as an example to describe the troubleshooting procedure.
Common causes
The following are the common causes of this type of issue:
· Routes destined for the tunnel destination address are flapping, which causes the tunnel to be flapping.
· The same source and destination addresses are configured on the device for two tunnels. As a result, only one tunnel can come up.
· Keepalive is enabled on the GRE tunnel interface. However, the device cannot correctly send or receive GRE keepalive packets. As a result, the device places the tunnel in down state.
· The device does not have sufficient resources to successfully issue the tunnel to the hardware. As a result, the tunnel is down on the physical layer.
· The configuration on the tunnel interface is inappropriate, leading to the loss of tunneled packets.
Troubleshooting flow
Figure 30 shows the troubleshooting flowchart.
Figure 30 Flowchart for troubleshooting tunnel interface instability
Solution
1. Examine whether routes are flapping.
Execute the debugging tunnel event command to enable tunneling event debugging. If the system continuously generates route refresh or deletion messages, routes are flapping. In this case, the tunnel is also flapping. The following information shows a sample command output:
<Sysname> debugging tunnel event
<Sysname> %Jun 16 12:49:55:497 2022 Sysname BGP/5/BGP_STATE_CHANGED: -MDC=1; BGP.: 4.4.4.4 state has changed from ESTABLISHED to IDLE for TCP_Connection_Failed event received.
//The system received a TCP connection failure event from the BGP peer at 4.4.4.4. The state of the BGP session has changed from Established to Idle.
%Jun 16 12:49:55:497 2022 Sysname BGP/5/BGP_STATE_CHANGED_REASON: -MDC=1; BGP.: 4.4.4.4 state has changed from ESTABLISHED to IDLE. (Reason: TCP connection failed(No route to host))
//The BGP session established for the BGP peer at 4.4.4.4 has changed from Established state to Idle state due to a TCP connection failure (no route to reach the host).
If routes are flapping, locate the cause of the route flapping based on the route refresh or deletion messages. For example, if the BGP session cannot stably enter the Established state, troubleshoot the issue according to the BGP troubleshooting manual.
If routes are not flapping, proceed to the next step.
2. Identify where the device at each end has tunnels with the same source and destination addresses.
Execute the display interface tunnel command in any view on the device at each end of the tunnel and identify whether the same device has multiple tunnels that use the same source and destination addresses.
<Sysname> display interface Tunnel
Tunnel1
Current state: UP
Line protocol state: UP
Description: Tunnel1 Interface
Bandwidth: 64 kbps
Maximum transmission unit: 1464
Internet protocol processing: Disabled
Output queue - Urgent queuing: Size/Length/Discards 0/100/0
Output queue - Protocol queuing: Size/Length/Discards 0/500/0
Output queue - FIFO queuing: Size/Length/Discards 0/75/0
Last clearing of counters: 15:20:18 Mon 06/13/2022
Tunnel source 1.1.1.1, destination 2.2.2.2
...
If multiple tunnels use the same source and destination addresses on the same device, only one of them can come up. You can execute the undo interface tunnel command to delete unneeded tunnels. If the same device does not have multiple tunnels that use the same source and destination addresses, proceed to the next step.
3. Identify whether GRE keepalive is configured and whether GRE keepalive packets can be sent and received correctly.
Execute the display current-configuration interface tunnel command in any view to display the keepalive configuration of the tunnel interface.
<Sysname> display current-configuration interface tunnel
#
interface Tunnel1 mode gre
ip address 10.1.1.2 255.255.255.0
source 12.1.1.4
destination 12.1.1.2
keepalive 3 3
#
On the local end, execute the debugging gre packet command to enable GRE packet debugging to identify whether the local end can correctly receive and send keepalive packets.
<Sysname> debugging gre packet
*Jun 16 12:46:50:350 2022 Sysname GRE/7/packet: -MDC=1;
Tunnel1 packet: Before encapsulation,
12.1.1.2->12.1.1.4 (length = 24)
*Jun 16 12:46:50:350 2022 Sysname GRE/7/packet: -MDC=1;
Tunnel1 packet: After encapsulation,
12.1.1.4->12.1.1.2 (length = 48)
*Jun 16 12:46:50:351 2022 Sysname GRE/7/packet: -MDC=1;
Tunnel1 packet: Before de-encapsulation according to fast-forwarding table,
12.1.1.2->12.1.1.4 (length = 24)
*Jun 16 12:46:50:351 2022 Sysname GRE/7/packet: -MDC=1;
Tunnel1 : Received a keepalive packet.
//Tunnel 1 received a keepalive packet.
On the remote end, enable GRE packet debugging. If the remote end has sent keepalive packets but the local end does not receive any of them, the GRE keepalive packets might fail to pass the local checksum check. As a result, the local tunnel interface goes down. You can execute the undo gre checksum command to disable GRE checksum on the local end or execute the undo keepalive command to disable GRE keepalive.
If the local end can successfully receive keepalive packets from the remote end, proceed to step 4.
4. Verify that the hop limit or TTL and DF bit settings of tunneled packets are properly configured.
Execute the display current-configuration interface tunnel command in any view to check the configuration of the hop limit or TTL and DF bit parameters.
#
interface Tunnel1 mode gre
ip address 10.1.1.2 255.255.255.0
source 12.1.1.4
destination 12.1.1.2
keepalive 3 3
tunnel ttl 1
tunnel dfbit enable
#
If the parameters are not properly configured, tunneled packets might be discarded.
A too small hop limit or TTL value might cause tunnel packets to be discarded on intermediate devices due to TTL timeout. In this case, execute the tunnel ttl command in tunnel interface view to set a proper TTL value according to the actual network configuration.
If the DF bit is set for tunneled packets, intermediate devices might discard tunneled packets if the length of these packets exceeds the MTU of the interfaces on the forwarding path. In this case, set the MTU of each interface on the forwarding path to be greater than the length of tunneled packets. If you cannot ensure that the MTU of each interface on the forwarding path is greater than the length of tunneled packets, do not set the DF bit for tunneled packets.
If the issue persists, proceed to step 5.
5. Identify whether the system fails to process the event for issuing the tunnel to hardware.
Enable tunneling event debugging, and observe whether the system has tunneled packets or events that have failed to be issued to the kernel or driver. The following information shows an example:
<Sysname> debugging tunnel all
*Jun 16 12:51:25:832 2022 Sysname TUNNEL/7/event: -MDC=1;
Tunnel1 notifies driver: Operation = 4.
TunnelIfIndex = 524, EvilinkIfIndex = 0
VRFIndex = 0, DstVRFIndex = 0
TunnelMode = IPv4 GRE, TransPro = 1
TunnelSrc = 12.1.1.4
TunnelDst = 12.1.1.2
TTL = 255, ToS = 0, DFBit = 0
MTU = 1476, IPv6Mtu = 1476
DrvContext[0] = 0xffffffffffffffff, DrvContext[1] = 0xffffffffffffffff
VNHandle = 0x20000040, ADJIndex = 0xfaf3889c
//Tunnel interface 1 notified the driver to execute operation 4.
*Jun 16 12:51:25:832 2022 Sysname TUNNEL/7/event: -MDC=1;
Processing result of operation 4 for Tunnel1: failed.
//The driver failed to process operation 4 issued by tunnel interface 1.
%Jun 16 12:51:25:832 2022 Sysname IFNET/3/PHY_UPDOWN: -MDC=1; Physical state on the interface Tunnel1 changed to down.
%Jun 16 12:51:25:832 2022 Sysname IFNET/5/LINK_UPDOWN: -MDC=1; Line protocol state on the interface Tunnel1 changed to down.
//Tunnel interface 1 went down.
*Jun 16 12:51:27:350 2022 Sysname TUNNEL/7/event: -MDC=1;
Tunnel1 can't come up because there is not enough hardware resource
//Tunnel 1 cannot come up because of insufficient hardware resources.
If the device generates the event or error messages in Table 3, a hardware fault causes tunnel instability. In this case, contact Technical Support.
Table 3 Debugging messages related to hardware
Field |
Description |
Tunnelnum can't come up because reason. |
Reason why a tunnel interface cannot come up. The value for the reason variable is there is not enough hardware resource. |
Failed to save 6RD prefix to DBM. |
The system failed to save the IPv6 prefix of the 6RD tunnel to the database in memory (DBM). |
Failed to save IPv4 prefix/suffix for 6RD tunnel to DBM. |
The system failed to save the IPv4 prefix or suffix of the 6RD tunnel to the DBM. |
Failed to save 6RD BR address to DBM. |
The system failed to save the BR address of the 6RD tunnel to the DBM. |
Failed to send 6RD prefix to kernel. |
The system failed to send the 6RD prefix configuration message to the kernel for the tunnel. |
Failed to send IPv4 prefix/suffix for 6RD tunnel to kernel. |
The system failed to send the 6RD IPv4 configuration message to the kernel for the tunnel. |
Failed to send 6RD BR address to kernel. |
The system failed to send the 6RD BR address configuration message to the kernel for the tunnel. |
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting Layer 2—LAN switching issues
Ethernet link aggregation issues
Down aggregate interface
Symptom
When two devices are connected through link aggregation, the output from the display interface command indicates that an aggregate interface is down.
Common causes
The following are the common causes for this type of issue:
· Incorrect configuration on the aggregate interface.
· Physical link fault on the member ports.
· Failure in sending and receiving LACP protocol packets.
Troubleshooting flow
To resolve this issue:
1. Use the display link-aggregation verbose command to check whether the member ports are in selected state. If a port is in unselected state, use the display interface command to check whether the physical status of the member port is up and eliminate physical faults on the port.
2. Check the local and peer aggregate interface configurations to eliminate configuration faults.
3. Use the debugging link-aggregation lacp packet command to view the LACP interaction situation of the member ports of dynamic aggregation.
Figure 31 shows the troubleshooting flowchart.
Figure 31 Flowchart for troubleshooting down aggregate interface
Solution
1. Check whether the physical connections are correct.
Verify that links are connected to the aggregate interface as planned.
If a physical connection is correct, proceed to step 2.
2. Whether the aggregate interface is shut down manually.
Execute the display interface command to check the physical state of the aggregate interface. If it displays Administratively DOWN, the aggregate interface is manually shut down. Execute the undo shutdown command to enable the aggregate interface. If the aggregate interface has not been manually shut down, proceed to step 3.
3. Check whether the member ports in the aggregation group are up.
Execute the display interface command to Identify whether the member ports in the aggregation group are up. If not, follow the troubleshooting procedure for the down interface issue.
If the interface is up, proceed to step 4.
For example, the member port GigabitEthernet 2/0/1 in the Layer 2 aggregation group 1 is in unselected state. In the output from the display interface command, the physical status of GigabitEthernet2/0/1 is DOWN, making the member port GigabitEthernet 2/0/1 unselected.
<Sysname> display link-aggregation verbose
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected, I -- Individual
Port: A -- Auto port, M -- Management port, R -- Reference port
Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
D -- Synchronization, E -- Collecting, F -- Distributing,
G -- Defaulted, H -- Expired
Aggregate Interface: Bridge-Aggregation1
Aggregation Mode: Static
Loadsharing Type: Shar
Management VLANs: None
Port Status Priority Oper-Key
GE2/0/1 U 32768 1
<Sysname> display interface GigabitEthernet 2/0/1
GigabitEthernet2/0/1
Current state: DOWN
Line protocol state: DOWN
IP packet frame type: Ethernet II, hardware address: 2a41-21c1-0100
Description: GigabitEthernet2/0/1 Interface
Bandwidth: 1000000 kbps
Loopback is not set
Unknown-speed mode, full-duplex mode
Link speed type is autonegotiation, link duplex type is force link
Flow-control is not enabled
Maximum frame length: 9216
Allow jumbo frames to pass
Broadcast max-ratio: 100%
Multicast max-ratio: 100%
Unicast max-ratio: 100%
Known-unicast max-ratio: 100%
PVID: 1
MDI type: Automdix
Port link-type: Access
Tagged VLANs: None
Untagged VLANs: 1
Port priority: 2
Last link flapping: 0 hours 0 minutes 15 seconds
Last clearing of counters: Never
Current system time:2021-08-10 10:15:02
Last time when physical state changed to up:2021-08-09 18:31:43
Last time when physical state changed to down:2021-08-10 10:14:47
Peak input rate: 0 bytes/sec, at 00-00-00 00:00:00
Peak output rate: 0 bytes/sec, at 00-00-00 00:00:00
Last 300 seconds input: 5000 packets/sec 5000 bytes/sec -%
Last 300 seconds output: 5000 packets/sec 5000 bytes/sec -%
Input (total): 5000 packets, 5000 bytes
5000 unicasts, 5000 broadcasts, 5000 multicasts, 0 pauses
Input (normal): 0 packets, 0 bytes
0 unicasts, 0 broadcasts, 0 multicasts, 0 pauses
Input: 5000 input errors, 0 runts, 0 giants, 0 throttles
0 CRC, 0 frame, 0 overruns, 0 aborts
5000 ignored, 0 parity errors
Output (total): 5000 packets, 5000 bytes
5000 unicasts, 5000 broadcasts, 5000 multicasts, 0 pauses
Output (normal): 0 packets, 0 bytes
0 unicasts, 0 broadcasts, 0 multicasts, 0 pauses
Output: 5000 output errors, 0 underruns, 0 buffer failures
5000 aborts, 0 deferred, 0 collisions, 0 late collisions
0 lost carrier, 0 no carrier
4. Check whether the aggregate interface is in dynamic mode.
¡ If the aggregate interface is in dynamic mode, check whether the peer aggregate interface is also in dynamic mode. Execute the display link-aggregation verbose command in any view to check the aggregation mode of the aggregate interfaces at both ends of the link and ensure that the aggregation modes at both ends are the same.
Taking the Layer 2 aggregate interface as an example, when Aggregation Mode: Dynamic is displayed, the aggregation interface is in dynamic mode:
<Sysname> display link-aggregation verbose bridge-aggregation 10
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected, I -- Individual
Port: A -- Auto port, M -- Management port, R -- Reference port
Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
D -- Synchronization, E -- Collecting, F -- Distributing,
G -- Defaulted, H -- Expired
Aggregate Interface: Bridge-Aggregation10
Creation Mode: Manual
Aggregation Mode: Dynamic
Loadsharing Type: Shar
Management VLANs: None
System ID: 0x8000, 000f-e267-6c6a
Local:
Port Status Priority Index Oper-Key Flag
GE2/0/1 S 32768 61 2 {ACDEF}
GE2/0/2 S 32768 62 2 {ACDEF}
GE2/0/3 S 32768 63 2 {ACDEF}
Remote:
Actor Priority Index Oper-Key SystemID Flag
GE2/0/1(R) 32768 111 2 0x8000, 000f-e267-57ad {ACDEF}
GE2/0/2 32768 112 2 0x8000, 000f-e267-57ad {ACDEF}
GE2/0/3 32768 113 2 0x8000, 000f-e267-57ad {ACDEF}
If the configuration is incorrect, change the aggregation interface of the remote end to dynamic aggregation. If the configuration is correct, execute the debugging link-aggregation lacp packet command to identify whether LACP packets are received and sent correctly.
Execute the debugging link-aggregation lacp packet command to view the Actor field in the send information and the Partner field in the receive information of the member port. If the sys-mac, key, and port-index fields are inconsistent, the LACP protocol packet transmission is abnormal. Identify whether the receiving or sending fiber is disconnected. If the sys-mac, key, and port-index fields are consistent, the LACP protocol packet transmission is normal, and proceed to step 5.
Enable the debugging switch for the LACP packets of the aggregation member port GigabitEthernet 2/0/1, and observe LACP packet receiving and sending on this port.
<Sysname> debugging link-aggregation lacp packet all interface gigabitethernet 2/0/1
*Nov 2 15:51:21:15 2007 Sysname LAGG/7/Packet: PACKET.GigabitEthernet2/0/1.send.
size=110, subtype =1, version=1
Actor: type=1, len=20, sys-pri=0x8000, sys-mac=00e0-fc02-0300, key=0x1, pri=0x8000, port-index=0x2, state=0xc5
Partner: type=2, len=20, sys-pri=0x0, sys-mac=0000-0000-0000, key=0x0, pri=0x0, port-index=0x0, state=0x32
Collector: type=3, len=16, col-max-delay=0x0
Terminator: type=0, len=0
*Nov 2 15:55:21:15 2007 Sysname LAGG/7/Packet: PACKET.GigabitEthernet2/0/1.receive.
size=110, subtype =1, version=1
Actor: type=1, len=20, sys-pri=0x8000, sys-mac=00e0-fc00-0000, key=0x1, pri=0x8000, port-index=0x6, state=0xd
Partner: type=2, len=20, sys-pri=0x8000, sys-mac=00e0-fc02-0300, key=0x1, pri=0x8000, port-index=0x2, state=0xc5
Collector: type=3, len=16, col-max-delay=0x0
Terminator: type=0, len=0
¡ If the aggregate interface is in static mode, proceed to step 5.
5. Check whether the minimum number of selected ports for the aggregate interface affects the selection of member ports.
Execute the display this command in aggregate interface view. If the link-aggregation selected-port minimum command is configured, modify the minimum selected port limit to meet the selection requirement. If the number of selectable member ports are increased to the minimum number of selected member ports or a larger value, the status of these member ports will become selected, and the link state of the corresponding aggregate interface will also change to up.
If the minimum number of selected ports for the aggregation interface does not affect the selection of the member ports, proceed to step 6.
For example, the minimum number of selected ports for Layer 2 aggregate interface 1 is 2. The aggregation group of Layer 2 aggregation interface 1 has only one member port, so this member port is in unselected state.
[Sysname-Bridge-Aggregation1] display this
#
interface Bridge-Aggregation1
link-aggregation selected-port minimum 2
#
return
[Sysname-Bridge-Aggregation1] display link-aggregation verbose
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected, I -- Individual
Port: A -- Auto port, M -- Management port, R -- Reference port
Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
D -- Synchronization, E -- Collecting, F -- Distributing,
G -- Defaulted, H -- Expired
Aggregate Interface: Bridge-Aggregation1
Aggregation Mode: Static
Loadsharing Type: Shar
Management VLANs: None
Port Status Priority Oper-Key
GE2/0/1 U 32768 1
6. Check whether selected member ports exist in the aggregation group.
If no selected member port exists in the aggregation group, see "Unselection of aggregation member ports." If selected member ports exist in the aggregation group, proceed to step 7.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Uneven traffic load sharing on an aggregate interface
Symptom
When two devices are connected through a link aggregation, output from the display counters rate command shows that some member ports have extremely low rates or a rate of 0 in the outbound direction.
Common causes
The common cause is the incorrect configuration of the aggregation load sharing method.
Troubleshooting flow
To resolve this issue, identify the characteristics of the packets forwarded by the aggregate interface and check whether the aggregate load sharing mode matches the packet characteristics.
Figure 32 shows the troubleshooting flowchart.
Figure 32 Flowchart for troubleshooting uneven traffic load sharing on an aggregate interface
Solution
1. Check whether the user service traffic is normal.
If the user service traffic is normal, wait for a while and then execute the display counters rate command to check the outbound traffic rate of the aggregation member ports. Check whether the traffic load sharing of the aggregation member ports has been restored.
¡ If load sharing has been restored, no action is required.
¡ If load sharing is not restored, proceed to step 2.
If the user service traffic is abnormal, proceed to step 2.
2. Check whether the aggregation load sharing mode matches the packet characteristics.
Check the type of aggregation load sharing by executing the display link-aggregation load-sharing modecommand. If it does not match the packet characteristics, adjust the mode of aggregation load sharing with the following command:
¡ Execute the link-aggregation global load-sharing mode command in system view to adjust the global load-sharing mode.
¡ Execute the link-aggregation load-sharing mode command in aggregate interface view to adjust the load sharing mode of the aggregate interface.
By default, the device performs load balancing based on source and destination IP addresses.
If the aggregation load sharing mode matches the characteristics of the packets, proceed to step 3.
3. Check whether cross-module or cross-chassis aggregation has been deployed.
If cross-module or cross-chassis aggregation exists on an IRF fabric, execute the undo link-aggregation load-sharing mode local-first command in system view to disable the local first forwarding feature.Disabling the local first forwarding feature can prevent cross-module or cross-chassis traffic from being too large and affect the stability of the IRF fabric. Perform this operation according to the actual situation.
If cross-module or cross-chassis aggregation is not deployed, proceed to step 4.
Excessive cross-module or cross-chassis traffic might affect the stability of the IRF fabric.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Unselection of aggregation member ports
Symptom
When two devices are connected via link aggregation, the member ports of the aggregation group are in unselected state and the aggregation fails.
Common causes
The following are the common causes for this type of issue:
· Link connectivity fault.
· The operational key and attribute configurations are inconsistent between the local end and the peer end.
· The aggregation member port count is incorrect.
Troubleshooting flow
To resolve this issue:
1. Identify whether the member ports are up and eliminate physical faults on the port.
2. Use the debugging link-aggregation lacp packet command to view the LACP interaction on member ports of the dynamic aggregation group.
3. Check the local and peer aggregate interface configurations to eliminate configuration faults.
Figure 33 shows the troubleshooting flowchart.
Figure 33 Flowchart for troubleshooting unselection of aggregation member ports
Solution
1. Identify whether the physical connections are correct.
Perform a link check according to the network plan of the aggregate interface, and identify whether the physical connections are connected as planned.
If the physical connections are correct, proceed to step 2.
2. Check whether the member ports in the aggregation group are up.
Use the display interface command to Identify whether the member ports in the aggregation group are up. If they are not up, follow the troubleshooting procedure for the down interface issue.
If the member ports are up, proceed to step 3.
3. Check whether the attribute configuration of the local member ports is the same as that of the aggregate interface.
a. Execute the display link-aggregation verbose command to view the unselected member ports on the local end.
Taking a Layer 2 aggregate interface as an example, when the Status field displays U, the member port is unselected.
<Sysname> display link-aggregation verbose
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected, I -- Individual
Port: A -- Auto port, M -- Management port, R -- Reference port
Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
D -- Synchronization, E -- Collecting, F -- Distributing,
G -- Defaulted, H -- Expired
Aggregate Interface: Bridge-Aggregation1
Creation Mode: Manual
Aggregation Mode: Dynamic
Loadsharing Type: Shar
Management VLANs: None
System ID: 0x8000, 2a41-21c1-0100
Local:
Port Status Priority Index Oper-Key Flag
GE2/0/1(R) S 32768 1 1 {ACDEF}
GE2/0/2 S 32768 2 1 {ACDEF}
GE2/0/3 U 32768 3 2 {AC}
Remote:
Actor Priority Index Oper-Key SystemID Flag
GE2/0/1 32768 1 1 0x8000, 36f6-c0aa-0200 {ACDEF}
GE2/0/2 32768 2 1 0x8000, 36f6-c0aa-0200 {ACDEF}
GE2/0/3 32768 3 1 0x8000, 36f6-c0aa-0200 {AC}
b. Execute the display current-configuration interface command to check whether the attribute configuration (such as VLAN) of the unselected member port on the local end is the same as the aggregate interface. If not, modify the attribute configuration for consistent configuration.
For example, the member port GigabitEthernet 2/0/3 is in unselected state and has different attribute configuration from the reference port GigabitEthernet2/0/1. This difference prevents the member port. You must modify the attribute configuration of the member port GigabitEthernet 2/0/3.
<Sysname> display current-configuration interface gigabitethernet 2/0/1
#
interface GigabitEthernet2/0/1
port link-mode bridge
port link-type trunk
port trunk permit vlan 1 to 20
port link-aggregation group 1
#
return
<Sysname> display current-configuration interface bridge-aggregation 1
#
interface Bridge-Aggregation1
port link-type trunk
port trunk permit vlan 1 to 100
link-aggregation mode dynamic
#
return
If the attribute configuration of the local member port is the same as the aggregate interface, proceed to step 4.
4. Check whether the operational key of the member ports on the local end is the same as the reference port.
a. Execute the display link-aggregation verbose command to view the unselected member ports on the local end.
Taking the Layer 2 aggregate interface as an example, when the Status field displays U, the member port is unselected:
<Sysname> display link-aggregation verbose
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected, I -- Individual
Port: A -- Auto port, M -- Management port, R -- Reference port
Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
D -- Synchronization, E -- Collecting, F -- Distributing,
G -- Defaulted, H -- Expired
Aggregate Interface: Bridge-Aggregation11
Creation Mode: Manual
Aggregation Mode: Dynamic
Loadsharing Type: Shar
Management VLANs: None
System ID: 0x8000, 2a41-21c1-0100
Local:
Port Status Priority Index Oper-Key Flag
GE2/0/1(R) S 32768 1 1 {ACDEF}
GE2/0/2 S 32768 2 1 {ACDEF}
GE2/0/3 U 32768 3 2 {AC}
Remote:
Actor Priority Index Oper-Key SystemID Flag
GE2/0/1 32768 1 1 0x8000, 36f6-c0aa-0200 {ACDEF}
GE2/0/2 32768 2 1 0x8000, 36f6-c0aa-0200 {ACDEF}
GE2/0/3 32768 3 1 0x8000, 36f6-c0aa-0200 {AC}
b. Execute the display current-configuration interface command to check whether the operational key of the local member port in unselected state (including the port's speed and duplex mode) is the same as the reference port. If not, modify the configuration for consistency.
For example, the operational key of the member port GigabitEthernet 2/0/3 in unselected state is different from that of the reference port GigabitEthernet 2/0/1. As a result, the member port cannot be selected and the port rate configuration must be modified.
<Sysname> display current-configuration interface GigabitEthernet 2/0/1
#
interface GigabitEthernet2/0/1
port link-mode bridge
combo enable fiber
port link-aggregation group 11
#
return
<Sysname> display current-configuration interface GigabitEthernet 2/0/3
#
interface GigabitEthernet2/0/3
port link-mode bridge
combo enable fiber
speed 100
port link-aggregation group 11
#
return
If the operational key of the local member port is the same as the reference port, proceed to step 5.
5. Check whether the local aggregate interface is in dynamic mode.
If it is in dynamic mode, proceed to step 6. If it is in static mode, proceed to step 8.
6. Check whether LACP packets are sent and received correctly.
Execute the debugging link-aggregation lacp packet command to Identify whether LACP packets are sent and received correctly. Examine the Actor field in the send information and the Partner field in the receive information of the member port. If the sys-mac, key, and port-index fields are inconsistent, the LACP protocol packet transmission is abnormal. Identify whether the receiving or sending fiber is disconnected. If the sys-mac, key, and port-index fields are consistent, the LACP protocol packet transmission is normal, and proceed to step 7.
Enable the debugging switch for the LACP packets of the aggregation member port GigabitEthernet 2/0/1, and observe LACP packet receiving and sending on this port.
<Sysname> debugging link-aggregation lacp packet all interface gigabitethernet 2/0/1
*Nov 2 15:51:21:15 2021 Sysname LAGG/7/Packet: PACKET.GigabitEthernet2/0/1.send.
size=110, subtype =1, version=1
Actor: type=1, len=20, sys-pri=0x8000, sys-mac=00e0-fc02-0300, key=0x1, pri=0x8000, port-index=0x2, state=0xc5
Partner: type=2, len=20, sys-pri=0x0, sys-mac=0000-0000-0000, key=0x0, pri=0x0, port-index=0x0, state=0x32
Collector: type=3, len=16, col-max-delay=0x0
Terminator: type=0, len=0
*Nov 2 15:55:21:15 2021 Sysname LAGG/7/Packet: PACKET.GigabitEthernet2/0/1.receive.
size=110, subtype =1, version=1
Actor: type=1, len=20, sys-pri=0x8000, sys-mac=00e0-fc00-0000, key=0x1, pri=0x8000, port-index=0x6, state=0xd
Partner: type=2, len=20, sys-pri=0x8000, sys-mac=00e0-fc02-0300, key=0x1, pri=0x8000, port-index=0x2, state=0xc5
Collector: type=3, len=16, col-max-delay=0x0
Terminator: type=0, len=0
7. Check whether the operational key and attribute configuration of the peer port for the local member port are the same as the peer port for the reference port.
Execute the display current-configuration interface command on the device on the peer end of the local unselected port. Identify whether the operational key and attribute configuration of the peer end for the unselected port are the same as those on the peer port for the reference port. If not, modify the configuraiton for consistency.
If the operational key and attribute configuration of the peer port for the local member port are the same as those of the peer port for the reference port, proceed to step 8.
8. Check whether the number of aggregation member ports reaches the upper limit.
¡ The number of aggregation member ports reaches the upper limit.
Execute the link-aggregation selected-port maximum command in aggregate interface view to configure the maximum number of selected ports in the aggregation group. Use the display link-aggregation verbose command to Identify whether the number of member ports in the aggregation group reaches the upper limit. If yes, the excess ports will be placed in unselected state. Selected ports are sorted in ascending order by port ID. Execute the undo port link-aggregation group command in member port view to remove undesired selected ports from the aggregation group for desired member ports to be selected.
¡ The number of aggregate member ports is below the lower limit.
Execute the link-aggregation selected-port minimum command in aggregate interface view to configure the minimum number of selected ports in the aggregation group. Execute the display link-aggregation verbose command to check whether the member ports in the aggregation group are lower than the lower limit. If they are lower than the lower limit, all member ports are in unselected state. Execute the link-aggregation selected-port minimum command to modify the minimum selected port count or add member ports to the aggregation group so that the minimum selection requirements are met.
If the number of aggregation member ports has not reached the limit of the aggregation group, proceed to step 9.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Spanning tree issues
Service interruption caused by a loop
Symptom
Services are interrupted when multiple devices are connected into a loop through physical links.
Common causes
The following are the common causes of this type of issue:
· The physical state of the device interfaces is down.
· The spanning tree feature is disabled on the device.
Troubleshooting flow
Figure 34 shows the troubleshooting flowchart.
Figure 34 Flowchart for troubleshooting service interruption caused by a loop
Solution
1. Identify whether the state of the interfaces forwarding service traffic is up.
a. Identify whether the physical state of the interfaces is up.
Execute the display interface brief command to Identify whether the physical state of the network interfaces is up by examining the Link field.
<Sysname> display interface brief
Brief information on interfaces in route mode:
Link: ADM - administratively down; Stby - standby
Protocol: (s) - spoofing
Interface Link Protocol Primary IP Description
InLoop0 UP UP(s) --
MGE0/0/0 DOWN DOWN --
NULL0 UP UP(s) --
REG0 UP -- --
Brief information on interfaces in bridge mode:
Link: ADM - administratively down; Stby - standby
Speed: (a) - auto
Duplex: (a)/A - auto; H - half; F - full
Type: A - access; T - trunk; H - hybrid
Interface Link Speed Duplex Type PVID Description
GE2/0/1 UP auto A A 1
GE2/0/2 DOWN auto A A 1
GE2/0/3 ADM auto A A 1
- If the state of the interfaces is up, proceed to step b.
- If the state of an interface is ADM, execute the undo shutdown command in interface view to activate this interface. If the state of the interface remains down, check the interface link and related configurations. If the state of the interface is up and the issue persists, proceed to step b.
- If the state of an interface is down, troubleshoot the interface link and related configurations. If the state of the interface is up and the issue persists, proceed to step b.
b. Identify whether the state of the data link layer (DDL) protocol on the interface is up. The interface with a down DDL protocol cannot participate in computing the spanning tree topology.
Execute the display interface command and check whether the DDL protocol state of the interface is up by examining the Line protocol state field.
<Sysname> display interface gigabitethernet 2/0/2
GigabitEthernet2/0/2
Current state: UP
Line protocol state: DOWN(LAGG)
...
DOWN(protocols) indicates that the DDL of the interface is shut down by one or more protocol modules. The protocols argument can be any combination of the following protocols:
- DLDP—The DDL of the interface is shut down because the DLDP module detects a unidirectional communication.
- OAM—The interface's data link layer was disabled because the Ethernet OAM module detected a remote link failure.
- LAGG—The DDL of the interface is shut down because there are no selected member ports for the aggregate interface.
- BFD—The DDL of the interface is shut down because the BFD module detects a link fault.
- VBP—The DDL of the interface is shut down because Layer 2 forwarding is configured.
If the DDL of the interface is shut down by the above protocols, review and adjust the configuration of these modules to restore the DDL protocol state of the interface to up. If the issue persists, proceed to step 2.
2. Identify whether the spanning tree feature on the devices is enabled.
a. Check whether the global spanning tree feature is enabled on the devices.
Execute the display stp command.
- If the following output appears, the global spanning tree protocol is not enabled:
<Sysname> display stp
Protocol status : Disabled
Protocol Std. : IEEE 802.1s
Version : 3
Bridge-Prio. : 32768
MAC address : 2eae-3769-0200
Max age(s) : 20
Forward delay(s) : 15
Hello time(s) : 2
Max hops : 20
TC Snooping : Disabled
<Sysname> display stp
STP is not configured.
Execute the stp global enable command under system view to enable the global spanning tree feature.
- If the state and statistical information of the spanning tree appear as shown below, the global spanning tree feature is enabled. Proceed to step b.
<Sysname> display stp
-------[CIST Global Info][Mode MSTP]-------
Bridge ID : 32768.2eae-3769-0200
Bridge times : Hello 2s MaxAge 20s FwdDelay 15s MaxHops 20
Root ID/ERPC : 32768.2eae-3769-0200, 0
RegRoot ID/IRPC : 32768.2eae-3769-0200, 0
RootPort ID : 0.0
BPDU-Protection : Disabled
Bridge Config-
Digest-Snooping : Disabled
TC or TCN received : 0
Time since last TC : 0 days 2h:49m:11s
----[Port1(GigabitEthernet2/0/1)][DOWN]----
Port protocol : Enabled
Port role : Disabled Port
Port ID : 128.54
Port cost(Legacy) : Config=auto, Active=200000
Desg.bridge/port : 32768.2eae-3769-0200, 128.54
Port edged : Config=disabled, Active=disabled
Point-to-Point : Config=auto, Active=false
Transmit limit : 10 packets/hello-time
TC-Restriction : Disabled
Role-Restriction : Disabled
Protection type : Config=none, Active=none
MST BPDU format : Config=auto, Active=802.1s
Port Config-
Digest-Snooping : Disabled
Rapid transition : False
Num of VLANs mapped : 1
Port times : Hello 2s MaxAge 20s FwdDelay 15s MsgAge 0s RemHops 20
BPDU sent : 0
TCN: 0, Config: 0, RST: 0, MST: 0
BPDU received : 0
TCN: 0, Config: 0, RST: 0, MST: 0
b. Check whether the spanning tree feature is enabled for VLANs. (Only applicable when the spanning tree mode is PVST. Proceed to step c for any other spanning tree mode.)
Execute the display this command in system view to check whether the undo stp vlan enable command exists.
[Sysname] display this
...
#
undo stp vlan 2 enable
stp mode pvst
stp global enable
#
...
If the above configuration exists and the network requires enabling the spanning tree feature for the VLANs, execute the stp vlan enable command in system view to enable the spanning tree feature on the VLANs.
c. Identify whether the spanning tree feature is enabled on the interfaces.
Execute the display stp command to Identify whether the spanning tree feature is not enabled on interfaces.
<Sysname> display stp
...
----[Port2(GigabitEthernet2/0/2)][DISABLED]----
Port protocol : Disabled
...
Execute the stp enable command in interface view to activate the spanning tree feature on the interfaces participating in the spanning tree calculations.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
User endpoint disconnection in the spanning tree network
Symptom
When a user endpoint is connected to the spanning tree network, transitent disconnection occurs on the interface connecting the endpoint, causing persistent packet loss and endpoint disconnection.
Common causes
The interface connected to the user endpoint device is not configured as an edge port.
Troubleshooting flow
Figure 35 shows the troubleshooting flowchart.
Figure 35 Flowchart for troubleshooting user endpoint disconnection in the spanning tree network
Solution
1. Check whether the interfaces directly connected to the user endpoint are edge ports in the spanning tree network.
Execute the display stp command on the device directly connected to the user endpoint to Identify whether the interface directly connected to the user endpoint is an edge port.
<Sysname> display stp
...
----[Port2(GigabitEthernet2/0/1)][FORWARDING]----
Port protocol : Enabled
Port role : Designated Port
Port ID : 128.2
Port cost(Legacy) : Config=auto, Active=20
Desg.bridge/port : 32768.2eae-3769-0200, 128.2
Port edged : Config=enabled, Active=enabled
Point-to-Point : Config=auto, Active=true
Transmit limit : 10 packets/hello-time
Protection type : Config=none, Active=none
Rapid transition : True
Port times : Hello 2s MaxAge 20s FwdDelay 15s MsgAge 0s
...
¡ If yes, proceed to step 2.
¡ If not, execute the stp edged-port command in interface view to configure this port as an edge port.
IMPORTANT: The edge port and loop guard features cannot be configured simultaneously on an interface. If the device outputs the following error prompt when you execute the stp edged-port command, the interface has the loop guard feature configured. In this case, you must execute the undo stp loop-protection command to disable the loop guard feature before you execute the stp edged-port command. |
Failed to enable edged-port on GigabitEthernet2/0/1, because loop-protection is enabled.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
STP/6/STP_DETECTED_TC
Unchangeable master port in an MSTI other than MSTI 0
Symptom
In an MSTP network, for instances other than MSTI 0 on the device, ports that should not have the master role are calculated as master ports. The master port roles cannot be changed by adjusting parameters such as priority and cost values.
Common causes
In an MST region, devices have inconsistent MST region configurations.
Troubleshooting flow
If the MST region configurations of two devices are inconsistent, a device will consider that the peer device and the local device are not in the same MST region. A port connected to a device in the region will be calculated as the master port. To resolve this issue, check the MST region configuration of devices in the same MST region to ensure that the configurations of each device are consistent.
Figure 36 shows the troubleshooting flowchart.
Figure 36 Flowchart for troubleshooting unchangeable master port in an MSTI other than MSTI 0
Solution
1. Make sure that devices in the same MST region have the same region name, revision level, and VLAN mapping table configuration for the MST region.
Execute the display stp region-configuration command to view the effective MST region configuration of the devices.
<Sysname> display stp region-configuration
Oper Configuration
Format selector : 0
Region name : hello
Revision level : 0
Configuration digest : 0x5f762d9a46311effb7a488a3267fca9f
Instance VLANs Mapped
0 21 to 4094
1 1 to 10
2 11 to 20
¡ Region name—The region name of the MST region. Execute the stp region-configuration command in system view to enter MST region view, and configure the region name with the region-name command.
¡ Revision level—Revision level of the MST region. Execute the stp region-configuration command in system view to enter MST region view, and configure the revision level with the revision-level command.
¡ Instance VLANs Mapped—VLAN mapping relationships of the MST region. Execute the stp region-configuration command in system view to enter MST region view, and configure VLAN mapping relationships with the instance or the vlan-mapping modulo command.
If the above parameters are inconsistent on different devices within the same MST region, change the parameter configurations to be consistent. After configuring the parameters of the MST region, execute the active region-configuration command in MST region view to activate the user configuration of the MST region. If not, the previous configuration will still take effect on the MST region.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Spanning tree link flapping
Symptom
Frequent network topology changes caused by constant changes of the spanning tree root bridge, port role, and port state
Common causes
The following are the common causes of this type of issue:
· Link flapping: The properties of a certain port's link, such as the state, rate, and duplex mode, change frequently.
· Node fault: The CPU usage of the devices on the network is high and the spanning tree packets cannot be processed in a timely manner. A device reboots repeatedly, causing the spanning tree to be constantly recalculated.
· Network failures:
¡ Packet congestion leads to BPDU loss.
¡ A device received a BPDU from another network unexpectedly, triggering a recalculation of the current network's spanning tree.
¡ Other features of the device cause BPDUs to be discarded incorrectly.
Troubleshooting flow
Figure 37 shows the troubleshooting flowchart.
Figure 37 Flowchart for troubleshooting spanning tree link flapping
Solution
1. Identify whether any device in the spanning tree network is experiencing high CPU usage, rebooting, or changes in the status of the interface links.
Based on the network deployment, use the controller, device management platform, and user interface to check whether the devices are experiencing high CPU usage, rebooting, or changes in the status of the interface links.
If both the device state and link state have returned to stability, but the issue persists, proceed to step 2.
2. Check whether the root bridge of the spanning tree network is changed.
In the spanning tree network, execute the display stp root command to view the root bridge in the current spanning tree network.
<Sysname> display stp root
MST ID Root Bridge ID ExtPathCost IntPathCost Root Port
0 32768.14e3-19d3-0100 0 40 GE2/0/2
10 0.14e3-19d3-0100 0 40 GE2/0/2
20 0.14e3-1f59-0200 0 0
The Root Bridge ID field indicates the ID of the root bridge in the spanning tree network. The format of the root bridge ID is priority.bridge MAC address. Use this field to determine whether the root bridge in the spanning tree network is the desired device. If the root bridge device is correct but the spanning tree network still keeps flapping, proceed to step 3. If the root bridge device is not the desired one, you can modify the root bridge as follows:
¡ Change the priority of the desired device. The priority of a device participates in the spanning tree calculation. The smaller the value, the higher the priority. Execute the stp priority command to set the priority level of the desired device to 0 or a smaller value, so that the specified device becomes the spanning tree root bridge.
¡ Execute the stp root primary command on the desired device to configure this device as the root bridge of the spanning tree.
After you configure the desired device as the root bridge, maintain the stability of the root bridge and network topology with the following functions:
¡ Enable root guard.
After configuring the stp root-protection command in interface view, this interface can only act as a designated port in all MSTIs. When this interface receives a BPDU with higher priority level from an MSTI, it immediately transits to listening state, no longer forwarding packets (which is equivalent to disconnect this interface). If no superior BPDU is received within double the forward delay time (the default forward delay time is 15 seconds), the interface will restore to its normal state. The root guard function can avoid illegal changes in the spanning tree topology caused by misconfiguration or vicious attacks.
¡ Configure the edge port and BPDU guard.
For access layer devices, access ports are usually directly connected to user endpoints (such as PCs) or file servers. Access ports must be configured as edge ports for fast port migration. Under normal circumstances, an access port does not exchange STP BPDUs with user endpoints. If the access port receives BPDUs, network topology change and spanning tree network flapping might occur.
Spanning tree provides BPDU guard feature to solve this issue. Execute stp bpdu-protection command in system or interface view. When edge ports receive BPDUs, the system will shut down these ports and notify the user that these ports have been shut down by spanning tree. The shutdown ports will be reactivated after a time interval configured by using the shutdown-interval command.
¡ Enable loop guard.
A downstream device relies on continuous BPDUs sent by the upstream device to maintain the state of the root port and blocked ports. If a link congestion or unidirectional link fault occurs, these ports cannot receive BPDUs from the upstream device. In this case, the downstream device reselects the port role, causing the root port of the downstream device to convert to the designated port. The blocked ports transit to the forwarding state, and a loop occurs in the switched network.
Execute the stp loop-protection command on the root port and alternate port of downstream devices to configure the loop guard feature to suppress the occurrence of the above loops. On a port with the loop guard feature enabled, the initial state of all MSTIs is discarding. If the port receives BPDUs, these MSTIs can perform normal state transitions. If the port does not receive BPDUs, these MSTIs will remain in the discarding state to avoid loops.
Do not configure the loop guard feature on a port connected to a user endpoint. Otherwise, the port will remain discarding and cannot forward user traffic.
¡ Enable TC-BPDU guard.
If TC-BPDUs are used to attack a device, the device will receive a large number of TC-BPDUs within a short period of time. Then, the device is busy with forwarding entry flushing. This affects network stability. You can enable TC-BPDU guard to prevent frequent flushing of forwarding entries. Execute the stp tc-protection command in system view to enable the TC-BPDU guard feature. Execute the stp tc-protection threshold number command in system view to configure the maximum number of forwarding entry flushes that the device can perform every 10 seconds.
With the TC-BPDU guard feature enabled, if the number of times the device receives TC-BPDUs within 10 seconds is greater than the specified number, the device only refresh the forwarding entries the specified number of times during this period. For excess TC-BPDUs, the device refreshes the forwarding entries uniformly after this period of time.
If the issue persists, proceed to step 3.
3. Troubleshoot for BPDU timeout.
Check whether the device outputs the STP_BPDU_RECEIVE_EXPIRY log. This log describes that the device has not received any BPDUs within the BPDU timeout time, which has triggered spanning tree recalculation. The cause of BPDU timeout might be congestion in BPDU forwarding on the network, or other configurations on the device causing BPDUs to be incorrectly discarded.
To locate the fault more accurately, proceed to step 4.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Log messages
· STP/5/STP_BPDU_RECEIVE_EXPIRY
· STP/6/STP_DETECTED_TC
· STP/6/STP_NOTIFIED_TC
Alarm messages
N/A
Troubleshooting Layer 2—WAN access issues
PPP issues
PPP interface in protocol down state
Symptom
After the physical PPP interfaces of two devices are connected, the link layer protocol state of the interfaces is displayed as down.
Common causes
The following are the common causes of this type of issue:
· The physical layer state of the interface is not up.
· The PPP-related configuration is incorrect on the interfaces at both ends of the link.
· The PPP protocol packets are dropped.
· A loop exists on the link.
· The link latency is too high.
Troubleshooting flow
Figure 38 shows the troubleshooting flowchart.
Figure 38 Flowchart for troubleshooting PPP interfaces in protocol down state
Solution
1. Identify whether the interface is up on the physical layer.
Execute the display interface interface-type interface-number command in any view to check the physical state of the local interface:
¡ If the physical state of the local interface is Administratively DOWN, the local interface is shut down by using the shutdown command. In this case, bring up the interface by executing the undo shutdown command on the local interface.
¡ If the physical state of the local interface is DOWN, identify whether the peer interface is shut down by using the shutdown command. If yes, bring up the peer interface by executing the undo shutdown command on the peer interface.
¡ Identify whether the optical fibers and transceiver modules are firmly installed at both ends, and whether the Rx/Tx optical fibers are correctly plugged. Resolve the issue that the interface is physically down.
¡ If the interface state is up, proceed to the next step.
2. Identify whether the PPP configuration is correct at both ends of the link.
Execute the display this command on the interface where the PPP protocol is down to check the PPP-related configuration on the interface.
[Sysname-Serial3/0/5] display this
#
interface Serial3/0/5
ip address 12.1.1.1 255.255.255.0
#
return
¡ Verify that the link layer protocol is PPP on both interfaces of the link. More specifically: In any view on the devices at both ends, execute the display interface interface-type interface-number command to identify whether the value for the Link layer protocol field in the command output is PPP on both interfaces. If it is not PPP on an interface, execute the link-protocol ppp command on the interface to configure the link layer protocol as PPP.
¡ If PPP authentication has been configured, identify whether the authentication type and the authentication username/password of the authenticator are the same as those of the authenticatee. If they are different, modify the configuration as described in the PPP configuration guide.
¡ If interfaces on both ends are assigned to an MP group, identify whether the MP-group interface is shut down by using the shutdown command. If yes, bring up the MP-group interface by executing the undo shutdown command on the MP-group interface.
¡ If the interface on one end has the remote address command executed, make sure the interface on the other end has either the ip address ppp-negotiate command executed or the ip address command executed to manually configure the IP address specified by using the remote address command on the peer interface.
If PPP is configured correctly but the link layer protocol state is still down on the PPP interface, proceed to the next step.
3. Identify whether the protocol packets are received and sent normally on the interface.
Execute the display ppp packet statistics command in any view to view the statistics of PPP protocol packets and identify whether the packets are sent and received normally.
<Sysname> display ppp packet statistics slot 3
PPP packet statistics in slot 3:
-----------------------------------LCP--------------------------------------
SEND_LCP_CON_REQ : 4 RECV_LCP_CON_REQ : 5
SEND_LCP_CON_NAK : 0 RECV_LCP_CON_NAK : 0
SEND_LCP_CON_REJ : 0 RECV_LCP_CON_REJ : 0
SEND_LCP_CON_ACK : 4 RECV_LCP_CON_ACK : 4
SEND_LCP_CODE_REJ : 0 RECV_LCP_CODE_REJ : 0
SEND_LCP_PROT_REJ : 0 RECV_LCP_PROT_REJ : 0
SEND_LCP_TERM_REQ : 2 RECV_LCP_TERM_REQ : 1
SEND_LCP_TERM_ACK : 1 RECV_LCP_TERM_ACK : 0
SEND_LCP_ECHO_REQ : 25 RECV_LCP_ECHO_REQ : 0
SEND_LCP_ECHO_REP : 0 RECV_LCP_ECHO_REP : 25
SEND_LCP_FAIL : 0 SEND_LCP_CON_REQ_RETRAN : 0
-----------------------------------IPCP-------------------------------------
SEND_IPCP_CON_REQ : 38 RECV_IPCP_CON_REQ : 2
SEND_IPCP_CON_NAK : 0 RECV_IPCP_CON_NAK : 0
SEND_IPCP_CON_REJ : 0 RECV_IPCP_CON_REJ : 0
SEND_IPCP_CON_ACK : 2 RECV_IPCP_CON_ACK : 2
SEND_IPCP_CODE_REJ : 0 RECV_IPCP_CODE_REJ : 0
SEND_IPCP_PROT_REJ : 0 RECV_IPCP_PROT_REJ : 0
SEND_IPCP_TERM_REQ : 0 RECV_IPCP_TERM_REQ : 0
SEND_IPCP_TERM_ACK : 0 RECV_IPCP_TERM_ACK : 0
SEND_IPCP_FAIL : 0
-----------------------------------IPV6CP-----------------------------------
SEND_IPV6CP_CON_REQ : 0 RECV_IPV6CP_CON_REQ : 0
SEND_IPV6CP_CON_NAK : 0 RECV_IPV6CP_CON_NAK : 0
SEND_IPV6CP_CON_REJ : 0 RECV_IPV6CP_CON_REJ : 0
SEND_IPV6CP_CON_ACK : 0 RECV_IPV6CP_CON_ACK : 0
SEND_IPV6CP_CODE_REJ : 0 RECV_IPV6CP_CODE_REJ : 0
SEND_IPV6CP_PROT_REJ : 0 RECV_IPV6CP_PROT_REJ : 0
SEND_IPV6CP_TERM_REQ : 0 RECV_IPV6CP_TERM_REQ : 0
SEND_IPV6CP_TERM_ACK : 0 RECV_IPV6CP_TERM_ACK : 0
SEND_IPV6CP_FAIL : 0
-----------------------------------OSICP------------------------------------
SEND_OSICP_CON_REQ : 0 RECV_OSICP_CON_REQ : 0
SEND_OSICP_CON_NAK : 0 RECV_OSICP_CON_NAK : 0
SEND_OSICP_CON_REJ : 0 RECV_OSICP_CON_REJ : 0
SEND_OSICP_CON_ACK : 0 RECV_OSICP_CON_ACK : 0
SEND_OSICP_CODE_REJ : 0 RECV_OSICP_CODE_REJ : 0
SEND_OSICP_PROT_REJ : 0 RECV_OSICP_PROT_REJ : 0
SEND_OSICP_TERM_REQ : 0 RECV_OSICP_TERM_REQ : 0
SEND_OSICP_TERM_ACK : 0 RECV_OSICP_TERM_ACK : 0
SEND_OSICP_FAIL : 0
-----------------------------------MPLSCP-----------------------------------
SEND_MPLSCP_CON_REQ : 0 RECV_MPLSCP_CON_REQ : 0
SEND_MPLSCP_CON_NAK : 0 RECV_MPLSCP_CON_NAK : 0
SEND_MPLSCP_CON_REJ : 0 RECV_MPLSCP_CON_REJ : 0
SEND_MPLSCP_CON_ACK : 0 RECV_MPLSCP_CON_ACK : 0
SEND_MPLSCP_CODE_REJ : 0 RECV_MPLSCP_CODE_REJ : 0
SEND_MPLSCP_PROT_REJ : 0 RECV_MPLSCP_PROT_REJ : 0
SEND_MPLSCP_TERM_REQ : 0 RECV_MPLSCP_TERM_REQ : 0
SEND_MPLSCP_TERM_ACK : 0 RECV_MPLSCP_TERM_ACK : 0
SEND_MPLSCP_FAIL : 0
-----------------------------------AUTH-------------------------------------
SEND_PAP_AUTH_REQ : 0 RECV_PAP_AUTH_REQ : 0
SEND_PAP_AUTH_ACK : 0 RECV_PAP_AUTH_ACK : 0
SEND_PAP_AUTH_NAK : 0 RECV_PAP_AUTH_NAK : 0
SEND_CHAP_AUTH_CHALLENGE: 0 RECV_CHAP_AUTH_CHALLENGE: 0
SEND_CHAP_AUTH_RESPONSE : 0 RECV_CHAP_AUTH_RESPONSE : 0
SEND_CHAP_AUTH_ACK : 0 RECV_CHAP_AUTH_ACK : 0
SEND_CHAP_AUTH_NAK : 0 RECV_CHAP_AUTH_NAK : 0
SEND_PAP_AUTH_FAIL : 0 SEND_CHAP_AUTH_FAIL : 0
¡ If the number of received or sent packets is 0 or does not increase after you execute this command multiple times, it indicates that protocol packets are lost during transmission. Verify that the interfaces, optical fibers, and transceiver modules are operating correctly to resolve the packet loss issue. If the issue persists, proceed to step 6.
¡ If packets are received and sent normally, proceed to the next step.
4. Identify whether a loop exists on the link.
Execute the debugging ppp all interface interface-type interface-number command in user view on the local device to enable debugging for PPP packets. Identify whether the local end has received and sent packets that are completely the same (such as in the packet type, packet ID, and magic number.)
*Apr 7 19:38:04:384 2022 Sysname PPP/7/FSM_PACKET_0: -MDC=1-Slot=3;
PPP Packet:
Ser3/0/5(109) Output LCP(c021) Packet, PktLen 14
Current State reqsent, code ConfReq(01), id 0, len 10
MagicNumber(5), len 6, val c5 ae e7 03
*Apr 7 19:38:04:390 2022 Sysname PPP/7/FSM_PACKET_0: -MDC=1-Slot=3;
PPP Packet:
Ser3/0/5(109) Input LCP(c021) Packet, PktLen 14
Current State reqsent, code ConfReq(01), id 0, len 10
MagicNumber(5), len 6, val c5 ae e7 03
¡ If yes, a loop exists on the link. Check the cause of the loop (for example, an incorrect fiber connection), and remove the loop. If the issue persists, proceed to step 6.
¡ If not, no loop exists on the link. Proceed to the next step.
5. Identify whether the link latency is too high.
Execute the debugging ppp all interface interface-type interface-number command in user view on the local device to enable debugging for PPP packets. Determine the link latency by checking the time interval between the transmit timestamp and the receive timestamp of the PPP negotiation packets.
*Apr 7 19:38:04:384 2022 Sysname PPP/7/FSM_PACKET_0: -MDC=1-Slot=3;
PPP Packet:
Ser3/0/5(109) Output LCP(c021) Packet, PktLen 14
Current State reqsent, code ConfReq(01), id 0, len 10
MagicNumber(5), len 6, val c5 ae e7 03
*Apr 7 19:38:04:387 2022 Sysname PPP/7/FSM_PACKET_0: -MDC=1-Slot=3;
PPP Packet:
Ser3/0/5(109) Input LCP(c021) Packet, PktLen 14
Current State acksent, code ConfAck(02), id 0, len 10
MagicNumber(5), len 6, val c5 ae e7 03
Identify whether the link latency is longer than the negotiation timeout interval for PPP protocol packets configured on the current interface. The negotiation timeout interval for PPP protocol packets is configured by using the ppp timer negotiate command on the interface, and is 3 seconds by default.
¡ If the link latency is too high, execute the ppp timer negotiate command to appropriately increase the negotiation timeout interval. Alternatively, replace the corresponding device or link and retest the link latency until the link latency is less than the negotiation timeout interval for PPP protocol packets configured on the interface.
¡ If the link latency is small, proceed to the next step.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting Layer 3—IP services issues
ARP issues
ARP learning failure
Symptom
The device cannot learn ARP entries, causing traffic forwarding failure.
Common causes
The following are the common causes of this type of issue:
· The memory is insufficient.
· The physical layer state of the interface is not up.
· The IP addresses of the local interface and the peer interface do not reside on the same network segment.
· ARP packets fail to be sent to the CPU.
· A card is faulty.
· ARP packets are dropped due to a busy CPU.
Troubleshooting flow
Figure 39 shows the troubleshooting flowchart.
Figure 39 Flowchart for troubleshooting ARP learning failure
Solution
1. Use the display memory-threshold command to identify whether the memory is insufficient.
<Sysname> display memory-threshold
Memory usage threshold: 100%
Free-memory thresholds:
Minor: 96M
Severe: 64M
Critical: 48M
Normal: 128M
Early-warning: 256M
Secure: 304M
Current free-memory state: Normal (secure)
¡ If the Current free-memory state field displays Normal or Normal (secure), go to the next step.
¡ If the Current free-memory state field displays Minor, Severe, Critical, or Normal (early-warning), check the device memory usage and troubleshoot the insufficient memory issue.
2. Check the network configuration and interface state.
a. Use the display interface command to identify whether the interface is up. If the interface is not up, troubleshoot the issue.
b. Use the display fib ip-address command to view FIB entries. ip-address represents the IP address in an ARP entry. If the corresponding FIB entry does not exist, the routing module might be faulty. For more information about troubleshooting routing module issues, see "Troubleshooting Layer 3—IP Routing." If the corresponding FIB entry exists but the next hop address is not the address of the direct next hop, check the connection between the device and its next hop.
c. Use the display ip interface command to view the IP address of the interface.
- Identify whether the IP address of the local interface resides on the same network segment as the peer interface. If the IP addresses reside on different network segments, execute the ip address command in interface view to edit the IP addresses.
- Identify whether the local interface IP address conflicts with the peer interface IP address. If a conflict has occurred, execute the ip address command in interface view to edit the IP addresses.
- Identify whether the peer interface is the one where the next hop resides.
d. Use the ping command to identify whether a link failure exists.
3. Identify whether ARP packets are sent and received correctly.
a. Use the debugging arp packet command to enable ARP packet debugging. Then, execute the ping command to identify whether the device sends and receives ARP packets correctly.
<Sysname> debugging arp packet
<Sysname> ping –c 1 1.1.1.2
Ping 1.1.1.2 (1.1.1.2): 56 data bytes, press CTRL+C to break
56 bytes from 1.1.1.2: icmp_seq=0 ttl=255 time=2.511 ms
--- Ping statistics for 1.1.1.2 ---
1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 2.511/2.511/2.511/nan ms
<Sysname>*Apr 18 17:28:22:879 2022 Sysname ARP/7/ARP_SEND: -MDC=1; Sent an ARP message, operation: 1, sender MAC: 68cb-978f-0106, sender IP: 1.1.1.1, target MAC: 0000-0000-0000, target IP: 1.1.1.2
The command output indicates that the device has successfully sent an ARP request in which the destination IP address is 1.1.1.2 and the source IP address is 1.1.1.1.
*Apr 18 17:28:22:881 2022 Sysname ARP/7/ARP_RCV: -MDC=1; Received an ARP message, operation: 2, sender MAC: 68cb-9c3f-0206, sender IP: 1.1.1.2, target MAC: 68cb-978f-0106, target IP: 1.1.1.1
The command output indicates that the device has successfully received an ARP reply in which the destination IP address is 1.1.1.1 and the source IP address is 1.1.1.2.
- If the device has sent and received ARP packets successfully, go to step 4.
- If the device failed to send or receive an ARP packet, go to the next step.
b. Use the debugging arp error command to enable ARP error debugging. Identify the ARP sending or receiving failure cause according to Table 4.
Field |
Description |
Packet discarded for the network state of receiving interface is down. |
An ARP packet was discarded because the network layer state of the receiving interface was down. |
Packet discarded for the ARP packet is too short. |
An ARP packet was discarded because the packet was too short. |
Packet discarded for the ARP packet is error. |
An ARP packet was discarded because the packet was an error packet. |
Packet discarded for the link state of the port is down. |
An ARP packet was discarded because the link layer state of the receiving port went down. |
Packet discarded for the sender IP is invalid. |
An ARP packet was discarded because the sender IP address in the packet was invalid. |
Packet discarded for the sender IP is a broadcast IP. |
An ARP packet was discarded because the sender IP address in the packet was a broadcast IP address. |
Packet discarded for the target IP is invaild. |
An ARP packet was discarded because the target IP address in the packet was invalid. |
Packet discarded for the target IP is a broadcast IP. |
An ARP packet was discarded because the target IP address in the packet was a broadcast IP address. |
Failed to get the source MAC of the ARP reply. |
ARP failed to obtain the source MAC address of an ARP reply. |
Packet discarded for the source MAC is a multicast address. |
An ARP packet was discarded because the source MAC address in the packet was a multicast MAC address. |
Packet discarded for the source MAC is a broadcast address. |
An ARP packet was discarded because the source MAC address in the packet was a broadcast MAC address. |
Packet discarded for the sender MAC address is the same as the receiving interface. |
An ARP packet was discarded because the sender MAC address in the packet is the same as the MAC address of the receiving interface. |
Packet discarded for the number of ARP entries reaches the limit. |
An ARP packet was discarded because the maximum number of ARP entries was reached. |
Packet discarded for the type of receiving interface is L2VE. |
An ARP packet was discarded because the receiving interface of the packet was an L2VE interface. |
Packet discarded for conflict with static entry. |
An ARP packet was discarded because the ARP information in the packet conflicted with a static ARP entry. |
Packet discarded for memory alarm notification. |
An ARP packet was discarded because a memory alarm notification was received. |
Packet discarded for insufficient resources. |
An ARP packet was discarded because of insufficient resources. |
4. Identify whether a card is faulty. The following uses the card in slot 1 as an example. Use the display system internal arp statistics command to view ARP statistics of the card.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display system internal arp statistics slot 1
Entry statistics:
Valid = 1 Dummy = 0
Long static = 0 Short resolved = 0
Multiport = 0 L3 short = 0
Packet = 1 OpenFlow = 0
Rule = 0 ARP input = 175
Resolved = 10
Static statistics:
Short static = 0 Long static = 0
Multiport = 0 Disabled = 0
Error statistics:
Memory = 0 Sync memory = 0
Packet = 10 Parameter = 0
IF = 0 Walk = 0
Add host route = 0 Del host route = 0
Local address = 0 Real time message = 0
Refresh rule = 0 Delete rule = 0
Smooth rule start = 0 Smooth rule end = 0
Running information:
Max ARP = 2048 Max multiport = 64
Default blackhole = 1 Max blackhole = 200
Timer queue = 0 Event queue = 0
Packet queue = 0 LIPC send queue = 0/0/0
a. If the value for the ARP input field is not 0, go to the next step. If the value for the ARP input field is 0, troubleshoot the card issue.
b. Collect the content of the Error statistics field and send it to H3C technical support staff.
5. Identify whether ARP packets are dropped due to a busy CPU. Use the view command to view information about ARP in the /proc/kque system directory and identify the situation and reason of ARP packet dropping.
[Sysname-probe] view /proc/kque | in ARP
0: dd0e0800 ARP_TIMER 128/0/13/0 (0x4b515545)
0: dd0e0900 ARP_SINGLEEVENT 1/0/0/0 (0x4b515545)
0: dd0e0a00 ARP_SEND 1024/0/0/0 (0x4b515545)
0: dd0e0b00 ARP_RULE 4096/0/0/0 (0x4b515545)
0: dd0e0c00 ARP_RULE_ENTRY 4096/0/0/0 (0x4b515545)
0: dd0e0d00 ARP_RBHASHNOTIFY 1/0/0/0 (0x4b515545)
0: dd0e0f00 ARP_DTC 2048/0/0/0 (0x4b515545)
0: dd0e6200 ARP_MICROSEGMENT 2048/0/0/0 (0x4b515545)
0: dd0e6300 ARP_MACNOTIFY 4096/0/0/0 (0x4b515545)
0: dd0e6400 ARP_UNKNOWNSMAC_EVENT 1/0/0/0 (0x4b515545)
0: d06e5900 ARPSNP_PKT 4096/0/0/0 (0x4b515545)
0: d06e5a00 ARP_VSISUP_PKT 4096/0/0/0 (0x4b515545)
0: d06e5b00 ARP_EVENT 8192/0/2/0 (0x4b515545)
0: d06e5c00 ARP_FREQEVENT 8192/0/1/0 (0x4b515545)
0: d06e5d00 ARP_MACNOTIFYEVENT 1/0/0/0 (0x4b515545)
0: d06e5e00 ARP_PKT 4096/0/2/0 (0x4b515545)
0: ca5f3400 FIBARPHRQ 1/0/0/0 (0x4b515545)
View the value for the ARP_PKT field in the command output, which is displayed in the W/X/Y/Z format.
¡ W represents the queue capacity, which is a fixed value.
¡ X represents the current queue length.
¡ Y represents the history maximum length of the queue.
¡ Z represents the number of dropped ARP packets in the queue.
If Z is not 0 and Y equals W, ARP packets are dropped due to a busy CPU. If Z is 0, go to the next step.
6. Collect detailed information about the ARP process. Execute the display mdc command to obtain the MDC number. Use the display process command to view the number of the ARP process corresponding to the MDC number. Based on the process number, use the view command to obtain detailed information about the ARP process and send it to H3C technical support staff.
[Sysname-probe] display process name karp/1
Job ID: 224
PID: 224
Parent JID: 2
Parent PID: 2
Executable path: -
Instance: 0
Respawn: OFF
Respawn count: 1
Max. spawns per minute: 0
Last started: Mon Apr 18 15:09:58 2022
Process state: sleeping
Max. core: 0
ARGS: -
TID LAST_CPU Stack PRI State HH:MM:SS:MSEC Name
224 0 0K 115 S 0:5:25:380 [karp/1]
1 in the karp/1 argument represents the MDC number. PID in the command output represents the number of the ARP process. Execute the view command to display detailed information about the ARP process numbered 224.
[Sysname-probe]view /proc/224/stack
[<c04c9cd4>] kepoll_wait+0x274/0x3c0
[<e1fb1372>] arp_Thread+0x42/0xd0 [system]
[<c043f1b4>] kthread+0xd4/0xe0
[<c0401daf>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff
7. Collect the following information and contact H3C Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
ARP response failure
Symptom
The device does not reply to the ARP request sent from the peer device.
Common causes
The following are the common causes of this type of issue:
· The target IP address in the ARP request received by the interface is not the IP address of the local device.
· The ARP request sent by the peer device triggers source MAC-based ARP attack detection on the local device.
· The ARP request sent by the peer device triggers ARP attack detection on the local device.
Troubleshooting flow
Figure 40 shows the troubleshooting flowchart.
Figure 40 Flowchart for troubleshooting ARP response failure
Solution
1. View information about the ARP request sent from the peer device to identify whether it is sent to the CPU.
a. Use the debugging arp packet command to enable ARP packet debugging. Then, configure the peer device to send an ARP request to the local device.
<Sysname> debugging arp packet
<Sysname> *Apr 21 17:38:05:489 2022 Sysname ARP/7/ARP_RCV: -MDC=1; Received an ARP message, operation: 1, sender MAC: 68cb-9c3f-0206, sender IP: 1.1.1.2, target MAC: 0000-0000-0000, target IP: 1.1.1.1
- If the target IP address is not the local device IP address, check the routing table and FIB of the peer device.
- If the target IP address is the local device IP address, go to the next step.
b. Use the debugging arp error command to enable ARP error debugging. Identify the ARP response failure cause according to Table 5.
Field |
Description |
Packet discarded for the network state of receiving interface is down. |
An ARP packet was discarded because the network layer state of the receiving interface was down. |
Packet discarded for the ARP packet is too short. |
An ARP packet was discarded because the packet was too short. |
Packet discarded for the ARP packet is error. |
An ARP packet was discarded because the packet was an error packet. |
Packet discarded for the link state of the port is down. |
An ARP packet was discarded because the link layer state of the receiving port went down. |
Packet discarded for the sender IP is invalid. |
An ARP packet was discarded because the sender IP address in the packet was invalid. |
Packet discarded for the sender IP is a broadcast IP. |
An ARP packet was discarded because the sender IP address in the packet was a broadcast IP address. |
Packet discarded for the target IP is invaild. |
An ARP packet was discarded because the target IP address in the packet was invalid. |
Packet discarded for the target IP is a broadcast IP. |
An ARP packet was discarded because the target IP address in the packet was a broadcast IP address. |
Failed to get the source MAC of the ARP reply. |
ARP failed to obtain the source MAC address of an ARP reply. |
Packet discarded for the source MAC is a multicast address. |
An ARP packet was discarded because the source MAC address in the packet was a multicast MAC address. |
Packet discarded for the source MAC is a broadcast address. |
An ARP packet was discarded because the source MAC address in the packet was a broadcast MAC address. |
Packet discarded for the sender MAC address is the same as the receiving interface. |
An ARP packet was discarded because the sender MAC address in the packet is the same as the MAC address of the receiving interface. |
Packet discarded for the number of ARP entries reaches the limit. |
An ARP packet was discarded because the maximum number of ARP entries was reached. |
Packet discarded for the type of receiving interface is L2VE. |
An ARP packet was discarded because the receiving interface of the packet was an L2VE interface. |
Packet discarded for conflict with static entry. |
An ARP packet was discarded because the ARP information in the packet conflicted with a static ARP entry. |
Packet discarded for memory alarm notification. |
An ARP packet was discarded because a memory alarm notification was received. |
Packet discarded for insufficient resources. |
An ARP packet was discarded because of insufficient resources. |
2. Identify whether the peer device MAC address is in a source MAC-based ARP attack entry. The following uses local interface GigabitEthernet2/0/1 as an example. Execute the display arp source-mac command to display ARP attack entries detected by source MAC-based ARP attack detection.
<Sysname> display arp source-mac interface gigabitethernet 2/0/1
Source-MAC VLAN/VSI name Interface Aging-time (sec)
23f3-1122-3344 4094 GE2/0/1 10
¡ If a source MAC-based ARP attack entry exists and the MAC address is the peer device MAC address, set the threshold for source MAC-based ARP attack detection as required. To set the threshold for source MAC-based ARP attack detection, use the arp source-mac threshold command.
¡ If the peer device MAC address is not in any source MAC-based ARP attack entry, go to the next step.
3. Identify whether the peer device triggers ARP attack detection. The following uses slot 1 as an example. Execute the display arp detection statistics attack-source command to display statistics for ARP attack sources.
<Sysname> display arp detection statistics attack-source slot 1
Interface VLAN MAC address IP address Number Time
GE2/0/1 1 0005-0001-0001 10.1.1.14 24 17:09:56
03-27-2017
¡ If an entry has the peer device MAC address, check the ARP attack detection configuration to identify whether inappropriate configuration causes the peer device to trigger ARP attack detection. If the configuration is inappropriate, edit it.
¡ If no entry has the peer device MAC address, go to the next step.
4. Use the display arp detection statistics packet-drop command to display statistics for packets dropped by ARP attack detection. Identify the reason why ARP attack detection is triggered according to the statistics.
<Sysname> display arp detection statistics packet-drop
State: U-Untrusted T-Trusted
ARP packets dropped by ARP inspect checking:
Interface/AC(State) IP Src-MAC Dst-MAC Inspect
GE2/0/1(U) 40 0 0 78
GE2/0/2(U) 0 0 0 0
GE2/0/3(T) 0 0 0 0
GE2/0/4(U) 0 0 30 0
GE2/0/5-srv1(U) 0 10 20 0
GE2/0/5-srv2(T) 10 0 20 22
Table 6 Command output
Field |
Description |
State |
State of an interface: · U—ARP untrusted interface or AC. · T—ARP trusted interface or AC. |
Interface/AC(State) |
Inbound interface or AC of ARP packets. State specifies the port or AC state, which is trusted or untrusted. |
IP |
Number of ARP packets discarded due to invalid sender and target IP addresses. |
Src-MAC |
Number of ARP packets discarded due to invalid source MAC address. |
Dst-MAC |
Number of ARP packets discarded due to invalid destination MAC address. |
Inspect |
Number of ARP packets that failed to pass user validity check. |
5. Use the display system internal arp statistics command to display ARP statistics on each card. Collect the content of the Error statistics field and send it to H3C technical support staff.
[Sysname-probe] display system internal arp statistics slot 1
Entry statistics:
Valid = 1 Dummy = 0
Long static = 0 Short resolved = 0
Multiport = 0 L3 short = 0
Packet = 1 OpenFlow = 0
Rule = 0 ARP input = 175
Resolved = 10
Static statistics:
Short static = 0 Long static = 0
Multiport = 0 Disabled = 0
Error statistics:
Memory = 0 Sync memory = 0
Packet = 10 Parameter = 0
IF = 0 Walk = 0
Add host route = 0 Del host route = 0
Local address = 0 Real time message = 0
Refresh rule = 0 Delete rule = 0
Smooth rule start = 0 Smooth rule end = 0
Running information:
Max ARP = 2048 Max multiport = 64
Default blackhole = 1 Max blackhole = 200
Timer queue = 0 Event queue = 0
Packet queue = 0 LIPC send queue = 0/0/0
6. Use the debugging arp entry command to enable ARP entry debugging. View the ARP entry status, collect related logs, and send them to H3C technical support staff.
<Sysname> debugging arp entry
<Sysname> ping -c 1 192.168.111.188
PING 192.168.111.188 (192.168.111.188): 56 data bytes, press CTRL_C to break
56 bytes from 192.168.111.188: icmp_seq=0 ttl=128 time=1.000 ms
--- 192.168.111.188 ping statistics ---
1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 1.000/1.000/1.000/0.000 ms
*Dec 17 14:28:34:762 2012 Sysname ARP/7/ARP_ENTRY: -MDC=1; ARP entry status ch
anged: MAC address: 000a-eb83-691e, IP address: 192.168.111.188, INITIALIZE -> N
O_AGE
Table 7 Command output
Field |
Description |
ARP entry status changed |
The status of an ARP entry changed. |
MAC address |
MAC address in the ARP entry. |
IP address |
IP address in the ARP entry. |
state1->state2 |
The status of the ARP entry changed from state1 to state2. An ARP entry has the following status: · INITIALIZE—The ARP entry is not resolved. · NO_AGE—The ARP entry does not age out. · AGING—Aging probe for the ARP entry has started. · AGED—The ARP entry ages out and is to be deleted. |
7. Collect the following information and contact H3C Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Traffic forwarding failure based on the existing ARP entry
Symptom
The device has learned an ARP entry but cannot forward traffic correctly.
Common causes
The following are the common causes of this type of issue:
· An abnormal parameter exists in the learned ARP entry.
· The learned ARP entry failed to be deployed to the driver.
Troubleshooting flow
Figure 41 shows the troubleshooting flowchart.
Figure 41 Flowchart for troubleshooting traffic forwarding failure based on the existing ARP entry
Solution
1. Identify whether an abnormal parameter exists in the learned ARP entry. Use the display system internal adj4 entry command to view ARP entry information. The following uses interface GigabitEthernet2/0/1 and peer IP address 1.1.1.2 as an example.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display system internal adj4 entry 1.1.1.2 interface gigabitethernet 2/0/1
ADJ4 entry:
Entry attribute : 0x0
Service type : Ethernet
Link media type : Broadcast
Action type : Forwarding
Entry flag : 0x0
Forward type : 0x0
Slot : 0
MTU : 1500
Driver flag : 2
Sequence No : 17
Physical interface : GE2/0/1
Logical interface : N/A
Virtual circuit information : 65535
ADJ index : 0xdc731e70
Peer address : 0.0.0.0
Reference count : 0
Reference Sequence : 9
MicroSegmentID : 0
Nexthop driver[0] : 0xffffffff
Nexthop driver[1] : 0xffffffff
Driver context[0] : 0xffffffff
Driver context[1] : 0xffffffff
Driver context[2] : 0xffffffff
Driver context[3] : 0xffffffff
Driver context[4] : 0xffffffff
Driver context[5] : 0xffffffff
Link head information(IP) : 68cb9c3f020668cb978f01060800
Link head information(MPLS) : 68cb9c3f020668cb978f01068847
¡ If the Action type field displays Forwarding, the device forwards traffic from 1.1.1.2 correctly and the device is not faulty.
¡ If the Action type field displays Drop, the device fails to forward traffic from 1.1.1.2. An abnormal parameter exists in the learned ARP entry.
- If the Driver flag field displays 4, driver resources are insufficient. Check the driver usage.
- If the Driver flag field does not display 4, go to the next step.
2. Identify whether the ARP entry is successfully deployed to the driver. Use the debugging system internal adj4 command and specify the hardware keyword to enable IPv4 adjacency entry debugging. Use the reset arp command to clear ARP entries from the ARP table. Then, use the ping command to send a packet to the peer device to trigger ARP learning. View the state of ARP deployment to the driver.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] debugging system internal adj4 hardware
[Sysname-probe] ping 1.1.1.2
Ping 1.1.1.2 (1.1.1.2): 56 data bytes, press CTRL+C to break
56 bytes from 1.1.1.2: icmp_seq=0 ttl=255 time=2.015 ms
*Apr 22 15:57:56:173 2022 Sysname ARP/7/ARP_SEND: -MDC=1; Sent an ARP message, operation: 1, sender MAC: 68cb-978f-0106, sender IP: 1.1.1.1, target MAC: 0000-0000-0000, target IP: 1.1.1.2
*Apr 22 15:57:56:173 2022 Sysname ARP/7/ARP_RCV: -MDC=1; Received an ARP message, operation: 2, sender MAC: 68cb-9c3f-0206, sender IP: 1.1.1.2, target MAC: 68cb-978f-0106, target IP: 1.1.1.1
*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_ENTRY: -MDC=1;
-------------ADJ4 Entry------------
IP address : 1.1.1.2
Route interface : GE2/0/1
Service type : Ethernet
Action type : Forwarding
Link media type : Broadcast
Physical interface : GE2/0/1
Logical interface : N/A
VSI Index : 4294967295
VPN Index : 0
MicroSegmentID : 0
MicSegOrigin : 5
Virtual Circuit information : 0xffff
Sequence : 1
Sequence for aging : 1
Slot : 0
MTU : 1500
*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_ENTRY: -MDC=1;
Add ADJ entry finished, Result : 0
*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_HARDWARE: -MDC=1;
====Start ADJLINK Add====
*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_HARDWARE: -MDC=1;
--------------- New Entry -------------
Service type : Ethernet
Link media type : Broadcast
Action type : Forwarding
EntryAttr : 0
IP address : 1.1.1.2
Route interface : GE2/0/1
Port interface : N/A
Slot : 0
MTU : 1500
VLAN ID : 65535
Second VLAN ID : 65535
Physical interface : GE2/0/1
Logical interface : N/A
VRF index : 0
VSI index : -1
VSI link ID : 65535
Usr ID : -1
MAC address : 68cb-9c3f-0206
Link head length(IP) : 14
Link head length(MPLS) : 14
Link head information(IP) : 68cb9c3f020668cb978f01060800
Link head information(MPLS) : 68cb9c3f020668cb978f01068847
*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_HARDWARE: -MDC=1;
----------- New Entry DrvContext ---------
Nexthop driver
[0]: 0xffffffff [1]: 0xffffffff
Driver context
[0]: 0xffffffff [1]: 0xffffffff [2]: 0xffffffff [3]: 0xffffffff [4]: 0xffffffff [5]: 0xffffffff
TRILL VN driver context
[0]: 0xffffffffffffffff [1]: 0xffffffffffffffff
*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_HARDWARE: -MDC=1;
====End ADJLINK Operate====
Result : 0x0, Reference flag : 0x0, Syn flag : 0x0
56 bytes from 1.1.1.2: icmp_seq=1 ttl=255 time=1.061 ms
56 bytes from 1.1.1.2: icmp_seq=2 ttl=255 time=0.908 ms
56 bytes from 1.1.1.2: icmp_seq=3 ttl=255 time=0.625 ms
56 bytes from 1.1.1.2: icmp_seq=4 ttl=255 time=0.580 ms
--- Ping statistics for 1.1.1.2 ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.580/1.038/2.015/0.520 ms
[Sysname-probe]%Apr 22 15:57:56:986 2022 Sysname PING/6/PING_STATISTICS: -MDC=1; Ping statistics for 1.1.1.2: 5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss, round-trip min/avg/max/std-dev = 0.580/1.038/2.015/0.520 ms.
¡ If the Result field displays 0x0, the ARP entry has been successfully deployed to the driver. Go to the next step.
¡ If the Result field does not display 0x0, the ARP entry failed to be deployed to the driver. Check the hardware resource usage under the guidance of H3C technical support staff.
3. Execute the following commands, collect the command outputs, and send them to H3C technical support staff:
¡ debugging system internal adj4 (with the notify keyword specified)
¡ debugging system internal fib prefix
4. Collect the following information and contact H3C Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
DHCP issues
· This document mainly introduces the procedures of troubleshooting attack protection issues on the DHCP server.
· For specific implementations of various attack prevention functions of DHCP, see DHCP Attack Protection Technology White Paper.
DHCP starvation attack prevention issues
About DHCP starvation attack prevention
A DHCP starvation attack occurs when an attacker constantly sends forged DHCP requests using different MAC addresses in the chaddr field to a DHCP server. As a result, legitimate DHCP clients cannot obtain IP addresses, because the IP address resources of the DHCP server are exhausted. To resolve this issue, enable the DHCP starvation attack prevention feature for the DHCP server.
Symptom
· Although the DHCP starvation attack prevention feature is enabled, the DHCP server still frequently runs out of IP address resources.
· A legitimate user cannot obtain any IP address from the DHCP server, because its requests are regarded as attack packets.
Common causes
The following are the common causes of this type of issue:
· The DHCP starvation attack prevention feature is not enabled on the client-facing interfaces of the DHCP server.
· When multiple DHCP relay agents exist between a DHCP client and the DHCP server, the DHCP server or non-first-hop relay agents are enabled with the MAC address check feature.
· The maximum number of ARP entries or MAC addresses that a client-facing interface can learn is unreasonable.
Troubleshooting flow
Figure 42 shows the troubleshooting flowchart.
Figure 42 Flowchart for troubleshooting DHCP starvation attack prevention issues
Solution
1. Check whether the DHCP starvation attack prevention feature is enabled on the client-facing interfaces of the DHCP server.
|
NOTE: Take this step when DHCP clients are directly connected to the DHCP server. If DHCP clients are connected to a DHCP relay agent, proceed to step 2. |
For better DHCP starvation attack prevention, configure the DHCP server to achieve DHCP starvation attack prevention against DHCP requests with different MAC addresses and with the same MAC address.
To achieve DHCP starvation attack prevention against DHCP requests with different MAC addresses:
¡ For a Layer 3 interface, use the arp max-learning-num command in Layer 3 interface view to set an ARP entry learning limit.
¡ For a Layer 2 interface, perform the following operations in Layer 2 interface view:
- Use the mac-address max-mac-count command to set an MAC learning limit.
- Use the undo mac-address max-mac-count enable-forwarding command to disable forwarding unknown frames received on the interface after the MAC learning limit on the interface is reached.
You can use the display this command to view the configuration of a client-facing interface on the DHCP server.
¡ Display Layer 3 interface configuration.
<Sysname> system-view
[Sysname] interface GigabitEthernet 2/0/1
[Sysname-GigabitEthernet2/0/1] display this
#
interface GigabitEthernet2/0/1
port link-mode route
arp max-learning-num 10
...
If no ARP entry limit is configured on the interface, use the arp max-learning-num command in Layer 3 interface view to set an ARP entry learning limit.
¡ Display Layer 2 interface configuration.
<Sysname> system-view
[Sysname] interface GigabitEthernet 2/0/1
[Sysname-GigabitEthernet2/0/1] display this
#
interface GigabitEthernet2/0/1
port link-mode bridge
mac-address max-mac-count 600
undo mac-address max-mac-count enable-forwarding
...
If the interface does not have any configuration about DHCP starvation attack prevention, perform the following operations in Layer 2 interface view:
- Use the mac-address max-mac-count command to set an MAC learning limit.
- Use the undo mac-address max-mac-count enable-forwarding command to disable forwarding unknown frames received on the interface after the MAC learning limit on the interface is reached.
To achieve DHCP starvation attack prevention against DHCP requests with the same MAC address, use the dhcp relay check mac-address command to enable MAC address check on all client-facing interfaces. The MAC address check feature enables the DHCP server to compare the chaddr field of a received DHCP request with the source MAC address in the frame header. If they are the same, the DHCP server verifies the packet legal and continues processing the packet. If they are not the same, the DHCP server discards the request.
You can use the display this command to check whether the MAC address check feature is enabled on a client-facing interface of the DHCP server.
<Sysname> system-view
[Sysname] interface GigabitEthernet 2/0/1
[Sysname-GigabitEthernet2/0/1] display this
#
interface GigabitEthernet2/0/1
port link-mode route
dhcp relay check mac-address
...
If the MAC address check feature is not enabled, use the dhcp relay check mac-address command to enable this feature on the interface.
2. Check whether the DHCP starvation attack prevention feature is configured correctly on the DHCP server or DHCP relay agent.
|
NOTE: Take this step when a DHCP client is connected to a DHCP relay agent for communication with the DHCP server. If no DHCP relay agent is deployed on the network, skip this step. |
a. Check whether an ARP entry learning limit or MAC learning limit is configured on the client-facing interfaces of the DHCP relay agent or the DHCP server. The check process is similar as step 1.
b. Check whether the DHCP server or non-first-hop relay agents are enabled with the MAC address check feature.
When a Layer 3 device forwards a DHCP request to the DHCP server, the Layer 3 device replaces the source MAC address of the DHCP request with its MAC address. On receipt of the packet from the Layer 3 device, the DHCP server or a non-first-hop DHCP relay agent will consider that packet as an attack packet.
When multiple DHCP relay agents exist between a DHCP client and the DHCP server, follow these guidelines as a best practice:
- Disable the MAC address check feature on the client-facing interfaces of the DHCP server and non-first-hop DHCP relay agents.
To disable the MAC address check feature on a client-facing interface of the DHCP server, use the undo dhcp relay check mac-address command. To disable the MAC address check feature on a client-facing interface of a non-first-hop DHCP relay agent, use the undo dhcp relay check mac-address command.
- Enable the MAC address check feature only on the client-facing interfaces of the first-hop DHCP relay agent.
For more information about how to check whether the MAC address check feature is enabled on a DHCP relay agent, see step 1.
3. Check whether the maximum number of ARP entries or MAC addresses that a client-facing interface can learn is unreasonable.
You can use the display this command in any view of the DHCP server to view the ARP entry learning limit or MAC learning limit on a client-facing interface.
¡ Display Layer 3 interface configuration.
<Sysname> system-view
[Sysname] interface GigabitEthernet 2/0/1
[Sysname-GigabitEthernet2/0/1] display this
#
interface GigabitEthernet2/0/1
port link-mode route
arp max-learning-num 10
...
If no ARP entry limit is configured on the interface, use the arp max-learning-num command in Layer 3 interface view to set an ARP entry learning limit.
¡ Display Layer 2 interface configuration.
<Sysname> system-view
[Sysname] interface GigabitEthernet 2/0/1
[Sysname-GigabitEthernet2/0/1] display this
#
interface GigabitEthernet2/0/1
port link-mode bridge
mac-address max-mac-count 600
...
If the ARP entry learning limit or MAC learning limit is much greater than the number of assignable IPs on the DHCP server, numerous users will fail to obtain IPs from the DHCP server. If the ARP entry learning limit or MAC learning limit is too small, the DHCP server might discard DHCP requests from legitimate users.
To ensure successful IP address acquisition and correct communication between legitimate users and the DHCP server, set a reasonable ARP entry learning limit or MAC learning limit. As a best practice, use the default ARP entry learning limit or MAC learning limit. If the default one cannot meet the service requirement, you can use the arp max-learning-num command or the mac-address max-mac-count command in interface view to set a new learning limit.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Collect the debugging results after you use the debugging dhcp server all command.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
ND issues
ND learning failure
Symptom
The device cannot learn ND entries, causing traffic forwarding failure.
Common causes
The following are the common causes of this type of issue:
· The memory is insufficient.
· The physical layer state of the interface is not up.
· The IPv6 addresses of the local interface and the peer interface do not reside on the same network segment.
· ND packets fail to be sent to the CPU.
· A card is faulty.
· ND packets are dropped due to a busy CPU.
Troubleshooting flow
Figure 43 shows the troubleshooting flowchart.
Figure 43 Flowchart for troubleshooting ND learning failure
Solution
1. Use the display memory-threshold command to identify whether the memory is insufficient.
<Sysname> display memory-threshold
Memory usage threshold: 100%
Free-memory thresholds:
Minor: 96M
Severe: 64M
Critical: 48M
Normal: 128M
Early-warning: 256M
Secure: 304M
Current free-memory state: Normal (secure)
¡ If the Current free-memory state field displays Normal or Normal (secure), go to the next step.
¡ If the Current free-memory state field displays Minor, Severe, Critical, or Normal (early-warning), check the device memory usage and troubleshoot the insufficient memory issue.
2. Check the network configuration and interface state.
a. Use the display interface command to identify whether the interface is up. If the interface is not up, troubleshoot the issue.
b. Use the display ipv6 fib ipv6-address command to view IPv6 FIB entry information. ipv6-address specifies the IPv6 address in an ND entry. If the corresponding IPv6 FIB entry does not exist, the routing module might be faulty. For more information about troubleshooting routing module issues, see "Troubleshooting Layer 3—IP Routing." If the corresponding IPv6 FIB entry exists but the next hop address is not the address of the direct next hop, check the connection between the device and its next hop.
c. Use the display ipv6 interface command to view the IPv6 address of the interface.
- Identify whether the IPv6 address of the local interface resides on the same network segment as the peer interface. If the IPv6 addresses reside on different network segments, execute the ipv6 address command in interface view to edit the IPv6 addresses.
- Identify whether the local interface IPv6 address conflicts with the peer interface IPv6 address. If a conflict has occurred, execute the ipv6 address command in interface view to edit the IPv6 addresses.
- Identify whether the peer interface is the one where the next hop resides.
d. Use the ping ipv6 command to identify whether a link failure exists.
3. Identify whether IPv6 packets are sent and received correctly.
a. Use the debugging ipv6 packet command to enable IPv6 packet debugging. Then, execute the ping ipv6 command to identify whether the device sends and receives IPv6 packets correctly.
<Sysname> debugging ipv6 packet
<Sysname> ping ipv6 -c 1 1::2
Ping6(56 data bytes) 1::1 --> 1::2, press CTRL+C to break
*Apr 26 11:37:33:402 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
LocalSending, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::1, Dst = 1::2,
prompt: Output an IPv6 Packet.
*Apr 26 11:37:33:402 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Sending, interface = GigabitEthernet2/0/1, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::1, Dst = 1::2,
prompt: Sending the packet from local interface GigabitEthernet2/0/1.
The command output indicates that the device has successfully sent an IPv6 packet on interface GigabitEthernet2/0/1.
*Apr 26 11:37:33:402 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
LocalSending, version = 6, traffic class = 224,
flow label = 0, payload length = 32, protocol = 58, hop limit = 255,
Src = 1::1, Dst = ff02::1:ff00:2,
prompt: Output an IPv6 Packet.
*Apr 26 11:37:33:402 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Sending, interface = GigabitEthernet2/0/1, version = 6, traffic class = 224,
flow label = 0, payload length = 32, protocol = 58, hop limit = 255,
Src = 1::1, Dst = ff02::1:ff00:2,
prompt: Sending the packet from local interface GigabitEthernet2/0/1.
56 bytes from 1::2, icmp_seq=0 hlim=64 time=19.336 ms
--- Ping6 statistics for 1::2 ---
1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 19.336/19.336/19.336/0.000 ms
<Sysname>*Apr 26 11:37:33:421 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Receiving, interface = GigabitEthernet2/0/1, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::2, Dst = 1::1,
prompt: Received an IPv6 packet.
The command output indicates that the device has received an IPv6 packet.
*Apr 26 11:37:33:421 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Delivering, interface = GigabitEthernet2/0/1, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::2, Dst = 1::1,
prompt: Delivering the IPv6 packet to the upper layer.
The command output indicates that the device sent the received IPv6 packet to the CPU.
%Apr 26 11:37:33:422 2022 Sysname PING/6/PING_STATISTICS: -MDC=1; Ping6 statistics for 1::2: 1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss, round-trip min/avg/max/std-dev = 19.336/19.336/19.336/0.000 ms.
- If the device has sent and received IPv6 packets successfully, go to the next step.
- If the device failed to send or receive an IPv6 packet, go to the next step.
b. Use the debugging ipv6 error command to enable IPv6 packet error debugging. Identify the IPv6 packet sending or receiving failure cause according to Table 8.
Field |
Description |
Number of IPv6 fragments exceeded the threshold. |
Number of IPv6 fragments exceeded the threshold. |
Number of IPv6 reassembly queues exceeded the threshold. |
Number of IPv6 reassembly queues exceeded the threshold. |
Invalid IPv6 packet. |
The IPv6 packet was invalid. |
Failed to process the hop-by-hop extension header. |
The system failed to process the hop-by-hop extension header. |
Failed to process the hop-by-hop option. |
The system failed to process the hop-by-hop option in the packet. |
The packet was discarded by services. |
The packet was discarded by the service. |
The packet was administratively discarded. |
The IPv6 packet was administratively discarded. |
4. Identify whether a card is faulty. The following uses the card in slot 1 as an example. Use the display system internal nd statistics command to view ND statistics of the card.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display system internal nd statistics slot 1
Entry statistics:
Valid : 1 Dummy : 0
Packet : 1 OpenFlow : 0
Long static : 0 Short static : 0
Temp node : 0 Rule : 0
Static statistics:
Short : 0 Long interface : 0
Long port : 0
Process statistics:
Input : 7 Resolving : 11
Error statistics:
Memory : 0 Sync : 0
Packet : 0 Parameter : 0
Anchor : 0 Get address : 0
Refresh FIB : 0 Delete FIB : 0
Realtime Sync : 0 Temp node : 0
Exceed limit : 0 Refresh rule : 0
Delete rule : 0 Smooth rule start : 0
Smooth rule end : 0 RA : 0
Origin : 0 Final RA : 0
a. If the value for the input field is not 0, go to the next step. If the value for the input field is 0, troubleshoot the card issue.
b. Collect the content of the Error statistics field and send it to H3C technical support staff.
5. Identify whether ND packets are dropped due to a busy CPU. Use the view command to view information about ND in the /proc/kque system directory and identify the situation and reason of ND packet dropping.
[Sysname-probe] view /proc/kque | in ND
0: dd0e0a00 ARP_SEND 1024/0/0/0 (0x4b515545)
0: dd0e6d00 ND_TIMER 1024/0/5/0 (0x4b515545)
0: dd0e6e00 ND_SINGLEEVENT 1/0/0/0 (0x4b515545)
0: dd0e6f00 ND_MACNOTIFYEVENT 1/0/0/0 (0x4b515545)
0: dcec4000 ND_RULE 4096/0/0/0 (0x4b515545)
0: dcec4200 ND_MICROSEGMENT 2048/0/0/0 (0x4b515545)
0: dcec4300 ND_MACNOTIFY 2048/0/0/0 (0x4b515545)
0: dcec4400 ND_MAC_EVENT 1/0/0/0 (0x4b515545)
0: d2da7800 OVERLAY_VNDEL 1/0/0/0 (0x4b515545)
0: ca5f3800 FIB6NDHRQ 1/0/0/0 (0x4b515545)
0: ca3f7600 ND_VSISUP_PKT 4096/0/0/0 (0x4b515545)
0: ca3f7400 NDSNP_PKT 4096/0/0/0 (0x4b515545)
0: ca3f7700 NDRAPG_PKT 4096/0/0/0 (0x4b515545)
0: ca3f7800 ND_EVENT 8192/0/1/0 (0x4b515545)
0: ca3f7900 ND_PKT 4096/0/1/0 (0x4b515545)
View the value for the ND_PKT field in the command output, which is displayed in the W/X/Y/Z format.
¡ W represents the queue capacity, which is a fixed value.
¡ X represents the current queue size.
¡ Y represents the history maximum length of the queue.
¡ Z represents the number of dropped ND packets in the queue.
If Z is not 0 and Y equals W, ND packets are dropped due to a busy CPU. If Z is 0, go to the next step.
6. Collect specific information about the ND process. Execute the display mdc command to show MDC-related information and obtain the MDC number. Use the display process command to view the process number of the ND process corresponding to the MDC number. Then, display the specific information of the ND process using the view command based on the process number, and send the specific information to the H3C Technical Support.
[Sysname-probe] display process name knd/1
Job ID: 55763
PID: 55763
Parent JID: 2
Parent PID: 2
Executable path: -
Instance: 0
Respawn: OFF
Respawn count: 1
Max. spawns per minute: 0
Last started: Tue Apr 26 11:32:31 2022
Process state: sleeping
Max. core: 0
ARGS: -
TID LAST_CPU Stack PRI State HH:MM:SS:MSEC Name
55763 0 0K 115 S 0:0:13:490 [kND/1]
The "1" in "knd/1" indicates that the MDC number is 1. In the displayed information above, the "PID" value shows that the process ID of the ND process is 55763. Next, execute the view command to display detailed information about the ND process with process ID 55763.
[Sysname-probe] view /proc/55763/stack
[<c04c9cd4>] kepoll_wait+0x274/0x3c0
[<e2021612>] nd_Thread+0x62/0x100 [system]
[<c043f1b4>] kthread+0xd4/0xe0
[<c0401daf>] kernel_thread_helper+0x7/0x10
[<ffffffff>] 0xffffffff
7. Collect the following information and contact H3C Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
NS packet response failure
Symptom
The device does not reply to the NS packet sent from the peer device.
Common Causes
The following are the common causes of this type of issue:
· The destination IPv6 address in the NS packet received by the interface is not the IPv6 address of the local device.
Troubleshooting flow
Figure 44 shows the troubleshooting flowchart.
Figure 44 Flowchart for troubleshooting NS packet response failure
Solution
1. View information about the ND packet sent from the peer device to identify whether it is sent to the CPU.
a. Use the debugging ipv6 packet command to enable IPv6 packet debugging. Then, configure the peer device to send an NS packet to the local device.
<Sysname> debugging ipv6 packet
*Apr 26 13:33:34:897 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Receiving, interface = GigabitEthernet2/0/1, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::2, Dst = 1::1,
prompt: Received an IPv6 packet.
- If the destination IPv6 address is not the local device IPv6 address, check the routing table and FIB of the peer device.
- If the destination IPv6 address is the local device IP address, go to step b.
b. Use the debugging ipv6 error command to enable IPv6 packet error debugging. Identify the NS packet response failure cause according to Table 9.
Table 9 Output from the debugging ipv6 error command
Field |
Description |
Number of IPv6 fragments exceeded the threshold. |
Number of IPv6 fragments exceeded the threshold. |
Number of IPv6 reassembly queues exceeded the threshold. |
Number of IPv6 reassembly queues exceeded the threshold. |
Invalid IPv6 packet. |
The IPv6 packet was invalid. |
Failed to process the hop-by-hop extension header. |
The system failed to process the hop-by-hop extension header. |
Failed to process the hop-by-hop option. |
The system failed to process the hop-by-hop option in the packet. |
The packet was discarded by services. |
The packet was discarded by the service. |
The packet was administratively discarded. |
The IPv6 packet was administratively discarded. |
2. Use the display system internal nd statistics command to display ND statistics on each card. Collect the content of the Error statistics field and send it to H3C technical support staff.
The following uses the card in slot 1 as an example. Use the display system internal nd statistics command to display ND statistics on each card. Identify whether a card is faulty.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display system internal nd statistics slot 1
Entry statistics:
Valid : 1 Dummy : 0
Packet : 1 OpenFlow : 0
Long static : 0 Short static : 0
Temp node : 0 Rule : 0
Static statistics:
Short : 0 Long interface : 0
Long port : 0
Process statistics:
Input : 7 Resolving : 11
Error statistics:
Memory : 0 Sync : 0
Packet : 0 Parameter : 0
Anchor : 0 Get address : 0
Refresh FIB : 0 Delete FIB : 0
Realtime Sync : 0 Temp node : 0
Exceed limit : 0 Refresh rule : 0
Delete rule : 0 Smooth rule start : 0
Smooth rule end : 0 RA : 0
Origin : 0 Final RA : 0
¡ Check the Input field to identify whether the card receives ND packets correctly.
¡ Collect the content of the Error statistics field and send it to H3C technical support staff.
3. Collect the following information and contact H3C Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Traffic forwarding failure based on the existing ND entry
Symptom
The device has learned an ND entry but cannot forward traffic correctly.
Common causes
The following are the common causes of this type of issue:
· An abnormal parameter exists in the learned ND entry.
· The learned ND entry failed to be deployed to the driver.
Troubleshooting flow
Figure 45 shows the troubleshooting flowchart.
Figure 45 Flowchart for troubleshooting traffic forwarding failure based on the existing ND entry
Solution
1. Use the display system internal adj6 entry command to identify whether an abnormal parameter exists in the learned ND entry. The following uses interface GigabitEthernet2/0/1 and peer IPv6 address 1::2 as an example.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display system internal adj6 entry 1::2 interface gigabitethernet 2/0/1
ADJ6 entry:
Entry attribute : 0x0
Service type : Ethernet
Link media type : Broadcast
Action type : Forwarding
Entry flag : 0x4
Forward type : 0x0
Slot : 0
MTU : 1500
Driver flag : 2
Sequence No : 17
Physical interface : GE2/0/1
Logical interface : N/A
Virtual circuit information : 65535
ADJ index : 0xdc780c38
Peer address : ::
Reference count : 0
Reference Sequence : 3
MicroSegmentID : 0
Nexthop driver[0] : 0xffffffff
Nexthop driver[1] : 0xffffffff
Driver context[0] : 0xffffffff
Driver context[1] : 0xffffffff
Driver context[2] : 0xffffffff
Driver context[3] : 0xffffffff
Driver context[4] : 0xffffffff
Driver context[5] : 0xffffffff
Link head information(IPv6) : 68cb9c3f020668cb978f010686dd
Link head information(MPLS) : 68cb9c3f020668cb978f01068847
¡ If the Action type field displays Forwarding, the device forwards traffic from 1::2 correctly and the device is not faulty.
¡ If the Action type field displays Drop, the device fails to forward traffic from 1::2. An abnormal parameter exists in the learned ND entry.
- If the Driver flag field displays 4, driver resources are insufficient. Check the driver usage.
- If the Driver flag field does not display 4, go to the next step.
2. Use the debugging system internal adj6 command and specify the hardware keyword to enable IPv6 adjacency entry debugging. Use the ping ipv6 command to trigger ND learning. Identify whether the ND entry is successfully deployed to the driver.
[Sysname-probe] debugging system internal adj6 hardware
[Sysname-probe] ping ipv6 -c 1 1::2
Ping6(56 data bytes) 1::1 --> 1::2, press CTRL+C to break
56 bytes from 1::2, icmp_seq=0 hlim=64 time=2.868 ms
--- Ping6 statistics for 1::2 ---
1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 2.868/2.868/2.868/0.000 ms
<Sysname>*Apr 26 16:06:42:412 2022 Sysname IP6PMTU/7/IP6PMTU_DBG: -MDC=1; Binding socket to PMTU succeeded
*Apr 26 16:06:42:412 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
LocalSending, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::1, Dst = 1::2,
prompt: Output an IPv6 Packet.
*Apr 26 16:06:42:412 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Sending, interface = GigabitEthernet0/0/1, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::1, Dst = 1::2,
prompt: Sending the packet from local interface GigabitEthernet0/0/1.
*Apr 26 16:06:42:413 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
LocalSending, version = 6, traffic class = 224,
flow label = 0, payload length = 32, protocol = 58, hop limit = 255,
Src = 1::1, Dst = ff02::1:ff00:2,
prompt: Output an IPv6 Packet.
*Apr 26 16:06:42:413 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Sending, interface = GigabitEthernet0/0/1, version = 6, traffic class = 224,
flow label = 0, payload length = 32, protocol = 58, hop limit = 255,
Src = 1::1, Dst = ff02::1:ff00:2,
prompt: Sending the packet from local interface GigabitEthernet0/0/1.
*Apr 26 16:06:42:414 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Receiving, interface = GigabitEthernet0/0/1, version = 6, traffic class = 224,
flow label = 0, payload length = 32, protocol = 58, hop limit = 255,
Src = 1::2, Dst = 1::1,
prompt: Received an IPv6 packet.
*Apr 26 16:06:42:414 2022 Sysname ADJ6/7/ADJ6_HARDWARE: -MDC=1;
====Start ADJLINK Add====
*Apr 26 16:06:42:414 2022 Sysname ADJ6/7/ADJ6_HARDWARE: -MDC=1;
--------------New Entry-------------
Service type : Ethernet
Link media type : Broadcast
Action type : Forwarding
IPv6 address : 1::2
Route interface : GE0/0/1
Port interface : N/A
Slot : 0
MTU : 1500
VLAN id : 65535
Second VLAN id : 65535
Physical interface : GE0/0/1
Logical interface : N/A
Vrf index : 0
VSI index : -1
VSI link ID : 65535
Usr ID : -1
MAC address : 68cb-9c3f-0206
Link head length(IPv6) : 14
Link head length(MPLS) : 14
Link head information(IPv6) : 68cb9c3f020668cb978f010686dd
Link head information(MPLS) : 68cb9c3f020668cb978f01068847
Nexthop driver
[0]: 0xffffffff [1]: 0xffffffff
Driver context
[0]: 0xff
*Apr 26 16:06:42:414 2022 Sysname ADJ6/7/ADJ6_HARDWARE: -MDC=1;
====End ADJLINK Operate====
Result : 0x0, Reference flag : 0x0, Syn flag : 0x0
*Apr 26 16:06:42:415 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Receiving, interface = GigabitEthernet0/0/1, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::2, Dst = 1::1,
prompt: Received an IPv6 packet.
*Apr 26 16:06:42:415 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;
Delivering, interface = GigabitEthernet0/0/1, version = 6, traffic class = 0,
flow label = 0, payload length = 64, protocol = 58, hop limit = 64,
Src = 1::2, Dst = 1::1,
prompt: Delivering the IPv6 packet to the upper layer.
%Apr 26 16:06:42:416 2022 Sysname PING/6/PING_STATISTICS: -MDC=1; Ping6 statistics for 1::2: 1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss, round-trip min/avg/max/std-dev = 2.868/2.868/2.868/0.000 ms.
*Apr 26 16:06:42:417 2022 Sysname IP6PMTU/7/IP6PMTU_DBG: -MDC=1; Unbinding PMTU from socket succeeded
¡ If the Result field displays 0x0, the ND entry has been successfully deployed to the driver. Go to the next step.
¡ If the Result field does not display 0x0, the ND entry failed to be deployed to the driver. Check the hardware resource usage.
3. Execute the following commands, collect the command outputs, and send them to H3C technical support staff.
¡ debugging system internal adj6 (with the notify keyword specified)
¡ debugging system internal ipv6 fib prefix
4. Collect the following information and contact H3C Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting Layer 3 IP routing issues
BGP issues
BGP session unable to enter Established state
Symptom
The session between the local router and a peer or peer group cannot transition to Established state.
Common causes
The following are the common causes of this type of issue:
· BGP packet forwarding is blocked.
· The packets used for establishing or maintaining the BGP TCP connection are filtered out by ACLs.
· A router ID conflict exists between the BGP peers within the autonomous system.
· The specified peer or peer group AS number is incorrect.
· The peer address specified for peer session establishment is the IP address of a loopback interface on the peer router. However, on the peer router, the peer connect-interface command is not executed, or the source IP address specified in the peer connect-interface command is not the specified loopback interface IP address.
· When the local router establishes a BGP TCP connection with the peer router, the TCP packets sent by both ends are too large. Consequently, TCP connection establishment fails, because those TCP packets are discarded by intermediate nodes that have a small output interface MTU and do not support packet fragmentation the packets.
· The EBGP peer address specified on the local router is the IP address of a loopback interface on the EBGP peer router, but the peer router is not configured with the peer ebgp-max-hop command.
· MD5 authentication fails, because both ends of the BGP session are not configured the same key by using the peer password command.
· When the peer ttl-security command is executed to enable GTSM for the specified peer or peer group, the maximum hop count is incorrectly configured. Consequently, the peer or peer group cannot pass the GTSM check.
· The BGP session is terminated, because the number of BGP routes sent by the peer to the local router exceeds the upper limit set by using the peer route-limit command.
· The peer ignore, ignore all-peers, or shutdown process is configured on either end of the BGP session.
· Although the local router and the peer router are enabled to exchange routing information, their respective configurations are not in the same address family view.
Analysis
Figure 46 shows the troubleshooting flowchart:
Figure 46 Troubleshooting flowchart
Solution
1. Identify whether the link to the BGP peer is operating correctly.
a. Identify whether the peer-facing interface is in UP state.
b. Use the ping command to test connectivity with the BGP peer. If the ping succeeds, the link between the local router and the BGP peer is operating correctly. In this case, proceed to step 2. If the ping fails, proceed to step c.
|
NOTE: As a best practice, use the ping –a source-ip –s packet-size or ping ipv6 –a source-ipv6 –s packet-size command to test connectivity with the BGP peer. The –a source-ip and –a source-ipv6 parameters specify the source IP address of ICMP echo requests. The –s packet-size parameter specifies the length of ICMP echo requests, which helps you monitor the transmission of long packets. The source IP for the ping should be the local interface IP used for BGP session establishment, and the destination IP should be the peer interface IP used for BGP session establishment. |
c. Repeat the ping –a source-ip –s packet-size command with a decreasing –s packet-size value. If the ping succeeds when the –s packet-size parameter is decreased to a certain value, the cause of this issue is that the TCP packets sent for BGP TCP connection establishment are too long and they are dropped by intermediate devices. To resolve this issue, perform either of the following tasks:
- Repeat the ping –a source-ip –s packet-size command and gradually reduce the value for the –s packet-size parameter until you find an appropriate value. As a best practice to ensure optimal forwarding efficiency, the final value should be the maximum value ensuring that the ping can succeed. Then, set the final value as the MTU value of the output interfaces for BGP packets. To achieve this goal, you can execute the ip/ipv6 mtu mtu-size or tcp mss value command on the related interfaces. Alternatively, you can execute the peer tcp-mss command in BGP instance view or BGP-VPN instance view. The ip/ipv6 mtu mtu-size command specifies the MTU value for an interface, and the peer tcp-mss command specifies the TCP MSS. You can use the following formula for TCP MSS calculation: TCP MSS = MTU - IP header length - TCP header length
- Execute the tcp path-mtu-discovery command to enable TCP path MTU discovery in system view. Then, the device dynamically obtains the smallest MTU value along the path used for TCP connection establishment, and calculates an MSS accordingly. When the device attempts to establish a BGP TCP connection, it determines the length of TCP packets based on the calculated MSS.
If the ping always fails no matter how you adjust the value for the –s packet-size parameter, troubleshoot this issue as described in Layer 3—IP Services Troubleshooting Guide.
d. If the issue persists, proceed to step 2.
2. Identify whether a BGP TCP connection has been established between the local router and the BGP peer.
Execute the display tcp command, and then identify whether the output displays the following TCP connection:
¡ Local address: IP address of the local router.
¡ Peer address: IP address of the related BGP peer.
¡ Peer port: 179.
¡ State: ESTABLISHED.
For example:
<Sysname> display tcp
*: TCP connection with authentication
Local Addr:port Foreign Addr:port State PCB
0.0.0.0:179 12.1.1.2:0 LISTEN 0xffffffffffffff9d
12.1.1.1:28160 12.1.1.2:179 ESTABLISHED 0xffffffffffffff9e
If such a TCP connection exists, proceed to step 3. If not, perform the following checks:
¡ Execute the display ip routing-table or display ipv6 routing-table command, and then identify whether the routing table contains an IGP route to the IPv4 or IPv6 peer address used for BGP session establishment. If such a route does not exist, check for incorrect IGP routing settings. For more information about troubleshooting IGP issues, see OSPF, OSPFv3, or IS-IS troubleshooting guide in Layer 3—IP Routing Troubleshooting Guide.
¡ Execute the display acl all command to check for a rule that denies port bgp. For example:
<Sysname> display acl all
Advanced IPv4 ACL 3077, 2 rules,
ACL's step is 5
rule 1 deny tcp destination-port eq bgp
rule 2 deny tcp source-port eq bgp
If such a rule exists, execute the undo rule command to remove the rule.
¡ Execute the debugging tcp packet command to identify whether an authentication failure occurs upon TCP connection establishment. For example:
<Sysname> debugging tcp packet acl 3000
*Feb 5 20:03:39:289 2021 Sysname SOCKET/7/INET: -MDC=1;
TCP Input: Failed to check md5, drop the packet.
As shown in the command output, BGP failed to pass MD5 authentication when it attempted to initiate a TCP connection. In this situation, execute the peer password command to configure the same key at both ends of the BGP TCP connection.
<Sysname> debugging tcp packet acl 3000
*Feb 5 20:03:39:289 2021 Sysname SOCKET/7/INET: -MDC=1;
TCP Input: Failed to check keychain, drop the packet.
As shown in the command output, BGP failed to pass keychain authentication when it attempted to initiate a TCP connection. In this situation, execute the peer keychain command at both ends of the BGP TCP connection to ensure the following requirements are met:
- The keys used by the two ends at the same time must have the same ID.
- The keys with the same ID must use the same authentication algorithm and key string.
<Sysname> debugging tcp packet acl 3000
*Feb 5 20:03:39:289 2021 Sysname SOCKET/7/INET: -MDC=1;
TCP Input: Failed to get IPSEC profile, index 500, name profile1(inpcb profile2), return 0x3fff.
As shown in the command output, BGP failed to pass IPsec authentication when it attempted to initiate a TCP connection. In this situation, make sure the peer ipsec-profile command is executed at both ends of the BGP TCP connection.
If the issue persists, proceed to step 3.
3. Identify whether the local router has a router ID conflict with the peer or peer group, or whether the specified peer or peer group AS number is incorrect.
a. Execute the display bgp peer command, and then view the BGP local router ID field in the output to identify whether a router ID conflict exists. If a router ID conflict is found, execute the router-id command in the BGP instance or BGP-VPN instance that requires establishing a BGP session, to change the router ID of the BGP router.
<Sysname> display bgp peer ipv4 unicast
BGP local router ID: 12.1.1.1
Local AS number: 10
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
12.1.1.2 20 3 3 0 0 00:00:25 Established
b. Execute the display bgp peer command, and then view the AS field in the output to identify whether the AS number specified for the peer or peer group is incorrect. If the AS number is incorrect, execute the peer as-number command to correct the AS number. For example:
<Sysname> display bgp peer ipv4 unicast
BGP local router ID: 12.1.1.1
Local AS number: 10
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
12.1.1.2 20 3 3 0 0 00:00:25 Established
c. If the issue persists, proceed to step 4.
4. Execute the display this command in BGP instance view to check for configurations that affect BGP session establishment:
Table 10 Check items that affect BGP session establishment
Check Item |
Description |
peer { group-name | ipv4-address [ mask-length ] | ipv6-address [ prefix-length ] } connect-interface interface-type interface-number |
When this configuration exists on the local router, the BGP peer must also use a loopback interface address for BGP session establishment. To meet this requirement, you can use this command or the peer source-address command. |
peer ipv4-address [ mask-length ] source-address source-ipv4-address peer ipv6-address [ prefix-length ] source-address source-ipv6-address |
If this configuration exists on the local router, the BGP peer must also use a loopback interface address for BGP session establishment. To meet this requirement, you can use this command or the peer connect-interface command. |
peer { group-name | ipv4-address [ mask-length ] | ipv6-address [ prefix-length ] } ebgp-max-hop [ hop-count ] |
This command is required in one of the following situations: · Two indirectly-connected devices need to establish an EBGP session. · Two directly-connected devices need to establish an EBGP session through their loopback interfaces. To ensure successful EBGP session establishment, execute this command at both ends of the EBGP session. |
peer { group-name | ipv4-address [ mask-length ] | ipv6-address [ prefix-length ] } ttl-security hops hop-count |
If this configuration exists, the local router accepts BGP packets from the specified peer only when the TTLs of those BGP packets are within the valid TTL range. The valid TTL range is from 255 – the hop-count value + 1 to 255. If the number of hops between the local router and the specified peer exceeds the hop-count value, execute this command to adjust the hop-count value. |
peer { group-name | ipv4-address [ mask-length ] | ipv6-address [ prefix-length ] | link-local-address interface interface-type interface-number } route-limit prefix-number [ reconnect reconnect-time | percentage-value ] * |
If this configuration exists on the local router and the number of routes received from the specified peer or peer group exceeds the prefix-number value, the local router will disconnect from the peer or peer group. To avoid this issue, reduce the number of routes sent by the peer or peer group or increase the prefix-number value. |
peer { group-name | ipv4-address [ mask-length ] | ipv6-address [ prefix-length ] | link-local-address interface interface-type interface-number } ignore [ graceful graceful-time { community { community-number | aa:nn } | local-preference preference | med med } * ] |
If this configuration exists, the local router will not establish a BGP session with the specified peer or peer group. To resolve this issue, execute the undo peer ignore command with the peer or peer group specified. |
ignore all-peers [ graceful graceful-time { community { community-number | aa:nn } | local-preference preference | med med } * ] |
If this configuration exists, the local router cannot establish BGP sessions with any peers. In this situation, the local router might be undergoing a network upgrade or maintenance task, and the related BGP process is temporarily unavailable. As a best practice, execute the undo peer ignore or undo ignore all-peers command after the upgrade or maintenance task is completed. |
shutdown process |
If this configuration exists, the local router cannot establish BGP sessions with any peers. In this situation, the local router might be undergoing a network upgrade or maintenance task, and the related BGP process is temporarily unavailable. As a best practice, execute the undo shutdown process command after the upgrade or maintenance task is completed. |
The peer enable command in the related address family |
When two devices need to establish a BGP session, you must execute the peer enable command on each of them with the other specified. Make sure the peer enable command is executed in the same address family. If this configuration exists on the local router, verify that the peer is also configured with the peer enable command in the same address family. |
If the issue persists, proceed to step 5.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
After the snmp-agent trap enable bgp command is executed in system view, the router generates the following alarm message:
Module name: BGP4-MIB
· bgpBackwardTransition (1.3.6.1.2.1.15.7.2)
Log messages
N/A
BGP session down
Symptom
The device generates a BGP/5/BGP_STATE_CHANGED log message, which notifies that the state of a BGP session transitioned from Established to Idle.
Common causes
The following are the common causes of this type of issue:
· KEEPALIVE or UPDATE message sending/receiving timed out.
· TCP connection establishment failed.
· The local device has reached a memory threshold.
· An error occurred in parsing BGP messages.
Analysis
Figure 47 shows the troubleshooting flowchart:
Figure 47 Troubleshooting flowchart
Solution
Execute the display bgp peer log-info command to identify the cause of this issue. The common causes include:
· A BGP timer expired.
If the output of the display bgp peer log-info command is similar to the following:
<Sysname> display bgp peer ipv4 3.3.3.3 log-info
Peer: 3.3.3.3
Date Time State Notification
Error/SubError
17-Jan-2022 14:48:34 Down Receive notification with error 4/0
Hold Timer Expired/ErrSubCode Unspecified
Keepalive last triggered time: 14:48:31-2022.1.17
Keepalive last sent time : 14:48:31-2022.1.17
Update last sent time : 14:48:24-2022.1.17
EPOLLOUT last occurred time : 14:48:30-2022.1.17
The BGP session went down because the local router could not receive a KEEPALIVE or UPDATE message from the peer before the hold timer expired. After the hold timer expired, the local device actively terminated the BGP session and sent a NOTIFICATION message to the peer.
A timer timeout issue might occur in one of the following situations:
¡ The device sends a KEEPALIVE or UPDATE message to a peer normally, but the message fails to reach the peer or the peer does not process the message in time.
¡ The device fails to generate a KEEPALIVE or UPDATE message in time due to scheduling issues.
To resolve this issue, execute the display system internal bgp log command in probe view at both ends of the BGP session, collect the command output, and then contact Technical Support for further analysis.
· A TCP connection error occurred.
If the output of the display bgp peer log-info command is similar to the following:
<Sysname> display bgp peer ipv4 1.1.1.1 log-info
Peer: 1.1.1.1
Date Time State Notification
Error/SubError
17-Jan-2022 14:42:01 Down Receive TCP_Connection_Failed event
The BGP session went down due to a TCP connection error. If BGP uses TCP as the transport layer protocol, and a TCP connection error occurs between the two BGP peers, the related BGP session will be terminated. If the output of the display bgp peer log-info command is different from the above example, but contains a NOTIFICATION message with error code 5/0, the cause of this issue is also a TCP connection error.
After you confirm that the BGP session went down due to a TCP connection error, perform the following task:
a. Execute the view /proc/tcp/tcp_log slot x command in probe view at both ends of the BGP session (execute this command once for each card or member device).
b. Collect the command output.
c. Contact Technical Support for further analysis.
· The memory was insufficient.
If the output of the display bgp peer log-info command is similar to the following:
<Sysname> display bgp peer ipv4 1.1.1.1 log-info
Peer: 1.1.1.1
Date Time State Notification
Error/SubError
17-Jan-2022 15:38:53 Down Send notification with error 6/8
Entered severe memory state
17-Jan-2022 14:53:51 Down Send notification with error 6/8
No memory to process the attribute
The device did not have enough memory to run BGP-related functions, which caused the BGP session termination. The cause of this issue corresponds to error code 6/8 in the output of the display bgp peer log-info command.
In this case, perform the following task:
d. Execute the display memory-threshold command at both ends of the BGP session to obtain the memory alarm thresholds.
e. Collect the output of the display bgp peer log-info command.
f. Contact Technical Support for further analysis.
· An error occurred in parsing BGP messages.
If the two ends of a BGP session have different message parsing capabilities or have a version mismatch, they might not be able to parse the BGP packets received from each other and thus might be disconnected. This type of issue corresponds to error codes 1, 2, and 3 in the output of the display bgp peer log-info command (where the Error part in the Error/SubError field is 1, 2, or 3).
Execute the debugging bgp raw-packet, debugging bgp open, and debugging bgp update commands at both ends of the BGP session, collect the output of those commands and the display bgp peer log-info command, and then contact Technical Support for further analysis.
· If the cause of this issue displayed in the output of the display bgp peer log-info command is not any of the above, collect the following information and contact Technical Support:
¡ Output of the display bgp peer log-info command.
¡ Output of the display system internal bgp log command.
¡ Output of the view /proc/tcp/tcp_log slot x command (executed once for each card or member device).
¡ The configuration file, log messages, and alarm messages.
Table 11 lists the detailed reasons for BGP peer disconnection and their corresponding error codes.
Table 11 Reasons for BGP peer disconnection
Error code/subcode |
Reason for peer disconnection |
Description |
1/1 |
connection not synchronized |
The two ends of the connection were not synchronized. The current implementation is that the first 16 bytes of the received message's header do not contain only Fs. |
1/2 |
bad message length |
Invalid message length. |
1/3 |
bad message type |
Invalid message type. |
3/1 |
the withdrawn length is too large |
The length of routing information to be withdrawn was too long. |
the attribute length is too large |
The attribute length was too long. |
|
one attribute appears more than once |
A path attribute appeared multiple times in an UPDATE message. |
|
the attribute length is too small |
The attribute length was less than two bytes. |
|
exntended length field is less than two octets |
The attribute length was extensible, but it was less than two bytes. |
|
the length field is less than one octet |
The attribute length was not extensible, but it was less than one byte. |
|
link-state attribute error |
The link-state attribute was in incorrect form. |
|
3/2 |
unrecognized well-known attribute |
Unknown well-known attribute. |
3/3 |
attribute-type attribute missed |
The attribute-type attribute was lost. The values for the attribute-type argument include: · ORIGIN · AS_PATH · LOCAL_PREF · NEXT_HOP |
3/4 |
attribute flags error |
Incorrect attribute flags. |
3/5 |
attribute-type attribute length error |
The length of the attribute-type attribute was invalid. The values for the attribute-type argument include: · AS_PATH · AS4_PATH · CLUSTER_LIST · AGGREGATOR · AS4_AGGREGATOR · ORIGIN · NEXT_HOP · MED · LOCAL_PREF · ATOMIC_AGGREGATE · ORIGINATOR_ID · MP_REACH_NLRI · COMMUNITIES · EXT-COMMUNITIES |
attribute length exceeds |
The attribute length crossed the limit. |
|
3/6 |
invalid ORIGIN attribute |
Invalid ORIGIN attribute. |
3/8 |
invalid NEXT_HOP attribute |
Invalid NEXT_HOP attribute. |
3/9 |
invalid nexthop length in MP_REACH_NLRI (address-family) |
The Nexthop length in the MP_REACH_NLRI attribute was invalid for the address-family address family. The values for the address-family argument include: · 4u—IPv4 unicast address family. · IPv4 Flowspec—IPv4 flowspec address family. · MPLS—MPLS address family. · VPNv4—VPNv4 address family · 6u—IPv6 unicast address family. · VPNv6—VPNv6 address family. · L2VPN—L2VPN address family. |
the length of MP_UNREACH_NLRI is too small |
The length of the MP_UNREACH_NLRI attribute was less than three bytes. |
|
the MP NLRI attribute length exceeds |
The length of the MP_REACH_NLRI or MP_UNREACH_NLRI attribute crossed the limit. |
|
erroneous MP NLRI attribute end position |
The reachable or unreachable prefix and the path attribute ended at different positions. |
|
3/10 |
invalid network field |
Invalid network field. |
3/11 |
malformed AS_PATH |
The AS_PATH attribute was malformed. |
4/0 |
Keepalive last triggered time |
Most recent time when KEEPALIVE message sending was triggered. |
Keepalive last sent time |
Most recent time when a KEEPALIVE message was sent. |
|
Update last sent time |
Most recent time when an UPDATE message was sent. |
|
EPOLLOUT last occurred time |
Most recent time when an EPOLLOUT event occurred. |
|
Keepalive last received time |
Most recent time when a KEEPALIVE message was received. |
|
Update last received time |
Most recent time when an UPDATE message was received. |
|
EPOLLIN last occurred time |
Most recent time when an EPOLLIN event occurred. |
|
5/0 |
connection retry timer expires |
The ConnectRetry timer expired. |
TCP_CR_Acked event received |
A TCP_CR_Acked event was received. |
|
TCP_Connection_Confirmed event received |
A TCP_Connection_Confirmed event was received. |
|
5/3 |
open message received |
An OPEN message was received. |
6/0 |
manualstop event received |
A manualstop event was received. |
physical interface configuration changed |
Physical configurations changed, such as interface settings. |
|
session down event received from BFD |
A BFD session down event was received. |
|
6/1 |
maximum number of prefixes reached |
The number of route prefixes has exceeded the upper limit specified by using the peer route-limit command. |
maximum number of address-family prefixes reached |
The number of route prefixes in the address-family address family has exceeded the upper limit specified by using the peer route-limit command. The values for the address-family argument include: · IPv4 unicast—IPv4 unicast address family. · IPv6 unicast—IPv6 unicast address family. · VPNv4—VPNv4 address family. · VPNv6—VPNv6 address family. |
|
6/2 |
configuration of peer ignore changed |
The peer ignore command was configured. |
6/3 |
address family deleted |
An address family was deleted. |
peer disabled |
A peer was disabled. |
|
6/4 |
administrative reset |
The BGP session was reset because of the reset bgp command or configuration changes. |
6/5 |
connection rejected |
The connection request was rejected. |
6/6 |
other configuration change |
Other configurations changed. |
6/7 |
connection collision resolution |
A connection conflict occurred. |
two connections exist and MD5 authentication is configured for the neighbor |
Two connections existed and MD5 authentication was configured for one of them. |
|
6/8 |
no memory to process the attribute |
The memory was insufficient for attribute parsing. |
no memory for the route |
Failed to obtain memory resources for route or label block generation. |
|
no memory to generate unreachable NLRI |
Failed to obtain memory resources for MP_UNREACH_NLRI encapsulation. |
|
no memory to generate a message |
Failed to obtain memory resources for message encapsulation. |
|
can't get the VPN RD |
Failed to obtain RDs upon prefix parsing. |
|
can't get the VPN routing table |
Failed to obtain the VPN routing table upon prefix parsing. |
|
can't get the attributes |
Failed to obtain attributes upon prefix parsing. |
|
entered severe memory state |
A severe memory usage alarm was triggered. |
|
entered critical memory state |
A critical memory usage alarm was triggered. |
Related alarm and log messages
Alarm messages
N/A
Log messages
· BGP/5/BGP_STATE_CHANGED
· BGP/5/BGP_STATE_CHANGED_REASON
· BGP/6/BGP_PEER_STATE_CHG
BGP routing loop in a cross-AS data center interconnect scenario
Symptom
As shown in Figure 48, two data centers are interconnected across ASs through BGP. RR 1 learns BGP routes with the same prefix (for example, 10.110.0.0/16) from Border 3 and Border 4 in Data Center 2. The next hops for those routes are the loopback interface addresses of Border 3 and Border 4, respectively. RR 1 selects the route from Border 3 or Border 4 as the optimal route. Border 1 and Border 2 send default routes to RR 1 through BGP, with the next hops being IP addresses of the interfaces directly connected to RR 1. If Border 3 or Border 4 restarts, the devices in Data Center 1 cannot access network segment 10.110.0.0/16 during the restart. Packets destined for the network segment loop between RR 1 and Border 1 or RR 1 and Border 2.
Common causes
Before Border 3 or Border 4 restarts, the BGP routing table and IP routing table of RR 1 are similar to the following:
<RR1> display bgp routing-table ipv4
Total number of routes: 4
BGP local router ID is 9.9.9.9
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Network NextHop MED LocPrf PrefVal Path/Ogn
* >i 0.0.0.0/0 19.1.1.1 100 0 i
* i 29.1.1.2 100 0 i
* >e 10.110.0.0/16 3.3.3.3 0 0 20i
* e 4.4.4.4 0 0 20i
<RR1> display ip routing-table
Destinations : 25 Routes : 25
Destination/Mask Proto Pre Cost NextHop Interface
0.0.0.0/0 BGP 255 0 19.1.1.1 GE2/0/1
0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
1.1.1.1/32 O_INTRA 10 1 19.1.1.1 GE2/0/1
2.2.2.2/32 O_INTRA 10 1 29.1.1.2 GE2/0/2
3.3.3.3/32 O_INTRA 10 1 39.1.1.3 GE2/0/3
4.4.4.4/32 O_INTRA 10 1 49.1.1.4 GE2/0/4
9.9.9.9/32 Direct 0 0 127.0.0.1 InLoop0
10.10.10.10/32 BGP 255 0 1.1.1.1 GE2/0/1
19.1.1.0/24 Direct 0 0 19.1.1.9 GE2/0/1
19.1.1.0/32 Direct 0 0 19.1.1.9 GE2/0/1
19.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0
19.1.1.255/32 Direct 0 0 19.1.1.9 GE2/0/1
10.110.0.0/16 BGP 255 0 3.3.3.3 GE2/0/3
29.1.1.0/24 Direct 0 0 29.1.1.9 GE2/0/2
29.1.1.0/32 Direct 0 0 29.1.1.9 GE2/0/2
29.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0
29.1.1.255/32 Direct 0 0 29.1.1.9 GE2/0/2
39.1.1.0/24 Direct 0 0 39.1.1.9 GE2/0/3
39.1.1.0/32 Direct 0 0 39.1.1.9 GE2/0/3
39.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0
39.1.1.255/32 Direct 0 0 39.1.1.9 GE2/0/3
49.1.1.0/24 Direct 0 0 29.1.1.9 GE2/0/2
49.1.1.0/32 Direct 0 0 29.1.1.9 GE2/0/2
49.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0
49.1.1.255/32 Direct 0 0 29.1.1.9 GE2/0/2
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0
127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
According to the above command output, RR 1 learned routes destined for the loopback interfaces of Border 3 and Border 4 through IGP. BGP network route 10.110.0.0/16 was iterated to the learned loopback interface routes.
After Border 4 restarts, RR 1 does not disconnect from Border 4 unless the session hold timer expires, and the routing table of RR 1 still retain network route 10.110.0.0/16 (received from Border 4). However, the network route can be iterated only to the default route (0.0.0.0/0), because the IGP route for next hop 4.4.4.4 has become invalid and RR 1 does not have other network routes that contain IP address 4.4.4.4.
In the routing table of RR 1, you can find the following information:
· The IGP metric value is 1 for the next hop of network route 10.110.0.0/16 received from Border 3, which corresponds to route entry 3.3.3.3/32 O_INTRA 10 1 39.1.1.3 GE2/0/3.
· The IGP metric value is 0 for the next hop of network route 10.110.0.0/16 received from Border 4, which corresponds to route entry 0.0.0.0/0 BGP 255 0 19.1.1.1 GE2/0/1.
According to the BGP route selection rules, RR 1 chooses the route from Border 4 as the optimal route. In the forwarding table, the next hop for network segment 10.110.0.0/16 changes to . Consequently, RR 1 forwards packets destined for network segment 10.110.0.0/16 to Border 1. Then, Border 1 forwards those packets back to RR 1, because Border 1 learned network route 10.110.0.0/16 from RR 1. This causes a routing loop.
Analysis
Figure 49 shows the troubleshooting flowchart:
Figure 49 Troubleshooting flowchart
Solution
1. View the BGP routing table and IP routing table of RR 1. This example uses the network shown in Figure 48 for illustration.
a. After Border 4 restarts, if you execute the display bgp routing-table ipv4 command on RR 1 before RR 1 is disconnected from Border 4, you can find that network route 10.110.0.0/16 received from Border 4 is still active and is the optimal route.
<RR1> display bgp routing-table ipv4
Total number of routes: 5
BGP local router ID is 9.9.9.9
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Network NextHop MED LocPrf PrefVal Path/Ogn
* >i 0.0.0.0/0 19.1.1.1 100 0 i
* i 29.1.1.2 100 0 i
* >e 10.110.0.0/16 4.4.4.4 0 0 20i
* e 3.3.3.3 0 0 20i
b. After you execute the display ip routing-table verbose command on RR 1, you can find that the output interface and real next hop for network route 10.110.0.0/16 have changed to the interface directly connected to Border 1 and the interface’s IP address (19.1.1.1), respectively.
<RR1> display ip routing-table 10.110.0.0/16 verbose
Summary count : 1
Destination: 10.110.0.0/16
Protocol: BGP instance default
Process ID: 0
SubProtID: 0x6 Age: 00h00m19s
FlushedAge: 00h00m19s
Cost: 0 Preference: 255
IpPre: N/A QosLocalID: N/A
Tag: 0 State: Active Adv
OrigTblID: 0x0 OrigVrf: default-vrf
TableID: 0x2 OrigAs: 20
NibID: 0x16000002 LastAs: 20
AttrID: 0x2
BkAttrID: 0xffffffff Neighbor: 4.4.4.4
Flags: 0x10060 OrigNextHop: 4.4.4.4
Label: NULL RealNextHop: 19.1.1.1
BkLabel: NULL BkNextHop: N/A
SRLabel: NULL Interface: GigabitEthernet2/0/1
BkSRLabel: NULL BkInterface: N/A
Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/1
BkTunnel ID: Invalid BkIPInterface: N/A
InLabel: NULL ColorInterface: N/A
SIDIndex: NULL BkColorInterface: N/A
FtnIndex: 0x0 TunnelInterface: N/A
TrafficIndex: N/A BkTunnelInterface: N/A
Connector: N/A PathID: 0x0
UserID: 0x0 SRTunnelID: Invalid
SID Type: N/A NID: Invalid
FlushNID: Invalid BkNID: Invalid
BkFlushNID: Invalid StatFlags: 0x0
SID: N/A
BkSID: N/A
CommBlockLen: 0 Priority: Low
MemberPort: N/A
c. After you execute the display ip routing-table command, you can find the following information:
- The IP routing table does not contain other network routes that contain IP address 4.4.4.4.
- The output interface and next hop IP for the default route are and 19.1.1.1, respectively.
This indicates that network route 10.110.0.0/16 received from Border 4 has been iterated to the default route.
<RR1> display ip routing-table
Destinations : 25 Routes : 25
Destination/Mask Proto Pre Cost NextHop Interface
0.0.0.0/0 BGP 255 0 19.1.1.1 GE2/0/1
0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
1.1.1.1/32 O_INTRA 10 1 19.1.1.1 GE2/0/1
2.2.2.2/32 O_INTRA 10 1 29.1.1.2 GE2/0/2
3.3.3.3/32 O_INTRA 10 1 39.1.1.3 GE2/0/3
9.9.9.9/32 Direct 0 0 127.0.0.1 InLoop0
10.10.10.10/32 BGP 255 0 1.1.1.1 GE2/0/1
19.1.1.0/24 Direct 0 0 19.1.1.9 GE2/0/1
19.1.1.0/32 Direct 0 0 19.1.1.9 GE2/0/1
19.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0
19.1.1.255/32 Direct 0 0 19.1.1.9 GE2/0/1
10.110.0.0/16 BGP 255 0 4.4.4.4 GE2/0/1
29.1.1.0/24 Direct 0 0 29.1.1.9 GE2/0/2
29.1.1.0/32 Direct 0 0 29.1.1.9 GE2/0/2
29.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0
29.1.1.255/32 Direct 0 0 29.1.1.9 GE2/0/2
39.1.1.0/24 Direct 0 0 39.1.1.9 GE2/0/3
39.1.1.0/32 Direct 0 0 39.1.1.9 GE2/0/3
39.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0
39.1.1.255/32 Direct 0 0 39.1.1.9 GE2/0/3
49.1.1.0/24 Direct 0 0 29.1.1.9 GE2/0/2
49.1.1.0/32 Direct 0 0 29.1.1.9 GE2/0/2
49.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0
49.1.1.255/32 Direct 0 0 29.1.1.9 GE2/0/2
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0
127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
If none of the above situations exists, contact Technical Support for help.
2. Use one of the following methods to remove the routing loop:
¡ Configure routing policies to filter recursive routes.
Execute the protocol bgp nexthop recursive-lookup route-policy route-policy-name command in RIB IPv4 address family view. This operation ensures that all BGP IPv4 network routes are iterated only to routes that can pass the routing policy specified by the route-policy-name argument.
Similarly, execute the protocol bgp4+ nexthop recursive-lookup route-policy route-policy-name command in RIB IPv6 address family view. This operation ensures that all BGP IPv6 network routes are iterated only to routes that can pass the routing policy specified by the route-policy-name argument.
In this scenario, create a routing policy on RR 1 that filters out the default route, and execute the protocol bgp nexthop recursive-lookup route-policy route-policy-name or protocol bgp nexthop recursive-lookup route-policy route-policy-name command to apply the routing policy. This configuration eliminates the BGP routing loop by preventing BGP routes from being iterated to the default route.
¡ Enable BFD for BGP.
After BFD is enabled for BGP, RR 1 uses BFD sessions to monitor the links to Border 3 and Border 4. If Border 3 or Border 4 restarts, BFD will detect link failures immediately. In this case, RR 1 will promptly terminate the related BGP session and delete the routes learned from Border 3 or Border 4. To enable BFD for BGP, execute the peer bfd command. For more information about this task, see the command reference.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
BGP routing loop in a cross-AS Spine-Leaf interconnect scenario
Symptom
As shown in Figure 50, the spine devices and the leaf devices are in different ASs. The spine devices are fully meshed. Spine 1 and Spine 2 each establish EBGP connections with the leaf devices. Spine 2 is enabled with load balancing and can perform load balancing across EBGP and IBGP routes. When Spine 1 restarts, traffic is routed to the leaf devices via Spine 2, and half of the traffic is lost.
Common causes
Before Spine 1 restarts, the BGP routing table of Spine 2 is similar to the following:
<Spine2> display bgp routing-table ipv4
Total number of routes: 3
BGP local router ID is 2.2.2.2
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Network NextHop MED LocPrf PrefVal Path/Ogn
* >i 0.0.0.0/0 24.1.1.4 100 0 i
* >e 100.1.1.0/24 23.1.1.3 0 0 20i
* i 1.1.1.1 0 100 0 20i
Leaf 2 receives network route 100.1.1.0/24 from both Leaf 1 (23.1.1.3) and Spine 1 (1.1.1.1). The next hop for the route received from Spine 1 is a loopback interface address of Spine 1.
The IP routing table of Spine 2 is similar to the following:
<Spine2> display ip routing-table
Destinations : 24 Routes : 25
Destination/Mask Proto Pre Cost NextHop Interface
0.0.0.0/0 BGP 255 0 24.1.1.4 GE2/0/1
0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
1.1.1.1/32 O_INTRA 10 1 12.1.1.1 GE2/0/2
2.2.2.2/32 Direct 0 0 127.0.0.1 InLoop0
4.4.4.4/32 O_INTRA 10 1 24.1.1.4 GE2/0/1
12.1.1.0/24 Direct 0 0 12.1.1.2 GE2/0/2
12.1.1.0/32 Direct 0 0 12.1.1.2 GE2/0/2
12.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0
12.1.1.255/32 Direct 0 0 12.1.1.2 GE2/0/2
14.1.1.0/24 O_INTRA 10 2 12.1.1.1 GE2/0/2
O_INTRA 10 2 24.1.1.4 GE2/0/1
23.1.1.0/24 Direct 0 0 23.1.1.2 GE2/0/3
23.1.1.0/32 Direct 0 0 23.1.1.2 GE2/0/3
23.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0
23.1.1.255/32 Direct 0 0 23.1.1.2 GE2/0/3
24.1.1.0/24 Direct 0 0 24.1.1.2 GE2/0/1
24.1.1.0/32 Direct 0 0 24.1.1.2 GE2/0/1
24.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0
24.1.1.255/32 Direct 0 0 24.1.1.2 GE2/0/1
100.1.1.0/24 BGP 255 0 23.1.1.3 GE2/0/3
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0
127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
For network route 100.1.1.0/24 received from Leaf 1, the IGP route to its next hop is 23.1.1.0/24, and the IGP metric is 0. For network route 100.1.1.0/24 received from Spine 1, the IGP route to its next hop is 1.1.1.1/32, and the IGP metric is 1. The two network routes 100.1.1.0/24 cannot establish a load balancing relationship in the BGP routing table, because their IGP metrics are different. This is desired by the network administrator: Spine 2 forwards traffic destined for network segment 100.1.1.0/24 to the leaf device rather than Spine 1.
Spine 3 advertises a default route to Spine 2 through BGP, and the next hop of the default route is the interface IP address directly connected to Spine 2. After Spine 1 restarts, Spine 2 retains the session to Spine 1 unless the session hold timer expires, and the routing table of Spine 2 still retain network route 100.1.1.0/24 received from Spine 1. However, the network route can be iterated only to the default route (0.0.0.0/0), because the IGP route for next hop 1.1.1.1 has become invalid and Spine 2 does not have other network routes that contain IP address 1.1.1.1.
In the BGP routing table of Spine 2, the IGP metric value is 0 for the next hop of network route 100.1.1.0/24 from Spine 1, which corresponds to route entry 0.0.0.0/0 BGP 255 0 24.1.1.4 GE2/0/1. Network routes 100.1.1.0/24 from Spine 1 and the leaf device have the same IGP metric value, so they can establish a load balancing relationship. After traffic destined for network segment 100.1.1.0/24 arrives at Spine 2, half of the traffic is distributed to Spine 3. Then, Spine 3 forwards the traffic back to Spine 2, because Spine 3 learned network route 100.1.1.0/24 from Spine 1 and Spine 2. This causes a routing loop and route loss.
Analysis
Figure 51 shows the troubleshooting flowchart:
Figure 51 Troubleshooting flowchart
Solution
1. View the BGP routing table and IP routing table of Spine 2. This example uses the network shown in Figure 50 for illustration.
a. After Spine 1 restarts, if you execute the display bgp routing-table ipv4 command on Spine 2 before Spine 2 is disconnected from Spine 1, you can find that two network routes 10.110.0.0/24 received from different devices are simultaneously selected as optimal routes.
<Spine2> display bgp routing-table ipv4
Total number of routes: 3
BGP local router ID is 2.2.2.2
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Network NextHop MED LocPrf PrefVal Path/Ogn
* >i 0.0.0.0/0 24.1.1.4 100 0 i
* >e 100.1.1.0/24 23.1.1.3 0 0 20i
* >i 1.1.1.1 0 100 0 20i
b. After you execute the display ip routing-table verbose command on Spine 2, you can find the following information:
- The two network routes 10.110.0.0/24 have established a load balancing relationship.
- For one of the routes, the real next hop is interface IP address 24.1.1.4 of Spine 3, and the output interface is the interface that directly connects Spine 2 to Spine 3.
<Spine2> display ip routing-table 100.1.1.0/24 verbose
Summary count : 2
Destination: 100.1.1.0/24
Protocol: BGP instance default
Process ID: 0
SubProtID: 0x5 Age: 00h00m13s
FlushedAge: 00h00m13s
Cost: 0 Preference: 255
IpPre: N/A QosLocalID: N/A
Tag: 0 State: Active Adv
OrigTblID: 0x0 OrigVrf: default-vrf
TableID: 0x2 OrigAs: 20
NibID: 0x16000002 LastAs: 10
AttrID: 0x2
BkAttrID: 0xffffffff Neighbor: 1.1.1.1
Flags: 0x10060 OrigNextHop: 1.1.1.1
Label: NULL RealNextHop: 24.1.1.4
BkLabel: NULL BkNextHop: N/A
SRLabel: NULL Interface: GigabitEthernet2/0/1
BkSRLabel: NULL BkInterface: N/A
Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/1
BkTunnel ID: Invalid BkIPInterface: N/A
InLabel: NULL ColorInterface: N/A
SIDIndex: NULL BkColorInterface: N/A
FtnIndex: 0x0 TunnelInterface: N/A
TrafficIndex: N/A BkTunnelInterface: N/A
Connector: N/A PathID: 0x0
UserID: 0x0 SRTunnelID: Invalid
SID Type: N/A NID: Invalid
FlushNID: Invalid BkNID: Invalid
BkFlushNID: Invalid StatFlags: 0x0
SID: N/A
BkSID: N/A
CommBlockLen: 0 Priority: Low
MemberPort: N/A
Destination: 100.1.1.0/24
Protocol: BGP instance default
Process ID: 0
SubProtID: 0x6 Age: 01h18m22s
FlushedAge: 00h00m13s
Cost: 0 Preference: 255
IpPre: N/A QosLocalID: N/A
Tag: 0 State: Active Adv
OrigTblID: 0x0 OrigVrf: default-vrf
TableID: 0x2 OrigAs: 20
NibID: 0x16000000 LastAs: 20
AttrID: 0x0
BkAttrID: 0xffffffff Neighbor: 23.1.1.3
Flags: 0x10060 OrigNextHop: 23.1.1.3
Label: NULL RealNextHop: 23.1.1.3
BkLabel: NULL BkNextHop: N/A
SRLabel: NULL Interface: GigabitEthernet2/0/3
BkSRLabel: NULL BkInterface: N/A
Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/3
BkTunnel ID: Invalid BkIPInterface: N/A
InLabel: NULL ColorInterface: N/A
SIDIndex: NULL BkColorInterface: N/A
FtnIndex: 0x0 TunnelInterface: N/A
TrafficIndex: N/A BkTunnelInterface: N/A
Connector: N/A PathID: 0x0
UserID: 0x0 SRTunnelID: Invalid
SID Type: N/A NID: Invalid
FlushNID: Invalid BkNID: Invalid
BkFlushNID: Invalid StatFlags: 0x0
SID: N/A
BkSID: N/A
CommBlockLen: 0 Priority: Low
MemberPort: N/A
c. After you execute the display ip routing-table command, you can find the following information:
- The IP routing table does not contain other network routes that contain IP address 1.1.1.1.
- The output interface and next hop IP for the default route are and 24.1.1.4, respectively.
This indicates that network route 100.1.1.0/24 received from Spine 1 has been iterated to the default route.
<Spine2> display ip routing-table
Destinations : 23 Routes : 24
Destination/Mask Proto Pre Cost NextHop Interface
0.0.0.0/0 BGP 255 0 24.1.1.4 GE2/0/1
0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
2.2.2.2/32 Direct 0 0 127.0.0.1 InLoop0
4.4.4.4/32 O_INTRA 10 1 24.1.1.4 GE2/0/1
12.1.1.0/24 Direct 0 0 12.1.1.2 GE2/0/2
12.1.1.0/32 Direct 0 0 12.1.1.2 GE2/0/2
12.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0
12.1.1.255/32 Direct 0 0 12.1.1.2 GE2/0/2
14.1.1.0/24 O_INTRA 10 2 24.1.1.4 GE2/0/1
23.1.1.0/24 Direct 0 0 23.1.1.2 GE2/0/3
23.1.1.0/32 Direct 0 0 23.1.1.2 GE2/0/3
23.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0
23.1.1.255/32 Direct 0 0 23.1.1.2 GE2/0/3
24.1.1.0/24 Direct 0 0 24.1.1.2 GE2/0/1
24.1.1.0/32 Direct 0 0 24.1.1.2 GE2/0/1
24.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0
24.1.1.255/32 Direct 0 0 24.1.1.2 GE2/0/1
100.1.1.0/24 BGP 255 0 1.1.1.1 GE2/0/1
BGP 255 0 23.1.1.3 GE2/0/3
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0
127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0
127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
If none of the above situations exists, contact Technical Support for help.
2. Use one of the following methods to remove the routing loop:
¡ Configure routing policies to filter recursive routes.
Execute the protocol bgp nexthop recursive-lookup route-policy route-policy-name command in RIB IPv4 address family view. This operation ensures that all BGP IPv4 network routes are iterated only to routes that can pass the routing policy specified by the route-policy-name argument.
Similarly, execute the protocol bgp4+ nexthop recursive-lookup route-policy route-policy-name command in RIB IPv6 address family view. This operation ensures that all BGP IPv6 network routes are iterated only to routes that can pass the routing policy specified by the route-policy-name argument.
In this scenario, create a routing policy on Spine 2 that filters out the default route, and execute the protocol bgp nexthop recursive-lookup route-policy route-policy-name or protocol bgp nexthop recursive-lookup route-policy route-policy-name command to apply the routing policy. This configuration eliminates the BGP routing loop by preventing BGP routes from being iterated to the default route.
¡ Enable BFD for BGP.
After BFD is enabled for BGP, Spine 1 and Spine 2 uses a BFD session to monitor their link. If Spine 1 restarts, BFD will detect a link failure immediately. In this case, Spine 2 will promptly terminate the related BGP session and delete the routes learned from Spine 1. To enable BFD for BGP, execute the peer bfd command. For more information about this task, see the command reference.
¡ Verify that EBGP and IBGP routes cannot establish a load balancing relationship.
In this example, the two routes for network segment 100.1.1.0/24 are learned from an IBGP peer and an EBGP peer, respectively. When you configure the balance command in the related BGP instance, do not specify the eibgp keyword. Without this keyword specified, Spine 2 selects only network route 100.1.1.0/24 received from the leaf device as the optimal route, according to the BGP route selection rules. This ensures that all traffic can be forwarded correctly.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Public traffic interrupted in BGP network
Symptom
Public traffic is interrupted when it is forwarded through BGP.
Common causes
The following are the common causes of this type of issue:
· The next hop of the related BGP public route is unreachable.
· The distribution or reception policy for BGP public routes is inappropriate.
· The related route is discarded, because the number of BGP public routes has exceeded the maximum number of routes that the device can receive.
Analysis
Figure 52 shows the troubleshooting flowchart:
Figure 52 Troubleshooting flowchart
Solution
1. Identify whether the required BGP public route exists and is valid.
Based on the next hop of the BGP route, the expected forwarding path for public network traffic, and the network topology plan, locate the sender of the BGP public route. On the sender, execute the display bgp routing-table ipv4 unicast or display bgp routing-table ipv6 unicast command to view BGP public route information.
a. If the required BGP public route does not exist, use the import-route or network command to generate the route. After the BGP route is generated or if the required BGP public route already exists, proceed to step b.
b. Identify whether the required BGP public route is valid. A BGP route is valid only if it has a reachable next hop. Take route 10.2.1.0/24 as an example. If this route is marked with an asterisk (*) in the command output, it is a valid route.
<Sysname> display bgp routing-table ipv4 unicast
Total number of routes: 4
BGP local router ID is 192.168.100.1
Status codes: * - valid, > - best, d - dampened, h - history
s - suppressed, S - stale, i - internal, e - external
a – additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Network NextHop MED LocPrf PrefVal Path/Ogn
* > 10.2.1.0/24 10.2.1.1 0 0 i
e 10.2.1.2 0 0 4294967295 i
View the command output to identify whether the required BGP public route is valid.
- If the BGP public route is invalid, the IP routing table does not have a route to the next hop of the BGP route. In this case, check for incorrect IP routing settings (IGP or static routing settings), and make sure the IP routing table contains a route to the next hop of the BGP route.
- If the BGP public route is valid, proceed to step 2.
2. Identify whether the distribution or reception policy for BGP public routes is inappropriate.
Based on the next hop of the BGP route, the expected forwarding path for public network traffic, and the network topology plan, locate the sender and receiver of the BGP public route. On both of the sender and receiver, execute the display current-configuration configuration bgp command to view the effective BGP settings.
As shown in the following command output, the commands that define BGP route distribution or reception include:
¡ peer prefix-list
¡ peer filter-policy
¡ peer as-path-acl
¡ filter-policy
¡ peer route-policy
<Sysname> display current-configuration configuration bgp
#
bgp 20
peer 12.1.1.1 as-number 10
peer 23.1.1.3 as-number 30
#
address-family ipv4 unicast
filter-policy 2088 export
network 9.9.9.9 255.255.255.255
peer 12.1.1.1 enable
peer 12.1.1.1 filter-policy 2077 export
peer 12.1.1.1 route-policy test export
peer 23.1.1.3 as-path-acl 2 export
peer 23.1.1.3 enable
peer 23.1.1.3 next-hop-local
peer 23.1.1.3 prefix-list abc export
#
return
For more information about these commands, see BGP commands in Layer 3—IP Routing Command Reference. After you find the effective BGP settings, identify whether the configured distribution or reception policy affects the distribution or reception of BGP public routes.
¡ If the distribution or reception of BGP public routes is abnormal, correct the distribution or reception policy.
¡ If the distribution or reception of BGP public routes is normal, proceed to step 3.
3. Identify whether the number of BGP routes has exceeded the maximum.
On the receiver of the BGP public route, execute the display current-configuration configuration bgp command to check for the peer route-limit command.
¡ If the peer route-limit command is configured and the receiver has generated the following log message:
BGP/4/BGP_EXCEED_ROUTE_LIMIT: BGP.: The number of routes from peer 1.1.1.1 (IPv4-UNC) exceeds the limit 100.
The sender of the BGP public route has advertised too many BGP routes, which causes some BGP public routes to be discarded by the receiver. In this case, use the following methods to resolve the issue:
- On the sending device, execute the aggregate command with the detail-suppressed or suppress-policy keyword specified to create summary routes and suppress the advertisement of summarized routes.
- On the receiving device, execute the peer route-limit command to increase the maximum number of routes that the device can receive.
¡ Proceed to step 4 if one of the following conditions exists:
- The peer route-limit command is not configured.
- The peer route-limit command is configured, but the number of routes received by the receiving device his below the upper limit (no BGP/4/BGP_EXCEED_ROUTE_LIMIT log message is generated).
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
· 1.3.6.1.4.1.25506.2.202.4.0.1 hh3cBgpPeerRouteNumThresholdExceed
· 1.3.6.1.4.1.25506.2.202.4.0.2 hh3cBgpPeerRouteNumThresholdCleard
· 1.3.6.1.4.1.25506.2.202.4.0.3 hh3cBgpPeerRouteExceed
· 1.3.6.1.4.1.25506.2.202.4.0.4 hh3cBgpPeerRouteExceedClear
· 1.3.6.1.4.1.25506.2.202.4.0.5 hh3cBgpPeerEstablished
· 1.3.6.1.4.1.25506.2.202.4.0.6 hh3cBgpPeerBackwardTransition
Log messages
· BGP/4/BGP_EXCEED_ROUTE_LIMIT
· BGP/5/BGP_REACHED_THRESHOLD
IS-IS issues
IS-IS neighbor establishment failure
Symptom
· The IS-IS neighbor is down.
· The IS-IS neighbor relationship flaps.
Common causes
The following are the common causes of this type of issue:
· IS-IS cannot send or receive hello packets normally, because a device has underlying faults or the link between the two devices fails.
· The devices at the ends of the link use the same system ID.
· The interfaces connected by the link use different MTU settings or the effective interface MTU is smaller than the transmitted hello packets.
· The IP addresses of the interfaces connected by the link are not in the same network segment.
· The interfaces connected by the link use different authentication modes.
· The two ends of the link are at mismatching IS-IS levels.
· When the two devices establish an IS-IS Level-1 neighbor relationship, they use different area addresses.
Troubleshooting flow
Figure 53 shows the troubleshooting flowchart.
Figure 53 Flowchart for troubleshooting IS-IS neighbor establishment failures
Solution
1. Identify whether the IS-IS interface is up at the physical layer.
Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the physical state of the IS-IS interface. If the interface is down, first resolve interface failures. If the interface is up, proceed to step 2.
¡ Execute the ping command to identify whether the link between the two devices fails (including whether the transport devices fail). If the link is operating properly, proceed to step 3.
To have IS-IS use BFD for link state detection, execute the isis bfd session-restrict-adj command to enable BFD session state-based control of adjacency establishment and maintenance. The IS-IS interfaces will advertise BFD-enabled TLVs in hello packets to each other. If the exchanged BFD-enabled TLVs carry the same information, BFD session state-based control of adjacency establishment and maintenance takes effect. After the BFD session goes down, the interfaces cannot establish an IS-IS adjacency.
¡ Execute the display bfd session command to view the state of the BFD session that monitors the IS-IS link. If the State field displays Down, first resolve link failures. If the State field displays Up, proceed to step 3.
3. Identify whether the CPU or memory usage is too high.
¡ Execute the display cpu-usage command to identify whether the MPU and interface modules on the failed device have high CPU usage. If the CPU usage is too high, IS-IS cannot normally send or receive packets, resulting in neighbor flapping. To resolve this issue, disable unnecessary features. If the CPU usage is not high, proceed to step 4.
¡ Execute the display memory-threshold command, and then view the Current free-memory state field in the command output. This field displays the current memory usage of the system. If this field displays Minor, Severe, or Critical, the free memory resources might be tight. As a result, the device might be unable to receive or send IS-IS packets, or might process IS-IS packets slowly. To resolve this issue, disable unnecessary features. If the current memory usage is normal, proceed to step 4.
4. Identify whether the state of the IS-IS interface is normal.
Execute the display isis interface command, and then view the IPv4 state or IPv6 state field to identify whether the IS-IS interface is in normal state.
¡ If the state of the IS-IS interface is Lnk:Up/IP:Dn, the interface is up at the link layer but is down at the network layer. You need to resolve interface faults at the network layer.
¡ If the state of the IS-IS interface is Up, proceed to step 5.
5. Identify whether IP addresses of the two IS-IS interfaces are in the same network segment.
¡ For IPv4 IS-IS, execute the display interface brief command to view the IPv4 address of each IS-IS interface.
- If the two IPv4 interface addresses are in different network segments, execute the ip address command on either of the interfaces to adjust its IPv4 address. This operation ensures that the IPv4 addresses of the two IS-IS interfaces are in the same network segment.
- If the two IPv4 interface addresses are in the same network segment, proceed to step 6.
¡ For IPv6 IS-IS, this check is not required.
6. Identify whether the two IS-IS interfaces use the same MTU value.
Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view interface MTU information.
¡ If the two IS-IS interfaces use different MTU values, execute the mtu size command on either of the interfaces to adjust its MTU value. This operation ensures that the two IS-IS interfaces use the same MTU value.
¡ If the two IS-IS interfaces use the same MTU value, proceed to step 7.
7. Identify whether IS-IS can receive hello packets.
Execute the display isis packet hello by-interface verbose command to identify whether IS-IS can receive hello packets. If the device cannot receive hello packets, resolve the packet loss issue. If the issue persists, proceed to step 12.
If the device can receive hello packets, continue to perform the following checks:
¡ If the value for the Duplicate system ID field increases over time, a system ID conflict exists and you need to proceed to step 8.
¡ If the value for the Mismatched level (LAN) field increases over time, an IS level mismatch exists and you need to proceed to step 9.
¡ If the value for the Bad area address TLV field increases over time, an area address mismatch exists and you need to proceed to step 10.
¡ If the values for other fields increase over time, proceed to step 12.
8. Identify whether the devices connected by the link use the same system ID.
Execute the display current-configuration isis command to view the system IDs of the devices.
¡ If their system IDs are identical, change the system ID of either device.
¡ If their system IDs are different, proceed to step 9.
9. Identify whether the devices connected by the link have an IS level mismatch.
Identify the IS level of each device and the circuit level of each IS-IS interface.
¡ Execute the display current-configuration | include is-level command to view the IS levels of the two devices connected by the link. If this command does not display the IS level of a device, the IS level of the device is Level-1-2, the default value.
¡ Execute the display current-configuration interface interface-type interface-number | include circuit-level command to view the circuit levels of IS-IS interfaces. If this command does not display the circuit level of an IS-IS interface, the circuit level of the interface is Level-1-2, the default value. The interface can establish both Level-1 and Level-2 adjacencies.
Two IS-IS interfaces can establish an IS-IS neighbor relationship only when their circuit levels meet one of the following requirements:
¡ If the circuit level of the local interface level is Level-1, the circuit level of the remote interface must be Level-1 or Level-1-2.
¡ If the circuit level of the local interface level is Level-2, the circuit level of the remote interface must be Level-2 or Level-1-2.
¡ If the circuit level of the local interface level is Level-1-2, the circuit level of the remote interface can be Level-1, Level-2, or Level-1-2.
Perform one of the following troubleshooting operations accordingly:
¡ If the two devices have an IS level mismatch, execute the is-level command in IS-IS view for either of the devices to adjust its IS level. Alternatively, execute the isis circuit-level command in interface view for the desired interface to adjust its circuit level.
¡ If the IS levels of the two devices are matching, proceed to step 10.
10. Identify whether the area addresses of the two devices connected by the link are matching.
Execute the display isis command, and then view the Network entity field in the command output to identify whether the area addresses of the devices are matching. The network entity title (NET) format is X…X.XXXX.XXXX.XXXX.00. The X…X segment represents the area address, the XXXX.XXXX.XXXX segment represents the system ID, and the 00 segment is the SEL.
¡ Two IS-IS devices can establish a Level-1 neighbor relationship only when they are in the same area. When they establish an IS-IS Level-2 neighbor relationship, area address check is not required.
When the two devices fail to establish a Level-1 neighbor relationship due to area address inconsistency, execute the network-entity command in IS-IS view for either of the devices to adjust its area address.
¡ If the area addresses of the two devices are matching, proceed to step 11.
11. Identify whether the two devices connected by the link are in the same IS-IS authentication mode.
Execute the display current-configuration interface-type interface-number | include isis command to view the authentication mode of the IS-IS interface on each device.
a. If the two IS-IS interfaces are in different authentication modes, execute the isis authentication-mode command on either of the IS-IS interfaces to adjust its authentication mode. This operation ensures that the two IS-IS interfaces are in the same authentication mode.
b. If the two IS-IS interfaces are in the same authentication mode and still fail to establish a neighbor relationship, verify that they use the same authentication password.
If the issue persists, proceed to step 12.
12. Collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: ISIS-MIB
isisAdjacencyChange (1.3.6.1.2.1.138.0.17)
Log messages
ISIS/3/ISIS_NBR_CHG
IS-IS route learning failure
Symptom
A device cannot learn an IS-IS route.
Common causes
The following are the common causes of this type of issue:
· Other routing protocols have advertised routes with the same destination address, and their protocol preferences are higher than that for IS-IS.
· The route is not selected as an optimal route, because it is redistributed into IS-IS and its preference is low.
· The route is not selected as an optimal route, because it is redistributed into IS-IS and is of a different cost type.
· The device and the advertisement source device are in different IS-IS cost styles.
· The device and the advertisement source device does not establish a normal IS-IS neighbor relationship.
· The device and the advertisement source device are configured with the same system ID.
· LSP authentication fails.
· Some LSPs are lost because the device has underlying faults or the link between the two devices fails.
· The device cannot receive the LSPs from the advertisement source device, because the LSP length has exceeded the maximum length of LSPs that the device can receive.
Troubleshooting flow
Figure 54 shows the troubleshooting flowchart.
Figure 54 Flowchart for troubleshooting IS-IS route learning failures
Solution
1. Identify whether the IS-IS routing table contains the desired IS-IS route.
Execute the display isis route command to view the IS-IS routing table.
¡ If the IS-IS route exists in the IS-IS routing table, execute the display ip routing-table ip-address [ mask | mask-length ] verbose command to check for routes with protocol preferences higher than that for IS-IS routes.
- If such routes exist, adjust the configuration according to the network plan.
- If such routes do not exist, proceed to step 7.
¡ If the IS-IS route does not exist in the IS-IS routing table, proceed to step 7.
2. Identify whether the desired IS-IS route is advertised.
Execute the display isis lsdb verbose local command on the advertisement source device to identify whether the device has advertised LSPs that carry the IS-IS route.
¡ If no LSPs carry the IS-IS route, check for incorrect IS-IS configurations on the advertisement source device. For example, you can check whether the related interface is enabled with IS-IS. If the IS-IS route is an external route redistributed into IS-IS, execute the display ip routing-table protocol protocol verbose command, and then view the State field of the route. If this field contains Inactive, the external route is inactive. IS-IS does not advertise inactive routes. In this situation, adjust the configurations related with external routes to ensure that the State field of the route contains Active and Adv.
¡ If an LSP that carries the IS-IS route exists, proceed to step 7.
3. Identify whether the desired IS-IS route is of the same cost type as other redistributed routes with the same destination address.
When multiple devices advertise routes to the same destination through route redistribution and these external routes need to form a load balancing relationship, make sure these routes are of the same cost type after redistribution by IS-IS. The cost value for a redistributed route varies by its cost type:
¡ If the cost type is external, the cost value equals the original cost value plus 64 when IS-IS advertises the route in LSPs.
¡ If the cost type is internal, the cost value equals the original cost value when IS-IS advertises the route in LSPs.
By default, the cost type is external for external routes redistributed by H3C devices. If the cost type of external routes redistributed by non-H3C devices is not external, the cost values for routes with the same destination address will be different. As a result, IS-IS neighbors will select the route with the lowest cost value as the optimal route. In this case, adjust the cost type of redistributed external routes to ensure that the external routes redistributed by devices from various vendors are all of the same cost type. To adjust the cost type of external routes redistributed by an H3C device:
a. On the device that advertises the desired IS-IS route, execute the display current-configuration configuration isis command to view the route redistribution configuration for IS-IS.
b. Execute the import-route command to adjust the cost type of external routes redistributed into IS-IS.
In situations other than those mentioned above, proceed to step 4.
4. Identify whether the IS-IS database has been synchronized.
On the device that cannot learn the IS-IS route, execute the display isis lsdb command to identify whether the device has received an LSP that contains the IS-IS route from the advertisement source device.
¡ If the desired LSP does not exist in the LSDB, check for link failures. If no link failures are found, execute the display isis command, and then view the LSP length receive field to determine whether the related LSP is too long for the device to receive. When the value of this field exceeds the maximum LSP length supported by the device, use the lsp-length originate command on the advertisement source to change the maximum length of generated LSPs. Make sure the maximum length of LSPs generated by the advertisement source equals the minimum IS-IS interface MTU within the current area.
¡ If the desired LSP exists in the LSDB, but the following conditions exist, the system ID of the advertisement source device conflicts with that of another device:
- The value for the Seq Num field of the LSP keeps increasing.
- The value for the Seq Num field of the LSP are different from that on the advertisement source device. To view the value for the Seq Num field of the LSP on the advertisement source device, use the display isis lsdb local verbose command.
In this situation, find the device that uses the same system ID as the advertisement source device, and then change the system ID of either device.
¡ If the desired LSP exists in the LSDB, but the following conditions exist, LSPs might be discarded during transmission:
- The value for the Seq Num field of the LSP remains unchanged.
- The value for the Seq Num field of the LSP are different from that on the advertisement source device. To view the value for the Seq Num field of the LSP on the advertisement source device, use the display isis lsdb local verbose command.
In this situation, check for underlying faults on the device and identify whether the intermediate links between the device and the advertisement source device fail.
¡ If the following conditions exist, proceed to step 5:
- The desired LSP exists in the LSDB.
- The value for the Seq Num field of the LSP is the same as that on the advertisement source device. To view the value for the Seq Num field of the LSP on the advertisement source device, use the display isis lsdb local verbose command.
5. Identify whether the device and the advertisement source device use the same cost style.
Execute the display isis command on the device and the advertisement source device separately, and then view the value for the Cost style field to identify whether the two devices use the same cost style. They can learn routes from each other only if their cost styles are the same.
¡ If the two devices use different cost styles, execute the cost-style command in IS-IS view for either of the devices to change its cost style.
¡ If the two devices use the same cost style, proceed to step 6.
6. Identify whether all devices along the path between the device and the advertisement source device have established IS-IS neighbor relationships correctly.
Execute the display isis peer command on each device to check for abnormal IS-IS neighbor relationships.
¡ If some neighbor relationships are established incorrectly, resolve this issue as described in ""IS-IS neighbor establishment failure."
¡ If all neighbor relationships are established correctly, proceed to step 7.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
IS-IS route flapping
Symptom
An IS-IS route is repeatedly added and deleted.
Common causes
The following are the common causes of this type of issue:
· The IS-IS neighbor flaps.
· The MPLS LSP tunnel flaps.
· On the local and remote devices, IS-IS redistributes the same external route with IS-IS. The external route takes precedence over the IS-IS route.
· The local and remote devices are configured with the same system ID.
Troubleshooting flow
Figure 55 shows the troubleshooting flowchart.
Figure 55 Flowchart for troubleshooting IS-IS route flapping
Solution
1. View the route flapping details.
Execute the display ip routing-table ip-address verbose command to view the route flapping details as follows:
¡ If the TunnelID field of the IS-IS route changes before and after route flapping, identify whether the MPLS LSP tunnel flaps.
Execute the display mpls lsp verbose command, and then view the Last Chg Time field to view the time when the state of the LDP LSP changed most recently. If the time is close to the time when the display mpls lsp verbose command is executed, MPLS LSP tunnel flapping exists.
In this situation, check for and troubleshoot LSP flapping issues. You can see the solution to LDP LSP flapping issues or sudden TE tunnel state changes (from up to down).
¡ If the Cost or Interface field of the IS-IS route changes, check for IS-IS neighbor flapping along the route.
¡ If the route sometimes appears and sometimes disappears in the routing table (the Age field is flapping), you can execute the display isis lsdb verbose command to find the LSP that carries the IS-IS route. Record the LSP ID of the LSP, and then the display isis lsdb verbose lsp-id command to view the update status of this LSP.
- If the LSP always carries the IS-IS route, check for IS-IS neighbor flapping along the route.
- If the Seq Num field of the LSP keeps increasing and a sharp content change exists before and after the LSP update, check for devices configured with the same system ID in the network.
- If the Seq Num field of the LSP keeps increasing and the IS-IS route is intermittently present before and after the LSP update, perform step 2 on the device that generated the LSP.
¡ If the Protocol field of the IS-IS route changes, proceed to step 2.
2. Check the route redistribution configuration for IS-IS.
If the IS-IS route is an external route redistributed into IS-IS, execute the display ip routing-table ip-address verbose command on the device into which the route was redistributed. This command displays the route flapping details.
¡ If the IS-IS route is not active and another IS-IS route with the same destination address is in Active state in the routing table, it indicates that other IS-IS devices in the network have advertised the same route. To resolve this issue, perform one of the following operations:
- Adjust the route preference according to the network plan.
- Configure a route filtering policy on the IS-IS device that redistributes the external route to control the routes flushed to the IP routing table.
¡ In other situations, proceed to step 3.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting OSPFv3
OSPFv3 neighbor down
Symptom
· The OSPFv3 neighbor goes down.
· OSPF neighbor flapping occurs.
Common causes
The following are the common causes for this type of issue:
· The BFD session is down, which indicates that BFD detects a link failure.
· The remote device fails.
· CPU usage or memory usage is excessively high.
· Link failures occur.
· The OSPFv3 interface is not up.
· The IP addresses of the two ends are not on the same network.
· The OSPFv3 settings of the two ends do not match.
¡ Router ID conflict occurs.
¡ Area types of the two ends are not consistent.
¡ OSPFv3 authentication modes of the two ends are not consistent.
¡ The timer settings of the two ends are not consistent.
¡ The network types of the OSPFv3 interfaces at the two ends do not match.
Troubleshooting flow
Figure 56 shows the troubleshooting flowchart.
Figure 56 Flowchart for troubleshooting OSPFv3 neighbor down
Solution
1. Identify the reason for the OSPFv3 neighbor down issue through the CLI.
Execute the display ospfv3 event-log peer command. The Reason field in the command output represents the reasons for neighbor state changes. Common options include:
¡ DeadExpired
The device does not receive any Hello packet before the dead timer expires, and the OSPFv3 neighbor state becomes Down. In this case, proceed to the next step.
¡ BFDDown
The BFD session goes down, causing the OSPFv3 neighbor state to become Down. In this case, proceed to the next step.
¡ 1-Way
The neighbor’s OSPFv3 state becomes Down. It sends 1-way Hello packets to the local device, causing the OSPFv3 state becomes Init on the local device. In this case, troubleshoot faults on the neighbor.
¡ IntPhyChange
The interface goes down or its MTU changes, tearing down the neighbor relationship. In this case, execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the running state and related information about the interface, and troubleshoot the interface faults. For other situations, proceed to step 11.
2. Identify whether the physical layer state of the interface is Up.
Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the physical layer state of the OSPFv3 interface. If the physical layer state is Down, first troubleshoot the interface faults. If the physical layer state is Up, proceed to step 2.
3. Identify whether the link fails.
Execute the ping command to identify whether the link, including transmission devices, fails. If the link operates correctly, proceed to step 4.
4. Identify whether the CPU usage is excessively high.
Execute the display cpu-usage command to identify whether the CPU usage of the device's MPU and interface module is excessively high. High CPU usage prevents the normal transmission of OSPFv3 packets, causing neighbor flapping. To resolve this issue, close unnecessary functions. If the CPU usage is not high, proceed to the next step.
5. Identify whether the memory usage exceeds the memory usage threshold.
Execute the display memory-threshold command. If the Current free-memory state field in the output, which represents the current memory usage of the system, displays Minor, Severe, or Critical, it indicates that the remaining free memory is relatively low. In this case, the device might be unable to send or receive OSPFv3 packets, or might process OSPFv3 packets slowly. To resolve this issue, close unnecessary functions. If the Current free-memory state field displays Normal, proceed to step 6.
6. Identify whether each OSPFv3 interface is in a normal state.
Execute the display ospfv3 interface command to identify whether the OSPFv3 interface is in a normal state.
¡ If the OSPFv3 interface is in Down state, identify whether OSPFv3 is enabled on the interface. If OSPFv3 is enabled, troubleshoot the interface issue on the network layer.
¡ If the OSPFv3 interface is in a normal state, including DR, BDR, DROther, and P-2-P, proceed to the next step.
7. Identify whether the OSPFv3 interfaces have the same MTU value.
If the ospfv3 mtu-ignore command is not executed for the interfaces, the interfaces must have the same MTU value. If they have different MTU values, OSPFv3 neighbor relationships cannot be established. Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the MTU information of each interface.
¡ If the interfaces have different MTU values, execute the mtu size command in interface view to configure the same MTU value for the interfaces.
¡ If the interfaces have the same MTU value, proceed to step 8.
8. Identify whether the DR priority of each interface is not zero.
On a broadcast or NBMA network, to elect a DR correctly, make sure that a minimum of one OSPFv3 interface has a non-zero DR priority. If all OSPFv3 interfaces have the DR priority of zero, the neighbor states at both ends can only become 2-Way. Execute the display ospfv3 interface command to view OSPFv3 interface information. The Priority field in the command output displays the DR priority of the interface.
If one or multiple interfaces have non-zero DR priorities, proceed to the next step.
9. Identify whether a neighbor has been manually specified for the NBMA or P2MP unicast interface.
When the network type of an interface is NBMA or P2MP unicast, you must use the ospfv3 peer command to specify a neighbor by its link-local address. Execute the display this command in interface view. If the network type of the interface is NBMA or P2MP unicast, execute the ospfv3 peer command to manually specify a neighbor by its link-local address.
If a neighbor has been manually specified for the NBMA or P2MP unicast interface, proceed to the next step.
10. Identify whether the OSPFv3 settings at the two ends are incorrect.
a. Execute the display ospfv3 command to view the OSPFv3 router IDs of the two ends. If the two ends are configured with the same router ID, edit the configuration to avoid the conflict. If the two ends are configured with different router IDs, proceed to the next step.
b. Execute the display ospfv3 interface command to view the area IDs of the two ends. If the two ends are configured with different area IDs, edit the configuration to ensure consistency. If the two ends are configured with the same area ID, proceed to the next step.
c. Execute the display ospfv3 interface command to view the network types of interfaces at the two ends. If the two interfaces are configured with different network types, edit the configuration to ensure consistency. If PTP is specified for one end and broadcast for the other, the neighbor relationship can enter Full state, but routing information cannot be calculated.
If the two ends are configured with the same network type, proceed to the next step.
d. Execute the display ospfv3 statistics error command every 10 seconds for 5 minutes to view OSPFv3 error statistics. Pay attention to the following fields:
- Authentication failure field. If the value of this field keeps increasing, it indicates that the two neighbors are configured with different OSPFv3 authentication modes. To resolve this issue, configure the same authentication mode for them.
- HELLO: Hello-time mismatch field. If the value of this field keeps increasing, it indicates that the two interfaces are configured with different hello intervals. To resolve this issue, configure the same hello interval for them.
- HELLO: Dead-time mismatch field. If the value of this field keeps increasing, it indicates that the two interfaces are configured with different dead intervals. To resolve this issue, configure the same dead interval for them.
- HELLO: Ebit option mismatch field. If the value of this field keeps increasing, it indicates that areas to which the two neighbors belong are of different types, for example, one is in a normal area and the other in a stub or NSSA area. To resolve this issue, configure the same area type for them.
If the issue persists, proceed to step 11.
11. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: OSPFV3-MIB
· ospfv3VirtIfStateChange (1.3.6.1.2.1.191.0.1)
· ospfv3NbrStateChange (1.3.6.1.2.1.191.0.2)
· ospfv3VirtNbrStateChange (1.3.6.1.2.1.191.0.3)
Log messages
· OSPFV3/6/OSPFV3_LAST_NBR_DOWN
· OSPFV3/5/OSPFV3_NBR_CHG
OSPFv3 neighbor unable to enter Full state
Symptom
The OSPFv3 neighbor state machine involves neighbor states of Down, Init, 2-way, ExStart, Exchange, Loading, and Full. Among them, the stable states are Down, 2-way, and Full.
· Down—OSPFv3 is not enabled.
· 2-way—The neighbor relationship between DR Others.
· Full—The local device and the neighbor are fully adjacent.
In networks using OSPFv3 for route calculation and forwarding, only 2-way and Full are normal neighbor states. If the neighbor state is neither 2-way nor Full, it indicates an abnormal neighbor relationship.
Common causes
The following are the common causes for this type of issue:
· OSPFv3 packets are dropped due to link failures.
· The DR priority configuration for the interfaces is not appropriate.
· The two ends are configured with different MTU values.
Troubleshooting flow
Figure 57 shows the troubleshooting flowchart.
Figure 57 Flowchart for troubleshooting OSPFv3 neighbor unable to enter Full state
Solution
1. Execute the display ospfv3 peer command to view OSPFv3 neighbor information, and perform different tasks based on the neighbor state.
¡ If no neighbor information exists:
Identify whether a Router ID is configured for the OSPFv3 process. If no Router ID is configured, the OSPFv3 process cannot operate. If a Router ID is configured, it indicates that the OSPFv3 neighbor goes down or neighbor flapping occurs.
¡ If the neighbor state remains Init:
It indicates that the remote device cannot receive Hello packets from the local end. In this case, identify whether the link or the remote device fails.
¡ If the neighbor state remains 2-Way:
Execute the display ospfv3 interface verbose command to identify whether the DR priority for OSPFv3 interface of the device is zero.
If the DR priority of the OSPFv3 interface is zero, no action is required.
If the DR priority of the OSPFv3 interface is not zero, proceed to step 2.
¡ If the neighbor state remains ExStart:
It indicates that the device is performing DD negotiation but cannot perform DD synchronization. The following are the common causes for this type of issue:
- The interface cannot send and receive oversized packets correctly.
Execute the ping -s packet-size neighbor-address command multiple times and set 1500 or greater value for the packet-size argument to view the numbers of transmitted and received packets. If the remote end cannot be pinged, resolve the link issue first.
- The two ends are configured with different MTU values.
If the OSPFv3 interface is not configured to ignore MTU check by using the ospfv3 mtu-ignore command, identify whether the two ends are configured with the same MTU value. If they are configured with different MTU values, configure the same MTU value for them.
If the issue persists, proceed to step 2.
¡ If the neighbor state remains Exchange:
It indicates that the device is performing DD packet exchange. Troubleshoot this issue in the same way as when the neighbor state remains ExStart.
If the issue persists, proceed to step 2.
¡ If the neighbor state remains Loading:
Execute the reset ospfv3 [ process-id ] process command to restart the OSPFv3 process.
If the issue persists, proceed to step 2.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
OSPF issues
OSPF neighbor down
Symptom
· OSPF neighbor down
· OSPF neighbor flapping occurs.
Common causes
The following are the common causes of this type of issue:
· The BFD session is down, which indicates that BFD has detected a link failure.
· The remote device failed.
· The CPU usage is too high.
· Link failures occurred.
· The OSPF interface is not up.
· The IP addresses of the two ends are not on the same network segment.
· The OSPF settings of the two ends do not match.
¡ A Router ID conflict exists.
¡ The two ends are configured with different area types.
¡ The two ends are configured with different OSPF authentication modes.
¡ The neighboring OSPF interfaces use different timer settings.
¡ The neighboring OSPF interfaces are configured with different network types.
Troubleshooting flow
Figure 58 shows the troubleshooting flowchart.
Figure 58 Troubleshooting flowchart
Solution
1. Identify the reason for the OSPF neighbor down issue through the CLI.
Execute the display ospf event-log peer command. The Reason field in the command output displays the reason for the neighbor state change. Common options include:
¡ DeadExpired
The device had not received any Hello packet before the dead timer expired, and the OSPF neighbor state became Down. In this case, proceed to step 2.
¡ BFDDown
The OSPF neighbor state became Down, because the BFD session went down. In this case, proceed to step 2.
¡ IntVliChange or Virtual link was deleted or the route it relies on was deleted
The neighbor relationship became Down, because the virtual link or its dependency route was deleted. In this case, proceed to step 2.
¡ 1-Way
The OSPF state of the local device became Init, because the OSPF state of the remote device became Down and sent a 1-way Hello packet to the local device. In this case, perform troubleshooting on the remote device.
¡ IntPhyChange
The neighbor relationship became Down, because a related interface went down or its MTU changed. In this case, execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the running state and related information about the interface, and then troubleshoot the interface faults. For other situations, proceed to 11.
2. Check for link failures.
Execute the ping command to check for link failures. If the related links operate correctly, proceed to 3.
3. Identify whether the CPU usage is too high.
Execute the display cpu-usage command to identify whether the CPU usage of the device's MPU and interface modules is excessively high. High CPU usage prevents the normal transmission of OSPF packets, causing neighbor flapping. To resolve this issue, disable unnecessary functions. If the CPU usage is not high, proceed to step 5.
4. Identify whether the memory usage exceeds the memory usage threshold.
Execute the display memory-threshold command. If the Current free-memory state field in the output, which represents the current memory usage of the system, displays Minor, Severe, or Critical, it indicates that the remaining free memory is relatively insufficient. In this case, the device might be unable to send or receive OSPF packets, or might process OSPF packets slowly. To resolve this issue, disable unnecessary functions. If the Current free-memory state field displays Normal, proceed to step 5.
5. Identify whether the physical link state of each OSPF interface is in UP.
Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to identify whether the physical link state of each OSPF interface is UP.
¡ If the physical link state of an OSPF interface is Down, you must recover that interface.
¡ If the physical link state of each OSPF interface is UP, execute the display ospf interface command to identify whether each OSPF interface is in a normal OSPF state.
- If an OSPF interface is in DOWN state, identify whether the network command was executed in the related OSPF process to advertise the network segment to which that interface belongs. If OSPF did not advertise the network segment, identify whether OSPF is enabled on the interface. If OSPF is enabled, troubleshoot the interface issue on the network layer.
- If the OSPF interfaces are in a normal state, including DR, BDR, DROther, and P-2-P, proceed to step 6.
6. Identify whether the IP addresses of the two ends are in the same network segment.
Execute the display interface brief command to view IP addresses of the two neighboring interfaces.
¡ If the two interface IP addresses are not in the same network segment, execute the ip address command on either of the interfaces to change its IP address. Make sure IP addresses of the two neighboring interfaces are in the same network segment.
¡ If the two interface IP addresses are in the same network segment, proceed to step 7.
7. Identify whether the related OSPF interfaces have the same MTU value.
If the ospf mtu-enable command was executed on the OSPF interfaces, those OSPF interfaces must add the same MTU value to DD packets. If this requirement is not met, the OSPF interfaces cannot be establish an OSPF neighbor relationship. By default, the MTU value in DD packets is 0. Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the MTU information of each interface.
¡ If the interfaces have different MTU values, execute the mtu size command in interface view to configure the same MTU value for the interfaces.
¡ If the interfaces have the same MTU value, proceed to step 8.
8. Verify that the DR priority of each neighboring OSPF interface is not zero.
On a broadcast or NBMA network, to elect a DR correctly, make sure that a minimum of one OSPF interface has a non-zero DR priority. If the DR priority is 0 for both of the two neighboring OSPF interfaces, the highest neighbor states at both ends are 2-Way. Execute the display ospf interface command to view OSPF interface information. The Priority field in the command output displays the DR priority of an interface.
If the DR priority of each neighboring interface is not zero, proceed to step 9.
9. Identify whether an NBMA or P2MP unicast neighbor has been manually specified.
When the network type of an OSPF interface is NBMA or P2MP unicast, you must use the peer command to specify a neighbor by its IP address. Execute the display this command in interface view. If the network type of an interface is NBMA or P2MP unicast, execute the peer command to manually specify a neighbor by its IP address.
If an NBMA or P2MP unicast neighbor has been manually specified, proceed to step 10.
10. Identify whether the OSPF settings at the two ends are correct.
a. Execute the display ospf command to view the OSPF router IDs of the two ends. If the two ends use the same router ID, edit the configuration to avoid the conflict. If the two ends use different router IDs, proceed to the next step.
b. Execute the display ospf interface command to view the OSPF area IDs of the two ends. If the two ends use different area IDs, edit the configuration to ensure area ID consistency. If the two ends use the same area ID, proceed to the next step.
c. Execute the display ospf interface command to view the network types of interfaces at the two ends. If the two interfaces are configured with different network types, edit the configuration to ensure network type consistency. If the network type is PTP for one end and Broadcast for the other, the neighbor relationship can reach Full state, but routing information cannot be calculated.
If the two ends are configured with the same network type, proceed to the next step.
d. Execute the display ospf statistics error command every 10 seconds for 5 minutes to view OSPF error statistics. Pay attention to the following fields:
- Bad authentication type: If the value of this field keeps increasing, the two OSPF neighbors are configured with different OSPF authentication modes. To resolve this issue, configure the same authentication mode for them.
- Hello-time mismatch: If the value of this field keeps increasing, the two neighboring interfaces are configured with different hello intervals. To resolve this issue, configure the same hello interval for them.
- Dead-time mismatch: If the value of this field keeps increasing, the two neighboring interfaces are configured with different dead intervals. To resolve this issue, configure the same dead interval for them.
- Ebit option mismatch: If the value of this field keeps increasing, the two neighbors are of different OSPF area types, for example, one is in a normal area and the other in a stub area. To resolve this issue, configure the same area type for them.
If the issue persists, proceed to step 11.
11. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: OSPF-TRAP-MIB
· ospfVirtIfStateChange (1.3.6.1.2.1.14.16.2.1)
· ospfNbrStateChange (1.3.6.1.2.1.14.16.2.2)
· ospfVirtNbrStateChange (1.3.6.1.2.1.14.16.2.3)
Log messages
· OSPF/5/OSPF_NBR_CHG
· OSPF/5/OSPF_NBR_CHG_REASON
OSPF neighbor unable to enter Full state
Symptom
The OSPF neighbor state machine involves neighbor states of Down, Init, 2-way, ExStart, Exchange, Loading, and Full. Among them, the stable states are Down, 2-way, and Full.
· Down—OSPF is not enabled.
· 2-way—The neighbor relationship between DR Others.
· Full—The local device and the neighbor are fully adjacent.
In networks using OSPF for route calculation and forwarding, only 2-way and Full are normal neighbor states. If the neighbor state is neither 2-way nor Full, it indicates an abnormal neighbor relationship.
Common causes
The following are the common causes of this type of issue:
· OSPF packets were dropped due to link failures.
· The DR priority of the neighboring interfaces is not appropriate.
· The two ends use different MTU values.
Troubleshooting flow
Figure 59 shows the troubleshooting flowchart:
Figure 59 Troubleshooting flowchart
Solution
1. Execute the display ospf peer command to view OSPF neighbor information, and perform different tasks based on the neighbor state.
¡ If no neighbor information exists:
The OSPF neighbor went down or flapped. See "OSPF neighbor down" to troubleshoot the issue.
¡ If the neighbor state remains Init:
The remote device cannot receive Hello packets from the local device. In this case, identify whether the link or the remote device has failed.
¡ If the neighbor state remains 2-Way:
Execute the display ospf interface verbose command to identify whether the DR priority of the neighbor-facing OSPF interface is zero.
- If the DR priority of the OSPF interface is zero, no action is required.
- If the DR priority of the OSPF interface is not zero, proceed to step 2.
¡ If the neighbor state remains ExStart:
The device is performing DD negotiation but cannot perform DD synchronization. The following are the common causes for this issue:
- The neighbor-facing interface cannot send and receive oversized packets correctly.
Repeat the ping -s packet-size neighbor-address command, set the value for the packet-size argument to 1500 or greater, and then view the transmission and reception of oversized packets. If the remote end still cannot be pinged, troubleshoot the link issue first.
- The two ends use different MTU values.
If the ospf mtu-enable command was configured on the neighbor-facing OSPF interface, identify whether the two ends use the same MTU value. If they use different MTU values, configure the same MTU value for them.
If the issue persists, proceed to step 2.
¡ If the neighbor state remains Exchange:
The device is performing DD packet exchange. Troubleshoot this issue in the same way as when the neighbor state remains ExStart.
If the issue persists, proceed to step 2.
¡ If the neighbor state remains Loading:
Execute the reset ospf [ process-id ] process command to restart the OSPF process.
If the issue persists, proceed to step 2.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
OSPF device unable to learn partial OSPF routes
Symptom
An OSPF device fails to learn partial OSPF routes.
Common causes
The following are the common causes of this type of issue:
· The network type is P2P for one end and is Broadcast for the other end. Although the neighbor relationship is in Full state, the two ends cannot learn routes from each other.
· The OSPF process is configured with the filter-policy import command.
· The filter import command is configured in the local OSPF area.
· The filter export command is configured in other OSPF areas.
· The OSPF process is bound to a VPN instance. The tag of routes redistributed to the OSPF process is the same as that in the AS External LSA (Type-5 LSA) or NSSA External LSA (Type-7 LSA).
· The ABR is unreachable.
· The ABR does not take the Summary LSAs from non-backbone areas into account during route calculation.
· The ASBR is unreachable.
· In the AS External LSA or NSSA External LSA, the FA address is unreachable.
· The route to the FA address in the NSSA External LSA is not in the same area as the NSSA External LSA.
Troubleshooting flow
Figure 60 and Figure 61 show the troubleshooting flowcharts.
Figure 60 Troubleshooting flowchart 1
Figure 61 Troubleshooting flowchart 2
Solution
1. Identify whether the network type is P2P for one end and is Broadcast for the other end.
If yes, the neighbor relationship can reach Full state, but the two ends cannot learn routes from each other. To resolve this issue:
a. Execute the display ospf interface command to view the network types of the two neighboring OSPF interfaces.
<Sysname> display ospf interface
OSPF Process 1 with Router ID 5.5.5.5
Interfaces
Area: 0.0.0.1
IP Address Type State Cost Pri DR BDR
192.168.51.5 PTP P-2-P 1 1 0.0.0.0 0.0.0.0
b. If this issue exists, execute the ospf network-type command to configure the same network type for the two neighboring interfaces.
If this issue does not exist, proceed to step 2.
2. Check the OSPF routing table multiple times for OSPF route flapping.
Execute the display ip routing-table protocol ospf verbose command, and then identify flapping OSPF routes by the Age field in the command output.
¡ If the Age field of an OSPF route displays a small value, the OSPF route flaps, and you must troubleshoot the route flapping issue.
¡ If no route flapping issue is found, proceed to step 3.
<Sysname> display ip routing-table protocol ospf verbose
Summary count : 3
Destination: 192.168.12.0/24
Protocol: O_INTER
Process ID: 1
SubProtID: 0x2 Age: 12h53m09s
Cost: 2 Preference: 10
IpPre: N/A QosLocalID: N/A
Tag: 0 State: Active Adv
OrigTblID: 0x0 OrigVrf: default-vrf
TableID: 0x2 OrigAs: 0
NibID: 0x13000003 LastAs: 0
AttrID: 0xffffffff Neighbor: 0.0.0.0
Flags: 0x10041 OrigNextHop: 192.168.51.1
Label: NULL RealNextHop: 192.168.51.1
BkLabel: NULL BkNextHop: N/A
SRLabel: NULL Interface: GigabitEthernet2/0/2
BkSRLabel: NULL BkInterface: N/A
SIDIndex: NULL InLabel: NULL
Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/2
BkTunnel ID: Invalid BkIPInterface: N/A
FtnIndex: 0x0 ColorInterface: N/A
TrafficIndex: N/A BkColorInterface: N/A
Connector: 0.0.0.0 VpnPeerId: N/A
Dscp: N/A Exp: N/A
SRTunnelID: Invalid StatFlags: 0x0
SID Type: N/A SID: N/A
BkSID: N/A NID: Invalid
FlushNID: Invalid BkNID: Invalid
BkFlushNID: Invalid PathID: 0x0
CommBlockLen: 0
OrigLinkID: 0x0 RealLinkID: 0x0
Destination: 192.168.24.0/24
Protocol: O_INTER
Process ID: 1
SubProtID: 0x2 Age: 12h53m09s
Cost: 3 Preference: 10
IpPre: N/A QosLocalID: N/A
Tag: 0 State: Active Adv
OrigTblID: 0x0 OrigVrf: default-vrf
TableID: 0x2 OrigAs: 0
NibID: 0x13000003 LastAs: 0
AttrID: 0xffffffff Neighbor: 0.0.0.0
Flags: 0x10041 OrigNextHop: 192.168.51.1
Label: NULL RealNextHop: 192.168.51.1
BkLabel: NULL BkNextHop: N/A
SRLabel: NULL Interface: GigabitEthernet2/0/2
BkSRLabel: NULL BkInterface: N/A
SIDIndex: NULL InLabel: NULL
Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/2
BkTunnel ID: Invalid BkIPInterface: N/A
FtnIndex: 0x0 ColorInterface: N/A
TrafficIndex: N/A BkColorInterface: N/A
Connector: 0.0.0.0 VpnPeerId: N/A
Dscp: N/A Exp: N/A
SRTunnelID: Invalid StatFlags: 0x0
SID Type: N/A SID: N/A
BkSID: N/A NID: Invalid
FlushNID: Invalid BkNID: Invalid
BkFlushNID: Invalid PathID: 0x0
CommBlockLen: 0
OrigLinkID: 0x0 RealLinkID: 0x0
Destination: 192.168.51.0/24
Protocol: O_INTRA
Process ID: 1
SubProtID: 0x1 Age: 12h54m07s
Cost: 1 Preference: 10
IpPre: N/A QosLocalID: N/A
Tag: 0 State: Inactive Adv
OrigTblID: 0x0 OrigVrf: default-vrf
TableID: 0x2 OrigAs: 0
NibID: 0x13000001 LastAs: 0
AttrID: 0xffffffff Neighbor: 0.0.0.0
Flags: 0x10c1 OrigNextHop: 0.0.0.0
Label: NULL RealNextHop: 0.0.0.0
BkLabel: NULL BkNextHop: N/A
SRLabel: NULL Interface: GigabitEthernet2/0/2
BkSRLabel: NULL BkInterface: N/A
SIDIndex: NULL InLabel: NULL
Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/2
BkTunnel ID: Invalid BkIPInterface: N/A
FtnIndex: 0x0 ColorInterface: N/A
TrafficIndex: N/A BkColorInterface: N/A
Connector: 0.0.0.0 VpnPeerId: N/A
Dscp: N/A Exp: N/A
SRTunnelID: Invalid StatFlags: 0x0
SID Type: N/A SID: N/A
BkSID: N/A NID: Invalid
FlushNID: Invalid BkNID: Invalid
BkFlushNID: Invalid PathID: 0x0
CommBlockLen: 0
OrigLinkID: 0x0 RealLinkID: 0x0
3. Identify whether the filter-policy import command is configured in the OSPF process.
In scenarios where route filtering is configured, check for OSPF route filtering errors.
a. Execute the display this command in the related OSPF process on the local device to identify whether the filter-policy import command is configured in the OSPF process.
[Sysname-ospf-1] display this
#
ospf 1
import-route direct
filter-policy 2000 import
area 0.0.0.1
network 192.168.51.0 0.0.0.255
nssa
#
return
b. If the filter-policy import command is configured, identify whether the filtering rule specified by using this command are appropriate.
- If an ACL is used for route filtering, execute the display acl { acl-number | name acl-name } command to view its configuration details.
- If a prefix list is used for route filtering, execute the display ip prefix-list command to view its configuration details.
- If a routing policy is used for route filtering, execute the display route-policy command to view its configuration details.
If the desired routes are unexpectedly denied by the filtering rule, identify whether the filtering rule meets the requirements. If it is inappropriate, specify a new filtering rule by using the filter-policy import command.
c. If the desired routes are not denied by the filtering rule or the filter-policy import command is not configured in the OSPF process, proceed to step 4.
4. Identify whether the LSDB of the OSPF process contains LSAs that carry the OSPF routes that have not been learned.
Choose the appropriate troubleshooting method based on the type of OSPF routes that have not been learned in the OSPF process.
If the OSPF process lacks intra-area routes, execute the display ospf [ process-id ] lsdb router command in user view to identify whether the LSDB of the OSPF process contains all the Router LSAs in the area.
<Sysname> display ospf 100 lsdb router
OSPF Process 100 with Router ID 5.5.5.5
Area: 0.0.0.1
Link State Database
Type : Router
LS ID : 5.5.5.5
Adv Rtr : 5.5.5.5
LS age : 7
Len : 36
Options : ASBR O NP
Seq# : 80000026
Checksum : 0x5f1f
Link Count: 1
Link ID: 192.168.51.1
Data : 192.168.51.5
Link Type: TransNet
Metric : 1
Type : Router
LS ID : 1.1.1.1
Adv Rtr : 1.1.1.1
LS age : 8
Len : 36
Options : ASBR ABR O NP
Seq# : 8000002a
Checksum : 0x534a
Link Count: 1
Link ID: 192.168.51.1
Data : 192.168.51.1
Link Type: TransNet
Metric : 1
- If the LSDB lacks some Router LSAs, proceed to step 7.
- If the LSDB contains all of the Router LSAs, but cannot calculate routing information, proceed to step 7.
¡ Inter-area OSPF routes
If the OSPF process lacks inter-area routes, execute the display ospf [ process-id ] lsdb summary command in user view to identify whether the LSDB of the OSPF process contains all the Network Summary LSAs from other areas.
<Sysname> display ospf lsdb summary
OSPF Process 1 with Router ID 5.5.5.5
Area: 0.0.0.1
Link State Database
Type : Sum-Net
LS ID : 192.168.24.0
Adv Rtr : 1.1.1.1
LS age : 576
Len : 28
Options : O NP
Seq# : 8000001f
Checksum : 0x4c25
Net Mask : 255.255.255.0
Tos 0 Metric: 2
Type : Sum-Net
LS ID : 192.168.12.0
Adv Rtr : 1.1.1.1
LS age : 576
Len : 28
Options : O NP
Seq# : 8000001f
Checksum : 0xc6b7
Net Mask : 255.255.255.0
Tos 0 Metric: 1
- If the LSDB lacks a Network Summary LSA, identify whether the filter import command is configured in the local OSPF area or the filter export command is configured in the OSPF area from which the missing Network Summary LSA was advertised. If the Network Summary LSA was unexpectedly filtered out by the filtering rule specified by using the filter import or filter export command, adjust filtering rule to avoid this issue.
You can use the filter import and filter export commands to specify ACLs, prefix lists, or routing policies for route filtering. To view the configuration details of an ACL, prefix list, or routing policy, execute one of the display acl { acl-number | name acl-name }, display ip prefix-list, or display route-policy command as needed.
- If the LSDB contains all of the Network Summary LSAs, but cannot calculate routing information, proceed to step 7.
¡ O_ASE or O_NSSA routes
If the OSPF process lacks O_ASE routes, execute the display ospf [ process-id ] lsdb ase command in user view to identify whether the LSDB of the OSPF process contains AS External LSAs.
<Sysname> display ospf 100 lsdb ase
OSPF Process 100 with Router ID 1.1.1.1
Link State Database
Type : External
LS ID : 10.1.1.0
Adv Rtr : 1.1.1.1
LS age : 713
Len : 36
Options : O E
Seq# : 80000001
Checksum : 0x934b
Net Mask : 255.255.255.0
TOS 0 Metric: 1
E Type : 2
Forwarding Address : 192.168.51.5
Tag : 1
If the OSPF process lacks O_NSSA routes, execute the display ospf [ process-id ] lsdb nssa command in user view to identify whether the LSDB of the OSPF process contains NSSA External LSAs.
<Sysname> display ospf 100 lsdb nssa
OSPF Process 100 with Router ID 1.1.1.1
Area: 0.0.0.0
Link State Database
Area: 0.0.0.1
Link State Database
Type : NSSA
LS ID : 192.168.51.0
Adv Rtr : 5.5.5.5
LS age : 965
Len : 36
Options : O NP
Seq# : 8000001f
Checksum : 0x1dfa
Net Mask : 255.255.255.0
TOS 0 Metric: 1
E Type : 2
Forwarding Address : 192.168.51.5
Tag : 1
Type : NSSA
LS ID : 10.1.1.0
Adv Rtr : 5.5.5.5
LS age : 965
Len : 36
Options : O NP
Seq# : 8000001f
Checksum : 0x6840
Net Mask : 255.255.255.0
TOS 0 Metric: 1
E Type : 2
Forwarding Address : 192.168.51.5
Tag : 1
- If the LSDB lacks some AS External LSAs or NSSA External LSAs, proceed to step 7.
- If the LSDB contains all of the AS External LSAs or NSSA External LSAs, but cannot learn O_ASE or O_NSSA routes, proceed to step 7.
5. Identify whether the ABR is reachable.
Inter-area routes are advertised by the ABR. If the local device and the ABR cannot reach each other, the local device will not be able to learn inter-area routes.
a. Execute the display ospf [ process-id ] lsdb summary command on the local device, and then view the Adv Rtr field in the command output. This field displays the router ID of the ABR, which advertised the Network Summary LSA.
<Sysname> display ospf 100 lsdb summary
OSPF Process 100 with Router ID 5.5.5.5
Area: 0.0.0.1
Link State Database
Type : Sum-Net
LS ID : 192.168.12.0
Adv Rtr : 1.1.1.1
LS age : 913
Len : 28
Options : O E
Seq# : 80000001
Checksum : 0x5d45
Net Mask : 255.255.255.0
Tos 0 Metric: 1
b. Execute the display ospf abr-asbr command on the local device, and then view the Destination and RtType fields in the command output. If the RtType field displays ABR, the Destination field displays the router ID of the ABR. In this situation, the local device has a route to the ABR.
<Sysname> display ospf 100 abr-asbr
OSPF Process 100 with Router ID 5.5.5.5
Routing Table to ABR and ASBR
Type Destination Area Cost Nexthop RtType
Intra 1.1.1.1 0.0.0.1 1 192.168.51.1 ABR
c. If the output of the display ospf abr-asbr command does not include a route to the ABR, proceed to step 7.
d. If the output of the display ospf abr-asbr command includes a route to the ABR, and the local device is an ABR, identify whether the local OSPF area is a backbone area.
- If the OSPF area is not a backbone area (with a non-zero area ID), no action is required. According to RFC 2328, ABRs do not process the Network Summary LSAs received from non-backbone areas.
- If the OSPF area is a backbone area (with an area ID of 0), but it cannot learn inter-area routes, proceed to step 7.
e. If the output of the display ospf abr-asbr command includes a route to the ABR, and the OSPF process is bound to a VPN instance, identify whether the vpn-instance-capability simple command is configured in the OSPF process. If this command is configured, proceed to step 7.
If this command is not configured, troubleshoot this issue as described in Table 12.
Table 12 Troubleshooting methods
Whether the DN bit is set to 1 |
Troubleshooting method |
The vpn-instance-capability simple command is not configured, and the Option field of the related Network Summary LSA contains the DN bit (the DN bit is set). |
According to RFC 2328, private OSPF processes do not use Network Summary LSAs that contain the DN bit for route calculation. It is normal that the local device cannot learn inter-area routes. |
The vpn-instance-capability simple command is not configured, and the Option field of the related Network Summary LSA does not contain the DN bit. |
Proceed to step 7. |
6. Identify whether the ASBR is reachable and whether loop prevention is enabled.
O_ASE routes and O_NSSA routes are advertised by the ASBR. If the local device and the ASBR cannot reach each other, the local device will not be able to learn routes from devices located in other ASs.
a. Execute the display ospf [ process-id ] lsdb [ ase | nssa ] command, and then view the Adv Rtr field in the command output. This field displays the router ID of the ASBR, which advertised the AS External LSA (Type-5) or NSSA External LSA (Type-7).
<Sysname> display ospf 100 lsdb ase
OSPF Process 100 with Router ID 1.1.1.1
Link State Database
Type : External
LS ID : 10.1.1.0
Adv Rtr : 1.1.1.1
LS age : 169
Len : 36
Options : O E
Seq# : 80000001
Checksum : 0x934b
Net Mask : 255.255.255.0
TOS 0 Metric: 1
E Type : 2
Forwarding Address : 192.168.51.5
Tag : 1
<Sysname> display ospf 100 lsdb nssa
OSPF Process 100 with Router ID 1.1.1.1
Area: 0.0.0.0
Link State Database
Area: 0.0.0.1
Link State Database
Type : NSSA
LS ID : 192.168.51.0
Adv Rtr : 5.5.5.5
LS age : 156
Len : 36
Options : O NP
Seq# : 80000001
Checksum : 0x59dc
Net Mask : 255.255.255.0
TOS 0 Metric: 1
E Type : 2
Forwarding Address : 192.168.51.5
Tag : 1
Type : NSSA
LS ID : 10.1.1.0
Adv Rtr : 5.5.5.5
LS age : 156
Len : 36
Options : O NP
Seq# : 80000001
Checksum : 0xa422
Net Mask : 255.255.255.0
TOS 0 Metric: 1
E Type : 2
Forwarding Address : 192.168.51.5
Tag : 1
b. Execute the display ospf abr-asbr command, and then view the Destination and RtType fields in the command output. If the RtType field displays ASBR, the Destination field displays the router ID of the ASBR. In this situation, the local device has a route to the ASBR.
<Sysname> display ospf 100 abr-asbr
OSPF Process 100 with Router ID 1.1.1.1
Routing Table to ABR and ASBR
Type Destination Area Cost Nexthop RtType
Intra 5.5.5.5 0.0.0.1 1 192.168.51.5 ASBR
c. If the output of the display ospf abr-asbr command does not include a route to the ASBR, proceed to step 7.
d. If the output of the display ospf abr-asbr command includes a route to the ASBR, and the Forwarding Address field of the LSA is not 0, check the reachability and route type of the forwarding address.
Execute the disply ospf routing forwarding-address { mask-length | mask } command in user view to identify whether the local device has a route to the forwarding address.
<Sysname> display ospf 100 routing 192.168.51.5 24
OSPF Process 100 with Router ID 1.1.1.1
Routing Table
Routing for network
Destination Cost Type NextHop AdvRouter Area
192.168.51.0/24 1 Transit 0.0.0.0 5.5.5.5 0.0.0.1
Total nets: 1
Intra area: 1 Inter area: 0 ASE: 0 NSSA: 0
Troubleshoot this issue as described in Table 13.
Table 13 Troubleshooting methods
Whether the forwarding address is reachable |
Troubleshooting method |
Unreachable |
If the display ospf routing forwarding-address { mask-length | mask } command does not display route information for the forwarding address, the forwarding address is unreachable. In this case, proceed to step 7. |
Reachable |
If the missing external routes are advertised in an NSSA External LSA, no action is required. According to RFC 3101, the route to the forwarding address must belong to the same OSPF area as the NSSA External LSA. If the Area field displays an area ID different from that of an NSSA External LSA, the OSPF process will not use that NSSA External LSA for route calculation. Therefore, it is normal that the OSPF process lacks the related external routes. |
If the Type field displays Type1 or Type2 in the output of the display ospf routing forwarding-address { mask-length | mask } command, the route to the forwarding address is an external route. According to RFC 2328, if the route to a non-zero forwarding address is an external route, OSPF will not use the related LSA for route calculation. Therefore, it is normal that the OSPF process lacks the related external routes. |
e. If the output of the display ospf abr-asbr command includes a route to the ASBR, and the OSPF process is bound to a VPN instance, identify whether the vpn-instance-capability simple command is configured in the OSPF process.
If this command is configured, proceed to step 7.
If this command is not configured, troubleshoot this issue as described in Table 14.
Table 14 Troubleshooting methods
Whether the DN bit is set to 1 |
Troubleshooting method |
The vpn-instance-capability simple command is not configured, and the Option field of the related AS External LSA or NSSA External LSA contains the DN bit. |
According to RFC 2328, private OSPF processes do not use AS External LSAs or NSSA External LSAs that contain the DN bit for route calculation. It is normal that the local device cannot learn the related external routes. |
The vpn-instance-capability simple command is not configured, and the Option field of the related AS External LSA or NSSA External LSA does not contain the DN bit. |
Execute the display ospf command, and then view the Default ASE parameters field in the command output to identify whether the AS External LSA or NSSA External LSA has the same tag value as the private OSPF process: · If they use the same tag value, no action is required. According to RFC 2328, private OSPF processes do not use such LSAs for route calculation. Therefore, it is normal that the OSPF process does not have the related external routes. · If they use different tag values, proceed to step 7. |
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Route flapping caused by an IP address conflict
Symptom
In an OSPF network, if different devices use the same interface IP address, OSPF route flapping will occur. When this issue occurs, the related devices typically have the following phenomena:
· The display cpu-usage command displays a high CPU usage.
· OSPF marks LSAs as stale frequently, and re-generates LSAs.
· The device refreshes routes frequently, and route calculations are incorrect.
Solution
In this troubleshooting example, the network diagram is as shown in Figure 62. The troubleshooting methods for other networks are similar as that for this network.
2. On each device in the OSPF network, execute the display ospf [ process-id ] lsdb command every second to view their OSPF LSDB information.
3. Check for abnormal LSA aging.
If abnormal LSA aging exists, you can find the following symptoms:
¡ On Device A, the Age field of a Network LSA (Type-2) remains at the minimum value, but its Sequence field increases rapidly. For example, in the following command output, the Age of Network LSA 172.168.0.1 (LinkStateID) does not naturally increase, and the Sequence field grows from 8000002D to 8000002F in a short time.
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 10.1.1.1
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 797 48 80000009 0
Router 1.1.1.1 1.1.1.1 835 36 80000005 0
Router 4.4.4.4 4.4.4.4 798 36 80000004 0
Router 10.1.1.1 10.1.1.1 415 36 80000007 0
Router 2.2.2.2 2.2.2.2 415 48 80000015 0
Network 192.168.0.2 3.3.3.3 802 32 80000002 0
Network 172.168.0.3 4.4.4.4 791 32 80000002 0
Network 172.168.0.1 10.1.1.1 7 32 8000002D 0
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 10.1.1.1
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 810 48 80000009 0
Router 1.1.1.1 1.1.1.1 848 36 80000005 0
Router 4.4.4.4 4.4.4.4 811 36 80000004 0
Router 10.1.1.1 10.1.1.1 428 36 80000007 0
Router 2.2.2.2 2.2.2.2 428 48 80000015 0
Network 192.168.0.2 3.3.3.3 815 32 80000002 0
Network 172.168.0.3 4.4.4.4 804 32 80000002 0
Network 172.168.0.1 10.1.1.1 4 32 8000002F 0
¡ On Device B, the Age field of the same Network LSA frequently switches between 3600 and other smaller values, and its Sequence field increases rapidly. For example, in the following command output, the Age of Network LSA 172.168.0.1 (LinkStateID) frequently switches between 3600 and other smaller values, and the Sequence field grows from 80000023 to 80000041 in a short time.
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 2.2.2.2
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 708 48 80000009 0
Router 1.1.1.1 1.1.1.1 746 36 80000005 0
Router 4.4.4.4 4.4.4.4 709 36 80000004 0
Router 10.1.1.1 10.1.1.1 329 36 80000007 0
Router 2.2.2.2 2.2.2.2 327 48 80000015 0
Network 172.168.0.3 4.4.4.4 702 32 80000002 0
Network 192.168.0.2 3.3.3.3 713 32 80000002 0
Network 172.168.0.1 10.1.1.1 3600 32 80000023 0
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 2.2.2.2
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 748 48 80000009 0
Router 1.1.1.1 1.1.1.1 786 36 80000005 0
Router 4.4.4.4 4.4.4.4 749 36 80000004 0
Router 10.1.1.1 10.1.1.1 369 36 80000007 0
Router 2.2.2.2 2.2.2.2 367 48 80000015 0
Network 172.168.0.3 4.4.4.4 742 32 80000002 0
Network 192.168.0.2 3.3.3.3 753 32 80000002 0
Network 172.168.0.1 10.1.1.1 7 32 80000041 0
¡ On Device C, the Age field of the same Network LSA remains at 3600 or the Network LSA occasionally disappears, and the Sequence field increases rapidly. For example, in the following command output, the Age of Network LSA 172.168.0.1 (LinkStateID) remains at 3600 or the Network LSA occasionally disappears. When the Network LSA exists, its Sequence field grows from 80000309 to 80000346 in a short time.
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 3.3.3.3
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 740 48 8000000D 0
Router 4.4.4.4 4.4.4.4 759 36 80000008 0
Router 10.1.1.1 10.1.1.1 364 36 8000000B 0
Router 2.2.2.2 2.2.2.2 366 48 80000019 0
Network 172.168.0.3 4.4.4.4 755 32 80000006 0
Network 192.168.0.2 3.3.3.3 744 32 80000006 0
Network 172.168.0.1 10.1.1.1 3600 32 80000309 0
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 3.3.3.3
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 745 48 8000000D 0
Router 4.4.4.4 4.4.4.4 764 36 80000008 0
Router 10.1.1.1 10.1.1.1 369 36 8000000B 0
Router 2.2.2.2 2.2.2.2 371 48 80000019 0
Network 172.168.0.3 4.4.4.4 760 32 80000006 0
Network 192.168.0.2 3.3.3.3 749 32 80000006 0
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 3.3.3.3
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 1302 48 8000000D 0
Router 4.4.4.4 4.4.4.4 1321 36 80000008 0
Router 10.1.1.1 10.1.1.1 926 36 8000000B 0
Router 2.2.2.2 2.2.2.2 928 48 80000019 0
Network 172.168.0.3 4.4.4.4 1317 32 80000006 0
Network 192.168.0.2 3.3.3.3 1306 32 80000006 0
Network 172.168.0.1 10.1.1.1 3600 32 80000346 0
4. Check for OSPF route flapping.
On Device B, execute the display ospf [ process-id ] routing command every second to check for route flapping.
<Sysname> display ospf 100 routing
OSPF Process 100 with Router ID 2.2.2.2
Routing Table
Routing for network
Destination Cost Type NextHop AdvRouter Area
192.168.0.0/24 1 Transit 0.0.0.0 3.3.3.3 0.0.0.0
172.168.0.0/24 1 Transit 0.0.0.0 10.1.1.1 0.0.0.0
Total nets: 2
Intra area: 2 Inter area: 0 ASE: 0 NSSA: 0
<Sysname> display ospf 100 routing
OSPF Process 100 with Router ID 2.2.2.2
Routing Table
Routing for network
Destination Cost Type NextHop AdvRouter Area
192.168.0.0/24 1 Transit 0.0.0.0 3.3.3.3 0.0.0.0
172.168.0.0/24 2 Transit 192.168.0.2 4.4.4.4 0.0.0.0
Total nets: 2
Intra area: 2 Inter area: 0 ASE: 0 NSSA: 0
If OSPF route flapping occurs, and multiple executions of the display ospf peer command show that the neighbor relationship is not flapping, an IP address conflict exists in the OSPF network. Meanwhile, this indicates that one of the conflicting devices is a DR, because Network LSAs (Type-2) are generated by DRs.
If two Network LSAs with the same LinkState ID exist and they are aging abnormally on any device, both of the conflicting devices are DRs.
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 10.1.1.1
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 367 48 80000021 0
Router 4.4.4.4 4.4.4.4 369 36 80000013 0
Router 10.1.1.1 10.1.1.1 477 36 80000012 0
Router 2.2.2.2 2.2.2.2 403 48 8000002B 0
Network 192.168.0.1 2.2.2.2 395 32 80000002 0
Network 172.168.0.1 3.3.3.3 3600 32 8000002B 0
Network 172.168.0.1 10.1.1.1 9 32 80000036 0
<Sysname> display ospf 100 lsdb
OSPF Process 100 with Router ID 10.1.1.1
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 3.3.3.3 3.3.3.3 460 48 80000021 0
Router 4.4.4.4 4.4.4.4 462 36 80000013 0
Router 10.1.1.1 10.1.1.1 570 36 80000012 0
Router 2.2.2.2 2.2.2.2 496 48 8000002B 0
Network 192.168.0.1 2.2.2.2 488 32 80000002 0
Network 172.168.0.1 3.3.3.3 3600 32 80000034 0
Network 172.168.0.1 10.1.1.1 6 32 80000041 0
5. Identify the conflicting devices.
You can use the output of the display ospf lsdb command to find the devices causing the IP address conflict.
If only one of the conflicting devices is a DR, perform the following task:
a. Check the AdvRouter field of the abnormal Network LSA to find the router ID of the advertising DR.
b. Check the LinkState ID field of the abnormal Network LSA to identify the interface that uses the conflicting IP address, and then find the IP address of the interface.
c. Based on the obtained interface address and the IP address plan, identify another conflicting device.
In this example, the DR with Router ID 10.1.1.1 has an interface IP address conflict with another device, and the conflicting IP address is 172.168.0.1. Based on the IP address plan, you can find the other device causing the conflict.
If both of the conflicting devices are DRs, perform the following task:
d. Check the AdvRouter field of each abnormal Network LSA to find the router IDs of the advertising DRs.
e. Check the LinkState ID field of each abnormal Network LSA to identify the interfaces causing the IP address conflict.
6. Change the IP address of a conflicting device, according to the network IP address plan.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Equal-cost route issues
Some next hops of equal-cost routes do not participate in load sharing or the load sharing is uneven
Symptom
· Traffic is not distributed to one or multiple next hops of equal-cost routes. When you use the display counters rate outbound interface command to observe the packet transmission rate of related interfaces, you can find that the transmission rate is 0 on one or multiple output interfaces of the equal-cost routes.
· Traffic is load shared unevenly. When you use the display counters rate outbound interface command to observe the packet transmission rate of related interfaces, you can find that one or multiple output interfaces of the equal-cost routes have a noticeably lower transmission rate.
Common causes
The following are the common causes of this type of issue:
· The number of next hops exceeds the maximum number of next hops supported by the device.
· The routes with the output interface have not been configured or properly issued.
· The physical link state and the data link layer state of the output interface are not up.
· The IP address of the output interface and that of the next hop interface are not in the same network segment.
· The device does not have an ARP or ND entry for the next hop.
· The load sharing mode is inappropriate.
· The hardware resources are insufficient.
· The last hop traversed by the traffic is configured with load sharing.
Troubleshooting flow
Figure 63 shows the troubleshooting flowchart.
Figure 63 Flowchart for troubleshooting equal-cost route issues
Solution
1. Identify whether the number of equal-cost routes with the same destination exceeds the maximum number of equal-cost routes supported by the device.
a. To view the maximum number of IPv4 equal-cost routes supported by the system, execute the display max-ecmp-num command. To view the maximum number of IPv6 equal-cost routes supported by the system, execute the display ipv6 max-ecmp-num command.
b. To view the number of IPv4 equal-cost routes destined for a specific address, execute the display ip routing-table ip-address longer-match command with the destination address specified. To view the number of IPv6 equal-cost routes destined for a specific address, execute the display ipv6 routing-table ipv6-address longer-match command with the destination address specified. In the command output, all routes with the same destination but different next hops are equal-cost routes. The number of those equal-cost routes equals Summary count minus count A. The Summary count argument represents the value of the Summary count field in the command output. The count A argument represents the number of routes whose mask length is different from that of the destination address.
- If the number of equal-cost routes with the same destination reaches the upper limit, the excess equal-cost routes will not be flushed to the routing table. To edit the next hop of an equal-cost route in the routing table, delete the equal-cost route, and then configure a new equal-cost route.
- If the number of equal-cost routes with the same destination is lower the upper limit, proceed to step 2.
2. Identify whether the equal-cost routes have been correctly flushed to the routing table.
Execute one of the following commands as needed to view the routes with the related destination:
¡ display ip routing-table [ vpn-instance vpn-instance-name ] ip-address [ mask-length | mask ] [ longer-match ] verbose
¡ display ipv6 routing-table [ vpn-instance vpn-instance-name ] ipv6-address [ prefix-length ] [ longer-match ] [ verbose ]
If the routing table does not contain the equal-cost route with the desired next hop and output interface, check for route configuration errors. If the route configuration is correct, proceed to step 3.
3. Identify whether the physical link state and the data link layer state of the output interface are up.
Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] or display ipv6 interface [ interface-type [ interface-number ] ] [ brief ] command to view states of the output interface.
¡ If the interface is not up at the physical layer or data link layer, resolve the interface or link failure.
¡ If the physical link state and the data link layer state of the output interface are up, proceed to step 4.
4. Identify whether the IP address of the output interface and that of the next hop interface are in the same network segment.
Execute the display interface brief or display ipv6 interface brief command on both the local device and the next hop device to view IP addresses of the interfaces that connect the two devices.
¡ If IP addresses of the two interfaces are not in the same network segment, execute the ip address/ipv6 address command in interface view for either of the interfaces to adjust its IP address. This operation ensures that IP addresses of the two interfaces are in the same network segment.
¡ If IP addresses of the two interfaces are not in the same network segment, proceed to step 5.
5. Identify whether an ARP or ND entry for the related next hop exists on the device.
To view ARP entries, execute the display arp command. To view ND entries, execute the display ipv6 neighbors command. If the device does not have an ARP or ND entry for the next hop, resolve this issue first. If the device has an ARP or ND entry for the next hop, proceed to step 6.
6. Identify whether the load sharing mode is appropriate.
¡ If the load sharing mode is inappropriate, determine the load sharing factors based on the packets to be load shared, and then execute the ip load-sharing mode command to adjust the load sharing mode. For example, if the packets with the same destination address carry different source IP addresses, IP protocol numbers, and destination port numbers, you can add these fields into the ip load-sharing mode command. If the issue persists after the load sharing mode is fully adjusted, proceed to step 7.
¡ If the load sharing mode is appropriate, proceed to step 7.
7. Check for hardware resource insufficiency.
¡ To view the IPv4 FIB entries that failed to be issued to the driver, use the display system internal fib prefix [ vpn-instance vpn-instance-name] entry-status f command.
¡ To view the IPv6 FIB entries that failed to be issued to the driver, use the display system internal ipv6 fib prefix [ vpn-instance vpn-instance-name ] entry-status f command.
The device is suffering from hardware resource insufficiency as long as the command output displays FIB entry information. To resolve this issue, disable unnecessary features to lower the hardware resource usage. If the issue persists, proceed to step 8.
8. Identify whether the last hop traversed by the traffic is configured with load sharing.
A device configured with load sharing might forward traffic to the local device. In this situation, affected by the load sharing algorithm, the traffic might be unevenly distributed when the local device transmits the traffic to nexthop devices. This is a normal phenomenon and no action is required. You need to identify whether the traffic from devices that are not configured with load sharing is unevenly distributed by the local device. If such an issue exists, proceed to step 9.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting RIR
The highest-priority link not selected for traffic forwarding
Symptom
In priority-based link selection, RIR does not select the link with the highest priority (Tunnel 1) for service traffic. Instead, it selects the link with second highest priority (Tunnel 2) for traffic forwarding.
Common causes
The following are the common causes of this type of issue:
· The route associated with the highest-priority link is unreachable.
· No ECMP routes are available to the destination IP address of service traffic.
· The bandwidth usage of the highest-priority link exceeds the specified lower threshold.
· The quality of the highest-priority link does not meet the requirements.
Analysis
Figure 64 shows the troubleshooting flowchart.
Solution
1. Identify whether the route associate with the highest-priority link (Tunnel 1) is reachable.
2. Execute the display tunnel flow-statistics command to view the path information (Interface) selected for the service traffic, namely Tunnel 2.
<Sysname> display tunnel flow-statistics flow 100
Flow 100:
Interface Out pps Out bps
Tunnel2 20 9600
3. Execute the display ip fast-forwarding cache command to view the 5-tuple information of the service traffic (that is, the output interface is Tunnel 2), and obtain the destination IP address of the service traffic.
<Sysname> display ip fast-forwarding cache
Total number of fast-forwarding entries: 1
SIP SPort DIP DPort Pro Input_If Output_If Flg
7.0.0.13 68 8.0.0.1 67 17 GE2/0/3 Tunnel2 5
4. Execute the display fib command to identify whether ECMP routes are available to the destination IP address, and the ECMP routes include link Tunnel 1.
¡ If no ECMP routes are available, check and edit route configuration to ensure that ECMP routes are available to the destination IP address and include link Tunnel 1. Only when such conditions exist, Tunnel 1 can participate in RIR route selection.
¡ If the ECMP routes exist, proceed to step 2.
<Sysname> display fib
Route destination count: 5
Directly-connected host count: 0
Flag:
U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination/Mask Nexthop Flag OutInterface/Token Label
8.0.0.1/32 127.0.0.1 UH Tunnel1 Null
8.0.0.1/32 127.0.0.1 UH Tunnel2 Null
5. Examine the bandwidth usage of the highest-priority link Tunnel 1.
6. Identify the bandwidth threshold. Verify that the flow priority-based-schedule enable command is configured in RIR-SDWAN view. If the command is not configured, the lower bandwidth usage threshold is 80%. If the command is configured, the lower bandwidth usage threshold is specified by the flow priority-based-schedule bandwidth-threshold command (default is 20%).
7. Execute the display rir sdwan bandwidth tunnel command to identify whether the bandwidth usage of Tunnel 1 exceeds the lower bandwidth threshold.
¡ If the lower bandwidth threshold is not exceeded, the bandwidth meets the link selection criteria, and you can proceed to step 3.
¡ If the lower bandwidth threshold is exceeded, the bandwidth does not meet the route selection criteria. Verify that the bandwidth of Tunnel 1 matches the bandwidth of the tunnel's physical output interface. If they do not match, edit the bandwidth of Tunnel 1 with the bandwidth command. If they match, it is normal for RIR to select Tunnel 2, because the bandwidth usage of Tunnel 1 does not meet the route selection criteria. Therefore, the device schedules service traffic to the lower-priority link Tunnel 2.
<Sysname> display rir sdwan bandwidth tunnel 1
Tunnel bandwidth info:
Interface Total bandwidth Remaining bandwidth Bandwidth usage
Tunnel1 200 kbps 200 kbps 0 %
Output interface bandwidth info:
PeerTTE: SiteID=1 DeviceID=2 IfID=2
Interface Total bandwidth Remaining bandwidth Bandwidth usage
GE2/0/1 200 kbps 200 kbps 0 %
8. Examine the quality of the highest-priority link Tunnel 1.
9. Execute the display rir sdwan flow command to verify that the CQI value for Tunnel 1 reaches 100.
¡ If the CQI value is 100, proceed to step 4.
¡ If the CQI value is below 100, proceed to step b.
<Sysname> display rir sdwan flow 1
Flow ID: 1
Session expected bandwidth: 2000 kbps
Quality policy: Yes
Tunnels with different preference values:
Preference: 8
Tunnel1
Site ID Device ID Interface ID CQI
100 1 100 80
100 2 110 90
10. Execute the display rir sdwan link-quality command to view the packet loss ratio (PktLoss (per mill)), delay (Delay (msec)), and jitter (Jitter (msec)) of Tunnel 1.
<Sysname> display rir sdwan link-quality
Tunnel1
Interface ID=1
Peer TTE: Site ID=1 Device ID=2 Interface ID=3
Connectivity: Connected
PktLoss (per mill): 0
Delay (msec) : 0
Jitter (msec) : 0
11. Compare the packet loss ratio, delay, and jitter of Tunnel 1 with the packet loss threshold (configure with the packet-loss threshold command), delay threshold (configure with the delay threshold command), and jitter threshold (configure with the jitter threshold command) configured in RIR-SDWAN view. If any of the thresholds is exceeded, the quality of the link Tunnel 1 does not meet the link selection criteria, and it is normal for RIR to select Tunnel 2. The device will then schedule service traffic to the link Tunnel 2, whose link quality meets the link selection criteria.
12. To fast restore the CQI value of link Tunnel 1 to 100, you can increase the values for the packet loss threshold (configure with the packet-loss threshold command), delay threshold (configure with the delay threshold command), and jitter threshold (configure with the jitter threshold command) configured in RIR-SDWAN view in the RIR-SDWAN view. This ensures that the quality of the link Tunnel 1 meets the link selection criteria.
13. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting multicast issues
MSDP issues
(S, G) entry creation failure
Symptom
The receiver-side MSDP peer fails to create (S, G) entries.
Common causes
The following are the common causes of this type of issue:
· The receiver-side MSDP peer fails to establish an MSDP peer relationship with the source-side MSDP peer.
· The receiver-side MSDP peer is not enabled with the SA message cache mechanism.
· The receiver-side MSDP peer does not receive SA messages from the source-side MSDP peer.
· The source-side MSDP peer is not created on the RP.
· Configuration errors exist, such as incorrect SA incoming policy, SA outgoing policy, or SA message creation policy.
Troubleshooting flow
Figure 65 shows the troubleshooting flowchart.
Figure 65 Flowchart for troubleshooting (S, G) entry creation failure
Solution
1. Verify that the receiver-side MSDP peer have successfully established an MSDP peer relationship with the source-side MSDP peer.
Execute the display msdp brief command on the receiver-side MSDP peer, and check the State field. If the State field is Established, an MSDP peer relationship has been established successfully.
¡ If the State field is not Established, verify that the interface used to establish a TCP connection with the source-side MSDP peer is correct and that the MSDP peers can ping each other. If the MSDP peers cannot ping each other, troubleshoot the ping failure as described in "Ping failure."
¡ If an MSDP peer relationship has been established successfully, proceed to the next step.
2. Verify that the receiver-side MSDP peer is enabled with the SA message cache mechanism.
Execute the display this command in MSDP view on the receiver-side MSDP peer to identify whether the SA message cache mechanism is enabled.
¡ If no, execute the cache-sa-enable command.
¡ If yes, proceed to the next step.
3. Identify whether the receiver-side MSDP peer receives SA messages from the source-side MSDP peer.
Execute the display this command to display (S, G) entries in the SA cache. Identify whether SA messages from the source-side MSDP peer are received by examining the (S, G) entries.
¡ If no, proceed to step 4.
¡ If yes, proceed to step 8.
4. Identify whether the source-side MSDP peer is configured with an SA outgoing policy.
Execute the display this command on the source-side MSDP peer to identify whether an SA outgoing policy is configured.
¡ If yes, perform one of the following tasks depending on whether an ACL is specified:
- If no ACL is specified, the source-side MSDP peer discards all SA messages, and use the undo peer sa-policy export command to delete the SA outgoing policy.
- If an ACL is specified, the source-side MSDP peer forwards only SA messages that the ACL permits. Verify that the SA messages are permitted by the specified ACL. If the SA messages are not permitted, use the undo peer sa-policy export command to delete the SA outgoing policy or modify the ACL.
¡ If no, proceed to the next step.
5. Identify whether the receiver-side MSDP peer is configured with an SA incoming policy.
Execute the display this command on the receiver-side MSDP peer to identify whether an SA incoming policy is configured.
¡ If yes, perform one of the following tasks depending on whether an ACL is specified:
- If no ACL is specified, the receiver-side MSDP peer discards all SA messages, and use the undo peer sa-policy export command to delete the SA incoming policy.
- If an ACL is specified, the receiver-side MSDP peer receives only SA messages that the ACL permits. Verify that the SA messages are permitted by the specified ACL. If the SA messages are not permitted, use the undo peer sa-policy export command to delete the SA incoming policy or modify the ACL.
¡ If no, proceed to the next step.
6. Verify that the source-side MSDP peer is the RP.
Execute the display pim routing-table command on the source-side MSDP peer, and check the Flag field. If the Flag field is 2MSDP, the source-side MSDP peer is the RP.
¡ If the source-side MSDP peer is not the RP, modify the RP configuration or modify the configuration on the receiver-side MSDP peer.
¡ If the source-side MSDP peer is the RP, proceed to the next step.
7. Identify whether the source-side MSDP peer is configured with an SA message creation policy.
Execute the display this command on the source-side MSDP peer to identify whether an SA message creation policy is configured.
¡ If yes, perform one of the following tasks depending on whether an ACL is specified:
- If no ACL is specified, the source-side MSDP peer does not advertise any (S, G) entries when creating SA messages, and use the undo import-source command to delete the SA message creation policy.
- If an ACL is specified, the source-side MSDP peer advertises only the (S, G) entries that the ACL permits. Verify that the (S, G) entries are permitted by the specified ACL. If the (S, G) entries are not permitted, use the undo import-source command to delete the SA message creation policy or modify the ACL.
¡ If no, proceed to the next step.
8. If the issue persists, collect the following information and contact the support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
MVPN issues
Default MDT establishment failure
Symptom
The PEs cannot establish a default MDT or establish PIM neighbor relationships in the same VPN.
Common causes
The following are the common causes of this type of issue:
· Incorrect MTI configuration. To establish a default MDT in a VPN instance, you must specify a default group and an MVPN source interface with a valid IP address for the MTI for that VPN instance on each PE.
· Incorrect default group configuration. You must specify the same default group for the same VPN instance across PEs. A default group uniquely identifies a default MDT. The PEs cannot establish a default MDT for a VPN instance if they have different default groups for that VPN instance.
· Incorrect PIM configuration. To correctly establish a default MDT for a VPN instance, you must enable the same PIM mode on all interfaces in that VPN instance across PEs and all interfaces on the P devices. PIM mode consistency ensures establishment of PIM neighbor relationships between PEs in the same VPN instance for a successful default MDT establishment.
· Absence of unicast routes or BGP peers. PIM can only obtain routing information correctly if both unicast routes and BGP peers are configured.
PIM disabled on the MTI for the VPN instance. This prevents PEs from establishing PIM neighbor relationships in the same VPN instance. To enable PIM on the MTI for a VPN instance, you must enable PIM on a minimum of one interface in the VPN instance.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 66.
Figure 66 Troubleshooting flowchart
Solution
1. Execute the display interface command to check the state and address encapsulation information of MTIs.
2. Execute the display multicast-vpn default-group command to verify that different PEs have the same default group for the same VPN instance.
3. Execute the display pim interface verbose command on each device. Verify that PIM is enabled on a minimum of one interface in the VPN instance on each PE. Ensure that the same PIM mode is enabled on the interfaces in the same VPN instance across PEs and all interfaces on the P devices.
4. Execute the display ip routing-table command to verify that the local PE has unicast route entries to the the remote PE in the same VPN instance.
5. Execute the display bgp peer command to verify that a BGP peer relationship has been established between the PEs.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Multicast routing table incorrectly built for a VPN instance
Symptom
The device cannot correctly build a multicast routing table for a VPN instance.
Common causes
The following are the common causes of this type of issue:
· The VPN instance or public instance does not have bootstrap router (BSR) information. To build a multicast routing table correctly for a VPN instance enabled with PIM-SM, both the VPN instance and the public instance must have the BSR information for that VPN instance.
· The VPN instance or public instance lacks RP information. To build a multicast routing table correctly for a VPN instance enabled with PIM-SM, both the VPN instance and the public instance must have the RP information for the VPN instance and routes to that RP. In addition, the devices in the public instance and VPN instance must correctly establish PIM neighbor relationships.
· No active routes are available between the DRs and the RP for the private network. You must make the DRs in the private network have routes to their RP, and the VPN instance for the private network has routes to the multicast source.
Troubleshooting flow
Figure 67 shows the troubleshooting flowchart.
Solution
1. Use the display pim bsr-info command to verify that the public and VPN instances have BSR information. If BSR information is absent, check for unicast routes to the BSR.
2. Use the display pim rp-info command to verify that the RP information is correct. If no RP information is displayed, verify the presence of a unicast route to the RP. Additionally, execute the display pim neighbor command to verify that PIM neighbor relationships have been correctly established on both public and private networks.
3. Use the ping command to check connectivity between the DR and RP on the private network, and between the receivers and the multicast source.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
PIM issues
PIM neighbor establishment failure
Symptom
The PIM neighbor relationship fails to be established.
Common causes
The following are the common causes of this type of issue:
· The physical state of the interface is down.
· The primary IP address is not configured on the interface.
· The PIM function on the interface does not take effect.
· The interface is not enabled with PIM.
· The PIM-related configuration on the interface is incorrect.
Troubleshooting flow
Figure 68 shows the troubleshooting flowchart.
Figure 68 Flowchart for troubleshooting PIM neighbor establishment failure
Solution
1. Verify that the physical state of the interface is up.
Execute the display interface interface-type interface-number command, and check the Current state field for the physical state of the interface.
¡ If the physical state is up, proceed to the next step.
¡ If the physical state is down, troubleshoot the interface down issue.
2. Verify that the interface is configured with a primary IP address.
Execute the display this command on the interface, and check for the primary IP address.
¡ If the primary IP address is not configured, use the ip address command to configure it.
¡ If the primary IP address is configured, proceed to the next step.
3. Verify that the interface is enabled with PIM.
Execute the display current-configuration interface command to identify whether PIM is enabled on the interface.
¡ If PIM is not enabled, execute the pim dm or pim sm command on the interface.
¡ If PIM is enabled, proceed to the next step.
4. Verify that PIM has taken effect on the interface.
Execute the display pim interface command. If PIM information exists for the interface, PIM has taken effect on the interface.
¡ If PIM has not taken effect, execute the display current-configuration | include multicast command to identify whether IP multicast routing has been enabled.
- If IP multicast routing has not been enabled, execute the multicast routing command to enable it.
- If IP multicast routing has been enabled, proceed to the next step.
¡ If PIM has taken effect, proceed to the next step.
5. Verify that the PIM-related configuration on the interface is correct.
The following are the common configuration errors that can cause the PIM neighbor relationship to fail to be established:
¡ The IP addresses of the directly connected interfaces are not on the same network segment.
¡ A PIM hello policy is configured on the interface by using the pim neighbor-policy command, but the neighbor’s IP address is not permitted by the specified ACL and PIM hello messages from the neighbor are dropped. Identify whether a PIM hello policy is required.
- If yes, modify the ACL so that the IP address of the PIM neighbor can be permitted by the ACL.
- If no, execute the undo pim neighbor-policy command to delete it.
¡ The pim require-genid command is executed on the interface to drop the hello messages without generation ID options, and hello messages from the neighbor do not carry generation ID options. Identify whether hello messages without generation ID options must be dropped.
- If yes, proceed to the next step.
- If no, execute the undo pim require-genid command.
6. If the issue persists, collect the following information and contact the support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Layer 3 multicast traffic forwarding failure within a PIM domain
Symptom
After IP multicast routing is enabled, Layer 3 multicast traffic fails to be forwarded within the same PIM domain.
Common causes
The following are the common causes of this type of issue:
· An interface for forwarding multicast data is not enabled with PIM.
· The PIM function on an interface does not take effect.
· The PIM neighboring relationship fails to be established.
· The interface connected to the hosts is not enabled with IGMP.
· In a PIM-SM or BIDIR-PIM network, the RP is not configured or the RP information is incorrect.
· No RPF route to the RP or multicast source exists.
· An interface for forwarding multicast data has been configured with a multicast forwarding boundary.
· In a PIM-SM or BIDIR-PIM network, an incorrect multicast source policy is configured.
· No multicast entry is generated.
Troubleshooting flow
Figure 69 shows the troubleshooting flowchart.
Solution
1. Verify that the interface for forwarding multicast data is enabled with PIM.
Execute the display this command on the interface for forwarding multicast data, and identify whether the PIM-SM or PIM-DM configuration exists.
¡ If no, PIM is not enabled on the interface. Execute the pim sm or pim dm command on the interface. In the case of a BIDIR-PIM network, also execute the bidir-pim enable command to enable BIDIR-PIM.
¡ If yes, proceed to the next step.
2. Verify that PIM has taken effect on the interface.
Execute the display pim interface command. If PIM information exists for the interface, PIM has taken effect on the interface.
¡ If PIM has not taken effect, execute the display interface interface-type interface-number command, and check the Current state field for the physical state of the interface. If the physical state is down, troubleshoot the interface down issue.
¡ If PIM has taken effect, proceed to the next step.
3. Verify that the PIM neighbor relationship has been established successfully.
Execute the display pim neighbor command. If PIM neighbor information exists, the PIM neighbor relationship has been established successfully.
¡ If the PIM neighbor relationship fails to be established, see “PIM neighbor establishment failure” for troubleshooting.
¡ If the PIM neighbor relationship has been established successfully, proceed to the next step.
4. Verify that IGMP has taken effect on the interface connected to the subnet of hosts.
Execute the display igmp interface command. If IGMP information exists for the interface, IGMP has taken effect on the interface.
¡ If IGMP has not taken effect, execute the igmp enable command on the interface to enable IGMP.
¡ If IGMP has taken effect:
- For a PIM-SM or BIDIR-PIM network, proceed to step 5.
- For a PIM-DM network, proceed to step 7.
5. Verify that the RP information is correct in a PIM-SM or BIDIR-PIM network.
Execute the display pim rp-info command on each device in the network. Identify whether the RP information for the multicast group is the same on all devices.
¡ If the RP information is different and static RPs are used, execute the static-rp command on each device to configure the same static RP. To use dynamically elected RPs, proceed to step 6.
¡ If the RP information is the same on all devices, proceed to step 6.
6. Verify that an RPF route to the RP exists.
Execute the display multicast rpf-info command to check for the RPF route to the RP.
¡ If no RPF route to the RP exists, examine the unicast route configuration. Execute the ping command on both the device and the RP to identify whether they can ping each other successfully. If no, modify the unicast route configuration until they can ping each other successfully.
¡ If an RPF route to the RP exists, execute the display multicast rpf-info command, and check the Referenced route type field for the type of the referenced route.
- If the RPF route is a static multicast route, execute the display multicast routing-table static command to identify whether the static multicast route is correct.
- If the RPF route is a unicast route, execute the display ip routing-table command to identify whether the unicast route is the same as the RPF route.
If an RPF route to the RP exists and is correct, proceed to the next step.
7. Verify that an RPF route to the multicast source exists.
Execute the display multicast rpf-info command to check for the RPF route to the multicast source.
¡ If no RPF route to the multicast source exists, examine the unicast route configuration. Execute the ping command on both the device and the multicast source to identify whether they can ping each other successfully. If no, modify the unicast route configuration until they can ping each other successfully.
¡ If an RPF route to the multicast source exists, execute the display multicast rpf-info command, and check the Referenced route type field for the type of the referenced route.
- If the RPF route is a static multicast route (the Referenced route type field is multicast static), execute the display multicast routing-table static command to identify whether the static multicast route is correct.
- If the RPF route is a unicast route (the Referenced route type field is igp, egp, unicast (direct), or unicast), execute the display ip routing-table command to identify whether the unicast route is the same as the RPF route.
¡ If an RPF route to the multicast source exists and is correct, proceed to the next step.
8. Verify that the RPF interface and the connected interface of the device's RPF neighbor have not been configured with a multicast forwarding boundary.
Execute the display multicast boundary command to check for the multicast forwarding boundary configuration.
¡ If an interface is configured with a multicast forwarding boundary, execute the undo multicast boundary command to delete it.
¡ If no interface is configured with a multicast forwarding boundary, proceed to the next step.
9. Verify that no multicast source policy is configured or multicast data is permitted by a multicast source policy.
Execute the display this command in PIM view to check for a multicast source policy (configured by using the source-policy command).
¡ If a multicast source policy has been configured, verify that it permits multicast data to be forwarded. If the multicast source policy denies multicast data, execute the undo source-policy command to delete it or modify the ACL so that it can permit the multicast data to pass through.
¡ If no multicast source policy is configured, proceed to the next step.
10. Verify that multicast entries have been generated.
¡ If multicast entries exist, collect entry information and proceed to step 11.
¡ If no multicast entries exist, proceed to step 11.
Use the following commands to collect entry information:
¡ Execute the display pim routing-table command to check for PIM routing entries.
¡ Execute the display igmp group command to check for IGMP multicast group entries.
¡ Execute the display multicast routing-table command to check for multicast routing entries.
¡ Execute the display multicast forwarding-table command to check for multicast forwarding entries.
11. If the issue persists, collect the following information and contact the support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
SPT forwarding failure in PIM-SM
Symptom
Multicast data fails to be forwarded through the SPT in a PIM-SM network. This section applies to only non-RP devices. If the faulty device is an RP, contact the support.
Common causes
The following are the common causes of this type of issue:
· The interface connected to downstream devices does not receive PIM join messages.
· The interface is not enabled with PIM-SM.
· The RPF route to the multicast source is incorrect.
· Configuration errors exist, such as multicast forwarding boundary and multicast source policy.
Troubleshooting flow
Figure 70 shows the troubleshooting flowchart.
Figure 70 Flowchart for troubleshooting SPT forwarding failure in PIM-SM
Solution
1. Verify that a correct (S, G) entry exists in the PIM routing table.
Execute the display pim routing-table command to check for the correct (S, G) entry.
¡ If a correct (S, G) entry exists, execute the display multicast forwarding-table command at 15-second intervals, and check the Matched packets and Forwarded packets fields.
- If no (S, G) entry exists in the forwarding table or the values of the Matched packets and Forwarded packets fields do not increase, proceed to step 8.
- If a (S, G) entry exists in the forwarding table and the values of the Matched packets and Forwarded packets fields increase, also proceed to step 8.
¡ If no correct (S, G) entry exists, proceed to step 2.
2. Verify that the interface connected to the downstream device has received PIM join messages.
Under the guidance of the support, use a packet capture tool such as Wireshark to capture packets on the interface to identify whether PIM join messages are received.
¡ If not, use the packet capture tool to capture packets on the interface connecting the downstream device to the device to identify whether the downstream device has sent PIM join messages. If not, troubleshoot the downstream device. If yes, the communication between the device and the downstream device is abnormal, and proceed to step 8.
¡ If the interface connected to the downstream device has received PIM join messages, and proceed to step 3.
3. Verify that the interface is enabled with PIM-SM.
Execute the display pim interface verbose command to identify whether the RPF interface, RPF neighbor interface, and interface connected to the subnet of hosts (downstream interface on the receiver-side DR) have been enabled with PIM-SM.
¡ If any of the interfaces is not enabled with PIM-SM, execute the pim sm command on the interface. Verify that IP multicast routing has been enabled by using the multicast routing command and that the neighbor relationship has been established successfully (displayed by using the display pim neighbor command).
¡ If all of the interfaces are enabled with PIM-SM, proceed to step 4.
4. Verify that an RPF route to the multicast source exists.
Execute the display multicast rpf-info command to check for the RPF route to the multicast source.
¡ If no route to the RP exists, examine unicast route configuration. Execute the ping command on both the device and the multicast source to identify whether they can ping each other successfully. If no, modify the unicast route configuration until they can ping each other successfully.
¡ If a route to the RP exists, execute the display multicast rpf-info command, and check the Referenced route type field for the type of the referenced route.
- If the RPF route is a static multicast route (the Referenced route type field is multicast static), execute the display multicast routing-table static command to identify whether the static multicast route is correct.
- If the RPF route is a unicast route (the Referenced route type field is igp, egp, unicast (direct), or unicast), execute the display ip routing-table command to identify whether the unicast route is the same as the RPF route.
¡ If an RPF route to the multicast source exists and is correct, proceed to step 5.
5. Verify that the DR corresponding to the interface for forwarding multicast data is the receiver-side DR.
Execute the display pim interface command, and check the DR-Address field. If (local) is displayed after the DR address, the DR is the receiver-side DR.
¡ If the DR is not the receiver-side DR, locate the device where the DR resides and perform step 6 on the device.
¡ If the DR is the receiver-side DR, perform step 6 on the current device.
6. Verify that the RPF interface and the connected interface of the device's RPF neighbor have not been configured with a multicast forwarding boundary.
Execute the display multicast boundary command to check for the multicast forwarding boundary configuration.
¡ If an interface is configured with a multicast forwarding boundary, execute the undo multicast boundary command to delete it.
¡ If no interface is configured with a multicast forwarding boundary, proceed to step 7.
7. Verify that no multicast source policy is configured or multicast data is not denied by a multicast source policy.
Execute the display this command in PIM view to check for a multicast source policy.
¡ If a multicast source policy has been configured, verify that it permits multicast data to be forwarded. If the multicast source policy denies multicast data, execute the undo source-policy command to delete it or modify the ACL so that it can permit the multicast data to pass through.
¡ If no interface is configured with a multicast forwarding boundary, proceed to step 8.
8. If the issue persists, collect the following information and contact the support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
RPT forwarding failure in PIM-SM
Symptom
Multicast data fails to be forwarded through the RPT in a PIM-SM network. This section applies to only non-RP devices. If the faulty device is an RP, contact the support.
Common causes
The following are the common causes of this type of issue:
· The route to the RP is unreachable.
· The RP address is not the same on all devices in the PIM-SM network.
· The interface connected to downstream devices does not receive PIM join messages.
· The interface is not enabled with PIM-SM.
· The RPF route to the RP is incorrect.
· Configuration errors exist, such as multicast forwarding boundary and multicast source policy.
Troubleshooting flow
Figure 71 shows the troubleshooting flowchart.
Figure 71 Flowchart for troubleshooting RPT forwarding failure in PIM-SM
Solution
1. Verify that a correct (S, G) entry exists in the PIM routing table.
Execute the display pim routing-table command to check for the correct (S, G) entry.
¡ If a correct (S, G) entry exists, execute the display multicast forwarding-table command at 15-second intervals, and identify whether the same (S, G) entry exists in the forwarding table and check the Matched packets and Forwarded packets fields.
- If no (S, G) entry exists in the forwarding table or the values of the Matched packets and Forwarded packets fields do not increase, proceed to step 9.
- If the same (S, G) entry exists in the forwarding table and the values of the Matched packets and Forwarded packets fields increase,, also proceed to step 9.
¡ If no correct (S, G) entry exists, proceed to step 2.
2. Verify that the interface connected to the downstream device has received PIM join messages.
Under the guidance of the support, use a packet capture tool such as Wireshark to capture packets on the interface to identify whether PIM join messages are received.
¡ If not, use the packet capture tool to capture packets on the interface connecting the downstream device to the device to identify whether the downstream device has sent PIM join messages. If not, troubleshoot the downstream device. If yes, the communication between the device and the downstream device is abnormal, and proceed to step 9.
¡ If the interface connected to the downstream device has received PIM join messages, and proceed to step 3.
3. Verify that the interface is enabled with PIM-SM.
Execute the display pim interface verbose command to identify whether the RPF interface, RPF neighbor interface, and interface connected to the subnet of hosts (downstream interface on the receiver-side DR) have been enabled with PIM-SM.
¡ If any of the interfaces is not enabled with PIM-SM, execute the pim sm command on the interface. Verify that IP multicast routing has been enabled by using the multicast routing command and that the neighbor relationship has been established successfully (displayed by using the display pim neighbor command).
¡ If all of the interfaces are enabled with PIM-SM, proceed to step 4.
4. Verify that the RP information is correct.
Execute the display pim rp-info command to check for the RP information, and identify whether all other devices in the PIM-SM domain has the same RP information.
¡ If the RP information is not the same and static RPs are used, execute the static-rp command on all devices to configure the same RP address. If dynamic RPs are used, proceed to step 9.
¡ If the RP information is the same, proceed to step 5.
5. Verify that an RPF route to the RP exists.
Execute the display multicast rpf-info command to check for the RPF route to the RP.
¡ If no route to the RP exists, examine unicast route configuration. Execute the ping command on both the device and the RP to identify whether they can ping each other successfully. If no, modify the unicast route configuration until they can ping each other successfully.
¡ If a route to the RP exists, execute the display multicast rpf-info command, and check the Referenced route type field for the type of the referenced route.
- If the RPF route is a static multicast route, execute the display multicast routing-table static command to identify whether the static multicast route is correct.
- If the RPF route is a unicast route, execute the display ip routing-table command to identify whether the unicast route is the same as the RPF route.
¡ If an RPF route to the RP exists and is correct, proceed to step 6.
6. Verify that the DR corresponding to the interface for forwarding multicast data is the receiver-side DR.
Execute the display pim interface command, and check the DR-Address field. If (local) is displayed after the DR address, the DR is the receiver-side DR.
¡ If the DR is not the receiver-side DR, locate the device where the DR resides and perform step 7 on the device.
¡ If the DR is the receiver-side DR, perform step 7 on the current device.
7. Verify that the RPF interface and the connected interface of the device's RPF neighbor have not been configured with a multicast forwarding boundary.
Execute the display multicast boundary command to check for the multicast forwarding boundary configuration.
¡ If an interface is configured with a multicast forwarding boundary, execute the undo multicast boundary command to delete it.
¡ If no interface is configured with a multicast forwarding boundary, proceed to step 8.
8. Verify that no multicast source policy is configured or multicast data is not denied by a multicast source policy.
Execute the display this command in PIM view to check for a multicast source policy.
¡ If a multicast source policy has been configured, verify that it permits multicast data to be forwarded. If the multicast source policy denies multicast data, execute the undo source-policy command to delete it or modify the ACL so that it can permit the multicast data to pass through.
¡ If no interface is configured with a multicast forwarding boundary, proceed to step 9.
9. If the issue persists, collect the following information and contact the support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Layer 3 multicast issues
Layer 3 multicast traffic forwarding failure
Symptom
A device fails to forward Layer 3 multicast traffic.
Common causes
The following are the common causes of this type of issue:
· No unicast routes exist.
· The interface state is incorrect.
· The device does not generate PIM routing entries or generates an incorrect PIM routing entry.
· The device does not generate multicast forwarding entries or generates an incorrect multicast forwarding entry.
Troubleshooting flow
Figure 72 shows the troubleshooting flowchart.
Figure 72 Flowchart for troubleshooting Layer 3 multicast traffic forwarding failure
Solution
1. Verify that a unicast route to the multicast source exists.
Execute the display ip routing-table ip-address command, and check the unicast route to the multicast source. Specify the multicast source address for the ip-address argument.
¡ If no unicast route to the multicast source exists, configure one.
¡ If a unicast route to the multicast source exists, proceed to the next step.
2. Verify that the physical states of the input interface and output interface are up.
Execute the display interface command to check the physical states of the input interface and output interface.
¡ If the physical state of either interface is down, troubleshoot the interface down issue.
¡ If the physical states of both interfaces are up, proceed to the next step.
3. Verify that the device generates a PIM routing entry and the entry has the correct output interface.
Execute the display pim routing-table command to check for correct PIM routing entries.
¡ If the device has not generated a correct PIM routing entry, contact the support.
¡ If the device has generated a correct PIM routing entry, proceed to the next step.
4. Verify that the device generates a multicast forwarding entry and the entry has the correct output interface.
Execute the display multicast forwarding-table command to check for correct multicast forwarding entries.
¡ If the device has not generated a correct multicast forwarding entry, collect the results of each step and the configuration file, and contact the support.
¡ If the device has generated a correct multicast forwarding entry, also collect the results of each step and the configuration file, and contact the support.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
IGMP or MLD entry establishment failure
Symptom
A device fails to establish IGMP or MLD entries.
Common causes
The following are the common causes of this type of issue:
· The device is not enabled with IP multicast routing.
· The physical state of the interface connected to the subnet of hosts is down.
· The interface connected to the subnet of hosts is not configured with a primary IP address.
· The interface connected to the subnet of hosts is not enabled with IGMP or MLD.
· The multicast group address is in the SSM group range, but the IGMP or MLD version is incorrect.
· An SSM group range is configured, but the multicast group address is not permitted by the ACL.
· An IGMP or MLD multicast group policy is configured, but the multicast group address is not permitted by the ACL.
Troubleshooting flow
Figure 73 shows the troubleshooting flowchart.
Figure 73 Flowchart for failure troubleshooting IGMP or MLD entry establishment failure
Solution
1. Verify that the device is enabled with IP multicast routing.
Execute the display current-configuration | include multicast command to identify whether IP multicast routing has been enabled.
¡ If IP multicast routing has not been enabled, execute the multicast routing command to enable it.
¡ If IP multicast routing has been enabled, proceed to the next step.
2. Verify that the physical state of the interface connected to the subnet of hosts is up.
Execute the display interface interface-type interface-number command, and check the Current state field for the physical state of the interface.
¡ If the physical state is up, proceed to the next step.
¡ If the physical state is down, troubleshoot the interface down issue.
3. Verify that the interface is configured with a primary IP address.
Execute the display this command on the interface connecting the device to hosts, and check for the primary IP address.
¡ If the primary IP address is not configured, use the ip address command to configure it.
¡ If the primary IP address is configured, proceed to the next step.
4. Verify that the interface connected to the subnet of hosts is enabled with IGMP or MLD.
Execute the display current-configuration interface command to identify whether IGMP or MLD is enabled on the interface.
¡ If no, enable IGMP or MLD on the interface.
¡ If yes, proceed to the next step.
5. Identify whether the multicast group address is in the default SSM group range.
¡ For IGMP, the default SSM group range is 232.0.0.0/8.
- If the multicast group address is in the default SSM group range, verify that the IGMP version is IGMPv3 and IGMPv3 packets are correct. If the issue persists, proceed to step 6.
- If the multicast group address is not in the default SSM group range, proceed to step 7.
¡ For MLD, the default IPv6 SSM group range is FF3x::/32.
- If the multicast group address is in the default IPv6 SSM group range, verify that the MLD version is MLDv2. If the issue persists, proceed to step 6.
- If the multicast group address is not in the default SSM group range, proceed to step 7.
6. Identify whether an SSM group range is configured on the interface.
Execute the display current-configuration configuration pim or display current-configuration configuration pim6 command to identify whether an SSM group range is configured.
¡ If an SSM group range is configured, identify whether the multicast group address is permitted by the ACL.
- If no, execute the undo ssm-policy command in PIM view or modify the ACL so that the multicast group address can be permitted.
- If yes, proceed to step 7.
¡ If an SSM group range is not configured, proceed to step 7.
7. Identify whether an IGMP or MLD multicast group policy is configured on the interface.
Execute the display current-configuration command to identify whether an IGMP or MLD multicast group policy is configured.
¡ If yes, identify whether the multicast group address is permitted by the ACL.
- If no, execute the undo igmp group-policy or undo mld group-policy command or modify the ACL so that the multicast group address can be permitted.
- If yes, proceed to step 8.
¡ If no, proceed to step 8.
8. If the issue persists, collect the following information and contact the support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Layer 2 multicast issues
Layer 2 multicast traffic forwarding failure
Symptom
A device fails to forward Layer 2 multicast traffic.
Common causes
The following are the common causes of this type of issue:
· The device does not generate Layer 2 multicast forwarding entries.
· The device does not receive Layer 2 multicast protocol packets
· The IGMP protocol packet format is incorrect.
· The version in IGMP protocol packets is different from the IGMP snooping version configured on the device.
· Layer 3 multicast is configured.
Troubleshooting flow
Figure 74 shows the troubleshooting flowchart.
Figure 74 Flowchart for troubleshooting Layer 2 multicast traffic forwarding failure
Solution
1. Identify whether the device generates Layer 2 multicast forwarding entries.
Execute the display l2-multicast ip forwarding command to check for Layer 2 multicast forwarding entries.
¡ If the device has generated Layer 2 multicast forwarding entries, contact the support.
¡ If the device has not generated Layer 2 multicast forwarding entries, proceed to step 2.
2. Identify whether the device receives IGMP reports.
Execute the debugging igmp-snooping packet command to enable IGMP snooping packet debugging. If the following information is printed, the device has received IGMP reports:
*Sep 15 11:47:41:455 2011 Sysname MCS/7/PACKET: -MDC=1; Receive IGMPv2 report packet from port GE2/0/1 on VLAN 2. (G162625)
¡ If the device has not received IGMP reports, troubleshoot the downstream device and hosts.
¡ If the device has received IGMP reports, proceed to step 3.
3. Verify that the IGMP protocol packet format is correct.
Configure mirroring, and use a packet capture tool (for example, Wireshark) to capture and analyze mirrored IGMP protocol packets under the guidance of the support.
¡ If the IGMP protocol packet format is incorrect, modify IGMP protocol packets.
¡ If the IGMP protocol packet format is correct, proceed to step 4.
Execute the display igmp-snooping command, and check the Version field for the IGMP snooping version.
¡ If the version in IGMP protocol packets is different from the IGMP snooping version, perform one of the following tasks:
- Modify the IGMP versions on the upstream and downstream devices so that they can be the same as the IGMP snooping version on the device.
- Use the version command in IGMP-snooping view or the igmp-snooping version command in VLAN view to modify the IGMP snooping version so that it can be the same as the IGMP versions on the upstream and downstream devices.
¡ If the version in IGMP protocol packets is the same as the IGMP snooping version, proceed to step 5.
5. Verify that Layer 3 multicast is not configured.
¡ If Layer 3 multicast is configured, delete the Layer 3 multicast configuration.
¡ If Layer 3 multicast is not configured, proceed to the next step.
6. If the issue persists, collect the following information and contact the support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting MPLS issues
LDP issues
LDP session down
Symptom
The LDP session cannot go up.
Common causes
The following are the common causes of this type of issue:
· The interface establishing the session is in a Down state.
· The LSR ID has been configured incorrectly.
· Related configuration for the LDP session does not exist.
· The transport address configuration is incorrect.
· The LDP Hello-hold timer has timed out.
· The LDP Keepalive-hold timer has timed out.
· Security authentication configuration is incorrect.
Troubleshooting flow
Figure 75 shows the troubleshooting flowchart.
Figure 75 Flowchart for troubleshooting LDP session down
Solution
To resolve the issue:
1. Check whether the interface for establishing an LDP session is in the up state.
Execute the display interface command to Identify whether the interface is in the up state:
¡ If the interface is not up, identify and eliminate any physical link faults to bring the interface to an up state.
¡ If the interface is in up state, proceed to step 2.
2. Check whether the LSR ID configuration is correct.
The LSR ID includes Local LSR ID, LDP LSR ID, and MPLS LSR ID. The priority of LSR ID from high to low is Local LSR ID, LDP LSR ID, and MPLS LSR ID. At least one type of LSR ID should be configured on the device and this LSR ID must be reachable at Layer 3.
Execute the display mpls ldp peer verbose command to Identify whether the LSR ID is configured.
<Sysname> display mpls ldp peer verbose
VPN instance: public instance
Peer LDP ID : 100.100.100.20:0
Local LDP ID : 100.100.100.17:0
TCP Connection : 100.100.100.20:47515 -> 100.100.100.17:646
…
If no LSR ID is configured, configure the LSR ID as follows:
¡ Configure the MPLS LSR ID in system view.
Execute the mpls lsr-id command in system view.
¡ Configure the LDP LSR ID in LDP view.
Execute the lsr-id command in LDP view.
If at least one type of LSR ID is configured, proceed to step 3.
3. Check whether relevant configuration for the LDP session exists.
If it's a direct session, execute the display this command in interface view to Identify whether there's any related configuration for the LDP session on the interface.
¡ If the configuration does not include the mpls enable, mpls ldp enable, mpls ldp ipv6 enable, or mpls ldp transport-address commands, deploy the missing commands.
¡ If the related configuration for the LDP session exists, proceed to step 4.
If it's an LDP remote session, execute the display this command in LDP view to Identify whether there is any related configuration of the LDP session.
¡ If the configuration does not include the targeted-peer or mpls ldp transport-address command, then deploy the missing commands.
¡ If the related configuration for the LDP session exists, proceed to step 4.
4. Check whether the transport address configuration is correct.
If it's an LDP IPv4 session, execute the display mpls ldp discovery verbose command to Identify whether the transport address configuration is correct.
<Sysname> display mpls ldp discovery verbose
VPN instance: public instance
Link Hellos:
Interface GigabitEthernet2/0/2
Local LDP ID : 100.100.100.17:0
Hello Interval : 5000 ms Hello Sent/Rcvd : 83/160
Transport Address: 100.100.100.17
Peer LDP ID : 100.100.100.18:0
Source Address : 202.118.224.18 Transport Address: 100.100.100.18
Hello Hold Time: 15 sec (Local: 15 sec, Peer: 15 sec)
Peer LDP ID : 100.100.100.20:0
Source Address : 202.118.224.20 Transport Address: 100.100.100.20
Hello Hold Time: 15 sec (Local: 15 sec, Peer: 15 sec)
Targeted Hellos:
100.100.100.17 -> 100.100.100.18 (Active, Passive)
Local LDP ID : 100.100.100.17:0
Hello Interval : 15000 ms Hello Sent/Rcvd : 23/20
Transport Address: 100.100.100.17
Session Setup : Config/Tunnel
Peer LDP ID : 100.100.100.18:0
Source Address : 100.100.100.18 Transport Address: 100.100.100.18
Hello Hold Time: 45 sec (Local: 45 sec, Peer: 45 sec)
If it's an LDP IPv6 session, execute the display mpls ldp discovery ipv6 verbose command to check whether the transport address configuration is correct.
<Sysname> display mpls ldp discovery ipv6 verbose
VPN instance: public instance
Link Hellos:
Interface GigabitEthernet2/0/2
Hello Interval : 5000 ms Hello Sent/Rcvd : 83/160
Transport Address: 2001::2
Peer LDP ID : 100.100.100.18:0
Source Address : FE80:130F:20C0:29FF:FEED:9E60:876A:130B
Transport Address: 2001::1
Hello Hold Time: 15 sec (Local: 15 sec, Peer: 15 sec)
Targeted Hellos:
2001:0000:130F::09C0:876A:130B ->
2005:130F::09C0:876A:130B(Active, Passive)
Hello Interval : 15000 ms Hello Sent/Rcvd : 23/22
Transport Address: 2001:0000:130F::09C0:876A:130B
Peer LDP ID : 100.100.100.18:0
Source Address : 2005:130F::09C0:876A:130B
Destination Address : 2001:0000:130F::09C0:876A:130B
Transport Address : 2005:130F::09C0:876A:130B
Hello Hold Time: 45 sec (Local: 45 sec, Peer: 45 sec)
If the transport address is incorrect, execute the mpls ldp transport-address command to configure the transport address in interface view or LDP peer view. By default, the transport address is the LSR ID of the local LSR.
If the transport address is correct, verify that the route is advertised. Execute the display ip routing-table command to Identify whether there is a route to reach the session endpoint.
¡ If the route does not exist, configure the transport address as an IP address that exists on the local device to ensure the route can be properly advertised.
¡ If the route exists, proceed to step 5.
5. Check whether the LDP Hello-hold timer has timed out.
It is recommended to execute the display mpls ldp discovery command every 5 seconds to check the count of transmitted and received Hello messages. This would verify if the Hello messages are being transmitted normally at both ends of the session. If the transmission or reception count does not change after several continuous command executions, it indicates an anomaly in the transmission and reception of Hello messages and the Hello-hold timing timer has timed out.
¡ If the Hello-hold timer times out, clear link faults and check the device's CPU usage. If the CPU usage is too high, disable some unnecessary features; if the CPU usage is normal, proceed to step 6.
¡ If the Hello-hold timer does not time out, proceed to step 6.
6. Check whether the LDP Keepalive-hold timer has timed out.
It is suggested to execute the display mpls ldp peer command every 15 seconds to check the transmit and receive counts of Keepalive messages, and Identify whether the Keepalive messages are transmitted normally at both ends of the session. If the counts do not change after several continuous command executions, it indicates an anomaly in transmitting or receiving Keepalive messages, and the Keepalive-hold timing has timed out.
¡ If the Keepalive-hold timer times out, resolve any packet forwarding issues.
¡ If the Keepalive-hold timer does not timeout, proceed to step 7.
7. Check whether the security authentication configuration is correct.
Execute the display mpls ldp peer command to check whether security authentication is configured on both ends of the LDP session, and whether the type of security authentication configured is consistent on both ends.
<Sysname> display mpls ldp peer
VPN instance: public instance
Total number of peers: 1
Peer LDP ID State Role GR Auth KA Sent/Rcvd
2.2.2.9:0 Operational Passive Off Keychain 39/39
¡ If the Auth field displays different values on both ends of the LDP session, then modify the security authentication on both ends of the LDP session to be consistent.
¡ If the Auth field displays the same value at both ends of the LDP session, then proceed to step 8.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module Name: MPLS-LDP-STD-MIB
mplsLdpSessionDown (1.3.6.1.2.1.10.166.4.0.4)
Log messages
LDP/4/LDP_SESSION_CHG
LDP session flapping
Symptom
The LDP session state flaps frequently.
Common causes
The following are the common causes of this type of issue:
· Interface flapping.
· Route flapping.
· High CPU usage.
Troubleshooting flow
Figure 76 shows the troubleshooting flowchart.
Figure 76 Flowchart for troubleshooting LDP session flapping
Solution
To resolve the issue:
1. Identify whether the interface is flapping.
Execute the display interface brief command to observe the Physical and Protocol fields. If both Physical and Protocol fields are displayed as Up, it indicates that the interface state is up. Otherwise, it indicates that the interface state is down. If the interface keeps switching between the Up and Down states, it indicates interface is flapping.
¡ If the interface is flapping, resolve the interface issue.
¡ If the interface is not flapping, proceed to step 2.
2. Identify whether the route is flapping.
Execute the display ip routing-table command to view route information. If the route information keeps switching between being displayed and not displayed, it indicates route flapping.
¡ If route flapping occurs, or the route has always been absent, resolve link issues and IGP route issues.
¡ If the route is not flapping, proceed to step 3.
3. Check whether the TCP packet is too large.
Execute the display tcp statistics command to view TCP connection traffic statistics. Determine if the TCP packet is excessively large by the value in the data packets retransmitted field in the Sent packets information.
¡ If the number of retransmitted packets continuously increases, it indicates that the TCP packet is too large. Execute the tcp mss command on the outgoing interface to adjust the TCP MSS value.
¡ If the number of retransmitted packets is not increased, it indicates that the TCP packet size is normal. Then, proceed to step 4.
4. Identify whether the CPU usage is too high.
Execute the display cpu-usage command to view the statistical information of CPU usage.
¡ If the CPU usage is too high, disable some unnecessary features to lower the device's CPU usage.
¡ If the CPU usage is normal, proceed with step 5.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module Name: MPLS-LDP-STD-MIB
mplsLdpSessionDown (1.3.6.1.2.1.10.166.4.0.4)
Log messages
LDP/4/LDP_SESSION_CHG
LDP LSP down
Symptom
In the LDP network, an LDP LSP cannot come up.
Common causes
The following are the common causes of this type of issue:
· Route issue.
· LDP session down.
· Insufficient resources, for example, number of labels reaching the limit, or lack of memory.
· LSP generation policy, label acceptance policy, label advertisement policy, or a label mapping propagation policy is configured.
· The outgoing interface of the route is not the interface that establishes the LDP session.
Troubleshooting flow
Troubleshoot this type of issue in the following procedure:
1. Identify whether the route exists.
2. Identify whether the LDP session has been established properly.
3. Identify whether there are issues of insufficient resources, for example, labels reaching the upper limit, or lack of memory.
4. Identify whether an LSP generation policy has been configured.
5. Identify whether the outgoing interface of the route is the interface used to establish the LDP session.
Figure 77 shows the troubleshooting flowchart.
Figure 77 Flowchart for troubleshooting LDP LSP down
Solution
To resolve the issue:
1. Identify whether the route exists.
Execute the display ip routing-table ip-address mask verbose command to Identify whether there is a route destined for the LSP destination address and is in active state (the State field value is Active Adv). For a public network BGP route, you also need to Identify whether the route carries a label. If the Label field is not NULL, it indicates the BGP route carries a label. When the route exists, the command will display relevant route information. If the route does not exist, the command will not display relevant route information.
<Sysname> display ip routing-table 1.1.1.1 32 verbose
Summary count : 1
Destination: 1.1.1.1/32
Protocol: O_INTRA
Process ID: 1
SubProtID: 0x1 Age: 00h00m16s
FlushedAge: 00h00m16s
Cost: 1 Preference: 10
IpPre: N/A QosLocalID: N/A
Tag: 0 State: Active Adv
OrigTblID: 0x0 OrigVrf: default-vrf
…
¡ If the route does not exist, the route exists but is not in active state, or the BGP route does not carry a label, resolve the routing failure.
¡ If the route exists and is in active state, and also carries a label when it is a BGP route, proceed to step 2.
2. Identify whether the LDP session has been established properly.
Execute the display mpls ldp peer verbose command to Identify whether the LDP session has been successfully established.
<Sysname> display mpls ldp peer verbose
VPN instance: public instance
Peer LDP ID : 1.1.1.1:0
Local LDP ID : 2.2.2.2:0
TCP Connection : 2.2.2.2:14080 -> 1.1.1.1:646
Session State : Operational Session Role : Active
Session Up Time : 0000:00:14 (DD:HH:MM)
…
¡ If the State field is not displayed as Operational, it indicates that the LDP session was not established normally. Troubleshoot the LDP session issue as described in “See "LDP session down."
¡ If the State field displays Operational, it indicates that the LDP session has been established and come up. In this case, proceed to step 3.
3. Check whether an LSP acceptance or advertisement policy has been configured.
¡ In the LDP view, execute the display this command. If the following commands exist, you need to check whether the specified LSP has been filtered by an IP prefix list:
- lsp-trigger prefix-list
- accept-label peer prefix-list
- advertise-label prefix-list
If an IP prefix list filters out the specified LSP, modify the IP prefix list to allow the destination address of the specified LSP to pass. If the IP prefix list does not filter the specified LSP, proceed to step 4.
¡ If the previous commands are not configured in the LDP view, proceed to step 4.
4. Identify whether the outgoing interface of the route is the interface used to establish the LDP session.
Execute the display ip routing-table ip-address mask command to view the outgoing interface of the specified route.
<Sysname> display ip routing-table 1.1.1.1 32
Summary count : 1
Destination/Mask Proto Pre Cost NextHop Interface
1.1.1.1/32 O_INTRA 10 1 10.1.1.1 GE2/0/1
Execute the display mpls ldp peer peer-lsr-id verbose command to view the Discovery Sources information of the specified LDP peer.
<Sysname> display mpls ldp peer 1.1.1.1 verbose
VPN instance: public instance
Peer LDP ID : 1.1.1.1:0
Local LDP ID : 2.2.2.2:0
TCP Connection : 2.2.2.2:14080 -> 1.1.1.1:646
Session State : Operational Session Role : Active
Session Up Time : 0000:12:55 AM (DD:HH:MM)
Max PDU Length : 4096 bytes (Local: 4096 bytes, Peer: 4096 bytes)
Keepalive Time : 45 sec (Local: 45 sec, Peer: 45 sec)
Keepalive Interval : 15 sec
Msgs Sent/Rcvd : 229/228
KA Sent/Rcvd : 223/223
Label Adv Mode : DU Graceful Restart : Off
Reconnect Time : 0 sec Recovery Time : 0 sec
Loop Detection : Off Path Vector Limit: 0
mLDP P2MP : Off
Discovery Sources:
GigabitEthernet2/0/1
Hello Hold Time: 15 sec Hello Interval : 5000 ms
Addresses received from peer:
10.1.1.1 1.1.1.1
¡ If the interface information in Discovery Sources field does not include the outgoing interface of the specified route, check whether the corresponding LDP configuration on the outgoing interface of the specified route and on the corresponding interface of the downstream device is correct. If it is incorrect, modify the corresponding configuration; if it is correct, proceed to step 5.
¡ If the interface information in the Discovery Sources field includes the outgoing interface of the specified route, proceed to step 5.
5. Check for insufficient resources, such as number of LSPs reaching the upper limit or lack of memory.
¡ Identify whether the system memory is insufficient.
Execute the display memory-threshold command to Identify whether the system is running out of memory. If the memory is insufficient, delete unnecessary LSPs.
¡ Check whether the number of labels has exceeded the upper limit.
Execute the display mpls summary command and Identify whether the number of idle labels in the LDP label range is 0, that is, the Idle field shows 0. If the idle label count is 0, it means that all label resources of the LDP have been used up, and it is necessary to delete unnecessary LSPs.
<Sysname> display mpls summary
MPLS LSR ID : 2.2.2.2
Egress Label Type: Implicit-null
Entropy Label : Off
Labels:
Range Used/Idle/Total Owner
16-2047 0/2032/2032 StaticPW
Static
StaticCR
Static SR Adj
BSID
2048-599999 9129/588823/597952 LDP
RSVP
BGP
BGP SR EPE
OSPF SR Adj
ISIS SR Adj
¡ If the issue of insufficient resources does not exist, proceed to step 6.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: MPLS-LSR-STD-MIB
The node name (OID) is mplsXCDown (1.3.6.1.2.1.10.166.2.0.2).
Log messages
N/A
LDP LSP flapping
Symptom
In the LDP network, an LDP LSP flaps frequently.
Common causes
The following are the common causes of this type of issue:
· Route flapping.
· LDP session flapping.
Troubleshooting flow
Troubleshoot this type of issue in the following procedure:
1. Identify whether the route is flapping.
2. Identify whether the LDP session is flapping.
Figure 78 shows the troubleshooting flowchart.
Figure 78 Flowchart for troubleshooting LDP LSP flapping
Solution
To resolve the issue:
1. Identify whether the route is flapping.
It is recommended to execute the display ip routing-table command every second continuously for 5 to 10 times to check the route to the LSP destination address. When the route exists, related route information will be displayed. If the route does not exist, related route information will not be displayed. If the related route information keeps switching between displaying and not displaying, it indicates route flapping.
After viewing the route information, execute the display mpls ldp fec command to verify that the State field in the Downstream Info for the LSP established with the downstream peer has a value of Established.
<Sysname> display mpls ldp fec
VPN instance: public instance
FEC: 1.1.1.1/32
Flags: 0x112
In Label: 2175
Upstream Info:
Peer: 1.1.1.1:0 State: Established
Downstream Info:
Peer: 1.1.1.1:0
Out Label: 3 State: Established
Next Hops: 10.1.1.1 GE2/0/1
RIB Info:
Protocol : OSPF BGP As Num : 0
Label Proto ID : 1 NextHopCount : 1
VN ID : 0x313000003
Tunnel ID : -
¡ If route flapping occurs OR if the route never exists, please troubleshoot the routing issue.
¡ If the route is not oscillating, proceed to step 2.
2. Identify whether the LDP session is flapping.
It is recommended to execute the display mpls ldp peer command every second, continuously for 5 to 10 times, to check the State field in the output information. If the value of this field is switching between Operational state and other states, it indicates that the LDP session is flapping.
<Sysname> display mpls ldp peer
VPN instance: public instance
Total number of peers: 1
Peer LDP ID State Role GR AUT KA Sent/Rcvd
1.1.1.1:0 Operational Active Off None 298/298
¡ If the LDP session is flapping, troubleshoot the flapping issue as described in "LDP session flapping."
¡ If the LDP session is not flapping, proceed to step 3.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: MPLS-LSR-STD-MIB
The node name (OID) is mplsXCDown (1.3.6.1.2.1.10.166.2.0.2).
Log messages
N/A
Troubleshooting MPLS L2VPN/VPLS
A PW failed to be pinged
Symptom
Execute the ping mpls pw command to test PW connectivity. However, the remote end cannot be pinged.
Common causes
The following are the common causes of this type of issue:
· The PW being tested does not exist.
· The PW template configuration is incorrect.
· The PW has failed.
· The PW does not have a valid forwarding path on the public network.
Analysis
To troubleshoot this type of issue, execute the ping mpls pw command, and then identify the cause of the issue depending on the received error message.
· If you receive the Unknown PW error message, the issue occurs because the PW does not exist. You must modify the configuration to make sure the PW can be created correctly.
· If the error message is No suitable control channel for the PW, check for VCCV control channel type misconfiguration. Then, execute the vccv cc command to specify the correct VCCV control channel type in the PW template.
· If the error message is Please configure pseudowire control-word for control channel, execute the control-word enable command to enable the control word feature in the PW template.
· If the error message is Request time out, identify whether the local PW is up, and then execute the tracert mpls pw command to locate the faulty node.
Figure 79 shows the troubleshooting flowchart.
Figure 79 Flowchart for troubleshooting ping failure
Solution
To resolve the issue:
If you receive the Unknown PW message, modify the configuration to ensure that the PW can be correctly created.
If you receive the No suitable control channel for the PW message, execute the vccv cc command to configure the same VCCV control channel type for both ends.
If you receive the Please configure pseudowire control-word for control channel message, execute the control-word enable command to enable the control word feature in the PW template.
If you receive the Request time out message, perform the following steps:
1. Execute the display l2vpn pw command to verify that the PW is up.
<Sysname> display l2vpn pw
Flags: M - main, B - backup, E - ecmp, BY - bypass, H - hub link, S - spoke link
N - no split horizon, A - administration, ABY - ac-bypass
PBY - pw-bypass
Total number of PWs: 2
2 up, 0 blocked, 0 down, 0 defect, 0 idle, 0 duplicate
Xconnect-group Name: ldp
Peer PWID/RmtSite/SrvID In/Out Label Proto Flag Link ID State
192.3.3.3 500 1299/1299 LDP M 0 Up
VSI Name: aaa
Peer PWID/RmtSite/SrvID In/Out Label Proto Flag Link ID State
2.2.2.9 2 1420/1419 BGP M 9 Up
¡ If the PW is in Down state, execute the display l2vpn pw verbose command to check for the failure reason and troubleshoot the issue.
<Sysname> display l2vpn pw verbose
VSI Name: aaa
Peer: 2.2.2.9 Remote Site: 2
Signaling Protocol : BGP
Link ID : 9 PW State : Down
In Label : 1420 Out Label: 1419
MTU : 1500
PW Attributes : Main
VCCV CC : -
VCCV BFD : -
Flow Label : Send
Control Word : Disabled
Tunnel Group ID : 0x800000960000000
Tunnel NHLFE IDs : 1038
Admin PW : -
E-Tree Mode : -
E-Tree Role : root
Root VLAN : -
Leaf VLAN : -
Down Reasons : Control word not match
The common causes of this type of issue are as follows:
- BFD session for PW down—The BFD session for PW detection is down. To resolve this issue, execute the display bfd session command to display BFD session information. Check and edit BFD configuration or check the physical link for link failure or link quality issues.
- BGP RD was deleted—The BGP RD has been deleted. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.
- BGP RD was empty—No BGP RD is configured. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.
- Control word not match—The control word configuration on the two ends of the PW is inconsistent. To resolve this issue, execute the control-word enable command to enable the control word feature on both ends.
- Encapsulation not match—The encapsulation types on the two ends of the PW are inconsistent. Execute the pw-type command to configure the same encapsulation type for the two ends.
- LDP interface parameter not match—The LDP negotiation parameters on the two ends of the PW are inconsistent. To resolve this issue, execute the vccv cc command to specify the same VCCV control channel (CC) type. Alternatively, specify the same CEM class for the CEM interfaces on both ends of the PW.
- Non-existent remote LDP PW—The remote device has deleted the LDP PW. To resolve the issue, reconfigure the LDP PW on the remote device.
- Local AC Down—The local AC is down. To resolve the issue, check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make Assurance interface is in up state.
- Local AC was non-existent—The local AC did not exist. To resolve this issue, configure a local AC and associate it with a VSI.
- MTU not match—The MTU is not the same at the two ends of the PW. To resolve the issue, configure the same MTU at both ends of the PW or use the mtu-negotiate disable command to disable MTU negotiation.
- Remote AC Down—The remote AC is down. Check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make Assurance interface is in up state.
¡ If the PW is in up state, go to step 2.
2. Execute the display l2vpn forwarding pw verbose command to verify that the In Label, Out Label, and Tunnel NHLFE IDs related to the tunnel that carries the PW are valid.
<Sysname> display l2vpn forwarding pw verbose
Xconnect-group Name: xcg1
Connection Name: c1
Link ID: 0
PW Type : VLAN PW State : Up
In Label : 110126 Out Label: 130126
MTU : 1500
PW Attributes : Main
VCCV CC : Router-Alert
VCCV BFD : Fault Detection with BFD
Flow Label : -
Tunnel Group ID : 0x800000130000001
Tunnel NHLFE IDs : 3
VSI Name: aaa
Link ID: 8
PW Type : VLAN PW State : Up
In Label : 1272 Out Label: 1275
MTU : 1500
PW Attributes : Main
VCCV CC : -
VCCV BFD : Fault Detection with BFD
Flow Label : -
Tunnel Group ID : 0x960000000
Tunnel NHLFE IDs: 1034
¡ If the values for the incoming and outgoing labels are empty or a hyphen (-), first execute the display l2vpn pw verbose command to check for the protocol that established the PW and then edit the configuration as follows:
- If the protocol is BGP, check and edit BGP configuration.
- If the protocol is LDP, check and edit LDP configuration.
- If the protocol is Static, check and edit static PW configuration.
For more information about protocol that are used establish PWs, see MPLS L2VPN and VPLS in the MPLS Configuration Guide for your device.
¡ If no value is available for the Tunnel NHLFE IDs field, go to step 3.
¡ If the forwarding information for the PW is normal, go to step 4.
3. Execute the display mpls lsp command to check for the tunnel that carries the PW. The tunnel is an LSP with the FEC as the PW peer IP address. If it does not exist, establish the tunnel that carries the PW.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
100.100.100.100/24 LDP -/1049 GE2/0/1
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Use the display diagnostic-information command to collect diagnostic information.
Related alarm and log messages
Alarm messages
None.
Log messages
· L2VPN/2/L2VPN_PWSTATE_CHANGE
· L2VPN/4/L2VPN_BGPVC_CONFLICT_LOCAL
· L2VPN/4/L2VPN_BGPVC_CONFLICT_REMOTE
· L2VPN/4/L2VPN_HARD_RESOURCE_NOENOUGH
· L2VPN/2/L2VPN_HARD_RESOURCE_RESTORE
· L2VPN/4/L2VPN_LABEL_DUPLICATE
Troubleshooting MPLS L3VPN issues
L3VPN traffic disrupted
Symptom
Private traffic forwarded through the MPLS L3VPN network gets disrupted.
Common causes
The following are the common causes of this type of issue:
· The next hop in the private network route is unreachable.
· Incorrect routing policy configuration prevents the route from being advertised and received.
· Private routes cannot be advertised because of insufficient label resources.
· The private network route does not point to a tunnel.
· Mismatch between export and import RTs prevents the device from learning routes into the private routing table.
· Incoming routes are discarded because the maximum number of routes has been reached.
Troubleshooting flow
Figure 80 shows the troubleshooting flowchart.
Figure 80 Troubleshooting flowchart for L3VPN traffic disruption
Solution
1. Verify that the route is the optimal one.
2. Execute the display bgp routing-table vpnv4 or display bgp routing-table vpnv6 command to verify that the BGP route to the VPNv4 or VPNv6 peer is optimal.
A route is optimal if it contains the greater than (>) symbol. The route to 100.1.2.0/24 in the following command output is an example of optimal routes.
<Sysname> display bgp routing-table vpnv4
BGP local router ID is 1.1.1.9
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a – additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Total number of VPN routes: 8
Total number of routes from all PEs: 8
Route distinguisher: 100:1(vpn1)
Total number of routes: 6
Network NextHop MED LocPrf PrefVal Path/Ogn
* > 1.1.1.0/24 1.1.1.1 0 32768 ?
* 1.1.1.2/32 1.1.1.1 0 32768 ?
* > 100.1.2.0/24 100.1.1.1 0 100 0 400i
Take action depending on the command output.
¡ If the route is not optimal, use the display mpls lsp command to verify that the MPLS LFIB has an entry for the route of interest. If an LFIB entry is not available, enable MPLS and LDP on the public network interface towards the remote PE by executing the mpls enable and mpls ldp enable commands, respectively. This ensures that VPNv4 routes can be pointed to public LSPs. If the entry exists, proceed to step 2.
¡ If the route is optimal, proceed to step 2.
3. Verify the connectivity to the next hop in the private route.
Execute the display bgp routing-table vpnv4 ipv4-address [ mask | mask-length ] command on the local PE check for the private route advertised by the remote PE. Specify the private route prefix for the ipv4-address argument.
¡ If the route does not exist, check for CE route advertisement issues. On the remote PE, execute the display bgp routing-table vpnv4 peer advertised-routes or display bgp routing-table vpnv6 peer advertised-routes command to verify that it has advertised private routes correctly to the local PE.
<Sysname> display bgp routing-table vpnv4 peer 22.22.22.22 advertised-routes
Total number of routes: 6
BGP local router ID is 11.11.11.11
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Route distinguisher: 1:1
Total number of routes: 3
Network NextHop MED LocPrf Path/Ogn
* >e 1.1.1.1/32 10.1.1.2 0 100 20i
* >e 7.7.7.7/32 10.1.1.2 0 100 20?
* >e 10.1.1.0/24 10.1.1.2 0 100 20?
If the private route does not exist, proceed to step 3.
¡ If the private route exists, verify that its next hop is reachable and its state is active.
Check the State field. If its value is valid, the route is active. Check the Original nexthop field. If it contains next hop information, the next hop in the route is reachable.
- If the private route is inactive, use the display ip
routing-table vpn-instance vpn-instance-name ip-address command to check the IP routing table for a route to the BGP next
hop in the Original nexthop field.
If such a route does not exist, the next hop in the private route is
unreachable. Then, check the routing configuration for the public network
between PEs.
If such a route exists, the BGP route next hop is reachable. Proceed to step
(3).
- If the private route is active, proceed to step 3.
<sysname> display bgp routing-table vpnv4 6.0.0.9 32
BGP local router ID: 4.0.0.9
Local AS number: 200
Route distinguisher: 103:1
Total number of routes: 1
Paths: 1 available, 1 best
BGP routing table information of 6.0.0.9/32:
From : 3.0.0.9 (3.0.0.9)
Rely nexthop : 20.0.2.1
Original nexthop: 3.0.0.9
OutLabel : 24128
Ext-Community : <RT: 100:1>
RxPathID : 0x0
TxPathID : 0x0
AS-path : 300 103
Origin : igp
Attribute value : pref-val 0
State : valid, external, best
IP precedence : N/A
QoS local ID : N/A
Traffic index : N/A
Tunnel policy : tp1
Rely tunnel IDs : 2
4. Verify that the routing policy is correct.
Execute the display current-configuration configuration bgp command both the route sender and receiver. Check the BGP configuration for import and export routing policies policies.
<sysname> display current-configuration configuration bgp
#
bgp 100
peer 1.1.1.1 as-number 100
peer 3.3.3.3 as-number 100
peer 3.3.3.3 connect-interface LoopBack1
#
address-family vpnv4
peer 3.3.3.3 enable
peer 3.3.3.3 route-policy in import
peer 3.3.3.3 route-policy out export
#
return
If the devices at both ends have import and export routing policies, check the policies for incorrect settings that filter out the private route.
If the devices do not have import or export routing policies, or if the routing policies do not filter out the private route, proceed to step 4.
5. Verify that the route can recurse to a tunnel.
On the remote PE (the route sender), execute the display bgp routing-table vpnv4 ipv4-address [ mask | mask-length ] command to verify that the VPNv4 route can recurse to the tunnel.
If the command output contains the Rely tunnel IDs, the route can recurse to the tunnel.
¡ If the route cannot recurse to the tunnel, see the troubleshooting procedure for the LDP LSP up failure issues.
¡ If the route recurses to a tunnel, proceed to step 5.
<sysname> display bgp routing-table vpnv4 6.0.0.9 32
BGP local router ID: 4.0.0.9
Local AS number: 200
Route distinguisher: 103:1
Total number of routes: 1
Paths: 1 available, 1 best
BGP routing table information of 6.0.0.9/32:
From : 3.0.0.9 (3.0.0.9)
Rely nexthop : 20.0.2.1
Original nexthop: 3.0.0.9
OutLabel : 24128
Ext-Community : <RT: 100:1>
RxPathID : 0x0
TxPathID : 0x0
AS-path : 300 103
Origin : igp
Attribute value : pref-val 0
State : valid, external, best
IP precedence : N/A
QoS local ID : N/A
Traffic index : N/A
Tunnel policy : tp1
Rely tunnel IDs : 2
6. Check for export RT and import RT mismatches. A mismatch between import and export RTs can prevent routes from being learned into private routing tables.
Execute the display bgp routing-table vpnv4 and display current-configuration configuration vpn-instance commands on the route sender (the local PE) and route receiver (the remote PE). Check for a mismatch between the export RT on the local PE and the import RT on the remote PE for the VPN instance. An RT mismatch can prevent the route from being learned into the remote VPN instance after it is sent to the remote PE.
Execute the display bgp routing-table vpnv4 and display ip extcommunity-list commands on the local PE to verify that the export RT for the VPN instance is not filtered out. If it is filtered out, the PE does not advertise the routes that match the export RT.
¡ If the export and import RTs do not match, execute the vpn-target command to reconfigure their settings for the VPN instance.
¡ If the routing policy filters out the export RT, execute the apply extcommunity rt command in routing policy view to include the export RT to the list of RT attributes set for the matching routes.
¡ If the export and import RTs match, or if the export RT is not filtered out by the routing policy, proceed to step 6.
Verify that the route carries the correct export RT attributes.
<sysname> display bgp routing-table vpnv4 6.0.0.9 32
BGP local router ID: 4.0.0.9
Local AS number: 200
Route distinguisher: 103:1
Total number of routes: 1
Paths: 1 available, 1 best
BGP routing table information of 6.0.0.9/32:
From : 3.0.0.9 (3.0.0.9)
Rely nexthop : 20.0.2.1
Original nexthop: 3.0.0.9
OutLabel : 24128
Ext-Community : <RT: 100:1>
RxPathID : 0x0
TxPathID : 0x0
AS-path : 300 103
Origin : igp
Attribute value : pref-val 0
State : valid, external, best
IP precedence : N/A
QoS local ID : N/A
Traffic index : N/A
Tunnel policy : tp1
Rely tunnel IDs : 2
Verify that the BGP extended community attribute list is correct.
<sysname> display ip extcommunity-list 1
Extended Community List Number 10
Deny rt: 100:1
Extended Community List Number 20
Permit rt: 200:1
Verify that the local device has correct import RT settings.
<sysname> display current-configuration configuration vpn-instance
#
ip vpn-instance vpn1
route-distinguisher 1:1
vpn-target 100:1 import-extcommunity
vpn-target 100:1 export-extcommunity
#
7. Check for insufficient MPLS label resources.
Execute the display mpls interface command on the route sender (the local PE) to verify that MPLS is enabled on the public network interface connected to the remote PE.
¡ If the command output contains the public interface connected to the remote PE, MPLS is enabled on the interface.
¡ If the command output does not contain the public network interface connected to the remote PE, execute the mpls enable command in interface view to enable MPLS on it.
<Sysname> display mpls interface
Interface Status MPLS MTU
GE2/0/1 Up 1500
GE2/0/2 Up 1500
Execute the display bgp routing-table vpnv4 advertise-info command to identify the state of label allocation for advertised routes.
¡ If the Inlabel field in the command output for a route is empty, the system might have failed to allocate a label to the route because of insufficient label resources. To conserve label resources:
- Execute the apply-label per-instance command to enable allocation of one label for the entire VPN instance.
- Use route summarization to reduce the number of routes.
¡ If the Inlabel field in the command output has a reasonable value, label resources are sufficient and a label has been allocated to the route. Proceed to step 7.
<Sysname> display bgp routing-table vpnv4 10.1.1.0 24 advertise-info
BGP local router ID: 1.1.1.9
Local AS number: 100
Route distinguisher: 100:1
Total number of routes: 1
Paths: 1 best
BGP routing table information of 10.1.1.0/24(TxPathID:0):
Advertised to VPN peers (1 in total):
3.3.3.9
Inlabel : 1279
8. Check for insufficient route entry resources.
Execute the display bgp peer vpnv4 log-info command to view the log for the BGP peer. If the command output contains the Cease/maximum number of VPNv4 prefixes reached message, the number of IPv4 VPN routes has reached the limit.
<Sysname> display bgp peer vpnv4 1.1.1.1 log-info
Peer : 1.1.1.1
Date Time State Notification
Error/SubError
06-Feb-2013 22:54:42 Down Send notification with error 6/1
Cease/maximum number of VPNv4 prefixes reached
In addition, if the number of routes has exceeded the limit, you would receive log messages similar to the following sample messages:
BGP/4/BGP_EXCEED_ROUTE_LIMIT: BGP default.vpn1: The number of routes (101) from peer 1.1.1.1 (IPv4-UNC) exceeds the limit 100.
BGP/4/BGP_REACHED_THRESHOLD: BGP default.vpn1: The ratio of the number of routes (3) received from peer 1.1.1.1 (IPv4-UNC) to the number of allowed routes (2) has reached the threshold (75%).
¡ If the number of routes exceeds the limit, execute the peer route-limit command in VPNv4 address family view or VPNv6 address family view on the route receiver to increase the maximum number of routes allowed to receive from its peers.
¡ If the number of routes does not exceed the limit, proceed to step 8.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: BGP4-MIB
bgpBackwardTransition (1.3.6.1.2.1.15.7.2)
Log messages
· BGP_EXCEED_ROUTE_LIMIT
· BGP_REACHED_THRESHOLD
L3VPN private route flapping
Symptom
The private routes received from a remote PE flap on the local PE.
Common causes
The following are the common causes of this type of issue:
· Public route flapping.
· LDP LSP flapping.
· Interface flapping.
Troubleshooting flow
Figure 81 shows the troubleshooting flowchart.
Figure 81 Troubleshooting flowchart for L3VPN private route flapping issues
Solution
1. Check for public route flapping issues.
2. Identify the route type.
a. Execute the display ip routing-table command to identify the route type.
Take the following command output for
example.
The Proto field displays IS_L1, indicating that the route type is IS-IS.
The Interface field displays Tun1, indicating that LDP over MPLS TE is deployed.
<Sysname> display ip routing-table 1.1.1.1
Summary count : 1
Destination/Mask Proto Pre Cost NextHop Interface
1.1.1.1/32 IS_L1 15 10 1.1.1.1 Tun1
b. Check for the route flapping issue.
Determine whether the route is flapping based on the route type. Take an IS-IS route for example. Execute the display ip routing-table protocol isis command to view route information. If the route continuously alternates between the visible and invisible states, route flapping has occurred.
- If the route is flapping, see the troubleshooting procedures for the OSPF neighbor down, OSPFv3 neighbor down, or IS-IS route flapping issue.
- If the route is not flapping, proceed to step 2.
3. Check for the LDP LSP flapping issue.
As a best practice, execute the display mpls ldp peer command every second for 5 to 10 times. Examine the State field in the command output. If the value changes between Operational and other states, the LDP session is flapping, causing LDP LSP flapping.
¡ If the LDP LSP is flapping, see the procedure for troubleshooting the LDP LSP flapping issue.
¡ If the LDP LSP is not flapping, proceed to step 3.
<Sysname> display mpls ldp peer
VPN instance: public instance
Total number of peers: 1
Peer LDP ID State Role GR AUT KA Sent/Rcvd
1.1.1.1:0 Operational Active Off None 298/298
4. Check for the interface flapping issue.
Execute the display interface brief command, and then examine the Link and Protocol fields in the command output. If the values in both fields are Up, the interface is up. If otherwise, the interface is down. If the interface state continuously alternates between up and down, the interface is flapping.
¡ If the interface is flapping, see the troubleshooting procedure for the interface not up issue.
¡ If the interface is not flapping, proceed to step 4.
<Sysname> display interface gigabitethernet 2/0/1 brief
Brief information on interfaces in route mode:
Link: ADM - administratively down; Stby - standby
Protocol: (s) – spoofing
Interface Link Protocol Primary IP Description
GE2/0/1 UP UP --
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
VPN route exchange failure between PEs
Symptom
PEs cannot exchange VPNv4 or VPNv6 routes.
Common causes
The following are the common causes of this type of issue:
· Public IGP routes are not advertised.
· No public LSPs are available.
· BGP peer relationships are not established.
· VPNv4 or VPNv6 routes are not learned.
Troubleshooting flow
Figure 82 shows the troubleshooting flowchart.
Figure 82 Troubleshooting flowchart for private route exchange failure between PEs
Solution
1. Verify that an IGP route is available.
Execute the display ip routing-table command to verify that the local PE has a subnet route to the LSR ID (typically, the IP address of a Loopback interface) of the remote PE.
<Sysname> display ip routing-table 1.1.1.1
Summary count : 1
Destination/Mask Proto Pre Cost NextHop Interface
1.1.1.2/32 IS_L1 15 10 1.1.1.1 LoopBack1
¡ If such a route does not exist, make sure an IGP protocol is enabled on the Loopback interface and the public network interface on each PE. This ensures correct advertisement of subnet routes between them.
¡ If such a route exists, proceed to step 2.
2. Verify that a public LSP is available.
Execute the display mpls lsp command to check for a public LSP to the remote PE's Loopback interface.
¡ If such an LSP is not present, enable MPLS and MPLS LDP on the public network interface to ensure the establishment of a public LSP.
¡ If such an LSP exists, proceed to step 3.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
1.1.1.2/32 LDP -/1049 GE2/0/1
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
1.1.1.1/24 LDP -/1051 GE2/0/1
3. Execute the display mpls ldp peer verbose command to verify that an LDP session has been successfully established.
¡ If the State field in the command output displays anything other than Operational, the LDP session has not been established. To resolve the issue, see the procedure that troubleshoots the failure of a LDP session to come up.
¡ If the State field in the command output displays Operational, the LDP session has been established. Proceed to step 3.
<Sysname> display mpls ldp peer verbose
VPN instance: public instance
Peer LDP ID : 1.1.1.1:0
Local LDP ID : 2.2.2.2:0
TCP Connection : 2.2.2.2:14080 -> 1.1.1.1:646
Session State : Operational Session Role : Active
Session Up Time : 0000:00:14 (DD:HH:MM)
…
4. Verify that a BGP peer relationship has been established.
Execute the display bgp peer vpnv4 to view the BGP VPNv4 peer relationships between PEs, and execute the display bgp peer ipv4 vpn-instance command to view the BGP peer relationships between PEs and CEs.
¡ If a BGP peer relationship is not present, or if the State field does not display Established, the BGP peer relationship has not been established. See the procedure that troubleshoots BGP neighbor establishment failures to resolve the issue.
¡ If the State field displays Established, the BGP peer relationship has been established. Proceed to step 4.
<Sysname> display bgp peer vpnv4
BGP local router ID: 192.168.100.1
Local AS number: 100
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
1.1.1.2 200 13 16 0 0 00:10:34 Established
<Sysname> display bgp peer ipv4 vpn-instance vpn1
BGP local router ID: 1.1.1.1
Local AS number: 100
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
10.1.1.1 65410 5 4 0 1 00:01:19 Established
5. Verify that the private route is operating correctly.
Execute the display ip routing-table vpn-instance command to check for private route issues.
¡ If the mask for the private route is not 32 bits and the route was discovered by a protocol other than BGP, the IP addresses of the Loopback interfaces on the peer PEs are in the same subnet. The device will prefer the direct route over the private route. To resolve this issue, change the IP address of the Loopback interface on each PE and set their mask to 32 bits.
¡ If the private route has a 32-bit mask and was discovered by BGP, the route is correct. Proceed to step 5.
<Sysname> display ip routing-table vpn-instance vpn1
Summary count : 1
Destination/Mask Proto Pre Cost NextHop Interface
1.1.1.0/24 Direct 0 0 1.1.1.1 LoopBack1
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Communication failure between VPN instances with matching RTs
Symptom
The network shown in Figure 83 deploys MPLS L3VPN services. On this network, CE 1 and CE 3 belong to VPN 1, and CE 2 belongs to VPN 2. To enable connectivity between VPN 1 and VPN 2, matching RTs were configured on them.
Despite this, CE 2 cannot ping CE 3 at IP address 3.3.3.3 in a different VPN, even though CE 1 can successfully ping CE 3.
Common causes
In this scenario, CE 1 can ping CE 3 in the same VPN, indicating that the public tunnel for label forwarding in the MPLS backbone network functions correctly. The failure is most likely caused by the IP conflict between interfaces assigned to different VPN instances.
Troubleshooting flow
Figure 84 shows the troubleshooting flowchart.
Solution
1. Check for IP conflict between interfaces on the PE.
Execute the display ip interface brief command on PE 1 to view the IP addresses of interfaces on it.
<Sysname> display ip interface brief
*down: administratively down
(s): spoofing (l): loopback
Interface Physical Protocol IP Address/Mask VPN instance Description
...
GE2/0/1 up up 10.1.1.1/24 vpn1 --
GE2/0/2 up up 10.1.1.1/24 vpn2 --
...
If the interfaces in different VPN instances are assigned IP addresses from different subnets, proceed to step 2.
If two interfaces in different VPN instances on the PE have the same IP address or IP addresses from the same subnet, re-assign an IP address to one of the interfaces. Make sure their IP addresses are from a different subnet. Then, change the IP address of the CE interface connected to the IP-updated PE interface and reconfigure routing between the PE and the CE.
BGP redistributes RT matching routes between the VPN instances. If you assigned the same IP address to interfaces in different VPN instances, the BGP routing table would have two routes for the same destination address. BGP would select the better one of the two routes. Look at the following sample command output:
<Sysname> display bgp routing-table vpnv4
BGP local router ID is 11.11.11.11
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Total number of VPN routes: 11
Total number of routes from all PEs: 2
Route distinguisher: 1:1(vpn1)
Total number of routes: 6
Network NextHop MED LocPrf PrefVal Path/Ogn
* >e 1.1.1.1/32 10.1.1.2 0 0 20i
* >e 2.2.2.2/32 10.1.1.2 0 0 30i
* >i 3.3.3.3/32 22.22.22.22 0 100 0 40i
* >e 10.1.1.0/24 10.1.1.2 0 0 20?
* >i 30.1.1.0/24 22.22.22.22 0 100 0 40?
Route distinguisher: 2:2(vpn2)
Total number of routes: 5
Network NextHop MED LocPrf PrefVal Path/Ogn
* >e 1.1.1.1/32 10.1.1.2 0 0 20i
* >e 2.2.2.2/32 10.1.1.2 0 0 30i
* >i 3.3.3.3/32 22.22.22.22 0 100 0 40i
* >e 10.1.1.0/24 10.1.1.2 0 0 20?
* e 10.1.1.2 0 0 30?
* >i 30.1.1.0/24 22.22.22.22 0 100 0 40?
In the BGP routing table for VPN 2 (with an RD of 2:2), the optimal route selected based on the AS_PATH attribute for subnet 10.1.1.0 originates from VPN 1. Then, PE 1 will send the traffic intended to be sent from VPN 1 to VPN 2 out of interface GigabitEthernet2/0/1 in VPN 1 instead of interface GigabitEthernet2/0/2 in VPN 2. This will cause an inter-VPN communication failure.
To ensure correct traffic forwarding between two VPN instances, make sure the PE and CE interfaces in one VPN instance are on a different subnet than the PE and CE interfaces in another VPN instance. For example, PE 1 and its attached CE establishes an EBGP session to exchange routes. Use the following procedure to change the IP address of the CE-attached PE interface on PE 1:
2. In system view, execute the interface command to enter the view of the interface associated with the target VPN instance.
3. Execute the ip address command to change the IP address of the target interface.
4. In system view, execute the bgp command to enter BGP instance view.
5. Execute the ip vpn-instance command to enter BGP-VPN instance view.
6. Execute the undo peer command to delete the BGP peer relationships established with the conflicting IP addresses.
7. Execute the peer as-number command to add the CE as an EBGP peer at its new IP address.
8. Execute the address-family ipv4 unicast command to enter BGP IPv4 unicast address family view.
9. Execute the peer enable command to enable BGP to exchange BGP IPv4 unicast routing information with the CE specified as a BGP peer.
Take IP reassignment only for interfaces in VPN 2 for example. On CE 2, perform the following steps:
a. In system view, execute the interface command to enter the view of the interface connected to PE 1.
b. Execute the ip address command to change the IP address of the target interface.
c. In system view, execute the bgp command to enter BGP instance view.
d. Execute the undo peer command to delete the BGP peer relationships established with the conflicting IP addresses.
10. Execute the peer as-number command to add the PE as an EBGP peer at its new IP address.
11. Execute the address-family ipv4 unicast command to enter BGP IPv4 unicast address family view.
12. Execute the peer enable command to enable BGP to exchange BGP IPv4 unicast routing information with the PE specified as a BGP peer.
13. Execute the import-route or network to advertise routing information for the VPN instance.
If the issue persists, proceed to step 2.
14. Collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
PEs unable to learn routes because of VPN route target filtering on the route reflector (RR)
Symptom
The RR does not reflect the MVPN routes, VPNv4 or VPNv6 routes, BGP L2VPN information, VPN Flowspec routes, or EVPN routes announced by one PE to other PEs, as expected.
Common causes
By default, an RR filters MVPN routes, VPNv4 or VPNv6 routes, BGP L2VPN information, VPN Flowspec routes, and EVPN routes based on VPN route targets. The RR adds a route to the routing table only if one of the export RT attributes in the route matches a local import RT. If no match is found, the RR discards the route, without forwarding the route to remote PEs.
Troubleshooting flow
To resolve this issue, disable VPN route target filtering on the RR. Figure 85 shows the troubleshooting flowchart.
Solution
1. Check the configuration for the affected address family. Make sure route target filtering has been disabled by using the undo policy vpn-target command.
2. In BGP instance view, execute the display this command to check the configuration in each address family view for the undo policy vpn-target command. If the command does not exist, proceed to step b. If the command exists, proceed to step 2.
3. Enter the view of the affected address family and execute the undo policy vpn-target to disable VPN route target filtering, allowing the RR to forward routes with mismatching RTs. If the issue persists, proceed to step 4.
4. Collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
A private IP routing table on a PE does not contain routes announced by a remote PE
Symptom
In an IPv4 or IPv6 MPLS L3VPN network, communication failure occurs between CEs because the IP routing table for a VPN instance on a PE lacks private routes to the site attached to a remote PE.
Common causes
The following are the common causes of this type of issue:
· The BGP session with the remote PE is not in Established state.
· The remote PE has not advertised private routes.
· A public tunnel has not been established.
· The local PE discards the private routes sent by the remote PE.
· The private routes advertised by the remote PE are in the local BGP routing table. However, they are not added to the IP routing table for the VPN instance.
Troubleshooting flow
Figure 86 shows the troubleshooting flowchart.
Solution
1. Verify that a BGP peer relationship has been established.
Execute the display bgp peer vpnv4 or display bgp peer vpnv6 command to verify that the local and remote PEs have established a BGP session in Established.
<Sysname> display bgp peer vpnv4
BGP local router ID: 11.11.11.11
Local AS number: 10
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
22.22.22.22 10 82 69 0 2 01:01:28 Established
¡ If the PEs has established a BGP peer relationship, proceed to step 3.
¡ If the PEs has not established a BGP peer relationship, see the BGP session establishment failure troubleshooting procedure in the part for troubleshooting IP routing issues. If the issue persists after the BGP session changes to the Established state, proceed to step 2.
2. Verify that the remote PE has advertised private routes to the local PE.
On the remote PE, execute the display bgp routing-table vpnv4 peer advertised-routes or display bgp routing-table vpnv6 peer advertised-routes command to verify that it has advertised private routes to the local PE.
<Sysname> display bgp routing-table vpnv4 peer 22.22.22.22 advertised-routes
Total number of routes: 6
BGP local router ID is 11.11.11.11
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Route distinguisher: 1:1
Total number of routes: 3
Network NextHop MED LocPrf Path/Ogn
* >e 1.1.1.1/32 10.1.1.2 0 100 20i
* >e 7.7.7.7/32 10.1.1.2 0 100 20?
* >e 10.1.1.0/24 10.1.1.2 0 100 20?
Route distinguisher: 2:2
Total number of routes: 3
Network NextHop MED LocPrf Path/Ogn
* >e 2.2.2.2/32 10.1.1.2 0 100 30i
* >e 7.7.7.7/32 10.1.1.2 0 100 30?
* >e 10.1.1.0/24 10.1.1.2 0 100 30?
If the information of interest exists, proceed to step 3. If the information of interest does not exist, proceed with the following checks:
a. Execute the display bgp routing-table vpnv4 or display bgp routing-table vpnv6 command on the remote PE to check for the private routes of interest.
- If the information of interest exists, proceed to step b.
- If the information of interest exists, see the IP routing troubleshooting part to check the routing configuration between the PE and the CE. Many routing protocols are available for PEs and CEs to exchange route information, including static routing, RIP, OSPF, OSPFv3, IS-IS, and BGP. Identify the troubleshooting procedure depending on the routing protocol used between the PE and the CE. If the issue persists after the private route has injected into the BGP routing table on the remote PE, proceed to step b.
b. Execute the display this command in BGP VPNv4 or or BGP VPNv6 address family view on the remote PE. Check for the route filtering misconfiguration that might prevent the private routes from being advertised. The following are the commands for route filtering:
- peer prefix-list export
- peer filter-policy export
- peer as-path-acl export
- filter-policy export
- peer route-policy export
To prevent a route export filtering command from incorrectly filtering private routes to be advertised, execute the undo form of that command. To avoid unexpected impacts on network services, adjust the private route export filtering policy under technical support guidance.
If the issue persists, proceed to step 3.
3. Verify that a public tunnel has been established.
The public tunnel for MPLS L3VPN can be an LSP, MPLS TE, or GRE tunnel. For an LSP or MPLS TE tunnel, the outer tag is an MPLS label. For a GRE public tunnel, the outer tag is GRE encapsulation.
A public tunnel is typically a label forwarding path automatically established by using LDP. The following information uses this type of public tunnel for example to describe the troubleshooting procedure for public tunnel establishment. For tunnels established by using other methods, see their respective troubleshooting procedures or seek help from Technical Support.
Execute the display mpls ldp peer command on each device in the private route advertisement path in the backbone network. Verify that they have established sessions with their LDP peers.
<Sysname> display mpls ldp peer
VPN instance: public instance
Total number of peers: 2
Peer LDP ID State Role GR AUT KA Sent/Rcvd
22.22.22.22:0 Operational Passive Off None 1816/1816
11.11.11.11:0 Operational Passive Off None 1816/1816
If the sessions have been successfully established, proceed to step 4.
If an LDP session is not established, see the LDP session down troubleshooting procedure for MPLS troubleshooting.
If the issue persists after the public tunnel is established, proceed to step 4.
4. Check the BGP routing table on the local PE for private routes advertised by the remote PE.
Execute the display bgp routing-table vpnv4 or display bgp routing-table vpnv6 command on the local PE to check for private routes advertised by the remote PE.
If the information of interest does not exist, perform the following operations:
a. Execute the display ip vpn-instance instance-name command on both the local and remote PEs to check for import and export RT mismatches for the VPN.
<Sysname> display ip vpn-instance instance-name vpn1
VPN-Instance Name and Index : vpn1, 1
Route Distinguisher : 1:1
Interfaces : GigabitEthernet2/0/1
TTL mode: pipe
Address-family IPv4:
Export VPN Targets :
1:1
Import VPN Targets :
1:1
- If an import and export RT mismatch exists, execute the vpn-target command in VPN instance view to change the RT settings on the local or remote PE. If the BGP routing table on the local PE still lacks the private routes advertised by the remote PE, proceed to step b. If the issue persists even if the BGP routing table on the local PE already contains the private routes advertised by the remote PE, proceed to step 5.
- If the import and export RTs match, proceed to step b.
b. Execute the display this command in BGP instance view. Check for the import route filtering misconfiguration that prevents the private routes from being imported. The following are the commands for route filtering:
- peer prefix-list import
- peer filter-policy import
- peer as-path-acl import
- filter-policy import
- peer route-policy import
To prevent a route import filtering command from incorrectly filtering received private routes, execute the undo form of that command. To avoid unexpected impacts on network services, adjust the private route import filtering policy under technical support guidance.
If the issue persists, proceed to step 5.
5. Identify the reason that prevents the BGP routes from being added to the IP routing table for the VPN instance. The following are possible reasons include:
¡ The device is configured with the undo policy vpn-target command. This command enables the device to add VPNv4 or VPNv6 routes to the BGP routing table for the VPN instance and select them as optimal routes, even if they do not match the the VPN instance’s RT attributes. However, these routes cannot be added to the IP routing table for the current VPN instance. To resolve this issue, execute the display this command in BGP instance view to identify the address families configured with the undo policy vpn-target command. If an address family is configured with that command, execute the policy vpn-target command in the view of that address family to resolve the issue.
¡ The device is configured with the routing-table bgp-rib-only command, which prevents BGP routes from being injected into the IP routing table. To resolve this issue, execute the display this command in BGP instance view to identify the address families configured with the routing-table bgp-rib-only command. If an address family is configured with that command, execute the undo routing-table bgp-rib-only command to resolve the issue.
If the issue persists, proceed to step 6.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure to forward large packets between sites
Symptom
On an IPv4 or IPv6 MPLS L3VPN network deploys devices from H3C and other vendors, inter-site access to resources in the same VPN might fail. For example, users in one site cannot access certain websites or download files via FTP in another site. Ping test fails when the payload of ICMP packets is above 1464 bytes. Ping tests succeeds when the payload of ICMP packets is less than 1464 bytes.
Common causes
This type of failure typically occurs when a small MPU is set on one or multiple network interfaces in the traffic forwarding path.
Troubleshooting flow
Figure 87 shows the troubleshooting flowchart.
Figure 87 Troubleshooting flowchart for failures to forward large packets between sites
Procedure
1. Set the MTU on each network interface in the traffic forwarding path to 1508 bytes or higher.
¡ On an H3C device, execute the display interface command to view the MTUs of interfaces.
<Sysname> display interface gigabitethernet 2/0/1
GigabitEthernet2/0/1
Current state: Administratively UP
Line protocol state: UP
Description: GigabitEthernet2/0/1 Interface
Bandwidth: 1000000 kbps
Maximum transmission unit: 1500
...
To change the MTU of an interface, execute the ip mtu or ipv6 mtu command in interface view.
¡ For information about the commands used on a device from a third-party vendor, see the documentation for that device.
If the issue persists, proceed to step 2.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure to ping the subnet attached to a remote CE from a PE
Symptom
On the IPv6 MPLS L3VPN network shown in Figure 88, multiple interfaces on PE 1 are assigned to VPN instance VPN 1. Executing the ping ipv6 2001:db8:3::1 command on both CE 1 and CE 2 successfully pings the subnet attached to remote CE 3. However, executing the ping ipv6 -vpn-instance vpn1 2001:db8:3::1 command on PE 1 cannot ping the subnet attached to CE 3.
Common causes
This issue typically occurs when CE 3 lacks routes to some private IPv6 addresses on PE 1. To resolve this issue, CE 3 must have routes to the IPv6 addresses of all up interfaces in the same VPN as it on PE 1.
Troubleshooting flow
Figure 89 shows the troubleshooting flowchart.
Procedure
1. Make sure CE 3 has routes to all private IPv6 addresses on PE 1.
When you ping a remote CE-attached subnet from PE 1 without specifying a source address, PE 1 sends ICMPv6 requests with a source address automatically selected from the IPv6 addresses on the packet outgoing interface. If CE 3 lacks routing information for this IPv6 address, it cannot return ICMPv6 echo packets.
To resolve this issue:
¡ Configure PE 1 to advertise all its private routes. For example, execute the import-route direct command in BGP-VPN IPv6 unicast address family view.
¡ Execute the ping ipv6 –a source-ipv6 -vpn-instance vpn-instance-name host command to perform a ping operation with a source IP address specified. Make sure this address exists in the IPv6 routing table on CE 3.
If the issue persists, proceed to step 2.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
MPLS TE issues
MPLS TE tunnel down
Symptom
After an MPLS TE tunnel is created, the display interface tunnel command shows that the tunnel's current state is DOWN.
<Sysname> display interface tunnel 1
Tunnel1
Current state: DOWN
Line protocol state: DOWN
Description: Tunnel1 Interface
Bandwidth: 64kbps
Maximum transmission unit: 1496
Internet address: 7.1.1.1/24 (primary)
Tunnel source unknown, destination 4.4.4.9
Tunnel TTL 255
Tunnel protocol/transport CR_LSP
Last clearing of counters: Never
Last 300 seconds input rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec
Last 300 seconds output rate: 6 bytes/sec, 48 bits/sec, 0 packets/sec
Input: 0 packets, 0 bytes, 0 drops
Output: 177 packets, 11428 bytes, 0 drops
Common causes
The following are the common causes of this type of issue:
· The link where the MPLS TE tunnel is located is down.
· The configuration for the MPLS TE tunnel is incorrect.
· The destination address of the MPLS TE tunnel is referenced by a static route.
Analysis
Figure 90 shows the troubleshooting flowchart.
Figure 90 Flowchart for troubleshooting MPLS TE tunnel down
Solution
To resolve the issue:
1. Verify that the MPLS TE tunnel’s output interface on the device is in up state.
Execute the display interface command to view the state of the output interface for the MPLS TE tunnel. Make sure the output interface is in up state.
2. Verify that the MPLS TE configuration is correct.
Check the following settings in sequence:
a. Make sure the mpls te enable command is configured in the OSPF/IS-IS area and on the interfaces that the MPLS TE tunnel passes through.
b. Make sure the LSR ID and Router ID are the same Loopback interface address.
c. If the MPLS TE tunnel is established using RSVP-TE, make sure the device and interfaces are configured with the rsvp and rsvp enable commands.
d. If the mpls te bandwidth command is configured on the tunnel interface, make sure the device's output interface is configured with the mpls te max-link-bandwidth and mpls te max-reservable-bandwidth commands.
e. If the mpls te affinity-attribute command is configured on the tunnel interface, make sure the mpls te link-attribute command is configured properly on the output interface. To ensure a link can be used by a tunnel, the following requirements must be met:
- The link attribute bits corresponding to the 1 bits in the affinity mask are checked as follows: The link attribute bits corresponding to the 1 bits of the affinity attribute must have a minimum of one bit set to 1. The link attribute bits corresponding to the 0 bits of the affinity attribute must have no bit set to 1.
- The link attribute bits corresponding to the 0 bits in the affinity mask are not checked.
f. If the MPLS TE tunnel is established using Segment Routing, make sure segment routing related settings are configured in the IGP area of the device.
g. If the MPLS TE tunnel is established by using an explicit path specified with the mpls te path command, verify that the explicit path configuration is appropriate: To use the strict mode, you must specify the IP address of the incoming interface hop by hop; to use the loose mode, you must specify the node address of the each device to be passed through.
3. Verify that the destination address of the MPLS TE tunnel is not used by a static route.
4. Execute the display current-configuration | include destination command to check whether the destination address of the MPLS TE tunnel is referenced by a static route. If it is referenced by a static route, modify the static route or change the destination address of the tunnel according to the actual network requirements.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file of the device.
¡ Diagnostic information collected using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
MPLS TE tunnel state changing from up to down
Symptom
The state of an MPLS TE tunnel has changed from UP to Down.
Common causes
The following are the common causes of this type of issue:
· The link where the MPLS TE tunnel is located is down.
· The configuration of the MPLS TE tunnel has been deleted or incorrectly configured.
· RSVP message timeouts or errors have occurred.
· The physical link does not meet the required bandwidth for the MPLS TE tunnel.
· The BFD session is down on the MPLS TE tunnel interface or the physical interface where the tunnel is located.
Analysis
Figure 91 shows the troubleshooting flowchart.
Figure 91 Flowchart for troubleshooting MPLS TE tunnel state changing from up to down
Solution
To resolve the issue:
1. Verify that the MPLS TE tunnel's output interface is in up state.
Execute the display interface command to view the state of the output interface for the MPLS TE tunnel. Make sure the output interface is in up state.
2. Verify that the MPLS TE configuration is correct.
Check the following settings in sequence:
a. Verify that the mpls te enable command is configured in the OSPF/IS-IS area and on the interfaces that the MPLS TE tunnel passes through.
b. Verify that the LSR ID and Router ID the same Loopback interface address.
c. If the MPLS TE tunnel is established using RSVP-TE, make sure the device and interfaces are configured with the rsvp and rsvp enable commands.
d. If the mpls te bandwidth command is configured on the tunnel interface, make sure the device's output interface is configured with the mpls te max-link-bandwidth and mpls te max-reservable-bandwidth commands.
e. If the mpls te affinity-attribute command is configured on the tunnel interface, make sure the mpls te link-attribute command is configured properly on the output interface. To ensure a link can be used by a tunnel, the following requirements must be met:
- The link attribute bits corresponding to the 1 bits in the affinity mask are checked as follows: The link attribute bits corresponding to the 1 bits of the affinity attribute must have a minimum of one bit set to 1. The link attribute bits corresponding to the 0 bits of the affinity attribute must have no bit set to 1.
- The link attribute bits corresponding to the 0 bits in the affinity mask are not checked.
f. If the MPLS TE tunnel is established using Segment Routing, make sure segment routing related settings are configured in the IGP area of the device.
g. If the MPLS TE tunnel is established by using an explicit path specified with the mpls te path command, verify that the explicit path configuration is appropriate: To use the strict mode, you must specify the IP address of the incoming interface hop by hop; to use the loose mode, you must specify the node address of the each device to be passed through.
3. Verify that no RSVP message timeouts or errors exist.
4. Use the display rsvp statistics command to Identify whether there are RSVP message timeouts (i.e., the number of Path messages sent and Resv messages received do not match, or the number of Path messages received and Resv messages sent do not match) or RSVP message errors (i.e., receiving PathError messages or ResvError messages). If RSVP message timeouts or errors are found, capture the error information carried in the PathError or ResvError packets, and then resolve the issue according to the error codes by referring to RFC 2205 and RFC 3209.
<Sysname> display rsvp statistics
P2P statistics:
Object Added Deleted
PSB 3 1
RSB 3 1
LSP 3 1
P2MP statistics:
Object Added Deleted
PSB 0 0
RSB 0 0
LSP 0 0
Packet Received Sent
Path 5 5
Resv 5 5
PathError 0 0
ResvError 0 0
PathTear 0 0
ResvTear 0 0
ResvConf 0 0
Bundle 0 0
Ack 0 0
Srefresh 0 0
Hello 0 0
Challenge 0 0
Response 0 0
Error 0 0
5. Verify that the physical link meets the bandwidth required for the MPLS TE tunnel.
6. When an MPLS TE tunnel with a higher priority is established on the device, it might preempt the bandwidth of an MPLS TE tunnel with lower priority, causing the state of the lower priority MPLS TE tunnel to become down. Check the remaining available bandwidth for each priority on the link by using the display mpls te link-management bandwidth-allocation command, and ensure that the remaining available bandwidth on the link is greater than the bandwidth required by the tunnel of that priority. If the remaining available bandwidth on the link cannot meet the requirements of the MPLS TE tunnel, modify the configuration, adjust the tunnel path, or provide more bandwidth for the link.
7. Verify that the BFD session for the MPLS TE tunnel interface or the tunnel's physical interface is not down.
8. Use the display mpls bfd te tunnel tunnel-number command to view the BFD state of the MPLS TE tunnel. If the BFD state of the MPLS TE tunnel is down, use the display bfd session command to identify the reason for the BFD session down state. Examine and modify the BFD configuration and clear link faults or quality issues of the physical links.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module Name: MPLS-TE-STD-MIB
· mplsTunnelUp (1.3.6.1.2.1.10.166.3.0.1)
· mplsTunnelDown (11.3.6.1.2.1.10.166.3.0.2)
Log messages
· IFNET/5/LINK_UPDOWN
· IFNET/3/PHY_UPDOWN
Loop in an MPLS TE tunnel
Symptom
A loop exists in the forwarding path of the MPLS TE tunnel, preventing traffic from being forwarded to the destination address through the MPLS TE tunnel.
Common causes
The same IP address exists on different devices that the MPLS TE tunnel passes through.
Solution
To resolve the issue:
1. Identify whether the same IP address has been configured on different devices that the MPLS TE tunnel passes through. If yes, change the IP addresses to ensure that no identical IP addresses exist on the different devices that the MPLS TE tunnel travels through.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file of the device.
¡ Diagnostic information collected using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Tunnel path calculation failure
Symptom
The calculation of the MPLS TE tunnel path failed, causing the tunnel to be down.
Common causes
The following are the common causes of this type of issue:
· No IGP neighbors have been established.
· No MPLS TEDB information exists.
· The configuration for the MPLS TE tunnel is incorrect.
Analysis
Figure 92 shows the troubleshooting flowchart.
Figure 92 Flowchart for troubleshooting tunnel path calculation failure
Solution
To resolve the issue:
1. Verify that an IGP neighbor has been established.
Execute the display ospf peer or display isis peer command to Identify whether an IGP neighbor has been established.
¡ If an IGP neighbor has been established, proceed to step 2.
¡ If no IGP neighbor has been established, complete the OSPF or IS-IS configuration to establish an IGP neighbor. For more information about OSPF and IS-IS, see OSPF configuration and IS-IS configuration respectively in the Layer 3—IP Routing Configuration Guide of the device.
2. Execute the display mpls te tedb command to view the information of MPLS TEDB.
If MPLS TEDB information exists, proceed to step 3.
If MPLS TEDB information does not exist, check the following configurations in order:
a. Verify that the mpls enable and mpls te enable commands are configured in the OSPF/IS-IS area and on the interfaces that the MPLS TE tunnel passes through.
b. Verify that the LSR ID and Router ID the same Loopback interface address.
3. Verify that the MPLS TE configuration is correct.
a. If the MPLS TE tunnel is established using RSVP-TE, make sure the device and interfaces are configured with the rsvp and rsvp enable commands.
b. If the MPLS TE tunnel is established using Segment Routing, make sure the segment-routing mpls command is configured in the IGP area of the device.
c. If the mpls te bandwidth command is configured on the tunnel interface, make sure the device's output interface is configured with the mpls te max-link-bandwidth and mpls te max-reservable-bandwidth commands.
d. If the mpls te affinity-attribute command is configured on the tunnel interface, make sure the mpls te link-attribute command is configured properly on the output interface. To ensure a link can be used by a tunnel, the following requirements must be met:
- The link attribute bits corresponding to the 1 bits in the affinity mask are checked as follows: The link attribute bits corresponding to the 1 bits of the affinity attribute must have a minimum of one bit set to 1. The link attribute bits corresponding to the 0 bits of the affinity attribute must have no bit set to 1.
- The link attribute bits corresponding to the 0 bits in the affinity mask are not checked.
e. If the MPLS TE tunnel is established by using an explicit path specified with the mpls te path command, verify that the explicit path configuration is appropriate: To use the strict mode, you must specify the IP address of the incoming interface hop by hop; to use the loose mode, you must specify the node address of the each device to be passed through.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file of the device.
¡ Diagnostic information collected using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Hot-standby CRLSP establishment failure
Symptom
After the mpls te backup hot-standby command is configured for an MPLS TE tunnel, no hot-standby backup CRLSP is established as expected.
Common causes
The following are the common causes of this type of issue:
· The device has only one interface that is adjacent to the neighbor.
· The configuration for the MPLS TE tunnel is incorrect.
Analysis
Figure 93 shows the troubleshooting flowchart.
Figure 93 Flowchart for troubleshooting hot-standby CRLSP establishment failure.
Solution
To resolve the issue:
1. According to the configured IGP protocol, execute the display ospf peer or display isis peer command to view information about the interfaces connected with the same neighbor (to the same System ID or Router ID).
# Display the summary information of IS-IS neighbors.
<Sysname> display isis peer
Peer information for IS-IS(1)
-----------------------------
System ID: 0000.0000.0001
Interface: GE2/0/1 Circuit Id: 0000.0000.0001.01
State: Up HoldTime: 27s Type: L1(L1L2) PRI: 64
System ID: 0000.0000.0001
Interface: GE2/0/2 Circuit Id: 0000.0000.0001.01
State: Up HoldTime: 27s Type: L2(L1L2) PRI: 64
# Display OSPF neighbor summary information.
<Sysname> display ospf peer
OSPF Process 1 with Router ID 1.1.1.1
Neighbor Brief Information
Area: 0.0.0.0
Router ID Address Pri Dead-Time State Interface
1.1.1.2 1.1.1.2 1 40 Full/DR GE2/0/1
¡ If the number of interfaces connected to the neighbor is greater than or equal to 2, proceed to the next step.
¡ If the number of interfaces connected to the neighbor is less than 2, increase the physical links between the device and the neighbor to ensure a path is available for establishing the backup CRLSP.
2. Verify that the MPLS TE configuration is correct.
Check the following settings in sequence:
a. Verify that the mpls te enable command is configured in the OSPF/IS-IS area and on the interfaces that the MPLS TE tunnel passes through.
b. Verify that the LSR ID and Router ID the same Loopback interface address.
c. If the MPLS TE tunnel is established using RSVP-TE, make sure the device and interfaces are configured with the rsvp and rsvp enable commands.
d. If the mpls te bandwidth command is configured on the tunnel interface, make sure the device's output interface is configured with the mpls te max-link-bandwidth and mpls te max-reservable-bandwidth commands.
e. If the mpls te affinity-attribute command is configured on the tunnel interface, make sure the mpls te link-attribute command is configured properly on the output interface. To ensure a link can be used by a tunnel, the following requirements must be met:
- The link attribute bits corresponding to the 1 bits in the affinity mask are checked as follows: The link attribute bits corresponding to the 1 bits of the affinity attribute must have a minimum of one bit set to 1. The link attribute bits corresponding to the 0 bits of the affinity attribute must have no bit set to 1.
- The link attribute bits corresponding to the 0 bits in the affinity mask are not checked.
f. If the MPLS TE tunnel is established using Segment Routing, make sure the segment-routing mpls command is configured in the IGP area of the device.
g. If the MPLS TE tunnel is established by using an explicit path specified with the mpls te path command, verify that the explicit path configuration is appropriate: To use the strict mode, you must specify the IP address of the incoming interface hop by hop; to use the loose mode, you must specify the node address of the each device to be passed through.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file of the device.
¡ Diagnostic information collected using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
N/A
Log messages
TE/5/TE_BACKUP_SWITCH
Issues of basic MPLS
Failure to forward packets through an LSP
Symptom
Packets sent by a host in the network cannot be forwarded through an LSP tunnel.
Common causes
The following are the common causes of this type of issue:
· The route does not exist.
· The LSP does not exist
· The route has not been recursed to the LSP tunnel.
· The forwarding state of the LSP tunnel is not ACTIVE.
· The BFD session state is down.
· The CPU usage is too high.
Troubleshooting flow
Figure 94 shows the troubleshooting flowchart.
Figure 94 Flowchart for troubleshooting packet forwarding failure on LSP
Solution
To resolve the issue:
1. Identify whether the IGP route exists.
Execute the display ip routing-table command to identify whether there is a subnet route destined for the Loopback interface address of the LSP destination node.
<Sysname> display ip routing-table 1.1.1.1
Summary count : 1
Destination/Mask Proto Pre Cost NextHop Interface
1.1.1.2/32 IS_L1 15 10 1.1.1.1 LoopBack1
¡ If the route does not exist, enable the IGP protocol on the Loopback interface and the public network interfaces to ensure the advertisement of the corresponding subnet route.
¡ If the route exists, proceed to step 2.
2. Identify whether the LSP exists.
Execute the display mpls lsp command to identify if there is an LSP destined for the Loopback interface address of the destination node.
¡ If no such LSP exists, establish one of the specified type:
- To establish an LDP LSP, enable MPLS and MPLS LDP on interfaces.
- To establish an SRLSP, execute the segment-routing mpls command in IS-IS IPv4 unicast address family view, OSPF view, or BGP IPv4 unicast address family view to enable MPLS-based SR.
- To establish an SR-MPLS TE policy, create the SR-MPLS TE policy correctly in SR TE view.
¡ If the LSP exists, proceed to step 3.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
1.1.1.2/32 LDP -/1049 GE2/0/1
3. Check whether the route has been recursed to the LSP tunnel.
Execute the display mpls tunnel all command to view the information of all tunnels. Execute the display fib command to view the FIB entry of the specified nexthop address. Find out the FIB entry where the Nexthop field value is the same as the Destination field value in the tunnel information, and then Identify whether the LSP index (value of the Token field) of the FIB entry is the same as the NHLFE ID of the tunnel.
¡ If they are different, it indicates that the route has not recursed to the LSP tunnel. Identify whether the tunnel type (Type field) of the specified FEC matches the tunnel type specified in the tunnel policy.
- If the tunnel types are different, modify the tunnel policy in tunnel policy view to make the tunnel policy configuration match with the specified FEC tunnel type.
- If the tunnel types are the same, proceed to step 7.
<Sysname> display tunnel-policy
Tunnel policy name: abc
Select-Seq: LSP
Load balance number : 1
Strict : No
¡ If the LSP index and the tunnel NHLFE ID are the same, it indicates that the route has recursed to the LSP tunnel. Proceed to step 4.
<Sysname> display mpls tunnel all
Destination Type Tunnel/NHLFE VPN Instance
2.2.2.9 LSP NHLFE3 -
3.3.3.9 SRLSP NHLFE2 -
4.4.4.9 SRPolicy NHLFE23068673 -
<Sysname> display fib
Destination count: 1 FIB entry count: 1
Flag:
U:Usable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination/Mask Nexthop Flag OutInterface/Token Label
55.55.55.55/32 2.2.2.9 UGHR 3 Null
…
4. Identify whether the forwarding state of the LSP tunnel is normal.
Execute the display mpls forwarding nhlfe command to view information about NHLFE entries.
¡ If the forwarding tags don't contain tag A, it implies that the LSP tunnel is not usable. Proceed to step 5.
¡ If the forwarding tags contain flag A, it implies that the LSP tunnel is functioning normally. Proceed to step 6.
<Sysname> display mpls forwarding nhlfe 3
Flags: T - Forwarded through a tunnel
N - Forwarded through the outgoing interface to the nexthop IP address
B - Backup forwarding information
A - Active forwarding information
M - P2MP forwarding information
S - Secondary backup path
NID Tnl-Type Flag OutLabel Forwarding Info
--------------------------------------------------------------------------------
3 LSP NA 1040127 GE2/0/3 10.0.3.2
5. Identify whether BFD is functioning properly.
Execute the display mpls bfd command or the display mpls sbfd command to view BFD/SBFD information for LSP tunnels.
¡ If the BFD/SBFD session state is Down, execute the mpls bfd enable command in system view to enable BFD/SBFD for MPLS, and make sure the BFD/SBFD session state is up.
¡ If the BFD/SBFD session state is Up, proceed with step 6.
<Sysname> display mpls bfd ipv4 22.22.2.2 32
Total number of sessions: 1, 1 up, 0 down, 0 init
FEC Type: LSP
FEC Info:
Destination: 22.22.2.2
Mask Length: 32
NHLFE ID: 1025
Local Discr: 513 Remote Discr: 513
Source IP: 11.11.1.1 Destination IP: 127.0.0.1
Session State: Up Session Role: Passive
Template Name: -
<Sysname> display mpls sbfd ipv4 22.22.2.2 32
Total number of sessions: 1, 1 up, 0 down, 0 init
FEC Type: LSP
FEC Info:
Destination: 22.22.2.2
Mask Length: 32
NHLFE ID: 1025
Local Discr: 513 Remote Discr: 513
Source IP: 11.11.1.1 Destination IP: 127.0.0.1
Session State: Up
Template Name: -
6. Identify whether CPU is functioning properly.
Execute the display cpu-usage command to view CPU usage statistics.
¡ If the CPU usage is too high, disable some unnecessary features to reduce the device's CPU usage.
¡ If the CPU usage is normal, proceed to step 7.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
Troubleshooting VPLS
Only the VSI on one PE device on the two ends of a PW is in up state
Symptom
Only the VSI on one PE device on the two ends of a PW is in up state.
Common causes
A VSI is up when at least one PW and one AC is up or at least two ACs are up in the VSI.
The common causes of this type of issue are:
· On an up VSI on a PE, PWs are down but two up ACs exist.
· On a down VSI, PWs are down and no AC or only one AC is up.
Solution
To resolve the issue:
1. Execute the display l2vpn vsi command to check the state of the ACs and PWs on a VSI.
<Sysname> display l2vpn vsi verbose
VSI Name: vpls1
VSI Index : 0
VSI Description : vsi for vpls1
VSI State : Down
MTU : 1500
Bandwidth : -
Broadcast Restrain : -
Multicast Restrain : -
Unknown Unicast Restrain: -
MAC Learning : Enabled
MAC Table Limit : -
MAC Learning rate : -
Drop Unknown : -
PW Redundancy : Master
Flooding : Enabled
Statistics : Disabled
VXLAN ID : -
LDP PWs:
Peer PW ID Link ID State
192.3.3.3 1 8 Down
ACs:
AC Link ID State Type
GE2/0/3 srv1 1 Up Manual
2. Execute the display l2vpn pw verbose command to identify the reason why the PW is down.
<Sysname> display l2vpn pw verbose
VSI Name: aaa
Peer: 2.2.2.9 Remote Site: 2
Signaling Protocol : BGP
Link ID : 9 PW State : Down
In Label : 1420 Out Label: 1419
MTU : 1500
PW Attributes : Main
VCCV CC : -
VCCV BFD : -
Flow Label : Send
Control Word : Disabled
Tunnel Group ID : 0x800000960000000
Tunnel NHLFE IDs : 1038
Admin PW : -
E-Tree Mode : -
E-Tree Role : root
Root VLAN : -
Leaf VLAN : -
Down Reasons : Control word not match
The common causes of this type of issue are as follows:
¡ BFD session for PW down—The BFD session for PW detection is down. To resolve this issue, execute the display bfd session command to display BFD session information. Check and edit BFD configuration or check the physical link for link failure or link quality issues.
¡ BGP RD was deleted—The BGP RD has been deleted. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.
¡ BGP RD was empty—No BGP RD is configured. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.
¡ Control word not match—The control word configuration on the two ends of the PW is inconsistent. To resolve this issue, execute the control-word enable command to enable the control word feature on both ends.
¡ Encapsulation not match—The encapsulation types on the two ends of the PW are inconsistent. To resolve this issue, execute the pw-type command to configure the same encapsulation type for the two ends.
¡ LDP interface parameter not match—The LDP negotiation parameters on the two ends of the PW are inconsistent. To resolve this issue, execute the vccv cc command to specify the same VCCV control channel (CC) type. Alternatively, specify the same CEM class for the CEM interfaces on both ends of the PW.
¡ Non-existent remote LDP PW—The remote device has deleted the LDP PW. To resolve this issue, reconfigure the PW on the remote device.
¡ Local AC Down—The local AC is down. To resolve this issue, check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state.
¡ Local AC was non-existent—No local AC is configured. To resolve this issue, configure a local AC and associate it with a VSI.
¡ MTU not match—The MTU configuration on the two ends of the PW is inconsistent. To resolve this issue, configure the same MTU at both ends of the PW or use the mtu-negotiate disable command to disable MTU negotiation.
¡ Remote AC Down—The remote AC is down. To resolve this issue, check and edit the configuration on the remote AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· L2VPN/2/L2VPN_PWSTATE_CHANGE
· L2VPN/4/L2VPN_BGPVC_CONFLICT_LOCAL
· L2VPN/4/L2VPN_BGPVC_CONFLICT_REMOTE
· L2VPN/4/L2VPN_HARD_RESOURCE_NOENOUGH
· L2VPN/2/L2VPN_HARD_RESOURCE_RESTORE
· L2VPN/4/L2VPN_LABEL_DUPLICATE
VPLS traffic failed to be forwarded
Symptom
VPLS traffic failed to be forwarded.
Common causes
The following are the common causes of this type of issue:
· The AC is not up.
· The PW is not up.
· The PW did not generate forwarding information.
· No public tunnels are available for the PW.
· The public tunnel for the PW is abnormal.
Troubleshooting flowchart
Figure 95 shows the troubleshooting flowchart.
Figure 95 Flowchart for troubleshooting VPLS traffic forwarding failure
Solution
To resolve the issue:
1. Execute the display l2vpn vsi command to check the status and quantity of the AC and PW associated with a VSI.
<Sysname> display l2vpn vsi verbose
VSI Name: vpls1
VSI Index : 0
VSI Description : vsi for vpls1
VSI State : Up
MTU : 1500
Bandwidth : -
Broadcast Restrain : -
Multicast Restrain : -
Unknown Unicast Restrain: -
MAC Learning : Enabled
MAC Table Limit : -
MAC Learning rate : -
Drop Unknown : -
PW Redundancy : Master
Flooding : Enabled
Statistics : Disabled
VXLAN ID : -
LDP PWs:
Peer PW ID Link ID State
192.3.3.3 1 8 Down
ACs:
AC Link ID State Type
GE2/0/3 srv1 1 Up Manual
2. If the state of the AC is down, verify that the AC configuration is correct and the interface where the AC is located is up. If the AC configuration is incorrect or the interface where the AC is located is down, edit the AC configuration or troubleshoot the interface failure.
3. If the PW is down, execute the display l2vpn pw verbose command to check the reason why the PW is down.
<Sysname> display l2vpn pw verbose
VSI Name: aaa
Peer: 2.2.2.9 Remote Site: 2
Signaling Protocol : BGP
Link ID : 9 PW State : Down
In Label : 1420 Out Label: 1419
MTU : 1500
PW Attributes : Main
VCCV CC : -
VCCV BFD : -
Flow Label : Send
Control Word : Disabled
Tunnel Group ID : 0x800000960000000
Tunnel NHLFE IDs : 1038
Admin PW : -
E-Tree Mode : -
E-Tree Role : root
Root VLAN : -
Leaf VLAN : -
Down Reasons : Control word not match
The common causes of this type of issue are as follows:
¡ BFD session for PW down—The BFD session for PW detection is down. To resolve this issue, execute the display bfd session command to display BFD session information. Check and edit BFD configuration or check the physical link for link failure or link quality issues.
¡ BGP RD was deleted—The BGP RD has been deleted. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.
¡ BGP RD was empty—No BGP RD is configured. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.
¡ Control word not match—The control word configuration on the two ends of the PW is inconsistent. To resolve this issue, execute the control-word enable command to enable the control word feature on both ends.
¡ Encapsulation not match—The encapsulation types on the two ends of the PW are inconsistent. Execute the pw-type command to configure the same encapsulation type for the two ends.
¡ LDP interface parameter not match—The LDP negotiation parameters on the two ends of the PW are inconsistent. To resolve this issue, execute the vccv cc command to specify the same VCCV control channel (CC) type. Alternatively, specify the same CEM class for the CEM interfaces on both ends of the PW.
¡ Non-existent remote LDP PW—The remote device has deleted the LDP PW. To resolve the issue, reconfigure the LDP PW on the remote device.
¡ Local AC Down—The local AC is down. To resolve this issue, check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state.
¡ Local AC was non-existent—No local AC is configured. To resolve this issue, configure a local AC and associate it with a VSI.
¡ MTU not match—The MTU configuration on the two ends of the PW is inconsistent. To resolve this issue, configure the same MTU at both ends of the PW or use the mtu-negotiate disable command to disable MTU negotiation.
¡ Remote AC Down—The remote AC is down. To resolve this issue, check and edit the configuration on the remote AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state.
4. If both the AC and PW are up, execute the display l2vpn forwarding pw verbose command to identify whether PW forwarding information exists. If the information exists, the Tunnel NHLFE IDs field displays the NHLFE IDs of the public tunnels that carry the PW.
¡ If PW forwarding information exists, go to step 6.
¡ If no PW forwarding information exists, go to step 5.
<Sysname> display l2vpn forwarding pw verbose
VSI Name: aaa
Link ID: 8
PW Type : VLAN PW State : Up
In Label : 1272 Out Label: 1275
MTU : 1500
PW Attributes : Main
VCCV CC : Router-Alert
VCCV BFD : Fault Detection with BFD
Flow Label : Send
Tunnel Group ID : 0x960000000
Tunnel NHLFE IDs: 1034
MAC limit : maximum=2000 alarm=enabled action=discard
5. Execute the display mpls lsp command to check for the tunnel that carries the PW. The tunnel is an LSP with the FEC as the PW peer IP address. If it does not exist, establish the tunnel that carries the PW.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
100.100.100.100/24 LDP -/1049 GE2/0/1
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Use the display diagnostic-information command to collect diagnostic information.
Related alarm and log messages
Alarm messages
N/A
Log messages
· L2VPN/2/L2VPN_PWSTATE_CHANGE
· L2VPN/4/L2VPN_BGPVC_CONFLICT_LOCAL
· L2VPN/4/L2VPN_BGPVC_CONFLICT_REMOTE
· L2VPN/4/L2VPN_HARD_RESOURCE_NOENOUGH
· L2VPN/2/L2VPN_HARD_RESOURCE_RESTORE
· L2VPN/4/L2VPN_LABEL_DUPLICATE
A PW in up state failed to forward packets between two PEs
Symptom
When a PW is in up state, it failed to forward packets between two PEs.
Common causes
The following are the common causes of this type of issue:
· The number of MAC addresses that a PW learned reached the upper limit, and the PW is configured to drop frames with unknown source MAC addresses when the maximum is reached.
· PW information has not been deployed to the forwarding module.
Troubleshooting flowchart
Figure 96 shows the troubleshooting flowchart.
Solution
To resolve the issue:
1. Execute the display l2vpn mac-address command to identify whether corresponding MAC address entries exist and the total number of learned MAC address entries. You can specify an AC interface or PW to display the total number of MAC address entries learned from that AC interface or PW.
¡ Display MAC address table information for VSIs.
<Sysname> display l2vpn mac-address
* - The output interface is issued to another VSI
MAC Address State VSI Name Link ID/Name Aging
0000-0000-000a Dynamic vpn1 GE2/0/1 Aging
0000-0000-0009 Dynamic vpn1 GE2/0/1 Aging
--- 2 mac address(es) found ---
¡ Display the number of MAC address entries.
<Sysname> display l2vpn mac-address count
2 mac address(es) found
2. Check for the maximum number of MAC addresses allowed to be learned, and the action to be taken on frames with unknown source MAC addresses when the PW has learned the maximum number of MAC addresses.
¡ Execute the display this command in VSI view to identify whether the mac-table limit and mac-table limit drop-unknown commands are configured for the VSI. If these commands are configured and the number of learned MAC addresses has reached the upper limit, increase the maximum number of MAC addresses that the VSI can learn or delete the mac-table limit drop-unknown command.
¡ Execute the display this command in AC view and PW view to check for the mac-limit command are configured for the VSI. If this command is configured and the number of learned MAC addresses has reached the upper limit, increase the maximum number of MAC addresses that can be learned or delete the action discard option.
3. Execute the display l2vpn forwarding pw verbose command to identify whether PW forwarding information exists. If the information exists, the Tunnel NHLFE IDs field displays the NHLFE IDs of the public tunnels that carry the PW.
¡ If forwarding information exists, go to step 5.
¡ If no forwarding information exists, go to step 4.
<Sysname> display l2vpn forwarding pw verbose
VSI Name: aaa
Link ID: 8
PW Type : VLAN PW State : Up
In Label : 1272 Out Label: 1275
MTU : 1500
PW Attributes : Main
VCCV CC : Router-Alert
VCCV BFD : Fault Detection with BFD
Flow Label : Send
Tunnel Group ID : 0x960000000
Tunnel NHLFE IDs: 1034
MAC limit : maximum=2000 alarm=enabled action=discard
4. Execute the display mpls lsp command to check for the tunnel that carries the PW. The tunnel is an LSP with the FEC as the PW peer IP address. If it does not exist, establish the tunnel that carries the PW.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
100.100.100.100/24 LDP -/1049 GE2/0/1
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Use the display diagnostic-information command to collect diagnostic information.
Related alarm and log messages
Alarm messages
N/A
Log messages
· L2VPN/4/L2VPN_MACLIMIT_MAX_AC
· L2VPN/4/L2VPN_MACLIMIT_MAX_PW
· L2VPN/4/L2VPN_MACLIMIT_MAX_VSI
An LDP PW cannot become up
Symptom
In a VPLS network, an LDP PW cannot become up.
Common causes
The following are the common causes of this type of issue:
· The encapsulation types at both ends of the PW are inconsistent.
· The MTU values at both ends of PW are inconsistent.
· The LDP session state is not Up.
· No public tunnels are available for the PW.
· The AC interface is not up.
Solution
To resolve the issue:
1. Use the display l2vpn pw verbose command to check for the peer IP address of the PW and the reason why the PW is down.
<Sysname> display l2vpn pw verbose
VSI Name: aaa
Peer: 2.2.2.9 VPLS ID: 100:100
Signaling Protocol : LDP
Link ID : 8 PW State : Down
In Label : 1553 Out Label: 1553
MTU : 1500
PW Attributes : Main
VCCV CC : -
VCCV BFD : -
Flow Label : -
Tunnel Group ID : 0x800000960000000
Tunnel NHLFE IDs : 1038
Admin PW : -
E-Tree Mode : -
E-Tree Role : root
Root VLAN : -
Leaf VLAN : -
Down Reasons : Control word not match
Table 15 shows the common causes of this type of issue.
Table 15 Common causes and solutions
Down Reasons |
Symptom |
Solution |
BFD session for PW down |
The BFD session for PW detection is down. |
Execute the display bfd session command to display BFD session information. Check and edit BFD configuration or check the physical link for link failure or link quality issues. |
Control word not match |
The control word configuration on the two ends of the PW is inconsistent. |
Execute the control-word enable command to enable the control word feature on both ends. |
Encapsulation not match |
The encapsulation types on the two ends of the PW are inconsistent. |
Execute the pw-type command to configure the same encapsulation type for the two ends. |
LDP interface parameter not match |
The LDP negotiation parameters on the two ends of the PW are inconsistent. |
Execute the vccv cc command to specify the same VCCV control channel (CC) type or specify the same CEM class for the CEM interfaces on both ends of the PW. |
Non-existent remote LDP PW |
The remote device has deleted the LDP PW. |
Reconfigure the PW on the remote device. |
Local AC Down |
The local AC is down. |
Check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state. |
Local AC was non-existent |
No local AC is configured. |
Configure a local AC and associate it with a VSI. |
MTU not match |
The MTU configuration on the two ends of the PW is inconsistent. |
Configure the same MTU at both ends of the PW or use the mtu-negotiate disable command to disable MTU negotiation. |
Remote AC Down |
The remote AC is down. |
Check and edit the configuration on the remote AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state. |
Label not allocated |
No label is allocated. |
Contact Technical Support. |
Local VSI Down |
The local VSI is down. |
See "Only the VSI on one PE device on the two ends of a PW is in up state." |
Local and remote LDP PWs have different AII |
The local SAII and remote TAII are different. |
See "LDP session down" in LDP Troubleshooting Guide. |
Local LDP PW was not sent mapping message |
The local end did not send the LDP mapping message. |
See "LDP session down" in LDP Troubleshooting Guide. |
Local LDP PW Virtual Nexthop defect |
The local LDP PW has virtual next hop defects. |
See steps 2 and 3. |
Remote LDP PW Virtual Nexthop defect |
The remote LDP PW has virtual next hop defects. |
See steps 2 and 3. |
Tunnel Down |
The tunnel that carries the PW is down. |
See step 3. |
2. Execute the display l2vpn forwarding pw verbose command to identify whether PW forwarding information exists. If the information exists, the Tunnel NHLFE IDs field displays the NHLFE IDs of the public tunnels that carry the PW.
¡ If forwarding information exists, go to step 4.
¡ If no forwarding information exists, go to step 3.
<Sysname> display l2vpn forwarding pw verbose
VSI Name: aaa
Link ID: 8
PW Type : VLAN PW State : Up
In Label : 1272 Out Label: 1275
MTU : 1500
PW Attributes : Main
VCCV CC : Router-Alert
VCCV BFD : Fault Detection with BFD
Flow Label : Send
Tunnel Group ID : 0x960000000
Tunnel NHLFE IDs: 1034
MAC limit : maximum=2000 alarm=enabled action=discard
3. Execute the display mpls lsp command to check for the tunnel that carries the PW. The tunnel is an LSP with the FEC as the PW peer IP address as described in step 1. If it does not exist, establish the tunnel that carries the PW. If not, create a tunnel for carrying the PW. Supported public tunnel types include LSP, MPLS TE, and GRE tunnels. For how to create LSP, MPLS TE, and GRE public tunnels, see "Configuring a static LSP" and "Configuring LDP," "Configuring MPLS-TE" in MPLS Configuration Guide, and "Configuring GRE" in Layer 3—IP Services Configuring Guide, respectively.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
100.100.100.100/24 LDP -/1049 GE2/0/1
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Use the display diagnostic-information command to collect diagnostic information.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
A VSI cannot become up when VPLS uses LDP
Symptom
A VSI cannot become up when VPLS uses LDP
Common causes
The VSI is in the Up state if any of the following conditions are met:
· Under VSI, there is at least one PW Up and one AC Up.
· In the VSI, at least two ACs are up.
· In VSI, there are at least two PW Up (multi-section PW networking).
The following are the common causes of this type of issue:
· The total number of ACs and Pws in up state in a VSI is less than 2.
· The shutdown command was executed in the VSI.
Solution
To resolve the issue:
1. Execute the display this command in VSI view to check for the shutdown command.
¡ If shutdown command is configured, execute the undo shutdown command.
¡ If the shutdown command is not configured, go to the next step.
2. Execute the display l2vpn vsi command to check the status and quantity of the AC and PW associated with the VSI.
<Sysname> display l2vpn vsi verbose
VSI Name: vpls1
VSI Index : 0
VSI Description : vsi for vpls1
VSI State : Up
MTU : 1500
Bandwidth : -
Broadcast Restrain : -
Multicast Restrain : -
Unknown Unicast Restrain: -
MAC Learning : Enabled
MAC Table Limit : -
MAC Learning rate : -
Drop Unknown : -
PW Redundancy : Master
Flooding : Enabled
Statistics : Disabled
VXLAN ID : -
LDP PWs:
Peer PW ID Link ID State
192.3.3.3 1 8 Down
ACs:
AC Link ID State Type
GE2/0/3 srv1 1 Up Manual
¡ If the sum of ACs and PWs associated with the VSI is less than 2, create ACs and PWs first.
¡ If the state of the AC is down, verify that the AC configuration is correct and the interface where the AC is located is up. If the AC configuration is incorrect or the interface where the AC is located is down, edit the AC configuration or troubleshoot the interface failure.
¡ If the state of the PW is down, see "An LDP PW cannot become up" to troubleshoot the issue.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting segment routing issues
EVPN L3VPN over SRv6 issues
EVPN L3VPN over SRv6 BE traffic forwarding failure
Symptom
On an EVPN L3VPN over SRv6 network shown in Figure 97, traffic forwarding failure occurs when the PEs use the SRv6 BE mode to forward the private service traffic in VPN 1 between CE 1 and CE 2.
The troubleshooting flow is the same for IPv4 and IPv6 private networks. The following information uses IPv4 for example to describe the troubleshooting procedure for EVPN L3VPN over SRv6.
Table 16 shows the key network planning information for the EVPN L3VPN over SRv6 network.
Table 16 SRv6 locators and major addresses in the address plan
Device |
Interface or locator |
Address |
Device |
Interface or locator |
Address |
PE 1 |
SRv6 Locator |
100:1::/64 |
PE 2 |
SRv6 Locator |
300:1::/64 |
|
Loopback0 |
1::1/128 |
|
Loopback0 |
3::3/128 |
CE 1 |
Loopback0 |
10.10.10.10/32 |
CE 2 |
Loopback0 |
20.20.20.20/32 |
|
Loopback1 |
11.11.11.11/32 |
|
Loopback1 |
22.22.22.22/32 |
P |
SRv6 Locator |
200:1::/64 |
|
|
|
|
Loopback0 |
2::2/128 |
|
|
|
Common causes
The following are the common causes of this type of issue:
· The PEs cannot learn public network routes because of the failure to establish BGP EVPN peer relationships.
· EVPN L3VPN over SRv6 configuration is incomplete.
· The routes for the SRv6 SIDs are unreachable.
Troubleshooting flow
Figure 98 shows the troubleshooting flowchart.
Figure 98 Flowchart for troubleshooting EVPN L3VPN over SRv6 BE traffic forwarding failure
Solution
1. Ping the private IP on the remote PE from the local PE to check its connectivity. When you do that, specify the name of the VPN instance to which the private IP address belongs.
<Sysname> ping -vpn-instance vpn1 20.20.20.20
Ping 20.20.20.20 (20.20.20.20): 56 data bytes, press CTRL+C to break
56 bytes from 20.20.20.20: icmp_seq=0 ttl=254 time=2.000 ms
56 bytes from 20.20.20.20: icmp_seq=1 ttl=254 time=1.000 ms
56 bytes from 20.20.20.20: icmp_seq=2 ttl=254 time=1.000 ms
56 bytes from 20.20.20.20: icmp_seq=3 ttl=254 time=1.000 ms
56 bytes from 20.20.20.20: icmp_seq=4 ttl=254 time=2.000 ms
--- Ping statistics for 20.20.20.20 in VPN instance vpn1 ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 1.000/1.400/2.000/0.490 ms
If the ping test fails, the private network service is unavailable. Proceed to the next step.
2. Perform the subsequent checks on both the local and remote PE devices. This document takes the local PE device for example. Execute the display ip routing-table vpn-instance command on the local PE device to check the VPN routing table for routes to the private IP addresses.
<Sysname> display ip routing-table vpn-instance vpn1
Destinations : 10 Routes : 10
Destination/Mask Proto Pre Cost NextHop Interface
10.1.1.0/24 Direct 0 0 10.1.1.2
10.1.1.2/32 Direct 0 0 127.0.0.1
10.1.1.255/32 Direct 0 0 10.1.1.2
10.10.10.10/32 BGP 255 0 10.1.1.1
11.11.11.11/32 BGP 255 0 10.1.1.1
20.1.1.0/24 BGP 255 0 3::3
20.20.20.20/32 BGP 255 0 3::3
22.22.22.22/32 BGP 255 0 3::3
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
If the device has VPN routes to the private IP addresses, verify that the FIB contains entries for these addresses, with a U flag in the Flag field. The U flag indicates that the private network IP address is valid.
<Sysname> display fib vpn-instance vpn1
Route destination count: 10
Directly-connected host count: 1
Flag:
U:Usable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination/Mask Nexthop Flag OutInterface/Token Label
11.11.11.11/32 10.1.1.1 UGHR Null
127.0.0.0/8 127.0.0.1 U InLoop0 Null
10.1.1.0/24 10.1.1.2 U Null
20.20.20.20/32 3::3 UGHR Null
10.1.1.2/32 127.0.0.1 UH Null
22.22.22.22/32 3::3 UGHR Null
10.1.1.255/32 10.1.1.2 UBH Null
255.255.255.255/32 127.0.0.1 UH InLoop0 Null
10.10.10.10/32 10.1.1.1 UGHR Null
10.1.1.1/32 10.1.1.1 UH Null
20.1.1.0/24 3::3 UGR Null
If a private IP address does not exist or is invalid in the VPN routing table or VPN FIB, proceed to check for BGP EVPN route learning issues between PEs and verify that a valid tunnel exists between them.
3. Execute the display bgp peer l2vpn evpn command on the local PE device to verify that it has established a BGP EVPN peer relationship with the remote PE device.
¡ If the PE devices have established a BGP EVPN peer relationship successfully, the State field in the command output displays Established. Proceed to step 4.
¡ If the PE devices have not established a BGP EVPN peer relationship, see the troubleshooting procedure for BGP peer establishment issues.
<Sysname> display bgp peer l2vpn evpn
BGP local router ID: 1.1.1.1
Local AS number: 100
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
3::3 100 13 10 0 2 00:00:05 Established
4. Execute the display bgp l2vpn evpn command on the local PE device to verify that it has learned the BGP EVPN routes from the remote PE device. Pay special attention to the PrefixSID field for each route in the command output. In this field, the End.DT4 SID represents the SRv6 SID assigned by the remote PE to the VPN private IP address. When forwarding traffic to that private address in SRv6 BE mode, the local PE device uses that End.DT4 SID as the destination address in the IPv6 packets.
<Sysname> display bgp l2vpn evpn [5][0][32][20.20.20.20]/80
BGP local router ID: 1.1.1.1
Local AS number: 100
Route distinguisher: 100:1(vpn1)
Total number of routes: 1
Paths: 1 available, 1 best
BGP routing table information of [5][0][32][20.20.20.20]/80:
From : 3::3 (3.3.3.3)
Rely nexthop : FE80::A2C3:E2FF:FEB5:306
Original nexthop: 3::3
Out interface :
Route age : 00h28m51s
OutLabel : 3
Ext-Community : <RT: 100:1>
RxPathID : 0x0
TxPathID : 0x0
PrefixSID : End.DT4 SID <300:1::A>
SRv6 Service TLV (37 bytes):
Type: SRV6 L3 Service TLV (5)
Length: 34 bytes, Reserved: 0x0
SRv6 Service Information Sub-TLV (33 bytes):
Type: 1 Length: 30, Rsvdl: 0x0
SID Flags: 0x0 Endpoint behavior: 0x13 Rsvd2: 0x0
SRv6 SID Sub-Sub-TLV:
Type: 1 Len: 6
BL: 64 NL: 0 FL: 64 AL: 0 TL: 0 TO: 0
AS-path : 300
Origin : incomplete
Attribute value : MED 0, localpref 100, pref-val 0
State : valid, internal, best
Source type : local
IP precedence : N/A
QoS local ID : N/A
Traffic index : N/A
EVPN route type : IP prefix advertisement route
ESI : 0000.0000.0000.0000.0000
Ethernet tag ID : 0
IP prefix : 20.20.20.20/32
Gateway address : 0.0.0.0
MPLS label : 3
Tunnel policy : NULL
Rely tunnel IDs : N/A
Re-orignination : Disable
If the local PE device has learned a BGP EVPN route to the remote destination address and its PrefixSID attribute is correct, execute the display bgp routing-table ipv4 vpn-instance command on the local PE device to verify that the route has been added to the BGP EVPN routing table. Verify that the route is both valid and the best. Only valid and optimal BGP routes can be learned into a VPN routing table.
<Sysname> display bgp routing-table ipv4 vpn-instance vpn1
Total number of routes: 8
BGP local router ID is 1.1.1.1
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Network NextHop MED LocPrf PrefVal Path/Ogn
* > 10.1.1.0/24 10.1.1.2 0 32768 ?
* e 10.1.1.1 0 0 200?
* > 10.1.1.2/32 127.0.0.1 0 32768 ?
* >e 10.10.10.10/32 10.1.1.1 0 0 200?
* >e 11.11.11.11/32 10.1.1.1 0 0 200?
* >i 20.1.1.0/24 3::3 0 100 0 ?
* >i 20.20.20.20/32 3::3 0 100 0 300?
* >i 22.22.22.22/32 3::3 0 100 0 300?
If the local PE device has failed to learn a valid and optimal BGP EVPN route, or if the PrefixSID attribute is missing from the BGP EVPN route, proceed to check for incomplete EVPN L3VPN over SRv6 configuration. An incomplete configuration might result in failure to allocate SRv6 SIDs or to establish an SRv6 tunnel.
5. Verify that the configuration for EVPN L3VPN over SRv6 on both PE devices is complete. If the configuration is incomplete, add the missing configuration. If the configuration is complete, proceed to step 6.
Execute the display current-configuration command on both PE devices to check for the following configuration items. If they are missing, see EVPN L3VPN over SRv6 configuration in Segment Routing Configuration Guide to add the missing configuration items.
#
isis 1
cost-style wide-compatible
#
address-family ipv6 unicast
segment-routing ipv6 locator aaa //Enable IS-IS to advertise the specified locator and the SRv6 SIDs in the locator.
#
#
bgp 100
peer 3::3 as-number 100
#
address-family l2vpn evpn
peer 3::3 enable
peer 3::3 advertise encap-type srv6 //Adertise SRv6-encapsulated EVPN routes with the PrefixSID attribute to the peer or peer group.
#
ip vpn-instance vpn1
#
address-family ipv4 unicast
segment-routing ipv6 best-effort evpn //Steer route matching traffic to an SRv6 BE tunnel.
segment-routing ipv6 locator aaa evpn //Apply the locator to the BGP process so the device can use the locator to allocate SRv6 SIDs for the private network routes in the specified VPN instance.
#
segment-routing ipv6
encapsulation source-address 11::11 //Specify the source address for the outer IPv6 header of SRv6 VPN packets.
locator aaa ipv6-prefix 300:1:: 64 static 8 //Create a Locator segment.
#
If the SRv6 configuration is complete and correct, proceed to check for unreachable SRv6 SIDs on the forwarding path.
6. Execute the display ipv6 routing-table ipv6-address command on all devices along the forwarding path to check for routes to the SRv6 SIDs on both PEs.
<Sysname> display ipv6 routing-table 300:1::A
Summary count : 2
Destination: 300:1::/64 Protocol : IS_L1
NextHop : FE80::A2C3:E2FF:FEB5:306 Preference: 15
Interface : Cost : 20
Execute the ping ipv6 command on all devices to verify the connectivity to the SRv6 SIDs.
<Sysname> ping ipv6 300:1::A
Ping6(56 data bytes) 1001::1 --> 300:1::A, press CTRL+C to break
56 bytes from 300:1::A, icmp_seq=0 hlim=63 time=2.000 ms
56 bytes from 300:1::A, icmp_seq=1 hlim=63 time=1.000 ms
56 bytes from 300:1::A, icmp_seq=2 hlim=63 time=0.000 ms
56 bytes from 300:1::A, icmp_seq=3 hlim=63 time=1.000 ms
56 bytes from 300:1::A, icmp_seq=4 hlim=63 time=0.000 ms
--- Ping6 statistics for 300:1::A ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.000/0.800/2.000/0.748 ms
If a route exists to the SRv6 SID at the remote end and the ping test succeeds, proceed to step 7. If no route exists to the SRv6 SID or the ping test fails, check for the failure of IGP on the PE devices to advertise the network segment in the locator for the SID to other devices in the domain. In this situation, use Layer 3—IP Routing Troubleshooting Guide to resolve the IGP route advertisement issue.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
EVPN L3VPN over SRv6 TE traffic forwarding failure
Symptom
On an EVPN L3VPN over SRv6 network shown in Figure 99, traffic forwarding failure occurs when the PEs use the SRv6 TE mode to forward the private service traffic between CE 1 and CE 2 in VPN 1 through an SRv6 TE policy.
The troubleshooting flow is the same for IPv4 and IPv6 private networks. The following information uses IPv4 for example to describe the troubleshooting procedure for EVPN L3VPN over SRv6.
Table 17 shows the key network planning information for the EVPN L3VPN over SRv6 network.
Table 17 SRv6 locators and major addresses in the address plan
Device |
Interface or locator |
Address |
Device |
Interface or locator |
Address |
PE 1 |
SRv6 Locator |
100:1::/64 |
PE 2 |
SRv6 Locator |
300:1::/64 |
|
Loopback0 |
1::1/128 |
|
Loopback0 |
3::3/128 |
CE 1 |
Loopback0 |
10.10.10.10/32 |
CE 2 |
Loopback0 |
20.20.20.20/32 |
|
Loopback1 |
11.11.11.11/32 |
|
Loopback1 |
22.22.22.22/32 |
P |
SRv6 Locator |
200:1::/64 |
|
|
|
|
Loopback0 |
2::2/128 |
|
|
|
Common causes
The following are the common causes of this type of issue:
· The PEs cannot learn VPN routes because of the failure to establish BGP EVPN peer relationships.
· Recursive routing is not performed in SRv6 TE mode.
· The SRv6 TE policy for the EVPN route is down.
· The routes for the SRv6 SIDs are unreachable.
Troubleshooting flow
Figure 100 shows the troubleshooting flowchart.
Figure 100 Flowchart for troubleshooting EVPN L3VPN over SRv6 TE traffic forwarding failure
Solution
1. Ping the private IP on the remote PE from the local PE to check its connectivity. When you do that, specify the name of the VPN instance to which the private IP address belongs.
<Sysname> ping -vpn-instance vpn1 20.20.20.20
Ping 20.20.20.20 (20.20.20.20): 56 data bytes, press CTRL+C to break
56 bytes from 20.20.20.20: icmp_seq=0 ttl=254 time=2.000 ms
56 bytes from 20.20.20.20: icmp_seq=1 ttl=254 time=1.000 ms
56 bytes from 20.20.20.20: icmp_seq=2 ttl=254 time=1.000 ms
56 bytes from 20.20.20.20: icmp_seq=3 ttl=254 time=1.000 ms
56 bytes from 20.20.20.20: icmp_seq=4 ttl=254 time=2.000 ms
--- Ping statistics for 20.20.20.20 in VPN instance vpn1 ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 1.000/1.400/2.000/0.490 ms
If the ping test fails, the private network service is unavailable. Proceed to the next step.
2. Perform the subsequent checks on both the local and remote PE devices. This document takes the local PE device for example. Execute the display ip routing-table vpn-instance vpn1 command on the local PE device to check the VPN routing table for routes to the private IP addresses. Verify that the outgoing interface in the route to the remote private IP address is the name of the expected SRv6 TE policy.
<Sysname> display ip routing-table vpn-instance vpn1
Destinations : 10 Routes : 10
Destination/Mask Proto Pre Cost NextHop Interface
10.1.1.0/24 Direct 0 0 10.1.1.2
10.1.1.2/32 Direct 0 0 127.0.0.1
10.1.1.255/32 Direct 0 0 10.1.1.2
10.10.10.10/32 BGP 255 0 10.1.1.1
11.11.11.11/32 BGP 255 0 10.1.1.1
20.1.1.0/24 BGP 255 0 3::3 p1
20.20.20.20/32 BGP 255 0 3::3 p1
22.22.22.22/32 BGP 255 0 3::3 p1
127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0
255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0
If the device has routes to the private IP addresses in the VPN instance, verify that the FIB contains entries for these addresses, with a U flag in the Flag field. The U flag in an entry indicates that the entry for the IP address is valid. Verify that the outgoing interface/token field in the route to the remote private IP address contains the forwarding index for the expected SRv6 TE policy.
<Sysname> display fib vpn-instance vpn1
Route destination count: 10
Directly-connected host count: 1
Flag:
U:Usable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination/Mask Nexthop Flag OutInterface/Token Label
11.11.11.11/32 10.1.1.1 UGHR Null
127.0.0.0/8 127.0.0.1 U InLoop0 Null
10.1.1.0/24 10.1.1.2 U Null
20.20.20.20/32 3::3 UGHR 2150629377 Null
10.1.1.2/32 127.0.0.1 UH Null
22.22.22.22/32 3::3 UGHR 2150629377 Null
10.1.1.255/32 10.1.1.2 UBH Null
255.255.255.255/32 127.0.0.1 UH InLoop0 Null
10.10.10.10/32 10.1.1.1 UGHR Null
10.1.1.1/32 10.1.1.1 UH Null
20.1.1.0/24 3::3 UGR 2150629377 Null
Proceed to check for BGP EVPN route
learning issues between PEs and verify that a valid SRv6 TE policy exists
between them in either of the following situations:
The VPN routing table or VPN FIB does not contain a valid entry for a private
IP address.
The entry does not contain the expected SRv6 TE policy as the outgoing interface.
3. Execute the display bgp peer l2vpn evpn command on the local PE device to verify that it has established a BGP EVPN peer relationship with the remote PE device.
¡ If the PE devices have established a BGP EVPN peer relationship successfully, the State field in the command output displays Established. Proceed to step 4.
¡ If the PE devices have not established a BGP EVPN peer relationship, see the troubleshooting procedure for BGP peer establishment issues.
<Sysname> display bgp peer l2vpn evpn
BGP local router ID: 1.1.1.1
Local AS number: 100
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
3::3 100 145 145 0 3 01:56:37 Established
4. Execute the display bgp l2vpn evpn command on the local PE to verify that it has learned the BGP EVPN routes from the remote PE. Pay special attention to the PrefixSID field for each route in the command output. In this field, the End.DT4 SID represents the SRv6 SID assigned by the remote PE to the VPN private IP address. When forwarding traffic to that private address in SRv6 TE mode, the local PE encapsulates that End.DT4 SID as the last SID in the SRH expansion header, along with the SID list in the SRv6 TE policy.
<Sysname> display bgp l2vpn evpn [5][0][32][20.20.20.20]/80
BGP local router ID: 1.1.1.1
Local AS number: 100
Route distinguisher: 100:1(vpn1)
Total number of routes: 1
Paths: 1 available, 1 best
BGP routing table information of [5][0][32][20.20.20.20]/80:
From : 3::3 (3.3.3.3)
Rely nexthop : FE80::A2C3:E2FF:FEB5:306
Original nexthop: 3::3
Out interface :
Route age : 00h28m51s
OutLabel : 3
Ext-Community : <RT: 100:1>
RxPathID : 0x0
TxPathID : 0x0
PrefixSID : End.DT4 SID <300:1::A>
SRv6 Service TLV (37 bytes):
Type: SRV6 L3 Service TLV (5)
Length: 34 bytes, Reserved: 0x0
SRv6 Service Information Sub-TLV (33 bytes):
Type: 1 Length: 30, Rsvdl: 0x0
SID Flags: 0x0 Endpoint behavior: 0x13 Rsvd2: 0x0
SRv6 SID Sub-Sub-TLV:
Type: 1 Len: 6
BL: 64 NL: 0 FL: 64 AL: 0 TL: 0 TO: 0
AS-path : 300
Origin : incomplete
Attribute value : MED 0, localpref 100, pref-val 0
State : valid, internal, best
Source type : local
IP precedence : N/A
QoS local ID : N/A
Traffic index : N/A
EVPN route type : IP prefix advertisement route
ESI : 0000.0000.0000.0000.0000
Ethernet tag ID : 0
IP prefix : 20.20.20.20/32
Gateway address : 0.0.0.0
MPLS label : 3
Tunnel policy : NULL
Rely tunnel IDs : N/A
Re-orignination : Disable
If the local PE has learned a BGP EVPN route to the remote destination address and its PrefixSID attribute is correct, execute the display bgp routing-table ipv4 vpn-instance command on the local PE to verify that the route has been added to the BGP VPN routing table. Verify that the route is both valid and the best. Only valid and optimal BGP routes can be learned into a VPN routing table.
<Sysname> display bgp routing-table ipv4 vpn-instance vpn1 20.20.20.20
BGP local router ID: 1.1.1.1
Local AS number: 100
Paths: 1 available, 1 best
BGP routing table information of 20.20.20.20/32:
From : 3::3 (3.3.3.3)
Rely nexthop : FE80::A2C3:E2FF:FEB5:306
Original nexthop: 3::3
Out interface :
Route age : 02h03m22s
OutLabel : 3
Ext-Community : <RT: 100:1>
RxPathID : 0x0
TxPathID : 0x0
PrefixSID : End.DT4 SID <300:1::A>
SRv6 Service TLV (37 bytes):
Type: SRV6 L3 Service TLV (5)
Length: 34 bytes, Reserved: 0x0
SRv6 Service Information Sub-TLV (33 bytes):
Type: 1 Length: 30, Rsvdl: 0x0
SID Flags: 0x0 Endpoint behavior: 0x13 Rsvd2: 0x0
SRv6 SID Sub-Sub-TLV:
Type: 1 Len: 6
BL: 64 NL: 0 FL: 64 AL: 0 TL: 0 TO: 0
AS-path : 300
Origin : incomplete
Attribute value : MED 0, localpref 100, pref-val 0
State : valid, internal, best, remoteredist
Source type : evpn remote-import
IP precedence : N/A
QoS local ID : N/A
Traffic index : N/A
Tunnel policy : a
Rely tunnel IDs : 2150629377
If the local PE has failed to learn a
valid and optimal BGP EVPN route, or if the PrefixSID attribute is missing from
the route, proceed to check for the following issues:
Incorrect recursive routing configuration for EVPN L3VPN over SRv6.
SRv6 TE policy issues.
5. In BGP-VPN IPv4 unicast address family view, execute the display this command to verify that the current configuration includes the segment-routing ipv6 traffic-engineering evpn or segment-routing ipv6 traffic-engineering best-effort evpn command. If neither of the commands is present, add the configuration. If either command is present, proceed to step 6.
<Sysname> system-view
[Sysname] bgp 100
[Sysname-bgp-default] ip vpn-instance vpn1
[Sysname-bgp-default-vpn1] address-family ipv4 unicast
[Sysname-bgp-default-ipv4-vpn1] display this
#
segment-routing ipv6 locator aaa evpn
segment-routing ipv6 traffic-engineering evpn
#
If the above recursive routing configuration is correct, see the SRv6 TE policy configuration in the segment routing configuration guide for the product to verify that the basic SRv6 configuration and the configuration for steering traffic to SRv6 TE policies are correct. If all settings are correct, proceed to the next step.
6. On each PE device, verify that the SRv6 TE policy in the route to the remote destination IP address is valid. Execute the display segment-routing ipv6 te policy command on each PE device. Identify the SRv6 TE policy that has a forwarding index value that is the same as the rely tunnel IDs value in the route to the destination IP address displayed by executing the display bgp routing-table command. Examine the Status field for the SRv6 TE policy to verify that it is up. If the policy is up, proceed to the next step. If the policy is down, see the SRv6 TE policy down issue troubleshooting procedure to resolve the issue.
<Sysname> display segment-routing ipv6 te policy
Name/ID: p1/0
Color: 10
Endpoint: 1000::1
Name from BGP:
BSID:
Mode: Dynamic Type: Type 2 Request state: Succeeded
Current BSID: 8000::1 Explicit BSID: - Dynamic BSID: 8000::1
Reference counts: 3
Flags: A/BS/NC
Status: Up
AdminStatus: Up
Up time: 2020-03-09 16:09:40
Down time: 2020-03-09 16:09:13
Hot backup: Enabled
Statistics: Enabled
Statistics by service class: Enabled
Path verification: Enabled
Drop-upon-invalid: Enabled
BFD trigger path-down: Enabled
SBFD: Enabled
Remote: 1000
SBFD template name: abc
SBFD backup template name: -
OAM SID: -
BFD Echo: Disabled
Forwarding index: 2150629377
…
Execute the ping srv6-te policy command to verify that the SRv6 TE policy has a valid path to reach the destination IP address. If BFD or SBFD is not configured to monitor the connectivity of the SRv6 TE policy, the up state of the policy only indicates that the first hop of the policy is reachable. To verify that all SIDs in the forwarding path are unreachable, you must perform this step.
<Sysname> ping srv6-te policy policy-name p1
Ping SRv6-TE policy (56 data bytes) , press CTRL+C to break
Segment list ID: 1
Preference=10, Path Type=Main, Protocol origin=Local, Originator=0,0.0.0.0, Discriminator=10, End.OP=none
56 bytes from 300:1::1, icmp_seq=0 ttl=63 time=2.000 ms
56 bytes from 300:1::1, icmp_seq=1 ttl=63 time=1.000 ms
56 bytes from 300:1::1, icmp_seq=2 ttl=63 time=0.000 ms
56 bytes from 300:1::1, icmp_seq=3 ttl=63 time=0.000 ms
56 bytes from 300:1::1, icmp_seq=4 ttl=63 time=0.000 ms
--- Ping6 SRv6-TE Policy statistics ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.000/0.600/2.000/0.800 ms
If the ping srv6-te policy command output shows that the SRv6 TE policy is not reachable, proceed to the next step.
7. Execute the display segment-routing ipv6 te segment-list command to identify the SIDs in the SID list of the best candidate path in the SRv6 TE policy.
<Sysname> display segment-routing ipv6 te segment-list
Total Segment lists: 2
Name/ID: s1/1
Origin: CLI
Status: Up
Nodes : 2
Flags: None
Local BSID: -
Reverse BSID: -
Reference counts: 0
Index : 10 SID: 200:1::1
Status : Up TopoStatus: Nonexistent
Type : Type_2 Flags: None
Coc Type : - Common prefix length: 0
Function length : 0 Args length: 0
Endpoint Behavior : -
Index : 20 SID: 300:1::1
Status : - TopoStatus: -
Type : Type_2 Flags: None
Coc Type : - Common prefix length: 0
Function length : 0 Args length: 0
Endpoint Behavior : -
Execute the display ipv6 routing-table ipv6-address command on each device in the forwarding path to verify that they have a valid route to each SRv6 SID, including the End.DT4 SID.
<Sysname> display ipv6 routing-table 300:1::A
Summary count : 2
Destination: 300:1::/64 Protocol : IS_L1
NextHop : FE80::A2C3:E2FF:FEB5:306 Preference: 15
Interface : Cost : 20
Execute the ping ipv6 command on each device in the forwarding path to verify the connectivity to the SRv6 SID.
<Sysname> ping ipv6 300:1::A
Ping6(56 data bytes) 1001::1 --> 300:1::A, press CTRL+C to break
56 bytes from 300:1::A, icmp_seq=0 hlim=63 time=2.000 ms
56 bytes from 300:1::A, icmp_seq=1 hlim=63 time=1.000 ms
56 bytes from 300:1::A, icmp_seq=2 hlim=63 time=0.000 ms
56 bytes from 300:1::A, icmp_seq=3 hlim=63 time=1.000 ms
56 bytes from 300:1::A, icmp_seq=4 hlim=63 time=0.000 ms
--- Ping6 statistics for 300:1::A ---
5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.000/0.800/2.000/0.748 ms
If the route to a SID is unreachable, check the SID list configuration for the SRv6 TE policy on the source node for SID list orchestration errors. If the SID list is correct, check the IGP for failure to advertise the locators that contain the SIDs to other devices. For more information about troubleshooting IGP, see the IP routing troubleshooting procedures.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting EVPN VPWS over SRv6
Troubleshooting EVPN VPWS over SRv6 BE traffic forwarding failure
Symptom
In an EVPN VPWS over SRv6 network, traffic forwarding fails in SRv6 BE mode.
Common causes
The following are the common causes of this type of issue:
· The BGP EVPN peers are not successfully established.
· The EVPN instance configurations do not match on both ends.
· The AC interface state is not up, or the AC access methods configured on both ends are different.
· The EVPN route cannot be steered to the SRv6 BE tunnel.
Analysis
Figure 101 shows the troubleshooting flowchart.
Figure 101 Flowchart for troubleshooting EVPN VPWS over SRv6 BE traffic forwarding failure
Solution
To resolve the issue:
1. Verify that the BGP EVPN peers are successfully established.
Execute the display bgp peer l2vpn evpn command on the local PE to verify that the BGP EVPN peers have been successfully established.
¡ If the State field in the output information is Established, the BGP EVPN peers have been successfully established between the PEs. If the condition exists, proceed to step 2.
¡ If the condition does not exist, resolve the issue of BGP EVPN peer establishment failure. For more information, see the analysis for locating the BGP peer establishment failure.
<PE1> display bgp peer l2vpn evpn
BGP local router ID: 1.1.1.1
Local AS number: 100
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
2::2 100 13 10 0 2 00:00:05 Established
2. Verify that the EVPN VPWS over SRv6 configurations on both PEs match.
In the EVPN VPWS over SRv6 network, the Route Target and Service ID settings on both ends must match. The encapsulation method used by EVPN must be SRv6. In addition, the MTU, control word, and SRv6 PW data encapsulation type settings must be the same.
Verify that the configurations on both PEs match by following these steps:
a. Execute the display this command in cross-connect group view of the two PEs to check the encapsulation method and Route Target of EVPN. If the encapsulation method is not SRv6, execute the evpn encapsulation srv6 command in cross-connect group view to modify the encapsulation method. If the Export RT value of one PE is not within the Import RT value range of the other PE, execute the vpn-target command in cross-connect group EVPN instance view to edit the RT value, so that the RTs of the two PEs match.
<PE1> system-view
[PE1] xconnect-group vpna
[PE1-xcg-vpna] display this
#
xconnect-group vpna
evpn encapsulation srv6
route-distinguisher 1:1
vpn-target 1:1 export-extcommunity
vpn-target 1:1 import-extcommunity
connection abc
segment-routing ipv6 locator aaa
evpn local-service-id 1 remote-service-id 2
ac interface GigabitEthernet 2/0/1
#
return
b. Execute the display evpn route xconnect-group command on both PEs to view the Service ID, MTU, and control word information.
- Service ID: The local service ID on one PE must be the same as the remote service ID on the other PE. If they are different, the SRv6 PW cannot be established. If the service IDs do not match, you need to execute the evpn local-service-id remote-service-id command in cross-connect view of the PE to edit the local service ID or remote service ID for them to match each other.
- MTU: View the local MTU value through the Local MTU field. If the MTU values of both ends are different, you need to edit the MTU value by executing the mtu command in cross-connect view. If the MTU value on one PE is 0, it can match any MTU value on the remote PE, and you do not need to edit the MTU.
- SRv6 PW data encapsulation type: Check the local SRv6 PW data encapsulation type via the PW type field. If the data encapsulation types on both ends are different, you need to edit the data encapsulation type of the PW in the PW template specified for the SRv6 PW with the srv6-pw-type command.
- Control word: The control word settings on both PEs must be identical. If the Flags field value does not include C, the control word feature is not enabled. Otherwise, the control word feature is enabled. If the control word settings on both PEs are different, you need to modify the control word configuration in the PW template specified for the SRv6 PW with the control-word enable command.
<PE1> display evpn route xconnect-group
Ctrl Flags: P - Primary, B - Backup, C - Control word
Xconnect group name: vpna
Connection name: pw1
Encapsulation : SRv6
ESI : 0000.0000.0000.0000.0000
Local service ID : 1
Remote service ID : 2
In SID[DX2] : 100::1:0:2
In SID[DX2L] : -
Local MTU : 1500
AC State : Up
Tunnel policy : -
PW class : -
PW type : Ethernet
SRv6 Tunnel:
Next Hop : 2::2
ESI : 0000.0000.0000.0000.0000
Out SID : 200::1:0:2
Flags : P
MTU : 1500
State : Up
If the settings on both PEs match but the issue persists, proceed to the next step.
3. Verify that the AC interface is up.
Execute the display evpn route xconnect-group command on the PE to view the state of the AC. If the AC is in down state, check the network connection and resolve the physical link down issue.
<PE1> display evpn route xconnect-group
Ctrl Flags: P - Primary, B - Backup, C - Control word
Xconnect group name: vpna
Connection name: pw1
Encapsulation : SRv6
ESI : 0000.0000.0000.0000.0000
Local service ID : 1
Remote service ID : 2
In SID[DX2] : 100::1:0:2
In SID[DX2L] : -
Local MTU : 1500
AC State : Up
SRv6 Tunnel:
Next Hop : 2::2
ESI : 0000.0000.0000.0000.0000
Out SID : 200::1:0:2
Flags : P
MTU : 1500
State : Up
4. Verify that the AC access modes on both PEs are consistent.
Execute the display l2vpn forwarding ac verbose command on both PEs to check the AC access mode. If the two ends use different AC access modes, traffic forwarding might fail. You need to modify the AC access mode through the access-mode keyword of the ac interface command in cross-connect view.
<PE1> display l2vpn forwarding ac verbose
Xconnect-group Name: vpws1
Connection Name: actopw
Interface:
Link ID : 1
Access Mode : Ethernet
Interface:
Link ID : 1
Access Mode : Ethernet
Reflector :
IP Address : 100.1.1.4
MAC Address : 8850-fc51-5cee
Src Port : 200
Dst Port : 201
5. Verify that the EVPN route is steered to the SRv6 BE tunnel.
Execute the display l2vpn peer srv6 verbose command on the PE to examine the SRv6 BE tunnel to which the EVPN route is steered.
<PE1> display l2vpn peer srv6 verbose
Xconnect-group Name: vpna
Connection Name: pw1
Peer: 2::2
Remote Service ID : 2
Signaling Protocol : EVPN
Link ID : 0x1
Sub Link ID : 0x0
SRv6 Tunnel State : Up
In SID : 100::1:0:2
Out SID : 200::1:0:2
MTU : 1500
SRv6 Tunnel Attributes : Main
Tunnel Group ID : 0x1000000030080000
Tunnel NHLFE IDs : -
Nexthop/Interface : FE80::7A6F:24FF:FE26:306 / GE2/0/2
Color : -
Color-Only : -
Recursion Mode : SID based
¡ If the Nexthop/Interface field has a value, the EVPN route is steered to the SRv6 BE tunnel. Execute the display ipv6 fib command to identify whether the forwarding information of the next hop address in the output information is accurate. If it is not accurate, contact technical support.
<PE1> display ipv6 fib FE80::7A6F:24FF:FE26:306
FIB entry count: 1
Flag:
U:Usable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination: FE80:: Prefix length: 10
Nexthop : :: Flags: U
Time stamp : 0x1 Label: Null
Interface : InLoop0 Token: Invalid
¡ If the value of the Nexthop/Interface field is a hyphen (-), the EVPN route is not steered to the SRv6 BE tunnel. Execute the display ipv6 routing-table command to verify that a route is available to the SRv6 SID. If no such an IPv6 route exists, resolve the IGP route learning issue. For more information, see the Layer 3 IP routing troubleshooting guide.
<PE1> display ipv6 routing-table 200::1:0:2
Summary count : 1
Destination: 200::/64 Protocol : O_INTRA
NextHop : FE80::7A6F:24FF:FE26:306 Preference: 10
Interface : GE2/0/2 Cost : 2
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting EVPN VPWS over SRv6 TE traffic forwarding failure
Symptom
In the EVPN VPWS over SRv6 network, traffic forwarding through an SRv6 TE policy fails in SRv6 TE mode.
Common causes
The following are the common causes of this type of issue:
· The BGP EVPN peers are not successfully established.
· The EVPN instance configurations do not match on both ends.
· The AC interface state is not up, or the AC access methods configured on both ends are different.
· Traffic steering in SRv6 TE mode is not configured.
· The EVPN route cannot be steered to the SRv6 TE policy.
Analysis
Figure 102 shows the troubleshooting flowchart.
Figure 102 Flowchart for troubleshooting EVPN VPWS over SRv6 TE policy traffic forwarding failure
Solution
To resolve the issue:
1. Verify that the BGP EVPN peers are successfully established.
Execute the display bgp peer l2vpn evpn command on the local PE to verify that the BGP EVPN peers have been successfully established.
¡ If the State field in the output information is Established, the BGP EVPN peers have been successfully established between the PEs. If the condition exists, proceed to step 2.
¡ If the condition does not exist, resolve the issue of BGP EVPN peer establishment failure. For more information, see the analysis for locating the BGP peer establishment failure.
<PE1> display bgp peer l2vpn evpn
BGP local router ID: 1.1.1.1
Local AS number: 100
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
2::2 100 13 10 0 2 00:00:05 Established
2. Verify that the EVPN VPWS over SRv6 configurations on both PEs match.
In the EVPN VPWS over SRv6 network, the Route Target and Service ID settings on both ends must match. The encapsulation method used by EVPN must be SRv6. In addition, the MTU, control word, and PW data encapsulation type settings must be the same.
Verify that the configurations on both PEs match by following these steps:
a. Execute the display this command in cross-connect group view of the two PEs to check the encapsulation method and Route Target of EVPN. If the encapsulation method is not SRv6, execute the evpn encapsulation srv6 command in cross-connect group view to modify the encapsulation method. If the Export RT value of one PE is not within the Import RT value range of the other PE, execute the vpn-target command in cross-connect group EVPN instance view to edit the RT value, so that the RTs of the two PEs match.
<PE1> system-view
[PE1] xconnect-group vpna
[PE1-xcg-vpna] display this
#
xconnect-group vpna
evpn encapsulation srv6
route-distinguisher 1:1
vpn-target 1:1 export-extcommunity
vpn-target 1:1 import-extcommunity
connection abc
segment-routing ipv6 locator aaa
evpn local-service-id 1 remote-service-id 2
ac interface GigabitEthernet 2/0/1
#
return
b. Execute the display evpn route xconnect-group command on both PEs to view the Service ID, MTU, and control word information.
- Service ID: The local service ID on one PE must be the same as the remote service ID on the other PE. If they are different, the SRv6 PW cannot be established. If the service IDs do not match, you need to execute the evpn local-service-id remote-service-id command in cross-connect view of the PE to edit the local service ID or remote service ID for them to match each other.
- MTU: View the local MTU value through the Local MTU field. If the MTU values of both ends are different, you need to edit the MTU value by executing the mtu command in cross-connect view. If the MTU value on one PE is 0, it can match any MTU value on the remote PE, and you do not need to edit the MTU.
- SRv6 PW data encapsulation type: Check the local SRv6 PW data encapsulation type via the PW type field. If the data encapsulation types on both ends are different, you need to edit the data encapsulation type of the PW in the PW template specified for the SRv6 PW with the srv6-pw-type command.
- Control word: The control word settings on both PEs must be identical. If the Flags field value does not include C, the control word feature is not enabled. Otherwise, the control word feature is enabled. If the control word settings on both PEs are different, you need to modify the control word configuration in the PW template specified for the SRv6 PW with the control-word enable command.
<PE1> display evpn route xconnect-group
Ctrl Flags: P - Primary, B - Backup, C - Control word
Xconnect group name: vpna
Connection name: pw1
Encapsulation : SRv6
ESI : 0000.0000.0000.0000.0000
Local service ID : 1
Remote service ID : 2
In SID[DX2] : 100::1:0:2
In SID[DX2L] : -
Local MTU : 1500
AC State : Up
Tunnel policy : -
PW class : -
PW type : Ethernet
SRv6 Tunnel:
Next Hop : 2::2
ESI : 0000.0000.0000.0000.0000
Out SID : 200::1:0:2
Flags : P
MTU : 1500
State : Up
If the settings on both PEs match but the issue persists, proceed to the next step.
3. Verify that the AC interface is up.
Execute the display evpn route xconnect-group command on the PE to view the state of the AC. If the AC is in down state, check the network connection and resolve the physical link down issue.
<PE1> display evpn route xconnect-group
Ctrl Flags: P - Primary, B - Backup, C - Control word
Xconnect group name: vpna
Connection name: pw1
Encapsulation : SRv6
ESI : 0000.0000.0000.0000.0000
Local service ID : 1
Remote service ID : 2
In SID[DX2] : 100::1:0:2
In SID[DX2L] : -
Local MTU : 1500
AC State : Up
SRv6 Tunnel:
Next Hop : 2::2
ESI : 0000.0000.0000.0000.0000
Out SID : 200::1:0:2
Flags : P
MTU : 1500
State : Up
4. Verify that the AC access modes on both PEs are consistent.
Execute the display l2vpn forwarding ac verbose command on both PEs to check the AC access mode. If the two ends use different AC access modes, traffic forwarding might fail. You need to modify the AC access mode through the access-mode keyword of the ac interface command in cross-connect view.
<PE1> display l2vpn forwarding ac verbose
Xconnect-group Name: vpws1
Connection Name: actopw
Interface:
Link ID : 1
Access Mode : Ethernet
Interface:
Link ID : 1
Access Mode : Ethernet
Reflector :
IP Address : 100.1.1.4
MAC Address : 8850-fc51-5cee
Src Port : 200
Dst Port : 201
5. Verify that traffic steering in SRv6 TE mode is configured.
In cross-connect group EVPN instance view, execute the display this command to verify that the segment-routing ipv6 traffic-engineering or segment-routing ipv6 traffic-engineering best-effort command is configured. If no such commands exist, configure the command. If the command exists, proceed to step 6.
<PE1> system-view
[PE1] xconnect-group vpna
[PE1-xcg-vpna] evpn encapsulation srv6
[PE1-xcg-vpna-evpn-srv6] display this
#
evpn encapsulation srv6
route-distinguisher 1:1
vpn-target 1:1 export-extcommunity
vpn-target 1:1 import-extcommunity
segment-routing ipv6 traffic-engineering
#
return
6. Verify that the EVPN route is steered to the SRv6 TE policy.
Execute the display l2vpn peer srv6 verbose command on the PE to examine the SRv6 TE policy to which the EVPN route is steered.
<PE1> display l2vpn peer srv6 verbose
Xconnect-group Name: vpna
Connection Name: pw1
Peer: 2::2
Remote Service ID : 2
Signaling Protocol : EVPN
Link ID : 0x1
Sub Link ID : 0x0
SRv6 Tunnel State : Up
In SID : 100::1:0:2
Out SID : 200::1:0:2
MTU : 1500
SRv6 Tunnel Attributes : Main
Tunnel Group ID : 0x1000000230080001
Tunnel NHLFE IDs : 2150629377
Nexthop/Interface : -
Color : 10
Color-Only : 11
Recursion Mode : Nexthop based
¡ If the Tunnel NHLFE IDs field has a value, the EVPN route is steered to the SRv6 TE policy, and this value is the tunnel index of the SRv6 TE policy to which the EVPN route is steered. Execute the display segment-routing ipv6 te policy command on PE. If the Forwarding index field value is the same as the Tunnel NHLFE IDs field value, this SRv6 TE policy is the one to which the EVPN route is steered. Execute the display l2vpn forwarding srv6 verbose command on the PE, and identify whether the SRv6 Tunnel State field value is Up. If it is Down, contact technical support. If it is Up, verify that such issues exist as the SRv6 TE policy SID list and the packet forwarding path planning are different and a physical link becomes faulty on the SRv6 TE policy packet forwarding path. For how to resolve such issues, see the analysis for locating the issue that the SRv6 TE policy cannot take effect.
<PE1> display l2vpn forwarding srv6 verbose
Xconnect-group Name: vpna
Connection Name: pw1
Link ID : 0x1
SRv6 Tunnel Type : Ethernet
SRv6 Tunnel State : Up
In SID : 100::1:0:2
Out SID : 200::1:0:2
MTU : 1500
SRv6 Tunnel Attributes : Main
SRv6 Forwarding IDs : 2150629377
<PE1> display segment-routing ipv6 te policy
Name/ID: p1/0
Color: 10
End-point: 2::2
Name from BGP:
Name from PCE:
BSID:
Mode: Dynamic Type: Type_2 Request state: Succeeded
Current BSID: 100::1:0:1 Explicit BSID: - Dynamic BSID: 100::1:0:1
Reference counts: 5
Flags: A/BS/NC
Status: Up
AdminStatus: Up
Up time: 2022-05-13 18:53:48
Down time: 2022-05-13 18:49:56
Hot backup: Disabled
Statistics: Disabled
Statistics by service class: Disabled
Path verification: Not configured
Drop-upon-invalid: Disabled
BFD trigger path-down: Disabled
SBFD: Disabled
BFD Echo: Disabled
BFD no-bypass: Disabled
Forwarding index: 2150629377
Association ID: 1
Service-class: -
Rate-limit: -
PCE delegation: Disabled
PCE delegate report-only: Disabled
Reoptimization: Disabled
Encapsulation mode: -
Flapping suppression Remaining interval: -
Candidate paths state: Configured
Candidate paths statistics:
CLI paths: 1 BGP paths: 0 PCEP paths: 0 ODN paths: 0
Candidate paths:
Preference : 10
Network slice ID: -
CPathName:
ProtoOrigin: CLI Discriminator: 10
Instance ID: 0 Node address: 0.0.0.0
Originator: 0, ::
Optimal: Y Flags: V/A
Dynamic: Not configured
PCEP: Not configured
Explicit SID list:
ID: 1 Name: s1
Weight: 1 Forwarding index: 2149580802
State: Up State(-): -
Verification State: -
Path MTU: 1500 Path MTU Reserved: 0
Local BSID: -
Reverse BSID: -
¡ If the Tunnel NHLFE IDs field value is a hyphen (-), the EVPN route is not steered to the SRv6 TE policy. You need to identify whether the SRv6 TE policy configuration on the PE is correct, and resolve the issue that the SRv6 TE policy cannot come up. For more information, see the analysis for locating the issue that the SRv6 TE policy cannot take effect.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting SR-MPLS
An SR-MPLS BE tunnel cannot be established
Symptom
The output from the display mpls lsp command on a node shows that the node does not have an outgoing label or the outgoing label is not SR-MPLS allocated when an SR-MPLS BE tunnel is established. For example, the FEC for the egress node is 5.5.5.5/32. The following output shows that no SR-MPLS outgoing label to 5.5.5.5/32 exists on this node, indicating that no SRLSP destined for 5.5.5.5/32 exists.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
12.1.1.2 Local -/- GE2/0/1
Tunnel1 Local -/- NHLFE2
Tunnel10 Local -/- NHLFE1
1.1.1.1/32 ISIS 16010/- -
2.2.2.2/32 ISIS 16020/3 GE2/0/1
2.2.2.2/32 ISIS -/3 GE2/0/1
3.3.3.3/32 ISIS 16030/16030 GE2/0/1
3.3.3.3/32 ISIS -/16030 GE2/0/1
4.4.4.4/32 ISIS 16040/16040 GE2/0/1
4.4.4.4/32 ISIS -/16040 GE2/0/1
1.1.1.1/1/4122 SR-TE -/16030 GE2/0/1
16040
Common causes
The following are the common causes of this type of issue:
· Physical link failure.
· SR-MPLS label publishment failed, because the
IGP or BGP peer relationship was not established.
The SR-MPLS configuration is missing or incorrect.
Troubleshooting flowchart
Figure 103 shows the troubleshooting flowchart.
Figure 103 Flowchart for troubleshooting SR-MPLS BE tunnel establishment failure
Solution
To resolve the issue:
1. On each node along the SRLSP, execute the display interface brief command to verify that both the physical link state and the data link layer state of each interface on the SRLSP are up.
2. Verify that an IGP/BGP peer relationship is established correctly and the IGP/BGP configuration is correct on each node that the SRLSP traverses. The troubleshooting procedure depends on the IGP protocol used:
¡ When the IGP protocol is OSPF:
- Execute the display ospf command to check for the Opaque capable field in the command output. If this field exists, Opaque LSA advertisement and reception capability is enabled in OSPF. If this field does not exist, execute the opaque-capability enable command in OSPF view to enable opaque LSA advertisement and reception.
- Execute the display ospf peer command to check the value for the State field. If the value is Full, the neighboring routers are fully adjacent. If the value is not Full, see "OSPFv3 neighbor unable to enter Full state" in OSPFv3 Troubleshooting Guide to troubleshoot the issue.
- Execute the display mpls lsp command to check for the OSPF LSP. The SR prefix SID for each node is manually assigned to the loopback address. If no such LSP is available, verify that OSPF is enabled on each node by using the ospf area command in loopback interface view or the network command in OSPF area view.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
1.1.1.9/32 OSPF 16010/- -
1.1.1.9/32 ISIS 16010/- -
2.2.2.9/32 OSPF 16020/17020 RAGG1.4
- If the output from the display mpls lsp command also contains the BGP LSP, SRLSP generation might fail because of prefix SID conflict. In this case, execute the peer route-policy command to filter out routes learned from the BGP peer.
¡ When the IGP protocol is IS-IS:
- Execute the display isis command to check the value for the Cost style field to identify whether the link cost style is wide, compatible, or wide-compatible. If the link cost style is neither of them, execute the cost-style command to change the link cost style.
- Execute the display isis peer command to check the value of the State field. If the value is Up, the IS-IS neighbor relationship is normal. If it is not Up, see "IS-IS neighbor establishment failure" in IS-IS Troubleshooting Guide to troubleshoot the issue.
- Execute the display mpls lsp command to check for the IS-IS LSP. The SR prefix SID for each node is manually assigned to the loopback address. If no such LSP is available, check for the isis enable command in loopback interface view on each node.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
1.1.1.9/32 OSPF 16010/- -
1.1.1.9/32 ISIS 16010/- -
2.2.2.9/32 ISIS 16020/17020 RAGG1.4
- If the output from the display mpls lsp command also contains the BGP LSP, SRLSP generation might fail because of prefix SID conflict. In this case, execute the peer route-policy command to filter out routes learned from the BGP peer.
¡ When the IGP protocol is BGP:
- Execute the display bgp peer command to check the value for the State field. If the value is Established, the BGP session is normal. If the value is not Established, see "BGP peer establishment failure" in BGP troubleshooting Guide to troubleshoot the issue.
- Execute the display mpls lsp command to check for the BGP LSP. If no such LSP is available, verify that the peer label-route-capability command is configured to enable BGP to exchange labeled routes with a peer or peer group.
<Sysname> display mpls lsp
FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX
1.1.1.9/32 OSPF 16010/- -
1.1.1.9/32 ISIS 16010/- -
2.2.2.9/32 BGP 16020/17020 RAGG1.4
3. Check the SR-MPLS configuration on each node that the SRLSP traverses:
a. Verify that SR-MPLS is enabled in IS-IS view, OSPF view, or BGP view. If it is not enabled, execute the segment-routing mpls command to enable SR-MPLS.
b. Verify that a prefix SID has been configured in loopback interface view. If it is not configured, execute the ospf prefix-sid command in OSPF view or isis prefix-sid command in IS-IS view to configure a prefix SID.
c. Execute the display segment-routing label-block command to identify whether the prefix SID configured in loopback interface view is within the SRGB label range. If the prefix SID is not within the SRGB range, edit the configured prefix SID.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
The state of the SR-MPLS TE Tunnel is down
Symptom
The output from the display mpls te tunnel-interface command on the ingress node shows that the SR-MPLS TE tunnel is down.
<Sysname> display mpls te tunnel-interface
Tunnel Name : Tunnel 1
Tunnel Signalled Name : tunnel1
Tunnel State : Down (Main CRLSP Down. Backup CRLSP Down.)
...
Common causes
The following are the common causes of this type of issue:
· A physical link failure exists on the SRLSPs used by the SR-MPLS TE tunnel.
· The BFD session for detecting the SR-MPLS TE tunnel is down.
· The SR-MPLS configuration is missing or incorrect.
· Incorrect SR-MPLS TE tunnel configuration.
Troubleshooting flowchart
Figure 104 shows the troubleshooting flowchart.
Figure 104 Flowchart for troubleshooting SR-MPLS TE tunnel down
Solution
To resolve the issue:
1. On each node along the SRLSP, execute the display interface brief command to verify that both the physical link state and the data link layer state of each interface on the SRLSP are up.
2. Identify whether the SR TE tunnel down is caused by a BFD session down.
a. Execute the display this command in SR MPLS-TE tunnel interface view to check for the mpls bfd, mpls sbfd, mpls tunnel-bfd, or mpls tunnel-sbfd command. If any command is available, go to step 3
b. Execute the display mpls bfd or display mpls sbfd command to check the BFD or SBFD session state.
c. If the status is down, execute the undo mpls bfd, undo mpls sbfd, undo mpls tunnel-bfd, or undo mpls tunnel-sbfd command to delete BFD/SBFD related commands.
3. If the BFD/SBFD session is normal or no BFD/SBFD session exists, check the SR-MPLS configuration.
a. In IS-IS view or OSPF view, check the following configuration to verify that SR-MPLS is supported:
- When the IGP protocol is IS-IS, execute the display isis command to check the value for the Cost style field to identify whether the link cost style is wide, compatible, or wide-compatible. If the link cost style is neither of them, execute the cost-style command to change the link cost style.
- When the IGP protocol is OSPF, execute the display ospf command to check for the Opaque capable field in the command output. If this field exists, Opaque LSA advertisement and reception capability is enabled in OSPF. If this field does not exist, execute the opaque-capability enable command in OSPF view to enable opaque LSA advertisement and reception.
b. If you use prefix SIDs for IP traffic forwarding over SRLSPs, identify whether a prefix SID has been configured in loopback interface view. If it is not configured, execute the ospf prefix-sid command in OSPF view or the isis prefix-sid command in IS-IS view to configure a prefix SID. If you use adjacency SIDs for IP traffic forwarding over SRLSPs, enable adjacency SID allocation in OSPF view or IS-IS view, or identify whether an adjacency SID has been configured on the interface on the SRLSP forwarding path. If it is not configured, execute the segment-routing adjacency enable command in OSPF view or IS-IS view to enable SR-MPLS adjacency SID allocation. You can also execute the isis adjacency-sid or ospf adjacency-sid command in interface view to assign an adjacency SID to an adjacency.
c. Execute the display segment-routing label-block command to identify whether the prefix SID configured in loopback interface view is within the SRGB label range, and whether the adjacency SID configured in interface view is within the SRLB label range. If the prefix SID is not within the SRGB range or the adjacency SID is not within the SRLB label range, change the configured adjacency SID.
4. Check MPLS-TE tunnel configuration and perform the following tasks based on the establishment mode of the MPLS-TE tunnel:
¡ Over a static SRLSP—Execute the display mpls static-sr-mpls command on the ingress node of the SRLSP to verify that the ordered list of labels represented by the Out-Label field matches the labels allocated for the nodes that the static SRLSP traverses. If the label sequence in the outgoing label stack on the ingress node does not match the static labels configured on each node along the SRLSP, execute the static-sr-mpls lsp command to change the label sequence in the outgoing label stack on the ingress node.
¡ Over an explicit-path SRLSP—Execute the display explicit-path command on the ingress node of the SRLSP to verify that the IP addresses or SIDs match the IP addresses of the nodes along the SRLSP path or local SIDs. In addition, make sure the SID type specified by the nexthop command in explicit path view on the ingress node is consistent with the prefix SID or adjacency SID type configured in interface view on each node along the SRLSP. This means if a prefix SID is configured on the interface, the SID specified by the nexthop command must also be a prefix SID. If they are inconsistent, execute the nexthop command to change the IP address or SID.
¡ Over a PCE-calculated SRLSP—Check for the mpls te delegation command in MPLS-TE tunnel interface view and execute the display mpls te pce peer command to identify whether the PCC and PCE have established a PCEP session. Use packet capture to verify that the controller (PCE) has performed path updates and that the path is correct. In the captured packets, make sure the adjacency SID or next-hop address sent by the PCE uses strict mode, and the prefix SID or node address uses loose mode. If the PCC and PCE have not established a PCEP session and the captured packets do not meet the above requirements, check the configuration on the controller.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
TE/5/TE_BACKUP_SWITCH
SRv6 TE policy issues
SRv6 TE policy cannot take effect
Symptom
An SRv6 TE policy fails the connectivity verification performed by using the ping srv6-te policy command. The command output shows SRv6 TE policy anomalies, indicating that the SRv6 TE policy cannot forward packets properly. For example:
<Sysname> ping srv6-te policy policy-name p1
The SRv6-TE policy does not reference a SID list or the referenced SID list is down.
Common causes
The following are the common causes of this type of issue:
· The SRv6 TE policy is shut down administratively.
· The BSID configuration of the SRv6 TE policy is incorrect or has conflicts.
· Some configuration for the SRv6 TE policy is missing.
· The number of SRv6 TE policies has exceeded the limit.
· The number of SIDs in the segment list has exceeded the limit.
· The SID list of the SRv6 TE policy differs from the planned packet forwarding path.
· Physical link faults have occurred on the forwarding path of the SRv6 TE policy.
Troubleshooting flow
Figure 105 shows the troubleshooting flowchart.
Figure 105 Flowchart for troubleshooting SRv6 TE policy failure to take effect
Solution
1. On the source node of the SRv6 TE policy, execute the display segment-routing ipv6 te policy status command for a preliminary identification of the reasons why the SRv6 TE policy is not taking effect.
<Sysname> display segment-routing ipv6 te policy status
Name/ID: p1/0
Status: Down
Check admin status : Failed
Check for endpoint & color : Passed
Check for segment list : Passed
Check valid candidate paths : Failed
Check for BSIDs : -
If the Check admin status field shows Failed, it means the SRv6 TE policy has been administratively shut down. Execute the undo shutdown command in SRv6 TE policy view to bring the policy up.
After the SRv6 TE policy is administratively up, execute the display segment-routing ipv6 te policy status command again to identify other fields displaying Failed or a hyphen (-). If the Check for segment List field displays as Failed, proceed to the following step.
2. Verify that no conflict occurs with the BSID of the SRv6 TE policy.
Execute the display segment-routing ipv6 te policy command at the source node of the SRv6 TE policy. If the Request state field displays Failed, it indicates a BSID request failure. The statically specified BSID might not be within the locator range or it might be duplicated with the BSID of an existing SRv6 TE policy, causing the SRv6 TE policy to become invalid. As a best practice, execute the undo binding-sid command for the invalid SRv6 TE policy to delete the statically specified BSID. The system automatically allocates a BSID to prevent errors and conflicts.
<Sysname> display segment-routing ipv6 te policy
Name/ID: p1/0
Color: 10
Endpoint: 1000::1
Name from BGP:
BSID:
Mode: Dynamic Type: Type 2 Request state: Succeeded
Current BSID: 8000::1 Explicit BSID: - Dynamic BSID: 8000::1
Reference counts: 3
Flags: A/BS/NC
If the issue persists after the successful BSID allocation, proceed to the following step.
3. Verify that the SRv6 TE policy configuration is complete.
Assume that IS-IS is used to advertise SIDs. On the source node of the SRv6 TE policy, execute the display current-configuration command to view the current configuration of the SRv6 TE policy. Compare the configuration with the configuration in the following example. If any configuration item is missing, it indicates that the policy configuration is incomplete.
isis 1
address-family ipv6 unicast
segment-routing ipv6 locator a
segment-routing ipv6
locator a ipv6-prefix 1000:0:0:1:: 64 static 16
traffic-engineering
srv6-policy locator a
segment-list sl1
index 10 ipv6 1000::2:0:0:1:0
index 20 ipv6 1000::2:0:0:1:3
policy p1
color 100 end-point ipv6 4::4
candidate-paths
preference 100
explicit segment-list sl1
On each node of the SRv6 TE policy forwarding path, you must execute the segment-routing ipv6 locator command in the IGP view in order to advertise the locator. For example:
isis 1
address-family ipv6 unicast
segment-routing ipv6 locator b
If the configuration is incomplete, supplement the missing parts. If the configuration is fully completed but the problem persists, proceed to the following step.
4. Verify that the number of SRv6 TE policies and that of segment lists do not exceed the limit.
Execute the display segment-routing ipv6 te policy statistics command on the source node of the SRv6 TE policy to Identify whether the number of resources used by SRv6 TE policies has reached the limit.
<Sysname> display segment-routing ipv6 te policy statistics
IPv6 TE Policy Database Statistics
…
SRv6-TE policy resource information:
Max resources: 1024
Used resources: 1
Upper threshold: 512 (50%)
Lower threshold: 102 (10%)
SID list resource information:
Max resources: 4096
Used resources: 1
Upper threshold: 3277 (80%)
Lower threshold: 1638 (40%)
…
¡ If the value of the Used resources field in SRv6-TE policy resource information is equal to the value of the Max resources field, it indicates that the number of SRv6 TE policies might have exceeded the limit. In this case, delete the unnecessary SRv6 TE policies.
¡ If the value of the Used resources field in SID list resource information is equal to the value of the Max resources field, it indicates that the number of segment lists might have exceeded the limit. In this case, delete the unnecessary segment lists.
¡ If the number of SRv6 TE policies and that of segment lists have not exceeded the limit, proceed to the following step.
5. Verify that the number of SIDs in the segment list does not exceed the limit.
Enter probe view on the source node of the SRv6 TE policy, and execute the display system internal segment-routing ipv6 te policy status command. In the command output, the MaxSIDs field value represents the maximum number of SIDs allowed in the segment list.
[Sysname-probe] display system internal segment-routing ipv6 te policy status
…
MaxGroupNidNum: 1024 MaxPolicyNidNum: 1024
MaxSeglistNidNum: 4096 MaxNexthopNidNum: 65535
MaxOutNum: 32 MaxEcmpNum: 16
MaxSIDs: 10
…
Execute the display segment-routing ipv6 te segment-list command. In the command output, the Nodes field indicates the number of SID nodes configured in the specified segment list.
<Sysname> display segment-routing ipv6 te segment-list
Total Segment lists: 1
Name/ID: A/1
Origin: CLI
Status: Up
Verification State: Down
Nodes: 11
…
If the number of SID nodes configured exceeds the maximum number of SIDs supported, delete unnecessary SID values in the segment list. If the number of SID nodes configured does not exceed the limit, proceed to the following step.
6. Verify that the configuration of the SID list is consistent with the planned forwarding path.
Execute the display segment-routing ipv6 te segment-list command on the source node of the SRv6 TE policy to display the SID list information. The SID values arranged from top to bottom represent nodes or links from near to far to the source node of the SRv6 TE policy. If the Status field value is Down, it indicates that the locator to which the SID belongs has not been learned correctly. In this case, troubleshoot this issue as described in the OSPFv3 or IS-IS troubleshooting manual.
[Sysname] display segment-routing ipv6 te segment-list
Total Segment lists: 1
Name/ID: s1/1
Origin: CLI
Status: Down
Verification State: Down
Nodes : 3
Index : 10 SID: 1::1
Status : UP TopoStatus: Nonexistent
Type : Type_2 Flags: None
Coc Type : - Common prefix length: 0
Index : 20 SID: 1::2
Status : Down TopoStatus: Nonexistent
Type : Type_2 Flags: None
Coc Type : - Common prefix length: 0
Index : 30 SID: 1::3
Status : Down TopoStatus: Nonexistent
Type : Type_2 Flags: None
Coc Type : - Common prefix length: 0
On each node along the SRv6 TE policy forwarding path, execute the display segment-routing ipv6 local-sid command in sequence to check whether the SID values are consistent with those in the SID list displayed by the display segment-routing ipv6 te segment-list command. The SID types are usually End SID and End.X SID. For example, for End SID, view the information of the SRv6 Local SID.
[Sysname] display segment-routing ipv6 local-sid end
Local SID forwarding table (End)
Total SIDs: 2
SID : 1000::2:0:0:1:0/64
Function type : End Flavor : PSP
Locator name : b Allocation type: Dynamic
Owner : IS-IS-1 State : Active
Create Time : Sep 04 16:32:03.443 2021
If the SID list does not match the SID values of the nodes on the forwarding path, execute the undo index index-number command to delete the incorrect SID, and then run the index index-number ipv6 ipv6-address command to reconfigure the correct SID. If the SID list is consistent with the plan, proceed to the following step.
7. On each node along the SRv6 TE policy forwarding path, check the physical link state with the display interface brief command. Ensure that both the physical state and the data link layer (DDL) protocol state of each interface are UP on the forwarding path. If the link is normal, or if the issue persists after link faults are cleared, proceed to the following step.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· SRPV6/2/SRPV6_BSID_CONFLICT
· SRPV6/2/SRPV6_BSID_CONFLICT_CLEAR
· SRPV6/5/SRPV6_PATH_STATE_DOWN
· SRPV6/4/SRPV6_POLICY_STATUS_CHG
· SRPV6/4/SRPV6_RESOURCE_EXDCEED
· SRPV6/4/SRPV6_RESOURCE_EXCEED_CLEAR
· SRPV6/5/SRPV6_SEGLIST_STATE_DOWN
· SRPV6/5/SRPV6_ SEGLIST_STATE_DOWN
· SRPV6/2/SRPV6_STATE_DOWN
· SRPV6/2/SRPV6_STATE_DOWN_CLEAR
Troubleshooting VPN issues
Troubleshooting EVPN issues
Troubleshooting EVPN VPLS over SRv6 BE traffic forwarding failure
Symptom
As shown in Figure 106, EVPN VPLS uses SRv6 BE tunnels as public network tunnels, and CE 1 is multi-homed to PE 1 and PE 2. In this network, broadcast and unicast traffic forwarding fails between CE 1 and CE 2.
Common causes
The following are the common causes of this type of issue:
· The BGP EVPN peers are not established between the PEs.
· The PEs have not received Type 3 routes (IMET routes).
· The PEs have not received Type 2 routes (MAC/IP advertisement routes).
· The PEs have not received Type 1 routes (Ethernet auto-discovery routes).
· The Route Target attribute carried in the EVPN route does not match the locally configured Import Route Target attribute.
· The route to the SRv6 SID does not exist on the PE.
Analysis
Figure 107 shows the troubleshooting flowchart.
Figure 107 Flowchart for troubleshooting EVPN VPLS over SRv6 BE traffic forwarding failure
Solution
1. Verify that the BGP EVPN peers are successfully established between the PEs.
a. Execute the display bgp peer l2vpn evpn command to verify that all BGP EVPN peers between the PEs are in Established state. If they are in Established state, proceed to step 2. If not, resolve the BGP EVPN peer establishment issue. For more information, see the troubleshooting solution for the issue that the BGP session cannot enter the Established state.
b. If the issue persists after the BGP peers are successfully established, proceed to step 2.
2. Verify that the PEs have received Type 3 routes.
a. Execute the display bgp l2vpn evpn route-type imet command to verify that the PE has received Type 3 routes from other PEs. If Type 3 routes are received, proceed to step 3. If not, troubleshoot the EVPN route synchronization issue. Possible reasons include the route reflector (RR) is not configured with the peer reflect-client command, the RR is not configured with the undo policy vpn-target command, and an incorrect routing policy is specified for the BGP peer. Please check for incorrect configuration and edit it.
b. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.
c. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.
3. Verify that the PEs have received Type 2 routes.
a. Identify the traffic type. For broadcast traffic failure, proceed to step 4. For unicast traffic failure, proceed to the next step.
b. Execute the display bgp l2vpn evpn route-type mac-ip command to verify that a Type 2 route matching the destination MAC address of unicast traffic exists, and the route comes from the correct BGP peer. If a Type 2 route exists and is correct, proceed to step 4. If not, resolve the Type 2 route synchronization issue as described in step 2.
c. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.
d. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.
4. Verify that the PEs have received Type 1 routes.
a. Execute the display bgp l2vpn evpn route-type mac-ip command to view detailed Type 2 route information. If the ESI carried in the route is 0.0.0.0.0.0, proceed to step 5. If not, proceed to the next step.
b. Execute the display bgp l2vpn evpn route-type auto-discovery command to verify that a Type 1 route matching the ESI in the Type 2 route exists. If such a route exists, proceed to step 5. If not, resolve the Type 1 route synchronization issue as described in step 2.
c. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.
d. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.
5. View the detailed route information to check for matching VPN Targets.
a. View the detailed Type 1, 2, and 3 route information. Take Type 1 route as an example, execute the display bgp l2vpn evpn route-type auto-discovery { evpn-route route-length | evpn-prefix } command to obtain the RTs carried in the extended community attribute of the route.
b. Enter VSI view and execute the display this command to obtain the vpn-target configured for the EVPN instance.
c. If a minimum of one RT carried in the route is consistent with the Import RT of the EVPN instance, proceed to step 6. If not, proceed to the next step.
d. Appropriately plan the VPN-target configuration for the EVPN instance, and modify the VPN-target configuration for the EVPN instance to ensure that the RT carried in the route matches the Import RT of the EVPN instance.
e. If the issue persists, proceed to the next step.
6. Verify that a route is available to the SRv6 SID.
a. Execute the display l2vpn forwarding srv6 command to view the SRv6 SID allocated by the remote PE to the SRv6 PW, which is the value for the Out SID field.
b. Execute the display ipv6 routing-table ipv6-address command (where ipv6-address is the Out SID field value) to verify that a route is available to the SRv6 SID allocated by the remote PE to the SRv6 PW. If such an SRv6 SID exists, proceed to step 7. If not, resolve the IGP route learning issue. For more information, see the IP routing troubleshooting guide.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Troubleshooting EVPN VPLS over SRv6 TE policy traffic forwarding failure
Symptom
As shown in Figure 108, EVPN VPLS uses SRv6 TE policy tunnels as public network tunnels, and CE 1 is multi-homed to PE 1 and PE 2. In this network, broadcast and unicast traffic forwarding fails between CE 1 and CE 2.
Common causes
The following are the common causes of this type of issue:
· The BGP EVPN peers are not established between the PEs.
· The PEs have not received Type 3 routes (IMET routes).
· The PEs have not received Type 2 routes (MAC/IP advertisement routes).
· The PEs have not received Type 1 routes (Ethernet auto-discovery routes).
· The Route Target attribute carried in the EVPN route does not match the locally configured Import Route Target attribute.
· The color value carried in the EVPN route does not match the color value configured for the SRv6 TE policy locally.
· The color value of the local VSI instance does not match the color value of the local SRv6 TE policy.
· The SRv6 TE policy to which EVPN VPLS is steered does not take effect.
Troubleshooting flow
Figure 109 shows the troubleshooting flowchart.
Figure 109 Flowchart for troubleshooting EVPN VPLS over SRv6 TE policy traffic forwarding failure
Solution
To resolve the issue:
1. Verify that the BGP EVPN peers are successfully established between the PEs.
a. Execute the display bgp peer l2vpn evpn command to verify that all BGP EVPN peers between the PEs are in Established state. If they are in Established state, proceed to step 2. If not, resolve the BGP EVPN peer establishment issue. For more information, see the troubleshooting solution for the issue that the BGP session cannot enter the Established state.
b. If the issue persists after the BGP peers are successfully established, proceed to step 2.
2. Verify that the PEs have received Type 3 routes.
a. Execute the display bgp l2vpn evpn route-type imet command to verify that the PE has received Type 3 routes from other PEs. If Type 3 routes are received, proceed to step 3. If not, troubleshoot the EVPN route synchronization issue. Possible reasons include the route reflector (RR) is not configured with the peer reflect-client command, the RR is not configured with the undo policy vpn-target command, and an incorrect routing policy is specified for the BGP peer. Please check for incorrect configuration and edit it.
b. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.
c. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.
3. Verify that the PEs have received Type 2 routes.
a. Identify the traffic type. For broadcast traffic failure, proceed to step 4. For unicast traffic failure, proceed to the next step.
b. Execute the display bgp l2vpn evpn route-type mac-ip command to verify that a Type 2 route matching the destination MAC address of unicast traffic exists, and the route comes from the correct BGP peer. If a Type 2 route exists and is correct, proceed to step 4. If not, resolve the Type 2 route synchronization issue as described in step 2.
c. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.
d. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.
4. Verify that the PEs have received Type 1 routes.
a. Execute the display bgp l2vpn evpn route-type mac-ip command to view detailed Type 2 route information. If the ESI carried in the route is 0.0.0.0.0.0, proceed to step 5. If not, proceed to the next step.
b. Execute the display bgp l2vpn evpn route-type auto-discovery command to verify that a Type 1 route matching the ESI in the Type 2 route exists. If such a route exists, proceed to step 5. If not, resolve the Type 1 route synchronization issue as described in step 2.
c. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.
d. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.
5. View the detailed route information to check for matching VPN Targets.
a. View the detailed Type 1, 2, and 3 route information. Take Type 1 route as an example, execute the display bgp l2vpn evpn route-type auto-discovery { evpn-route route-length | evpn-prefix } command to obtain the RTs carried in the extended community attribute of the route.
b. Enter VSI view and execute the display this command to obtain the vpn-target configured for the EVPN instance.
c. If a minimum of one RT carried in the route is consistent with the Import RT of the EVPN instance, proceed to step 6. If not, proceed to the next step.
d. Appropriately plan the VPN-target configuration for the EVPN instance, and modify the VPN-target configuration for the EVPN instance to ensure that the RT carried in the route matches the Import RT of the EVPN instance.
e. If the issue persists, proceed to the next step.
6. View detailed route information, and verify that the color in the route matches the color value configured locally for the SRv6 TE policy.
a. View the detailed Type 1, 2, and 3 route information. Take Type 1 route as an example, execute the display bgp l2vpn evpn route-type auto-discovery { evpn-route route-length | evpn-prefix } command to view the color value in the route. If no color exists in the route, proceed to step 7. Otherwise, proceed to the next step.
b. Execute the display segment-routing ipv6 te policy command to view the color value of the SRv6 TE policy to which EVPN VPLS is expected to be steered.
c. If the color in the route is the same as the color value in the SRv6 TE policy, proceed to step 7. If they are different, you need to edit the color value of the SRv6 TE policy.
d. If the issue persists, proceed to the next step.
7. Verify that the color value of the local VSI instance does not match the color value of the local SRv6 TE policy.
a. Execute the display l2vpn peer srv6 verbose command to view the default color value configured for the VSI instance, which is the value in the Color field.
b. Execute the display segment-routing te policy command to view the color value of the SRv6 TE policy to which EVPN VPLS is expected to be steered.
c. If the color value of the VSI instance is the same as the color value of the SRv6 TE policy, proceed to step 7. If they are different, edit the color value of the local VSI instance or the SRv6 TE policy.
d. If the issue persists, proceed to the next step.
8. Verify that the SRv6 TE policy is effective.
a. Execute the display segment-routing ipv6 te policy command to check the value for the Status field. If the value is up, the SRv6 TE policy is effective, and proceed to step 8. If the value is down, the SRv6 TE policy is not effective. To more information to resolve this issue, see the SRv6 TE policy troubleshooting guide.
b. If the issue persists, proceed to the next step.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Troubleshooting VXLAN issues
Unreachable centralized VXLAN IP gateway
Symptom
As shown in Figure 110, a VXLAN tunnel is established between the VTEP and the centralized VXLAN IP gateway, and a VSI interface on the centralized VXLAN IP gateway acts as a gateway interface. When a ping operation is executed on the server connected to the VTEP, the centralized VXLAN IP gateway is unreachable.
Common causes
The following are the common causes of this type of issue:
· The status of the VXLAN tunnel is down.
· The source or destination IP address of the VXLAN tunnel is incorrect.
· The status of the VXLAN IP gateway interface is down.
· No ARP entry for the ping operation exists on the device.
Troubleshooting flow
Figure 111 shows the troubleshooting flowchart.
Figure 111 Flowchart for troubleshooting an unreachable centralized VXLAN IP gateway
Solution
1. View the VXLAN tunnel information of the VXLAN network to which the server belongs on the VTEP that is connected to the server.
a. Execute the display l2vpn vsi verbose command to check the VXLAN ID of the VXLAN network to which the server belongs, and the name of the VXLAN tunnel associated with the VXLAN network (Tunnel Name field).
<Sysname> display l2vpn vsi verbose
VSI Name: vpna
VSI Index : 0
VSI State : Up
MTU : 1500
Bandwidth : Unlimited
Broadcast Restrain : Unlimited
Multicast Restrain : Unlimited
Unknown Unicast Restrain: Unlimited
MAC Learning : Enabled
MAC Table Limit : -
MAC Learning rate : -
Drop Unknown : -
Flooding : Enabled
Statistics : Disabled
VXLAN ID : 10
Tunnels:
Tunnel Name Link ID State Type Flood proxy
Tunnel1 0x5000001 Up Manual Disabled
Tunnel2 0x5000002 Up Manual Disabled
ACs:
AC Link ID State Type
GE2/0/1 srv1000 0 Up Manual
b. Execute the display interface tunnel command based on the name of the VXLAN tunnel and examine the current state, source IP address, and destination IP address of the VXLAN tunnel.
<Sysname> display interface tunnel 2
Tunnel2
Current state: UP
Line protocol state: UP
Description: Tunnel2 Interface
Bandwidth: 64 kbps
Maximum transmission unit: 1464
Internet protocol processing: Disabled
Last clearing of counters: Never
Tunnel source 2.2.2.2, destination 1.1.1.1
Tunnel protocol/transport UDP_VXLAN/IP
Last 300 seconds input rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec
Last 300 seconds output rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec
Input: 0 packets, 0 bytes, 0 drops
Output: 0 packets, 0 bytes, 0 drops
- If the VXLAN tunnel is up, go to step 3.
- If the VXLAN tunnel is down, go to step 2.
2. Check on the VTEP to see if the source IP address of the VXLAN tunnel is a local IP address, and whether the destination IP address is reachable.
¡ Execute the display ip interface brief command to verify that the source IP address of the VXLAN tunnel is a local IP address. If not, use the source command to modify the source IP address of the VXLAN tunnel.
<Sysname> display ip interface brief
*down: administratively down
(s): spoofing (l): loopback
Interface Physical Protocol IP address VPN instance Description
Loop1 up up(s) 2.2.2.2 -- --
……
MTunnel0 down down -- aaa --
Vlan1 *down down -- -- --
¡ Execute the display fib command to Identify whether an entry for the destination IP address of the VXLAN tunnel is in the FIB table. If not, modify the routing configuration to ensure Layer 3 connectivity to the destination IP address of the VXLAN tunnel.
<Sysname> display fib
Destination count: 4 FIB entry count: 4
Flag:
U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination/Mask Nexthop Flag OutInterface/Token Label
0.0.0.0/32 127.0.0.1 UH InLoop0 Null
2.2.2.2/32 127.0.0.1 UH InLoop0 Null
1.1.1.1/32 127.0.0.1 UH InLoop0 Null
127.0.0.0/32 127.0.0.1 UH InLoop0 Null
3. Execute the display interface vsi-interface brief command on the VXLAN IP gateway to view information about the VXLAN IP gateway interface, including the gateway interface number (Interface field), gateway interface state (Link Protocol field), and the gateway address (Primary IP field).
<Sysname> display interface Vsi-interface brief
Brief information on interfaces in route mode:
Link: ADM - administratively down; Stby - standby
Protocol: (s) - spoofing
Interface Link Protocol Primary IP Description
Vsi1 DOWN DOWN 192.168.1.1
¡ If the VXLAN IP gateway interface is down, check whether the shutdown command is configured for the VSI interface or whether the VSI bound to the VSI interface is up.
- If the shutdown command is configured for the VSI interface, execute the undo shutdown command.
- If the VSI bound to the VSI interface is down, execute the display l2vpn vsi command to check the AC status of VSI. If the AC status is down, verify that the AC configuration is correct and the AC-attached interface is up. If the AC configuration is incorrect or the AC-attached interface is down, modify the AC configuration or troubleshoot the interface issue.
¡ If the VXLAN IP gateway interface is up, execute the display arp command to check whether the ARP information for the gateway IP address has been learned.
<Sysname> display arp
Type: S-Static D-Dynamic O-Openflow R-Rule M-Multiport I-Invalid
IP address MAC address VLAN/VSI Interface/Link ID Aging Type
10.1.1.1 0001-0001-0001 0 Tunnel2 17 D
10.1.1.11 0001-0001-0001 0 Tunnel2 20 D
20.1.1.1 0002-0002-0002 1 Tunnel3 17 D
20.1.1.12 0002-0002-0002 1 Tunnel3 20 D
- If yes, go to step 4.
- If not, execute the display arp count command to check whether the number of learned entries has reached the maximum number of dynamic ARP entries for the device or interface. If yes, execute the arp max-learning-num or arp max-learning-number command to increase the maximum number of dynamic ARP entries.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Diagnostic information collected by using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Disconnection between VSI interfaces on two VTEPs
Symptom
As shown in Figure 112, a VXLAN tunnel is manually set up between the VTEPs, and VSI interfaces are configured as gateway interfaces on the VTEPs. Two VSI interfaces cannot ping each other.
|
NOTE: This section introduces the troubleshooting methods for the ADWAN scenario. |
Figure 112 ADWAN network diagram
Common causes
The following are the common causes of this type of issue:
· A VSI interface has not been associated with a VSI.
· A VSI interface is down.
· The IP addresses of the VSI interfaces are not in the same subnet.
· The VXLAN tunnel is down.
· The source or destination IP address of the VXLAN tunnel is incorrect.
· The VSI is down.
Troubleshooting flow
Figure 113 shows the troubleshooting flowchart.
Figure 113 Flowchart for troubleshooting disconnection between VSI interfaces on two VTEPs
Solution
1. Execute the display ip interface brief command on the VTEPs to view a brief information about the interfaces and IP addresses. For the unreachable gateway IP address, identify the name and state of the owner VSI interface.
[Sysname] display ip interface brief
*down: administratively down
(s): spoofing (l): loopback
Interface Physical Protocol IP address/Mask VPN instance Description
GE2/0/1 up up 192.168.1.114/24 -- --
GE2/0/3 down down -- -- --
RAGG1 down down -- -- --
Vsi1 down down 1.1.1.1/24 -- --
2. Execute the display l2vpn vsi verbose command on the VTEPs to view the information of the gateway interface (Gateway Interface field) and VXLAN tunnel (Tunnel Name field) associated with VSI.
[Sysname] display l2vpn vsi verbose
VSI Name: aaa
VSI Index : 0
VSI State : Up
MTU : 1500
Bandwidth : -
Broadcast Restrain : 5120 kbps
Multicast Restrain : 5120 kbps
Unknown Unicast Restrain: 5120 kbps
MAC Learning : Enabled
MAC Table Limit : -
MAC Learning rate : Unlimited
Drop Unknown : Disabled
PW Redundancy Mode : Slave
Flooding : Enabled
Statistics : Disabled
Gateway Interface : VSI-interface 1
VXLAN ID : 100
Tunnel Statistics : Disabled
Tunnels:
Tunnel Name Link ID State Type Flood Proxy Split horizon
Tunnel1 0x5000001 UP Manual Disabled Enabled
3. Check the output from the display l2vpn vsi verbose command for the VSI associated with VSI-interface 1.
¡ If the VSI does not exist, use the gateway vsi-interface command to configure the VSI interface as the VSI's gateway interface.
¡ If the VSI exists, perform the following tasks for the VSI interface:
- Identify whether the shutdown command has been executed on the VSI interface. If yes, use the undo shutdown command to bring up the VSI interface.
- Verify that the IP addresses of the VSI interfaces on the two VTEPs are in the same subnet. If not, assign IP addresses from the same subnet to the VSI interfaces.
4. Check the output from the display l2vpn vsi verbose command for VXLAN tunnels of the VSI.
¡ If no VXLAN tunnel is associated, create a VXLAN tunnel and use the tunnel command to associate it with the VSI.
¡ If a VXLAN tunnel is associated, follow step 2 to check the state, source IP address, and destination IP address of the VXLAN tunnel by using the display interface tunnel command.
<Sysname> display interface tunnel 2
Tunnel2
Current state: UP
Line protocol state: UP
Description: Tunnel2 Interface
Bandwidth: 64 kbps
Maximum transmission unit: 1464
Internet protocol processing: Disabled
Last clearing of counters: Never
Tunnel source 2.2.2.2, destination 1.1.1.1
Tunnel protocol/transport UDP_VXLAN/IP
Last 300 seconds input rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec
Last 300 seconds output rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec
Input: 0 packets, 0 bytes, 0 drops
Output: 0 packets, 0 bytes, 0 drops
5. Check on the VTEPs whether the source IP address of the VXLAN tunnel is a local IP address, and whether the destination IP address is an address on the remote VTEP. Verify that the destination IP address is reachable.
¡ Execute the display ip interface brief command to Identify whether the source IP address of the VXLAN tunnel is a local IP address. If not, modify the source IP address of the VXLAN tunnel by using the source command.
<Sysname> display ip interface brief
*down: administratively down
(s): spoofing (l): loopback
Interface Physical Protocol IP address VPN instance Description
Loop1 up up(s) 2.2.2.2 -- --
……
MTunnel0 down down -- aaa --
Vlan1 *down down -- -- --
¡ Execute the display fib command to Identify whether an entry for the destination IP address of the VXLAN tunnel is in the FIB table, and use the ping command to verify connectivity between the source and destination IP addresses of the VXLAN tunnel. If no FIB entry is found, modify the routing configuration to ensure Layer 3 connectivity to the destination IP address of the VXLAN tunnel.
<Sysname> display fib
Destination count: 4 FIB entry count: 4
Flag:
U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination/Mask Nexthop Flag OutInterface/Token Label
0.0.0.0/32 127.0.0.1 UH InLoop0 Null
2.2.2.2/32 127.0.0.1 UH InLoop0 Null
1.1.1.1/32 127.0.0.1 UH InLoop0 Null
127.0.0.0/32 127.0.0.1 UH InLoop0 Null
6. Execute the display l2vpn vsi verbose command on the VTEPs to Identify whether the VSI is up.
¡ If the VSI is down, check whether the shutdown command has been configured on the VSI. If yes, execute the undo shutdown command.
¡ If the VSI is up, go to step 7.
7. Perform steps 1 through 6.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Diagnostic information collected by using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
Module Name: HH3C-IF-EXT-MIB
· hh3cIfPortUp (1.3.6.1.4.1.25506.2.40.3.0.5)
Log messages
· IFNET/3/PHY_UPDOWN
· IFNET/5/LINK_UPDOWN
Troubleshooting EVPN issues
Troubleshooting EVPN VXLAN
Intra-VXLAN tunnel setup failure in an EVPN VXLAN network with distributed gateways
Symptom
In an EVPN network with distributed gateways, tunnels cannot be established between VTEPs in the same VXLAN.
Common causes
The following are the common causes of this type of issue:
· Type-2 EVPN routes (MAC/IP advertisement routes) and type-3 EVPN routes (IMET routes) have not been received.
· The RT configuration is incorrect on EVPN instances.
Troubleshooting flow
Troubleshoot the issue by using the following process:
1. Verify that type-2 routes have been received.
2. Verify that type-3 routes have been received.
3. Verify that the RT configuration for EVPN instances is correct.
Figure 114 shows the troubleshooting process.
Figure 114 Flowchart for troubleshooting intra-VXLAN tunnel setup failure
Solution
To resolve the issue:
1. Execute the display bgp l2vpn evpn command on the local end to Identify whether the local end has advertised type-2 or type-3 routes to the peer end. For example, the following output indicates that the local end has advertised type-2 and type-3 routes to 4.4.4.4. If a Route Reflector (RR) exists in the network, specify the RR address when you execute the display bgp l2vpn evpn command. If not, specify the peer end's address.
<Sysname> display bgp l2vpn evpn peer 4.4.4.4 advertised-routes
Total number of routes: 2
BGP local router ID is 1.1.1.1
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external,
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Route distinguisher: 2:2
Total number of routes: 2
Network NextHop MED LocPrf Path/Ogn
* > [2][0][48][0e86-19b6-0308][0][0.0.0.0]/104
0.0.0.0 0 100 i
* > [3][0][32][1.1.1.1]/80
0.0.0.0 0 100 i
¡ If the local end has advertised type-2 or type-3 routes to the peer end, go to step 2.
¡ If the local end has not advertised type-2 and type-3 routes to the peer end, Identify whether BGP is configured correctly for EVPN. For more information, see EVPN VXLAN configuration in EVPN Configuration Guide.
2. Execute the display bgp l2vpn evpn command on the peer end to Identify whether the peer end has advertised type-2 or type-3 routes to the local end. For example, the following output indicates that the peer end has advertised type-2 and type-3 routes to 4.4.4.4. If an RR exists in the network, specify the RR address when you execute the display bgp l2vpn evpn command. If not, specify the peer end's address.
<Sysname> display bgp l2vpn evpn peer 4.4.4.4 advertised-routes
Total number of routes: 2
BGP local router ID is 3.3.3.3
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external,
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Route distinguisher: 1:1
Total number of routes: 2
Network NextHop MED LocPrf Path/Ogn
* > [2][0][48][0e86-23cf-0507][0][0.0.0.0]/104
0.0.0.0 0 100 i
* > [3][0][32][3.3.3.3]/80
0.0.0.0 0 100 i
¡ If the peer end has advertised type-2 or type-3 routes to the local end, go to step 3.
¡ If the peer end has not advertised type-2 and type-3 routes to the local end, Identify whether BGP is configured correctly for EVPN. For more information, see EVPN VXLAN configuration in EVPN Configuration Guide.
3. Execute the display this command in VSI view to Identify whether export targets and import targets on both ends are correct.
[Sysname-vsi-aaa] display this
#
vsi aaa
vxlan 10
evpn encapsulation vxlan
route-distinguisher 2:2
vpn-target 1:1 export-extcommunity
vpn-target 2:2 import-extcommunity
#
return
¡ If the route targets are inconsistent on the local and peer ends, execute the vpn-target command in VSI view to modify the incorrect route targets.
¡ If the route targets are consistent on the local peer and ends, go to step 4.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Diagnostic information collected by using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
Inter-VXLAN tunnel setup failure in an EVPN VXLAN network with distributed gateways
Symptom
In an EVPN network with distributed gateways, tunnels cannot be established between VTEPs in different VXLANs.
Common causes
The following are the common causes of this type of issue:
· Type-2 EVPN routes and type-5 EVPN routes (IP prefix advertisement routes) have not been received.
· The RT configuration is incorrect on VPN instances.
Troubleshooting flow
Troubleshoot the issue by using the following process:
1. Verify that type-2 routes have been received.
2. Verify that type-5 routes have been received.
3. Verify that the RT configuration for VPN instances is correct.
Figure 115 shows the troubleshooting process.
Figure 115 Flowchart for troubleshooting inter-VXLAN tunnel setup failure
Solution
To resolve the issue:
1. Execute the display bgp l2vpn evpn command on the local end to Identify whether the local end has advertised type-2 or type-5 routes to the peer end. For example, the following output indicates that the local end has advertised type-2 and type-5 routes to 4.4.4.4. If an RR exists in the network, specify the RR address when you execute the display bgp l2vpn evpn command. If not, specify the peer end's address.
<Sysname> display bgp l2vpn evpn peer 4.4.4.4 advertised-routes
Total number of routes: 3
BGP local router ID is 1.1.1.1
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external,
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Route distinguisher: 1:1
Total number of routes: 1
Network NextHop MED LocPrf Path/Ogn
* > [5][0][24][10.1.1.0]/80
0.0.0.0 0 100 i
Route distinguisher: 2:2
Total number of routes: 2
Network NextHop MED LocPrf Path/Ogn
* > [2][0][48][0e86-19b6-0308][0][0.0.0.0]/104
0.0.0.0 0 100 i
* > [3][0][32][1.1.1.1]/80
0.0.0.0 0 100 i
¡ If the local end has advertised type-2 or type-5 routes to the peer end, go to step 2.
¡ If the local end has not advertised type-2 and type-5 routes to the peer end, Identify whether BGP is configured correctly for EVPN. For more information, see EVPN VXLAN configuration in EVPN Configuration Guide.
2. Execute the display bgp l2vpn evpn command on the peer end to Identify whether the peer end has advertised type-2 or type-5 routes to the local end. For example, the following output indicates that the peer end has advertised type-2 and type-5 routes to 4.4.4.4. If an RR exists in the network, specify the RR address when you execute the display bgp l2vpn evpn command. If not, specify the peer end's address.
<Sysname> display bgp l2vpn evpn peer 4.4.4.4 advertised-routes
Total number of routes: 3
BGP local router ID is 3.3.3.3
Status codes: * - valid, > - best, d - dampened, h - history,
s - suppressed, S - stale, i - internal, e - external,
a - additional-path
Origin: i - IGP, e - EGP, ? - incomplete
Route distinguisher: 1:1
Total number of routes: 2
Network NextHop MED LocPrf Path/Ogn
* > [2][0][48][0e86-23cf-0507][0][0.0.0.0]/104
0.0.0.0 0 100 i
* > [3][0][32][3.3.3.3]/80
0.0.0.0 0 100 i
Route distinguisher: 3:3
Total number of routes: 2
Network NextHop MED LocPrf Path/Ogn
* > [5][0][24][10.1.1.0]/80
0.0.0.0 0 100 i
¡ If the peer end has advertised type-2 or type-5 routes to the local end, go to step 3.
¡ If the peer end has not advertised type-2 and type-5 routes to the local end, Identify whether BGP is configured correctly for EVPN. For more information, see EVPN VXLAN configuration in EVPN Configuration Guide.
3. Execute the display this command in L3VNI VPN instance view to Identify whether export targets and import targets on both ends are correct.
[Sysname-vpn-instance-vpna] display this
#
ip vpn-instance vpna
route-distinguisher 1:1
#
address-family evpn
vpn-target 1:1 import-extcommunity
vpn-target 1:1 export-extcommunity
#
return
¡ If the route targets are inconsistent on the local and peer ends, execute the vpn-target command to modify the incorrect route targets.
¡ If the route targets are consistent on the local peer and ends, go to step 4.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Diagnostic information collected by using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
Layer 2 VXLAN business traffic interruption
Symptom
A VXLAN network cannot forward Layer 2 VXLAN business traffic.
Common causes
The following are the common causes of this type of issue:
· ACs or VXLAN tunnels are not established.
· MAC addresses are not learned.
Troubleshooting flow
Figure 116 shows the troubleshooting flowchart.
Figure 116 Flowchart for troubleshooting Layer 2 VXLAN business traffic interruption
Solution
To resolve the issue:
1. Execute the display l2vpn vsi verbose command to view the VXLAN tunnels and ACs of the involved VSI.
<Sysname> display l2vpn vsi verbose
VSI Name: vpna
VSI Index : 0
VSI State : Up
MTU : 1500
Bandwidth : Unlimited
Broadcast Restrain : Unlimited
Multicast Restrain : Unlimited
Unknown Unicast Restrain: Unlimited
MAC Learning : Enabled
MAC Table Limit : -
MAC Learning rate : -
Drop Unknown : -
Flooding : Enabled
Statistics : Disabled
VXLAN ID : 10
Tunnels:
Tunnel Name Link ID State Type Flood proxy
Tunnel1 0x5000001 Up Manual Disabled
ACs:
AC Link ID State Type
GE2/0/1 srv1000 0 Up Manual
¡ If both the ACs and VXLAN tunnels are up, go to step 2.
¡ If an AC is down, modify the incorrect AC configuration.
¡ If a VXLAN tunnel is down, troubleshoot the issue as described in "Intra-VXLAN tunnel setup failure in an EVPN VXLAN network with distributed gateways."
2. Execute the display l2vpn mac-address command to check the VSI MAC address table for the MAC addresses of endpoints in the network and the total number of learned MAC address entries.
<Sysname> display l2vpn mac-address
* - The output interface is issued to another VSI
MAC Address State VSI Name Link ID/Name Aging
0001-0001-0001 Static aaa Tunnel1 NotAging
52f6-bc1e-0d06 Dynamic vpna GE2/0/1 Aging
--- 3 mac address(es) found ---
¡ If the endpoint MAC addresses have been learned, go to step 3.
¡ If the endpoint MAC addresses are not learned, execute the display this command in VSI view and verify that the mac-table limit and mac-table limit drop-unknown commands have been executed for the VSI. If the commands exist and the MAC address learning limit has been reached, increase or delete the MAC address learning limit for the VSI by using the mac-table limit drop-unknown command.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Diagnostic information collected by using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
Layer 3 VXLAN business traffic interruption
Symptom
A VXLAN network cannot forward Layer 3 VXLAN business traffic.
Common causes
The following are the common causes of this type of issue:
· ACs or VXLAN tunnels are not established.
· The device's router MAC address is incorrect.
Troubleshooting flow
Figure 117 shows the troubleshooting flowchart.
Figure 117 Flowchart for troubleshooting Layer 3 VXLAN business traffic interruption
Solution
To resolve the issue:
1. Execute the display l2vpn vsi verbose command to view the VXLAN tunnels and ACs of the involved VSI.
<Sysname> display l2vpn vsi verbose
VSI Name: vpna
VSI Index : 0
VSI State : Up
MTU : 1500
Bandwidth : Unlimited
Broadcast Restrain : Unlimited
Multicast Restrain : Unlimited
Unknown Unicast Restrain: Unlimited
MAC Learning : Enabled
MAC Table Limit : -
MAC Learning rate : -
Drop Unknown : -
Flooding : Enabled
Statistics : Disabled
VXLAN ID : 10
Tunnels:
Tunnel Name Link ID State Type Flood proxy
Tunnel1 0x5000001 Up Manual Disabled
ACs:
AC Link ID State Type
GE2/0/1 srv1000 0 Up Manual
¡ If both the ACs and VXLAN tunnels are up, go to step 2.
¡ If an AC is down, modify the incorrect AC configuration.
¡ If a VXLAN tunnel is down, troubleshoot the issue as described in "Inter-VXLAN tunnel setup failure in an EVPN VXLAN network with distributed gateways."
2. Execute the display evpn routing-table command, check the routing table of the L3VNI VPN instance, and record the nexthop address (Nexthop field) in the route for the target endpoint IP address (IP address field).
<Sysname> display evpn routing-table vpn-instance vpn1
Flags: E - with valid ESI A – A-D ready L - Local ES exists
VPN instance name: vpn1 Local L3VNI: 7
IP address Nexthop Outgoing interface NibID Flags
10.1.1.11 1.1.1.1 Vsi-interface3 0x18000000 EAL
3. Execute the display arp command to view the ARP information for the next hop.
<Sysname> display arp
Type: S-Static D-Dynamic O-Openflow R-Rule M-Multiport I-Invalid
IP address MAC address VLAN/VSI name Interface Aging Type
1.1.1.1 00e0-fe50-6503 vsi1 Tunnel1 960 D
¡ If the nexthop address is mapped to the router MAC address, go to step 4. Execute the display interface vsi-interface command to view the MAC address of the L3VNI VSI interface, which is also the router MAC address.
¡ If the nexthop address is not mapped to the router MAC address, restore consistency between the mapped MAC address and the MAC address of the L3VNI VSI interface. Alternatively, use the evpn global-mac command to configure an EVPN global MAC address.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Diagnostic information collected by using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
Prolonged VM migration
Symptom
In an EVPN network, a VTEP does not learn the MAC address or ARP information of a VM immediately after the VM migrates to the VTEP.
Common causes
The following are the common causes of this type of issue:
· The VTEP has not learned the MAC address or ARP entry for the migrated VM.
· The VTEP has not learned the MAC address or ARP entry for the migrated VM through BGP EVPN route synchronization.
· The BGP EVPN routes synchronized between VTEPs are not optimal ones.
Troubleshooting flow
Figure 118 shows the troubleshooting flowchart.
Figure 118 Flowchart for troubleshooting prolonged VM migration
Solution
1. After the migration, check for the MAC address and ARP entry on the destination VTEP.
Execute the display l2vpn mac-address command and check the VSI MAC address table for the MAC address of the migrated VM.
<Sysname> display l2vpn mac-address
* - The output interface is issued to another VSI
MAC Address State VSI Name Link ID/Name Aging
52f6-bc1e-0d06 EVPN aaa Tunnel10 NotAging
0001-0001-0001 Dynamic vpna GE2/0/1 Aging
--- 2 mac address(es) found ---
Execute the display arp command and check the VSI ARP table for the ARP entry of the migrated VM.
<Sysname> display arp
Type: S-Static D-Dynamic O-Openflow R-Rule M-Multiport I-Invalid
IP address MAC address VLAN/VSI name Interface Aging Type
10.1.1.3 0001-0001-0001 vpna GE2/0/1 960 D
1.1.1.4 00e0-fe60-5000 vsi2 Tunnel1 -- M
¡ If a MAC address or ARP entry exists for the migrated VM, go to step 2.
¡ If no MAC address or ARP entry exists for the migrated VM, the VTEP does not learn the MAC address or ARP information of the migrated VM. Bring up the VM on the VTEP for the VTEP to learn the MAC address and ARP information.
2. Before migration, check on the VTEP whether the MAC address or ARP of the migrated VM has been synchronized through BGP EVPN routes.
Execute the display evpn route mac command to verify that the MAC address of the migrated VM has been learned from synchronized BGP EVPN routes. The value B in the Flags field indicates that a MAC address entry is learned from BGP EVPN routes.
<Sysname> display evpn route mac
Flags: D - Dynamic B - BGP L - Local active
G - Gateway S - Static M - Mapping I - Invalid
VSI name: bbb
EVPN instance: -
MAC address Link ID/Name Flags Encap Next hop
0000-0000-000a 1 DL VXLAN -
0001-0001-0001 Tunnel1 B VXLAN 2.2.2.2
Execute the display evpn route arp command to verify that the ARP information of the migrated VM has been learned from synchronized BGP EVPN routes. The value B in the Flags field indicates that an ARP entry is learned from BGP EVPN routes.
<Sysname> display evpn route arp
Flags: D - Dynamic B - BGP L - Local active
G - Gateway S - Static M - Mapping I - Invalid
VPN instance: vpn1 Interface: Vsi-interface1
IP address MAC address Router MAC VSI index Flags
10.1.1.1 0001-0001-0001 a0ce-7e40-0400 0 B
10.1.1.11 0001-0001-0002 a0ce-7e40-0400 0 DL
10.1.1.101 0001-0011-0101 a0ce-7e40-0400 0 SL
10.1.1.102 0001-0011-0102 0011-9999-0000 0 BS
¡ If a MAC address entry or ARP entry has been learned through BGP EVPN route synchronization for the migrated VM, go to step 3.
¡ If no MAC address entry or ARP entry has been learned through BGP EVPN route synchronization for the migrated VM, execute the vpn-target command to modify the route targets of the local EVPN instance. Make sure the EVPN instance's route targets on the local and peer ends are consistent.
3. Execute the display bgp l2vpn evpn command and verify that the MAC/IP advertisement route carrying the MAC address and ARP information of the migrated VM is optimal. Verify that the values in the State field include best. In the following output, the MAC/IP advertisement route advertises MAC address 0001-0203-0405 and IP address 5.5.5.5/32, and the route state values include best.
<Sysname> display bgp l2vpn evpn route-distinguisher 1.1.1.1:100 [2][5][48][0001-0203-0405][32][5.5.5.5] 136
BGP local router ID: 172.16.250.133
Local AS number: 100
Route distinguisher: 1.1.1.1:100
Total number of routes: 1
Paths: 1 available, 1 best
BGP routing table information of [2][5][48][0001-0203-0405][32][5.5.5.5]/136:
From : 10.1.1.2 (192.168.56.17)
Rely nexthop : 10.1.1.2
Original nexthop: 10.1.1.2
OutLabel : NULL
Ext-Community : <RT: 1:2>, <RT: 1:3>, <RT: 1:4>, <RT: 1:5>, <RT: 1:6>, <RT: 1:7
>, <Encapsulation Type: VXLAN>, <Router's Mac: 0006-0708-0910
>, <MAC Mobility: Flag 0, SeqNum 2>, <Default GateWay>
RxPathID : 0x0
TxPathID : 0x0
AS-path : 200
Origin : igp
Attribute value : MED 0, pref-val 0
State : valid, external, best
IP precedence : N/A
QoS local ID : N/A
Traffic index : N/A
EVPN route type : MAC/IP advertisement route
ESI : 0001.0203.0405.0607.0809
Ethernet tag ID : 5
MAC address : 0001-0001-0001
IP address : 10.1.1.1/32
MPLS label1 : 10
MPLS label2 : 100
Re-origination : Enable
¡ If the MAC/IP advertisement route is an optimal route, go to step 4.
¡ If the MAC/IP advertisement route is an optimal route, modify routing configuration for the route to be an optimal one.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Diagnostic information collected by using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
Unavailability to access a VM after MAC migration
Symptom
After MAC address migration occurs on a VTEP, endpoints cannot access the migrated VM.
Common causes
The common causes of this issue are traffic anomalies and an incorrect outgoing interface in the MAC address entry for the VM due to network attacks.
Troubleshooting flow
Troubleshoot the issue by using the following process:
1. View the MAC address migration information.
2. Verify that the outgoing interface in the MAC address entry for the VM is correct.
Solution
1. Execute the display evpn route mac-mobility command to view the MAC address migration information. In the following output, MAC address 1000-0000-0000 has migrated from GE2/0/1 to the local VTEP.
<Sysname> display evpn route mac-mobility
Flags: S - Suppressed, N - Not suppressed
Suppression threshold: 5
Detection cycle : 180s
Suppression time : Permanent
VSI name : vsia
EVPN instance : -
MAC address Move count Moved from Flags Suppressed at
1000-0000-0000 10 GE2/0/1 S 15:30:30 2018/03/30
2. Execute the display l2vpn mac-address command to verify that the outgoing interface in the MAC address entry for the VM is correct. The Link ID/Name field displays the name of the interface or tunnel interface where a MAC address is learned.
<Sysname> display l2vpn mac-address
* - The output interface is issued to another VSI
MAC Address State VSI Name Link ID/Name Aging
1000-0000-0000 EVPN aaa Tunnel10 NotAging
52f6-bc1e-0d06 Dynamic vpna GE2/0/1 Aging
--- 2 mac address(es) found ---
¡ If the outgoing interface is correct, go to step 3.
¡ If the outgoing interface is incorrect, bring up the VM on the destination VTEP for the VTEP to learn or update its forwarding entries.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Diagnostic information collected by using the display diagnostic-information command.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A.
Troubleshooting ACL and QoS issues
QoS issues
Traffic failure to match a traffic class
Symptom
When you execute the display qos policy interface command to view the configuration information and running status of the QoS policy on an interface, you find that the current traffic on the interface does not match the traffic classes in the QoS policy.
· For a hardware forwarding product, if the Accounting enable field in traffic class 1 is 0 (Bytes/Packets) 0 (bps) in the command output, it means the number of packets that match the traffic class on the interface is zero. To view the statistics of traffic matching the traffic class on the hardware forwarding product, you must execute the accounting command in the traffic behavior of the QoS policy to configure the traffic accounting action.
· For a software forwarding product, if the Matched field in traffic class 1 is 0 (Packets) 0 (Bytes) in the command output, it means the number of packets that match the traffic class on the interface is 0. In the QoS policy of the software forwarding product, a default traffic class named default-class exists. All traffic that does not match any other traffic classes in the QoS policy will match the traffic class named default-class.
<Sysname> display qos policy interface gigabitethernet 2/0/1 inbound
Interface: GigabitEthernet2/0/1
Direction: Inbound
Policy: 1
Classifier: default-class
Matched : 213126 (Packets) 40928738 (Bytes)
5-minute statistics:
Forwarded: 20/4208 (pps/bps)
Dropped : 0/0 (pps/bps)
Operator: AND
Rule(s) :
If-match any
Behavior: be
-none-
Classifier: 1
Matched : 0 (Packets) 0 (Bytes)
5-minute statistics:
Forwarded: 0/0 (pps/bps)
Dropped : 0/0 (pps/bps)
Operator: AND
Rule(s) :
If-match acl 3000
Behavior: 1
Marking:
Remark dscp 3
Common causes
The following are the common causes of this type of issue:
· The interface that has the QoS policy applied is in down state and is not forwarding traffic.
· The configuration of a traffic class is incorrect, and it cannot match the forwarded traffic.
· A higher-priority policy is executed on the traffic matching the ACL in a traffic class of the QoS policy.
Troubleshooting flow
Figure 119 shows the troubleshooting flowchart.
Figure 119 Flowchart for troubleshooting traffic failure to match a traffic class
Solution
1. Identify whether the physical link state of the interface is normal.
Execute the display interface command on the device to check the interface status. For example:
<Sysname> display interface gigabitethernet 2/0/1
GigabitEthernet2/0/1
Interface index: 386
Current state: Administratively DOWN
Line protocol state: DOWN
…
a. If the Current state field displays Administratively DOWN, execute the undo shutdown command on the interface to bring up the interface.
b. If the Current state field displays DOWN, check the physical connection of the interface.
c. If the physical link of the interface is operating normally but the issue persists, proceed to the following steps.
2. Check the configuration of traffic classes in the QoS policy applied to the device interface.
Execute the display traffic classifier user-defined command on the device to check the configuration of user-defined traffic classes. For more information on the match criteria of the if-match command, see QoS commands in ACL and QoS Command Reference.
If the configuration of a traffic class is incorrect, execute the traffic classifier command to enter the view of the traffic class, and execute the if-match command to modify the match criteria of the traffic class. For example:
[Sysname-classifier-1] if-match dscp ef
[Sysname-classifier-1] display this
traffic classifier a operator or
if-match protocol ipv6
if-match dscp ef
Identify whether the logical relationship among various criteria, which is displayed in the Operator field, is correct. AND means that the criteria in this traffic class are ANDed. In this case, a packet must match all criteria to belong to this class. OR means that the criteria in this traffic class are ORed. In this case, a packet that matches any criterion belongs to this class. If more than one match criterion is in the Rule(s) field and the Operator field displays AND, it means that a packet must match all criteria to belong to this class. In this case, execute the traffic classifier command and set the operator parameter to or.
<Sysname> display traffic classifier user-defined
User-defined classifier information:
Classifier: 1 (ID 101)
Operator: AND
Rule(s) :
If-match dscp ef
Classifier: 2 (ID 102)
Operator: AND
Rule(s) :
If-match dscp af21
Classifier: 3 (ID 103)
Operator: AND
Rule(s) :
If-match dscp af11
If the traffic class in the QoS policy is configured correctly but the issue persists, proceed to the following steps.
3. When an ACL is referenced for traffic matching in a traffic class, it is possible that the QoS policy configured in MQC method will not take effect because a higher-priority behavior has been executed on the traffic matching the ACL. The priority order for different behaviors is as follows:
¡ In the outbound direction: Packet filtering > Global MQC QoS policy > MQC QoS policy applied to interface.
¡ In the inbound direction: Packet filtering > MQC QoS policy applied to interface > Global MQC QoS policy.
Execute the display current-configuration command to identify whether higher-priority policy behaviors exist in the current running configuration. If no such configurations exist but the issue persists, proceed to the following steps.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration data and log messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· QOS_POLICY_APPLYIF_CBFAIL
· QOS_POLICY_APPLYIF_FAIL
Ineffective QPPB policy
Symptom
As shown in Figure 120, in a typical QoS Policy Propagation Through the Border Gateway Protocol (QPPB) environment, Device A and Device B establish a BGP neighbor relationship. Device B sends BGP route 10.10.10.1/24 to Device A. Device A sets the IP precedence or local QoS ID value for BGP route 10.10.10.1/24 through a routing policy and adds it to the routing table of Device A. When Device A receives a packet destined to 10.10.10.1/24, it classifies the packet based on the IP precedence or local QoS ID value in the routing table of Device A and executes the corresponding action.
The QPPB policy does not take effect when Device A forwards packets. The packets are not effectively classified based on the IP precedence or local QoS ID value in the routing table of Device A, and the corresponding action is not executed.
The following section describes the troubleshooting flow in BGP IPv4 unicast address view. The flow is similar for other address family views.
Figure 120 Typical QPPB network
Common causes
The following are the common causes of this type of issue:
· The physical link between routers is not connected.
· The BGP route has failed to be advertised.
· The IP precedence or local QoS ID value fails to be issued to the routing table.
· The QPPB policy has not been applied to the forwarding interface.
· The configuration of the QPPB policy is incorrect.
Troubleshooting flow
To troubleshoot this type of fault:
· Identify whether the physical link between routers is operating normally.
· Identify whether the BGP neighbor relationship has been established normally.
· Identify whether BGP routes have been learned from the peer.
· Identify whether the IP precedence or local QoS ID value is properly issued to the routing table.
· Identify whether the QPPB policy configuration is correct.
· Identify whether the QPPB policy is applied to the forwarding interface.
Figure 105 shows the troubleshooting flowchart of this type of fault.
Figure 121 Flowchart for troubleshooting ineffective QPPB policy
Solution
1. Check the connectivity of the link between Device A and Device B.
Execute the display interface command on Device A, Device B, and the network devices between them to check the physical link state. View information of the interconnect interface on Device A as an example.
<Sysname> display interface gigabitethernet 2/0/1
GigabitEthernet2/0/1
Interface index: 386
Current state: Administratively DOWN
Line protocol state: DOWN
…
a. If the Current state field displays Administratively DOWN, execute the undo shutdown command on the interface to bring up the interface.
b. If the Current state field displays DOWN, check the physical connection of the interface.
c. If the physical link of the interface is operating normally but the issue persists, proceed to the following steps.
2. Identify whether the BGP neighbor relationship between Device A and Device B is normal. Device A must be able to learn routes from its BGP peer normally.
a. Execute the display ip routing-table protocol command on Device A. Identify whether Device A has normally learned BGP route 10.10.10.1/24 from the BGP peer (Device B).
- If this BGP route appears in the command output, it means BGP routes are learned normally. In this case, proceed to step 3.
- If this BGP route does not appear in the command output, it means BGP routes are learned abnormally. In this case, proceed to step b.
<Sysname> display ip routing-table protocol bgp
…
Destination/Mask Proto Pre Cost NextHop Interface
192.168.80.0/24 bgp 255 10 192.168.80.10 GE2/0/1
10.10.10.1/24 bgp 255 10 2.2.2.2 GE2/0/1
…
b. Device A and Device B establish a BGP neighbor relationship through their respective Loopback 1 interfaces. By executing the display bgp peer command on Device A, you can identify whether the BGP neighbor relationship between Device A and Device B is normal.
- If the State field of the peer (Device B) displays Established, it means that the BGP neighbor relationship between Device A and Device B is normal.
- If not, see the BGP troubleshooting guide to troubleshoot BGP-related issues.
<Sysname> display bgp peer ipv4
BGP local router ID: 1.1.1.1
Local AS number: 100
Total number of peers: 1 Peers in established state: 1
* - Dynamically created peer
Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State
2.2.2.2 200 13 16 0 0 00:10:34 Established
c. If the BGP neighbor relationship is normal and Device A can learn the route from the BGP peer normally, proceed to the following steps.
Execute the display ip routing-table ip-address verbose command on Device A to identify whether the BGP route learned from Device B is configured with the correct IP precedence or local QoS ID value. The command output is as follows:
<Sysname> display ip routing-table 10.10.10.1 verbose
…
Destination: 10.10.10.1/24
Protocol: BGP
Process ID: 0
SubProtID: 0x1 Age: 00h00m37s
FlushedAge: 15h28m49s
Cost: 0 Preference: 255
IpPre: N/A QosLocalID: 100
Tag: 0 State: Active Adv
In the command output, the IpPre field represents the IP precedence value, and the QosLocalID field represents the local QoS ID value. If both of these fields have a value of N/A, it means that the BGP route learned from Device B has not been configured with an IP precedence or local QoS local ID value. Execute the following commands on Device A to add the relevant configuration.
a. Execute the ip prefix-list command to configure an IPv4 prefix list or an item for the list and permit routes destined to subnet 10.10.10.0/24 and with a mask length of 24.
#
ip prefix-list 10 index 10 permit 10.10.10.0 24
#
b. Execute the route-policy command to create a routing policy. In the routing policy, use the if-match ip command to configure the criteria of matching the IPv4 prefix list created above. Then, execute either the apply qos-local-id or apply ip-precedence command to configure the local QoS ID or IP precedence value.
#
route-policy a permit node 10
if-match ip address prefix-list 10
apply ip-precedence 1
apply qos-local-id 100
#
c. Execute the peer route-policy command in BGP IPv4 unicast address family view to apply the routing policy to the routes from the peer device (Device B).
If the BGP route learned from Device B is configured with the correct IP precedence or local QoS ID value, proceed to the following steps.
4. Identify whether the QPPB policy configuration is correct.
Execute the display qos policy user-defined command on Device A to view the QPPB policy configuration.
<Sysname> display qos policy user-defined
User-defined QoS policy information:
Policy: aaa (ID 106)
Classifier: aaa (ID 0)
Behavior: aaa
Redirecting:
Redirect to next-hop 192.168.10.1
a. Execute the display traffic classifier command to check the traffic class configuration in the QPPB policy. In this example, the Classifier field displays aaa.
<Sysname> display traffic classifier user-defined aaa
User-defined classifier information:
Classifier: aaa (ID 100)
Operator: AND
Rule(s) :
If-match qos-local-id 100
- If the Rule(s) field displays the If-match qos-local-id or If-match ip-precedence rule, make sure the match criteria of the traffic class are consistent with the IP precedence or local QoS ID value configured in the previous routing policy.
- If the If-match qos-local-id or If-match ip-precedence match criterion does not exist or if the traffic class configuration is inconsistent with the IP precedence or local QoS ID value configured in the routing policy, execute the undo if-match command in traffic class view to delete the original configuration and re-execute the if-match command to configure the match criterion that matches the IP precedence or local QoS ID value. For detailed troubleshooting steps regarding QoS policy failure, see troubleshooting ACL and QoS.
If the configuration of the QPPB policy is correct, proceed to the following steps.
5. Identify whether the QPPB feature is configured on the packet forwarding interface and a QPPB policy is applied.
On Device A, configure QPPB on the outgoing interface or incoming interface, and apply the QPPB policy to the interface. On this interface, use the display this command to identify whether the configuration is complete. Take the configuration on the incoming interface as an example. The configuration is displayed as follows:
#
bgp-policy destination ip-prec-map ip-qos-map
qos apply policy aaa inbound
#
If any type of the preceding configurations is missing, execute the bgp-policy or qos apply policy command on the interface to add the configuration. If the QPPB policy has already been applied to the packet forwarding interface and the QPPB feature has been configured, proceed to the following steps.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration data, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Ineffective QoS rate-limiting policy for user group
Symptom
In the network shown in Figure 122, interface A of the UP acts as the remote interface for the vBRAS-CP to connect online users, while interface B of the UP connects to the public network. Service traffic is forwarded from interface B of the UP to users connected to interface A. Downlink service traffic for all users on this interface requires a rate limit of 300 Mbps.
On the vBRAS-CP, create a traffic class that uses an ACL to match any user group and configure a traffic behavior to implement rate limiting. After associating the traffic class with the traffic behavior in the QoS policy, apply the QoS policy to the remote interface (interface A) to rate-limit the traffic for all users to interface A. The peak rate of the downlink service traffic can reach 500 Mbps, but the rate limit is set to 300 Mbps. The rate limiting feature does not take effect.
Figure 122 Network diagram for the QoS rate limiting policy for user groups
Common causes
The following are the common causes of this type of issue:
· The traffic class configuration of the QoS policy is incorrect on the vBRAS-CP.
· The traffic behavior configuration in the QoS policy is incorrect on the vBRAS-CP.
· The QoS policy on the vBRAS-CP is applied incorrectly.
Troubleshooting flow
Figure 123 shows the troubleshooting flowchart.
Figure 123 Flowchart for troubleshooting the ineffective QoS rate-limiting policy for user group
Solution
1. In the network as shown in Figure 122, check the traffic class configuration in the QoS policy on the vBRAS-CP. Perform all the following tasks on the CTRL-VM of the vBRAS-CP.
Execute the display traffic classifier user-defined command on the vBRAS-CP to check the configuration of the traffic class. For example, if the Rule(s) field displays If-match acl 3001, the rule matches the packets with an advanced ACL.
<Sysname> display traffic classifier user-defined
User-defined classifier information:
Classifier: aaa (ID 103)
Operator: AND
Rule(s) :
If-match acl 3001
According to the ACL number, execute the display acl command on the vBRAS-CP to further identify whether the parameters in the ACL numbered 3001 match any user groups. For example, the ACL configuration is as follows:
<Sysname> display acl 3001
Advanced IPv4 ACL 3001, 1 rule,
ACL's step is 5
rule 5 permit ip user-group-any
If the ACL in the traffic class has configuration errors, delete the incorrect configuration and reconfigure the ACL to match any user groups.
If other matching parameters exist in the ACL of the traffic class, execute the undo rule command to delete the interfering parameters from the ACL. Alternatively, you can execute the undo rule command, and then execute the rule command to reconfigure the ACL to match any user groups.
Identify whether the logical relationship among various criteria, which is displayed in the Operator field, is correct. AND means that the criteria in this traffic class are ANDed. In this case, a packet must match all criteria to belong to this class. OR means that the criteria in this traffic class are ORed. In this case, a packet that matches any criterion belongs to this class. Set the operator as needed in the traffic class. In this example, if more than one match criterion is in the Rule(s) field and the Operator field displays AND, delete the criteria that are not needed.
If the traffic class in the QoS policy is configured correctly but the issue persists, proceed to the following steps.
2. Check the traffic behavior configuration in the QoS policy on the vBRAS-CP.
On the vBRAS-CP, execute the display traffic behavior user-defined command to check the configuration of the traffic behavior. For example, if the Committed Access Rate field displays CIR 300000 (kbps) in the command output, it means the rate limit is 300 Mbps for packets matching the traffic class. If the traffic behavior has configuration errors, execute the traffic behavior command to enter the view of the traffic behavior and execute the car command to modify the forwarding action of the traffic behavior.
<Sysname> display traffic behavior user-defined aaa
User-defined behavior information:
Behavior: aaa (ID 104)
Committed Access Rate:
CIR 300000 (kbps), CBS 18750000 (Bytes), EBS 0 (Bytes)
Green action : pass
Yellow action : pass
Red action : discard
If the traffic behavior in the QoS policy is configured correctly but the issue persists, proceed to execute the following steps.
3. Check the configuration of the QoS policy on the vBRAS-CP.
a. Execute the display qos policy command on the vBRAS- CP to check the configuration of the QoS policy. The Classifier field and Behavior field in the command output should correspond to the correct traffic class and traffic behavior configured in the previous steps. If the class-behavior association is incorrect, execute the qos policy command to enter the view of the QoS policy, and execute the classifier behavior command to modify the class-behavior association of the QoS policy. If the configuration is correct, proceed to step b.
<Sysname> display qos policy user-defined aaa
User-defined QoS policy information:
Policy: aaa (ID 104)
Classifier: aaa (ID 1)
Behavior: aaa
Committed Access Rate:
CIR 300000 (kbps), CBS 18750000 (Bytes), EBS 0 (Bytes)
Green action : pass
Yellow action : pass
Red action : discard
b. Execute the display qos policy interface command on the vBRAS-CP to check the configuration of the QoS policy applied to the interface. If an incorrect QoS policy is applied to the remote interface or the QoS policy is not applied to the outbound direction, execute the undo qos apply policy command on remote interface Remote-GE 1024/1/3/0 to remove the incorrect configuration, and then execute the qos apply policy command to apply the correct QoS policy.
<Sysname> display qos policy interface Remote-GE 1024/1/3/0
Interface: Remote-GE 1024/1/3/0
Direction: Outbound
Policy: aaa
Classifier: aaa
Matched : 231231 (Packets) 69348888 (Bytes)
Operator: AND
Rule(s) :
If-match acl 3001
Behavior: aaa
Committed Access Rate:
CIR 300000 (kbps), CBS 18750000 (Bytes), EBS 0 (Bytes)
Green action : pass
Yellow action : pass
Red action : discard
Green packets : 231231 (Packets) 69348888 (Bytes)
Yellow packets: 0 (Packets) 0 (Bytes)
Red packets : 0 (Packets) 0 (Bytes)
c. If no QoS policy is applied to the interface, execute the display qos policy global command to check the configuration of the QoS policy applied to the outbound direction of the specified UP globally. If the QoS policy applied to the specified UP globally is incorrect or the QoS policy is not applied to the outbound direction, execute the undo qos apply policy global command on the vBRAS-CP to remove the incorrect QoS policy, and then execute the qos apply policy global command to apply the correct QoS policy.
<Sysname> display qos policy global up-id 1024
Direction: Outbound
Policy: aaa
Classifier: default-class
Matched : 0 (Packets) 0 (Bytes)
Operator: AND
Rule(s) :
If-match any
Behavior: be
-none-
Classifier: aaa
Matched : 14 (Packets) 2260 (Bytes)
Operator: AND
Rule(s) :
If-match acl 3001
Behavior: aaa
Committed Access Rate:
CIR 300000 (kbps), CBS 18750000 (Bytes), EBS 0 (Bytes)
Green action : pass
Yellow action : pass
Red action : discard
Green packets : 0 (Packets) 0 (Bytes)
Yellow packets: 0 (Packets) 0 (Bytes)
Red packets : 0 (Packets) 0 (Bytes)
If the configuration of the preceding QoS policy is correct and the QoS policy is applied normally but the issue persists, proceed to the following steps.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration data and related log messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· QOS_POLICY_APPLYIF_CBFAIL
· QOS_POLICY_APPLYIF_FAIL
· QOS_POLICY_APPLYGLOBAL_CBFAIL
· QOS_POLICY_APPLYGLOBAL_FAIL
Troubleshooting IP tunneling and security VPN issues
IPsec issues
IKE negotiation triggering failures
Symptom
As shown in Figure 124, an IKE-based IPsec tunnel needs to be established between Device A and Device B to protect the private network traffic between Host A and Host B. The encapsulation mode for the IPsec tunnel is the tunnel mode. After completing the configuration on Device A and Device B, traffic fails to be forwarded between Host A and Host B.
After you execute the display ike sa command on Device A to view IKE SAs, no information is displayed.
<DeviceA> display ike sa
Connection-ID Local Remote Flag DOI
---------------------------------------------------------------
Flags:
RD--READY RL--REPLACED FD-FADING RK-REKEY
When you execute the display ike statistics command on Device A to view IKE statistics, no noticeable error is found.
<DeviceA> display ike statistics
IKE statistics:
No matching proposal: 0
Invalid ID information: 0
Unavailable certificate: 0
Unsupported DOI: 0
Unsupported situation: 0
Invalid proposal syntax: 0
Invalid SPI: 0
Invalid protocol ID: 0
Invalid certificate: 0
Authentication failure: 0
…
Common causes
The following are the common causes of this type of issue:
· A host cannot reach the corresponding IPsec gateway, or the IPsec gateways cannot reach each other.
· The configuration of the route from an IPsec gateway to the subnet where the peer host resides is incorrect.
· The configurations of the security policies between security zones are incorrect.
· The IPsec policy configurations are incorrect.
· The configurations of IKE profiles and IKE proposals are incorrect.
· The configurations of the protected data flows configured on the IPsec gateways are incorrect.
Troubleshooting flow
Figure 125 shows the troubleshooting flowchart.
Figure 125 Flowchart for troubleshooting IKE negotiation triggering failures
Solution
1. Check whether Host A and Host B can ping their respective IPsec gateways, and whether the IPsec gateways can ping each other:
Execute the ping command to check the network connectivity.
a. If the ping is unsuccessful, continue troubleshooting according to the procedures for troubleshooting ping failures in network management and monitoring troubleshooting guide. Make sure Host A and Host B can ping their respective IPsec gateways, and the IPsec gateways can ping each other.
b. If the issue persists, go to step 2.
2. Check whether the configuration of the route from each IPsec gateway to the subnet where the peer host resides is correct:
a. On each IPsec gateway, execute the display ip routing-table command to view the route information. Make sure a route to the subnet where the peer host resides exists on each IPsec gateway.
For example, the route information on Device A is as follows:
<DeviceA> display ip routing-table
Destinations : 1 Routes : 1
Destination/Mask Proto Pre Cost NextHop Interface
10.1.2.0/24 Static 60 0 2.2.2.2 GE2/0/2
The route information on Device B is as follows:
<DeviceB> display ip routing-table
Destinations : 1 Routes : 1
Destination/Mask Proto Pre Cost NextHop Interface
10.1.1.0/24 Static 60 0 2.2.3.2 GE2/0/2
b. If the route information is incorrect, configure the routes on Device A and Device B correctly as below:
<DeviceA> system-view
[DeviceA] ip route-static 10.1.2.0 24 2.2.2.2
<DeviceB> system-view
[DeviceB] ip route-static 10.1.1.0 24 2.2.3.2
c. If the issue persists, go to step 3.
3. Check whether the configurations of the security policies between the security zones are correct:
Check the security zone and security policy configurations on Device A. Make sure rules permitting traffic between the security zones have been configured in the security policies. If not, configure the security policies as follows:
a. Configure rules to permit traffic between the Untrust and Local security zones, so that the devices can establish an IPsec tunnel:
# Configure a rule named ipseclocalout to allow Device A to send IPsec negotiation packets to Device B.
[DeviceA] security-policy ip
[DeviceA-security-policy-ip] rule name ipseclocalout
[DeviceA-security-policy-ip-0-ipseclocalout] source-zone local
[DeviceA-security-policy-ip-0-ipseclocalout] destination-zone untrust
[DeviceA-security-policy-ip-0-ipseclocalout] source-ip-host 2.2.2.1
[DeviceA-security-policy-ip-0-ipseclocalout] destination-ip-host 2.2.3.1
[DeviceA-security-policy-ip-0-ipseclocalout] action pass
[DeviceA-security-policy-ip-0-ipseclocalout] quit
# Configure a rule named ipseclocalin to allow Device A to receive the IPsec negotiation packets sent from Device B.
[DeviceA-security-policy-ip] rule name ipseclocalin
[DeviceA-security-policy-ip-1-ipseclocalin] source-zone untrust
[DeviceA-security-policy-ip-1-ipseclocalin] destination-zone local
[DeviceA-security-policy-ip-1-ipseclocalin] source-ip-host 2.2.3.1
[DeviceA-security-policy-ip-1-ipseclocalin] destination-ip-host 2.2.2.1
[DeviceA-security-policy-ip-1-ipseclocalin] action pass
[DeviceA-security-policy-ip-1-ipseclocalin] quit
b. Configure rules to permit the traffic between Host A and Host B:
# Configure a rule named trust-untrust to permit the packets from Host A to Host B.
[DeviceA-security-policy-ip] rule name trust-untrust
[DeviceA-security-policy-ip-2-trust-untrust] source-zone trust
[DeviceA-security-policy-ip-2-trust-untrust] destination-zone untrust
[DeviceA-security-policy-ip-2-trust-untrust] source-ip-subnet 10.1.1.0 24
[DeviceA-security-policy-ip-2-trust-untrust] destination-ip-subnet 10.1.2.0 24
[DeviceA-security-policy-ip-2-trust-untrust] action pass
[DeviceA-security-policy-ip-2-trust-untrust] quit
# Configure a rule named untrust-trust to permit the packets from Host B to Host A.
[DeviceA-security-policy-ip] rule name untrust-trust
[DeviceA-security-policy-ip-3-untrust-trust] source-zone untrust
[DeviceA-security-policy-ip-3-untrust-trust] destination-zone trust
[DeviceA-security-policy-ip-3-untrust-trust] source-ip-subnet 10.1.2.0 24
[DeviceA-security-policy-ip-3-untrust-trust] destination-ip-subnet 10.1.1.0 24
[DeviceA-security-policy-ip-3-untrust-trust] action pass
[DeviceA-security-policy-ip-3-untrust-trust] quit
[DeviceA-security-policy-ip] quit
Check the security zone and security policy configurations on Device B. Make sure rules permitting traffic between the security zones have been configured in the security policies. If not, configure the security policies as follows:
a. Configure rules to permit traffic between the Untrust and Local security zones, so that the devices can establish an IPsec tunnel:
# Configure a rule named ipseclocalout to allow Device B to send IPsec negotiation packets to Device A.
[DeviceB] security-policy ip
[DeviceB-security-policy-ip] rule name ipseclocalout
[DeviceB-security-policy-ip-0-ipseclocalout] source-zone local
[DeviceB-security-policy-ip-0-ipseclocalout] destination-zone untrust
[DeviceB-security-policy-ip-0-ipseclocalout] source-ip-host 2.2.3.1
[DeviceB-security-policy-ip-0-ipseclocalout] destination-ip-host 2.2.2.1
[DeviceB-security-policy-ip-0-ipseclocalout] action pass
[DeviceB-security-policy-ip-0-ipseclocalout] quit
# Configure a rule named ipseclocalin to allow Device B to receive the IPsec negotiation packets sent from Device A.
[DeviceB-security-policy-ip] rule name ipseclocalin
[DeviceB-security-policy-ip-1-ipseclocalin] source-zone untrust
[DeviceB-security-policy-ip-1-ipseclocalin] destination-zone local
[DeviceB-security-policy-ip-1-ipseclocalin] source-ip-host 2.2.2.1
[DeviceB-security-policy-ip-1-ipseclocalin] destination-ip-host 2.2.3.1
[DeviceB-security-policy-ip-1-ipseclocalin] action pass
[DeviceB-security-policy-ip-1-ipseclocalin] quit
b. Configure rules to permit traffic between Host B and Host A:
# Configure a rule named trust-untrust to permit the packets from Host B to Host A.
[DeviceB-security-policy-ip] rule name trust-untrust
[DeviceB-security-policy-ip-2-trust-untrust] source-zone trust
[DeviceB-security-policy-ip-2-trust-untrust] destination-zone untrust
[DeviceB-security-policy-ip-2-trust-untrust] source-ip-subnet 10.1.2.0 24
[DeviceB-security-policy-ip-2-trust-untrust] destination-ip-subnet 10.1.1.0 24
[DeviceB-security-policy-ip-2-trust-untrust] action pass
[DeviceB-security-policy-ip-2-trust-untrust] quit
# Configure a rule named untrust-trust to permit the packets from Host A to Host B.
[DeviceB-security-policy-ip] rule name untrust-trust
[DeviceB-security-policy-ip-3-untrust-trust] source-zone untrust
[DeviceB-security-policy-ip-3-untrust-trust] destination-zone trust
[DeviceB-security-policy-ip-3-untrust-trust] source-ip-subnet 10.1.1.0 24
[DeviceB-security-policy-ip-3-untrust-trust] destination-ip-subnet 10.1.2.0 24
[DeviceB-security-policy-ip-3-untrust-trust] action pass
[DeviceB-security-policy-ip-3-untrust-trust] quit
[DeviceB-security-policy-ip] quit
For more information, see security policy issues in Troubleshooting Security.
If the issue persists, go to step 4.
4. Check whether the IPsec policy configurations are correct:
a. Execute the display ipsec policy command on the local IPsec gateway Device A. View the peer address, displayed in the Remote address field, that has been configured in the corresponding IPsec policy.
[DeviceA] display ipsec policy
-----------------------------
IPsec Policy: mypolicy
-----------------------------
Sequence number: 2
Alias: hub1-spoke2
Mode: ISAKMP
-----------------------------
Description: This is my complete policy
Traffic Flow Confidentiality: Enabled
Security data flow: 3002
Selector mode: standard
Local address:2.2.2.1
Remote address: 2.2.3.1
Remote address:
Remote address switchback mode: Enabled
Transform set: completetransform
b. Execute the display ipsec policy command on the peer IPsec gateway Device B. View the address displayed in the Local address field, which is either the local address configured in the corresponding IPsec policy, or the address of interface applying the IPsec policy (if no local address is configured).
[DeviceB] display ipsec policy
-----------------------------
IPsec Policy: mypolicy
-----------------------------
Sequence number: 2
Alias: hub1-spoke2
Mode: ISAKMP
-----------------------------
Description: This is my complete policy
Traffic Flow Confidentiality: Enabled
Security data flow: 3002
Selector mode: standard
Local address: 2.2.3.1
Remote address: 2.2.2.1
Remote address:
Remote address switchback mode: Enabled
Transform set: completetransform
c. Verify that the addresses displayed in the two fields are the same.
d. If the issue persists, go to step 5.
5. Check whether the configurations of IKE profiles and IKE proposals are incorrect:
a. Check the IKE profile configuration on each device. Verify that the local and peer IPsec gateway addresses are configured correctly. If preshared key authentication is used, the preshared keys configured (using the pre-shared-key command) on the local and peer ends must be the same. If RSA signature or digital envelope authentication is used, make sure the digital certificate is within the validity period (which can be viewed by the display pki certificate domain command).
For example, the IKE profile configuration on Device A is as follows:
[DeviceA] ike keychain keychain1
[DeviceA-ike-keychain-keychain1] pre-shared-key address 2.2.3.1 255.255.255.0 key simple 123456TESTplat&!
[DeviceA-ike-keychain-keychain1] quit
[DeviceA] ike profile profile1
[DeviceA-ike-profile-profile1] keychain keychain1
[DeviceA-ike-profile-profile1] local-identity address 2.2.2.1
[DeviceA-ike-profile-profile1] match remote identity address 2.2.3.1 255.255.255.0
[DeviceA-ike-profile-profile1] quit
The IKE profile configuration on Device B is as follows:
[DeviceB] ike keychain keychain1
[DeviceB-ike-keychain-keychain1] pre-shared-key address 2.2.2.1 255.255.255.0 key simple 123456TESTplat&!
[DeviceB-ike-keychain-keychain1] quit
[DeviceB] ike profile profile1
[DeviceB-ike-profile-profile1] keychain keychain1
[DeviceB-ike-profile-profile1] local-identity address 2.2.3.1
[DeviceB-ike-profile-profile1] match remote identity address 2.2.2.1 255.255.255.0
[DeviceB-ike-profile-profile1] quit
b. Execute the display ike proposal command on Device A and Device B to check whether the IKE proposal configurations are consistent. Make sure the configuration parameters are consistent, as shown as below:
[DeviceA] display ike proposal
Priority Authentication Authentication Encryption Diffie-Hellman Duration
method algorithm algorithm group (seconds)
----------------------------------------------------------------------------
default PRE-SHARED-KEY SHA1 DES-CBC Group 1 86400
[DeviceB] display ike proposal
Priority Authentication Authentication Encryption Diffie-Hellman Duration
method algorithm algorithm group (seconds)
----------------------------------------------------------------------------
default PRE-SHARED-KEY SHA1 DES-CBC Group 1 86400
c. If the issue persists, go to step 6.
6. Check whether the configuration of the data flow to be protected on each IPsec gateway is correct:
a. On Device A, execute the display ipsec policy command to view the ACL used by the IPsec policy (displayed in the Security data flow field).
[DeviceA] display ipsec policy
-----------------------------
IPsec Policy: mypolicy
-----------------------------
Sequence number: 2
Alias: hub1-spoke2
Mode: ISAKMP
-----------------------------
Description: This is my complete policy
Traffic Flow Confidentiality: Enabled
Security data flow: 3002
Then, on Device A, execute the display acl command to check whether the rule information of ACL 3002 is consistent with the scope of data flows to be protected.
[Device A] display acl 3002
Advanced IPv4 ACL 3002, 1 rule,
ACL's step is 5
rule 0 permit ip source 10.1.1.0 0.0.0.255 destination 10.1.2.0 0.0.0.255
If the configuration is incorrect, configure an IPv4 advanced ACL to identify data flows from subnet where Host A resides to the subnet where Host B resides correctly.
[DeviceA] acl advanced 3002
[DeviceA-acl-ipv4-adv-3002] rule 0 permit ip source 10.1.1.0 0.0.0.255 destination 10.1.2.0 0.0.0.255
[DeviceA-acl-ipv4-adv-3002] quit
[DeviceA] ipsec policy policy2 1 isakmp
[DeviceA-ipsec-policy-isakmp-policy2-1] security acl 3002 aggregation
b. On Device B, execute the display ipsec policy command to view the ACL used by the IPsec policy (displayed in the Security data flow field).
[DeviceB] display ipsec policy
-----------------------------
IPsec Policy: mypolicy
-----------------------------
Sequence number: 2
Alias: hub1-spoke2
Mode: ISAKMP
-----------------------------
Description: This is my complete policy
Traffic Flow Confidentiality: Enabled
Security data flow: 3002
Then, on Device B, execute the display acl command to check whether the rule information of ACL 3002 is consistent with the scope of data flows to be protected.
Show the ACL 3002 details on Device A.
Advanced IPv4 Access Control List 3002, which contains 1 rule,
The step size of ACL is 5.
Allow IP traffic with a source address of 10.1.2.0-10.1.2.255 and a destination address of 10.1.1.0-10.1.1.255 by using rule 0.
If the configuration is incorrect, configure an IPv4 advanced ACL to identify data flows from subnet where Host B resides to the subnet where Host A resides correctly.
[DeviceB] acl advanced 3002
[DeviceB-acl-ipv4-adv-3002] rule 0 permit ip source 10.1.2.0 0.0.0.255 destination 10.1.1.0 0.0.0.255
[DeviceB-acl-ipv4-adv-3002] quit
[DeviceB] ipsec policy policy2 1 isakmp
[DeviceB-ipsec-policy-isakmp-policy2-1] security acl 3002 aggregation
c. If the issue persists, go to step 7.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failures in triggering IKE negotiations (using an IPsec profile)
Symptom
As shown in Figure 126, an IKE-based IPsec tunnel needs to be established between Device A and Device B to protect the private network traffic between Host A and Host B. The encapsulation mode for the IPsec tunnel is the tunnel mode. After completing the configuration on Device A and Device B, traffic fails to be forwarded between Host A and Host B.
After you execute the display ike sa command on Device A, no information is displayed, which indicates that the phase-1 IKE negotiation was unsuccessful. RD is displayed in the Flag field after you execute the display ike sa command and no information is displayed after you execute the display ipsec sa command. This indicates that the phase-2 IKE negotiation was also unsuccessful.
<DeviceA> display ike sa
Connection-ID Local Remote Flag DOI
---------------------------------------------------------------
Flags:
RD--READY RL--REPLACED FD-FADING RK-REKEY
<DeviceA> display ipsec sa
<DeviceA>
When you execute the display ike statistics command on Device A to view IKE statistics, no noticeable error is found.
<DeviceA> display ike statistics
IKE statistics:
No matching proposal: 0
Invalid ID information: 0
Unavailable certificate: 0
Unsupported DOI: 0
Unsupported situation: 0
Invalid proposal syntax: 0
Invalid SPI: 0
Invalid protocol ID: 0
Invalid certificate: 0
Authentication failure: 0
…
After you execute the display ipsec statistics command on Device A to view IPsec statistics, no noticeable error is found.
<DeviceA> display ipsec statistics
IPsec packet statistics:
Received/sent packets: 0/0
Received/sent bytes: 0/0
Received/sent packet rate: 0/0 packets/sec
Received/sent byte rate: 0/0 bytes/sec
Dropped packets (received/sent): 0/0
Dropped packets statistics
No available SA: 0
Wrong SA: 0
Invalid length: 0
Authentication failure: 0
Encapsulation failure: 0
Decapsulation failure: 0
Replayed packets: 0
ACL check failure: 0
MTU check failure: 0
Loopback limit exceeded: 0
Crypto speed limit exceeded: 0
Common causes
The following are the common causes of this type of issue:
· The route between IPsec gateways is unreachable.
· The IPsec profile configuration is incorrect.
· The configurations of the IKE profiles and IKE proposals are incorrect.
Troubleshooting flow
Figure 127 shows the troubleshooting flowchart.
Solution
1. Check whether the IPsec gateways can ping each other:
Use the ping command to check the network connectivity.
a. If the ping is unsuccessful, continue troubleshooting according to the procedures for troubleshooting ping failures in network management and monitoring troubleshooting guide. Make sure Host A and Host B can ping their respective IPsec gateways, and the IPsec gateways can ping each other.
b. If the issue persists, go to step 2.
2. Check whether the IPsec profile configurations are correct:
a. Execute the display ipsec profile command to check whether the configurations on the local IPsec gateway Device A and the peer IPsec gateway Device B are complete. Verify that both transform set and IKE profile have been configured on each device. Make sure security proposals with the same encryption algorithm, authentication algorithm, and PFS are configured on the devices.
For example, the output on Device A is as follows:
[DeviceA] display ipsec profile
-------------------------------------------
IPsec profile: myprofile
Alias: ccc
Mode: isakmp
-------------------------------------------
Transform set: tran1
IKE profile: profile
SA duration(time based): 3600 seconds
SA duration(traffic based): 1843200 kilobytes
SA soft-duration buffer(time based): 1000 seconds
SA soft-duration buffer(traffic based): 43200 kilobytes
SA idle time: 100 seconds
[DeviceA] display ipsec transform-set
IPsec transform set: tran1
State: complete
Encapsulation mode: tunnel
ESN: Enabled
PFS:
Transform: AH-ESP
AH protocol:
Integrity: SHA1
ESP protocol:
Integrity: SHA1
Encryption: AES-CBC-128
The output on Device B is as follows:
[DeviceB] display ipsec profile
-------------------------------------------
IPsec profile: myprofile
Alias: ddd
Mode: isakmp
-------------------------------------------
Transform set: tran1
IKE profile: profile
SA duration(time based): 3600 seconds
SA duration(traffic based): 1843200 kilobytes
SA soft-duration buffer(time based): 1000 seconds
SA soft-duration buffer(traffic based): 43200 kilobytes
SA idle time: 100 seconds
[DeviceB] display ipsec transform-set
IPsec transform set: tran1
State: complete
Encapsulation mode: tunnel
ESN: Enabled
PFS:
Transform: AH-ESP
AH protocol:
Integrity: SHA1
ESP protocol:
Integrity: SHA1
Encryption: AES-CBC-128
b. If the issue persists, go to step 3.
3. Check whether the IPsec profiles are correctly configured on the tunnel interfaces.
a. Execute the interface tunnel command on the IPsec gateway Device A to enter tunnel interface Tunnel 1. Execute the display this command to check whether the local and peer addresses and the IPsec profile are configured correctly on the tunnel interface.
[DeviceA] interface tunnel 1
[DeviceA-Tunnel1] display this
#
interface Tunnel1 mode ipsec
ip address 3.3.3.1 255.255.255.0
source 2.2.2.1
destination 2.2.3.1
tunnel protection ipsec profile myprofile
[DeviceA-Tunnel1] quit
If configuration errors exist, modify the configuration as follows:
[DeviceA] interface tunnel 1 mode ipsec
[DeviceA-Tunnel1] ip address 3.3.3.1 255.255.255.0
[DeviceA-Tunnel1] source 2.2.2.1
[DeviceA-Tunnel1] destination 2.2.3.1
[DeviceA-Tunnel1] tunnel protection ipsec profile myprofile
[DeviceA-Tunnel1] quit
b. Execute the interface tunnel command on the IPsec gateway Device B to enter tunnel interface Tunnel 1. Execute the display this command to check whether the local and peer addresses and the IPsec profile are configured correctly on the tunnel interface.
[DeviceB] interface tunnel 1
[DeviceB-Tunnel1] display this
#
interface Tunnel1 mode ipsec
ip address 3.3.3.2 255.255.255.0
source 2.2.3.1
destination 2.2.2.1
tunnel protection ipsec profile myprofile
[DeviceB-Tunnel1] quit
If configuration errors exist, modify the configuration as follows:
[DeviceB] interface tunnel 1 mode ipsec
[DeviceB-Tunnel1] ip address 3.3.3.2 255.255.255.0
[DeviceB-Tunnel1] source 2.2.3.1
[DeviceB-Tunnel1] destination 2.2.2.1
[DeviceB-Tunnel1] tunnel protection ipsec profile myprofile
[DeviceB-Tunnel1] quit
c. If the issue persists, go to step 4.
4. Check whether the IKE profile and IKE proposal configurations are correct.
a. Check the IKE profile configuration on each device. Verify that the local and peer IPsec gateway addresses are configured correctly. If preshared key authentication is used, the preshared keys configured (using the pre-shared-key command) on the local and peer ends must be the same. If RSA signature or digital envelope authentication is used, make sure the digital certificate is within the validity period (displayed in the Validity field of the output for the display pki certificate domain command).
For example, the IKE profile configuration on Device A is as follows:
[DeviceA] ike keychain keychain1
[DeviceA-ike-keychain-keychain1] pre-shared-key address 2.2.3.1 255.255.255.0 key simple 123456TESTplat&!
[DeviceA-ike-keychain-keychain1] quit
[DeviceA] ike profile profile
[DeviceA-ike-profile-profile] keychain keychain1
[DeviceA-ike-profile-profile] local-identity address 2.2.2.1
[DeviceA-ike-profile-profile] match remote identity address 2.2.3.1 255.255.255.0
[DeviceA-ike-profile-profile] quit
The IKE profile configuration on Device B is as follows:
[DeviceB] ike keychain keychain1
[DeviceB-ike-keychain-keychain1] pre-shared-key address 2.2.2.1 255.255.255.0 key simple 123456TESTplat&!
[DeviceB-ike-keychain-keychain1] quit
[DeviceB] ike profile profile
[DeviceB-ike-profile-profile] keychain keychain1
[DeviceB-ike-profile-profile] local-identity address 2.2.3.1
[DeviceB-ike-profile-profile] match remote identity address 2.2.2.1 255.255.255.0
[DeviceB-ike-profile-profile] quit
b. Execute the display ike proposal command on IPsec gateways Device A and Device B respectively to view the IKE proposal configurations. Verify that the IKE proposal configurations are consistent.
[DeviceA] display ike proposal
Priority Authentication Authentication Encryption Diffie-Hellman Duration
method algorithm algorithm group (seconds)
----------------------------------------------------------------------------
default PRE-SHARED-KEY SHA1 DES-CBC Group 1 86400
[DeviceB] display ike proposal
Priority Authentication Authentication Encryption Diffie-Hellman Duration
method algorithm algorithm group (seconds)
----------------------------------------------------------------------------
default PRE-SHARED-KEY SHA1 DES-CBC Group 1 86400
c. If the issue persists, go to step 5.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Collected information related to establishment of the IPsec tunnel after you execute the debugging commands as follows.
<DeviceA> terminal debugging
The current terminal is enabled to display debugging logs.
<DeviceA> terminal monitor
The current terminal is enabled to display logs.
<DeviceA> debugging ike all
<DeviceA> debugging ipsec all
IP tunneling issues
Failure in pinging the IP address of the remote tunnel interface from the local tunnel interface for a P2P tunnel
Symptom
After you configure a P2P tunnel (for example, a GRE, IPv4, or IPv6 tunnel), you cannot ping the IP address of the remote tunnel interface from the IP address of the local tunnel interface.
This section uses a GRE/IPv4 tunnel to describe the troubleshooting procedure.
|
NOTE: The troubleshooting procedure in this section is not applicable to P2MP tunnels like DS-Lite and GRE P2MP tunnels. |
Common causes
The following are the common causes of this type of issue:
· Configuration errors. For example, the tunnel modes at the two ends of the tunnel are inconsistent, or no source or destination address is configured on any of the tunnel interfaces attached to the tunnel. Another example is that the source and destination addresses at one end are not the destination and source addresses at the other end, respectively.
· Physical link disconnectivity. The tunnel interface at each end cannot come up because no routes exist between the source and destination addresses of the tunnel. Another case is that the routes for the physical links that the tunnel relies on are all down. In this case, the intermediate devices drop tunneled packets even if the tunnel interfaces at both ends are up.
Troubleshooting flow
Figure 128 shows the troubleshooting flowchart.
Solution
1. Verify that the tunnel interface configuration is complete on both ends of the tunnel.
Execute the display current-configuration interface tunnel command on both ends of the tunnel to display the tunnel interface configuration. Make sure the tunnel source address, tunnel destination address, and IP address of the tunnel interface have all been configured on each end.
<Sysname> display current-configuration interface tunnel 1
#
interface Tunnel1 mode gre
ip address 10.1.1.1 255.255.255.0
source 1.1.1.1
destination 1.1.1.2
#
If the configuration of the tunnel interface on one end is incomplete, supplement the missing configuration. The following information provides an example of the tunnel interface configuration:
<Sysname> system-view
[Sysname] interface tunnel 1 mode gre
[Sysname-Tunnel1] ip address 10.1.1.1 255.255.255.0
[Sysname-Tunnel1] source 1.1.1.1
[Sysname-Tunnel1] destination 1.1.1.2
2. Verify that the encapsulation modes at both ends of the tunnel are the same.
On each end, execute the display current-configuration interface tunnel command to display the encapsulation mode of the tunnel interface.
<Sysname> display current-configuration interface tunnel 1
#
interface Tunnel1 mode gre
ip address 10.1.1.1 255.255.255.0
source 1.1.1.1
destination 1.1.1.2
#
If the encapsulation modes at both ends are inconsistent, you must first execute the undo interface tunnel command to delete the tunnel interface with an incorrect mode, and then execute the interface tunnel command to re-create the tunnel interface. Deleting a tunnel interface also deletes the configuration on that tunnel interface. You must reconfigure the tunnel source address, tunnel destination address, and IP address of the tunnel interface after the tunnel interface is re-created.
3. Verify that the source and destination addresses at one end of the tunnel are the destination and source addresses at the other end of the tunnel, respectively.
On each end, execute the display current-configuration interface tunnel command to display the tunnel interface configuration. Make sure the tunnel source address on the local end is the tunnel destination address on the remote end and the tunnel destination address on the local end is the tunnel source address on the remote end. In addition, the tunnel source address on each end must be a local address.
Local end:
<Sysname> display current-configuration interface tunnel 1
#
interface Tunnel1 mode gre
ip address 10.1.1.1 255.255.255.0
source 1.1.1.1
destination 1.1.1.2
#
Remote end:
<Sysname> display current-configuration interface tunnel 1
#
interface Tunnel1 mode gre
ip address 10.1.1.2 255.255.255.0
source 1.1.1.2
destination 1.1.1.1
#
If the tunnel source or destination address on one end is incorrectly configured, execute the source or destination command in tunnel interface view to reconfigure the tunnel source or destination address.
4. Verify that the GRE keys at both ends of the tunnel are identical.
You must configure the same GRE key at both ends of a GRE tunnel, or do not configure any GRE key at both ends of a GRE tunnel. To check the GRE key configuration, execute the display current-configuration interface tunnel command on both ends.
Local end:
#
interface Tunnel1 mode gre
ip address 10.1.1.1 255.255.255.0
source 1.1.1.1
destination 1.1.1.2
gre key 123
#
Remote end:
#
interface Tunnel1 mode gre
ip address 10.1.1.2 255.255.255.0
source 1.1.1.2
destination 1.1.1.1
gre key 123
#
If the GRE keys configured on both ends of the tunnel are different, execute the gre key command in tunnel interface view to configure the same GRE key on both ends.
5. Verify that the tunnel interfaces at both ends are already up.
Execute the display interface tunnel command to display the tunnel interface state. If the tunnel interface on one end is still down after you perform steps 1 and 2, you can continue to use the procedure for troubleshooting tunnel interface instability.
<Sysname> display interface tunnel 1
Tunnel1
Current state: UP
Line protocol state: UP
Description: Tunnel1 Interface
Bandwidth: 64kbps
Maximum transmission unit: 1476
Internet address: 10.1.2.1/24 (primary)
Tunnel source 2002::1:1 (Vlan-interface10), destination 2001::2:1
Tunnel TOS 0xC8, Tunnel TTL 255
Tunnel protocol/transport GRE/IPv6
...
6. Verify that the source and destination IP addresses of the tunnel have routes to reach each other.
Execute the display current-configuration interface tunnel command to identify whether the IP addresses of the tunnel interfaces at both ends of the tunnel belong to the same subnet. If they belong to the same subnet, the two ends will generate subnet routes by default. In this case, no physical link disconnectivity issue exists. If they do not belong to the same subnet, execute the display fib command to identify whether the source and destination IP addresses of the tunnel have routes to reach each other. If no routes are available, you must configure static or dynamic routes to make sure the source and destination IP addresses of the tunnel have routes to reach each other. If the issue persists, proceed to step 7.
<Sysname> display fib
Route destination count: 4
Directly-connected host count: 0
Flag:
U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static
R:Relay F:FRR
Destination/Mask Nexthop Flag OutInterface/Token Label
0.0.0.0/32 127.0.0.1 UH InLoop0 Null
1.1.1.2/24 192.168.126.1 USGF M-GE0/0/0 Null
127.0.0.0/8 127.0.0.1 U InLoop0 Null
127.0.0.0/32 127.0.0.1 UH InLoop0 Null
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Command output from the debugging commands in Table 18.
Command |
Description |
debugging tunnel |
Enable tunneling debugging. |
debugging gre |
Enable GRE debugging. |
debugging ip packet [ acl acl-number ] |
Enable IP packet debugging. |
debugging ipv6 packet [ acl acl-number ] |
Enable IPv6 packet debugging. |
debugging ip error |
Enable IP forwarding error debugging. |
debugging ip info [ acl acl-number ] |
Enable IP forwarding debugging. |
debugging ipv6 info [ acl acl-number ] |
Enable IPv6 forwarding debugging. |
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting user access and authentication issues
802.1X issues
802.1X user authentication failure
Symptom
A user fails 802.1X authentication or an exception occurs during 802.1X authentication.
Common causes
The following are the common causes of this type of issue:
· 802.1X is not enabled globally or on the interface that the user accesses.
· The 802.1X client cannot correctly send or receive authentication packets.
· The authentication method configured on the device is inconsistent with that on the RADIUS server.
· Incorrect settings exist in the authentication domain used by the 802.1X user or other authentication-related settings have errors.
· The RADIUS server does not respond.
· The RADIUS server rejects the authentication request of the user.
· Authorization attribute assignment fails.
· The MAC address of the 802.1X user is bound to an interface that is not the interface that the user accesses.
· The 802.1X user is in quiet state.
· The maximum number of online 802.1X users already reached.
Troubleshooting flow
Figure 129 shows the troubleshooting flowchart.
Figure 129 Flowchart for troubleshooting 802.1X user authentication failure
Solution
IMPORTANT: · As a best practice, do not enable debugging when the device is running correctly. However, you can enable debugging when a fault occurs for troubleshooting purposes. · Save the results of the steps in this section in a timely manner, so that you can quickly collect and provide feedback if the fault cannot be resolved. |
1. Verify that 802.1X is enabled globally and on the interface that the user accesses.
Execute the display dot1x command on the device to identify whether 802.1X is enabled both globally and on the interface that the user accesses.
¡ If message 802.1X is not configured appears, 802.1X is not enabled globally. You can execute the dot1x command in system view to globally enable 802.1X.
¡ If the output from the display dot1x command has global configuration information but does not have interface-specific configuration information, 802.1X is not enabled on the interface. You can execute the dot1x command in interface view.
2. Verify that the 802.1X client can correctly send and receive authentication packets.
¡ Verify that the 802.1X client version is a version supported by both the device and the server.
¡ Verify that the link between the device and the 802.1X client is correctly connected.
¡ Capture packets to inspect whether the device can correctly exchange data packets with the client and analyze the captured packet file to locate and resolve the issue.
3. Verify that the authentication method is consistent on the device and the RADIUS server.
On the device, 802.1X supports EAP termination (PAP and CHAP authentication methods) and EAP relay (EAP authentication method). When you configure the authentication method, follow these restrictions and guidelines:
¡ Make sure the authentication method configured on the device and the RADIUS server is consistent and the client supports the authentication method.
¡ Local authentication only supports EAP termination.
Execute the display dot1x command on the device to check the current 802.1X authentication method.
<Sysname> display dot1x
Global 802.1X parameters:
802.1X authentication : Enabled
DR member configuration conflict : Unknown
EAP authentication : Enabled
...
If the authentication method is inconsistent with the server, you can execute the dot1x authentication-method command to change the authentication method.
4. Verify that the authentication domain and its related settings are correctly configured.
The device chooses an authentication domain for an 802.1X user in the following order: The mandatory 802.1X authentication domain specified on the interface that the user accesses -> The ISP domain specified in the username -> The default ISP domain in the system.
a. Execute the display dot1x command on the device to examine whether a mandatory 802.1X authentication domain has been specified on the interface that the user accesses.
<Sysname> display dot1x
…
GigabitEthernet2/0/1 is link-up
802.1X authentication : Enabled
…
Multicast trigger : Enabled
Mandatory auth domain : Not configured
…
If a mandatory 802.1X authentication domain has been specified, execute the display domain command to verify that authentication methods are correctly configured in the mandatory 802.1X authentication domain.
b. If no mandatory 802.1X authentication domain has been specified, check the 802.1X username for a domain name. If the 802.1X username includes a domain name, verify that the domain name delimiter is also supported by the RADIUS server, and then locate the domain specified by the username and verify that the settings in the domain are correct.
c. If the 802.1X username does not include a domain name, check the configuration of the default authentication domain.
d. If the default authentication domain does not exist, identify whether the domain if-unknown command has been executed. If the command has been executed, verify that authentication methods are correctly configured in the domain specified by using the command.
e. If none of the above mentioned authentication domains on the device are available for the user, the user cannot complete the authentication.
5. Verify that the RADIUS server can respond to the device.
For more information about the troubleshooting procedure, see the issue of RADIUS server no response in “Troubleshooting AAA.”
6. Identify whether the offline reason is authentication rejection.
a. Execute the debugging dot1x event command to enable 802.1X authentication event debugging.
- If the system generates debugging message Local authentication request was rejected, the user is rejected by local authentication. Causes for local authentication rejection include nonexistence of local user account, incorrect username or password, and incorrect user service type.
- If the system generates debugging message The RADIUS server rejected the authentication request, the user is rejected by the RADIUS server. Many reasons can cause RADIUS server authentication rejection. The most common ones include absence of username on the server, incorrect username or password, and no matching RADIUS authorization policy. You can execute the debugging radius error command to enable RADIUS error debugging and check the debugging messages in the command output. In addition, execute the test-aaa command on the device to initiate a RADIUS request test. After you locate the issue, adjust the settings of the server, device, and client accordingly.
b. Execute the display aaa online-fail-record command and check the Online failure reason field for the authentication failure reason.
7. Verify that authorization attributes are assigned successfully to the user.
a. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates message Authorization failure, the authorization has failed.
b. Examine whether the device has been configured with the port-security authorization-fail offline command to enable the authorization-fail-offline feature.
- If the authorization-fail-offline feature is not enabled, users who fail authorization can still stay online. In this case, the authentication failure is not caused by an authorization failure, and you must continue to locate other fault reasons.
- If the authorization-fail-offline feature is enabled, execute the dot1x access-user log enable failed-login command to enable logging for 802.1X user login failures. In addition, use the DOT1X_LOGIN_FAILURE log to identify the failed authorization attributes, such as the authorization ACL and VLAN.
c. Verify that the authorization attributes, for example, the authorization ACL and VLAN, on the server are configured correctly, to ensure that the server assigns accurate authorization attributes to the user.
d. Execute the display acl and display vlan commands to verify that the corresponding authorization attributes exist on the device. If an authorization attribute does not exist, you must create it on the device to ensure that the user can obtain the authorization information.
8. Verify that the MAC address of the 802.1X user is not bound to an interface that is not the interface that the 802.1X user accesses.
a. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message MAC binding processing failure, the device fails to process the MAC address binding request for the 802.1X user.
b. Execute the dot1x access-user log enable failed-login command to enable logging for 802.1X user login failures. In addition, use the DOT1X_MACBINDING_EXIST log to determine that the reason for the user login failure is that the user's MAC address has been bound to another interface.
c. Use the undo dot1x mac-binding command on the device to delete the existing MAC address binding entry for the 802.1X user.
9. Examine whether the 802.1X user is in quiet state.
Execute the display dot1x command on the device and check the Quiet timer and Quiet period fields and the Auth state field in the Online 802.1X users area. If the quiet timer is enabled and the value for the Auth state is Unauthenticated for the 802.1X user, the 802.1X user is in quiet state.
The device cannot process any 802.1X authentication requests for the quiet 802.1X user until the quiet timer expires. You can wait until the quiet timer expires or execute the dot1x timer quiet-period command to shorten the quiet period. When the quiet timer expires, initiate 802.1X authentication for the user and verify that the user can pass 802.1X authentication.
10. Identify whether the number of online 802.1X users has reached the maximum value.
a. Execute the display dot1x interface command on the device to check the information on the interface that the user accesses. The Max online users field displays the maximum number of online users supported on the interface and the Online 802.1X users field displays the current number of online users on the interface. Compare the values for these two fields to determine whether the number of online 802.1X users has reached the maximum value.
b. If the number of online 802.1X users on the interface has reached the maximum value, you can execute the dot1x max-user command to increase the maximum number of online 802.1X users allowed on the interface.
c. If the number of online 802.1X users on the interface cannot be increased, you can wait for other users to go offline or connect the user to another interface.
11. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ The log messages collected by executing the dot1x access-user log enable command.
¡ The debugging messages collected by executing the debugging dot1x all and debugging radius all commands.
Related alarm and log messages
Alarm messages
N/A
Log messages
· DOT1X_CONFIG_NOTSUPPORT
· DOT1X_LOGIN_FAILURE
· DOT1X_MACBINDING_EXIST
802.1X user logoff
Symptom
An 802.1X user goes offline unexpectedly after it passes authentication successfully to come online.
Common causes
The following are the common causes of this type of issue:
· Settings related to 802.1X authentication have changes on the device.
· The user fails online user handshake.
· Real-time accounting fails for the user.
· 802.1X reauthentication fails.
· The server forces the user to go offline.
· The user goes offline after offline detection is enabled.
· The session of the user times out.
Troubleshooting flow
Figure 130 shows the troubleshooting flowchart.
Figure 130 Flowchart for troubleshooting 802.1X user logoff
Solution
IMPORTANT: · As a best practice, do not enable debugging when the device is running correctly. However, you can enable debugging when a fault occurs for troubleshooting purposes. · Save the results of the steps in this section in a timely manner, so that you can quickly collect and provide feedback if the fault cannot be resolved. |
1. Identify whether settings related to 802.1X authentication have changes on the device and verify that the changed settings are correct.
a. Execute the display dot1x command to examine whether the 802.1X authentication settings on the device have changes, and verify that the changed settings are correct.
b. Execute the display domain command to examine whether settings in the authentication domain used by the user have changes, and verify that the changed settings are correct.
2. Identify whether the user fails online user handshake and troubleshoot the cause of the failure.
a. Execute the display dot1x command to check the Handshake field to identify whether 802.1X online user handshake is enabled on the interface that the user accesses.
b. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message Handshake interaction failure, the user has failed online user handshake. You can capture packets to identify whether the device and the client can correctly send and receive EAP data packets and analyze the captured packet file to locate and resolve the issue.
3. Examine whether real-time accounting fails for the user and troubleshoot the cause of the failure.
Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message Real-time accounting failure, real-time accounting has failed for the user. In this case, check the link between the device and the accounting server for connectivity issues, and identify whether settings related to accounting have changes on the device and the accounting server. Verify that the changed settings are correct.
4. Identify whether the user is logged off due to a reauthentication failure and troubleshoot the cause of the failure.
a. Execute the display dot1x command to check the Periodic reauth field for the enabling status of 802.1X periodic reauthentication on the interface that the user accesses.
b. Execute the dot1x access-user log enable abnormal-logoff command to enable logging for exceptional logoffs of 802.1X users. Then, use the DOT1X_LOGOFF_ABNORMAL log to verify that the reason for the user exceptional logoff is reauthentication failure.
c. To troubleshoot the cause of the reauthentication failure, use the method in "802.1X user authentication failure."
5. Identify whether the user is logged off by the RADIUS server if RADIUS remote authentication is used.
Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message The RADIUS server forcibly logged out the user, the user is logged off by the RADIUS server. You can contact the server administrator to identify the reason for the logoff.
6. Identify whether the user is logged off because the device has not received any traffic from the user before the offline detection timer expires, and troubleshoot the issue.
a. Execute the display dot1x command to check the Offline detection field for the enabling status of the 802.1X offline detection feature on the interface that the user accesses.
b. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message Offline detect timer expired, it indicates that the device has not received any traffic from the user on the interface within the offline detection interval. As a result, the device cuts off the user's connection, causing the user to go offline.
c. Check the link between the client and the device for connectivity issues to troubleshoot the reason why the client did not send any packets.
7. Examine whether the session of the user has timed out.
a. Identify whether the session timeout time has been configured for the 802.1X user.
- If RADIUS remote authentication is used, execute the debugging radius packet command to enable RADIUS packet debugging and check the debugging messages to identify whether the response packets sent from the server contain the Session-Timeout attribute.
- If local authentication is used, execute the display local-user command to check for the existence of the Session-timeout field in the command output.
b. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message User session timed out, the user goes offline due to session timeout.
c. It is normal for a user to go offline due to session timeout, and the user can come online again.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ The offline reason displayed by executing the display aaa abnormal-offline-record or display aaa normal-offline-record command.
¡ The log messages collected by executing the dot1x access-user log enable command.
¡ The debugging messages collected by executing the debugging dot1x all and debugging radius all commands.
Related alarm and log messages
Alarm messages
N/A
Log messages
· DOT1X_LOGOFF
· DOT1X_LOGOFF_ABNORMAL
Troubleshooting AAA issues
Unable to execute some commands after logging into the device
Symptom
After logging into the device, the administrator does not have execution permissions for some commands, and the system prints a message of Permission denied.
Common causes
The common cause of this type of issue is that the authorization given to the user role is too limited.
Troubleshooting flow
Figure 131 shows the troubleshooting flowchart.
Figure 131 Flowchart for troubleshooting the issue of unable to execute some commands after login
Solution
1. Check whether the user role is a custom user role.
Log in to the device as a super administrator (with a network-admin, mdc-admin, or level-15 user role), execute the display line command to view the authentication mode for the user line, and take different processing steps according to the authentication mode used.
<Sysname> display line
Idx Type Tx/Rx Modem Auth Int Location
0 CON 0 9600 - N - 0/0
+ 81 VTY 0 - N - 0/0
+ 82 VTY 1 - P - 0/0
+ 83 VTY 2 - A - 0/0
...
¡ For authentication mode none or password (Auth field value: N or P), check whether the user role in the corresponding user line view is a custom user role. If it is not a custom user role, use the user-role role-name command to set a system predefined role with higher privileges.
¡ For the scheme authentication mode (Auth field value: A), first check the authentication method configured in the authentication domain for the login user.
If the domain's authentication method is local, use the display local-user command to check whether the user role is a custom user role. If not a custom user role, use the authorization-attribute user-role role-name command to assign a system predefined role with higher permissions (for example, network-admin).
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] authorization-attribute user-role network-admin
If the domain's authentication method is remote, contact the administrator of the remote authentication server to authorize a predefined system role with higher permissions.
2. Check whether the commands unable to execute are within the permissions allowed by the custom user role.
Execute the display role name role-name command to view the command rule associated with the user custom role.
If the commands executed by the user are outside the permissions of the command rule, add the permissions for these commands to the command rule for the custom user role through the rule command, or assign the user a predefined system role with higher privileges. Even if custom user roles are configured with higher permission rules, some commands are still unsupported. For details on these commands, see the RBAC configuration in Fundamentals Configuration Guide.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Unable to create or edit local users after logging into the device
Symptom
After logging into the device, the administrator cannot create or edit local users, and the system prompts a message of Insufficient right to perform the operation.
Common causes
The common cause of this type of issue is that the user role is not authorized to configure the target local users.
Troubleshooting flow
Figure 132 shows the troubleshooting flowchart.
Solution
1. Check whether the role of the current logged-in user is a predefined super administrator role, network-admin, mdc-admin, or level-15.
Only the predefined super administrator roles have the permission to create local users. Other user roles can only access their own local user views. If the logged-in user does not have a super administrator role, assign one to the user.
Execute this step only if you lack the permission to create local users. If you cannot modify local users, execute step 2.
2. Compare the permission scope of the logged-in user with that of the target user.
Execute the display role name role-name command to view the roles and permissions of both the logged-in user and the target user, and compare their permissions. If the logged-in user has lower permissions than the target user, assign the logged-in user a role with higher permissions.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Administrator not assigned a user role
Symptom
The administrator cannot successfully log in to the device, and the device does not offer three login attempts. For instance, when users attempt to log in via Telnet and enter their username and password, the device's login interface neither displays a message indicating AAA authentication failure nor prompts them to re-enter their credentials.
Common causes
The common cause of this type of issue is that the user is not assigned a user role.
Troubleshooting flow
Figure 133 shows the troubleshooting flowchart.
Figure 133 Flowchart for troubleshooting the issue of administrator not assigned a user role
Solution
1. Check whether the user is assigned with a user role.
Log in to the device as a super administrator (with a network-admin, mdc-admin, or level-15 user role), execute the display line command to view the authentication mode for the user line, and take different processing steps according to the authentication mode used.
<Sysname> display line
Idx Type Tx/Rx Modem Auth Int Location
0 CON 0 9600 - N - 0/0
+ 81 VTY 0 - N - 0/0
+ 82 VTY 1 - P - 0/0
+ 83 VTY 2 - A - 0/0
...
¡ For authentication mode none or password (Auth field value N or P), check whether the user role configuration exists in the corresponding user line view. If it does not, assign a user role (abc in this example) to the user line by using the user-role role-name command.
<Sysname> system-view
[Sysname] line vty 0 63
[Sysname-line-vty0-63] user-role abc
¡ For the scheme authentication mode (Auth field value: A), first check the authentication method configured in the authentication domain for the login user.
- If the domain's authentication method is local, use the display local-user command to view the authorized roles of the local user. If the User role list field is empty, it indicates that no user role is authorized for the user.
<Sysname> display local-user user-name test class manage
Total 1 local users matched.
Device management user test:
State: Active
Service type: Telnet
User group: system
Bind attributes:
Authorization attributes:
Work directory: flash:
User role list:
...
In this case, enter the local user view and execute the authorization-attribute user-role command to authorize the user role (abc in this example).
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] authorization-attribute user-role abc
- If the domain's authentication method is remote, contact the administrator of the authentication server to check whether the user has been authorized with a user role. If not, add the user-role authorization attribute for the user. Using the Free RADIUS server as an example, to add the user role network-admin in the users file, edit the script as follows:
user Cleartext-Password := "123456"
H3C-User-Roles ="shell:roles=\"network-admin\""
For adding user roles on other RADIUS servers, please follow the actual situation.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Invalid characters in login username
Symptom
The administrator failed to log in to the device, and the system printed the following log information:
Sysname LOGIN/5/LOGIN_INVALID_USERNAME_PWD: -MDC=1; Invalid username or password from xx.xx.xx.xx.
Common causes
The common cause of this type of issue is that the entered username contains invalid characters.
Troubleshooting flow
Figure 134 shows the troubleshooting flowchart.
Figure 134 Flowchart for troubleshooting the issue of username containing invalid characters
Solution
|
NOTE: This solution applies only to SSH and Telnet login users. |
1. Check whether the username entered by the user contains invalid characters.
When a user logs in to the device, the system checks the validity of the entered username and domain name. If the username contains characters "\", "|", "/", ":", "*", "?", "<", ">", and "@", or if the domain name contains "@", login is not allowed. In this case, users can try to log in again and enter the correct username.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
LOGIN_INVALID_USERNAME_PWD
Incorrect username or password for local authentication
Symptom
The administrator failed to log into the device using local authentication. If the device is enabled with event debugging for the local server (by using the debugging local-server event command), the system will print the following debugging information:
*Aug 18 10:36:58:514 2021 Sysname LOCALSER/7/EVENT: -MDC=1;
Authentication failed, user password is wrong.
Or
*Aug 18 10:37:24:962 2021 Sysname LOCALSER/7/EVENT: -MDC=1;
Authentication failed, user "t4" doesn't exist.
Common causes
The following are the common causes of this type of issue:
· The entered password is incorrect.
· The local username does not exist.
Troubleshooting flow
Figure 135 shows the troubleshooting flowchart.
Figure 135 Flowchart for troubleshooting incorrect local username or password
Solution
1. Identify whether the local username exists.
Execute the display local-user command to Identify whether a local user of the device management type exists with the same login username.
¡ If the local user does not exist, use the local-user command to create one (username test in this example) and notify the user to try logging in to the device again.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test]
¡ If the local user exists, execute step 2.
2. Check whether the entered password for the local user is correct.
If the system prompts incorrect password during user login, enter the local user view and execute the password command to reset the password (123456TESTplat&! in this example), and then notify the user to try logging into the device again.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] password simple 123456TESTplat&!
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, alarm messages, and debugging information.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Service type of local user mismatch
Symptom
The administrator failed to log into the device using local authentication. If the device is enabled with event debugging for the local server (by using the debugging local-server event command), the system will print the following debugging information:
*Aug 7 17:18:07:098 2021 Sysname LOCALSER/7/EVENT: -MDC=1; Authentication failed, unexpected user service type 64 (expected = 3072).
Common causes
The common cause of this type of issue is that the user's access type does not match the service type configured for the local user on the device, meaning the user's access type is not within the configured range of service types.
Troubleshooting flow
Figure 136 shows the troubleshooting flowchart.
Figure 136 Flowchart for troubleshooting service type of local user mismatch
Solution
1. Identify whether the user's access type falls within the range of service types configured for the local user.
a. Execute the display local-user command. The Service type field in the command output displays the service types the local user can use.
<Sysname> display local-user user-name test class manage
Total 1 local users matched.
Device management user test:
State: Active
Service type: Telnet
User group: system
Bind attributes:
Authorization attributes:
Work directory: flash:
User role list:
...
b. In local user view for this user, modify the service types that the user can use. Make sure the actually used access type (SSH in this example) is included.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] service-type ssh
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, alarm messages, and debugging information.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Denied access within a period due to excessive number of login failures
Symptom
After failing to log in to the device a specified number of times, an administrator is temporarily banned from attempting to log in again.
Common causes
The following are the common causes of this type of issue:
· The device has the login attack prevention feature enabled. After this feature is enabled, if a user fails to log in the specified number of times and their IP address gets blacklisted, the device will discard packets from that IP address. This prevents the user from logging in for a set duration.
· Users log in to the device using local authentication, and the device has the password control feature enabled. After a user login authentication fails, the system adds the user to the password management blacklist and restricts subsequent login attempts according to the measures configured. When a user login fails more times than the specified limit, the system will prohibit that user from logging in. After a period, the system allows the user to attempt to log in again.
Troubleshooting flow
Figure 137 shows the troubleshooting flowchart.
Figure 137 Flowchart for troubleshooting denied access within a period
Solution
1. Try to log in again after waiting for a certain period.
Incorrect password input might cause login prohibition. As a best practice, try to log in again after waiting for some time. If you encounter the same issue again when logging into the device with the correct username and password, switch to another administrator account that can access the device and continue with the following processing steps.
2. Check whether the user can initiate a login connection after being blocked.
¡ If the user is still able to initiate a login connection to the device after being blocked but fails to authenticate, execute the display password-control blacklist command in any view to Identify whether the user has been added to the blacklist. If the user is on the blacklist and the Lock flag in the display information is set to lock, it means the user is locked out.
<Sysname> display password-control blacklist
Per-user blacklist limit: 100.
Blacklist items matched: 1.
Username IP address Login failures Lock flag
test 3.3.3.3 4 lock
For users added to the blacklist, you can process them in either of the following methods:
- Execute the undo password-control enable command in system view to disable the global password control feature.
<Sysname> system-view
[Sysname] undo password-control enable
- Execute the reset password-control blacklist command in user view to clear the user (user test in this example) from the password control blacklist.
<Sysname> reset password-control blacklist user-name test
¡ If the user is blocked and cannot initiate a login connection to the device, execute step 3.
3. Identify whether the login attack prevention feature is enabled.
If the current configuration contains commands starting with attack-defense login, you can disable the login attack prevention feature as needed or change the maximum number of consecutive login failures and the block duration after a login failure.
¡ Use the undo attack-defense login enable command to disable login user attack prevention, and use the undo blacklist global enable command to disable the global blacklist.
<Sysname> system-view
[Sysname] undo attack-defense login enable
[Sysname] undo blacklist global enable
¡ Execute the attack-defense login max-attempt command to increase the maximum number of consecutive login failures, allowing more user login attempts. This number is set to 5 in the following example:
<Sysname> system-view
[Sysname] attack-defense login max-attempt 5
¡ Execute the attack-defense login block-timeout command to reduce the blocking time, allowing users to log in again as soon as possible. The blocking time is set to 1 minute in the following example:
<Sysname> system-view
[Sysname] attack-defense login block-timeout 1
Executing the above actions may weaken the device's defense against login DoS attacks, so proceed with caution.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Delayed reauthentication after login failure
Symptom
After an administrator fails to log in to a device, the console does not respond for a certain period, during which the administrator user cannot perform any operations.
Common causes
The common cause of this type of issue is that the device has the login reauthentication-delay feature enabled. After this feature is enabled, if a user login fails, the system will delay for a certain period before allowing the user to authenticate again.
Troubleshooting flow
Figure 138 shows the troubleshooting flowchart.
Figure 138 Flowchart for troubleshooting delayed reauthentication after login failure
Solution
1. Identify whether the login reauthentication delay feature is enabled.
If the current configuration contains the attack-defense login reauthentication-delay command, you can disable the login reauthentication delay feature or adjust the delay period as needed.
¡ Execute the undo attack-defense login reauthentication-delay command to disable the login reauthentication delay feature.
<Sysname> system-view
[Sysname] undo attack-defense login reauthentication-delay
¡ Execute the attack-defense login reauthentication-delay seconds command to reduce the wait time for reauthentication after a user login fails (for example, to 10 seconds).
<Sysname> system-view
[Sysname] attack-defense login reauthentication-delay 10
Executing the above actions may weaken the device's defense against login user dictionary attacks, so proceed with caution.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Maximum concurrent logins with identical local username reached
Symptom
When a certain number of local authentication users access the device with the same username, subsequent attempts to log in to the device with that username will fail.
If the device is enabled with event debugging for the local server (by using the debugging local-server event command), the system will print the following debugging information:
*Aug 18 10:52:56:664 2021 Sysname LOCALSER/7/EVENT: -MDC=1;
Authentication failed, the maximum number of concurrent logins already reached for the local user.
Common causes
The common cause of this type of issue is that the maximum number of concurrent logins has been set for the current local user name.
Troubleshooting flow
Figure 139 shows the troubleshooting flowchart.
Solution
1. Identify whether you have set the maximum number of concurrent logins for users using the current local user name.
Execute the display local-user command to view the local user configuration for that user name. If the value for the Access limit field is Enabled, it indicates that the maximum number of concurrent users using the current local user name has been set (2 in this example).
<Sysname> display local-user user-name test class manage
Total 1 local users matched.
Device management user test:
Service type: SSH/Telnet
Access limit: Enabled Max access number: 2
Service type: Telnet
User group: system
Bind attributes:
Authorization attributes:
Work directory: flash:
User role list: test
...
You can change or remove this access limit in the local user view as needed.
¡ To remove this access limit, execute the undo access-limit command.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] undo access-limit
¡ To change the limit to a bigger value (10 in this example), execute the access-limit max-user-number command.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] access-limit 10
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Maximum concurrent users of the same access type reached
Symptom
When a certain number of users access the device using the same login method, subsequent user logins using that method will fail.
If the device has enabled with event debugging for the related access module, the system will print the following debugging information:
%Aug 18 10:57:52:596 2021 Sysname TELNETD/6/TELNETD_REACH_SESSION_LIMIT: -MDC=1; Telnet client 1.1.1.1 failed to log in. The current number of Telnet sessions is 5. The maximum number allowed is (5).
Common causes
The common cause of this type of issue is that the maximum number of concurrent users is set for the specified login method.
Troubleshooting flow
Figure 140 shows the troubleshooting flowchart:
Solution
1. Identify whether you have set the maximum number of concurrent users for a specific login method.
If the aaa session-limit command exists in the current configuration, you can change the maximum number of users accessing the device using the current login method by executing the aaa session-limit { ftp | http | https | ssh | telnet } max-sessions command in system view. The following example changes this limit to 32.
<Sysname> system-view
[Sysname] aaa session-limit telnet 32
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
RADIUS server not respond
Symptom
Authentication, authorization, and accounting through RADIUS failed because the RADIUS server is not responding. If the device has RADIUS event debugging enabled (by executing the debugging radius event command), the system will print the following debugging information:
*Aug 8 17:49:06:143 2021 Sysname RADIUS/7/EVENT: -MDC=1; Reached the maximum retries
Common causes
The following are the common causes of this type of issue:
· The shared keys configured on the RADIUS server do not match those configured on the access device.
· The IP address of the device is not added to the RADIUS server or incorrect IP address is added to the RADIUS server for the device.
· Network issues exist between the RADIUS server and the access device, such as when a firewall in the intermediate network blocks the port numbers (default authentication port number 1812, default accounting port number 1813) used by the RADIUS server to provide AAA services.
Troubleshooting flow
Figure 141 shows the troubleshooting flowchart.
Figure 141 Flowchart for troubleshooting a non-responsive RADIUS server
Solution
1. Identify whether the shared keys configured on the RADIUS server match those on the access device.
¡ If the shared keys do not match, then:
# On the access device, execute the key authentication and key accounting commands in RADIUS scheme view to reconfigure the shared keys for authentication and accounting. The following example sets the authentication key to 123 and the accounting key to 456:
<Sysname> system-view
[Sysname] radius scheme radius1
[Sysname-radius-radius1] key authentication simple 123
[Sysname-radius-radius1] key accounting simple 456
# On the RADIUS server, reconfigure the shared keys for RADIUS message interaction with the access device to ensure consistency with the share key configuration on the access device.
¡ If the shared keys are consistent, execute step 2.
2. Identify whether the access device's IP address has been added to the RADIUS server or if the added IP address is correct.
The device IP address added on the RADIUS server must be the source IP address from which the access device sends RADIUS packets. You can set the source IP address used by the access device to send RADIUS packets by commands.
The access device selects the source IP address used to send RADIUS packets in the following order:
a. The NAS-IP address configured in RADIUS scheme view by using the nas-ip command.
b. The source NAS-IP address configured in system view by using the radius nas-ip command.
c. IP address of the outgoing interface sending the RADIUS packets.
3. Identify whether any network issues exist between the device and the server.
First, use methods like ping to verify network connectivity between the device and the server. Then, Identify whether firewalls exist within the network. Typically, if a network contains a firewall that blocks packets destined for the UDP port numbers of the RADIUS server (with default RADIUS authentication port number at 1812 and default RADIUS accounting port number at 1813), RADIUS packets will be discarded.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, alarm messages, and debugging information.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
HWTACACS server not respond
Symptom
Authentication, authorization, and accounting failed using the HWTACACS server. If the device has HWTACACS event debugging enabled (by using debugging hwtacacs event command), the system prints Connection timed out in the event debugging information.
Common causes
The following are the common causes of this type of issue:
· The shared keys configured on the HWTACACS server do not match those configured on the access device.
· The IP address of the device is not added to the HWTACACS server or incorrect IP address is added to the HWTACACS server for the device.
· Network issues exist between the HWTACACS server and the access device, such as when a firewall in the intermediate network blocks the port number (default authentication/authorization/accounting port number 49) used by the HWTACACS server to provide AAA services.
Troubleshooting flow
Figure 142 shows the troubleshooting flowchart.
Figure 142 Flowchart for troubleshooting non-responsive HWTACACS server
Solution
1. Identify whether the shared keys configured on the HWTACACS server match those on the access device.
¡ If the shared keys do not match, then:
# On the access device, execute the key authentication, key authorization, and key accounting commands in HWTACACS scheme view to reconfigure the shared keys for authentication, authorization, and accounting (in the example below, the authentication and authorization keys are 123, and the accounting key is 456).
<Sysname> system-view
[Sysname] hwtacacs scheme hwt1
[Sysname-hwtacacs-hwt1] key authentication simple 123
[Sysname-hwtacacs-hwt1] key authorization simple 123
[Sysname-hwtacacs-hwt1] key accounting simple 456
# On the HWTACACS server, reconfigure the shared key for HWTACACS messages interacting with the access device to ensure consistency with the configuration on the access device.
¡ If the shared keys are consistent, execute step 2.
2. Identify whether the access device's IP address has been added to the HWTACACS server or if the added IP address is correct.
The IP address added to the HWTACACS server must be the source IP address from which the access device sends HWTACACS packets. You can set the source IP address used by the access device to send HWTACACS packets by commands.
The access device selects the source IP address used to send HWTACACS packets in the following order:
¡ The source IP address configured in HWTACACS scheme view by using the nas-ip command.
¡ The source IP address configured in system view by using the hwtacacs nas-ip command.
¡ The IP address of the outgoing interface sending the HWTACACS packets.
3. Identify whether any network issues exist between the device and the server.
First, use methods like ping to verify network connectivity between the device and the server. Then, Identify whether firewalls exist within the network. Typically, if a network contains a firewall that blocks packets destined for the TCP port number of the HWTACACS server (with the default authentication/authentication/authorization port number at 49), HWTACACS packets will be discarded.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, alarm messages, and debugging information.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Mismatched user access type and the Login-Service attribute value issued by the RADIUS server
Symptom
User authentication fails because the device does not support the Login-Service attribute value issued by the RADIUS server.
Use the debugging radius packet command to enable RADIUS packet debugging on the device. In the debugging information of the following form, you can see that the server issued a Login-Service attribute type not supported by the device.
*Aug 3 02:33:18:707 2021 Sysname RADIUS/7/PACKET:
Service-Type=Framed-User
Idle-Timeout=66666
Session-Timeout=6000
Login-Service=TCP-Clear
Common causes
The main reason for this class of faults is that the service type for user login does not match the service type specified by the Login-Service attribute issued by the server.
The Login-Service attribute is issued to the user by the RADIUS server to identify the type of service for authenticated users. The device currently supports the following Login-Service attribute values:
· 0: Telnet (standard attribute)
· 50: SSH (expansion attribute)
· 51: FTP (expansion attribute)
· 52: Terminal (expansion attribute)
· 53: HTTP (expansion attribute)
· 54: HTTPS (expansion attribute)
You can use the CLI to set the method in which the device inspects the value of the Login-Service attribute, controlling the consistency check method for user service types.
Troubleshooting flow
Figure 143 shows the troubleshooting flowchart.
Solution
1. Verify if the Login-Service attribute value issued by the RADIUS server matches the access type.
Execute the display radius scheme command on the access device to view the value of the Attribute 15 check-mode field for the RADIUS scheme.
¡ If the value is Loose, it indicates that the loose check mode is used and the device uses the standard value of the Login-Service attribute to check the user service type. SSH, FTP, and terminal users can pass authentication only when the Login-Service attribute value issued by the RADIUS server is 0, indicating the Telnet user type.
¡ If the value is Strict, it indicates that the strict check mode is used and the device uses both the standard value and expansion values of the Login-Service attribute to check the user service type. SSH, FTP, and terminal users can pass authentication only when the RADIUS server assigns the corresponding Login-Service expansion attribute value.
If the Login-Service attribute issued to a user by the RADIUS server is out of the range supported by the device, you can resolve this issue by using one of the following methods:
¡ On the RADIUS server, set the server to either not issue the Login-Service attribute or change the issued attribute value to a value supported by the access device.
¡ On the access device, enter the corresponding RADIUS scheme and use the attribute 15 check-mode command to change the check mode for the Login-Service attribute. In this example, the check mode is set to loose.
<Sysname> system-view
[Sysname] radius scheme radius1
[Sysname-radius-radius1] attribute 15 check-mode loose
2. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, log information, debugging information, and alarm messages.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
Local authentication login failure
Symptom
The administrator failed to log into the device using local authentication.
Common causes
The following are the common causes of this type of issue:
· The configuration of the authentication method for the user line is incorrect.
· The protocol type supported by the VTY user line is incorrect.
· The configured authentication, authorization, and accounting schemes for the ISP domain are incorrect.
· The local user does not exist, the password is incorrect, or the service type is incorrect.
· The number of local user accesses has reached the upper limit.
· The number of users logged into the device has reached the upper limit.
· The global password management function is enabled, and the local lauth.dat file on the device is abnormal.
Troubleshooting flow
Figure 144 shows the troubleshooting flowchart.
Figure 144 Flowchart for troubleshooting local authentication login failures
Solution
|
NOTE: For login issues with Web, NETCONF over SOAP, and FTP, inspection of the user line (class) configuration is not required. The other troubleshooting steps are the same. |
1. Check the user line configuration .
Execute the line vty first-number [ last-number ] command to enter the view of the specified VTY user line, and execute the display this command to view if the following configurations are correct:
¡ The authentication-mode is set to scheme.
¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.
¡ For SSH login, the protocol inbound is set to ssh or the default value is used.
2. Check the configuration in user line class view.
3. The configuration in user line view takes precedence over the configuration in user line class view. If the user line view does not contain any configuration, continue to check the settings in user line class view.
4. Execute the line class vty command to enter VTY user line class view, and use the display this command to verify if the following configurations are correct:
¡ The authentication-mode is set to scheme.
¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.
¡ For SSH login, the protocol inbound is set to ssh or the default value is used.
If the configurations in user line view and user line class view are incorrect, set the authentication scheme to scheme as needed for the user line or user line class, and specify the supported protocol types for user login.
5. Identify whether the authentication, authorization, and accounting scheme configurations for the ISP domain are correct.
Execute the display domain command to view the configuration information.
¡ If a user login username includes the domain name (for example, test), verify if the value of the Login authentication scheme field for the domain is Local. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is Local.
<Sysname> display domain test
Domain: test
State: Active
Login authentication scheme: Local
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out action: Offline
Service type: HSI
Session time: Exclude idle time
NAS-ID: N/A
DHCPv6-follow-IPv6CP timeout: 60 seconds
Authorization attributes:
Idle cut: Disabled
Session timeout: Disabled
IGMP access limit: 4
MLD access limit: 4
¡ If the user login username does not include the domain name, execute the display this command in system view to view the configuration of domain default enable isp-name. In this example, the default domain name is system.
#
domain default enable system
#
- If this configuration exists, execute the display domain command to verify if the value of the Login authentication scheme field for the ISP domain is Local. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is Local.
- If the configuration does not exist, execute the display domain command to verify if the value of the Login authentication scheme field for the system domain is Local. If the Login authentication scheme field is missing for the system domain, verify if the value of the Default authentication scheme field is Local.
The method for confirming the authorization and accounting configuration is similar. If the above configurations are incorrect, configure the local scheme for authentication, authorization, or accounting for login users in the relevant ISP domain.
6. Verify that the username and password are correct.
Execute the display local-user command to verify if the corresponding local user configuration exists.
¡ If a local user exists, execute the local-user username class manage command to enter local user view. Then, use the display this command to verify if a password is configured in the view and if the service-type configuration matches the required service type.
- If the user password is required, try resetting the password once. In this example, the password is set to 123456TESTplat&!.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] password simple 123456TESTplat&!
- If the service type is incorrect, configure the service type to match the login method. In this example, SSH is used.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] service-type ssh
¡ If a local user does not exist, execute the local-user username class manage command to create a device management local user and configure the password and service type. In this example, the username is test.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test]
7. Verify if the number of users accessing with this local username has reached the upper limit.
Execute the display this command in local user view to verify if the access-limit configuration exists.
¡ If the access-limit configuration exists, execute the display local-user username class manage command to verify if the value of the Current access number field has reached the configured upper limit. If the upper limit is reached, take one of the following measures as needed:
- In local user view, execute the access-limit command to increase the user limit. In this example, the upper limit is changed to 20.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] access-limit 20
- Execute the free command in user view to force other online users offline. This example releases all connections established on VTY1.
<Sysname> free line vty 1
Are you sure to free line vty1? [Y/N]:y
[OK]
¡ If the access-limit configuration does not exist, or the number of users has not reached the upper limit, proceed to the next step.
8. Verify if the number of online users for the specified login type has reached the upper limit.
a. Execute the display this command in system view to verify if the aaa session-limit configuration exists. If the configuration is not found, it indicates that the default value 32 is used.
#
aaa session-limit ftp 33
domain default enable system
#
b. Execute the display users command to view the current user login status in use line and verify if the user quantity has reached the upper limit.
c. If the number of online users reaches the upper limit, take one of the following measures as needed:
- In system view, execute the aaa session-limit command to increase the user quantity upper limit.
- Execute the free command in user view to force other online users offline.
9. Verify if the local lauth.dat file is correct.
After you enable the global password management feature, the device automatically generates a lauth.dat file to record local users' authentication and login information. Manually deleting or modifying this file will cause an anomaly in local authentication. Therefore, first execute the display password-control command to verify if the global password management feature is enabled on the device.
¡ If the file does not exist, is of size 0, or is very small (less than 20B), contact Technical Support. If urgent, try re-enabling the global password management feature to resolve the issue.
<Sysname> dir
Directory of flash: (EXT4)
0 drw- - Aug 16 2021 11:45:37 core
1 drw- - Aug 16 2021 11:45:42 diagfile
2 drw- - Aug 16 2021 11:45:57 dlp
3 -rw- 713 Aug 16 2021 11:49:41 ifindex.dat
4 -rw- 12 Sep 01 2021 02:40:01 lauth.dat
...
<Sysname> system-view
[Sysname] undo password-control enable
[Sysname] password-control enable
¡ If this feature is not enabled, skip this step.
10. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, log information, alarm messages, and debugging information.
¡ Use the debugging local-server all command to enable debugging of the local server to collect the device debugging information.
Related alarm and log messages
Alarm messages
Module: HH3C-UI-MAN-MIB
· hh3cLogInAuthenFailure (1.3.6.1.4.1.25506.2.2.1.1.3.0.3)
· Module: HH3C-SSH-MIB
· hh3cSSHUserAuthFailure (1.3.6.1.4.1.25506.2.22.1.3.0.1)
Log messages
· LOGIN/5/LOGIN_FAILED
· SSHS/6/SSHS_AUTH_FAIL
RADIUS authentication login failure
Symptom
The administrator failed to log in to the device using RADIUS authentication.
Common causes
The following are the common causes of this type of issue:
· The configuration of the authentication method for the user line is incorrect.
· The protocol type supported by the VTY user line is incorrect.
· The configured authentication, authorization, and accounting schemes for the ISP domain are incorrect.
· Interaction with the RADIUS server failed.
· The value of the Login-Service attribute issued by the RADIUS server is incorrect.
· The RADIUS server failed to assign a user role.
Troubleshooting flow
Figure 145 shows the troubleshooting flowchart.
Figure 145 Flowchart for troubleshooting RADIUS authentication login failures
Solution
|
NOTE: For login issues with Web, NETCONF over SOAP, and FTP, inspection of the user line (class) configuration is not required. The other troubleshooting steps are the same. |
1. Check the user line configuration .
Execute the line vty first-number [ last-number ] command to enter the view of the specified VTY user line, and execute the display this command to view if the following configurations are correct:
¡ The authentication-mode is set to scheme.
¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.
¡ For SSH login, the protocol inbound is set to ssh or the default value is used.
2. Check the configuration in user line class view.
3. The configuration in user line view takes precedence over the configuration in user line class view. If the user line view does not contain any configuration, continue to check the settings in user line class view.
4. Execute the line class vty command to enter VTY user line class view, and use the display this command to verify if the following configurations are correct:
¡ The authentication-mode is set to scheme.
¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.
¡ For SSH login, the protocol inbound is set to ssh or the default value is used.
If the configurations in user line view and user line class view are incorrect, set the authentication scheme to scheme as needed for the user line or user line class, and specify the supported protocol types for user login.
5. Identify whether the authentication, authorization, and accounting scheme configurations for the ISP domain are correct.
Execute the display domain command to view the configuration information.
¡ If a user login username includes the domain name (for example, test), verify if the value of the Login authentication scheme field for the domain is in the RADIUS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the RADIUS=xx format.
<Sysname> display domain test
Domain: test
State: Active
Login authentication scheme: RADIUS=rds
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out action: Offline
Service type: HSI
Session time: Exclude idle time
NAS-ID: N/A
DHCPv6-follow-IPv6CP timeout: 60 seconds
Authorization attributes:
Idle cut: Disabled
Session timeout: Disabled
IGMP access limit: 4
MLD access limit: 4
¡ If the user login username does not include the domain name, execute the display this command in system view to view the configuration of domain default enable isp-name. In this example, the default domain name is system.
#
domain default enable system
#
- If this configuration exists, execute the display domain command to verify if the value of the Login authentication scheme field for the ISP domain is in the RADIUS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the RADIUS=xx format.
- If the configuration does not exist, execute the display domain command to verify if the value of the Login authentication scheme field for the system domain is in the RADIUS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the RADIUS=xx format.
The method for confirming the authorization and accounting configuration is similar. If the above configurations are incorrect, configure the RADIUS scheme for authentication, authorization, or accounting for login users in the relevant ISP domain. In this example, the specified RADIUS scheme is rd1.
<Sysname> system-view
[Sysname] domain test
[Sysname-isp-test] authentication login radius-scheme rd1
[Sysname-isp-test] authorization login radius-scheme rd1
[Sysname-isp-test] accounting login radius-scheme rd1
6. Use the RADIUS debugging information to troubleshoot the following faults:
¡ Execute the debugging radius packet command to enable RADIUS packet debugging. If the output debugging information shows Authentication reject, it indicates that the server has rejected the user's access request. In this case, continue to review the authentication logs recorded on the RADIUS server and contact the server administrator for appropriate processing based on the failure reasons described in the logs.
¡ Execute the debugging radius error command to enable RADIUS error debugging. If the output debugging information shows Invalid packet authenticator, it indicates that the shared key between the device and the server does not match. Try setting a matching shared key for the RADIUS scheme.
¡ Execute the debugging radius event command to enable RADIUS event debugging. If the output debugging information shows Response timed out, it indicates that the device is unreachable from the server. Try troubleshooting the link connectivity issues between the device and the server.
7. Verify if the value of the Login-Service attribute issued by the RADIUS server matches the service type supported by the device.
Execute the debugging radius packet command to enable RADIUS packet debugging. Then, view the Login-Service attribute issued by the RADIUS server, and use the method described in "Mismatched user access type and the Login-Service attribute value issued by the RADIUS server" to resolve the issue.
8. Verify if the RADIUS server has assigned the correct user role.
Execute the debugging radius all command to enable all RADIUS debugging functions. If the connection disconnects immediately after the user enters the username and password, and no anomaly exists in the RADIUS event debugging or RADIUS error debugging output, it is possible that the RADIUS server failed to assign a user role or assigned an incorrect user role to the user. In this case, verify if the RADIUS packet debugging information includes the shell:roles=xx or Exec-Privilege=xx field.
¡ If not included, it means the RADIUS server did not assign a user role to the user. To solve this issue, use one of the following methods:
- On the device, use the role default-role enable rolename command to enable default user role authorization. This gives users a default user role when the server has not authorized any roles for them.
<Sysname> system-view
[Sysname] role default-role enable
- Contact the RADIUS server administrator to assign the appropriate user role to users.
¡ If included, but the specified user role does not exist on the device, contact the RADIUS server administrator to modify the user role settings or use the user-role role-name command to create the corresponding user role on the device.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, log information, alarm messages, and debugging information.
¡ Use the debugging radius all command to enable all the RADIUS debugging functions to collect the device debugging information.
Related alarm and log messages
Alarm messages
Module: HH3C-UI-MAN-MIB
· hh3cLogInAuthenFailure (1.3.6.1.4.1.25506.2.2.1.1.3.0.3)
· Module: HH3C-SSH-MIB
· hh3cSSHUserAuthFailure (1.3.6.1.4.1.25506.2.22.1.3.0.1)
Log messages
· LOGIN/5/LOGIN_AUTHENTICATION_FAILED
· LOGIN/5/LOGIN_FAILED
· SSHS/6/SSHS_AUTH_FAIL
HWTACACS authentication login failure
Symptom
The administrator failed to log in to the device using HWTACACS authentication.
Common causes
The following are the common causes of this type of issue:
· The configuration of the authentication method for the user line is incorrect.
· The protocol type supported by the VTY user line is incorrect.
· The configured authentication, authorization, and accounting schemes for the ISP domain are incorrect.
· Interaction with the HWTACACS server failed.
· The HWTACACS server failed to assign a user role.
Troubleshooting flow
Figure 146 shows the troubleshooting flowchart.
Figure 146 Flowchart for troubleshooting HWTACACS authentication login failures
Solution
|
NOTE: For login issues with Web, NETCONF over SOAP, and FTP, inspection of the user line (class) configuration is not required. The other troubleshooting steps are the same. |
1. Check the user line configuration .
Execute the line vty first-number [ last-number ] command to enter the view of the specified VTY user line, and execute the display this command to view if the following configurations are correct:
¡ The authentication-mode is set to scheme.
¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.
¡ For SSH login, the protocol inbound is set to ssh or the default value is used.
2. Check the configuration in user line class view.
3. The configuration in user line view takes precedence over the configuration in user line class view. If the user line view does not contain any configuration, continue to check the settings in user line class view.
4. Execute the line class vty command to enter VTY user line class view, and use the display this command to verify if the following configurations are correct:
¡ The authentication-mode is set to scheme.
¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.
¡ For SSH login, the protocol inbound is set to ssh or the default value is used.
¡ If the configurations in user line view and user line class view are incorrect, set the authentication scheme to scheme as needed for the user line or user line class, and specify the supported protocol types for user login.
5. Identify whether the authentication, authorization, and accounting scheme configurations for the ISP domain are correct.
Execute the display domain command to view the configuration information.
¡ If a user login username includes the domain name (for example, test), verify if the value of the Login authentication scheme field for the domain is in the HWTACACS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the HWTACACS=xx format.
<Sysname> display domain test
Domain: test
State: Active
Login authentication scheme: HWTACACS=hwt1
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out action: Offline
Service type: HSI
Session time: Exclude idle time
NAS-ID: N/A
DHCPv6-follow-IPv6CP timeout: 60 seconds
Authorization attributes:
Idle cut: Disabled
Session timeout: Disabled
IGMP access limit: 4
MLD access limit: 4
¡ If the user login username does not include the domain name, execute the display this command in system view to view the configuration of domain default enable isp-name. In this example, the default domain name is system.
#
domain default enable system
#
- If this configuration exists, execute the display domain command to verify if the value of the Login authentication scheme field for the ISP domain is in the HWTACACS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the HWTACACS=xx format.
- If the configuration does not exist, execute the display domain command to verify if the value of the Login authentication scheme field for the system domain is in the HWTACACS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the HWTACACS=xx format.
The method for confirming the authorization and accounting configuration is similar. If the above configurations are incorrect, configure the RADIUS scheme for authentication, authorization, or accounting for login users in the relevant ISP domain. In this example, the specified HWTACACS scheme is hwt1.
<Sysname> system-view
[Sysname] domain test
[Sysname-isp-test] authentication login hwtacacs-scheme hwt1
[Sysname-isp-test] authorization login hwtacacs-scheme hwt1
[Sysname-isp-test] accounting login hwtacacs-scheme hwt1
6. Use the HWTACACS debugging information to troubleshoot the following faults:
¡ Execute the debugging hwtacacs send-packet and debugging hwtacacs receive-packet commands to enable HWTACACS packet sending and receiving debugging. If the output debugging information shows status: STATUS_FAIL, it means the server rejected the user's access request. In this case, review the failure reasons described in the HWTACACS authentication log and pinpoint based on the specific reasons for failure.
¡ Execute the debugging hwtacacs error command to enable HWTACACS error debugging. If the output debugging information shows Failed to get available server, it indicates that the shared key between the device and the server does not match. Try setting a matching shared key for the HWTACACS scheme.
¡ Execute the debugging radius event command to enable HWTACACS event debugging. If the output debugging information shows Connection timed out, it indicates that the device is unreachable from the server. Try troubleshooting the link connectivity issues between the device and the server.
7. Verify if the HWTACACS server has assigned the correct user role.
Execute the debugging hwtacacs all command to enable all HWTACACS debugging functions. If the connection disconnects immediately after the user logs in, and no anomaly exists in the HWTACACS event debugging output or HWTACACS error debugging output, it is possible that the HWTACACS server failed to assign a user role to the user. In this case, verify if the HWTACACS packet debugging information includes the priv-lvl=xx or roles=xx field.
¡ If not included, it means the HWTACACS server did not assign user role to the user. To solve this issue, use one of the following methods:
- On the device, use the role default-role enable rolename command to enable default user role authorization. This gives users a default user role when the server has not authorized any roles for them.
<Sysname> system-view
[Sysname] role default-role enable
- Contact the HWTACACS server administrator to assign the appropriate user role to users. The authorization role configuration on the HWTACACS server must meet the format of roles="name1 name2 namen", where name1, name2, and namen are the user roles to be authorized and issued to users. Multiple roles are allowed and separated by spaces.
¡ If included, but the specified user role does not exist on the device, contact the RADIUS server administrator to modify the user role settings or use the user-role role-name command to create the corresponding user role on the device.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, log information, alarm messages, and debugging information.
¡ Use the debugging hwtacacs all command to enable all the HWTACACS debugging functions to collect the device debugging information.
Related alarm and log messages
Alarm messages
Module: HH3C-UI-MAN-MIB
· hh3cLogInAuthenFailure (1.3.6.1.4.1.25506.2.2.1.1.3.0.3)
· Module: HH3C-SSH-MIB
· hh3cSSHUserAuthFailure (1.3.6.1.4.1.25506.2.22.1.3.0.1)
Log messages
· LOGIN/5/LOGIN_AUTHENTICATION_FAILED
· LOGIN/5/LOGIN_FAILED
· SSHS/6/SSHS_AUTH_FAIL
LDAP authentication login failure
Symptom
The administrator failed to log in to the device using LDAP authentication.
Common causes
The following are the common causes of this type of issue:
· The configuration of the authentication method for the user line is incorrect.
· The protocol type supported by the VTY user line is incorrect.
· The configured authentication, authorization, and accounting schemes for the ISP domain are incorrect.
· Interaction with the LDAP server failed.
Troubleshooting flow
Figure 147 shows the troubleshooting flowchart.
Figure 147 Flowchart for troubleshooting LDAP authentication login failures
Solution
|
NOTE: For login issues with Web, NETCONF over SOAP, and FTP, inspection of the user line (class) configuration is not required. The other troubleshooting steps are the same. |
1. Check the user line configuration .
Execute the line vty first-number [ last-number ] command to enter the view of the specified VTY user line, and execute the display this command to view if the following configurations are correct:
¡ The authentication-mode is set to scheme.
¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.
¡ For SSH login, the protocol inbound is set to ssh or the default value is used.
2. Check the configuration in user line class view.
3. The configuration in user line view takes precedence over the configuration in user line class view. If the user line view does not contain any configuration, continue to check the settings in user line class view.
4. Execute the line class vty command to enter VTY user line class view, and use the display this command to verify if the following configurations are correct:
¡ The authentication-mode is set to scheme.
¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.
¡ For SSH login, the protocol inbound is set to ssh or the default value is used.
If the configurations in user line view and user line class view are inaccurate, set the authentication scheme to scheme as needed for the user line or user line class, and specify the supported protocol types for user login.
5. Identify whether the authentication, authorization, and accounting scheme configurations for the ISP domain are accurate.
Execute the display domain command to view the configuration information.
¡ If a user login username includes the domain name (for example, test), verify if the value of the Login authentication scheme field for the domain is in the LDAP=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the LDAP=xx format.
<Sysname> display domain test
Domain: test
State: Active
Login authentication scheme: LDAP=ldp
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out action: Offline
Service type: HSI
Session time: Exclude idle time
NAS-ID: N/A
DHCPv6-follow-IPv6CP timeout: 60 seconds
Authorization attributes:
Idle cut: Disabled
Session timeout: Disabled
IGMP access limit: 4
MLD access limit: 4
¡ If the user login username does not include the domain name, execute the display this command in system view to view the configuration of domain default enable isp-name. In this example, the default domain name is system.
#
domain default enable system
#
- If this configuration exists, execute the display domain command to verify if the value of the Login authentication scheme field for the ISP domain is in the LDAP=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the LDAP=xx format.
- If the configuration does not exist, execute the display domain command to verify if the value of the Login authentication scheme field for the system domain is in the LDAP=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the LDAP=xx format.
If the above configurations are incorrect, configure the LDAP authentication scheme for login users in the relevant ISP domain. LDAP servers generally act as authentication servers, and authorization and accounting are usually configured differently, such as local, RADIUS, or HWTACACS. In this example, authentication uses the LDAP scheme ccc, and local authorization and accounting are used.
<Sysname> system-view
[Sysname] domain test
[Sysname-isp-test] authentication login ldap-scheme ccc
[Sysname-isp-test] authorization login local
[Sysname-isp-test] accounting login local
6. Use the LDAP debugging information to troubleshoot the following faults:
Execute the debugging ldap error command to enable LDAP error debugging. Use the following debugging information printed by the system to identify the issue:
¡ If the output information shows Failed to perform binding operation as administrator, it indicates that the administrator DN configured in LDAP server view does not exist or the administrator password is incorrect. To address this issue, enter LDAP server view and execute the login-dn and login-password commands to modify the administrator DN and password configuration, respectively. In this example, the DN for a user with the administrator role is cn=administrator,cn=users,dc=ld, and the administrator password is admin!123456.
<Sysname> system-view
[Sysname] ldap server ldap1
[Sysname-ldap-server-ldap1] login-dn cn=administrator,cn=users,dc=ld
[Sysname-ldap-server-ldap1] login-password simple admin!123456
¡ If the output information shows Failed to get bind result.errno = 115, it indicates that the LDAP service is not enabled on the peer or the LDAP server is experiencing an anomaly. To address this issue, contact the administrator of the LDAP server.
¡ If the output information shows Bind operation failed, it indicates the device cannot reach the LDAP server. Try troubleshooting connectivity issues between the device and the server.
¡ If the output information shows Failed to perform binding operation as user, it indicates the password of the LDAP user is incorrect.
¡ If the output information shows Failed to bind user username for the result of searching DN is NULL, it indicates the LDAP user does not exist. To address this issue, contact the administrator of the LDAP server.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, log information, alarm messages, and debugging information.
¡ Use the debugging ldap all command to enable all the LDAP debugging functions to collect the device debugging information.
Related alarm and log messages
Alarm messages
Module: HH3C-UI-MAN-MIB
· hh3cLogInAuthenFailure (1.3.6.1.4.1.25506.2.2.1.1.3.0.3)
· Module: HH3C-SSH-MIB
· hh3cSSHUserAuthFailure (1.3.6.1.4.1.25506.2.22.1.3.0.1)
Log messages
· LOGIN/5/LOGIN_AUTHENTICATION_FAILED
· LOGIN/5/LOGIN_FAILED
· SSHS/6/SSHS_AUTH_FAIL
Ineffective dynamic VLAN issued by the RADIUS authentication server
Symptom
When an 802.1X or MAC authentication user is online, the dynamically authorized VLAN attribute issued by the RADIUS authentication server does not take effect.
Common causes
The following are the common causes of this type of issue:
· The RADIUS DAE service is disabled.
· The content of the authorization attribute issued by RADIUS is incorrect.
· The user failed to obtain the dynamic VLAN.
· The interface type configuration for the dynamically authorized VLAN is incorrect.
· The dynamically authorized VLAN does not exist.
Troubleshooting flow
Figure 148 shows the troubleshooting flowchart.
Solution
1. Verify if the RADIUS DAE service is enabled.
In system view, execute the display current-configuration | include radius command to verify if the radius dynamic-author server configuration exists.
¡ If the configuration exists, execute the radius dynamic-author server command to enter RADIUS DAE server view and verify if the RADIUS DAE client and RADIUS DAE service port configurations are correct.
<Sysname> system-view
[Sysname] radius dynamic-author server
[Sysname-radius-da-server] display this
#
radius dynamic-author server
port 3790
client ip 3.3.3.3 key cipher $c$3$kiAORLht3S3rTCmFq0uWXPgV8PjI2Q==
#
¡ If the configuration does not exist, execute the radius dynamic-author server command to enable the RADIUS DAE service, and enter RADIUS DAE server view to configure the RADIUS DAE client and RADIUS DAE service port. In this example, the client IP address is 1.1.1.1, the shared key is 123456, and the service port is 3798.
<Sysname> system-view
[Sysname] radius dynamic-author server
[Sysname-radius-da-server] client ip 1.1.1.1 key simple 123456
[Sysname-radius-da-server] port 3798
2. Verify if the VLAN attributes issued by the RADIUS server are correct.
Execute the debugging radius packet command to enable RADIUS packet debugging, and configure the RADIUS server to issue the VLAN attributes again.
The RADIUS server must issue the following standard attributes at the same time to issue VLAN information:
¡ The Tunnel-Type attribute, number 64, is an Integer with a fixed value of 13, representing VLAN.
¡ The Tunnel-Medium-Type attribute, number 65, is an Integer with a fixed value of 6, representing IEEE 802.
¡ The Tunnel-Private-Group-Id attribute, number 81, is a String, representing the VLAN ID or VLAN name.
View the output RADIUS debugging information, verify if the COA request contains the three standard attributes as shown in the example below.
*Aug 3 02:33:18:700 2021 Sysname RADIUS/7/PACKET:
Received a RADIUS packet
Server IP : 128.11.3.48
NAS-IP : 128.11.30.69
VPN instance : --(public)
Server port : 55805
Type : COA request
Length : 41
Packet ID : 34
User-Name="user"
Tunnel-Type:0=VLAN
Tunnel-Medium-Type:0=IEEE-802
Tunnel-Private-Group-Id:0="2"
If the output authorization attributes are incorrect, contact the administrator of the RADIUS server to modify the authorization VLAN configuration and try to re-issue the VLAN. If the output authorization attributes are correct, proceed to the next step.
3. Verify if the user successfully received the assigned VLAN information.
Execute the display dot1x connection or display mac-authentication connection command to verify if the online user information includes dynamic VLAN authorization information issued by the server.
¡ If authorized VLAN information exists, it indicates successful VLAN distribution.
¡ If no authorization VLAN information exists, it means the VLAN was not successfully deployed. In this case, as a best practice, continue identifying the cause of the fault under the guidance of technical support based on the RADIUS debugging information.
4. Verify if the authorized VLAN exists.
Execute the display vlan brief command to verify if the dynamically issued VLAN exists. If the VLAN does not exist, execute the vlan vlan-id command in system view to create the VLAN.
5. Verify if the interface type for the VLAN is correct.
Different types of interfaces have different requirements for successfully joining the authorized VLAN. For specific configuration requirements, see configuring 802.1X authentication and configuring MAC authentication in Security Configuration Guide.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, debugging information, and diagnosis information.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
Ineffective or partially effective Filter-Id attribute issued by the RADIUS server
Symptom
The RADIUS authentication server issues an ACL to the user through the Filter-Id attribute, but the user cannot access network resources normally after authentication and login.
Common causes
The following are the common causes of this type of issue:
· The content of the authorization attribute issued by RADIUS is incorrect.
· The access user failed to obtain the ACL.
· The authorized ACL does not exist.
Troubleshooting flow
Figure 149 shows the troubleshooting flowchart.
Solution
1. Verify if the Filter-ID attribute issued by the RADIUS server is correct.
Execute the debugging radius packet command to enable RADIUS packet debugging, and configure the RADIUS server to re-issue the Filter-ID attribute. View the output debugging information on the device.
¡ If the issued Filter-ID attribute is purely numeric, it indicates that an ACL number has been issued.
*Aug 18 16:54:49:670 2021 Sysname RADIUS/7/PACKET: -MDC=1;
Received a RADIUS packet
Server IP : 128.11.3.48
NAS-IP : 128.11.30.69
VPN instance : --(public)
Server port : 54175
Type : COA request
Length : 32
Packet ID : 200
User-Name="user"
Filter-Id="2001"
¡ If the Filter-ID attribute value is not entirely numeric and the next attribute delivered is H3c-ACL-Version (with an integer value in the range of 1 to 4), it indicates that an ACL name has been assigned.
*Aug 18 16:55:19:798 2021 Sysname RADIUS/7/PACKET: -MDC=1;
Received a RADIUS packet
Server IP : 128.11.3.48
NAS-IP : 128.11.30.69
VPN instance : --(public)
Server port : 54176
Type : COA request
Length : 48
Packet ID : 157
User-Name="user"
Filter-Id="aclname1"
H3c-ACL-Version=1
If the Filter-ID attribute is not issued as expected, or if the issued ACL type is not supported by the device, contact the administrator of the RADIUS server to modify the authorization ACL configuration and try to re-issue the Filter-ID. If the issue persists, proceed to the next step.
2. Verify if the user successfully received the assigned ACL information.
Execute the display dot1x connection or display mac-authentication connection command to verify if the online user information includes ACL authorization information.
¡ If authorized ACL information exists, it indicates successful ACL distribution.
¡ If no authorization ACL information exists, it means the ACL was not successfully deployed. In this case, as a best practice, continue identifying the cause of the fault under the guidance of technical support based on the RADIUS debugging information.
3. Verify if the corresponding ACL has been created on the device.
Execute the display acl all command to verify if the issued ACL exists.
¡ If the ACL has not been created, execute the acl number acl-number [ name acl-name ] command in system view to create the ACL.
¡ If the ACL exists, verify if the ACL configuration is correct.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, debugging information, and diagnosis information.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
RADIUS authentication fail-permit failed for 802.1X, MAC, or Web authentication users
Symptom
User fail-permit fails when the RADIUS server is unreachable during 802.1X, MAC, or Web authentication. The following conditions might occur:
· Fail-permit is not performed and new users cannot come online.
· Fail-permit succeeds, but users cannot visit critical resources.
· Fail-permit succeeds, but users are kicked off.
Common causes
The following are the common causes of this type of issue:
· Free VLAN is configured on the interface for port security. User traffic in the specified VLAN is not authenticated.
· The fail-permit policy is not configured as required.
· Not all RADIUS servers under the RADIUS authentication scheme are unreachable. Accessible RADIUS servers exist, and other reasons cause the user authentication to fail.
· A backup RADIUS authentication method (Local or None) is configured. The backup method is used when the RADIUS authentication server cannot be reached.
· The critical resources configured for 802.1X and MAC authentication fail-permit do not exist.
· Online user fail-permit is not enabled when 802.1X or MAC authentication offline detection is enabled on the interface. In this case, the device logs off users if no user traffic is detected during an offline detection timeout.
Troubleshooting flow
The troubleshooting flowchart is shown in Figure 150.
Figure 150 Troubleshooting flowchart
Solution
1. Verify if a free VLAN is configured on the port for port security.
If a free VLAN is configured for port security on the user access port, traffic from 802.1X and MAC authentication users in the VLAN will bypass authentication and be forwarded directly. As a result, these users will not trigger fail-permit. The free VLAN configuration example is as follows:
<Sysname> system-view
[Sysname] interface GigabitEthernet 2/0/1
[Sysname-GigabitEthernet2/0/1] port-security free-vlan 2 3
To disable direct forwarding of user traffic is a specific VLAN, delete the free VLAN configuration.
2. Verify if the configured fail-permit policy is correct.
The device supports the following types of fail-permit policies:
¡ ISP domain-based fail-permit (for 802.1X, MAC, or Web authentication users): When the device enters fail-permit state, newly connected users within the authentication domain will "escape" from the current domain and directly access the configured critical domain without authentication. The critical domain configuration example is as follows:
# In ISP domain test, configure domain dm1 as the critical domain.
<Sysname> system-view
[Sysname] domain abc
[Sysname-isp-abc] authen-radius-unavailable online domain dm2
¡ Port-based fail-permit (for 802.1X and MAC authentication users): When the device enters fail-permit state, new users connecting to the port can directly access certain critical resources (such as critical VLAN, critical VSI, critical microsegmentation, or critical resources within a critical profile) bound to the current port without authentication. The critical resource configuration example is as follows:
# Specify VLAN 100 as the critical VLAN of GigabitEthernet 2/0/1.
<Sysname> system-view
[Sysname] interface GigabitEthernet 2/0/1
[Sysname-GigabitEthernet2/0/1] dot1x critical vlan 100
If both types of fail-permit policies are configured, the ISP domain-based fail-permit policy has a higher priority. That is, if the RADIUS server is unreachable, new users will directly enter the critical domain bound to the authentication domain and come online in the critical domain. However, the users cannot access the critical resources on the port.
You can execute the display domain command to verify if a critical domain is configured under the ISP domain for user authentication. For example, in the display information, the Authen-radius-unavailable field shows that the configured critical domain is dm2.
<Sysname> display domain abc
Domain: abc
State: Active
Login authorization scheme: RADIUS=bbb
LAN access authentication scheme: RADIUS=bbb
LAN access accounting scheme: RADIUS=bbb
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out policy: Offline
Service type: HSI
Session time: Exclude idle time
Dual-stack accounting method: Merge
Authorization attributes:
Idle cut: Disabled
IGMP access limit: 4
MLD access limit: 4
Authen-fail action: Offline
Authen-radius-unavailable: Online domain dm2
Authen-radius-recover: Not configured
3. Verify if a RADIUS scheme is configured for users.
Execute the display domain command to verify if a RADIUS scheme is configured for LAN access users. In the example, the LAN access authentication scheme field shows that an LDAP authentication scheme is configured and no RADIUS scheme is configured.
<Sysname> display domain abc
Domain: abc
State: Active
Login authorization scheme: RADIUS=bbb
LAN access authentication scheme: LDAP=ldp
LAN access authorization scheme : Local
LAN access accounting scheme: Local
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out policy: Offline
Service type: HSI
Session time: Exclude idle time
Dual-stack accounting method: Merge
Authorization attributes:
Idle cut: Disabled
IGMP access limit: 4
MLD access limit: 4
Authen-fail action: Offline
Authen-radius-unavailable: Online domain dm2
Authen-radius-recover: Not configured
If no RADIUS scheme is configured in the authentication domain for LAN access users, configure a scheme as follows:
# Configure RADIUS scheme rd for LAN access users in ISP domain abc.
[Sysname] domain abc
[Sysname-isp-abc] authentication lan-access radius-scheme rd
4. Verify if all RADIUS servers are unreachable under the RADIUS authentication scheme used for user authentication.
The device enters fail-permit state only when all RADIUS servers in the RADIUS scheme used for user authentication are in Block state. Execute the display radius scheme command to view the state of the authentication servers under the RADIUS scheme. In the display information, the State fields of all the RADIUS authentication servers are Active, indicating that the servers are reachable. The fail-permit function will not be triggered.
<Sysname> display radius scheme rd
RADIUS scheme name: rad1
Index: 0
Primary authentication server:
Host name: Not Configured
IP : 128.11.3.33 Port: 1812
VPN : Not configured
State: Active (duration: 0 weeks, 0 days, 0 hours, 43 minutes, 22 seconds)
Most recent state changes:
2022/03/30 15:15:59 Changed to active state
2022/03/30 15:11:05 Changed to blocked state
2022/03/30 15:09:55 Changed to active state
2022/03/30 15:05:01 Changed to blocked state
2022/03/30 08:58:59 Changed to active state
Test profile: Not configured
Weight: 0
Primary accounting server:
Host name: Not Configured
IP : 128.11.3.33 Port: 1813
VPN : Not configured
State: Blocked (mandatory)
Most recent state changes:
2022/03/30 08:59:11 Changed to blocked state
2022/03/29 19:15:04 Changed to active state
2022/03/29 19:10:06 Changed to blocked state
2022/03/29 19:03:08 Changed to active state
2022/03/29 18:58:15 Changed to blocked state
Weight: 0
Second authentication server:
Host name: Not Configured
IP : 1.12.3.4 Port: 1812
VPN : Not configured
State: Active (duration: 0 weeks, 0 days, 0 hours, 0 minutes, 10 seconds)
Most recent state changes:
2022/03/30 15:59:11 Changed to active state
Test profile: Not configured
Weight: 0
Accounting-On function : Disabled
extended function : Disabled
retransmission times : 50
retransmission interval(seconds) : 3
Timeout Interval(seconds) : 3
Retransmission Times : 3
Retransmission Times for Accounting Update : 5
Server Quiet Period(minutes) : 5
Realtime Accounting Interval(seconds) : 720
Stop-accounting packets buffering : Enabled
Retransmission times : 500
NAS IP Address : Not configured
Local NAS IP Address : Not configured
5. Verify if the RADIUS scheme is the authentication method in use.
If a backup RADIUS authentication method (Local or None) is configured, the backup method is used when the RADIUS authentication server cannot be reached. Fail-permit will not be triggered.
Execute the display domain command to view the authentication method configured for LAN access users in the user authentication domain. In the example, the LAN access authentication scheme field shows that the preferred RADIUS authentication scheme is rd and local authentication can be used if the authentication scheme is unavailable.
<Sysname> display domain abc
Domain: abc
State: Active
Login authorization scheme: RADIUS=bbb
LAN access authentication scheme: RADIUS=rd, Local
LAN access authorization scheme: RADIUS=rd, Local
LAN access accounting scheme: RADIUS=rd, Local
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out policy: Offline
Service type: HSI
Session time: Exclude idle time
Dual-stack accounting method: Merge
Authorization attributes:
Idle cut: Disabled
IGMP access limit: 4
MLD access limit: 4
Authen-fail action: Offline
Authen-radius-unavailable: Online domain dm2
Authen-radius-recover: Not configured
In this scenario, to trigger user fail-permit when the RADIUS server is unreachable, delete the configured backup authentication method, making RADIUS authentication the last method.
6. Verify if critical resources are configured on the port.
¡ For 802.1X, MAC, and Web authentication users in the critical domain, they can access the authorized resources configured in the domain. First, execute the display domain command to view the Authorization attributes field in the critical domain, and then configure the corresponding authorization resources on the device.
¡ For 802.1X and MAC authentication users that performed fail-permit based on the port-based fail-permit policy, they can access the critical resources configured on the port. First, view the critical configuration on the user authentication interface, and then create the corresponding authorization resources on the device.
[Sysname-GigabitEthernet2/0/24] display this
#
interface GigabitEthernet2/0/24
port link-mode bridge
dot1x critical vlan 24
#
7. Verify if offline detection is enabled on the user access port.
If offline detection is enabled, by default, when all RADIUS authentication servers in the authentication domain are unreachable, the device logs off users with no traffic within a detection period.
In this example, the command output shows that offline detection is enabled on the access port for MAC authentication users.
<Sysname> display mac-authentication
Global MAC authentication parameters:
MAC authentication : Enabled
Authentication method : PAP
DR member configuration conflict : Unknown
Username format : MAC address in lowercase(xxxxxxxxxxxx)
Username : mac
Password : Not configured
MAC range accounts : 2
MAC address Mask Username
2222-0000-0000 ffff-0000-0000 user1
4444-0000-0000 ffff-0000-0000 user1
Offline detect period : 300 s
Quiet period : 60 s
Server timeout : 100 s
Reauth period : 3600 s
User aging period for critical VLAN : 1000 s
User aging period for critical VSI : 1000 s
User aging period for guest VLAN : 1000 s
User aging period for guest VSI : 1000 s
User aging period for critical microsegment: 1000 s
Temporary user aging period : 60 s
Authentication domain : Not configured, use default domain
HTTP proxy port list : Total 10 ports
1-3, 5, 7, 9, 11-13, 15
HTTPS proxy port list : Not configured
Max number of silent MACs : 31236 (per slot)
Online MAC-auth wired users : 1
Online MAC-auth wireless users : 2
Silent MAC users:
MAC address VLAN ID From port Port index
0001-0000-0001 100 GE2/0/2 21
GigabitEthernet2/0/1 is link-up
MAC authentication : Enabled
Carry User-IP : Disabled
Authentication domain : Not configured
Auth-delay timer : Enabled
Auth-delay period : 60 s
Periodic reauth : Enabled
Reauth period : 120 s
Re-auth server-unreachable : Logoff
Guest VLAN : 100
Guest VLAN reauthentication : Enabled
Guest VLAN auth-period : 150 s
Critical VLAN : Not configured
Critical voice VLAN : Disabled
Host mode : Single VLAN
Offline detection : Enabled
Authentication order : Parallel
User aging : Enabled
Server-recovery online-user-sync : Enabled
...omit...
When a RADIUS authentication server is reachable, to use offline detection and allow users to stay online in case of authentication server failure, you can enable online user fail-permit on the device.
For MAC authentication users, the configuration method to enable online user fail-permit on an interface is as follows:
<Sysname> system-view
[Sysname] interface GigabitEthernet 2/0/1
[Sysname-GigabitEthernet2/0/1] mac-authentication auth-server-unavailable escape
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Device configuration file, debugging information, and diagnosis information.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
IPoE user fail-permit failure during RADIUS authentication
Symptom
During IPoE user authentication, the RADIUS server is unreachable and the fail-permit function fails, preventing users from coming online.
Common causes
The following are the common causes of this type of issue:
· The fail-permit policy is not configured as required.
· Not all RADIUS servers under the RADIUS authentication scheme are unreachable. Accessible RADIUS servers exist, and other reasons cause the user authentication to fail.
· A backup RADIUS authentication method (Local or None) is configured. The backup method is used when the RADIUS authentication server cannot be reached.
Troubleshooting flow
Figure 151 shows the troubleshooting flowchart.
Figure 151 Flowchart for troubleshooting IPoE user fail-permit failure during RADIUS authentication
Solution
1. Verify if the configured fail-permit policy is correct.
IPoE users support fail-permit based on an ISP domain. In the user authentication domain, specify a critical domain (also known as fail-permit domain) to accommodate users that access the authentication domain when all RADIUS servers are unavailable.
You can execute the display domain command to verify if a critical domain is configured under the ISP domain for user authentication. For example, in the display information, the Authen-radius-unavailable field shows that the configured critical domain is dm2.
<Sysname> display domain name abc
Domain: abc
Current state: Active
State configuration: Active
IPoE authentication scheme: RADIUS=rd
IPoE authorization scheme: RADIUS=rd
IPoE accounting scheme: RADIUS=rd
PPPoEA authentication scheme: None
PPPoEA authorization scheme: None
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out policy: Offline
Send accounting update:Yes
Session time: Exclude idle time
Dual-stack accounting method: Merge
Authen-fail action: Offline
Service type: HSI
DHCPv6-follow-IPv6CP timeout: 60 seconds
IPv6CP interface ID assignment: Disabled
NAS-ID: N/A
Service rate-limit mode: Separate
Web server IPv4 URL : Not configured
Track : Not configured
Web server IPv6 URL : Not configured
Track : Not configured
Web server URL parameters : Not configured
Web server IPv4 address : Not configured
Web server secondary IPv4 address: Not configured
Web server IPv6 address : Not configured
Web server secondary IPv6 address: Not configured
Secondary Web server IPv4 URL : Not configured
Track : Not configured
Secondary Web server IPv6 URL : Not configured
Track : Not configured
Secondary Web server IPv4 address : Not configured
Secondary Web server secondary IPv4 address: Not configured
Secondary Web server IPv6 address : Not configured
Secondary Web server secondary IPv6 address: Not configured
Redirect active time : Not configured
Redirect server IPv4 address : Not configured
Temporary redirect : Disabled
Redirect server IPv6 address : Not configured
Access user auto-save : Enabled
Authorization attributes:
Idle cut: Disabled
IGMP access limit: 4
MLD access limit: 4
Access limit: Not configured
Access interface VPN instance strict check: Disabled
Dynamic authorization effective attributes: Not configured
Authen-radius-unavailable: Online on domain dm2
Authen-radius-recover: Not configured
IP resource usage warning thresholds:
High threshold: Not configured
Low threshold: Not configured
IPv6 resource usage warning thresholds:
High threshold: Not configured
Low threshold: Not configured
L2TP-user RADIUS-force: Disabled
IPv6 ND autoconfiguration:
Managed-address flag: Unset
Other flag : Unset
If the Authen-radius-unavailable field shows Not configured or does not show the expected domain name, reconfigure the critical domain as follows:
# In ISP domain abc, configure domain dm1 as the critical domain.
<Sysname> system-view
[Sysname] domain name abc
[Sysname-isp-abc] authen-radius-unavailable online domain dm1
2. Verify if all RADIUS servers are unreachable under the RADIUS authentication scheme used for user authentication.
The device enters fail-permit state only when all RADIUS servers in the RADIUS scheme used for user authentication are in Block state. Execute the display radius scheme command to view the state of the authentication servers under the RADIUS scheme. In the display information, the State fields of all the RADIUS authentication servers are Active, indicating that the servers are reachable. The fail-permit function will not be triggered.
<Sysname> display radius scheme rd
RADIUS scheme name: rd
Index: 0
Primary authentication server:
IP : 2.2.2.2 Port: 1812
VPN : Not configured
State: Active (duration: 0 weeks, 0 days, 0 hours, 0 minutes, 19 seconds)
Most recent state changes:
2022/04/22 15:54:58 Changed to active state
Test profile: Not configured
Weight: 0
Primary accounting server:
IP : 2.2.2.2 Port: 1813
VPN : Not configured
State: Active (duration: 0 weeks, 0 days, 0 hours, 0 minutes, 8 seconds)
Most recent state changes:
2022/04/22 15:55:10 Changed to active state
Weight: 0
...
3. Verify if the RADIUS scheme is the authentication method in use.
If a backup RADIUS authentication method (Local or None) is configured, the backup method is used when the RADIUS authentication server cannot be reached. Fail-permit will not be triggered.
Execute the display domain command to view the authentication method configured for IPoE users in the user authentication domain. In the example, the IPoE access authentication scheme field shows that the preferred RADIUS authentication scheme is rd and local authentication can be used if the authentication scheme is unavailable.
<Sysname> display domain abc
Domain: abc
State: Active
Login authorization scheme: RADIUS=bbb
LAN access authentication scheme: RADIUS=rd, Local
LAN access authorization scheme: RADIUS=rd, Local
LAN access accounting scheme: RADIUS=rd, Local
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
Accounting start failure action: Online
Accounting update failure action: Online
Accounting quota out policy: Offline
Service type: HSI
Session time: Exclude idle time
Dual-stack accounting method: Merge
Authorization attributes:
Idle cut: Disabled
IGMP access limit: 4
MLD access limit: 4
Authen-fail action: Offline
Authen-radius-unavailable: Online domain dm2
Authen-radius-recover: Not configured
In this scenario, to trigger user fail-permit when the RADIUS server is unreachable, delete the configured backup authentication method, making RADIUS authentication the last method.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, debugging information, and diagnosis information.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
ITA service does not take effect
Symptom
After a user comes online, the ITA service policy either fails to take effect or stops functioning. The system does not independently meter and rate-limit traffic to different destination addresses according to the expected accounting levels as intended.
Common causes
The following are the common causes of this type of issue:
· The user access type does not support ITA service policies.
· No ITA service policy is configured on the device for the user.
· The RADIUS server has not authorized an ITA service policy for the user, and the ITA service policy to be used is not specified in the user authentication domain.
· The RADIUS server has authorized an EDSG service policy for the user, and an ITA service policy to be applied is specified in the user authentication domain.
· The accounting configuration in the ITA service policy is incorrect.
· The QoS configuration for marking ITA service traffic is incorrect.
· The user's ITA service traffic quota has been exhausted.
Troubleshooting flow
Figure 152 shows the troubleshooting flowchart.
Figure 152 Flowchart for troubleshooting ineffective ITA service
Solution
1. Verify that the user access type supports ITA service policies.
Currently, only the portal, IPoE, and PPP access types support applying ITA service policies.
You can execute the display access-user command and view the Access type field to identify the user access type.
¡ If the user access type is portal, IPoE, or PPP, proceed to step 2.
¡ If the user access type is any other type, no action is required.
2. Verify if the expected ITA service policy is configured on the device.
¡ Execute the display ita policy command to verify if an ITA policy is configured on the device.
¡ If the specified ITA policy does not exist, execute the ita policy command in system view to create the ITA service policy and configure the policy as needed. For more information, see Security Configuration Guide.
¡ If the specified ITA policy exists, proceed to step 3.
3. Verify if the ITA service policy is authorized for the user.
If the RADIUS server has authorized an ITA service policy for the user, the device will use the policy authorized by the RADIUS server. If no policy is authorized, the device uses the ITA service policy specified in the user authentication domain. Therefore, first verify if the RADIUS server has authorized an ITA service policy, and then check the configuration under the authentication domain as needed.
a. Execute the debugging radius packet command to enable RADIUS packet debugging. If the system prints H3C-Ita-Policy="XXX" when the user comes online, it indicates that an ITA service policy has been authorized for the user. In this case, proceed to step 4. If no ITA service policy is authorized, proceed to step b.
b. Execute the display domain command to view the user authentication domain configuration. If the command output includes ITA service policy: XXX (where XXX represents the policy name), it indicates that an ITA service policy is configured in the domain. In this case, proceed to step c. If no ITA service policy is specified, proceed to step d.
c. Verify if the RADIUS server has authorized an EDSG service policy for the user. If the RADIUS packet debugging information output when the user comes online includes H3C-AV-Pair := "edsg-policy:activelist=xxx" or Cisco-AVPair := "edsg-policy:username=[xxx]xxx", it indicates than an EDSG service policy has been authorized.
If an EDSG service policy has been authorized, the ITA service policy specified in the authentication domain does not take effect. In this case, first change the user authorization configuration on the RADIUS server to cancel EDSG policy issuance, and then proceed to step 4.
d. You can authorize an ITA service policy for users in either of the following methods
- Based on user authorization: Configure an ITA service policy on the RADIUS authentication server, and make users go offline and come online again.
- Based on authentication domain: Specify an ITA service policy in authentication domain view used by the user to come online, and then make the user go offline and come online again.
For example, specify ITA service policy ita1 in ISP domain test.
<Sysname> system-view
[Sysname] domain name test
[Sysname-isp-test] ita-policy ita1
4. Verify if the accounting scheme used by the ITA service policy is available.
Execute the display ita policy command to display ITA service policy configuration, and view the Accounting method field to identify the accounting scheme used by the ITA service policy.
For example, view the configuration of ITA service policy ita1.
<Sysname> display ita policy ita1
Accounting method : RADIUS=Rd1, None
Accounting merge : Enabled
Accounting levels :
Level 1 IPv4
Inbound CAR: CIR 100 kbps PIR 200 kbps
Outbound CAR: CIR 100 kbps PIR 200 kbps
Level 2 IPv6
Inbound CAR: CIR 300 kbps PIR 400 kbps
Level 3 IPv4
Level 8 IPv6
Traffic separation : Enabled
Separated levels: 1, 2, 3, 4
Traffic quota-out action: Online
Send accounting update: No
¡ If the Accounting method shows None, it indicates that no accounting method is specified for the ITA service policy. In this case, configure an accounting scheme in ITA service policy view, and make sure the specified accounting server is available.
For example, specify accounting scheme radius1 in ITA service policy ita1.
<Sysname> system-view
[Sysname] ita policy ita1
[Sysname-ita-policy-ita1] accounting-method radius-scheme radius1
¡ If the Accounting method field includes RADIUS=xxx, it indicates that the RADIUS accounting method is specified for the ITA service policy. In this case, make sure the RADIUS accounting server is available. If the accounting server is unavailable, see "RADIUS server not responding."
5. Verify if the ITA service is being charged according to the traffic accounting level.
By defining different traffic levels based on the destination addresses of users' traffic, you can use ITA to separate the traffic accounting statistics of different levels for each user.
a. Execute the display ita policy command to display ITA service policy configuration, and view the Accounting levels field to verify if the accounting level information is correct under the ITA service policy.
For example, view the configuration of ITA service policy ita1.
<Sysname> display ita policy ita1
Accounting method : RADIUS=Rd1, None
Accounting merge : Enabled
Accounting levels :
Level 1 IPv4
Inbound CAR: CIR 100 kbps PIR 200 kbps
Outbound CAR: CIR 100 kbps PIR 200 kbps
Level 2 IPv6
Inbound CAR: CIR 300 kbps PIR 400 kbps
Traffic separation : Enabled
Separated levels: 1, 2, 3, 4
Traffic quota-out action: Online
Send accounting update: No
- If the Accounting levels shows None, it indicates that no accounting level is specified. In this case, configure the accounting level for users under the ITA service policy.
For example, in ITA service policy ita1, specify the accounting levels for IPv4 traffic as level 2, and the accounting level for IPv6 traffic as level 5.
<Sysname> system-view
[Sysname] ita policy ita1
[Sysname-ita-policy-ita1] accounting-level 2 ipv4
[Sysname-ita-policy-ita1] accounting-level 5 ipv6
- If the Accounting levels field is not None, it indicates that an accounting level is specified. Make sure the accounting level is correct and then proceed to step b.
b. Verify if the QoS policy configuration for identifying user ITA service traffic is correct.
- To issue QoS policies to users based on authorized user profiles, verify that a QoS policy is applied to the user profile, and the traffic class and the traffic priority marking settings are correct in the QoS policy.
- To issue QoS policies to users based on interfaces, verify that a QoS policy is applied to the user access interface, and the traffic class and the traffic priority marking settings are correct in the QoS policy.
6. Verify if the ITA service has stopped working.
a. Execute the display value-added-service user xxx verbose command to view detailed information about value-added service users. If the Level-X State fields shows Offline, it indicates that the value-added service is offline.
b. If the service with the specified accounting level is offline, view the Offline reason field in the command output to identify the service offline reason. Possible options of the Offline reason field includes:
- Authentication failed.
- Accounting failed.
- Accounting update failed.
- Failed to send accounting packets.
- Traffic quota exhausted.
- Session timed out.
- Cut by the AAA server.
- Logged out by the RADIUS proxy.
If the quota is exhausted, no action is required. If the service was forced to go offline, validate with RADIUS or device administrator. For other cases, see "RADIUS server not responding" to exclude the server fault, and then proceed to step 8.
For example, view detailed information about the value-added service user with IP address 1.1.1.1.
<Sysname> display value-added-service user ip-address 1.1.1.1 verbose
Slot 97:
Basic:
User ID : 0x1
User name : user1
IP address : 1.1.1.1
IPv6 address : -
Service type : ITA
ITA:
Policy name : ita1
Accounting merge : Disabled
Traffic quota-out action : Offline
Level-1 State : Offline
Offline reason : Session timed out
Inbound CAR : CIR 1000kbps PIR 2000kbps
CBS -
Outbound CAR : CIR 1000kbps PIR 2000kbps
CBS -
Uplink packets/bytes : 4/392
Downlink packets/bytes : 4/392
IPv6 uplink packets/bytes : 0/0
IPv6 downlink packets/bytes : 0/0
Accounting start time : 2022-08-27 01:23:41
Online time (hh:mm:ss) : 0:00:12
Accounting state : Stop
Session timeout : Unlimited
Time remained : Unlimited
Realtime accounting interval: -
Traffic separate : Disabled
Traffic quota : Unlimited
Traffic remained : Unlimited
The above display shows that the Level-1 State is Offline, indicating that the ITA service of level 1 accounting is offline. The Offline reason field is Session timed out, indicating that the quota for the ITA service of level 1 accounting has been exhausted.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, log information, and alarm messages.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
EDSG service does not take effect
Symptom
After a user comes online, the EDSG service policy does not take effect or stops taking effect. The user is not provided with independent accounting and dynamic rate limit services as expected based on the EDSG value-added service parameters.
Common causes
The following are the common causes of this type of issue:
· The user access type does not support EDSG service policies.
· No EDSG service policy is configured on the device for the user.
· The RADIUS server failed to authorize an EDSG service policy for the user.
· The RADIUS server has authorized both an EDSG service policy and an ITA service policy for the user.
· The EDSG service policy information (including EDSG policy name, username, and password) delivered by the RADIUS server is invalid and cannot be recognized by the device.
· The authentication or accounting scheme specified in the EDSG service policy is not available.
· The EDSG service policy has stopped working.
Troubleshooting flow
Figure 153 shows the troubleshooting flowchart.
Figure 153 Flowchart for troubleshooting ineffective EDSG service
Solution
1. Verify that the user access type supports EDSG service policies.
Currently, only the IPoE and PPP access types support applying EDSG service policies.
You can execute the display access-user command and view the Access type field to identify the user access type.
¡ If the user access type is IPoE or PPPoE, proceed to step 2.
¡ If the user access type is any other type, no action is required.
2. Verify if the expected EDSG service policy is configured on the device.
¡ Execute the display service policy command to verify if an EDSG policy is configured on the device.
¡ If the specified EDSG policy does not exist, execute the service policy command in system view to create the EDSG service policy and configure the policy as needed. For more information, see Security Configuration Guide.
¡ If the specified EDSG policy exists, proceed to step 3.
3. Verify if the RADIUS server has authorized an EDSG service policy for the user.
The device can recognize only EDSG service policy information (including EDSG policy name, username, and password) delivered by the RADIUS server through private attributes H3c-AV-Pair and Cisco-AVPair.
a. Execute the debugging radius packet command to enable RADIUS packet debugging. If the RADIUS packet debugging information output when the user comes online includes H3C-AV-Pair := "edsg-policy:activelist=xxx" or Cisco-AVPair := "edsg-policy:username=[xxx]xxx", it indicates than an EDSG service policy has been authorized. In this case, proceed to step 4. If no EDSG service policy has been authorized, proceed to step b.
b. Configure an EDSG service policy on the RADIUS authentication server, and make users go offline and come online again.
4. Verify if the RADIUS server has authorized both an ITA service policy and an EDSG service policy for the user.
If the RADIUS server issues both ITA and EDSG service policies for the same user, the EDSG service policy will not take effect. In this case, change the user authorization configuration on the RADIUS server. Make sure the server authorizes only an EDSG service policy for the user.
|
NOTE: When RADIUS packet debugging is enabled, if the RADIUS server authorizes an ITA service policy for a user, the system prompt when the user comes online will include H3C-Ita-Policy="XXX". |
5. Verify if the device can identify the EDSG service policy information issued by the RADIUS server.
The device can only recognize EDSG service policy information (policy name, username, password) delivered through private attributes (H3c-AV-Pair or Cisco-AVPair). Confirm with the server administrator whether the username and password are issued using other unsupported attributes.
¡ If both the username and password are issued simultaneously, confirm the RADIUS attribute name with the server administrator. Then, enable the RADIUS attribute interpretation function under the RADIUS scheme for user authentication, and configure the RADIUS attribute translation rule to convert the attribute to H3C-AV-Pair or Cisco-AVPair.
For example, if RADIUS scheme rs1 is used for user authentication, enable RADIUS attribute translation in RADIUS scheme view of RADIUS scheme rs1, and configure the system to convert received H3c-Server-String attributes into H3c-AVPair attributes.
<Sysname> system-view
[Sysname] radius scheme rs1
[Sysname-radius-rs1] attribute translate
[Sysname-radius-rs1] attribute convert H3c-Server-String to H3c-AVPair received
¡ If username and password are not issued simultaneously, proceed to step 6.
6. Verify if the authentication and accounting schemes used by the EDSG service are available.
Execute the display service policy command to view EDSG service policy information. The Authentication method and Accounting method fields display the authentication scheme and accounting scheme used by the EDSG service policy, respectively.
For example, view the configuration of EDSG service policy sp1.
<Sysname> display service policy sp1
Service policy: sp1
Service ID : 10
Authentication method : RADIUS=Rd1, None
Accounting method : RADIUS=Rd1, None
Traffic statistics : Separate
Inbound CAR : CIR=222 kbps, PIR=2222 kpbs, CBS=5678 bytes, EBS=5678 bytes
Outbound CAR : CIR=222 kbps, PIR=2222 kpbs
Dual-stack rate limit mode : Merge
Service rate-limit mode : Separate
¡ If both the Authentication method and Accounting method fields are None, it indicates that the EDSG service for the user does not require separate authentication or accounting. In this case, proceed to step 7.
¡ If the Authentication method and Accounting method field include the RADIUS=xxx string, it indicates that the EDSG service for the user requires separate authentication and accounting. In this case, make sure the RADIUS authentication server and accounting server are available and the corresponding authentication username and password are created on the server.
|
NOTE: If the EDSG service policy delivered by the server during user login includes a username and password, the device uses the username and password for EDSG authentication. Otherwise, the device uses the login username and password for EDSG service validation. |
7. Verify if the EDSG service has stopped working.
a. Execute the display value-added-service user xxx verbose command to view detailed information about value-added service users. If the Level-X State fields shows Offline, it indicates that the value-added service is offline.
b. If the service with the specified accounting level is offline, view the Offline reason field in the command output to identify the service offline reason. Possible options of the Offline reason field includes:
- Authentication failed.
- Accounting failed.
- Accounting update failed.
- Failed to send accounting packets.
- Traffic quota exhausted.
- Session timed out.
- Cut by the AAA server.
- Logged out by the RADIUS proxy.
If the quota is exhausted, no action is required. If the service was forced to go offline, validate with RADIUS or device administrator. For other cases, see "RADIUS server not responding" to exclude the server fault, and then proceed to step 8.
For example, view detailed information about the value-added service user with IP address 1.1.1.1.
<Sysname> display value-added-service user ip-address 1.1.1.1 verbose
Slot 97:
Basic:
User ID : 0x80000033
User name : pp3
IP address : 1.1.1.1
IPv6 address : -
Service type : EDSG
Service policy:
Service ID : 8
Policy name : sp8
Policy username : pp3
State : Offline
Offline reason : Session timed out
Traffic statistics : Separate
Service rate-limit mode : Separate
Dual-stack rate limit mode : Merge
Traffic quota-out action : Offline
Inbound CAR : -
Outbound CAR : -
Uplink packets/bytes : 0/0
Downlink packets/bytes : 0/0
IPv6 uplink packets/bytes : 0/0
IPv6 downlink packets/bytes : 0/0
Accounting start time : 2022-08-27 05:03:49
Online time (hh:mm:ss) : 0:03:13
Accounting state : Stop
Session timeout : Unlimited
Time remained : Unlimited
Realtime accounting interval : 20 seconds
Traffic quota : Unlimited
Traffic remained : Unlimited
8. If the issue persists, collect the following information and contact Technical Support:
¡ Execution results of the above steps.
¡ Device configuration file, log information, and alarm messages.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
MAC authentication issues
MAC authentication failures
Symptom
Failures or exceptions occur in MAC authentication for a user.
Common causes
The following are the common causes of this type of issue:
· The user has come online through other authentication methods.
· MAC authentication is not enabled globally or on the interface.
· The authentication method configured on the device is not consistent with that on the RADIUS server.
· The authentication domain used by the MAC authentication user and related settings are not configured correctly.
· No response from the RADIUS server.
· The local authentication or RADIUS authentication request is rejected.
· Failed to deploy the authorization attributes.
· The user's MAC address has been set as a silent MAC address.
· The number of concurrent online MAC authentication users on the interface has reached the maximum.
Troubleshooting flow
Figure 129 shows the troubleshooting flowchart.
Figure 154 Flowchart for troubleshooting MAC authentication failures
Solution
CAUTION: · Do not enable the debugging commands when the device is operating normally. Enable the commands when you reproduce the issue after it has occurred. · Save the execution results of the following steps promptly, so that information can be quickly collected and provided if the issue persists. |
1. Check whether the user has come online through other authentication methods.
By default, the authentication order on a port is 802.1X authentication, MAC authentication, and then Web authentication.
Execute the display dot1x connection command to check whether the user has successfully passed 802.1X authentication and come online. If the user has come online, determine whether the user needs to come online again through MAC authentication. If MAC authentication is required, log out the user, disable 802.1X authentication, and then configure the user to perform MAC authentication.
2. Check whether MAC authentication is enabled globally or on the interface:
a. Execute the display mac-authentication command, if MAC authentication is not configured. is prompted, global MAC authentication is disabled. To enable it, execute the mac-authentication command in system view.
b. Execute the display mac-authentication command. If global MAC authentication configuration exists, but MAC authentication configuration on the user authentication interface does not exist, execute the mac-authentication command in the view of the user authentication interface.
3. Check whether the authentication method configured on the device is consistent with that on the RADIUS server.
The device supports using both CHAP and PAP authentication methods for MAC authentication.
Execute the dis mac-authentication command to check whether the authentication method used for MAC authentication displayed in the Authentication method field is consistent with that configured on the RADIUS server. If they are different, execute the mac-authentication authentication-method command to modify the configuration on the device.
4. Check whether the authentication domain and related configurations are configured correctly.
MAC authentication users accessing through the port will select the authentication domain in the following order: the authentication domain specified on the port, the authentication domain specified in system view, and then the default authentication domain of the system.
a. Execute the display mac-authentication command on the device to check whether a MAC authentication domain for user authentication is configured on the system and authentication interface.
<Sysname> display mac-authentication
Global MAC authentication parameters:
MAC authentication : Enabled
Authentication method : PAP
Authentication domain : Not configured, use default domain
…
GigabitEthernet2/0/1 is link-up
MAC authentication : Enabled
Carry User-IP : Disabled
Authentication domain : Not configured
…
b. If an authentication domain used for MAC authentication users is configured on the authentication interface, execute the display domain command to check whether the authentication scheme in the authentication domain is configured correctly. If an authentication domain is not configured on the authentication interface, but is configured in system view, execute the display domain command to check whether the authentication scheme in the authentication domain is configured correctly.
c. If no authentication domain used for MAC authentication users is configured on both the authentication interface and system view, check the configuration of the default authentication domain.
d. If no default authentication domain exists and a domain to accommodate users assigned to nonexistent domains has been configured by the domain if-unknown command, check whether the authentication scheme in the domain is configured correctly.
e. If none of the authentication domains mentioned above exists on the device, the user cannot perform authentication.
5. Check whether the RADIUS server is responsive.
For more information, see troubleshooting RADIUS server unresponsiveness in AAA troubleshooting procedures.
6. Check whether the authentication request is rejected:
a. Execute the debugging mac-authetication event command to enable debugging for MAC authentication events.
- If the system prompts Local authentication request was rejected., it indicates that the local authentication request is rejected. The causes for local authentication rejection includes non-existent local user, incorrect user password, and incorrect service type.
- If the system prompts The RADIUS server rejected the authentication request., it indicates that the request is rejected by the RADIUS server. Common causes for server authentication rejection includes missing username on the server, inconsistent username formats, incorrect username password, and RADIUS server policy mismatch.
Execute the debugging radius error command on the device to enable debugging for RADIUS errors. You can also execute the test-aaa command to perform a RADIUS request test on the device. After identifying the issue, adjust the server, device, and client configurations accordingly.
b. Execute the display aaa online-fail-record command and view the authentication failure reasons displayed in the Online failure reason field. For more information, see AAA troubleshooting procedures.
7. Check whether authorization attributes failed to be deployed.
Execute the debugging mac-authentication event command to enable debugging for MAC address authentication events. If the device prompts Authorization failure.,, it indicates an authorization failure.
a. Check whether the authorization-fail-offline feature has been configured in system view using the authorization fail user offline command. If this feature is not configured, users can stay online after authorization failures by default. This indicates that the authentication failure is not caused by an authorization failure. Proceed with other steps.
b. If the authorization-fail-offline feature is configured, execute the mac-authentication access-user log enable failed-login command to enable logging for MAC authentication user login failures. You can identify the attributes (such as authorization ACL and VLAN) that failed to be deployed from the logs.
c. Check whether the authorization attribute settings on the server are correct. Make sure the authorization attributes deployed by the server are correct.
d. Execute commands such as display acl or display vlan to check whether the corresponding authorization attributes exist on the device. If the attributes do not exist, create relevant authorization attributes on the device, and make sure that the user can obtain the authorization information.
8. Check whether the user's MAC address has been set as a silent MAC address.
Execute the display mac-authentication command to view the information displayed in the Silent MAC users field. If the user's MAC address is a silent MAC address, wait for the quiet timer to age before performing MAC authentication again. You can reconfigure the quiet timer using the mac-authentication timer quiet command.
9. Check whether the number of concurrent online MAC authentication users on the interface has reached the maximum:
a. Execute the display mac-authentication command to view information on the authentication interface. View the maximum number of concurrent online users allowed on the interface displayed in the Max online users field and the number of current online users displayed in the Current online users field. Compare the two numbers to determine whether the number of concurrent online MAC authentication users on the interface has reached the maximum.
b. If the maximum number of concurrent online users has been reached, execute the mac-authentication max-user command to increase the maximum number of concurrent MAC authentication users on the interface.
c. If the number of maximum number of concurrent MAC authentication users cannot be increased, wait for other users to go offline or use a different port for user access.
10. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Log information collected after you execute the mac-authentication access-user log enable command.
¡ Debugging information collected after you execute the debugging mac-authentication all and debugging radius all commands.
Related alarm and log messages
Alarm messages
N/A
Log messages
· MACA_ENABLE_NOT_EFFECTIVE
· MACA_LOGIN_FAILURE
MAC authentication user disconnections
Symptom
A MAC authentication user is disconnected unexpectedly after passing authentication and coming online.
Common causes
The following are the common causes of this type of issue:
· The user has come online using 802.1X authentication.
· MAC authentication-related configurations on the device have changed.
· Real-time accounting of MAC authentication user's traffic has failed.
· The user failed MAC reauthentication.
· The server forces the user offline.
· The user goes offline after offline detection is enabled.
· The user session has timed out.
Troubleshooting flow
Figure 130 shows the troubleshooting flowchart.
Figure 155 Flowchart for troubleshooting MAC authentication user disconnections
Solution
CAUTION: · Do not enable the debugging commands when the device is operating normally. Enable the commands when you reproduce the issue after it has occurred. · Save the execution results of the following steps promptly, so that information can be quickly collected and provided if the issue persists. |
1. Check whether disconnection occurs because the user has come online after passing 802.1X authentication.
By default, the authentication order on a port is 802.1X authentication, MAC authentication, and then Web authentication.
If the user first passes MAC authentication, Web authentication is terminated immediately, but 802.1X authentication will proceed. If the user also passes 802.1X authentication, the 802.1X authentication information will overwrite the MAC authentication information of the user.
Execute the display dot1x connection command to check whether the user has successfully passed 802.1X authentication and come online. If the user has come online, determine whether the user needs to come online again through MAC authentication. If MAC authentication is required, log off the user, disable 802.1X authentication, and then configure the user to perform MAC authentication.
2. Check whether MAC authentication-related configurations on the device have changed:
a. Execute the display mac-authentication command to check whether the configurations (such as feature enabling and authentication method) related to MAC authentication on the device have changed.
b. Execute the display domain command to check whether the configurations (such as authorization attributes) in the user authentication domain have changed.
3. Check whether real-time accounting failed.
Execute the debugging mac-authentication event command to enable debugging for MAC authentication events. If the system prompts Real-time accounting failure., it indicates that real-time charging accounting failed. Check the link state between the device and the accounting server, and whether the related accounting configurations on the device and the accounting server have changed.
4. Check whether disconnection occurs because of a reauthentication failure:
a. Execute the display mac-authentication command and view the information displayed in the Periodic reauth field to check whether MAC reauthentication is enabled on the authentication interface.
b. Execute the mac-authentication access-user log enable logoff command to enable logging for MAC authentication user logoffs.
c. Identify the reasons for the reauthentication failure as described in "MAC authentication failures."
5. Check whether the RADIUS server forced the user offline.
Execute the debugging mac-authentication event command to enable debugging for MAC authentication events. If the system prompts The RADIUS server forcibly logged out the user., it indicates that the server forced the user offline. Please contact the server administrator to identify the reasons for forcible logoff by the server.
6. Check whether no user packet is received before the offline detect timer expires:
a. Execute the display mac-authentication command and view the information displayed in the Offline detection field on the authentication interface to check whether the offline detection has been enabled.
b. Execute the debugging mac-authentication event command to enable debugging for MAC authentication events. If the system prompts Offline detect timer expired., it indicates that no packet was received from the online MAC authentication user on the interface before the offline detect timer expires. The device disconnected the user connection, causing the user to go offline.
c. Check the link state between the user client and the device to identify the reasons for packet sending failures.
7. Check whether the user session has timed out:
a. Execute the debugging radius packet command to enable RADIUS packet debugging. Verify that the Session-Timeout attribute is carried in the responses from the server.
b. Execute the debugging mac-authentication event command to enable debugging for MAC authentication events. If the system prompts User session timed out., it indicates that the user goes offline because of user session timeout.
c. Disconnections caused by user session timeouts are normal. Users can reinitiate a request to come online.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Offline reasons displayed after you execute the display aaa abnormal-offline-record or display aaa normal-offline-record command.
¡ Log information collected after you execute the mac-authentication access-user log enable command.
¡ Debugging information collected after you execute the debugging mac-authentication all and debugging radius all commands.
Related alarm and log messages
Alarm messages
N/A
Log messages
MACA_LOGOFF
Password control issues
Password change required upon admin login
Symptom
When an administrator logs in to the device through local authentication, the system identifies that the password strength does not meet the requirements and prompts the administrator to change the current login password.
Common causes
The following are the common causes of this type of issue:
· The password control configured in local user view has a high password strength check.
· The password control configured in local user group view has a high password strength check.
· The password control configured in system view has a high password strength check.
Troubleshooting flow
Figure 156 shows the troubleshooting flowchart.
Figure 156 Flowchart for troubleshooting password change upon the login of an administrator
Solution
1. Identify whether to reduce the current password check strength.
With the global password control feature enabled, when a device management user that log in via Telnet, SSH, HTTP, and HTTPS enters the login password, the system will check the user's login password according to password restrictions. The password restrictions include the current password composition policy, minimum password length, and password complexity policy. If the password does not meet the above password restrictions, the system considers the password weak. For information about password control, see Security Configuration Guide.
By default, when a user logs in to the device with a weak password, the system will generate an alarm message. If the current password strength check is higher than the actual login control requirements, identify the scope of changes (local user, user group, or all local users). Then, perform the subsequent steps to reduce the password check strength in the corresponding view.
2. Reduce the password check strength of password control for the local user.
Execute the local-user command to enter local user view and perform the following operations:
¡ Execute the password-control composition command to configure the password composition policy. In this example, a password must contain a minimum of four character types and a minimum of five characters for each type.
¡ Execute the password-control length command to set the minimum password length. In this example, the minimum password length is 16 characters.
¡ Execute the password-control complexity command to configure the password complexity policy. In this example, the device will identify whether a password contains the username.
<Sysname> system-view
[Sysname] local-user test class manage
[Sysname-luser-manage-test] password-control composition type-number 4 type-length 5
[Sysname-luser-manage-test] password-control length 16
[Sysname-luser-manage-test] password-control complexity user-name check
3. Reduce the password check strength of password control for the user group.
Execute the user-group command to enter user group view and perform the following operations:
¡ Execute the password-control composition command to configure the password composition policy for the user group.
¡ Execute the password-control length command to set the minimum password length.
¡ Execute the password-control complexity command to configure the password complexity policy.
4. Reduce the password check strength of password control for all local users.
In system view, perform the following operations:
¡ Execute the password-control composition command to configure the password composition policy.
¡ Execute the password-control length command to set the minimum password length.
¡ Execute the password-control complexity command to configure the password complexity policy.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, diagnostics information, and prompt messages of the device.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure to create a local user or configure a user password
Symptom
When you fail to create a local user, the system generates the Add user failed message.
When you fail to configure the local user password, the system generates the Operation failed message.
Common causes
The following are the common causes of this type of issue:
· The memory usage of the device has reached the specified threshold.
· The local file system of the device is running out of memory space.
· An anomaly occurs on the local lauth.dat file of the device.
Troubleshooting flow
Figure 157 shows the troubleshooting flowchart.
Figure 157 Flowchart for troubleshooting failure to create a local user or configure a user password
Solution
1. Identify whether the amount of the free memory space of the device has reached the specified memory alarm threshold.
If you fail to change the local user's password, directly proceed to step 2.
Execute the display memory-threshold command to view memory alarm thresholds and statistics. You can obtain the current state of the free memory in the system. During the period when the system memory is in the minor, severe, and critical alarm threshold states, creating local users is not allowed.
<Sysname> display memory-threshold
Memory usage threshold: 100%
Free-memory thresholds:
Minor: 96M
Severe: 64M
Critical: 48M
Normal: 128M
Early-warning: 144M
Secure: 160M
Current free-memory state: Normal (secure)
...
You can execute the monitor process command to check the process statistics in any view. Enter m to locate the processes that are consuming excessive memory resources, sorted by memory usage. If necessary, clean up the memory space. After the memory alarm state is cleared, try again to create local users.
2. Identify whether the storage space of the local file system on the device is insufficient.
If any of the following types of log messages are output on the device, a file system error causes this issue:
¡ PWDCTL/3/PWDCTL_FAILED_TO_OPENFILE: Failed to create or open the password file.
¡ PWDCTL/3/PWDCTL_FAILED_TO_WRITEPWD: Failed to write the password records to file.
¡ PWDCTL/3/PWDCTL_NOENOUGHSPACE: Not enough free space on the storage media where the file is located.
Execute the dir command in user view to check the remaining capacity information of local storage media (such as flash). If no enough remaining space is available, delete unnecessary files.
3. Identify whether the local lauth.dat file is operating properly.
After the global password control feature is enabled, the device will automatically generate the lauth.dat file to record the local user's authentication and login information. If this file is manually deleted or edited, an anomaly occurs on local authentication. Execute the dir command in user view to check the presence of the lauth.dat file in the local storage media, such as flash.
<Sysname> dir
Directory of flash: (EXT4)
0 drw- - Aug 16 2021 11:45:37 core
1 drw- - Aug 16 2021 11:45:42 diagfile
2 drw- - Aug 16 2021 11:45:57 dlp
3 -rw- 713 Aug 16 2021 11:49:41 ifindex.dat
4 -rw- 12 Sep 01 2021 02:40:01 lauth.dat
...
If this file is absent, is 0 in size, or is very small (less than 20B when an anomaly might occur), contact Technical Support. If the current configuration is required urgently, you can try to resolve this issue by enabling the global password control feature.
<Sysname> system-view
[Sysname] undo password-control enable
[Sysname] password-control enable
If this issue is resolved, you can try to re-create the local user or configure the user password.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, diagnostics information, and prompt messages of the device.
Related alarm and log messages
Alarm messages
N/A
Log messages
· PWDCTL/3/PWDCTL_FAILED_TO_WRITEPWD
· PWDCTL/3/PWDCTL_FAILED_TO_OPENFILE
· PWDCTL/3/PWDCTL_NOENOUGHSPACE
Admin login failure due to idle timeout
Symptom
When an administrator uses local authentication to log in to the device, the login might fail due to account idle timeout. The system generates the prompt message of Failed to login because the idle timer expired.
Common causes
The main reason for this issue is that when a user has not logged in successfully within the configured idle time since their last login, their account is immediately invalidated once the idle time expires. Then, the system no longer permits the user to log in using that account.
Troubleshooting flow
Figure 158 shows the troubleshooting flowchart.
Figure 158 Flowchart for troubleshooting login failure of an administrator due to idle timeout
Solution
1. Identify whether other administrators or methods can log in to the device.
¡ If other administrators or methods (such as console login) can log in to the device, only the target user is prevented from logging in to the device. Therefore, after other administrators log in, they can delete this local user and re-create it, or edit the idle time of the user account (by the password-control login idle-time command). If the idle time is set to 0, the system disables the account idle time restriction.
¡ If no other administrators or methods can log in to the device, proceed to step 2.
2. Identify whether the device is enabled with SNMP.
You can attempt to log in to the device through the network management system (NMS):
¡ If SNMP is enabled, use the MIB to change the system time to a point before the idle time, and then log in to the device with this administrator account. The MIB node for changing system time is hh3cSysLocalClock (1.3.6.1.4.1.25506.2.3.1.1.1) in HH3C-SYS-MAN-MIB.
After a successful re-login by the administrator, restore the system time and disable the idle timeout check for user accounts.
¡ If you disable SNMP, the MIB is not available. You can try to restart the device and enter the EXTENDED-BOOTWARE menu. Then, select either the option to bypass console authentication or bypass the configuration file option to access the system. As a best practice, contact Technical Support to perform this step.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, diagnostics information, and prompt messages of the device.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Portal issues
Portal authentication page pushing failures
Symptom
When a user accesses any webpage that is not the portal Web server page, or directly accesses the portal Web server page, no portal authentication page is pushed for the user.
Common causes
The following are the common causes of this type of issue:
· The host, server, and device cannot reach one other.
· HTTP proxy has been enabled on the browser.
· The webpage address entered by the user contains a non-standard TCP port number (neither 80 nor 443).
· Exceptions occur on the intermediate network or DNS server.
· HTTPS redirect on the device is not working properly.
· HTTP Strict Transport Security (HSTS) has been enabled on the HTTPS website the user accesses.
· The portal server cannot identify the escaped characters of special characters in the URL.
· Portal server configuration errors.
Troubleshooting flow
Figure 159 shows the troubleshooting flowchart.
Figure 159 Flowchart for troubleshooting portal authentication page pushing failures
Solution
1. Verify that the route configuration on the client and portal server are correct.
After disabling the firewall on the client, use the ping command to check whether the portal server is reachable. If the server cannot be pinged, first check whether the route configurations on the client and the portal server are correct. Then proceed with the following steps:
¡ Check whether the return route from the portal server to the client is configured correctly.
¡ Whether multiple NICs are present on the client or the portal server.
If multiple NICs exist, some traffic between the client and the server might not pass through the network configured with portal authentication. Identify the NIC from which the user's Web access traffic is sent out. For example, if a Windows client is used, execute the route print command in the CMD window to view specific route information and identify the NIC.
Finally, use the ping command to test the connectivity for each pair of devices along the network path so as to locate the issue. First, ping the gateway from the client (for successful ping, you must disable authentication first), and then ping the server from the gateway.
2. Whether HTTP proxy has been enabled on the browser of the client.
If HTTP proxy has been enabled on the browser, users might be unable to access the portal authentication page. You must disable HTTP proxy. For example, open the Windows Internet Explorer browser, click Tools, select Internet Options > Connections > LAN Settings, and then clear the Use a proxy server for your LAN option in the Proxy server area.
3. Check whether the entered address includes a non-standard TCP port number.
Non-standard TCP port numbers refer to port numbers other than 80 or 443. If the webpage address entered by the user includes a non-standard TCP port number, the portal authentication page might be prevented from popping up. For example, http://10.1.1.1:18008. For HTTP addresses, use port 80. For HTTPS protocol addresses, use port 443.
4. Check whether exceptions have occurred on the intermediate network or DNS server:
a. Check whether the DNS server IP address is configured as a permitted address on the device.
b. Check the connectivity of the intermediate network and troubleshoot DNS server issues. On the gateway, collect traffic statistics on the downlink interface connecting the client and the uplink interface connecting the DNS server, or mirror and capture the client's packets accessing the DNS server. Confirm whether the gateway has sent out DNS requests, but has not received responses.
5. Check whether HTTPS redirect has been enabled:
a. Check whether the SSL server policy associated with the HTTPS redirect server exists. If not, complete the relevant configuration.
6. Check whether HSTS has been enabled on the HTTPS website.
With HSTS enabled, an HTTPS website requires browsers to access it using HTTPS and the certificate must be valid. When the device redirects the user's browser through HTTPS, the device uses a self-signed certificate (because it does not have the target website's certificate) to impersonate the target website and establish an SSL connection with the browser. If the browser detects the certificate as untrusted, HTTPS redirect will fail, preventing the portal authentication page from popping up. This issue is related to the specific HSTS protocol enforcement requirements set by the website, and cannot be resolved. In this case, try other websites as a best practice.
7. The portal server does not support encoding of special characters in the URL.
In actual applications, some portal Web servers cannot identify the escaped characters of any combination of special characters $-_.+!*'();,/?:@, so they cannot correctly provide the Web authentication page to users. To resolve this issue, you can execute the portal url-unescape-chars command to unescape these special characters.
# Configure the unescaped special characters in redirect portal Web server URLs as ;().
<Sysname> system-view
[Sysname] portal url-unescape-chars ;()
8. Check whether the portal server configuration is correct:
¡ Check whether an IP address group is configured on the portal server and whether the device is associated with an IP address group.
¡ Check whether the client IP address is within the range of the IP address group configured on the portal server.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Device configuration files, log information, and alarm messages.
¡ Screenshots of portal-related configurations on the server.
¡ Files containing the packets captured between the device and the server.
¡ Screenshots of the issue taken on the client's browser.
¡ Portal filtering rules for packet matching displayed after you execute the display portal rule command.
¡ If the issue persists, execute the debugging portal or debugging ip packet command to collect debugging information.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Portal authentication failures
Symptom
Failures or exceptions occur in portal authentication for a user.
Common causes
The following are the common causes of this type of issue:
· The shared key configured in portal authentication server view on the device is inconsistent with that configured on the portal authentication server.
· The address of the portal authentication server configured in portal authentication server view on the device does not exist.
· The portal packets are invalid.
· The authentication domain used by the portal user is configured incorrectly.
· The shared key configured in RADIUS scheme view on the device is inconsistent with that configured on the RADIUS server.
· Failed to obtain the physical information of the user.
· The RADIUS server has denied the authentication.
· The RADIUS server is unresponsive.
· Failed to deploy the authorization ACL or user profile.
Troubleshooting flow
Figure 160 shows the troubleshooting flowchart.
Figure 160 Flowchart for troubleshooting portal authentication failures
Solution
1. Check whether shared key configured in portal authentication server view on the device is inconsistent with that configured on the portal authentication server.
If the IMC server is used, enter the username and password and then click Log In, check whether a message indicating the request to the device timed out is prompted. If so, this indicates that the shared key configured in portal authentication server view on the device might be inconsistent with that configured on the server.
You can troubleshoot using the following methods:
¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the shared key configured on the device is inconsistent with that configured on the portal server.
*Jul 28 17:51:20:774 2021 Sysname PORTAL/7/ERROR: -MDC=1; Packet validity check failed due to invalid key.
¡ Execute the display portal auth-error-record command to check whether the following information is displayed in the Auth error reason field of the command output: Packet validity check failed due to invalid authenticator.
If the shared keys are inconsistent, modify the shared key configured in portal authentication server view on the device or on the portal authentication server to ensure consistency.
2. Check whether the address of the portal authentication server configured in portal authentication server view on the device exists.
When the device receives an authentication packet sent by the portal server, it validates whether the source IP address of the packet is in the list of portal authentication server addresses configured on the device. If not, the device considers the authentication packet to be invalid and drops it.
If the IMC server is used, enter the username and password and then click Log In, check whether a message indicating the request to the device timed out is prompted. If so, this indicates that the authentication server address configured in portal server view on the device might not exist.
You can troubleshoot using the following method:
¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the IP address of the portal authentication server configured on the device is incorrect.
*Jul 28 19:15:10:665 2021 Sysname PORTAL/7/ERROR: -MDC=1;Packet source unknown. Server IP:192.168.161.188, VRF Index:0.
¡ Execute the display portal auth-error-record command to check whether the following information is displayed in the Auth error reason field of the command output: Packet source unknown. Server IP:X.X.X.X, VRF index:0.
If the address is incorrect, execute the ip command to modify the portal server's IP address in portal authentication sever view on the device.
3. Check whether the portal packets are invalid.
Upon receiving a portal packet sent by the portal server, the device performs a validity check on the packet. If the packet length is incorrect, or errors exist on the packet checksum, the packet will be considered as invalid and dropped.
You can check whether the portal packet is invalid using the following methods:
¡ Execute the display portal packet statistics command to check whether invalid packets exist and whether the number of invalid packets is increasing. If invalid packets exist, execute the debugging portal error command on the device to enable portal error debugging for troubleshooting.
¡ Execute the display portal auth-error-record command to check whether the following information is displayed in the Auth error reason field of the command output: Packet type invalid or Packet validity check failed because packet length and version don't match.
If the portal packets are invalid, identify the reason for invalidity and make modifications accordingly.
4. Check the authentication domain configuration used by the portal user:
The device selects the authentication domain for a portal user in this order:
a. ISP domain specified for the interface.
b. ISP domain carried in the username.
c. System default ISP domain.
If the chosen domain does not exist on the device, the device searches for the ISP domain configured to accommodate users assigned to nonexistent domains. If no such ISP domain is configured, user authentication fails.
Execute the display portal command to check whether an authentication domain is used on the authentication interface.
¡ If an authentication domain is used, check whether the authentication domain exists on the device, and whether the authentication, authorization, and accounting configurations in the domain are configured correctly.
¡ If no authentication domain is used, check whether the domain included in the username exists. If the domain does not exist, check whether the default authentication domain exists and whether the configuration in the default authentication domain is correct.
If the IMC server is used, enter the username and password and then click Log In, check whether a message indicating request rejection is prompted. If so, this indicates that the authentication domain configuration on the device might be incorrect.
You can troubleshoot using the following method:
¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the authentication domain is configured incorrectly on the device and further troubleshooting is required.
*Jul 28 19:49:12:725 2021 Sysname PORTAL/7/ERROR: -MDC=1; User-SM [21.0.0.21]: AAA processed authentication request and returned error.
¡ Execute the display portal auth-error-record command to check whether the following information is displayed in the Auth error reason field of the command output: AAA authentication failed or AAA returned an error.
If the authentication domain is configured incorrectly, execute the related command to configure a correct authentication domain used by the portal user.
5. Check whether the shared key configured in RADIUS scheme view on the device is consistent with that configured on the RADIUS server.
If the IMC server is used, enter the username and password and then click Log In, check whether a message indicating the request to the device timed out is prompted. If so, this indicates that the shared key configured in RADIUS scheme view is inconsistent with that configured on the server.
Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the shared key configured in RADIUS scheme view is inconsistent with that configured on the server.
*Jul 28 19:49:12:725 2021 Sysname RADIUS/7/ERROR: -MDC=1; The response packet has an invalid Response Authenticator value.
When the device initiates an authentication request to the RADIUS server, the server first validates the shared key used in the request. If the validation fails, the server notifies the device of the failure. If the shared key configuration is incorrect, make sure the shared key configured in the RADIUS scheme view is consistent with that configured on the server.
6. Check whether the device failed to obtain physical information about the user.
During the user's onboarding process, portal searches for the user's physical information, and identifies information such as the interface through which the user accesses based on the corresponding physical information. If the search for physical information fails, the user will fail to come online.
You can troubleshoot using the following method:
¡ Execute the debugging portal event command on the device to enable portal event debugging. If the following information is displayed on the device, it indicates that the device failed to obtain physical information about the user.
*Jul 28 19:49:12:725 2021 Sysname PORTAL/7/ERROR: -MDC=1; User-SM [21.0.0.21]: Failed to find physical info for ack_info.
¡ Execute the display portal auth-error-record or display portal auth-fail-record command to check whether the following information is displayed in the Auth error reason field of the command output: Failed to obtain user physical information or Failed to get physical information.
After you confirm that obtaining the user's physical information failed, check whether an entry for the authentication user exists on the device. If no entry exists, go to the next step.
7. Check whether the RADIUS server has rejected the authentication:
a. Many reasons might cause the RADIUS server to reject the authentication of a user. Most common ones include incorrect username or password, or failure in matching the RADIUS server's authorization policy. To resolve these issues, first check the authentication logs on the server, or enable RADIUS error debugging on the device by using the debugging radius error command to view the relevant debugging information. After identifying the root causes, adjust the configurations on the server, client, or device accordingly.
b. Execute the display portal auth-fail-record command to identify the portal authentication failure reason for the user displayed in the Auth error reason field of the command output.
8. Check whether the RADIUS server is unresponsive.
You can troubleshoot using the following methods:
¡ Execute the display radius scheme command and view the server's state displayed in the State field. If the state is Blocked, it indicates the server is unavailable.
¡ Check whether the device prints the following log:
RADIUS/4/RADIUS_AUTH_SERVER_DOWN: -MDC=1; RADIUS authentication server was
blocked: server IP=192.168.161.188, port=1812, VPN instance=public.
¡ Execute the debugging radius event command on the device to enable debugging for RADIUS events. If the following information is printed on the device, it indicates that the RADIUS server is unresponsive.
*Jul 28 19:49:12:725 2021 Sysname RADIUS/7/evnet: -MDC=1; Reached the maximum retries.
After confirming that the RADIUS server is unresponsive, proceed with the following steps:
a. Check whether the device's IP address has been added on the server.
- If not, add the device's IP address on the server. If yes, make sure the added device IP address is consistent with the source IP address of the authentication request. By default, the source IP address of RADIUS packets sent to the RADIUS server is the IP address of the outgoing interface for these packets.
- If yes, make sure the device IP address added on the server is the source IP address of the authentication request.
b. View packets on both the device and the server, and check whether exceptions has occurred in the intermediate links. For example, a firewall in the intermediate network might not allow RADIUS (default authentication port 1812) packets to pass through. If a large number of users cannot be authenticated and RADIUS server down records appear in the logs on the device, there is a high probability that exceptions has occurred on the server or the intermediate network, and further check is required.
9. Check whether the authorization ACL or user profile failed to be deployed.
With portal strict checking enabled, if the authorized ACL or user profile does not exist on the device or the device fails to be deployed, the device will force the portal user offline.
a. Execute the display portal command and view the Strict checking field to check whether strict checking is enabled on the device. Then determine whether you need to enable strict checking. If it is not required, disable it directly. If it is required, go to step b.
b. Execute the display acl or display user-profile command on the device to check whether the ACL or user profile authorized by the AAA server does not exist. If the ACL or user profile does not exist, determine whether authorization by the server is required. If yes, add the corresponding ACL or user profile configurations on the device.
10. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Information collected after you execute the display portal auth-error-record or display portal auth-fail-record command.
¡ Screenshots of portal-related configurations on the portal server.
¡ Files containing the packets captured between the device and the AAA server.
¡ Screenshots of the issue taken on the client's browser.
¡ Debugging information collected after you enable the debugging portal command.
Related alarm and log messages
Alarm messages
N/A
Log messages
RADIUS/4/RADIUS_AUTH_SERVER_DOWN
Portal authentication user disconnections
A portal user is disconnected after coming online for a period of time.
Common causes
The following are the common causes of this type of issue:
· The user session has timed out.
· User idle cut.
· Accounting update failures.
· User traffic reaches the threshold.
· The server forces the user offline.
· The user failed the online detection.
· The interface where the user accesses is down.
Troubleshooting flow
Figure 161 shows the troubleshooting flowchart.
Figure 161 Flow chart for troubleshooting portal authentication user disconnections
Solution
1. Execute the portal logout-record enable command to enable portal user offline recording.
2. Check whether the user session has timed out.
If the AAA server has deployed the session timeout time (single online duration) to the portal user, once the user's online duration exceeds the timeout time, the device logs out the user.
Use the following methods to check whether the portal user goes offline because of session timeout:
¡ View the user offline records on the AAA server.
¡ Execute the display portal logout-record command to view the user logout reason.
<Sysname> display portal logout-record all
Total logout records: 1
User name : gkt
User MAC : 0800-2700-94ad
Interface : Vlan-interface100
User IP address : 21.0.0.20
AP : N/A
SSID : N/A
User login time : 2021-07-29 11:05:58
User logout time : 2021-07-29 11:05:58
Logout reason : Session timeout
¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the portal user is logged out because of user session timeout.
*Jul 28 17:51:20:774 2021 Sysname PORTAL/7/ERROR: -MDC=1; Session timer timed out and the user will be logged off.
The user logout triggered by session timeout is a normal logout. The user can come online again.
3. Check whether the user goes offline because of user idle cut.
With the idle cut feature configured, if the device or the AAA server has authorized an idle timeout period for the user, the device periodically checks the user's traffic after the user comes online. If the user's traffic generated within the specified idle timeout period is less than the specified data volume, the user will be forced offline.
You can use the following methods to check whether the portal user goes offline because of idle cut:
¡ View the user logout records on the AAA server.
¡ Execute the display portal logout-record command to view the user offline records.
<Sysname> display portal logout-record all
Total logout records: 1
User name : gkt
User MAC : 0800-2700-94ad
Interface : Vlan-interface100
User IP address : 21.0.0.20
AP : N/A
SSID : N/A
User login time : 2021-07-29 11:05:58
User logout time : 2021-07-29 11:05:58
Logout reason : Idle timeout
¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the portal user goes offline because of idle cut timer timeout.
*Jul 28 17:51:20:774 2021 Sysname PORTAL/7/ERROR: -MDC=1; Idle-cut timer timed out and the user will be logged off.
The logout triggered by idle timeout is a normal logout. The user can come online again.
4. Check whether accounting update failures.
When a remote portal authentication user comes online, the device periodically sends accounting-update packets to the AAA server. If the link between the device and the AAA server is disconnected or the server fails, the device fails to send accounting-update packets. When the maximum number of retransmissions is reached, transmission of accounting-update packets fails and the accounting update failure policy has been configured on the device, the user will be triggered to go offline. The accounting update failure policy is configured by using the accounting update-fail offline command.
You can use the following methods to check whether the user goes offline because of accounting update failures:
¡ Execute the display portal logout-record command to view the user offline records.
<Sysname> display portal logout-record all
Total logout records: 1
User name : gkt
User MAC : 0800-2700-94ad
Interface : Vlan-interface100
User IP address : 21.0.0.20
AP : N/A
SSID : N/A
User login time : 2021-07-29 11:05:58
User logout time : 2021-07-29 11:05:58
Logout reason : Accounting update failure
¡ Execute the display interface command to check whether the port on the device connected to the AAA server has any changes, or whether the AAA server has any exception records. Or, execute the display radius scheme command to check whether Block is displayed in the State field (indicating the state of the server). If yes, the reason for the user logout might be accounting update failures.
¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the portal user goes offline because of accounting update failures.
*Jul 28 17:51:20:774 2021 Sysname PORTAL/7/ERROR: -MDC=1; Processed accounting-update failed and user logout.
If you confirm that the user goes offline because of accounting update failures, check the link state between the device and the server, and check whether the relevant accounting configurations on the device and the AAA server have changed.
5. Check whether the user's traffic has reached the threshold.
When a user comes online, if the AAA server deploys a traffic threshold, the device will force the user offline once the user's traffic exceeds the deployed threshold.
You can use the following methods to check whether the user goes offline because traffic threshold reaching:
¡ Check the user offline records on the AAA server.
¡ Execute the display portal logout-record command to view the user offline records.
<Sysname> display portal logout-record all
Total logout records: 1
User name : gkt
User MAC : 0800-2700-94ad
Interface : Vlan-interface100
User IP address : 21.0.0.20
AP : N/A
SSID : N/A
User login time : 2021-07-29 11:05:58
User logout time : 2021-07-29 11:05:58
Logout reason : User traffic reached threshold
The user logout triggered by traffic threshold reaching is a normal logout. The user can come online again.
6. Check whether the AAA server actively kicks the user offline.
After RADIUS session-control feature is enabled on the device (using the radius session-control enable command), the device immediately forces a user offline upon reception of a disconnection request from the AAA server. If the feature is enabled, you can use the following methods to check whether the user is forced offline by the AAA server:
¡ View the user offline records on the AAA server.
¡ Execute the display portal logout-record command to view the user offline records.
<Sysname> display portal logout-record all
Total logout records: 1
User name : gkt
User MAC : 0800-2700-94ad
Interface : Vlan-interface100
User IP address : 21.0.0.20
AP : N/A
SSID : N/A
User login time : 2021-07-29 11:05:58
User logout time : 2021-07-29 11:05:58
Logout reason : Force logout by RADIUS server
For more information about the reasons for the forcible user logout, contact the server administrator.
7. Check whether the user goes offline because of online detection failures.
If the portal user online detection feature is enabled on the device (using the portal user-detect command), the device periodically sends detection packets to the user client. If the device has not received a response from the client after the specified maximum number of attempts, it will force the user offline.
Check whether the portal user online detection feature is enabled on the device. If the feature is enabled, you can use the following methods to check whether the user goes offline because of user online detection failures:
¡ View the user offline records on the AAA server.
¡ Execute the display portal logout-record command to view the user offline records.
<Sysname> display portal logout-record all
Total logout records: 1
User name : gkt
User MAC : 0800-2700-94ad
Interface : Vlan-interface100
User IP address : 21.0.0.20
AP : N/A
SSID : N/A
User login time : 2021-07-29 11:05:58
User logout time : 2021-07-29 11:05:58
Logout reason : User detection failure
After you confirm that the user goes offline because of user online detection failures, check the link state between the client and the device, and identify the reasons why the client does not respond to the detection packet.
8. Check whether the interface through which the portal user accesses is down.
If the interface used by the portal user goes down for a period of time, the device forces all portal users accessing through this interface offline.
You can confirm use the following methods to check whether the user has gone offline because of interface down:
¡ View the user logout records on the AAA server.
¡ Execute the display interface command to check whether the state of the interface changed. If the interface's state changed and the change time is close to the time when the user went offline, the reason for the user logout might be interface down.
¡ Execute the display portal logout-record command to view the user logout records
<Sysname> display portal logout-record all
Total logout records: 1
User name : gkt
User MAC : 0800-2700-94ad
Interface : Vlan-interface100
User IP address : 21.0.0.20
AP : N/A
SSID : N/A
User login time : 2021-07-29 11:05:58
User logout time : 2021-07-29 11:05:58
Logout reason : Interface down
If you confirm that the user goes offline because of the interface down, identify the reasons for interface down, such as loosely connected network cable.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
¡ Screenshots of portal-related configurations on the portal server.
¡ User logout records on the AAA server.
¡ Files containing the packets captured between the device and the AAA server.
¡ Screenshots of the issue taken on the client's browser.
¡ Debugging information collected after you enable the debugging portal command.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting security issues
Troubleshooting SSH
Failure to log in to the device from the SSH client
Symptom
The SSH client fails to log in to the device as the SSH server.
Common causes
The following are the common causes of this type of issue:
· The SSH client cannot reach the device.
· The device is not enabled with the SSH server function.
· An SSH login control ACL is specified on the device, but the ACL does not permit the IP address of the SSH client.
· The service port specified by the client does not match the server port.
· The SSH version on the device is not compatible with the client.
· No local key pairs are generated on the device.
· The public key on the server is inconsistent with that cached on the device.
· The authentication method or access protocol for a user line is incorrectly configured.
· No SSH service is configured in local user view on the device.
· The service type or authentication method for the SSH user is incorrectly configured.
· The algorithms for SSH2 the device are not compatible that on the client.
· The device does not have enough VTY user line resources.
· The number of SSH login users on the device reaches the upper limit.
Troubleshooting flow
Figure 162 shows the troubleshooting flowchart.
Figure 162 Flowchart for troubleshooting SSH login failure
Solution
1. Verify that the client can ping the device.
Execute the ping command to check network connectivity.
¡ If the ping fails, see the ping troubleshooting guide to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.
¡ If the ping succeeds, proceed to step 2.
2. Verify that the SSH server function is enabled.
If the following log message occurs, the SSH server function is disabled:
SSHS/6/SSHS_SRV_UNAVAILABLE: The SCP server is disabled or the SCP service type is not supported.
Execute the display ssh server status command to identify whether the Stelnet server function, SFTP server function, NETCONF over SSH server function, and SCP server function are enabled as needed.
<Sysname> display ssh server status
Stelnet server: Disable
SSH version : 2.0
SSH authentication-timeout : 60 second(s)
SSH server key generating interval : 0 hour(s)
SSH authentication retries : 3 time(s)
SFTP server: Disable
SFTP Server Idle-Timeout: 10 minute(s)
NETCONF server: Disable
SCP server: Disable
¡ If the SSH server function is disabled, execute the following commands to enable related SSH server functions:
<Sysname> system-view
[Sysname] ssh server enable
[Sysname] sftp server enable
[Sysname] scp server enable
[Sysname] netconf ssh server enable
¡ If the SSH server function is enabled, proceed to step 3.
3. Identify whether an SSH login control ACL is configured.
Identify whether an SSH login control ACL is specified by the ssh server acl command.
¡ If an SSH login control ACL is configured, identify whether the specified ACL permits the IP address of the client.
If the following log message occurs, the specified ACL denies the IP address of the client:
SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.10 was denied by ACL rule (rule ID=20).
SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.11 was denied by ACL rule (default rule).
- If the specified ACL denies the IP address of the client, edit the SSH login control ACL for the ACL to permit the IP address of the client. If no SSH clients require login control, remove SSH login control settings.
- If the specified ACL already permits the IP address of the client, proceed to step 4.
¡ If no SSH login control ACL is configured, proceed to step 4.
4. Identify whether the SSH service port on the client matches that on the server.
If the SSH service port on the server changes, but the client still uses the default SSH service port, the SSH login will fail.
Take an H3C device as the client as an example. The following error message will occur: Failed to connect to host 10.1.1.1 port 100.
¡ If the SSH service port on the client does not match that on the server, execute the display current-configuration | include ssh command to view the SSH service port on the server, and then change the SSH service port on the client to that on the server.
¡ If the SSH service port on the client matches that on the server, proceed to the next step.
5. Identify whether the SSH version on the server is compatible with that on the client.
If the following log message occurs, the SSH version on the device is not compatible with that on the client:
SSHS/6/SSHS_VERSION_MISMATCH: SSH client 192.168.30.117 failed to log in because of version mismatch.
If an SSH1 client logs in to the device, you can execute the display ssh server status command on the device to identify the SSH version from the SSH version field.
¡ If the SSH version field displays 1.99, the device is compatible the SSH1 client. Then, proceed to the next step.
¡ If the SSH version field displays 2.0, execute the ssh server compatible-ssh1x enable command on the device to enable the device to support SSH1 clients.
6. Identify whether the server generates a local key pair.
When the device acts as the SSH server, you must configure a local asymmetric key pair. A client uses only one of DSA, ECDSA, or RSA public key algorithms to authenticate the server, but different clients support different algorithms. To ensure successful client login, generate DSA, ECDSA, and RSA key pairs on the server as a best practice.
Execute the display public-key local public command on the device to view local public key information on the device.
¡ If no DSA, ECDSA, or RSA key pair exists, execute the public-key local create command to configure these key pairs in sequence.
¡ If these key pairs are configured, proceed to the next step.
7. Identify whether the public key on the server is consistent with that cached on the client.
If the client chooses to save the server's public key upon the first login, updating the server's local key pair will cause the client to fail to authenticate the server.
This example uses an H3C device as the client. If the following message occurs upon client login, the public key on the server is inconsistent with that cached on the client:
The server's host key does not match the local cached key. Either the server administrator has changed the host key, or you connected to another server pretending to be this server. Please remove the local cached key, before logging in!
¡ If the inconsistency occurs, execute the undo public-key peer command to delete the old server public key saved on the client.
¡ If the inconsistency does not exist, proceed to the next step.
8. Identify whether the authentication method and the access protocol for a VTY user line are configured correctly.
If the client is an Stelnet or NETCONF over SSH client, execute the display this command in VTY user line view to identify whether the authentication method is scheme and SSH is specified as an access protocol.
[Sysname] line vty 0 63
[Sysname-line-vty0-63] display this
#
line vty 0 63
authentication-mode scheme
user-role network-admin
idle-timeout 0 0
#
¡ If the authentication method or access protocol is configured incorrectly, change the authentication method to scheme and specify SSH as one of the access protocols.
¡ If the configuration is correct, proceed to step 9.
Execute the display this command in local user view to identify whether the SSH service is authorized to the local user.
[Sysname] local-user test
[Sysname-luser-manage-test] display this
#
local-user test class manage
service-type ssh
authorization-attribute user-role network-admin
authorization-attribute user-role network-operator
#
¡ If the SSH service is not authorized, execute the service-type command in local user view to specify the SSH service.
¡ If the SSH service is authorized, proceed to the next step.
If remote authentication is configured, locate issues as described in the AAA troubleshooting guide.
10. Identify whether an SSH user is configured and the correct service type and authentication method are specified for the SSH user.
SSH supports Stelnet, SFTP, NETCONF, and SCP service types.
First, identify whether the SSH user is created correctly based on the authentication method on the server.
¡ If the server uses the publickey authentication method, you must create an SSH user and a local user on the device. The two users must have the same username, so that the SSH user can be assigned the correct working directory and user role.
¡ If the server uses the password authentication method, you must perform one of the following tasks:
- For local authentication, configure a local user on the device.
- For remote authentication, configure an SSH user on a remote authentication server, for example, a RADIUS server. You do not need to create an SSH user. However, if such an SSH user has been created, make sure you have specified the correct service type and authentication method.
¡ If the server uses the keyboard-interactive, password-publickey, or any authentication method, you must create an SSH user on the device and perform one of the following tasks:
- For local authentication, configure a local user on the device.
- For remote authentication, configure an SSH user on a remote authentication server, for example, a RADIUS server.
Then, perform the following operations based on the result of the previous step:
¡ If no SSH user is created and required, proceed to the next step. If no SSH user is created but an SSH user is required, execute the ssh user command to create an SSH user.
¡ If an SSH user has been created, check the service type and authentication method for the SSH user.
To avoid login failure, the service type of the SSH user must match the client type, which can be Stelnet, SFTP, SCP, or NETCONF over SSH. Identify whether the service type for the SSH user is correct .
Take the SCP client as an example. The service type does not match the client type, if the following log message occurs on the device:
SSHS/6/SSHS_SRV_UNAVAILABLE: The SCP server is disabled or the SCP service type is not supported.
Then, perform the following operations:
- Execute the ssh user command in system view on the device to edit the service type for the SSH user.
- Execute the display ssh user-information command on the device to view the authentication method used by the SSH server. Identify whether the SSH user on the device is configured correctly based on the authentication method.
11. Identify whether algorithms for SSH2 on the device match the client.
Execute the display ssh2 algorithm command to view algorithms used by SSH2 to identify whether these algorithms include those supported by the client. For example, if the device is configured to not use CBC-related encryption algorithms, but the SSH client supports only CBC-related algorithms, the client will be unable to log in to the server.
Algorithms for SSH2 on the device does not match the client, if the following log message occurs on the device:
SSHS/6/SSHS_ALGORITHM_MISMATCH: SSH client 192.168.30.117 failed to log in because of encryption algorithm mismatch.
¡ If the algorithms on the client do not match those on the device, perform one of the following operations as needed:
- Execute the ssh2 algorithm cipher, ssh2 algorithm key-exchange, ssh2 algorithm mac, or ssh2 algorithm public-key command on the device to add algorithms supported by the client.
- Add algorithms supported by the server on the client.
¡ If the algorithms on the client match those on the device, proceed to the next step.
12. Identify whether the number of VTY users on the device reaches the upper limit.
Both SSH and Telnet users log in using VTY user lines, but VTY user lines are limited resources. If all VTY user lines are occupied, clients using Stelnet and NETCONF over SSH cannot log in. Clients using SFTP and SCP do not occupy user lines and can still log in.
The number of VTY users on the device reaches the upper limit if the following log message occurs:
SSHS/6/SSHS_REACH_USER_LIMIT: SSH client 192.168.30.117 failed to log in, because the number of users reached the upper limit.
Execute the display line command to identify whether VTY user lines are sufficient.
¡ If VTY user line resources are insufficient, change the authentication method for idle VTY lines with non-scheme authentication to scheme. If all VTY lines already use scheme authentication and are active, execute free line vty to forcibly release VTY lines. This allows new SSH users to come online.
¡ If VTY user line resources are sufficient, proceed to the next step.
13. Identify whether the number of online SSH users reaches the upper limit.
Execute the display ssh server session command to view session information on the server and the maximum number of SSH connections set by the aaa session-limit ssh command.
The number of online SSH users reaches the upper limit if the following log message occurs on the device:
SSHS/6/SSHS_REACH_SESSION_LIMIT: SSH client 192.168.30.117 failed to log in. The number of SSH sessions is 10, and exceeded the limit (10).
SSHS/6/SSHS_REACH_SESSION_LIMIT: SSH client 192.168.30.117 failed to log in. The current number of SSH sessions is 10. The maximum number allowed is 10.
¡ If the number of SSH sessions has reached the upper limit, execute aaa session-limit ssh to increase the upper limit. If the configured maximum number of user connections has reached the upper limit, disconnect idle SSH clients from the client side. This will allow new SSH users to come online.
¡ If the upper limit is not reached, proceed to the next step.
14. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module: HH3C-SSH-MIB
hh3cSSHVersionNegotiationFailure (1.3.6.1.4.1.25506.2.22.1.3.0.2)
Log messages
· SSHS/5/SSH_ACL_DENY
· SSHS/6/SSHS_ALGORITHM_MISMATCH
· SSHS/6/SSHS_REACH_SESSION_LIMIT
· SSHS/6/SSHS_REACH_USER_LIMIT
· SSHS/6/SSHS_SRV_UNAVAILABLE
· SSHS/6/SSHS_VERSION_MISMATCH
Failure to log in to the device as the SSH server through password authentication
Symptom
When the device acts the SSH server, a user fails to log in to the device through password authentication.
Common causes
The following are the common causes of this type of issue:
· The SSH client cannot reach the device.
· The login password of the SSH client is incorrect.
· The device is not enabled with the SSH server function.
· The SSH user is not configured on the SSH server.
· An SSH login control ACL is specified on the device, but the ACL does not permit the IP address of the SSH client.
· The number of SSH login users on the device reaches the upper limit.
· The SSH version on the device is not compatible with the client.
· The service type or authentication method for SSH users is configured incorrectly.
· No local key pairs are generated on the device.
· The SCP or SFTP working directory is incorrect.
Troubleshooting flow
Figure 163 shows the troubleshooting flowchart.
Solution
1. Verify that the client can ping the device.
Execute the ping command to check network connectivity.
¡ If the ping fails, see the ping troubleshooting guide to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.
¡ If the ping succeeds, proceed to the next step.
2. Verify that the login password is correct.
¡ If the server uses local authentication, identify whether the login password of the user is consistent with that set for the local device management user on the device.
- If the inconsistency occurs, enter the correct login password again. If the login password is forgotten, enter the view of the local device management user and execute the password command to specify a new password to ensure that the login password of the user and the specified password are the same. The name of the local device management user is that of the current login user.
- If the inconsistency does not exist, proceed to step 3.
¡ If the server uses remote authentication, make sure the password of the current login user is consistent with that on the authentication server.
- If the inconsistency occurs, enter the correct login password again. If the password is forgotten, set a new password on the device for the login user. Make sure the set password is consistent with that on the authentication server.
- If the inconsistency does not exist, proceed to step 3.
3. Verify that the SSH server function is enabled.
If the following log message occurs, the SSH server function is disabled:
SSHS/6/SSHS_SRV_UNAVAILABLE: The SCP server is disabled or the SCP service type is not supported.
Execute the display ssh server status command to identify whether the Stelnet server function, SFTP server function, NETCONF over SSH server function, and SCP server function are enabled as needed.
<Sysname> display ssh server status
Stelnet server: Disable
SSH version : 2.0
SSH authentication-timeout : 60 second(s)
SSH server key generating interval : 0 hour(s)
SSH authentication retries : 3 time(s)
SFTP server: Disable
SFTP Server Idle-Timeout: 10 minute(s)
NETCONF server: Disable
SCP server: Disable
¡ If the SSH server function is disabled, execute the following commands to enable related SSH server functions:
<Sysname> system-view
[Sysname] ssh server enable
[Sysname] sftp server enable
[Sysname] scp server enable
[Sysname] netconf ssh server enable
¡ If the SSH server function is enabled, proceed to the next step.
4. Identify whether the SSH service port on the client matches that on the server.
If the SSH service port on the server changes, but the client still uses the default SSH service port, the SSH login will fail.
Take an H3C device as the client as an example. The following error message will occur: Failed to connect to host 10.1.1.1 port 22.
¡ If the SSH service port on the client does not match that on the server, execute the display current-configuration | include ssh command to view the SSH service port on the server, and then change the SSH service port on the client to that on the server.
¡ If the SSH service port on the client matches that on the server, proceed to the next step.
5. Identify whether an SSH login control ACL is configured.
Identify whether an SSH login control ACL is specified by the ssh server acl command.
¡ If an SSH login control ACL is specified, identify whether the client is permitted by the ACL. First, execute the ssh server acl-deny-log enable command to enable logging for SSH login attempts that are denied by the SSH login control ACL.
¡ If the following log message occurs, the specified ACL denies the IP address of the client:
SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.10 was denied by ACL rule (rule ID=20).
SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.11 was denied by ACL rule (default rule).
- If the specified ACL denies the IP address of the client, edit the SSH login control ACL for the ACL to permit the IP address of the client. If no SSH clients require login control, remove SSH login control settings.
- If the specified ACL already permits the IP address of the client, proceed to the next step.
¡ If no SSH login control ACL is specified, proceed to the next step.
6. Identify whether the SSH version on the server is compatible with the client version.
If the following log message occurs, the SSH version on the device is not compatible with that on the client:
SSHS/6/SSHS_VERSION_MISMATCH: SSH client 192.168.30.117 failed to log in because of version mismatch.
If an SSH1 client logs in to the device, you can execute the display ssh server status command on the device to identify the SSH version from the SSH version field.
¡ If the SSH version field displays 1.99, the device is compatible the SSH1 client. Then, proceed to the next step.
¡ If the SSH version field displays 2.0, execute the ssh server compatible-ssh1x enable command on the device to enable the device to support SSH1 clients.
7. Identify whether the authentication method and the access protocol for a VTY user line are configured incorrectly.
If the client is an Stelnet or NETCONF over SSH client, execute the display this command in VTY user line view to identify whether the authentication method is scheme and SSH is specified as an access protocol.
[Sysname] line vty 0 63
[Sysname-line-vty0-63] display this
#
line vty 0 63
authentication-mode scheme
user-role network-admin
idle-timeout 0 0
#
¡ If the authentication method or access protocol is configured incorrectly, execute authentication-mode scheme to change the authentication method to scheme and execute protocol inbound ssh to specify SSH as one of the access protocols.
¡ If the configuration is correct, proceed to the next step.
8. Identify whether the number of VTY users on the device reaches the upper limit.
Both SSH and Telnet users log in using VTY user lines, but VTY user lines are limited resources. If all VTY user lines are occupied, clients using Stelnet and NETCONF over SSH cannot log in. Clients using SFTP and SCP do not occupy user lines and can still log in.
The number of VTY users on the device reaches the upper limit if the following log message occurs:
SSHS/6/SSHS_REACH_USER_LIMIT: SSH client 192.168.30.117 failed to log in, because the number of users reached the upper limit.
Execute the display line command to identify whether VTY user lines are sufficient.
¡ If VTY user line resources are insufficient, change the authentication method for idle VTY lines with non-scheme authentication to scheme. If all VTY lines already use scheme authentication and are active, execute free line vty to forcibly release VTY lines. This allows new SSH users to come online.
¡ If VTY user line resources are sufficient, proceed to the next step.
9. Identify whether the number of online SSH users reaches the upper limit.
Execute the display ssh server session command to view session information on the server and the maximum number of SSH connections set by the aaa session-limit ssh command.
The number of online SSH users reaches the upper limit if the following log message occurs on the device:
SSHS/6/SSHS_REACH_SESSION_LIMIT: SSH client 192.168.30.117 failed to log in. The number of SSH sessions is 10, and exceeded the limit (10).
¡ If the number of SSH sessions has reached the upper limit, execute aaa session-limit ssh to increase the upper limit. If the configured maximum number of user connections has reached the upper limit, disconnect idle SSH clients from the client side. This will allow new SSH users to come online.
¡ If the upper limit is not reached, proceed to the next step.
10. Identify whether the server generates a local key pair.
To prevent fake server spoofing, the client first identifies whether the public key sent from the server matches the one stored locally when the client authenticates the server. After the client verifies the public key consistency, the client uses this public key to verify the server's digital signature. If the client has not saved the server's public key or the saved server’s public key is incorrect, server authentication will fail, preventing the client from logging in to the server. Therefore, before the client logs in to the server, create a key pair on the server and save the correct server’s public key on the client.
A client uses only one of DSA, ECDSA, or RSA public key algorithms to authenticate the server, but different clients support different algorithms. To ensure successful client login, generate DSA, ECDSA, and RSA key pairs on the server as a best practice.
Execute the display public-key local public command no the device to view local public key information on the device.
¡ If no DSA, ECDSA, or RSA key pair exists, execute the public-key local create command to configure these key pairs in sequence. Make sure the public key generated on the server is saved to the client.
¡ If these key pairs are configured, proceed to the next step.
11. Identify whether an SSH user is configured and the correct service type and authentication method are specified for the SSH user.
SSH supports Stelnet, SFTP, NETCONF, and SCP service types.
Identify whether the SSH user is created correctly based on the authentication method on the server.
If the server uses the password authentication method, you must perform one of the following tasks:
¡ For local authentication, configure a local user on the device.
¡ For remote authentication, configure an SSH user on a remote authentication server, for example, a RADIUS server.
For remote authentication, you do not need to create an SSH user. However, if such an SSH user has been created, make sure you have specified the correct service type and authentication method.
Perform the following operations based on the result of the previous step:
¡ If no SSH user is created and required, proceed to the next step. If no SSH user is created but an SSH user is required, execute the ssh user command to create an SSH user.
¡ If an SSH user has been created, check the service type and authentication method for the SSH user.
- Execute the display ssh user-information command on the device to view the service type and authentication for the SSH user from the Service-type and Authentication-type fields, respectively. The service type of the SSH user must match the client type, which can be Stelnet, SFTP, SCP, or NETCONF over SSH, and the authentication method must be password.
- Execute the ssh user command in system view on the device to edit the service type and authentication method for the SSH user as required.
12. Identify whether the SCP or SFTP working directory is correct.
When the service type for an SSH user is SCP or SFTP, specify an authorized directory for the SSH user. If the specified authorized directory does not exist, the SCP or SFTP client will fail to connect to the SCP or SFTP server through that SSH user. For a password authentication user, identify whether the AAA-authorized working directory exists.
¡ If the working directory does not exist, execute the authorization-attribute work-directory directory-name command in local user view to edit the authorized working directory.
¡ If the working directory exists, proceed to the next step.
13. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module: HH3C-SSH-MIB
hh3cSSHVersionNegotiationFailure (1.3.6.1.4.1.25506.2.22.1.3.0.2)
Log messages
· SSHS/5/SSH_ACL_DENY
· SSHS/6/SSHS_ALGORITHM_MISMATCH
· SSHS/6/SSHS_REACH_SESSION_LIMIT
· SSHS/6/SSHS_REACH_USER_LIMIT
· SSHS/6/SSHS_SRV_UNAVAILABLE
· SSHS/6/SSHS_VERSION_MISMATCH
Failure to log in to the device as the SSH server through publickey authentication
Symptom
When the device acts the SSH server, a user fails to log in to the device through publickey authentication.
Common causes
The following are the common causes of this type of issue:
· The SSH client cannot reach the device.
· The user's public key on the server is configured incorrectly.
· The device is not enabled with the SSH server function.
· The SSH user is not configured on the SSH server.
· An SSH login control ACL is specified on the device, but the ACL does not permit the IP address of the SSH client.
· The number of SSH login users on the device reaches the upper limit.
· The SSH version on the device is not compatible with the client.
· The service type or authentication method for SSH users is configured incorrectly.
· No local key pairs are generated on the device.
· The SCP or SFTP working directory is incorrect.
Troubleshooting flow
Figure 164 shows the troubleshooting flowchart.
Solution
1. Verify that the client can ping the device.
Execute the ping command to check network connectivity.
¡ If the ping fails, see the ping troubleshooting guide to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.
¡ If the ping succeeds, proceed to step 2.
2. Identify whether the user's public key configured on the server matches the private key used by the user.
The SSH client might support multiple public key algorithms, each corresponding to a different asymmetric key pair. User authentication will succeed only when the type of public key saved on the server matches the type of private key used by the user during login. For example, if the server specifies the DSA public key for a user and the user also has a matching private key. However, user authentication will fail if the user attempts to log in using an RSA private key, . Execute the display public-key peer command on the device to view client public key information saved on the device. Identify whether the client public key is consistent with the type of private key used by the login user.
¡ If the inconsistency occurs, execute the public-key local create command for the device to generate the corresponding type of private key pair.
¡ If the consistency exists, proceed to step 3.
3. Verify that the SSH server function is enabled.
If the following log message occurs, the SSH server function is disabled:
SSHS/6/SSHS_SRV_UNAVAILABLE: The SCP server is disabled or the SCP service type is not supported.
Execute the display ssh server status command to identify whether the Stelnet server function, SFTP server function, NETCONF over SSH server function, and SCP server function are enabled as needed.
<Sysname> display ssh server status
Stelnet server: Disable
SSH version : 2.0
SSH authentication-timeout : 60 second(s)
SSH server key generating interval : 0 hour(s)
SSH authentication retries : 3 time(s)
SFTP server: Disable
SFTP Server Idle-Timeout: 10 minute(s)
NETCONF server: Disable
SCP server: Disable
¡ If the SSH server function is disabled, execute the following commands to enable related SSH server functions:
<Sysname> system-view
[Sysname] ssh server enable
[Sysname] sftp server enable
[Sysname] scp server enable
[Sysname] netconf ssh server enable
¡ If the SSH server function is enabled, proceed to step 4.
4. Identify whether the SSH service port on the client matches that on the server.
If the SSH service port on the server changes, but the client still uses the default SSH service port, the SSH login will fail.
Take an H3C device as the client as an example. The following error message will occur: Failed to connect to host 10.1.1.1 port 100.
¡ If the SSH service port on the client does not match that on the server, execute the display current-configuration | include ssh command to view the SSH service port on the server, and then change the SSH service port on the client to that on the server.
¡ If the SSH service port on the client matches that on the server, proceed to step 5.
5. Identify whether an SSH login control ACL is configured.
Identify whether an SSH login control ACL is specified by the ssh server acl command.
¡ If an SSH login control ACL is specified, identify whether the client is permitted by the ACL. First, execute the ssh server acl-deny-log enable command to enable logging for SSH login attempts that are denied by the SSH login control ACL.
If the following log message occurs, the specified ACL denies the IP address of the client:
SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.10 was denied by ACL rule (rule ID=20).
SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.11 was denied by ACL rule (default rule).
- If the specified ACL denies the IP address of the client, edit the SSH login control ACL for the ACL to permit the IP address of the client. If no SSH clients require login control, remove SSH login control settings.
- If the specified ACL already permits the IP address of the client, proceed to the next step
¡ If no SSH login control ACL is configured, proceed to the next step.
6. Identify whether the SSH version on the server is compatible with the client version.
If the following log message occurs, the SSH version on the device is not compatible with that on the client:
SSHS/6/SSHS_VERSION_MISMATCH: SSH client 192.168.30.117 failed to log in because of version mismatch.
If an SSH1 client logs in to the device, you can execute the display ssh server status command on the device to identify the SSH version from the SSH version field.
¡ If the SSH version field displays 1.99, the device is compatible the SSH1 client. Then, proceed to the next step.
¡ If the SSH version field displays 2.0, execute the ssh server compatible-ssh1x enable command on the device to enable the device to support SSH1 clients.
7. Identify whether the authentication method and the access protocol for a VTY user line are configured incorrectly.
If the client is an Stelnet or NETCONF over SSH client, execute the display this command in VTY user line view to identify whether the authentication method is scheme and SSH is specified as an access protocol.
[Sysname] line vty 0 63
[Sysname-line-vty0-63] display this
#
line vty 0 63
authentication-mode scheme
user-role network-admin
idle-timeout 0 0
#
¡ If the authentication method or access protocol is configured incorrectly, execute authentication-mode scheme to change the authentication method to scheme and execute protocol inbound ssh to specify SSH as one of the access protocols.
¡ If the configuration is correct, proceed to the next step.
8. Identify whether the number of VTY users on the device reaches the upper limit.
Both SSH and Telnet users log in using VTY user lines, but VTY user lines are limited resources. If all VTY user lines are occupied, clients using Stelnet and NETCONF over SSH cannot log in. Clients using SFTP and SCP do not occupy user lines and can still log in.
The number of VTY users on the device reaches the upper limit if the following log message occurs:
SSHS/6/SSHS_REACH_USER_LIMIT: SSH client 192.168.30.117 failed to log in, because the number of users reached the upper limit.
Execute the display line command to identify whether VTY user lines are sufficient.
¡ If VTY user line resources are insufficient, change the authentication method for idle VTY lines with non-scheme authentication to scheme. If all VTY lines already use scheme authentication and are active, execute free line vty to forcibly release VTY lines. This allows new SSH users to come online.
¡ If VTY user line resources are sufficient, proceed to the next step.
9. Identify whether the number of online SSH users reaches the upper limit.
Execute the display ssh server session command to view session information on the server and the maximum number of SSH connections set by the aaa session-limit ssh command.
The number of online SSH users reaches the upper limit if the following log message occurs on the device:
SSHS/6/SSHS_REACH_SESSION_LIMIT: SSH client 192.168.30.117 failed to log in. The number of SSH sessions is 10, and exceeded the limit (10).
¡ If the number of SSH sessions has reached the upper limit, execute aaa session-limit ssh to increase the upper limit. If the configured maximum number of user connections has reached the upper limit, disconnect idle SSH clients from the client side. This will allow new SSH users to come online.
¡ If the upper limit is not reached, proceed to the next step.
10. Identify whether the server generates a local key pair.
To prevent fake server spoofing, the client first identifies whether the public key sent from the server matches the one stored locally when the client authenticates the server. After the client verifies the public key consistency, he client uses this public key to verify the server's digital signature. If the client has not saved the server's public key or the saved server’s public key is incorrect, server authentication will fail, preventing the client from logging in to the server. Therefore, before the client logs in to the server, create a key pair on the server and save the correct server’s public key on the client.
A client uses only one of DSA, ECDSA, or RSA public key algorithms to authenticate the server, but different clients support different algorithms. To ensure successful client login, generate DSA, ECDSA, and RSA key pairs on the server as a best practice.
Execute the display public-key local public command o he device to view local public key information on the device.
¡ If no DSA, ECDSA, or RSA key pair exists, execute the public-key local create command to configure these key pairs in sequence.
¡ If these key pairs are configured, proceed to the next step.
11. Identify whether an SSH user is configured and the correct service type and authentication method are specified for the SSH user.
SSH supports Stelnet, SFTP, NETCONF, and SCP service types.
Identify whether the SSH user is created correctly based on the authentication method on the server.
If the server uses the publickey authentication method, you must create an SSH user and a local user on the device. The two users must have the same username, so that the SSH user can be assigned the correct working directory and user role.
Perform the following operations based on the result of the previous step:
¡ If no SSH user is configured, execute the ssh user command to create an SSH user.
¡ If an SSH user has been created, check the service type and authentication method for the SSH user.
- Execute the display ssh user-information command on the device to view the service type and authentication for the SSH user from the Service-type and Authentication-type fields, respectively. The service type of the SSH user must match the client type, which can be Stelnet, SFTP, SCP, or NETCONF over SSH, and the authentication method must be publickey.
- Execute the ssh user command in system view on the device to edit the service type and authentication method for the SSH user as required.
12. Identify whether the SCP or SFTP working directory is correct.
When the service type for an SSH user is SCP or SFTP, specify an authorized directory for the SSH user. If the specified authorized directory does not exist, the SCP or SFTP client will fail to connect to the SCP or SFTP server through that SSH user. For a publickey authentication user, identify whether the AAA-authorized working directory exists.
¡ If the working directory does not exist, execute the authorization-attribute work-directory directory-name command in local user view to edit the authorized working directory.
¡ If the working directory exists, proceed to step 13.
13. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module: HH3C-SSH-MIB
hh3cSSHVersionNegotiationFailure (1.3.6.1.4.1.25506.2.22.1.3.0.2)
Log messages
· SSHS/5/SSH_ACL_DENY
· SSHS/6/SSHS_ALGORITHM_MISMATCH
· SSHS/6/SSHS_REACH_SESSION_LIMIT
· SSHS/6/SSHS_REACH_USER_LIMIT
· SSHS/6/SSHS_SRV_UNAVAILABLE
· SSHS/6/SSHS_VERSION_MISMATCH
Failure to log in to the SSH server through password authentication when the device acts as the SSH client
Symptom
A password authentication user fails to log in to the SSH server when the device acts as the SSH client.
Common causes
The following are the common causes of this type of issue:
· The SSH client cannot reach the server, causing TCP connection setup failure.
· The login password of the SSH client is incorrect.
· No local key pairs are generated on the server.
· The public key on the SSH server is inconsistent with that cached on the SSH client.
· The SSH version of the client is not compatible with the server.
Troubleshooting flow
Figure 165 shows the troubleshooting flowchart.
Solution
1. Verify that the client can ping the server.
Execute the ping command to check network connectivity.
¡ If the ping fails, see the ping troubleshooting guide to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.
¡ If the ping succeeds, proceed to step 2.
2. Verify that the login password is correct.
¡ If the server uses local authentication, identify whether the login password of the user is consistent with that set for the local device management user on the device.
- If the inconsistency occurs, enter the correct login password again. If the login password is forgotten, enter the view of the local device management user on the server and execute the password command to specify a new password to ensure that the login password of the user and the specified password are the same. The name of the local device management user is that of the current login user. This example uses an H3C device as the SSH server.
- If the inconsistency does not exist, proceed to the next step.
¡ If the server uses remote authentication, make sure the password of the current login user is consistent with that on the authentication server.
- If the inconsistency occurs, enter the correct login password again. If the password is forgotten, set a new password on the device for the login user. Make sure the set password is consistent with that on the authentication server.
- If the inconsistency does not exist, proceed to the next step.
3. Identify whether the server generates a local key pair.
To prevent fake server spoofing, the client first identifies whether the public key sent from the server matches the one stored locally when the client authenticates the server. After the client verifies the public key consistency, he client uses this public key to verify the server's digital signature. If the client has not saved the server's public key or the saved server’s public key is incorrect, server authentication will fail, preventing the client from logging in to the server. Therefore, before the client logs in to the server, create a key pair on the server and save the correct server’s public key on the client.
A client uses only one of DSA, ECDSA, or RSA public key algorithms to authenticate the server, but different clients support different algorithms. To ensure successful client login, generate DSA, ECDSA, and RSA key pairs on the server as a best practice.
This example uses an H3C device as an SSH server. Execute the display public-key local public command on the server to view local public key information on the server.
¡ If no DSA, ECDSA, or RSA key pair exists, execute the public-key local create command to configure these key pairs in sequence. Make sure the public key generated on the server is saved to the client.
¡ If these key pairs are configured, proceed to the next step.
4. Identify whether the SSH version of the server is compatible with the client version.
Verify that the SSH version of the server is compatible with the client version.
This example uses an H3C device as an SSH server. If an SSH1 client logs in to the server, you can execute the display ssh server status command on the server to identify the SSH version from the SSH version field.
¡ If the SSH version field displays 1.99, the server is compatible the SSH1 client. Then, proceed to the next step.
¡ If the SSH version field displays 2.0, execute the ssh server compatible-ssh1x enable command on the server to enable the server to support the SSH1 client.
5. Identify whether the public key on the server is consistent with that cached on the client..
If the client chooses to save the server's public key upon the first login, updating the server's local key pair will cause the client to fail to authenticate the server.
The server's host key does not match the local cached key. Either the server administrator has changed the host key, or you connected to another server pretending to be this server. Please remove the local cached key, before logging in!
¡ If the inconsistency occurs, execute the undo public-key peer command on the client to delete the old server public key saved on the client.
¡ If the inconsistency does not exist, proceed to step 6.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure to log in to the SSH server through publickey authentication when the device acts as the SSH client
Symptom
When the device acts an SSH client, a user fails to log in to the SSH server through publickey authentication.
Common causes
The following are the common causes of this type of issue:
· The SSH client cannot reach the server, causing TCP connection setup failure.
· No local key pairs are generated on the server.
· The user's public key on the server is configured incorrectly.
· The server’s public key is not compatible with that cached on the SSH client.
· The SSH version on the client is not compatible with the server.
Troubleshooting flow
Figure 166 shows the troubleshooting flowchart.
Solution
1. Verify that the client can ping the server.
Execute the ping command to check network connectivity.
¡ If the ping fails, see the ping troubleshooting guide to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.
¡ If the ping succeeds, proceed to step 2.
2. Identify whether the user's public key configured on the server matches the private key used by the user.
The SSH client might support multiple public key algorithms, each corresponding to a different asymmetric key pair. User authentication will succeed only when the type of public key saved on the server matches the type of private key used by the user during login. For example, if the server specifies the DSA public key for a user and the user also has a matching private key. However, user authentication will fail if the user attempts to log in using an RSA private key, .
This example uses an H3C device as an SSH server. Execute the display public-key peer command on the device (client) to view client public key information saved on the device. Identify whether the client public key is consistent with the type of private key used by the login user.
¡ If the inconsistency occurs, execute the public-key local create command on the server to generate the corresponding type of private key pair.
¡ If the consistency exists, proceed to step 3.
3. Identify whether the server generates a local key pair.
To prevent fake server spoofing, the client first identifies whether the public key sent from the server matches the one stored locally when the client authenticates the server. After the client verifies the public key consistency, the client uses this public key to verify the server's digital signature. If the client has not saved the server's public key or the saved server’s public key is incorrect, server authentication will fail, preventing the client from logging in to the server. Therefore, before the client logs in to the server, create a key pair on the server and save the correct server’s public key on the client.
A client uses only one of DSA, ECDSA, or RSA public key algorithms to authenticate the server, but different clients support different algorithms. To ensure successful client login, generate DSA, ECDSA, and RSA key pairs on the server as a best practice.
This example uses an H3C device as an SSH server. Execute the display public-key local public command on the server to view local public key information on the server.
¡ If no DSA, ECDSA, or RSA key pair exists, execute the public-key local create command to configure these key pairs in sequence.
¡ If these key pairs are configured, proceed to step 4.
4. Identify whether the SSH version of the server is compatible with the client version.
Verify that the SSH version of the server is compatible with the client version.
This example uses an H3C device as an SSH server. If an SSH1 client logs in to the server, you can execute the display ssh server status command on the server to identify the SSH version from the SSH version field.
¡ If the SSH version field displays 1.99, the server is compatible the SSH1 client. Then, proceed to step 5.
¡ If the SSH version field displays 2.0, execute the ssh server compatible-ssh1x enable command on the server to enable the server to support the SSH1 client.
5. Identify whether the public key on the server is consistent with that cached on the client.
If the client chooses to save the server's public key upon the first login, updating the server's local key pair will cause the client to fail to authenticate the server.
The server's host key does not match the local cached key. Either the server administrator has changed the host key, or you connected to another server pretending to be this server. Please remove the local cached key, before logging in!
¡ If the inconsistency occurs, execute the undo public-key peer command on the client to delete the old server public key saved on the client.
¡ If the inconsistency does not exist, proceed to step 6.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
SSL VPN issues
Failure to open the SSL VPN webpage on the browser
Symptom
When you enter the address of the SSL VPN gateway on a browser, the SSL VPN webpage fails to open.
Common causes
The following are the common causes of this type of issue:
· The client cannot reach the SSL VPN gateway.
· The security policies between security zones are not configured correctly.
· The SSL VPN gateway is not configured correctly.
· The SSL VPN context is not configured correctly.
· The address and port of the SSL VPN gateway are not correctly listened to.
Troubleshooting flow
Figure 167 shows the troubleshooting flowchart.
Figure 167 Flowchart for troubleshooting failure to open the SSL VPN webpage
Solution
1. Verify that the client can ping the SSL VPN gateway successfully.
Execute the ping command to check network connectivity.
a. If the ping fails, see the troubleshooting guides for Layer 3—IP services for how to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.
b. If the issue persists, proceed to step 2.
2. Verify that the security policies between security zones are configured correctly.
As shown in Figure 168, verify that the security policies between security zones meet the following requirements:
¡ Make sure the Local security zone on the device can communicate with the Untrust security zone for the user to ensure communication between the user and the SSL VPN gateway.
The configuration of security zones and security policies is as follows:
<Device> system-view
[Device] interface GigabitEthernet 2/0/1
[Device-GigabitEthernet 2/0/1] ip address 1.1.1.2 255.255.255.0
[Device-GigabitEthernet 2/0/1] quit
[Device] security-zone name untrust
[Device-security-zone-Untrust] import interface GigabitEthernet 2/0/1
[Device-security-zone-Untrust] quit
[Device] security-policy ip
[Device-security-policy-ip] rule name sslvpnlocalout1
[Device-security-policy-ip-0-sslvpnlocalout1] source-zone local
[Device-security-policy-ip-0-sslvpnlocalout1] destination-zone untrust
[Device-security-policy-ip-0-sslvpnlocalout1] source-ip-host 1.1.1.2
[Device-security-policy-ip-0-sslvpnlocalout1] destination-ip-host 40.1.1.1
[Device-security-policy-ip-0-sslvpnlocalout1] action pass
[Device-security-policy-ip-0-sslvpnlocalout1] quit
[Device-security-policy-ip] rule name sslvpnlocalin1
[Device-security-policy-ip-1-sslvpnlocalin1] source-zone untrust
[Device-security-policy-ip-1-sslvpnlocalin1] destination-zone local
[Device-security-policy-ip-1-sslvpnlocalin1] source-ip-host 40.1.1.1
[Device-security-policy-ip-1-sslvpnlocalin1] destination-ip-host 1.1.1.2
[Device-security-policy-ip-1-sslvpnlocalin1] action pass
[Device-security-policy-ip-1-sslvpnlocalin1] quit
[Device-security-policy-ip] quit
¡ Make sure the Local security zone on the device can communicate with the Trust security zone for the internal to ensure communication between the internal server and the SSL VPN gateway.
The configuration of security zones and security policies is as follows:
[Device] interface GigabitEthernet 2/0/2
[Device-GigabitEthernet 2/0/2] ip address 2.2.2.2 255.255.255.0
[Device-GigabitEthernet 2/0/2] quit
[Device] security-zone name trust
[Device-security-zone-Trust] import interface GigabitEthernet 2/0/2
[Device-security-zone-Trust] quit
[Device-security-policy-ip] rule name sslvpnlocalout2
[Device-security-policy-ip-2-sslvpnlocalout2] source-zone local
[Device-security-policy-ip-2-sslvpnlocalout2] destination-zone trust
[Device-security-policy-ip-2-sslvpnlocalout2] source-ip-host 2.2.2.2
[Device-security-policy-ip-2-sslvpnlocalout2] destination-ip-host 20.2.2.2
[Device-security-policy-ip-2-sslvpnlocalout2] action pass
[Device-security-policy-ip-2-sslvpnlocalout2] quit
[Device-security-policy-ip] rule name sslvpnlocalin2
[Device-security-policy-ip-3-sslvpnlocalin2] source-zone trust
[Device-security-policy-ip-3-sslvpnlocalin2] destination-zone local
[Device-security-policy-ip-3-sslvpnlocalin2] source-ip-host 20.2.2.2
[Device-security-policy-ip-3-sslvpnlocalin2] destination-ip-host 2.2.2.2
[Device-security-policy-ip-3-sslvpnlocalin2] action pass
[Device-security-policy-ip-3-sslvpnlocalin2] quit
[Device-security-policy-ip] quit
For more information about troubleshooting security policies, see the security policy troubleshooting guide.
If the issue persists, proceed to step 3.
3. Verify that the SSL VPN gateway is configured correctly.
View the SSL VPN gateway information to identify the status of the SSL VPN gateway.
¡ Identify whether the SSL VPN gateway is up. Execute the display sslvpn gateway command to view the Operation state filed in the output.
¡ If the Operation state filed displays Up, the SSL VPN gateway is up. If the Operation state filed does not display Up, execute the service enable command in SSL VPN gateway view to enable the SSL VPN gateway. The following is an example:
[Device] sslvpn gateway gw1
[Device-sslvpn-gateway-gw1] service enable
The SSL VPN gateway information is as follows:
[Device] display sslvpn gateway
Gateway name: gw
Operation state: Up
IP: 1.1.1.2 Port: 2000
...
If the issue persists, proceed to step 4.
4. Verify that the SSL VPN context is configured correctly.
View the SSL VPN context information to identify the status of the SSL VPN context.
¡ Identify whether the SSL VPN context is up. Identify the Operation state filed in the output. If the Operation state filed displays Up, the SSL VPN context is up. If the Operation state filed does not display Up, execute the service enable command in SSL VPN context view to enable the SSL VPN context.
¡ Verify that the SSL VPN context is associated with the SSL VPN gateway. Identify the Associated SSL VPN gateway field in the output. If the Associated SSL VPN gateway field displays a gateway name, the SSL VPN gateway is successfully associated. If the Associated SSL VPN gateway field does not display any gateway name, execute the gateway command in SSL VPN context view to associate the SSL VPN context with the SSL VPN gateway. The following is an example:
[Device] sslvpn context ctx1
[Device-sslvpn-context-ctx1] gateway gw1
The SSL VPN context information is as follows:
[Device] display sslvpn context
Context name: ctx
Operation state: Up
Associated SSL VPN gateway: gw
...
If the issue persists, proceed to step 5.
5. Verify that the address and port of the SSL VPN gateway are correctly listened to.
Execute the display tcp-proxy to identify the listening status of the SSL VPN gateway address and port. Verify that the listening port on each service module is correctly enabled.
The TCP proxy information is as follows:
[Device] display tcp-proxy slot 1
Local Addr:port Foreign Addr:port State Service type
1.1.1.2:2000 0.0.0.0:0 LISTEN SSLVPN
Perform the following operations as needed is the port listening status is abnormal:
¡ If the port listening is abnormal due to memory insufficiency, execute the display memory-threshold command to obtain memory usage information. If a memory usage alarm threshold is reached, execute the undo service enable command to disable the SSL VPN gateway in SSL VPN gateway view after the memory space recovers to normal. Then, execute the service enable command to enable the SSL VPN gateway again.
The memory usage threshold information is as follows:
[Device] display memory-threshold
Memory usage threshold: 100%
Free-memory thresholds:
Minor: 256M
Severe: 128M
Critical: 64M
Normal: 304M
Early-warning: 320M
Secure: 368M
Current free-memory state: Minor
...
¡ If the port listening is abnormal due to port occupation, execute the display tcp-proxy port-info command to view port usage information. If the status of the port range to which a port belongs is RESERVED, the ports in the port range are all occupied. In this case, change the port of the SSL VPN gateway, and then execute the undo service enable and undo service enable commands in SSL VPN gateway view to re-enable the SSL VPN gateway.
The usage information of TCP proxy ports is as follows:
[Device] display tcp-proxy port-info
Index Range State
16 [1024, 1087] USABLE
17 [1088, 1151] USABLE
18 [1152, 1215] RESERVED
19 [1216, 1279] USABLE
20 [1280, 1343] USABLE
...
If the issue persists, proceed to step 6.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure to log in to the SSL VPN gateway on a browser
Symptom
The browser can open the SSL VPN webpage but cannot log in to the SSL VPN gateway.
Common causes
The following are the common causes of this type of issue:
· The SSL VPN user is not configured correctly.
· When certificate authentication is enabled for the client and the server, certificates are installed incorrectly.
Troubleshooting flow
Figure 169 shows the troubleshooting flowchart.
Figure 169 Flowchart for troubleshooting failure to log in to the SSL VPN gateway
Solution
1. Verify that the SSL VPN user configuration is correct.
Verify user configuration depending on the user type.
¡ For local users, execute the display local-user command to verify local user configuration as follows:
- Network access user is displayed before the name of the local user to indicate that the local user is a network access user.
- The Service type field displays SSL VPN.
- Verify that the SSL VPN policy group field has a value to ensure that a policy group is configured for the SSL VPN user.
<Sysname> display local-user
Network access user sslvpn:
State: Active
Service type: SSL VPN
User group: system
Authorization attributes:
Work directory: flash:
User role list: network-operator
SSL VPN policy group: pg
...
¡ For remote users, make sure a local user group is configured on the device, and its name must match the user group name on the remote authentication server. For example, if the user group name on the remote authentication server is sslvpn, the local user group name must also be sslvpn. In addition, make sure the local user group is associated with the SSL VPN policy group displayed in the SSL VPN policy group field.
<Sysname> display user-group all
Total 1 user groups matched.
User group: sslvpn
Authorization attributes:
Work directory: flash:/
SSL VPN policy group: policygroup1
...
If the issue persists, proceed to step 2.
2. Verify that certificates are installed correctly.
If certificate authentication is enabled for the client and the server, make sure certificates are installed correctly both the client and the server.
¡ Client certificate authentication: Execute the display ssl client-policy command to verify the SSL client policy information, including the SSL version and associated PKI domain.
<Sysname> display ssl client-policy policy1
SSL client policy: policy1
SSL version: SSL 3.0
PKI domain: client-domain
Preferred ciphersuite:
RSA_AES_128_CBC_SHA
Server-verify: enabled
...
¡ Server certificate authentication: Execute the display ssl server-policy command to verify the SSL server policy information, including the SSL version and associated PKI domain.
<Sysname> display ssl server-policy policy1
SSL server policy: policy1
Version info:
SSL3.0: Disabled
TLS1.0: Enabled
TLS1.1: Disabled
TLS1.2: Enabled
TLS1.3: Enabled
GM-TLS1.1: Disabled
PKI domains: server-domain
...
If the issue persists, proceed to step 3.
3. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure to obtain SSL VPN gateway information on the iNode client
Symptom
When you enter the SSL VPN gateway address on the iNode client, the iNode client prompts that it cannot obtain SSL VPN gateway information.
Common causes
The following are the common causes of this type of issue:
· The client cannot reach the SSL VPN gateway.
· The security policies between security zones are not configured correctly.
· The SSL VPN gateway is not configured correctly.
· The SSL VPN context is not configured correctly.
· The address and port of the SSL VPN gateway are not correctly listened to.
Cause
Figure 170 shows the troubleshooting flowchart.
Solution
1. Verify that the client can ping the device.
Execute the ping command to check network connectivity.
¡ If the ping fails, see the troubleshooting guides for Layer 3—IP services for how to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.
¡ If the issue persists, proceed to step 2.
2. Verify that the security policies between security zones are configured correctly.
As shown in Figure 171, verify that the security policies between security zones meet the following requirements:
¡ Make sure the Local security zone on the device can communicate with the Untrust security zone for the SSL VPN user to ensure communication between the SSL VPN user and the SSL VPN gateway.
The configuration of security zones and security policies is as follows:
<Device> system-view
[Device] interface GigabitEthernet 2/0/1
[Device-GigabitEthernet 2/0/1] ip address 1.1.1.2 255.255.255.0
[Device-GigabitEthernet 2/0/1] quit
[Device] interface sslvpn-ac 1
[Device-SSLVPN-AC1] ip address 10.1.1.100 24
[Device-SSLVPN-AC1] quit
[Device] security-zone name untrust
[Device-security-zone-Untrust] import interface GigabitEthernet 2/0/1
[Device-security-zone-Untrust] import interface sslvpn-ac 1
[Device-security-zone-Untrust] quit
[Device] security-policy ip
[Device-security-policy-ip] rule name sslvpnlocalout1
[Device-security-policy-ip-0-sslvpnlocalout1] source-zone local
[Device-security-policy-ip-0-sslvpnlocalout1] destination-zone untrust
[Device-security-policy-ip-0-sslvpnlocalout1] source-ip-host 1.1.1.2
[Device-security-policy-ip-0-sslvpnlocalout1] destination-ip-host 40.1.1.1
[Device-security-policy-ip-0-sslvpnlocalout1] action pass
[Device-security-policy-ip-0-sslvpnlocalout1] quit
[Device-security-policy-ip] rule name sslvpnlocalin1
[Device-security-policy-ip-1-sslvpnlocalin1] source-zone untrust
[Device-security-policy-ip-1-sslvpnlocalin1] destination-zone local
[Device-security-policy-ip-1-sslvpnlocalin1] source-ip-host 40.1.1.1
[Device-security-policy-ip-1-sslvpnlocalin1] destination-ip-host 1.1.1.2
[Device-security-policy-ip-1-sslvpnlocalin1] action pass
[Device-security-policy-ip-1-sslvpnlocalin1] quit
[Device-security-policy-ip] quit
¡ Make sure the Local security zone on the device can communicate with the Trust security zone for the internal to ensure communication between the internal server and the SSL VPN gateway.
The configuration of security zones and security policies is as follows:
[Device] interface GigabitEthernet 2/0/2
[Device-GigabitEthernet 2/0/2] ip address 2.2.2.2 255.255.255.0
[Device-GigabitEthernet 2/0/2] quit
[Device] security-zone name trust
[Device-security-zone-Trust] import interface GigabitEthernet 2/0/2
[Device-security-zone-Trust] quit
[Device-security-policy-ip] rule name sslvpnlocalout2
[Device-security-policy-ip-2-sslvpnlocalout2] source-zone local
[Device-security-policy-ip-2-sslvpnlocalout2] destination-zone trust
[Device-security-policy-ip-2-sslvpnlocalout2] source-ip-host 2.2.2.2
[Device-security-policy-ip-2-sslvpnlocalout2] destination-ip-host 20.2.2.2
[Device-security-policy-ip-2-sslvpnlocalout2] action pass
[Device-security-policy-ip-2-sslvpnlocalout2] quit
[Device-security-policy-ip] rule name sslvpnlocalin2
[Device-security-policy-ip-3-sslvpnlocalin2] source-zone trust
[Device-security-policy-ip-3-sslvpnlocalin2] destination-zone local
[Device-security-policy-ip-3-sslvpnlocalin2] source-ip-host 20.2.2.2
[Device-security-policy-ip-3-sslvpnlocalin2] destination-ip-host 2.2.2.2
[Device-security-policy-ip-3-sslvpnlocalin2] action pass
[Device-security-policy-ip-3-sslvpnlocalin2] quit
[Device-security-policy-ip] quit
¡ Make sure the Untrust security zone for the SSL VPN user can communicate with the Trust security zone for the internal server to ensure that the user can communicate with the server via the SSL VPN AC interface.
The configuration of security zones and security policies is as follows:
[Device-security-policy-ip] rule name untrust-trust
[Device-security-policy-ip-4-untrust-trust] source-zone untrust
[Device-security-policy-ip-4-untrust-trust] destination-zone trust
[Device-security-policy-ip-4-untrust-trust] source-ip-subnet 10.1.1.0 24
[Device-security-policy-ip-4-untrust-trust] destination-ip-host 20.2.2.2
[Device-security-policy-ip-4-untrust-trust] action pass
[Device-security-policy-ip-4-untrust-trust] quit
[Device-security-policy-ip] rule name trust-untrust
[Device-security-policy-ip-5-trust-untrust] source-zone trust
[Device-security-policy-ip-5-trust-untrust] destination-zone untrust
[Device-security-policy-ip-5-trust-untrust] source-ip-host 20.2.2.2
[Device-security-policy-ip-5-trust-untrust] destination-ip-subnet 10.1.1.0 24
[Device-security-policy-ip-5-trust-untrust] action pass
[Device-security-policy-ip-5-trust-untrust] quit
[Device-security-policy-ip] quit
For more information about troubleshooting security policies, see the security policy troubleshooting guide.
If the issue persists, proceed to step 3.
3. Verify that the SSL VPN gateway is configured correctly.
View the SSL VPN gateway information to identify the status of the SSL VPN gateway.
¡ Identify whether the SSL VPN gateway is up. Execute the display sslvpn gateway command to view the Operation state filed in the output.
The SSL VPN gateway information is as follows:
[Device] display sslvpn gateway
Gateway name: gw
Operation state: Up
IP: 1.1.1.2 Port: 2000
...
¡ If the Operation state filed displays Up, the SSL VPN gateway is up. If the Operation state filed does not display Up, execute the service enable command in SSL VPN gateway view to enable the SSL VPN gateway. The following is an example:
[Device] sslvpn gateway gw1
[Device-sslvpn-gateway-gw1] service enable
If the issue persists, proceed to step 4.
4. Verify that the SSL VPN context is configured correctly.
View the SSL VPN context information to identify the status of the SSL VPN context.
¡ Identify whether the SSL VPN context is up. Execute the display sslvpn context command to identify the Operation state filed in the output. If the Operation state filed displays Up, the SSL VPN context is up. If the Operation state filed does not display Up, execute the service enable command in SSL VPN context view to enable the SSL VPN context.
The SSL VPN context information is as follows:
[Device] display sslvpn context
Context name: ctx
Operation state: Up
Associated SSL VPN gateway: gw
...
¡ Verify that the SSL VPN context is associated with the SSL VPN gateway. Execute the display sslvpn context command to identify the Associated SSL VPN gateway field in the output. If the Associated SSL VPN gateway field displays a gateway name, the SSL VPN gateway is successfully associated. If the Associated SSL VPN gateway field does not display any gateway name, execute the gateway command in SSL VPN context view to associate the SSL VPN context with the SSL VPN gateway. The following is an example:
[Device] sslvpn context ctx1
[Device-sslvpn-context-ctx1] gateway gw1
If the issue persists, proceed to step 5.
5. Verify that the address and port of the SSL VPN gateway are correctly listened to.
Execute the display tcp-proxy to identify the listening status of the SSL VPN gateway address and port. Verify that the listening port on each service module is correctly enabled.
The TCP proxy information is as follows:
[Device] display tcp-proxy slot 1
Local Addr:port Foreign Addr:port State Service type
1.1.1.2:2000 0.0.0.0:0 LISTEN SSLVPN
Perform the following operations as needed is the port listening status is abnormal:
¡ If the port listening is abnormal due to memory insufficiency, execute the display memory-threshold command to obtain memory usage information. If a memory usage alarm threshold is reached, execute the undo service enable command to disable the SSL VPN gateway in SSL VPN gateway view after the memory space recovers to normal. Then, execute the service enable command to enable the SSL VPN gateway again.
The memory usage threshold information is as follows:
[Device] display memory-threshold
Memory usage threshold: 100%
Free-memory thresholds:
Minor: 256M
Severe: 128M
Critical: 64M
Normal: 304M
Early-warning: 320M
Secure: 368M
Current free-memory state: Minor
...
¡ If the port listening is abnormal due to port occupation, execute the display tcp-proxy port-info command to view port usage information. If the status of the port range to which a port belongs is RESERVED, the ports in the port range are all occupied. In this case, change the port of the SSL VPN gateway, and then execute the undo service enable and service enable commands in SSL VPN gateway view to re-enable the SSL VPN gateway.
The usage information of TCP proxy ports is as follows:
[Device] display tcp-proxy port-info
Index Range State
16 [1024, 1087] USABLE
17 [1088, 1151] USABLE
18 [1152, 1215] RESERVED
19 [1216, 1279] USABLE
20 [1280, 1343] USABLE
...
If the issue persists, proceed to step 6.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure to log in to the SSL VPN gateway from the iNode client
Symptom
When you enter the SSL VPN gateway address on the iNode client, the iNode client can obtain the SSL VPN gateway information but cannot log in to the SSL VPN gateway information.
The display sslvpn session command shows that the SSL VPN session is empty. That is, the SSL VPN failed to be set up.
Common causes
The following are the common causes of this type of issue:
· The SSL VPN AC interface is not configured correctly.
· The VNIC fails to be assigned an IP address.
· The SSL VPN user is not configured correctly.
· When certificate authentication is enabled for the client and the server, certificates are installed incorrectly.
· The version of the iNode client is not the latest one.
Cause
Figure 172 shows the troubleshooting flowchart.
Solution
1. Verify that the SSL VPN AC interface is configured correctly.
Make sure the SSL VPN AC interface is configured correctly. Assign an IP address to the SSL VPN AC interface and associate the SSL VPN AC interface with the SSL VPN context.
a. Execute the display interface command to verify that the SSL VPN AC interface is configured correctly. For example, the output is as follows, the SSL VPN AC interface is configured correctly:
<Device> system-view
[Device] display interface SSLVPN-AC 1 brief
Brief information on interfaces in route mode:
Link: ADM - administratively down; Stby - standby
Protocol: (s) - spoofing
Interface Link Protocol Primary IP Description
SSLVPN-AC1 UP UP 1.1.1.1
b. Configure the SSL VPN AC interface and associate the SSL VPN AC interface with the SSL VPN context. The following is an example:
[Device] interface SSLVPN-AC 1
[Device-SSLVPN-AC1] ip address 1.1.1.1 24
[Device-SSLVPN-AC1] quit
[Device] sslvpn context ctx
[Device-sslvpn-context-ctx] ip-tunnel interface SSLVPN-AC 1
[Device-sslvpn-context-ctx] quit
If the issue persists, proceed to step 2.
2. If the VNIC fails to be assigned an IP address, perform the following operations:
a. Verify that the SSL VPN address pool is configured correctly.
Verify that the following settings are configured: An SSL VPN address pool is configured. The SSL VPN address pool is associated with the SSL VPN context or an user-authorized policy group. The SSL VPN address pool does not include the address of the SSL VPN gateway.
The following is an example to configure the SSL VPN address and associate the SSL VPN address pool with the SSL VPN context and the policy group:
[Device] sslvpn ip address-pool pool1 1.1.1.1 1.1.1.10
[Device] sslvpn context ctx
[Device-sslvpn-context-ctx] ip-tunnel address-pool pool1 mask 24
[Device-sslvpn-context-ctx] policy-group pg1
[Device-sslvpn-context-ctx1-policy-group-pg1] ip-tunnel address-pool pool1 mask 24
[Device-sslvpn-context-ctx1-policy-group-pg1] quit
[Device-sslvpn-context-ctx] quit
b. Identify whether the address pool has available addresses.
Review log information of the device to identify whether the following log message exists: SSLVPN_IPAC_ALLOC_ADDR_FAIL: Reason: No address is available in the address pool.
If the log message exists, the address pool does not available addresses. Then, wait for a period of time before you log in again or execute the following commands to configure the address pool again by adding the number of addresses and associating the address pool again:
[Device] sslvpn ip address-pool pool1 1.1.1.1 1.1.1.100
[Device] sslvpn context ctx
[Device-sslvpn-context-ctx] ip-tunnel address-pool pool1 mask 24
[Device-sslvpn-context-ctx] policy-group pg1
[Device-sslvpn-context-ctx1-policy-group-pg1] ip-tunnel address-pool pool1 mask 24
[Device-sslvpn-context-ctx1-policy-group-pg1] quit
[Device-sslvpn-context-ctx] quit
c. Identify whether the available addresses in the address pool are bound to other users.
Review log information of the device to identify whether the following log message exists: SSLVPN_IPAC_ALLOC_ADDR_FAIL: Reason: Available addresses in the address pool have been bound to other users.
If the log message exists, the available addresses in the address pool are bound to other users. Wait for a period of time before you log in again or execute the following commands to configure the address pool again by adding the number of addresses and associating the address pool again:
[Device] sslvpn ip address-pool pool1 1.1.1.1 1.1.1.100
[Device] sslvpn context ctx
[Device-sslvpn-context-ctx] ip-tunnel address-pool pool1 mask 24
[Device-sslvpn-context-ctx] policy-group pg1
[Device-sslvpn-context-ctx1-policy-group-pg1] ip-tunnel address-pool pool1 mask 24
[Device-sslvpn-context-ctx1-policy-group-pg1] quit
[Device-sslvpn-context-ctx] quit
If the issue persists, proceed to step 3.
3. Verify that the SSL VPN user configuration is correct.
Check user configuration depending on the user type. The name of the SSL VPN policy group for a user or user group specified by the authorization-attribute command must match that in the SSL VPN context specified by the policy-group command.
¡ For local users, execute the display local-user command to verify local user configuration as follows:
- Network access user is displayed before the name of the local user to indicate that the local user is a network access user.
- The Service type field displays SSL VPN.
- Verify that the SSL VPN policy group field has a value to ensure that a policy group is configured for the SSL VPN user.
<Sysname> display local-user
Network access user sslvpn:
State: Active
Service type: SSL VPN
User group: system
Authorization attributes:
Work directory: flash:
User role list: network-operator
SSL VPN policy group: pg
...
¡ For remote users, make sure a local user group is configured on the device, and its name must match the user group name on the remote authentication server. For example, if the user group name on the remote authentication server is sslvpn, the local user group name must also be sslvpn. In addition, make sure the local user group is associated with the SSL VPN policy group displayed in the SSL VPN policy group field.
<Sysname> display user-group all
Total 1 user groups matched.
User group: sslvpn
Authorization attributes:
Work directory: flash:/
SSL VPN policy group: policygroup1
...
The name of the SSL VPN policy group for a user or user group specified by the authorization-attribute command must match that in the SSL VPN context specified by the policy-group command.
If the issue persists, proceed to step 4.
4. Verify that certificates are installed correctly.
If certificate authentication is enabled for the client and the server, make sure certificates are installed correctly both the client and the server.
¡ Client certificate authentication: Execute the display ssl client-policy command to verify the SSL client policy information, including the SSL version and associated PKI domain.
<Sysname> display ssl client-policy policy1
SSL client policy: policy1
SSL version: SSL 3.0
PKI domain: client-domain
Preferred ciphersuite:
RSA_AES_128_CBC_SHA
Server-verify: enabled
...
¡ Server certificate authentication: Execute the display ssl server-policy command to verify the SSL server policy information, including the SSL version and associated PKI domain.
<Sysname> display ssl server-policy policy1
SSL server policy: policy1
Version info:
SSL3.0: Disabled
TLS1.0: Enabled
TLS1.1: Disabled
TLS1.2: Enabled
TLS1.3: Enabled
GM-TLS1.1: Disabled
PKI domains: server-domain
...
If the issue persists, proceed to step 5.
5. Verify that the iNode client is of the latest version.
Access the official website to verify that the iNode client version is the latest. The following is an example:
Figure 173 iNode client
If the iNode client version is not the latest, download the iNode client of the latest version from the official website.
If the issue persists even when the iNode client version is the latest, proceed to step 6.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
SSLVPN_IPAC_ALLOC_ADDR_FAIL
SSLVPN_IPAC_ALLOC_ADDR_SUCCESS
Failure to age out an iNode user
Symptom
When some iNode users have not accessed internal resources for a long period, they do not go offline, thus occupying device license resources.
Common causes
The idle-cut traffic threshold is not set for SSL VPN sessions.
Troubleshooting flow
Figure 174 shows the troubleshooting flowchart.
Figure 174 Flowchart for troubleshooting failure to log in to the SSL VPN gateway
Solution
1. Verify that the idle-cut traffic threshold is set for SSL VPN sessions.
The iNode client periodically sends keepalive messages, preventing it from aging out and going offline. To age out idle iNode users, you can set the idle-cut traffic threshold for SSL VPN sessions. The following is an example:
<Device> system-view
[Device] sslvpn context ctx1
[Device-sslvpn-context-ctx1] idle-cut traffic-threshold 1000
If the issue persists, proceed to step 2.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure log in to the SSL VPN gateway again after previous successful logins
Symptom
A user fails to log in to the SSL VPN gateway again after previous successful logins.
Common causes
The maximum number of concurrent logins for each user account is set in an SSL VPN context.
Troubleshooting flow
Figure 175 shows the troubleshooting flowchart.
Figure 175 Flowchart for troubleshooting failure to log in to the SSL VPN gateway again
Solution
1. Identify whether the maximum number of concurrent logins for each user account is set in an SSL VPN context.
a. If the maximum number of concurrent logins for each user account is set in an SSL VPN context, delete the configuration or increase the maximum number before another login attempt. The following is an example:
<Device> system-view
[Device] sslvpn context ctx
[Device-sslvpn-context-ctx] max-onlines 100
b. Enable the force logout feature. When a login is attempted but logins using the account reach the maximum, this feature logs out the user with the longest idle time to allow the new login.
[Device-sslvpn-context-ctx] force-logout max-onlines enable
If the issue persists, proceed to step 2.
2. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration files, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
SSLVPN_USER_LOGINFAILED
Troubleshooting high availability issues
Troubleshooting BFD
BFD session establishment failure
Symptom
Execute the display bfd session command on the device. If no session information is displayed or the State field value in the output is not Up, the BFD session cannot be established.
Common causes
The following are the common causes of this type of issue:
· The routing table has no routes to the destination address of the BFD session.
· The path detected by BFD fails, and BFD packets cannot be correctly exchanged as a result.
Troubleshooting flow
Figure 176 shows the troubleshooting flowchart.
Figure 176 Flowchart for troubleshooting BFD session establishment failure
Solution
BFD uses a session established between two network devices to detect the bidirectional forwarding path between the devices for the upper-layer application. BFD itself does not have a discovery mechanism. It relies on the notifications from the associated upper-layer protocol to establish a session. After the upper-layer protocol establishes a new neighbor relationship, it sends the neighbor's parameters and detection parameters (including the destination address and source address) to BFD. BFD then establishes a session based on the received parameters. Once the session is established, BFD packets are quickly sent periodically. If BFD packets are not received within the detection interval, the bidirectional forwarding path is considered faulty, and fault information is sent to the upper-layer application, which takes corresponding actions. To accurately troubleshoot a BFD session establishment failure, make sure the upper-layer protocol operates correctly.
To resolve the issue:
1. Execute the display bfd session command to identify whether BFD session information exists.
¡ If no BFD session information exists, proceed to step 2 and step 3.
¡ If BFD session information exists, but the State field value is Down, proceed to step 4.
2. Identify whether an upper-layer protocol is associated with BFD.
Execute the display current-configuration command to identify whether an upper-layer protocol is associated with BFD. For example, the following output shows that OSPF is associated with BFD:
interface GigabitEthernet2/0/1
ospf bfd enable
¡ If an upper-layer protocol is associated with BFD, proceed to the next step.
¡ If no upper-layer protocol is associated with BFD, execute the corresponding command to associate an upper-layer protocol with BFD and make sure the configuration is correct.
3. Identify whether the number of BFD sessions exceeds the upper limit of the device.
Execute the display bfd session command to view the value of the Total sessions field. If the value reaches the device upper limit, new BFD sessions cannot be created. To solve this issue, delete some unnecessary BFD sessions by disabling BFD for some upper-layer protocols.
If the number of BFD sessions does not exceed the device upper limit, proceed to the next step.
4. Identify whether the BFD routing or tunnel information is correct.
If BFD is used to detect the connectivity of an IP link, perform the following tasks to check the routing information:
a. Execute the display bfd session command to view the IPv4 or IPv6 address displayed in the DestAddr field.
b. Execute the display ip routing-table or display ipv6 routing-table command to identify whether a route destined for the address is available in the DestAddr field.
c. If no routes exist, see the Layer 3—IP routing troubleshooting guide to troubleshoot the issue.
If a route exists but the BFD session cannot come up, proceed to the next step.
If BFD is used to detect the connectivity of an LSP, PW, VXLAN tunnel, MPLS TE tunnel, SRLSP, or SRv6 TE policy, see the troubleshooting guide for each module. If the tunnel state is abnormal, troubleshoot the tunnel issue. If the tunnel state is normal but the BFD session cannot come up, proceed to the next step.
5. Identify whether BFD packet sending is correct.
Execute the display bfd session verbose command multiple times to view the value of the Tx count field, which indicates the number of sent packets. If the value remains 0, no BFD packets are sent. Perform the following tasks to check BFD packet sending:
a. Execute the display current-configuration configuration bfd-static-session command to view the interface detected by the static BFD session. For example, the following output shows that GigabitEthernet 2/0/2 (after track-interface) is the interface detected by the static BFD session.
<Sysname> display current-configuration configuration bfd-static-session
#
bfd static chris peer-ipv6 1::2 source-ipv6 1::1 discriminator local 1000 remote 1010 track-interface GigabitEthernet2/0/2
#
b. Execute the display interface interface-type interface-number command to view the running status of the interface. If the value of the Current state or Line protocol state field is not UP, troubleshoot the interface issue. If the interface is running correctly, proceed to step c.
c. Execute the display bfd session command to view the value of the Init mode field, which indicates the BFD operating mode. A node operating in passive mode does not send BFD packets until it receives a BFD control packet from a node operating in active mode.
d. If the value of the Tx count field on the node operating in active mode keeps increasing, this node sends BFD packets correctly. In this case, proceed to step 6 to identify whether the node operating in passive mode can receive BFD packets correctly.
e. If the BFD packet sending failure is caused by reasons other than those mentioned above, proceed to step 8.
6. Identify whether BFD packet receiving is correct.
Execute the display bfd session verbose command multiple times on one end of the BFD session to view the value of the Rx count field, which indicates the number of packets received.
¡ If the value of the Rx count field remains 0, identify whether the peer end of the BFD session sends BFD packets correctly. If the peer end does not send BFD packets correctly, troubleshoot the packet sending failure on the peer end.
If the peer end sends BFD packets correctly, execute the display system internal bfd packet statistics command on the local end to view the data loss statistics in the The detailed discarded packet statistics area. If packet loss occurs, resolve the packet loss issue according to the reasons. If the issue cannot be resolved or no packet loss occurs, proceed to step 8.
¡ If the value of the Rx count field increases and the BFD session enters Init state, the local end can receive BFD packets. In this case, execute the display bfd session verbose command on the peer end of the BFD session to view the value of the Rx count field.
- If the value of the Rx count field on the peer end remains 0 but the local end sends BFD packets correctly, the peer end of the BFD session does not receive packets correctly. This issue will cause the peer end to continually send BFD control packets in Down state, preventing the BFD session from coming up on the local end. Execute the display system internal bfd packet statistics command on the peer end to view the data loss statistics in the The detailed discarded packet statistics area. If packet loss occurs, resolve the packet loss issue according to the reasons. If the issue cannot be resolved or no packet loss occurs, proceed to step 8.
- If the value of the Rx count field on the peer end remains 0 because the local end does not send BFD packets correctly, troubleshoot the packet sending failure on the local end.
If both ends send BFD packets correctly, but one end cannot receive BFD packets, proceed to step 7.
7. Identify whether the link detected by the BFD session can forward packets correctly.
Execute the ping command to identify whether the link detected by the BFD session can forward packets correctly. Use different ping commands by link type, as shown in Table 19.
Table 19 Ping commands for different types of links
Link type |
Ping command |
IP links |
ping ip ping ipv6 |
LSP tunnels |
ping mpls ipv4 |
MPLS TE tunnels |
ping mpls te |
PWs |
ping mpls pw |
SRv6 TE policies |
ping srv6-te policy |
¡ If the other end cannot be pinged, see Ping and Tracert—Ping Failure Troubleshooting Guide, MPLS Troubleshooting Guide, and Segment Routing Troubleshooting Guide to troubleshoot tunnel issues.
¡ If the other end can be pinged, proceed to the next step.
8. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
BFD session flapping
Symptom
When the link is unstable, the CLI might frequently output logs about BFD session going down. For example:
%Jul 28 16:03:50:856 2022 H3C BFD/4/BFD_CHANGE_FSM: Sess[192.168.24.4/192.168.24.2, LD/RD:33793/33793, Interface:GE2/0/1, SessType:Ctrl, LinkType:INET], Ver:1, Sta: UP->DOWN, Diag: 7 (Administratively Down)
Common causes
The following are the common causes of this type of issue:
· Physical link failures.
· Upper-layer protocol failures.
· Hardware failures.
Troubleshooting flow
The troubleshooting flow for this issue is as follows:
1. Determine the reasons according to the BFD logs.
2. Check the board hardware, physical link, upper-layer protocol state, route availability, and whether the tunnel is correctly established.
Figure 116 shows the troubleshooting flowchart.
Figure 177 Flowchart for troubleshooting BFD session flapping.
Solution
CAUTION: · Output of excessive debugging messages increases the CPU usage and affects the system operation. To guarantee system performance, enable debugging for only the specified modules. · Save the execution results of the following steps. If the issue cannot be resolved, contact Technical Support. |
BFD uses a session established between two network devices to detect the bidirectional forwarding path between the devices for the upper-layer application. BFD itself does not have a discovery mechanism. It relies on the notifications from the associated upper-layer protocol to establish a session. After the upper-layer protocol establishes a new neighbor relationship, it sends the neighbor's parameters and detection parameters (including the destination address and source address) to BFD. BFD then establishes a session based on the received parameters. Once the session is established, BFD packets are quickly sent periodically. If BFD packets are not received within the detection interval, the bidirectional forwarding path is considered faulty, and fault information is sent to the upper-layer application, which takes corresponding actions. To accurately troubleshoot a BFD session establishment failure, make sure the upper-layer protocol operates correctly.
To resolve the issue:
1. Identify whether the BFD session state changes from Init to Down.
The following log message outputted by the CLI indicates that the BFD session state changes from Init to Down:
BFD/4/BFD_CHANGE_FSM: Sess[20.0.4.2/20.0.4.1,LD/RD:533/532, Interface:Vlan204, SessType:Ctrl, LinkType:INET], Ver.1, Sta: INIT->DOWN, Diag: 1 (Control Detection Time Expired).
a. Identify whether the link detected by the BFD session can forwards packets correctly.
Execute the ping command to identify whether the link detected by the BFD session can forward packets correctly. Use different ping commands for different types of links, as shown in Table 20.
Table 20 Ping commands for different types of links
Link type |
Ping command |
IP links |
ping ip or ping ipv6 |
LSP tunnels |
ping mpls ipv4 |
MPLS TE tunnels |
ping mpls te |
PWs |
ping mpls pw |
SRv6 TE policies |
ping srv6-te policy |
- If the other end cannot be pinged successfully, see Ping and Tracert—Ping Failure Troubleshooting Guide, MPLS Troubleshooting Guide, and Segment Routing Troubleshooting Guide to troubleshoot tunnel issues.
- If the other end can be pinged successfully, proceed to the next step.
b. Check the BFD packet receiving on the local end.
Execute the debugging bfd packet receive command to enable debugging for received BFD packets.
- If the value of the Sta field in the debugging information is 1 or no debugging information is displayed, it indicates that the local end receives BFD packets in Down state or cannot receive BFD packets. In this case, proceed to the next step.
- If the value of the Sta field in the debugging information is 2 or 3, it indicates that the local end receives BFD packets in Init or Up state, but the BFD session cannot come up on the local end. In this case, proceed to the next step.
c. Check the BFD packet receiving on the peer end as described in "BFD session establishment failure."
- If the peer end cannot receive BFD packets, see "BFD session establishment failure" to troubleshoot the issue.
- If the peer end can receive BFD packets, execute the display system internal bfd packet statistics command on the local end to view the data loss reasons in the The detailed discarded packet statistics area and resolve the packet loss issue according to the reasons. If the issue cannot be resolved or no packet loss occurs, proceed to the next step.
2. Identify whether the BFD session state changes from Up to Down.
The following log message outputted by the CLI indicates that the BFD session state changes from UP to Down:
BFD/4/BFD_CHANGE_FSM: Sess[20.0.4.2/20.0.4.1,LD/RD:533/532, Interface:Vlan204, SessType:Ctrl, LinkType:INET], Ver.1, Sta: UP->DOWN, Diag: 1 (Control Detection Time Expired).
State change from Up to Down for the BFD is typically related to issues during the session negotiation phase. You can identify the reason based on the value of the Diag field in the BFD log message.
Table 21 Diagnostic information for different values of the Diag field
Value of the Diag field |
Diagnostic information |
1 (Control Detection Time Expired) |
Local detection times out for a control-mode BFD session. The local end does not receive any packet from the peer end within the detection interval. |
2 (Echo Function Failed) |
Detection times out for an echo-mode BFD session. The local end does not receive any packet from the peer end within the detection interval. |
3 (Neighbor Signaled Session Down) |
The peer end notifies the local end of BFD session down. |
If the value of the Diag field is 1, perform the following tasks:
a. Disable BFD for the upper-layer protocol, and then select the appropriate tool to check link connectivity based on the type of the link.
- If link flapping occurs, troubleshoot the link flapping issue.
- If the link is operating correctly and the BFD session still flaps after you re-enable BFD for the upper-layer protocol, proceed to the next step.
b. Check the BFD packet receiving on the local end.
Check the BFD packet receiving on the local end as described in "BFD session establishment failure."
- If the local end can receive BFD packets but packet loss occurs, execute the display system internal bfd packet statistics command to view the data loss reasons in the The detailed discarded packet statistics area and resolve the packet loss issue according to the reasons. If the issue cannot be resolved or no packet loss occurs, proceed to the next step.
- If the local end cannot receive BFD packets, proceed to the next step.
If the value of the Diag field is 2, perform the following tasks:
¡ If BFD is used to detect a single-hop IP link, use the ping command on the peer end to ping the source address of the echo-mode BFD session.
- If the IP address cannot be pinged, it indicates a link failure. Troubleshoot the link failure.
- If the IP address can be pinged, proceed to the next step.
¡ If BFD is used to detect an MPLS tunnel, the local end sends BFD echo packets through the MPLS tunnel and the peer end forwards the received BFD echo packets through an IP link. In this case, check the connectivity of the MPLS tunnel and the IP link.
- If the MPLS tunnel or IP link fails, troubleshoot the MPLS tunnel failure or IP link failure.
- If the MPLS tunnel and IP link operate correctly, proceed to the next step.
¡ If BFD is used to detect an SRv6 tunnel, the local end sends BFD echo packets through the SRv6 tunnel and the peer end forwards the received BFD echo packets through an IP link. In this case, check the connectivity of the SRv6 tunnel and the IP link.
- If the SRv6 tunnel or IP link fails, troubleshoot the SRv6 tunnel failure or IP link failure.
- If the SRv6 tunnel and IP link operate correctly, proceed to the next step.
¡ If uRPF is enabled for the device, it drops the echo packets forwarded back from the peer end. In this case, execute the display ip urpf command to identify whether an ACL is specified to permit the packets whose source IP address is the source IP address of the echo-mode BFD session. This configuration suppresses uRPF from dropping packets that match the ACL.
- If no ACL is specified for packet drop suppression, use the ip urpf command to specify an ACL that permits the packets whose source IP address is the source IP address of the echo-mode BFD session.
- If such an ACL is specified for packet drop suppression, proceed to the next step.
If the value of the Diag field is 3, perform the same tasks as those performed when the value of the Diag field is 1.
3. Identify whether the BFD session state changes to Administratively Down.
The following log message outputted by the CLI indicates that the BFD session state changes to Administratively Down:
BFD/5/BFD_CHANGE_SESS: Sess[17.1.1.2/17.1.1.1, LD/RD:1537/1537, Interface:GE2/0/1, SessType:Ctrl, LinkType:INET], Ver:1, Sta: Deleted, Diag: 7 (Administratively Down)
The common cause of the issue is upper-layer protocol failure, which indirectly causes BFD flapping. First, disable BFD for the upper-layer protocol and identify whether the upper-layer protocol remains stable. If the upper-layer protocol flaps, see Layer 3—IP Routing Troubleshooting Guide, MPLS Troubleshooting Guide, and Segment Routing Troubleshooting Guide to troubleshoot the upper-layer protocol failure.
If the upper-layer protocol is stable but the BFD session cannot come up, proceed to the next step.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: HH3C-BFD-STD-MIB
· hh3cBfdSessStateUp (1.3.6.1.4.1.25506.2.72.0.3)
· hh3cBfdSessStateDown (1.3.6.1.4.1.25506.2.72.0.4)
Log messages
· BFD/4/BFD_CHANGE_FSM
· BFD/5/BFD_CHANGE_SESS
Troubleshooting SBFD
SBFD session establishment failure
Symptom
After executing the display sbfd session initiator command on the device, if you cannot view session information, or the Session state field value in the command output is not Up, the SBFD session cannot come up.
Common causes
The following are the common causes of this type of issue:
· The upper-layer protocol has not issued an SBFD session creation command.
· The number of SBFD sessions has exceeded the device capacity.
· The SBFD local discriminator is not configured on the reflector.
· Route entry or related forwarding entry anomalies occur on the initiator and reflector of the SBFD session.
· The link detected by SBFD is faulty. As a result, SBFD packets cannot be exchanged correctly.
Analysis
Figure 178 shows the troubleshooting flowchart.
Figure 178 Flowchart for troubleshooting SBFD session establishment failure
Solution
To resolve the issue:
1. Executing the display sbfd session initiator command to verify that SBFD session information exists.
a. If no SBFD session information exists, proceed to step 2.
b. If SBFD session information exists, but the Session state field value is Down, proceed to step 4.
2. Verify that the configuration of collaboration between an upper-layer protocol and SBFD exists.
Execute the display current-configuration command to identify whether the configuration of collaboration between an upper-layer protocol and SBFD exists on the initiator.
For example, the configuration of collaboration between an SRv6 TE policy and SBFD is as follows:
segment-routing ipv6
traffic-engineering
policy 1
sbfd enable remote 1000
¡ If such a collaboration configuration exists, proceed to step 3.
¡ If no such collaboration exists, configure the settings for the collaboration between the upper-layer protocol and SBFD, and make sure the settings are correct.
3. Identify whether the number of SBFD sessions has exceeded the device capacity.
a. Execute the display system internal bfd capability command to view the maximum number of sessions supported by the device in the Max session count field.
b. Use the display sbfd session initiator command to view the number of SBFD sessions on the device. Use the display bfd session command to view the number of BFD sessions on the device.
c. Verify that the total number of SBFD and BFD sessions has reached the maximum number of sessions supported by the device.
- If the total number of SBFD and BFD sessions has reached the maximum number of sessions supported by the device, new SBFD sessions cannot be created. To resolve this issue, you can delete unnecessary SBFD or BFD sessions. For example, if an unnecessary BFD session for OSPF exists, delete this session by using undo ospf bfd enable command.
- If the total number of SBFD and BFD sessions has not reached the maximum number of sessions supported by the device, proceed to step 4.
4. Identify whether the discriminators on SBFD initiator and reflector match.
Execute the display sbfd session initiator command on the SBFD session initiator to view the value for the Remote discr field. Then, execute the display current-configuration command on the SBFD session reflector to view the value of the local discriminator.
¡ If the remote discriminator of the SBFD session initiator matches the local discriminator of the SBFD session reflector, proceed to step 5.
¡ If the remote discriminator of the SBFD session initiator does not match the local discriminator of the SBFD session reflector, take relevant actions based on the actual situation.
- If the SBFD session initiator does not have a remote discriminator specified, specify a remote discriminator by using the command for SBFD collaboration with the upper-layer protocol, or by using the sbfd destination ipv4 or sbfd destination ipv6 command.
- If the SBFD session reflector does not have a local discriminator specified, or the local discriminator does not match the remote discriminator specified for the initiator, execute the sbfd local-discriminator command to configure a local discriminator or edit the local discriminator value.
5. Verify that the SBFD route and tunnel information are normal.
When the SBFD initiator or reflector forwards SBFD packets through the IP path, take the following steps to examine the route information:
a. Execute the display sbfd session initiator command on the SBFD session initiator to view the IPv4 or IPv6 address in the Destination IP field.
b. Execute the display ip routing-table or display ipv6 routing-table command on the SBFD session initiator to identify whether a route is available to the destination address indicated by the Destination IP field.
c. If no such a route exists, troubleshoot the route issue. For more information, see the Layer 3—IP routing troubleshooting guide.
If such a route exists, but the BFD session cannot come up, proceed to step 6.
When the SBFD initiator or reflector forwards SBFD packets through the LSP, PW, VXLAN, MPLS TE, SRLSP, or SRv6 TE policy tunnel, verify the tunnel status. For more information, see troubleshooting guide for the associated modules. If the tunnel status is abnormal, troubleshoot the tunnel failure. If the tunnel status is normal, but the SBFD session cannot come up, proceed to step 6.
6. Verify that the SBFD packets are sent correctly.
Repeatedly execute the display sbfd session initiator verbose command to view the changes in the value for the Tx count field. The Tx Count value represents the number of packets transmitted. If the value for this field is fixed at 0 or does not change, SBFD packet sending is abnormal. Follow these steps to examine the SBFD packet transmission status:
a. Execute the display interface interface-type interface-number command to view the interface running status. If the value for the Current state or Line protocol state field is not UP, troubleshoot the interface failure. If the interface is running correctly, proceed to step b.
b. Execute the debugging bfd error command to identify the packet sending failure reason based on the debugging information, and troubleshoot the failure according to the failure reason. For example:
<Sysname> debugging bfd error
*Feb 22 11:27:58:715 2023 Sysname BFD/7/DEBUG: Encap link head return:0x40010001
The information above indicates a link-layer header encapsulation failure for SBFD packets. If you cannot troubleshoot the failure, proceed to step 9.
c. After the previous operations, if SBFD can send packets correctly, but the SBFD session cannot come up or the SBFD session flaps, proceed to step 7. After the previous operations, if SBFD packet transmission anomalies still persist, proceed to step 8.
7. Verify that the SBFD packets are received correctly.
Repeatedly execute the display sbfd session initiator verbose command on the SBFD initiator to view the value for the Rx Count field. The Rx Count value represents the number of packets received.
a. If the value for this field is fixed at 0 or does not change, SBFD packet receiving is abnormal. Execute the display system internal bfd packet statistics command in probe check for packet loss count in The detailed discarded packet statistics field. If packet loss occurs, troubleshoot the fault according to the packet loss reason.
- If you cannot troubleshoot the fault, proceed to step 8.
- If no packet loss occurs, proceed to step 9.
b. If the value for the Rx Count field keeps increasing, but the SBFD session flaps, proceed to step 8.
8. Verify that the SBFD packets are forwarded correctly.
Use the ping tool to identify whether the link associated with the SBFD session can forward packets correctly. The ping tool used varies by link type. For more information, see Table 19.
Table 22 Link types and the associated ping tools
Link type |
Ping tool |
IP link |
IP ping tool. Execute ping ip or ping ipv6 command to verify the reachability of the specified IPv4 address or IPv6 address. |
LSP tunnel |
MPLS ping tool. Execute the ping mpls ipv4 command to verify the LSP tunnel connectivity. |
MPLS TE tunnel |
MPLS ping tool. Execute the ping mpls te command to verify the MPLS TE tunnel connectivity. |
PW |
MPLS ping tool. Execute the ping mpls pw command to verify the PW tunnel connectivity. |
SRv6 TE policy tunnel |
SRv6 TE policy ping tool. Execute the ping srv6-te policy command to verify the SRv6 forwarding path connectivity. |
¡ If the ping operation fails, troubleshoot the link failure. For more information, see the ping, MPLS, and segment routing troubleshooting guides.
¡ If the ping operation succeeds, proceed to step 9.
9. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: HH3C-BFD-STD-MIB
· hh3cBfdSessNumberLimit (1.3.6.1.4.1.25506.2.72.1.1.4)
Log messages
· BFD_REACHED_UPPER_LIMIT
Troubleshooting VRRP
Master/backup state change for the VRRP group
Symptom
The master and backup states of the devices have changed in the VRRP group. Log in to the two devices in the VRRP group separately, and execute the display vrrp command:
· In the command output, if the State field value is Master on one device and is Backup on the other device, the VRRP group is operating correctly, and no action is required.
· If other conditions exist, take relevant actions as described in this document.
<Sysname> display vrrp
IPv4 Virtual Router Information:
Running Mode : Standard
Enhanced sending of gratuitous ARP packets : Disabled
Total number of virtual routers : 1
Interface VRID State Running Adver Auth Virtual
Pri Timer Type IP
---------------------------------------------------------------------
GE2/0/1 1 Master 150 100 Simple 1.1.1.1
Common causes
The following are the common causes of this type of issue:
· An interface event is received. The state of the interface configured with the VRRP group has changed.
· The virtual IP address of the VRRP group is deleted.
· The status of the track entry associated with the VRRP group has changed.
· The master receives a VRRP packet with a higher priority.
· The current device becomes the IP address owner.
· Upon expiration of the timer, the backup has not received any VRRP packet from the master.
· The backup receives a packet with a priority of 0.
· A preemption occurs.
· The VRRP group is associated with a master VRRP group. A state change in master VRRP group results in a state change in the subordinate VRRP group.
Analysis
Figure 179 shows the troubleshooting flowchart.
Figure 179 Flowchart for troubleshooting master/backup state change of the VRRP group
Solution
1. To resolve the issue:
2. Log in to the two devices in the VRRP group separately and execute the display logbuffer | include VRRP_STATUS_CHANGE command. Obtain the log with the VRRP_STATUS_CHANGE digest. This log carries the state of the device in the VRRP group and the reason for the state change.
¡ The states of a device in the VRRP group include:
- Master—Indicates that the device is the master in the VRRP group.
- Backup—Indicates that the device is the backup in the VRRP group.
- Initialize—Indicates that the VRRP group is disabled on the device.
- Inactive—Indicates that the VRRP group is in invalid state. The reason might be that the virtual IP address is not configured, or the VRRP group is associated with a nonexistent master VRRP group.
¡ The state change reasons for the VRRP group of the device include:
- Interface event received—Indicates that an interface event has been received, and the state of the interface configured with the VRRP group has changed (reason 1).
- IP address deleted—The virtual IP address of the VRRP group has been deleted (reason 2).
- The status of the tracked object changed—The status of the track entry associated with the VRRP group has changed (reason 3).
- VRRP packet received—The master received a VRRP packet with a higher priority (reason 4).
- Current device has changed to IP address owner—The current device has become the IP address owner (reason 5).
- Master-down-timer expired—Upon expiration of the master-down-timer, the backup has not received any VRRP packet from the master (reason 6).
- Zero priority packet received—The backup has received a packet with a priority of 0 (reason 7).
- Preempt—A preemption has occurred (reason 8).
For example, the following log indicates that the state of the VRRP group on interface GigabitEthernet2/0/1 has changed from Master to Initialize due to a state change of interface GigabitEthernet2/0/1.
<Sysname> display logbuffer | include VRRP_STATUS_CHANGE
%Mar 12 14:10:32:110 2023 Sysname VRRP4/6/VRRP_STATUS_CHANGE: The status of IPv4 virtual router 1 (configured on GigabitEthernet2/0/1) changed from Master to Initialize: Interface event received.
3. Take relevant actions based on the VRRP state in the log and the state change reason.
¡ For reason 1 (an interface event has been received, and the state of the interface configured with the VRRP group has changed):
Execute the display interface command on both the local and remote ends to view the state of the connected interfaces of the VRRP group. If the interface state is Down, locate and resolve the interface failure based on the output information.
¡ For reason 2 (the virtual IP address of the VRRP group has been deleted): Execute the vrrp [ ipv6 ] vrid command in interface view to configure a virtual IP address for the VRRP group.
¡ For reason 3 (state change of the track entry associated with the VRRP group), first execute the display vrrp [ ipv6 ] command to obtain the ID of the associated track entry, and then use the display track command to locate and resolve the track entry fault.
¡ For reason 4 (the master received a VRRP packet with a higher priority), no action is required.
¡ For reason 5 (the current device has become the IP address owner), as a best practice, take the following actions:
Identify whether the local device is required to be configured as the IP address owner of the VRRP group: Execute the display vrrp [ ipv6 ] command without parameters on the local device to view the virtual IP address of the VRRP group, and execute the display interface brief command on the local device to view the device interface IP address. If the IP address of an interface on the device in the VRRP group is the same as the virtual IP address of the VRRP group, the device is called the IP address owner. If the VRRP group contains an IP address owner, it acts as the master as long as it is operating correctly.
- If the device is required to be configured as the IP address owner, no action is required.
- If the device is not required to be configured as the IP address owner, execute the vrrp [ ipv6 ] vrid command in interface view to edit the virtual IP address of the VRRP group.
¡ For reason 6 (the backup has not received any VRRP packet from the master upon expiration of the timer):
- Identify whether the peer device is faulty. Execute the display vrrp [ ipv6 ] command on the peer device. If the value for the State field is Initialize, VRRP fails to operate on this device. Identify the failure reason and restore the peer device.
- Identify whether the connected interfaces of the VRRP group are faulty. Execute the display interface command on both the local and peer devices to view the connected interface state of the VRRP group. If the interface state is Down, locate and resolve the interface failure based on the output information.
- Identify whether the VRRP configuration is incorrect by executing the display current-configuration | inculde vrrp command on the local and peer devices. The VRRP settings on the local and peer devices must meet the following requirements:
# The VRRP group number and virtual IP address settings must be the same on the local and peer devices. If they are different, configure them again with the vrrp [ ipv6 ] vrid command.
# IPv4 VRRP requires version consistency. If the versions are inconsistent, execute the vrrp version command in interface view to edit the version number. IPv6 VRRP supports only VRRPv3 that is not configurable.
# IPv4 VRRP requires authentication mode consistency. In addition, make sure authentication keys, if configured, are consistent. If the previous settings are inconsistent, execute the vrrp vrid authentication-mode command in interface view to edit the settings. IPv6 VRRP does not support authentication.
¡ For reason 7 (the backup has received a packet with a priority of 0), as a best practice, take the following actions:
- Execute the display vrrp [ ipv6 ] verbose command on the local and peer devices to view the configured VRRP priority (Config pri field).
If the configuration is correct, no action is required.
If the configuration is incorrect, execute the vrrp [ ipv6 ] vrid priority command in interface view to edit the configuration.
- Execute the display vrrp [ ipv6 ] verbose command on the local and peer devices to view the configured VRRP priority (Config pri field) and effective VRRP priority (Running pri field). If the values are inconsistent, obtain the associated track entry number, and the use the display track command to locate and resolve the track entry fault.
¡ For reason 8 (a preemption has occurred):
- If the preemption is manually triggered by the administrator, no action is required.
- If the preemption is automatically performed, the monitored object is faulty, and you need to further locate the reason for the automatic preemption.
4. Identify whether the state change of the VRRP group is due to the state change of the associated master VRRP group.
5. Execute the display vrrp [ ipv6 ] verbose command on the local device to obtain the name of the associated master VRRP group from the Follow Name field.
¡ If the master VRRP group does not exist, execute the vrrp [ ipv6 ] vrid command to create the master VRRP group.
¡ If the master VRRP group already exists, further locate the reason for state change in the master VRRP group according to the reason field value provided in the master VRRP group log.
6. If the issue persists, collect configuration data, log messages, and alarm information, and then contact H3C Support for help.
Related alarm and log messages
Alarm messages
Module name: VRRP-MIB
· vrrpTrapNewMaster (1.3.6.1.2.1.68.0.1)
Module name: HH3C-VRRP-EXT-MIB (supported by only V7B75)
· hh3cVrrpExtStateChange (1.3.6.1.4.1.25506.2.24.2.0.1)
Log messages
· VRRP4/6/VRRP_STATUS_CHANGE
· VRRP6/6/VRRP_STATUS_CHANGE
RBM issues
RBM channel establishment failure
Symptom
The RBM states on the two devices are both disconnected, and RBM channels cannot be established.
Common causes
The following are the common causes of this type of issue:
· The models and software versions of the devices are inconsistent.
· The RBM configurations of the two devices are incorrect.
· The RBM channel interfaces are in abnormal state.
Troubleshooting flow
Figure 180 shows the troubleshooting flowchart.
Figure 180 Troubleshooting flowchart for RBM channel establishment failure
Solution
1. Identify whether the models and software versions of the devices are consistent.
Use the display boot-loader command to identify whether the device models and versions for MPUs and service modules are consistent.
<Device> display boot-loader
Software images on slot 0:
Current software images:
cfa0:/SR6600X-CMW710-BOOT-F8149L19-RSE3.bin
cfa0:/SR6600X-CMW710-SYSTEM-F8149L19-RSE3.bin
Main startup software images:
cfa0:/SR6600X-CMW710-BOOT-F8149L19-RSE3.bin
cfa0:/SR6600X-CMW710-SYSTEM-F8149L19-RSE3.bin
Backup startup software images:
None
Software images on slot 1:
Current software images:
cfa0:/SR6600X-CMW710-BOOT-F8149L19-RSE3.bin
cfa0:/SR6600X-CMW710-SYSTEM-F8149L19-RSE3.bin
Main startup software images:
cfa0:/SR6600X-CMW710-BOOT-F8149L19-RSE3.bin
cfa0:/SR6600X-CMW710-SYSTEM-F8149L19-RSE3.bin
Backup startup software images:
None
2. Enter RBM view, and identify whether the RBM configurations are correct for the two devices.
Use the display remote-backup-group status command to identify whether the local IP and remote IP are correctly configured on the two ends, the remote-IP-associated ports are specified and consistent on the two ends, and the RBM roles are correctly configured on the two ends. (Make sure one device is assigned the primary role, and the other device is assigned the secondary role.)
Examine RBM status information:
<Device> display remote-backup-group status
Remote backup group information:
Backup mode: Dual-active
Device management role: Primary(Batch backup in progress)
Device running status: Active
Data channel interface: GigabitEthernet2/0/1
Local IP: 1.1.1.1
Remote IP: 1.1.1.2 Destination port: 1028
……
Edit the settings that are incorrect.
Configure the primary device:
<DeviceA> system-view
[DeviceA] remote-backup group
[DeviceA-remote-backup-group] remote-ip 1.1.1.2
[DeviceA-remote-backup-group] local-ip 1.1.1.1
[DeviceA-remote-backup-group] data-channel interface Route-Aggregation99
[DeviceA-remote-backup-group] device-role primary
Configure the secondary device:
<DeviceB> system-view
[DeviceB] remote-backup group
[DeviceB-remote-backup-group] remote-ip 1.1.1.1
[DeviceB-remote-backup-group] local-ip 1.1.1.2
[DeviceB-remote-backup-group] data-channel interface Route-Aggregation99
[DeviceB-remote-backup-group] device-role secondary
3. Examine the RBM channel interface state.
If the RBM channel interface is a physical interface, execute the display interface command to identify whether the physical and protocol states of the interface are both up. If the physical or protocol state is not up, troubleshoot the issue based on the reason displayed in the Cause field.
If the RBM channel interface is an aggregate interface, execute the display interface command to identify whether the aggregate interface is up. If the aggregate interface is not up, troubleshoot the issue. For more information, see the aggregate interface failure troubleshooting procedure for Ethernet link aggregation troubleshooting.
4. After completing the previous operations, execute the display remote-backup-group status command to view HA status information. If the issue has been resolved, the control channel establishment state is Connected.
Examine RBM status information:
<Device> display remote-backup-group status
Remote backup group information:
Backup mode: Dual-active
Device management role: Primary(Batch backup in progress)
Device running status: Active
Data channel interface: GigabitEthernet2/0/1
Local IP: 1.1.1.1
Remote IP: 1.1.1.2 Destination port: 1028
Control channel status: Connected
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of the each step.
¡ Configuration data, log messages, and alarm information.
Related alarm and log messages
Alarm messages
Module name: RBM-MIB
· hh3cRbmKeepaliveNormal (1.3.6.1.4.1.25506.2.187.1.2.0.1)
· hh3cRbmKeepaliveFailure (1.3.6.1.4.1.25506.2.187.1.2.0.2)
Log messages
RBM_CHANNEL
Inconsistent device configurations
Symptom
If the two devices have inconsistent configurations, when an active/standby switchover occurs, services might fail to be migrated smoothly or experience issues.
By default, RBM performs configuration consistency check every 24 hours. Upon detecting inconsistent configurations, the system reports an inconsistency alarm with the associated module. See the example below:
RBM_P[Device]%Dec 17 14:25:43:191 2020 H3C RBM/6/RBM_CFG_COMPARE_START: Started configuration consistency check.
%Dec 17 14:25:44:775 2020 H3C RBM/6/RBM_CFG_COMPARE_RESULT: The following modules have inconsistent configuration: acl.
%Dec 17 14:25:44:775 2020 H3C RBM/6/RBM_CFG_COMPARE_FINISH: Finished configuration consistency check.
Common causes
The following are the common causes of this type of issue:
· The RBM channel is disconnected.
· The configuration has been changed individually on the secondary device.
· Automatic configuration synchronization is not enabled.
Troubleshooting flow
Figure 190 shows the troubleshooting flowchart.
Figure 181 Troubleshooting flowchart for inconsistent device configurations
Solution
1. Execute the display remote-backup-group status command to identify whether the control channel has been established. If the control channel has not been established, troubleshoot the issue. For more information, see the procedures for troubleshooting RBM channel establishment failure. You must first establish a control channel and ensure the RBM channel availability.
Examine RBM status information:
<Device> display remote-backup-group status
Remote backup group information:
Backup mode: Dual-active
Device management role: Primary(Batch backup in progress)
Device running status: Active
Data channel interface: GigabitEthernet2/0/1
Local IP: 1.1.1.1
Remote IP: 1.1.1.2 Destination port: 1028
Control channel status: Connected
……
2. In RBM view, execute the configuration manual-sync-check command to perform a one-off configuration consistency check, and then execute the display remote-backup-group sync-check command to display the configuration consistency check result for RBM.
Display the configuration consistency check result for RBM:
<Device> display remote-backup-group sync-check
Inconsistent configuration exists.
Configuration on secondary device:
#
security-policy ip
rule 0 name abc
source-zone trust
destination-zone untrust
#
Configuration on primary device:
#
security-policy ip
rule 0 name abc
source-zone dmz
destination-zone trust
#
…
3. Identify the configuration differences based on the output from the display remote-backup-group sync-check command.
For example, if the system detects differences in the ACL module, as a best practice, compare ACL configurations of the two devices as follows:
¡ The secondary device has ACL 3000, which does not exist on the primary device.
- If ACL 3000 is required, add it on the primary device, and then execute the configuration manual-sync command to manually synchronize the configuration of the primary device to the secondary device.
- If ACL 3000 is not necessary, execute the configuration manual-sync command to manually synchronize the configuration of the primary device to the secondary device. In this case, the configuration on the secondary device will be overwritten, and ACL 3000 will be removed.
¡ The primary device has ACL 3000, which does not exist on the secondary device.
- If ACL 3000 is required, execute the configuration manual-sync command to manually synchronize the configuration of the primary device to the secondary device.
- If ACL 3000 is not necessary, delete it from the primary device, and then execute the configuration manual-sync command to manually synchronize the configuration of the primary device to the secondary device.
4. Execute the display remote-backup-group status command to identify whether automatic configuration backup is enabled. If the feature is not enabled, execute the configuration auto-sync enable command in RBM view of the two devices to enable automatic configuration backup.
Displays HA status information:
<Device> display remote-backup-group status
Remote backup group information:
Backup mode: Dual-active
Device management role: Primary(Batch backup in progress)
Device running status: Active
Data channel interface: GigabitEthernet2/0/1
Local IP: 1.1.1.1
Remote IP: 1.1.1.2 Destination port: 1028
Control channel status: Connected
Keepalive interval: 1s
Keepalive count: 10
Configuration consistency check interval: 24 hour
Configuration consistency check result: Consistent
Configuration backup status: Batch backup (Do not operate
the device at will, such as board insertion and removal.)
……
5. After completing the previous operations, initiate a configuration consistency check again. If the issue has been resolved, the configurations are consistent in the configuration consistency check result.
Display the configuration consistency check result for RBM:
<Device> display remote-backup-group sync-check
No inconsistent configuration exists.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of the each step.
¡ Configuration data, log messages, and alarm information.
Related alarm and log messages
Alarm messages
Module name: RBM-MIB
· hh3cRbmKeepaliveNormal (1.3.6.1.4.1.25506.2.187.1.2.0.1)
· hh3cRbmKeepaliveFailure (1.3.6.1.4.1.25506.2.187.1.2.0.2)
· hh3cRbmCfgInconsistentTrap (1.3.6.1.4.1.25506.2.187.1.2.0.4)
Log messages
· CFG_BATCH_SYNC
· CFG_COMPARE
· RBM_CHANNEL
Troubleshooting system management issues
NETCONF issues
NETCONF over SOAP access failure
Symptom
The device acts as a NETCONF server. A user failed to log in to the device from a NETCONF over SOAP client.
Common causes
The following are the common causes of this type of issue:
· The SOAP client cannot communicate with the device. A TCP connection fails to be established.
· The device is disabled with NETCONF over SOAP.
· An ACL is configured on the server to control client access and the IP address of the client is not in an ACL-based access control permit rule.
· The local user is not authorized to use the HTTP or HTTPS service.
· The authentication method of the local user is not configured correctly.
· The number of HTTP/HTTPS login users has reached the upper limit.
Troubleshooting flow
Figure 182 shows the troubleshooting flowchart.
Figure 182 Flowchart for troubleshooting NETCONF over SOAP access failure
Solution
1. Identify whether a physical link failure exists.
Log in to the device through Telnet (user role name network-admin). Identify whether the device can ping the IP address of the NETCONF client. If the ping operation fails, execute either the display ip routing-table command or the display route-static routing-table command to obtain the output interface for the route to the client. Then, execute the display interface command to check the interface state.
<Sysname> display interface GigabitEthernet 2/0/1
GigabitEthernet2/0/1
Interface index: 386
Current state: Administratively DOWN
Line protocol state: DOWN
...
a. If the Current state field displays Administratively DOWN, execute the undo shutdown command on the interface to bring up the interface. If the Current state field displays DOWN, check the physical connection of the interface.
b. If other devices exist between the device and the client, check the interface state and restore physical connections on each device hop-by-hop as mentioned above.
2. Execute the display netconf service command to identify whether NETCONF over SOAP is enabled.
<Sysname> display netconf service
NETCONF over SOAP over HTTP: Disabled (port 80)
NETCONF over SOAP over HTTPS: Disabled (port 832)
NETCONF over SSH: Disabled (port 830)
NETCONF over Telnet: Enabled
NETCONF over Console: Enabled
...
If the NETCONF over SOAP over HTTP field displays Disabled, enable NETCONF over SOAP over HTTP by using the netconf soap http enable command in system view.
If the NETCONF over SOAP over HTTPS field displays Disabled, enable NETCONF over SOAP over HTTPS by using the netconf soap https enable command in system view.
3. Identify whether an ACL is configured to control client access.
<Sysname> display current-configuration | begin netconf
netconf soap http enable
netconf soap https enable
netconf soap http acl 2000
#
[Sysname] acl basic 2000
[Sysname-acl-ipv4-basic-2000] display this
#
acl basic 2000
rule 5 permit source 192.168.4.10 0
rule 10 permit source 192.168.4.15 0
...
If configuration about the netconf soap { http | https } acl command exists, use one of the following methods as required:
¡ Execute the rule command in the corresponding ACL view. Make sure the client IP address matches an ACL rule.
¡ Execute the undo netconf soap { http | https } acl command to disassociate NETCONF over SOAP from the ACL.
4. When local authentication is used, identify whether the local user corresponding to the client can use the HTTP and HTTPS services.
Enter local user view and execute the display this command. Make sure the service-type http https command is configured.
<Sysname> system-view
[Sysname] local-user test
[Sysname-luser-manage-test] display this
#
local-user test class manage
service-type http https
authorization-attribute user-role network-operator
5. When local authentication is used, execute the display domain command to check the authentication, authorization, and accounting configuration in the ISP domain.
<Sysname> display domain
Total 12 domains
Domain: system
Current state: Active
State configuration: Active
Default authentication scheme: Local
Default authorization scheme: Local
Default accounting scheme: Local
...
For example, in ISP domain system, execute the following commands to configure local authentication, authorization, and accounting for login users.
<Sysname> system-view
[Sysname] domain system
[Sysname-isp-system] authentication login local
[Sysname-isp-system] authorization login local
[Sysname-isp-system] accounting login local
6. Identify whether the number of users logging in to the device has reached the upper limit.
Execute the display netconf service command on the device to check the value for the Active Sessions field. This field indicates the number of active NETCONF sessions. If the value has reached the maximum number of concurrent HTTP or HTTPS users, use one of the following methods to resolve the issue:
¡ Execute the aaa session-limit { http | https } max-sessions command to set a large upper limit for concurrent HTTP or HTTPS users that can log in to the device.
¡ Perform the <kill-session> operation to forcibly release some established NETCONF over SOAP sessions so that new users can come online.
# Execute the display netconf session command to view NETCONF session information.
<Sysname> display netconf session
Session ID: 1 Session type : SOAP
Username : yy
...
# The XML message for the <kill-session> operation is as follows:
<rpc message-id="100" xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<kill-session>
<session-id>1</session-id>
</kill-session>
</rpc>
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
NETCONF/6/SOAP_XML_LOGIN
NETCONF over SSH access failure
Symptom
A configuration tool failed to log in to the device through SSH.
Solution
Troubleshoot the issue as shown in "Troubleshooting security."
Troubleshooting network management and monitoring issues
Troubleshooting NQA
ICMP echo operation failure
Symptom
1. Execute the display nqa { history | result | statistics } command on the device. The following conditions indicate an operation failure:
¡ If you execute the display nqa history command, the Status field value in the output is not Succeeded.
¡ If you execute the display nqa result or display nqa statistics command, the Extended results field value in the output is not 0.
2. If an application module (such as Track) is associated with a failed NQA operation, the application module takes corresponding actions. For example, the state of the track entry will change from Positive to Negative or NotReady.
Common causes
1. If the Status field value in the operation result is Internal error or Unknown error, the following are the common causes of this type of issue:
¡ The device does not have a route or ARP entry destined for the destination address of the operation.
¡ Insufficient device memory.
¡ Other internal reasons.
2. If the Status field value in the operation result is Timeout, the following are the common causes of this type of issue:
¡ Network errors:
- Probe packets are mistakenly identified as attack packets and dropped by security devices.
- The network time frequently changes.
- Transmission of probe packets fails and cyclic redundancy check (CRC) errors occur on the interface.
- Probe packets are lost on the packet transmission path due to other reasons.
¡ Configuration errors:
- The network is complicated. Too many devices exist between the source and the destination of the operation. The default TTL value, which is 20, cannot meet the requirements.
- Probe packets are excessively large, causing too many fragments and processing timeout.
- The output interface and next hop are configured incorrectly.
- The source address is configured incorrectly.
- The probe timeout time is too small.
Troubleshooting flow
Figure 183 shows the troubleshooting flowchart.
Figure 183 Flowchart for troubleshooting ICMP echo operation failure
Solution
1. Execute the display nqa { history | result | statistics } command to view the NQA operation results. Identify the failed NQA operation, its execution time, and the failure type.
¡ If the value of the Status field in the output of the display nqa history command is not Succeeded, it indicates that the NQA operation fails.
<Sysname> display nqa history admin test
NQA entry (admin admin, tag test) history records:
Index Response Status Time
10 500 Timeout 2023-03-12 17:03:01.6
9 500 Timeout 2023-03-12 17:03:01.1
...
Available values for the Status field include:
- Succeeded.
- Internal error. (This state does not trigger the Track state change in NQA-Track collaboration.)
- Unknown error. (This state triggers the Track state change in NQA-Track collaboration.)
- Timeout. (This state triggers the Track state change in NQA-Track collaboration.)
¡ If the values in the Extended results area in the output of the display nqa result command are not 0, it indicates that the most recent NQA operation fails.
<Sysname> display nqa result admin test
NQA entry (admin admin, tag test) test results:
Send operation times: 1 Receive response times: 1
Min/Max/Average round trip time: 35/35/35
Square-Sum of round trip time: 1225
Last succeeded probe time: 2023-03-12 10:50:33.2
Extended results:
Packet loss ratio: 0%
Failures due to timeout: 0
Failures due to disconnect: 0
Failures due to no connection: 0
Failures due to internal error: 0
Failures due to other errors: 0
¡ If the values in the Extended results area in the output of the display nqa statistics command are not 0, it indicates that a performed NQA operation fails.
<Sysname> display nqa statistics admin test
NQA entry (admin admin, tag test) test statistics:
NO. : 1
Start time: 2023-03-12 09:30:20.0
Life time: 2 seconds
Send operation times: 1 Receive response times: 1
Min/Max/Average round trip time: 13/13/13
Square-Sum of round trip time: 169
Extended results:
Packet loss ratio: 0%
Failures due to timeout: 0
Failures due to disconnect: 0
Failures due to no connection: 0
Failures due to internal error: 0
Failures due to other errors: 0
¡ If the time displayed in the output of the display nqa { history | result | statistics } command is not as expected, the NQA operation you configured might not have started. Execute the nqa schedule command to start the NQA operation.
2. To resolve an NQA operation failure due to internal errors or unknown errors, perform the following tasks:
a. Identify the destination address of the NQA operation.
Execute the display current-configuration [ configuration nqa ] command to view NQA configuration. The destination ip or destination ipv6 field in the output displays the destination address of the NQA operation. If the destination address is incorrect, execute the undo nqa schedule command in system view to stop the NQA operation. Then, execute the destination ip or destination ipv6 command in NQA operation view to edit the destination address before you restart the operation.
b. Execute the ping command to ping the destination address of the NQA operation. If the address cannot be pinged, resolve the unreachability issue. If there is a data link to the destination address but the device has no routes to the address in the routing table, execute the out interface or next-hop ip command in ICMP echo operation view. With either command executed, NQA will skip the routing table lookup and directly encapsulate the NQA probe packets with the specified IP address.
c. Identify whether the NQA operation failure is caused by device memory insufficiency.
d. Execute the display memory-threshold command to view information about the memory alarm threshold. If the value of the Current free-memory state field is Minor (level-1 alarm threshold state), Severe (level-2 alarm threshold state), or Critical (level-3 alarm threshold state), it indicates that the device memory is insufficient. Resolve the device memory insufficiency.
3. To resolve an NQA operation failure due to timeout issues, perform the following tasks:
a. Identify the destination address of the NQA operation.
Execute the display current-configuration [ configuration nqa ] command to view NQA configuration. The destination ip or destination ipv6 field in the output displays the destination address of the NQA operation. If the destination address is incorrect, execute the undo nqa schedule command in system view to stop the NQA operation. Then, execute the destination ip or destination ipv6 command in NQA operation view to edit the destination address before you restart the operation.
b. Execute the ping command to ping the destination address of the NQA operation. If the IP address cannot be pinged, resolve the unreachability issue. If there is a data link to the destination address but the device has no routes to the address in the routing table, execute the out interface or next-hop ip command in ICMP echo operation view. With either command executed, NQA will skip the routing table lookup and directly encapsulate the NQA probe packets with the specified IP address.
c. If the destination address can be pinged but packet loss randomly occurs, execute the display nqa statistics command to view the value of the Max round trip time field. Identify whether the value is close to the value configured by the probe timeout command in ICMP echo operation view.
- If they are close, it indicates that the link delay is relatively high and the probe timeout time is too small. Execute the probe timeout command to set a probe timeout time that is greater than the maximum round-trip time.
- If they are not close, the random packet loss issue exists on the link. Identify whether CRC errors occur on the input interface of the probe responses by executing the display interface command. The value of the CRC field in the Input area of the command output represents the number of inbound packets that contained CRC errors. If this number continues to grow rapidly, a component on the transmission path might be faulty. Further troubleshoot the issue.
d. Execute the ping command to ping the destination address of the NQA operation with the same settings configured for the operation. Identify whether the NQA operation failed due to incorrect parameter configuration. The settings include the packet size, output interface, next hop, source address, and initial TTL value.
- If the destination address can be pinged, the security devices on the probed path might have filtered the NQA probe packets. Further troubleshoot the issue.
- If the destination address cannot be pinged, edit the settings for probe packets. If the destination address can be pinged, the NQA operation failure might be caused by incorrect parameter configuration.
The default TTL value is 20 for ICMP echo operations. If more than 20 devices exist between the source and destination on the network, execute the ttl command in ICMP echo operation view to set a greater TTL value.
If the probe packets are too large, causing excessive fragments and processing timeout, execute the data-size command in ICMP echo operation view to set a smaller payload size for each probe packet.
If the output interface and next hop are configured incorrectly, execute the out interface and nexthop commands in ICMP echo operation view to edit the output interface and next hop.
If the source address is configured incorrectly, execute the source ip or source ipv6 command in ICMP echo operation view to edit the source address for probe packets. ICMP echo operations do not support source port configuration. For operations that support source port configuration, also execute the source port command to edit the source port.
If the probe timeout time is too small, execute the probe timeout command in ICMP echo operation view to set a greater probe timeout timer.
- If the destination address still cannot be pinged, identify the cause of packet loss through methods such as packet capture, traffic measurement, and debugging.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: HH3C-NQA-MIB
· hh3cNqaProbeFailure (1.3.6.1.4.1.25506.8.3.3.3)
· hh3cNqaProbeTimeAboveThreshold (1.3.6.1.4.1.25506.8.3.3.10)
· hh3cNqaProbeTimeBelowThreshold (1.3.6.1.4.1.25506.8.3.3.11)
· hh3cNqaProbeFailAboveThreshold (1.3.6.1.4.1.25506.8.3.3.12)
· h3cNqaProbeFailBelowThreshold (1.3.6.1.4.1.25506.8.3.3.13)
Log messages
· NQA/6/NQA_LOG_UNREACHABLE
· NQA/6/NQA_PACKET_OVERSIZE
· NQA/4/NQA_SCHEDULE_FAILURE
· NQA/4/NQA_SEVER_FAILURE
· NQA/6/NQA_START_FAILURE
TWAMP Light test failure
Symptom
The device, acting as the TWAMP Light sender, starts a TWAMP Light test to the destination end. The TWAMP Light test fails when one of the following conditions occurs:
· The status of the TWAMP Light test is abnormal.
Execute the display nqa twamp-light client command on the device. If the Status field value in the output is Inactive, the TWAMP Light test has not started and is considered a failure.
· The TWAMP Light test result contains an anomaly.
Execute the display nqa twamp-light client statistics two-way-loss test-session command on the device. If the Loss count field value is not 0, it indicates that TWAMP Light test packet loss occurs on the network. If the Error count field value is not 0, it indicates that the device receives error TWAMP Light test packets. When the number of lost packets or error packets exceeds the threshold allowed for user services, the TWAMP Light test is considered a failure.
Common causes
· The following are the common causes of test status anomaly:
¡ On a Layer 3 VPN network, the VPN is deleted.
¡ On a Layer 2 VPN network, the source AC state changes to down.
¡ An interface card is removed and the interface specified by the source interface command does not exist.
· The following are the common causes of test result anomaly:
¡ Packet loss occurs.
- Settings on the TWAMP Light sender and responder do not match.
- The device cannot communicate with the destination address of the test. The destination address cannot be pinged or packet loss occurs during the ping operation.
- CRC errors occur on the interface.
¡ Packet error occurs.
- The timeout time specified by the timeout keyword in the start command is too small. After you execute the start command to start the TWAMP Light test, the reflected package reaches the device after the timeout timer expires. The device takes the reflected packet as an error packet.
- The content of the test packets contains fields that do not comply with the protocol requirements.
- Packet encapsulation fails.
Troubleshooting flow
Figure 184 shows the troubleshooting flowchart:
Figure 184 Flowchart for troubleshooting TWAMP Light test failure
Solution
1. Collect the status and results of the failed TWAMP Light test.
Execute the display nqa twamp-light client and display nqa twamp-light client statistics two-way-loss test-session commands on the device. Identify the failed TWAMP Light test and collect its status and results.
¡ Execute the display nqa twamp-light client command on the device. If the Status field value in the output is Inactive, it indicates that the status of the TWAMP Light test is abnormal.
<Sysname> display nqa twamp-light client
Brief information about all test sessions:
Total sessions: 1
Active sessions: 1
-----------------------------------------------------------------------------
ID Status Source IP/Port Destination IP/Port
1 Active 1.1.1.1/10000 1.1.1.2/20000
¡ Execute the display nqa twamp-light client statistics two-way-loss test-session command on the device. If the Loss count field value is not 0, it indicates that TWAMP Light test packet loss has occurred on the network. If the Error count field value is not 0, it indicates that the device receives error TWAMP Light test packets.
<Sysname> display nqa twamp-light client statistics two-way-delay test-session 1
Latest two-way loss statistics:
Index Loss count Loss ratio Error count Error ratio
1 200 100.0000% 0 0.0000%
2 200 100.0000% 0 0.0000%
3 200 100.0000% 0 0.0000%
4 200 100.0000% 0 0.0000%
5 200 100.0000% 0 0.0000%
--------------------------------------------------------------------------------
Average loss count : 200 Average loss ratio : 100.0000%
Maximum loss count : 200 Maximum loss ratio : 100.0000%
Minimum loss count : 200 Minimum loss ratio : 100.0000%
Average error count : 0 Average error ratio : 0.0000%
Maximum error count : 0 Maximum error ratio : 0.0000%
Minimum error count : 0 Minimum error ratio : 0.0000%
2. To resolve the test status anomaly, perform the following tasks:
a. If the device has just completed startup or active/standby switchover, or the interface card where the interface (specified in the source interface command) is located has not completed startup, wait for the device state to become stable. Execute the display system stable state command. If the System state field value in the output is Stable, the device is already in a stable state. In this case, identify whether the test status changes to Active.
- If it changes to Active, no further action is required.
- If it does not change to Active, proceed to the next step.
b. If the device is operating stably, identify whether the configuration is complete.
- On a Layer 3 VPN network, execute the display nqa twamp-light client verbose command to view the VPN bound to the TWAMP Light test and execute the display ip vpn-instance command to identify whether the VPN exists. If the bound VPN does not exist, execute the ip vpn-instance command in system view to create a VPN instance.
- On a Layer 2 VPN network, execute the display nqa twamp-light client verbose command to view the source interface. If the Source interface field value is a hyphen (-), execute the source interface command in TWAMP Light client-session view to specify a source AC for test packets, and make sure the specified interface is up.
c. Identify whether the network connection is ready. If a source interface or source AC is specified for the TWAMP Light test, make sure the source interface or source AC is up.
- Execute the display l2vpn pw xconnect-group or display l2vpn forwarding ac command. If the State field (which represents AC state) value is Down, resolve the AC issue.
- Execute the display interface command. If the values for the Current state and Line protocol state fields (which indicate the interface state) are Down, bring up the interface.
3. To resolve the test packet loss issue, perform the following tasks:
a. Identify whether packet loss is due to configuration errors.
Execute the display nqa twamp-light client verbose command on the device and the display nqa twamp-light responder command on the TWAMP Light responder of the test. If the following parameters are specified, the settings on the TWAMP Light sender and responder must be consistent.
- Source IP address. You can edit this parameter on the TWAMP Light sender by using the source ip or source ipv6 command in TWAMP Light client-session view.
- Source port number. You can edit this parameter on the TWAMP Light sender by using the source port command in TWAMP Light client-session view.
- Destination IP address. You can edit this parameter on the TWAMP Light sender by using the source ip or source ipv6 command in TWAMP Light client-session view.
- Destination port number. You can edit this parameter on the TWAMP Light sender by using the destination port command in TWAMP Light client-session view.
- VPN instance name. You can edit this parameter on the TWAMP Light sender by using the vpn-instance command in TWAMP Light client-session view.
- VLAN ID. You can edit this parameter on the TWAMP Light sender by using the vlan command in TWAMP Light client-session view.
- Source MAC address. You can edit this parameter on the TWAMP Light sender by using the source mac command in TWAMP Light client-session view.
- Destination MAC address. You can edit this parameter on the TWAMP Light sender by using the destination mac command in TWAMP Light client-session view.
You can edit all the above parameters on the TWAMP Light responder by using the test-session command in TWAMP Light responder view.
b. Identify whether packet loss is caused by network failures. Execute the ping command to ping the destination address of the test. If the destination address cannot be pinged or packet loss occurs, first resolve the network failures.
c. Identify whether the packet loss is caused by CRC errors.
Execute the display counters command. If the value of the Err (pkts) field in the command output keeps increasing as the test progresses, it indicates that packet sending failures occur on the link layer. Replace the associated interfaces or cables to resolve the issue.
4. To resolve the error packet issue, perform the following tasks:
a. Identify whether configuration errors exist and the device mistakenly considers TWAMP Light reflected packets that arrive late as error packets.
- Execute the ping command on the TWAMP Light sender to ping the responder and view the maximum delay between the two ends, which is the max value in the round-trip min/avg/max/std-dev field of the ping results, in milliseconds.
- Execute the display nqa twamp-light client verbose command on the device and view the value of the Timeout(sec) field, which represents the timeout time of the TWAMP Light reflected packets. The timeout time of TWAMP Light reflected packets must be greater than the maximum delay between the two ends. If it is not, execute the start command in TWAMP Light sender view to specify a larger timeout time with the time-out keyword.
b. Execute the terminal monitor, terminal debugging, debugging nqa error, and debugging nqa event commands sequentially in user view on the TWAMP Light sender to enable debugging for NQA. Then, execute the view /var/log/trace.log command in probe view to view NQA trace logs. Based on the logs, identify whether the packet content meets protocol requirements and whether the packet encapsulation is correct. If the packet content does not meet protocol requirements or the packet encapsulation is incorrect, re-configure the TWAMP Light test as described in TWAMP Light configuration.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· NQA/6/NQA_TWAMP_LIGHT_PACKET_INVALID
· NQA/6/NQA_TWAMP_LIGHT_REACTION
· NQAS/6/NQA_TWAMP_LIGHT_START_FAILURE
NTP issues
NTP clock synchronization failure
Symptom
The device, acting as an NTP client, fails to synchronize the clock with the NTP server. The output from the display ntp-service status command on the device shows that the value of the Clock status field is unsynchronized, indicating that the NTP clock has not been synchronized.
Common causes
The following are the common causes of this type of issue:
· NTP configuration error.
· NTP clock synchronization link disconnected.
· Link flapping and unstable latency.
· Time synchronization anomaly at the NTP server side.
Troubleshooting flow
Figure 185 shows the troubleshooting flowchart.
Figure 185 Flowchart for troubleshooting NTP clock synchronization failure
Solution
NTP can operate in the following association modes:
· Client/server mode
· Symmetric active/passive peer mode
· Broadcast mode
· Multicast mode
The client/server mode, broadcast mode, and multicast mode all derive from the client/server (C/S) model. The following information is the troubleshooting procedure for NTP time synchronization failure in the C/S model. If you are synchronizing local time with a peer using the symmetric active/passive mode, you can also use this procedure by treating the local end as the client in the C/S model.
To resolve the issue:
1. Execute the display ntp-service [ ipv6 ] sessions command on the device. If no information is output, NTP is not enabled on the device. Configure NTP settings by referring to the configuration guide for the device. If information is output from the command, locate the issue as follows:
a. Identify whether the value of the source field is the IP address of the intended NTP server. If the source field value is not that of the intended NTP server, proceed to step 2 to modify the NTP configuration.
b. The device, when acting as an NTP client, requires the clock stratum of the NTP server to be in the inclusive range of 0 to 14. If the clock stratum of the NTP server exceeds 14, the device will not synchronize with the NTP server's clock (a higher stratum value indicates lower clock accuracy).
- In IPv4 NTP, the stra field in the command output displays the clock stratum of the NTP server.
- In IPv6 NTP, the Clock stratum field in the command output displays the clock stratum of the server.
To change the clock stratum of the NTP server, log in to the NTP server and execute the ntp-service refclock-master command.
c. Identify the reachability of the client to the clock source. In IPv4 NTP, check the reach field. In IPv6, check the Reachabilities field. If the value of the field is 0, the client and server are not reachable to each other. Go to step 3.
Sample command output about IPv4 NTP sessions:
<Sysname> display ntp-service sessions
source reference stra reach poll now offset delay disper
********************************************************************************
[12345]LOCAL(0) LOCL 0 1 64 - 0.0000 0.0000 7937.9
[5]1.1.1.1 INIT 16 0 64 - 0.0000 0.0000 0.0000
Notes: 1 source(master), 2 source(peer), 3 selected, 4 candidate, 5 configured.
Total sessions: 1
Sample command output about IPv6 NTP sessions:
<Sysname> display ntp-service ipv6 sessions
Notes: 1 source(master), 2 source(peer), 3 selected, 4 candidate, 5 configured.
Source: [12345]3000::32
Reference: 127.127.1.0 Clock stratum: 2
Reachabilities: 1 Poll interval: 64
Last receive time: 6 Offset: -0.0
Roundtrip delay: 0.0 Dispersion: 0.0
Total sessions: 1
2. Identify if the NTP-related configuration is correct.
Determine the NTP association mode used by the device according to the network plan. Based on the NTP association mode, execute the display current-configuration | include ntp-service command on the device to view the NTP-related configuration and identify if the current NTP configuration is correct. For example, when the local end acts as a client for time synchronization in client/server mode, the NTP configuration must meet the following requirements:
¡ The ntp-service [ ipv6 ] unicast-server command has been configured in system view to specify the NTP server IP address correctly.
¡ The clock protocol ntp command has been configured in system view to specify NTP for obtaining the system time.
¡ If NTP authentication is required, make sure the authentication key configured on the device and the NTP server is the same, and the authentication key used on this device is trusted on the server. Verify that the ntp-service authentication-keyid specify an authentication key) and ntp-service reliable authentication-keyid (configure the authentication key as a trust key) commands have been configured in system view correctly.
3. Execute the ping command to verify the reachability between the device and the NTP server.
¡ If the ping is successful, the device and the NTP server are reachable to each other. Proceed to step 4.
¡ If the ping fails, see the ping failure troubleshooting procedure in "Troubleshooting network management and monitoring" to resolve the issue. Proceed to step 4 after the ping succeeds.
4. Execute the debugging ntp-service all command in user view to enable NTP debugging and view NTP debugging information. The local end will not synchronize with the NTP server if the following conditions exist.
¡ If "The packet from ip-address failed the validity tests result" is displayed in the debugging information, the packets from the specified NTP server failed the validity check on the device. The device will not synchronize with the clock of that NTP server.
¡ if delay in rdel: delay > 16000, disper in rdsp: disper > 16000, or delay/2+disper > 16000, the clock provided by the NTP server has a too large offset, and the device will not synchronize the clock with that NTP server.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration file, log messages, and alarm messages.
|
NOTE: NTP message exchange is slow. After you execute the debugging ntp-service all command to enable NTP debugging, the debugging information might be displayed in 5 to 10 minutes. |
Related alarm and log messages
Alarm messages
N/A
Log messages
· NTP/5/NTP_CLOCK_CHANGE
· NTP/5/NTP_LEAP_CHANGE
· NTP/4/NTP_SOURCE_LOST
· NTP/5/NTP_STRATUM_CHANGE
Ping and tracert failures
Ping failure
Symptom
No response is received from the destination within a period of time if a ping operation is performed at the source.
Common causes
The following are the common causes for this type of issue:
· The source did not send a request.
· The destination did not send a response.
· Packet loss or long transmission time issues occurred on the intermediate device.
The following are the common causes for this type of issue:
· The transmission delay is long. Even though the source received the response from the destination, the wait timeout timer has expired, causing the ping operation to fail.
· Incorrect configuration. For example, when a ping packet is too large, the MTU value for the outgoing interface is smaller, and the do not fragmentation feature is enabled.
· Missing entries in the FIB or ARP table.
· Attack protection configuration exists.
· Hardware failure.
Analysis
The troubleshooting flow for issues of this type is as follows:
1. Identify whether the parameters for the ping operation are correct and adjust the parameters as required.
2. Display ping statistics to identify the node where the issue occurred.
3. Check for ARP and FIB entries that reach the destination.
4. Identify whether the ping packets are dropped due to attack protection configuration.
Figure 186 shows the fault diagnosis flowchart.
Figure 186 Flowchart for troubleshooting ping failure
Solution
To resolve the issue:
1. Identify whether the parameters for the ping operation are correct and adjust the parameters as required.
a. Identify whether the transmission delay is too long.
Identify whether the ping -t timeout command has been executed. If so, increase the value of the -t option (1000 as a best practice) or remove it and perform the ping operation again. If the issue is resolved, the issue might be caused by large network delay. If the issue persists, go to the next step.
The -t option specifies the timeout time (in milliseconds) of an ICMP echo reply. The default is 2000. If the source does not receive an ICMP echo reply before the timeout timer expires, it determines that the ICMP echo reply has timed out.
b. Identify whether the ping packet is discarded because it is too large.
Identify whether the ping -f –s packet-size command has been executed. If it has and the MTU for an outgoing interface on the packet forwarding path is smaller than packet-size, the packet will be discarded, because it is too large and cannot be fragmented. To resolve the issue, reduce the packet length or remove the -f keyword.
The -f option sets the "do-not-fragment" bit in the IP header. The ICMP echo requests will not be fragmented.
The -s packet-size option specifies the length (in bytes) of ICMP echo requests (excluding the IP packet header and the ICMP packet header). The default is 56.
The default MTU for an Ethernet interface is 1500 bytes. To view the MTU for an interface, execute the display interface command.
<Sysname> display interface
Current state: UP
Line protocol state: UP
Description: Interface
Bandwidth: 1000000 kbps
Maximum transmission unit: 1500
...
c. Identify whether a wrong outgoing interface is specified.
Identify whether the ping -i interface-type interface-number command has been executed to specify the outgoing interface for ping packets. If an outgoing interface is specified, make sure the physical link between the interface and the destination is reachable. If no outgoing interface is specified, specify another interface or remove the -i option.
The -i interface-type interface-number option specifies the source interface for ICMP echo requests. If you do not specify this option, the system uses the primary IP address of the matching route's egress interface as the source interface for ICMP echo requests.
d. Identify whether a source address is specified.
Identify whether the ping –a source-ip command has been executed to specify the source address for ping packets. If this command has been executed, make sure the intermediate device and the destination have a route to the source IP address.
The -a source-ip option specifies an IP address of the device as the source IP address for ICMP echo requests. If this option is not specified, the source IP address for ICMP echo requests is the primary IP address of the outgoing interface.
e. Identify whether a correct VPN is specified for the destination.
Based on network planning and deployment, determine whether the destination belongs to a specific VPN. If the destination belongs to a VPN, specify the -vpn-instance keyword in the ping command.
2. Check the packet statistics for the source, destination, and intermediate devices to determine the device where the ping failure occurred.
¡ Identify whether the source has sent an ICMP echo request and received an ICMP echo reply.
After you perform the ping operation at the source, execute the display icmp statistics command at both the source and destination to check the ICMP packet transmission status. You can determine the orientation of the ping failure based on the number of packets in the Input and Output sections in the statistics:
- If the echo value in the Output section at the source is increasing normally, but the echo replies value in the Input section is not, it indicates that the source has sent a request but has not received any response. If the counts in both the Input and Output sections at the destination remain unchanged, it indicates that the destination neither received the request nor responded. Then, you can determine that a forwarding failure of ping packets has occurred from the source to the destination.
- If the echo value in the Output section at the source is increasing normally, but the echo replies value in the Input section is not, it indicates that the source has sent a request but has not received any response. If the counts in both the Input and Output sections at the destination are increasing normally, it indicates that the destination has received the request and has responded. Then, you can determine that a forwarding failure of ping packets has occurred from the destination to the source.
The output from the display icmp statistics command is shown as follows:
<Sysname> display icmp statistics
Input: bad formats 0 bad checksum 0
echo 1 destination unreachable 0
source quench 0 redirects 0
echo replies 0 parameter problem 0
timestamp 0 information requests 0
mask requests 0 mask replies 0
time exceeded 0 invalid type 0
router advert 0 router solicit 0
broadcast/multicast echo requests ignored 0
broadcast/multicast timestamp requests ignored 0
Output: echo 0 destination unreachable 0
source quench 0 redirects 0
echo replies 1 parameter
problem 0
timestamp 0 information replies 0
mask requests 0 mask replies 0
time exceeded 0 bad address 0
packet error 0 router advert 0
...
IMPORTANT: · If the destination is a modular device or an IRF member device, and the ICMP packet has reached the destination without being fragmented, execute the display icmp statistics command with the slot parameter at the destination to view the ICMP packet statistics. The slot parameter indicates the slot where the interface that received the ICMP packet is located. · If the destination is a modular device or an IRF member device and the ICMP packet is fragmented before it reaches the destination, execute the display icmp statistics command at the destination to view ICMP packet statistics. |
3. Identify the node where the failure occurred.
After you determine the orientation of the ping failure, execute the tracert command to identify the location where the packet was discarded in that direction.
¡ If there are issues from the source to the destination, start troubleshooting from the source.
¡ If there are issues from the destination to the source, start troubleshooting from the destination.
As shown in the example below, execute the tracert command to view the path a packet traverses from the source to the destination and information about Layer 3 devices in the private network it traverses. The destination has an IP address of 1.1.3.2 and belongs to vpn1.
<Sysname> tracert –vpn-instance vpn1 –resolve-as vpn 1.1.3.2
traceroute to 1.1.3.2 (1.1.3.2), 30 hops at most, 40 bytes each packet, press CTRL+C to break
1 1.1.1.2 (1.1.1.2) 673 ms 425 ms 30 ms
2 1.1.2.2 (1.1.2.2) 580 ms 470 ms 80 ms
3 * * *
The output shows that the ping packet experienced a forwarding failure at the next hop device at 1.1.2.2 (the node displayed as 3 * * *).
4. Check for FIB and ARP entries to both the destination and source.
Perform the following operations on the node where the failure occurred:
¡ Execute the display fib command to check for routes to the destination and source. If no routes exist, check the configuration of routing protocols such as OSPF, IS-IS, and BGP for errors.
¡ If the routes exist and the packets traverse over an Ethernet link, execute the display arp command to check for required ARP entries. If the ARP entries are missing, first troubleshoot ARP issues.
5. Identify whether ICMP attack prevention is configured on the node that has ping failures.
If the device is configured with ICMP attack prevention policies and an ICMP attack is detected, it will discard the ICMP packets, resulting in ping failure.
¡ Execute the display attack-defense icmp-flood statistics ip command to display flood attack detection and prevention statistics and identify whether the device is under ICMP attacks.
¡ Execute the display current-configuration | include icmp-flood and display current-configuration | include “signature detect” commands to identify whether attack prevention policies are configured.
If the device is under an ICMP attack, first locate and mitigate the ICMP attack.
6. Identify the location where the packets were discarded and reason based on packet transmission statistics.
On the device along the ping packet transmission path:
a. Configure a QoS policy to use ACLs to filter ping packets, and then apply the QoS policy in both the inbound and outbound directions of the interface where the ping packets traverse.
b. Execute the display qos policy interface command to check the number of IP packets successfully matched by the QoS policy on the interface. If the number increases, it indicates that the device has received a ping packet. If there is no increase, the device has not received any ping packet. In this case, execute the debugging ip packet command to enable IP packet debugging and further troubleshoot and resolve the issue.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
Tracert failure
Symptom
The output from the tracert command contains line ***, indicating that some nodes are unreachable to each other and the tracert operation has failed.
Common causes
The following are the common causes for this type of issue:
· No corresponding route or ARP entries exist.
· Sending ICMP timeout packets is not enabled on the intermediate device.
· Sending ICMP destination unreachable packets is not enabled at the destination.
Analysis
The troubleshooting flow for issues of this type is as follows:
1. Identify whether the intermediate device is enabled to send ICMP timeout packets.
2. Identify whether the destination is enabled to send ICMP destination unreachable packets.
3. Check for ARP and FIB entries that reach the destination.
Figure 187 shows the fault diagnosis flowchart.
Figure 187 Flowchart for troubleshooting tracert failure
Solution
To resolve the issue:
1. Identify whether the intermediate device is enabled to send ICMP timeout packets.
# View the path that the packets traverse from the source to the destination (assuming there are only two hops from source to destination, with the destination IP address as 1.1.2.2).
<Sysname> tracert 1.1.2.2
traceroute to 1.1.2.2 (1.1.2.2), 30 hops at most, 40 bytes each packet, press CTRL+C to break
1 * * *
2 1.1.2.2 (1.1.2.2) [AS 100] 580 ms 470 ms 80 ms
When the previous output is displayed, log in to the intermediate device and execute the ip ttl-expires enable command to enable sending ICMP timeout packets. If the issue persists, go to the next step.
2. Identify whether the destination is enabled to send ICMP destination unreachable packets.
# View the path that the packets traverse from the source to the destination (assuming there are only two hops from source to destination, with the destination IP address as 1.1.2.2).
<Sysname> tracert 1.1.2.2
traceroute to 1.1.2.2 (1.1.2.2), 30 hops at most, 40 bytes each packet, press CTRL+C to break
1 1.1.1.2 (1.1.1.2) [AS 99] 560 ms 430 ms 50 ms
2 * * *
When the previous output is displayed, log in to the intermediate device and execute the ip unreachables enable command to enable sending ICMP destination unreachable packets. If the issue persists, continue with the following steps.
3. Check for the corresponding FIB and ARP entries on the node where the failure occurred.
Execute the display fib command on the device that has not responded to ICMP error packets (shown as * * * in the output from the tracert command) to check for the route to the destination.
¡ If no routes exist, check the configuration of routing protocols such as OSPF, IS-IS, and BGP for errors.
¡ If the routes exist and the packets traverse over an Ethernet link, execute the display arp command to check for required ARP entries. If the ARP entries are missing, first troubleshoot ARP issues.
4. Identify whether the source has received the ICMP error packets.
Execute the display icmp statistics command multiple times at the source to identify whether it has received ICMP error packets. The output from the command is as follows:
<Sysname> display icmp statistics
Input: bad formats 0 bad checksum 0
echo 0 destination unreachable 9
source quench 0 redirects 0
echo replies 7 parameter problem 0
timestamp 0 information requests 0
mask requests 0 mask replies 0
time exceeded 3 invalid type 0
router advert 0 router solicit 0
broadcast/multicast echo requests ignored 0
broadcast/multicast timestamp requests ignored 0
...
Observe the changes in the statistics about the previous ICMP packets and identify whether the difference in the time exceeded and destination unreachable values in the Input section matches the number of tracert packets sent. If they do not match, it indicates that the sender did not receive the ICMP error packets.
5. Identify the location where the packets were discarded and reason based on packet transmission statistics.
On the device along the tracert packet transmission path:
a. Configure a QoS policy to use ACLs to filter ping packets, and then apply the QoS policy in both the inbound and outbound directions of the interface where the tracert packets traverse.
b. Execute the display qos policy interface command to check the number of IP packets successfully matched by the QoS policy on the interface. If the number increases, it indicates that the device has received a tracert packet. If there is no increase, the device has not received any tracert packet. In this case, execute the debugging ip packet command to enable IP packet debugging and further troubleshoot and resolve the issue.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
IPv6 ping failure
Symptom
No response is received from the destination within a period of time if a IPv6 ping operation is performed at the source.
Common causes
The following are the common causes for this type of issue:
· The source did not send a request.
· The destination did not send a response.
· Packet loss or long transmission time issues occurred on the intermediate device.
The following are the common causes for this type of issue:
· The transmission delay is long. Even though the source received the response from the destination, the wait timeout timer has expired, causing the IPv6 ping operation to fail.
· The specified parameter values for the ping operation do not match the network capabilities. For example, if an IPv6 ping message is too large, the MTU value for the outgoing interface of the IPv6 ping packet at the intermediate device is smaller, the IPv6 ping packet will be discarded.
· Missing entries in the IPv6 FIB or ND table.
· Attack protection configuration exists.
· Hardware failure.
Analysis
The troubleshooting flow for issues of this type is as follows:
1. Identify whether the parameters for the IPv6 ping operation are correct and adjust the parameters as required.
2. Display IPv6 ping statistics to identify the node where the issue occurred.
3. Check for ND and IPv6 FIB entries that reach the destination.
4. Identify whether the IPv6 ping packets are dropped due to attack protection configuration.
Figure 188 shows the fault diagnosis flowchart.
Figure 188 Flowchart for troubleshooting IPv6 ping operation failure
Solution
To resolve the issue:
1. Identify whether the parameters for the IPv6 ping operation are correct and adjust the parameters as required.
a. Identify whether the transmission delay is too long.
Identify whether the ping ipv6 -t timeout command has been executed. If so, increase the value of the -t keyword (1000 as a best practice) or remove the -t option and perform the IPv6 ping operation again. If the issue is resolved, the issue might be caused by large network delay. If the issue persists, go to the next step.
The -t option specifies the timeout time (in milliseconds) of an ICMPv6 echo reply. The default is 2000. If the source does not receive an ICMPv6 echo reply before the timeout timer expires, it determines that the ICMP echo reply timed out.
b. Identify whether the IPv6 ping packet is discarded because it is too large.
Identify whether the ping ipv6 –s packet-size command has been executed. If it has and the MTU for an outgoing interface on the packet forwarding path is smaller than packet-size, the packet will be discarded, because it is too large and cannot be fragmented. To resolve the issue, reduce the packet length.
The -s packet-size option specifies the length (in bytes) of ICMPv6 echo requests (excluding the IPv6 packet header and the ICMPv6 packet header). The default is 56.
The default MTU for an Ethernet interface is 1500 bytes. To view the MTU for an interface, execute the display interface command.
<Sysname> display interface
Current state: UP
Line protocol state: UP
Description: Interface
Bandwidth: 1000000 kbps
Maximum transmission unit: 1500
…
c. Identify whether a wrong outgoing interface is specified.
Identify whether the ping ipv6 -i interface-type interface-number command has been executed to specify the outgoing interface for IPv6 ping packets. If an outgoing interface is specified, make sure the physical link between the interface and the destination is reachable. If no outgoing interface is specified, specify another interface or remove the -i option.
The -i interface-type interface-number option specifies the source interface for ICMPv6 echo requests. If you do not specify this option, the system uses the primary IP address of the matching route's egress interface as the source interface for ICMPv6 echo requests.
d. Identify whether a source address is specified.
Identify whether the ping ipv6 –a source-ip command has been executed to specify the source address for IPv6 ping packets. If this command has been executed, make sure the intermediate device and the destination have a route to the source IP address.
The -a source-ip option specifies an IPv6 address of the device as the source IPv6 address for ICMPv6 echo requests. If this option is not specified, the source IPv6 address for ICMPv6 echo requests is the primary IPv6 address of the outgoing interface.
e. Identify whether a correct VPN is specified for the destination.
Based on network planning and deployment, determine whether the destination belongs to a specific VPN. If the destination belongs to a VPN, specify the -vpn-instance keyword in the ping ipv6 command.
2. Check the packet statistics for the source, destination, and intermediate devices to determine the device where the IPv6 ping failure occurred.
¡ Identify whether the source has sent an ICMPv6 echo request and received an ICMPv6 echo reply.
After you perform the IPv6 ping operation at the source, execute the display ipv6 icmp statistics command at both the source and destination to check the ICMPv6 packet transmission status. You can determine the orientation of the ping failure based on the number of packets in the Input and Output fields in the statistics:
- If the echo value in the Output section at the source is increasing normally, but the echo reply value in the Input section is not, it indicates that the source has sent a request but has not received any response. If the counts in both the Input and Output sections at the destination remain unchanged, it indicates that the destination neither received the request nor responded. Then, you can determine that a forwarding failure of IPv6 ping packets has occurred from the source to the destination.
- If the echo value in the Output section at the source is increasing normally, but the echo reply value in the Input section is not, it indicates that the source has sent a request but has not received any response. If the counts in both the Input and Output sections at the destination are increasing normally, it indicates that the destination has received the request and has responded. Then, you can determine that a forwarding failure of IPv6 ping packets has occurred from the destination to the source.
The output from the display ipv6 icmp statistics command is shown as follows:
<Sysname> display ipv6 icmp statistics
Input: bad code 0 too short 0
checksum error 0 bad length 0
path MTU changed 0 destination unreachable 0
too big 0 parameter problem 0
echo request 1 echo reply 0
neighbor solicit 0 neighbor advertisement 0
router solicit 0 router advertisement 0
redirect 0 router renumbering 0
output: parameter problem 0 echo request 0
echo reply 1 unreachable no route 0
unreachable admin 0 unreachable beyond scope 0
unreachable address 0 unreachable no port 0
too big 0 time exceed transit 0
time exceed reassembly 0 redirect 0
ratelimited 0 other errors 0
IMPORTANT: · If the destination is a modular device or an IRF member device, and the ICMPv6 packet has reached the destination without being fragmented, execute the display ipv6 icmp statistics command with the slot parameter at the destination to view the ICMPv6 packet statistics. The slot parameter indicates the slot where the interface that received the ICMPv6 packet is located. · If the destination is a modular device or an IRF member device and the ICMPv6 packet is fragmented before it reaches the destination, execute the display ipv6 icmp statistics command at the destination to view ICMPv6 packet statistics. |
3. Identify the node where the failure occurred.
After you determine the orientation of the IPv6 ping failure, execute the tracert ipv6 command to identify the location where the packet was discarded in that direction.
¡ If there are issues from the source to the destination, start troubleshooting from the source.
¡ If there are issues from the destination to the source, start troubleshooting from the destination.
As shown in the example below, perform the IPv6 tracert operation to view the path a packet traverses from the source to the destination and information about Layer 3 devices it traverses. The destination has an IPv6 address of 2::2.
<Sysname> tracert ipv6 2::2
traceroute to 2::2 (2::2), 30 hops at most, 104 byte packets, press CTRL_C to break
1 1::2 1.000 ms 0.000 ms 1.000 ms
2 * * *
The output shows that the IPv6 ping packet experienced a forwarding failure at the next hop device at 1::2 (the node displayed as 2 * * *).
To perform the IPv6 tracert operation:
¡ Execute the ipv6 unreachables enable command in system view of the intermediate device to enable sending ICMPv6 time exceeded messages.
¡ Execute the ipv6 unreachables enable command to enable sending ICMP destination unreachable packets in system view of the destination device.
4. Check for IPv6 FIB and ND entries to the destination and source.
Perform the following operations on the node where the failure occurred:
¡ Execute the display ipv6 fib command to check for routes to the destination and source. If no routes exist, check the configuration of routing protocols such as IGP and BGP for errors.
¡ If the routes exist and the packets traverse over an Ethernet link, execute the display ipv6 neighbors command to check for required ND entries. If the ND entries are missing, first troubleshoot ND issues.
5. Identify whether ICMPv6 attack prevention is configured on the node that has ping failures.
If the device is configured with ICMPv6 attack prevention policies and an ICMPv6 attack is detected, it will discard the ICMPv6 packets, resulting in IPv6 ping failure.
¡ Execute the display attack-defense icmpv6-flood statistics ipv6 command to display flood attack detection and prevention statistics and determine if the device is under ICMPv6 attacks.
¡ Execute the display current-configuration | include icmpv6-flood and display current-configuration | include “signature detect” commands to identify whether attack prevention policies are configured.
If the device is under an ICMPv6 attack, first locate and mitigate the ICMPv6 attack.
6. Identify the packet loss location and reason based on packet transmission statistics.
On the devices along the IPv6 ping packet transmission path:
a. Configure a QoS policy to use IPv6 ACLs to filter IPv6 ping packets, and then apply the QoS policy in both the inbound and outbound directions of the interface where the IPv6 ping packets traverse.
b. Execute the display qos policy interface command to check the number of IPv6 packets successfully matched by the QoS policy on the interface. If the number increases, it indicates that the device has received an IPv6 ping packet. If there is no increase, the device has not received any IPv6 ping packet. In this case, execute the debugging ipv6 packet command to enable IPv6 packet debugging and further troubleshoot and resolve the issue.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
None.
Log messages
None.
RMON issues
Failure of the NMS to receive RMON alarm messages
Symptom
The network management system (NMS) fails to receive RMON alarm messages.
Common causes
The following are the common causes of this type of issue:
· The device and the NMS are not reachable to reach other.
· SNMP notification configuration error.
· The RMON statistics table has not been created.
· The RMON event table has not been created.
· The RMON alarm table has not been created.
· Alarm variable configuration error.
Troubleshooting flowchart
Figure 189 shows the troubleshooting flowchart.
Figure 189 Flowchart for troubleshooting failure of the NMS to receive RMON alarms
Solution
1. Execute the ping command to test the reachability between the device and NMS.
¡ If the ping is successful, the device and NMS are reachable to each other. Proceed to step 2.
¡ If the ping fails, see the ping failure troubleshooting procedure in ping and tracert troubleshooting guide to resolve the network connectivity issue. Proceed to step 2 after the ping succeeds.
2. Identify if the SNMP notification configuration is correct.
RMON is an SNMP-based network management protocol, sending RMON alarm messages based on the SNMP notification channel. For the NMS to receive RMON alarm messages, configure SNMP notification on the device, and ensure that the NMS can receive SNMP notifications normally.
If the NMS can receive any of the following SNMP notifications, proceed to step 3. If the NMS cannot receive any of the following notifications, see the troubleshooting procedure for failure of the NMS to receive SNMP notifications in "Troubleshooting network management & monitoring to identify and resolve the issue.
¡ SNMP alive traps. After SNMP is enabled on the device, the device sends Notification hh3cPeriodicalTrap(1.3.6.1.4.1.25506.2.38.1.6.3.0.1) to the NMS by default at intervals of 60 seconds.
¡ Login and logout notifications. You can log in or log out the device via Telnet, triggering the device to automatically generate and send corresponding login and logout notifications. Then verify whether the NMS can receive the notifications sent from the device.
The login notification is as follows:
Notification hh3cLogIn(1.3.6.1.4.1.25506.2.2.1.1.3.0.1) with hh3cTerminalUserName(1.3.6.1.4.1.25506.2.2.1.1.2.1.0)=;hh3cTerminalSource(1.3.6.1.4.1.25506.2.2.1.1.2.2.0)=VTY.
The logout notification is as follows:
Notification hh3cLogOut(1.3.6.1.4.1.25506.2.2.1.1.3.0.2) with hh3cTerminalUserName(1.3.6.1.4.1.25506.2.2.1.1.2.1.0)=;hh3cTerminalSource(1.3.6.1.4.1.25506.2.2.1.1.2.2.0)=VTY.
¡ linkUP and linkDown notifications. Execute the shutdown and undo shutdown commands on an interface that is physically up to trigger the device to generate linkDown and linkUP notifications. Then verify whether the NMS can receive the notifications generated by the device.
The linkup notification is as follows:
Notification linkUp(1.3.6.1.6.3.1.1.5.4) with ifIndex(1.3.6.1.2.1.2.2.1.1.961)=961;ifAdminStatus(1.3.6.1.2.1.2.2.1.7.961)=1;ifOperStatus(1.3.6.1.2.1.2.2.1.8.961)=1.
The linkdown notification is as follows:
Notification linkDown(1.3.6.1.6.3.1.1.5.3) with ifIndex(1.3.6.1.2.1.2.2.1.1.961)=961;ifAdminStatus(1.3.6.1.2.1.2.2.1.7.961)=2;ifOperStatus(1.3.6.1.2.1.2.2.1.8.961)=2.
¡ Identify if the configuration of the RMON event table is correct.
Execute the display rmon event command on the device to Identify whether an RMON event table has been created. If the event table is empty, use the rmon event command to create event entries. Make sure actions of these entries include generation of alarm messages.
3. Identify if the configuration of the RMON alarm table or the RMON private alarm table is correct.
Execute the display rmon alarm command on the device to identify whether the RMON alarm table is configured and whether the monitored variables and trigger conditions are consistent with the network plan. If the alarm table is empty, or if the monitored variables and trigger conditions are not consistent with the network plan (for example, the monitored variable does not exist, the monitored variable is configured incorrectly, or the alarm trigger condition cannot be reached), create or modify the alarm table entries using the rmon alarm in system view.
Execute the display rmon prialarm on the device to identify whether an RMON private alarm table has been configured and whether the monitored variables and trigger conditions are consistent with the network plan. If the alarm table is empty, or if the monitored variables and trigger conditions are inconsistent with the network plan (such as the monitored variable doesn't exist, the configured variable is configured incorrectly, or the alarm trigger condition can't be reached), execute the rmon prialarm command to create or modify the private alarm entries.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
· risingAlarm (1.3.6.1.2.1.16.0.1)
· fallingAlarm (1.3.6.1.2.1.16.0.2)
Log messages
N/A
SNMP issues
SNMP connection failure
Symptom
The Network Management System (NMS) cannot connect to the device successfully via SNMP.
Common causes
The following are the common causes of this type of issue:
· Network connection failure between the device and NMS.
· Authentication failure because of configuration errors.
· The device is under an SNMP packet attack and enters SNMP silence mode.
Troubleshooting flow
Figure 190 shows the troubleshooting flowchart.
Figure 190 Flowchart for troubleshooting SNMP connection failure
Solution
To resolve the issue:
1. Execute the ping command to identify if the device and NMS are reachable to each other.
¡ If the ping is successful, the device and NMS are reachable to each other. Proceed to step 2.
¡ If the ping fails, see the ping failure troubleshooting procedure described in ping and tracert troubleshooting guide to resolve the network connection issue. After the ping is successful, re-establish the SNMP connection. If the SNMP connection still cannot be established after that, proceed to step 2.
2. Identify if the SNMP configuration is correct.
a. Execute the display snmp-agent sys-info version command to identify the SNMP version used on the device. The SNMP version used on the device and the NMS must be the same. If they are different, use the snmp-agent sys-info version command to modify their SNMP versions to be the same.
b. If SNMPv1 or SNMPv2c is used, execute the display snmp-agent community command to view the community information configured on the device (including the community name and ACLs used). The community name used on the device and the NMS must be the same, and the ACLs configured on the device must permit access from the NMS. If these conditions are not met, execute the snmp-agent community and acl commands to modify the configuration.
c. If SNMPv3 is used, execute the display snmp-agent usm-user command to view SNMPv3 user information (including the username and ACLs used), and execute the display snmp-agent group command to view SNMP group information (including the authentication/encryption mode and ACLs used). The username and authentication/encryption parameters configured on the device and NMS must be consistent, and the ACLs configured on the device must permit access from the NMS. If these conditions are not met, use the snmp-agent group, snmp-agent usm-user v3, and acl commands to modify the configuration.
3. Identify if the device has entered SNMP silence state.
If 100 or more SNMP messages fails authentication on the device within a statistical period (1 minute), the system considers that the device is under an SNMP attack. Consequently, the SNMP module enters silence state (the device will print a log "SNMP agent is now silent"), and the device will not respond to any received SNMP messages for approximately 4 to 5 minutes. After the connection is established, re-enable SNMP silence.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module Name: SNMPv2-MIB
· authenticationFailure (1.3.6.1.6.3.1.1.5.5)
Log messages
· SNMP/3/SNMP_ACL_RESTRICTION
· SNMP/4/SNMP_AUTHENTICATION_FAILURE
· SNMP/4/SNMP_SILENT
SNMP operation timeout
Symptom
The NMS performs an SNMP Get or Set operations on the device, but the operation times out.
Common causes
The following are the common causes of this type of issue:
· Interruption of the SNMP connection, causing failure of the NMS to access the device.
· Packet loss on the network, causing failure of the device to receive the SNMP request.
· Insufficient storage space on the storage medium of the device, causing the device to be unable to process the SNMP request.
· The device is busy processing other tasks, which prevents it from processing the SNMP request.
· The SNMP process (acting as an SNMP agent) is busy processing other SNMP requests, preventing it from responding to the current SNMP request.
· An exception occurs while the SNMP process is handling the current SNMP request.
Troubleshooting flow
Figure 191 shows the troubleshooting flowchart.
Figure 191 Troubleshooting flowchart for SNMP operation timeout
Solution
To resolve the issue:
1. Locate and resolve the SNMP connection issue.
Check the SNMP connection on the NMS. If the connection timed out or failed, see the troubleshooting procedure for SNMP connection failure to locate and resolve the SNMP connection issue.
2. Identify if there is any packet loss on the network.
Execute the ping –c count host command on the NMS, for example, setting the count parameter to 100, and the host parameter to the IP address of the device. Identify the packet loss field in the ping command result to determine if there is any packet loss on the network.
¡ If there is no packet loss, proceed to step 3.
¡ If there is packet loss, see the ping failure troubleshooting procedure in ping and tracert troubleshooting guide to resolve the network connection issue.
|
NOTE: –c count: Specifies the number of ICMP echo requests that are sent to the destination. The value range is 1 to 4294967295, and the default is 5. |
3. Locate and resolve the issue of insufficient storage space on the storage media of the device.
Execute the display memory-threshold command in any view. If the Current free-memory state field value in the command output is normal, the storage space on the storage media of the device is sufficient. If the field value is not normal, the storage space on the storage media of the device is sufficient. Use the following methods to free up storage space.
¡ Use the reset recycle-bin command to remove files from the recycle bin. (The files in the recycle bin also occupy space on the storage media.)
¡ Use the delete /unreserved file command to delete unused files completely at once. If you do not specify the /unreserved parameter, the deleted files will be stored in the recycle bin.
|
NOTE: Depending on the device model, the storage medium supported by the device might be Flash or compact flash card (CF card). |
4. Locate and resolve the device busyness issue.
a. Execute the display cpu-usage command several times in any view to identify if the CPU usage of the device remains at a high level.
b. Execute the monitor process command in any view to identify if there are processes with high CPU usage. If a service process has a high CPU usage, you can reduce the CPU usage by restarting the service as needed. Whether the process can be restarted depends on the device model.
5. Identify and resolve the issue with the SNMP process.
Execute the probe command in the system view to enter probe view, and then execute the display system internal snmp-agent operation in-progress command several times to view information about the SNMP operations that the device is processing.
¡ If the Request ID value in the outputs is constantly changing, the SNMP process is handling different requests, and the current SNMP process is busy. You need to reduce the frequency of SNMP operations from the NMS to the device.
¡ If the Request ID value in the outputs remains unchanged, the SNMP process is continuously handling the same request, and the SNMP process times out while processing the request. You can resolve the issue by using the following method:
- Execute the undo snmp-agent command and snmp-agent command in sequence to restart the SNMP process.
- Execute the display system internal snmp-agent operation timed-out and display system internal snmp-agent packet timed-out commands to identify time-consuming SNMP operations and the MIB objects involved in the operation. Reduce or avoid execution of similar operations.
6. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
NMS failure to manage the device
Symptom
The NMS performs an SNMP Set or Get operation on the device, but the device does not respond or a prompt is displayed indicating that the operation has failed.
Common causes
The following are the common causes of this type of issue:
· The NMS cannot connect to the device via SNMP.
· The SNMP version used by the NMS does not match that of the MIB object.
· The NMS does not have access permission to the device.
· The SNMP process on the device is busy, unable to respond to the current SNMP request.
Troubleshooting flow
Figure 192 shows the troubleshooting flowchart.
Figure 192 Troubleshooting flowchart for NMS failure to manage the device
Solution
To resolve the issue:
1. Identify if the NMS can connect to the device via SNMP.
If the NMS fails to connect to the device via SNMP, see the troubleshooting procedure for SNMP connection failure to resolve the issue.
2. Identify whether the SNMP version used by the NMS has the access permission to the MIB object.
For example, the snmpUsmMIB object allows access only via SNMPv3. Data types such as Integer32, Unsigned32, and Counter64 are supported only in SNMPv2c and SNMPv3. If the NMS connects the device via SNMPv1, it will not be able to access MIB objects of Integer32, Unsigned32, and Counter64 data types. For the data type of a MIB object, see the SYNTAX field of the MIB object in the MIB file.
hh3cDhcpServer2BadNum OBJECT-TYPE
SYNTAX Counter64
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"The total number of the bad packets received."
::= { hh3cDhcpServer2StatGroup 1 }
If the NMS cannot access the MIB object because of version incompatibility, specify SNMPv2c or SNMPv3 for the NMS. Then re-establish connection with the device, and perform SNMP Get and Set operations.
3. Identify if the MIB object supports the current access operation.
Access the MIB object based on the access type supported by it. For the access type of a MIB object, see the MAX-ACCESS field of the MIB object in the MIB file.
hh3cDhcpServer2BadNum OBJECT-TYPE
SYNTAX Counter64
MAX-ACCESS read-only
STATUS current
DESCRIPTION
"The total number of the bad packets received."
::= { hh3cDhcpServer2StatGroup 1 }
4. Identify the access permissions of the NMS. If the access permissions are insufficient, modify the corresponding configuration to assign the required permissions to the NMS.
SNMP uses the following modes to control access to MIB objects:
¡ View-based Access Control Model—VACM mode controls access to MIB objects by assigning MIB views to SNMP communities or users. You can view the MIB view related configuration by using the display current-configuration | include view command and view the detailed information of the MIB view by using the display snmp-agent mib-view command. If the configuration is incorrect, modify the relevant MIB configuration.
The device supports three types of MIB views.
- Read view: The NMS can only read the value of the objects in this view.
- Write-view: The NMS can read and write the value of the objects in this view.
- Notify-view: When the notification object included in this view reaches the trigger condition, the NMS will receive the corresponding trap/inform message.
¡ Role based access control—RBAC mode controls access to MIB objects by assigning user roles to SNMP communities or users. The users access and operate specific system functions and resource objects according to their roles. When creating an SNMPv3 user, you can assign a user role to it. The rules established for the user role will define the MIB objects that the user can access and access permissions to these MIB objects. If an error is found in the role permission configuration, execute the role name command to enter user role view and modify the user role rules.
- SNMP communities or users with the network-admin or level-15 predefined user role have read and write access to all MIB objects.
- SNMP communities or users with the network-operator predefined user role have read-only access to all MIB objects.
- SNMP communities or users with a user-defined user role have access rights to MIB objects as specified by the rule command.
|
NOTE: Only users with network-admin or level-15 user roles can configure SNMP communities, users, or groups after logging in to the device. For successful configuration, make sure the user has a network-admin or level-15 user role. |
5. Determine if the SNMP process is busy.
If the device does not respond to the request from the NMS because the SNMP process on the device busy, you can resolve the issue by referring to the troubleshooting procedure for the SNMP operation timeout issue.
6. Other recommendations
As a best practice, connect the NMS to a service port on the device for access, as the service port has better packet processing capabilities than the network management port. This ensures that SNMP packets can be processed as quickly as possible.
If multiple NMSs access a device simultaneously and the device responds slowly, reduce the access frequency to alleviate the device load. For example, set the access interval to be a minimum of 5 minutes.
7. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module Name: SNMPv2-MIB
authenticationFailure (1.3.6.1.6.3.1.1.5.5)
Log messages
· SNMP/3/SNMP_ACL_RESTRICTION
· SNMP/4/SNMP_AUTHENTICATION_FAILURE
· SNMP/4/SNMP_SILENT
Failure of the NMS to receive SNMP notifications
Symptom
The NMS fails to receive SNMP notifications sent from the device.
Common causes
The following are the common causes of this type of issue:
· The device and NMS are unreachable to each other, or SNMP malfunctions, resulting in SNMP connection failure.
· Configuration error on the device side or NMS side.
· No notifications have been generated from the service modules on the device.
· Loss of notifications.
· The SNMP notification messages are too large, exceeding the notification message size that the SNMP module can process.
Troubleshooting flow
Figure 193 shows the troubleshooting flowchart.
Solution
To resolve the issue:
1. Execute the snmp-agent trap log command in system view to enable logging for SNMP notifications. When the device sends an SNMP notification to the NMS, a log will be generated on the device to record that notification.
2. Execute the display logbuffer | include SNMP_NOTIFY command to identify whether a notification has been generated on the device and the detailed information about the generated notification.
¡ If a notification has been generated, proceed to step 3.
¡ If no notification has been generated, proceed to step 4.
a. Identify whether the device can establish an SNMP connection with the NMS. If the connection establishment fails, see the SNMP connection failure troubleshooting procedure to resolve the issue.
b. Execute the display current-configuration | include snmp command to identify whether the snmp-agent target-host trap command has been configured correctly. If any configuration error exists, modify the configuration to ensure that the target IP address (VPN parameters) and port number configured in that command are consistent with those used by the NMS to receive SNMP notifications. In addition, make sure the device and NMS are consistent in SNMP version and security word.
- In SNMPv1 or SNMPv2c, the security word is the community name. To create an SNMP community name, execute the snmp-agent community command.
- In SNMPv3, the security word is the username, and the device and NMS must have the same authentication and encryption levels. To create an SNMPv3 user, execute the snmp-agent group and snmp-agent usm-user v3 commands. The authentication and encryption modes and authentication password and encryption password (if used) configured for the user must be consistent with those on the NMS side. Also, the authentication and encryption levels configured for the user must be higher than those specified in the snmp-agent target-host trap command. Security levels increases from unauthenticated and unencrypted, authenticated and unencrypted, to authenticated and encrypted.
- The MIB view accessible to the community name and user must include the corresponding notification object. If this condition is not met, the device will not send the notifications to the NMS due to permission issue.
c. Execute the debugging udp packet command to enable debugging for UDP packets and determine whether the notifications sent from the device are too large. If the data encapsulated by the service module is relatively large, the notification packets might exceed the maximum length of SNMP packets that the device can transmit and be discarded. You can adjust the maximum length of SNMP packets that the device can transmit by using the snmp-agent packet max-size command. Take into account the MTU value of the network and whether fragmentation is supported when making the adjustment.
*Dec 27 22:35:41:203 2021 Sysname SOCKET/7/UDP: -MDC=1;
UDP Output:
UDP Packet: vrf = 0, src = 192.168.56.121/30912, dst = 192.168.56.1/162
len = 79, checksum = 0xd98f
d. Identify whether a firewall on the network filters the notifications.
If a firewall on the network filters the notifications, use the following method to resolve the issue:
- If the firewall has filtered the notifications by source IP, use the snmp-agent trap source command to modify the source IP address of the notifications.
- Modify the firewall rules to permit the notifications.
e. Identify if the network is unstable and has packet loss.
If there is packet loss on the network, use the following methods to resolve the issue:
- Check the network and resolve the packet loss issue on the network.
- Configure the device to send informs instead of traps. Informs require acknowledge from the NMS, more reliable than traps. Informs are supported only in SNMPv2c and SNMPv3.
4. Identify if the SNMP module has sent notifications to the NMS.
a. Use the display snmp-agent trap-list command to identify whether SNMP notification has been enabled for the service modules. If not enabled, use the snmp-agent trap enable command to enable SNMP notification for the modules.
b. Identify if the notification triggering conditions have been met. For example, the interface status change notification is generated when the status of an interface changes, and high CPU or memory usage notification is generated when the CPU or memory usage exceeds the threshold.
- If no notification triggering conditions have been met, it is normal that no notification is generated.
- If a notification triggering condition has been met but the device has not sent a notification, proceed to step c.
c. Use the display snmp-agent trap queue command to identify if the notification buffer is full. If the Message number is greater than the Queue size, the notification buffer might be full, and newly generated notifications might be discarded. In this case, you can use the snmp-agent trap queue-size and snmp-agent trap life commands in system view to adjust the performance parameters of the notification buffer.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· SNMP/6/SNMP_NOTIFY
· SNMP/3/SNMP_INFORM_LOST
Mirroring issues
Failure to receive mirrored packets on monitoring device after flow mirroring configuration
Symptom
The monitoring device fails to receive mirrored packets after flow mirroring configuration.
Common causes
The following are the common causes of this type of issue:
· The link between the destination interface and the monitored network fails.
· The QoS policy has not been applied or the packets do not match the QoS policy.
· When you configure a traffic behavior, the specified flow mirroring interface is incorrect.
Troubleshooting flow
Figure 105 shows the troubleshooting flowchart.
Figure 194 Flowchart for troubleshooting failure to receive mirrored packets on monitoring device after flow mirroring configuration
Solution
1. Identify whether the mirroring source ports can successfully send and receive packets.
Execute the display interface interface-type interface-number command on the source device. Check the received and sent packet statistics values in the Input(total) and Output(total) fields in the command output for a mirroring source port.
¡ If the statistics of the packets received or sent by the mirroring source port are 0 or unchanged, the link between the device and the monitored network might fail (for example, the related interfaces are down). Resolve this issue.
¡ If the statistics of the packets received and sent by the mirroring source port are not zero and continuously change, proceed to the next step.
2. Identify whether the QoS policy has been correctly applied.
Identify whether the QoS policy that matches the packets to be mirrored has been applied and whether the applied QoS policy is correct.
Execute the display qos policy interface command on the source device to identify whether the QoS policy has been applied to the mirroring source port.
¡ If not, apply the QoS policy to the mirroring source port as needed.
¡ If yes, continue to identify whether the QoS policy is configured correctly. Execute the display qos policy command on the device to check the configuration of the QoS policy. In the command output, the Classifier fields and Behavior fields display the configured traffic classes and traffic behaviors, respectively.
- If class-behavior associations are incorrect, execute the qos policy command in system view to enter the view of the QoS policy. Then, execute the classifier behavior command to modify the class-behavior associations in the QoS policy. For how to modify a QoS policy, see "Ineffective MQC QoS policy."
- If class-behavior associations are correct, proceed to the next step.
3. Identify whether packets are sent out the destination interface.
Execute the display interface interface-type interface-number command on the destination device. Check the sent packet statistics values in the Output(total) field in the command output.
¡ If the statistics of packets sent out the destination interface are 0 or do not change, execute the display interface interface-type interface-number command on the device. Check the Current state field in the command output to identify whether the interface is physically up.
- If the interface is up, proceed to the following steps.
- If the interface is down, resolve the issues that the interface is physically down.
¡ If the statistics of the packets sent by the destination interface are not zero and continuously change, proceed to step 8.
4. In the traffic behavior of the QoS policy applied to the destination interface, identify whether the flow mirroring destination interface specified in the mirror-to interface command is the destination interface.
¡ If not, reconfigure the flow mirroring destination interface as the correct one by using the mirror-to interface command.
¡ If yes, proceed to step 8.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
· QOS_POLICY_APPLYGLOBAL_CBFAIL
Failure to receive mirrored packets on monitoring device after port mirroring configuration
Symptom
The monitoring device fails to receive mirrored packets after port mirroring configuration.
Possible reasons
The following are the common causes of this type of issue:
· The link between the mirroring source ports and the monitored network fails.
· The configuration of the mirroring source ports or the monitor ports is incorrect.
Troubleshooting flow
Figure 195 shows the troubleshooting flowchart.
Solution
1. Identify whether the mirroring source ports can successfully send and receive packets.
Execute the display interface interface-type interface-number command on the source device. Check the received and sent packet statistics values in the Input(total) and Output(total) fields in the command output for a mirroring source port.
¡ If the statistics of the packets received or sent by the mirroring source port are 0 or unchanged, the link between the device and the monitored network might fail (for example, the related interfaces are down). Resolve this issue.
¡ If the statistics of the packets received and sent by the mirroring source port are not zero and continuously change, proceed to the next step.
2. Identify whether the port mirroring configuration is correct.
On the source device, execute the display mirroring-group command to check port mirroring configuration and identify whether the configured mirroring source ports and monitor ports are correct. In the command output, the Mirroring port field displays the mirroring source ports, and the Monitor port field displays the monitor ports.
¡ If the configuration is correct, proceed to the next step.
¡ If the configuration is incorrect, execute the mirroring-group mirroring-port and mirroring-group monitor-port commands in system view to correctly reconfigure the mirroring source ports and monitor ports.
3. Identify whether packets are sent out a destination port.
Execute the display interface interface-type interface-number command on the destination device. Check the Output(total) field in the command output to view the statistics of the packets sent by the port.
¡ If the statistics of packets sent out the destination port are 0 or do not change, execute the display interface interface-type interface-number command on the device. Check the Current state field in the command output to identify whether the interface is physically up.
- If the interface is up, proceed to the following steps.
- If the interface is down, resolve the issues that the interface is physically down.
¡ If the statistics of the packets sent by the destination port are not zero and continuously change, proceed to the next step.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
N/A
gRPC issues
Inappropriate gRPC sampling interval
Symptom
In the data packets sent to the collector in the gRPC dial-out mode, the sampling interval of some data sources is inconsistent with the specified sampling interval.
Common causes
The following are the common causes of this type of issue:
· Some sensor paths cannot achieve the specified sampling interval accuracy, and the gRPC module collects data at their own minimum sampling intervals.
· The device CPU is busy.
· The sensor path corresponding to the data sources is ifmgr/interfaces, a route type path, or a statistics type path. Because data in such a sensor path is too much, the device cannot complete data collection within the specified sampling interval.
For example, if the sensor path is route/ipv4routes, the device cannot complete data collection within a small sampling interval when the number of route entries reaches 100000.
Troubleshooting flow
Figure 196 shows the troubleshooting flowchart.
Figure 196 Flowchart for troubleshooting the inappropriate gRPC sampling interval issue
Solution
1. Use the display system internal telemetry command to identify whether the sensor paths use their minimum sampling intervals.
In the following example, the sampling interval (100 milliseconds) specified for the route/ipv4routes sensor path is smaller than the effective sampling interval (5 seconds). This indicates that the sensor path actually uses the minimum sampling interval (5 seconds). To make sure the specified sampling interval is the same as the effective sampling interval, specify a sampling interval that is larger than the minimum sampling interval.
<Sysname> system-view
[Sysname] probe
[Sysname-probe] display system internal telemetry
Current-time: 2021-12-25T15:51:45.530
--------------------Subscription s----------------------
Subscription mode: non-gNMI
DSCP value: 0
Source address or interface: Not configured
Telemetry data model: 2-layer
Encoding: JSON
Protocol: GRPC
Sensor group: s
Sampling interval: 100 milliseconds
Sampling type Effective sampling interval Sensor path
Periodic 5 seconds route/ipv4routes
Destination group: d
...
[Sysname-probe] quit
2. Identify whether the device CPU is busy.
Use the display cpu-usage command to view the CPU usage.
[Sysname] display cpu-usage
Slot 0 CPU 0 CPU usage:
70% in last 5 seconds
62% in last 1 minute
60% in last 5 minutes
...
If the CPU usage of the master device or global active MPU exceeds 60%, the telemetry sampling efficiency will be affected. As a result, the device cannot complete data collection within the sampling interval. To resolve the issue, use one of the following methods as required:
¡ Wait for the CPU usage to drop below 60%.
¡ Reduce the number of sensor paths to reduce CPU usage.
3. Identify whether a sensor path reports a large amount of data.
Enter telemetry view and use the display this command to view the configuration.
[Sysname] telemetry
[Sysname-telemetry] display this
#
telemetry
sensor-group s
sensor path route/ipv4routes
destination-group d
ipv4-address 192.168.79.155 port 50051
subscription s
sensor-group s sample-interval 5
destination-group d
#
Identify whether the time difference between two adjacent data packets sent to the collector is a multiple of the specified sampling interval on the network management side when the following sensor paths exist:
¡ ifmgr/interfaces.
¡ Route type paths.
¡ Statistics type paths.
|
NOTE: · A statistics type sensor path typically includes the statistics node, for example, ifmgr/statistics. · A route type sensor path typically includes the route node, for example, route/ipv4routes. |
Assuming that the sampling interval specified for the route/ipv4routes sensor path is 5 seconds. The time difference between two data packets sent to the collector is the difference between values of the two Timestamps fields (in milliseconds) = (1641482427751 – 1641482417751)/1000 = 10 seconds, which is a multiple of 5 seconds.
Producer-Name: H3C
...
Sensor-Path: route/ipv4routes
Json-Data: {"Notification":{"Timestamp":"1641482417751",...
Producer-Name: H3C
...
Sensor-Path: route/ipv4routes
Json-Data: {"Notification":{"Timestamp":"1641482427751",...
The output shows that data collected from the sensor path is too much, which requires multiple sampling intervals. To make sure the specified sampling interval is the same as the effective sampling interval, specify a sampling interval that is larger than the time required for data reporting.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Troubleshooting DPI issues
URL filtering issues
Failed to connect to the cloud query server
Symptom
As shown in Figure 197, the device is connected to a LAN through security zone Trust and to the Internet through security zone Untrust. The device performs URL filtering when the host accesses the Internet. After URL filtering cloud query is configured, the device cannot connect to the cloud query server.
Common causes
The following are the common causes of this type of issue:
· A physical link error results in communication failure between the device and the cloud server.
· Packets from the device to the cloud query server are dropped by a security policy due to incorrect security policy configuration.
· The license for URL filtering has expired.
· The URL filtering cloud query configuration is incorrect.
Analysis
The troubleshooting flow for issues of this type is as follows:
1. Identify whether the device and the cloud query server can reach each other.
2. Inspect the security policy configuration on the device and identify whether it allows the device to send packets to the cloud query server.
3. Identify whether the license for URL filtering is valid on the device.
4. Identify whether the URL filtering cloud query configuration is correct.
Figure 198 shows the troubleshooting flowchart.
Figure 198 Flowchart for troubleshooting cloud query server connection failure
Solution
The following troubleshooting steps are based on the IP address information and security zone information in the scenario illustrated in Figure 197 for fault analysis.
1. The domain name of H3C’s cloud query server is sec.h3c.com. The server is deployed in a public network, and the ping service is disabled by default. You cannot test the connectivity to the server by pinging it from the device. You can ping other IP addresses in the public network, such as www.h3c.com, to verity that the device can access the Internet.
a. Use the ping command on the device to check its network connection with www.h3c.com.
<Sysname> ping www.h3c.com
C:\Users\usera>ping www.h3c.com
Pinging www.h3c.com (10.63.16.77) with 32 bytes of data:
Reply from 10.63.16.77: bytes=32 time=25ms TTL=122
Reply from 10.63.16.77: bytes=32 time=25ms TTL=122
Reply from 10.63.16.77: bytes=32 time=25ms TTL=122
Reply from 10.63.16.77: bytes=32 time=25ms TTL=122
Ping statistics for 10.63.16.77:
Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 25ms, Maximum = 25ms, Average = 25ms
- If the device can ping www.h3c.com successfully, proceed to 2.
- If the device fails to ping www.h3c.com, proceed to step b.
b. Identify whether the DNS server configuration is correct.
<Sysname> display current-configuration | include dns
dns server 114.114.114.114
- If no DNS server is configured or the DNS server address is incorrect, configure a correct DNS server address. If the address still cannot be pinged, proceed to step c.
- If a correct DNS server address has been configured, proceed to step c.
c. Troubleshoot the ping failure as described in the network management & monitoring troubleshooting procedures. If the issue persists after the device can ping www.h3c.com successfully, proceed to step 2.
2. Identify whether a security policy on the device allows packets to the cloud query server.
a. Execute the display security-policy ip command to display the configuration information for all security policies.
[Device] display security-policy ip
Security-policy ip
rule 0 name trust-untrust
action pass
source-zone trust
destination-zone untrust
source-ip-host 192.168.1.3
rule 1 name local-untrust
action pass
source-zone local
destination-zone untrust
b. Review the active security policy rules in top-down order to find the policy rule that matches packets from the device to the server.
- If the rule is not found, modify the existing security policy rule or add a new security policy rule to allow the device to access the server. The requirements for related security policy rules are as follows:
The source security zone used as a filter condition must be the Local security zone.
The destination security zone used as a filter condition must be the Trust security zone.
The action of the security policy rule is set to pass.
The commands for creating a security policy are as follows:
[Device] security-policy ip
[Device-security-policy-ip] rule name local-untrust
[Device-security-policy-ip-2-local-untrust] source-zone local
[Device-security-policy-ip-2-local-untrust] destination-zone untrust
[Device-security-policy-ip-2-local-untrust] action pass
[Device-security-policy-ip-2-local-untrust] quit
[Device-security-policy-ip] quit
- If the rule is found, examine whether the rule action is pass. If the action is drop, change it to pass.
- If the issue persists after packets to the cloud query server are permitted, proceed to step 3.
3. Identify whether the license for URL filtering is valid.
a. Execute the display license feature command on the device and check the UFLT field for the license state.
<Sysname> display license feature
Total: 32 Usage: 2
Feature Licensed State
ACG N -
AV Y Trial
IPRPT N -
IPS N -
SLB N -
SSLVPN Y Pre-licensed
UFLT Y Formal
WAF N -
WEB-CACHE N -
Table 23 Command output
Field |
Description |
Total |
Total number of licenses that can be installed. |
Usage |
Number of licenses that has been installed. |
Feature |
Feature that must be licensed before being used. |
Licensed |
Licensing state of the feature: · N—Not licensed. · Y—Licensed. |
State |
License type: · Formal—A formal license is installed for the feature and the license is valid. · Trial—A trial license is installed for the feature and the license is valid. · Pre-licensed—A license is pre-installed for the feature and the license is valid. If the feature is not licensed, this field displays a hyphen (-). To use the feature, you must install a valid license. |
b. If the State field displays a hyphen (-), install a valid license. For more information about licensing, see H3C Security Products Licensing Guide. If the issue persists after you install a valid license, proceed to step 4.
c. If the State field displays a value other than a hyphen, proceed to step 4.
4. Identify whether the URL filtering cloud query configuration is correct.
a. Identify whether the default cloud query server is modified.
The default cloud query server is sec.h3c.com. As a best practice, do not change the default.
Execute the display current-configuration | include cloud-server command on the device.
[Device] display current-configuration | include cloud-server
- If the default cloud query server is modified, execute the undo inspect cloud-server command. If the issue persists after you execute the command, proceed to step b.
- If the default cloud query server is not modified, proceed to step b.
b. Identify whether cloud query is enabled in a URL filtering policy.
Execute the following command:
[Device] display current-configuration | include cloud
cloud-query enable
- If cloud query is not enabled, enable cloud query. If the issue persists, proceed to step 5.
- If cloud query is enabled, proceed to step 5.
5. Collect the following information and contact the support:
¡ Results of each step.
¡ Configuration file.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failed to access an HTTP website
Symptom
As shown in Figure 199, the device is connected to a LAN through security zone Trust and to the Internet through security zone Untrust. The device performs URL filtering when the host accesses the Internet. After URL filtering is configured, some websites cannot be accessed.
Common causes
The following are the common causes of this type of issue:
· The URL filtering configuration is incorrect.
· The security policy configuration is incorrect.
Analysis
The troubleshooting flow for issues of this type is as follows:
1. View the error message on the browser page or the log message on the device, and take an action according to the error message or log message.
2. Identify whether the URL filtering configuration is correct.
3. Identify whether the security policy configuration is correct.
Figure 200 shows the troubleshooting flowchart.
Figure 200 Flowchart for troubleshooting HTTP website access failure
Solution
1. View the error message returned by the browser, and take an action accordingly.
URL filtering returns a page to the user's browser after blocking access to a website. This page provides the reason for the block and website information. The following is an example of the information on the page:
Web Access Blocked
Your access to this website was denied. To access this webpage, contact Technical Support.
Reason:The URL of the website hit the URL blacklist.
Category:
URL: http://192.22.2.61/wnm/frame/index.php
You can take an action according to the reason:
¡ If the Reason field displays The URL of the website hit the URL blacklist., execute the display this command in URL filtering policy view and determine the ID of the configured blacklist rule.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] display this
category test action drop logging
add blacklist 1 host text 192.22.2.61
Execute the undo add whitelist 1 command in URL filtering policy view to delete the whitelist rule. Execute the inspect activate command to make the configuration change take effect. If the issue persists, proceed to step 3.
¡ If the Reason field displays The URL of the website hit a user-defined URL category., check the Category field for the name of the user-defined URL category, and use either of the following methods to resolve the issue. In this example, the name of the user-defined URL category is test, and the URL is http://192.22.2.61/wnm/frame/index.php.
Delete the filtering rule that contains the URL as follows:
- Identify the filtering rule ID in user-defined URL category test.
[Sysname] url-filter category test
[Sysname-url-filter-category-test] display this
#
url-filter category test severity 2000
rule 1 host text 192.22.2.61
rule 2 host text *185*
#
Execute the undo rule 1 command in URL category view to delete the filtering rule. Return to system view, and then execute the inspect activate command to make the configuration change take effect. If the issue persists, proceed to step 3.
- Enter the view of the URL filtering policy, and modify the action as permit. If the issue persists, proceed to step 3.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] category test action permit
¡ If the Reason field displays The URL of the website hit a predefined URL category., use either of the following methods to resolve the issue:
- In URL filtering policy view, add a blacklist rule to match the URL.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] add whitelist 1 host text 192.22.2.61
Return to system view, and then execute the inspect activate command to make the configuration change take effect. If the issue persists, proceed to step 3.
- Create a user-defined URL category named test in system view, create a URL filtering rule to match the URL, and specify the permit action for the URL category.
[Sysname] url-filter category test severity 2000
[Sysname-url-filter-category-test] rule 1 host text 192.22.2.61
[Sysname-url-filter-category-test] quit
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] category test action permit
Return to system view, and execute the inspect activate command to make the configuration change take effect. If the issue persists, proceed to step 3.
¡ If the Reason field displays The URL of the website did not match any accessible URL category., use any of the following methods to resolve the issue:
- In URL filtering policy view, add a blacklist rule to match the URL.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] add whitelist 1 host text 192.22.2.61
Return to system view, and then execute the inspect activate command to make the configuration change take effect. If the issue persists, proceed to step 3.
- Create a user-defined URL category named test in system view, create a URL filtering rule to match the URL, and specify the permit action for the URL category.
[Sysname] url-filter category test severity 2000
[Sysname-url-filter-category-test] rule 1 host text 192.22.2.61
[Sysname-url-filter-category-test] quit
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] category test action permit
Return to system view, and execute the inspect activate command to make the configuration change take effect. If the issue persists, proceed to step 3.
- In URL filtering policy view, specify the default action as permit. If the issue persists, proceed to step 3.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] default-action permit
¡ If the Reason field displays No matching whitelist entry was found for the website in whitelist mode., use the following method to resolve the issue:
In URL filtering policy view, add a blacklist rule to match the URL.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] add whitelist 1 host text 192.22.2.61
Return to system view, and execute the inspect activate command to make the configuration take effect. If the issue persists, proceed to step 3.
If the URL is a link on the webpage that matches the whitelist rule, use the following method to resolve the issue:
Execute the display this command in URL filtering policy view to identify whether referer whitelist is enabled.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] display this
#
url-filter policy p1
undo referer-whitelist enable
|
NOTE: The referer whitelist is useful when you want to allow users to access links on the webpages that match a whitelist rule. |
In URL filtering policy view, execute the referer-whitelist enable command to enable referer whitelist. If the issue persists, proceed to step 3.
¡ If the Reason field displays The URL of the website hit the URL reputation signature library., use either of the following methods to resolve the issue:
- In URL filtering policy view, add a blacklist rule to match the URL.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] add whitelist 1 host text 192.22.2.61
Return to system view, and then execute the inspect activate command to make the configuration change take effect. If the issue persists, proceed to step 3.
- Create a user-defined URL category named test in system view, create a URL filtering rule to match the URL, and specify the permit action for the URL category.
[Sysname] url-filter category test severity 2000
[Sysname-url-filter-category-test] rule 1 host text 192.22.2.61
[Sysname-url-filter-category-test] quit
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] category test action permit
Return to system view, and execute the inspect activate command to make the configuration change take effect. If the issue persists, proceed to step 3.
2. View URL filtering logs.
If you enable URL filtering logging, the device generates a log message when a packet matches a URL filtering rule in a URL filtering policy or when the default action is executed. You can obtain information such as the name of the matching URL filtering policy, the blocked website URL, and the URL category from the log message. The following takes the fast log message as an example.
¡ UFLT_MATCH_IPV4_LOG
The following is an example of the log message:
UFLT/6/UFLT_MATCH_IPV4_LOG:Protocol(1001)=TCP;Application(1002)=SouhuNews;UserName(1113)=;SrcMacAddr(1021)=08-00-27-11-93-78;SrcIPAddr(1003)=112.1.1.2;SrcPort(1004)=3887;NATSrcIPAddr(1005)=112.1.1.2;NATSrcPort(1006)=3887;DstIPAddr(1007)=114.1.1.2;DstPort(1008)=80;NATDstIPAddr(1009)=114.1.1.2;NATDstPort(1010)=80;SrcZoneName(1025)=in;DstZoneName(1035)=out;PolicyName(1079)=p1;URLParentCategory(1128)=SearchEngines&Portals;URLCategory(1094)=SearchEngines&Portals;URL(1093)=news.sohu.com/upload/itoolbar/itoolbar.index.loader.20140923.js;VistTime(1114)=1480688515;Client(1110)=;Action(1053)=Drop;VlanID(1175)=400;VNI(1213)=--;SrcLocation(1209)=China Macao;DstLocation(1214)=SaintKittsandNevis;
You can take an action according to the value of the URLCategory(1094) field.
If the value of the URLCategory(1094) field is BlackList, it indicates that the website was blocked because it matched a URL blacklist. In this case, you can troubleshoot the issue by following the procedure in step 1 for the situation where the Reason field displays The URL of the website hit the URL blacklist.
If the value of the URLCategory(1094) field is a category name, it indicates that the website was blocked because it matched a user-defined or predefined URL category. In this case, you can execute the display url-filter category verbose command, and check the Type field to identify the type of the URL category.
<Sysname> display url-filter category verbose
URL category statistics:
Predefined categories: 53
Predefined rules: 2000
User-defined categories: 5
User-defined rules: 4
URL category details:
Name: category1
Type: User defined
Severity: 1001
Rules: 1
Description:
Name: Pre-AdvertisementsAndPop-Ups
Type: Predefined
Severity: 300
Rules: 32
Description: Sites that provide advertising graphics or other ad content fi
les such as banners and pop-ups.
If the Type field displays User defined, you can troubleshoot the issue by following the procedure in step 1 for the situation where the Reason field displays The URL of the website hit a user-defined URL category.
If the Type field displays Predefined, you can troubleshoot the issue by following the procedure in step 1 for the situation where the Reason field displays The URL of the website hit a predefined URL category.
¡ UFLT_NOT_MATCH_IPV4_LOG
The device generates this log message when the following conditions exist:
- URL whitelist-only filtering is enabled, and the accessed URL is not added to the URL whitelist.
- The default action is taken on a packet because it does not match any URL filtering rule.
The following is an example of the log message:
UFLT/6/UFLT_NOT_MATCH_IPV4_LOG:Protocol(1001)=TCP;Application(1002)=SouhuNews;UserName(1113)=;SrcMacAddr(1021)=08-00-27-11-93-78;SrcIPAddr(1003)=112.1.1.2;SrcPort(1004)=3887;NATSrcIPAddr(1005)=112.1.1.2;NATSrcPort(1006)=3887;DstIPAddr(1007)=114.1.1.2;DstPort(1008)=80;NATDstIPAddr(1009)=114.1.1.2;NATDstPort(1010)=80;SrcZoneName(1025)=in;DstZoneName(1035)=out;PolicyName(1079)=p1;URLParentCategory(1128)=-;URLCategory(1094)=Unknown;URL(1093)=news.sohu.com/upload/itoolbar/index/toolbar_bg_130315.gif;VistTime(1114)=1480691551;Client(1110)=;Action(1053)=Drop;VlanID(1175)=400;VNI(1213)=--;SrcLocation(1209)=China Macao;DstLocation(1214)=SaintKittsandNevis;
In URL filtering policy view, execute the display this command to identify whether URL whitelist-only filtering is enabled.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] display this
#
url-filter policy p1
whitelist-only enable
#
- If URL whitelist-only filtering is enabled, it indicates that the URL was blocked because it did not match a whitelist rule. In this case, you can troubleshoot the issue by following the procedure in step 1 for the situation where the Reason field displays No matching whitelist entry was found for the website in whitelist mode.
- If URL whitelist-only filtering is not enabled, it indicates that the URL was blocked because it did not match a filtering rule and the default action was taken. In this case, you can troubleshoot the issue by following the procedure in step 1 for the situation where the Reason field displays The URL of the website did not match any accessible URL category.
3. Identify whether the URL filtering configuration is correct.
When a packet matches filtering rules in multiple URL categories, the device takes the action specified for the category with the highest severity level.
a. Execute the display this command in URL filtering policy view.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] display this
category test1 action permit logging
category test2 action drop logging
b. Enter the view of each URL filtering policy to identify whether a URL filtering rule exists in multiple URL filtering policies.
View the configuration of URL category test1.
[Sysname] url-filter category test1
[Sysname-url-filter-category-test] display this
#
url-filter category test1 severity 2000
rule 1 host text 192.22.2.61
rule 2 host text *185*
#
View the configuration of URL category test2.
[Sysname] url-filter category test2
[Sysname-url-filter-category-test] display this
#
url-filter category test2 severity 3000
rule 1 host text 192.22.2.61
#
In this example, two URL categories (test1 and test2) exist in the URL filtering policy, and each of them contains a filtering rule that matches 192.22.2.61.
Because the severity of URL category test2 is higher that than of URL category test1, the action (drop) for URL category test2 is taken.
Use any of the following methods to resolve the issue:
Modify the severity level of URL category test1 to be higher than that of URL category test2.
[Sysname] url-filter category test1 severity 3001
Modify the action for URL category test2 to permit.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] category test2 action permit
Delete the filtering rule from URL category test2.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] undo rule 1
In URL filtering policy view, add a whitelist to match 192.22.2.61.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] add whitelist 1 host text 192.22.2.61
If the issue persists, proceed to step 4.
4. Identify whether the security policy uses a correct URL filtering policy.
Execute the display this command in security policy view to identify whether a DPI application profile is applied.
[Sysname] security-policy ip
[Sysname-security-policy-ip] display this
#
security-policy ip
rule 10 name abc
action pass
source-zone Trust
destination-zone Untrust
profile url-filter
#
return
In DPI application profile url-filter, identify whether the correct URL filtering policy is applied.
[Sysname] app-profile url-filter
[Sysname-app-profile-url-filter] display this
#
app-profile url-filter
url-filter apply policy p1
#
return
¡ If the applied URL filtering policy is not the correct one, execute the url-filter apply policy command in DPI application profile view to apply the correct one. If the issue persists, proceed to step 5.
¡ If the applied URL filtering policy is the correct one, proceed to step 5.
5. Collect the following information and contact the support:
¡ Results of each step.
¡ Configuration file.
Related alarm and log messages
Alarm messages
N/A
Log messages
· UFLT/6/UFLT_MATCH_IPV4_LOG
· UFLT/6/UFLT_MATCH_IPV6_LOG
· UFLT/6/UFLT_NOT_MATCH_IPV4_LOG
· UFLT/6/UFLT_NOT_MATCH_IPV6_LOG
Failed to block an HTTP website
Symptom
As shown in Figure 201, the device is connected to a LAN through security zone Trust and to the Internet through security zone Untrust. The device performs URL filtering when the host accesses the Internet. After URL filtering is configured, some websites cannot be blocked.
Common causes
The following are the common causes of this type of issue:
· In a network with asymmetric forwarding of flows, support for HA dual-active mode is disabled.
· The status of the DPI engine is abnormal.
· The security policy configuration is incorrect.
· The URL filtering configuration is incorrect. For example, the website is added to a whitelist rule.
Analysis
The troubleshooting flow for issues of this type is as follows:
1. In a network with asymmetric forwarding of flows, identify whether support for HA dual-active mode is enabled.
2. Identify whether the status of the DPI engine is normal.
3. Identify whether the security policy configuration is correct.
4. Identify whether the URL filtering configuration is correct.
Figure 202 shows the troubleshooting flowchart.
Figure 202 Flowchart for troubleshooting HTTP website block failure
Solution
In a network with asymmetric forwarding of flows (for example, HA dual-active network), identify whether support for HA dual-active mode is enabled.
1. Execute the display remote-backup-group status command on the device, and check the Backup mode field.
<Sysname> display remote-backup-group status
Remote backup group information:
Backup mode: Dual-active
Device management role: Primary
Device running status: Active
If the Backup mode field displays Dual-active, execute the inspect dual-active enable command in system view to enable support for HA dual-active mode. If the issue persists, proceed to step 2.
[Sysname] display current-configuration | include dual-active
inspect dual-active enable
If the Backup mode field displays another value, proceed to step 2.
2. Identify whether the status of the DPI engine is normal.
Execute the display inspect status command, and check the Running status field.
<Sysname> display inspect status
Running status: Normal
¡ If the Running status field displays Normal, proceed to step 3.
¡ If the Running status field displays DPI administratively disabled, it indicates that the administrator disabled the DPI engine. Execute the undo inspect bypass command in system view to enable the DPI engine. If the issue persists, proceed to step 3.
¡ If the Running status field displays DPI auto-bypass for protocol http, it indicates that the DPI engine automatically disabled inspection for the HTTP protocol. Execute the undo inspect bypass protocol command to enable inspection for the HTTP protocol. If the issue persists, proceed to step 3.
¡ If the Running status field displays DPI disabled due to high CPU usage, it indicates that the DPI engine was disabled due to high CPU usage. Reduce the CPU usage, and then execute the undo inspect bypass command to enable the DPI engine. If the issue persists, proceed to step 3.
3. Identify whether the security policy uses a correct URL filtering policy.
Execute the display this command in security policy view to identify whether a DPI application profile is applied.
[Sysname] security-policy ip
[Sysname-security-policy-ip] display this
#
security-policy ip
rule 10 name abc
action pass
source-zone Trust
destination-zone Untrust
profile url-filter
#
return
In DPI application profile url-filter, identify whether the correct URL filtering policy is applied.
[Sysname] app-profile url-filter
[Sysname-app-profile-url-filter] display this
#
app-profile url-filter
url-filter apply policy p1
#
return
¡ If the applied URL filtering policy is not the correct one, execute the url-filter apply policy command in DPI application profile view to apply the correct one. If the issue persists, proceed to step 4.
¡ If the applied URL filtering policy is the correct one, proceed to step 4.
4. Identify whether the URL filtering configuration is correct.
If you configure whitelist rules, blacklist rules, user-defined URL categories, the highest-priority rule takes effect, causing unexpected filtering results. The priority of them is whitelist rule > blacklist rule > user-defined URL category > predefined URL category.
a. Identify whether a whitelist rule is configured.
Execute the display this command in URL filtering policy view.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] display this
add whitelist 1 host text 192.22.2.61
Execute the undo add whitelist 1 command in URL filtering policy view to delete the whitelist rule.
- If the issue persists after the configuration is modified, proceed to step b.
- If no whitelist rule is configured, proceed to step b.
b. Identify whether multiple URL categories exist and whether they contain the same URL filtering rule.
When a packet matches filtering rules in multiple URL categories, the device takes the action specified for the category with the highest severity level.
Execute the display this command in URL filtering policy view to identify whether multiple URL categories exist.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] display this
category test1 action drop logging
category test2 action permit logging
Enter the view of each URL filtering policy to identify whether a URL filtering rule exists in multiple URL categories.
View the configuration of URL category test1.
[Sysname] url-filter category test1
[Sysname-url-filter-category-test] display this
#
url-filter category test1 severity 2000
rule 1 host text 192.22.2.61
rule 2 host text *185*
#
View the configuration of URL category test2.
[Sysname] url-filter category test2
[Sysname-url-filter-category-test] display this
#
url-filter category test2 severity 3000
rule 1 host text 192.22.2.61
#
In this example, two URL categories (test1 and test2) exist in the URL filtering policy, and each of them contains a filtering rule that matches 192.22.2.61.
Because the severity of URL category test2 is higher that than of URL category test1, the action (drop) for URL category test2 is taken.
Use any of the following methods to resolve the issue:
Modify the severity level of URL category test1 to be higher than that of URL category test2.
[Sysname] url-filter category test1 severity 3001
Modify the action for URL category test2 to drop.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] category test2 action drop
Delete the filtering rule from URL category test2.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] undo rule 1
In the URL filtering policy, add a blacklist rule to match 192.22.2.61.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] add blacklist 1 host text 192.22.2.61
- If the issue persists, proceed to step 5.
- If the URL filtering policy does not contains multiple URL categories or the URL categories do not contain the same URL filtering rule, proceed to step c.
5. Identify whether the action for a URL category is permit.
a. Execute the display this command in URL filtering policy view.
[Sysname-url-filter-policy-p1] display this
category test2 action permit logging
b. View the configuration of URL category test2.
[Sysname] url-filter category test2
[Sysname-url-filter-category-test] display this
#
url-filter category test2 severity 3000
rule 1 host text 192.22.2.61
c. Execute the undo rule 1 to delete the filtering rule.
- If the issue persists, proceed to step 5.
- If no user-defined URL category exists or the action of the user-defined URL category is not permit, proceed to step 5.
6. Collect the following information and contact the support:
¡ Results of each step.
¡ Configuration file.
Related alarm and log messages
Alarm messages
N/A
Log messages
· UFLT/6/UFLT_MATCH_IPV4_LOG
· UFLT/6/UFLT_MATCH_IPV6_LOG
· UFLT/6/UFLT_NOT_MATCH_IPV4_LOG
· UFLT/6/UFLT_NOT_MATCH_IPV6_LOG
Failed to access or block an HTTPS website
Symptom
As shown in Figure 203, the device is connected to a LAN through security zone Trust and to the Internet through security zone Untrust. The device performs URL filtering when the host accesses the Internet. After URL filtering is configured, an HTTPS website cannot be accessed or blocked.
Common causes
To enable URL filtering on HTTPS traffic, use either of the following methods:
· Enable HTTPS URL filtering. This feature performs URL filtering on undecrypted HTTPS traffic. The device directly detects the Client Hello message from the client, and extracts the server name from the Sever Name Indication (SNI) extension to match the URL filtering policy.
· Configure SSL decryption. This feature decrypts the HTTPS traffic and then performs HTTP URL filtering on the decrypted traffic. For more information about SSL decryption, see proxy policy configuration in DPI Configuration Guide.
If both of them are configured, only SSL decryption takes effect.
The following are the common causes of this type of issue:
· Neither HTTPS URL filtering nor SSL decryption is configured.
· The SSL decryption configuration is incorrect.
· In a client protection scenario, the trusted SSL decryption certificate is not installed on the client browser. In this case, SSL proxying will fail, and the HTTPS website cannot be accessed.
· The HTTPS website is not added to the user-defined SSL hostname whitelist, and the device cannot pass the verification of the client or server. In this case, SSL proxying will fail, and the HTTPS website cannot be accessed.
· The security policy configuration is incorrect.
Analysis
The troubleshooting flow for issues of this type is as follows:
1. Identify whether HTTPS URL filtering is enabled.
2. Identify whether SSL configuration is correct
3. Identify whether the trusted SSL decryption certificate is installed on the client browser.
4. Identify whether the hostname of the website is added to the user-defined SSL hostname whitelist.
5. Identify whether the security policy configuration is correct.
Figure 204 shows the troubleshooting flowchart.
Figure 204 Flowchart for troubleshooting HTTPS website access or block failure
Solution
1. Identify whether HTTPS URL filtering is enabled.
In URL filtering policy view, execute the display this command.
[Sysname] url-filter policy p1
[Sysname-url-filter-policy-p1] display this
#
url-filter policy p12
https-filter enable
#
¡ If HTTPS URL filtering is enabled, proceed to step 2.
¡ If HTTPS URL filtering is not enabled, use the https-filter enable command enable HTTPS URL filtering. If the issue persists, proceed to step 2.
2. Identify whether SSL decryption is configured and whether the configuration is correct.
Execute the display app-proxy-policy command to identify whether SSL decryption is configured.
<Sysname> display app-proxy-policy
Default action: no-proxy
Rule with ID 0 and name rule0:
Action: ssl-decrypt
Status:Enabled
Protect mode: client
Match criteria:
Source security zones: trust
Destination security zones: trust
Source IP address object groups: srcobj
Destination IP address object groups: destobj
Service object groups: serviceobj
Users: user1
User groups: usergroup1
Identify whether the configured match criteria are correct, such as source/destination security zones and source/destination IP addresses.
¡ If the configured match criteria are correct, proceed to step 3.
¡ If the configured match criteria are incorrect or SSL decryption is configured, modify the match criteria or configure SSL decryption (see proxy policy configuration in DPI Configuration Guide).
3. Identify whether the trusted SSL decryption certificate is installed on the client browser.
If the SSL decryption protection mode for a proxy policy rule is client protection, you must import the SSL decryption certificate marked as trusted in the device to the user’s browser (under Trusted Root Certification Authorities). If this certificate is not imported, the device to fail the certificate verification process during SSL proxy operations. The browser will also display warnings such as issues with the website's security certificate or certificate errors, and some websites might become inaccessible.
Identify whether the trusted SSL decryption certificate is imported under Trusted Root Certification Authorities.
¡ If no, follow the browser's certificate import wizard instructions to import the certificate under Trusted Root Certification Authorities in the certificate list.
¡ If yes, proceed to step 4.
4. Identify whether the hostname of the website is added to the user-defined SSL hostname whitelist.
In the scenario that requires SSL client authentication or in-depth server certificate inspection, SSL proxy will cause the device (as an SSL proxy) unable to pass SSL client or server verification.
Execute the display app-proxy ssl whitelist hostname user-defined command.
<Sysname> display app-proxy ssl whitelist hostname user-defined
Hostname
example1.com
example2.com
¡ If the hostname of the website is added to the user-defined SSL hostname whitelist, proceed to step 5.
¡ If the hostname of the website is not added to the user-defined SSL hostname whitelist, Execute the app-proxy ssl whitelist user-defined-hostname command in system view.
5. Identify whether the security policy configuration is correct.
When you configure the SSL decryption function, you must allow communication between the source security zone, destination security zone, and the Local zone in the security policy. This ensures that the device can act as an SSL proxy server/client, proxying the traffic between the client and the server.
Execute the display security-policy ip command to check the security policy configuration.
<Sysname> display security-policy ip
Security-policy ip
rule 3 name trust-local
action pass
source-zone Trust
source-zone Local
destination-zone Local
destination-zone Trust
rule 4 name untrust-local
action pass
source-zone Untrust
source-zone Local
destination-zone Local
destination-zone Untrust
---- More ----
¡ If the source security zone, destination security zone, and the Local zone can communicate with one another, proceed to step 6.
¡ If the source security zone, destination security zone, and the Local zone cannot communicate with one another, configure them to communicate with one another in a security policy rule. If the issue persists, proceed to step 6.
6. Collect the following information and contact the support:
¡ Results of each step.
¡ Configuration file.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Data analysis center issues
Failure to display certain DPI service logs and report data on the webpage
Symptom
After the user logs into the device webpage, no log data is displayed on a certain DPI service log page, and no statistics are available for this type of service in the generated reports. For example, the threat log page does not display any data, and the generated summary reports do not include threat statistics or trend data.
Common causes
The following are the common causes of this type of issue:
· The log collection feature in the data analysis center module for the service has not been enabled.
· The service configuration is incorrect, resulting in the system not generating logs. For example, the IPS service is not configured with the logging action.
· The user traffic did not hit the service policy, so the system did not generate log messages.
· The system time of the device is configured incorrectly.
Troubleshooting flow
Troubleshoot the issue by using the following process:
1. Identify whether the specified service is enabled with the log collection feature in the data analysis center module.
2. Identify whether the specified service is enabled with the logging action.
3. Check the statistical information of inspection rules being hit in the application layer inspection engine, and identify whether user traffic hits the specified service inspection rules.
4. Identify whether the device's system time is correct.
Figure 205 shows the troubleshooting flowchart.
Solution
1. Identify whether the specified service is enabled with the log collection feature in the data analysis center module.
After each service module processes a message, it needs to send the generated log message to the data analysis center. The data analysis center then extracts relevant data for summarization and analysis, and displays the results on various log, trend, and statistics pages on the Monitor tab of the Web interface. You need to enable the log collection feature for each service in the data analysis center module. Then, the data analysis center will extract information from the service log messages and carry out subsequent processing.
You can execute the display dac log-collect command to check the enabling status of the log collection feature for each service.
<Sysname> system-view
[Sysname] display dac log-collect all
Service type Service Status
Slot 1:
dpi audit Disabled
dpi ffilter Disabled
dpi threat Enabled
dpi traffic Disabled
dpi uflt Disabled
Table 24 Command output
Field |
Description |
Service |
Service name. |
Status |
Status of the log collection feature, including: · Disabled. · Enabled. |
The gray-colored content in the output represents the log collection feature for threat services (including IPS and anti-virus services). The Status field indicates the state of the log collection feature, and you can perform the following operations based on this field value:
¡ If the field value is Enabled, the log collection feature of the threat service is enabled. Proceed to step 2.
¡ If the field value is Disabled, execute the dac log-collect enable command to enable the log collection feature for a service.
2. Identify whether the specified service is enabled with the logging action.
The system will only log the corresponding information if the specified service is enabled with the logging action in its policy configuration, when a message matches the policy. For example, in the case of the IPS service, you can execute a command to identify whether the logging action is enabled in an IPS policy. For information about the logging action of other services, see the configuration guide for each service.
You can execute the display ips policy command to identify whether the actions specified for signatures in an IPS policy include logging.
<Sysname> display ips policy aa
Total signatures :10929 failed:0
Pre-defined signatures:10925 failed:0
Snort signatures :0 failed:0
User-config signatures:0 failed:0
Flag:
B: Block-Source D: Drop P: Permit Rs: Reset Rd: Redirect C: Capture L: L
ogging
Pre: predefined Snort: Snort User: user-config
Type RuleID Target SubTarget Severity Direction Category
SubCategory Status Action
Pre 1 OperationSystem LinuxUnix High Server Vulnerability
RemoteCodeExecu Enable RsL
Pre 2 OperationSystem LinuxUnix High Server Vulnerability
MemoryCorruptio Enable RsL
Pre 4 OfficeSoftware MicrosoftOffice High Any Vulnerability
Overflow Enable RsL
Pre 5 OfficeSoftware MicrosoftOffice High Any Vulnerability
MemoryCorruptio Enable RsL
Pre 6 Browser InternetExplore High Any Vulnerability
---- More ----
Table 25 Command output
Field |
Description |
Total signatures |
Total number of IPS signatures. |
Pre-defined signatures |
Number of the predefined IPS signatures. |
User-config signatures |
Number of user-defined signatures that are configured manually. |
Snort signatures |
Number of Snort signatures. |
Type |
Type of IPS signature, including: · Pre—Predefined signatures. · User—User-defined signatures that are configured manually. · Snort—Snort signatures imported from Snort files. |
RuleID |
IPS signature ID. |
Target |
Attack target. |
SubTarget |
Attack subtarget. |
Severity |
Attack severity level of the signature in the ascending order of severity, Low, Medium, High, and Critical. |
Direction |
Direction attribute in an IPS signature, including: · Any—Both server to client and client to server directions. · Client—Server to client direction. · Server—Client to server direction. |
Category |
Attack category of the IPS signature. |
Subcategory |
Attack subcategory of the IPS signature. |
Status |
Status of the IPS signature, including: · Enabled. · Disabled. |
Action |
Actions for matching packets: · Block-source—Drops matching packets and adds the sources of the packets to the IP blacklist. · Drop—Drops matching packets. · Permit—Permits matching packets to pass. · Reset—Closes the TCP or UDP connections for matching packets by sending TCP reset messages or ICMP port unreachable messages. · Redirect—Redirects matching packets to a webpage. · Capture—Captures matching packets. · Logging— Logs matching packets. |
The gray-colored signifies the actions taken by an IPS signature in the Action field.
¡ If this field value includes logging (L), the logging action will be performed when a message matches the signature. Proceed to step 3.
¡ If this field value does not include logging (L), the logging action will not be performed when a message matches the signature. You can modify the actions taken by the signature through the following operations:
- In IPS policy view, execute the signature override logging command to specify the action for the signature in the IPS policy as logging.
- In IPS policy view, execute the signature override all logging command to specify the action for all the signatures in the IPS policy as logging.
3. Check the statistics about inspection rules hits in the DPI engine, and identify whether the traffic has hit the specified services.
In probe view, execute the display system internal inspect hit-statistics command to display statistics for inspection rule matching of DPI services.
<Device> system-view
[Device] probe
[Device-probe] display system internal inspect hit-statistics
Slot 2:
Rule ID Module Rule hits AC hits PCRE try PCRE hits
2147483649 FFILTER 10 10 0 0
2147483679 FFILTER 5 50 0 0
31 APR 2 2 0 0
932 APR 0 8269 0 0
104 IPS 0 86 0 0
120 IPS 0 6 0 0
183 IPS 0 19 0 0
401 IPS 0 2 0 0
4817 APR 0 1 3 0
6503 APR 0 3 0 0
Table 26 Command output
Field |
Description |
Rule ID |
ID of each service inspection rule. |
Module |
Service name. |
Rule hits |
· Rule hit testing. |
AC hits |
· Number of AC inspection rule hits. |
PCRE try |
· Number of regular expression matches. |
PCRE hits |
· Number of regular expression hits. |
The gray-colored identifiers represent the inspection rule hits of the IPS service. The non-zero values for the AC hits field indicate traffic hitting the IPS service. You can check the inspection rule hit status for a service as needed:
¡ If the values for the Rule hits, AC hits, and PCRE hits fields are all zero, no traffic has hit service inspection rules. Identify whether the policy configuration for the specified service is correct. For more information about the service policy configuration, see the configuration guide for each service.
¡ If at least one of the Rule hits, AC hits, and PCRE hits fields has a non-zero value, traffic has hit service inspection rules. Then, proceed to step 4.
4. Identify whether the device's system time is correct.
Execute the display clock command to identify whether the device's system time is correct. If the system time is incorrect, the specified query time range will not match the logging time, preventing the viewing of log messages within a certain period.
<Device> display clock
19:34:01 beijing Thu 12/08/2022
Time Zone : beijing add 08:00:00
The gray-colored content represents the system time. Identify whether the time is correct and perform the following operations as needed:
¡ If the system time is correct, proceed to step 5.
¡ If the system time is incorrect, execute the clock datetime command to modify the system time. After you modify the system time of the device, new traffic must hit the service for the service to generate log messages which then can be displayed on the Web interface. A delay of about 10 to 20 seconds is present for the logging data, so you need to wait for a short period of time before viewing the new data. The delay for data on the trend and statistics page is about five minutes. In addition, if log aggregation is configured, the longer the configured aggregation time, the longer the delay will be. To configure log aggregation, click the Log aggregation settings button on each log page on the Monitor tab of the Web interface, and then enable log aggregation and set the aggregation time.
These delays above are predicted under ideal circumstances. When the data acquisition pressure is higher, these delays might increase.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Result of the execution of the display system internal dac statistics log-collect command in probe view.
¡ The configuration files of the device.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A
Failure to display recently generated logs of a certain DPI service on the webpage
Symptom
After the user logs in to the device webpage, the webpage displays the historical log messages of a certain DPI service, but cannot display the recently generated log messages of the DPI service.
Common causes
The following are the common causes for this type of issue:
· The configuration has changed, and the log collection feature for the service is disabled.
· The service log data has reached its storage upper limit. The system has performed the action of discarding new log messages, that is, log-only.
· The system time of the device is configured incorrectly.
Troubleshooting flow
Troubleshoot the issue by using the following process:
1. Identify whether the specified service is disabled with the log collection feature in the data analysis center module.
2. Identify the storage space usage of the specified service, whether the upper limit is reached, and whether the action for reaching the upper limit is set to log-only.
3. Identify whether the device's system time is correct.
Figure 206 shows the troubleshooting flowchart.
Solution
1. Identify whether the specified service is disabled with the log collection feature in the data analysis center module.
After each service module processes a message, it needs to send the generated log message to the data analysis center. The data analysis center then extracts relevant data for summarization and analysis, and displays the results on various log, trend, and statistics pages on the Monitor tab of the Web interface. The log collection feature in the data analysis center module is enabled for each service. Then, the data analysis center will extract information from the service log messages and carry out subsequent processing.
You can perform one of the following operations:
¡ Execute the display dac log-collect command to check the enabling status of the log collection feature for each service.
<Sysname> system-view
[Sysname] display dac log-collect all
Service type Service Status
Slot 1:
dpi audit Disabled
dpi ffilter Disabled
dpi threat Enabled
dpi traffic Disabled
dpi uflt Disabled
Table 27 Command output
Field |
Description |
Service |
Service name. |
Status |
Status of the log collection feature, including: · Disabled. · Enabled. |
The gray-colored content in the output represents the log collection feature for threat services (including IPS and anti-virus services). The Status field indicates the state of the log collection feature, and you can perform the following operations based on this field value:
- If the field value is Enabled, the log collection feature of the threat service is enabled. Proceed to step 2.
- When the field value is Disabled, execute the dac log-collect enable command to enable the log collection feature for a service.
2. Identify the memory space usage of the specified service, whether the upper limit is reached, and whether the action for reaching the upper limit is set to log-only.
Log in to the device webpage, and select System > Log Settings > Storage Space Settings. You can obtain the following information from the Max storage space, Action, and Used space columns for a service, as shown in Figure 207.
¡ Max storage space: Maximum storage space allocated for each service, which represents the percentage of total storage space that can be used by their service data.
¡ Action: Action performed on historical data when the data storage of various services reach time or space limit. The actions include delete and log-only. The delete action deletes the oldest log data to save new data. The log-only action generates a log message, but it does not delete old log data to save new data.
¡ Used space: Percentage of memory space currently occupied by a service.
Figure 207 Storage space usage
The log data storage states for threat services (including IPS and anti-virus services) are identified by the red rectangles. You can check the log data storage of the specified services as follows:
¡ If the value in the Used space column is significantly different from the value in the Max storage space column, or if the Action column is delete, proceed to step 3.
¡ If the value in the Used space column is very close to the value in in the Max storage space column and the Action column is log-only, the storage space for service logs is full and recently generated service logs are discarded. In this case, you can increase the maximum storage space or change the upper limit processing action to delete by clicking the edit icon for a service to open the service editing dialog box.
Figure 208 Edit service information
3. Identify whether the device's system time is correct.
Execute the display clock command to identify whether the device system time is correct. If the system time is incorrect, the specified query time range will not match the logging time, preventing the viewing of log messages within a certain period.
<Device> display clock
19:34:01 beijing Thu 12/08/2022
Time Zone : beijing add 08:00:00
The gray-colored content represents the system time. Identify whether the time is correct and perform the following operations as needed:
¡ If the system time is correct, proceed to step 4.
¡ If the system time is incorrect, execute the clock datetime command to modify the system time. After you modify the system time of the device, new traffic must hit the service for the service to generate log messages which then can be displayed on the Web interface. A delay of about 10 to 20 seconds is present for the logging data, so you need to wait for a short period of time before viewing the new data. The delay for data on the trend and statistics page is about five minutes. In addition, if log aggregation is configured, the longer the configured aggregation time, the longer the delay will be. To configure log aggregation, click the Log aggregation settings button on each log page on the Monitor tab of the Web interface, and then enable log aggregation and set the aggregation time.
¡ These delays above are predicted under ideal circumstances. When the data acquisition pressure is higher, these delays might increase.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ Result of the execution of the display system internal dac statistics log-collect command in probe view.
¡ The configuration files of the device.
Related alarm and log messages
Alarm messages
N/A
Log messages
N/A