Download Book

H3C SR6600[SR6600-X] Router Series Troubleshooting Guide(V7)-R8149-6W100-book.pdf(5.63 MB)

Released At: 16-04-2025
Page Views:
Downloads:

Table of Contents

H3C SR6600[SR6600-X] Router Series Troubleshooting Guide(V7)-R8149-6W100

Related Documents

H3C SR6600[SR6600-X] Router Series

Troubleshooting Guide

Document version: 6W100-20250416

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

The information in this document is subject to change without notice.

Contents

Introduction· 1

General guidelines· 1

Collecting log and operating information· 1

Contacting technical support 4

Troubleshooting hardware issues· 5

System issues· 5

Power supply issues· 17

Transceiver module issues· 46

Troubleshooting fundamental issues· 53

Software upgrade issues· 64

Troubleshooting system management issues· 69

Hardware resource management issues· 69

Troubleshooting virtual technology issues· 78

IRF issues· 78

Troubleshooting interface issues· 89

Tunnel interface issues· 89

Troubleshooting Layer 2—LAN switching issues· 94

Ethernet link aggregation issues· 94

Spanning tree issues· 106

Troubleshooting Layer 2—WAN access issues· 117

PPP issues· 117

Troubleshooting Layer 3—IP services issues· 122

ARP issues· 122

DHCP issues· 137

ND issues· 141

Troubleshooting Layer 3 IP routing issues· 154

BGP issues· 154

IS-IS issues· 180

Troubleshooting OSPFv3· 191

OSPF issues· 197

Equal-cost route issues· 222

Troubleshooting RIR· 225

Troubleshooting multicast issues· 229

MSDP issues· 229

MVPN issues· 232

PIM issues· 235

Layer 3 multicast issues· 248

Layer 2 multicast issues· 253

Troubleshooting MPLS issues· 255

LDP issues· 255

Troubleshooting MPLS L2VPN/VPLS· 267

Troubleshooting MPLS L3VPN issues· 271

MPLS TE issues· 295

Issues of basic MPLS· 306

Troubleshooting VPLS· 310

Troubleshooting segment routing issues· 324

EVPN L3VPN over SRv6 issues· 324

Troubleshooting EVPN VPWS over SRv6· 339

Troubleshooting SR-MPLS· 351

SRv6 TE policy issues· 357

Troubleshooting VPN issues· 363

Troubleshooting EVPN issues· 363

Troubleshooting VXLAN issues· 370

Troubleshooting EVPN issues· 379

Troubleshooting EVPN VXLAN· 379

Troubleshooting ACL and QoS issues· 395

QoS issues· 395

Troubleshooting IP tunneling and security VPN issues· 409

IPsec issues· 409

IP tunneling issues· 423

Troubleshooting user access and authentication issues· 428

802.1X issues· 428

Troubleshooting AAA issues· 437

MAC authentication issues· 501

Password control issues· 508

Portal issues· 514

Troubleshooting security issues· 528

Troubleshooting SSH· 528

SSL VPN issues· 550

Troubleshooting high availability issues· 569

Troubleshooting BFD·· 569

Troubleshooting SBFD·· 577

Troubleshooting VRRP· 581

RBM issues· 586

Troubleshooting system management issues· 593

NETCONF issues· 593

Troubleshooting network management and monitoring issues· 598

Troubleshooting NQA· 598

NTP issues· 607

Ping and tracert failures· 610

RMON issues· 624

SNMP issues· 626

Mirroring issues· 638

gRPC issues· 642

Troubleshooting DPI issues· 646

URL filtering issues· 646

Data analysis center issues· 666

Introduction

This document provides information about troubleshooting common software and hardware problems with H3C SR6600[SR6600-X] routers.

General guidelines

IMPORTANT:

To prevent a problem from causing loss of configuration, save the configuration each time you finish configuring a feature. For configuration recovery, regularly back up the configuration to a remote server.

When you troubleshoot H3C MSR routers, follow these general guidelines:

· To help identify the cause of the problem, collect system and configuration information, including:

¡ Symptom, time of failure, and configuration.

¡ Network topology information, including the network diagram, port connections, and points of failure.

¡ Log messages and diagnostic information. For more information about collecting this information, see "Collecting log and operating information."

¡ Physical evidence of failure:

- Photos of the hardware.

- Status of the card, power, and fan status LEDs.

¡ Steps you have taken, such as reconfiguration, cable swapping, and rebooting.

¡ Output from the commands executed during the troubleshooting process.

· To ensure safety, wear an ESD-preventive wrist strap when you replace or maintain a hardware component.

· If hardware replacement is required, use the release notes to verify the hardware and software compatibility.

Collecting log and operating information

IMPORTANT:

By default, the information center is enabled. If the feature has been disabled, you must use the info-center enable command to enable the feature for collecting log messages.

Table 1 shows the types of files that the system uses to store operating log and status information. You can export these files by using FTP, TFTP, or USB. To more easily locate log information, use a consistent rule to categorize and name files. For example, save log information files to a separate folder for each MPU on a distributed device, and include their chassis and slot numbers in the folder names.

Table 1 Log and operating information

Category	File name format	Content
Common log	logfileX.log	Command execution and operational log messages.
Diagnostic log	diagfileX.log	Diagnostic log messages about device operation, including the following items: · Parameter settings in effect when an error occurs. · Information about a card startup error. · Handshaking information between the MPU and interface card when a communication error occurs.
Operating statistics	file-basename.gz	IMPORTANT: Collecting operating statistics decreases system performance. Current operating statistics for feature modules, including the following items: · Device status. · CPU status. · Memory status. · Configuration status. · Software entries. · Hardware entries.

NOTE:

For common and diagnostic log files, the system automatically compresses them into .gz files when they are full.

Collecting common log messages

# Save common log messages from the log buffer to a log file.

By default, the log file is saved in the logfile directory of the storage medium on the device.

<Sysname> logfile save

The contents in the log file buffer have been saved to the file cfa0:/logfile/logfile8.log

# Identify the log file on the active MPU of the master device.

<Sysname> dir cfa0:/logfile/

Directory of cfa0:/logfile

0 -rw- 21863 Jul 11 2013 16:00:37 logfile8.log

1021104 KB total (421552 KB free)

# Identify the log file on the standby MPU of the master device.

<Sysname> dir slot1#cfa0:/logfile/

Directory of slot1#cfa0:/logfile

0 -rw- 21863 Jul 11 2013 16:00:37 logfile8.log

1021104 KB total (421552 KB free)

# Transfer the files to the desired destination by using FTP, TFTP, or USB. (Details not shown.)

Collecting diagnostic log messages

# Save diagnostic log messages from the diagnostic log file buffer to a diagnostic log file.

By default, the diagnostic log file is saved in the diagfile directory of the storage medium on the device.

<Sysname> diagnostic-logfile save

The contents in the diagnostic log file buffer have been saved to the file cfa0:/diagfile/diagfile18.log

# Identify the diagnostic log file on the active MPU of the master device.

<Sysname> dir cfa0:/diagfile/

Directory of cfa0:/diagfile

0 -rw- 161321 Jul 11 2013 16:16:00 diagfile18.log

1021104 KB total (421416 KB free)

# Identify the diagnostic log file on the standby MPU of the master device.

<Sysname> dir slot1#cfa0:/diagfile/

Directory of slot1#cfa0:/diagfile

0 -rw- 161321 Jul 11 2013 16:16:00 diagfile18.log

1021104 KB total (421416 KB free)

# Transfer the files to the desired destination by using FTP, TFTP, or USB. (Details not shown.)

Collecting operating statistics

You can collect operating statistics by saving the statistics to a file or displaying the statistics on the screen.

When you collect operating statistics, follow these guidelines:

· Log in to the device through a network or management port instead of the console port, if possible. Network and management ports are faster than the console port.

· Do not execute commands while operating statistics are being collected.

· H3C recommends saving operating statistics to a file to retain the information.

NOTE:

The amount of time to collect statistics increases along with the number of cards.

To collect operating statistics:

1. Disable pausing between screens of output if you want to display operating statistics on the screen. Skip this step if you are saving statistics to a file.

<Sysname> screen-length disable

2. Collect operating statistics for multiple feature modules.

<Sysname> display diagnostic-information

Save or display diagnostic information (Y=save, N=display)? [Y/N] :

3. At the prompt, choose to save or display operating statistics:

# To save operating statistics, enter y at the prompt and then specify the destination file path.

Save or display diagnostic information (Y=save, N=display)? [Y/N] :y

Please input the file name(*.tar.gz)[cfa0:/diag.tar.gz] :cfa0:/diag.tar.gz

Diagnostic information is outputting to cfa0:/diag.tar.gz.

Please wait...

Save successfully.

<Sysname> dir cfa0:/

Directory of cfa0:

…

6 -rw- 898180 Jun 26 2013 09:23:51 diag.tar.gz

1021808 KB total (259072 KB free)

# To display operating statistics on the monitor terminal, enter n at the prompt.

Save or display diagnostic information (Y=save, N=display)? [Y/N] :n

===========================================================

===============display alarm===============

No alarm information.

=========================================================

===============display boot-loader===============

Software images on slot 0:

Current software images:

cfa0:/SR6600X-CMW710-BOOT-R7328_mrpnc.bin

cfa0:/SR6600X-CMW710-SYSTEM-R7328_mrpnc.bin

Main startup software images:

cfa0:/SR6600X-CMW710-BOOT-R7328_mrpnc.bin

cfa0:/SR6600X-CMW710-SYSTEM-R7328_mrpnc.bin

Backup startup software images:

None

=========================================================

===============display counters inbound interface===============

Interface Total (pkts) Broadcast (pkts) Multicast (pkts) Err (pkts)

BAGG1 0 0 0 0

GE2/0/1 0 0 0 0

GE2/0/2 2 2 0 0

GE2/0/3 0 0 0 0

GE2/0/4 0 0 0 0

GE2/0/5 0 0 0 0

GE2/0/6 0 0 0 0

GE2/0/7 0 0 0 0

GE2/0/8 0 0 0 0

GE2/0/9 0 0 0 0

GE2/0/10 0 0 0 0

……

Contacting technical support

If you cannot resolve a problem by using the troubleshooting procedures in this document, contact H3C Support. When you contact an authorized H3C support representative, be prepared to provide the following information:

· Information described in "General guidelines."

· Product serial numbers.

This information will help the support engineer assist you as quickly as possible.

Contact H3C Support at [email protected].

Troubleshooting hardware issues

System issues

The terminal displays nothing or garbled characters

Symptom

When the device powers on, the configuration terminal displays nothing or garbled characters.

Common causes

The following are the common causes of this type of issue:

· Power is malfunctioning.

· The MPU is experiencing abnormal operation.

· Connect the configuration cable to the MPU's console port.

· Configure terminal parameter settings correctly.

· Configure cable faults.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 1.

Figure 1 Troubleshooting flow

Solution

1. Identify whether the power is functioning properly.

If the power supply unit indication light shows an abnormal status, refer to the power failure troubleshooting section for assistance.

2. Identify whether the MPU operates normally.

If the MPU indication light status is abnormal, refer to the MPU troubleshooting section for resolution.

3. Identify whether the configuration cable is connected to the MPU's console port.

4. Identify whether the COM port connection of the configuration terminal is correct. Ensure the selected serial port matches the terminal's settings and that the serial port parameters are configured correctly.

The serial port parameters are as follows: use a baud rate of 9600, set data bits to 8, select no parity check, use 1 stop bit, and set no traffic control. Choose VT100 for terminal emulation. Use the actual conditions of the device for the serial port parameters of different device configurations.

5. Replace the configuration cable.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

The device experiences an abnormal reboot

Symptom

The device experiences an abnormal restart during operation.

Common causes

Common causes of this type of failure include boot file issues.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 2.

Figure 2 Troubleshooting flowchart

Solution

1. Identify whether the device can enter command line mode after rebooting.

If the device can access command line mode, use the display diagnostic-information command to collect diagnostic information. After collecting, export the device information and send it to H3C technical support for assistance.

NOTE:

When you execute the display diagnostic-information command, specify the key-info parameter to collect only essential diagnostic information, reducing collection time.

2. Identify whether the startup file is functioning properly.

If the device cannot enter command line mode, connect the device through the Console port and restart it. If BootWare prompts a CRC error or cannot find the boot file, use the BootWare menu to re-download the boot file and set it as the current boot file. During the BootWare loading process, BootWare automatically sets this file as the current boot file.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Temperature anomaly alarm

Symptom

The system generates a temperature alarm. Print alarm messages indicating high temperatures, for example:

%Jun 26 10:13:46:233 2013 H3C DRVPLAT/4/DrvDebug: Temperature of the board is too high!

Common causes

The following are the common causes of this type of issue:

· Poor ventilation or air conditioning failures cause high ambient temperature.

· The device fan malfunctions or the air intake vent is blocked by foreign objects.

· The air filter on the device has accumulated too much dust.

· The software failed to retrieve temperature data and generated an error alarm.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 3.

Figure 3 Troubleshooting flowchart

Solution

1. Identify whether the ambient temperature is too high.

If the temperature is too high, increase the air conditioning or take other heat dissipation measures to lower the ambient temperature.

2. Identify whether the device temperature is too high.

Execute the display environment command to check the device's current temperature value. If it shows 255, the software fails to obtain temperature data. Execute the display environment command multiple times until the temperature data displays normally. Then, Identify whether the device temperature is too high.

If the device temperature is too high (exceeding the general high-temperature alarm threshold), acknowledge that the device fan is functioning properly and Identify whether the air intake vent is blocked by foreign objects.

3. Use the display fan command to Identify whether the fan tray is operating correctly. If it is not functioning properly, see the fan module failure section to troubleshoot the fan issue.

4. Identify whether the air filter is clean.

If the fan operates normally, Identify whether the air filter is clean. After cleaning the air filter, Identify whether the temperature can return to normal.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· TEMP_HIGH

· TEMP_LOW

· TEMP_NORMAL

· TEMPERATURE_ALARM

· TEMPERATURE_LOW

· TEMPERATURE_NORMAL

· TEMPERATURE_POWEROFF

· TEMPERATURE_SHUTDOWN

· TEMPERATURE_WARNING

Voltage abnormality alarm

Symptom

The system prints voltage anomaly alarm messages, for example:

DEV/4/VOLTAGE_HIGH: Voltage is greater than the high-voltage alarm threshold on chasiss 1 slot 16 voltage sensor 1.

DEV/4/VOLTAGE_LOW: Voltage is less than the low-voltage alarm threshold on chasiss 1 slot 16 voltage sensor 24.

Common causes

Common causes of this type of failure typically include hardware (HW) malfunctions.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 4.

Figure 4 Troubleshooting flowchart

Solution

Collect the device configuration file, log information, and alarm information, and contact Technical Support.

Related alarm and log messages

Alarm messages

N/A

Log messages

· VOLT_HIGH

· VOLT_LOW

· VOLT_NORMAL

Memory exception alarm

Symptom

The system prints memory exception alarm messages, such as:

DIAG/1/MEM_EXCEED_THRESHOLD: Memory minor threshold has been exceeded.

Common causes

The common causes of this type of failure mainly stem from memory leaks.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 5.

Figure 5 Troubleshooting flowchart

Solution

1. Determine the usage of each memory block.

Use the display system internal kernel memory pool command in the probe view to check memory usage for each block. Identify memory modules with abnormal or increasing usage.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] display system internal kernel memory pool slot 1

Active Number Size Align Slab Pg/Slab ASlabs NSlabs Name

9126 9248 64 8 32 1 289 289 kmalloc-64

105 112 16328 0 2 8 54 56 kmalloc-16328

14 14 2097096 0 1 512 14 14 kmalloc-2097096

147 225 2048 8 15 8 12 15 kmalloc-2048

7108 7232 192 8 32 2 226 226 kmalloc-192

22 22 524232 0 1 128 22 22 kmalloc-524232

1288 1344 128 8 21 1 64 64 kmalloc-128

0 0 67108808 0 1 16384 0 0 kmalloc-67108808

630 651 4096 8 7 8 93 93 kmalloc-4096

68 70 131016 0 1 32 68 70 kmalloc-131016

1718 2048 8 8 64 1 31 32 kmalloc-8

1 1 16777160 0 1 4096 1 1 kmalloc-16777160

2 15 2048 0 15 8 1 1 sgpool-64

0 0 40 0 42 1 0 0 inotify_event_cache

325 330 16328 8 2 8 165 165 kmalloc_dma-16328

0 0 72 0 30 1 0 0 LFIB_IlmEntryCache

0 0 1080 0 28 8 0 0 LFIB_IlmEntryCache

0 0 1464 0 21 8 0 0 MFW_FsCache

1 20 136 0 20 1 1 1 L2VFIB_Ac_cache

0 0 240 0 25 2 0 0 CCF_JOBDESC

0 0 88 0 26 1 0 0 NS4_Aggre_TosSrcPre

0 0 128 0 21 1 0 0 IPFS_CacheHash_cachep

---- More ----

Please focus on the statistics in the Number and Size columns. If you notice continuous growth in a specific block, it indicates that the block is being actively utilized. Follow these restrictions and guidelines:

¡ An increase in memory block usage is normal. Therefore, determine whether the memory block is truly abnormal. Number*Size represents the memory size used by a specific module. To determine if memory usage is normal, observe the memory growth rate and analyze the amount of memory used over time.

¡ Some memory leaks occur slowly, requiring a longer observation period, even weeks, for comparison.

2. Collect information and seek technical support.

The above steps only define the scope of the issue. Continue gathering information to identify the specific fault. Due to the high requirements for subsequent information collection, avoid user operations. Please contact H3C's technical support engineer.

Please do not restart the device, as it may corrupt fault information and complicate fault localization.

Related alarm and log messages

Alarm messages

N/A

Log messages

· MEM_ALERT

· MEM_EXCEED_THRESHOLD

· MEM_BELOW_THRESHOLD

High CPU usage

Symptom

Use the display cpu-usage command to monitor CPU usage continuously. If the CPU usage remains above 80%, a task is likely consuming CPU resources for an extended period. Acknowledge the specific cause of the high CPU usage.

<Sysname> display cpu-usage

Slot 1 CPU 0 CPU usage:

80% in last 5 seconds

80% in last 1 minute

80% in last 5 minutes

Common causes

The following are the common causes of this type of issue:

· Route oscillation

· Message attack

· Link loop

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 6.

Figure 6 Troubleshooting flowchart

Solution

1. Check for routing oscillation.

Frequent changes in the routing table entries may cause high CPU usage. When route flapping occurs, collect information and contact H3C technicians for technical support.

View the routing table for the first time.

[Sysname] display ip routing-table

Destinations : 9 Routes : 9

Destination/Mask Proto Pre Cost NextHop Interface

0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

10.1.1.0/24 OSPF 150 1 11.2.1.1 Vlan100

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0

127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

224.0.0.0/4 Direct 0 0 0.0.0.0 NULL0

224.0.0.0/24 Direct 0 0 0.0.0.0 NULL0

255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

Review the routing table again.

[Sysname] display ip routing-table

Destinations : 8 Routes : 8

Destination/Mask Proto Pre Cost NextHop Interface

0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0

127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

224.0.0.0/4 Direct 0 0 0.0.0.0 NULL0

224.0.0.0/24 Direct 0 0 0.0.0.0 NULL0

255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

2. Check for message attacks.

Acknowledge the attack source by capturing packets. Capture packets at the device port. Use packet capture tools like Sniffer, Wireshark, or WinNetCap to analyze packet characteristics and acknowledge the attack source. Then configure message protection against the attack source. For more information about message attack prevention and configuration, see "Attack Detection and Prevention" in the "Security Configuration Guide."

3. Check for the existence of a link.

When a loop exists in the link, it may cause a broadcast storm and network oscillation. A large number of protocol packets sent to the CPU can increase CPU usage. Many device ports may experience high traffic, with port utilization exceeding 90%.

<Sysname> display interface gigabitethernet2/0/1

GigabitEthernet2/0/1

Current state: UP

Line protocol state: UP

Description: GigabitEthernet2/0/1 Interface

Bandwidth: 1000000 kbps

Maximum transmission unit: 1500

Internet address: 2.1.1.2/24 (primary)

IP packet frame type: Ethernet II, hardware address: 0000-fc00-9276

IPv6 packet frame type: Ethernet II, hardware address: 0000-fc00-9276

Loopback is not set

Media type is twisted pair, port hardware type is 1000_BASE_T

Port priority: 0

1000Mbps-speed mode, full-duplex mode

Link speed type is autonegotiation, link duplex type is autonegotiation

Flow-control is not enabled

Maximum frame length: 9216

Last clearing of counters: Never

Peak input rate: 8 bytes/sec, at 2016-03-19 09:20:48

Peak output rate: 1 bytes/sec, at 2016-03-19 09:16:16

Last 300 second input: 26560 packets/sec 123241940 bytes/sec 99%

Last 300 second output: 0 packets/sec 0 bytes/sec 0%

……

If a loop occurs in the link:

¡ Check the link connection and ensure the port configuration is correct.

¡ For Layer 2 interfaces, enable the STP protocol and ensure the configuration is correct.

¡ Identify whether the STP status of adjacent devices is normal for Layer 2 ports.

¡ If the above configurations are correct, STP may have a calculation error or the protocol may calculate correctly, but the port driver layer does not block properly. You can shut down the ports on the loop or unplug and replug the ports to prompt STP to recalculate for a quick recovery.

4. Identify the CPU-intensive tasks.

If the above steps do not resolve the issue, use the display process cpu command to check which task is using the most CPU.

<Sysname> display process cpu slot 1

CPU utilization in 5 secs: 2.4%; 1 min: 2.5%; 5 mins: 2.4%

JID 5Sec 1Min 5Min Name

1 0.0% 0.0% 0.0% scmd

2 0.0% 0.0% 0.0% [kthreadd]

3 0.0% 0.0% 0.0% [migration/0]

4 0.0% 0.0% 0.0% [ksoftirqd/0]

5 0.0% 0.0% 0.0% [watchdog/0]

6 0.0% 0.0% 0.0% [migration/1]

7 0.0% 0.0% 0.0% [ksoftirqd/1]

8 0.0% 0.0% 0.0% [watchdog/1]

9 0.0% 0.0% 0.0% [migration/2]

10 0.0% 0.0% 0.0% [ksoftirqd/2]

11 0.0% 0.0% 0.0% [watchdog/2]

……

Each column represents the percentage of CPU usage for a task over 5 seconds, 1 minute, and 5 minutes, along with the task name. The higher the task utilization, the more CPU resources the corresponding task consumes. In normal conditions, task CPU usage is usually below 5%. Use this command to check tasks with significantly higher usage.

5. Acknowledge the call stack of the abnormal task.

Use the follow job job-id command in probe view to acknowledge the call stack of the abnormal task. Query it more than five times and send the results to the technical support personnel for analysis. This helps determine what processing the task is performing that causes the CPU usage to remain high. This example shows the call stack for JID 145.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] follow job 145 slot 1

Attaching to process 145 ([dGDB])

Iteration 1 of 5

------------------------------

Kernel stack:

[<ffffffff80355290>] schedule+0x570/0xde0

[<ffffffff80355da8>] schedule_timeout+0x98/0xe0

[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0

[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]

[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]

[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]

[<ffffffff8015c420>] kthread+0x130/0x140

[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20

Iteration 2 of 5

------------------------------

Kernel stack:

[<ffffffff80355290>] schedule+0x570/0xde0

[<ffffffff80355da8>] schedule_timeout+0x98/0xe0

[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0

[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]

[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]

[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]

[<ffffffff8015c420>] kthread+0x130/0x140

[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20

Iteration 3 of 5

------------------------------

Kernel stack:

[<ffffffff80355290>] schedule+0x570/0xde0

[<ffffffff80355da8>] schedule_timeout+0x98/0xe0

[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0

[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]

[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]

[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]

[<ffffffff8015c420>] kthread+0x130/0x140

[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20

Iteration 4 of 5

------------------------------

Kernel stack:

[<ffffffff80355290>] schedule+0x570/0xde0

[<ffffffff80355da8>] schedule_timeout+0x98/0xe0

[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0

[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]

[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]

[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]

[<ffffffff8015c420>] kthread+0x130/0x140

[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20

Iteration 5 of 5

------------------------------

Kernel stack:

[<ffffffff80355290>] schedule+0x570/0xde0

[<ffffffff80355da8>] schedule_timeout+0x98/0xe0

[<ffffffff802047e4>] ep_poll+0x4b4/0x5e0

[<ffffffffc05587a8>] DRV_Sal_EVENT_Read+0x1f8/0x290 [system]

[<ffffffffc07351e4>] drv_sysm_gdb_console+0xc4/0x2d0 [system]

[<ffffffffc1a04114>] thread_boot+0x84/0xa0 [system]

[<ffffffff8015c420>] kthread+0x130/0x140

[<ffffffff801183d0>] kernel_thread_helper+0x10/0x20

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· CPU_STATE_NORMAL

· CPU_MINOR_RECOVERY

· CPU_MINOR_THRESHOLD

· CPU_SEVERE_RECOVERY

· CPU_SEVERE_THRESHOLD

Power supply issues

Power supply is abnormal

Symptom

The power supply unit status LED is abnormal, or the power supply reports a fault during operation.

Common causes

The following are the common causes of this type of issue:

· The power supply unit model does not match the host.

· The power supply unit is not installed properly.

· The power cord is not securely plugged in.

· The power supply unit temperature is too high.

· Power supply unit failure.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 7.

Figure 7 Troubleshooting flowchart

Solution

1. Check whether the power supply unit model matches the host model.

2. Check the power supply system of the device: acknowledge that the power supply system operates correctly and the voltage is normal.

3. Use the indicator on the power supply unit to initially assess whether issues such as output short circuit, output overcurrent, output overvoltage, input undervoltage, or overheating exist. The power indication light states vary among different hosts. For more information about the specific host, see the corresponding hardware manual.

4. Check the power supply unit status.

5. Use the display power command to show the power supply unit status. Check for any Fault, Error, or Absent states in the power modules.

<Sysname> display power

Power 0 State: Normal

Power 1 State: Absent

Power 2 State: Absent

Power 3 State: Absent

You can also use the display alarm command to view the alarm messages from the power supply unit.

<Sysname> display alarm

Slot CPU Level Info

- - INFO Power 1 is absent.

- - INFO Power 2 is absent.

- - INFO Power 3 is absent.

6. If the power supply unit status is Absent, follow these sub-steps for troubleshooting.

a. Please remove the power supply unit and reinstall it. Check the power connector for any damage before reinstalling.

b. After reinstallation, if the power supply unit does not return to Normal status, replace it with a functioning power supply unit in a different slot for cross-verification.

c. If the power supply unit still shows as absent, replace it with a new power supply unit.

d. After replacing the power supply unit, this issue persists. Please execute step 7.

7. If the power supply unit status shows Fault or Error, follow these steps for troubleshooting.

a. Identify whether the power cord (PWR) is loose or properly connected.

b. If the power cord (PWR) connects properly, verify the power cord (PWR) for faults.

c. If the power cord (PWR) is normal, high temperatures may be causing the power supply unit (PSU) to malfunction. Check the power supply unit for dust accumulation. If there is excessive dust, clean it and then uninstall and reinstall the power supply unit.

d. After reinstallation, the power supply unit status did not return to Normal. Please swap this power supply unit with a functioning one for cross-verification.

e. If the power supply unit still shows a Fault status, replace the power supply unit.

f. After replacing the power supply unit, this issue persists. Please execute step 7.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· DEV/2/POWER_FAILED

· DEV/3/POWER_ABSENT

Fan issues

The fan module status is abnormal

Symptom

The fan module status LED is abnormal, or the fan frame reports a fault during operation.

Common causes

The following are the common causes of this type of issue:

· The fan is not securely plugged in.

· The chassis air intake vent and exhaust vent are blocked by foreign objects.

· Fan hardware failure.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 8.

Figure 8 Troubleshooting flowchart

Solution

1. Check the fan module indicator light status for normal operation. The status may vary between different hosts. For more information about the specific host's hardware manual, see the relevant documentation. If all the indicator lights are off, acknowledge whether the power supply unit is functioning properly or if the overall system switch wiring is open. For more information about power supply unit status anomalies, see"Power supply is abnormal."

2. Check the fan status.

Use the display fan command to check the fan frame status.

<Sysname> display fan

Fan Frame 0 State: Normal

Use the display alarm command to view the fan box alarm message.

<Sysname> display alarm

Chassis Slot CPU Level Info

2 - - INFO fan 1 is absent.

3. Check that the fan frame is securely installed.

4. If the fan frame's operating state shows as Absent, the fan frame is either not in place or not securely installed. If the fan frame is in place, remove and reinstall it. Before reinstalling, check that the fan connector is intact. Then, ensure the fan frame status shows as Normal. If it still shows as Absent, replace the fan frame. If the new fan frame still shows as Absent, execute step 5.

5. Check the device's operating environment information.

6. If the fan frame's operating state shows as Fault, the fan frame is malfunctioning and cannot provide heat dissipation function. Use the following steps for further identification.

a. Use the display environment command to Identify whether the system temperature continues to rise. If the system temperature continues to rise, touch the device's air outlet with your hand to check for airflow. If the temperature continues to rise and there is no airflow from the outlet, the fan frame is abnormal.

b. Identify whether the chassis air intake vent and exhaust vent are blocked by foreign objects. Please clear any foreign objects.

c. Identify whether the speed of each fan is normal.

Use the display fan command in any view to Identify whether the speed of each fan differs from the normal speed by more than 50%. If abnormalities occur, acknowledge them by unplugging and reinserting the fan or replacing the crossover for further confirmation.

d. If you confirm a fan issue, uninstall and then reinstall the fan module. Before reinstalling, check the fan connector for damage. Use the display fan command to see if it returns to Normal status.

e. If you still cannot restore to Normal status, please replace the fan frame. If there is no fan frame on site and immediate replacement is not possible, turn off the device to prevent overheating and circuit damage. If cooling measures keep the system below 50 degrees Celsius, you can continue using the device.

f. If replacing the fan frame still does not restore to Normal status, execute step 5.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· DEV/2/FAN_FAILED

· DEV/3/FAN_ABSENT

Card issues

Abnormal card state

Symptom

· The card status is abnormal (for example, execute the display device command to Identify whether the card status shows Absent or Fault).

· The card experiences abnormal reboots, fails to start, or keeps rebooting.

Common causes

The following are the common causes of this type of issue:

· The card is not installed properly.

· Card damage.

· The faceplate's indication light is abnormally lit.

· Power supply unit failure.

· The power supply unit output power is insufficient.

· The host software version does not support using this card.

· The MPU is in an abnormal operating state.

· The device identifier of the service module, standby MPU, or switching fabric module does not match the active MPU.

· The switching fabric module is not in place or is in an abnormal state before the service module starts.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 9.

Figure 9 Troubleshooting flowchart

Solution

Card status: Absent

1. Acknowledge that the module is securely inserted. Check for any gaps between the module and the chassis. You can also remove and reinsert the module. Before reinserting, check the connector status of the card for deformation or dirt.

2. Place the card in another slot and move a functioning card from the frame to this slot. Acknowledge whether the issue is with the card.

3. Identify whether the indication lights on the faceplate are lit.

4. Acknowledge whether the power supply unit provides sufficient output power. For example, add a power supply unit and Identify whether the card status recovers to normal.

5. Acknowledge whether the host software version supports this card.

a. Execute the display version command to view the software version of the host.

b. Contact technical support to acknowledge whether the current host software version supports this card.

c. If the current software version does not support this card, upgrade to the correct version. Acknowledge compatibility with other cards before the version upgrade.

6. If the card is the MPU, connect the configuration cable to the Console port. Use a pointed tool (like a pen tip) to press the system reset button (RESET) on the card, or reboot the card using the reboot slot slotid force command. Check the boot information on the configuration terminal to see if it returns to normal (no display or garbled characters indicate an abnormal situation). Also, verify that the status LED on the card returns to normal. Under normal conditions, the terminal displays information similar to the following after startup:

System is starting...

Press Ctrl+D to access BASIC-BOOTWARE MENU

Press Ctrl+T to start memory test

Booting Normal Extend BootWare........

****************************************************************************

* *

* H3C SR66 BootWare, Version 7.1.064 *

* *

****************************************************************************

Compiled Date : Apr 6 2017

CPU Type : XLS408

CPU L1 Cache : 32KB

CPU Clock Speed : 1000MHz

Memory Type : DDR2 SDRAM

Memory Size : 2048MB

Memory Speed : 533MHz

BootWare Size : 1024KB

Flash Size : 4MB

cfa0 Size : 244MB

BASIC CPLD Version : 131.0

EXTEND CPLD Version : 133.0

PCB Version : Ver.B

BootWare Validating...

Press Ctrl+B to enter extended boot menu...

7. If the card is a service module, first ensure the MPU is in a normal operating state and check that the daughtercard connector is not deformed or dirty.

8. If you confirm a card failure, replace the card, collect the information below, and contact technical support personnel.

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

The card status is power-off.

1. Acknowledge whether the device environment has experienced overheating. Use the display power-supply command to check for records of excessive ambient temperature and powered-off cards. For example, if the power status of the card shows "Status" as "off," it indicates that the card has been actively powered down due to user actions or over-temperature protection.

<Sysname> display power-supply verbose

Power No. State Description

------------------------------------------------

1 Normal VAPEL-1200AC

2 Absent Unknown

Power supply information for chassis 0

------------------------------------------------

Total system power : 1200 watts

Redundant system power : 0 watts

Used system power : 0 watts

Available system power : 990 watts

Reserved system power : 210 watts

Slot Card type Used power(W) State

------------------------------------------------

0 RT-RSE-X3 50 On

1 N/A 0 Off

2 N/A 0 Off

3 N/A 0 Off

4 N/A 0 Off

5 N/A 0 Off

2. If you acknowledge overheating due to power issues, Identify whether all slots for the cards are filled. If they are, use the display fan command to confirm the fan's operation. A fan status of Normal indicates proper functioning. If the fan is not normal or if you suspect a power issue with the card, collect the following information and contact technical support personnel.

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Card status: Fault.

1. Check the overall system power consumption. If the power consumption is insufficient, the card will enter a fault state.

2. Wait about 10 minutes to acknowledge whether the order card remains in Fault or Normal status, then restart it again. If the card restarts automatically after being in Normal mode, collect the following information and contact technical support personnel.

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

3. If the card is the MPU, connect the serial line and check the configuration terminal for normal startup information or any abnormal startup messages. If the MPU fails the memory read/write test during startup and keeps rebooting, Identify whether the memory module is securely seated.

readed value is 55555555 , expected value is aaaaaaaa

DRAM test fails at: 080ffff8

Fatal error! Please reboot the card.

4. Place the card in a different slot to further confirm whether the slot is faulty.

5. If you confirm a single card failure, replace the card, collect the following information, and contact technical support personnel.

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Abnormal single card restart

The card restart here refers to a situation where the card has restarted, and its current status is Normal.

1. Analyze the logs or run time to identify the restart period. Acknowledge whether users executed the reboot command or performed card power cycling near the restart time.

2. Use the display version command to check the reason for the last reboot of the card. For example, "Last reboot reason" indicates that the last reboot of the card was due to power on the device.

<Sysname> display version

H3C Comware Software, Version 7.1.075, Release 7751P01

H3C SR6600-X uptime is 0 weeks, 0 days, 4 hours, 24 minutes

Last reboot reason : Cold reboot……

3. If all circuit cards restart simultaneously, Identify whether the device's power supply unit functions properly. Acknowledge any power outages and ensure the power input is securely connected without looseness.

4. Check the logs to see if there are warning messages similar to **"Warning: Standby board on slot 1 is not compatible with master board."** or **"Warning: The LPU board on slot 1 is not compatible with MPU board."** during the reboot. This indicates that the device identifier of the service module, standby MPU, or switching fabric module does not match the active MPU. In such cases, contact technical support for replacement.

5. If you cannot acknowledge, collect the following information and contact technical support personnel.

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

The MPU cannot start

Symptom

The MPU fails to start.

Common causes

The following are the common causes of this type of issue:

· The MPU hardware failure prevents powering on.

· The MPU BootWare basic segment is damaged.

· Memory or CPU hardware failure prevents BootWare from running.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 10.

Figure 10 Troubleshooting flowchart

Solution

1. Identify whether the MPU status light (RUN light) is on.

After BootWare initializes, it immediately sets the running light to a quick flash. This serves as an important indicator of whether the system can boot.

The LEDs on different MPUs might vary slightly. For specific details, see the hardware description of the corresponding product.

2. If the device's power-on indicator flashes quickly, the basic segment starts normally. Proceed to step 4.

3. If the power light is off, the device may not be powered on or the BootWare segment may have corruption.

a. First, Identify whether the device is powered on. Observe the MPU from the front of the air intake vent. Check for any green flashing lights or steady lights inside the MPU. After some time, remove the MPU and feel the heat of the heat dissipation fins on the CPU.

b. If there is no power, check the power supply and power module. Hardware faults in the device may also prevent the motherboard from powering on.

c. If the device powers on normally, the BootWare basic segment may be corrupted and should be returned for research and development (R&D) processing.

NOTE:

The term "the running light not lit" refers to a situation where the light has never turned on after power-up. If it flashes for more than 5 seconds and then goes out, it does not apply.

It is virtually impossible for the running light to stay constantly lit or blink slowly (at 1Hz frequency) immediately after power-on. If this occurs, it indicates a hardware failure.

4. Identify whether the Bootware runs successfully.

¡ Check for the following information. If present, it indicates that the basic segment has run successfully. Proceed to step 5.

System is starting...

Press Ctrl+D to access BASIC-BOOTWARE MENU

Press Ctrl+T to start memory test

Booting Normal Extend BootWare........

****************************************************************************

* *

* H3C SR66 BootWare, Version 7.1.064 *

* *

****************************************************************************

Compiled Date : Apr 6 2017

CPU Type : XLS408

CPU L1 Cache : 32KB

CPU Clock Speed : 1000MHz

Memory Type : DDR2 SDRAM

Memory Size : 2048MB

Memory Speed : 533MHz

BootWare Size : 1024KB

Flash Size : 4MB

cfa0 Size : 244MB

BASIC CPLD Version : 131.0

EXTEND CPLD Version : 133.0

PCB Version : Ver.B

BootWare Validating...

¡ If there is no output, the memory or CPU may have issues. Proceed to step 5.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

The new MPU cannot start

Symptom

The device originally had one main board. A new main board was added as a standby main board (SMB), but the new main board fails to start.

Common causes

The following are the common causes of this type of issue:

· The standby MPU and the original MPU have different models.

· The software versions of the standby MPU and the original MPU do not match.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 11.

Figure 11 Troubleshooting flowchart

Solution

1. Identify whether the new MPU matches the model of the original MPU.

Both MPU models in the same device must match. Identify whether the two MPU models match. If they do not match, replace one with a compatible MPU.

2. Identify whether the newly added MPU version matches the original MPU version.

Connect to the Console port of the standby main board (SMB) to Identify whether the system software version loaded during startup matches that of the primary main board. If they are inconsistent, upgrade the version of the standby main board (SMB) in the BootWare menu.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

The MPU restarts during use and fails to boot properly

Symptom

The MPU restarts during use and fails to boot normally.

Common causes

The following are the common causes of this type of issue:

· The startup file is corrupted.

· The MPU memory unit is damaged.

· The board is not fully inserted or is damaged, causing BootWare to run abnormally.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 12.

Figure 12 Troubleshooting flowchart

Solution

1. Identify whether the startup files on the MPU are functioning properly.

Log in to the faulty MPU through the console port. Restart the device. If BootWare prompts a CRC error or cannot find the boot file, reload the boot file. Acknowledge that the file size in Flash matches the file on the server. If it does not exist or is inconsistent, reload the boot file. After loading, set this file as the current startup file. BootWare automatically sets this file as the current startup file during the loading process.

2. Test whether the MPU memory unit functions correctly.

Acknowledge that the loaded file size is correct and that setting it as the current startup file is also normal. Please restart the board and immediately hold down CTRL+T to check the memory module. If you receive a memory error, replace the board.

3. Identify whether Bootware still prompts an error.

If the memory check is normal but you still see error messages during the BootWare startup, use the relevant prompts to initially identify the faulty component. Check that the board is securely inserted. Replace the module if it is securely inserted.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Active/standby switchover failure

Symptom

This type of failure commonly occurs in the following three situations:

· Use the reboot command to restart the active MPU, and the standby MPU will also restart.

· Changeover between primary and backup has an issue.

Common causes

The following are the common causes of this type of issue:

· The standby main board (SMB) automatically becomes the main control board when the primary control board restarts before the original SMB finishes booting.

· The standby main board (SMB) did not receive messages from the main main board and switched to the main main board.

· The MPU itself has an anomaly that causes a reboot.

· The main control board and standby main board (SMB) versions are inconsistent.

Troubleshooting flow

Use the reboot command to restart the active main board. The standby main board also restarts. For diagnosing such faults, see Figure 13.

Figure 13 Troubleshooting flowchart

Solution

When you reboot the primary main board using the reboot command, the standby main board (SMB) also restarts. The solution for this issue is as follows:

1. After the primary MPU starts successfully, use the ftp or tftp command to upload the latest logfile from the logfile directory on the storage media to the file server.

2. Check the logfile for the reboot command log (similar to "Command is reboot slot 0") from the last startup (similar to "SYSLOG_RESTART:"). During this time, has a similar message appeared, such as "Batch backup of standby board in slot 1 has finished"?

a. If it has not occurred before, this indicates that the standby main board (SMB) has not fully started. The active main board has rebooted and passively turned into the active main board. In this case, the SMB reboot is normal and requires no action. Before the next reboot, ensure the standby main board (SMB) completes the batch backup. Look for logs indicating "Batch backup of standby board in slot 1 has finished." Then, use the reboot slot command to reboot the main control board.

b. If this occurs, please contact H3C technical support personnel.

For changeover exceptions between primary and backup, the solution for such faults is as follows:

3. Use the display system stable state command to collect the status information of the primary and backup controllers.

<H3C> display system stable state

System state : Stable

Redundancy state : Stable

Slot CPU Role State

0 0 Active Stable

1 0 Standby Stable

Check the displayed information.

a. Determine whether the dual master control role is Active or Standby.

b. Identify whether the primary and backup control statuses are stable.

4. Use the display boot-loader command to collect version information for the primary and backup controllers. Identify whether the versions of the primary and backup controllers match.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Service card cannot start up

Symptom

A service card cannot start up.

Common causes

The following are the common causes of this type of issue:

· Abnormal operation of the switching fabric module.

· Power supply anomaly.

· The software version does not support this service card.

· The service card is not properly installed.

· Hardware failure of the service card.

· Hardware failure of the chassis slot.

Solution

1. Check whether the switch fabric module is functioning properly.

Ensure that the switch fabric module is in place and its status is **Normal**. If the status is abnormal, troubleshoot the switch fabric module first.

2. Check whether the service module is powered on.

Observe the RUN indicator status on the service module. If the indicator is off, the service module may not be powered on. Proceed with the following substeps for troubleshooting. If the power is normal, proceed to step 3.

3. Check the power module indicators to determine whether the power module is functioning properly.

If the indicators are abnormal, see the power module state section for troubleshooting.

4. Calculate the total power consumption and check whether the remaining power capacity is sufficient.

If the power is insufficient, add additional power modules.

5. Check whether the software version supports the service module.

Execute the display version command in any view to check the device’s software version. Then verify whether the current software version supports the service module. If not, upgrade to a compatible version. Before upgrading, ensure that the new version is compatible with other boards.

6. Reseat the service module.

Remove the service module, inspect the connectors for any damage, and reinsert it securely to ensure proper installation.

7. Test the service module in another slot to see if it can boot.

¡ If it still fails to boot in another slot, the service module may be faulty. Replace it with a new one for testing.

¡ If it boots successfully in another slot, install another working service module into the original faulty slot. If it fails to boot, the chassis slot itself may be faulty.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Service card restarts during operation and cannot start up

Symptom

A service card restarts during operation and cannot start up

Common causes

The following are the common causes of this type of issue:

· Abnormal power supply.

· Abnormal startup file on the MPU.

· Service module hardware failure.

· Chassis slot hardware failure.

Solution

1. Check whether the power module is functioning properly.

Verify the status of the power module indicators and ensure the power capacity meets the operational requirements of the board. If any power module is malfunctioning, see the power module fault section for troubleshooting.

2. Check whether the boot files on the main control board are intact.

Execute the display boot-loader command in any view to check the next-startup software package for the board. In the user view, run the dir command to confirm whether the boot software package exists. If it is missing or corrupted, obtain the correct boot package again or configure another software package as the next-startup file for the board.

<Sysname> display boot-loader

Software images on slot 0:

Current software images:

cfa0:/SR6600X-CMW710-BOOT-F8149L19-RSE3.bin

cfa0:/SR6600X-CMW710-SYSTEM-F8149L19-RSE3.bin

Main startup software images:

cfa0:/SR6600X-CMW710-BOOT-F8149L19-RSE3.bin

cfa0:/SR6600X-CMW710-SYSTEM-F8149L19-RSE3.bin

Backup startup software images:

None

<Sysname>dir

Directory of cfa0: (VFAT)

0 -rw- 4944 Mar 06 2024 08:35:06 20210430.cfg

1 -rw- 94704 Mar 06 2024 08:35:06 20210430.mdb

2 -rw- 43518976 Mar 06 2024 08:17:58 SR6600X-CMW710-BOOT-F8149L19-RSE3.b

3 -rw- 317644800 Mar 06 2024 08:26:24 SR6600X-CMW710-SYSTEM-F8149L19-RSE3

.bin

4 -rw- 361170944 Mar 06 2024 08:12:42 SR6600X-RSE3.ipe

5 drw- - Apr 22 2021 23:32:50 diagfile

6 drw- - Oct 11 2022 18:49:54 dpi

7 -rw- 296 Mar 06 2024 08:35:06 ifindex.dat

8 drw- - Jan 11 2021 10:11:48 license

9 drw- - Oct 25 2022 02:09:00 logfile

10 drw- - Mar 06 2024 17:35:30 pki

11 drw- - Jan 11 2021 10:11:48 seclog

12 drw- - Mar 06 2024 07:24:46 tracefile

13 drw- - Apr 29 2021 17:39:16 versioninfo

1020068 KB total (304488 KB free)

3. Test whether a normally functioning service board can boot in the faulty slot.

If the boot files loaded on the service board are confirmed to be normal, insert another working service board into the problematic slot for testing (if conditions permit).

If the inserted working service board boots successfully, this rules out issues with the MPU or backplane. Proceed to step 4.

If the inserted working service board still fails to boot, replace the MPU.

4. Check for loading logs.

Execute the display logbuffer command in any view to check whether there are loading records for the board in the corresponding slot in the device's logbuffer.

<Sysname> display logbuffer

%Jan 12 19:13:49:513 2022 H3C DEV/4/BOARD_LOADING: -MDC=1; Board in slot 1 is loading software images.

%Jan 12 19:14:01:718 2022 H3C DEV/5/LOAD_FINISHED: -MDC=1; Board in slot 1 has finished loading software images.

If loading logs for the board in the corresponding slot exist, relocate the service board to another slot to see if it can boot normally.

If no loading logs are found for the board in the corresponding slot, proceed to step 5.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· DEV/4/BOARD_LOADING

· DEV/5/LOAD_FINISHED

Port issues

The port experiences a CRC error

Symptom

Use the display interface command to check for CRC error packets on the port.

<Sysname> display interface gigabitethernet2/0/1

GigabitEthernet2/0/1

Current state: DOWN

Line protocol state: DOWN

Description: GigabitEthernet2/0/1 Interface

Bandwidth: 1000000 kbps

Maximum transmission unit: 1500

Allow jumbo frames to pass

Broadcast max-ratio: 100%

Multicast max-ratio: 100%

Unicast max-ratio: 100%

Internet address: 2.1.1.2/24 (primary)

IP packet frame type: Ethernet II, hardware address: 0000-fc00-9276

IPv6 packet frame type: Ethernet II, hardware address: 0000-fc00-9276

Loopback is not set

Media type is twisted pair, port hardware type is 1000_BASE_T

Promiscuous mode is not set

Port priority: 0

1000Mbps-speed mode, full-duplex mode

Link speed type is autonegotiation, link duplex type is autonegotiation

Flow-control is not enabled

Maximum frame length: 9216

Output queue - Urgent queuing: Size/Length/Discards 0/1024/0

Output queue - Protocol queuing: Size/Length/Discards 0/500/0

Output queue - FIFO queuing: Size/Length/Discards 0/1024/0

Last link flapping: 6 hours 39 minutes 28 seconds

Last hardware down reason: PHY line side is down

Last clearing of counters: Never

Current system time:2017-12-09 10:46:24

Last time when physical state changed to up:-

Last time when physical state changed to down:2017-12-09 10:25:30

Peak input rate: 8 bytes/sec, at 2019-03-19 09:20:48

Peak output rate: 1 bytes/sec, at 2019-03-19 09:16:16

Last 300 second input: 0 packets/sec 0 bytes/sec -%

Last 300 second output: 0 packets/sec 0 bytes/sec -%

Input (total): 2892 packets, 236676 bytes

24 unicasts, 2 broadcasts, 2866 multicasts, 0 pauses

Input (normal): 2892 packets, - bytes

24 unicasts, 2 broadcasts, 2866 multicasts, 0 pauses

Input: 0 input errors, 0 runts, 0 giants, 0 throttles

3 CRC, 0 frame, - overruns, 0 aborts

- ignored, - parity errors

Output (total): 29 packets, 1856 bytes

24 unicasts, 5 broadcasts, 0 multicasts, 0 pauses

Output (normal): 29 packets, - bytes

24 unicasts, 5 broadcasts, 0 multicasts, 0 pauses

Output: 0 output errors, - underruns, - buffer failures

0 aborts, 0 deferred, 0 collisions, 0 late collisions

0 lost carrier, - no carrier

The information above shows that the incoming port has experienced CRC errors.

Common causes

· The port has a ghost connection with the cable connector.

· Port anomaly.

· The cable connector is damaged.

· The transceiver module or fiber optic may have contamination or poor connections.

· Insufficient optical power.

· Intermediate link or device failure.

· Device or board hardware failure.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 14.

Figure 14 Troubleshooting flowchart

Solution

1. Use the port to perform an internal loopback check.

Configure the loopback internal command on the port to enable the internal loopback function. Then, use the display interface command to Identify whether the port's CRC error packet statistics increase. If growth occurs, a device or hardware fault may exist. Please contact technical support personnel. If there is no growth, then it is not an internal port issue.

2. Check for any abnormalities between the port and the cable connector.

a. Check the physical connection of the port and cable connector for any loose connections. If there is a loose connection, connect the port and cable connector properly.

b. Check the port for abnormalities, such as foreign objects, bent pins, or deformed housings. If there is an issue, replace it with another functioning port or transceiver module.

c. Check the cable connector for any damage. If you notice any damage, replace the cable.

3. Check the transceiver module for any abnormalities.

a. Connect the Tx and Rx ends of the transceiver module for this port using fiber optic. Then, use the display interface command to Identify whether the port's CRC error packet statistics increase. If there is growth, the issue may be with the transceiver module. If there is no growth, the issue does not lie with the transceiver module.

b. Use the display transceiver alarm command to check for Rx_Los or Tx_Fault alarm messages in the transceiver module. If you find any alarm messages, clean or replace the fiber optic or transceiver module.

c. Use the display transceiver diagnosis command to Identify whether the transceiver module's receive (Rx) and transmit power are within the specified maximum and minimum range. If the receive or transmit power exceeds the range, clean or replace the fiber optic or transceiver module.

4. Replace the normal port to test if recovery is possible.

Test by replacing with another normal port. If the packet loss disappears after the change and reappears when switching back, replace the port due to hardware fault and send the fault information to technical support personnel for analysis. If packet loss persists on other normal ports, a link fault in the transmission link is likely.

5. Identify whether the transmission link is functioning properly.

Use the instrument to test the intermediate link. Poor link quality or excessive signal degrade can cause errors during message transmission. Identify whether the interconnecting intermediate link devices (optical transceivers, patch panels, transmission devices, etc.) are functioning properly. If a link fault occurs in the transmission link, replace or recover the transmission link.

6. Execute the shutdown command, then execute the undo shutdown command to Identify whether the port can recover normally.

7. If the issue persists, it may be a device or board hardware failure. Collect information and contact technical support personnel.

Related alarm and log messages

Alarm messages

N/A

Log messages

Number of CRC error packets exceeded the high threshold: Interface Name GigabitEthernet2/0/1, High threshold 1000, Number of CRC error packets 6611063, Interval 10s.

The port does not receive packets

Symptom

The port status is UP but does not receive packets or experiences packet loss.

Use the display interface command to check that the received message statistics for this end's incoming orientation are less than the sent message statistics for the opposite end's outgoing orientation.

Common causes

· The port has a CRC error.

· The configuration on the port affects packet reception.

· Device or board hardware failure.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 15.

Figure 15 Troubleshooting flowchart

Solution

1. Identify whether the port has CRC errors.

Check the "Port CRC Errors" section for troubleshooting.

2. Identify whether the port configuration affects message reception.

Identify whether the port configuration affects message reception by following these steps:

a. Use the display interface brief command to check for any anomalies in the port configuration. This includes configurations for the duplex mode of both ends, port types, and VLAN settings. If there are any issues, change the port attribute configuration to Identify whether the faulty port can recover. If you cannot, first execute the shutdown command, then execute the undo shutdown command, and Identify whether the port can recover normally.

b. For Layer 2 ports, if you configure the STP function, use the display stp brief command to Identify whether the port is in a discarding state. If the port is set to a discarding state by STP, investigate further based on the relevant STP configuration. Set the port configuration for connecting terminal equipment (TE) as an edge port or disable STP functionality for that port.

c. If the port joins an aggregation group, use the display link-aggregation summary command to Identify whether the port is in the Selected state. When the port status is Unselected, the port cannot send or receive datagrams. Identify the reason for the port being in the Unselected state. Check for inconsistencies in the attribute configuration of member ports within the aggregation group compared to the reference port, and investigate further to resolve the issue.

d. If you configure ACL filtering, further investigate based on the related ACL settings.

e. If the interface has traffic control enabled, disable the traffic control function to see if the faulty port can recover.

3. Execute the shutdown command, then execute the undo shutdown command to Identify whether the port can recover normally.

4. If the issue persists, it may indicate a device or hardware failure. Collect information and contact technical support personnel.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

The port does not send packets

Symptom

The port status is UP, but it does not send packets.

Use the display interface command to check that the sending message statistics for this end's orientation do not increase.

Common causes

· Transceiver module malfunction.

· The configuration on the port affects message reception.

· Device or board hardware failure.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 16.

Figure 16 Troubleshooting flowchart

Solution

1. Perform internal loopback checks on the port.

Configure the loopback internal command under the port to enable the internal loopback function. Then, use the display interface command to Identify whether the outgoing packet statistics have increased. If it does not grow, a device or hardware (HW) failure may occur. Please contact technical support personnel. If there is growth, it is not an internal port issue.

2. Identify whether the port configuration affects message transmission.

Identify whether the port configuration affects message transmission by following these steps:

a. For Layer 2 ports, if you configure the STP function, use the display stp brief command to Identify whether the port is in a discarding state. If the port is set to the discarding state by STP, further investigate according to the relevant STP configuration. Configure the port connecting the terminal equipment as an edge port or disable the STP function for that port.

b. If the port joins an aggregation group, use the display link-aggregation summary command to Identify whether the port is in the Selected state. When the port status is Unselected, the port cannot send or receive datagrams. Identify the reason why the port is in the Unselected state. Check for inconsistencies in attribute configurations among member ports in the aggregation group compared to the reference port, and investigate further to resolve the issue.

c. If you configure ACL filtering, further investigate based on the relevant ACL settings.

d. If the interface has traffic control enabled, disable the traffic control function to Identify whether the faulty port can recover.

3. Execute the shutdown command, then execute the undo shutdown command to Identify whether the port can recover normally.

4. If the issue persists, it may be a device or hardware failure. Gather information and contact technical support personnel.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Copper port is not up

Symptom

The copper ports cannot establish a normal connection after connecting the All-in-one cable.

Common causes

The following are the common causes of this type of issue:

· Port configuration issue.

· The network cable has issues.

· There is an issue with this port or the remote port.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 17.

Figure 17 Troubleshooting flowchart

Solution

1. Identify whether the network port configurations (port speed, duplex, negotiation mode, etc.) are consistent on both ends of the network cable. Execute the display interface brief command to Identify whether the rates and duplex configurations of both ends match. If they do not match, configure the port's speed and duplex mode using the speed and duplex commands.

<Sysname> display interface brief

Brief information on interfaces in route mode:

Link: ADM - administratively down; Stby - standby

Protocol: (s) – spoofing

Interface Link Protocol Primary IP Description

GE2/0/1 DOWN DOWN --

Loop0 UP UP(s) 2.2.2.9

NULL0 UP UP(s) --

Vlan1 UP UP --

Vlan999 UP UP 192.168.1.42

Brief information on interfaces in bridge mode:

Link: ADM - administratively down; Stby - standby

Speed: (a) - auto

Duplex: (a)/A - auto; H - half; F - full

Type: A - access; T - trunk; H - hybrid

Interface Link Speed Duplex Type PVID Description

GE2/0/2 DOWN auto A A 1 aaaaaaa

GE2/0/3 UP 1G(a) F(a) A 1 aaaaaaa

2. Use the display interface command to Identify whether the port status is Administratively DOWN. If it is, activate the corresponding Ethernet port with the undo shutdown command.

<Sysname> display interface gigabitethernet 2/0/1

GigabitEthernet2/0/1

Current state: Administratively DOWN

Line protocol state: DOWN

Description: GigabitEthernet2/0/1 Interface

Bandwidth: 1000000 kbps

Maximum transmission unit: 1500

Allow jumbo frames to pass

Broadcast max-ratio: 100%

Multicast max-ratio: 100%

Unicast max-ratio: 100%

Internet protocol processing: Disabled

...

3. Replace the faulty network cable with a confirmed working one and Identify whether the issue is resolved.

4. Replace the local device port and the remote device port, then Identify whether the fault is resolved.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Frequent port up/down events

Symptom

After inserting the All-in-one cable or transceiver module into the card, the port frequently goes UP and DOWN.

Common causes

The following are the common causes of this type of issue:

· Transceiver module or All-in-one cable failure.

· The copper ports' auto-negotiation is unstable.

· Clock configuration issues at both ends of the WAN port.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 18.

Figure 18 Troubleshooting flowchart

Solution

1. For the fiber port, acknowledge whether the transceiver module is functioning abnormally. Check the alarm information of the transceiver modules to troubleshoot issues with both modules and the fiber optic in between. If the alarm message indicates a receiving issue, check the opposite end port, fiber optic, or transmission equipment. For sending issues or abnormal current and voltage, investigate your local port.

<Sysname> display transceiver alarm interface gigabitethernet 2/0/1

GigabitEthernet2/0/1 transceiver current alarm information:

RX loss of signal

RX power low

2. Check whether the transceiver module's receive (Rx) and transmit optical power are normal, meaning they fall within the upper and lower threshold values. If the transmitted optical power is at a critical value, replace the fiber and transceiver module for cross-verification. If the received optical power is at a critical value, check the remote transceiver module and the intermediate fiber link.

<Sysname> display transceiver diagnosis interface gigabitethernet 2/0/1

GigabitEthernet2/0/1 transceiver diagnostic information:

Current diagnostic parameters:

Temp(°C) Voltage(V) Bias(mA) RX power(dBm) TX power(dBm)

36 3.31 6.13 -35.64 -5.19

Alarm thresholds:

Temp(°C) Voltage(V) Bias(mA) RX power(dBM) TX power(dBM)

High 50 3.55 1.44 -10.00 5.00

Low 30 3.01 1.01 -30.00 0.00

3. For copper ports, unstable negotiation often occurs during auto-negotiation. In this case, try setting a fixed rate and duplex.

4. For the WAN port, Identify whether the clocks on both ends are configured. Set the side with the clock card on the MPU to Master and the other side to Slave.

5. If the fault persists, check the link, endpoint devices, and intermediate equipment.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Transceiver module issues

Fiber port is not up

Symptom

The fiber port is not up.

Common causes

· The current version of the device does not support this transceiver module.

· The fiber port has foreign objects, or the transceiver module's gold fingers are contaminated or damaged.

· The transceiver module does not match the interface rate.

· Fiber port failure.

· Transceiver module or All-in-one cable failure.

· The transceiver module does not match the fiber optic type.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 19.

Figure 19 Troubleshooting flowchart

Solution

1. Identify whether the device's current version supports the transceiver module.

Check the Installation Manual (IM) or Release Notes to see if the current software version supports this transceiver module. You can upgrade the software version if a new version supports the transceiver module.

2. Identify whether the transceiver module matches the interface rate and duplex mode.

Execute the display interface command to Identify whether the port and transceiver module's rate and duplex configuration match. If they do not match, configure the port's speed and duplex mode using the speed and duplex commands.

3. Identify whether the optical interface is faulty.

Directly connect the fiber ports with matching All-in-one cables for short reach (SR) on the same rate on this device. Identify whether the port can come up. If the connection can be established, the remote port is abnormal. If it cannot be established, the local port is abnormal. You can Identify whether the issue is resolved by swapping the local and remote ports.

4. Check for any issues with the transceiver module or all-in-one cable.

Check for abnormalities in the transceiver module or All-in-one cable using the following steps:

a. Use the display transceiver alarm interface command to view the current port's transceiver module alarm messages. If it shows "None," there are no faults. If there are alarm messages, check the transceiver module or All-in-one cable alarm messages to determine if the issue is with the optical transceiver or the fiber optic or the opposite end. For example, if you encounter RX signal loss and TX fault errors, check for foreign objects on the fiber port or severe oxidation on the transceiver module's gold fingers.

b. Use the display transceiver interface command to Identify whether the transceiver module types, wavelengths, and transmission distances match on both ends.

c. Use the display transceiver diagnosis interface command to Identify whether the current measurement values of the transceiver module's diagnostic parameters are within the normal range. Common issues and solutions for parameter exceptions are as follows:

- Secure the fiber optic connection with the transceiver module to resolve poor contact issues.

- Replace the fiber optic if its quality is poor or damaged.

- Adjust the optical attenuation devices based on actual usage when the transmission path adds intermediate optical attenuation devices.

- Replace the transceiver module with one that matches the actual transmission distance when there is a significant difference between the adapted and actual distances.

5. Check that the transceiver module type matches the fiber optic.

Use the H3C transceiver module manual to Identify whether the transceiver module type matches the fiber optic type. If there is a mismatch, resolve it by replacing the fiber optic.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· OPTMOD/3/CFG_ERR

· OPTMOD/5/CHKSUM_ERR

· OPTMOD/5/IO_ERR

· OPTMOD/4/FIBER_SFPMODULE_INVALID

· OPTMOD/4/FIBER_SFPMODULE_NOWINVALID

· OPTMOD/5/MOD_ALM_ON

· OPTMOD/5/RX_ALM_ON

· OPTMOD/5/RX_POW_HIGH

· OPTMOD/5/RX_POW_LOW

System logs contain information about non-H3C transceiver modules

Symptom

Use the display logbuffer command to check the system logs. You may find information about non-H3C compliant transceiver modules. Log messages display the following information:

This transceiver is NOT sold by H3C. H3C therefore shall NOT guarantee the normal function of the device or assume the maintenance responsibility thereof!

Common causes

The transceiver module is either from third parties or a counterfeit H3C transceiver module.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 20.

Figure 20 Troubleshooting flowchart

Solution

1. Identify whether the transceiver module is an H3C model.

a. Determine if the transceiver module is H3C certified by checking the label on the module.

b. Use the display transceiver interface command to Identify whether the Vendor Name is H3C. If it displays H3C, you may have an H3C transceiver module without an electronic label, or it may not be an H3C transceiver module. Acknowledge the need for further verification. If other information is displayed, it is not an H3C transceiver module. Replace it with an H3C transceiver module to Identify whether the issue is resolved.

c. Acknowledge with H3C's technical support engineer whether it is an H3C transceiver module.

Use the display hardware internal transceiver register interface and display transceiver information interface commands in probe view to collect transceiver module information. Then provide feedback to the H3C technical support engineer with the bar code on the transceiver module to acknowledge the source of the module and confirm whether it is an H3C transceiver module. If you acknowledge that it is not an H3C transceiver module, replace it with an H3C transceiver module to Identify whether the issue is resolved.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ Device log and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

OPTMOD/4/PHONY_MODULE

The transceiver module does not support digital diagnostics

Symptom

When you use the display transceiver diagnosis interface command to view transceiver module diagnostic information, the system indicates that the transceiver module does not support digit diagnostics. Display as follows:

<Sysname> display transceiver diagnosis interface Twenty-FiveGiGE2/0/1

The transceiver does not support this function.

Common causes

· The transceiver module is a non-H3C transceiver module.

· The transceiver module does not support digital diagnostics.

· Transceiver module failure.

· Device/Fiber port failure.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 21.

Figure 21 Troubleshooting flowchart

Solution

1. Determine if it is an H3C transceiver module. For detailed steps, see "System logs contain information about non-H3C transceiver modules."

2. Use the display transceiver interface command to Identify whether the Digital Diagnostic Monitoring field is YES. If it is YES, the device supports digital diagnostics; otherwise, it does not.

3. Insert the same model transceiver module into other functioning ports of this device or into other operational devices that support the module. Identify whether the unsupported digits diagnostic message still appears.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ Device alarm message.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Lost transceiver module serial number

Symptom

Use the display transceiver manuinfo interface command to check for missing transceiver module serial numbers.

Common causes

· The transceiver module is not securely inserted.

· Transceiver module/device failure.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 22.

Figure 22 Troubleshooting flowchart

Solution

1. Identify whether the transceiver module is fully inserted into the fiber port.

2. You can resolve this by securely inserting the transceiver module or replacing the fiber port.

3. Identify whether the transceiver module is faulty.

4. You can determine this by using the same model of transceiver module plugged into this device's port or another functioning device that supports the module.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ Device alarm message.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting fundamental issues

Login management issues

Forgetting the login password for the console port

Symptom

When local password authentication or AAA local authentication is used for console login, you cannot successfully log in to the device through the console port due to an incorrect password.

Common causes

The following are the common causes of this type of issue:

· You forget the login password for the console port or enter an incorrect password.

· The login account for the console port has expired.

Troubleshooting flow

Figure 23 shows the troubleshooting flowchart.

Figure 23 Flowchart for troubleshooting the issue of forgetting the login password for the console port

Solution

1. Verify that you can log in to the device through Telnet or Stelnet.

If you have a user account assigned the Telnet or Stelnet service and the network-admin or level-15 user role, you can use this account to log in to the device through Telnet or Stelnet and modify the settings related to console login. The procedure is as follows:

a. Use the account assigned the Telnet or Stelnet service to log in to the device and execute the display line command to view the authentication mode of the user line for the console port.

<Sysname> display line

Idx Type Tx/Rx Modem Auth Int Location

0 CON 0 9600 - P - 0/0

+ 81 VTY 0 - N - 0/0

...

If the value for the Auth field is P, the authentication mode is local password authentication. If the value for this field is A, the authentication mode is AAA (scheme) authentication.

b. Verify that the user account you use has the network-admin or level-15 user role.

If you log in to the device on a user line that uses local password authentication or does not require authentication, you can enter the view of that user line to identify whether the user line is assigned the network-admin or level-15 user role. If you log in to the device on a user line that uses scheme authentication, the user roles are assigned by AAA. You must check the authorization attributes assigned to your user account to identify whether the user account is assigned the network-admin or level-15 user role. For local authentication, the user account is configured on the device. For remote authentication, the user account is configured on a remote server.

<Sysname> system-view

[Sysname]line vty 0

[Sysname-line-vty0] display this

line con 0

authentication-mode password

user-role network-admin

line vty 0 63

authentication-mode none

user-role network-admin

return

If your user account is not assigned the network-admin or level-15 user role, it does not have permissions to change the settings related to console login. In this case, proceed to step 2. If your user account is assigned the network-admin or level-15 user role, handle the password forgotten issue according to the authentication mode used for console login.

c. If local password authentication is used for console login, change the authentication password for the console port.

Access the user line where the console port is located and set a new password for the user line. In this example, the password is 1234567890!. As a best practice, assign the network-admin or level-15 user role to the user line to ensure that the users who log in to the device through the console port have sufficient privileges.

[Sysname] line console 0

[Sysname-line-console0] set authentication password simple 1234567890!

[Sysname-line-console0] user-role network-admin

d. If AAA local authentication is used for console login, change the password of the local user account that can be used to log in to the device through the console port.

Enter the local user view of the account used to log in to the device through the console port, and change the password of the account. In this example, the username is admin, and the password is 1234567890!. As a best practice, assign the network-admin or level-15 user role to the account to ensure that the users who use this account to log in to the device through the console port have sufficient privileges.

[Sysname] local-user admin class manage

[Sysname-luser-manage-admin] password simple 1234567890!

[Sysname-luser-manage-admin] authorization-attribute user-role network-admin

e. If AAA remote authentication is used for console login, contact the administrator of the AAA server to obtain the login password.

f. To prevent configuration loss after a reboot, execute the save command to save the running configuration.

2. Connect your configuration terminal to the console port of the device, and then power cycle the device to access the BootWare menu.

IMPORTANT:

· Accessing the BootWare menu requires a device reboot, which causes service interruption. As a best practice, back up services as needed and reboot the device when the service traffic is light.

· For a distributed device, you must connect your configuration terminal to the console ports on both MPUs and then reboot the entire device. After you access the extended BootWare menu of each MPU, perform the operations in this step and subsequent steps first on the active MPU and then reboot the standby MPU.

Upon system startup, if you fail to promptly select the basic segment, the system directly runs the BootWare extended segment. When message Press Ctrl+B to access EXTENDED-BOOTWARE MENU... appears, immediately press Ctrl + B. The system provides a prompt on whether password recovery capability is enabled.

Password recovery capability is enabled.

Password recovery capability is disabled.

¡ When password recovery capability is enabled, you can choose to skip authentication for console login or skip the current system configuration. For more information about the troubleshooting procedure, see steps 3 and 4.

¡ When password recovery capability is disabled, you can choose to restore the factory defaults on the device. For more information about the troubleshooting procedure, see step 5.

3. Skip authentication for console login through the extended BootWare menu, and change the password of the console port after you log in to the system.

Press Enter to access the extended BootWare menu, and then follow the system prompt to select the option that skips authentication for console login (the menu option might vary by device model). After the system starts up, you do not need to enter the password of the console port and the system can load all settings.

a. After the system starts up, you must change the password of the console port as soon as possible according to the authentication mode used by the console port.

# If local password authentication is used for console login, change the authentication password for the console port.

<Sysname> system-view

[Sysname] line console 0

[Sysname-line-console0] set authentication password simple 1234567890!

[Sysname-line-console0] user-role network-admin

# If AAA local authentication is used for console login, change the password of the local user account that can be used to log in to the device through the console port.

<Sysname> system-view

[Sysname] local-user admin class manage

[Sysname-luser-manage-admin] password simple 1234567890!

[Sysname-luser-manage-admin] authorization-attribute user-role network-admin

b. To prevent configuration loss after a reboot, execute the save command to save the running configuration.

4. Skip the current system configuration through the extended BootWare menu and configure a new password for the console port after login.

Press Enter to access the extended BootWare menu, and then follow the system prompt to select the option that skips the current system configuration (the menu option might differ by device model). When the system starts, it ignores all settings in the next-startup configuration file and starts up with initial settings. This is a one-time operation and takes effect only for the first system boot or reboot after you choose this option. After the system starts up, you do not need to enter the password of the console port.

a. After the system starts up, you must export the settings in the original next-startup configuration file as soon as possible. Do not power off the device during this operation. You can use one of the following methods:

- Use FTP or TFTP to export the original next-startup configuration file to your local terminal.

- Execute the more command in user view to display the contents of the original next-startup configuration file, and then copy and paste all the displayed contents to a local configuration file.

b. Manually edit the settings related to console login in the local file, and then upload the edited file to the root directory of the storage medium on the device.

c. Specify the edited configuration file as the next-startup configuration file (in this example, the configuration file is startup.cfg).

<Sysname> startup saved-configuration startup.cfg

d. Reboot the device.

5. Restore the factory defaults through the extended BootWare menu and configure a new password for the console port after login.

CAUTION:

In this operation, the system will automatically delete the main and backup next-startup configuration files upon startup, and then load the factory defaults. You must ensure that this operation does not have negative impact on services.

Press Enter to access the extended BootWare menu, and then follow the system prompt to select the option that restores the factory defaults. The menu option might differ by device model. After the system starts up, you do not need to enter the password of the console port.

a. After the system starts up, configure the login authentication mode for the console port as per your actual needs, as well as the relevant login password or account.

The authentication mode is none:

<Sysname> system-view

[Sysname] line console 0

[Sysname-line-console0] authentication-mode none

[Sysname-line-console0] user-role network-admin

You can log in to the device through this user line without providing any username or password. This authentication mode has security risks. Use it with caution.

The authentication mode is local password authentication:

<Sysname> system-view

[Sysname] line console 0

[Sysname-line-console0] authentication-mode password

[Sysname-line-console0] set authentication password simple 1234567890!

[Sysname-line-console0] user-role network-admin

The authentication mode is local AAA authentication:

<Sysname> system-view

[Sysname] line console 0

[Sysname-line-console0] authentication-mode scheme

[Sysname-line-console0] quit

[Sysname] local-user admin class manage

[Sysname-luser-manage-admin] service-type terminal

[Sysname-luser-manage-admin] password simple 1234567890!

[Sysname-luser-manage-admin] authorization-attribute user-role network-admin

The authentication mode is remote AAA authentication:

<Sysname> system-view

[Sysname] line console 0

[Sysname-line-console0] authentication-mode scheme

[Sysname-line-console0] quit

In addition, you must configure an authentication domain for login users and a RADIUS, HWTACACS, or LDAP scheme. For more information about the configuration, see AAA configuration in Security Configuration Guide.

b. To prevent configuration loss after a reboot, execute the save command to save the running configuration.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Forgetting the password for Telnet login

Symptom

When local password authentication or AAA local authentication is used for Telnet login, you cannot successfully Telnet to the device due to an incorrect password.

Common causes

The following are the common causes of this type of issue:

· You forget the login password for the user account that you use to Telnet to the device or enter an incorrect password.

· The account that you use to Telnet to the device has expired.

Troubleshooting flow

Figure 24 shows the troubleshooting flowchart.

Figure 24 Flowchart for troubleshooting the issue of forgetting the password of a user account used for Telnet login

Solution

1. Verify that you can use another method to log in to the device.

If the Telnet login password is lost, you can log in to the device through another method (such as through the console port) and reconfigure a Telnet login password.

a. Log in to the device through a non-Telnet method, and then execute the display line command to display the authentication mode used by VTY lines.

<Sysname> display line

Idx Type Tx/Rx Modem Auth Int Location

+ 0 CON 0 9600 - P - 0/0

81 VTY 0 - P - 0/0

...

If the value for the Auth field is P, the authentication mode is local password authentication. If the value for this field is A, the authentication mode is AAA (scheme) authentication.

b. Based on the authentication mode used by the VTY lines, configure a new login password for Telnet login.

For local password authentication:

Set the authentication mode for VTY login users to local password authentication, and configure the login password and user role. For example, set the login password to 1234567890! and specify the network-admin user role for VTY login users.

<Sysname> system-view

[Sysname] line vty 0 63

[Sysname-line-vty0-63] authentication-mode password

[Sysname-line-vty0-63] set authentication password simple 1234567890!

[Sysname-line-vty0-63] user-role network-admin

For AAA local authentication:

Set the authentication mode for VTY login users to AAA authentication, and configure a new password for the account that you use to Telnet to the device and specify user roles for the account. In this example, the local account used for Telnet login is admin, the password is set to 1234567890!, and the network-admin user role is specified for the account.

<Sysname> system-view

[Sysname] line vty 0 63

[Sysname-line-vty0-63] authentication-mode scheme

[Sysname-line-vty0-63] quit

[Sysname] local-user admin class manage

[Sysname-luser-manage-admin] service-type telnet

[Sysname-luser-manage-admin] password simple 1234567890!

[Sysname-luser-manage-admin] authorization-attribute user-role network-admin

If you forget the original login account name, you can create a new local account by performing the operations in this step.

For AAA remote authentication:

Contact the administrator of the AAA server to obtain the login password.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Telnet login failure

Symptom

When the device acts as a Telnet server, you fail to log in to the device through a Telnet client.

Common causes

The following are the common causes of this type of issue:

· The network connection between the Telnet client and the device is poor.

· The Telnet client feature is not enabled on the Telnet client.

· The Telnet service is not enabled on the device.

· VTY lines do not support the Telnet protocol.

· The login username or password is incorrect.

· The number of login users on the device has reached the upper limit.

· Access control for Telnet login has been configured on the device, and the Telnet client is not permitted by the rules in the ACL specified for filtering users.

· The authentication mode settings are not configured correctly.

· When both the Telnet client and Telnet server are H3C devices, you do not log in to the Telnet server from the source address or source interface specified on the Telnet client for outgoing Telnet packets.

Troubleshooting flow

Figure 25 shows the troubleshooting flowchart.

Figure 25 Flowchart for troubleshooting Telnet login failure

Solution

1. Verify that the client can successfully ping the device.

Execute the ping command on the Telnet client to check the network connection between the Telnet client and the device.

If the Telnet client cannot ping the IP address of the device, it cannot establish a Telnet connection with the device. As a result, it cannot Telnet to the device. The reason for the ping failure might be that ping is disabled on the Telnet client. To troubleshoot the ping failure, follow the procedure in "Ping and tracert issues.”

2. Verify that the Telnet client feature is enabled on the client.

Typically, before you set up a new Telnet connection on a PC, you must enable the Telnet client feature in the Turn Windows features on or off window on the PC.

For information about enabling the Telnet client feature on other types of devices, such as mobile devices, see the user manuals for those devices.

3. Verify that the Telnet service is enabled on the device.

By default, the Telnet service is disabled. If the command output for the display this command in system view does not contain the telnet server enable command line, the Telnet service remains disabled. You can execute the telnet server enable command to enable the Telnet service to allow clients to Telnet to the device.

4. Verify that the VTY line through which the user Telnets to the device supports the Telnet protocol.

Execute the display this command in VTY line view or VTY line class view.

¡ If the command output does not contain the protocol inbound telnet or protocol inbound all command line, the VTY line does not support the Telnet protocol.

¡ In non-FIPS mode, the system supports all protocols by default. If the command output contains the undo protocol inbound command line or does not contain the protocol inbound command line, the system supports all protocols.

If the Telnet protocol is not supported on the user line, execute the protocol inbound telnet or protocol inbound all command on the user line to allow Telnet login.

<Sysname> system-view

[Sysname] line vty 0 63

[Sysname-line-vty0-63] authentication-mode scheme

[Sysname-line-vty0-63] protocol inbound all

A configuration change in user line view does not take effect on the current session. It takes effect on subsequent login sessions.

5. Verify that the username and password used by the client to Telnet to the device are correct.

If the device prompts an authentication failure when you initiate a Telnet connection and enter the username and password for Telnet login as instructed by the Telnet client, you can attempt to log in again by re-entering the username and password. If the login still fails, you can check the LOGIN/5/LOGIN_INVALID_USERNAME_PWD log. You have entered an invalid username or password if the log contains the following message:

If you forget the correct login username or password, you can change the authentication mode to none or reset the password, and then attempt to Telnet to the device again.

¡ In user line view or user line class view, execute the authentication-mode none command to disable authentication. The configuration indicates that when a user logs in to the device through the specified user line or user line class, no authentication is required. The user can use the user line or user line class to log in without having to enter a username or password. This mode brings security risks. Use it with caution.

<Sysname> system-view

[Sysname] line vty 0 63

[Sysname-line-vty0-63] authentication-mode none

¡ If the authentication mode is local password authentication, execute the set authentication password command in user line view or user line class view to configure an authentication password for local password authentication.

<Sysname> system-view

[Sysname] line vty 0 63

[Sysname-line-vty0-63] authentication-mode password

[Sysname-line-vty0-63] set authentication password simple hello12345&!

¡ If the authentication mode is AAA authentication, follow the procedure in "AAA and password control issues" to reset the password.

6. Identify whether the number of login users on the device has reached the upper limit.

Log in to the device through the console port and execute the display users command in any view to display the current number of Telnet users. By default, the device supports a maximum of 32 concurrent Telnet users.

Check the TELNETD/6/TELNETD_REACH_SESSION_LIMIT log. The number of Telnet users has reached the upper limit if the following log message is generated:

TELNETD/6/TELNETD_REACH_SESSION_LIMIT: Telnet client 1.1.1.1 failed to log in. The current number of Telnet sessions is 10. The maximum number allowed is (10).

If the number of Telnet users has reached the upper limit, you can first disconnect the connections of other idle Telnet users or execute the aaa session-limit telnet command to increase the maximum number of concurrent Telnet users. Then, initiate a Telnet connection to the device again.

7. Identify whether ACLs have been applied to control Telnet login on the device.

In system view, execute the display this command. If the command output contains settings related to the telnet server acl or telnet server ipv6 acl command, ACLs have been applied to control Telnet login.

¡ Verify that the rules in the ACLs permit the IP address, port number, and protocol number of the Telnet client. You can check the TELNETD_ACL_DENY log. The rules in the ACLs deny the IP address of the Telnet client if the following log message is generated:

TELNETD/5/TELNETD_ACL_DENY: The Telnet Connection 1.2.3.4(vpn1) request was denied according to ACL rules.

¡ Execute the undo telnet server acl or undo telnet server ipv6 acl command to remove ACL access restrictions for Telnet users.

8. Verify that the authentication mode settings are correctly configured on the device.

In any view, execute the display line command to check the Auth field to obtain the authentication mode used on the user line through which you Telnet to the device. The value of A indicates AAA authentication, the value of N indicates none authentication, and the value of P indicates local password authentication.

¡ If local password authentication is configured as the login authentication mode for the VTY line by using the authentication-mode password command, you must ensure that an authentication password has been configured for the VTY line.

¡ If AAA authentication is configured as the login authentication mode by using the authentication-mode scheme command, you must ensure that the user account used for Telnet login has been created. For more information about the troubleshooting procedure, see "AAA and password control issues.”

9. When both the Telnet client and Telnet server are H3C devices, identify whether the Telnet client has configured a source address or a source interface for outgoing Telnet packets.

Execute the display this command in system view. If the command output contains the telnet client source command line, a source IPv4 address or source interface has been specified on the Telnet client for outgoing Telnet packets. In this case, you must ensure that you log in to the Telnet server from the specified source IPv4 address or source interface on the Telnet client. If the login fails, perform one of the following operations and attempt to log in to the Telnet server again:

¡ Execute the telnet client source command to reconfigure the source IPv4 address or source interface for the Telnet client to use for outgoing Telnet packets.

¡ Execute the undo telnet client source command to restore the default. In this case, no source IPv4 address or source interface is specified. The Telnet client uses the primary IPv4 address of the output interface for the route to the server as the source IPv4 address.

When you perform the operations in this step, follow these restrictions and guidelines:

¡ The source setting configured by using the telnet client source command has a lower precedence than the source setting specified by using the telnet command in user view.

¡ In an IPv6 network, you can execute the telnet ipv6 command in user view to specify a source interface or source IPv6 address for outgoing Telnet packets.

10. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· LOGIN/5/LOGIN_FAILED

· LOGIN/5/LOGIN_INVALID_USERNAME_PWD

· TELNETD/5/TELNETD_ACL_DENY

· TELNETD/6/TELNETD_REACH_SESSION_LIMIT

Software upgrade issues

Device startup failure

Symptom

The device fails to restart after loading the software images.

Common causes

The following are the common causes of this type of issue:

· The storage medium (CF card or USB disk) is not securely installed.

· The BootWare version does not match the version of the software images.

Troubleshooting flow

Figure 26 shows the troubleshooting flowchart.

Figure 26 Flowchart for troubleshooting device startup failure

Solution

1. Run a terminal emulation program on the PC connected to the console port, start up the device, and identify the BootWare version according to the following information;

Booting Normal Extended BootWare

The Extended BootWare is self-decompressing.............Done.

****************************************************************************

* *

* BootWare, Version 1.50 *

* *

****************************************************************************

2. Press Ctrl + B within 3 seconds after Press Ctrl+B to access EXTENDED-BOOTWARE MENU... is displayed. The system enters the extended BootWare menu.

==========================<EXTENDED-BOOTWARE MENU>==========================

|<1> Boot System |

|<2> Enter Serial SubMenu |

|<3> Enter Ethernet SubMenu |

|<4> File Control |

|<5> Restore to Factory Default Configuration |

|<6> Skip Current System Configuration |

|<7> BootWare Operation Menu |

|<8> Skip Authentication for Console Login |

|<9> Storage Device Operation |

|<0> Reboot |

============================================================================

Ctrl+Z: Access EXTENDED ASSISTANT MENU

Ctrl+C: Display Copyright

Ctrl+F: Format File System

Enter your choice(0-9): 4

3. Enter 4 to access the file control submenu. Verify that the storage medium is securely installed (the CF card is used as example).

¡ If the Note:the operating device is cfa0 message appears and you can see the file information in the CF card after entering 1, the storage medium is securely installed, and proceed to step 4.

¡ If the Note:the operating device is cfa0 message does not appear and you cannot see the file information in the CF card after entering 1, the storage medium is not securely installed, and contact Technical Support.

===============================<File CONTROL>===============================

|Note:the operating device is cfa0 |

|<1> Display All File(s) |

|<2> Set Image File type |

|<3> Set Bin File type |

|<4> Delete File |

|<5> Copy File |

|<0> Exit To Main Menu |

============================================================================

Enter your choice(0-5): 1

Display all file(s) in cfa0:

'M' = MAIN 'B' = BACKUP 'N/A' = NOT ASSIGNED

============================================================================

|NO. Size(B) Time Type Name |

|1 539432 Nov/18/2021 21:11:56 N/A cfa0:/info/info_3_0.bin |

|2 539432 Nov/18/2021 21:15:00 N/A cfa0:/info/info_3_1.bin |

|3 539432 Aug/28/2021 19:05:42 N/A cfa0:/info/info_2_0.bin |

============================================================================

4. Confirm with the support whether the BootWare version is the latest.

¡ If yes, proceed to step 5.

¡ If no, download the latest BootWare version, and proceed to step 5.

5. Connect the PC to the Ethernet interface on the device, run the FTP or TFTP server software on the PC, specify the file path of the downloaded image, and proceed to step 6.

NOTE:

No FTP or TFTP server software is available with the device. You must make sure that it is available by yourself..

6. Enter 0 to return to the extended BootWare menu. Enter 7 to access the BootWare operation menu, and proceed to step 7.

==========================<EXTENDED-BOOTWARE MENU>==========================

|<1> Boot System |

|<2> Enter Serial SubMenu |

|<3> Enter Ethernet SubMenu |

|<4> File Control |

|<5> Restore to Factory Default Configuration |

|<6> Skip Current System Configuration |

|<7> BootWare Operation Menu |

|<8> Skip Authentication for Console Login |

|<9> Storage Device Operation |

|<0> Reboot |

============================================================================

Ctrl+Z: Access EXTENDED ASSISTANT MENU

Ctrl+C: Display Copyright

Ctrl+F: Format File System

Enter your choice(0-9): 7

7. Enter 4 to update the BootWare through the Ethernet interface, and proceed to step 8.

=========================<BootWare Operation Menu>==========================

|Note:the operating device is flash |

|<1> Backup Full BootWare |

|<2> Restore Full BootWare |

|<3> Update BootWare By Serial |

|<4> Update BootWare By Ethernet |

|<0> Exit To Main Menu |

============================================================================

Enter your choice(0-4): 4

8. Enter 4 to configure Ethernet interface parameters, and proceed to step 9.

===================<BOOTWARE OPERATION ETHERNET SUB-MENU>===================

|<1> Update Full BootWare |

|<2> Update Extended BootWare |

|<3> Update Basic BootWare |

|<4> Modify Ethernet Parameter |

|<0> Exit To Main Menu |

============================================================================

Enter your choice(0-4): 4

9. Enter 1 to upload the BootWare image. Enter 0 to return to the BootWare operation menu. Enter 0 again to return to the extended BootWare menu. Enter 0 to reboot the device.

¡ If the device starts up successfully, the issue is resolved.

¡ If the device fails to start up, proceed to step 10.

10. Collect the results of each step and contact the support.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Image loading failure

Symptom

The device fails to load software images.

Common causes

The common cause of this type of issue is that the software image file is corrupted.

Solution

1. Execute the md5sum command in user view to use the MD5 algorithm to calculate the digest of the software image file.

<Sysname> md5sum cfa0:/Comware-cmw710.ipe

MD5 digest:

f2054bc35cd13bf84038bd10fc7a3efd

2. Obtain the label of the software image file from the official website or Technical Support, and use an MD5 tool to calculate the digest of the label.

3. Compare the digest of the software image file with the digest of the label.

¡ If they are the same, the software image file is not corrupted, and proceed to the next step.

¡ If they are different, the software image file is corrupted, and contact the support to obtain a new software image file.

4. Collect the following information and contact the support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting system management issues

Hardware resource management issues

High CPU usage

Symptom

If one of the following conditions occurs, the CPU control core usage of the device is high, and you must identify the causes for the high CPU usage:

· During daily inspection of the device, execute the display cpu-usage command repeatedly to view the CPU usage. The CPU usage is significantly higher than the daily average.

# Execute the display cpu-usage summary command to view the average CPU usage during the most recent 5-second, 1-minute, or 5-minute interval.

<Sysname> display cpu-usage summary

Slot CPU Last 5 sec Last 1 min Last 5 min

1 0 5% 5% 4%

# Execute the display cpu-usage history command to view the CPU usage in graphical form for the last 60 samples. The data shows that the CPU usage rate continues to increase or is significantly higher than the daily average value.

· When you log in to the device via Telnet or SSH and execute commands, the device responds slowly and experiences stagnation.

· The device outputs log messages about high CPU usage on the device.

· Alarms on high CPU usage occur on the SNMP manager.

Common causes

The following are the common causes of this type of issue:

· Network attacks.

· Protocol flappings, typically STP flappings and route protocol flappings.

· Network loops.

· After flow sampling is configured on the device and the traffic to be processed is too large or the device's sampling frequency is too high, the sampling feature occupies a significant amount of CPU resources.

· The device generates a large number of log messages. Then, abundant resources are occupied for the generation and management of these log messages.

Troubleshooting flow

Figure 27 shows the troubleshooting flowchart.

Figure 27 Flowchart for troubleshooting high CPU usage

Solution

1. Identify whether a network attack occurs.

On a live network, the most common cause of high CPU usage is a network attack. Attackers initiate a large number of abnormal network interactions which hit the device. For example, the attackers transmit a large number request messages for setting up TCP connections or ICMP request messages in a short period. Then, the device is busy processing these attack messages, leading to high CPU usage and subsequently affecting the normal operation of the device.

In probe view, execute the display system internal control-plane statistics command to view statistics of the control plane to check the number of dropped messages. If the current CPU usage is high and the Dropped field value is large, a message attack occurs probably on the device. (Support for the display system internal control-plane statistics command depends on the device model.)

<Sysname> system-view

[Sysname] probe

[Sysname-probe] display system internal control-plane statistics slot 1

Control plane slot 1

Protocol: Default

Bandwidth: 15360 (pps)

Forwarded: 108926 (Packets), 29780155 (Bytes)

Dropped : 0 (Packets), 0 (Bytes)

Protocol: ARP

Bandwidth: 512 (pps)

Forwarded: 1489284 (Packets), 55318920 (Bytes)

Dropped : 122114 (Packets), 491421 (Bytes)

...

¡ If a network attack occurs, first resolve the network attack issue.

¡ If no network attack occurs, proceed to step 2.

2. Identify whether a protocol flapping occurs on the device.

A protocol flapping can cause continuous processing of protocol messages, topology calculations, and entry updates by the device, resulting in high CPU usage. In practical applications, the most common protocol flappings are STP protocol flappings and OSPF protocol flappings.

¡ For STP protocol flappings, execute the stp port-log command in system view to enable outputting port state transition information. If the CLI of the device frequently outputs the following logs, an STP flapping occurs:

STP/6/STP_DETECTED_TC: Instance 0's port GigabitEthernet2/0/1 detected a topology change.

STP/6/STP_DISCARDING: Instance 0's port GigabitEthernet2/0/1 has been set to discarding state.

STP/6/STP_NOTIFIED_TC: Instance 0's port GigabitEthernet2/0/1 was notified a topology change.

- If an STP flapping occurs, first resolve the STP flapping issue.

- If no STP flapping occurs, proceed to the next step.

¡ For OSPF flappings, execute the display ip routing-table command to view routing information. If route entries for the same network segment are frequently and repeatedly created and deleted in the routing table, a route flapping occurs.

- If a route flapping occurs or the routes do not exist, troubleshoot link-related issues and IGP routing issues.

- If no route flapping occurs, proceed to step 3.

3. Identify whether a network loop occurs.

When an Ethernet interface operates in Layer 2 mode and a loop occurs on the link, broadcast storms and network flappings might occur. Then, a large number of protocol packets are sent to the CPU for processing, causing high CPU usage. When a network loop occurs, traffic on many ports of the device will increase significantly, with a large proportion of broadcast and multicast packets. To identify whether a network loop occurs on the device and whether broadcast, multicast, or unknown unicast packet storms occur, follow these steps:

a. Clear the Ethernet interface traffic statistics.

<Sysname> reset counters interface

b. Execute the display counters rate inbound interface command multiple times to identify whether the port usage has significantly increased.

<Sysname> display counters rate inbound interface

Usage: Bandwidth utilization in percentage

Interface Usage(%) Total(pps) Broadcast(pps) Multicast(pps)

GE2/0/1 0.01 7 -- --

GE2/0/2 0.01 1 -- --

GE2/0/3 0.01 5 -- --

GE2/0/4 0.05 60 -- --

GE2/0/5 0.04 52 -- --

Overflow: More than 14 digits.

--: Not supported.

c. If the port usage significantly increases, repeatedly execute the display counters inbound interface command to view the total number of packets received on interfaces and the number of broadcast and multicast packets, which correspond to the values for the Total(pkt), Broadcast(pkt), and Multicast(pkt) fields, respectively. If the proportion of broadcast and multicast packets in the total number of received packets on the interfaces is high, a broadcast or multicast storm might occur. If the number of broadcast and multicast packets has not significantly increased, but the number of the total packets received on interfaces has increased significantly, an unknown unicast packet storm might occur.

<Sysname> display counters inbound interface

Interface Total(pkt) Broadcast(pkt) Multicast(pkt) Err(pkt)

GE2/0/1 141 27 111 0

GE2/0/2 274866 47696 0 --

GE2/0/3 1063034 684808 2 --

GE2/0/4 11157797 7274558 50 0

GE2/0/5 9653898 5619640 52 0

Overflow: More than 14 digits (7 digits for column "Err").

--: Not supported.

¡ If a link loop occurs, perform the following operations:

- Troubleshoot the link connection to prevent the occurrence of loops in the physical topology.

- Execute the display stp command to identify whether STP is enabled and whether the configuration is correct. If the configuration is incorrect, correct the configuration.

- Execute the display stp brief and display stp abnormal-port commands to check the spanning tree status on neighboring devices. Locate and resolve STP anomalies according to the BlockReason field value in the output from the display stp abnormal-port command.

If the STP configuration is correct, an STP protocol miscalculation might occur, or the protocol calculation is correct but the port driver layer is not blocked as expected. To quickly restore STP and eliminate loops, execute the shutdown/undo shutdown command or unplug and plug the network cable on the interface where the loop occurs, allowing STP recalculation.

- In Ethernet interface view, execute broadcast-suppression to enable broadcast suppression on an interface, execute multicast-suppression to enable multicast storm suppression, and execute unicast-suppression to enable unknown unicast storm suppression. Alternatively, execute flow-control to configure flow control. (Support for the broadcast-suppression, multicast-suppression, unicast-suppression, and flow-control commands depends on the device model.)

- Apply QoS policies for rate limiting on multicast, broadcast and unknown unicast packets.

¡ If no loop occurs, proceed to step 4.

4. Identify whether flow statistics and sampling features are configured and whether the configured parameters are appropriate.

After network traffic monitoring features including NetStream are configured on the device, the device will statistically analyze network traffic. If network traffic is high, the CPU usage might be high. In this case, perform the following operations:

¡ Configure filter conditions to precisely filter the traffic, and only analyze the traffic that users care about.

¡ Configure the sampler and adjust the sampling ratio. Then, the statistics collected by NetStream can basically reflect the status of the entire network, and can prevent the excessive statistical messages from affecting the forwarding performance of the device.

5. Identify whether the device is generating a large number of log messages.

In certain anomaly situations, for example, when the device is under attack, errors occur during the operation, or a port frequently runs up/down, the device continuously generates diagnostic information or log information. In this case, the system software needs to frequently read and write to the memory, which can increase the CPU usage.

Use the following methods to identify whether the device is generating a large number of log messages:

¡ Log in to the device via Telnet and execute the terminal monitor command to enable log output to the current terminal.

<Sysname> terminal monitor

The current terminal is enabled to display logs.

After you execute this command, if a large number of abnormal log messages or duplicated log messages are output to the CLI, the device is generating a large number of log messages.

¡ Repeatedly execute the display logbuffer summary command. If the total log volume increases obviously, execute the display logbuffer reverse command to view detailed log information to identify whether a large number of abnormal log messages occur or whether a particular log message is repeatedly appearing in large quantities.

<Sysname> display logbuffer summary

Slot EMERG ALERT CRIT ERROR WARN NOTIF INFO DEBUG

1 0 0 2 9 24 12 128 0

5 0 0 0 41 72 8 2 0

97 0 0 42 11 14 7 40 0

<Sysname> display logbuffer reverse

Log buffer: Enabled

Max buffer size: 1024

Actual buffer size: 512

Dropped messages: 0

Overwritten messages: 0

Current messages: 410

%Jan 15 08:17:24:259 2021 Sysname SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=192.168.2.108-User=**; Command is display logbuffer

%Jan 15 08:17:19:743 2021 Sysname SHELL/4/SHELL_CMD_MATCHFAIL: -User=**-IPAddr=192.168.2.108; Command display logfile in view shell failed to be matched.

...

If the device is generating a large number of log messages, use the following methods to reduce log generation:

¡ Disable the log output feature for some service modules.

¡ Execute the info-center logging suppress command to disable log output for a module.

¡ Execute the info-center logging suppress duplicates command to enable duplicate log suppression.

If the device has not generated a large number of log messages, proceed to step 6.

6. Collect CPU usage information, and identify the service modules where the CPU usage is high.

a. Identify the tasks that are consuming high CPU usage.

# Execute the display process cpu to view tasks that occupy most CPU resources within a period. This example displays information about slot 1.

<Sysname> display process cpu slot 1

CPU utilization in 5 secs: 0.4%; 1 min: 0.2%; 5 mins: 0.2%

JID 5Sec 1Min 5Min Name

1 0.0% 0.0% 0.0% scmd

2 5.5% 5.1% 5.0% [kthreadd]

3 0.0% 0.0% 0.0% [ksoftirqd/0]

...

If a process has a CPU usage higher than 3% (for reference only), further location is required for that process.

# Execute the monitor process dumbtty command to view the real-time CPU usage of a process. This example displays information about CPU 0 for slot 1.

<Sysname> system-view

[Sysname] monitor process dumbtty slot 1 cpu 0

206 processes; 342 threads; 5134 fds

Thread states: 4 running, 338 sleeping, 0 stopped, 0 zombie

CPU0: 99.04% idle, 0.00% user, 0.96% kernel, 0.00% interrupt, 0.00% steal

CPU1: 98.06% idle, 0.00% user, 1.94% kernel, 0.00% interrupt, 0.00% steal

CPU2: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal

CPU3: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal

CPU4: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal

Memory: 7940M total, 5273M available, page size 4K

JID PID PRI State FDs MEM HH:MM:SS CPU Name

322 322 115 R 0 0K 01:48:03 20.02% [kdrvfwdd2]

323 323 115 R 0 0K 01:48:03 20.02% [kdrvfwdd3]

324 324 115 R 0 0K 01:48:03 20.02% [kdrvfwdd4]

376 376 120 S 22 159288K 00:00:07 0.37% diagd

1 1 120 S 18 30836K 00:00:02 0.18% scmd

379 379 120 S 22 173492K 00:00:11 0.18% devd

2 2 120 S 0 0K 00:00:00 0.00% [kthreadd]

3 3 120 S 0 0K 00:00:02 0.00% [ksoftirqd/0]

…

- In the output from the monitor process dumbtty command, find the JIDs of processes with CPU usage higher than 3% (for reference only). Then, execute the display process job command for these processes to collect detailed information about the processes, and identify whether the processes are running on the control core.

If the LAST_CPU field value in the output from the display process job command is the ID of the control core (for example, 0 and 1), a process is running on the CPU control core and further location is required. If the LAST_CPU field value is not the ID of the control core, a process is running on the CPU forwarding core, In this case, no action is required and proceed to step 7. Take the pppd process as an example. The output shows that this process contains multiple threads, all of which are running on the control core.

<Sysname> display process name pppd

Job ID: 515

PID: 515

Parent JID: 1

Parent PID: 1

Executable path: /sbin/pppd

Instance: 0

Respawn: ON

Respawn count: 1

Max. spawns per minute: 12

Last started: Wed Nov 3 09:52:00 2021

Process state: sleeping

Max. core: 1

ARGS: --MaxTotalLimit=2000000 --MaxIfLimit=65534 --CmdOption=0x01047fbf --bSaveRunDb --pppoechastenflag=1 --pppoechastennum=6 --pppoechastenperiod=60 --pppoechastenblocktime=300 --pppchastenflag=1 --pppchastennum=6 --pppchastenperiod=60 --pppchastenblocktime=300 --PppoeKChasten --bSoftRateLimit --RateLimitToken=2048

TID LAST_CPU Stack PRI State HH:MM:SS:MSEC Name

515 0 136K 115 S 0:0:0:90 pppd

549 0 136K 115 S 0:0:0:0 ppp_misc

557 0 136K 115 S 0:0:0:10 ppp_chasten

610 0 136K 115 S 0:0:0:0 ppp_work0

611 1 136K 115 S 0:0:0:0 ppp_work1

612 1 136K 115 S 0:0:0:0 ppp_work2

613 1 136K 115 S 0:0:0:0 mp_main

618 1 136K 115 S 0:0:0:110 pppoes_main

619 1 136K 115 S 0:0:0:100 pppoes_mesh

620 1 136K 115 S 0:0:0:120 l2tp_mesh

621 1 136K 115 S 0:0:0:20 l2tp_main

- For a process running on the control core with CPU usage higher than 5%, check the Name field value to identify whether the process is a user-mode process.

If the Name field for a process includes square brackets ([ ]), the process is a kernel thread, and you do not need to execute the monitor thread dumbtty command. If the Name field for a process does not include square brackets ([ ]), the process is a user process and might contain multiple threads. For user processes with multithreading, execute the monitor thread dumbtty command. If the LAST_CPU field of a thread in the output corresponds to the ID of the CPU control core, and the CPU field value is greater than 5%, this thread might cause high CPU core usage. Then, further location is required.

<Sysname> system-view

[Sysname] monitor thread dumbtty slot 1 cpu 0

206 processes; 342 threads; 5134 fds

Thread states: 4 running, 338 sleeping, 0 stopped, 0 zombie

CPU0: 98.06% idle, 0.97% user, 0.97% kernel, 0.00% interrupt, 0.00% steal

CPU1: 97.12% idle, 0.96% user, 0.96% kernel, 0.96% interrupt, 0.00% steal

CPU2: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal

CPU3: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal

CPU4: 0.00% idle, 0.00% user, 100.00% kernel, 0.00% interrupt, 0.00% steal

Memory: 7940M total, 5315M available, page size 4K

JID TID LAST_CPU PRI State HH:MM:SS MAX CPU Name

322 322 2 115 R 00:04:21 0 20.15% [kdrvfwdd2]

323 323 3 115 R 00:04:21 0 20.15% [kdrvfwdd3]

324 324 4 115 R 00:04:21 0 20.15% [kdrvfwdd4]

1 1 1 120 S 00:00:02 21 0.19% scmd

376 376 1 120 S 00:00:00 1 0.19% diagd

2 2 0 120 S 00:00:00 0 0.00% [kthreadd]

...

b. Identify the stacks of an abnormal task.

Execute the follow job command in probe view to identify the stacks of an abnormal task. The following takes the pppd process (with process ID 515) in slot 1 on the device as an example.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] follow job 515 slot 1

Attaching to process 515 (pppd)

Iteration 1 of 5

------------------------------

Thread LWP 515:

Switches: 3205

User stack:

#0 0x00007fdc2a3aaa8c in epoll_wait+0x14/0x2e

#1 0x0000000000441745 in ppp_EpollSched+0x35/0x5c

#2 0x0000000000000004 in ??

Kernel stack:

[<ffffffff811f0573>] ep_poll+0x2f3/0x370

[<ffffffff811f06c0>] SyS_epoll_wait+0xd0/0xe0

[<ffffffff814aed79>] system_call_fastpath+0x16/0x1b

[<ffffffffffffffff>] 0xffffffffffffffff

Thread LWP 549:

Switches: 20

User stack:

#0 0x00007fdc2a3aaa8c in epoll_wait+0x14/0x2e

#1 0x00000000004435d4 in ppp_misc_EpollSched+0x44/0x6c

Kernel stack:

[<ffffffffffffffff>] 0xffffffffffffffff

...

c. Identify the task name based on steps a and b, and then find the corresponding service module according to the task name to locate and resolve issues in the service module. For example, if the CPU usage of the snmpd task is high, an SNMP attack might occur, or the NMS frequently accesses the device. Then, further troubleshooting is required for the SNMP service module. If the CPU usage of the nqad task is high, the NQA detection might be performed too frequently. Then, further troubleshooting is required for the NQA service module.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

· hh3cEntityExtCpuUsageThresholdNotfication

· hh3cEntityExtCpuUsageThresholdRecover

· hh3cCpuUsageSevereNotification

· hh3cCpuUsageSevereRecoverNotification

· hh3cCpuUsageMinorNotification

· hh3cCpuUsageMinorRecoverNotification

Log messages

· DIAG/5/CPU_MINOR_RECOVERY

· DIAG/4/CPU_MINOR_THRESHOLD

· DIAG/5/CPU_SEVERE_RECOVERY

· DIAG/3/CPU_SEVERE_THRESHOLD

Troubleshooting virtual technology issues

IRF issues

IRF setup failure

Symptom

Several devices cannot form an IRF fabric, or a new member device cannot join an existing IRF fabric.

Common causes

The following are the common causes of this type of issue:

· When you use member devices to set up a new IRF fabric, the total number of IRF member devices exceeds the upper limit. When you add a new member device to an existing IRF fabric, the number of existing IRF member devices has reached the upper limit in that IRF fabric.

· The device configuration does not meet the IRF setup requirements.

· The IRF physical interfaces, cables, and physical topology do not meet the IRF setup requirements. As a result, the IRF links cannot come up.

Troubleshooting flow

Figure 28 shows the troubleshooting flowchart.

Figure 28 Flowchart for troubleshooting IRF setup failure

Solution

IMPORTANT:

This section only covers the routine requirements for setting up an IRF fabric. For more information about the requirements for setting up an IRF fabric, see IRF configuration in the configuration guides for the product.

1. Identify whether the number of IRF member devices has reached the maximum value supported by the system.

Execute the display irf command to view the number of member devices in the current IRF fabric. If the number of IRF member devices has reached the maximum value supported by the system, you cannot add any member device to the IRF fabric.

The maximum number of member devices in an IRF fabric varies by device model.

2. Verify that all member devices run the same version of software.

Execute the display version command to display the current software version on each device. Only devices running the same software version can form an IRF fabric.

Typically, the IRF auto-update feature (enabled by default) can automatically synchronize the software version of a member device with the software version of the master device. However, the synchronization might fail when the gap between the software versions is large. In this case, you must manually upgrade the software of that member device.

If the member device has two MPUs, you must upgrade software for both the MPUs to ensure software consistency across them.

3. Verify that the IRF configuration on each member device meets the IRF setup requirements.

a. Verify that all member devices are operating in IRF mode.

Some products are shipped in IRF mode and do not support mode conversion. Some products are shipped in standalone mode and support mode conversion. If a device supports the display irf link or display irf topology command, the device is operating in IRF mode. If a device does not support either of the commands, the device is operating in standalone mode. To enable IRF mode for the device, execute the chassis convert mode irf command in system view.

<Sysname> display irf ?

> Redirect it to a file

>> Redirect it to a file in append mode

configuration IRF configuration that will be valid after reboot

link Display link status

topology Topology information

| Matching output

<cr>

b. Verify that the member ID of each member device is unique across the IRF fabric.

Execute the display irf command to display the member IDs of the member devices in the IRF fabric. Each member device in the IRF fabric must use a unique member ID. Devices that use the same member ID cannot establish an IRF fabric or join the same IRF fabric. The default member ID for a device is 1. In standalone mode, you can change the IRF member ID of a device by using the irf member command. In IRF mode, you can change the IRF member ID of a device by using the irf member renumber command. For the new member ID to take effect, you must save the configuration and reboot the device.

c. Verify that each member device is shipped with a unique bridge MAC address.

Member devices shipped with the same bridge MAC address cannot join the same IRF fabric. Typically, each device is shipped with a unique bridge MAC address across the network. If IRF setup fails and the Failed to stack because of the same bridge MAC addresses message is generated, two devices are shipped with the same bridge MAC address. In this case, use the irf mac-address command to change the bridge MAC address on one of the devices. (Support for the irf mac-address command depends on the device model.)

d. Verify that all member devices in the same IRF fabric use the same IRF domain ID.

The IRF domain ID does not affect IRF fabric setup and merge, but it affects multi-active detection (MAD). To ensure that MAD can operate correctly, make sure all member devices in the same IRF fabric use the same IRF domain ID. By default, the IRF domain ID is 0. To obtain the IRF domain ID of a device, execute the display irf command on that device and check the value in the Domain ID field of the command output. If the IRF domain ID of a device is different from that of the other devices, execute the irf domain command to change the IRF domain ID on the device.

4. Verify that the IRF ports are in up state.

An IRF port is a logical interface that connects IRF member devices. To use an IRF port, you must bind a minimum of one physical interface to it. To obtain the status of IRF ports, execute the display irf topology command and check the value in the Link field of the command output.

<Sysname> display irf topology

Topology Info

-------------------------------------------------------------------------

IRF-Port1 IRF-Port2

MemberID Link neighbor Link neighbor Belong To

2 DIS --- UP 1 5e40-08d9-0104

1 UP 2 DIS --- 5e40-08d9-0104

¡ If the value of the Link field is UP for an IRF port on a member device, the IRF port is correctly connected and no action is required.

¡ If the value of the Link field is DIS for an IRF port on a member device, no IRF physical interfaces have been bound to the IRF port. If binding IRF physical interfaces to the IRF port is required, execute the port group interface command in IRF port view to bind IRF physical interfaces to the IRF port.

¡ If the value of the Link field is DOWN for an IRF port on a member device, execute the display irf link command to examine whether the IRF physical interfaces bound to the IRF port are in UP state.

- If a minimum of one IRF physical interface is up when the IRF port is down, the configuration of the IRF port might not be activated. To activate the IRF port configuration, execute the irf-port-configuration active command in system view.

- If no IRF physical interfaces are in UP state, proceed to step 5 to troubleshoot the IRF physical interface issue.

¡ If the value of the Link field is TIMEOUT for an IRF port on a member device, the IRF hello packets have timed out and the IRF link has communication issues. Perform the following tasks to locate the timeout issue of IRF packets:

- Identify whether the IRF packet exchange failure is caused by an anomaly of the neighboring IRF port. For this purpose, log in to the neighboring device at the other end of the IRF link, execute the display irf topology and display irf link commands on the neighboring device, and then locate the issue based on the command output.

- Verify that no network loops exist on the IRF fabric, as they lead to packet loss. To identify whether a network loop exists, execute the display counters rate inbound interface command to display the packet rate statistics of the IRF physical interfaces and examine whether a packet storm has occurred on the IRF link. If a packet storm exists, check for a physical loop and examine whether the VLAN and STP settings are correct. If a physical loop exists or the settings are incorrect, remove the loop or correct the settings to resolve the packet storm issue.

- Execute the display device command to examine whether the switching fabric modules are operating correctly. If not, first troubleshoot the issue with the switching fabric module.

¡ If the value of the Link field is ISOLATE for an IRF port on a member device, the member device is isolated. In this case, execute the display logbuffer | include STM stackability check command, and then proceed according to the command output.

- If the command output includes the STM stackability check: Product series is inconsistency message, the model of the member device does not meet the IRF setup requirements. In this case, proceed to step 7.

- If the command output includes the STM stackability check: Product xxx is inconsistency message, where xxx might represent the system operating mode or other settings that require consistency across member devices, the current system parameter configuration does not meet the IRF setup requirements. In this case, proceed to step 8.

5. Check the state of IRF physical interfaces and verify that a minimum of one IRF physical interface is up for each IRF port.

Execute the display irf link command to check the state of IRF physical interfaces.

¡ If the value of the Interface field is disable for an IRF port, no IRF physical interfaces have been bound to the IRF port.

¡ If the value of the Interface field for an IRF port is one or multiple physical interface names, continue to check the Status field. The value and meaning of the Status field are as follows:

- UP—An IRF physical link is up. In this state, no action is required.

- DOWN—An IRF physical link is down. In this case, verify that the transceiver module and fiber or cable of the IRF physical interface is operating correctly. You must use a physical interface that meets the product requirements as an IRF physical interface and use a connection medium that meets the product requirements to connect the IRF physical interface. When the transceiver module and fiber or cable of the IRF physical interface is operating correctly, proceed to step 6.

- ADM—An IRF physical interface is shut down by using the shutdown command. In this state, the IRF physical interface is administratively down. To bring up the IRF physical interface, you must execute the undo shutdown command.

- ABSENT—An IRF physical interface does not exist. You can insert the card or expansion interface module that hosts the interface.

6. Verify that the IRF physical connections meet the IRF connection requirements.

Perform the following operations to locate an IRF physical connection issue:

a. On each member device, execute the display irf configuration command to view the binding relationship between IRF ports and IRF physical interfaces. Verify that the IRF physical interfaces bound to IRF ports are consistent with those on the IRF physical connections. If not, reconfigure the IRF port bindings or reconnect physical interfaces.

b. Verify that the IRF physical interfaces are correctly connected. Make sure the IRF physical interfaces of IRF-port 1 on one member device are connected to the IRF physical interfaces of IRF-port 2 on another member device. If the IRF fabric contains only two member devices, you must connect them in a daisy-chain topology rather than a ring topology.

7. Verify that the hardware of the member devices meets the IRF setup requirements.

You must use hardware that meets the IRF setup requirements to set up an IRF fabric. For example, the device model, MPUs, interface modules, and IRF physical interfaces must meet the IRF setup requirements. You can perform the following tasks to determine whether the device hardware meets the IRF setup requirements:

# Execute the display version command to check the device model.

<Sysname> display version

H3C Comware Software, Version 7.1.070, Alpha 704228

H3C S12508X-AF uptime is 0 weeks, 0 days, 2 hours, 31 minutes

Last reboot reason : Cold reboot

...

# Execute the display device command to check the models of the MPUs and interface modules.

...

# Execute the display interface command to check the rate and type of each IRF physical interface.

...

8. Verify that the system parameter settings meet the IRF setup requirements.

To set up an IRF fabric, all member devices must use the same system parameter settings, including the same system operating mode, VXLAN hardware resource mode, route hardware resource mode, and maximum number of ECMP routes. (The restrictions vary by device model.)

¡ To display the system operating mode on a device, use the display system-working-mode command. To change the system operating mode of the device, use the system-working-mode command. For the mode change to take effect, you must save the configuration and reboot the device.

¡ To display the hardware resource modes on a device, use the display hardware-resource command. To change the VXLAN and route hardware resource modes of the device, use the hardware-resource vxlan and hardware-resource routing-mode commands, respectively. For the mode changes to take effect, you must save the configuration and reboot the device.

¡ To display the maximum number of IPv4 ECMP routes and the maximum number of IPv6 ECMP routes supported by the system, use the display max-ecmp-num and display ipv6 max-ecmp-num commands, respectively. To change the maximum number of IPv4 ECMP routes and the maximum number of IPv6 ECMP routes, use the max-ecmp-num and ipv6 max-ecmp-num commands, respectively. For the changes to take effect, you must save the configuration and reboot the device.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module name: HH3C-STACK-MIB

· hh3cStackPhysicalIntfLinkDown(1.3.6.1.4.1.25506.2.91.6.0.8)

· hh3cStackPhysicalIntfRxTimeout (1.3.6.1.4.1.25506.2.91.6.0.9)

Log messages

· STM/3/STM_LINK_DOWN

· STM/2/STM_LINK_TIMEOUT

· STM/6/STM_LINK_UP

· STM/4/STM_SAMEMAC

· STM/3/STM_SOMER_CHECK

Unexpected reboot of an IRF member device

Symptom

The master device or a subordinate device in an IRF fabric reboots unexpectedly. As a result, the IRF fabric splits.

Common causes

The following are the common causes of this type of issue:

· The subordinate device automatically reboots to load startup software images from the master device.

· IRF merge causes the subordinate device to reboot.

· A software or hardware fault causes the device to reboot unexpectedly in an attempt to fix the fault.

Troubleshooting flow

Figure 29 shows the troubleshooting flowchart.

Figure 29 Flowchart for troubleshooting unexpected reboot of an IRF member device

Solution

1. Identify whether the rebooted device is a subordinate device.

¡ If the device is a subordinate device, proceed to step 2.

¡ If the device is the master device, proceed to step 4.

2. Identify whether the reboot is caused by the software auto-update feature.

¡ If the reboot is caused by the software auto-update feature, no action is required.

¡ If the reboot is not caused by the software auto-update feature, proceed to step 3.

To identify whether the reboot of the subordinate device is caused by the software auto-update feature, execute the display system internal irf msg command in probe view. If the command output includes the Version is different, and the sender CPU MAC is xxxx-xxxx-xxxx (chassis xx slot xx). message, the reboot of the subordinate device with the CPU MAC of xxxx-xxxx-xxxx is caused by the software auto-update feature.

3. Identify whether the reboot is caused by an IRF merge.

¡ If the reboot is caused by an IRF merge, locate the causes of the IRF split and merge, and eliminate security risks to prevent the same issue from causing an IRF split and merge again.

¡ If the reboot is not caused by an IRF merge, proceed to step 4.

To identify whether the reboot of the subordinate device is caused by an IRF merge:

¡ Execute the display kernel reboot command on the IRF fabric to obtain the device reboot reason after the device reboots. If the value for the Reason field is 0x7, the device reboots due to an IRF merge. The value for the Slot field represents the number of the slot that triggers the reboot, and the value for the Target Slot field represents the number of the slot that has been rebooted.

<Sysname> display kernel reboot 1

--------------------- Reboot record 1 ---------------------

Recorded at : 2021-12-06 00:10:05.440616

Occurred at : 2021-12-06 00:10:05.440616

Reason : 0x7

Thread : STM_Main (TID: 232)

Context : thread context

Slot : 1

Target Slot : 2

Cpu : 0

VCPU ID : 2

Kernel module info : module name (system) module address (0xffffffffc0074000)

module name (addon) module address (0xffffffffc0008000)

¡ Execute the display system internal irf msg | include reboot command in probe view on the IRF fabric. If the master device has sent a reboot message, the reboot of the subordinate device is caused by an IRF merge.

19> Send reboot pkt, src_addr 5e40-08d9-0104 (chassis 1 slot 1), at 2022/1/5 15:42:48:386

4. Examine whether the reboot is caused by a software or hardware fault.

Execute the display version command, check the Reboot Cause field for the reboot cause, and handle the reboot issue according to the reboot cause as shown in Table 2.

<Sysname> display version

...

Reboot Cause : ColdReboot

[SubSlot 0] 24GE+4SFP Plus+POE

Table 2 Device reboot causes and recommended actions

Value for the Reboot Cause field	Reboot cause description	Recommended actions
AutoUpdateReboot	The reboot was caused by an automatic software upgrade.	No action is required.
BootwareBackupReboot	Bootware backup area reboot.	Collect log messages and diagnostic messages, and then contact Technical Support for help.
ColdReboot	The reboot was caused by a power cycle.	Check the power supply environment of the device to ensure that the power supply module can provide power correctly to the device.
CryptographicModuleSelftestsFailedReboot	The reboot was caused by an algorithm library self-test failure.	Upgrade the software version as soon as possible.
CryptotestFailReboot	The reboot was caused by a cryptographic algorithm library self-check failure.	Upgrade the software version as soon as possible.
DeadLoopReboot	The reboot was caused by a kernel thread dead loop.	Collect log messages, diagnostic messages, and the command output from the display kernel deadloop 20 verbose command executed for the reboot slot, and then contact Technical Support for help.
DEVHandShakeReboot	The reboot was caused by a device management handshake failure.	Execute the display device command to identify whether the active MPU is in Normal state. If the state is not Normal, the MPU might fail. You must resolve the MPU issue first.
GoldMonReboot	The Generic OnLine Diagnostics (GOLD) module detected an exception.	Perform the following operations to locate the reboot cause: 1. Execute the display diagnostic content command, check the Correct-action field, and find that the corrective action is reboot. Then, obtain the time when the device was rebooted and troubleshoot issues occurred around the time. 2. Execute the display diagnostic event-log command to display GOLD log entries. 3. Locate the reboot cause based on the command output and resolve the issue.
IRFMergeReboot	The reboot was caused by an IRF merge.	An IRF link failure can cause an IRF split. Once the IRF link is recovered, the IRF fabric will automatically merge. To prevent the same issue from causing an IRF split and merge again, locate and resolve the issue.
KernelAbnormalReboot	A CPU, host memory, or software issue led to a system kernel error.	Collect log messages, diagnostic messages, and the command output from the display kernel exception 10 verbose and display kernel reboot 20 verbose commands, and then contact Technical Support for help.
KeyReboot	The RESET key was pressed.	Avoid accidental operations.
LicenseTimeoutReboot	The license has expired.	Install a formal license as soon as possible.
MasterLostReboot	The master slot was rebooted while the current slot was performing a bulk backup operation.	Collect log messages and diagnostic messages, and then contact Technical Support for help.
MemoryexhaustReboot	The amount of free memory is lower than the threshold value.	Identify the cause of high memory usage and resolve the high memory usage fault accordingly. For example, too many ACL entries can cause high memory usage.
PdtReboot	The reboot was required by the driver.	Collect log messages and diagnostic messages, and then contact Technical Support for help.
SelfReboot	The current slot was reset.	Collect log messages and diagnostic messages, and then contact Technical Support for help.
StandbyCannotUpdateReboot	The standby MPU cannot be upgraded to the active MPU.	Collect log messages and diagnostic messages, and then contact Technical Support for help.
StandbySwitchReboot	The original active MPU was rebooted after an active/standby switchover.	Identify the cause of the active/standby switchover and resolve the fault that causes the active/standby switchover to prevent another unexpected active/standby switchover. For example, software upgrade can cause an active/standby switchover.
UserReboot	The reboot was caused by a manual operation through the CLI, the network manager, or the Web interface.	No action is required.
WarmReboot	The reboot might be caused by various reasons, for example, poor contact of board pins.	Collect log messages and diagnostic messages, and then contact Technical Support for help.
WatchDogReboot	The watchdog detected a system fault, for example, a CPU, memory, software, or hardware fault.	Use the display hardware-failure-detection command to locate the cause of the fault based on the command output, and troubleshoot the fault.

5. If the issue persists, collect the following information and contact Technical Support:

¡ For example, the active MPU is in slot 16 and the standby MPU is in slot 17. The standby MPU reboots. To resolve the issue, collect the output information of the following commands:

- Execute the following commands in any view:

display version

display device

display diagnostic-information

display kernel deadloop 20 verbose slot 16

display kernel exception 10 verbose slot 16

display kernel reboot 20 verbose slot 16

- Execute the following commands in probe view to collect information:

local logbuffer slot 17 display

local logbuffer slot 17 display from-highmemory

display reboot last-time slot 17

display system internal version

display diag-msg start-msg slot 17

NOTE:

Support for these commands depends on the device model and software version.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· DEV/1/AUTO_SWITCH_FAULT_REBOOT

· DEV/5/BOARD_REBOOT

· DEV/1/BOARD_RUNNING_FAULT_REBOOT

· DEV/5/CHASSIS_REBOOT

· DEV/5/SUBCARD_REBOOT

· DEV/5/SYSTEM_REBOOT

· STM/4/STM_MERGE

Troubleshooting interface issues

Tunnel interface issues

Tunnel interface instability

Symptom

After you configure a P2P tunnel (for example, a GRE, IPv4, or IPv6 tunnel), the local tunnel interface is in up state. You can ping the IP address of the remote tunnel interface from the local tunnel interface. However, the local tunnel interface is unstable. The following symptoms exist:

· The tunnel interface is repeatedly coming up and going down.

· The tunneled packet loss rate is high and the transmission rate is low.

This section uses a GRE/IPv4 tunnel as an example to describe the troubleshooting procedure.

Common causes

The following are the common causes of this type of issue:

· Routes destined for the tunnel destination address are flapping, which causes the tunnel to be flapping.

· The same source and destination addresses are configured on the device for two tunnels. As a result, only one tunnel can come up.

· Keepalive is enabled on the GRE tunnel interface. However, the device cannot correctly send or receive GRE keepalive packets. As a result, the device places the tunnel in down state.

· The device does not have sufficient resources to successfully issue the tunnel to the hardware. As a result, the tunnel is down on the physical layer.

· The configuration on the tunnel interface is inappropriate, leading to the loss of tunneled packets.

Troubleshooting flow

Figure 30 shows the troubleshooting flowchart.

Figure 30 Flowchart for troubleshooting tunnel interface instability

Solution

1. Examine whether routes are flapping.

Execute the debugging tunnel event command to enable tunneling event debugging. If the system continuously generates route refresh or deletion messages, routes are flapping. In this case, the tunnel is also flapping. The following information shows a sample command output:

<Sysname> debugging tunnel event

<Sysname> %Jun 16 12:49:55:497 2022 Sysname BGP/5/BGP_STATE_CHANGED: -MDC=1; BGP.: 4.4.4.4 state has changed from ESTABLISHED to IDLE for TCP_Connection_Failed event received.

//The system received a TCP connection failure event from the BGP peer at 4.4.4.4. The state of the BGP session has changed from Established to Idle.

%Jun 16 12:49:55:497 2022 Sysname BGP/5/BGP_STATE_CHANGED_REASON: -MDC=1; BGP.: 4.4.4.4 state has changed from ESTABLISHED to IDLE. (Reason: TCP connection failed(No route to host))

//The BGP session established for the BGP peer at 4.4.4.4 has changed from Established state to Idle state due to a TCP connection failure (no route to reach the host).

If routes are flapping, locate the cause of the route flapping based on the route refresh or deletion messages. For example, if the BGP session cannot stably enter the Established state, troubleshoot the issue according to the BGP troubleshooting manual.

If routes are not flapping, proceed to the next step.

2. Identify where the device at each end has tunnels with the same source and destination addresses.

Execute the display interface tunnel command in any view on the device at each end of the tunnel and identify whether the same device has multiple tunnels that use the same source and destination addresses.

<Sysname> display interface Tunnel

Tunnel1

Current state: UP

Line protocol state: UP

Description: Tunnel1 Interface

Bandwidth: 64 kbps

Maximum transmission unit: 1464

Internet protocol processing: Disabled

Output queue - Urgent queuing: Size/Length/Discards 0/100/0

Output queue - Protocol queuing: Size/Length/Discards 0/500/0

Output queue - FIFO queuing: Size/Length/Discards 0/75/0

Last clearing of counters: 15:20:18 Mon 06/13/2022

Tunnel source 1.1.1.1, destination 2.2.2.2

...

If multiple tunnels use the same source and destination addresses on the same device, only one of them can come up. You can execute the undo interface tunnel command to delete unneeded tunnels. If the same device does not have multiple tunnels that use the same source and destination addresses, proceed to the next step.

3. Identify whether GRE keepalive is configured and whether GRE keepalive packets can be sent and received correctly.

Execute the display current-configuration interface tunnel command in any view to display the keepalive configuration of the tunnel interface.

<Sysname> display current-configuration interface tunnel

interface Tunnel1 mode gre

ip address 10.1.1.2 255.255.255.0

source 12.1.1.4

destination 12.1.1.2

keepalive 3 3

On the local end, execute the debugging gre packet command to enable GRE packet debugging to identify whether the local end can correctly receive and send keepalive packets.

<Sysname> debugging gre packet

*Jun 16 12:46:50:350 2022 Sysname GRE/7/packet: -MDC=1;

Tunnel1 packet: Before encapsulation,

12.1.1.2->12.1.1.4 (length = 24)

*Jun 16 12:46:50:350 2022 Sysname GRE/7/packet: -MDC=1;

Tunnel1 packet: After encapsulation,

12.1.1.4->12.1.1.2 (length = 48)

*Jun 16 12:46:50:351 2022 Sysname GRE/7/packet: -MDC=1;

Tunnel1 packet: Before de-encapsulation according to fast-forwarding table,

12.1.1.2->12.1.1.4 (length = 24)

*Jun 16 12:46:50:351 2022 Sysname GRE/7/packet: -MDC=1;

Tunnel1 : Received a keepalive packet.

//Tunnel 1 received a keepalive packet.

On the remote end, enable GRE packet debugging. If the remote end has sent keepalive packets but the local end does not receive any of them, the GRE keepalive packets might fail to pass the local checksum check. As a result, the local tunnel interface goes down. You can execute the undo gre checksum command to disable GRE checksum on the local end or execute the undo keepalive command to disable GRE keepalive.

If the local end can successfully receive keepalive packets from the remote end, proceed to step 4.

4. Verify that the hop limit or TTL and DF bit settings of tunneled packets are properly configured.

Execute the display current-configuration interface tunnel command in any view to check the configuration of the hop limit or TTL and DF bit parameters.

interface Tunnel1 mode gre

ip address 10.1.1.2 255.255.255.0

source 12.1.1.4

destination 12.1.1.2

keepalive 3 3

tunnel ttl 1

tunnel dfbit enable

If the parameters are not properly configured, tunneled packets might be discarded.

A too small hop limit or TTL value might cause tunnel packets to be discarded on intermediate devices due to TTL timeout. In this case, execute the tunnel ttl command in tunnel interface view to set a proper TTL value according to the actual network configuration.

If the DF bit is set for tunneled packets, intermediate devices might discard tunneled packets if the length of these packets exceeds the MTU of the interfaces on the forwarding path. In this case, set the MTU of each interface on the forwarding path to be greater than the length of tunneled packets. If you cannot ensure that the MTU of each interface on the forwarding path is greater than the length of tunneled packets, do not set the DF bit for tunneled packets.

If the issue persists, proceed to step 5.

5. Identify whether the system fails to process the event for issuing the tunnel to hardware.

Enable tunneling event debugging, and observe whether the system has tunneled packets or events that have failed to be issued to the kernel or driver. The following information shows an example:

<Sysname> debugging tunnel all

*Jun 16 12:51:25:832 2022 Sysname TUNNEL/7/event: -MDC=1;

Tunnel1 notifies driver: Operation = 4.

TunnelIfIndex = 524, EvilinkIfIndex = 0

VRFIndex = 0, DstVRFIndex = 0

TunnelMode = IPv4 GRE, TransPro = 1

TunnelSrc = 12.1.1.4

TunnelDst = 12.1.1.2

TTL = 255, ToS = 0, DFBit = 0

MTU = 1476, IPv6Mtu = 1476

DrvContext[0] = 0xffffffffffffffff, DrvContext[1] = 0xffffffffffffffff

VNHandle = 0x20000040, ADJIndex = 0xfaf3889c

//Tunnel interface 1 notified the driver to execute operation 4.

*Jun 16 12:51:25:832 2022 Sysname TUNNEL/7/event: -MDC=1;

Processing result of operation 4 for Tunnel1: failed.

//The driver failed to process operation 4 issued by tunnel interface 1.

%Jun 16 12:51:25:832 2022 Sysname IFNET/3/PHY_UPDOWN: -MDC=1; Physical state on the interface Tunnel1 changed to down.

%Jun 16 12:51:25:832 2022 Sysname IFNET/5/LINK_UPDOWN: -MDC=1; Line protocol state on the interface Tunnel1 changed to down.

//Tunnel interface 1 went down.

*Jun 16 12:51:27:350 2022 Sysname TUNNEL/7/event: -MDC=1;

Tunnel1 can't come up because there is not enough hardware resource

//Tunnel 1 cannot come up because of insufficient hardware resources.

If the device generates the event or error messages in Table 3, a hardware fault causes tunnel instability. In this case, contact Technical Support.

Table 3 Debugging messages related to hardware

Field	Description
Tunnelnum can't come up because reason.	Reason why a tunnel interface cannot come up. The value for the reason variable is there is not enough hardware resource.
Failed to save 6RD prefix to DBM.	The system failed to save the IPv6 prefix of the 6RD tunnel to the database in memory (DBM).
Failed to save IPv4 prefix/suffix for 6RD tunnel to DBM.	The system failed to save the IPv4 prefix or suffix of the 6RD tunnel to the DBM.
Failed to save 6RD BR address to DBM.	The system failed to save the BR address of the 6RD tunnel to the DBM.
Failed to send 6RD prefix to kernel.	The system failed to send the 6RD prefix configuration message to the kernel for the tunnel.
Failed to send IPv4 prefix/suffix for 6RD tunnel to kernel.	The system failed to send the 6RD IPv4 configuration message to the kernel for the tunnel.
Failed to send 6RD BR address to kernel.	The system failed to send the 6RD BR address configuration message to the kernel for the tunnel.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting Layer 2—LAN switching issues

Ethernet link aggregation issues

Down aggregate interface

Symptom

When two devices are connected through link aggregation, the output from the display interface command indicates that an aggregate interface is down.

Common causes

The following are the common causes for this type of issue:

· Incorrect configuration on the aggregate interface.

· Physical link fault on the member ports.

· Failure in sending and receiving LACP protocol packets.

Troubleshooting flow

To resolve this issue:

1. Use the display link-aggregation verbose command to check whether the member ports are in selected state. If a port is in unselected state, use the display interface command to check whether the physical status of the member port is up and eliminate physical faults on the port.

2. Check the local and peer aggregate interface configurations to eliminate configuration faults.

3. Use the debugging link-aggregation lacp packet command to view the LACP interaction situation of the member ports of dynamic aggregation.

Figure 31 shows the troubleshooting flowchart.

Figure 31 Flowchart for troubleshooting down aggregate interface

Solution

1. Check whether the physical connections are correct.

Verify that links are connected to the aggregate interface as planned.

If a physical connection is correct, proceed to step 2.

2. Whether the aggregate interface is shut down manually.

Execute the display interface command to check the physical state of the aggregate interface. If it displays Administratively DOWN, the aggregate interface is manually shut down. Execute the undo shutdown command to enable the aggregate interface. If the aggregate interface has not been manually shut down, proceed to step 3.

3. Check whether the member ports in the aggregation group are up.

Execute the display interface command to Identify whether the member ports in the aggregation group are up. If not, follow the troubleshooting procedure for the down interface issue.

If the interface is up, proceed to step 4.

For example, the member port GigabitEthernet 2/0/1 in the Layer 2 aggregation group 1 is in unselected state. In the output from the display interface command, the physical status of GigabitEthernet2/0/1 is DOWN, making the member port GigabitEthernet 2/0/1 unselected.

<Sysname> display link-aggregation verbose

Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing

Port Status: S -- Selected, U -- Unselected, I -- Individual

Port: A -- Auto port, M -- Management port, R -- Reference port

Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,

D -- Synchronization, E -- Collecting, F -- Distributing,

G -- Defaulted, H -- Expired

Aggregate Interface: Bridge-Aggregation1

Aggregation Mode: Static

Loadsharing Type: Shar

Management VLANs: None

Port Status Priority Oper-Key

GE2/0/1 U 32768 1

<Sysname> display interface GigabitEthernet 2/0/1

GigabitEthernet2/0/1

Current state: DOWN

Line protocol state: DOWN

IP packet frame type: Ethernet II, hardware address: 2a41-21c1-0100

Description: GigabitEthernet2/0/1 Interface

Bandwidth: 1000000 kbps

Loopback is not set

Unknown-speed mode, full-duplex mode

Link speed type is autonegotiation, link duplex type is force link

Flow-control is not enabled

Maximum frame length: 9216

Allow jumbo frames to pass

Broadcast max-ratio: 100%

Multicast max-ratio: 100%

Unicast max-ratio: 100%

Known-unicast max-ratio: 100%

PVID: 1

MDI type: Automdix

Port link-type: Access

Tagged VLANs: None

Untagged VLANs: 1

Port priority: 2

Last link flapping: 0 hours 0 minutes 15 seconds

Last clearing of counters: Never

Current system time:2021-08-10 10:15:02

Last time when physical state changed to up:2021-08-09 18:31:43

Last time when physical state changed to down:2021-08-10 10:14:47

Peak input rate: 0 bytes/sec, at 00-00-00 00:00:00

Peak output rate: 0 bytes/sec, at 00-00-00 00:00:00

Last 300 seconds input: 5000 packets/sec 5000 bytes/sec -%

Last 300 seconds output: 5000 packets/sec 5000 bytes/sec -%

Input (total): 5000 packets, 5000 bytes

5000 unicasts, 5000 broadcasts, 5000 multicasts, 0 pauses

Input (normal): 0 packets, 0 bytes

0 unicasts, 0 broadcasts, 0 multicasts, 0 pauses

Input: 5000 input errors, 0 runts, 0 giants, 0 throttles

0 CRC, 0 frame, 0 overruns, 0 aborts

5000 ignored, 0 parity errors

Output (total): 5000 packets, 5000 bytes

5000 unicasts, 5000 broadcasts, 5000 multicasts, 0 pauses

Output (normal): 0 packets, 0 bytes

0 unicasts, 0 broadcasts, 0 multicasts, 0 pauses

Output: 5000 output errors, 0 underruns, 0 buffer failures

5000 aborts, 0 deferred, 0 collisions, 0 late collisions

0 lost carrier, 0 no carrier

4. Check whether the aggregate interface is in dynamic mode.

¡ If the aggregate interface is in dynamic mode, check whether the peer aggregate interface is also in dynamic mode. Execute the display link-aggregation verbose command in any view to check the aggregation mode of the aggregate interfaces at both ends of the link and ensure that the aggregation modes at both ends are the same.

Taking the Layer 2 aggregate interface as an example, when Aggregation Mode: Dynamic is displayed, the aggregation interface is in dynamic mode:

<Sysname> display link-aggregation verbose bridge-aggregation 10

Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing

Port Status: S -- Selected, U -- Unselected, I -- Individual

Port: A -- Auto port, M -- Management port, R -- Reference port

Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,

D -- Synchronization, E -- Collecting, F -- Distributing,

G -- Defaulted, H -- Expired

Aggregate Interface: Bridge-Aggregation10

Creation Mode: Manual

Aggregation Mode: Dynamic

Loadsharing Type: Shar

Management VLANs: None

System ID: 0x8000, 000f-e267-6c6a

Local:

Port Status Priority Index Oper-Key Flag

GE2/0/1 S 32768 61 2 {ACDEF}

GE2/0/2 S 32768 62 2 {ACDEF}

GE2/0/3 S 32768 63 2 {ACDEF}

Remote:

Actor Priority Index Oper-Key SystemID Flag

GE2/0/1(R) 32768 111 2 0x8000, 000f-e267-57ad {ACDEF}

GE2/0/2 32768 112 2 0x8000, 000f-e267-57ad {ACDEF}

GE2/0/3 32768 113 2 0x8000, 000f-e267-57ad {ACDEF}

If the configuration is incorrect, change the aggregation interface of the remote end to dynamic aggregation. If the configuration is correct, execute the debugging link-aggregation lacp packet command to identify whether LACP packets are received and sent correctly.

Execute the debugging link-aggregation lacp packet command to view the Actor field in the send information and the Partner field in the receive information of the member port. If the sys-mac, key, and port-index fields are inconsistent, the LACP protocol packet transmission is abnormal. Identify whether the receiving or sending fiber is disconnected. If the sys-mac, key, and port-index fields are consistent, the LACP protocol packet transmission is normal, and proceed to step 5.

Enable the debugging switch for the LACP packets of the aggregation member port GigabitEthernet 2/0/1, and observe LACP packet receiving and sending on this port.

<Sysname> debugging link-aggregation lacp packet all interface gigabitethernet 2/0/1

*Nov 2 15:51:21:15 2007 Sysname LAGG/7/Packet: PACKET.GigabitEthernet2/0/1.send.

size=110, subtype =1, version=1

Actor: type=1, len=20, sys-pri=0x8000, sys-mac=00e0-fc02-0300, key=0x1, pri=0x8000, port-index=0x2, state=0xc5

Partner: type=2, len=20, sys-pri=0x0, sys-mac=0000-0000-0000, key=0x0, pri=0x0, port-index=0x0, state=0x32

Collector: type=3, len=16, col-max-delay=0x0

Terminator: type=0, len=0

*Nov 2 15:55:21:15 2007 Sysname LAGG/7/Packet: PACKET.GigabitEthernet2/0/1.receive.

size=110, subtype =1, version=1

Actor: type=1, len=20, sys-pri=0x8000, sys-mac=00e0-fc00-0000, key=0x1, pri=0x8000, port-index=0x6, state=0xd

Partner: type=2, len=20, sys-pri=0x8000, sys-mac=00e0-fc02-0300, key=0x1, pri=0x8000, port-index=0x2, state=0xc5

Collector: type=3, len=16, col-max-delay=0x0

Terminator: type=0, len=0

¡ If the aggregate interface is in static mode, proceed to step 5.

5. Check whether the minimum number of selected ports for the aggregate interface affects the selection of member ports.

Execute the display this command in aggregate interface view. If the link-aggregation selected-port minimum command is configured, modify the minimum selected port limit to meet the selection requirement. If the number of selectable member ports are increased to the minimum number of selected member ports or a larger value, the status of these member ports will become selected, and the link state of the corresponding aggregate interface will also change to up.

If the minimum number of selected ports for the aggregation interface does not affect the selection of the member ports, proceed to step 6.

For example, the minimum number of selected ports for Layer 2 aggregate interface 1 is 2. The aggregation group of Layer 2 aggregation interface 1 has only one member port, so this member port is in unselected state.

[Sysname-Bridge-Aggregation1] display this

interface Bridge-Aggregation1

link-aggregation selected-port minimum 2

return

[Sysname-Bridge-Aggregation1] display link-aggregation verbose

Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing

Port Status: S -- Selected, U -- Unselected, I -- Individual

Port: A -- Auto port, M -- Management port, R -- Reference port

Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,

D -- Synchronization, E -- Collecting, F -- Distributing,

G -- Defaulted, H -- Expired

Aggregate Interface: Bridge-Aggregation1

Aggregation Mode: Static

Loadsharing Type: Shar

Management VLANs: None

Port Status Priority Oper-Key

GE2/0/1 U 32768 1

6. Check whether selected member ports exist in the aggregation group.

If no selected member port exists in the aggregation group, see "Unselection of aggregation member ports." If selected member ports exist in the aggregation group, proceed to step 7.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Uneven traffic load sharing on an aggregate interface

Symptom

When two devices are connected through a link aggregation, output from the display counters rate command shows that some member ports have extremely low rates or a rate of 0 in the outbound direction.

Common causes

The common cause is the incorrect configuration of the aggregation load sharing method.

Troubleshooting flow

To resolve this issue, identify the characteristics of the packets forwarded by the aggregate interface and check whether the aggregate load sharing mode matches the packet characteristics.

Figure 32 shows the troubleshooting flowchart.

Figure 32 Flowchart for troubleshooting uneven traffic load sharing on an aggregate interface

Solution

1. Check whether the user service traffic is normal.

If the user service traffic is normal, wait for a while and then execute the display counters rate command to check the outbound traffic rate of the aggregation member ports. Check whether the traffic load sharing of the aggregation member ports has been restored.

¡ If load sharing has been restored, no action is required.

¡ If load sharing is not restored, proceed to step 2.

If the user service traffic is abnormal, proceed to step 2.

2. Check whether the aggregation load sharing mode matches the packet characteristics.

Check the type of aggregation load sharing by executing the display link-aggregation load-sharing modecommand. If it does not match the packet characteristics, adjust the mode of aggregation load sharing with the following command:

¡ Execute the link-aggregation global load-sharing mode command in system view to adjust the global load-sharing mode.

¡ Execute the link-aggregation load-sharing mode command in aggregate interface view to adjust the load sharing mode of the aggregate interface.

By default, the device performs load balancing based on source and destination IP addresses.

If the aggregation load sharing mode matches the characteristics of the packets, proceed to step 3.

3. Check whether cross-module or cross-chassis aggregation has been deployed.

If cross-module or cross-chassis aggregation exists on an IRF fabric, execute the undo link-aggregation load-sharing mode local-first command in system view to disable the local first forwarding feature.Disabling the local first forwarding feature can prevent cross-module or cross-chassis traffic from being too large and affect the stability of the IRF fabric. Perform this operation according to the actual situation.

If cross-module or cross-chassis aggregation is not deployed, proceed to step 4.

Excessive cross-module or cross-chassis traffic might affect the stability of the IRF fabric.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Unselection of aggregation member ports

Symptom

When two devices are connected via link aggregation, the member ports of the aggregation group are in unselected state and the aggregation fails.

Common causes

The following are the common causes for this type of issue:

· Link connectivity fault.

· The operational key and attribute configurations are inconsistent between the local end and the peer end.

· The aggregation member port count is incorrect.

Troubleshooting flow

To resolve this issue:

1. Identify whether the member ports are up and eliminate physical faults on the port.

2. Use the debugging link-aggregation lacp packet command to view the LACP interaction on member ports of the dynamic aggregation group.

3. Check the local and peer aggregate interface configurations to eliminate configuration faults.

Figure 33 shows the troubleshooting flowchart.

Figure 33 Flowchart for troubleshooting unselection of aggregation member ports

Solution

1. Identify whether the physical connections are correct.

Perform a link check according to the network plan of the aggregate interface, and identify whether the physical connections are connected as planned.

If the physical connections are correct, proceed to step 2.

2. Check whether the member ports in the aggregation group are up.

Use the display interface command to Identify whether the member ports in the aggregation group are up. If they are not up, follow the troubleshooting procedure for the down interface issue.

If the member ports are up, proceed to step 3.

3. Check whether the attribute configuration of the local member ports is the same as that of the aggregate interface.

a. Execute the display link-aggregation verbose command to view the unselected member ports on the local end.

Taking a Layer 2 aggregate interface as an example, when the Status field displays U, the member port is unselected.

<Sysname> display link-aggregation verbose

Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing

Port Status: S -- Selected, U -- Unselected, I -- Individual

Port: A -- Auto port, M -- Management port, R -- Reference port

Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,

D -- Synchronization, E -- Collecting, F -- Distributing,

G -- Defaulted, H -- Expired

Aggregate Interface: Bridge-Aggregation1

Creation Mode: Manual

Aggregation Mode: Dynamic

Loadsharing Type: Shar

Management VLANs: None

System ID: 0x8000, 2a41-21c1-0100

Local:

Port Status Priority Index Oper-Key Flag

GE2/0/1(R) S 32768 1 1 {ACDEF}

GE2/0/2 S 32768 2 1 {ACDEF}

GE2/0/3 U 32768 3 2 {AC}

Remote:

Actor Priority Index Oper-Key SystemID Flag

GE2/0/1 32768 1 1 0x8000, 36f6-c0aa-0200 {ACDEF}

GE2/0/2 32768 2 1 0x8000, 36f6-c0aa-0200 {ACDEF}

GE2/0/3 32768 3 1 0x8000, 36f6-c0aa-0200 {AC}

b. Execute the display current-configuration interface command to check whether the attribute configuration (such as VLAN) of the unselected member port on the local end is the same as the aggregate interface. If not, modify the attribute configuration for consistent configuration.

For example, the member port GigabitEthernet 2/0/3 is in unselected state and has different attribute configuration from the reference port GigabitEthernet2/0/1. This difference prevents the member port. You must modify the attribute configuration of the member port GigabitEthernet 2/0/3.

<Sysname> display current-configuration interface gigabitethernet 2/0/1

interface GigabitEthernet2/0/1

port link-mode bridge

port link-type trunk

port trunk permit vlan 1 to 20

port link-aggregation group 1

return

<Sysname> display current-configuration interface bridge-aggregation 1

interface Bridge-Aggregation1

port link-type trunk

port trunk permit vlan 1 to 100

link-aggregation mode dynamic

return

If the attribute configuration of the local member port is the same as the aggregate interface, proceed to step 4.

4. Check whether the operational key of the member ports on the local end is the same as the reference port.

a. Execute the display link-aggregation verbose command to view the unselected member ports on the local end.

Taking the Layer 2 aggregate interface as an example, when the Status field displays U, the member port is unselected:

<Sysname> display link-aggregation verbose

Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing

Port Status: S -- Selected, U -- Unselected, I -- Individual

Port: A -- Auto port, M -- Management port, R -- Reference port

Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,

D -- Synchronization, E -- Collecting, F -- Distributing,

G -- Defaulted, H -- Expired

Aggregate Interface: Bridge-Aggregation11

Creation Mode: Manual

Aggregation Mode: Dynamic

Loadsharing Type: Shar

Management VLANs: None

System ID: 0x8000, 2a41-21c1-0100

Local:

Port Status Priority Index Oper-Key Flag

GE2/0/1(R) S 32768 1 1 {ACDEF}

GE2/0/2 S 32768 2 1 {ACDEF}

GE2/0/3 U 32768 3 2 {AC}

Remote:

Actor Priority Index Oper-Key SystemID Flag

GE2/0/1 32768 1 1 0x8000, 36f6-c0aa-0200 {ACDEF}

GE2/0/2 32768 2 1 0x8000, 36f6-c0aa-0200 {ACDEF}

GE2/0/3 32768 3 1 0x8000, 36f6-c0aa-0200 {AC}

b. Execute the display current-configuration interface command to check whether the operational key of the local member port in unselected state (including the port's speed and duplex mode) is the same as the reference port. If not, modify the configuration for consistency.

For example, the operational key of the member port GigabitEthernet 2/0/3 in unselected state is different from that of the reference port GigabitEthernet 2/0/1. As a result, the member port cannot be selected and the port rate configuration must be modified.

<Sysname> display current-configuration interface GigabitEthernet 2/0/1

interface GigabitEthernet2/0/1

port link-mode bridge

combo enable fiber

port link-aggregation group 11

return

<Sysname> display current-configuration interface GigabitEthernet 2/0/3

interface GigabitEthernet2/0/3

port link-mode bridge

combo enable fiber

speed 100

port link-aggregation group 11

return

If the operational key of the local member port is the same as the reference port, proceed to step 5.

5. Check whether the local aggregate interface is in dynamic mode.

If it is in dynamic mode, proceed to step 6. If it is in static mode, proceed to step 8.

6. Check whether LACP packets are sent and received correctly.

Execute the debugging link-aggregation lacp packet command to Identify whether LACP packets are sent and received correctly. Examine the Actor field in the send information and the Partner field in the receive information of the member port. If the sys-mac, key, and port-index fields are inconsistent, the LACP protocol packet transmission is abnormal. Identify whether the receiving or sending fiber is disconnected. If the sys-mac, key, and port-index fields are consistent, the LACP protocol packet transmission is normal, and proceed to step 7.

Enable the debugging switch for the LACP packets of the aggregation member port GigabitEthernet 2/0/1, and observe LACP packet receiving and sending on this port.

<Sysname> debugging link-aggregation lacp packet all interface gigabitethernet 2/0/1

*Nov 2 15:51:21:15 2021 Sysname LAGG/7/Packet: PACKET.GigabitEthernet2/0/1.send.

size=110, subtype =1, version=1

Actor: type=1, len=20, sys-pri=0x8000, sys-mac=00e0-fc02-0300, key=0x1, pri=0x8000, port-index=0x2, state=0xc5

Partner: type=2, len=20, sys-pri=0x0, sys-mac=0000-0000-0000, key=0x0, pri=0x0, port-index=0x0, state=0x32

Collector: type=3, len=16, col-max-delay=0x0

Terminator: type=0, len=0

*Nov 2 15:55:21:15 2021 Sysname LAGG/7/Packet: PACKET.GigabitEthernet2/0/1.receive.

size=110, subtype =1, version=1

Actor: type=1, len=20, sys-pri=0x8000, sys-mac=00e0-fc00-0000, key=0x1, pri=0x8000, port-index=0x6, state=0xd

Partner: type=2, len=20, sys-pri=0x8000, sys-mac=00e0-fc02-0300, key=0x1, pri=0x8000, port-index=0x2, state=0xc5

Collector: type=3, len=16, col-max-delay=0x0

Terminator: type=0, len=0

7. Check whether the operational key and attribute configuration of the peer port for the local member port are the same as the peer port for the reference port.

Execute the display current-configuration interface command on the device on the peer end of the local unselected port. Identify whether the operational key and attribute configuration of the peer end for the unselected port are the same as those on the peer port for the reference port. If not, modify the configuraiton for consistency.

If the operational key and attribute configuration of the peer port for the local member port are the same as those of the peer port for the reference port, proceed to step 8.

8. Check whether the number of aggregation member ports reaches the upper limit.

¡ The number of aggregation member ports reaches the upper limit.

Execute the link-aggregation selected-port maximum command in aggregate interface view to configure the maximum number of selected ports in the aggregation group. Use the display link-aggregation verbose command to Identify whether the number of member ports in the aggregation group reaches the upper limit. If yes, the excess ports will be placed in unselected state. Selected ports are sorted in ascending order by port ID. Execute the undo port link-aggregation group command in member port view to remove undesired selected ports from the aggregation group for desired member ports to be selected.

¡ The number of aggregate member ports is below the lower limit.

Execute the link-aggregation selected-port minimum command in aggregate interface view to configure the minimum number of selected ports in the aggregation group. Execute the display link-aggregation verbose command to check whether the member ports in the aggregation group are lower than the lower limit. If they are lower than the lower limit, all member ports are in unselected state. Execute the link-aggregation selected-port minimum command to modify the minimum selected port count or add member ports to the aggregation group so that the minimum selection requirements are met.

If the number of aggregation member ports has not reached the limit of the aggregation group, proceed to step 9.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Spanning tree issues

Service interruption caused by a loop

Symptom

Services are interrupted when multiple devices are connected into a loop through physical links.

Common causes

The following are the common causes of this type of issue:

· The physical state of the device interfaces is down.

· The spanning tree feature is disabled on the device.

Troubleshooting flow

Figure 34 shows the troubleshooting flowchart.

Figure 34 Flowchart for troubleshooting service interruption caused by a loop

Solution

1. Identify whether the state of the interfaces forwarding service traffic is up.

a. Identify whether the physical state of the interfaces is up.

Execute the display interface brief command to Identify whether the physical state of the network interfaces is up by examining the Link field.

<Sysname> display interface brief

Brief information on interfaces in route mode:

Link: ADM - administratively down; Stby - standby

Protocol: (s) - spoofing

Interface Link Protocol Primary IP Description

InLoop0 UP UP(s) --

MGE0/0/0 DOWN DOWN --

NULL0 UP UP(s) --

REG0 UP -- --

Brief information on interfaces in bridge mode:

Link: ADM - administratively down; Stby - standby

Speed: (a) - auto

Duplex: (a)/A - auto; H - half; F - full

Type: A - access; T - trunk; H - hybrid

Interface Link Speed Duplex Type PVID Description

GE2/0/1 UP auto A A 1

GE2/0/2 DOWN auto A A 1

GE2/0/3 ADM auto A A 1

- If the state of the interfaces is up, proceed to step b.

- If the state of an interface is ADM, execute the undo shutdown command in interface view to activate this interface. If the state of the interface remains down, check the interface link and related configurations. If the state of the interface is up and the issue persists, proceed to step b.

- If the state of an interface is down, troubleshoot the interface link and related configurations. If the state of the interface is up and the issue persists, proceed to step b.

b. Identify whether the state of the data link layer (DDL) protocol on the interface is up. The interface with a down DDL protocol cannot participate in computing the spanning tree topology.

Execute the display interface command and check whether the DDL protocol state of the interface is up by examining the Line protocol state field.

<Sysname> display interface gigabitethernet 2/0/2

GigabitEthernet2/0/2

Current state: UP

Line protocol state: DOWN(LAGG)

...

DOWN(protocols) indicates that the DDL of the interface is shut down by one or more protocol modules. The protocols argument can be any combination of the following protocols:

- DLDP—The DDL of the interface is shut down because the DLDP module detects a unidirectional communication.

- OAM—The interface's data link layer was disabled because the Ethernet OAM module detected a remote link failure.

- LAGG—The DDL of the interface is shut down because there are no selected member ports for the aggregate interface.

- BFD—The DDL of the interface is shut down because the BFD module detects a link fault.

- VBP—The DDL of the interface is shut down because Layer 2 forwarding is configured.

If the DDL of the interface is shut down by the above protocols, review and adjust the configuration of these modules to restore the DDL protocol state of the interface to up. If the issue persists, proceed to step 2.

2. Identify whether the spanning tree feature on the devices is enabled.

a. Check whether the global spanning tree feature is enabled on the devices.

Execute the display stp command.

- If the following output appears, the global spanning tree protocol is not enabled:

<Sysname> display stp

Protocol status : Disabled

Protocol Std. : IEEE 802.1s

Version : 3

Bridge-Prio. : 32768

MAC address : 2eae-3769-0200

Max age(s) : 20

Forward delay(s) : 15

Hello time(s) : 2

Max hops : 20

TC Snooping : Disabled

<Sysname> display stp

STP is not configured.

Execute the stp global enable command under system view to enable the global spanning tree feature.

- If the state and statistical information of the spanning tree appear as shown below, the global spanning tree feature is enabled. Proceed to step b.

<Sysname> display stp

-------[CIST Global Info][Mode MSTP]-------

Bridge ID : 32768.2eae-3769-0200

Bridge times : Hello 2s MaxAge 20s FwdDelay 15s MaxHops 20

Root ID/ERPC : 32768.2eae-3769-0200, 0

RegRoot ID/IRPC : 32768.2eae-3769-0200, 0

RootPort ID : 0.0

BPDU-Protection : Disabled

Bridge Config-

Digest-Snooping : Disabled

TC or TCN received : 0

Time since last TC : 0 days 2h:49m:11s

----[Port1(GigabitEthernet2/0/1)][DOWN]----

Port protocol : Enabled

Port role : Disabled Port

Port ID : 128.54

Port cost(Legacy) : Config=auto, Active=200000

Desg.bridge/port : 32768.2eae-3769-0200, 128.54

Port edged : Config=disabled, Active=disabled

Point-to-Point : Config=auto, Active=false

Transmit limit : 10 packets/hello-time

TC-Restriction : Disabled

Role-Restriction : Disabled

Protection type : Config=none, Active=none

MST BPDU format : Config=auto, Active=802.1s

Port Config-

Digest-Snooping : Disabled

Rapid transition : False

Num of VLANs mapped : 1

Port times : Hello 2s MaxAge 20s FwdDelay 15s MsgAge 0s RemHops 20

BPDU sent : 0

TCN: 0, Config: 0, RST: 0, MST: 0

BPDU received : 0

TCN: 0, Config: 0, RST: 0, MST: 0

b. Check whether the spanning tree feature is enabled for VLANs. (Only applicable when the spanning tree mode is PVST. Proceed to step c for any other spanning tree mode.)

Execute the display this command in system view to check whether the undo stp vlan enable command exists.

[Sysname] display this

...

undo stp vlan 2 enable

stp mode pvst

stp global enable

...

If the above configuration exists and the network requires enabling the spanning tree feature for the VLANs, execute the stp vlan enable command in system view to enable the spanning tree feature on the VLANs.

c. Identify whether the spanning tree feature is enabled on the interfaces.

Execute the display stp command to Identify whether the spanning tree feature is not enabled on interfaces.

<Sysname> display stp

...

----[Port2(GigabitEthernet2/0/2)][DISABLED]----

Port protocol : Disabled

...

Execute the stp enable command in interface view to activate the spanning tree feature on the interfaces participating in the spanning tree calculations.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

User endpoint disconnection in the spanning tree network

Symptom

When a user endpoint is connected to the spanning tree network, transitent disconnection occurs on the interface connecting the endpoint, causing persistent packet loss and endpoint disconnection.

Common causes

The interface connected to the user endpoint device is not configured as an edge port.

Troubleshooting flow

Figure 35 shows the troubleshooting flowchart.

Figure 35 Flowchart for troubleshooting user endpoint disconnection in the spanning tree network

Solution

1. Check whether the interfaces directly connected to the user endpoint are edge ports in the spanning tree network.

Execute the display stp command on the device directly connected to the user endpoint to Identify whether the interface directly connected to the user endpoint is an edge port.

<Sysname> display stp

...

----[Port2(GigabitEthernet2/0/1)][FORWARDING]----

Port protocol : Enabled

Port role : Designated Port

Port ID : 128.2

Port cost(Legacy) : Config=auto, Active=20

Desg.bridge/port : 32768.2eae-3769-0200, 128.2

Port edged : Config=enabled, Active=enabled

Point-to-Point : Config=auto, Active=true

Transmit limit : 10 packets/hello-time

Protection type : Config=none, Active=none

Rapid transition : True

Port times : Hello 2s MaxAge 20s FwdDelay 15s MsgAge 0s

...

¡ If yes, proceed to step 2.

¡ If not, execute the stp edged-port command in interface view to configure this port as an edge port.

IMPORTANT:

The edge port and loop guard features cannot be configured simultaneously on an interface. If the device outputs the following error prompt when you execute the stp edged-port command, the interface has the loop guard feature configured. In this case, you must execute the undo stp loop-protection command to disable the loop guard feature before you execute the stp edged-port command.

Failed to enable edged-port on GigabitEthernet2/0/1, because loop-protection is enabled.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

STP/6/STP_DETECTED_TC

Unchangeable master port in an MSTI other than MSTI 0

Symptom

In an MSTP network, for instances other than MSTI 0 on the device, ports that should not have the master role are calculated as master ports. The master port roles cannot be changed by adjusting parameters such as priority and cost values.

Common causes

In an MST region, devices have inconsistent MST region configurations.

Troubleshooting flow

If the MST region configurations of two devices are inconsistent, a device will consider that the peer device and the local device are not in the same MST region. A port connected to a device in the region will be calculated as the master port. To resolve this issue, check the MST region configuration of devices in the same MST region to ensure that the configurations of each device are consistent.

Figure 36 shows the troubleshooting flowchart.

Figure 36 Flowchart for troubleshooting unchangeable master port in an MSTI other than MSTI 0

Solution

1. Make sure that devices in the same MST region have the same region name, revision level, and VLAN mapping table configuration for the MST region.

Execute the display stp region-configuration command to view the effective MST region configuration of the devices.

<Sysname> display stp region-configuration

Oper Configuration

Format selector : 0

Region name : hello

Revision level : 0

Configuration digest : 0x5f762d9a46311effb7a488a3267fca9f

Instance VLANs Mapped

0 21 to 4094

1 1 to 10

2 11 to 20

¡ Region name—The region name of the MST region. Execute the stp region-configuration command in system view to enter MST region view, and configure the region name with the region-name command.

¡ Revision level—Revision level of the MST region. Execute the stp region-configuration command in system view to enter MST region view, and configure the revision level with the revision-level command.

¡ Instance VLANs Mapped—VLAN mapping relationships of the MST region. Execute the stp region-configuration command in system view to enter MST region view, and configure VLAN mapping relationships with the instance or the vlan-mapping modulo command.

If the above parameters are inconsistent on different devices within the same MST region, change the parameter configurations to be consistent. After configuring the parameters of the MST region, execute the active region-configuration command in MST region view to activate the user configuration of the MST region. If not, the previous configuration will still take effect on the MST region.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Spanning tree link flapping

Symptom

Frequent network topology changes caused by constant changes of the spanning tree root bridge, port role, and port state

Common causes

The following are the common causes of this type of issue:

· Link flapping: The properties of a certain port's link, such as the state, rate, and duplex mode, change frequently.

· Node fault: The CPU usage of the devices on the network is high and the spanning tree packets cannot be processed in a timely manner. A device reboots repeatedly, causing the spanning tree to be constantly recalculated.

· Network failures:

¡ Packet congestion leads to BPDU loss.

¡ A device received a BPDU from another network unexpectedly, triggering a recalculation of the current network's spanning tree.

¡ Other features of the device cause BPDUs to be discarded incorrectly.

Troubleshooting flow

Figure 37 shows the troubleshooting flowchart.

Figure 37 Flowchart for troubleshooting spanning tree link flapping

Solution

1. Identify whether any device in the spanning tree network is experiencing high CPU usage, rebooting, or changes in the status of the interface links.

Based on the network deployment, use the controller, device management platform, and user interface to check whether the devices are experiencing high CPU usage, rebooting, or changes in the status of the interface links.

If both the device state and link state have returned to stability, but the issue persists, proceed to step 2.

2. Check whether the root bridge of the spanning tree network is changed.

In the spanning tree network, execute the display stp root command to view the root bridge in the current spanning tree network.

<Sysname> display stp root

MST ID Root Bridge ID ExtPathCost IntPathCost Root Port

0 32768.14e3-19d3-0100 0 40 GE2/0/2

10 0.14e3-19d3-0100 0 40 GE2/0/2

20 0.14e3-1f59-0200 0 0

The Root Bridge ID field indicates the ID of the root bridge in the spanning tree network. The format of the root bridge ID is priority.bridge MAC address. Use this field to determine whether the root bridge in the spanning tree network is the desired device. If the root bridge device is correct but the spanning tree network still keeps flapping, proceed to step 3. If the root bridge device is not the desired one, you can modify the root bridge as follows:

¡ Change the priority of the desired device. The priority of a device participates in the spanning tree calculation. The smaller the value, the higher the priority. Execute the stp priority command to set the priority level of the desired device to 0 or a smaller value, so that the specified device becomes the spanning tree root bridge.

¡ Execute the stp root primary command on the desired device to configure this device as the root bridge of the spanning tree.

After you configure the desired device as the root bridge, maintain the stability of the root bridge and network topology with the following functions:

¡ Enable root guard.

After configuring the stp root-protection command in interface view, this interface can only act as a designated port in all MSTIs. When this interface receives a BPDU with higher priority level from an MSTI, it immediately transits to listening state, no longer forwarding packets (which is equivalent to disconnect this interface). If no superior BPDU is received within double the forward delay time (the default forward delay time is 15 seconds), the interface will restore to its normal state. The root guard function can avoid illegal changes in the spanning tree topology caused by misconfiguration or vicious attacks.

¡ Configure the edge port and BPDU guard.

For access layer devices, access ports are usually directly connected to user endpoints (such as PCs) or file servers. Access ports must be configured as edge ports for fast port migration. Under normal circumstances, an access port does not exchange STP BPDUs with user endpoints. If the access port receives BPDUs, network topology change and spanning tree network flapping might occur.

Spanning tree provides BPDU guard feature to solve this issue. Execute stp bpdu-protection command in system or interface view. When edge ports receive BPDUs, the system will shut down these ports and notify the user that these ports have been shut down by spanning tree. The shutdown ports will be reactivated after a time interval configured by using the shutdown-interval command.

¡ Enable loop guard.

A downstream device relies on continuous BPDUs sent by the upstream device to maintain the state of the root port and blocked ports. If a link congestion or unidirectional link fault occurs, these ports cannot receive BPDUs from the upstream device. In this case, the downstream device reselects the port role, causing the root port of the downstream device to convert to the designated port. The blocked ports transit to the forwarding state, and a loop occurs in the switched network.

Execute the stp loop-protection command on the root port and alternate port of downstream devices to configure the loop guard feature to suppress the occurrence of the above loops. On a port with the loop guard feature enabled, the initial state of all MSTIs is discarding. If the port receives BPDUs, these MSTIs can perform normal state transitions. If the port does not receive BPDUs, these MSTIs will remain in the discarding state to avoid loops.

Do not configure the loop guard feature on a port connected to a user endpoint. Otherwise, the port will remain discarding and cannot forward user traffic.

¡ Enable TC-BPDU guard.

If TC-BPDUs are used to attack a device, the device will receive a large number of TC-BPDUs within a short period of time. Then, the device is busy with forwarding entry flushing. This affects network stability. You can enable TC-BPDU guard to prevent frequent flushing of forwarding entries. Execute the stp tc-protection command in system view to enable the TC-BPDU guard feature. Execute the stp tc-protection threshold number command in system view to configure the maximum number of forwarding entry flushes that the device can perform every 10 seconds.

With the TC-BPDU guard feature enabled, if the number of times the device receives TC-BPDUs within 10 seconds is greater than the specified number, the device only refresh the forwarding entries the specified number of times during this period. For excess TC-BPDUs, the device refreshes the forwarding entries uniformly after this period of time.

If the issue persists, proceed to step 3.

3. Troubleshoot for BPDU timeout.

Check whether the device outputs the STP_BPDU_RECEIVE_EXPIRY log. This log describes that the device has not received any BPDUs within the BPDU timeout time, which has triggered spanning tree recalculation. The cause of BPDU timeout might be congestion in BPDU forwarding on the network, or other configurations on the device causing BPDUs to be incorrectly discarded.

To locate the fault more accurately, proceed to step 4.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Log messages

· STP/5/STP_BPDU_RECEIVE_EXPIRY

· STP/6/STP_DETECTED_TC

· STP/6/STP_NOTIFIED_TC

Alarm messages

N/A

Troubleshooting Layer 2—WAN access issues

PPP issues

PPP interface in protocol down state

Symptom

After the physical PPP interfaces of two devices are connected, the link layer protocol state of the interfaces is displayed as down.

Common causes

The following are the common causes of this type of issue:

· The physical layer state of the interface is not up.

· The PPP-related configuration is incorrect on the interfaces at both ends of the link.

· The PPP protocol packets are dropped.

· A loop exists on the link.

· The link latency is too high.

Troubleshooting flow

Figure 38 shows the troubleshooting flowchart.

Figure 38 Flowchart for troubleshooting PPP interfaces in protocol down state

Solution

1. Identify whether the interface is up on the physical layer.

Execute the display interface interface-type interface-number command in any view to check the physical state of the local interface:

¡ If the physical state of the local interface is Administratively DOWN, the local interface is shut down by using the shutdown command. In this case, bring up the interface by executing the undo shutdown command on the local interface.

¡ If the physical state of the local interface is DOWN, identify whether the peer interface is shut down by using the shutdown command. If yes, bring up the peer interface by executing the undo shutdown command on the peer interface.

¡ Identify whether the optical fibers and transceiver modules are firmly installed at both ends, and whether the Rx/Tx optical fibers are correctly plugged. Resolve the issue that the interface is physically down.

¡ If the interface state is up, proceed to the next step.

2. Identify whether the PPP configuration is correct at both ends of the link.

Execute the display this command on the interface where the PPP protocol is down to check the PPP-related configuration on the interface.

[Sysname-Serial3/0/5] display this

interface Serial3/0/5

ip address 12.1.1.1 255.255.255.0

return

¡ Verify that the link layer protocol is PPP on both interfaces of the link. More specifically: In any view on the devices at both ends, execute the display interface interface-type interface-number command to identify whether the value for the Link layer protocol field in the command output is PPP on both interfaces. If it is not PPP on an interface, execute the link-protocol ppp command on the interface to configure the link layer protocol as PPP.

¡ If PPP authentication has been configured, identify whether the authentication type and the authentication username/password of the authenticator are the same as those of the authenticatee. If they are different, modify the configuration as described in the PPP configuration guide.

¡ If interfaces on both ends are assigned to an MP group, identify whether the MP-group interface is shut down by using the shutdown command. If yes, bring up the MP-group interface by executing the undo shutdown command on the MP-group interface.

¡ If the interface on one end has the remote address command executed, make sure the interface on the other end has either the ip address ppp-negotiate command executed or the ip address command executed to manually configure the IP address specified by using the remote address command on the peer interface.

If PPP is configured correctly but the link layer protocol state is still down on the PPP interface, proceed to the next step.

3. Identify whether the protocol packets are received and sent normally on the interface.

Execute the display ppp packet statistics command in any view to view the statistics of PPP protocol packets and identify whether the packets are sent and received normally.

<Sysname> display ppp packet statistics slot 3

PPP packet statistics in slot 3:

-----------------------------------LCP--------------------------------------

SEND_LCP_CON_REQ : 4 RECV_LCP_CON_REQ : 5

SEND_LCP_CON_NAK : 0 RECV_LCP_CON_NAK : 0

SEND_LCP_CON_REJ : 0 RECV_LCP_CON_REJ : 0

SEND_LCP_CON_ACK : 4 RECV_LCP_CON_ACK : 4

SEND_LCP_CODE_REJ : 0 RECV_LCP_CODE_REJ : 0

SEND_LCP_PROT_REJ : 0 RECV_LCP_PROT_REJ : 0

SEND_LCP_TERM_REQ : 2 RECV_LCP_TERM_REQ : 1

SEND_LCP_TERM_ACK : 1 RECV_LCP_TERM_ACK : 0

SEND_LCP_ECHO_REQ : 25 RECV_LCP_ECHO_REQ : 0

SEND_LCP_ECHO_REP : 0 RECV_LCP_ECHO_REP : 25

SEND_LCP_FAIL : 0 SEND_LCP_CON_REQ_RETRAN : 0

-----------------------------------IPCP-------------------------------------

SEND_IPCP_CON_REQ : 38 RECV_IPCP_CON_REQ : 2

SEND_IPCP_CON_NAK : 0 RECV_IPCP_CON_NAK : 0

SEND_IPCP_CON_REJ : 0 RECV_IPCP_CON_REJ : 0

SEND_IPCP_CON_ACK : 2 RECV_IPCP_CON_ACK : 2

SEND_IPCP_CODE_REJ : 0 RECV_IPCP_CODE_REJ : 0

SEND_IPCP_PROT_REJ : 0 RECV_IPCP_PROT_REJ : 0

SEND_IPCP_TERM_REQ : 0 RECV_IPCP_TERM_REQ : 0

SEND_IPCP_TERM_ACK : 0 RECV_IPCP_TERM_ACK : 0

SEND_IPCP_FAIL : 0

-----------------------------------IPV6CP-----------------------------------

SEND_IPV6CP_CON_REQ : 0 RECV_IPV6CP_CON_REQ : 0

SEND_IPV6CP_CON_NAK : 0 RECV_IPV6CP_CON_NAK : 0

SEND_IPV6CP_CON_REJ : 0 RECV_IPV6CP_CON_REJ : 0

SEND_IPV6CP_CON_ACK : 0 RECV_IPV6CP_CON_ACK : 0

SEND_IPV6CP_CODE_REJ : 0 RECV_IPV6CP_CODE_REJ : 0

SEND_IPV6CP_PROT_REJ : 0 RECV_IPV6CP_PROT_REJ : 0

SEND_IPV6CP_TERM_REQ : 0 RECV_IPV6CP_TERM_REQ : 0

SEND_IPV6CP_TERM_ACK : 0 RECV_IPV6CP_TERM_ACK : 0

SEND_IPV6CP_FAIL : 0

-----------------------------------OSICP------------------------------------

SEND_OSICP_CON_REQ : 0 RECV_OSICP_CON_REQ : 0

SEND_OSICP_CON_NAK : 0 RECV_OSICP_CON_NAK : 0

SEND_OSICP_CON_REJ : 0 RECV_OSICP_CON_REJ : 0

SEND_OSICP_CON_ACK : 0 RECV_OSICP_CON_ACK : 0

SEND_OSICP_CODE_REJ : 0 RECV_OSICP_CODE_REJ : 0

SEND_OSICP_PROT_REJ : 0 RECV_OSICP_PROT_REJ : 0

SEND_OSICP_TERM_REQ : 0 RECV_OSICP_TERM_REQ : 0

SEND_OSICP_TERM_ACK : 0 RECV_OSICP_TERM_ACK : 0

SEND_OSICP_FAIL : 0

-----------------------------------MPLSCP-----------------------------------

SEND_MPLSCP_CON_REQ : 0 RECV_MPLSCP_CON_REQ : 0

SEND_MPLSCP_CON_NAK : 0 RECV_MPLSCP_CON_NAK : 0

SEND_MPLSCP_CON_REJ : 0 RECV_MPLSCP_CON_REJ : 0

SEND_MPLSCP_CON_ACK : 0 RECV_MPLSCP_CON_ACK : 0

SEND_MPLSCP_CODE_REJ : 0 RECV_MPLSCP_CODE_REJ : 0

SEND_MPLSCP_PROT_REJ : 0 RECV_MPLSCP_PROT_REJ : 0

SEND_MPLSCP_TERM_REQ : 0 RECV_MPLSCP_TERM_REQ : 0

SEND_MPLSCP_TERM_ACK : 0 RECV_MPLSCP_TERM_ACK : 0

SEND_MPLSCP_FAIL : 0

-----------------------------------AUTH-------------------------------------

SEND_PAP_AUTH_REQ : 0 RECV_PAP_AUTH_REQ : 0

SEND_PAP_AUTH_ACK : 0 RECV_PAP_AUTH_ACK : 0

SEND_PAP_AUTH_NAK : 0 RECV_PAP_AUTH_NAK : 0

SEND_CHAP_AUTH_CHALLENGE: 0 RECV_CHAP_AUTH_CHALLENGE: 0

SEND_CHAP_AUTH_RESPONSE : 0 RECV_CHAP_AUTH_RESPONSE : 0

SEND_CHAP_AUTH_ACK : 0 RECV_CHAP_AUTH_ACK : 0

SEND_CHAP_AUTH_NAK : 0 RECV_CHAP_AUTH_NAK : 0

SEND_PAP_AUTH_FAIL : 0 SEND_CHAP_AUTH_FAIL : 0

¡ If the number of received or sent packets is 0 or does not increase after you execute this command multiple times, it indicates that protocol packets are lost during transmission. Verify that the interfaces, optical fibers, and transceiver modules are operating correctly to resolve the packet loss issue. If the issue persists, proceed to step 6.

¡ If packets are received and sent normally, proceed to the next step.

4. Identify whether a loop exists on the link.

Execute the debugging ppp all interface interface-type interface-number command in user view on the local device to enable debugging for PPP packets. Identify whether the local end has received and sent packets that are completely the same (such as in the packet type, packet ID, and magic number.)

*Apr 7 19:38:04:384 2022 Sysname PPP/7/FSM_PACKET_0: -MDC=1-Slot=3;

PPP Packet:

Ser3/0/5(109) Output LCP(c021) Packet, PktLen 14

Current State reqsent, code ConfReq(01), id 0, len 10

MagicNumber(5), len 6, val c5 ae e7 03

*Apr 7 19:38:04:390 2022 Sysname PPP/7/FSM_PACKET_0: -MDC=1-Slot=3;

PPP Packet:

Ser3/0/5(109) Input LCP(c021) Packet, PktLen 14

Current State reqsent, code ConfReq(01), id 0, len 10

MagicNumber(5), len 6, val c5 ae e7 03

¡ If yes, a loop exists on the link. Check the cause of the loop (for example, an incorrect fiber connection), and remove the loop. If the issue persists, proceed to step 6.

¡ If not, no loop exists on the link. Proceed to the next step.

5. Identify whether the link latency is too high.

Execute the debugging ppp all interface interface-type interface-number command in user view on the local device to enable debugging for PPP packets. Determine the link latency by checking the time interval between the transmit timestamp and the receive timestamp of the PPP negotiation packets.

*Apr 7 19:38:04:384 2022 Sysname PPP/7/FSM_PACKET_0: -MDC=1-Slot=3;

PPP Packet:

Ser3/0/5(109) Output LCP(c021) Packet, PktLen 14

Current State reqsent, code ConfReq(01), id 0, len 10

MagicNumber(5), len 6, val c5 ae e7 03

*Apr 7 19:38:04:387 2022 Sysname PPP/7/FSM_PACKET_0: -MDC=1-Slot=3;

PPP Packet:

Ser3/0/5(109) Input LCP(c021) Packet, PktLen 14

Current State acksent, code ConfAck(02), id 0, len 10

MagicNumber(5), len 6, val c5 ae e7 03

Identify whether the link latency is longer than the negotiation timeout interval for PPP protocol packets configured on the current interface. The negotiation timeout interval for PPP protocol packets is configured by using the ppp timer negotiate command on the interface, and is 3 seconds by default.

¡ If the link latency is too high, execute the ppp timer negotiate command to appropriately increase the negotiation timeout interval. Alternatively, replace the corresponding device or link and retest the link latency until the link latency is less than the negotiation timeout interval for PPP protocol packets configured on the interface.

¡ If the link latency is small, proceed to the next step.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting Layer 3—IP services issues

ARP issues

ARP learning failure

Symptom

The device cannot learn ARP entries, causing traffic forwarding failure.

Common causes

The following are the common causes of this type of issue:

· The memory is insufficient.

· The physical layer state of the interface is not up.

· The IP addresses of the local interface and the peer interface do not reside on the same network segment.

· ARP packets fail to be sent to the CPU.

· A card is faulty.

· ARP packets are dropped due to a busy CPU.

Troubleshooting flow

Figure 39 shows the troubleshooting flowchart.

Figure 39 Flowchart for troubleshooting ARP learning failure

Solution

1. Use the display memory-threshold command to identify whether the memory is insufficient.

<Sysname> display memory-threshold

Memory usage threshold: 100%

Free-memory thresholds:

Minor: 96M

Severe: 64M

Critical: 48M

Normal: 128M

Early-warning: 256M

Secure: 304M

Current free-memory state: Normal (secure)

¡ If the Current free-memory state field displays Normal or Normal (secure), go to the next step.

¡ If the Current free-memory state field displays Minor, Severe, Critical, or Normal (early-warning), check the device memory usage and troubleshoot the insufficient memory issue.

2. Check the network configuration and interface state.

a. Use the display interface command to identify whether the interface is up. If the interface is not up, troubleshoot the issue.

b. Use the display fib ip-address command to view FIB entries. ip-address represents the IP address in an ARP entry. If the corresponding FIB entry does not exist, the routing module might be faulty. For more information about troubleshooting routing module issues, see "Troubleshooting Layer 3—IP Routing." If the corresponding FIB entry exists but the next hop address is not the address of the direct next hop, check the connection between the device and its next hop.

c. Use the display ip interface command to view the IP address of the interface.

- Identify whether the IP address of the local interface resides on the same network segment as the peer interface. If the IP addresses reside on different network segments, execute the ip address command in interface view to edit the IP addresses.

- Identify whether the local interface IP address conflicts with the peer interface IP address. If a conflict has occurred, execute the ip address command in interface view to edit the IP addresses.

- Identify whether the peer interface is the one where the next hop resides.

d. Use the ping command to identify whether a link failure exists.

3. Identify whether ARP packets are sent and received correctly.

a. Use the debugging arp packet command to enable ARP packet debugging. Then, execute the ping command to identify whether the device sends and receives ARP packets correctly.

<Sysname> debugging arp packet

<Sysname> ping –c 1 1.1.1.2

Ping 1.1.1.2 (1.1.1.2): 56 data bytes, press CTRL+C to break

56 bytes from 1.1.1.2: icmp_seq=0 ttl=255 time=2.511 ms

--- Ping statistics for 1.1.1.2 ---

1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 2.511/2.511/2.511/nan ms

<Sysname>*Apr 18 17:28:22:879 2022 Sysname ARP/7/ARP_SEND: -MDC=1; Sent an ARP message, operation: 1, sender MAC: 68cb-978f-0106, sender IP: 1.1.1.1, target MAC: 0000-0000-0000, target IP: 1.1.1.2

The command output indicates that the device has successfully sent an ARP request in which the destination IP address is 1.1.1.2 and the source IP address is 1.1.1.1.

*Apr 18 17:28:22:881 2022 Sysname ARP/7/ARP_RCV: -MDC=1; Received an ARP message, operation: 2, sender MAC: 68cb-9c3f-0206, sender IP: 1.1.1.2, target MAC: 68cb-978f-0106, target IP: 1.1.1.1

The command output indicates that the device has successfully received an ARP reply in which the destination IP address is 1.1.1.1 and the source IP address is 1.1.1.2.

- If the device has sent and received ARP packets successfully, go to step 4.

- If the device failed to send or receive an ARP packet, go to the next step.

b. Use the debugging arp error command to enable ARP error debugging. Identify the ARP sending or receiving failure cause according to Table 4.

Table 4 Command output

Field	Description
Packet discarded for the network state of receiving interface is down.	An ARP packet was discarded because the network layer state of the receiving interface was down.
Packet discarded for the ARP packet is too short.	An ARP packet was discarded because the packet was too short.
Packet discarded for the ARP packet is error.	An ARP packet was discarded because the packet was an error packet.
Packet discarded for the link state of the port is down.	An ARP packet was discarded because the link layer state of the receiving port went down.
Packet discarded for the sender IP is invalid.	An ARP packet was discarded because the sender IP address in the packet was invalid.
Packet discarded for the sender IP is a broadcast IP.	An ARP packet was discarded because the sender IP address in the packet was a broadcast IP address.
Packet discarded for the target IP is invaild.	An ARP packet was discarded because the target IP address in the packet was invalid.
Packet discarded for the target IP is a broadcast IP.	An ARP packet was discarded because the target IP address in the packet was a broadcast IP address.
Failed to get the source MAC of the ARP reply.	ARP failed to obtain the source MAC address of an ARP reply.
Packet discarded for the source MAC is a multicast address.	An ARP packet was discarded because the source MAC address in the packet was a multicast MAC address.
Packet discarded for the source MAC is a broadcast address.	An ARP packet was discarded because the source MAC address in the packet was a broadcast MAC address.
Packet discarded for the sender MAC address is the same as the receiving interface.	An ARP packet was discarded because the sender MAC address in the packet is the same as the MAC address of the receiving interface.
Packet discarded for the number of ARP entries reaches the limit.	An ARP packet was discarded because the maximum number of ARP entries was reached.
Packet discarded for the type of receiving interface is L2VE.	An ARP packet was discarded because the receiving interface of the packet was an L2VE interface.
Packet discarded for conflict with static entry.	An ARP packet was discarded because the ARP information in the packet conflicted with a static ARP entry.
Packet discarded for memory alarm notification.	An ARP packet was discarded because a memory alarm notification was received.
Packet discarded for insufficient resources.	An ARP packet was discarded because of insufficient resources.

4. Identify whether a card is faulty. The following uses the card in slot 1 as an example. Use the display system internal arp statistics command to view ARP statistics of the card.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] display system internal arp statistics slot 1

Entry statistics:

Valid = 1 Dummy = 0

Long static = 0 Short resolved = 0

Multiport = 0 L3 short = 0

Packet = 1 OpenFlow = 0

Rule = 0 ARP input = 175

Resolved = 10

Static statistics:

Short static = 0 Long static = 0

Multiport = 0 Disabled = 0

Error statistics:

Memory = 0 Sync memory = 0

Packet = 10 Parameter = 0

IF = 0 Walk = 0

Add host route = 0 Del host route = 0

Local address = 0 Real time message = 0

Refresh rule = 0 Delete rule = 0

Smooth rule start = 0 Smooth rule end = 0

Running information:

Max ARP = 2048 Max multiport = 64

Default blackhole = 1 Max blackhole = 200

Timer queue = 0 Event queue = 0

Packet queue = 0 LIPC send queue = 0/0/0

a. If the value for the ARP input field is not 0, go to the next step. If the value for the ARP input field is 0, troubleshoot the card issue.

b. Collect the content of the Error statistics field and send it to H3C technical support staff.

5. Identify whether ARP packets are dropped due to a busy CPU. Use the view command to view information about ARP in the /proc/kque system directory and identify the situation and reason of ARP packet dropping.

[Sysname-probe] view /proc/kque | in ARP

0: dd0e0800 ARP_TIMER 128/0/13/0 (0x4b515545)

0: dd0e0900 ARP_SINGLEEVENT 1/0/0/0 (0x4b515545)

0: dd0e0a00 ARP_SEND 1024/0/0/0 (0x4b515545)

0: dd0e0b00 ARP_RULE 4096/0/0/0 (0x4b515545)

0: dd0e0c00 ARP_RULE_ENTRY 4096/0/0/0 (0x4b515545)

0: dd0e0d00 ARP_RBHASHNOTIFY 1/0/0/0 (0x4b515545)

0: dd0e0f00 ARP_DTC 2048/0/0/0 (0x4b515545)

0: dd0e6200 ARP_MICROSEGMENT 2048/0/0/0 (0x4b515545)

0: dd0e6300 ARP_MACNOTIFY 4096/0/0/0 (0x4b515545)

0: dd0e6400 ARP_UNKNOWNSMAC_EVENT 1/0/0/0 (0x4b515545)

0: d06e5900 ARPSNP_PKT 4096/0/0/0 (0x4b515545)

0: d06e5a00 ARP_VSISUP_PKT 4096/0/0/0 (0x4b515545)

0: d06e5b00 ARP_EVENT 8192/0/2/0 (0x4b515545)

0: d06e5c00 ARP_FREQEVENT 8192/0/1/0 (0x4b515545)

0: d06e5d00 ARP_MACNOTIFYEVENT 1/0/0/0 (0x4b515545)

0: d06e5e00 ARP_PKT 4096/0/2/0 (0x4b515545)

0: ca5f3400 FIBARPHRQ 1/0/0/0 (0x4b515545)

View the value for the ARP_PKT field in the command output, which is displayed in the W/X/Y/Z format.

¡ W represents the queue capacity, which is a fixed value.

¡ X represents the current queue length.

¡ Y represents the history maximum length of the queue.

¡ Z represents the number of dropped ARP packets in the queue.

If Z is not 0 and Y equals W, ARP packets are dropped due to a busy CPU. If Z is 0, go to the next step.

6. Collect detailed information about the ARP process. Execute the display mdc command to obtain the MDC number. Use the display process command to view the number of the ARP process corresponding to the MDC number. Based on the process number, use the view command to obtain detailed information about the ARP process and send it to H3C technical support staff.

[Sysname-probe] display process name karp/1

Job ID: 224

PID: 224

Parent JID: 2

Parent PID: 2

Executable path: -

Instance: 0

Respawn: OFF

Respawn count: 1

Max. spawns per minute: 0

Last started: Mon Apr 18 15:09:58 2022

Process state: sleeping

Max. core: 0

ARGS: -

TID LAST_CPU Stack PRI State HH:MM:SS:MSEC Name

224 0 0K 115 S 0:5:25:380 [karp/1]

1 in the karp/1 argument represents the MDC number. PID in the command output represents the number of the ARP process. Execute the view command to display detailed information about the ARP process numbered 224.

[Sysname-probe]view /proc/224/stack

[<c04c9cd4>] kepoll_wait+0x274/0x3c0

[<e1fb1372>] arp_Thread+0x42/0xd0 [system]

[<c043f1b4>] kthread+0xd4/0xe0

[<c0401daf>] kernel_thread_helper+0x7/0x10

[<ffffffff>] 0xffffffff

7. Collect the following information and contact H3C Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

ARP response failure

Symptom

The device does not reply to the ARP request sent from the peer device.

Common causes

The following are the common causes of this type of issue:

· The target IP address in the ARP request received by the interface is not the IP address of the local device.

· The ARP request sent by the peer device triggers source MAC-based ARP attack detection on the local device.

· The ARP request sent by the peer device triggers ARP attack detection on the local device.

Troubleshooting flow

Figure 40 shows the troubleshooting flowchart.

Figure 40 Flowchart for troubleshooting ARP response failure

Solution

1. View information about the ARP request sent from the peer device to identify whether it is sent to the CPU.

a. Use the debugging arp packet command to enable ARP packet debugging. Then, configure the peer device to send an ARP request to the local device.

<Sysname> debugging arp packet

<Sysname> *Apr 21 17:38:05:489 2022 Sysname ARP/7/ARP_RCV: -MDC=1; Received an ARP message, operation: 1, sender MAC: 68cb-9c3f-0206, sender IP: 1.1.1.2, target MAC: 0000-0000-0000, target IP: 1.1.1.1

- If the target IP address is not the local device IP address, check the routing table and FIB of the peer device.

- If the target IP address is the local device IP address, go to the next step.

b. Use the debugging arp error command to enable ARP error debugging. Identify the ARP response failure cause according to Table 5.

Table 5 Command output

Field	Description
Packet discarded for the network state of receiving interface is down.	An ARP packet was discarded because the network layer state of the receiving interface was down.
Packet discarded for the ARP packet is too short.	An ARP packet was discarded because the packet was too short.
Packet discarded for the ARP packet is error.	An ARP packet was discarded because the packet was an error packet.
Packet discarded for the link state of the port is down.	An ARP packet was discarded because the link layer state of the receiving port went down.
Packet discarded for the sender IP is invalid.	An ARP packet was discarded because the sender IP address in the packet was invalid.
Packet discarded for the sender IP is a broadcast IP.	An ARP packet was discarded because the sender IP address in the packet was a broadcast IP address.
Packet discarded for the target IP is invaild.	An ARP packet was discarded because the target IP address in the packet was invalid.
Packet discarded for the target IP is a broadcast IP.	An ARP packet was discarded because the target IP address in the packet was a broadcast IP address.
Failed to get the source MAC of the ARP reply.	ARP failed to obtain the source MAC address of an ARP reply.
Packet discarded for the source MAC is a multicast address.	An ARP packet was discarded because the source MAC address in the packet was a multicast MAC address.
Packet discarded for the source MAC is a broadcast address.	An ARP packet was discarded because the source MAC address in the packet was a broadcast MAC address.
Packet discarded for the sender MAC address is the same as the receiving interface.	An ARP packet was discarded because the sender MAC address in the packet is the same as the MAC address of the receiving interface.
Packet discarded for the number of ARP entries reaches the limit.	An ARP packet was discarded because the maximum number of ARP entries was reached.
Packet discarded for the type of receiving interface is L2VE.	An ARP packet was discarded because the receiving interface of the packet was an L2VE interface.
Packet discarded for conflict with static entry.	An ARP packet was discarded because the ARP information in the packet conflicted with a static ARP entry.
Packet discarded for memory alarm notification.	An ARP packet was discarded because a memory alarm notification was received.
Packet discarded for insufficient resources.	An ARP packet was discarded because of insufficient resources.

2. Identify whether the peer device MAC address is in a source MAC-based ARP attack entry. The following uses local interface GigabitEthernet2/0/1 as an example. Execute the display arp source-mac command to display ARP attack entries detected by source MAC-based ARP attack detection.

<Sysname> display arp source-mac interface gigabitethernet 2/0/1

Source-MAC VLAN/VSI name Interface Aging-time (sec)

23f3-1122-3344 4094 GE2/0/1 10

¡ If a source MAC-based ARP attack entry exists and the MAC address is the peer device MAC address, set the threshold for source MAC-based ARP attack detection as required. To set the threshold for source MAC-based ARP attack detection, use the arp source-mac threshold command.

¡ If the peer device MAC address is not in any source MAC-based ARP attack entry, go to the next step.

3. Identify whether the peer device triggers ARP attack detection. The following uses slot 1 as an example. Execute the display arp detection statistics attack-source command to display statistics for ARP attack sources.

<Sysname> display arp detection statistics attack-source slot 1

Interface VLAN MAC address IP address Number Time

GE2/0/1 1 0005-0001-0001 10.1.1.14 24 17:09:56

03-27-2017

¡ If an entry has the peer device MAC address, check the ARP attack detection configuration to identify whether inappropriate configuration causes the peer device to trigger ARP attack detection. If the configuration is inappropriate, edit it.

¡ If no entry has the peer device MAC address, go to the next step.

4. Use the display arp detection statistics packet-drop command to display statistics for packets dropped by ARP attack detection. Identify the reason why ARP attack detection is triggered according to the statistics.

<Sysname> display arp detection statistics packet-drop

State: U-Untrusted T-Trusted

ARP packets dropped by ARP inspect checking:

Interface/AC(State) IP Src-MAC Dst-MAC Inspect

GE2/0/1(U) 40 0 0 78

GE2/0/2(U) 0 0 0 0

GE2/0/3(T) 0 0 0 0

GE2/0/4(U) 0 0 30 0

GE2/0/5-srv1(U) 0 10 20 0

GE2/0/5-srv2(T) 10 0 20 22

Table 6 Command output

Field	Description
State	State of an interface: · U—ARP untrusted interface or AC. · T—ARP trusted interface or AC.
Interface/AC(State)	Inbound interface or AC of ARP packets. State specifies the port or AC state, which is trusted or untrusted.
IP	Number of ARP packets discarded due to invalid sender and target IP addresses.
Src-MAC	Number of ARP packets discarded due to invalid source MAC address.
Dst-MAC	Number of ARP packets discarded due to invalid destination MAC address.
Inspect	Number of ARP packets that failed to pass user validity check.

5. Use the display system internal arp statistics command to display ARP statistics on each card. Collect the content of the Error statistics field and send it to H3C technical support staff.

[Sysname-probe] display system internal arp statistics slot 1

Entry statistics:

Valid = 1 Dummy = 0

Long static = 0 Short resolved = 0

Multiport = 0 L3 short = 0

Packet = 1 OpenFlow = 0

Rule = 0 ARP input = 175

Resolved = 10

Static statistics:

Short static = 0 Long static = 0

Multiport = 0 Disabled = 0

Error statistics:

Memory = 0 Sync memory = 0

Packet = 10 Parameter = 0

IF = 0 Walk = 0

Add host route = 0 Del host route = 0

Local address = 0 Real time message = 0

Refresh rule = 0 Delete rule = 0

Smooth rule start = 0 Smooth rule end = 0

Running information:

Max ARP = 2048 Max multiport = 64

Default blackhole = 1 Max blackhole = 200

Timer queue = 0 Event queue = 0

Packet queue = 0 LIPC send queue = 0/0/0

6. Use the debugging arp entry command to enable ARP entry debugging. View the ARP entry status, collect related logs, and send them to H3C technical support staff.

<Sysname> debugging arp entry

<Sysname> ping -c 1 192.168.111.188

PING 192.168.111.188 (192.168.111.188): 56 data bytes, press CTRL_C to break

56 bytes from 192.168.111.188: icmp_seq=0 ttl=128 time=1.000 ms

--- 192.168.111.188 ping statistics ---

1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 1.000/1.000/1.000/0.000 ms

*Dec 17 14:28:34:762 2012 Sysname ARP/7/ARP_ENTRY: -MDC=1; ARP entry status ch

anged: MAC address: 000a-eb83-691e, IP address: 192.168.111.188, INITIALIZE -> N

O_AGE

Table 7 Command output

Field	Description
ARP entry status changed	The status of an ARP entry changed.
MAC address	MAC address in the ARP entry.
IP address	IP address in the ARP entry.
state1->state2	The status of the ARP entry changed from state1 to state2. An ARP entry has the following status: · INITIALIZE—The ARP entry is not resolved. · NO_AGE—The ARP entry does not age out. · AGING—Aging probe for the ARP entry has started. · AGED—The ARP entry ages out and is to be deleted.

7. Collect the following information and contact H3C Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Traffic forwarding failure based on the existing ARP entry

Symptom

The device has learned an ARP entry but cannot forward traffic correctly.

Common causes

The following are the common causes of this type of issue:

· An abnormal parameter exists in the learned ARP entry.

· The learned ARP entry failed to be deployed to the driver.

Troubleshooting flow

Figure 41 shows the troubleshooting flowchart.

Figure 41 Flowchart for troubleshooting traffic forwarding failure based on the existing ARP entry

Solution

1. Identify whether an abnormal parameter exists in the learned ARP entry. Use the display system internal adj4 entry command to view ARP entry information. The following uses interface GigabitEthernet2/0/1 and peer IP address 1.1.1.2 as an example.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] display system internal adj4 entry 1.1.1.2 interface gigabitethernet 2/0/1

ADJ4 entry:

Entry attribute : 0x0

Service type : Ethernet

Link media type : Broadcast

Action type : Forwarding

Entry flag : 0x0

Forward type : 0x0

Slot : 0

MTU : 1500

Driver flag : 2

Sequence No : 17

Physical interface : GE2/0/1

Logical interface : N/A

Virtual circuit information : 65535

ADJ index : 0xdc731e70

Peer address : 0.0.0.0

Reference count : 0

Reference Sequence : 9

MicroSegmentID : 0

Nexthop driver[0] : 0xffffffff

Nexthop driver[1] : 0xffffffff

Driver context[0] : 0xffffffff

Driver context[1] : 0xffffffff

Driver context[2] : 0xffffffff

Driver context[3] : 0xffffffff

Driver context[4] : 0xffffffff

Driver context[5] : 0xffffffff

Link head information(IP) : 68cb9c3f020668cb978f01060800

Link head information(MPLS) : 68cb9c3f020668cb978f01068847

¡ If the Action type field displays Forwarding, the device forwards traffic from 1.1.1.2 correctly and the device is not faulty.

¡ If the Action type field displays Drop, the device fails to forward traffic from 1.1.1.2. An abnormal parameter exists in the learned ARP entry.

- If the Driver flag field displays 4, driver resources are insufficient. Check the driver usage.

- If the Driver flag field does not display 4, go to the next step.

2. Identify whether the ARP entry is successfully deployed to the driver. Use the debugging system internal adj4 command and specify the hardware keyword to enable IPv4 adjacency entry debugging. Use the reset arp command to clear ARP entries from the ARP table. Then, use the ping command to send a packet to the peer device to trigger ARP learning. View the state of ARP deployment to the driver.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] debugging system internal adj4 hardware

[Sysname-probe] ping 1.1.1.2

Ping 1.1.1.2 (1.1.1.2): 56 data bytes, press CTRL+C to break

56 bytes from 1.1.1.2: icmp_seq=0 ttl=255 time=2.015 ms

*Apr 22 15:57:56:173 2022 Sysname ARP/7/ARP_SEND: -MDC=1; Sent an ARP message, operation: 1, sender MAC: 68cb-978f-0106, sender IP: 1.1.1.1, target MAC: 0000-0000-0000, target IP: 1.1.1.2

*Apr 22 15:57:56:173 2022 Sysname ARP/7/ARP_RCV: -MDC=1; Received an ARP message, operation: 2, sender MAC: 68cb-9c3f-0206, sender IP: 1.1.1.2, target MAC: 68cb-978f-0106, target IP: 1.1.1.1

*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_ENTRY: -MDC=1;

-------------ADJ4 Entry------------

IP address : 1.1.1.2

Route interface : GE2/0/1

Service type : Ethernet

Action type : Forwarding

Link media type : Broadcast

Physical interface : GE2/0/1

Logical interface : N/A

VSI Index : 4294967295

VPN Index : 0

MicroSegmentID : 0

MicSegOrigin : 5

Virtual Circuit information : 0xffff

Sequence : 1

Sequence for aging : 1

Slot : 0

MTU : 1500

*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_ENTRY: -MDC=1;

Add ADJ entry finished, Result : 0

*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_HARDWARE: -MDC=1;

====Start ADJLINK Add====

*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_HARDWARE: -MDC=1;

--------------- New Entry -------------

Service type : Ethernet

Link media type : Broadcast

Action type : Forwarding

EntryAttr : 0

IP address : 1.1.1.2

Route interface : GE2/0/1

Port interface : N/A

Slot : 0

MTU : 1500

VLAN ID : 65535

Second VLAN ID : 65535

Physical interface : GE2/0/1

Logical interface : N/A

VRF index : 0

VSI index : -1

VSI link ID : 65535

Usr ID : -1

MAC address : 68cb-9c3f-0206

Link head length(IP) : 14

Link head length(MPLS) : 14

Link head information(IP) : 68cb9c3f020668cb978f01060800

Link head information(MPLS) : 68cb9c3f020668cb978f01068847

*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_HARDWARE: -MDC=1;

----------- New Entry DrvContext ---------

Nexthop driver

[0]: 0xffffffff [1]: 0xffffffff

Driver context

[0]: 0xffffffff [1]: 0xffffffff [2]: 0xffffffff [3]: 0xffffffff [4]: 0xffffffff [5]: 0xffffffff

TRILL VN driver context

[0]: 0xffffffffffffffff [1]: 0xffffffffffffffff

*Apr 22 15:57:56:174 2022 Sysname ADJ4/7/ADJ4_HARDWARE: -MDC=1;

====End ADJLINK Operate====

Result : 0x0, Reference flag : 0x0, Syn flag : 0x0

56 bytes from 1.1.1.2: icmp_seq=1 ttl=255 time=1.061 ms

56 bytes from 1.1.1.2: icmp_seq=2 ttl=255 time=0.908 ms

56 bytes from 1.1.1.2: icmp_seq=3 ttl=255 time=0.625 ms

56 bytes from 1.1.1.2: icmp_seq=4 ttl=255 time=0.580 ms

--- Ping statistics for 1.1.1.2 ---

5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 0.580/1.038/2.015/0.520 ms

[Sysname-probe]%Apr 22 15:57:56:986 2022 Sysname PING/6/PING_STATISTICS: -MDC=1; Ping statistics for 1.1.1.2: 5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss, round-trip min/avg/max/std-dev = 0.580/1.038/2.015/0.520 ms.

¡ If the Result field displays 0x0, the ARP entry has been successfully deployed to the driver. Go to the next step.

¡ If the Result field does not display 0x0, the ARP entry failed to be deployed to the driver. Check the hardware resource usage under the guidance of H3C technical support staff.

3. Execute the following commands, collect the command outputs, and send them to H3C technical support staff:

¡ debugging system internal adj4 (with the notify keyword specified)

¡ debugging system internal fib prefix

4. Collect the following information and contact H3C Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

DHCP issues

· This document mainly introduces the procedures of troubleshooting attack protection issues on the DHCP server.

· For specific implementations of various attack prevention functions of DHCP, see DHCP Attack Protection Technology White Paper.

DHCP starvation attack prevention issues

About DHCP starvation attack prevention

A DHCP starvation attack occurs when an attacker constantly sends forged DHCP requests using different MAC addresses in the chaddr field to a DHCP server. As a result, legitimate DHCP clients cannot obtain IP addresses, because the IP address resources of the DHCP server are exhausted. To resolve this issue, enable the DHCP starvation attack prevention feature for the DHCP server.

Symptom

· Although the DHCP starvation attack prevention feature is enabled, the DHCP server still frequently runs out of IP address resources.

· A legitimate user cannot obtain any IP address from the DHCP server, because its requests are regarded as attack packets.

Common causes

The following are the common causes of this type of issue:

· The DHCP starvation attack prevention feature is not enabled on the client-facing interfaces of the DHCP server.

· When multiple DHCP relay agents exist between a DHCP client and the DHCP server, the DHCP server or non-first-hop relay agents are enabled with the MAC address check feature.

· The maximum number of ARP entries or MAC addresses that a client-facing interface can learn is unreasonable.

Troubleshooting flow

Figure 42 shows the troubleshooting flowchart.

Figure 42 Flowchart for troubleshooting DHCP starvation attack prevention issues

Solution

1. Check w hether the DHCP starvation attack prevention feature is enabled on the client-facing interfaces of the DHCP server.

NOTE:

Take this step when DHCP clients are directly connected to the DHCP server. If DHCP clients are connected to a DHCP relay agent, proceed to step 2.

For better DHCP starvation attack prevention, configure the DHCP server to achieve DHCP starvation attack prevention against DHCP requests with different MAC addresses and with the same MAC address.

To achieve DHCP starvation attack prevention against DHCP requests with different MAC addresses:

¡ For a Layer 3 interface, use the arp max-learning-num command in Layer 3 interface view to set an ARP entry learning limit.

¡ For a Layer 2 interface, perform the following operations in Layer 2 interface view:

- Use the mac-address max-mac-count command to set an MAC learning limit.

- Use the undo mac-address max-mac-count enable-forwarding command to disable forwarding unknown frames received on the interface after the MAC learning limit on the interface is reached.

You can use the display this command to view the configuration of a client-facing interface on the DHCP server.

¡ Display Layer 3 interface configuration.

<Sysname> system-view

[Sysname] interface GigabitEthernet 2/0/1

[Sysname-GigabitEthernet2/0/1] display this

interface GigabitEthernet2/0/1

port link-mode route

arp max-learning-num 10

...

If no ARP entry limit is configured on the interface, use the arp max-learning-num command in Layer 3 interface view to set an ARP entry learning limit.

¡ Display Layer 2 interface configuration.

<Sysname> system-view

[Sysname] interface GigabitEthernet 2/0/1

[Sysname-GigabitEthernet2/0/1] display this

interface GigabitEthernet2/0/1

port link-mode bridge

mac-address max-mac-count 600

undo mac-address max-mac-count enable-forwarding

...

If the interface does not have any configuration about DHCP starvation attack prevention, perform the following operations in Layer 2 interface view:

- Use the mac-address max-mac-count command to set an MAC learning limit.

- Use the undo mac-address max-mac-count enable-forwarding command to disable forwarding unknown frames received on the interface after the MAC learning limit on the interface is reached.

To achieve DHCP starvation attack prevention against DHCP requests with the same MAC address, use the dhcp relay check mac-address command to enable MAC address check on all client-facing interfaces. The MAC address check feature enables the DHCP server to compare the chaddr field of a received DHCP request with the source MAC address in the frame header. If they are the same, the DHCP server verifies the packet legal and continues processing the packet. If they are not the same, the DHCP server discards the request.

You can use the display this command to check whether the MAC address check feature is enabled on a client-facing interface of the DHCP server.

<Sysname> system-view

[Sysname] interface GigabitEthernet 2/0/1

[Sysname-GigabitEthernet2/0/1] display this

interface GigabitEthernet2/0/1

port link-mode route

dhcp relay check mac-address

...

If the MAC address check feature is not enabled, use the dhcp relay check mac-address command to enable this feature on the interface.

2. Check whether the DHCP starvation attack prevention feature is configured correctly on the DHCP server or DHCP relay agent.

NOTE:

Take this step when a DHCP client is connected to a DHCP relay agent for communication with the DHCP server. If no DHCP relay agent is deployed on the network, skip this step.

a. Check w hether an ARP entry learning limit or MAC learning limit is configured on the client-facing interfaces of the DHCP relay agent or the DHCP server. The check process is similar as step 1.

b. Check w hether the DHCP server or non-first-hop relay agents are enabled with the MAC address check feature.

When a Layer 3 device forwards a DHCP request to the DHCP server, the Layer 3 device replaces the source MAC address of the DHCP request with its MAC address. On receipt of the packet from the Layer 3 device, the DHCP server or a non-first-hop DHCP relay agent will consider that packet as an attack packet.

When multiple DHCP relay agents exist between a DHCP client and the DHCP server, follow these guidelines as a best practice:

- Disable the MAC address check feature on the client-facing interfaces of the DHCP server and non-first-hop DHCP relay agents.

To disable the MAC address check feature on a client-facing interface of the DHCP server, use the undo dhcp relay check mac-address command. To disable the MAC address check feature on a client-facing interface of a non-first-hop DHCP relay agent, use the undo dhcp relay check mac-address command.

- Enable the MAC address check feature only on the client-facing interfaces of the first-hop DHCP relay agent.

For more information about how to check whether the MAC address check feature is enabled on a DHCP relay agent, see step 1.

3. Check whether the maximum number of ARP entries or MAC addresses that a client-facing interface can learn is unreasonable.

You can use the display this command in any view of the DHCP server to view the ARP entry learning limit or MAC learning limit on a client-facing interface.

¡ Display Layer 3 interface configuration.

<Sysname> system-view

[Sysname] interface GigabitEthernet 2/0/1

[Sysname-GigabitEthernet2/0/1] display this

interface GigabitEthernet2/0/1

port link-mode route

arp max-learning-num 10

...

If no ARP entry limit is configured on the interface, use the arp max-learning-num command in Layer 3 interface view to set an ARP entry learning limit.

¡ Display Layer 2 interface configuration.

<Sysname> system-view

[Sysname] interface GigabitEthernet 2/0/1

[Sysname-GigabitEthernet2/0/1] display this

interface GigabitEthernet2/0/1

port link-mode bridge

mac-address max-mac-count 600

...

If the ARP entry learning limit or MAC learning limit is much greater than the number of assignable IPs on the DHCP server, numerous users will fail to obtain IPs from the DHCP server. If the ARP entry learning limit or MAC learning limit is too small, the DHCP server might discard DHCP requests from legitimate users.

To ensure successful IP address acquisition and correct communication between legitimate users and the DHCP server, set a reasonable ARP entry learning limit or MAC learning limit. As a best practice, use the default ARP entry learning limit or MAC learning limit. If the default one cannot meet the service requirement, you can use the arp max-learning-num command or the mac-address max-mac-count command in interface view to set a new learning limit.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Collect the debugging results after you use the debugging dhcp server all command.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

ND issues

ND learning failure

Symptom

The device cannot learn ND entries, causing traffic forwarding failure.

Common causes

The following are the common causes of this type of issue:

· The memory is insufficient.

· The physical layer state of the interface is not up.

· The IPv6 addresses of the local interface and the peer interface do not reside on the same network segment.

· ND packets fail to be sent to the CPU.

· A card is faulty.

· ND packets are dropped due to a busy CPU.

Troubleshooting flow

Figure 43 shows the troubleshooting flowchart.

Figure 43 Flowchart for troubleshooting ND learning failure

Solution

1. Use the display memory-threshold command to identify whether the memory is insufficient.

<Sysname> display memory-threshold

Memory usage threshold: 100%

Free-memory thresholds:

Minor: 96M

Severe: 64M

Critical: 48M

Normal: 128M

Early-warning: 256M

Secure: 304M

Current free-memory state: Normal (secure)

¡ If the Current free-memory state field displays Normal or Normal (secure), go to the next step.

¡ If the Current free-memory state field displays Minor, Severe, Critical, or Normal (early-warning), check the device memory usage and troubleshoot the insufficient memory issue.

2. Check the network configuration and interface state.

a. Use the display interface command to identify whether the interface is up. If the interface is not up, troubleshoot the issue.

b. Use the display ipv6 fib ipv6-address command to view IPv6 FIB entry information. ipv6-address specifies the IPv6 address in an ND entry. If the corresponding IPv6 FIB entry does not exist, the routing module might be faulty. For more information about troubleshooting routing module issues, see "Troubleshooting Layer 3—IP Routing." If the corresponding IPv6 FIB entry exists but the next hop address is not the address of the direct next hop, check the connection between the device and its next hop.

c. Use the display ipv6 interface command to view the IPv6 address of the interface.

- Identify whether the IPv6 address of the local interface resides on the same network segment as the peer interface. If the IPv6 addresses reside on different network segments, execute the ipv6 address command in interface view to edit the IPv6 addresses.

- Identify whether the local interface IPv6 address conflicts with the peer interface IPv6 address. If a conflict has occurred, execute the ipv6 address command in interface view to edit the IPv6 addresses.

- Identify whether the peer interface is the one where the next hop resides.

d. Use the ping ipv6 command to identify whether a link failure exists.

3. Identify whether IPv6 packets are sent and received correctly.

a. Use the debugging ipv6 packet command to enable IPv6 packet debugging. Then, execute the ping ipv6 command to identify whether the device sends and receives IPv6 packets correctly.

<Sysname> debugging ipv6 packet

<Sysname> ping ipv6 -c 1 1::2

Ping6(56 data bytes) 1::1 --> 1::2, press CTRL+C to break

*Apr 26 11:37:33:402 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

LocalSending, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::1, Dst = 1::2,

prompt: Output an IPv6 Packet.

*Apr 26 11:37:33:402 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Sending, interface = GigabitEthernet2/0/1, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::1, Dst = 1::2,

prompt: Sending the packet from local interface GigabitEthernet2/0/1.

The command output indicates that the device has successfully sent an IPv6 packet on interface GigabitEthernet2/0/1.

*Apr 26 11:37:33:402 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

LocalSending, version = 6, traffic class = 224,

flow label = 0, payload length = 32, protocol = 58, hop limit = 255,

Src = 1::1, Dst = ff02::1:ff00:2,

prompt: Output an IPv6 Packet.

*Apr 26 11:37:33:402 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Sending, interface = GigabitEthernet2/0/1, version = 6, traffic class = 224,

flow label = 0, payload length = 32, protocol = 58, hop limit = 255,

Src = 1::1, Dst = ff02::1:ff00:2,

prompt: Sending the packet from local interface GigabitEthernet2/0/1.

56 bytes from 1::2, icmp_seq=0 hlim=64 time=19.336 ms

--- Ping6 statistics for 1::2 ---

1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 19.336/19.336/19.336/0.000 ms

<Sysname>*Apr 26 11:37:33:421 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Receiving, interface = GigabitEthernet2/0/1, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::2, Dst = 1::1,

prompt: Received an IPv6 packet.

The command output indicates that the device has received an IPv6 packet.

*Apr 26 11:37:33:421 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Delivering, interface = GigabitEthernet2/0/1, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::2, Dst = 1::1,

prompt: Delivering the IPv6 packet to the upper layer.

The command output indicates that the device sent the received IPv6 packet to the CPU.

%Apr 26 11:37:33:422 2022 Sysname PING/6/PING_STATISTICS: -MDC=1; Ping6 statistics for 1::2: 1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss, round-trip min/avg/max/std-dev = 19.336/19.336/19.336/0.000 ms.

- If the device has sent and received IPv6 packets successfully, go to the next step.

- If the device failed to send or receive an IPv6 packet, go to the next step.

b. Use the debugging ipv6 error command to enable IPv6 packet error debugging. Identify the IPv6 packet sending or receiving failure cause according to Table 8.

Table 8 Command output

Field	Description
Number of IPv6 fragments exceeded the threshold.	Number of IPv6 fragments exceeded the threshold.
Number of IPv6 reassembly queues exceeded the threshold.	Number of IPv6 reassembly queues exceeded the threshold.
Invalid IPv6 packet.	The IPv6 packet was invalid.
Failed to process the hop-by-hop extension header.	The system failed to process the hop-by-hop extension header.
Failed to process the hop-by-hop option.	The system failed to process the hop-by-hop option in the packet.
The packet was discarded by services.	The packet was discarded by the service.
The packet was administratively discarded.	The IPv6 packet was administratively discarded.

4. Identify whether a card is faulty. The following uses the card in slot 1 as an example. Use the display system internal nd statistics command to view ND statistics of the card.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] display system internal nd statistics slot 1

Entry statistics:

Valid : 1 Dummy : 0

Packet : 1 OpenFlow : 0

Long static : 0 Short static : 0

Temp node : 0 Rule : 0

Static statistics:

Short : 0 Long interface : 0

Long port : 0

Process statistics:

Input : 7 Resolving : 11

Error statistics:

Memory : 0 Sync : 0

Packet : 0 Parameter : 0

Anchor : 0 Get address : 0

Refresh FIB : 0 Delete FIB : 0

Realtime Sync : 0 Temp node : 0

Exceed limit : 0 Refresh rule : 0

Delete rule : 0 Smooth rule start : 0

Smooth rule end : 0 RA : 0

Origin : 0 Final RA : 0

a. If the value for the input field is not 0, go to the next step. If the value for the input field is 0, troubleshoot the card issue.

b. Collect the content of the Error statistics field and send it to H3C technical support staff.

5. Identify whether ND packets are dropped due to a busy CPU. Use the view command to view information about ND in the /proc/kque system directory and identify the situation and reason of ND packet dropping.

[Sysname-probe] view /proc/kque | in ND

0: dd0e0a00 ARP_SEND 1024/0/0/0 (0x4b515545)

0: dd0e6d00 ND_TIMER 1024/0/5/0 (0x4b515545)

0: dd0e6e00 ND_SINGLEEVENT 1/0/0/0 (0x4b515545)

0: dd0e6f00 ND_MACNOTIFYEVENT 1/0/0/0 (0x4b515545)

0: dcec4000 ND_RULE 4096/0/0/0 (0x4b515545)

0: dcec4200 ND_MICROSEGMENT 2048/0/0/0 (0x4b515545)

0: dcec4300 ND_MACNOTIFY 2048/0/0/0 (0x4b515545)

0: dcec4400 ND_MAC_EVENT 1/0/0/0 (0x4b515545)

0: d2da7800 OVERLAY_VNDEL 1/0/0/0 (0x4b515545)

0: ca5f3800 FIB6NDHRQ 1/0/0/0 (0x4b515545)

0: ca3f7600 ND_VSISUP_PKT 4096/0/0/0 (0x4b515545)

0: ca3f7400 NDSNP_PKT 4096/0/0/0 (0x4b515545)

0: ca3f7700 NDRAPG_PKT 4096/0/0/0 (0x4b515545)

0: ca3f7800 ND_EVENT 8192/0/1/0 (0x4b515545)

0: ca3f7900 ND_PKT 4096/0/1/0 (0x4b515545)

View the value for the ND_PKT field in the command output, which is displayed in the W/X/Y/Z format.

¡ W represents the queue capacity, which is a fixed value.

¡ X represents the current queue size.

¡ Y represents the history maximum length of the queue.

¡ Z represents the number of dropped ND packets in the queue.

If Z is not 0 and Y equals W, ND packets are dropped due to a busy CPU. If Z is 0, go to the next step.

6. Collect specific information about the ND process. Execute the display mdc command to show MDC-related information and obtain the MDC number. Use the display process command to view the process number of the ND process corresponding to the MDC number. Then, display the specific information of the ND process using the view command based on the process number, and send the specific information to the H3C Technical Support.

[Sysname-probe] display process name knd/1

Job ID: 55763

PID: 55763

Parent JID: 2

Parent PID: 2

Executable path: -

Instance: 0

Respawn: OFF

Respawn count: 1

Max. spawns per minute: 0

Last started: Tue Apr 26 11:32:31 2022

Process state: sleeping

Max. core: 0

ARGS: -

TID LAST_CPU Stack PRI State HH:MM:SS:MSEC Name

55763 0 0K 115 S 0:0:13:490 [kND/1]

The "1" in "knd/1" indicates that the MDC number is 1. In the displayed information above, the "PID" value shows that the process ID of the ND process is 55763. Next, execute the view command to display detailed information about the ND process with process ID 55763.

[Sysname-probe] view /proc/55763/stack

[<c04c9cd4>] kepoll_wait+0x274/0x3c0

[<e2021612>] nd_Thread+0x62/0x100 [system]

[<c043f1b4>] kthread+0xd4/0xe0

[<c0401daf>] kernel_thread_helper+0x7/0x10

[<ffffffff>] 0xffffffff

7. Collect the following information and contact H3C Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

NS packet response failure

Symptom

The device does not reply to the NS packet sent from the peer device.

Common Causes

The following are the common causes of this type of issue:

· The destination IPv6 address in the NS packet received by the interface is not the IPv6 address of the local device.

Troubleshooting flow

Figure 44 shows the troubleshooting flowchart.

Figure 44 Flowchart for troubleshooting NS packet response failure

Solution

1. View information about the ND packet sent from the peer device to identify whether it is sent to the CPU.

a. Use the debugging ipv6 packet command to enable IPv6 packet debugging. Then, configure the peer device to send an NS packet to the local device.

<Sysname> debugging ipv6 packet

*Apr 26 13:33:34:897 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Receiving, interface = GigabitEthernet2/0/1, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::2, Dst = 1::1,

prompt: Received an IPv6 packet.

- If the destination IPv6 address is not the local device IPv6 address, check the routing table and FIB of the peer device.

- If the destination IPv6 address is the local device IP address, go to step b.

b. Use the debugging ipv6 error command to enable IPv6 packet error debugging. Identify the NS packet response failure cause according to Table 9.

Table 9 Output from the debugging ipv6 error command

Field	Description
Number of IPv6 fragments exceeded the threshold.	Number of IPv6 fragments exceeded the threshold.
Number of IPv6 reassembly queues exceeded the threshold.	Number of IPv6 reassembly queues exceeded the threshold.
Invalid IPv6 packet.	The IPv6 packet was invalid.
Failed to process the hop-by-hop extension header.	The system failed to process the hop-by-hop extension header.
Failed to process the hop-by-hop option.	The system failed to process the hop-by-hop option in the packet.
The packet was discarded by services.	The packet was discarded by the service.
The packet was administratively discarded.	The IPv6 packet was administratively discarded.

2. Use the display system internal nd statistics command to display ND statistics on each card. Collect the content of the Error statistics field and send it to H3C technical support staff.

The following uses the card in slot 1 as an example. Use the display system internal nd statistics command to display ND statistics on each card. Identify whether a card is faulty.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] display system internal nd statistics slot 1

Entry statistics:

Valid : 1 Dummy : 0

Packet : 1 OpenFlow : 0

Long static : 0 Short static : 0

Temp node : 0 Rule : 0

Static statistics:

Short : 0 Long interface : 0

Long port : 0

Process statistics:

Input : 7 Resolving : 11

Error statistics:

Memory : 0 Sync : 0

Packet : 0 Parameter : 0

Anchor : 0 Get address : 0

Refresh FIB : 0 Delete FIB : 0

Realtime Sync : 0 Temp node : 0

Exceed limit : 0 Refresh rule : 0

Delete rule : 0 Smooth rule start : 0

Smooth rule end : 0 RA : 0

Origin : 0 Final RA : 0

¡ Check the Input field to identify whether the card receives ND packets correctly.

¡ Collect the content of the Error statistics field and send it to H3C technical support staff.

3. Collect the following information and contact H3C Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Traffic forwarding failure based on the existing ND entry

Symptom

The device has learned an ND entry but cannot forward traffic correctly.

Common causes

The following are the common causes of this type of issue:

· An abnormal parameter exists in the learned ND entry.

· The learned ND entry failed to be deployed to the driver.

Troubleshooting flow

Figure 45 shows the troubleshooting flowchart.

Figure 45 Flowchart for troubleshooting traffic forwarding failure based on the existing ND entry

Solution

1. Use the display system internal adj6 entry command to identify whether an abnormal parameter exists in the learned ND entry. The following uses interface GigabitEthernet2/0/1 and peer IPv6 address 1::2 as an example.

<Sysname> system-view

[Sysname] probe

[Sysname-probe] display system internal adj6 entry 1::2 interface gigabitethernet 2/0/1

ADJ6 entry:

Entry attribute : 0x0

Service type : Ethernet

Link media type : Broadcast

Action type : Forwarding

Entry flag : 0x4

Forward type : 0x0

Slot : 0

MTU : 1500

Driver flag : 2

Sequence No : 17

Physical interface : GE2/0/1

Logical interface : N/A

Virtual circuit information : 65535

ADJ index : 0xdc780c38

Peer address : ::

Reference count : 0

Reference Sequence : 3

MicroSegmentID : 0

Nexthop driver[0] : 0xffffffff

Nexthop driver[1] : 0xffffffff

Driver context[0] : 0xffffffff

Driver context[1] : 0xffffffff

Driver context[2] : 0xffffffff

Driver context[3] : 0xffffffff

Driver context[4] : 0xffffffff

Driver context[5] : 0xffffffff

Link head information(IPv6) : 68cb9c3f020668cb978f010686dd

Link head information(MPLS) : 68cb9c3f020668cb978f01068847

¡ If the Action type field displays Forwarding, the device forwards traffic from 1::2 correctly and the device is not faulty.

¡ If the Action type field displays Drop, the device fails to forward traffic from 1::2. An abnormal parameter exists in the learned ND entry.

- If the Driver flag field displays 4, driver resources are insufficient. Check the driver usage.

- If the Driver flag field does not display 4, go to the next step.

2. Use the debugging system internal adj6 command and specify the hardware keyword to enable IPv6 adjacency entry debugging. Use the ping ipv6 command to trigger ND learning. Identify whether the ND entry is successfully deployed to the driver.

[Sysname-probe] debugging system internal adj6 hardware

[Sysname-probe] ping ipv6 -c 1 1::2

Ping6(56 data bytes) 1::1 --> 1::2, press CTRL+C to break

56 bytes from 1::2, icmp_seq=0 hlim=64 time=2.868 ms

--- Ping6 statistics for 1::2 ---

1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 2.868/2.868/2.868/0.000 ms

<Sysname>*Apr 26 16:06:42:412 2022 Sysname IP6PMTU/7/IP6PMTU_DBG: -MDC=1; Binding socket to PMTU succeeded

*Apr 26 16:06:42:412 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

LocalSending, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::1, Dst = 1::2,

prompt: Output an IPv6 Packet.

*Apr 26 16:06:42:412 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Sending, interface = GigabitEthernet0/0/1, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::1, Dst = 1::2,

prompt: Sending the packet from local interface GigabitEthernet0/0/1.

*Apr 26 16:06:42:413 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

LocalSending, version = 6, traffic class = 224,

flow label = 0, payload length = 32, protocol = 58, hop limit = 255,

Src = 1::1, Dst = ff02::1:ff00:2,

prompt: Output an IPv6 Packet.

*Apr 26 16:06:42:413 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Sending, interface = GigabitEthernet0/0/1, version = 6, traffic class = 224,

flow label = 0, payload length = 32, protocol = 58, hop limit = 255,

Src = 1::1, Dst = ff02::1:ff00:2,

prompt: Sending the packet from local interface GigabitEthernet0/0/1.

*Apr 26 16:06:42:414 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Receiving, interface = GigabitEthernet0/0/1, version = 6, traffic class = 224,

flow label = 0, payload length = 32, protocol = 58, hop limit = 255,

Src = 1::2, Dst = 1::1,

prompt: Received an IPv6 packet.

*Apr 26 16:06:42:414 2022 Sysname ADJ6/7/ADJ6_HARDWARE: -MDC=1;

====Start ADJLINK Add====

*Apr 26 16:06:42:414 2022 Sysname ADJ6/7/ADJ6_HARDWARE: -MDC=1;

--------------New Entry-------------

Service type : Ethernet

Link media type : Broadcast

Action type : Forwarding

IPv6 address : 1::2

Route interface : GE0/0/1

Port interface : N/A

Slot : 0

MTU : 1500

VLAN id : 65535

Second VLAN id : 65535

Physical interface : GE0/0/1

Logical interface : N/A

Vrf index : 0

VSI index : -1

VSI link ID : 65535

Usr ID : -1

MAC address : 68cb-9c3f-0206

Link head length(IPv6) : 14

Link head length(MPLS) : 14

Link head information(IPv6) : 68cb9c3f020668cb978f010686dd

Link head information(MPLS) : 68cb9c3f020668cb978f01068847

Nexthop driver

[0]: 0xffffffff [1]: 0xffffffff

Driver context

[0]: 0xff

*Apr 26 16:06:42:414 2022 Sysname ADJ6/7/ADJ6_HARDWARE: -MDC=1;

====End ADJLINK Operate====

Result : 0x0, Reference flag : 0x0, Syn flag : 0x0

*Apr 26 16:06:42:415 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Receiving, interface = GigabitEthernet0/0/1, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::2, Dst = 1::1,

prompt: Received an IPv6 packet.

*Apr 26 16:06:42:415 2022 Sysname IP6FW/7/IP6FW_PACKET: -MDC=1;

Delivering, interface = GigabitEthernet0/0/1, version = 6, traffic class = 0,

flow label = 0, payload length = 64, protocol = 58, hop limit = 64,

Src = 1::2, Dst = 1::1,

prompt: Delivering the IPv6 packet to the upper layer.

%Apr 26 16:06:42:416 2022 Sysname PING/6/PING_STATISTICS: -MDC=1; Ping6 statistics for 1::2: 1 packet(s) transmitted, 1 packet(s) received, 0.0% packet loss, round-trip min/avg/max/std-dev = 2.868/2.868/2.868/0.000 ms.

*Apr 26 16:06:42:417 2022 Sysname IP6PMTU/7/IP6PMTU_DBG: -MDC=1; Unbinding PMTU from socket succeeded

¡ If the Result field displays 0x0, the ND entry has been successfully deployed to the driver. Go to the next step.

¡ If the Result field does not display 0x0, the ND entry failed to be deployed to the driver. Check the hardware resource usage.

3. Execute the following commands, collect the command outputs, and send them to H3C technical support staff.

¡ debugging system internal adj6 (with the notify keyword specified)

¡ debugging system internal ipv6 fib prefix

4. Collect the following information and contact H3C Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting Layer 3 IP routing issues

BGP issues

BGP session unable to enter Established state

Symptom

The session between the local router and a peer or peer group cannot transition to Established state.

Common causes

The following are the common causes of this type of issue:

· BGP packet forwarding is blocked.

· The packets used for establishing or maintaining the BGP TCP connection are filtered out by ACLs.

· A router ID conflict exists between the BGP peers within the autonomous system.

· The specified peer or peer group AS number is incorrect.

· The peer address specified for peer session establishment is the IP address of a loopback interface on the peer router. However, on the peer router, the peer connect-interface command is not executed, or the source IP address specified in the peer connect-interface command is not the specified loopback interface IP address.

· When the local router establishes a BGP TCP connection with the peer router, the TCP packets sent by both ends are too large. Consequently, TCP connection establishment fails, because those TCP packets are discarded by intermediate nodes that have a small output interface MTU and do not support packet fragmentation the packets.

· The EBGP peer address specified on the local router is the IP address of a loopback interface on the EBGP peer router, but the peer router is not configured with the peer ebgp-max-hop command.

· MD5 authentication fails, because both ends of the BGP session are not configured the same key by using the peer password command.

· When the peer ttl-security command is executed to enable GTSM for the specified peer or peer group, the maximum hop count is incorrectly configured. Consequently, the peer or peer group cannot pass the GTSM check.

· The BGP session is terminated, because the number of BGP routes sent by the peer to the local router exceeds the upper limit set by using the peer route-limit command.

· The peer ignore, ignore all-peers, or shutdown process is configured on either end of the BGP session.

· Although the local router and the peer router are enabled to exchange routing information, their respective configurations are not in the same address family view.

Analysis

Figure 46 shows the troubleshooting flowchart:

Figure 46 Troubleshooting flowchart

Solution

1. Identify whether the l ink to the BGP peer is operating correctly.

a. Identify whether the peer-facing interface is in UP state.

b. Use the ping command to test connectivity with the BGP peer. If the ping succeeds, the link between the local router and the BGP peer is operating correctly. In this case, proceed to step 2. If the ping fails, proceed to step c.

NOTE:

As a best practice, use the ping –a source-ip –s packet-size or ping ipv6 –a source-ipv6 –s packet-size command to test connectivity with the BGP peer.

The –a source-ip and –a source-ipv6 parameters specify the source IP address of ICMP echo requests. The –s packet-size parameter specifies the length of ICMP echo requests, which helps you monitor the transmission of long packets.

The source IP for the ping should be the local interface IP used for BGP session establishment, and the destination IP should be the peer interface IP used for BGP session establishment.

c. Repeat the ping –a source-ip –s packet-size command with a decreasing –s packet-size value. If the ping succeeds when the –s packet-size parameter is decreased to a certain value, the cause of this issue is that the TCP packets sent for BGP TCP connection establishment are too long and they are dropped by intermediate devices. To resolve this issue, perform either of the following tasks:

- Repeat the ping –a source-ip –s packet-size command and gradually reduce the value for the –s packet-size parameter until you find an appropriate value. As a best practice to ensure optimal forwarding efficiency, the final value should be the maximum value ensuring that the ping can succeed. Then, set the final value as the MTU value of the output interfaces for BGP packets. To achieve this goal, you can execute the ip/ipv6 mtu mtu-size or tcp mss value command on the related interfaces. Alternatively, you can execute the peer tcp-mss command in BGP instance view or BGP-VPN instance view. The ip/ipv6 mtu mtu-size command specifies the MTU value for an interface, and the peer tcp-mss command specifies the TCP MSS. You can use the following formula for TCP MSS calculation: TCP MSS = MTU - IP header length - TCP header length

- Execute the tcp path-mtu-discovery command to enable TCP path MTU discovery in system view. Then, the device dynamically obtains the smallest MTU value along the path used for TCP connection establishment, and calculates an MSS accordingly. When the device attempts to establish a BGP TCP connection, it determines the length of TCP packets based on the calculated MSS.

If the ping always fails no matter how you adjust the value for the –s packet-size parameter, troubleshoot this issue as described in Layer 3—IP Services Troubleshooting Guide.

d. If the issue persists, proceed to step 2.

2. Identify whether a BGP TCP connection has been established between the local router and the BGP peer.

Execute the display tcp command, and then identify whether the output displays the following TCP connection:

¡ Local address: IP address of the local router.

¡ Peer address: IP address of the related BGP peer.

¡ Peer port: 179.

¡ State: ESTABLISHED.

For example:

<Sysname> display tcp

*: TCP connection with authentication

Local Addr:port Foreign Addr:port State PCB

0.0.0.0:179 12.1.1.2:0 LISTEN 0xffffffffffffff9d

12.1.1.1:28160 12.1.1.2:179 ESTABLISHED 0xffffffffffffff9e

If such a TCP connection exists, proceed to step 3. If not, perform the following checks:

¡ Execute the display ip routing-table or display ipv6 routing-table command, and then identify whether the routing table contains an IGP route to the IPv4 or IPv6 peer address used for BGP session establishment. If such a route does not exist, check for incorrect IGP routing settings. For more information about troubleshooting IGP issues, see OSPF, OSPFv3, or IS-IS troubleshooting guide in Layer 3—IP Routing Troubleshooting Guide.

¡ Execute the display acl all command to check for a rule that denies port bgp. For example:

<Sysname> display acl all

Advanced IPv4 ACL 3077, 2 rules,

ACL's step is 5

rule 1 deny tcp destination-port eq bgp

rule 2 deny tcp source-port eq bgp

If such a rule exists, execute the undo rule command to remove the rule.

¡ Execute the debugging tcp packet command to identify whether an authentication failure occurs upon TCP connection establishment. For example:

<Sysname> debugging tcp packet acl 3000

*Feb 5 20:03:39:289 2021 Sysname SOCKET/7/INET: -MDC=1;

TCP Input: Failed to check md5, drop the packet.

As shown in the command output, BGP failed to pass MD5 authentication when it attempted to initiate a TCP connection. In this situation, execute the peer password command to configure the same key at both ends of the BGP TCP connection.

<Sysname> debugging tcp packet acl 3000

*Feb 5 20:03:39:289 2021 Sysname SOCKET/7/INET: -MDC=1;

TCP Input: Failed to check keychain, drop the packet.

As shown in the command output, BGP failed to pass keychain authentication when it attempted to initiate a TCP connection. In this situation, execute the peer keychain command at both ends of the BGP TCP connection to ensure the following requirements are met:

- The keys used by the two ends at the same time must have the same ID.

- The keys with the same ID must use the same authentication algorithm and key string.

<Sysname> debugging tcp packet acl 3000

*Feb 5 20:03:39:289 2021 Sysname SOCKET/7/INET: -MDC=1;

TCP Input: Failed to get IPSEC profile, index 500, name profile1(inpcb profile2), return 0x3fff.

As shown in the command output, BGP failed to pass IPsec authentication when it attempted to initiate a TCP connection. In this situation, make sure the peer ipsec-profile command is executed at both ends of the BGP TCP connection.

If the issue persists, proceed to step 3.

3. Identify whether the local router has a router ID conflict with the peer or peer group, or whether the specified peer or peer group AS number is incorrect.

a. Execute the display bgp peer command, and then view the BGP local router ID field in the output to identify whether a router ID conflict exists. If a router ID conflict is found, execute the router-id command in the BGP instance or BGP-VPN instance that requires establishing a BGP session, to change the router ID of the BGP router.

<Sysname> display bgp peer ipv4 unicast

BGP local router ID: 12.1.1.1

Local AS number: 10

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

12.1.1.2 20 3 3 0 0 00:00:25 Established

b. Execute the display bgp peer command, and then view the AS field in the output to identify whether the AS number specified for the peer or peer group is incorrect. If the AS number is incorrect, execute the peer as-number command to correct the AS number. For example:

<Sysname> display bgp peer ipv4 unicast

BGP local router ID: 12.1.1.1

Local AS number: 10

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

12.1.1.2 20 3 3 0 0 00:00:25 Established

c. If the issue persists, proceed to step 4.

4. Execute the display this command in BGP instance view to check for c onfigurations that affect BGP session establishment:

Table 10 Check items that affect BGP session establishment

Check Item	Description
peer { group-name \| ipv4-address [ mask-length ] \| ipv6-address [ prefix-length ] } connect-interface interface-type interface-number	When this configuration exists on the local router, the BGP peer must also use a loopback interface address for BGP session establishment. To meet this requirement, you can use this command or the peer source-address command.
peer ipv4-address [ mask-length ] source-address source-ipv4-address peer ipv6-address [ prefix-length ] source-address source-ipv6-address	If this configuration exists on the local router, the BGP peer must also use a loopback interface address for BGP session establishment. To meet this requirement, you can use this command or the peer connect-interface command.
peer { group-name \| ipv4-address [ mask-length ] \| ipv6-address [ prefix-length ] } ebgp-max-hop [ hop-count ]	This command is required in one of the following situations: · Two indirectly-connected devices need to establish an EBGP session. · Two directly-connected devices need to establish an EBGP session through their loopback interfaces. To ensure successful EBGP session establishment, execute this command at both ends of the EBGP session.
peer { group-name \| ipv4-address [ mask-length ] \| ipv6-address [ prefix-length ] } ttl-security hops hop-count	If this configuration exists, the local router accepts BGP packets from the specified peer only when the TTLs of those BGP packets are within the valid TTL range. The valid TTL range is from 255 – the hop-count value + 1 to 255. If the number of hops between the local router and the specified peer exceeds the hop-count value, execute this command to adjust the hop-count value.
peer { group-name \| ipv4-address [ mask-length ] \| ipv6-address [ prefix-length ] \| link-local-address interface interface-type interface-number } route-limit prefix-number [ reconnect reconnect-time \| percentage-value ] *	If this configuration exists on the local router and the number of routes received from the specified peer or peer group exceeds the prefix-number value, the local router will disconnect from the peer or peer group. To avoid this issue, reduce the number of routes sent by the peer or peer group or increase the prefix-number value.
peer { group-name \| ipv4-address [ mask-length ] \| ipv6-address [ prefix-length ] \| link-local-address interface interface-type interface-number } ignore [ graceful graceful-time { community { community-number \| aa:nn } \| local-preference preference \| med med } * ]	If this configuration exists, the local router will not establish a BGP session with the specified peer or peer group. To resolve this issue, execute the undo peer ignore command with the peer or peer group specified.
ignore all-peers [ graceful graceful-time { community { community-number \| aa:nn } \| local-preference preference \| med med } * ]	If this configuration exists, the local router cannot establish BGP sessions with any peers. In this situation, the local router might be undergoing a network upgrade or maintenance task, and the related BGP process is temporarily unavailable. As a best practice, execute the undo peer ignore or undo ignore all-peers command after the upgrade or maintenance task is completed.
shutdown process	If this configuration exists, the local router cannot establish BGP sessions with any peers. In this situation, the local router might be undergoing a network upgrade or maintenance task, and the related BGP process is temporarily unavailable. As a best practice, execute the undo shutdown process command after the upgrade or maintenance task is completed.
The peer enable command in the related address family	When two devices need to establish a BGP session, you must execute the peer enable command on each of them with the other specified. Make sure the peer enable command is executed in the same address family. If this configuration exists on the local router, verify that the peer is also configured with the peer enable command in the same address family.

If the issue persists, proceed to step 5.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

After the snmp-agent trap enable bgp command is executed in system view, the router generates the following alarm message:

Module name: BGP4-MIB

· bgpBackwardTransition (1.3.6.1.2.1.15.7.2)

Log messages

N/A

BGP session down

Symptom

The device generates a BGP/5/BGP_STATE_CHANGED log message, which notifies that the state of a BGP session transitioned from Established to Idle.

Common causes

The following are the common causes of this type of issue:

· KEEPALIVE or UPDATE message sending/receiving timed out.

· TCP connection establishment failed.

· The local device has reached a memory threshold.

· An error occurred in parsing BGP messages.

Analysis

Figure 47 shows the troubleshooting flowchart:

Figure 47 Troubleshooting flowchart

Solution

Execute the display bgp peer log-info command to identify the cause of this issue. The common causes include:

· A BGP timer expired.

If the output of the display bgp peer log-info command is similar to the following:

<Sysname> display bgp peer ipv4 3.3.3.3 log-info

Peer: 3.3.3.3

Date Time State Notification

Error/SubError

17-Jan-2022 14:48:34 Down Receive notification with error 4/0

Hold Timer Expired/ErrSubCode Unspecified

Keepalive last triggered time: 14:48:31-2022.1.17

Keepalive last sent time : 14:48:31-2022.1.17

Update last sent time : 14:48:24-2022.1.17

EPOLLOUT last occurred time : 14:48:30-2022.1.17

The BGP session went down because the local router could not receive a KEEPALIVE or UPDATE message from the peer before the hold timer expired. After the hold timer expired, the local device actively terminated the BGP session and sent a NOTIFICATION message to the peer.

A timer timeout issue might occur in one of the following situations:

¡ The device sends a KEEPALIVE or UPDATE message to a peer normally, but the message fails to reach the peer or the peer does not process the message in time.

¡ The device fails to generate a KEEPALIVE or UPDATE message in time due to scheduling issues.

To resolve this issue, execute the display system internal bgp log command in probe view at both ends of the BGP session, collect the command output, and then contact Technical Support for further analysis.

· A TCP connection error occurred.

If the output of the display bgp peer log-info command is similar to the following:

<Sysname> display bgp peer ipv4 1.1.1.1 log-info

Peer: 1.1.1.1

Date Time State Notification

Error/SubError

17-Jan-2022 14:42:01 Down Receive TCP_Connection_Failed event

The BGP session went down due to a TCP connection error. If BGP uses TCP as the transport layer protocol, and a TCP connection error occurs between the two BGP peers, the related BGP session will be terminated. If the output of the display bgp peer log-info command is different from the above example, but contains a NOTIFICATION message with error code 5/0, the cause of this issue is also a TCP connection error.

After you confirm that the BGP session went down due to a TCP connection error, perform the following task:

a. Execute the view /proc/tcp/tcp_log slot x command in probe view at both ends of the BGP session (execute this command once for each card or member device).

b. Collect the command output.

c. Contact Technical Support for further analysis.

· The memory was insufficient.

If the output of the display bgp peer log-info command is similar to the following:

<Sysname> display bgp peer ipv4 1.1.1.1 log-info

Peer: 1.1.1.1

Date Time State Notification

Error/SubError

17-Jan-2022 15:38:53 Down Send notification with error 6/8

Entered severe memory state

17-Jan-2022 14:53:51 Down Send notification with error 6/8

No memory to process the attribute

The device did not have enough memory to run BGP-related functions, which caused the BGP session termination. The cause of this issue corresponds to error code 6/8 in the output of the display bgp peer log-info command.

In this case, perform the following task:

d. Execute the display memory-threshold command at both ends of the BGP session to obtain the memory alarm thresholds.

e. Collect the output of the display bgp peer log-info command.

f. Contact Technical Support for further analysis.

· An error occurred in parsing BGP messages.

If the two ends of a BGP session have different message parsing capabilities or have a version mismatch, they might not be able to parse the BGP packets received from each other and thus might be disconnected. This type of issue corresponds to error codes 1, 2, and 3 in the output of the display bgp peer log-info command (where the Error part in the Error/SubError field is 1, 2, or 3).

Execute the debugging bgp raw-packet, debugging bgp open, and debugging bgp update commands at both ends of the BGP session, collect the output of those commands and the display bgp peer log-info command, and then contact Technical Support for further analysis.

· If the cause of this issue displayed in the output of the display bgp peer log-info command is not any of the above, collect the following information and contact Technical Support:

¡ Output of the display bgp peer log-info command.

¡ Output of the display system internal bgp log command.

¡ Output of the view /proc/tcp/tcp_log slot x command (executed once for each card or member device).

¡ The configuration file, log messages, and alarm messages.

Table 11 lists the detailed reasons for BGP peer disconnection and their corresponding error codes.

Table 11 Reasons for BGP peer disconnection

Error code/subcode	Reason for peer disconnection	Description
1/1	connection not synchronized	The two ends of the connection were not synchronized. The current implementation is that the first 16 bytes of the received message's header do not contain only Fs.
1/2	bad message length	Invalid message length.
1/3	bad message type	Invalid message type.
3/1	the withdrawn length is too large	The length of routing information to be withdrawn was too long.
	the attribute length is too large	The attribute length was too long.
	one attribute appears more than once	A path attribute appeared multiple times in an UPDATE message.
	the attribute length is too small	The attribute length was less than two bytes.
	exntended length field is less than two octets	The attribute length was extensible, but it was less than two bytes.
	the length field is less than one octet	The attribute length was not extensible, but it was less than one byte.
	link-state attribute error	The link-state attribute was in incorrect form.
3/2	unrecognized well-known attribute	Unknown well-known attribute.
3/3	attribute-type attribute missed	The attribute-type attribute was lost. The values for the attribute-type argument include: · ORIGIN · AS_PATH · LOCAL_PREF · NEXT_HOP
3/4	attribute flags error	Incorrect attribute flags.
3/5	attribute-type attribute length error	The length of the attribute-type attribute was invalid. The values for the attribute-type argument include: · AS_PATH · AS4_PATH · CLUSTER_LIST · AGGREGATOR · AS4_AGGREGATOR · ORIGIN · NEXT_HOP · MED · LOCAL_PREF · ATOMIC_AGGREGATE · ORIGINATOR_ID · MP_REACH_NLRI · COMMUNITIES · EXT-COMMUNITIES
3/5	attribute length exceeds	The attribute length crossed the limit.
3/6	invalid ORIGIN attribute	Invalid ORIGIN attribute.
3/8	invalid NEXT_HOP attribute	Invalid NEXT_HOP attribute.
3/9	invalid nexthop length in MP_REACH_NLRI (address-family)	The Nexthop length in the MP_REACH_NLRI attribute was invalid for the address-family address family. The values for the address-family argument include: · 4u—IPv4 unicast address family. · IPv4 Flowspec—IPv4 flowspec address family. · MPLS—MPLS address family. · VPNv4—VPNv4 address family · 6u—IPv6 unicast address family. · VPNv6—VPNv6 address family. · L2VPN—L2VPN address family.
	the length of MP_UNREACH_NLRI is too small	The length of the MP_UNREACH_NLRI attribute was less than three bytes.
	the MP NLRI attribute length exceeds	The length of the MP_REACH_NLRI or MP_UNREACH_NLRI attribute crossed the limit.
	erroneous MP NLRI attribute end position	The reachable or unreachable prefix and the path attribute ended at different positions.
3/10	invalid network field	Invalid network field.
3/11	malformed AS_PATH	The AS_PATH attribute was malformed.
4/0	Keepalive last triggered time	Most recent time when KEEPALIVE message sending was triggered.
	Keepalive last sent time	Most recent time when a KEEPALIVE message was sent.
	Update last sent time	Most recent time when an UPDATE message was sent.
	EPOLLOUT last occurred time	Most recent time when an EPOLLOUT event occurred.
	Keepalive last received time	Most recent time when a KEEPALIVE message was received.
	Update last received time	Most recent time when an UPDATE message was received.
	EPOLLIN last occurred time	Most recent time when an EPOLLIN event occurred.
5/0	connection retry timer expires	The ConnectRetry timer expired.
	TCP_CR_Acked event received	A TCP_CR_Acked event was received.
	TCP_Connection_Confirmed event received	A TCP_Connection_Confirmed event was received.
5/3	open message received	An OPEN message was received.
6/0	manualstop event received	A manualstop event was received.
	physical interface configuration changed	Physical configurations changed, such as interface settings.
	session down event received from BFD	A BFD session down event was received.
6/1	maximum number of prefixes reached	The number of route prefixes has exceeded the upper limit specified by using the peer route-limit command.
6/1	maximum number of address-family prefixes reached	The number of route prefixes in the address-family address family has exceeded the upper limit specified by using the peer route-limit command. The values for the address-family argument include: · IPv4 unicast—IPv4 unicast address family. · IPv6 unicast—IPv6 unicast address family. · VPNv4—VPNv4 address family. · VPNv6—VPNv6 address family.
6/2	configuration of peer ignore changed	The peer ignore command was configured.
6/3	address family deleted	An address family was deleted.
6/3	peer disabled	A peer was disabled.
6/4	administrative reset	The BGP session was reset because of the reset bgp command or configuration changes.
6/5	connection rejected	The connection request was rejected.
6/6	other configuration change	Other configurations changed.
6/7	connection collision resolution	A connection conflict occurred.
6/7	two connections exist and MD5 authentication is configured for the neighbor	Two connections existed and MD5 authentication was configured for one of them.
6/8	no memory to process the attribute	The memory was insufficient for attribute parsing.
	no memory for the route	Failed to obtain memory resources for route or label block generation.
	no memory to generate unreachable NLRI	Failed to obtain memory resources for MP_UNREACH_NLRI encapsulation.
	no memory to generate a message	Failed to obtain memory resources for message encapsulation.
	can't get the VPN RD	Failed to obtain RDs upon prefix parsing.
	can't get the VPN routing table	Failed to obtain the VPN routing table upon prefix parsing.
	can't get the attributes	Failed to obtain attributes upon prefix parsing.
	entered severe memory state	A severe memory usage alarm was triggered.
	entered critical memory state	A critical memory usage alarm was triggered.

Related alarm and log messages

Alarm messages

N/A

Log messages

· BGP/5/BGP_STATE_CHANGED

· BGP/5/BGP_STATE_CHANGED_REASON

· BGP/6/BGP_PEER_STATE_CHG

BGP routing loop in a cross-AS data center interconnect scenario

Symptom

As shown in Figure 48, two data centers are interconnected across ASs through BGP. RR 1 learns BGP routes with the same prefix (for example, 10.110.0.0/16) from Border 3 and Border 4 in Data Center 2. The next hops for those routes are the loopback interface addresses of Border 3 and Border 4, respectively. RR 1 selects the route from Border 3 or Border 4 as the optimal route. Border 1 and Border 2 send default routes to RR 1 through BGP, with the next hops being IP addresses of the interfaces directly connected to RR 1. If Border 3 or Border 4 restarts, the devices in Data Center 1 cannot access network segment 10.110.0.0/16 during the restart. Packets destined for the network segment loop between RR 1 and Border 1 or RR 1 and Border 2.

Figure 48 Network diagram

Common causes

Before Border 3 or Border 4 restarts, the BGP routing table and IP routing table of RR 1 are similar to the following:

<RR1> display bgp routing-table ipv4

Total number of routes: 4

BGP local router ID is 9.9.9.9

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Network NextHop MED LocPrf PrefVal Path/Ogn

* >i 0.0.0.0/0 19.1.1.1 100 0 i

* i 29.1.1.2 100 0 i

* >e 10.110.0.0/16 3.3.3.3 0 0 20i

* e 4.4.4.4 0 0 20i

<RR1> display ip routing-table

Destinations : 25 Routes : 25

Destination/Mask Proto Pre Cost NextHop Interface

0.0.0.0/0 BGP 255 0 19.1.1.1 GE2/0/1

0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

1.1.1.1/32 O_INTRA 10 1 19.1.1.1 GE2/0/1

2.2.2.2/32 O_INTRA 10 1 29.1.1.2 GE2/0/2

3.3.3.3/32 O_INTRA 10 1 39.1.1.3 GE2/0/3

4.4.4.4/32 O_INTRA 10 1 49.1.1.4 GE2/0/4

9.9.9.9/32 Direct 0 0 127.0.0.1 InLoop0

10.10.10.10/32 BGP 255 0 1.1.1.1 GE2/0/1

19.1.1.0/24 Direct 0 0 19.1.1.9 GE2/0/1

19.1.1.0/32 Direct 0 0 19.1.1.9 GE2/0/1

19.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0

19.1.1.255/32 Direct 0 0 19.1.1.9 GE2/0/1

10.110.0.0/16 BGP 255 0 3.3.3.3 GE2/0/3

29.1.1.0/24 Direct 0 0 29.1.1.9 GE2/0/2

29.1.1.0/32 Direct 0 0 29.1.1.9 GE2/0/2

29.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0

29.1.1.255/32 Direct 0 0 29.1.1.9 GE2/0/2

39.1.1.0/24 Direct 0 0 39.1.1.9 GE2/0/3

39.1.1.0/32 Direct 0 0 39.1.1.9 GE2/0/3

39.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0

39.1.1.255/32 Direct 0 0 39.1.1.9 GE2/0/3

49.1.1.0/24 Direct 0 0 29.1.1.9 GE2/0/2

49.1.1.0/32 Direct 0 0 29.1.1.9 GE2/0/2

49.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0

49.1.1.255/32 Direct 0 0 29.1.1.9 GE2/0/2

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0

127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

According to the above command output, RR 1 learned routes destined for the loopback interfaces of Border 3 and Border 4 through IGP. BGP network route 10.110.0.0/16 was iterated to the learned loopback interface routes.

After Border 4 restarts, RR 1 does not disconnect from Border 4 unless the session hold timer expires, and the routing table of RR 1 still retain network route 10.110.0.0/16 (received from Border 4). However, the network route can be iterated only to the default route (0.0.0.0/0), because the IGP route for next hop 4.4.4.4 has become invalid and RR 1 does not have other network routes that contain IP address 4.4.4.4.

In the routing table of RR 1, you can find the following information:

· The IGP metric value is 1 for the next hop of network route 10.110.0.0/16 received from Border 3, which corresponds to route entry 3.3.3.3/32 O_INTRA 10 1 39.1.1.3 GE2/0/3.

· The IGP metric value is 0 for the next hop of network route 10.110.0.0/16 received from Border 4, which corresponds to route entry 0.0.0.0/0 BGP 255 0 19.1.1.1 GE2/0/1.

According to the BGP route selection rules, RR 1 chooses the route from Border 4 as the optimal route. In the forwarding table, the next hop for network segment 10.110.0.0/16 changes to . Consequently, RR 1 forwards packets destined for network segment 10.110.0.0/16 to Border 1. Then, Border 1 forwards those packets back to RR 1, because Border 1 learned network route 10.110.0.0/16 from RR 1. This causes a routing loop.

Analysis

Figure 49 shows the troubleshooting flowchart:

Figure 49 Troubleshooting flowchart

Solution

1. View the BGP routing table and IP routing table of RR 1. This example uses the network shown in Figure 48 for illustration.

a. After Border 4 restarts, if you execute the display bgp routing-table ipv4 command on RR 1 before RR 1 is disconnected from Border 4, you can find that network route 10.110.0.0/16 received from Border 4 is still active and is the optimal route.

<RR1> display bgp routing-table ipv4

Total number of routes: 5

BGP local router ID is 9.9.9.9

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Network NextHop MED LocPrf PrefVal Path/Ogn

* >i 0.0.0.0/0 19.1.1.1 100 0 i

* i 29.1.1.2 100 0 i

* >e 10.110.0.0/16 4.4.4.4 0 0 20i

* e 3.3.3.3 0 0 20i

b. After you execute the display ip routing-table verbose command on RR 1, you can find that the output interface and real next hop for network route 10.110.0.0/16 have changed to the interface directly connected to Border 1 and the interface’s IP address (19.1.1.1), respectively.

<RR1> display ip routing-table 10.110.0.0/16 verbose

Summary count : 1

Destination: 10.110.0.0/16

Protocol: BGP instance default

Process ID: 0

SubProtID: 0x6 Age: 00h00m19s

FlushedAge: 00h00m19s

Cost: 0 Preference: 255

IpPre: N/A QosLocalID: N/A

Tag: 0 State: Active Adv

OrigTblID: 0x0 OrigVrf: default-vrf

TableID: 0x2 OrigAs: 20

NibID: 0x16000002 LastAs: 20

AttrID: 0x2

BkAttrID: 0xffffffff Neighbor: 4.4.4.4

Flags: 0x10060 OrigNextHop: 4.4.4.4

Label: NULL RealNextHop: 19.1.1.1

BkLabel: NULL BkNextHop: N/A

SRLabel: NULL Interface: GigabitEthernet2/0/1

BkSRLabel: NULL BkInterface: N/A

Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/1

BkTunnel ID: Invalid BkIPInterface: N/A

InLabel: NULL ColorInterface: N/A

SIDIndex: NULL BkColorInterface: N/A

FtnIndex: 0x0 TunnelInterface: N/A

TrafficIndex: N/A BkTunnelInterface: N/A

Connector: N/A PathID: 0x0

UserID: 0x0 SRTunnelID: Invalid

SID Type: N/A NID: Invalid

FlushNID: Invalid BkNID: Invalid

BkFlushNID: Invalid StatFlags: 0x0

SID: N/A

BkSID: N/A

CommBlockLen: 0 Priority: Low

MemberPort: N/A

c. After you execute the display ip routing-table command, you can find the following information:

- The IP routing table does not contain other network routes that contain IP address 4.4.4.4.

- The output interface and next hop IP for the default route are and 19.1.1.1, respectively.

This indicates that network route 10.110.0.0/16 received from Border 4 has been iterated to the default route.

<RR1> display ip routing-table

Destinations : 25 Routes : 25

Destination/Mask Proto Pre Cost NextHop Interface

0.0.0.0/0 BGP 255 0 19.1.1.1 GE2/0/1

0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

1.1.1.1/32 O_INTRA 10 1 19.1.1.1 GE2/0/1

2.2.2.2/32 O_INTRA 10 1 29.1.1.2 GE2/0/2

3.3.3.3/32 O_INTRA 10 1 39.1.1.3 GE2/0/3

9.9.9.9/32 Direct 0 0 127.0.0.1 InLoop0

10.10.10.10/32 BGP 255 0 1.1.1.1 GE2/0/1

19.1.1.0/24 Direct 0 0 19.1.1.9 GE2/0/1

19.1.1.0/32 Direct 0 0 19.1.1.9 GE2/0/1

19.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0

19.1.1.255/32 Direct 0 0 19.1.1.9 GE2/0/1

10.110.0.0/16 BGP 255 0 4.4.4.4 GE2/0/1

29.1.1.0/24 Direct 0 0 29.1.1.9 GE2/0/2

29.1.1.0/32 Direct 0 0 29.1.1.9 GE2/0/2

29.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0

29.1.1.255/32 Direct 0 0 29.1.1.9 GE2/0/2

39.1.1.0/24 Direct 0 0 39.1.1.9 GE2/0/3

39.1.1.0/32 Direct 0 0 39.1.1.9 GE2/0/3

39.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0

39.1.1.255/32 Direct 0 0 39.1.1.9 GE2/0/3

49.1.1.0/24 Direct 0 0 29.1.1.9 GE2/0/2

49.1.1.0/32 Direct 0 0 29.1.1.9 GE2/0/2

49.1.1.9/32 Direct 0 0 127.0.0.1 InLoop0

49.1.1.255/32 Direct 0 0 29.1.1.9 GE2/0/2

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0

127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

If none of the above situations exists, contact Technical Support for help.

2. Use one of the following methods to remove the routing loop:

¡ Configure routing policies to filter recursive routes.

Execute the protocol bgp nexthop recursive-lookup route-policy route-policy-name command in RIB IPv4 address family view. This operation ensures that all BGP IPv4 network routes are iterated only to routes that can pass the routing policy specified by the route-policy-name argument.

Similarly, execute the protocol bgp4+ nexthop recursive-lookup route-policy route-policy-name command in RIB IPv6 address family view. This operation ensures that all BGP IPv6 network routes are iterated only to routes that can pass the routing policy specified by the route-policy-name argument.

In this scenario, create a routing policy on RR 1 that filters out the default route, and execute the protocol bgp nexthop recursive-lookup route-policy route-policy-name or protocol bgp nexthop recursive-lookup route-policy route-policy-name command to apply the routing policy. This configuration eliminates the BGP routing loop by preventing BGP routes from being iterated to the default route.

¡ Enable BFD for BGP.

After BFD is enabled for BGP, RR 1 uses BFD sessions to monitor the links to Border 3 and Border 4. If Border 3 or Border 4 restarts, BFD will detect link failures immediately. In this case, RR 1 will promptly terminate the related BGP session and delete the routes learned from Border 3 or Border 4. To enable BFD for BGP, execute the peer bfd command. For more information about this task, see the command reference.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

BGP routing loop in a cross-AS Spine-Leaf interconnect scenario

Symptom

As shown in Figure 50, the spine devices and the leaf devices are in different ASs. The spine devices are fully meshed. Spine 1 and Spine 2 each establish EBGP connections with the leaf devices. Spine 2 is enabled with load balancing and can perform load balancing across EBGP and IBGP routes. When Spine 1 restarts, traffic is routed to the leaf devices via Spine 2, and half of the traffic is lost.

Figure 50 Network diagram

Common causes

Before Spine 1 restarts, the BGP routing table of Spine 2 is similar to the following:

<Spine2> display bgp routing-table ipv4

Total number of routes: 3

BGP local router ID is 2.2.2.2

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Network NextHop MED LocPrf PrefVal Path/Ogn

* >i 0.0.0.0/0 24.1.1.4 100 0 i

* >e 100.1.1.0/24 23.1.1.3 0 0 20i

* i 1.1.1.1 0 100 0 20i

Leaf 2 receives network route 100.1.1.0/24 from both Leaf 1 (23.1.1.3) and Spine 1 (1.1.1.1). The next hop for the route received from Spine 1 is a loopback interface address of Spine 1.

The IP routing table of Spine 2 is similar to the following:

<Spine2> display ip routing-table

Destinations : 24 Routes : 25

Destination/Mask Proto Pre Cost NextHop Interface

0.0.0.0/0 BGP 255 0 24.1.1.4 GE2/0/1

0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

1.1.1.1/32 O_INTRA 10 1 12.1.1.1 GE2/0/2

2.2.2.2/32 Direct 0 0 127.0.0.1 InLoop0

4.4.4.4/32 O_INTRA 10 1 24.1.1.4 GE2/0/1

12.1.1.0/24 Direct 0 0 12.1.1.2 GE2/0/2

12.1.1.0/32 Direct 0 0 12.1.1.2 GE2/0/2

12.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0

12.1.1.255/32 Direct 0 0 12.1.1.2 GE2/0/2

14.1.1.0/24 O_INTRA 10 2 12.1.1.1 GE2/0/2

O_INTRA 10 2 24.1.1.4 GE2/0/1

23.1.1.0/24 Direct 0 0 23.1.1.2 GE2/0/3

23.1.1.0/32 Direct 0 0 23.1.1.2 GE2/0/3

23.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0

23.1.1.255/32 Direct 0 0 23.1.1.2 GE2/0/3

24.1.1.0/24 Direct 0 0 24.1.1.2 GE2/0/1

24.1.1.0/32 Direct 0 0 24.1.1.2 GE2/0/1

24.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0

24.1.1.255/32 Direct 0 0 24.1.1.2 GE2/0/1

100.1.1.0/24 BGP 255 0 23.1.1.3 GE2/0/3

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0

127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

For network route 100.1.1.0/24 received from Leaf 1, the IGP route to its next hop is 23.1.1.0/24, and the IGP metric is 0. For network route 100.1.1.0/24 received from Spine 1, the IGP route to its next hop is 1.1.1.1/32, and the IGP metric is 1. The two network routes 100.1.1.0/24 cannot establish a load balancing relationship in the BGP routing table, because their IGP metrics are different. This is desired by the network administrator: Spine 2 forwards traffic destined for network segment 100.1.1.0/24 to the leaf device rather than Spine 1.

Spine 3 advertises a default route to Spine 2 through BGP, and the next hop of the default route is the interface IP address directly connected to Spine 2. After Spine 1 restarts, Spine 2 retains the session to Spine 1 unless the session hold timer expires, and the routing table of Spine 2 still retain network route 100.1.1.0/24 received from Spine 1. However, the network route can be iterated only to the default route (0.0.0.0/0), because the IGP route for next hop 1.1.1.1 has become invalid and Spine 2 does not have other network routes that contain IP address 1.1.1.1.

In the BGP routing table of Spine 2, the IGP metric value is 0 for the next hop of network route 100.1.1.0/24 from Spine 1, which corresponds to route entry 0.0.0.0/0 BGP 255 0 24.1.1.4 GE2/0/1. Network routes 100.1.1.0/24 from Spine 1 and the leaf device have the same IGP metric value, so they can establish a load balancing relationship. After traffic destined for network segment 100.1.1.0/24 arrives at Spine 2, half of the traffic is distributed to Spine 3. Then, Spine 3 forwards the traffic back to Spine 2, because Spine 3 learned network route 100.1.1.0/24 from Spine 1 and Spine 2. This causes a routing loop and route loss.

Analysis

Figure 51 shows the troubleshooting flowchart:

Figure 51 Troubleshooting flowchart

Solution

1. View the BGP routing table and IP routing table of Spine 2. This example uses the network shown in Figure 50 for illustration.

a. After Spine 1 restarts, if you execute the display bgp routing-table ipv4 command on Spine 2 before Spine 2 is disconnected from Spine 1, you can find that two network routes 10.110.0.0/24 received from different devices are simultaneously selected as optimal routes.

<Spine2> display bgp routing-table ipv4

Total number of routes: 3

BGP local router ID is 2.2.2.2

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Network NextHop MED LocPrf PrefVal Path/Ogn

* >i 0.0.0.0/0 24.1.1.4 100 0 i

* >e 100.1.1.0/24 23.1.1.3 0 0 20i

* >i 1.1.1.1 0 100 0 20i

b. After you execute the display ip routing-table verbose command on Spine 2, you can find the following information:

- The two network routes 10.110.0.0/24 have established a load balancing relationship.

- For one of the routes, the real next hop is interface IP address 24.1.1.4 of Spine 3, and the output interface is the interface that directly connects Spine 2 to Spine 3.

<Spine2> display ip routing-table 100.1.1.0/24 verbose

Summary count : 2

Destination: 100.1.1.0/24

Protocol: BGP instance default

Process ID: 0

SubProtID: 0x5 Age: 00h00m13s

FlushedAge: 00h00m13s

Cost: 0 Preference: 255

IpPre: N/A QosLocalID: N/A

Tag: 0 State: Active Adv

OrigTblID: 0x0 OrigVrf: default-vrf

TableID: 0x2 OrigAs: 20

NibID: 0x16000002 LastAs: 10

AttrID: 0x2

BkAttrID: 0xffffffff Neighbor: 1.1.1.1

Flags: 0x10060 OrigNextHop: 1.1.1.1

Label: NULL RealNextHop: 24.1.1.4

BkLabel: NULL BkNextHop: N/A

SRLabel: NULL Interface: GigabitEthernet2/0/1

BkSRLabel: NULL BkInterface: N/A

Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/1

BkTunnel ID: Invalid BkIPInterface: N/A

InLabel: NULL ColorInterface: N/A

SIDIndex: NULL BkColorInterface: N/A

FtnIndex: 0x0 TunnelInterface: N/A

TrafficIndex: N/A BkTunnelInterface: N/A

Connector: N/A PathID: 0x0

UserID: 0x0 SRTunnelID: Invalid

SID Type: N/A NID: Invalid

FlushNID: Invalid BkNID: Invalid

BkFlushNID: Invalid StatFlags: 0x0

SID: N/A

BkSID: N/A

CommBlockLen: 0 Priority: Low

MemberPort: N/A

Destination: 100.1.1.0/24

Protocol: BGP instance default

Process ID: 0

SubProtID: 0x6 Age: 01h18m22s

FlushedAge: 00h00m13s

Cost: 0 Preference: 255

IpPre: N/A QosLocalID: N/A

Tag: 0 State: Active Adv

OrigTblID: 0x0 OrigVrf: default-vrf

TableID: 0x2 OrigAs: 20

NibID: 0x16000000 LastAs: 20

AttrID: 0x0

BkAttrID: 0xffffffff Neighbor: 23.1.1.3

Flags: 0x10060 OrigNextHop: 23.1.1.3

Label: NULL RealNextHop: 23.1.1.3

BkLabel: NULL BkNextHop: N/A

SRLabel: NULL Interface: GigabitEthernet2/0/3

BkSRLabel: NULL BkInterface: N/A

Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/3

BkTunnel ID: Invalid BkIPInterface: N/A

InLabel: NULL ColorInterface: N/A

SIDIndex: NULL BkColorInterface: N/A

FtnIndex: 0x0 TunnelInterface: N/A

TrafficIndex: N/A BkTunnelInterface: N/A

Connector: N/A PathID: 0x0

UserID: 0x0 SRTunnelID: Invalid

SID Type: N/A NID: Invalid

FlushNID: Invalid BkNID: Invalid

BkFlushNID: Invalid StatFlags: 0x0

SID: N/A

BkSID: N/A

CommBlockLen: 0 Priority: Low

MemberPort: N/A

c. After you execute the display ip routing-table command, you can find the following information:

- The IP routing table does not contain other network routes that contain IP address 1.1.1.1.

- The output interface and next hop IP for the default route are and 24.1.1.4, respectively.

This indicates that network route 100.1.1.0/24 received from Spine 1 has been iterated to the default route.

<Spine2> display ip routing-table

Destinations : 23 Routes : 24

Destination/Mask Proto Pre Cost NextHop Interface

0.0.0.0/0 BGP 255 0 24.1.1.4 GE2/0/1

0.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

2.2.2.2/32 Direct 0 0 127.0.0.1 InLoop0

4.4.4.4/32 O_INTRA 10 1 24.1.1.4 GE2/0/1

12.1.1.0/24 Direct 0 0 12.1.1.2 GE2/0/2

12.1.1.0/32 Direct 0 0 12.1.1.2 GE2/0/2

12.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0

12.1.1.255/32 Direct 0 0 12.1.1.2 GE2/0/2

14.1.1.0/24 O_INTRA 10 2 24.1.1.4 GE2/0/1

23.1.1.0/24 Direct 0 0 23.1.1.2 GE2/0/3

23.1.1.0/32 Direct 0 0 23.1.1.2 GE2/0/3

23.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0

23.1.1.255/32 Direct 0 0 23.1.1.2 GE2/0/3

24.1.1.0/24 Direct 0 0 24.1.1.2 GE2/0/1

24.1.1.0/32 Direct 0 0 24.1.1.2 GE2/0/1

24.1.1.2/32 Direct 0 0 127.0.0.1 InLoop0

24.1.1.255/32 Direct 0 0 24.1.1.2 GE2/0/1

100.1.1.0/24 BGP 255 0 1.1.1.1 GE2/0/1

BGP 255 0 23.1.1.3 GE2/0/3

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

127.0.0.0/32 Direct 0 0 127.0.0.1 InLoop0

127.0.0.1/32 Direct 0 0 127.0.0.1 InLoop0

127.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

If none of the above situations exists, contact Technical Support for help.

2. Use one of the following methods to remove the routing loop:

¡ Configure routing policies to filter recursive routes.

In this scenario, create a routing policy on Spine 2 that filters out the default route, and execute the protocol bgp nexthop recursive-lookup route-policy route-policy-name or protocol bgp nexthop recursive-lookup route-policy route-policy-name command to apply the routing policy. This configuration eliminates the BGP routing loop by preventing BGP routes from being iterated to the default route.

¡ Enable BFD for BGP.

After BFD is enabled for BGP, Spine 1 and Spine 2 uses a BFD session to monitor their link. If Spine 1 restarts, BFD will detect a link failure immediately. In this case, Spine 2 will promptly terminate the related BGP session and delete the routes learned from Spine 1. To enable BFD for BGP, execute the peer bfd command. For more information about this task, see the command reference.

¡ Verify that EBGP and IBGP routes cannot establish a load balancing relationship.

In this example, the two routes for network segment 100.1.1.0/24 are learned from an IBGP peer and an EBGP peer, respectively. When you configure the balance command in the related BGP instance, do not specify the eibgp keyword. Without this keyword specified, Spine 2 selects only network route 100.1.1.0/24 received from the leaf device as the optimal route, according to the BGP route selection rules. This ensures that all traffic can be forwarded correctly.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Public traffic interrupted in BGP network

Symptom

Public traffic is interrupted when it is forwarded through BGP.

Common causes

The following are the common causes of this type of issue:

· The next hop of the related BGP public route is unreachable.

· The distribution or reception policy for BGP public routes is inappropriate.

· The related route is discarded, because the number of BGP public routes has exceeded the maximum number of routes that the device can receive.

Analysis

Figure 52 shows the troubleshooting flowchart:

Figure 52 Troubleshooting flowchart

Solution

1. Identify whether the required BGP public route exists and is valid.

Based on the next hop of the BGP route, the expected forwarding path for public network traffic, and the network topology plan, locate the sender of the BGP public route. On the sender, execute the display bgp routing-table ipv4 unicast or display bgp routing-table ipv6 unicast command to view BGP public route information.

a. If the required BGP public route does not exist, use the import-route or network command to generate the route. After the BGP route is generated or if the required BGP public route already exists, proceed to step b.

b. Identify whether the required BGP public route is valid. A BGP route is valid only if it has a reachable next hop. Take route 10.2.1.0/24 as an example. If this route is marked with an asterisk (*) in the command output, it is a valid route.

<Sysname> display bgp routing-table ipv4 unicast

Total number of routes: 4

BGP local router ID is 192.168.100.1

Status codes: * - valid, > - best, d - dampened, h - history

s - suppressed, S - stale, i - internal, e - external

a – additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Network NextHop MED LocPrf PrefVal Path/Ogn

* > 10.2.1.0/24 10.2.1.1 0 0 i

e 10.2.1.2 0 0 4294967295 i

View the command output to identify whether the required BGP public route is valid.

- If the BGP public route is invalid, the IP routing table does not have a route to the next hop of the BGP route. In this case, check for incorrect IP routing settings (IGP or static routing settings), and make sure the IP routing table contains a route to the next hop of the BGP route.

- If the BGP public route is valid, proceed to step 2.

2. Identify whether the distribution or reception policy for BGP public routes is inappropriate.

Based on the next hop of the BGP route, the expected forwarding path for public network traffic, and the network topology plan, locate the sender and receiver of the BGP public route. On both of the sender and receiver, execute the display current-configuration configuration bgp command to view the effective BGP settings.

As shown in the following command output, the commands that define BGP route distribution or reception include:

¡ peer prefix-list

¡ peer filter-policy

¡ peer as-path-acl

¡ filter-policy

¡ peer route-policy

<Sysname> display current-configuration configuration bgp

bgp 20

peer 12.1.1.1 as-number 10

peer 23.1.1.3 as-number 30

address-family ipv4 unicast

filter-policy 2088 export

network 9.9.9.9 255.255.255.255

peer 12.1.1.1 enable

peer 12.1.1.1 filter-policy 2077 export

peer 12.1.1.1 route-policy test export

peer 23.1.1.3 as-path-acl 2 export

peer 23.1.1.3 enable

peer 23.1.1.3 next-hop-local

peer 23.1.1.3 prefix-list abc export

return

For more information about these commands, see BGP commands in Layer 3—IP Routing Command Reference. After you find the effective BGP settings, identify whether the configured distribution or reception policy affects the distribution or reception of BGP public routes.

¡ If the distribution or reception of BGP public routes is abnormal, correct the distribution or reception policy.

¡ If the distribution or reception of BGP public routes is normal, proceed to step 3.

3. Identify whether the number of BGP routes has exceeded the maximum.

On the receiver of the BGP public route, execute the display current-configuration configuration bgp command to check for the peer route-limit command.

¡ If the peer route-limit command is configured and the receiver has generated the following log message:

BGP/4/BGP_EXCEED_ROUTE_LIMIT: BGP.: The number of routes from peer 1.1.1.1 (IPv4-UNC) exceeds the limit 100.

The sender of the BGP public route has advertised too many BGP routes, which causes some BGP public routes to be discarded by the receiver. In this case, use the following methods to resolve the issue:

- On the sending device, execute the aggregate command with the detail-suppressed or suppress-policy keyword specified to create summary routes and suppress the advertisement of summarized routes.

- On the receiving device, execute the peer route-limit command to increase the maximum number of routes that the device can receive.

¡ Proceed to step 4 if one of the following conditions exists:

- The peer route-limit command is not configured.

- The peer route-limit command is configured, but the number of routes received by the receiving device his below the upper limit (no BGP/4/BGP_EXCEED_ROUTE_LIMIT log message is generated).

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

· 1.3.6.1.4.1.25506.2.202.4.0.1 hh3cBgpPeerRouteNumThresholdExceed

· 1.3.6.1.4.1.25506.2.202.4.0.2 hh3cBgpPeerRouteNumThresholdCleard

· 1.3.6.1.4.1.25506.2.202.4.0.3 hh3cBgpPeerRouteExceed

· 1.3.6.1.4.1.25506.2.202.4.0.4 hh3cBgpPeerRouteExceedClear

· 1.3.6.1.4.1.25506.2.202.4.0.5 hh3cBgpPeerEstablished

· 1.3.6.1.4.1.25506.2.202.4.0.6 hh3cBgpPeerBackwardTransition

Log messages

· BGP/4/BGP_EXCEED_ROUTE_LIMIT

· BGP/5/BGP_REACHED_THRESHOLD

IS-IS issues

IS-IS neighbor establishment failure

Symptom

· The IS-IS neighbor is down.

· The IS-IS neighbor relationship flaps.

Common causes

The following are the common causes of this type of issue:

· IS-IS cannot send or receive hello packets normally, because a device has underlying faults or the link between the two devices fails.

· The devices at the ends of the link use the same system ID.

· The interfaces connected by the link use different MTU settings or the effective interface MTU is smaller than the transmitted hello packets.

· The IP addresses of the interfaces connected by the link are not in the same network segment.

· The interfaces connected by the link use different authentication modes.

· The two ends of the link are at mismatching IS-IS levels.

· When the two devices establish an IS-IS Level-1 neighbor relationship, they use different area addresses.

Troubleshooting flow

Figure 53 shows the troubleshooting flowchart.

Figure 53 Flowchart for troubleshooting IS-IS neighbor establishment failures

Solution

1. Identify whether the IS-IS interface is up at the physical layer.

Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the physical state of the IS-IS interface. If the interface is down, first resolve interface failures. If the interface is up, proceed to step 2.

2. Check for link failures.

¡ Execute the ping command to identify whether the link between the two devices fails (including whether the transport devices fail). If the link is operating properly, proceed to step 3.

To have IS-IS use BFD for link state detection, execute the isis bfd session-restrict-adj command to enable BFD session state-based control of adjacency establishment and maintenance. The IS-IS interfaces will advertise BFD-enabled TLVs in hello packets to each other. If the exchanged BFD-enabled TLVs carry the same information, BFD session state-based control of adjacency establishment and maintenance takes effect. After the BFD session goes down, the interfaces cannot establish an IS-IS adjacency.

¡ Execute the display bfd session command to view the state of the BFD session that monitors the IS-IS link. If the State field displays Down, first resolve link failures. If the State field displays Up, proceed to step 3.

3. Identify whether the CPU or memory usage is too high.

¡ Execute the display cpu-usage command to identify whether the MPU and interface modules on the failed device have high CPU usage. If the CPU usage is too high, IS-IS cannot normally send or receive packets, resulting in neighbor flapping. To resolve this issue, disable unnecessary features. If the CPU usage is not high, proceed to step 4.

¡ Execute the display memory-threshold command, and then view the Current free-memory state field in the command output. This field displays the current memory usage of the system. If this field displays Minor, Severe, or Critical, the free memory resources might be tight. As a result, the device might be unable to receive or send IS-IS packets, or might process IS-IS packets slowly. To resolve this issue, disable unnecessary features. If the current memory usage is normal, proceed to step 4.

4. Identify whether the state of the IS-IS interface is normal.

Execute the display isis interface command, and then view the IPv4 state or IPv6 state field to identify whether the IS-IS interface is in normal state.

¡ If the state of the IS-IS interface is Lnk:Up/IP:Dn, the interface is up at the link layer but is down at the network layer. You need to resolve interface faults at the network layer.

¡ If the state of the IS-IS interface is Up, proceed to step 5.

5. Identify whether IP addresses of the two IS-IS interfaces are in the same network segment.

¡ For IPv4 IS-IS, execute the display interface brief command to view the IPv4 address of each IS-IS interface.

- If the two IPv4 interface addresses are in different network segments, execute the ip address command on either of the interfaces to adjust its IPv4 address. This operation ensures that the IPv4 addresses of the two IS-IS interfaces are in the same network segment.

- If the two IPv4 interface addresses are in the same network segment, proceed to step 6.

¡ For IPv6 IS-IS, this check is not required.

6. Identify whether the two IS-IS interfaces use the same MTU value.

Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view interface MTU information.

¡ If the two IS-IS interfaces use different MTU values, execute the mtu size command on either of the interfaces to adjust its MTU value. This operation ensures that the two IS-IS interfaces use the same MTU value.

¡ If the two IS-IS interfaces use the same MTU value, proceed to step 7.

7. Identify whether IS-IS can receive hello packets.

Execute the display isis packet hello by-interface verbose command to identify whether IS-IS can receive hello packets. If the device cannot receive hello packets, resolve the packet loss issue. If the issue persists, proceed to step 12.

If the device can receive hello packets, continue to perform the following checks:

¡ If the value for the Duplicate system ID field increases over time, a system ID conflict exists and you need to proceed to step 8.

¡ If the value for the Mismatched level (LAN) field increases over time, an IS level mismatch exists and you need to proceed to step 9.

¡ If the value for the Bad area address TLV field increases over time, an area address mismatch exists and you need to proceed to step 10.

¡ If the values for other fields increase over time, proceed to step 12.

8. Identify whether the devices connected by the link use the same system ID.

Execute the display current-configuration isis command to view the system IDs of the devices.

¡ If their system IDs are identical, change the system ID of either device.

¡ If their system IDs are different, proceed to step 9.

9. Identify whether the devices connected by the link have an IS level mismatch.

Identify the IS level of each device and the circuit level of each IS-IS interface.

¡ Execute the display current-configuration | include is-level command to view the IS levels of the two devices connected by the link. If this command does not display the IS level of a device, the IS level of the device is Level-1-2, the default value.

¡ Execute the display current-configuration interface interface-type interface-number | include circuit-level command to view the circuit levels of IS-IS interfaces. If this command does not display the circuit level of an IS-IS interface, the circuit level of the interface is Level-1-2, the default value. The interface can establish both Level-1 and Level-2 adjacencies.

Two IS-IS interfaces can establish an IS-IS neighbor relationship only when their circuit levels meet one of the following requirements:

¡ If the circuit level of the local interface level is Level-1, the circuit level of the remote interface must be Level-1 or Level-1-2.

¡ If the circuit level of the local interface level is Level-2, the circuit level of the remote interface must be Level-2 or Level-1-2.

¡ If the circuit level of the local interface level is Level-1-2, the circuit level of the remote interface can be Level-1, Level-2, or Level-1-2.

Perform one of the following troubleshooting operations accordingly:

¡ If the two devices have an IS level mismatch, execute the is-level command in IS-IS view for either of the devices to adjust its IS level. Alternatively, execute the isis circuit-level command in interface view for the desired interface to adjust its circuit level.

¡ If the IS levels of the two devices are matching, proceed to step 10.

10. Identify whether the area addresses of the two devices connected by the link are matching.

Execute the display isis command, and then view the Network entity field in the command output to identify whether the area addresses of the devices are matching. The network entity title (NET) format is X…X.XXXX.XXXX.XXXX.00. The X…X segment represents the area address, the XXXX.XXXX.XXXX segment represents the system ID, and the 00 segment is the SEL.

¡ Two IS-IS devices can establish a Level-1 neighbor relationship only when they are in the same area. When they establish an IS-IS Level-2 neighbor relationship, area address check is not required.

When the two devices fail to establish a Level-1 neighbor relationship due to area address inconsistency, execute the network-entity command in IS-IS view for either of the devices to adjust its area address.

¡ If the area addresses of the two devices are matching, proceed to step 11.

11. Identify whether the two devices connected by the link are in the same IS-IS authentication mode.

Execute the display current-configuration interface-type interface-number | include isis command to view the authentication mode of the IS-IS interface on each device.

a. If the two IS-IS interfaces are in different authentication modes, execute the isis authentication-mode command on either of the IS-IS interfaces to adjust its authentication mode. This operation ensures that the two IS-IS interfaces are in the same authentication mode.

b. If the two IS-IS interfaces are in the same authentication mode and still fail to establish a neighbor relationship, verify that they use the same authentication password.

If the issue persists, proceed to step 12.

12. Collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module name: ISIS-MIB

isisAdjacencyChange (1.3.6.1.2.1.138.0.17)

Log messages

ISIS/3/ISIS_NBR_CHG

IS-IS route learning failure

Symptom

A device cannot learn an IS-IS route.

Common causes

The following are the common causes of this type of issue:

· Other routing protocols have advertised routes with the same destination address, and their protocol preferences are higher than that for IS-IS.

· The route is not selected as an optimal route, because it is redistributed into IS-IS and its preference is low.

· The route is not selected as an optimal route, because it is redistributed into IS-IS and is of a different cost type.

· The device and the advertisement source device are in different IS-IS cost styles.

· The device and the advertisement source device does not establish a normal IS-IS neighbor relationship.

· The device and the advertisement source device are configured with the same system ID.

· LSP authentication fails.

· Some LSPs are lost because the device has underlying faults or the link between the two devices fails.

· The device cannot receive the LSPs from the advertisement source device, because the LSP length has exceeded the maximum length of LSPs that the device can receive.

Troubleshooting flow

Figure 54 shows the troubleshooting flowchart.

Figure 54 Flowchart for troubleshooting IS-IS route learning failures

Solution

1. Identify whether the IS-IS routing table contains the desired IS-IS route.

Execute the display isis route command to view the IS-IS routing table.

¡ If the IS-IS route exists in the IS-IS routing table, execute the display ip routing-table ip-address [ mask | mask-length ] verbose command to check for routes with protocol preferences higher than that for IS-IS routes.

- If such routes exist, adjust the configuration according to the network plan.

- If such routes do not exist, proceed to step 7.

¡ If the IS-IS route does not exist in the IS-IS routing table, proceed to step 7.

2. Identify whether the desired IS-IS route is advertised.

Execute the display isis lsdb verbose local command on the advertisement source device to identify whether the device has advertised LSPs that carry the IS-IS route.

¡ If no LSPs carry the IS-IS route, check for incorrect IS-IS configurations on the advertisement source device. For example, you can check whether the related interface is enabled with IS-IS. If the IS-IS route is an external route redistributed into IS-IS, execute the display ip routing-table protocol protocol verbose command, and then view the State field of the route. If this field contains Inactive, the external route is inactive. IS-IS does not advertise inactive routes. In this situation, adjust the configurations related with external routes to ensure that the State field of the route contains Active and Adv.

¡ If an LSP that carries the IS-IS route exists, proceed to step 7.

3. Identify whether the desired IS-IS route is of the same cost type as other redistributed routes with the same destination address.

When multiple devices advertise routes to the same destination through route redistribution and these external routes need to form a load balancing relationship, make sure these routes are of the same cost type after redistribution by IS-IS. The cost value for a redistributed route varies by its cost type:

¡ If the cost type is external, the cost value equals the original cost value plus 64 when IS-IS advertises the route in LSPs.

¡ If the cost type is internal, the cost value equals the original cost value when IS-IS advertises the route in LSPs.

By default, the cost type is external for external routes redistributed by H3C devices. If the cost type of external routes redistributed by non-H3C devices is not external, the cost values for routes with the same destination address will be different. As a result, IS-IS neighbors will select the route with the lowest cost value as the optimal route. In this case, adjust the cost type of redistributed external routes to ensure that the external routes redistributed by devices from various vendors are all of the same cost type. To adjust the cost type of external routes redistributed by an H3C device:

a. On the device that advertises the desired IS-IS route, execute the display current-configuration configuration isis command to view the route redistribution configuration for IS-IS.

b. Execute the import-route command to adjust the cost type of external routes redistributed into IS-IS.

In situations other than those mentioned above, proceed to step 4.

4. Identify whether the IS-IS database has been synchronized.

On the device that cannot learn the IS-IS route, execute the display isis lsdb command to identify whether the device has received an LSP that contains the IS-IS route from the advertisement source device.

¡ If the desired LSP does not exist in the LSDB, check for link failures. If no link failures are found, execute the display isis command, and then view the LSP length receive field to determine whether the related LSP is too long for the device to receive. When the value of this field exceeds the maximum LSP length supported by the device, use the lsp-length originate command on the advertisement source to change the maximum length of generated LSPs. Make sure the maximum length of LSPs generated by the advertisement source equals the minimum IS-IS interface MTU within the current area.

¡ If the desired LSP exists in the LSDB, but the following conditions exist, the system ID of the advertisement source device conflicts with that of another device:

- The value for the Seq Num field of the LSP keeps increasing.

- The value for the Seq Num field of the LSP are different from that on the advertisement source device. To view the value for the Seq Num field of the LSP on the advertisement source device, use the display isis lsdb local verbose command.

In this situation, find the device that uses the same system ID as the advertisement source device, and then change the system ID of either device.

¡ If the desired LSP exists in the LSDB, but the following conditions exist, LSPs might be discarded during transmission:

- The value for the Seq Num field of the LSP remains unchanged.

In this situation, check for underlying faults on the device and identify whether the intermediate links between the device and the advertisement source device fail.

¡ If the following conditions exist, proceed to step 5:

- The desired LSP exists in the LSDB.

- The value for the Seq Num field of the LSP is the same as that on the advertisement source device. To view the value for the Seq Num field of the LSP on the advertisement source device, use the display isis lsdb local verbose command.

5. Identify whether the device and the advertisement source device use the same cost style.

Execute the display isis command on the device and the advertisement source device separately, and then view the value for the Cost style field to identify whether the two devices use the same cost style. They can learn routes from each other only if their cost styles are the same.

¡ If the two devices use different cost styles, execute the cost-style command in IS-IS view for either of the devices to change its cost style.

¡ If the two devices use the same cost style, proceed to step 6.

6. Identify whether all devices along the path between the device and the advertisement source device have established IS-IS neighbor relationships correctly.

Execute the display isis peer command on each device to check for abnormal IS-IS neighbor relationships.

¡ If some neighbor relationships are established incorrectly, resolve this issue as described in ""IS-IS neighbor establishment failure."

¡ If all neighbor relationships are established correctly, proceed to step 7.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

IS-IS route flapping

Symptom

An IS-IS route is repeatedly added and deleted.

Common causes

The following are the common causes of this type of issue:

· The IS-IS neighbor flaps.

· The MPLS LSP tunnel flaps.

· On the local and remote devices, IS-IS redistributes the same external route with IS-IS. The external route takes precedence over the IS-IS route.

· The local and remote devices are configured with the same system ID.

Troubleshooting flow

Figure 55 shows the troubleshooting flowchart.

Figure 55 Flowchart for troubleshooting IS-IS route flapping

Solution

1. View the route flapping details.

Execute the display ip routing-table ip-address verbose command to view the route flapping details as follows:

¡ If the TunnelID field of the IS-IS route changes before and after route flapping, identify whether the MPLS LSP tunnel flaps.

Execute the display mpls lsp verbose command, and then view the Last Chg Time field to view the time when the state of the LDP LSP changed most recently. If the time is close to the time when the display mpls lsp verbose command is executed, MPLS LSP tunnel flapping exists.

In this situation, check for and troubleshoot LSP flapping issues. You can see the solution to LDP LSP flapping issues or sudden TE tunnel state changes (from up to down).

¡ If the Cost or Interface field of the IS-IS route changes, check for IS-IS neighbor flapping along the route.

¡ If the route sometimes appears and sometimes disappears in the routing table (the Age field is flapping), you can execute the display isis lsdb verbose command to find the LSP that carries the IS-IS route. Record the LSP ID of the LSP, and then the display isis lsdb verbose lsp-id command to view the update status of this LSP.

- If the LSP always carries the IS-IS route, check for IS-IS neighbor flapping along the route.

- If the Seq Num field of the LSP keeps increasing and a sharp content change exists before and after the LSP update, check for devices configured with the same system ID in the network.

- If the Seq Num field of the LSP keeps increasing and the IS-IS route is intermittently present before and after the LSP update, perform step 2 on the device that generated the LSP.

¡ If the Protocol field of the IS-IS route changes, proceed to step 2.

2. Check the route redistribution configuration for IS-IS.

If the IS-IS route is an external route redistributed into IS-IS, execute the display ip routing-table ip-address verbose command on the device into which the route was redistributed. This command displays the route flapping details.

¡ If the IS-IS route is not active and another IS-IS route with the same destination address is in Active state in the routing table, it indicates that other IS-IS devices in the network have advertised the same route. To resolve this issue, perform one of the following operations:

- Adjust the route preference according to the network plan.

- Configure a route filtering policy on the IS-IS device that redistributes the external route to control the routes flushed to the IP routing table.

¡ In other situations, proceed to step 3.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting OSPFv3

OSPFv3 neighbor down

Symptom

· The OSPFv3 neighbor goes down.

· OSPF neighbor flapping occurs.

Common causes

The following are the common causes for this type of issue:

· The BFD session is down, which indicates that BFD detects a link failure.

· The remote device fails.

· CPU usage or memory usage is excessively high.

· Link failures occur.

· The OSPFv3 interface is not up.

· The IP addresses of the two ends are not on the same network.

· The OSPFv3 settings of the two ends do not match.

¡ Router ID conflict occurs.

¡ Area types of the two ends are not consistent.

¡ OSPFv3 authentication modes of the two ends are not consistent.

¡ The timer settings of the two ends are not consistent.

¡ The network types of the OSPFv3 interfaces at the two ends do not match.

Troubleshooting flow

Figure 56 shows the troubleshooting flowchart.

Figure 56 Flowchart for troubleshooting OSPFv3 neighbor down

Solution

1. Identify the reason for the OSPFv3 neighbor down issue through the CLI.

Execute the display ospfv3 event-log peer command. The Reason field in the command output represents the reasons for neighbor state changes. Common options include:

¡ DeadExpired

The device does not receive any Hello packet before the dead timer expires, and the OSPFv3 neighbor state becomes Down. In this case, proceed to the next step.

¡ BFDDown

The BFD session goes down, causing the OSPFv3 neighbor state to become Down. In this case, proceed to the next step.

¡ 1-Way

The neighbor’s OSPFv3 state becomes Down. It sends 1-way Hello packets to the local device, causing the OSPFv3 state becomes Init on the local device. In this case, troubleshoot faults on the neighbor.

¡ IntPhyChange

The interface goes down or its MTU changes, tearing down the neighbor relationship. In this case, execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the running state and related information about the interface, and troubleshoot the interface faults. For other situations, proceed to step 11.

2. Identify whether the physical layer state of the interface is Up.

Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the physical layer state of the OSPFv3 interface. If the physical layer state is Down, first troubleshoot the interface faults. If the physical layer state is Up, proceed to step 2.

3. Identify whether the link fails.

Execute the ping command to identify whether the link, including transmission devices, fails. If the link operates correctly, proceed to step 4.

4. Identify whether the CPU usage is excessively high.

Execute the display cpu-usage command to identify whether the CPU usage of the device's MPU and interface module is excessively high. High CPU usage prevents the normal transmission of OSPFv3 packets, causing neighbor flapping. To resolve this issue, close unnecessary functions. If the CPU usage is not high, proceed to the next step.

5. Identify whether the memory usage exceeds the memory usage threshold.

Execute the display memory-threshold command. If the Current free-memory state field in the output, which represents the current memory usage of the system, displays Minor, Severe, or Critical, it indicates that the remaining free memory is relatively low. In this case, the device might be unable to send or receive OSPFv3 packets, or might process OSPFv3 packets slowly. To resolve this issue, close unnecessary functions. If the Current free-memory state field displays Normal, proceed to step 6.

6. Identify whether each OSPFv3 interface is in a normal state.

Execute the display ospfv3 interface command to identify whether the OSPFv3 interface is in a normal state.

¡ If the OSPFv3 interface is in Down state, identify whether OSPFv3 is enabled on the interface. If OSPFv3 is enabled, troubleshoot the interface issue on the network layer.

¡ If the OSPFv3 interface is in a normal state, including DR, BDR, DROther, and P-2-P, proceed to the next step.

7. Identify whether the OSPFv3 interfaces have the same MTU value.

If the ospfv3 mtu-ignore command is not executed for the interfaces, the interfaces must have the same MTU value. If they have different MTU values, OSPFv3 neighbor relationships cannot be established. Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the MTU information of each interface.

¡ If the interfaces have different MTU values, execute the mtu size command in interface view to configure the same MTU value for the interfaces.

¡ If the interfaces have the same MTU value, proceed to step 8.

8. Identify whether the DR priority of each interface is not zero.

On a broadcast or NBMA network, to elect a DR correctly, make sure that a minimum of one OSPFv3 interface has a non-zero DR priority. If all OSPFv3 interfaces have the DR priority of zero, the neighbor states at both ends can only become 2-Way. Execute the display ospfv3 interface command to view OSPFv3 interface information. The Priority field in the command output displays the DR priority of the interface.

If one or multiple interfaces have non-zero DR priorities, proceed to the next step.

9. Identify whether a neighbor has been manually specified for the NBMA or P2MP unicast interface.

When the network type of an interface is NBMA or P2MP unicast, you must use the ospfv3 peer command to specify a neighbor by its link-local address. Execute the display this command in interface view. If the network type of the interface is NBMA or P2MP unicast, execute the ospfv3 peer command to manually specify a neighbor by its link-local address.

If a neighbor has been manually specified for the NBMA or P2MP unicast interface, proceed to the next step.

10. Identify whether the OSPFv3 settings at the two ends are incorrect.

a. Execute the display ospfv3 command to view the OSPFv3 router IDs of the two ends. If the two ends are configured with the same router ID, edit the configuration to avoid the conflict. If the two ends are configured with different router IDs, proceed to the next step.

b. Execute the display ospfv3 interface command to view the area IDs of the two ends. If the two ends are configured with different area IDs, edit the configuration to ensure consistency. If the two ends are configured with the same area ID, proceed to the next step.

c. Execute the display ospfv3 interface command to view the network types of interfaces at the two ends. If the two interfaces are configured with different network types, edit the configuration to ensure consistency. If PTP is specified for one end and broadcast for the other, the neighbor relationship can enter Full state, but routing information cannot be calculated.

If the two ends are configured with the same network type, proceed to the next step.

d. Execute the display ospfv3 statistics error command every 10 seconds for 5 minutes to view OSPFv3 error statistics. Pay attention to the following fields:

- Authentication failure field. If the value of this field keeps increasing, it indicates that the two neighbors are configured with different OSPFv3 authentication modes. To resolve this issue, configure the same authentication mode for them.

- HELLO: Hello-time mismatch field. If the value of this field keeps increasing, it indicates that the two interfaces are configured with different hello intervals. To resolve this issue, configure the same hello interval for them.

- HELLO: Dead-time mismatch field. If the value of this field keeps increasing, it indicates that the two interfaces are configured with different dead intervals. To resolve this issue, configure the same dead interval for them.

- HELLO: Ebit option mismatch field. If the value of this field keeps increasing, it indicates that areas to which the two neighbors belong are of different types, for example, one is in a normal area and the other in a stub or NSSA area. To resolve this issue, configure the same area type for them.

If the issue persists, proceed to step 11.

11. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module name: OSPFV3-MIB

· ospfv3VirtIfStateChange (1.3.6.1.2.1.191.0.1)

· ospfv3NbrStateChange (1.3.6.1.2.1.191.0.2)

· ospfv3VirtNbrStateChange (1.3.6.1.2.1.191.0.3)

Log messages

· OSPFV3/6/OSPFV3_LAST_NBR_DOWN

· OSPFV3/5/OSPFV3_NBR_CHG

OSPFv3 neighbor unable to enter Full state

Symptom

The OSPFv3 neighbor state machine involves neighbor states of Down, Init, 2-way, ExStart, Exchange, Loading, and Full. Among them, the stable states are Down, 2-way, and Full.

· Down—OSPFv3 is not enabled.

· 2-way—The neighbor relationship between DR Others.

· Full—The local device and the neighbor are fully adjacent.

In networks using OSPFv3 for route calculation and forwarding, only 2-way and Full are normal neighbor states. If the neighbor state is neither 2-way nor Full, it indicates an abnormal neighbor relationship.

Common causes

The following are the common causes for this type of issue:

· OSPFv3 packets are dropped due to link failures.

· The DR priority configuration for the interfaces is not appropriate.

· The two ends are configured with different MTU values.

Troubleshooting flow

Figure 57 shows the troubleshooting flowchart.

Figure 57 Flowchart for troubleshooting OSPFv3 neighbor unable to enter Full state

Solution

1. Execute the display ospfv3 peer command to view OSPFv3 neighbor information, and perform different tasks based on the neighbor state.

¡ If no neighbor information exists:

Identify whether a Router ID is configured for the OSPFv3 process. If no Router ID is configured, the OSPFv3 process cannot operate. If a Router ID is configured, it indicates that the OSPFv3 neighbor goes down or neighbor flapping occurs.

¡ If the neighbor state remains Init:

It indicates that the remote device cannot receive Hello packets from the local end. In this case, identify whether the link or the remote device fails.

¡ If the neighbor state remains 2-Way:

Execute the display ospfv3 interface verbose command to identify whether the DR priority for OSPFv3 interface of the device is zero.

If the DR priority of the OSPFv3 interface is zero, no action is required.

If the DR priority of the OSPFv3 interface is not zero, proceed to step 2.

¡ If the neighbor state remains ExStart:

It indicates that the device is performing DD negotiation but cannot perform DD synchronization. The following are the common causes for this type of issue:

- The interface cannot send and receive oversized packets correctly.

Execute the ping -s packet-size neighbor-address command multiple times and set 1500 or greater value for the packet-size argument to view the numbers of transmitted and received packets. If the remote end cannot be pinged, resolve the link issue first.

- The two ends are configured with different MTU values.

If the OSPFv3 interface is not configured to ignore MTU check by using the ospfv3 mtu-ignore command, identify whether the two ends are configured with the same MTU value. If they are configured with different MTU values, configure the same MTU value for them.

If the issue persists, proceed to step 2.

¡ If the neighbor state remains Exchange:

It indicates that the device is performing DD packet exchange. Troubleshoot this issue in the same way as when the neighbor state remains ExStart.

If the issue persists, proceed to step 2.

¡ If the neighbor state remains Loading:

Execute the reset ospfv3 [ process-id ] process command to restart the OSPFv3 process.

If the issue persists, proceed to step 2.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

OSPF issues

OSPF neighbor down

Symptom

· OSPF neighbor down

· OSPF neighbor flapping occurs.

Common causes

The following are the common causes of this type of issue:

· The BFD session is down, which indicates that BFD has detected a link failure.

· The remote device failed.

· The CPU usage is too high.

· Link failures occurred.

· The OSPF interface is not up.

· The IP addresses of the two ends are not on the same network segment.

· The OSPF settings of the two ends do not match.

¡ A Router ID conflict exists.

¡ The two ends are configured with different area types.

¡ The two ends are configured with different OSPF authentication modes.

¡ The neighboring OSPF interfaces use different timer settings.

¡ The neighboring OSPF interfaces are configured with different network types.

Troubleshooting flow

Figure 58 shows the troubleshooting flowchart.

Figure 58 Troubleshooting flowchart

Solution

1. Identify the reason for the OSPF neighbor down issue through the CLI.

Execute the display ospf event-log peer command. The Reason field in the command output displays the reason for the neighbor state change. Common options include:

¡ DeadExpired

The device had not received any Hello packet before the dead timer expired, and the OSPF neighbor state became Down. In this case, proceed to step 2.

¡ BFDDown

The OSPF neighbor state became Down, because the BFD session went down. In this case, proceed to step 2.

¡ IntVliChange or Virtual link was deleted or the route it relies on was deleted

The neighbor relationship became Down, because the virtual link or its dependency route was deleted. In this case, proceed to step 2.

¡ 1-Way

The OSPF state of the local device became Init, because the OSPF state of the remote device became Down and sent a 1-way Hello packet to the local device. In this case, perform troubleshooting on the remote device.

¡ IntPhyChange

The neighbor relationship became Down, because a related interface went down or its MTU changed. In this case, execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the running state and related information about the interface, and then troubleshoot the interface faults. For other situations, proceed to 11.

2. Check for link failures.

Execute the ping command to check for link failures. If the related links operate correctly, proceed to 3.

3. Identify whether the CPU usage is too high.

Execute the display cpu-usage command to identify whether the CPU usage of the device's MPU and interface modules is excessively high. High CPU usage prevents the normal transmission of OSPF packets, causing neighbor flapping. To resolve this issue, disable unnecessary functions. If the CPU usage is not high, proceed to step 5.

4. Identify whether the memory usage exceeds the memory usage threshold.

Execute the display memory-threshold command. If the Current free-memory state field in the output, which represents the current memory usage of the system, displays Minor, Severe, or Critical, it indicates that the remaining free memory is relatively insufficient. In this case, the device might be unable to send or receive OSPF packets, or might process OSPF packets slowly. To resolve this issue, disable unnecessary functions. If the Current free-memory state field displays Normal, proceed to step 5.

5. Identify whether the physical link state of each OSPF interface is in UP.

Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to identify whether the physical link state of each OSPF interface is UP.

¡ If the physical link state of an OSPF interface is Down, you must recover that interface.

¡ If the physical link state of each OSPF interface is UP, execute the display ospf interface command to identify whether each OSPF interface is in a normal OSPF state.

- If an OSPF interface is in DOWN state, identify whether the network command was executed in the related OSPF process to advertise the network segment to which that interface belongs. If OSPF did not advertise the network segment, identify whether OSPF is enabled on the interface. If OSPF is enabled, troubleshoot the interface issue on the network layer.

- If the OSPF interfaces are in a normal state, including DR, BDR, DROther, and P-2-P, proceed to step 6.

6. Identify whether the IP addresses of the two ends are in the same network segment.

Execute the display interface brief command to view IP addresses of the two neighboring interfaces.

¡ If the two interface IP addresses are not in the same network segment, execute the ip address command on either of the interfaces to change its IP address. Make sure IP addresses of the two neighboring interfaces are in the same network segment.

¡ If the two interface IP addresses are in the same network segment, proceed to step 7.

7. Identify whether the related OSPF interfaces have the same MTU value.

If the ospf mtu-enable command was executed on the OSPF interfaces, those OSPF interfaces must add the same MTU value to DD packets. If this requirement is not met, the OSPF interfaces cannot be establish an OSPF neighbor relationship. By default, the MTU value in DD packets is 0. Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] command to view the MTU information of each interface.

¡ If the interfaces have different MTU values, execute the mtu size command in interface view to configure the same MTU value for the interfaces.

¡ If the interfaces have the same MTU value, proceed to step 8.

8. Verify that the DR priority of each neighboring OSPF interface is not zero.

On a broadcast or NBMA network, to elect a DR correctly, make sure that a minimum of one OSPF interface has a non-zero DR priority. If the DR priority is 0 for both of the two neighboring OSPF interfaces, the highest neighbor states at both ends are 2-Way. Execute the display ospf interface command to view OSPF interface information. The Priority field in the command output displays the DR priority of an interface.

If the DR priority of each neighboring interface is not zero, proceed to step 9.

9. Identify whether an NBMA or P2MP unicast neighbor has been manually specified.

When the network type of an OSPF interface is NBMA or P2MP unicast, you must use the peer command to specify a neighbor by its IP address. Execute the display this command in interface view. If the network type of an interface is NBMA or P2MP unicast, execute the peer command to manually specify a neighbor by its IP address.

If an NBMA or P2MP unicast neighbor has been manually specified, proceed to step 10.

10. Identify whether the OSPF settings at the two ends are correct.

a. Execute the display ospf command to view the OSPF router IDs of the two ends. If the two ends use the same router ID, edit the configuration to avoid the conflict. If the two ends use different router IDs, proceed to the next step.

b. Execute the display ospf interface command to view the OSPF area IDs of the two ends. If the two ends use different area IDs, edit the configuration to ensure area ID consistency. If the two ends use the same area ID, proceed to the next step.

c. Execute the display ospf interface command to view the network types of interfaces at the two ends. If the two interfaces are configured with different network types, edit the configuration to ensure network type consistency. If the network type is PTP for one end and Broadcast for the other, the neighbor relationship can reach Full state, but routing information cannot be calculated.

If the two ends are configured with the same network type, proceed to the next step.

d. Execute the display ospf statistics error command every 10 seconds for 5 minutes to view OSPF error statistics. Pay attention to the following fields:

- Bad authentication type: If the value of this field keeps increasing, the two OSPF neighbors are configured with different OSPF authentication modes. To resolve this issue, configure the same authentication mode for them.

- Hello-time mismatch: If the value of this field keeps increasing, the two neighboring interfaces are configured with different hello intervals. To resolve this issue, configure the same hello interval for them.

- Dead-time mismatch: If the value of this field keeps increasing, the two neighboring interfaces are configured with different dead intervals. To resolve this issue, configure the same dead interval for them.

- Ebit option mismatch: If the value of this field keeps increasing, the two neighbors are of different OSPF area types, for example, one is in a normal area and the other in a stub area. To resolve this issue, configure the same area type for them.

If the issue persists, proceed to step 11.

11. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module name: OSPF-TRAP-MIB

· ospfVirtIfStateChange (1.3.6.1.2.1.14.16.2.1)

· ospfNbrStateChange (1.3.6.1.2.1.14.16.2.2)

· ospfVirtNbrStateChange (1.3.6.1.2.1.14.16.2.3)

Log messages

· OSPF/5/OSPF_NBR_CHG

· OSPF/5/OSPF_NBR_CHG_REASON

OSPF neighbor unable to enter Full state

Symptom

The OSPF neighbor state machine involves neighbor states of Down, Init, 2-way, ExStart, Exchange, Loading, and Full. Among them, the stable states are Down, 2-way, and Full.

· Down—OSPF is not enabled.

· 2-way—The neighbor relationship between DR Others.

· Full—The local device and the neighbor are fully adjacent.

In networks using OSPF for route calculation and forwarding, only 2-way and Full are normal neighbor states. If the neighbor state is neither 2-way nor Full, it indicates an abnormal neighbor relationship.

Common causes

The following are the common causes of this type of issue:

· OSPF packets were dropped due to link failures.

· The DR priority of the neighboring interfaces is not appropriate.

· The two ends use different MTU values.

Troubleshooting flow

Figure 59 shows the troubleshooting flowchart:

Figure 59 Troubleshooting flowchart

Solution

1. Execute the display ospf peer command to view OSPF neighbor information, and perform different tasks based on the neighbor state.

¡ If no neighbor information exists:

The OSPF neighbor went down or flapped. See "OSPF neighbor down" to troubleshoot the issue.

¡ If the neighbor state remains Init:

The remote device cannot receive Hello packets from the local device. In this case, identify whether the link or the remote device has failed.

¡ If the neighbor state remains 2-Way:

Execute the display ospf interface verbose command to identify whether the DR priority of the neighbor-facing OSPF interface is zero.

- If the DR priority of the OSPF interface is zero, no action is required.

- If the DR priority of the OSPF interface is not zero, proceed to step 2.

¡ If the neighbor state remains ExStart:

The device is performing DD negotiation but cannot perform DD synchronization. The following are the common causes for this issue:

- The neighbor-facing interface cannot send and receive oversized packets correctly.

Repeat the ping -s packet-size neighbor-address command, set the value for the packet-size argument to 1500 or greater, and then view the transmission and reception of oversized packets. If the remote end still cannot be pinged, troubleshoot the link issue first.

- The two ends use different MTU values.

If the ospf mtu-enable command was configured on the neighbor-facing OSPF interface, identify whether the two ends use the same MTU value. If they use different MTU values, configure the same MTU value for them.

If the issue persists, proceed to step 2.

¡ If the neighbor state remains Exchange:

The device is performing DD packet exchange. Troubleshoot this issue in the same way as when the neighbor state remains ExStart.

If the issue persists, proceed to step 2.

¡ If the neighbor state remains Loading:

Execute the reset ospf [ process-id ] process command to restart the OSPF process.

If the issue persists, proceed to step 2.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

OSPF device unable to learn partial OSPF routes

Symptom

An OSPF device fails to learn partial OSPF routes.

Common causes

The following are the common causes of this type of issue:

· The network type is P2P for one end and is Broadcast for the other end. Although the neighbor relationship is in Full state, the two ends cannot learn routes from each other.

· The OSPF process is configured with the filter-policy import command.

· The filter import command is configured in the local OSPF area.

· The filter export command is configured in other OSPF areas.

· The OSPF process is bound to a VPN instance. The tag of routes redistributed to the OSPF process is the same as that in the AS External LSA (Type-5 LSA) or NSSA External LSA (Type-7 LSA).

· The ABR is unreachable.

· The ABR does not take the Summary LSAs from non-backbone areas into account during route calculation.

· The ASBR is unreachable.

· In the AS External LSA or NSSA External LSA, the FA address is unreachable.

· The route to the FA address in the NSSA External LSA is not in the same area as the NSSA External LSA.

Troubleshooting flow

Figure 60 and Figure 61 show the troubleshooting flowcharts.

Figure 60 Troubleshooting flowchart 1

Figure 61 Troubleshooting flowchart 2

Solution

1. Identify whether the network type is P2P for one end and is Broadcast for the other end.

If yes, the neighbor relationship can reach Full state, but the two ends cannot learn routes from each other. To resolve this issue:

a. Execute the display ospf interface command to view the network types of the two neighboring OSPF interfaces.

<Sysname> display ospf interface

OSPF Process 1 with Router ID 5.5.5.5

Interfaces

Area: 0.0.0.1

IP Address Type State Cost Pri DR BDR

192.168.51.5 PTP P-2-P 1 1 0.0.0.0 0.0.0.0

b. If this issue exists, execute the ospf network-type command to configure the same network type for the two neighboring interfaces.

If this issue does not exist, proceed to step 2.

2. Check the OSPF routing table multiple times for OSPF route flapping.

Execute the display ip routing-table protocol ospf verbose command, and then identify flapping OSPF routes by the Age field in the command output.

¡ If the Age field of an OSPF route displays a small value, the OSPF route flaps, and you must troubleshoot the route flapping issue.

¡ If no route flapping issue is found, proceed to step 3.

<Sysname> display ip routing-table protocol ospf verbose

Summary count : 3

Destination: 192.168.12.0/24

Protocol: O_INTER

Process ID: 1

SubProtID: 0x2 Age: 12h53m09s

Cost: 2 Preference: 10

IpPre: N/A QosLocalID: N/A

Tag: 0 State: Active Adv

OrigTblID: 0x0 OrigVrf: default-vrf

TableID: 0x2 OrigAs: 0

NibID: 0x13000003 LastAs: 0

AttrID: 0xffffffff Neighbor: 0.0.0.0

Flags: 0x10041 OrigNextHop: 192.168.51.1

Label: NULL RealNextHop: 192.168.51.1

BkLabel: NULL BkNextHop: N/A

SRLabel: NULL Interface: GigabitEthernet2/0/2

BkSRLabel: NULL BkInterface: N/A

SIDIndex: NULL InLabel: NULL

Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/2

BkTunnel ID: Invalid BkIPInterface: N/A

FtnIndex: 0x0 ColorInterface: N/A

TrafficIndex: N/A BkColorInterface: N/A

Connector: 0.0.0.0 VpnPeerId: N/A

Dscp: N/A Exp: N/A

SRTunnelID: Invalid StatFlags: 0x0

SID Type: N/A SID: N/A

BkSID: N/A NID: Invalid

FlushNID: Invalid BkNID: Invalid

BkFlushNID: Invalid PathID: 0x0

CommBlockLen: 0

OrigLinkID: 0x0 RealLinkID: 0x0

Destination: 192.168.24.0/24

Protocol: O_INTER

Process ID: 1

SubProtID: 0x2 Age: 12h53m09s

Cost: 3 Preference: 10

IpPre: N/A QosLocalID: N/A

Tag: 0 State: Active Adv

OrigTblID: 0x0 OrigVrf: default-vrf

TableID: 0x2 OrigAs: 0

NibID: 0x13000003 LastAs: 0

AttrID: 0xffffffff Neighbor: 0.0.0.0

Flags: 0x10041 OrigNextHop: 192.168.51.1

Label: NULL RealNextHop: 192.168.51.1

BkLabel: NULL BkNextHop: N/A

SRLabel: NULL Interface: GigabitEthernet2/0/2

BkSRLabel: NULL BkInterface: N/A

SIDIndex: NULL InLabel: NULL

Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/2

BkTunnel ID: Invalid BkIPInterface: N/A

FtnIndex: 0x0 ColorInterface: N/A

TrafficIndex: N/A BkColorInterface: N/A

Connector: 0.0.0.0 VpnPeerId: N/A

Dscp: N/A Exp: N/A

SRTunnelID: Invalid StatFlags: 0x0

SID Type: N/A SID: N/A

BkSID: N/A NID: Invalid

FlushNID: Invalid BkNID: Invalid

BkFlushNID: Invalid PathID: 0x0

CommBlockLen: 0

OrigLinkID: 0x0 RealLinkID: 0x0

Destination: 192.168.51.0/24

Protocol: O_INTRA

Process ID: 1

SubProtID: 0x1 Age: 12h54m07s

Cost: 1 Preference: 10

IpPre: N/A QosLocalID: N/A

Tag: 0 State: Inactive Adv

OrigTblID: 0x0 OrigVrf: default-vrf

TableID: 0x2 OrigAs: 0

NibID: 0x13000001 LastAs: 0

AttrID: 0xffffffff Neighbor: 0.0.0.0

Flags: 0x10c1 OrigNextHop: 0.0.0.0

Label: NULL RealNextHop: 0.0.0.0

BkLabel: NULL BkNextHop: N/A

SRLabel: NULL Interface: GigabitEthernet2/0/2

BkSRLabel: NULL BkInterface: N/A

SIDIndex: NULL InLabel: NULL

Tunnel ID: Invalid IPInterface: GigabitEthernet2/0/2

BkTunnel ID: Invalid BkIPInterface: N/A

FtnIndex: 0x0 ColorInterface: N/A

TrafficIndex: N/A BkColorInterface: N/A

Connector: 0.0.0.0 VpnPeerId: N/A

Dscp: N/A Exp: N/A

SRTunnelID: Invalid StatFlags: 0x0

SID Type: N/A SID: N/A

BkSID: N/A NID: Invalid

FlushNID: Invalid BkNID: Invalid

BkFlushNID: Invalid PathID: 0x0

CommBlockLen: 0

OrigLinkID: 0x0 RealLinkID: 0x0

3. Identify whether the filter-policy import command is configured in the OSPF process.

In scenarios where route filtering is configured, check for OSPF route filtering errors.

a. Execute the display this command in the related OSPF process on the local device to identify whether the filter-policy import command is configured in the OSPF process.

[Sysname-ospf-1] display this

ospf 1

import-route direct

filter-policy 2000 import

area 0.0.0.1

network 192.168.51.0 0.0.0.255

nssa

return

b. If the filter-policy import command is configured, identify whether the filtering rule specified by using this command are appropriate.

- If an ACL is used for route filtering, execute the display acl { acl-number | name acl-name } command to view its configuration details.

- If a prefix list is used for route filtering, execute the display ip prefix-list command to view its configuration details.

- If a routing policy is used for route filtering, execute the display route-policy command to view its configuration details.

If the desired routes are unexpectedly denied by the filtering rule, identify whether the filtering rule meets the requirements. If it is inappropriate, specify a new filtering rule by using the filter-policy import command.

c. If the desired routes are not denied by the filtering rule or the filter-policy import command is not configured in the OSPF process, proceed to step 4.

4. Identify whether the LSDB of the OSPF process contains LSAs that carry the OSPF routes that have not been learned.

Choose the appropriate troubleshooting method based on the type of OSPF routes that have not been learned in the OSPF process.

¡ I ntra-area OSPF routes

If the OSPF process lacks intra-area routes, execute the display ospf [ process-id ] lsdb router command in user view to identify whether the LSDB of the OSPF process contains all the Router LSAs in the area.

<Sysname> display ospf 100 lsdb router

OSPF Process 100 with Router ID 5.5.5.5

Area: 0.0.0.1

Link State Database

Type : Router

LS ID : 5.5.5.5

Adv Rtr : 5.5.5.5

LS age : 7

Len : 36

Options : ASBR O NP

Seq# : 80000026

Checksum : 0x5f1f

Link Count: 1

Link ID: 192.168.51.1

Data : 192.168.51.5

Link Type: TransNet

Metric : 1

Type : Router

LS ID : 1.1.1.1

Adv Rtr : 1.1.1.1

LS age : 8

Len : 36

Options : ASBR ABR O NP

Seq# : 8000002a

Checksum : 0x534a

Link Count: 1

Link ID: 192.168.51.1

Data : 192.168.51.1

Link Type: TransNet

Metric : 1

- If the LSDB lacks some Router LSAs, proceed to step 7.

- If the LSDB contains all of the Router LSAs, but cannot calculate routing information, proceed to step 7.

¡ Inter-area OSPF routes

If the OSPF process lacks inter-area routes, execute the display ospf [ process-id ] lsdb summary command in user view to identify whether the LSDB of the OSPF process contains all the Network Summary LSAs from other areas.

<Sysname> display ospf lsdb summary

OSPF Process 1 with Router ID 5.5.5.5

Area: 0.0.0.1

Link State Database

Type : Sum-Net

LS ID : 192.168.24.0

Adv Rtr : 1.1.1.1

LS age : 576

Len : 28

Options : O NP

Seq# : 8000001f

Checksum : 0x4c25

Net Mask : 255.255.255.0

Tos 0 Metric: 2

Type : Sum-Net

LS ID : 192.168.12.0

Adv Rtr : 1.1.1.1

LS age : 576

Len : 28

Options : O NP

Seq# : 8000001f

Checksum : 0xc6b7

Net Mask : 255.255.255.0

Tos 0 Metric: 1

- If the LSDB lacks a Network Summary LSA, identify whether the filter import command is configured in the local OSPF area or the filter export command is configured in the OSPF area from which the missing Network Summary LSA was advertised. If the Network Summary LSA was unexpectedly filtered out by the filtering rule specified by using the filter import or filter export command, adjust filtering rule to avoid this issue.

You can use the filter import and filter export commands to specify ACLs, prefix lists, or routing policies for route filtering. To view the configuration details of an ACL, prefix list, or routing policy, execute one of the display acl { acl-number | name acl-name }, display ip prefix-list, or display route-policy command as needed.

- If the LSDB contains all of the Network Summary LSAs, but cannot calculate routing information, proceed to step 7.

¡ O_ASE or O_NSSA routes

If the OSPF process lacks O_ASE routes, execute the display ospf [ process-id ] lsdb ase command in user view to identify whether the LSDB of the OSPF process contains AS External LSAs.

<Sysname> display ospf 100 lsdb ase

OSPF Process 100 with Router ID 1.1.1.1

Link State Database

Type : External

LS ID : 10.1.1.0

Adv Rtr : 1.1.1.1

LS age : 713

Len : 36

Options : O E

Seq# : 80000001

Checksum : 0x934b

Net Mask : 255.255.255.0

TOS 0 Metric: 1

E Type : 2

Forwarding Address : 192.168.51.5

Tag : 1

If the OSPF process lacks O_NSSA routes, execute the display ospf [ process-id ] lsdb nssa command in user view to identify whether the LSDB of the OSPF process contains NSSA External LSAs.

<Sysname> display ospf 100 lsdb nssa

OSPF Process 100 with Router ID 1.1.1.1

Area: 0.0.0.0

Link State Database

Area: 0.0.0.1

Link State Database

Type : NSSA

LS ID : 192.168.51.0

Adv Rtr : 5.5.5.5

LS age : 965

Len : 36

Options : O NP

Seq# : 8000001f

Checksum : 0x1dfa

Net Mask : 255.255.255.0

TOS 0 Metric: 1

E Type : 2

Forwarding Address : 192.168.51.5

Tag : 1

Type : NSSA

LS ID : 10.1.1.0

Adv Rtr : 5.5.5.5

LS age : 965

Len : 36

Options : O NP

Seq# : 8000001f

Checksum : 0x6840

Net Mask : 255.255.255.0

TOS 0 Metric: 1

E Type : 2

Forwarding Address : 192.168.51.5

Tag : 1

- If the LSDB lacks some AS External LSAs or NSSA External LSAs, proceed to step 7.

- If the LSDB contains all of the AS External LSAs or NSSA External LSAs, but cannot learn O_ASE or O_NSSA routes, proceed to step 7.

5. Identify whether the ABR is reachable.

Inter-area routes are advertised by the ABR. If the local device and the ABR cannot reach each other, the local device will not be able to learn inter-area routes.

a. Execute the display ospf [ process-id ] lsdb summary command on the local device, and then view the Adv Rtr field in the command output. This field displays the router ID of the ABR, which advertised the Network Summary LSA.

<Sysname> display ospf 100 lsdb summary

OSPF Process 100 with Router ID 5.5.5.5

Area: 0.0.0.1

Link State Database

Type : Sum-Net

LS ID : 192.168.12.0

Adv Rtr : 1.1.1.1

LS age : 913

Len : 28

Options : O E

Seq# : 80000001

Checksum : 0x5d45

Net Mask : 255.255.255.0

Tos 0 Metric: 1

b. Execute the display ospf abr-asbr command on the local device, and then view the Destination and RtType fields in the command output. If the RtType field displays ABR, the Destination field displays the router ID of the ABR. In this situation, the local device has a route to the ABR.

<Sysname> display ospf 100 abr-asbr

OSPF Process 100 with Router ID 5.5.5.5

Routing Table to ABR and ASBR

Type Destination Area Cost Nexthop RtType

Intra 1.1.1.1 0.0.0.1 1 192.168.51.1 ABR

c. If the output of the display ospf abr-asbr command does not include a route to the ABR, proceed to step 7.

d. If the output of the display ospf abr-asbr command includes a route to the ABR, and the local device is an ABR, identify whether the local OSPF area is a backbone area.

- If the OSPF area is not a backbone area (with a non-zero area ID), no action is required. According to RFC 2328, ABRs do not process the Network Summary LSAs received from non-backbone areas.

- If the OSPF area is a backbone area (with an area ID of 0), but it cannot learn inter-area routes, proceed to step 7.

e. If the output of the display ospf abr-asbr command includes a route to the ABR, and the OSPF process is bound to a VPN instance, identify whether the vpn-instance-capability simple command is configured in the OSPF process. If this command is configured, proceed to step 7.

If this command is not configured, troubleshoot this issue as described in Table 12.

Table 12 Troubleshooting methods

Whether the DN bit is set to 1	Troubleshooting method
The vpn-instance-capability simple command is not configured, and the Option field of the related Network Summary LSA contains the DN bit (the DN bit is set).	According to RFC 2328, private OSPF processes do not use Network Summary LSAs that contain the DN bit for route calculation. It is normal that the local device cannot learn inter-area routes.
The vpn-instance-capability simple command is not configured, and the Option field of the related Network Summary LSA does not contain the DN bit.	Proceed to step 7.

6. Identify whether the ASBR is reachable and whether loop prevention is enabled.

O_ASE routes and O_NSSA routes are advertised by the ASBR. If the local device and the ASBR cannot reach each other, the local device will not be able to learn routes from devices located in other ASs.

a. Execute the display ospf [ process-id ] lsdb [ ase | nssa ] command, and then view the Adv Rtr field in the command output. This field displays the router ID of the ASBR, which advertised the AS External LSA (Type-5) or NSSA External LSA (Type-7).

<Sysname> display ospf 100 lsdb ase

OSPF Process 100 with Router ID 1.1.1.1

Link State Database

Type : External

LS ID : 10.1.1.0

Adv Rtr : 1.1.1.1

LS age : 169

Len : 36

Options : O E

Seq# : 80000001

Checksum : 0x934b

Net Mask : 255.255.255.0

TOS 0 Metric: 1

E Type : 2

Forwarding Address : 192.168.51.5

Tag : 1

<Sysname> display ospf 100 lsdb nssa

OSPF Process 100 with Router ID 1.1.1.1

Area: 0.0.0.0

Link State Database

Area: 0.0.0.1

Link State Database

Type : NSSA

LS ID : 192.168.51.0

Adv Rtr : 5.5.5.5

LS age : 156

Len : 36

Options : O NP

Seq# : 80000001

Checksum : 0x59dc

Net Mask : 255.255.255.0

TOS 0 Metric: 1

E Type : 2

Forwarding Address : 192.168.51.5

Tag : 1

Type : NSSA

LS ID : 10.1.1.0

Adv Rtr : 5.5.5.5

LS age : 156

Len : 36

Options : O NP

Seq# : 80000001

Checksum : 0xa422

Net Mask : 255.255.255.0

TOS 0 Metric: 1

E Type : 2

Forwarding Address : 192.168.51.5

Tag : 1

b. Execute the display ospf abr-asbr command, and then view the Destination and RtType fields in the command output. If the RtType field displays ASBR, the Destination field displays the router ID of the ASBR. In this situation, the local device has a route to the ASBR.

<Sysname> display ospf 100 abr-asbr

OSPF Process 100 with Router ID 1.1.1.1

Routing Table to ABR and ASBR

Type Destination Area Cost Nexthop RtType

Intra 5.5.5.5 0.0.0.1 1 192.168.51.5 ASBR

c. If the output of the display ospf abr-asbr command does not include a route to the ASBR, proceed to step 7.

d. If the output of the display ospf abr-asbr command includes a route to the ASBR, and the Forwarding Address field of the LSA is not 0, check the reachability and route type of the forwarding address.

Execute the disply ospf routing forwarding-address { mask-length | mask } command in user view to identify whether the local device has a route to the forwarding address.

<Sysname> display ospf 100 routing 192.168.51.5 24

OSPF Process 100 with Router ID 1.1.1.1

Routing Table

Routing for network

Destination Cost Type NextHop AdvRouter Area

192.168.51.0/24 1 Transit 0.0.0.0 5.5.5.5 0.0.0.1

Total nets: 1

Intra area: 1 Inter area: 0 ASE: 0 NSSA: 0

Troubleshoot this issue as described in Table 13.

Table 13 Troubleshooting methods

Whether the forwarding address is reachable	Troubleshooting method
Unreachable	If the display ospf routing forwarding-address { mask-length \| mask } command does not display route information for the forwarding address, the forwarding address is unreachable. In this case, proceed to step 7.
Reachable	If the missing external routes are advertised in an NSSA External LSA, no action is required. According to RFC 3101, the route to the forwarding address must belong to the same OSPF area as the NSSA External LSA. If the Area field displays an area ID different from that of an NSSA External LSA, the OSPF process will not use that NSSA External LSA for route calculation. Therefore, it is normal that the OSPF process lacks the related external routes.
Reachable	If the Type field displays Type1 or Type2 in the output of the display ospf routing forwarding-address { mask-length \| mask } command, the route to the forwarding address is an external route. According to RFC 2328, if the route to a non-zero forwarding address is an external route, OSPF will not use the related LSA for route calculation. Therefore, it is normal that the OSPF process lacks the related external routes.

e. If the output of the display ospf abr-asbr command includes a route to the ASBR, and the OSPF process is bound to a VPN instance, identify whether the vpn-instance-capability simple command is configured in the OSPF process.

If this command is configured, proceed to step 7.

If this command is not configured, troubleshoot this issue as described in Table 14.

Table 14 Troubleshooting methods

Whether the DN bit is set to 1

Troubleshooting method

The vpn-instance-capability simple command is not configured, and the Option field of the related AS External LSA or NSSA External LSA contains the DN bit.

According to RFC 2328, private OSPF processes do not use AS External LSAs or NSSA External LSAs that contain the DN bit for route calculation. It is normal that the local device cannot learn the related external routes.

The vpn-instance-capability simple command is not configured, and the Option field of the related AS External LSA or NSSA External LSA does not contain the DN bit.

Execute the display ospf command, and then view the Default ASE parameters field in the command output to identify whether the AS External LSA or NSSA External LSA has the same tag value as the private OSPF process:

· If they use the same tag value, no action is required. According to RFC 2328, private OSPF processes do not use such LSAs for route calculation. Therefore, it is normal that the OSPF process does not have the related external routes.

· If they use different tag values, proceed to step 7.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Route flapping caused by an IP address conflict

Symptom

In an OSPF network, if different devices use the same interface IP address, OSPF route flapping will occur. When this issue occurs, the related devices typically have the following phenomena:

· The display cpu-usage command displays a high CPU usage.

· OSPF marks LSAs as stale frequently, and re-generates LSAs.

· The device refreshes routes frequently, and route calculations are incorrect.

Solution

In this troubleshooting example, the network diagram is as shown in Figure 62. The troubleshooting methods for other networks are similar as that for this network.

Figure 62 Network diagram

2. On each device in the OSPF network, execute the display ospf [ process-id ] lsdb command every second to view their OSPF LSDB information.

3. Check for abnormal LSA aging.

If abnormal LSA aging exists, you can find the following symptoms:

¡ On Device A, the Age field of a Network LSA (Type-2) remains at the minimum value, but its Sequence field increases rapidly. For example, in the following command output, the Age of Network LSA 172.168.0.1 (LinkStateID) does not naturally increase, and the Sequence field grows from 8000002D to 8000002F in a short time.

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 10.1.1.1

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 797 48 80000009 0

Router 1.1.1.1 1.1.1.1 835 36 80000005 0

Router 4.4.4.4 4.4.4.4 798 36 80000004 0

Router 10.1.1.1 10.1.1.1 415 36 80000007 0

Router 2.2.2.2 2.2.2.2 415 48 80000015 0

Network 192.168.0.2 3.3.3.3 802 32 80000002 0

Network 172.168.0.3 4.4.4.4 791 32 80000002 0

Network 172.168.0.1 10.1.1.1 7 32 8000002D 0

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 10.1.1.1

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 810 48 80000009 0

Router 1.1.1.1 1.1.1.1 848 36 80000005 0

Router 4.4.4.4 4.4.4.4 811 36 80000004 0

Router 10.1.1.1 10.1.1.1 428 36 80000007 0

Router 2.2.2.2 2.2.2.2 428 48 80000015 0

Network 192.168.0.2 3.3.3.3 815 32 80000002 0

Network 172.168.0.3 4.4.4.4 804 32 80000002 0

Network 172.168.0.1 10.1.1.1 4 32 8000002F 0

¡ On Device B, the Age field of the same Network LSA frequently switches between 3600 and other smaller values, and its Sequence field increases rapidly. For example, in the following command output, the Age of Network LSA 172.168.0.1 (LinkStateID) frequently switches between 3600 and other smaller values, and the Sequence field grows from 80000023 to 80000041 in a short time.

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 2.2.2.2

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 708 48 80000009 0

Router 1.1.1.1 1.1.1.1 746 36 80000005 0

Router 4.4.4.4 4.4.4.4 709 36 80000004 0

Router 10.1.1.1 10.1.1.1 329 36 80000007 0

Router 2.2.2.2 2.2.2.2 327 48 80000015 0

Network 172.168.0.3 4.4.4.4 702 32 80000002 0

Network 192.168.0.2 3.3.3.3 713 32 80000002 0

Network 172.168.0.1 10.1.1.1 3600 32 80000023 0

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 2.2.2.2

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 748 48 80000009 0

Router 1.1.1.1 1.1.1.1 786 36 80000005 0

Router 4.4.4.4 4.4.4.4 749 36 80000004 0

Router 10.1.1.1 10.1.1.1 369 36 80000007 0

Router 2.2.2.2 2.2.2.2 367 48 80000015 0

Network 172.168.0.3 4.4.4.4 742 32 80000002 0

Network 192.168.0.2 3.3.3.3 753 32 80000002 0

Network 172.168.0.1 10.1.1.1 7 32 80000041 0

¡ On Device C, the Age field of the same Network LSA remains at 3600 or the Network LSA occasionally disappears, and the Sequence field increases rapidly. For example, in the following command output, the Age of Network LSA 172.168.0.1 (LinkStateID) remains at 3600 or the Network LSA occasionally disappears. When the Network LSA exists, its Sequence field grows from 80000309 to 80000346 in a short time.

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 3.3.3.3

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 740 48 8000000D 0

Router 4.4.4.4 4.4.4.4 759 36 80000008 0

Router 10.1.1.1 10.1.1.1 364 36 8000000B 0

Router 2.2.2.2 2.2.2.2 366 48 80000019 0

Network 172.168.0.3 4.4.4.4 755 32 80000006 0

Network 192.168.0.2 3.3.3.3 744 32 80000006 0

Network 172.168.0.1 10.1.1.1 3600 32 80000309 0

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 3.3.3.3

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 745 48 8000000D 0

Router 4.4.4.4 4.4.4.4 764 36 80000008 0

Router 10.1.1.1 10.1.1.1 369 36 8000000B 0

Router 2.2.2.2 2.2.2.2 371 48 80000019 0

Network 172.168.0.3 4.4.4.4 760 32 80000006 0

Network 192.168.0.2 3.3.3.3 749 32 80000006 0

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 3.3.3.3

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 1302 48 8000000D 0

Router 4.4.4.4 4.4.4.4 1321 36 80000008 0

Router 10.1.1.1 10.1.1.1 926 36 8000000B 0

Router 2.2.2.2 2.2.2.2 928 48 80000019 0

Network 172.168.0.3 4.4.4.4 1317 32 80000006 0

Network 192.168.0.2 3.3.3.3 1306 32 80000006 0

Network 172.168.0.1 10.1.1.1 3600 32 80000346 0

4. Check for OSPF route flapping.

On Device B, execute the display ospf [ process-id ] routing command every second to check for route flapping.

<Sysname> display ospf 100 routing

OSPF Process 100 with Router ID 2.2.2.2

Routing Table

Routing for network

Destination Cost Type NextHop AdvRouter Area

192.168.0.0/24 1 Transit 0.0.0.0 3.3.3.3 0.0.0.0

172.168.0.0/24 1 Transit 0.0.0.0 10.1.1.1 0.0.0.0

Total nets: 2

Intra area: 2 Inter area: 0 ASE: 0 NSSA: 0

<Sysname> display ospf 100 routing

OSPF Process 100 with Router ID 2.2.2.2

Routing Table

Routing for network

Destination Cost Type NextHop AdvRouter Area

192.168.0.0/24 1 Transit 0.0.0.0 3.3.3.3 0.0.0.0

172.168.0.0/24 2 Transit 192.168.0.2 4.4.4.4 0.0.0.0

Total nets: 2

Intra area: 2 Inter area: 0 ASE: 0 NSSA: 0

If OSPF route flapping occurs, and multiple executions of the display ospf peer command show that the neighbor relationship is not flapping, an IP address conflict exists in the OSPF network. Meanwhile, this indicates that one of the conflicting devices is a DR, because Network LSAs (Type-2) are generated by DRs.

If two Network LSAs with the same LinkState ID exist and they are aging abnormally on any device, both of the conflicting devices are DRs.

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 10.1.1.1

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 367 48 80000021 0

Router 4.4.4.4 4.4.4.4 369 36 80000013 0

Router 10.1.1.1 10.1.1.1 477 36 80000012 0

Router 2.2.2.2 2.2.2.2 403 48 8000002B 0

Network 192.168.0.1 2.2.2.2 395 32 80000002 0

Network 172.168.0.1 3.3.3.3 3600 32 8000002B 0

Network 172.168.0.1 10.1.1.1 9 32 80000036 0

<Sysname> display ospf 100 lsdb

OSPF Process 100 with Router ID 10.1.1.1

Link State Database

Area: 0.0.0.0

Type LinkState ID AdvRouter Age Len Sequence Metric

Router 3.3.3.3 3.3.3.3 460 48 80000021 0

Router 4.4.4.4 4.4.4.4 462 36 80000013 0

Router 10.1.1.1 10.1.1.1 570 36 80000012 0

Router 2.2.2.2 2.2.2.2 496 48 8000002B 0

Network 192.168.0.1 2.2.2.2 488 32 80000002 0

Network 172.168.0.1 3.3.3.3 3600 32 80000034 0

Network 172.168.0.1 10.1.1.1 6 32 80000041 0

5. Identify the conflicting devices.

You can use the output of the display ospf lsdb command to find the devices causing the IP address conflict.

If only one of the conflicting devices is a DR, perform the following task:

a. Check the AdvRouter field of the abnormal Network LSA to find the router ID of the advertising DR.

b. Check the LinkState ID field of the abnormal Network LSA to identify the interface that uses the conflicting IP address, and then find the IP address of the interface.

c. Based on the obtained interface address and the IP address plan, identify another conflicting device.

In this example, the DR with Router ID 10.1.1.1 has an interface IP address conflict with another device, and the conflicting IP address is 172.168.0.1. Based on the IP address plan, you can find the other device causing the conflict.

If both of the conflicting devices are DRs, perform the following task:

d. Check the AdvRouter field of each abnormal Network LSA to find the router IDs of the advertising DRs.

e. Check the LinkState ID field of each abnormal Network LSA to identify the interfaces causing the IP address conflict.

6. Change the IP address of a conflicting device, according to the network IP address plan.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Equal-cost route issues

Some next hops of equal-cost routes do not participate in load sharing or the load sharing is uneven

Symptom

· Traffic is not distributed to one or multiple next hops of equal-cost routes. When you use the display counters rate outbound interface command to observe the packet transmission rate of related interfaces, you can find that the transmission rate is 0 on one or multiple output interfaces of the equal-cost routes.

· Traffic is load shared unevenly. When you use the display counters rate outbound interface command to observe the packet transmission rate of related interfaces, you can find that one or multiple output interfaces of the equal-cost routes have a noticeably lower transmission rate.

Common causes

The following are the common causes of this type of issue:

· The number of next hops exceeds the maximum number of next hops supported by the device.

· The routes with the output interface have not been configured or properly issued.

· The physical link state and the data link layer state of the output interface are not up.

· The IP address of the output interface and that of the next hop interface are not in the same network segment.

· The device does not have an ARP or ND entry for the next hop.

· The load sharing mode is inappropriate.

· The hardware resources are insufficient.

· The last hop traversed by the traffic is configured with load sharing.

Troubleshooting flow

Figure 63 shows the troubleshooting flowchart.

Figure 63 Flowchart for troubleshooting equal-cost route issues

Solution

1. Identify whether the number of equal-cost routes with the same destination exceeds the maximum number of equal-cost routes supported by the device.

a. To view the maximum number of IPv4 equal-cost routes supported by the system, execute the display max-ecmp-num command. To view the maximum number of IPv6 equal-cost routes supported by the system, execute the display ipv6 max-ecmp-num command.

b. To view the number of IPv4 equal-cost routes destined for a specific address, execute the display ip routing-table ip-address longer-match command with the destination address specified. To view the number of IPv6 equal-cost routes destined for a specific address, execute the display ipv6 routing-table ipv6-address longer-match command with the destination address specified. In the command output, all routes with the same destination but different next hops are equal-cost routes. The number of those equal-cost routes equals Summary count minus count A. The Summary count argument represents the value of the Summary count field in the command output. The count A argument represents the number of routes whose mask length is different from that of the destination address.

- If the number of equal-cost routes with the same destination reaches the upper limit, the excess equal-cost routes will not be flushed to the routing table. To edit the next hop of an equal-cost route in the routing table, delete the equal-cost route, and then configure a new equal-cost route.

- If the number of equal-cost routes with the same destination is lower the upper limit, proceed to step 2.

2. Identify w hether the equal-cost routes have been correctly flushed to the routing table.

Execute one of the following commands as needed to view the routes with the related destination:

¡ display ip routing-table [ vpn-instance vpn-instance-name ] ip-address [ mask-length | mask ] [ longer-match ] verbose

¡ display ipv6 routing-table [ vpn-instance vpn-instance-name ] ipv6-address [ prefix-length ] [ longer-match ] [ verbose ]

If the routing table does not contain the equal-cost route with the desired next hop and output interface, check for route configuration errors. If the route configuration is correct, proceed to step 3.

3. Identify whether the physical link state and the data link layer state of the output interface are up.

Execute the display interface [ interface-type [ interface-number | interface-number.subnumber ] ] or display ipv6 interface [ interface-type [ interface-number ] ] [ brief ] command to view states of the output interface.

¡ If the interface is not up at the physical layer or data link layer, resolve the interface or link failure.

¡ If the physical link state and the data link layer state of the output interface are up, proceed to step 4.

4. Identify whether the IP address of the output interface and that of the next hop interface are in the same network segment.

Execute the display interface brief or display ipv6 interface brief command on both the local device and the next hop device to view IP addresses of the interfaces that connect the two devices.

¡ If IP addresses of the two interfaces are not in the same network segment, execute the ip address/ipv6 address command in interface view for either of the interfaces to adjust its IP address. This operation ensures that IP addresses of the two interfaces are in the same network segment.

¡ If IP addresses of the two interfaces are not in the same network segment, proceed to step 5.

5. Identify whether an ARP or ND entry for the related next hop exists on the device.

To view ARP entries, execute the display arp command. To view ND entries, execute the display ipv6 neighbors command. If the device does not have an ARP or ND entry for the next hop, resolve this issue first. If the device has an ARP or ND entry for the next hop, proceed to step 6.

6. Identify whether the load sharing mode is appropriate.

¡ If the load sharing mode is inappropriate, determine the load sharing factors based on the packets to be load shared, and then execute the ip load-sharing mode command to adjust the load sharing mode. For example, if the packets with the same destination address carry different source IP addresses, IP protocol numbers, and destination port numbers, you can add these fields into the ip load-sharing mode command. If the issue persists after the load sharing mode is fully adjusted, proceed to step 7.

¡ If the load sharing mode is appropriate, proceed to step 7.

7. Check for hardware resource insufficiency.

¡ To view the IPv4 FIB entries that failed to be issued to the driver, use the display system internal fib prefix [ vpn-instance vpn-instance-name] entry-status f command.

¡ To view the IPv6 FIB entries that failed to be issued to the driver, use the display system internal ipv6 fib prefix [ vpn-instance vpn-instance-name ] entry-status f command.

The device is suffering from hardware resource insufficiency as long as the command output displays FIB entry information. To resolve this issue, disable unnecessary features to lower the hardware resource usage. If the issue persists, proceed to step 8.

8. Identify whether the last hop traversed by the traffic is configured with load sharing.

A device configured with load sharing might forward traffic to the local device. In this situation, affected by the load sharing algorithm, the traffic might be unevenly distributed when the local device transmits the traffic to nexthop devices. This is a normal phenomenon and no action is required. You need to identify whether the traffic from devices that are not configured with load sharing is unevenly distributed by the local device. If such an issue exists, proceed to step 9.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting RIR

The highest-priority link not selected for traffic forwarding

Symptom

In priority-based link selection, RIR does not select the link with the highest priority (Tunnel 1) for service traffic. Instead, it selects the link with second highest priority (Tunnel 2) for traffic forwarding.

Common causes

The following are the common causes of this type of issue:

· The route associated with the highest-priority link is unreachable.

· No ECMP routes are available to the destination IP address of service traffic.

· The bandwidth usage of the highest-priority link exceeds the specified lower threshold.

· The quality of the highest-priority link does not meet the requirements.

Analysis

Figure 64 shows the troubleshooting flowchart.

Figure 64 Flowchart for troubleshooting failure to select the highest-priority link for traffic forwarding

Solution

1. Identify whether the route associate with the highest-priority link (Tunnel 1) is reachable.

2. Execute the display tunnel flow-statistics command to view the path information (Interface) selected for the service traffic, namely Tunnel 2.

<Sysname> display tunnel flow-statistics flow 100

Flow 100:

Interface Out pps Out bps

Tunnel2 20 9600

3. Execute the display ip fast-forwarding cache command to view the 5-tuple information of the service traffic (that is, the output interface is Tunnel 2), and obtain the destination IP address of the service traffic.

<Sysname> display ip fast-forwarding cache

Total number of fast-forwarding entries: 1

SIP SPort DIP DPort Pro Input_If Output_If Flg

7.0.0.13 68 8.0.0.1 67 17 GE2/0/3 Tunnel2 5

4. Execute the display fib command to identify whether ECMP routes are available to the destination IP address, and the ECMP routes include link Tunnel 1.

¡ If no ECMP routes are available, check and edit route configuration to ensure that ECMP routes are available to the destination IP address and include link Tunnel 1. Only when such conditions exist, Tunnel 1 can participate in RIR route selection.

¡ If the ECMP routes exist, proceed to step 2.

<Sysname> display fib

Route destination count: 5

Directly-connected host count: 0

Flag:

U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination/Mask Nexthop Flag OutInterface/Token Label

8.0.0.1/32 127.0.0.1 UH Tunnel1 Null

8.0.0.1/32 127.0.0.1 UH Tunnel2 Null

5. Examine the bandwidth usage of the highest-priority link Tunnel 1.

6. Identify the bandwidth threshold. Verify that the flow priority-based-schedule enable command is configured in RIR-SDWAN view. If the command is not configured, the lower bandwidth usage threshold is 80%. If the command is configured, the lower bandwidth usage threshold is specified by the flow priority-based-schedule bandwidth-threshold command (default is 20%).

7. Execute the display rir sdwan bandwidth tunnel command to identify whether the bandwidth usage of Tunnel 1 exceeds the lower bandwidth threshold.

¡ If the lower bandwidth threshold is not exceeded, the bandwidth meets the link selection criteria, and you can proceed to step 3.

¡ If the lower bandwidth threshold is exceeded, the bandwidth does not meet the route selection criteria. Verify that the bandwidth of Tunnel 1 matches the bandwidth of the tunnel's physical output interface. If they do not match, edit the bandwidth of Tunnel 1 with the bandwidth command. If they match, it is normal for RIR to select Tunnel 2, because the bandwidth usage of Tunnel 1 does not meet the route selection criteria. Therefore, the device schedules service traffic to the lower-priority link Tunnel 2.

<Sysname> display rir sdwan bandwidth tunnel 1

Tunnel bandwidth info:

Interface Total bandwidth Remaining bandwidth Bandwidth usage

Tunnel1 200 kbps 200 kbps 0 %

Output interface bandwidth info:

PeerTTE: SiteID=1 DeviceID=2 IfID=2

Interface Total bandwidth Remaining bandwidth Bandwidth usage

GE2/0/1 200 kbps 200 kbps 0 %

8. Examine the quality of the highest-priority link Tunnel 1.

9. Execute the display rir sdwan flow command to verify that the CQI value for Tunnel 1 reaches 100.

¡ If the CQI value is 100, proceed to step 4.

¡ If the CQI value is below 100, proceed to step b.

<Sysname> display rir sdwan flow 1

Flow ID: 1

Session expected bandwidth: 2000 kbps

Quality policy: Yes

Tunnels with different preference values:

Preference: 8

Tunnel1

Site ID Device ID Interface ID CQI

100 1 100 80

100 2 110 90

10. Execute the display rir sdwan link-quality command to view the packet loss ratio (PktLoss (per mill)), delay (Delay (msec)), and jitter (Jitter (msec)) of Tunnel 1.

<Sysname> display rir sdwan link-quality

Tunnel1

Interface ID=1

Peer TTE: Site ID=1 Device ID=2 Interface ID=3

Connectivity: Connected

PktLoss (per mill): 0

Delay (msec) : 0

Jitter (msec) : 0

11. Compare the packet loss ratio, delay, and jitter of Tunnel 1 with the packet loss threshold (configure with the packet-loss threshold command), delay threshold (configure with the delay threshold command), and jitter threshold (configure with the jitter threshold command) configured in RIR-SDWAN view. If any of the thresholds is exceeded, the quality of the link Tunnel 1 does not meet the link selection criteria, and it is normal for RIR to select Tunnel 2. The device will then schedule service traffic to the link Tunnel 2, whose link quality meets the link selection criteria.

12. To fast restore the CQI value of link Tunnel 1 to 100, you can increase the values for the packet loss threshold (configure with the packet-loss threshold command), delay threshold (configure with the delay threshold command), and jitter threshold (configure with the jitter threshold command) configured in RIR-SDWAN view in the RIR-SDWAN view. This ensures that the quality of the link Tunnel 1 meets the link selection criteria.

13. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting multicast issues

MSDP issues

(S, G) entry creation failure

Symptom

The receiver-side MSDP peer fails to create (S, G) entries.

Common causes

The following are the common causes of this type of issue:

· The receiver-side MSDP peer fails to establish an MSDP peer relationship with the source-side MSDP peer.

· The receiver-side MSDP peer is not enabled with the SA message cache mechanism.

· The receiver-side MSDP peer does not receive SA messages from the source-side MSDP peer.

· The source-side MSDP peer is not created on the RP.

· Configuration errors exist, such as incorrect SA incoming policy, SA outgoing policy, or SA message creation policy.

Troubleshooting flow

Figure 65 shows the troubleshooting flowchart.

Figure 65 Flowchart for troubleshooting (S, G) entry creation failure

Solution

1. Verify that the receiver-side MSDP peer have successfully established an MSDP peer relationship with the source-side MSDP peer.

Execute the display msdp brief command on the receiver-side MSDP peer, and check the State field. If the State field is Established, an MSDP peer relationship has been established successfully.

¡ If the State field is not Established, verify that the interface used to establish a TCP connection with the source-side MSDP peer is correct and that the MSDP peers can ping each other. If the MSDP peers cannot ping each other, troubleshoot the ping failure as described in "Ping failure."

¡ If an MSDP peer relationship has been established successfully, proceed to the next step.

2. Verify that the receiver-side MSDP peer is enabled with the SA message cache mechanism.

Execute the display this command in MSDP view on the receiver-side MSDP peer to identify whether the SA message cache mechanism is enabled.

¡ If no, execute the cache-sa-enable command.

¡ If yes, proceed to the next step.

3. Identify whether the receiver-side MSDP peer receives SA messages from the source-side MSDP peer.

Execute the display this command to display (S, G) entries in the SA cache. Identify whether SA messages from the source-side MSDP peer are received by examining the (S, G) entries.

¡ If no, proceed to step 4.

¡ If yes, proceed to step 8.

4. Identify whether the source-side MSDP peer is configured with an SA outgoing policy.

Execute the display this command on the source-side MSDP peer to identify whether an SA outgoing policy is configured.

¡ If yes, perform one of the following tasks depending on whether an ACL is specified:

- If no ACL is specified, the source-side MSDP peer discards all SA messages, and use the undo peer sa-policy export command to delete the SA outgoing policy.

- If an ACL is specified, the source-side MSDP peer forwards only SA messages that the ACL permits. Verify that the SA messages are permitted by the specified ACL. If the SA messages are not permitted, use the undo peer sa-policy export command to delete the SA outgoing policy or modify the ACL.

¡ If no, proceed to the next step.

5. Identify whether the receiver-side MSDP peer is configured with an SA incoming policy.

Execute the display this command on the receiver-side MSDP peer to identify whether an SA incoming policy is configured.

¡ If yes, perform one of the following tasks depending on whether an ACL is specified:

- If no ACL is specified, the receiver-side MSDP peer discards all SA messages, and use the undo peer sa-policy export command to delete the SA incoming policy.

- If an ACL is specified, the receiver-side MSDP peer receives only SA messages that the ACL permits. Verify that the SA messages are permitted by the specified ACL. If the SA messages are not permitted, use the undo peer sa-policy export command to delete the SA incoming policy or modify the ACL.

¡ If no, proceed to the next step.

6. Verify that the source-side MSDP peer is the RP.

Execute the display pim routing-table command on the source-side MSDP peer, and check the Flag field. If the Flag field is 2MSDP, the source-side MSDP peer is the RP.

¡ If the source-side MSDP peer is not the RP, modify the RP configuration or modify the configuration on the receiver-side MSDP peer.

¡ If the source-side MSDP peer is the RP, proceed to the next step.

7. Identify whether the source-side MSDP peer is configured with an SA message creation policy.

Execute the display this command on the source-side MSDP peer to identify whether an SA message creation policy is configured.

¡ If yes, perform one of the following tasks depending on whether an ACL is specified:

- If no ACL is specified, the source-side MSDP peer does not advertise any (S, G) entries when creating SA messages, and use the undo import-source command to delete the SA message creation policy.

- If an ACL is specified, the source-side MSDP peer advertises only the (S, G) entries that the ACL permits. Verify that the (S, G) entries are permitted by the specified ACL. If the (S, G) entries are not permitted, use the undo import-source command to delete the SA message creation policy or modify the ACL.

¡ If no, proceed to the next step.

8. If the issue persists, collect the following information and contact the support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

MVPN issues

Default MDT establishment failure

Symptom

The PEs cannot establish a default MDT or establish PIM neighbor relationships in the same VPN.

Common causes

The following are the common causes of this type of issue:

· Incorrect MTI configuration. To establish a default MDT in a VPN instance, you must specify a default group and an MVPN source interface with a valid IP address for the MTI for that VPN instance on each PE.

· Incorrect default group configuration. You must specify the same default group for the same VPN instance across PEs. A default group uniquely identifies a default MDT. The PEs cannot establish a default MDT for a VPN instance if they have different default groups for that VPN instance.

· Incorrect PIM configuration. To correctly establish a default MDT for a VPN instance, you must enable the same PIM mode on all interfaces in that VPN instance across PEs and all interfaces on the P devices. PIM mode consistency ensures establishment of PIM neighbor relationships between PEs in the same VPN instance for a successful default MDT establishment.

· Absence of unicast routes or BGP peers. PIM can only obtain routing information correctly if both unicast routes and BGP peers are configured.

PIM disabled on the MTI for the VPN instance. This prevents PEs from establishing PIM neighbor relationships in the same VPN instance. To enable PIM on the MTI for a VPN instance, you must enable PIM on a minimum of one interface in the VPN instance.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 66.

Figure 66 Troubleshooting flowchart

Solution

1. Execute the display interface command to check the state and address encapsulation information of MTIs.

2. Execute the display multicast-vpn default-group command to verify that different PEs have the same default group for the same VPN instance.

3. Execute the display pim interface verbose command on each device. Verify that PIM is enabled on a minimum of one interface in the VPN instance on each PE. Ensure that the same PIM mode is enabled on the interfaces in the same VPN instance across PEs and all interfaces on the P devices.

4. Execute the display ip routing-table command to verify that the local PE has unicast route entries to the the remote PE in the same VPN instance.

5. Execute the display bgp peer command to verify that a BGP peer relationship has been established between the PEs.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Multicast routing table incorrectly built for a VPN instance

Symptom

The device cannot correctly build a multicast routing table for a VPN instance.

Common causes

The following are the common causes of this type of issue:

· The VPN instance or public instance does not have bootstrap router (BSR) information. To build a multicast routing table correctly for a VPN instance enabled with PIM-SM, both the VPN instance and the public instance must have the BSR information for that VPN instance.

· The VPN instance or public instance lacks RP information. To build a multicast routing table correctly for a VPN instance enabled with PIM-SM, both the VPN instance and the public instance must have the RP information for the VPN instance and routes to that RP. In addition, the devices in the public instance and VPN instance must correctly establish PIM neighbor relationships.

· No active routes are available between the DRs and the RP for the private network. You must make the DRs in the private network have routes to their RP, and the VPN instance for the private network has routes to the multicast source.

Troubleshooting flow

Figure 67 shows the troubleshooting flowchart.

Figure 67 Troubleshooting flowchart for issues with building the multicast routing table for a VPN instance

Solution

1. Use the display pim bsr-info command to verify that the public and VPN instances have BSR information. If BSR information is absent, check for unicast routes to the BSR.

2. Use the display pim rp-info command to verify that the RP information is correct. If no RP information is displayed, verify the presence of a unicast route to the RP. Additionally, execute the display pim neighbor command to verify that PIM neighbor relationships have been correctly established on both public and private networks.

3. Use the ping command to check connectivity between the DR and RP on the private network, and between the receivers and the multicast source.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

PIM issues

PIM neighbor establishment failure

Symptom

The PIM neighbor relationship fails to be established.

Common causes

The following are the common causes of this type of issue:

· The physical state of the interface is down.

· The primary IP address is not configured on the interface.

· The PIM function on the interface does not take effect.

· The interface is not enabled with PIM.

· The PIM-related configuration on the interface is incorrect.

Troubleshooting flow

Figure 68 shows the troubleshooting flowchart.

Figure 68 Flowchart for troubleshooting PIM neighbor establishment failure

Solution

1. Verify that the physical state of the interface is up.

Execute the display interface interface-type interface-number command, and check the Current state field for the physical state of the interface.

¡ If the physical state is up, proceed to the next step.

¡ If the physical state is down, troubleshoot the interface down issue.

2. Verify that the interface is configured with a primary IP address.

Execute the display this command on the interface, and check for the primary IP address.

¡ If the primary IP address is not configured, use the ip address command to configure it.

¡ If the primary IP address is configured, proceed to the next step.

3. Verify that the interface is enabled with PIM.

Execute the display current-configuration interface command to identify whether PIM is enabled on the interface.

¡ If PIM is not enabled, execute the pim dm or pim sm command on the interface.

¡ If PIM is enabled, proceed to the next step.

4. Verify that PIM has taken effect on the interface.

Execute the display pim interface command. If PIM information exists for the interface, PIM has taken effect on the interface.

¡ If PIM has not taken effect, execute the display current-configuration | include multicast command to identify whether IP multicast routing has been enabled.

- If IP multicast routing has not been enabled, execute the multicast routing command to enable it.

- If IP multicast routing has been enabled, proceed to the next step.

¡ If PIM has taken effect, proceed to the next step.

5. Verify that the PIM-related configuration on the interface is correct.

The following are the common configuration errors that can cause the PIM neighbor relationship to fail to be established:

¡ The IP addresses of the directly connected interfaces are not on the same network segment.

¡ A PIM hello policy is configured on the interface by using the pim neighbor-policy command, but the neighbor’s IP address is not permitted by the specified ACL and PIM hello messages from the neighbor are dropped. Identify whether a PIM hello policy is required.

- If yes, modify the ACL so that the IP address of the PIM neighbor can be permitted by the ACL.

- If no, execute the undo pim neighbor-policy command to delete it.

¡ The pim require-genid command is executed on the interface to drop the hello messages without generation ID options, and hello messages from the neighbor do not carry generation ID options. Identify whether hello messages without generation ID options must be dropped.

- If yes, proceed to the next step.

- If no, execute the undo pim require-genid command.

6. If the issue persists, collect the following information and contact the support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Layer 3 multicast traffic forwarding failure within a PIM domain

Symptom

After IP multicast routing is enabled, Layer 3 multicast traffic fails to be forwarded within the same PIM domain.

Common causes

The following are the common causes of this type of issue:

· An interface for forwarding multicast data is not enabled with PIM.

· The PIM function on an interface does not take effect.

· The PIM neighboring relationship fails to be established.

· The interface connected to the hosts is not enabled with IGMP.

· In a PIM-SM or BIDIR-PIM network, the RP is not configured or the RP information is incorrect.

· No RPF route to the RP or multicast source exists.

· An interface for forwarding multicast data has been configured with a multicast forwarding boundary.

· In a PIM-SM or BIDIR-PIM network, an incorrect multicast source policy is configured.

· No multicast entry is generated.

Troubleshooting flow

Figure 69 shows the troubleshooting flowchart.

Figure 69 Flowchart for troubleshooting Layer 3 multicast traffic forwarding failure within a PIM domain

Solution

1. Verify that the interface for forwarding multicast data is enabled with PIM.

Execute the display this command on the interface for forwarding multicast data, and identify whether the PIM-SM or PIM-DM configuration exists.

¡ If no, PIM is not enabled on the interface. Execute the pim sm or pim dm command on the interface. In the case of a BIDIR-PIM network, also execute the bidir-pim enable command to enable BIDIR-PIM.

¡ If yes, proceed to the next step.

2. Verify that PIM has taken effect on the interface.

Execute the display pim interface command. If PIM information exists for the interface, PIM has taken effect on the interface.

¡ If PIM has not taken effect, execute the display interface interface-type interface-number command, and check the Current state field for the physical state of the interface. If the physical state is down, troubleshoot the interface down issue.

¡ If PIM has taken effect, proceed to the next step.

3. Verify that the PIM neighbor relationship has been established successfully.

Execute the display pim neighbor command. If PIM neighbor information exists, the PIM neighbor relationship has been established successfully.

¡ If the PIM neighbor relationship fails to be established, see “PIM neighbor establishment failure” for troubleshooting.

¡ If the PIM neighbor relationship has been established successfully, proceed to the next step.

4. Verify that IGMP has taken effect on the interface connected to the subnet of hosts.

Execute the display igmp interface command. If IGMP information exists for the interface, IGMP has taken effect on the interface.

¡ If IGMP has not taken effect, execute the igmp enable command on the interface to enable IGMP.

¡ If IGMP has taken effect:

- For a PIM-SM or BIDIR-PIM network, proceed to step 5.

- For a PIM-DM network, proceed to step 7.

5. Verify that the RP information is correct in a PIM-SM or BIDIR-PIM network.

Execute the display pim rp-info command on each device in the network. Identify whether the RP information for the multicast group is the same on all devices.

¡ If the RP information is different and static RPs are used, execute the static-rp command on each device to configure the same static RP. To use dynamically elected RPs, proceed to step 6.

¡ If the RP information is the same on all devices, proceed to step 6.

6. Verify that an RPF route to the RP exists.

Execute the display multicast rpf-info command to check for the RPF route to the RP.

¡ If no RPF route to the RP exists, examine the unicast route configuration. Execute the ping command on both the device and the RP to identify whether they can ping each other successfully. If no, modify the unicast route configuration until they can ping each other successfully.

¡ If an RPF route to the RP exists, execute the display multicast rpf-info command, and check the Referenced route type field for the type of the referenced route.

- If the RPF route is a static multicast route, execute the display multicast routing-table static command to identify whether the static multicast route is correct.

- If the RPF route is a unicast route, execute the display ip routing-table command to identify whether the unicast route is the same as the RPF route.

If an RPF route to the RP exists and is correct, proceed to the next step.

7. Verify that an RPF route to the multicast source exists.

Execute the display multicast rpf-info command to check for the RPF route to the multicast source.

¡ If no RPF route to the multicast source exists, examine the unicast route configuration. Execute the ping command on both the device and the multicast source to identify whether they can ping each other successfully. If no, modify the unicast route configuration until they can ping each other successfully.

¡ If an RPF route to the multicast source exists, execute the display multicast rpf-info command, and check the Referenced route type field for the type of the referenced route.

- If the RPF route is a static multicast route (the Referenced route type field is multicast static), execute the display multicast routing-table static command to identify whether the static multicast route is correct.

- If the RPF route is a unicast route (the Referenced route type field is igp, egp, unicast (direct), or unicast), execute the display ip routing-table command to identify whether the unicast route is the same as the RPF route.

¡ If an RPF route to the multicast source exists and is correct, proceed to the next step.

8. Verify that the RPF interface and the connected interface of the device's RPF neighbor have not been configured with a multicast forwarding boundary.

Execute the display multicast boundary command to check for the multicast forwarding boundary configuration.

¡ If an interface is configured with a multicast forwarding boundary, execute the undo multicast boundary command to delete it.

¡ If no interface is configured with a multicast forwarding boundary, proceed to the next step.

9. Verify that no multicast source policy is configured or multicast data is permitted by a multicast source policy.

Execute the display this command in PIM view to check for a multicast source policy (configured by using the source-policy command).

¡ If a multicast source policy has been configured, verify that it permits multicast data to be forwarded. If the multicast source policy denies multicast data, execute the undo source-policy command to delete it or modify the ACL so that it can permit the multicast data to pass through.

¡ If no multicast source policy is configured, proceed to the next step.

10. Verify that multicast entries have been generated.

¡ If multicast entries exist, collect entry information and proceed to step 11.

¡ If no multicast entries exist, proceed to step 11.

Use the following commands to collect entry information:

¡ Execute the display pim routing-table command to check for PIM routing entries.

¡ Execute the display igmp group command to check for IGMP multicast group entries.

¡ Execute the display multicast routing-table command to check for multicast routing entries.

¡ Execute the display multicast forwarding-table command to check for multicast forwarding entries.

11. If the issue persists, collect the following information and contact the support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

SPT forwarding failure in PIM-SM

Symptom

Multicast data fails to be forwarded through the SPT in a PIM-SM network. This section applies to only non-RP devices. If the faulty device is an RP, contact the support.

Common causes

The following are the common causes of this type of issue:

· The interface connected to downstream devices does not receive PIM join messages.

· The interface is not enabled with PIM-SM.

· The RPF route to the multicast source is incorrect.

· Configuration errors exist, such as multicast forwarding boundary and multicast source policy.

Troubleshooting flow

Figure 70 shows the troubleshooting flowchart.

Figure 70 Flowchart for troubleshooting SPT forwarding failure in PIM-SM

Solution

1. Verify that a correct (S, G) entry exists in the PIM routing table.

Execute the display pim routing-table command to check for the correct (S, G) entry.

¡ If a correct (S, G) entry exists, execute the display multicast forwarding-table command at 15-second intervals, and check the Matched packets and Forwarded packets fields.

- If no (S, G) entry exists in the forwarding table or the values of the Matched packets and Forwarded packets fields do not increase, proceed to step 8.

- If a (S, G) entry exists in the forwarding table and the values of the Matched packets and Forwarded packets fields increase, also proceed to step 8.

¡ If no correct (S, G) entry exists, proceed to step 2.

2. Verify that the interface connected to the downstream device has received PIM join messages.

Under the guidance of the support, use a packet capture tool such as Wireshark to capture packets on the interface to identify whether PIM join messages are received.

¡ If not, use the packet capture tool to capture packets on the interface connecting the downstream device to the device to identify whether the downstream device has sent PIM join messages. If not, troubleshoot the downstream device. If yes, the communication between the device and the downstream device is abnormal, and proceed to step 8.

¡ If the interface connected to the downstream device has received PIM join messages, and proceed to step 3.

3. Verify that the interface is enabled with PIM-SM.

Execute the display pim interface verbose command to identify whether the RPF interface, RPF neighbor interface, and interface connected to the subnet of hosts (downstream interface on the receiver-side DR) have been enabled with PIM-SM.

¡ If any of the interfaces is not enabled with PIM-SM, execute the pim sm command on the interface. Verify that IP multicast routing has been enabled by using the multicast routing command and that the neighbor relationship has been established successfully (displayed by using the display pim neighbor command).

¡ If all of the interfaces are enabled with PIM-SM, proceed to step 4.

4. Verify that an RPF route to the multicast source exists.

Execute the display multicast rpf-info command to check for the RPF route to the multicast source.

¡ If no route to the RP exists, examine unicast route configuration. Execute the ping command on both the device and the multicast source to identify whether they can ping each other successfully. If no, modify the unicast route configuration until they can ping each other successfully.

¡ If a route to the RP exists, execute the display multicast rpf-info command, and check the Referenced route type field for the type of the referenced route.

¡ If an RPF route to the multicast source exists and is correct, proceed to step 5.

5. Verify that the DR corresponding to the interface for forwarding multicast data is the receiver-side DR.

Execute the display pim interface command, and check the DR-Address field. If (local) is displayed after the DR address, the DR is the receiver-side DR.

¡ If the DR is not the receiver-side DR, locate the device where the DR resides and perform step 6 on the device.

¡ If the DR is the receiver-side DR, perform step 6 on the current device.

6. Verify that the RPF interface and the connected interface of the device's RPF neighbor have not been configured with a multicast forwarding boundary.

Execute the display multicast boundary command to check for the multicast forwarding boundary configuration.

¡ If an interface is configured with a multicast forwarding boundary, execute the undo multicast boundary command to delete it.

¡ If no interface is configured with a multicast forwarding boundary, proceed to step 7.

7. Verify that no multicast source policy is configured or multicast data is not denied by a multicast source policy.

Execute the display this command in PIM view to check for a multicast source policy.

¡ If no interface is configured with a multicast forwarding boundary, proceed to step 8.

8. If the issue persists, collect the following information and contact the support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

RPT forwarding failure in PIM-SM

Symptom

Multicast data fails to be forwarded through the RPT in a PIM-SM network. This section applies to only non-RP devices. If the faulty device is an RP, contact the support.

Common causes

The following are the common causes of this type of issue:

· The route to the RP is unreachable.

· The RP address is not the same on all devices in the PIM-SM network.

· The interface connected to downstream devices does not receive PIM join messages.

· The interface is not enabled with PIM-SM.

· The RPF route to the RP is incorrect.

· Configuration errors exist, such as multicast forwarding boundary and multicast source policy.

Troubleshooting flow

Figure 71 shows the troubleshooting flowchart.

Figure 71 Flowchart for troubleshooting RPT forwarding failure in PIM-SM

Solution

1. Verify that a correct (S, G) entry exists in the PIM routing table.

Execute the display pim routing-table command to check for the correct (S, G) entry.

¡ If a correct (S, G) entry exists, execute the display multicast forwarding-table command at 15-second intervals, and identify whether the same (S, G) entry exists in the forwarding table and check the Matched packets and Forwarded packets fields.

- If no (S, G) entry exists in the forwarding table or the values of the Matched packets and Forwarded packets fields do not increase, proceed to step 9.

- If the same (S, G) entry exists in the forwarding table and the values of the Matched packets and Forwarded packets fields increase,, also proceed to step 9.

¡ If no correct (S, G) entry exists, proceed to step 2.

2. Verify that the interface connected to the downstream device has received PIM join messages.

Under the guidance of the support, use a packet capture tool such as Wireshark to capture packets on the interface to identify whether PIM join messages are received.

¡ If the interface connected to the downstream device has received PIM join messages, and proceed to step 3.

3. Verify that the interface is enabled with PIM-SM.

¡ If all of the interfaces are enabled with PIM-SM, proceed to step 4.

4. Verify that the RP information is correct.

Execute the display pim rp-info command to check for the RP information, and identify whether all other devices in the PIM-SM domain has the same RP information.

¡ If the RP information is not the same and static RPs are used, execute the static-rp command on all devices to configure the same RP address. If dynamic RPs are used, proceed to step 9.

¡ If the RP information is the same, proceed to step 5.

5. Verify that an RPF route to the RP exists.

Execute the display multicast rpf-info command to check for the RPF route to the RP.

¡ If no route to the RP exists, examine unicast route configuration. Execute the ping command on both the device and the RP to identify whether they can ping each other successfully. If no, modify the unicast route configuration until they can ping each other successfully.

¡ If a route to the RP exists, execute the display multicast rpf-info command, and check the Referenced route type field for the type of the referenced route.

- If the RPF route is a static multicast route, execute the display multicast routing-table static command to identify whether the static multicast route is correct.

- If the RPF route is a unicast route, execute the display ip routing-table command to identify whether the unicast route is the same as the RPF route.

¡ If an RPF route to the RP exists and is correct, proceed to step 6.

6. Verify that the DR corresponding to the interface for forwarding multicast data is the receiver-side DR.

Execute the display pim interface command, and check the DR-Address field. If (local) is displayed after the DR address, the DR is the receiver-side DR.

¡ If the DR is not the receiver-side DR, locate the device where the DR resides and perform step 7 on the device.

¡ If the DR is the receiver-side DR, perform step 7 on the current device.

7. Verify that the RPF interface and the connected interface of the device's RPF neighbor have not been configured with a multicast forwarding boundary.

Execute the display multicast boundary command to check for the multicast forwarding boundary configuration.

¡ If an interface is configured with a multicast forwarding boundary, execute the undo multicast boundary command to delete it.

¡ If no interface is configured with a multicast forwarding boundary, proceed to step 8.

8. Verify that no multicast source policy is configured or multicast data is not denied by a multicast source policy.

Execute the display this command in PIM view to check for a multicast source policy.

¡ If no interface is configured with a multicast forwarding boundary, proceed to step 9.

9. If the issue persists, collect the following information and contact the support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Layer 3 multicast issues

Layer 3 multicast traffic forwarding failure

Symptom

A device fails to forward Layer 3 multicast traffic.

Common causes

The following are the common causes of this type of issue:

· No unicast routes exist.

· The interface state is incorrect.

· The device does not generate PIM routing entries or generates an incorrect PIM routing entry.

· The device does not generate multicast forwarding entries or generates an incorrect multicast forwarding entry.

Troubleshooting flow

Figure 72 shows the troubleshooting flowchart.

Figure 72 Flowchart for troubleshooting Layer 3 multicast traffic forwarding failure

Solution

1. Verify that a unicast route to the multicast source exists.

Execute the display ip routing-table ip-address command, and check the unicast route to the multicast source. Specify the multicast source address for the ip-address argument.

¡ If no unicast route to the multicast source exists, configure one.

¡ If a unicast route to the multicast source exists, proceed to the next step.

2. Verify that the physical states of the input interface and output interface are up.

Execute the display interface command to check the physical states of the input interface and output interface.

¡ If the physical state of either interface is down, troubleshoot the interface down issue.

¡ If the physical states of both interfaces are up, proceed to the next step.

3. Verify that the device generates a PIM routing entry and the entry has the correct output interface.

Execute the display pim routing-table command to check for correct PIM routing entries.

¡ If the device has not generated a correct PIM routing entry, contact the support.

¡ If the device has generated a correct PIM routing entry, proceed to the next step.

4. Verify that the device generates a multicast forwarding entry and the entry has the correct output interface.

Execute the display multicast forwarding-table command to check for correct multicast forwarding entries.

¡ If the device has not generated a correct multicast forwarding entry, collect the results of each step and the configuration file, and contact the support.

¡ If the device has generated a correct multicast forwarding entry, also collect the results of each step and the configuration file, and contact the support.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

IGMP or MLD entry establishment failure

Symptom

A device fails to establish IGMP or MLD entries.

Common causes

The following are the common causes of this type of issue:

· The device is not enabled with IP multicast routing.

· The physical state of the interface connected to the subnet of hosts is down.

· The interface connected to the subnet of hosts is not configured with a primary IP address.

· The interface connected to the subnet of hosts is not enabled with IGMP or MLD.

· The multicast group address is in the SSM group range, but the IGMP or MLD version is incorrect.

· An SSM group range is configured, but the multicast group address is not permitted by the ACL.

· An IGMP or MLD multicast group policy is configured, but the multicast group address is not permitted by the ACL.

Troubleshooting flow

Figure 73 shows the troubleshooting flowchart.

Figure 73 Flowchart for failure troubleshooting IGMP or MLD entry establishment failure

Solution

1. Verify that the device is enabled with IP multicast routing.

Execute the display current-configuration | include multicast command to identify whether IP multicast routing has been enabled.

¡ If IP multicast routing has not been enabled, execute the multicast routing command to enable it.

¡ If IP multicast routing has been enabled, proceed to the next step.

2. Verify that the physical state of the interface connected to the subnet of hosts is up.

Execute the display interface interface-type interface-number command, and check the Current state field for the physical state of the interface.

¡ If the physical state is up, proceed to the next step.

¡ If the physical state is down, troubleshoot the interface down issue.

3. Verify that the interface is configured with a primary IP address.

Execute the display this command on the interface connecting the device to hosts, and check for the primary IP address.

¡ If the primary IP address is not configured, use the ip address command to configure it.

¡ If the primary IP address is configured, proceed to the next step.

4. Verify that the interface connected to the subnet of hosts is enabled with IGMP or MLD.

Execute the display current-configuration interface command to identify whether IGMP or MLD is enabled on the interface.

¡ If no, enable IGMP or MLD on the interface.

¡ If yes, proceed to the next step.

5. Identify whether the multicast group address is in the default SSM group range.

¡ For IGMP, the default SSM group range is 232.0.0.0/8.

- If the multicast group address is in the default SSM group range, verify that the IGMP version is IGMPv3 and IGMPv3 packets are correct. If the issue persists, proceed to step 6.

- If the multicast group address is not in the default SSM group range, proceed to step 7.

¡ For MLD, the default IPv6 SSM group range is FF3x::/32.

- If the multicast group address is in the default IPv6 SSM group range, verify that the MLD version is MLDv2. If the issue persists, proceed to step 6.

- If the multicast group address is not in the default SSM group range, proceed to step 7.

6. Identify whether an SSM group range is configured on the interface.

Execute the display current-configuration configuration pim or display current-configuration configuration pim6 command to identify whether an SSM group range is configured.

¡ If an SSM group range is configured, identify whether the multicast group address is permitted by the ACL.

- If no, execute the undo ssm-policy command in PIM view or modify the ACL so that the multicast group address can be permitted.

- If yes, proceed to step 7.

¡ If an SSM group range is not configured, proceed to step 7.

7. Identify whether an IGMP or MLD multicast group policy is configured on the interface.

Execute the display current-configuration command to identify whether an IGMP or MLD multicast group policy is configured.

¡ If yes, identify whether the multicast group address is permitted by the ACL.

- If no, execute the undo igmp group-policy or undo mld group-policy command or modify the ACL so that the multicast group address can be permitted.

- If yes, proceed to step 8.

¡ If no, proceed to step 8.

8. If the issue persists, collect the following information and contact the support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Layer 2 multicast issues

Layer 2 multicast traffic forwarding failure

Symptom

A device fails to forward Layer 2 multicast traffic.

Common causes

The following are the common causes of this type of issue:

· The device does not generate Layer 2 multicast forwarding entries.

· The device does not receive Layer 2 multicast protocol packets

· The IGMP protocol packet format is incorrect.

· The version in IGMP protocol packets is different from the IGMP snooping version configured on the device.

· Layer 3 multicast is configured.

Troubleshooting flow

Figure 74 shows the troubleshooting flowchart.

Figure 74 Flowchart for troubleshooting Layer 2 multicast traffic forwarding failure

Solution

1. Identify whether the device generates Layer 2 multicast forwarding entries.

Execute the display l2-multicast ip forwarding command to check for Layer 2 multicast forwarding entries.

¡ If the device has generated Layer 2 multicast forwarding entries, contact the support.

¡ If the device has not generated Layer 2 multicast forwarding entries, proceed to step 2.

2. Identify whether the device receives IGMP reports.

Execute the debugging igmp-snooping packet command to enable IGMP snooping packet debugging. If the following information is printed, the device has received IGMP reports:

*Sep 15 11:47:41:455 2011 Sysname MCS/7/PACKET: -MDC=1; Receive IGMPv2 report packet from port GE2/0/1 on VLAN 2. (G162625)

¡ If the device has not received IGMP reports, troubleshoot the downstream device and hosts.

¡ If the device has received IGMP reports, proceed to step 3.

3. Verify that the IGMP protocol packet format is correct.

Configure mirroring, and use a packet capture tool (for example, Wireshark) to capture and analyze mirrored IGMP protocol packets under the guidance of the support.

¡ If the IGMP protocol packet format is incorrect, modify IGMP protocol packets.

¡ If the IGMP protocol packet format is correct, proceed to step 4.

4. Verify that the version in IGMP protocol packets is the same as the IGMP snooping version configured on the device.

Execute the display igmp-snooping command, and check the Version field for the IGMP snooping version.

¡ If the version in IGMP protocol packets is different from the IGMP snooping version, perform one of the following tasks:

- Modify the IGMP versions on the upstream and downstream devices so that they can be the same as the IGMP snooping version on the device.

- Use the version command in IGMP-snooping view or the igmp-snooping version command in VLAN view to modify the IGMP snooping version so that it can be the same as the IGMP versions on the upstream and downstream devices.

¡ If the version in IGMP protocol packets is the same as the IGMP snooping version, proceed to step 5.

5. Verify that Layer 3 multicast is not configured.

¡ If Layer 3 multicast is configured, delete the Layer 3 multicast configuration.

¡ If Layer 3 multicast is not configured, proceed to the next step.

6. If the issue persists, collect the following information and contact the support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting MPLS issues

LDP issues

LDP session down

Symptom

The LDP session cannot go up.

Common causes

The following are the common causes of this type of issue:

· The interface establishing the session is in a Down state.

· The LSR ID has been configured incorrectly.

· Related configuration for the LDP session does not exist.

· The transport address configuration is incorrect.

· The LDP Hello-hold timer has timed out.

· The LDP Keepalive-hold timer has timed out.

· Security authentication configuration is incorrect.

Troubleshooting flow

Figure 75 shows the troubleshooting flowchart.

Figure 75 Flowchart for troubleshooting LDP session down

Solution

To resolve the issue:

1. Check whether the interface for establishing an LDP session is in the up state.

Execute the display interface command to Identify whether the interface is in the up state:

¡ If the interface is not up, identify and eliminate any physical link faults to bring the interface to an up state.

¡ If the interface is in up state, proceed to step 2.

2. Check whether the LSR ID configuration is correct.

The LSR ID includes Local LSR ID, LDP LSR ID, and MPLS LSR ID. The priority of LSR ID from high to low is Local LSR ID, LDP LSR ID, and MPLS LSR ID. At least one type of LSR ID should be configured on the device and this LSR ID must be reachable at Layer 3.

Execute the display mpls ldp peer verbose command to Identify whether the LSR ID is configured.

<Sysname> display mpls ldp peer verbose

VPN instance: public instance

Peer LDP ID : 100.100.100.20:0

Local LDP ID : 100.100.100.17:0

TCP Connection : 100.100.100.20:47515 -> 100.100.100.17:646

…

If no LSR ID is configured, configure the LSR ID as follows:

¡ Configure the MPLS LSR ID in system view.

Execute the mpls lsr-id command in system view.

¡ Configure the LDP LSR ID in LDP view.

Execute the lsr-id command in LDP view.

If at least one type of LSR ID is configured, proceed to step 3.

3. Check whether relevant configuration for the LDP session exists.

If it's a direct session, execute the display this command in interface view to Identify whether there's any related configuration for the LDP session on the interface.

¡ If the configuration does not include the mpls enable, mpls ldp enable, mpls ldp ipv6 enable, or mpls ldp transport-address commands, deploy the missing commands.

¡ If the related configuration for the LDP session exists, proceed to step 4.

If it's an LDP remote session, execute the display this command in LDP view to Identify whether there is any related configuration of the LDP session.

¡ If the configuration does not include the targeted-peer or mpls ldp transport-address command, then deploy the missing commands.

¡ If the related configuration for the LDP session exists, proceed to step 4.

4. Check whether the transport address configuration is correct.

If it's an LDP IPv4 session, execute the display mpls ldp discovery verbose command to Identify whether the transport address configuration is correct.

<Sysname> display mpls ldp discovery verbose

VPN instance: public instance

Link Hellos:

Interface GigabitEthernet2/0/2

Local LDP ID : 100.100.100.17:0

Hello Interval : 5000 ms Hello Sent/Rcvd : 83/160

Transport Address: 100.100.100.17

Peer LDP ID : 100.100.100.18:0

Source Address : 202.118.224.18 Transport Address: 100.100.100.18

Hello Hold Time: 15 sec (Local: 15 sec, Peer: 15 sec)

Peer LDP ID : 100.100.100.20:0

Source Address : 202.118.224.20 Transport Address: 100.100.100.20

Hello Hold Time: 15 sec (Local: 15 sec, Peer: 15 sec)

Targeted Hellos:

100.100.100.17 -> 100.100.100.18 (Active, Passive)

Local LDP ID : 100.100.100.17:0

Hello Interval : 15000 ms Hello Sent/Rcvd : 23/20

Transport Address: 100.100.100.17

Session Setup : Config/Tunnel

Peer LDP ID : 100.100.100.18:0

Source Address : 100.100.100.18 Transport Address: 100.100.100.18

Hello Hold Time: 45 sec (Local: 45 sec, Peer: 45 sec)

If it's an LDP IPv6 session, execute the display mpls ldp discovery ipv6 verbose command to check whether the transport address configuration is correct.

<Sysname> display mpls ldp discovery ipv6 verbose

VPN instance: public instance

Link Hellos:

Interface GigabitEthernet2/0/2

Hello Interval : 5000 ms Hello Sent/Rcvd : 83/160

Transport Address: 2001::2

Peer LDP ID : 100.100.100.18:0

Source Address : FE80:130F:20C0:29FF:FEED:9E60:876A:130B

Transport Address: 2001::1

Hello Hold Time: 15 sec (Local: 15 sec, Peer: 15 sec)

Targeted Hellos:

2001:0000:130F::09C0:876A:130B ->

2005:130F::09C0:876A:130B(Active, Passive)

Hello Interval : 15000 ms Hello Sent/Rcvd : 23/22

Transport Address: 2001:0000:130F::09C0:876A:130B

Peer LDP ID : 100.100.100.18:0

Source Address : 2005:130F::09C0:876A:130B

Destination Address : 2001:0000:130F::09C0:876A:130B

Transport Address : 2005:130F::09C0:876A:130B

Hello Hold Time: 45 sec (Local: 45 sec, Peer: 45 sec)

If the transport address is incorrect, execute the mpls ldp transport-address command to configure the transport address in interface view or LDP peer view. By default, the transport address is the LSR ID of the local LSR.

If the transport address is correct, verify that the route is advertised. Execute the display ip routing-table command to Identify whether there is a route to reach the session endpoint.

¡ If the route does not exist, configure the transport address as an IP address that exists on the local device to ensure the route can be properly advertised.

¡ If the route exists, proceed to step 5.

5. Check whether the LDP Hello-hold timer has timed out.

It is recommended to execute the display mpls ldp discovery command every 5 seconds to check the count of transmitted and received Hello messages. This would verify if the Hello messages are being transmitted normally at both ends of the session. If the transmission or reception count does not change after several continuous command executions, it indicates an anomaly in the transmission and reception of Hello messages and the Hello-hold timing timer has timed out.

¡ If the Hello-hold timer times out, clear link faults and check the device's CPU usage. If the CPU usage is too high, disable some unnecessary features; if the CPU usage is normal, proceed to step 6.

¡ If the Hello-hold timer does not time out, proceed to step 6.

6. Check whether the LDP Keepalive-hold timer has timed out.

It is suggested to execute the display mpls ldp peer command every 15 seconds to check the transmit and receive counts of Keepalive messages, and Identify whether the Keepalive messages are transmitted normally at both ends of the session. If the counts do not change after several continuous command executions, it indicates an anomaly in transmitting or receiving Keepalive messages, and the Keepalive-hold timing has timed out.

¡ If the Keepalive-hold timer times out, resolve any packet forwarding issues.

¡ If the Keepalive-hold timer does not timeout, proceed to step 7.

7. Check whether the security authentication configuration is correct.

Execute the display mpls ldp peer command to check whether security authentication is configured on both ends of the LDP session, and whether the type of security authentication configured is consistent on both ends.

<Sysname> display mpls ldp peer

VPN instance: public instance

Total number of peers: 1

Peer LDP ID State Role GR Auth KA Sent/Rcvd

2.2.2.9:0 Operational Passive Off Keychain 39/39

¡ If the Auth field displays different values on both ends of the LDP session, then modify the security authentication on both ends of the LDP session to be consistent.

¡ If the Auth field displays the same value at both ends of the LDP session, then proceed to step 8.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module Name: MPLS-LDP-STD-MIB

mplsLdpSessionDown (1.3.6.1.2.1.10.166.4.0.4)

Log messages

LDP/4/LDP_SESSION_CHG

LDP session flapping

Symptom

The LDP session state flaps frequently.

Common causes

The following are the common causes of this type of issue:

· Interface flapping.

· Route flapping.

· High CPU usage.

Troubleshooting flow

Figure 76 shows the troubleshooting flowchart.

Figure 76 Flowchart for troubleshooting LDP session flapping

Solution

To resolve the issue:

1. Identify whether the interface is flapping.

Execute the display interface brief command to observe the Physical and Protocol fields. If both Physical and Protocol fields are displayed as Up, it indicates that the interface state is up. Otherwise, it indicates that the interface state is down. If the interface keeps switching between the Up and Down states, it indicates interface is flapping.

¡ If the interface is flapping, resolve the interface issue.

¡ If the interface is not flapping, proceed to step 2.

2. Identify whether the route is flapping.

Execute the display ip routing-table command to view route information. If the route information keeps switching between being displayed and not displayed, it indicates route flapping.

¡ If route flapping occurs, or the route has always been absent, resolve link issues and IGP route issues.

¡ If the route is not flapping, proceed to step 3.

3. Check whether the TCP packet is too large.

Execute the display tcp statistics command to view TCP connection traffic statistics. Determine if the TCP packet is excessively large by the value in the data packets retransmitted field in the Sent packets information.

¡ If the number of retransmitted packets continuously increases, it indicates that the TCP packet is too large. Execute the tcp mss command on the outgoing interface to adjust the TCP MSS value.

¡ If the number of retransmitted packets is not increased, it indicates that the TCP packet size is normal. Then, proceed to step 4.

4. Identify whether the CPU usage is too high.

Execute the display cpu-usage command to view the statistical information of CPU usage.

¡ If the CPU usage is too high, disable some unnecessary features to lower the device's CPU usage.

¡ If the CPU usage is normal, proceed with step 5.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module Name: MPLS-LDP-STD-MIB

mplsLdpSessionDown (1.3.6.1.2.1.10.166.4.0.4)

Log messages

LDP/4/LDP_SESSION_CHG

LDP LSP down

Symptom

In the LDP network, an LDP LSP cannot come up.

Common causes

The following are the common causes of this type of issue:

· Route issue.

· LDP session down.

· Insufficient resources, for example, number of labels reaching the limit, or lack of memory.

· LSP generation policy, label acceptance policy, label advertisement policy, or a label mapping propagation policy is configured.

· The outgoing interface of the route is not the interface that establishes the LDP session.

Troubleshooting flow

Troubleshoot this type of issue in the following procedure:

1. Identify whether the route exists.

2. Identify whether the LDP session has been established properly.

3. Identify whether there are issues of insufficient resources, for example, labels reaching the upper limit, or lack of memory.

4. Identify whether an LSP generation policy has been configured.

5. Identify whether the outgoing interface of the route is the interface used to establish the LDP session.

Figure 77 shows the troubleshooting flowchart.

Figure 77 Flowchart for troubleshooting LDP LSP down

Solution

To resolve the issue:

1. Identify whether the route exists.

Execute the display ip routing-table ip-address mask verbose command to Identify whether there is a route destined for the LSP destination address and is in active state (the State field value is Active Adv). For a public network BGP route, you also need to Identify whether the route carries a label. If the Label field is not NULL, it indicates the BGP route carries a label. When the route exists, the command will display relevant route information. If the route does not exist, the command will not display relevant route information.

<Sysname> display ip routing-table 1.1.1.1 32 verbose

Summary count : 1

Destination: 1.1.1.1/32

Protocol: O_INTRA

Process ID: 1

SubProtID: 0x1 Age: 00h00m16s

FlushedAge: 00h00m16s

Cost: 1 Preference: 10

IpPre: N/A QosLocalID: N/A

Tag: 0 State: Active Adv

OrigTblID: 0x0 OrigVrf: default-vrf

…

¡ If the route does not exist, the route exists but is not in active state, or the BGP route does not carry a label, resolve the routing failure.

¡ If the route exists and is in active state, and also carries a label when it is a BGP route, proceed to step 2.

2. Identify whether the LDP session has been established properly.

Execute the display mpls ldp peer verbose command to Identify whether the LDP session has been successfully established.

<Sysname> display mpls ldp peer verbose

VPN instance: public instance

Peer LDP ID : 1.1.1.1:0

Local LDP ID : 2.2.2.2:0

TCP Connection : 2.2.2.2:14080 -> 1.1.1.1:646

Session State : Operational Session Role : Active

Session Up Time : 0000:00:14 (DD:HH:MM)

…

¡ If the State field is not displayed as Operational, it indicates that the LDP session was not established normally. Troubleshoot the LDP session issue as described in “See "LDP session down."

¡ If the State field displays Operational, it indicates that the LDP session has been established and come up. In this case, proceed to step 3.

3. Check whether an LSP acceptance or advertisement policy has been configured.

¡ In the LDP view, execute the display this command. If the following commands exist, you need to check whether the specified LSP has been filtered by an IP prefix list:

- lsp-trigger prefix-list

- accept-label peer prefix-list

- advertise-label prefix-list

If an IP prefix list filters out the specified LSP, modify the IP prefix list to allow the destination address of the specified LSP to pass. If the IP prefix list does not filter the specified LSP, proceed to step 4.

¡ If the previous commands are not configured in the LDP view, proceed to step 4.

4. Identify whether the outgoing interface of the route is the interface used to establish the LDP session.

Execute the display ip routing-table ip-address mask command to view the outgoing interface of the specified route.

<Sysname> display ip routing-table 1.1.1.1 32

Summary count : 1

Destination/Mask Proto Pre Cost NextHop Interface

1.1.1.1/32 O_INTRA 10 1 10.1.1.1 GE2/0/1

Execute the display mpls ldp peer peer-lsr-id verbose command to view the Discovery Sources information of the specified LDP peer.

<Sysname> display mpls ldp peer 1.1.1.1 verbose

VPN instance: public instance

Peer LDP ID : 1.1.1.1:0

Local LDP ID : 2.2.2.2:0

TCP Connection : 2.2.2.2:14080 -> 1.1.1.1:646

Session State : Operational Session Role : Active

Session Up Time : 0000:12:55 AM (DD:HH:MM)

Max PDU Length : 4096 bytes (Local: 4096 bytes, Peer: 4096 bytes)

Keepalive Time : 45 sec (Local: 45 sec, Peer: 45 sec)

Keepalive Interval : 15 sec

Msgs Sent/Rcvd : 229/228

KA Sent/Rcvd : 223/223

Label Adv Mode : DU Graceful Restart : Off

Reconnect Time : 0 sec Recovery Time : 0 sec

Loop Detection : Off Path Vector Limit: 0

mLDP P2MP : Off

Discovery Sources:

GigabitEthernet2/0/1

Hello Hold Time: 15 sec Hello Interval : 5000 ms

Addresses received from peer:

10.1.1.1 1.1.1.1

¡ If the interface information in Discovery Sources field does not include the outgoing interface of the specified route, check whether the corresponding LDP configuration on the outgoing interface of the specified route and on the corresponding interface of the downstream device is correct. If it is incorrect, modify the corresponding configuration; if it is correct, proceed to step 5.

¡ If the interface information in the Discovery Sources field includes the outgoing interface of the specified route, proceed to step 5.

5. Check for insufficient resources, such as number of LSPs reaching the upper limit or lack of memory.

¡ Identify whether the system memory is insufficient.

Execute the display memory-threshold command to Identify whether the system is running out of memory. If the memory is insufficient, delete unnecessary LSPs.

¡ Check whether the number of labels has exceeded the upper limit.

Execute the display mpls summary command and Identify whether the number of idle labels in the LDP label range is 0, that is, the Idle field shows 0. If the idle label count is 0, it means that all label resources of the LDP have been used up, and it is necessary to delete unnecessary LSPs.

<Sysname> display mpls summary

MPLS LSR ID : 2.2.2.2

Egress Label Type: Implicit-null

Entropy Label : Off

Labels:

Range Used/Idle/Total Owner

16-2047 0/2032/2032 StaticPW

Static

StaticCR

Static SR Adj

BSID

2048-599999 9129/588823/597952 LDP

RSVP

BGP

BGP SR EPE

OSPF SR Adj

ISIS SR Adj

¡ If the issue of insufficient resources does not exist, proceed to step 6.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module name: MPLS-LSR-STD-MIB

The node name (OID) is mplsXCDown (1.3.6.1.2.1.10.166.2.0.2).

Log messages

N/A

LDP LSP flapping

Symptom

In the LDP network, an LDP LSP flaps frequently.

Common causes

The following are the common causes of this type of issue:

· Route flapping.

· LDP session flapping.

Troubleshooting flow

Troubleshoot this type of issue in the following procedure:

1. Identify whether the route is flapping.

2. Identify whether the LDP session is flapping.

Figure 78 shows the troubleshooting flowchart.

Figure 78 Flowchart for troubleshooting LDP LSP flapping

Solution

To resolve the issue:

1. Identify whether the route is flapping.

It is recommended to execute the display ip routing-table command every second continuously for 5 to 10 times to check the route to the LSP destination address. When the route exists, related route information will be displayed. If the route does not exist, related route information will not be displayed. If the related route information keeps switching between displaying and not displaying, it indicates route flapping.

After viewing the route information, execute the display mpls ldp fec command to verify that the State field in the Downstream Info for the LSP established with the downstream peer has a value of Established.

<Sysname> display mpls ldp fec

VPN instance: public instance

FEC: 1.1.1.1/32

Flags: 0x112

In Label: 2175

Upstream Info:

Peer: 1.1.1.1:0 State: Established

Downstream Info:

Peer: 1.1.1.1:0

Out Label: 3 State: Established

Next Hops: 10.1.1.1 GE2/0/1

RIB Info:

Protocol : OSPF BGP As Num : 0

Label Proto ID : 1 NextHopCount : 1

VN ID : 0x313000003

Tunnel ID : -

¡ If route flapping occurs OR if the route never exists, please troubleshoot the routing issue.

¡ If the route is not oscillating, proceed to step 2.

2. Identify whether the LDP session is flapping.

It is recommended to execute the display mpls ldp peer command every second, continuously for 5 to 10 times, to check the State field in the output information. If the value of this field is switching between Operational state and other states, it indicates that the LDP session is flapping.

<Sysname> display mpls ldp peer

VPN instance: public instance

Total number of peers: 1

Peer LDP ID State Role GR AUT KA Sent/Rcvd

1.1.1.1:0 Operational Active Off None 298/298

¡ If the LDP session is flapping, troubleshoot the flapping issue as described in "LDP session flapping."

¡ If the LDP session is not flapping, proceed to step 3.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module name: MPLS-LSR-STD-MIB

The node name (OID) is mplsXCDown (1.3.6.1.2.1.10.166.2.0.2).

Log messages

N/A

Troubleshooting MPLS L2VPN/VPLS

A PW failed to be pinged

Symptom

Execute the ping mpls pw command to test PW connectivity. However, the remote end cannot be pinged.

Common causes

The following are the common causes of this type of issue:

· The PW being tested does not exist.

· The PW template configuration is incorrect.

· The PW has failed.

· The PW does not have a valid forwarding path on the public network.

Analysis

To troubleshoot this type of issue, execute the ping mpls pw command, and then identify the cause of the issue depending on the received error message.

· If you receive the Unknown PW error message, the issue occurs because the PW does not exist. You must modify the configuration to make sure the PW can be created correctly.

· If the error message is No suitable control channel for the PW, check for VCCV control channel type misconfiguration. Then, execute the vccv cc command to specify the correct VCCV control channel type in the PW template.

· If the error message is Please configure pseudowire control-word for control channel, execute the control-word enable command to enable the control word feature in the PW template.

· If the error message is Request time out, identify whether the local PW is up, and then execute the tracert mpls pw command to locate the faulty node.

Figure 79 shows the troubleshooting flowchart.

Figure 79 Flowchart for troubleshooting ping failure

Solution

To resolve the issue:

If you receive the Unknown PW message, modify the configuration to ensure that the PW can be correctly created.

If you receive the No suitable control channel for the PW message, execute the vccv cc command to configure the same VCCV control channel type for both ends.

If you receive the Please configure pseudowire control-word for control channel message, execute the control-word enable command to enable the control word feature in the PW template.

If you receive the Request time out message, perform the following steps:

1. Execute the display l2vpn pw command to verify that the PW is up.

<Sysname> display l2vpn pw

Flags: M - main, B - backup, E - ecmp, BY - bypass, H - hub link, S - spoke link

N - no split horizon, A - administration, ABY - ac-bypass

PBY - pw-bypass

Total number of PWs: 2

2 up, 0 blocked, 0 down, 0 defect, 0 idle, 0 duplicate

Xconnect-group Name: ldp

Peer PWID/RmtSite/SrvID In/Out Label Proto Flag Link ID State

192.3.3.3 500 1299/1299 LDP M 0 Up

VSI Name: aaa

Peer PWID/RmtSite/SrvID In/Out Label Proto Flag Link ID State

2.2.2.9 2 1420/1419 BGP M 9 Up

¡ If the PW is in Down state, execute the display l2vpn pw verbose command to check for the failure reason and troubleshoot the issue.

<Sysname> display l2vpn pw verbose

VSI Name: aaa

Peer: 2.2.2.9 Remote Site: 2

Signaling Protocol : BGP

Link ID : 9 PW State : Down

In Label : 1420 Out Label: 1419

MTU : 1500

PW Attributes : Main

VCCV CC : -

VCCV BFD : -

Flow Label : Send

Control Word : Disabled

Tunnel Group ID : 0x800000960000000

Tunnel NHLFE IDs : 1038

Admin PW : -

E-Tree Mode : -

E-Tree Role : root

Root VLAN : -

Leaf VLAN : -

Down Reasons : Control word not match

The common causes of this type of issue are as follows:

- BFD session for PW down—The BFD session for PW detection is down. To resolve this issue, execute the display bfd session command to display BFD session information. Check and edit BFD configuration or check the physical link for link failure or link quality issues.

- BGP RD was deleted—The BGP RD has been deleted. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.

- BGP RD was empty—No BGP RD is configured. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.

- Control word not match—The control word configuration on the two ends of the PW is inconsistent. To resolve this issue, execute the control-word enable command to enable the control word feature on both ends.

- Encapsulation not match—The encapsulation types on the two ends of the PW are inconsistent. Execute the pw-type command to configure the same encapsulation type for the two ends.

- LDP interface parameter not match—The LDP negotiation parameters on the two ends of the PW are inconsistent. To resolve this issue, execute the vccv cc command to specify the same VCCV control channel (CC) type. Alternatively, specify the same CEM class for the CEM interfaces on both ends of the PW.

- Non-existent remote LDP PW—The remote device has deleted the LDP PW. To resolve the issue, reconfigure the LDP PW on the remote device.

- Local AC Down—The local AC is down. To resolve the issue, check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make Assurance interface is in up state.

- Local AC was non-existent—The local AC did not exist. To resolve this issue, configure a local AC and associate it with a VSI.

- MTU not match—The MTU is not the same at the two ends of the PW. To resolve the issue, configure the same MTU at both ends of the PW or use the mtu-negotiate disable command to disable MTU negotiation.

- Remote AC Down—The remote AC is down. Check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make Assurance interface is in up state.

¡ If the PW is in up state, go to step 2.

2. Execute the display l2vpn forwarding pw verbose command to verify that the In Label, Out Label, and Tunnel NHLFE IDs related to the tunnel that carries the PW are valid.

<Sysname> display l2vpn forwarding pw verbose

Xconnect-group Name: xcg1

Connection Name: c1

Link ID: 0

PW Type : VLAN PW State : Up

In Label : 110126 Out Label: 130126

MTU : 1500

PW Attributes : Main

VCCV CC : Router-Alert

VCCV BFD : Fault Detection with BFD

Flow Label : -

Tunnel Group ID : 0x800000130000001

Tunnel NHLFE IDs : 3

VSI Name: aaa

Link ID: 8

PW Type : VLAN PW State : Up

In Label : 1272 Out Label: 1275

MTU : 1500

PW Attributes : Main

VCCV CC : -

VCCV BFD : Fault Detection with BFD

Flow Label : -

Tunnel Group ID : 0x960000000

Tunnel NHLFE IDs: 1034

¡ If the values for the incoming and outgoing labels are empty or a hyphen (-), first execute the display l2vpn pw verbose command to check for the protocol that established the PW and then edit the configuration as follows:

- If the protocol is BGP, check and edit BGP configuration.

- If the protocol is LDP, check and edit LDP configuration.

- If the protocol is Static, check and edit static PW configuration.

For more information about protocol that are used establish PWs, see MPLS L2VPN and VPLS in the MPLS Configuration Guide for your device.

¡ If no value is available for the Tunnel NHLFE IDs field, go to step 3.

¡ If the forwarding information for the PW is normal, go to step 4.

3. Execute the display mpls lsp command to check for the tunnel that carries the PW. The tunnel is an LSP with the FEC as the PW peer IP address. If it does not exist, establish the tunnel that carries the PW.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

100.100.100.100/24 LDP -/1049 GE2/0/1

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Use the display diagnostic-information command to collect diagnostic information.

Related alarm and log messages

Alarm messages

None.

Log messages

· L2VPN/2/L2VPN_PWSTATE_CHANGE

· L2VPN/4/L2VPN_BGPVC_CONFLICT_LOCAL

· L2VPN/4/L2VPN_BGPVC_CONFLICT_REMOTE

· L2VPN/4/L2VPN_HARD_RESOURCE_NOENOUGH

· L2VPN/2/L2VPN_HARD_RESOURCE_RESTORE

· L2VPN/4/L2VPN_LABEL_DUPLICATE

Troubleshooting MPLS L3VPN issues

L3VPN traffic disrupted

Symptom

Private traffic forwarded through the MPLS L3VPN network gets disrupted.

Common causes

The following are the common causes of this type of issue:

· The next hop in the private network route is unreachable.

· Incorrect routing policy configuration prevents the route from being advertised and received.

· Private routes cannot be advertised because of insufficient label resources.

· The private network route does not point to a tunnel.

· Mismatch between export and import RTs prevents the device from learning routes into the private routing table.

· Incoming routes are discarded because the maximum number of routes has been reached.

Troubleshooting flow

Figure 80 shows the troubleshooting flowchart.

Figure 80 Troubleshooting flowchart for L3VPN traffic disruption

Solution

1. Verify that the route is the optimal one.

2. Execute the display bgp routing-table vpnv4 or display bgp routing-table vpnv6 command to verify that the BGP route to the VPNv4 or VPNv6 peer is optimal.

A route is optimal if it contains the greater than (>) symbol. The route to 100.1.2.0/24 in the following command output is an example of optimal routes.

<Sysname> display bgp routing-table vpnv4

BGP local router ID is 1.1.1.9

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a – additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Total number of VPN routes: 8

Total number of routes from all PEs: 8

Route distinguisher: 100:1(vpn1)

Total number of routes: 6

Network NextHop MED LocPrf PrefVal Path/Ogn

* > 1.1.1.0/24 1.1.1.1 0 32768 ?

* 1.1.1.2/32 1.1.1.1 0 32768 ?

* > 100.1.2.0/24 100.1.1.1 0 100 0 400i

Take action depending on the command output.

¡ If the route is not optimal, use the display mpls lsp command to verify that the MPLS LFIB has an entry for the route of interest. If an LFIB entry is not available, enable MPLS and LDP on the public network interface towards the remote PE by executing the mpls enable and mpls ldp enable commands, respectively. This ensures that VPNv4 routes can be pointed to public LSPs. If the entry exists, proceed to step 2.

¡ If the route is optimal, proceed to step 2.

3. Verify the connectivity to the next hop in the private route.

Execute the display bgp routing-table vpnv4 ipv4-address [ mask | mask-length ] command on the local PE check for the private route advertised by the remote PE. Specify the private route prefix for the ipv4-address argument.

¡ If the route does not exist, check for CE route advertisement issues. On the remote PE, execute the display bgp routing-table vpnv4 peer advertised-routes or display bgp routing-table vpnv6 peer advertised-routes command to verify that it has advertised private routes correctly to the local PE.

<Sysname> display bgp routing-table vpnv4 peer 22.22.22.22 advertised-routes

Total number of routes: 6

BGP local router ID is 11.11.11.11

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Route distinguisher: 1:1

Total number of routes: 3

Network NextHop MED LocPrf Path/Ogn

* >e 1.1.1.1/32 10.1.1.2 0 100 20i

* >e 7.7.7.7/32 10.1.1.2 0 100 20?

* >e 10.1.1.0/24 10.1.1.2 0 100 20?

If the private route does not exist, proceed to step 3.

¡ If the private route exists, verify that its next hop is reachable and its state is active.

Check the State field. If its value is valid, the route is active. Check the Original nexthop field. If it contains next hop information, the next hop in the route is reachable.

- If the private route is inactive, use the display ip routing-table vpn-instance vpn-instance-name ip-address command to check the IP routing table for a route to the BGP next hop in the Original nexthop field.
If such a route does not exist, the next hop in the private route is unreachable. Then, check the routing configuration for the public network between PEs.
If such a route exists, the BGP route next hop is reachable. Proceed to step (3).

- If the private route is active, proceed to step 3.

<sysname> display bgp routing-table vpnv4 6.0.0.9 32

BGP local router ID: 4.0.0.9

Local AS number: 200

Route distinguisher: 103:1

Total number of routes: 1

Paths: 1 available, 1 best

BGP routing table information of 6.0.0.9/32:

From : 3.0.0.9 (3.0.0.9)

Rely nexthop : 20.0.2.1

Original nexthop: 3.0.0.9

OutLabel : 24128

Ext-Community : <RT: 100:1>

RxPathID : 0x0

TxPathID : 0x0

AS-path : 300 103

Origin : igp

Attribute value : pref-val 0

State : valid, external, best

IP precedence : N/A

QoS local ID : N/A

Traffic index : N/A

Tunnel policy : tp1

Rely tunnel IDs : 2

4. Verify that the routing policy is correct.

Execute the display current-configuration configuration bgp command both the route sender and receiver. Check the BGP configuration for import and export routing policies policies.

<sysname> display current-configuration configuration bgp

bgp 100

peer 1.1.1.1 as-number 100

peer 3.3.3.3 as-number 100

peer 3.3.3.3 connect-interface LoopBack1

address-family vpnv4

peer 3.3.3.3 enable

peer 3.3.3.3 route-policy in import

peer 3.3.3.3 route-policy out export

return

If the devices at both ends have import and export routing policies, check the policies for incorrect settings that filter out the private route.

If the devices do not have import or export routing policies, or if the routing policies do not filter out the private route, proceed to step 4.

5. Verify that the route can recurse to a tunnel.

On the remote PE (the route sender), execute the display bgp routing-table vpnv4 ipv4-address [ mask | mask-length ] command to verify that the VPNv4 route can recurse to the tunnel.

If the command output contains the Rely tunnel IDs, the route can recurse to the tunnel.

¡ If the route cannot recurse to the tunnel, see the troubleshooting procedure for the LDP LSP up failure issues.

¡ If the route recurses to a tunnel, proceed to step 5.

<sysname> display bgp routing-table vpnv4 6.0.0.9 32

BGP local router ID: 4.0.0.9

Local AS number: 200

Route distinguisher: 103:1

Total number of routes: 1

Paths: 1 available, 1 best

BGP routing table information of 6.0.0.9/32:

From : 3.0.0.9 (3.0.0.9)

Rely nexthop : 20.0.2.1

Original nexthop: 3.0.0.9

OutLabel : 24128

Ext-Community : <RT: 100:1>

RxPathID : 0x0

TxPathID : 0x0

AS-path : 300 103

Origin : igp

Attribute value : pref-val 0

State : valid, external, best

IP precedence : N/A

QoS local ID : N/A

Traffic index : N/A

Tunnel policy : tp1

Rely tunnel IDs : 2

6. Check for export RT and import RT mismatches. A mismatch between import and export RTs can prevent routes from being learned into private routing tables.

Execute the display bgp routing-table vpnv4 and display current-configuration configuration vpn-instance commands on the route sender (the local PE) and route receiver (the remote PE). Check for a mismatch between the export RT on the local PE and the import RT on the remote PE for the VPN instance. An RT mismatch can prevent the route from being learned into the remote VPN instance after it is sent to the remote PE.

Execute the display bgp routing-table vpnv4 and display ip extcommunity-list commands on the local PE to verify that the export RT for the VPN instance is not filtered out. If it is filtered out, the PE does not advertise the routes that match the export RT.

¡ If the export and import RTs do not match, execute the vpn-target command to reconfigure their settings for the VPN instance.

¡ If the routing policy filters out the export RT, execute the apply extcommunity rt command in routing policy view to include the export RT to the list of RT attributes set for the matching routes.

¡ If the export and import RTs match, or if the export RT is not filtered out by the routing policy, proceed to step 6.

Verify that the route carries the correct export RT attributes.

<sysname> display bgp routing-table vpnv4 6.0.0.9 32

BGP local router ID: 4.0.0.9

Local AS number: 200

Route distinguisher: 103:1

Total number of routes: 1

Paths: 1 available, 1 best

BGP routing table information of 6.0.0.9/32:

From : 3.0.0.9 (3.0.0.9)

Rely nexthop : 20.0.2.1

Original nexthop: 3.0.0.9

OutLabel : 24128

Ext-Community : <RT: 100:1>

RxPathID : 0x0

TxPathID : 0x0

AS-path : 300 103

Origin : igp

Attribute value : pref-val 0

State : valid, external, best

IP precedence : N/A

QoS local ID : N/A

Traffic index : N/A

Tunnel policy : tp1

Rely tunnel IDs : 2

Verify that the BGP extended community attribute list is correct.

<sysname> display ip extcommunity-list 1

Extended Community List Number 10

Deny rt: 100:1

Extended Community List Number 20

Permit rt: 200:1

Verify that the local device has correct import RT settings.

<sysname> display current-configuration configuration vpn-instance

ip vpn-instance vpn1

route-distinguisher 1:1

vpn-target 100:1 import-extcommunity

vpn-target 100:1 export-extcommunity

7. Check for insufficient MPLS label resources.

Execute the display mpls interface command on the route sender (the local PE) to verify that MPLS is enabled on the public network interface connected to the remote PE.

¡ If the command output contains the public interface connected to the remote PE, MPLS is enabled on the interface.

¡ If the command output does not contain the public network interface connected to the remote PE, execute the mpls enable command in interface view to enable MPLS on it.

<Sysname> display mpls interface

Interface Status MPLS MTU

GE2/0/1 Up 1500

GE2/0/2 Up 1500

Execute the display bgp routing-table vpnv4 advertise-info command to identify the state of label allocation for advertised routes.

¡ If the Inlabel field in the command output for a route is empty, the system might have failed to allocate a label to the route because of insufficient label resources. To conserve label resources:

- Execute the apply-label per-instance command to enable allocation of one label for the entire VPN instance.

- Use route summarization to reduce the number of routes.

¡ If the Inlabel field in the command output has a reasonable value, label resources are sufficient and a label has been allocated to the route. Proceed to step 7.

<Sysname> display bgp routing-table vpnv4 10.1.1.0 24 advertise-info

BGP local router ID: 1.1.1.9

Local AS number: 100

Route distinguisher: 100:1

Total number of routes: 1

Paths: 1 best

BGP routing table information of 10.1.1.0/24(TxPathID:0):

Advertised to VPN peers (1 in total):

3.3.3.9

Inlabel : 1279

8. Check for insufficient route entry resources.

Execute the display bgp peer vpnv4 log-info command to view the log for the BGP peer. If the command output contains the Cease/maximum number of VPNv4 prefixes reached message, the number of IPv4 VPN routes has reached the limit.

<Sysname> display bgp peer vpnv4 1.1.1.1 log-info

Peer : 1.1.1.1

Date Time State Notification

Error/SubError

06-Feb-2013 22:54:42 Down Send notification with error 6/1

Cease/maximum number of VPNv4 prefixes reached

In addition, if the number of routes has exceeded the limit, you would receive log messages similar to the following sample messages:

BGP/4/BGP_EXCEED_ROUTE_LIMIT: BGP default.vpn1: The number of routes (101) from peer 1.1.1.1 (IPv4-UNC) exceeds the limit 100.

BGP/4/BGP_REACHED_THRESHOLD: BGP default.vpn1: The ratio of the number of routes (3) received from peer 1.1.1.1 (IPv4-UNC) to the number of allowed routes (2) has reached the threshold (75%).

¡ If the number of routes exceeds the limit, execute the peer route-limit command in VPNv4 address family view or VPNv6 address family view on the route receiver to increase the maximum number of routes allowed to receive from its peers.

¡ If the number of routes does not exceed the limit, proceed to step 8.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module name: BGP4-MIB

bgpBackwardTransition (1.3.6.1.2.1.15.7.2)

Log messages

· BGP_EXCEED_ROUTE_LIMIT

· BGP_REACHED_THRESHOLD

L3VPN private route flapping

Symptom

The private routes received from a remote PE flap on the local PE.

Common causes

The following are the common causes of this type of issue:

· Public route flapping.

· LDP LSP flapping.

· Interface flapping.

Troubleshooting flow

Figure 81 shows the troubleshooting flowchart.

Figure 81 Troubleshooting flowchart for L3VPN private route flapping issues

Solution

1. Check for public route flapping issues.

2. Identify the route type.

a. Execute the display ip routing-table command to identify the route type.

Take the following command output for example.
The Proto field displays IS_L1, indicating that the route type is IS-IS.
The Interface field displays Tun1, indicating that LDP over MPLS TE is deployed.

<Sysname> display ip routing-table 1.1.1.1

Summary count : 1

Destination/Mask Proto Pre Cost NextHop Interface

1.1.1.1/32 IS_L1 15 10 1.1.1.1 Tun1

b. Check for the route flapping issue.

Determine whether the route is flapping based on the route type. Take an IS-IS route for example. Execute the display ip routing-table protocol isis command to view route information. If the route continuously alternates between the visible and invisible states, route flapping has occurred.

- If the route is flapping, see the troubleshooting procedures for the OSPF neighbor down, OSPFv3 neighbor down, or IS-IS route flapping issue.

- If the route is not flapping, proceed to step 2.

3. Check for the LDP LSP flapping issue.

As a best practice, execute the display mpls ldp peer command every second for 5 to 10 times. Examine the State field in the command output. If the value changes between Operational and other states, the LDP session is flapping, causing LDP LSP flapping.

¡ If the LDP LSP is flapping, see the procedure for troubleshooting the LDP LSP flapping issue.

¡ If the LDP LSP is not flapping, proceed to step 3.

<Sysname> display mpls ldp peer

VPN instance: public instance

Total number of peers: 1

Peer LDP ID State Role GR AUT KA Sent/Rcvd

1.1.1.1:0 Operational Active Off None 298/298

4. Check for the interface flapping issue.

Execute the display interface brief command, and then examine the Link and Protocol fields in the command output. If the values in both fields are Up, the interface is up. If otherwise, the interface is down. If the interface state continuously alternates between up and down, the interface is flapping.

¡ If the interface is flapping, see the troubleshooting procedure for the interface not up issue.

¡ If the interface is not flapping, proceed to step 4.

<Sysname> display interface gigabitethernet 2/0/1 brief

Brief information on interfaces in route mode:

Link: ADM - administratively down; Stby - standby

Protocol: (s) – spoofing

Interface Link Protocol Primary IP Description

GE2/0/1 UP UP --

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

VPN route exchange failure between PEs

Symptom

PEs cannot exchange VPNv4 or VPNv6 routes.

Common causes

The following are the common causes of this type of issue:

· Public IGP routes are not advertised.

· No public LSPs are available.

· BGP peer relationships are not established.

· VPNv4 or VPNv6 routes are not learned.

Troubleshooting flow

Figure 82 shows the troubleshooting flowchart.

Figure 82 Troubleshooting flowchart for private route exchange failure between PEs

Solution

1. Verify that an IGP route is available.

Execute the display ip routing-table command to verify that the local PE has a subnet route to the LSR ID (typically, the IP address of a Loopback interface) of the remote PE.

<Sysname> display ip routing-table 1.1.1.1

Summary count : 1

Destination/Mask Proto Pre Cost NextHop Interface

1.1.1.2/32 IS_L1 15 10 1.1.1.1 LoopBack1

¡ If such a route does not exist, make sure an IGP protocol is enabled on the Loopback interface and the public network interface on each PE. This ensures correct advertisement of subnet routes between them.

¡ If such a route exists, proceed to step 2.

2. Verify that a public LSP is available.

Execute the display mpls lsp command to check for a public LSP to the remote PE's Loopback interface.

¡ If such an LSP is not present, enable MPLS and MPLS LDP on the public network interface to ensure the establishment of a public LSP.

¡ If such an LSP exists, proceed to step 3.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

1.1.1.2/32 LDP -/1049 GE2/0/1

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

1.1.1.1/24 LDP -/1051 GE2/0/1

3. Execute the display mpls ldp peer verbose command to verify that an LDP session has been successfully established.

¡ If the State field in the command output displays anything other than Operational, the LDP session has not been established. To resolve the issue, see the procedure that troubleshoots the failure of a LDP session to come up.

¡ If the State field in the command output displays Operational, the LDP session has been established. Proceed to step 3.

<Sysname> display mpls ldp peer verbose

VPN instance: public instance

Peer LDP ID : 1.1.1.1:0

Local LDP ID : 2.2.2.2:0

TCP Connection : 2.2.2.2:14080 -> 1.1.1.1:646

Session State : Operational Session Role : Active

Session Up Time : 0000:00:14 (DD:HH:MM)

…

4. Verify that a BGP peer relationship has been established.

Execute the display bgp peer vpnv4 to view the BGP VPNv4 peer relationships between PEs, and execute the display bgp peer ipv4 vpn-instance command to view the BGP peer relationships between PEs and CEs.

¡ If a BGP peer relationship is not present, or if the State field does not display Established, the BGP peer relationship has not been established. See the procedure that troubleshoots BGP neighbor establishment failures to resolve the issue.

¡ If the State field displays Established, the BGP peer relationship has been established. Proceed to step 4.

<Sysname> display bgp peer vpnv4

BGP local router ID: 192.168.100.1

Local AS number: 100

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

1.1.1.2 200 13 16 0 0 00:10:34 Established

<Sysname> display bgp peer ipv4 vpn-instance vpn1

BGP local router ID: 1.1.1.1

Local AS number: 100

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

10.1.1.1 65410 5 4 0 1 00:01:19 Established

5. Verify that the private route is operating correctly.

Execute the display ip routing-table vpn-instance command to check for private route issues.

¡ If the mask for the private route is not 32 bits and the route was discovered by a protocol other than BGP, the IP addresses of the Loopback interfaces on the peer PEs are in the same subnet. The device will prefer the direct route over the private route. To resolve this issue, change the IP address of the Loopback interface on each PE and set their mask to 32 bits.

¡ If the private route has a 32-bit mask and was discovered by BGP, the route is correct. Proceed to step 5.

<Sysname> display ip routing-table vpn-instance vpn1

Summary count : 1

Destination/Mask Proto Pre Cost NextHop Interface

1.1.1.0/24 Direct 0 0 1.1.1.1 LoopBack1

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Communication failure between VPN instances with matching RTs

Symptom

The network shown in Figure 83 deploys MPLS L3VPN services. On this network, CE 1 and CE 3 belong to VPN 1, and CE 2 belongs to VPN 2. To enable connectivity between VPN 1 and VPN 2, matching RTs were configured on them.

Despite this, CE 2 cannot ping CE 3 at IP address 3.3.3.3 in a different VPN, even though CE 1 can successfully ping CE 3.

Figure 83 Network diagram

Common causes

In this scenario, CE 1 can ping CE 3 in the same VPN, indicating that the public tunnel for label forwarding in the MPLS backbone network functions correctly. The failure is most likely caused by the IP conflict between interfaces assigned to different VPN instances.

Troubleshooting flow

Figure 84 shows the troubleshooting flowchart.

Figure 84 Flowchart for troubleshooting the communication failure between VPN instances with matching RTs

Solution

1. Check for IP conflict between interfaces on the PE.

Execute the display ip interface brief command on PE 1 to view the IP addresses of interfaces on it.

<Sysname> display ip interface brief

*down: administratively down

(s): spoofing (l): loopback

Interface Physical Protocol IP Address/Mask VPN instance Description

...

GE2/0/1 up up 10.1.1.1/24 vpn1 --

GE2/0/2 up up 10.1.1.1/24 vpn2 --

...

If the interfaces in different VPN instances are assigned IP addresses from different subnets, proceed to step 2.

If two interfaces in different VPN instances on the PE have the same IP address or IP addresses from the same subnet, re-assign an IP address to one of the interfaces. Make sure their IP addresses are from a different subnet. Then, change the IP address of the CE interface connected to the IP-updated PE interface and reconfigure routing between the PE and the CE.

BGP redistributes RT matching routes between the VPN instances. If you assigned the same IP address to interfaces in different VPN instances, the BGP routing table would have two routes for the same destination address. BGP would select the better one of the two routes. Look at the following sample command output:

<Sysname> display bgp routing-table vpnv4

BGP local router ID is 11.11.11.11

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Total number of VPN routes: 11

Total number of routes from all PEs: 2

Route distinguisher: 1:1(vpn1)

Total number of routes: 6

Network NextHop MED LocPrf PrefVal Path/Ogn

* >e 1.1.1.1/32 10.1.1.2 0 0 20i

* >e 2.2.2.2/32 10.1.1.2 0 0 30i

* >i 3.3.3.3/32 22.22.22.22 0 100 0 40i

* >e 10.1.1.0/24 10.1.1.2 0 0 20?

* >i 30.1.1.0/24 22.22.22.22 0 100 0 40?

Route distinguisher: 2:2(vpn2)

Total number of routes: 5

Network NextHop MED LocPrf PrefVal Path/Ogn

* >e 1.1.1.1/32 10.1.1.2 0 0 20i

* >e 2.2.2.2/32 10.1.1.2 0 0 30i

* >i 3.3.3.3/32 22.22.22.22 0 100 0 40i

* >e 10.1.1.0/24 10.1.1.2 0 0 20?

* e 10.1.1.2 0 0 30?

* >i 30.1.1.0/24 22.22.22.22 0 100 0 40?

In the BGP routing table for VPN 2 (with an RD of 2:2), the optimal route selected based on the AS_PATH attribute for subnet 10.1.1.0 originates from VPN 1. Then, PE 1 will send the traffic intended to be sent from VPN 1 to VPN 2 out of interface GigabitEthernet2/0/1 in VPN 1 instead of interface GigabitEthernet2/0/2 in VPN 2. This will cause an inter-VPN communication failure.

To ensure correct traffic forwarding between two VPN instances, make sure the PE and CE interfaces in one VPN instance are on a different subnet than the PE and CE interfaces in another VPN instance. For example, PE 1 and its attached CE establishes an EBGP session to exchange routes. Use the following procedure to change the IP address of the CE-attached PE interface on PE 1:

2. In system view, execute the interface command to enter the view of the interface associated with the target VPN instance.

3. Execute the ip address command to change the IP address of the target interface.

4. In system view, execute the bgp command to enter BGP instance view.

5. Execute the ip vpn-instance command to enter BGP-VPN instance view.

6. Execute the undo peer command to delete the BGP peer relationships established with the conflicting IP addresses.

7. Execute the peer as-number command to add the CE as an EBGP peer at its new IP address.

8. Execute the address-family ipv4 unicast command to enter BGP IPv4 unicast address family view.

9. Execute the peer enable command to enable BGP to exchange BGP IPv4 unicast routing information with the CE specified as a BGP peer.

Take IP reassignment only for interfaces in VPN 2 for example. On CE 2, perform the following steps:

a. In system view, execute the interface command to enter the view of the interface connected to PE 1.

b. Execute the ip address command to change the IP address of the target interface.

c. In system view, execute the bgp command to enter BGP instance view.

d. Execute the undo peer command to delete the BGP peer relationships established with the conflicting IP addresses.

10. Execute the peer as-number command to add the PE as an EBGP peer at its new IP address.

11. Execute the address-family ipv4 unicast command to enter BGP IPv4 unicast address family view.

12. Execute the peer enable command to enable BGP to exchange BGP IPv4 unicast routing information with the PE specified as a BGP peer.

13. Execute the import-route or network to advertise routing information for the VPN instance.

If the issue persists, proceed to step 2.

14. Collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

PEs unable to learn routes because of VPN route target filtering on the route reflector (RR)

Symptom

The RR does not reflect the MVPN routes, VPNv4 or VPNv6 routes, BGP L2VPN information, VPN Flowspec routes, or EVPN routes announced by one PE to other PEs, as expected.

Common causes

By default, an RR filters MVPN routes, VPNv4 or VPNv6 routes, BGP L2VPN information, VPN Flowspec routes, and EVPN routes based on VPN route targets. The RR adds a route to the routing table only if one of the export RT attributes in the route matches a local import RT. If no match is found, the RR discards the route, without forwarding the route to remote PEs.

Troubleshooting flow

To resolve this issue, disable VPN route target filtering on the RR. Figure 85 shows the troubleshooting flowchart.

Figure 85 Flowchart for troubleshooting failure of PEs to learn routes due to VPN route target filtering on the RR

Solution

1. Check the configuration for the affected address family. Make sure route target filtering has been disabled by using the undo policy vpn-target command.

2. In BGP instance view, execute the display this command to check the configuration in each address family view for the undo policy vpn-target command. If the command does not exist, proceed to step b. If the command exists, proceed to step 2.

3. Enter the view of the affected address family and execute the undo policy vpn-target to disable VPN route target filtering, allowing the RR to forward routes with mismatching RTs. If the issue persists, proceed to step 4.

4. Collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

A private IP routing table on a PE does not contain routes announced by a remote PE

Symptom

In an IPv4 or IPv6 MPLS L3VPN network, communication failure occurs between CEs because the IP routing table for a VPN instance on a PE lacks private routes to the site attached to a remote PE.

Common causes

The following are the common causes of this type of issue:

· The BGP session with the remote PE is not in Established state.

· The remote PE has not advertised private routes.

· A public tunnel has not been established.

· The local PE discards the private routes sent by the remote PE.

· The private routes advertised by the remote PE are in the local BGP routing table. However, they are not added to the IP routing table for the VPN instance.

Troubleshooting flow

Figure 86 shows the troubleshooting flowchart.

Figure 86 Troubleshooting flowchart for missing routes from remote PEs in a PE's VPN IP routing table

Solution

1. Verify that a BGP peer relationship has been established.

Execute the display bgp peer vpnv4 or display bgp peer vpnv6 command to verify that the local and remote PEs have established a BGP session in Established.

<Sysname> display bgp peer vpnv4

BGP local router ID: 11.11.11.11

Local AS number: 10

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

22.22.22.22 10 82 69 0 2 01:01:28 Established

¡ If the PEs has established a BGP peer relationship, proceed to step 3.

¡ If the PEs has not established a BGP peer relationship, see the BGP session establishment failure troubleshooting procedure in the part for troubleshooting IP routing issues. If the issue persists after the BGP session changes to the Established state, proceed to step 2.

2. Verify that the remote PE has advertised private routes to the local PE.

On the remote PE, execute the display bgp routing-table vpnv4 peer advertised-routes or display bgp routing-table vpnv6 peer advertised-routes command to verify that it has advertised private routes to the local PE.

<Sysname> display bgp routing-table vpnv4 peer 22.22.22.22 advertised-routes

Total number of routes: 6

BGP local router ID is 11.11.11.11

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Route distinguisher: 1:1

Total number of routes: 3

Network NextHop MED LocPrf Path/Ogn

* >e 1.1.1.1/32 10.1.1.2 0 100 20i

* >e 7.7.7.7/32 10.1.1.2 0 100 20?

* >e 10.1.1.0/24 10.1.1.2 0 100 20?

Route distinguisher: 2:2

Total number of routes: 3

Network NextHop MED LocPrf Path/Ogn

* >e 2.2.2.2/32 10.1.1.2 0 100 30i

* >e 7.7.7.7/32 10.1.1.2 0 100 30?

* >e 10.1.1.0/24 10.1.1.2 0 100 30?

If the information of interest exists, proceed to step 3. If the information of interest does not exist, proceed with the following checks:

a. Execute the display bgp routing-table vpnv4 or display bgp routing-table vpnv6 command on the remote PE to check for the private routes of interest.

- If the information of interest exists, proceed to step b.

- If the information of interest exists, see the IP routing troubleshooting part to check the routing configuration between the PE and the CE. Many routing protocols are available for PEs and CEs to exchange route information, including static routing, RIP, OSPF, OSPFv3, IS-IS, and BGP. Identify the troubleshooting procedure depending on the routing protocol used between the PE and the CE. If the issue persists after the private route has injected into the BGP routing table on the remote PE, proceed to step b.

b. Execute the display this command in BGP VPNv4 or or BGP VPNv6 address family view on the remote PE. Check for the route filtering misconfiguration that might prevent the private routes from being advertised. The following are the commands for route filtering:

- peer prefix-list export

- peer filter-policy export

- peer as-path-acl export

- filter-policy export

- peer route-policy export

To prevent a route export filtering command from incorrectly filtering private routes to be advertised, execute the undo form of that command. To avoid unexpected impacts on network services, adjust the private route export filtering policy under technical support guidance.

If the issue persists, proceed to step 3.

3. Verify that a public tunnel has been established.

The public tunnel for MPLS L3VPN can be an LSP, MPLS TE, or GRE tunnel. For an LSP or MPLS TE tunnel, the outer tag is an MPLS label. For a GRE public tunnel, the outer tag is GRE encapsulation.

A public tunnel is typically a label forwarding path automatically established by using LDP. The following information uses this type of public tunnel for example to describe the troubleshooting procedure for public tunnel establishment. For tunnels established by using other methods, see their respective troubleshooting procedures or seek help from Technical Support.

Execute the display mpls ldp peer command on each device in the private route advertisement path in the backbone network. Verify that they have established sessions with their LDP peers.

<Sysname> display mpls ldp peer

VPN instance: public instance

Total number of peers: 2

Peer LDP ID State Role GR AUT KA Sent/Rcvd

22.22.22.22:0 Operational Passive Off None 1816/1816

11.11.11.11:0 Operational Passive Off None 1816/1816

If the sessions have been successfully established, proceed to step 4.

If an LDP session is not established, see the LDP session down troubleshooting procedure for MPLS troubleshooting.

If the issue persists after the public tunnel is established, proceed to step 4.

4. Check the BGP routing table on the local PE for private routes advertised by the remote PE.

Execute the display bgp routing-table vpnv4 or display bgp routing-table vpnv6 command on the local PE to check for private routes advertised by the remote PE.

If the information of interest does not exist, perform the following operations:

a. Execute the display ip vpn-instance instance-name command on both the local and remote PEs to check for import and export RT mismatches for the VPN.

<Sysname> display ip vpn-instance instance-name vpn1

VPN-Instance Name and Index : vpn1, 1

Route Distinguisher : 1:1

Interfaces : GigabitEthernet2/0/1

TTL mode: pipe

Address-family IPv4:

Export VPN Targets :

1:1

Import VPN Targets :

1:1

- If an import and export RT mismatch exists, execute the vpn-target command in VPN instance view to change the RT settings on the local or remote PE. If the BGP routing table on the local PE still lacks the private routes advertised by the remote PE, proceed to step b. If the issue persists even if the BGP routing table on the local PE already contains the private routes advertised by the remote PE, proceed to step 5.

- If the import and export RTs match, proceed to step b.

b. Execute the display this command in BGP instance view. Check for the import route filtering misconfiguration that prevents the private routes from being imported. The following are the commands for route filtering:

- peer prefix-list import

- peer filter-policy import

- peer as-path-acl import

- filter-policy import

- peer route-policy import

To prevent a route import filtering command from incorrectly filtering received private routes, execute the undo form of that command. To avoid unexpected impacts on network services, adjust the private route import filtering policy under technical support guidance.

If the issue persists, proceed to step 5.

5. Identify the reason that prevents the BGP routes from being added to the IP routing table for the VPN instance. The following are possible reasons include:

¡ The device is configured with the undo policy vpn-target command. This command enables the device to add VPNv4 or VPNv6 routes to the BGP routing table for the VPN instance and select them as optimal routes, even if they do not match the the VPN instance’s RT attributes. However, these routes cannot be added to the IP routing table for the current VPN instance. To resolve this issue, execute the display this command in BGP instance view to identify the address families configured with the undo policy vpn-target command. If an address family is configured with that command, execute the policy vpn-target command in the view of that address family to resolve the issue.

¡ The device is configured with the routing-table bgp-rib-only command, which prevents BGP routes from being injected into the IP routing table. To resolve this issue, execute the display this command in BGP instance view to identify the address families configured with the routing-table bgp-rib-only command. If an address family is configured with that command, execute the undo routing-table bgp-rib-only command to resolve the issue.

If the issue persists, proceed to step 6.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Failure to forward large packets between sites

Symptom

On an IPv4 or IPv6 MPLS L3VPN network deploys devices from H3C and other vendors, inter-site access to resources in the same VPN might fail. For example, users in one site cannot access certain websites or download files via FTP in another site. Ping test fails when the payload of ICMP packets is above 1464 bytes. Ping tests succeeds when the payload of ICMP packets is less than 1464 bytes.

Common causes

This type of failure typically occurs when a small MPU is set on one or multiple network interfaces in the traffic forwarding path.

Troubleshooting flow

Figure 87 shows the troubleshooting flowchart.

Figure 87 Troubleshooting flowchart for failures to forward large packets between sites

Procedure

1. Set the MTU on each network interface in the traffic forwarding path to 1508 bytes or higher.

¡ On an H3C device, execute the display interface command to view the MTUs of interfaces.

<Sysname> display interface gigabitethernet 2/0/1

GigabitEthernet2/0/1

Current state: Administratively UP

Line protocol state: UP

Description: GigabitEthernet2/0/1 Interface

Bandwidth: 1000000 kbps

Maximum transmission unit: 1500

...

To change the MTU of an interface, execute the ip mtu or ipv6 mtu command in interface view.

¡ For information about the commands used on a device from a third-party vendor, see the documentation for that device.

If the issue persists, proceed to step 2.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Failure to ping the subnet attached to a remote CE from a PE

Symptom

On the IPv6 MPLS L3VPN network shown in Figure 88, multiple interfaces on PE 1 are assigned to VPN instance VPN 1. Executing the ping ipv6 2001:db8:3::1 command on both CE 1 and CE 2 successfully pings the subnet attached to remote CE 3. However, executing the ping ipv6 -vpn-instance vpn1 2001:db8:3::1 command on PE 1 cannot ping the subnet attached to CE 3.

Figure 88 Network diagram

Common causes

This issue typically occurs when CE 3 lacks routes to some private IPv6 addresses on PE 1. To resolve this issue, CE 3 must have routes to the IPv6 addresses of all up interfaces in the same VPN as it on PE 1.

Troubleshooting flow

Figure 89 shows the troubleshooting flowchart.

Figure 89 Troubleshooting flowchart for failures to ping the subnet attached to a remote CE from a PE

Procedure

1. Make sure CE 3 has routes to all private IPv6 addresses on PE 1.

When you ping a remote CE-attached subnet from PE 1 without specifying a source address, PE 1 sends ICMPv6 requests with a source address automatically selected from the IPv6 addresses on the packet outgoing interface. If CE 3 lacks routing information for this IPv6 address, it cannot return ICMPv6 echo packets.

To resolve this issue:

¡ Configure PE 1 to advertise all its private routes. For example, execute the import-route direct command in BGP-VPN IPv6 unicast address family view.

¡ Execute the ping ipv6 –a source-ipv6 -vpn-instance vpn-instance-name host command to perform a ping operation with a source IP address specified. Make sure this address exists in the IPv6 routing table on CE 3.

If the issue persists, proceed to step 2.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

MPLS TE issues

MPLS TE tunnel down

Symptom

After an MPLS TE tunnel is created, the display interface tunnel command shows that the tunnel's current state is DOWN.

<Sysname> display interface tunnel 1

Tunnel1

Current state: DOWN

Line protocol state: DOWN

Description: Tunnel1 Interface

Bandwidth: 64kbps

Maximum transmission unit: 1496

Internet address: 7.1.1.1/24 (primary)

Tunnel source unknown, destination 4.4.4.9

Tunnel TTL 255

Tunnel protocol/transport CR_LSP

Last clearing of counters: Never

Last 300 seconds input rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

Last 300 seconds output rate: 6 bytes/sec, 48 bits/sec, 0 packets/sec

Input: 0 packets, 0 bytes, 0 drops

Output: 177 packets, 11428 bytes, 0 drops

Common causes

The following are the common causes of this type of issue:

· The link where the MPLS TE tunnel is located is down.

· The configuration for the MPLS TE tunnel is incorrect.

· The destination address of the MPLS TE tunnel is referenced by a static route.

Analysis

Figure 90 shows the troubleshooting flowchart.

Figure 90 Flowchart for troubleshooting MPLS TE tunnel down

Solution

To resolve the issue:

1. Verify that the MPLS TE tunnel’s output interface on the device is in up state.

Execute the display interface command to view the state of the output interface for the MPLS TE tunnel. Make sure the output interface is in up state.

2. Verify that the MPLS TE configuration is correct.

Check the following settings in sequence:

a. Make sure the mpls te enable command is configured in the OSPF/IS-IS area and on the interfaces that the MPLS TE tunnel passes through.

b. Make sure the LSR ID and Router ID are the same Loopback interface address.

c. If the MPLS TE tunnel is established using RSVP-TE, make sure the device and interfaces are configured with the rsvp and rsvp enable commands.

d. If the mpls te bandwidth command is configured on the tunnel interface, make sure the device's output interface is configured with the mpls te max-link-bandwidth and mpls te max-reservable-bandwidth commands.

e. If the mpls te affinity-attribute command is configured on the tunnel interface, make sure the mpls te link-attribute command is configured properly on the output interface. To ensure a link can be used by a tunnel, the following requirements must be met:

- The link attribute bits corresponding to the 1 bits in the affinity mask are checked as follows: The link attribute bits corresponding to the 1 bits of the affinity attribute must have a minimum of one bit set to 1. The link attribute bits corresponding to the 0 bits of the affinity attribute must have no bit set to 1.

- The link attribute bits corresponding to the 0 bits in the affinity mask are not checked.

f. If the MPLS TE tunnel is established using Segment Routing, make sure segment routing related settings are configured in the IGP area of the device.

g. If the MPLS TE tunnel is established by using an explicit path specified with the mpls te path command, verify that the explicit path configuration is appropriate: To use the strict mode, you must specify the IP address of the incoming interface hop by hop; to use the loose mode, you must specify the node address of the each device to be passed through.

3. Verify that the destination address of the MPLS TE tunnel is not used by a static route.

4. Execute the display current-configuration | include destination command to check whether the destination address of the MPLS TE tunnel is referenced by a static route. If it is referenced by a static route, modify the static route or change the destination address of the tunnel according to the actual network requirements.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file of the device.

¡ Diagnostic information collected using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

MPLS TE tunnel state changing from up to down

Symptom

The state of an MPLS TE tunnel has changed from UP to Down.

Common causes

The following are the common causes of this type of issue:

· The link where the MPLS TE tunnel is located is down.

· The configuration of the MPLS TE tunnel has been deleted or incorrectly configured.

· RSVP message timeouts or errors have occurred.

· The physical link does not meet the required bandwidth for the MPLS TE tunnel.

· The BFD session is down on the MPLS TE tunnel interface or the physical interface where the tunnel is located.

Analysis

Figure 91 shows the troubleshooting flowchart.

Figure 91 Flowchart for troubleshooting MPLS TE tunnel state changing from up to down

Solution

To resolve the issue:

1. Verify that the MPLS TE tunnel's output interface is in up state.

Execute the display interface command to view the state of the output interface for the MPLS TE tunnel. Make sure the output interface is in up state.

2. Verify that the MPLS TE configuration is correct.

Check the following settings in sequence:

a. Verify that the mpls te enable command is configured in the OSPF/IS-IS area and on the interfaces that the MPLS TE tunnel passes through.

b. Verify that the LSR ID and Router ID the same Loopback interface address.

c. If the MPLS TE tunnel is established using RSVP-TE, make sure the device and interfaces are configured with the rsvp and rsvp enable commands.

- The link attribute bits corresponding to the 0 bits in the affinity mask are not checked.

f. If the MPLS TE tunnel is established using Segment Routing, make sure segment routing related settings are configured in the IGP area of the device.

3. Verify that no RSVP message timeouts or errors exist.

4. Use the display rsvp statistics command to Identify whether there are RSVP message timeouts (i.e., the number of Path messages sent and Resv messages received do not match, or the number of Path messages received and Resv messages sent do not match) or RSVP message errors (i.e., receiving PathError messages or ResvError messages). If RSVP message timeouts or errors are found, capture the error information carried in the PathError or ResvError packets, and then resolve the issue according to the error codes by referring to RFC 2205 and RFC 3209.

<Sysname> display rsvp statistics

P2P statistics:

Object Added Deleted

PSB 3 1

RSB 3 1

LSP 3 1

P2MP statistics:

Object Added Deleted

PSB 0 0

RSB 0 0

LSP 0 0

Packet Received Sent

Path 5 5

Resv 5 5

PathError 0 0

ResvError 0 0

PathTear 0 0

ResvTear 0 0

ResvConf 0 0

Bundle 0 0

Ack 0 0

Srefresh 0 0

Hello 0 0

Challenge 0 0

Response 0 0

Error 0 0

5. Verify that the physical link meets the bandwidth required for the MPLS TE tunnel.

6. When an MPLS TE tunnel with a higher priority is established on the device, it might preempt the bandwidth of an MPLS TE tunnel with lower priority, causing the state of the lower priority MPLS TE tunnel to become down. Check the remaining available bandwidth for each priority on the link by using the display mpls te link-management bandwidth-allocation command, and ensure that the remaining available bandwidth on the link is greater than the bandwidth required by the tunnel of that priority. If the remaining available bandwidth on the link cannot meet the requirements of the MPLS TE tunnel, modify the configuration, adjust the tunnel path, or provide more bandwidth for the link.

7. Verify that the BFD session for the MPLS TE tunnel interface or the tunnel's physical interface is not down.

8. Use the display mpls bfd te tunnel tunnel-number command to view the BFD state of the MPLS TE tunnel. If the BFD state of the MPLS TE tunnel is down, use the display bfd session command to identify the reason for the BFD session down state. Examine and modify the BFD configuration and clear link faults or quality issues of the physical links.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module Name: MPLS-TE-STD-MIB

· mplsTunnelUp (1.3.6.1.2.1.10.166.3.0.1)

· mplsTunnelDown (11.3.6.1.2.1.10.166.3.0.2)

Log messages

· IFNET/5/LINK_UPDOWN

· IFNET/3/PHY_UPDOWN

Loop in an MPLS TE tunnel

Symptom

A loop exists in the forwarding path of the MPLS TE tunnel, preventing traffic from being forwarded to the destination address through the MPLS TE tunnel.

Common causes

The same IP address exists on different devices that the MPLS TE tunnel passes through.

Solution

To resolve the issue:

1. Identify whether the same IP address has been configured on different devices that the MPLS TE tunnel passes through. If yes, change the IP addresses to ensure that no identical IP addresses exist on the different devices that the MPLS TE tunnel travels through.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file of the device.

¡ Diagnostic information collected using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Tunnel path calculation failure

Symptom

The calculation of the MPLS TE tunnel path failed, causing the tunnel to be down.

Common causes

The following are the common causes of this type of issue:

· No IGP neighbors have been established.

· No MPLS TEDB information exists.

· The configuration for the MPLS TE tunnel is incorrect.

Analysis

Figure 92 shows the troubleshooting flowchart.

Figure 92 Flowchart for troubleshooting tunnel path calculation failure

Solution

To resolve the issue:

1. Verify that an IGP neighbor has been established.

Execute the display ospf peer or display isis peer command to Identify whether an IGP neighbor has been established.

¡ If an IGP neighbor has been established, proceed to step 2.

¡ If no IGP neighbor has been established, complete the OSPF or IS-IS configuration to establish an IGP neighbor. For more information about OSPF and IS-IS, see OSPF configuration and IS-IS configuration respectively in the Layer 3—IP Routing Configuration Guide of the device.

2. Execute the display mpls te tedb command to view the information of MPLS TEDB.

If MPLS TEDB information exists, proceed to step 3.

If MPLS TEDB information does not exist, check the following configurations in order:

a. Verify that the mpls enable and mpls te enable commands are configured in the OSPF/IS-IS area and on the interfaces that the MPLS TE tunnel passes through.

b. Verify that the LSR ID and Router ID the same Loopback interface address.

3. Verify that the MPLS TE configuration is correct.

a. If the MPLS TE tunnel is established using RSVP-TE, make sure the device and interfaces are configured with the rsvp and rsvp enable commands.

b. If the MPLS TE tunnel is established using Segment Routing, make sure the segment-routing mpls command is configured in the IGP area of the device.

c. If the mpls te bandwidth command is configured on the tunnel interface, make sure the device's output interface is configured with the mpls te max-link-bandwidth and mpls te max-reservable-bandwidth commands.

d. If the mpls te affinity-attribute command is configured on the tunnel interface, make sure the mpls te link-attribute command is configured properly on the output interface. To ensure a link can be used by a tunnel, the following requirements must be met:

- The link attribute bits corresponding to the 0 bits in the affinity mask are not checked.

e. If the MPLS TE tunnel is established by using an explicit path specified with the mpls te path command, verify that the explicit path configuration is appropriate: To use the strict mode, you must specify the IP address of the incoming interface hop by hop; to use the loose mode, you must specify the node address of the each device to be passed through.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file of the device.

¡ Diagnostic information collected using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Hot-standby CRLSP establishment failure

Symptom

After the mpls te backup hot-standby command is configured for an MPLS TE tunnel, no hot-standby backup CRLSP is established as expected.

Common causes

The following are the common causes of this type of issue:

· The device has only one interface that is adjacent to the neighbor.

· The configuration for the MPLS TE tunnel is incorrect.

Analysis

Figure 93 shows the troubleshooting flowchart.

Figure 93 Flowchart for troubleshooting hot-standby CRLSP establishment failure.

Solution

To resolve the issue:

1. According to the configured IGP protocol, execute the display ospf peer or display isis peer command to view information about the interfaces connected with the same neighbor (to the same System ID or Router ID).

# Display the summary information of IS-IS neighbors.

<Sysname> display isis peer

Peer information for IS-IS(1)

-----------------------------

System ID: 0000.0000.0001

Interface: GE2/0/1 Circuit Id: 0000.0000.0001.01

State: Up HoldTime: 27s Type: L1(L1L2) PRI: 64

System ID: 0000.0000.0001

Interface: GE2/0/2 Circuit Id: 0000.0000.0001.01

State: Up HoldTime: 27s Type: L2(L1L2) PRI: 64

# Display OSPF neighbor summary information.

<Sysname> display ospf peer

OSPF Process 1 with Router ID 1.1.1.1

Neighbor Brief Information

Area: 0.0.0.0

Router ID Address Pri Dead-Time State Interface

1.1.1.2 1.1.1.2 1 40 Full/DR GE2/0/1

¡ If the number of interfaces connected to the neighbor is greater than or equal to 2, proceed to the next step.

¡ If the number of interfaces connected to the neighbor is less than 2, increase the physical links between the device and the neighbor to ensure a path is available for establishing the backup CRLSP.

2. Verify that the MPLS TE configuration is correct.

Check the following settings in sequence:

a. Verify that the mpls te enable command is configured in the OSPF/IS-IS area and on the interfaces that the MPLS TE tunnel passes through.

b. Verify that the LSR ID and Router ID the same Loopback interface address.

c. If the MPLS TE tunnel is established using RSVP-TE, make sure the device and interfaces are configured with the rsvp and rsvp enable commands.

- The link attribute bits corresponding to the 0 bits in the affinity mask are not checked.

f. If the MPLS TE tunnel is established using Segment Routing, make sure the segment-routing mpls command is configured in the IGP area of the device.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file of the device.

¡ Diagnostic information collected using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

N/A

Log messages

TE/5/TE_BACKUP_SWITCH

Issues of basic MPLS

Failure to forward packets through an LSP

Symptom

Packets sent by a host in the network cannot be forwarded through an LSP tunnel.

Common causes

The following are the common causes of this type of issue:

· The route does not exist.

· The LSP does not exist

· The route has not been recursed to the LSP tunnel.

· The forwarding state of the LSP tunnel is not ACTIVE.

· The BFD session state is down.

· The CPU usage is too high.

Troubleshooting flow

Figure 94 shows the troubleshooting flowchart.

Figure 94 Flowchart for troubleshooting packet forwarding failure on LSP

Solution

To resolve the issue:

1. Identify whether the IGP route exists.

Execute the display ip routing-table command to identify whether there is a subnet route destined for the Loopback interface address of the LSP destination node.

<Sysname> display ip routing-table 1.1.1.1

Summary count : 1

Destination/Mask Proto Pre Cost NextHop Interface

1.1.1.2/32 IS_L1 15 10 1.1.1.1 LoopBack1

¡ If the route does not exist, enable the IGP protocol on the Loopback interface and the public network interfaces to ensure the advertisement of the corresponding subnet route.

¡ If the route exists, proceed to step 2.

2. Identify whether the LSP exists.

Execute the display mpls lsp command to identify if there is an LSP destined for the Loopback interface address of the destination node.

¡ If no such LSP exists, establish one of the specified type:

- To establish an LDP LSP, enable MPLS and MPLS LDP on interfaces.

- To establish an SRLSP, execute the segment-routing mpls command in IS-IS IPv4 unicast address family view, OSPF view, or BGP IPv4 unicast address family view to enable MPLS-based SR.

- To establish an SR-MPLS TE policy, create the SR-MPLS TE policy correctly in SR TE view.

¡ If the LSP exists, proceed to step 3.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

1.1.1.2/32 LDP -/1049 GE2/0/1

3. Check whether the route has been recursed to the LSP tunnel.

Execute the display mpls tunnel all command to view the information of all tunnels. Execute the display fib command to view the FIB entry of the specified nexthop address. Find out the FIB entry where the Nexthop field value is the same as the Destination field value in the tunnel information, and then Identify whether the LSP index (value of the Token field) of the FIB entry is the same as the NHLFE ID of the tunnel.

¡ If they are different, it indicates that the route has not recursed to the LSP tunnel. Identify whether the tunnel type (Type field) of the specified FEC matches the tunnel type specified in the tunnel policy.

- If the tunnel types are different, modify the tunnel policy in tunnel policy view to make the tunnel policy configuration match with the specified FEC tunnel type.

- If the tunnel types are the same, proceed to step 7.

<Sysname> display tunnel-policy

Tunnel policy name: abc

Select-Seq: LSP

Load balance number : 1

Strict : No

¡ If the LSP index and the tunnel NHLFE ID are the same, it indicates that the route has recursed to the LSP tunnel. Proceed to step 4.

<Sysname> display mpls tunnel all

Destination Type Tunnel/NHLFE VPN Instance

2.2.2.9 LSP NHLFE3 -

3.3.3.9 SRLSP NHLFE2 -

4.4.4.9 SRPolicy NHLFE23068673 -

<Sysname> display fib

Destination count: 1 FIB entry count: 1

Flag:

U:Usable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination/Mask Nexthop Flag OutInterface/Token Label

55.55.55.55/32 2.2.2.9 UGHR 3 Null

…

4. Identify whether the forwarding state of the LSP tunnel is normal.

Execute the display mpls forwarding nhlfe command to view information about NHLFE entries.

¡ If the forwarding tags don't contain tag A, it implies that the LSP tunnel is not usable. Proceed to step 5.

¡ If the forwarding tags contain flag A, it implies that the LSP tunnel is functioning normally. Proceed to step 6.

<Sysname> display mpls forwarding nhlfe 3

Flags: T - Forwarded through a tunnel

N - Forwarded through the outgoing interface to the nexthop IP address

B - Backup forwarding information

A - Active forwarding information

M - P2MP forwarding information

S - Secondary backup path

NID Tnl-Type Flag OutLabel Forwarding Info

--------------------------------------------------------------------------------

3 LSP NA 1040127 GE2/0/3 10.0.3.2

5. Identify whether BFD is functioning properly.

Execute the display mpls bfd command or the display mpls sbfd command to view BFD/SBFD information for LSP tunnels.

¡ If the BFD/SBFD session state is Down, execute the mpls bfd enable command in system view to enable BFD/SBFD for MPLS, and make sure the BFD/SBFD session state is up.

¡ If the BFD/SBFD session state is Up, proceed with step 6.

<Sysname> display mpls bfd ipv4 22.22.2.2 32

Total number of sessions: 1, 1 up, 0 down, 0 init

FEC Type: LSP

FEC Info:

Destination: 22.22.2.2

Mask Length: 32

NHLFE ID: 1025

Local Discr: 513 Remote Discr: 513

Source IP: 11.11.1.1 Destination IP: 127.0.0.1

Session State: Up Session Role: Passive

Template Name: -

<Sysname> display mpls sbfd ipv4 22.22.2.2 32

Total number of sessions: 1, 1 up, 0 down, 0 init

FEC Type: LSP

FEC Info:

Destination: 22.22.2.2

Mask Length: 32

NHLFE ID: 1025

Local Discr: 513 Remote Discr: 513

Source IP: 11.11.1.1 Destination IP: 127.0.0.1

Session State: Up

Template Name: -

6. Identify whether CPU is functioning properly.

Execute the display cpu-usage command to view CPU usage statistics.

¡ If the CPU usage is too high, disable some unnecessary features to reduce the device's CPU usage.

¡ If the CPU usage is normal, proceed to step 7.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting VPLS

Only the VSI on one PE device on the two ends of a PW is in up state

Symptom

Only the VSI on one PE device on the two ends of a PW is in up state.

Common causes

A VSI is up when at least one PW and one AC is up or at least two ACs are up in the VSI.

The common causes of this type of issue are:

· On an up VSI on a PE, PWs are down but two up ACs exist.

· On a down VSI, PWs are down and no AC or only one AC is up.

Solution

To resolve the issue:

1. Execute the display l2vpn vsi command to check the state of the ACs and PWs on a VSI.

<Sysname> display l2vpn vsi verbose

VSI Name: vpls1

VSI Index : 0

VSI Description : vsi for vpls1

VSI State : Down

MTU : 1500

Bandwidth : -

Broadcast Restrain : -

Multicast Restrain : -

Unknown Unicast Restrain: -

MAC Learning : Enabled

MAC Table Limit : -

MAC Learning rate : -

Drop Unknown : -

PW Redundancy : Master

Flooding : Enabled

Statistics : Disabled

VXLAN ID : -

LDP PWs:

Peer PW ID Link ID State

192.3.3.3 1 8 Down

ACs:

AC Link ID State Type

GE2/0/3 srv1 1 Up Manual

2. Execute the display l2vpn pw verbose command to identify the reason why the PW is down.

<Sysname> display l2vpn pw verbose

VSI Name: aaa

Peer: 2.2.2.9 Remote Site: 2

Signaling Protocol : BGP

Link ID : 9 PW State : Down

In Label : 1420 Out Label: 1419

MTU : 1500

PW Attributes : Main

VCCV CC : -

VCCV BFD : -

Flow Label : Send

Control Word : Disabled

Tunnel Group ID : 0x800000960000000

Tunnel NHLFE IDs : 1038

Admin PW : -

E-Tree Mode : -

E-Tree Role : root

Root VLAN : -

Leaf VLAN : -

Down Reasons : Control word not match

The common causes of this type of issue are as follows:

¡ BFD session for PW down—The BFD session for PW detection is down. To resolve this issue, execute the display bfd session command to display BFD session information. Check and edit BFD configuration or check the physical link for link failure or link quality issues.

¡ BGP RD was deleted—The BGP RD has been deleted. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.

¡ BGP RD was empty—No BGP RD is configured. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.

¡ Control word not match—The control word configuration on the two ends of the PW is inconsistent. To resolve this issue, execute the control-word enable command to enable the control word feature on both ends.

¡ Encapsulation not match—The encapsulation types on the two ends of the PW are inconsistent. To resolve this issue, execute the pw-type command to configure the same encapsulation type for the two ends.

¡ LDP interface parameter not match—The LDP negotiation parameters on the two ends of the PW are inconsistent. To resolve this issue, execute the vccv cc command to specify the same VCCV control channel (CC) type. Alternatively, specify the same CEM class for the CEM interfaces on both ends of the PW.

¡ Non-existent remote LDP PW—The remote device has deleted the LDP PW. To resolve this issue, reconfigure the PW on the remote device.

¡ Local AC Down—The local AC is down. To resolve this issue, check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state.

¡ Local AC was non-existent—No local AC is configured. To resolve this issue, configure a local AC and associate it with a VSI.

¡ MTU not match—The MTU configuration on the two ends of the PW is inconsistent. To resolve this issue, configure the same MTU at both ends of the PW or use the mtu-negotiate disable command to disable MTU negotiation.

¡ Remote AC Down—The remote AC is down. To resolve this issue, check and edit the configuration on the remote AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· L2VPN/2/L2VPN_PWSTATE_CHANGE

· L2VPN/4/L2VPN_BGPVC_CONFLICT_LOCAL

· L2VPN/4/L2VPN_BGPVC_CONFLICT_REMOTE

· L2VPN/4/L2VPN_HARD_RESOURCE_NOENOUGH

· L2VPN/2/L2VPN_HARD_RESOURCE_RESTORE

· L2VPN/4/L2VPN_LABEL_DUPLICATE

VPLS traffic failed to be forwarded

Symptom

VPLS traffic failed to be forwarded.

Common causes

The following are the common causes of this type of issue:

· The AC is not up.

· The PW is not up.

· The PW did not generate forwarding information.

· No public tunnels are available for the PW.

· The public tunnel for the PW is abnormal.

Troubleshooting flowchart

Figure 95 shows the troubleshooting flowchart.

Figure 95 Flowchart for troubleshooting VPLS traffic forwarding failure

Solution

To resolve the issue:

1. Execute the display l2vpn vsi command to check the status and quantity of the AC and PW associated with a VSI.

<Sysname> display l2vpn vsi verbose

VSI Name: vpls1

VSI Index : 0

VSI Description : vsi for vpls1

VSI State : Up

MTU : 1500

Bandwidth : -

Broadcast Restrain : -

Multicast Restrain : -

Unknown Unicast Restrain: -

MAC Learning : Enabled

MAC Table Limit : -

MAC Learning rate : -

Drop Unknown : -

PW Redundancy : Master

Flooding : Enabled

Statistics : Disabled

VXLAN ID : -

LDP PWs:

Peer PW ID Link ID State

192.3.3.3 1 8 Down

ACs:

AC Link ID State Type

GE2/0/3 srv1 1 Up Manual

2. If the state of the AC is down, verify that the AC configuration is correct and the interface where the AC is located is up. If the AC configuration is incorrect or the interface where the AC is located is down, edit the AC configuration or troubleshoot the interface failure.

3. If the PW is down, execute the display l2vpn pw verbose command to check the reason why the PW is down.

<Sysname> display l2vpn pw verbose

VSI Name: aaa

Peer: 2.2.2.9 Remote Site: 2

Signaling Protocol : BGP

Link ID : 9 PW State : Down

In Label : 1420 Out Label: 1419

MTU : 1500

PW Attributes : Main

VCCV CC : -

VCCV BFD : -

Flow Label : Send

Control Word : Disabled

Tunnel Group ID : 0x800000960000000

Tunnel NHLFE IDs : 1038

Admin PW : -

E-Tree Mode : -

E-Tree Role : root

Root VLAN : -

Leaf VLAN : -

Down Reasons : Control word not match

The common causes of this type of issue are as follows:

¡ BGP RD was deleted—The BGP RD has been deleted. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.

¡ BGP RD was empty—No BGP RD is configured. To resolve this issue, execute the route-distinguisher route-distinguisher command in auto-discovery VSI view.

¡ Encapsulation not match—The encapsulation types on the two ends of the PW are inconsistent. Execute the pw-type command to configure the same encapsulation type for the two ends.

¡ Non-existent remote LDP PW—The remote device has deleted the LDP PW. To resolve the issue, reconfigure the LDP PW on the remote device.

¡ Local AC was non-existent—No local AC is configured. To resolve this issue, configure a local AC and associate it with a VSI.

4. If both the AC and PW are up, execute the display l2vpn forwarding pw verbose command to identify whether PW forwarding information exists. If the information exists, the Tunnel NHLFE IDs field displays the NHLFE IDs of the public tunnels that carry the PW.

¡ If PW forwarding information exists, go to step 6.

¡ If no PW forwarding information exists, go to step 5.

<Sysname> display l2vpn forwarding pw verbose

VSI Name: aaa

Link ID: 8

PW Type : VLAN PW State : Up

In Label : 1272 Out Label: 1275

MTU : 1500

PW Attributes : Main

VCCV CC : Router-Alert

VCCV BFD : Fault Detection with BFD

Flow Label : Send

Tunnel Group ID : 0x960000000

Tunnel NHLFE IDs: 1034

MAC limit : maximum=2000 alarm=enabled action=discard

5. Execute the display mpls lsp command to check for the tunnel that carries the PW. The tunnel is an LSP with the FEC as the PW peer IP address. If it does not exist, establish the tunnel that carries the PW.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

100.100.100.100/24 LDP -/1049 GE2/0/1

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Use the display diagnostic-information command to collect diagnostic information.

Related alarm and log messages

Alarm messages

N/A

Log messages

· L2VPN/2/L2VPN_PWSTATE_CHANGE

· L2VPN/4/L2VPN_BGPVC_CONFLICT_LOCAL

· L2VPN/4/L2VPN_BGPVC_CONFLICT_REMOTE

· L2VPN/4/L2VPN_HARD_RESOURCE_NOENOUGH

· L2VPN/2/L2VPN_HARD_RESOURCE_RESTORE

· L2VPN/4/L2VPN_LABEL_DUPLICATE

A PW in up state failed to forward packets between two PEs

Symptom

When a PW is in up state, it failed to forward packets between two PEs.

Common causes

The following are the common causes of this type of issue:

· The number of MAC addresses that a PW learned reached the upper limit, and the PW is configured to drop frames with unknown source MAC addresses when the maximum is reached.

· PW information has not been deployed to the forwarding module.

Troubleshooting flowchart

Figure 96 shows the troubleshooting flowchart.

Figure 96 Flowchart for troubleshooting packet forwarding failure between two PEs when the PW is in up state

Solution

To resolve the issue:

1. Execute the display l2vpn mac-address command to identify whether corresponding MAC address entries exist and the total number of learned MAC address entries. You can specify an AC interface or PW to display the total number of MAC address entries learned from that AC interface or PW.

¡ Display MAC address table information for VSIs.

<Sysname> display l2vpn mac-address

* - The output interface is issued to another VSI

MAC Address State VSI Name Link ID/Name Aging

0000-0000-000a Dynamic vpn1 GE2/0/1 Aging

0000-0000-0009 Dynamic vpn1 GE2/0/1 Aging

--- 2 mac address(es) found ---

¡ Display the number of MAC address entries.

<Sysname> display l2vpn mac-address count

2 mac address(es) found

2. Check for the maximum number of MAC addresses allowed to be learned, and the action to be taken on frames with unknown source MAC addresses when the PW has learned the maximum number of MAC addresses.

¡ Execute the display this command in VSI view to identify whether the mac-table limit and mac-table limit drop-unknown commands are configured for the VSI. If these commands are configured and the number of learned MAC addresses has reached the upper limit, increase the maximum number of MAC addresses that the VSI can learn or delete the mac-table limit drop-unknown command.

¡ Execute the display this command in AC view and PW view to check for the mac-limit command are configured for the VSI. If this command is configured and the number of learned MAC addresses has reached the upper limit, increase the maximum number of MAC addresses that can be learned or delete the action discard option.

3. Execute the display l2vpn forwarding pw verbose command to identify whether PW forwarding information exists. If the information exists, the Tunnel NHLFE IDs field displays the NHLFE IDs of the public tunnels that carry the PW.

¡ If forwarding information exists, go to step 5.

¡ If no forwarding information exists, go to step 4.

<Sysname> display l2vpn forwarding pw verbose

VSI Name: aaa

Link ID: 8

PW Type : VLAN PW State : Up

In Label : 1272 Out Label: 1275

MTU : 1500

PW Attributes : Main

VCCV CC : Router-Alert

VCCV BFD : Fault Detection with BFD

Flow Label : Send

Tunnel Group ID : 0x960000000

Tunnel NHLFE IDs: 1034

MAC limit : maximum=2000 alarm=enabled action=discard

4. Execute the display mpls lsp command to check for the tunnel that carries the PW. The tunnel is an LSP with the FEC as the PW peer IP address. If it does not exist, establish the tunnel that carries the PW.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

100.100.100.100/24 LDP -/1049 GE2/0/1

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Use the display diagnostic-information command to collect diagnostic information.

Related alarm and log messages

Alarm messages

N/A

Log messages

· L2VPN/4/L2VPN_MACLIMIT_MAX_AC

· L2VPN/4/L2VPN_MACLIMIT_MAX_PW

· L2VPN/4/L2VPN_MACLIMIT_MAX_VSI

An LDP PW cannot become up

Symptom

In a VPLS network, an LDP PW cannot become up.

Common causes

The following are the common causes of this type of issue:

· The encapsulation types at both ends of the PW are inconsistent.

· The MTU values at both ends of PW are inconsistent.

· The LDP session state is not Up.

· No public tunnels are available for the PW.

· The AC interface is not up.

Solution

To resolve the issue:

1. Use the display l2vpn pw verbose command to check for the peer IP address of the PW and the reason why the PW is down.

<Sysname> display l2vpn pw verbose

VSI Name: aaa

Peer: 2.2.2.9 VPLS ID: 100:100

Signaling Protocol : LDP

Link ID : 8 PW State : Down

In Label : 1553 Out Label: 1553

MTU : 1500

PW Attributes : Main

VCCV CC : -

VCCV BFD : -

Flow Label : -

Tunnel Group ID : 0x800000960000000

Tunnel NHLFE IDs : 1038

Admin PW : -

E-Tree Mode : -

E-Tree Role : root

Root VLAN : -

Leaf VLAN : -

Down Reasons : Control word not match

Table 15 shows the common causes of this type of issue.

Table 15 Common causes and solutions

Down Reasons	Symptom	Solution
BFD session for PW down	The BFD session for PW detection is down.	Execute the display bfd session command to display BFD session information. Check and edit BFD configuration or check the physical link for link failure or link quality issues.
Control word not match	The control word configuration on the two ends of the PW is inconsistent.	Execute the control-word enable command to enable the control word feature on both ends.
Encapsulation not match	The encapsulation types on the two ends of the PW are inconsistent.	Execute the pw-type command to configure the same encapsulation type for the two ends.
LDP interface parameter not match	The LDP negotiation parameters on the two ends of the PW are inconsistent.	Execute the vccv cc command to specify the same VCCV control channel (CC) type or specify the same CEM class for the CEM interfaces on both ends of the PW.
Non-existent remote LDP PW	The remote device has deleted the LDP PW.	Reconfigure the PW on the remote device.
Local AC Down	The local AC is down.	Check and edit the configuration on the AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state.
Local AC was non-existent	No local AC is configured.	Configure a local AC and associate it with a VSI.
MTU not match	The MTU configuration on the two ends of the PW is inconsistent.	Configure the same MTU at both ends of the PW or use the mtu-negotiate disable command to disable MTU negotiation.
Remote AC Down	The remote AC is down.	Check and edit the configuration on the remote AC interface or troubleshoot the issue on the interface where the AC is located and make sure the interface is in up state.
Label not allocated	No label is allocated.	Contact Technical Support.
Local VSI Down	The local VSI is down.	See "Only the VSI on one PE device on the two ends of a PW is in up state."
Local and remote LDP PWs have different AII	The local SAII and remote TAII are different.	See "LDP session down" in LDP Troubleshooting Guide.
Local LDP PW was not sent mapping message	The local end did not send the LDP mapping message.	See "LDP session down" in LDP Troubleshooting Guide.
Local LDP PW Virtual Nexthop defect	The local LDP PW has virtual next hop defects.	See steps 2 and 3.
Remote LDP PW Virtual Nexthop defect	The remote LDP PW has virtual next hop defects.	See steps 2 and 3.
Tunnel Down	The tunnel that carries the PW is down.	See step 3.

2. Execute the display l2vpn forwarding pw verbose command to identify whether PW forwarding information exists. If the information exists, the Tunnel NHLFE IDs field displays the NHLFE IDs of the public tunnels that carry the PW.

¡ If forwarding information exists, go to step 4.

¡ If no forwarding information exists, go to step 3.

<Sysname> display l2vpn forwarding pw verbose

VSI Name: aaa

Link ID: 8

PW Type : VLAN PW State : Up

In Label : 1272 Out Label: 1275

MTU : 1500

PW Attributes : Main

VCCV CC : Router-Alert

VCCV BFD : Fault Detection with BFD

Flow Label : Send

Tunnel Group ID : 0x960000000

Tunnel NHLFE IDs: 1034

MAC limit : maximum=2000 alarm=enabled action=discard

3. Execute the display mpls lsp command to check for the tunnel that carries the PW. The tunnel is an LSP with the FEC as the PW peer IP address as described in step 1. If it does not exist, establish the tunnel that carries the PW. If not, create a tunnel for carrying the PW. Supported public tunnel types include LSP, MPLS TE, and GRE tunnels. For how to create LSP, MPLS TE, and GRE public tunnels, see "Configuring a static LSP" and "Configuring LDP," "Configuring MPLS-TE" in MPLS Configuration Guide, and "Configuring GRE" in Layer 3—IP Services Configuring Guide, respectively.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

100.100.100.100/24 LDP -/1049 GE2/0/1

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Use the display diagnostic-information command to collect diagnostic information.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

A VSI cannot become up when VPLS uses LDP

Symptom

A VSI cannot become up when VPLS uses LDP

Common causes

The VSI is in the Up state if any of the following conditions are met:

· Under VSI, there is at least one PW Up and one AC Up.

· In the VSI, at least two ACs are up.

· In VSI, there are at least two PW Up (multi-section PW networking).

The following are the common causes of this type of issue:

· The total number of ACs and Pws in up state in a VSI is less than 2.

· The shutdown command was executed in the VSI.

Solution

To resolve the issue:

1. Execute the display this command in VSI view to check for the shutdown command.

¡ If shutdown command is configured, execute the undo shutdown command.

¡ If the shutdown command is not configured, go to the next step.

2. Execute the display l2vpn vsi command to check the status and quantity of the AC and PW associated with the VSI.

<Sysname> display l2vpn vsi verbose

VSI Name: vpls1

VSI Index : 0

VSI Description : vsi for vpls1

VSI State : Up

MTU : 1500

Bandwidth : -

Broadcast Restrain : -

Multicast Restrain : -

Unknown Unicast Restrain: -

MAC Learning : Enabled

MAC Table Limit : -

MAC Learning rate : -

Drop Unknown : -

PW Redundancy : Master

Flooding : Enabled

Statistics : Disabled

VXLAN ID : -

LDP PWs:

Peer PW ID Link ID State

192.3.3.3 1 8 Down

ACs:

AC Link ID State Type

GE2/0/3 srv1 1 Up Manual

¡ If the sum of ACs and PWs associated with the VSI is less than 2, create ACs and PWs first.

¡ If the state of the AC is down, verify that the AC configuration is correct and the interface where the AC is located is up. If the AC configuration is incorrect or the interface where the AC is located is down, edit the AC configuration or troubleshoot the interface failure.

¡ If the state of the PW is down, see "An LDP PW cannot become up" to troubleshoot the issue.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting segment routing issues

EVPN L3VPN over SRv6 issues

EVPN L3VPN over SRv6 BE traffic forwarding failure

Symptom

On an EVPN L3VPN over SRv6 network shown in Figure 97, traffic forwarding failure occurs when the PEs use the SRv6 BE mode to forward the private service traffic in VPN 1 between CE 1 and CE 2.

The troubleshooting flow is the same for IPv4 and IPv6 private networks. The following information uses IPv4 for example to describe the troubleshooting procedure for EVPN L3VPN over SRv6.

Figure 97 Network diagram

Table 16 shows the key network planning information for the EVPN L3VPN over SRv6 network.

Table 16 SRv6 locators and major addresses in the address plan

Device	Interface or locator	Address	Device	Interface or locator	Address
PE 1	SRv6 Locator	100:1::/64	PE 2	SRv6 Locator	300:1::/64
	Loopback0	1::1/128		Loopback0	3::3/128
CE 1	Loopback0	10.10.10.10/32	CE 2	Loopback0	20.20.20.20/32
	Loopback1	11.11.11.11/32		Loopback1	22.22.22.22/32
P	SRv6 Locator	200:1::/64
	Loopback0	2::2/128

Common causes

The following are the common causes of this type of issue:

· The PEs cannot learn public network routes because of the failure to establish BGP EVPN peer relationships.

· EVPN L3VPN over SRv6 configuration is incomplete.

· The routes for the SRv6 SIDs are unreachable.

Troubleshooting flow

Figure 98 shows the troubleshooting flowchart.

Figure 98 Flowchart for troubleshooting EVPN L3VPN over SRv6 BE traffic forwarding failure

Solution

1. Ping the private IP on the remote PE from the local PE to check its connectivity. When you do that, specify the name of the VPN instance to which the private IP address belongs.

<Sysname> ping -vpn-instance vpn1 20.20.20.20

Ping 20.20.20.20 (20.20.20.20): 56 data bytes, press CTRL+C to break

56 bytes from 20.20.20.20: icmp_seq=0 ttl=254 time=2.000 ms

56 bytes from 20.20.20.20: icmp_seq=1 ttl=254 time=1.000 ms

56 bytes from 20.20.20.20: icmp_seq=2 ttl=254 time=1.000 ms

56 bytes from 20.20.20.20: icmp_seq=3 ttl=254 time=1.000 ms

56 bytes from 20.20.20.20: icmp_seq=4 ttl=254 time=2.000 ms

--- Ping statistics for 20.20.20.20 in VPN instance vpn1 ---

5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 1.000/1.400/2.000/0.490 ms

If the ping test fails, the private network service is unavailable. Proceed to the next step.

2. Perform the subsequent checks on both the local and remote PE devices. This document takes the local PE device for example. Execute the display ip routing-table vpn-instance command on the local PE device to check the VPN routing table for routes to the private IP addresses.

<Sysname> display ip routing-table vpn-instance vpn1

Destinations : 10 Routes : 10

Destination/Mask Proto Pre Cost NextHop Interface

10.1.1.0/24 Direct 0 0 10.1.1.2

10.1.1.2/32 Direct 0 0 127.0.0.1

10.1.1.255/32 Direct 0 0 10.1.1.2

10.10.10.10/32 BGP 255 0 10.1.1.1

11.11.11.11/32 BGP 255 0 10.1.1.1

20.1.1.0/24 BGP 255 0 3::3

20.20.20.20/32 BGP 255 0 3::3

22.22.22.22/32 BGP 255 0 3::3

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

If the device has VPN routes to the private IP addresses, verify that the FIB contains entries for these addresses, with a U flag in the Flag field. The U flag indicates that the private network IP address is valid.

<Sysname> display fib vpn-instance vpn1

Route destination count: 10

Directly-connected host count: 1

Flag:

U:Usable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination/Mask Nexthop Flag OutInterface/Token Label

11.11.11.11/32 10.1.1.1 UGHR Null

127.0.0.0/8 127.0.0.1 U InLoop0 Null

10.1.1.0/24 10.1.1.2 U Null

20.20.20.20/32 3::3 UGHR Null

10.1.1.2/32 127.0.0.1 UH Null

22.22.22.22/32 3::3 UGHR Null

10.1.1.255/32 10.1.1.2 UBH Null

255.255.255.255/32 127.0.0.1 UH InLoop0 Null

10.10.10.10/32 10.1.1.1 UGHR Null

10.1.1.1/32 10.1.1.1 UH Null

20.1.1.0/24 3::3 UGR Null

If a private IP address does not exist or is invalid in the VPN routing table or VPN FIB, proceed to check for BGP EVPN route learning issues between PEs and verify that a valid tunnel exists between them.

3. Execute the display bgp peer l2vpn evpn command on the local PE device to verify that it has established a BGP EVPN peer relationship with the remote PE device.

¡ If the PE devices have established a BGP EVPN peer relationship successfully, the State field in the command output displays Established. Proceed to step 4.

¡ If the PE devices have not established a BGP EVPN peer relationship, see the troubleshooting procedure for BGP peer establishment issues.

<Sysname> display bgp peer l2vpn evpn

BGP local router ID: 1.1.1.1

Local AS number: 100

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

3::3 100 13 10 0 2 00:00:05 Established

4. Execute the display bgp l2vpn evpn command on the local PE device to verify that it has learned the BGP EVPN routes from the remote PE device. Pay special attention to the PrefixSID field for each route in the command output. In this field, the End.DT4 SID represents the SRv6 SID assigned by the remote PE to the VPN private IP address. When forwarding traffic to that private address in SRv6 BE mode, the local PE device uses that End.DT4 SID as the destination address in the IPv6 packets.

<Sysname> display bgp l2vpn evpn [5][0][32][20.20.20.20]/80

BGP local router ID: 1.1.1.1

Local AS number: 100

Route distinguisher: 100:1(vpn1)

Total number of routes: 1

Paths: 1 available, 1 best

BGP routing table information of [5][0][32][20.20.20.20]/80:

From : 3::3 (3.3.3.3)

Rely nexthop : FE80::A2C3:E2FF:FEB5:306

Original nexthop: 3::3

Out interface :

Route age : 00h28m51s

OutLabel : 3

Ext-Community : <RT: 100:1>

RxPathID : 0x0

TxPathID : 0x0

PrefixSID : End.DT4 SID <300:1::A>

SRv6 Service TLV (37 bytes):

Type: SRV6 L3 Service TLV (5)

Length: 34 bytes, Reserved: 0x0

SRv6 Service Information Sub-TLV (33 bytes):

Type: 1 Length: 30, Rsvdl: 0x0

SID Flags: 0x0 Endpoint behavior: 0x13 Rsvd2: 0x0

SRv6 SID Sub-Sub-TLV:

Type: 1 Len: 6

BL: 64 NL: 0 FL: 64 AL: 0 TL: 0 TO: 0

AS-path : 300

Origin : incomplete

Attribute value : MED 0, localpref 100, pref-val 0

State : valid, internal, best

Source type : local

IP precedence : N/A

QoS local ID : N/A

Traffic index : N/A

EVPN route type : IP prefix advertisement route

ESI : 0000.0000.0000.0000.0000

Ethernet tag ID : 0

IP prefix : 20.20.20.20/32

Gateway address : 0.0.0.0

MPLS label : 3

Tunnel policy : NULL

Rely tunnel IDs : N/A

Re-orignination : Disable

If the local PE device has learned a BGP EVPN route to the remote destination address and its PrefixSID attribute is correct, execute the display bgp routing-table ipv4 vpn-instance command on the local PE device to verify that the route has been added to the BGP EVPN routing table. Verify that the route is both valid and the best. Only valid and optimal BGP routes can be learned into a VPN routing table.

<Sysname> display bgp routing-table ipv4 vpn-instance vpn1

Total number of routes: 8

BGP local router ID is 1.1.1.1

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Network NextHop MED LocPrf PrefVal Path/Ogn

* > 10.1.1.0/24 10.1.1.2 0 32768 ?

* e 10.1.1.1 0 0 200?

* > 10.1.1.2/32 127.0.0.1 0 32768 ?

* >e 10.10.10.10/32 10.1.1.1 0 0 200?

* >e 11.11.11.11/32 10.1.1.1 0 0 200?

* >i 20.1.1.0/24 3::3 0 100 0 ?

* >i 20.20.20.20/32 3::3 0 100 0 300?

* >i 22.22.22.22/32 3::3 0 100 0 300?

If the local PE device has failed to learn a valid and optimal BGP EVPN route, or if the PrefixSID attribute is missing from the BGP EVPN route, proceed to check for incomplete EVPN L3VPN over SRv6 configuration. An incomplete configuration might result in failure to allocate SRv6 SIDs or to establish an SRv6 tunnel.

5. Verify that the configuration for EVPN L3VPN over SRv6 on both PE devices is complete. If the configuration is incomplete, add the missing configuration. If the configuration is complete, proceed to step 6.

Execute the display current-configuration command on both PE devices to check for the following configuration items. If they are missing, see EVPN L3VPN over SRv6 configuration in Segment Routing Configuration Guide to add the missing configuration items.

isis 1

cost-style wide-compatible

address-family ipv6 unicast

segment-routing ipv6 locator aaa //Enable IS-IS to advertise the specified locator and the SRv6 SIDs in the locator.

bgp 100

peer 3::3 as-number 100

address-family l2vpn evpn

peer 3::3 enable

peer 3::3 advertise encap-type srv6 //Adertise SRv6-encapsulated EVPN routes with the PrefixSID attribute to the peer or peer group.

ip vpn-instance vpn1

address-family ipv4 unicast

segment-routing ipv6 best-effort evpn //Steer route matching traffic to an SRv6 BE tunnel.

segment-routing ipv6 locator aaa evpn //Apply the locator to the BGP process so the device can use the locator to allocate SRv6 SIDs for the private network routes in the specified VPN instance.

segment-routing ipv6

encapsulation source-address 11::11 //Specify the source address for the outer IPv6 header of SRv6 VPN packets.

locator aaa ipv6-prefix 300:1:: 64 static 8 //Create a Locator segment.

If the SRv6 configuration is complete and correct, proceed to check for unreachable SRv6 SIDs on the forwarding path.

6. Execute the display ipv6 routing-table ipv6-address command on all devices along the forwarding path to check for routes to the SRv6 SIDs on both PEs.

<Sysname> display ipv6 routing-table 300:1::A

Summary count : 2

Destination: 300:1::/64 Protocol : IS_L1

NextHop : FE80::A2C3:E2FF:FEB5:306 Preference: 15

Interface : Cost : 20

Execute the ping ipv6 command on all devices to verify the connectivity to the SRv6 SIDs.

<Sysname> ping ipv6 300:1::A

Ping6(56 data bytes) 1001::1 --> 300:1::A, press CTRL+C to break

56 bytes from 300:1::A, icmp_seq=0 hlim=63 time=2.000 ms

56 bytes from 300:1::A, icmp_seq=1 hlim=63 time=1.000 ms

56 bytes from 300:1::A, icmp_seq=2 hlim=63 time=0.000 ms

56 bytes from 300:1::A, icmp_seq=3 hlim=63 time=1.000 ms

56 bytes from 300:1::A, icmp_seq=4 hlim=63 time=0.000 ms

--- Ping6 statistics for 300:1::A ---

5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 0.000/0.800/2.000/0.748 ms

If a route exists to the SRv6 SID at the remote end and the ping test succeeds, proceed to step 7. If no route exists to the SRv6 SID or the ping test fails, check for the failure of IGP on the PE devices to advertise the network segment in the locator for the SID to other devices in the domain. In this situation, use Layer 3—IP Routing Troubleshooting Guide to resolve the IGP route advertisement issue.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

EVPN L3VPN over SRv6 TE traffic forwarding failure

Symptom

On an EVPN L3VPN over SRv6 network shown in Figure 99, traffic forwarding failure occurs when the PEs use the SRv6 TE mode to forward the private service traffic between CE 1 and CE 2 in VPN 1 through an SRv6 TE policy.

The troubleshooting flow is the same for IPv4 and IPv6 private networks. The following information uses IPv4 for example to describe the troubleshooting procedure for EVPN L3VPN over SRv6.

Figure 99 Network diagram

‌

Table 17 shows the key network planning information for the EVPN L3VPN over SRv6 network.

Table 17 SRv6 locators and major addresses in the address plan

Device	Interface or locator	Address	Device	Interface or locator	Address
PE 1	SRv6 Locator	100:1::/64	PE 2	SRv6 Locator	300:1::/64
	Loopback0	1::1/128		Loopback0	3::3/128
CE 1	Loopback0	10.10.10.10/32	CE 2	Loopback0	20.20.20.20/32
	Loopback1	11.11.11.11/32		Loopback1	22.22.22.22/32
P	SRv6 Locator	200:1::/64
	Loopback0	2::2/128

Common causes

The following are the common causes of this type of issue:

· The PEs cannot learn VPN routes because of the failure to establish BGP EVPN peer relationships.

· Recursive routing is not performed in SRv6 TE mode.

· The SRv6 TE policy for the EVPN route is down.

· The routes for the SRv6 SIDs are unreachable.

Troubleshooting flow

Figure 100 shows the troubleshooting flowchart.

Figure 100 Flowchart for troubleshooting EVPN L3VPN over SRv6 TE traffic forwarding failure

Solution

1. Ping the private IP on the remote PE from the local PE to check its connectivity. When you do that, specify the name of the VPN instance to which the private IP address belongs.

<Sysname> ping -vpn-instance vpn1 20.20.20.20

Ping 20.20.20.20 (20.20.20.20): 56 data bytes, press CTRL+C to break

56 bytes from 20.20.20.20: icmp_seq=0 ttl=254 time=2.000 ms

56 bytes from 20.20.20.20: icmp_seq=1 ttl=254 time=1.000 ms

56 bytes from 20.20.20.20: icmp_seq=2 ttl=254 time=1.000 ms

56 bytes from 20.20.20.20: icmp_seq=3 ttl=254 time=1.000 ms

56 bytes from 20.20.20.20: icmp_seq=4 ttl=254 time=2.000 ms

--- Ping statistics for 20.20.20.20 in VPN instance vpn1 ---

5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 1.000/1.400/2.000/0.490 ms

If the ping test fails, the private network service is unavailable. Proceed to the next step.

2. Perform the subsequent checks on both the local and remote PE devices. This document takes the local PE device for example. Execute the display ip routing-table vpn-instance vpn1 command on the local PE device to check the VPN routing table for routes to the private IP addresses. Verify that the outgoing interface in the route to the remote private IP address is the name of the expected SRv6 TE policy.

<Sysname> display ip routing-table vpn-instance vpn1

Destinations : 10 Routes : 10

Destination/Mask Proto Pre Cost NextHop Interface

10.1.1.0/24 Direct 0 0 10.1.1.2

10.1.1.2/32 Direct 0 0 127.0.0.1

10.1.1.255/32 Direct 0 0 10.1.1.2

10.10.10.10/32 BGP 255 0 10.1.1.1

11.11.11.11/32 BGP 255 0 10.1.1.1

20.1.1.0/24 BGP 255 0 3::3 p1

20.20.20.20/32 BGP 255 0 3::3 p1

22.22.22.22/32 BGP 255 0 3::3 p1

127.0.0.0/8 Direct 0 0 127.0.0.1 InLoop0

255.255.255.255/32 Direct 0 0 127.0.0.1 InLoop0

If the device has routes to the private IP addresses in the VPN instance, verify that the FIB contains entries for these addresses, with a U flag in the Flag field. The U flag in an entry indicates that the entry for the IP address is valid. Verify that the outgoing interface/token field in the route to the remote private IP address contains the forwarding index for the expected SRv6 TE policy.

<Sysname> display fib vpn-instance vpn1

Route destination count: 10

Directly-connected host count: 1

Flag:

U:Usable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination/Mask Nexthop Flag OutInterface/Token Label

11.11.11.11/32 10.1.1.1 UGHR Null

127.0.0.0/8 127.0.0.1 U InLoop0 Null

10.1.1.0/24 10.1.1.2 U Null

20.20.20.20/32 3::3 UGHR 2150629377 Null

10.1.1.2/32 127.0.0.1 UH Null

22.22.22.22/32 3::3 UGHR 2150629377 Null

10.1.1.255/32 10.1.1.2 UBH Null

255.255.255.255/32 127.0.0.1 UH InLoop0 Null

10.10.10.10/32 10.1.1.1 UGHR Null

10.1.1.1/32 10.1.1.1 UH Null

20.1.1.0/24 3::3 UGR 2150629377 Null

Proceed to check for BGP EVPN route learning issues between PEs and verify that a valid SRv6 TE policy exists between them in either of the following situations:
The VPN routing table or VPN FIB does not contain a valid entry for a private IP address.
The entry does not contain the expected SRv6 TE policy as the outgoing interface.

3. Execute the display bgp peer l2vpn evpn command on the local PE device to verify that it has established a BGP EVPN peer relationship with the remote PE device.

¡ If the PE devices have established a BGP EVPN peer relationship successfully, the State field in the command output displays Established. Proceed to step 4.

¡ If the PE devices have not established a BGP EVPN peer relationship, see the troubleshooting procedure for BGP peer establishment issues.

<Sysname> display bgp peer l2vpn evpn

BGP local router ID: 1.1.1.1

Local AS number: 100

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

3::3 100 145 145 0 3 01:56:37 Established

4. Execute the display bgp l2vpn evpn command on the local PE to verify that it has learned the BGP EVPN routes from the remote PE. Pay special attention to the PrefixSID field for each route in the command output. In this field, the End.DT4 SID represents the SRv6 SID assigned by the remote PE to the VPN private IP address. When forwarding traffic to that private address in SRv6 TE mode, the local PE encapsulates that End.DT4 SID as the last SID in the SRH expansion header, along with the SID list in the SRv6 TE policy.

<Sysname> display bgp l2vpn evpn [5][0][32][20.20.20.20]/80

BGP local router ID: 1.1.1.1

Local AS number: 100

Route distinguisher: 100:1(vpn1)

Total number of routes: 1

Paths: 1 available, 1 best

BGP routing table information of [5][0][32][20.20.20.20]/80:

From : 3::3 (3.3.3.3)

Rely nexthop : FE80::A2C3:E2FF:FEB5:306

Original nexthop: 3::3

Out interface :

Route age : 00h28m51s

OutLabel : 3

Ext-Community : <RT: 100:1>

RxPathID : 0x0

TxPathID : 0x0

PrefixSID : End.DT4 SID <300:1::A>

SRv6 Service TLV (37 bytes):

Type: SRV6 L3 Service TLV (5)

Length: 34 bytes, Reserved: 0x0

SRv6 Service Information Sub-TLV (33 bytes):

Type: 1 Length: 30, Rsvdl: 0x0

SID Flags: 0x0 Endpoint behavior: 0x13 Rsvd2: 0x0

SRv6 SID Sub-Sub-TLV:

Type: 1 Len: 6

BL: 64 NL: 0 FL: 64 AL: 0 TL: 0 TO: 0

AS-path : 300

Origin : incomplete

Attribute value : MED 0, localpref 100, pref-val 0

State : valid, internal, best

Source type : local

IP precedence : N/A

QoS local ID : N/A

Traffic index : N/A

EVPN route type : IP prefix advertisement route

ESI : 0000.0000.0000.0000.0000

Ethernet tag ID : 0

IP prefix : 20.20.20.20/32

Gateway address : 0.0.0.0

MPLS label : 3

Tunnel policy : NULL

Rely tunnel IDs : N/A

Re-orignination : Disable

If the local PE has learned a BGP EVPN route to the remote destination address and its PrefixSID attribute is correct, execute the display bgp routing-table ipv4 vpn-instance command on the local PE to verify that the route has been added to the BGP VPN routing table. Verify that the route is both valid and the best. Only valid and optimal BGP routes can be learned into a VPN routing table.

<Sysname> display bgp routing-table ipv4 vpn-instance vpn1 20.20.20.20

BGP local router ID: 1.1.1.1

Local AS number: 100

Paths: 1 available, 1 best

BGP routing table information of 20.20.20.20/32:

From : 3::3 (3.3.3.3)

Rely nexthop : FE80::A2C3:E2FF:FEB5:306

Original nexthop: 3::3

Out interface :

Route age : 02h03m22s

OutLabel : 3

Ext-Community : <RT: 100:1>

RxPathID : 0x0

TxPathID : 0x0

PrefixSID : End.DT4 SID <300:1::A>

SRv6 Service TLV (37 bytes):

Type: SRV6 L3 Service TLV (5)

Length: 34 bytes, Reserved: 0x0

SRv6 Service Information Sub-TLV (33 bytes):

Type: 1 Length: 30, Rsvdl: 0x0

SID Flags: 0x0 Endpoint behavior: 0x13 Rsvd2: 0x0

SRv6 SID Sub-Sub-TLV:

Type: 1 Len: 6

BL: 64 NL: 0 FL: 64 AL: 0 TL: 0 TO: 0

AS-path : 300

Origin : incomplete

Attribute value : MED 0, localpref 100, pref-val 0

State : valid, internal, best, remoteredist

Source type : evpn remote-import

IP precedence : N/A

QoS local ID : N/A

Traffic index : N/A

Tunnel policy : a

Rely tunnel IDs : 2150629377

If the local PE has failed to learn a valid and optimal BGP EVPN route, or if the PrefixSID attribute is missing from the route, proceed to check for the following issues:
Incorrect recursive routing configuration for EVPN L3VPN over SRv6.
SRv6 TE policy issues.

5. In BGP-VPN IPv4 unicast address family view, execute the display this command to verify that the current configuration includes the segment-routing ipv6 traffic-engineering evpn or segment-routing ipv6 traffic-engineering best-effort evpn command. If neither of the commands is present, add the configuration. If either command is present, proceed to step 6.

<Sysname> system-view

[Sysname] bgp 100

[Sysname-bgp-default] ip vpn-instance vpn1

[Sysname-bgp-default-vpn1] address-family ipv4 unicast

[Sysname-bgp-default-ipv4-vpn1] display this

segment-routing ipv6 locator aaa evpn

segment-routing ipv6 traffic-engineering evpn

If the above recursive routing configuration is correct, see the SRv6 TE policy configuration in the segment routing configuration guide for the product to verify that the basic SRv6 configuration and the configuration for steering traffic to SRv6 TE policies are correct. If all settings are correct, proceed to the next step.

6. On each PE device, verify that the SRv6 TE policy in the route to the remote destination IP address is valid. Execute the display segment-routing ipv6 te policy command on each PE device. Identify the SRv6 TE policy that has a forwarding index value that is the same as the rely tunnel IDs value in the route to the destination IP address displayed by executing the display bgp routing-table command. Examine the Status field for the SRv6 TE policy to verify that it is up. If the policy is up, proceed to the next step. If the policy is down, see the SRv6 TE policy down issue troubleshooting procedure to resolve the issue.

<Sysname> display segment-routing ipv6 te policy

Name/ID: p1/0

Color: 10

Endpoint: 1000::1

Name from BGP:

BSID:

Mode: Dynamic Type: Type 2 Request state: Succeeded

Current BSID: 8000::1 Explicit BSID: - Dynamic BSID: 8000::1

Reference counts: 3

Flags: A/BS/NC

Status: Up

AdminStatus: Up

Up time: 2020-03-09 16:09:40

Down time: 2020-03-09 16:09:13

Hot backup: Enabled

Statistics: Enabled

Statistics by service class: Enabled

Path verification: Enabled

Drop-upon-invalid: Enabled

BFD trigger path-down: Enabled

SBFD: Enabled

Remote: 1000

SBFD template name: abc

SBFD backup template name: -

OAM SID: -

BFD Echo: Disabled

Forwarding index: 2150629377

…

Execute the ping srv6-te policy command to verify that the SRv6 TE policy has a valid path to reach the destination IP address. If BFD or SBFD is not configured to monitor the connectivity of the SRv6 TE policy, the up state of the policy only indicates that the first hop of the policy is reachable. To verify that all SIDs in the forwarding path are unreachable, you must perform this step.

<Sysname> ping srv6-te policy policy-name p1

Ping SRv6-TE policy (56 data bytes) , press CTRL+C to break

Segment list ID: 1

Preference=10, Path Type=Main, Protocol origin=Local, Originator=0,0.0.0.0, Discriminator=10, End.OP=none

56 bytes from 300:1::1, icmp_seq=0 ttl=63 time=2.000 ms

56 bytes from 300:1::1, icmp_seq=1 ttl=63 time=1.000 ms

56 bytes from 300:1::1, icmp_seq=2 ttl=63 time=0.000 ms

56 bytes from 300:1::1, icmp_seq=3 ttl=63 time=0.000 ms

56 bytes from 300:1::1, icmp_seq=4 ttl=63 time=0.000 ms

--- Ping6 SRv6-TE Policy statistics ---

5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 0.000/0.600/2.000/0.800 ms

If the ping srv6-te policy command output shows that the SRv6 TE policy is not reachable, proceed to the next step.

7. Execute the display segment-routing ipv6 te segment-list command to identify the SIDs in the SID list of the best candidate path in the SRv6 TE policy.

<Sysname> display segment-routing ipv6 te segment-list

Total Segment lists: 2

Name/ID: s1/1

Origin: CLI

Status: Up

Nodes : 2

Flags: None

Local BSID: -

Reverse BSID: -

Reference counts: 0

Index : 10 SID: 200:1::1

Status : Up TopoStatus: Nonexistent

Type : Type_2 Flags: None

Coc Type : - Common prefix length: 0

Function length : 0 Args length: 0

Endpoint Behavior : -

Index : 20 SID: 300:1::1

Status : - TopoStatus: -

Type : Type_2 Flags: None

Coc Type : - Common prefix length: 0

Function length : 0 Args length: 0

Endpoint Behavior : -

Execute the display ipv6 routing-table ipv6-address command on each device in the forwarding path to verify that they have a valid route to each SRv6 SID, including the End.DT4 SID.

<Sysname> display ipv6 routing-table 300:1::A

Summary count : 2

Destination: 300:1::/64 Protocol : IS_L1

NextHop : FE80::A2C3:E2FF:FEB5:306 Preference: 15

Interface : Cost : 20

Execute the ping ipv6 command on each device in the forwarding path to verify the connectivity to the SRv6 SID.

<Sysname> ping ipv6 300:1::A

Ping6(56 data bytes) 1001::1 --> 300:1::A, press CTRL+C to break

56 bytes from 300:1::A, icmp_seq=0 hlim=63 time=2.000 ms

56 bytes from 300:1::A, icmp_seq=1 hlim=63 time=1.000 ms

56 bytes from 300:1::A, icmp_seq=2 hlim=63 time=0.000 ms

56 bytes from 300:1::A, icmp_seq=3 hlim=63 time=1.000 ms

56 bytes from 300:1::A, icmp_seq=4 hlim=63 time=0.000 ms

--- Ping6 statistics for 300:1::A ---

5 packet(s) transmitted, 5 packet(s) received, 0.0% packet loss

round-trip min/avg/max/std-dev = 0.000/0.800/2.000/0.748 ms

If the route to a SID is unreachable, check the SID list configuration for the SRv6 TE policy on the source node for SID list orchestration errors. If the SID list is correct, check the IGP for failure to advertise the locators that contain the SIDs to other devices. For more information about troubleshooting IGP, see the IP routing troubleshooting procedures.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting EVPN VPWS over SRv6

Troubleshooting EVPN VPWS over SRv6 BE traffic forwarding failure

Symptom

In an EVPN VPWS over SRv6 network, traffic forwarding fails in SRv6 BE mode.

Common causes

The following are the common causes of this type of issue:

· The BGP EVPN peers are not successfully established.

· The EVPN instance configurations do not match on both ends.

· The AC interface state is not up, or the AC access methods configured on both ends are different.

· The EVPN route cannot be steered to the SRv6 BE tunnel.

Analysis

Figure 101 shows the troubleshooting flowchart.

Figure 101 Flowchart for troubleshooting EVPN VPWS over SRv6 BE traffic forwarding failure

Solution

To resolve the issue:

1. Verify that the BGP EVPN peers are successfully established.

Execute the display bgp peer l2vpn evpn command on the local PE to verify that the BGP EVPN peers have been successfully established.

¡ If the State field in the output information is Established, the BGP EVPN peers have been successfully established between the PEs. If the condition exists, proceed to step 2.

¡ If the condition does not exist, resolve the issue of BGP EVPN peer establishment failure. For more information, see the analysis for locating the BGP peer establishment failure.

<PE1> display bgp peer l2vpn evpn

BGP local router ID: 1.1.1.1

Local AS number: 100

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

2::2 100 13 10 0 2 00:00:05 Established

2. Verify that the EVPN VPWS over SRv6 configurations on both PEs match.

In the EVPN VPWS over SRv6 network, the Route Target and Service ID settings on both ends must match. The encapsulation method used by EVPN must be SRv6. In addition, the MTU, control word, and SRv6 PW data encapsulation type settings must be the same.

Verify that the configurations on both PEs match by following these steps:

a. Execute the display this command in cross-connect group view of the two PEs to check the encapsulation method and Route Target of EVPN. If the encapsulation method is not SRv6, execute the evpn encapsulation srv6 command in cross-connect group view to modify the encapsulation method. If the Export RT value of one PE is not within the Import RT value range of the other PE, execute the vpn-target command in cross-connect group EVPN instance view to edit the RT value, so that the RTs of the two PEs match.

<PE1> system-view

[PE1] xconnect-group vpna

[PE1-xcg-vpna] display this

xconnect-group vpna

evpn encapsulation srv6

route-distinguisher 1:1

vpn-target 1:1 export-extcommunity

vpn-target 1:1 import-extcommunity

connection abc

segment-routing ipv6 locator aaa

evpn local-service-id 1 remote-service-id 2

ac interface GigabitEthernet 2/0/1

return

b. Execute the display evpn route xconnect-group command on both PEs to view the Service ID, MTU, and control word information.

- Service ID: The local service ID on one PE must be the same as the remote service ID on the other PE. If they are different, the SRv6 PW cannot be established. If the service IDs do not match, you need to execute the evpn local-service-id remote-service-id command in cross-connect view of the PE to edit the local service ID or remote service ID for them to match each other.

- MTU: View the local MTU value through the Local MTU field. If the MTU values of both ends are different, you need to edit the MTU value by executing the mtu command in cross-connect view. If the MTU value on one PE is 0, it can match any MTU value on the remote PE, and you do not need to edit the MTU.

- SRv6 PW data encapsulation type: Check the local SRv6 PW data encapsulation type via the PW type field. If the data encapsulation types on both ends are different, you need to edit the data encapsulation type of the PW in the PW template specified for the SRv6 PW with the srv6-pw-type command.

- Control word: The control word settings on both PEs must be identical. If the Flags field value does not include C, the control word feature is not enabled. Otherwise, the control word feature is enabled. If the control word settings on both PEs are different, you need to modify the control word configuration in the PW template specified for the SRv6 PW with the control-word enable command.

<PE1> display evpn route xconnect-group

Ctrl Flags: P - Primary, B - Backup, C - Control word

Xconnect group name: vpna

Connection name: pw1

Encapsulation : SRv6

ESI : 0000.0000.0000.0000.0000

Local service ID : 1

Remote service ID : 2

In SID[DX2] : 100::1:0:2

In SID[DX2L] : -

Local MTU : 1500

AC State : Up

Tunnel policy : -

PW class : -

PW type : Ethernet

SRv6 Tunnel:

Next Hop : 2::2

ESI : 0000.0000.0000.0000.0000

Out SID : 200::1:0:2

Flags : P

MTU : 1500

State : Up

If the settings on both PEs match but the issue persists, proceed to the next step.

3. Verify that the AC interface is up.

Execute the display evpn route xconnect-group command on the PE to view the state of the AC. If the AC is in down state, check the network connection and resolve the physical link down issue.

<PE1> display evpn route xconnect-group

Ctrl Flags: P - Primary, B - Backup, C - Control word

Xconnect group name: vpna

Connection name: pw1

Encapsulation : SRv6

ESI : 0000.0000.0000.0000.0000

Local service ID : 1

Remote service ID : 2

In SID[DX2] : 100::1:0:2

In SID[DX2L] : -

Local MTU : 1500

AC State : Up

SRv6 Tunnel:

Next Hop : 2::2

ESI : 0000.0000.0000.0000.0000

Out SID : 200::1:0:2

Flags : P

MTU : 1500

State : Up

4. Verify that the AC access modes on both PEs are consistent.

Execute the display l2vpn forwarding ac verbose command on both PEs to check the AC access mode. If the two ends use different AC access modes, traffic forwarding might fail. You need to modify the AC access mode through the access-mode keyword of the ac interface command in cross-connect view.

<PE1> display l2vpn forwarding ac verbose

Xconnect-group Name: vpws1

Connection Name: actopw

Interface:

Link ID : 1

Access Mode : Ethernet

Interface:

Link ID : 1

Access Mode : Ethernet

Reflector :

IP Address : 100.1.1.4

MAC Address : 8850-fc51-5cee

Src Port : 200

Dst Port : 201

5. Verify that the EVPN route is steered to the SRv6 BE tunnel.

Execute the display l2vpn peer srv6 verbose command on the PE to examine the SRv6 BE tunnel to which the EVPN route is steered.

<PE1> display l2vpn peer srv6 verbose

Xconnect-group Name: vpna

Connection Name: pw1

Peer: 2::2

Remote Service ID : 2

Signaling Protocol : EVPN

Link ID : 0x1

Sub Link ID : 0x0

SRv6 Tunnel State : Up

In SID : 100::1:0:2

Out SID : 200::1:0:2

MTU : 1500

SRv6 Tunnel Attributes : Main

Tunnel Group ID : 0x1000000030080000

Tunnel NHLFE IDs : -

Nexthop/Interface : FE80::7A6F:24FF:FE26:306 / GE2/0/2

Color : -

Color-Only : -

Recursion Mode : SID based

¡ If the Nexthop/Interface field has a value, the EVPN route is steered to the SRv6 BE tunnel. Execute the display ipv6 fib command to identify whether the forwarding information of the next hop address in the output information is accurate. If it is not accurate, contact technical support.

<PE1> display ipv6 fib FE80::7A6F:24FF:FE26:306

FIB entry count: 1

Flag:

U:Usable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination: FE80:: Prefix length: 10

Nexthop : :: Flags: U

Time stamp : 0x1 Label: Null

Interface : InLoop0 Token: Invalid

¡ If the value of the Nexthop/Interface field is a hyphen (-), the EVPN route is not steered to the SRv6 BE tunnel. Execute the display ipv6 routing-table command to verify that a route is available to the SRv6 SID. If no such an IPv6 route exists, resolve the IGP route learning issue. For more information, see the Layer 3 IP routing troubleshooting guide.

<PE1> display ipv6 routing-table 200::1:0:2

Summary count : 1

Destination: 200::/64 Protocol : O_INTRA

NextHop : FE80::7A6F:24FF:FE26:306 Preference: 10

Interface : GE2/0/2 Cost : 2

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting EVPN VPWS over SRv6 TE traffic forwarding failure

Symptom

In the EVPN VPWS over SRv6 network, traffic forwarding through an SRv6 TE policy fails in SRv6 TE mode.

Common causes

The following are the common causes of this type of issue:

· The BGP EVPN peers are not successfully established.

· The EVPN instance configurations do not match on both ends.

· The AC interface state is not up, or the AC access methods configured on both ends are different.

· Traffic steering in SRv6 TE mode is not configured.

· The EVPN route cannot be steered to the SRv6 TE policy.

Analysis

Figure 102 shows the troubleshooting flowchart.

Figure 102 Flowchart for troubleshooting EVPN VPWS over SRv6 TE policy traffic forwarding failure

Solution

To resolve the issue:

1. Verify that the BGP EVPN peers are successfully established.

Execute the display bgp peer l2vpn evpn command on the local PE to verify that the BGP EVPN peers have been successfully established.

¡ If the State field in the output information is Established, the BGP EVPN peers have been successfully established between the PEs. If the condition exists, proceed to step 2.

¡ If the condition does not exist, resolve the issue of BGP EVPN peer establishment failure. For more information, see the analysis for locating the BGP peer establishment failure.

<PE1> display bgp peer l2vpn evpn

BGP local router ID: 1.1.1.1

Local AS number: 100

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

2::2 100 13 10 0 2 00:00:05 Established

2. Verify that the EVPN VPWS over SRv6 configurations on both PEs match.

In the EVPN VPWS over SRv6 network, the Route Target and Service ID settings on both ends must match. The encapsulation method used by EVPN must be SRv6. In addition, the MTU, control word, and PW data encapsulation type settings must be the same.

Verify that the configurations on both PEs match by following these steps:

<PE1> system-view

[PE1] xconnect-group vpna

[PE1-xcg-vpna] display this

xconnect-group vpna

evpn encapsulation srv6

route-distinguisher 1:1

vpn-target 1:1 export-extcommunity

vpn-target 1:1 import-extcommunity

connection abc

segment-routing ipv6 locator aaa

evpn local-service-id 1 remote-service-id 2

ac interface GigabitEthernet 2/0/1

return

b. Execute the display evpn route xconnect-group command on both PEs to view the Service ID, MTU, and control word information.

<PE1> display evpn route xconnect-group

Ctrl Flags: P - Primary, B - Backup, C - Control word

Xconnect group name: vpna

Connection name: pw1

Encapsulation : SRv6

ESI : 0000.0000.0000.0000.0000

Local service ID : 1

Remote service ID : 2

In SID[DX2] : 100::1:0:2

In SID[DX2L] : -

Local MTU : 1500

AC State : Up

Tunnel policy : -

PW class : -

PW type : Ethernet

SRv6 Tunnel:

Next Hop : 2::2

ESI : 0000.0000.0000.0000.0000

Out SID : 200::1:0:2

Flags : P

MTU : 1500

State : Up

If the settings on both PEs match but the issue persists, proceed to the next step.

3. Verify that the AC interface is up.

Execute the display evpn route xconnect-group command on the PE to view the state of the AC. If the AC is in down state, check the network connection and resolve the physical link down issue.

<PE1> display evpn route xconnect-group

Ctrl Flags: P - Primary, B - Backup, C - Control word

Xconnect group name: vpna

Connection name: pw1

Encapsulation : SRv6

ESI : 0000.0000.0000.0000.0000

Local service ID : 1

Remote service ID : 2

In SID[DX2] : 100::1:0:2

In SID[DX2L] : -

Local MTU : 1500

AC State : Up

SRv6 Tunnel:

Next Hop : 2::2

ESI : 0000.0000.0000.0000.0000

Out SID : 200::1:0:2

Flags : P

MTU : 1500

State : Up

4. Verify that the AC access modes on both PEs are consistent.

<PE1> display l2vpn forwarding ac verbose

Xconnect-group Name: vpws1

Connection Name: actopw

Interface:

Link ID : 1

Access Mode : Ethernet

Interface:

Link ID : 1

Access Mode : Ethernet

Reflector :

IP Address : 100.1.1.4

MAC Address : 8850-fc51-5cee

Src Port : 200

Dst Port : 201

5. Verify that traffic steering in SRv6 TE mode is configured.

In cross-connect group EVPN instance view, execute the display this command to verify that the segment-routing ipv6 traffic-engineering or segment-routing ipv6 traffic-engineering best-effort command is configured. If no such commands exist, configure the command. If the command exists, proceed to step 6.

<PE1> system-view

[PE1] xconnect-group vpna

[PE1-xcg-vpna] evpn encapsulation srv6

[PE1-xcg-vpna-evpn-srv6] display this

evpn encapsulation srv6

route-distinguisher 1:1

vpn-target 1:1 export-extcommunity

vpn-target 1:1 import-extcommunity

segment-routing ipv6 traffic-engineering

return

6. Verify that the EVPN route is steered to the SRv6 TE policy.

Execute the display l2vpn peer srv6 verbose command on the PE to examine the SRv6 TE policy to which the EVPN route is steered.

<PE1> display l2vpn peer srv6 verbose

Xconnect-group Name: vpna

Connection Name: pw1

Peer: 2::2

Remote Service ID : 2

Signaling Protocol : EVPN

Link ID : 0x1

Sub Link ID : 0x0

SRv6 Tunnel State : Up

In SID : 100::1:0:2

Out SID : 200::1:0:2

MTU : 1500

SRv6 Tunnel Attributes : Main

Tunnel Group ID : 0x1000000230080001

Tunnel NHLFE IDs : 2150629377

Nexthop/Interface : -

Color : 10

Color-Only : 11

Recursion Mode : Nexthop based

¡ If the Tunnel NHLFE IDs field has a value, the EVPN route is steered to the SRv6 TE policy, and this value is the tunnel index of the SRv6 TE policy to which the EVPN route is steered. Execute the display segment-routing ipv6 te policy command on PE. If the Forwarding index field value is the same as the Tunnel NHLFE IDs field value, this SRv6 TE policy is the one to which the EVPN route is steered. Execute the display l2vpn forwarding srv6 verbose command on the PE, and identify whether the SRv6 Tunnel State field value is Up. If it is Down, contact technical support. If it is Up, verify that such issues exist as the SRv6 TE policy SID list and the packet forwarding path planning are different and a physical link becomes faulty on the SRv6 TE policy packet forwarding path. For how to resolve such issues, see the analysis for locating the issue that the SRv6 TE policy cannot take effect.

<PE1> display l2vpn forwarding srv6 verbose

Xconnect-group Name: vpna

Connection Name: pw1

Link ID : 0x1

SRv6 Tunnel Type : Ethernet

SRv6 Tunnel State : Up

In SID : 100::1:0:2

Out SID : 200::1:0:2

MTU : 1500

SRv6 Tunnel Attributes : Main

SRv6 Forwarding IDs : 2150629377

<PE1> display segment-routing ipv6 te policy

Name/ID: p1/0

Color: 10

End-point: 2::2

Name from BGP:

Name from PCE:

BSID:

Mode: Dynamic Type: Type_2 Request state: Succeeded

Current BSID: 100::1:0:1 Explicit BSID: - Dynamic BSID: 100::1:0:1

Reference counts: 5

Flags: A/BS/NC

Status: Up

AdminStatus: Up

Up time: 2022-05-13 18:53:48

Down time: 2022-05-13 18:49:56

Hot backup: Disabled

Statistics: Disabled

Statistics by service class: Disabled

Path verification: Not configured

Drop-upon-invalid: Disabled

BFD trigger path-down: Disabled

SBFD: Disabled

BFD Echo: Disabled

BFD no-bypass: Disabled

Forwarding index: 2150629377

Association ID: 1

Service-class: -

Rate-limit: -

PCE delegation: Disabled

PCE delegate report-only: Disabled

Reoptimization: Disabled

Encapsulation mode: -

Flapping suppression Remaining interval: -

Candidate paths state: Configured

Candidate paths statistics:

CLI paths: 1 BGP paths: 0 PCEP paths: 0 ODN paths: 0

Candidate paths:

Preference : 10

Network slice ID: -

CPathName:

ProtoOrigin: CLI Discriminator: 10

Instance ID: 0 Node address: 0.0.0.0

Originator: 0, ::

Optimal: Y Flags: V/A

Dynamic: Not configured

PCEP: Not configured

Explicit SID list:

ID: 1 Name: s1

Weight: 1 Forwarding index: 2149580802

State: Up State(-): -

Verification State: -

Path MTU: 1500 Path MTU Reserved: 0

Local BSID: -

Reverse BSID: -

¡ If the Tunnel NHLFE IDs field value is a hyphen (-), the EVPN route is not steered to the SRv6 TE policy. You need to identify whether the SRv6 TE policy configuration on the PE is correct, and resolve the issue that the SRv6 TE policy cannot come up. For more information, see the analysis for locating the issue that the SRv6 TE policy cannot take effect.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting SR-MPLS

An SR-MPLS BE tunnel cannot be established

Symptom

The output from the display mpls lsp command on a node shows that the node does not have an outgoing label or the outgoing label is not SR-MPLS allocated when an SR-MPLS BE tunnel is established. For example, the FEC for the egress node is 5.5.5.5/32. The following output shows that no SR-MPLS outgoing label to 5.5.5.5/32 exists on this node, indicating that no SRLSP destined for 5.5.5.5/32 exists.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

12.1.1.2 Local -/- GE2/0/1

Tunnel1 Local -/- NHLFE2

Tunnel10 Local -/- NHLFE1

1.1.1.1/32 ISIS 16010/- -

2.2.2.2/32 ISIS 16020/3 GE2/0/1

2.2.2.2/32 ISIS -/3 GE2/0/1

3.3.3.3/32 ISIS 16030/16030 GE2/0/1

3.3.3.3/32 ISIS -/16030 GE2/0/1

4.4.4.4/32 ISIS 16040/16040 GE2/0/1

4.4.4.4/32 ISIS -/16040 GE2/0/1

1.1.1.1/1/4122 SR-TE -/16030 GE2/0/1

16040

Common causes

The following are the common causes of this type of issue:

· Physical link failure.

· SR-MPLS label publishment failed, because the IGP or BGP peer relationship was not established.
The SR-MPLS configuration is missing or incorrect.

Troubleshooting flowchart

Figure 103 shows the troubleshooting flowchart.

Figure 103 Flowchart for troubleshooting SR-MPLS BE tunnel establishment failure

Solution

To resolve the issue:

1. On each node along the SRLSP, execute the display interface brief command to verify that both the physical link state and the data link layer state of each interface on the SRLSP are up.

2. Verify that an IGP/BGP peer relationship is established correctly and the IGP/BGP configuration is correct on each node that the SRLSP traverses. The troubleshooting procedure depends on the IGP protocol used:

¡ When the IGP protocol is OSPF:

- Execute the display ospf command to check for the Opaque capable field in the command output. If this field exists, Opaque LSA advertisement and reception capability is enabled in OSPF. If this field does not exist, execute the opaque-capability enable command in OSPF view to enable opaque LSA advertisement and reception.

- Execute the display ospf peer command to check the value for the State field. If the value is Full, the neighboring routers are fully adjacent. If the value is not Full, see "OSPFv3 neighbor unable to enter Full state" in OSPFv3 Troubleshooting Guide to troubleshoot the issue.

- Execute the display mpls lsp command to check for the OSPF LSP. The SR prefix SID for each node is manually assigned to the loopback address. If no such LSP is available, verify that OSPF is enabled on each node by using the ospf area command in loopback interface view or the network command in OSPF area view.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

1.1.1.9/32 OSPF 16010/- -

1.1.1.9/32 ISIS 16010/- -

2.2.2.9/32 OSPF 16020/17020 RAGG1.4

- If the output from the display mpls lsp command also contains the BGP LSP, SRLSP generation might fail because of prefix SID conflict. In this case, execute the peer route-policy command to filter out routes learned from the BGP peer.

¡ When the IGP protocol is IS-IS:

- Execute the display isis command to check the value for the Cost style field to identify whether the link cost style is wide, compatible, or wide-compatible. If the link cost style is neither of them, execute the cost-style command to change the link cost style.

- Execute the display isis peer command to check the value of the State field. If the value is Up, the IS-IS neighbor relationship is normal. If it is not Up, see "IS-IS neighbor establishment failure" in IS-IS Troubleshooting Guide to troubleshoot the issue.

- Execute the display mpls lsp command to check for the IS-IS LSP. The SR prefix SID for each node is manually assigned to the loopback address. If no such LSP is available, check for the isis enable command in loopback interface view on each node.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

1.1.1.9/32 OSPF 16010/- -

1.1.1.9/32 ISIS 16010/- -

2.2.2.9/32 ISIS 16020/17020 RAGG1.4

¡ When the IGP protocol is BGP:

- Execute the display bgp peer command to check the value for the State field. If the value is Established, the BGP session is normal. If the value is not Established, see "BGP peer establishment failure" in BGP troubleshooting Guide to troubleshoot the issue.

- Execute the display mpls lsp command to check for the BGP LSP. If no such LSP is available, verify that the peer label-route-capability command is configured to enable BGP to exchange labeled routes with a peer or peer group.

<Sysname> display mpls lsp

FEC Proto In/Out Label Out Inter/NHLFE/LSINDEX

1.1.1.9/32 OSPF 16010/- -

1.1.1.9/32 ISIS 16010/- -

2.2.2.9/32 BGP 16020/17020 RAGG1.4

3. Check the SR-MPLS configuration on each node that the SRLSP traverses:

a. Verify that SR-MPLS is enabled in IS-IS view, OSPF view, or BGP view. If it is not enabled, execute the segment-routing mpls command to enable SR-MPLS.

b. Verify that a prefix SID has been configured in loopback interface view. If it is not configured, execute the ospf prefix-sid command in OSPF view or isis prefix-sid command in IS-IS view to configure a prefix SID.

c. Execute the display segment-routing label-block command to identify whether the prefix SID configured in loopback interface view is within the SRGB label range. If the prefix SID is not within the SRGB range, edit the configured prefix SID.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

The state of the SR-MPLS TE Tunnel is down

Symptom

The output from the display mpls te tunnel-interface command on the ingress node shows that the SR-MPLS TE tunnel is down.

<Sysname> display mpls te tunnel-interface

Tunnel Name : Tunnel 1

Tunnel Signalled Name : tunnel1

Tunnel State : Down (Main CRLSP Down. Backup CRLSP Down.)

...

Common causes

The following are the common causes of this type of issue:

· A physical link failure exists on the SRLSPs used by the SR-MPLS TE tunnel.

· The BFD session for detecting the SR-MPLS TE tunnel is down.

· The SR-MPLS configuration is missing or incorrect.

· Incorrect SR-MPLS TE tunnel configuration.

Troubleshooting flowchart

Figure 104 shows the troubleshooting flowchart.

Figure 104 Flowchart for troubleshooting SR-MPLS TE tunnel down

Solution

To resolve the issue:

1. On each node along the SRLSP, execute the display interface brief command to verify that both the physical link state and the data link layer state of each interface on the SRLSP are up.

2. Identify whether the SR TE tunnel down is caused by a BFD session down.

a. Execute the display this command in SR MPLS-TE tunnel interface view to check for the mpls bfd, mpls sbfd, mpls tunnel-bfd, or mpls tunnel-sbfd command. If any command is available, go to step 3

b. Execute the display mpls bfd or display mpls sbfd command to check the BFD or SBFD session state.

c. If the status is down, execute the undo mpls bfd, undo mpls sbfd, undo mpls tunnel-bfd, or undo mpls tunnel-sbfd command to delete BFD/SBFD related commands.

3. If the BFD/SBFD session is normal or no BFD/SBFD session exists, check the SR-MPLS configuration.

a. In IS-IS view or OSPF view, check the following configuration to verify that SR-MPLS is supported:

- When the IGP protocol is IS-IS, execute the display isis command to check the value for the Cost style field to identify whether the link cost style is wide, compatible, or wide-compatible. If the link cost style is neither of them, execute the cost-style command to change the link cost style.

- When the IGP protocol is OSPF, execute the display ospf command to check for the Opaque capable field in the command output. If this field exists, Opaque LSA advertisement and reception capability is enabled in OSPF. If this field does not exist, execute the opaque-capability enable command in OSPF view to enable opaque LSA advertisement and reception.

b. If you use prefix SIDs for IP traffic forwarding over SRLSPs, identify whether a prefix SID has been configured in loopback interface view. If it is not configured, execute the ospf prefix-sid command in OSPF view or the isis prefix-sid command in IS-IS view to configure a prefix SID. If you use adjacency SIDs for IP traffic forwarding over SRLSPs, enable adjacency SID allocation in OSPF view or IS-IS view, or identify whether an adjacency SID has been configured on the interface on the SRLSP forwarding path. If it is not configured, execute the segment-routing adjacency enable command in OSPF view or IS-IS view to enable SR-MPLS adjacency SID allocation. You can also execute the isis adjacency-sid or ospf adjacency-sid command in interface view to assign an adjacency SID to an adjacency.

c. Execute the display segment-routing label-block command to identify whether the prefix SID configured in loopback interface view is within the SRGB label range, and whether the adjacency SID configured in interface view is within the SRLB label range. If the prefix SID is not within the SRGB range or the adjacency SID is not within the SRLB label range, change the configured adjacency SID.

4. Check MPLS-TE tunnel configuration and perform the following tasks based on the establishment mode of the MPLS-TE tunnel:

¡ Over a static SRLSP—Execute the display mpls static-sr-mpls command on the ingress node of the SRLSP to verify that the ordered list of labels represented by the Out-Label field matches the labels allocated for the nodes that the static SRLSP traverses. If the label sequence in the outgoing label stack on the ingress node does not match the static labels configured on each node along the SRLSP, execute the static-sr-mpls lsp command to change the label sequence in the outgoing label stack on the ingress node.

¡ Over an explicit-path SRLSP—Execute the display explicit-path command on the ingress node of the SRLSP to verify that the IP addresses or SIDs match the IP addresses of the nodes along the SRLSP path or local SIDs. In addition, make sure the SID type specified by the nexthop command in explicit path view on the ingress node is consistent with the prefix SID or adjacency SID type configured in interface view on each node along the SRLSP. This means if a prefix SID is configured on the interface, the SID specified by the nexthop command must also be a prefix SID. If they are inconsistent, execute the nexthop command to change the IP address or SID.

¡ Over a PCE-calculated SRLSP—Check for the mpls te delegation command in MPLS-TE tunnel interface view and execute the display mpls te pce peer command to identify whether the PCC and PCE have established a PCEP session. Use packet capture to verify that the controller (PCE) has performed path updates and that the path is correct. In the captured packets, make sure the adjacency SID or next-hop address sent by the PCE uses strict mode, and the prefix SID or node address uses loose mode. If the PCC and PCE have not established a PCEP session and the captured packets do not meet the above requirements, check the configuration on the controller.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

TE/5/TE_BACKUP_SWITCH

SRv6 TE policy issues

SRv6 TE policy cannot take effect

Symptom

An SRv6 TE policy fails the connectivity verification performed by using the ping srv6-te policy command. The command output shows SRv6 TE policy anomalies, indicating that the SRv6 TE policy cannot forward packets properly. For example:

<Sysname> ping srv6-te policy policy-name p1

The SRv6-TE policy does not reference a SID list or the referenced SID list is down.

Common causes

The following are the common causes of this type of issue:

· The SRv6 TE policy is shut down administratively.

· The BSID configuration of the SRv6 TE policy is incorrect or has conflicts.

· Some configuration for the SRv6 TE policy is missing.

· The number of SRv6 TE policies has exceeded the limit.

· The number of SIDs in the segment list has exceeded the limit.

· The SID list of the SRv6 TE policy differs from the planned packet forwarding path.

· Physical link faults have occurred on the forwarding path of the SRv6 TE policy.

Troubleshooting flow

Figure 105 shows the troubleshooting flowchart.

Figure 105 Flowchart for troubleshooting SRv6 TE policy failure to take effect

Solution

1. On the source node of the SRv6 TE policy, execute the display segment-routing ipv6 te policy status command for a preliminary identification of the reasons why the SRv6 TE policy is not taking effect.

<Sysname> display segment-routing ipv6 te policy status

Name/ID: p1/0

Status: Down

Check admin status : Failed

Check for endpoint & color : Passed

Check for segment list : Passed

Check valid candidate paths : Failed

Check for BSIDs : -

If the Check admin status field shows Failed, it means the SRv6 TE policy has been administratively shut down. Execute the undo shutdown command in SRv6 TE policy view to bring the policy up.

After the SRv6 TE policy is administratively up, execute the display segment-routing ipv6 te policy status command again to identify other fields displaying Failed or a hyphen (-). If the Check for segment List field displays as Failed, proceed to the following step.

2. Verify that no conflict occurs with the BSID of the SRv6 TE policy.

Execute the display segment-routing ipv6 te policy command at the source node of the SRv6 TE policy. If the Request state field displays Failed, it indicates a BSID request failure. The statically specified BSID might not be within the locator range or it might be duplicated with the BSID of an existing SRv6 TE policy, causing the SRv6 TE policy to become invalid. As a best practice, execute the undo binding-sid command for the invalid SRv6 TE policy to delete the statically specified BSID. The system automatically allocates a BSID to prevent errors and conflicts.

<Sysname> display segment-routing ipv6 te policy

Name/ID: p1/0

Color: 10

Endpoint: 1000::1

Name from BGP:

BSID:

Mode: Dynamic Type: Type 2 Request state: Succeeded

Current BSID: 8000::1 Explicit BSID: - Dynamic BSID: 8000::1

Reference counts: 3

Flags: A/BS/NC

If the issue persists after the successful BSID allocation, proceed to the following step.

3. Verify that the SRv6 TE policy configuration is complete.

Assume that IS-IS is used to advertise SIDs. On the source node of the SRv6 TE policy, execute the display current-configuration command to view the current configuration of the SRv6 TE policy. Compare the configuration with the configuration in the following example. If any configuration item is missing, it indicates that the policy configuration is incomplete.

isis 1

address-family ipv6 unicast

segment-routing ipv6 locator a

segment-routing ipv6

locator a ipv6-prefix 1000:0:0:1:: 64 static 16

traffic-engineering

srv6-policy locator a

segment-list sl1

index 10 ipv6 1000::2:0:0:1:0

index 20 ipv6 1000::2:0:0:1:3

policy p1

color 100 end-point ipv6 4::4

candidate-paths

preference 100

explicit segment-list sl1

On each node of the SRv6 TE policy forwarding path, you must execute the segment-routing ipv6 locator command in the IGP view in order to advertise the locator. For example:

isis 1

address-family ipv6 unicast

segment-routing ipv6 locator b

If the configuration is incomplete, supplement the missing parts. If the configuration is fully completed but the problem persists, proceed to the following step.

4. Verify that the number of SRv6 TE policies and that of segment lists do not exceed the limit.

Execute the display segment-routing ipv6 te policy statistics command on the source node of the SRv6 TE policy to Identify whether the number of resources used by SRv6 TE policies has reached the limit.

<Sysname> display segment-routing ipv6 te policy statistics

IPv6 TE Policy Database Statistics

…

SRv6-TE policy resource information:

Max resources: 1024

Used resources: 1

Upper threshold: 512 (50%)

Lower threshold: 102 (10%)

SID list resource information:

Max resources: 4096

Used resources: 1

Upper threshold: 3277 (80%)

Lower threshold: 1638 (40%)

…

¡ If the value of the Used resources field in SRv6-TE policy resource information is equal to the value of the Max resources field, it indicates that the number of SRv6 TE policies might have exceeded the limit. In this case, delete the unnecessary SRv6 TE policies.

¡ If the value of the Used resources field in SID list resource information is equal to the value of the Max resources field, it indicates that the number of segment lists might have exceeded the limit. In this case, delete the unnecessary segment lists.

¡ If the number of SRv6 TE policies and that of segment lists have not exceeded the limit, proceed to the following step.

5. Verify that the number of SIDs in the segment list does not exceed the limit.

Enter probe view on the source node of the SRv6 TE policy, and execute the display system internal segment-routing ipv6 te policy status command. In the command output, the MaxSIDs field value represents the maximum number of SIDs allowed in the segment list.

[Sysname-probe] display system internal segment-routing ipv6 te policy status

…

MaxGroupNidNum: 1024 MaxPolicyNidNum: 1024

MaxSeglistNidNum: 4096 MaxNexthopNidNum: 65535

MaxOutNum: 32 MaxEcmpNum: 16

MaxSIDs: 10

…

Execute the display segment-routing ipv6 te segment-list command. In the command output, the Nodes field indicates the number of SID nodes configured in the specified segment list.

<Sysname> display segment-routing ipv6 te segment-list

Total Segment lists: 1

Name/ID: A/1

Origin: CLI

Status: Up

Verification State: Down

Nodes: 11

…

If the number of SID nodes configured exceeds the maximum number of SIDs supported, delete unnecessary SID values in the segment list. If the number of SID nodes configured does not exceed the limit, proceed to the following step.

6. Verify that the configuration of the SID list is consistent with the planned forwarding path.

Execute the display segment-routing ipv6 te segment-list command on the source node of the SRv6 TE policy to display the SID list information. The SID values arranged from top to bottom represent nodes or links from near to far to the source node of the SRv6 TE policy. If the Status field value is Down, it indicates that the locator to which the SID belongs has not been learned correctly. In this case, troubleshoot this issue as described in the OSPFv3 or IS-IS troubleshooting manual.

[Sysname] display segment-routing ipv6 te segment-list

Total Segment lists: 1

Name/ID: s1/1

Origin: CLI

Status: Down

Verification State: Down

Nodes : 3

Index : 10 SID: 1::1

Status : UP TopoStatus: Nonexistent

Type : Type_2 Flags: None

Coc Type : - Common prefix length: 0

Index : 20 SID: 1::2

Status : Down TopoStatus: Nonexistent

Type : Type_2 Flags: None

Coc Type : - Common prefix length: 0

Index : 30 SID: 1::3

Status : Down TopoStatus: Nonexistent

Type : Type_2 Flags: None

Coc Type : - Common prefix length: 0

On each node along the SRv6 TE policy forwarding path, execute the display segment-routing ipv6 local-sid command in sequence to check whether the SID values are consistent with those in the SID list displayed by the display segment-routing ipv6 te segment-list command. The SID types are usually End SID and End.X SID. For example, for End SID, view the information of the SRv6 Local SID.

[Sysname] display segment-routing ipv6 local-sid end

Local SID forwarding table (End)

Total SIDs: 2

SID : 1000::2:0:0:1:0/64

Function type : End Flavor : PSP

Locator name : b Allocation type: Dynamic

Owner : IS-IS-1 State : Active

Create Time : Sep 04 16:32:03.443 2021

If the SID list does not match the SID values of the nodes on the forwarding path, execute the undo index index-number command to delete the incorrect SID, and then run the index index-number ipv6 ipv6-address command to reconfigure the correct SID. If the SID list is consistent with the plan, proceed to the following step.

7. On each node along the SRv6 TE policy forwarding path, check the physical link state with the display interface brief command. Ensure that both the physical state and the data link layer (DDL) protocol state of each interface are UP on the forwarding path. If the link is normal, or if the issue persists after link faults are cleared, proceed to the following step.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· SRPV6/2/SRPV6_BSID_CONFLICT

· SRPV6/2/SRPV6_BSID_CONFLICT_CLEAR

· SRPV6/5/SRPV6_PATH_STATE_DOWN

· SRPV6/4/SRPV6_POLICY_STATUS_CHG

· SRPV6/4/SRPV6_RESOURCE_EXDCEED

· SRPV6/4/SRPV6_RESOURCE_EXCEED_CLEAR

· SRPV6/5/SRPV6_SEGLIST_STATE_DOWN

· SRPV6/5/SRPV6_ SEGLIST_STATE_DOWN

· SRPV6/2/SRPV6_STATE_DOWN

· SRPV6/2/SRPV6_STATE_DOWN_CLEAR

Troubleshooting VPN issues

Troubleshooting EVPN issues

Troubleshooting EVPN VPLS over SRv6 BE traffic forwarding failure

Symptom

As shown in Figure 106, EVPN VPLS uses SRv6 BE tunnels as public network tunnels, and CE 1 is multi-homed to PE 1 and PE 2. In this network, broadcast and unicast traffic forwarding fails between CE 1 and CE 2.

Figure 106 Network diagram

Common causes

The following are the common causes of this type of issue:

· The BGP EVPN peers are not established between the PEs.

· The PEs have not received Type 3 routes (IMET routes).

· The PEs have not received Type 2 routes (MAC/IP advertisement routes).

· The PEs have not received Type 1 routes (Ethernet auto-discovery routes).

· The Route Target attribute carried in the EVPN route does not match the locally configured Import Route Target attribute.

· The route to the SRv6 SID does not exist on the PE.

Analysis

Figure 107 shows the troubleshooting flowchart.

Figure 107 Flowchart for troubleshooting EVPN VPLS over SRv6 BE traffic forwarding failure

Solution

1. Verify that the BGP EVPN peers are successfully established between the PEs.

a. Execute the display bgp peer l2vpn evpn command to verify that all BGP EVPN peers between the PEs are in Established state. If they are in Established state, proceed to step 2. If not, resolve the BGP EVPN peer establishment issue. For more information, see the troubleshooting solution for the issue that the BGP session cannot enter the Established state.

b. If the issue persists after the BGP peers are successfully established, proceed to step 2.

2. Verify that the PEs have received Type 3 routes.

a. Execute the display bgp l2vpn evpn route-type imet command to verify that the PE has received Type 3 routes from other PEs. If Type 3 routes are received, proceed to step 3. If not, troubleshoot the EVPN route synchronization issue. Possible reasons include the route reflector (RR) is not configured with the peer reflect-client command, the RR is not configured with the undo policy vpn-target command, and an incorrect routing policy is specified for the BGP peer. Please check for incorrect configuration and edit it.

b. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.

c. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.

3. Verify that the PEs have received Type 2 routes.

a. Identify the traffic type. For broadcast traffic failure, proceed to step 4. For unicast traffic failure, proceed to the next step.

b. Execute the display bgp l2vpn evpn route-type mac-ip command to verify that a Type 2 route matching the destination MAC address of unicast traffic exists, and the route comes from the correct BGP peer. If a Type 2 route exists and is correct, proceed to step 4. If not, resolve the Type 2 route synchronization issue as described in step 2.

c. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.

d. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.

4. Verify that the PEs have received Type 1 routes.

a. Execute the display bgp l2vpn evpn route-type mac-ip command to view detailed Type 2 route information. If the ESI carried in the route is 0.0.0.0.0.0, proceed to step 5. If not, proceed to the next step.

b. Execute the display bgp l2vpn evpn route-type auto-discovery command to verify that a Type 1 route matching the ESI in the Type 2 route exists. If such a route exists, proceed to step 5. If not, resolve the Type 1 route synchronization issue as described in step 2.

c. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.

d. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.

5. View the detailed route information to check for matching VPN Targets.

a. View the detailed Type 1, 2, and 3 route information. Take Type 1 route as an example, execute the display bgp l2vpn evpn route-type auto-discovery { evpn-route route-length | evpn-prefix } command to obtain the RTs carried in the extended community attribute of the route.

b. Enter VSI view and execute the display this command to obtain the vpn-target configured for the EVPN instance.

c. If a minimum of one RT carried in the route is consistent with the Import RT of the EVPN instance, proceed to step 6. If not, proceed to the next step.

d. Appropriately plan the VPN-target configuration for the EVPN instance, and modify the VPN-target configuration for the EVPN instance to ensure that the RT carried in the route matches the Import RT of the EVPN instance.

e. If the issue persists, proceed to the next step.

6. Verify that a route is available to the SRv6 SID.

a. Execute the display l2vpn forwarding srv6 command to view the SRv6 SID allocated by the remote PE to the SRv6 PW, which is the value for the Out SID field.

b. Execute the display ipv6 routing-table ipv6-address command (where ipv6-address is the Out SID field value) to verify that a route is available to the SRv6 SID allocated by the remote PE to the SRv6 PW. If such an SRv6 SID exists, proceed to step 7. If not, resolve the IGP route learning issue. For more information, see the IP routing troubleshooting guide.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Troubleshooting EVPN VPLS over SRv6 TE policy traffic forwarding failure

Symptom

As shown in Figure 108, EVPN VPLS uses SRv6 TE policy tunnels as public network tunnels, and CE 1 is multi-homed to PE 1 and PE 2. In this network, broadcast and unicast traffic forwarding fails between CE 1 and CE 2.

Figure 108 Network diagram

Common causes

The following are the common causes of this type of issue:

· The BGP EVPN peers are not established between the PEs.

· The PEs have not received Type 3 routes (IMET routes).

· The PEs have not received Type 2 routes (MAC/IP advertisement routes).

· The PEs have not received Type 1 routes (Ethernet auto-discovery routes).

· The Route Target attribute carried in the EVPN route does not match the locally configured Import Route Target attribute.

· The color value carried in the EVPN route does not match the color value configured for the SRv6 TE policy locally.

· The color value of the local VSI instance does not match the color value of the local SRv6 TE policy.

· The SRv6 TE policy to which EVPN VPLS is steered does not take effect.

Troubleshooting flow

Figure 109 shows the troubleshooting flowchart.

Figure 109 Flowchart for troubleshooting EVPN VPLS over SRv6 TE policy traffic forwarding failure

Solution

To resolve the issue:

1. Verify that the BGP EVPN peers are successfully established between the PEs.

b. If the issue persists after the BGP peers are successfully established, proceed to step 2.

2. Verify that the PEs have received Type 3 routes.

b. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.

c. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.

3. Verify that the PEs have received Type 2 routes.

a. Identify the traffic type. For broadcast traffic failure, proceed to step 4. For unicast traffic failure, proceed to the next step.

c. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.

d. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.

4. Verify that the PEs have received Type 1 routes.

c. If the EVPN route synchronization issue persists after you perform the previous operation, proceed to step 8.

d. If the issue persists after the EVPN route synchronization failure is resolved, proceed to the next step.

5. View the detailed route information to check for matching VPN Targets.

b. Enter VSI view and execute the display this command to obtain the vpn-target configured for the EVPN instance.

c. If a minimum of one RT carried in the route is consistent with the Import RT of the EVPN instance, proceed to step 6. If not, proceed to the next step.

e. If the issue persists, proceed to the next step.

6. View detailed route information, and verify that the color in the route matches the color value configured locally for the SRv6 TE policy.

a. View the detailed Type 1, 2, and 3 route information. Take Type 1 route as an example, execute the display bgp l2vpn evpn route-type auto-discovery { evpn-route route-length | evpn-prefix } command to view the color value in the route. If no color exists in the route, proceed to step 7. Otherwise, proceed to the next step.

b. Execute the display segment-routing ipv6 te policy command to view the color value of the SRv6 TE policy to which EVPN VPLS is expected to be steered.

c. If the color in the route is the same as the color value in the SRv6 TE policy, proceed to step 7. If they are different, you need to edit the color value of the SRv6 TE policy.

d. If the issue persists, proceed to the next step.

7. Verify that the color value of the local VSI instance does not match the color value of the local SRv6 TE policy.

a. Execute the display l2vpn peer srv6 verbose command to view the default color value configured for the VSI instance, which is the value in the Color field.

b. Execute the display segment-routing te policy command to view the color value of the SRv6 TE policy to which EVPN VPLS is expected to be steered.

c. If the color value of the VSI instance is the same as the color value of the SRv6 TE policy, proceed to step 7. If they are different, edit the color value of the local VSI instance or the SRv6 TE policy.

d. If the issue persists, proceed to the next step.

8. Verify that the SRv6 TE policy is effective.

a. Execute the display segment-routing ipv6 te policy command to check the value for the Status field. If the value is up, the SRv6 TE policy is effective, and proceed to step 8. If the value is down, the SRv6 TE policy is not effective. To more information to resolve this issue, see the SRv6 TE policy troubleshooting guide.

b. If the issue persists, proceed to the next step.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Troubleshooting VXLAN issues

Unreachable centralized VXLAN IP gateway

Symptom

As shown in Figure 110, a VXLAN tunnel is established between the VTEP and the centralized VXLAN IP gateway, and a VSI interface on the centralized VXLAN IP gateway acts as a gateway interface. When a ping operation is executed on the server connected to the VTEP, the centralized VXLAN IP gateway is unreachable.

Figure 110 Network diagram

Common causes

The following are the common causes of this type of issue:

· The status of the VXLAN tunnel is down.

· The source or destination IP address of the VXLAN tunnel is incorrect.

· The status of the VXLAN IP gateway interface is down.

· No ARP entry for the ping operation exists on the device.

Troubleshooting flow

Figure 111 shows the troubleshooting flowchart.

Figure 111 Flowchart for troubleshooting an unreachable centralized VXLAN IP gateway

Solution

1. View the VXLAN tunnel information of the VXLAN network to which the server belongs on the VTEP that is connected to the server.

a. Execute the display l2vpn vsi verbose command to check the VXLAN ID of the VXLAN network to which the server belongs, and the name of the VXLAN tunnel associated with the VXLAN network (Tunnel Name field).

<Sysname> display l2vpn vsi verbose

VSI Name: vpna

VSI Index : 0

VSI State : Up

MTU : 1500

Bandwidth : Unlimited

Broadcast Restrain : Unlimited

Multicast Restrain : Unlimited

Unknown Unicast Restrain: Unlimited

MAC Learning : Enabled

MAC Table Limit : -

MAC Learning rate : -

Drop Unknown : -

Flooding : Enabled

Statistics : Disabled

VXLAN ID : 10

Tunnels:

Tunnel Name Link ID State Type Flood proxy

Tunnel1 0x5000001 Up Manual Disabled

Tunnel2 0x5000002 Up Manual Disabled

ACs:

AC Link ID State Type

GE2/0/1 srv1000 0 Up Manual

b. Execute the display interface tunnel command based on the name of the VXLAN tunnel and examine the current state, source IP address, and destination IP address of the VXLAN tunnel.

<Sysname> display interface tunnel 2

Tunnel2

Current state: UP

Line protocol state: UP

Description: Tunnel2 Interface

Bandwidth: 64 kbps

Maximum transmission unit: 1464

Internet protocol processing: Disabled

Last clearing of counters: Never

Tunnel source 2.2.2.2, destination 1.1.1.1

Tunnel protocol/transport UDP_VXLAN/IP

Last 300 seconds input rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

Last 300 seconds output rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

Input: 0 packets, 0 bytes, 0 drops

Output: 0 packets, 0 bytes, 0 drops

- If the VXLAN tunnel is up, go to step 3.

- If the VXLAN tunnel is down, go to step 2.

2. Check on the VTEP to see if the source IP address of the VXLAN tunnel is a local IP address, and whether the destination IP address is reachable.

¡ Execute the display ip interface brief command to verify that the source IP address of the VXLAN tunnel is a local IP address. If not, use the source command to modify the source IP address of the VXLAN tunnel.

<Sysname> display ip interface brief

*down: administratively down

(s): spoofing (l): loopback

Interface Physical Protocol IP address VPN instance Description

Loop1 up up(s) 2.2.2.2 -- --

……

MTunnel0 down down -- aaa --

Vlan1 *down down -- -- --

¡ Execute the display fib command to Identify whether an entry for the destination IP address of the VXLAN tunnel is in the FIB table. If not, modify the routing configuration to ensure Layer 3 connectivity to the destination IP address of the VXLAN tunnel.

<Sysname> display fib

Destination count: 4 FIB entry count: 4

Flag:

U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination/Mask Nexthop Flag OutInterface/Token Label

0.0.0.0/32 127.0.0.1 UH InLoop0 Null

2.2.2.2/32 127.0.0.1 UH InLoop0 Null

1.1.1.1/32 127.0.0.1 UH InLoop0 Null

127.0.0.0/32 127.0.0.1 UH InLoop0 Null

3. Execute the display interface vsi-interface brief command on the VXLAN IP gateway to view information about the VXLAN IP gateway interface, including the gateway interface number (Interface field), gateway interface state (Link Protocol field), and the gateway address (Primary IP field).

<Sysname> display interface Vsi-interface brief

Brief information on interfaces in route mode:

Link: ADM - administratively down; Stby - standby

Protocol: (s) - spoofing

Interface Link Protocol Primary IP Description

Vsi1 DOWN DOWN 192.168.1.1

¡ If the VXLAN IP gateway interface is down, check whether the shutdown command is configured for the VSI interface or whether the VSI bound to the VSI interface is up.

- If the shutdown command is configured for the VSI interface, execute the undo shutdown command.

- If the VSI bound to the VSI interface is down, execute the display l2vpn vsi command to check the AC status of VSI. If the AC status is down, verify that the AC configuration is correct and the AC-attached interface is up. If the AC configuration is incorrect or the AC-attached interface is down, modify the AC configuration or troubleshoot the interface issue.

¡ If the VXLAN IP gateway interface is up, execute the display arp command to check whether the ARP information for the gateway IP address has been learned.

<Sysname> display arp

Type: S-Static D-Dynamic O-Openflow R-Rule M-Multiport I-Invalid

IP address MAC address VLAN/VSI Interface/Link ID Aging Type

10.1.1.1 0001-0001-0001 0 Tunnel2 17 D

10.1.1.11 0001-0001-0001 0 Tunnel2 20 D

20.1.1.1 0002-0002-0002 1 Tunnel3 17 D

20.1.1.12 0002-0002-0002 1 Tunnel3 20 D

- If yes, go to step 4.

- If not, execute the display arp count command to check whether the number of learned entries has reached the maximum number of dynamic ARP entries for the device or interface. If yes, execute the arp max-learning-num or arp max-learning-number command to increase the maximum number of dynamic ARP entries.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Diagnostic information collected by using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Disconnection between VSI interfaces on two VTEPs

Symptom

As shown in Figure 112, a VXLAN tunnel is manually set up between the VTEPs, and VSI interfaces are configured as gateway interfaces on the VTEPs. Two VSI interfaces cannot ping each other.

NOTE:

This section introduces the troubleshooting methods for the ADWAN scenario.

Figure 112 ADWAN network diagram

Common causes

The following are the common causes of this type of issue:

· A VSI interface has not been associated with a VSI.

· A VSI interface is down.

· The IP addresses of the VSI interfaces are not in the same subnet.

· The VXLAN tunnel is down.

· The source or destination IP address of the VXLAN tunnel is incorrect.

· The VSI is down.

Troubleshooting flow

Figure 113 shows the troubleshooting flowchart.

Figure 113 Flowchart for troubleshooting disconnection between VSI interfaces on two VTEPs

Solution

1. Execute the display ip interface brief command on the VTEPs to view a brief information about the interfaces and IP addresses. For the unreachable gateway IP address, identify the name and state of the owner VSI interface.

[Sysname] display ip interface brief

*down: administratively down

(s): spoofing (l): loopback

Interface Physical Protocol IP address/Mask VPN instance Description

GE2/0/1 up up 192.168.1.114/24 -- --

GE2/0/3 down down -- -- --

RAGG1 down down -- -- --

Vsi1 down down 1.1.1.1/24 -- --

2. Execute the display l2vpn vsi verbose command on the VTEPs to view the information of the gateway interface (Gateway Interface field) and VXLAN tunnel (Tunnel Name field) associated with VSI.

[Sysname] display l2vpn vsi verbose

VSI Name: aaa

VSI Index : 0

VSI State : Up

MTU : 1500

Bandwidth : -

Broadcast Restrain : 5120 kbps

Multicast Restrain : 5120 kbps

Unknown Unicast Restrain: 5120 kbps

MAC Learning : Enabled

MAC Table Limit : -

MAC Learning rate : Unlimited

Drop Unknown : Disabled

PW Redundancy Mode : Slave

Flooding : Enabled

Statistics : Disabled

Gateway Interface : VSI-interface 1

VXLAN ID : 100

Tunnel Statistics : Disabled

Tunnels:

Tunnel Name Link ID State Type Flood Proxy Split horizon

Tunnel1 0x5000001 UP Manual Disabled Enabled

3. Check the output from the display l2vpn vsi verbose command for the VSI associated with VSI-interface 1.

¡ If the VSI does not exist, use the gateway vsi-interface command to configure the VSI interface as the VSI's gateway interface.

¡ If the VSI exists, perform the following tasks for the VSI interface:

- Identify whether the shutdown command has been executed on the VSI interface. If yes, use the undo shutdown command to bring up the VSI interface.

- Verify that the IP addresses of the VSI interfaces on the two VTEPs are in the same subnet. If not, assign IP addresses from the same subnet to the VSI interfaces.

4. Check the output from the display l2vpn vsi verbose command for VXLAN tunnels of the VSI.

¡ If no VXLAN tunnel is associated, create a VXLAN tunnel and use the tunnel command to associate it with the VSI.

¡ If a VXLAN tunnel is associated, follow step 2 to check the state, source IP address, and destination IP address of the VXLAN tunnel by using the display interface tunnel command.

<Sysname> display interface tunnel 2

Tunnel2

Current state: UP

Line protocol state: UP

Description: Tunnel2 Interface

Bandwidth: 64 kbps

Maximum transmission unit: 1464

Internet protocol processing: Disabled

Last clearing of counters: Never

Tunnel source 2.2.2.2, destination 1.1.1.1

Tunnel protocol/transport UDP_VXLAN/IP

Last 300 seconds input rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

Last 300 seconds output rate: 0 bytes/sec, 0 bits/sec, 0 packets/sec

Input: 0 packets, 0 bytes, 0 drops

Output: 0 packets, 0 bytes, 0 drops

5. Check on the VTEPs whether the source IP address of the VXLAN tunnel is a local IP address, and whether the destination IP address is an address on the remote VTEP. Verify that the destination IP address is reachable.

¡ Execute the display ip interface brief command to Identify whether the source IP address of the VXLAN tunnel is a local IP address. If not, modify the source IP address of the VXLAN tunnel by using the source command.

<Sysname> display ip interface brief

*down: administratively down

(s): spoofing (l): loopback

Interface Physical Protocol IP address VPN instance Description

Loop1 up up(s) 2.2.2.2 -- --

……

MTunnel0 down down -- aaa --

Vlan1 *down down -- -- --

¡ Execute the display fib command to Identify whether an entry for the destination IP address of the VXLAN tunnel is in the FIB table, and use the ping command to verify connectivity between the source and destination IP addresses of the VXLAN tunnel. If no FIB entry is found, modify the routing configuration to ensure Layer 3 connectivity to the destination IP address of the VXLAN tunnel.

<Sysname> display fib

Destination count: 4 FIB entry count: 4

Flag:

U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination/Mask Nexthop Flag OutInterface/Token Label

0.0.0.0/32 127.0.0.1 UH InLoop0 Null

2.2.2.2/32 127.0.0.1 UH InLoop0 Null

1.1.1.1/32 127.0.0.1 UH InLoop0 Null

127.0.0.0/32 127.0.0.1 UH InLoop0 Null

6. Execute the display l2vpn vsi verbose command on the VTEPs to Identify whether the VSI is up.

¡ If the VSI is down, check whether the shutdown command has been configured on the VSI. If yes, execute the undo shutdown command.

¡ If the VSI is up, go to step 7.

7. Perform steps 1 through 6.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Diagnostic information collected by using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

Module Name: HH3C-IF-EXT-MIB

· hh3cIfPortUp (1.3.6.1.4.1.25506.2.40.3.0.5)

Log messages

· IFNET/3/PHY_UPDOWN

· IFNET/5/LINK_UPDOWN

Troubleshooting EVPN issues

Troubleshooting EVPN VXLAN

Intra-VXLAN tunnel setup failure in an EVPN VXLAN network with distributed gateways

Symptom

In an EVPN network with distributed gateways, tunnels cannot be established between VTEPs in the same VXLAN.

Common causes

The following are the common causes of this type of issue:

· Type-2 EVPN routes (MAC/IP advertisement routes) and type-3 EVPN routes (IMET routes) have not been received.

· The RT configuration is incorrect on EVPN instances.

Troubleshooting flow

Troubleshoot the issue by using the following process:

1. Verify that type-2 routes have been received.

2. Verify that type-3 routes have been received.

3. Verify that the RT configuration for EVPN instances is correct.

Figure 114 shows the troubleshooting process.

Figure 114 Flowchart for troubleshooting intra-VXLAN tunnel setup failure

Solution

To resolve the issue:

1. Execute the display bgp l2vpn evpn command on the local end to Identify whether the local end has advertised type-2 or type-3 routes to the peer end. For example, the following output indicates that the local end has advertised type-2 and type-3 routes to 4.4.4.4. If a Route Reflector (RR) exists in the network, specify the RR address when you execute the display bgp l2vpn evpn command. If not, specify the peer end's address.

<Sysname> display bgp l2vpn evpn peer 4.4.4.4 advertised-routes

Total number of routes: 2

BGP local router ID is 1.1.1.1

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external,

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Route distinguisher: 2:2

Total number of routes: 2

Network NextHop MED LocPrf Path/Ogn

* > [2][0][48][0e86-19b6-0308][0][0.0.0.0]/104

0.0.0.0 0 100 i

* > [3][0][32][1.1.1.1]/80

0.0.0.0 0 100 i

¡ If the local end has advertised type-2 or type-3 routes to the peer end, go to step 2.

¡ If the local end has not advertised type-2 and type-3 routes to the peer end, Identify whether BGP is configured correctly for EVPN. For more information, see EVPN VXLAN configuration in EVPN Configuration Guide.

2. Execute the display bgp l2vpn evpn command on the peer end to Identify whether the peer end has advertised type-2 or type-3 routes to the local end. For example, the following output indicates that the peer end has advertised type-2 and type-3 routes to 4.4.4.4. If an RR exists in the network, specify the RR address when you execute the display bgp l2vpn evpn command. If not, specify the peer end's address.

<Sysname> display bgp l2vpn evpn peer 4.4.4.4 advertised-routes

Total number of routes: 2

BGP local router ID is 3.3.3.3

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external,

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Route distinguisher: 1:1

Total number of routes: 2

Network NextHop MED LocPrf Path/Ogn

* > [2][0][48][0e86-23cf-0507][0][0.0.0.0]/104

0.0.0.0 0 100 i

* > [3][0][32][3.3.3.3]/80

0.0.0.0 0 100 i

¡ If the peer end has advertised type-2 or type-3 routes to the local end, go to step 3.

¡ If the peer end has not advertised type-2 and type-3 routes to the local end, Identify whether BGP is configured correctly for EVPN. For more information, see EVPN VXLAN configuration in EVPN Configuration Guide.

3. Execute the display this command in VSI view to Identify whether export targets and import targets on both ends are correct.

[Sysname-vsi-aaa] display this

vsi aaa

vxlan 10

evpn encapsulation vxlan

route-distinguisher 2:2

vpn-target 1:1 export-extcommunity

vpn-target 2:2 import-extcommunity

return

¡ If the route targets are inconsistent on the local and peer ends, execute the vpn-target command in VSI view to modify the incorrect route targets.

¡ If the route targets are consistent on the local peer and ends, go to step 4.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Diagnostic information collected by using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

Inter-VXLAN tunnel setup failure in an EVPN VXLAN network with distributed gateways

Symptom

In an EVPN network with distributed gateways, tunnels cannot be established between VTEPs in different VXLANs.

Common causes

The following are the common causes of this type of issue:

· Type-2 EVPN routes and type-5 EVPN routes (IP prefix advertisement routes) have not been received.

· The RT configuration is incorrect on VPN instances.

Troubleshooting flow

Troubleshoot the issue by using the following process:

1. Verify that type-2 routes have been received.

2. Verify that type-5 routes have been received.

3. Verify that the RT configuration for VPN instances is correct.

Figure 115 shows the troubleshooting process.

Figure 115 Flowchart for troubleshooting inter-VXLAN tunnel setup failure

Solution

To resolve the issue:

1. Execute the display bgp l2vpn evpn command on the local end to Identify whether the local end has advertised type-2 or type-5 routes to the peer end. For example, the following output indicates that the local end has advertised type-2 and type-5 routes to 4.4.4.4. If an RR exists in the network, specify the RR address when you execute the display bgp l2vpn evpn command. If not, specify the peer end's address.

<Sysname> display bgp l2vpn evpn peer 4.4.4.4 advertised-routes

Total number of routes: 3

BGP local router ID is 1.1.1.1

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external,

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Route distinguisher: 1:1

Total number of routes: 1

Network NextHop MED LocPrf Path/Ogn

* > [5][0][24][10.1.1.0]/80

0.0.0.0 0 100 i

Route distinguisher: 2:2

Total number of routes: 2

Network NextHop MED LocPrf Path/Ogn

* > [2][0][48][0e86-19b6-0308][0][0.0.0.0]/104

0.0.0.0 0 100 i

* > [3][0][32][1.1.1.1]/80

0.0.0.0 0 100 i

¡ If the local end has advertised type-2 or type-5 routes to the peer end, go to step 2.

¡ If the local end has not advertised type-2 and type-5 routes to the peer end, Identify whether BGP is configured correctly for EVPN. For more information, see EVPN VXLAN configuration in EVPN Configuration Guide.

2. Execute the display bgp l2vpn evpn command on the peer end to Identify whether the peer end has advertised type-2 or type-5 routes to the local end. For example, the following output indicates that the peer end has advertised type-2 and type-5 routes to 4.4.4.4. If an RR exists in the network, specify the RR address when you execute the display bgp l2vpn evpn command. If not, specify the peer end's address.

<Sysname> display bgp l2vpn evpn peer 4.4.4.4 advertised-routes

Total number of routes: 3

BGP local router ID is 3.3.3.3

Status codes: * - valid, > - best, d - dampened, h - history,

s - suppressed, S - stale, i - internal, e - external,

a - additional-path

Origin: i - IGP, e - EGP, ? - incomplete

Route distinguisher: 1:1

Total number of routes: 2

Network NextHop MED LocPrf Path/Ogn

* > [2][0][48][0e86-23cf-0507][0][0.0.0.0]/104

0.0.0.0 0 100 i

* > [3][0][32][3.3.3.3]/80

0.0.0.0 0 100 i

Route distinguisher: 3:3

Total number of routes: 2

Network NextHop MED LocPrf Path/Ogn

* > [5][0][24][10.1.1.0]/80

0.0.0.0 0 100 i

¡ If the peer end has advertised type-2 or type-5 routes to the local end, go to step 3.

¡ If the peer end has not advertised type-2 and type-5 routes to the local end, Identify whether BGP is configured correctly for EVPN. For more information, see EVPN VXLAN configuration in EVPN Configuration Guide.

3. Execute the display this command in L3VNI VPN instance view to Identify whether export targets and import targets on both ends are correct.

[Sysname-vpn-instance-vpna] display this

ip vpn-instance vpna

route-distinguisher 1:1

address-family evpn

vpn-target 1:1 import-extcommunity

vpn-target 1:1 export-extcommunity

return

¡ If the route targets are inconsistent on the local and peer ends, execute the vpn-target command to modify the incorrect route targets.

¡ If the route targets are consistent on the local peer and ends, go to step 4.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Diagnostic information collected by using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

Layer 2 VXLAN business traffic interruption

Symptom

A VXLAN network cannot forward Layer 2 VXLAN business traffic.

Common causes

The following are the common causes of this type of issue:

· ACs or VXLAN tunnels are not established.

· MAC addresses are not learned.

Troubleshooting flow

Figure 116 shows the troubleshooting flowchart.

Figure 116 Flowchart for troubleshooting Layer 2 VXLAN business traffic interruption

Solution

To resolve the issue:

1. Execute the display l2vpn vsi verbose command to view the VXLAN tunnels and ACs of the involved VSI.

<Sysname> display l2vpn vsi verbose

VSI Name: vpna

VSI Index : 0

VSI State : Up

MTU : 1500

Bandwidth : Unlimited

Broadcast Restrain : Unlimited

Multicast Restrain : Unlimited

Unknown Unicast Restrain: Unlimited

MAC Learning : Enabled

MAC Table Limit : -

MAC Learning rate : -

Drop Unknown : -

Flooding : Enabled

Statistics : Disabled

VXLAN ID : 10

Tunnels:

Tunnel Name Link ID State Type Flood proxy

Tunnel1 0x5000001 Up Manual Disabled

ACs:

AC Link ID State Type

GE2/0/1 srv1000 0 Up Manual

¡ If both the ACs and VXLAN tunnels are up, go to step 2.

¡ If an AC is down, modify the incorrect AC configuration.

¡ If a VXLAN tunnel is down, troubleshoot the issue as described in "Intra-VXLAN tunnel setup failure in an EVPN VXLAN network with distributed gateways."

2. Execute the display l2vpn mac-address command to check the VSI MAC address table for the MAC addresses of endpoints in the network and the total number of learned MAC address entries.

<Sysname> display l2vpn mac-address

* - The output interface is issued to another VSI

MAC Address State VSI Name Link ID/Name Aging

0001-0001-0001 Static aaa Tunnel1 NotAging

52f6-bc1e-0d06 Dynamic vpna GE2/0/1 Aging

--- 3 mac address(es) found ---

¡ If the endpoint MAC addresses have been learned, go to step 3.

¡ If the endpoint MAC addresses are not learned, execute the display this command in VSI view and verify that the mac-table limit and mac-table limit drop-unknown commands have been executed for the VSI. If the commands exist and the MAC address learning limit has been reached, increase or delete the MAC address learning limit for the VSI by using the mac-table limit drop-unknown command.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Diagnostic information collected by using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

Layer 3 VXLAN business traffic interruption

Symptom

A VXLAN network cannot forward Layer 3 VXLAN business traffic.

Common causes

The following are the common causes of this type of issue:

· ACs or VXLAN tunnels are not established.

· The device's router MAC address is incorrect.

Troubleshooting flow

Figure 117 shows the troubleshooting flowchart.

Figure 117 Flowchart for troubleshooting Layer 3 VXLAN business traffic interruption

Solution

To resolve the issue:

1. Execute the display l2vpn vsi verbose command to view the VXLAN tunnels and ACs of the involved VSI.

<Sysname> display l2vpn vsi verbose

VSI Name: vpna

VSI Index : 0

VSI State : Up

MTU : 1500

Bandwidth : Unlimited

Broadcast Restrain : Unlimited

Multicast Restrain : Unlimited

Unknown Unicast Restrain: Unlimited

MAC Learning : Enabled

MAC Table Limit : -

MAC Learning rate : -

Drop Unknown : -

Flooding : Enabled

Statistics : Disabled

VXLAN ID : 10

Tunnels:

Tunnel Name Link ID State Type Flood proxy

Tunnel1 0x5000001 Up Manual Disabled

ACs:

AC Link ID State Type

GE2/0/1 srv1000 0 Up Manual

¡ If both the ACs and VXLAN tunnels are up, go to step 2.

¡ If an AC is down, modify the incorrect AC configuration.

¡ If a VXLAN tunnel is down, troubleshoot the issue as described in "Inter-VXLAN tunnel setup failure in an EVPN VXLAN network with distributed gateways."

2. Execute the display evpn routing-table command, check the routing table of the L3VNI VPN instance, and record the nexthop address (Nexthop field) in the route for the target endpoint IP address (IP address field).

<Sysname> display evpn routing-table vpn-instance vpn1

Flags: E - with valid ESI A – A-D ready L - Local ES exists

VPN instance name: vpn1 Local L3VNI: 7

IP address Nexthop Outgoing interface NibID Flags

10.1.1.11 1.1.1.1 Vsi-interface3 0x18000000 EAL

3. Execute the display arp command to view the ARP information for the next hop.

<Sysname> display arp

Type: S-Static D-Dynamic O-Openflow R-Rule M-Multiport I-Invalid

IP address MAC address VLAN/VSI name Interface Aging Type

1.1.1.1 00e0-fe50-6503 vsi1 Tunnel1 960 D

¡ If the nexthop address is mapped to the router MAC address, go to step 4. Execute the display interface vsi-interface command to view the MAC address of the L3VNI VSI interface, which is also the router MAC address.

¡ If the nexthop address is not mapped to the router MAC address, restore consistency between the mapped MAC address and the MAC address of the L3VNI VSI interface. Alternatively, use the evpn global-mac command to configure an EVPN global MAC address.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Diagnostic information collected by using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

Prolonged VM migration

Symptom

In an EVPN network, a VTEP does not learn the MAC address or ARP information of a VM immediately after the VM migrates to the VTEP.

Common causes

The following are the common causes of this type of issue:

· The VTEP has not learned the MAC address or ARP entry for the migrated VM.

· The VTEP has not learned the MAC address or ARP entry for the migrated VM through BGP EVPN route synchronization.

· The BGP EVPN routes synchronized between VTEPs are not optimal ones.

Troubleshooting flow

Figure 118 shows the troubleshooting flowchart.

Figure 118 Flowchart for troubleshooting prolonged VM migration

Solution

1. After the migration, check for the MAC address and ARP entry on the destination VTEP.

Execute the display l2vpn mac-address command and check the VSI MAC address table for the MAC address of the migrated VM.

<Sysname> display l2vpn mac-address

* - The output interface is issued to another VSI

MAC Address State VSI Name Link ID/Name Aging

52f6-bc1e-0d06 EVPN aaa Tunnel10 NotAging

0001-0001-0001 Dynamic vpna GE2/0/1 Aging

--- 2 mac address(es) found ---

Execute the display arp command and check the VSI ARP table for the ARP entry of the migrated VM.

<Sysname> display arp

Type: S-Static D-Dynamic O-Openflow R-Rule M-Multiport I-Invalid

IP address MAC address VLAN/VSI name Interface Aging Type

10.1.1.3 0001-0001-0001 vpna GE2/0/1 960 D

1.1.1.4 00e0-fe60-5000 vsi2 Tunnel1 -- M

¡ If a MAC address or ARP entry exists for the migrated VM, go to step 2.

¡ If no MAC address or ARP entry exists for the migrated VM, the VTEP does not learn the MAC address or ARP information of the migrated VM. Bring up the VM on the VTEP for the VTEP to learn the MAC address and ARP information.

2. Before migration, check on the VTEP whether the MAC address or ARP of the migrated VM has been synchronized through BGP EVPN routes.

Execute the display evpn route mac command to verify that the MAC address of the migrated VM has been learned from synchronized BGP EVPN routes. The value B in the Flags field indicates that a MAC address entry is learned from BGP EVPN routes.

<Sysname> display evpn route mac

Flags: D - Dynamic B - BGP L - Local active

G - Gateway S - Static M - Mapping I - Invalid

VSI name: bbb

EVPN instance: -

MAC address Link ID/Name Flags Encap Next hop

0000-0000-000a 1 DL VXLAN -

0001-0001-0001 Tunnel1 B VXLAN 2.2.2.2

Execute the display evpn route arp command to verify that the ARP information of the migrated VM has been learned from synchronized BGP EVPN routes. The value B in the Flags field indicates that an ARP entry is learned from BGP EVPN routes.

<Sysname> display evpn route arp

Flags: D - Dynamic B - BGP L - Local active

G - Gateway S - Static M - Mapping I - Invalid

VPN instance: vpn1 Interface: Vsi-interface1

IP address MAC address Router MAC VSI index Flags

10.1.1.1 0001-0001-0001 a0ce-7e40-0400 0 B

10.1.1.11 0001-0001-0002 a0ce-7e40-0400 0 DL

10.1.1.101 0001-0011-0101 a0ce-7e40-0400 0 SL

10.1.1.102 0001-0011-0102 0011-9999-0000 0 BS

¡ If a MAC address entry or ARP entry has been learned through BGP EVPN route synchronization for the migrated VM, go to step 3.

¡ If no MAC address entry or ARP entry has been learned through BGP EVPN route synchronization for the migrated VM, execute the vpn-target command to modify the route targets of the local EVPN instance. Make sure the EVPN instance's route targets on the local and peer ends are consistent.

3. Execute the display bgp l2vpn evpn command and verify that the MAC/IP advertisement route carrying the MAC address and ARP information of the migrated VM is optimal. Verify that the values in the State field include best. In the following output, the MAC/IP advertisement route advertises MAC address 0001-0203-0405 and IP address 5.5.5.5/32, and the route state values include best.

<Sysname> display bgp l2vpn evpn route-distinguisher 1.1.1.1:100 [2][5][48][0001-0203-0405][32][5.5.5.5] 136

BGP local router ID: 172.16.250.133

Local AS number: 100

Route distinguisher: 1.1.1.1:100

Total number of routes: 1

Paths: 1 available, 1 best

BGP routing table information of [2][5][48][0001-0203-0405][32][5.5.5.5]/136:

From : 10.1.1.2 (192.168.56.17)

Rely nexthop : 10.1.1.2

Original nexthop: 10.1.1.2

OutLabel : NULL

Ext-Community : <RT: 1:2>, <RT: 1:3>, <RT: 1:4>, <RT: 1:5>, <RT: 1:6>, <RT: 1:7

>, <Encapsulation Type: VXLAN>, <Router's Mac: 0006-0708-0910

>, <MAC Mobility: Flag 0, SeqNum 2>, <Default GateWay>

RxPathID : 0x0

TxPathID : 0x0

AS-path : 200

Origin : igp

Attribute value : MED 0, pref-val 0

State : valid, external, best

IP precedence : N/A

QoS local ID : N/A

Traffic index : N/A

EVPN route type : MAC/IP advertisement route

ESI : 0001.0203.0405.0607.0809

Ethernet tag ID : 5

MAC address : 0001-0001-0001

IP address : 10.1.1.1/32

MPLS label1 : 10

MPLS label2 : 100

Re-origination : Enable

¡ If the MAC/IP advertisement route is an optimal route, go to step 4.

¡ If the MAC/IP advertisement route is an optimal route, modify routing configuration for the route to be an optimal one.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Diagnostic information collected by using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

Unavailability to access a VM after MAC migration

Symptom

After MAC address migration occurs on a VTEP, endpoints cannot access the migrated VM.

Common causes

The common causes of this issue are traffic anomalies and an incorrect outgoing interface in the MAC address entry for the VM due to network attacks.

Troubleshooting flow

Troubleshoot the issue by using the following process:

1. View the MAC address migration information.

2. Verify that the outgoing interface in the MAC address entry for the VM is correct.

Solution

1. Execute the display evpn route mac-mobility command to view the MAC address migration information. In the following output, MAC address 1000-0000-0000 has migrated from GE2/0/1 to the local VTEP.

<Sysname> display evpn route mac-mobility

Flags: S - Suppressed, N - Not suppressed

Suppression threshold: 5

Detection cycle : 180s

Suppression time : Permanent

VSI name : vsia

EVPN instance : -

MAC address Move count Moved from Flags Suppressed at

1000-0000-0000 10 GE2/0/1 S 15:30:30 2018/03/30

2. Execute the display l2vpn mac-address command to verify that the outgoing interface in the MAC address entry for the VM is correct. The Link ID/Name field displays the name of the interface or tunnel interface where a MAC address is learned.

<Sysname> display l2vpn mac-address

* - The output interface is issued to another VSI

MAC Address State VSI Name Link ID/Name Aging

1000-0000-0000 EVPN aaa Tunnel10 NotAging

52f6-bc1e-0d06 Dynamic vpna GE2/0/1 Aging

--- 2 mac address(es) found ---

¡ If the outgoing interface is correct, go to step 3.

¡ If the outgoing interface is incorrect, bring up the VM on the destination VTEP for the VTEP to learn or update its forwarding entries.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Diagnostic information collected by using the display diagnostic-information command.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A.

Troubleshooting ACL and QoS issues

QoS issues

Traffic failure to match a traffic class

Symptom

When you execute the display qos policy interface command to view the configuration information and running status of the QoS policy on an interface, you find that the current traffic on the interface does not match the traffic classes in the QoS policy.

· For a hardware forwarding product, if the Accounting enable field in traffic class 1 is 0 (Bytes/Packets) 0 (bps) in the command output, it means the number of packets that match the traffic class on the interface is zero. To view the statistics of traffic matching the traffic class on the hardware forwarding product, you must execute the accounting command in the traffic behavior of the QoS policy to configure the traffic accounting action.

· For a software forwarding product, if the Matched field in traffic class 1 is 0 (Packets) 0 (Bytes) in the command output, it means the number of packets that match the traffic class on the interface is 0. In the QoS policy of the software forwarding product, a default traffic class named default-class exists. All traffic that does not match any other traffic classes in the QoS policy will match the traffic class named default-class.

<Sysname> display qos policy interface gigabitethernet 2/0/1 inbound

Interface: GigabitEthernet2/0/1

Direction: Inbound

Policy: 1

Classifier: default-class

Matched : 213126 (Packets) 40928738 (Bytes)

5-minute statistics:

Forwarded: 20/4208 (pps/bps)

Dropped : 0/0 (pps/bps)

Operator: AND

Rule(s) :

If-match any

Behavior: be

-none-

Classifier: 1

Matched : 0 (Packets) 0 (Bytes)

5-minute statistics:

Forwarded: 0/0 (pps/bps)

Dropped : 0/0 (pps/bps)

Operator: AND

Rule(s) :

If-match acl 3000

Behavior: 1

Marking:

Remark dscp 3

Common causes

The following are the common causes of this type of issue:

· The interface that has the QoS policy applied is in down state and is not forwarding traffic.

· The configuration of a traffic class is incorrect, and it cannot match the forwarded traffic.

· A higher-priority policy is executed on the traffic matching the ACL in a traffic class of the QoS policy.

Troubleshooting flow

Figure 119 shows the troubleshooting flowchart.

Figure 119 Flowchart for troubleshooting traffic failure to match a traffic class

Solution

1. Identify whether the physical link state of the interface is normal.

Execute the display interface command on the device to check the interface status. For example:

<Sysname> display interface gigabitethernet 2/0/1

GigabitEthernet2/0/1

Interface index: 386

Current state: Administratively DOWN

Line protocol state: DOWN

…

a. If the Current state field displays Administratively DOWN, execute the undo shutdown command on the interface to bring up the interface.

b. If the Current state field displays DOWN, check the physical connection of the interface.

c. If the physical link of the interface is operating normally but the issue persists, proceed to the following steps.

2. Check the configuration of traffic classes in the QoS policy applied to the device interface.

Execute the display traffic classifier user-defined command on the device to check the configuration of user-defined traffic classes. For more information on the match criteria of the if-match command, see QoS commands in ACL and QoS Command Reference.

If the configuration of a traffic class is incorrect, execute the traffic classifier command to enter the view of the traffic class, and execute the if-match command to modify the match criteria of the traffic class. For example:

[Sysname-classifier-1] if-match dscp ef

[Sysname-classifier-1] display this

traffic classifier a operator or

if-match protocol ipv6

if-match dscp ef

Identify whether the logical relationship among various criteria, which is displayed in the Operator field, is correct. AND means that the criteria in this traffic class are ANDed. In this case, a packet must match all criteria to belong to this class. OR means that the criteria in this traffic class are ORed. In this case, a packet that matches any criterion belongs to this class. If more than one match criterion is in the Rule(s) field and the Operator field displays AND, it means that a packet must match all criteria to belong to this class. In this case, execute the traffic classifier command and set the operator parameter to or.

<Sysname> display traffic classifier user-defined

User-defined classifier information:

Classifier: 1 (ID 101)

Operator: AND

Rule(s) :

If-match dscp ef

Classifier: 2 (ID 102)

Operator: AND

Rule(s) :

If-match dscp af21

Classifier: 3 (ID 103)

Operator: AND

Rule(s) :

If-match dscp af11

If the traffic class in the QoS policy is configured correctly but the issue persists, proceed to the following steps.

3. When an ACL is referenced for traffic matching in a traffic class, it is possible that the QoS policy configured in MQC method will not take effect because a higher-priority behavior has been executed on the traffic matching the ACL. The priority order for different behaviors is as follows:

¡ In the outbound direction: Packet filtering > Global MQC QoS policy > MQC QoS policy applied to interface.

¡ In the inbound direction: Packet filtering > MQC QoS policy applied to interface > Global MQC QoS policy.

Execute the display current-configuration command to identify whether higher-priority policy behaviors exist in the current running configuration. If no such configurations exist but the issue persists, proceed to the following steps.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ Configuration data and log messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· QOS_POLICY_APPLYIF_CBFAIL

· QOS_POLICY_APPLYIF_FAIL

Ineffective QPPB policy

Symptom

As shown in Figure 120, in a typical QoS Policy Propagation Through the Border Gateway Protocol (QPPB) environment, Device A and Device B establish a BGP neighbor relationship. Device B sends BGP route 10.10.10.1/24 to Device A. Device A sets the IP precedence or local QoS ID value for BGP route 10.10.10.1/24 through a routing policy and adds it to the routing table of Device A. When Device A receives a packet destined to 10.10.10.1/24, it classifies the packet based on the IP precedence or local QoS ID value in the routing table of Device A and executes the corresponding action.

The QPPB policy does not take effect when Device A forwards packets. The packets are not effectively classified based on the IP precedence or local QoS ID value in the routing table of Device A, and the corresponding action is not executed.

The following section describes the troubleshooting flow in BGP IPv4 unicast address view. The flow is similar for other address family views.

Figure 120 Typical QPPB network

Common causes

The following are the common causes of this type of issue:

· The physical link between routers is not connected.

· The BGP route has failed to be advertised.

· The IP precedence or local QoS ID value fails to be issued to the routing table.

· The QPPB policy has not been applied to the forwarding interface.

· The configuration of the QPPB policy is incorrect.

Troubleshooting flow

To troubleshoot this type of fault:

· Identify whether the physical link between routers is operating normally.

· Identify whether the BGP neighbor relationship has been established normally.

· Identify whether BGP routes have been learned from the peer.

· Identify whether the IP precedence or local QoS ID value is properly issued to the routing table.

· Identify whether the QPPB policy configuration is correct.

· Identify whether the QPPB policy is applied to the forwarding interface.

Figure 105 shows the troubleshooting flowchart of this type of fault.

Figure 121 Flowchart for troubleshooting ineffective QPPB policy

Solution

1. Check the connectivity of the link between Device A and Device B.

Execute the display interface command on Device A, Device B, and the network devices between them to check the physical link state. View information of the interconnect interface on Device A as an example.

<Sysname> display interface gigabitethernet 2/0/1

GigabitEthernet2/0/1

Interface index: 386

Current state: Administratively DOWN

Line protocol state: DOWN

…

a. If the Current state field displays Administratively DOWN, execute the undo shutdown command on the interface to bring up the interface.

b. If the Current state field displays DOWN, check the physical connection of the interface.

c. If the physical link of the interface is operating normally but the issue persists, proceed to the following steps.

2. Identify whether the BGP neighbor relationship between Device A and Device B is normal. Device A must be able to learn routes from its BGP peer normally.

a. Execute the display ip routing-table protocol command on Device A. Identify whether Device A has normally learned BGP route 10.10.10.1/24 from the BGP peer (Device B).

- If this BGP route appears in the command output, it means BGP routes are learned normally. In this case, proceed to step 3.

- If this BGP route does not appear in the command output, it means BGP routes are learned abnormally. In this case, proceed to step b.

<Sysname> display ip routing-table protocol bgp

…

Destination/Mask Proto Pre Cost NextHop Interface

192.168.80.0/24 bgp 255 10 192.168.80.10 GE2/0/1

10.10.10.1/24 bgp 255 10 2.2.2.2 GE2/0/1

…

b. Device A and Device B establish a BGP neighbor relationship through their respective Loopback 1 interfaces. By executing the display bgp peer command on Device A, you can identify whether the BGP neighbor relationship between Device A and Device B is normal.

- If the State field of the peer (Device B) displays Established, it means that the BGP neighbor relationship between Device A and Device B is normal.

- If not, see the BGP troubleshooting guide to troubleshoot BGP-related issues.

<Sysname> display bgp peer ipv4

BGP local router ID: 1.1.1.1

Local AS number: 100

Total number of peers: 1 Peers in established state: 1

* - Dynamically created peer

Peer AS MsgRcvd MsgSent OutQ PrefRcv Up/Down State

2.2.2.2 200 13 16 0 0 00:10:34 Established

c. If the BGP neighbor relationship is normal and Device A can learn the route from the BGP peer normally, proceed to the following steps.

3. Identify whether the IP precedence or local QoS ID value in the routing table of Device A is correct.

Execute the display ip routing-table ip-address verbose command on Device A to identify whether the BGP route learned from Device B is configured with the correct IP precedence or local QoS ID value. The command output is as follows:

<Sysname> display ip routing-table 10.10.10.1 verbose

…

Destination: 10.10.10.1/24

Protocol: BGP

Process ID: 0

SubProtID: 0x1 Age: 00h00m37s

FlushedAge: 15h28m49s

Cost: 0 Preference: 255

IpPre: N/A QosLocalID: 100

Tag: 0 State: Active Adv

In the command output, the IpPre field represents the IP precedence value, and the QosLocalID field represents the local QoS ID value. If both of these fields have a value of N/A, it means that the BGP route learned from Device B has not been configured with an IP precedence or local QoS local ID value. Execute the following commands on Device A to add the relevant configuration.

a. Execute the ip prefix-list command to configure an IPv4 prefix list or an item for the list and permit routes destined to subnet 10.10.10.0/24 and with a mask length of 24.

ip prefix-list 10 index 10 permit 10.10.10.0 24

b. Execute the route-policy command to create a routing policy. In the routing policy, use the if-match ip command to configure the criteria of matching the IPv4 prefix list created above. Then, execute either the apply qos-local-id or apply ip-precedence command to configure the local QoS ID or IP precedence value.

route-policy a permit node 10

if-match ip address prefix-list 10

apply ip-precedence 1

apply qos-local-id 100

c. Execute the peer route-policy command in BGP IPv4 unicast address family view to apply the routing policy to the routes from the peer device (Device B).

If the BGP route learned from Device B is configured with the correct IP precedence or local QoS ID value, proceed to the following steps.

4. Identify whether the QPPB policy configuration is correct.

Execute the display qos policy user-defined command on Device A to view the QPPB policy configuration.

<Sysname> display qos policy user-defined

User-defined QoS policy information:

Policy: aaa (ID 106)

Classifier: aaa (ID 0)

Behavior: aaa

Redirecting:

Redirect to next-hop 192.168.10.1

a. Execute the display traffic classifier command to check the traffic class configuration in the QPPB policy. In this example, the Classifier field displays aaa.

<Sysname> display traffic classifier user-defined aaa

User-defined classifier information:

Classifier: aaa (ID 100)

Operator: AND

Rule(s) :

If-match qos-local-id 100

- If the Rule(s) field displays the If-match qos-local-id or If-match ip-precedence rule, make sure the match criteria of the traffic class are consistent with the IP precedence or local QoS ID value configured in the previous routing policy.

- If the If-match qos-local-id or If-match ip-precedence match criterion does not exist or if the traffic class configuration is inconsistent with the IP precedence or local QoS ID value configured in the routing policy, execute the undo if-match command in traffic class view to delete the original configuration and re-execute the if-match command to configure the match criterion that matches the IP precedence or local QoS ID value. For detailed troubleshooting steps regarding QoS policy failure, see troubleshooting ACL and QoS.

If the configuration of the QPPB policy is correct, proceed to the following steps.

5. Identify whether the QPPB feature is configured on the packet forwarding interface and a QPPB policy is applied.

On Device A, configure QPPB on the outgoing interface or incoming interface, and apply the QPPB policy to the interface. On this interface, use the display this command to identify whether the configuration is complete. Take the configuration on the incoming interface as an example. The configuration is displayed as follows:

bgp-policy destination ip-prec-map ip-qos-map

qos apply policy aaa inbound

If any type of the preceding configurations is missing, execute the bgp-policy or qos apply policy command on the interface to add the configuration. If the QPPB policy has already been applied to the packet forwarding interface and the QPPB feature has been configured, proceed to the following steps.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ Configuration data, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Ineffective QoS rate-limiting policy for user group

Symptom

In the network shown in Figure 122, interface A of the UP acts as the remote interface for the vBRAS-CP to connect online users, while interface B of the UP connects to the public network. Service traffic is forwarded from interface B of the UP to users connected to interface A. Downlink service traffic for all users on this interface requires a rate limit of 300 Mbps.

On the vBRAS-CP, create a traffic class that uses an ACL to match any user group and configure a traffic behavior to implement rate limiting. After associating the traffic class with the traffic behavior in the QoS policy, apply the QoS policy to the remote interface (interface A) to rate-limit the traffic for all users to interface A. The peak rate of the downlink service traffic can reach 500 Mbps, but the rate limit is set to 300 Mbps. The rate limiting feature does not take effect.

Figure 122 Network diagram for the QoS rate limiting policy for user groups

Common causes

The following are the common causes of this type of issue:

· The traffic class configuration of the QoS policy is incorrect on the vBRAS-CP.

· The traffic behavior configuration in the QoS policy is incorrect on the vBRAS-CP.

· The QoS policy on the vBRAS-CP is applied incorrectly.

Troubleshooting flow

Figure 123 shows the troubleshooting flowchart.

Figure 123 Flowchart for troubleshooting the ineffective QoS rate-limiting policy for user group

Solution

1. In the network as shown in Figure 122, check the traffic class configuration in the QoS policy on the vBRAS-CP. Perform all the following tasks on the CTRL-VM of the vBRAS-CP.

Execute the display traffic classifier user-defined command on the vBRAS-CP to check the configuration of the traffic class. For example, if the Rule(s) field displays If-match acl 3001, the rule matches the packets with an advanced ACL.

<Sysname> display traffic classifier user-defined

User-defined classifier information:

Classifier: aaa (ID 103)

Operator: AND

Rule(s) :

If-match acl 3001

According to the ACL number, execute the display acl command on the vBRAS-CP to further identify whether the parameters in the ACL numbered 3001 match any user groups. For example, the ACL configuration is as follows:

<Sysname> display acl 3001

Advanced IPv4 ACL 3001, 1 rule,

ACL's step is 5

rule 5 permit ip user-group-any

If the ACL in the traffic class has configuration errors, delete the incorrect configuration and reconfigure the ACL to match any user groups.

If other matching parameters exist in the ACL of the traffic class, execute the undo rule command to delete the interfering parameters from the ACL. Alternatively, you can execute the undo rule command, and then execute the rule command to reconfigure the ACL to match any user groups.

Identify whether the logical relationship among various criteria, which is displayed in the Operator field, is correct. AND means that the criteria in this traffic class are ANDed. In this case, a packet must match all criteria to belong to this class. OR means that the criteria in this traffic class are ORed. In this case, a packet that matches any criterion belongs to this class. Set the operator as needed in the traffic class. In this example, if more than one match criterion is in the Rule(s) field and the Operator field displays AND, delete the criteria that are not needed.

If the traffic class in the QoS policy is configured correctly but the issue persists, proceed to the following steps.

2. Check the traffic behavior configuration in the QoS policy on the vBRAS-CP.

On the vBRAS-CP, execute the display traffic behavior user-defined command to check the configuration of the traffic behavior. For example, if the Committed Access Rate field displays CIR 300000 (kbps) in the command output, it means the rate limit is 300 Mbps for packets matching the traffic class. If the traffic behavior has configuration errors, execute the traffic behavior command to enter the view of the traffic behavior and execute the car command to modify the forwarding action of the traffic behavior.

<Sysname> display traffic behavior user-defined aaa

User-defined behavior information:

Behavior: aaa (ID 104)

Committed Access Rate:

CIR 300000 (kbps), CBS 18750000 (Bytes), EBS 0 (Bytes)

Green action : pass

Yellow action : pass

Red action : discard

If the traffic behavior in the QoS policy is configured correctly but the issue persists, proceed to execute the following steps.

3. Check the configuration of the QoS policy on the vBRAS-CP.

a. Execute the display qos policy command on the vBRAS- CP to check the configuration of the QoS policy. The Classifier field and Behavior field in the command output should correspond to the correct traffic class and traffic behavior configured in the previous steps. If the class-behavior association is incorrect, execute the qos policy command to enter the view of the QoS policy, and execute the classifier behavior command to modify the class-behavior association of the QoS policy. If the configuration is correct, proceed to step b.

<Sysname> display qos policy user-defined aaa

User-defined QoS policy information:

Policy: aaa (ID 104)

Classifier: aaa (ID 1)

Behavior: aaa

Committed Access Rate:

CIR 300000 (kbps), CBS 18750000 (Bytes), EBS 0 (Bytes)

Green action : pass

Yellow action : pass

Red action : discard

b. Execute the display qos policy interface command on the vBRAS-CP to check the configuration of the QoS policy applied to the interface. If an incorrect QoS policy is applied to the remote interface or the QoS policy is not applied to the outbound direction, execute the undo qos apply policy command on remote interface Remote-GE 1024/1/3/0 to remove the incorrect configuration, and then execute the qos apply policy command to apply the correct QoS policy.

<Sysname> display qos policy interface Remote-GE 1024/1/3/0

Interface: Remote-GE 1024/1/3/0

Direction: Outbound

Policy: aaa

Classifier: aaa

Matched : 231231 (Packets) 69348888 (Bytes)

Operator: AND

Rule(s) :

If-match acl 3001

Behavior: aaa

Committed Access Rate:

CIR 300000 (kbps), CBS 18750000 (Bytes), EBS 0 (Bytes)

Green action : pass

Yellow action : pass

Red action : discard

Green packets : 231231 (Packets) 69348888 (Bytes)

Yellow packets: 0 (Packets) 0 (Bytes)

Red packets : 0 (Packets) 0 (Bytes)

c. If no QoS policy is applied to the interface, execute the display qos policy global command to check the configuration of the QoS policy applied to the outbound direction of the specified UP globally. If the QoS policy applied to the specified UP globally is incorrect or the QoS policy is not applied to the outbound direction, execute the undo qos apply policy global command on the vBRAS-CP to remove the incorrect QoS policy, and then execute the qos apply policy global command to apply the correct QoS policy.

<Sysname> display qos policy global up-id 1024

Direction: Outbound

Policy: aaa

Classifier: default-class

Matched : 0 (Packets) 0 (Bytes)

Operator: AND

Rule(s) :

If-match any

Behavior: be

-none-

Classifier: aaa

Matched : 14 (Packets) 2260 (Bytes)

Operator: AND

Rule(s) :

If-match acl 3001

Behavior: aaa

Committed Access Rate:

CIR 300000 (kbps), CBS 18750000 (Bytes), EBS 0 (Bytes)

Green action : pass

Yellow action : pass

Red action : discard

Green packets : 0 (Packets) 0 (Bytes)

Yellow packets: 0 (Packets) 0 (Bytes)

Red packets : 0 (Packets) 0 (Bytes)

If the configuration of the preceding QoS policy is correct and the QoS policy is applied normally but the issue persists, proceed to the following steps.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ Configuration data and related log messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

· QOS_POLICY_APPLYIF_CBFAIL

· QOS_POLICY_APPLYIF_FAIL

· QOS_POLICY_APPLYGLOBAL_CBFAIL

· QOS_POLICY_APPLYGLOBAL_FAIL

Troubleshooting IP tunneling and security VPN issues

IPsec issues

IKE negotiation triggering failures

Symptom

As shown in Figure 124, an IKE-based IPsec tunnel needs to be established between Device A and Device B to protect the private network traffic between Host A and Host B. The encapsulation mode for the IPsec tunnel is the tunnel mode. After completing the configuration on Device A and Device B, traffic fails to be forwarded between Host A and Host B.

After you execute the display ike sa command on Device A to view IKE SAs, no information is displayed.

<DeviceA> display ike sa

Connection-ID Local Remote Flag DOI

---------------------------------------------------------------

Flags:

RD--READY RL--REPLACED FD-FADING RK-REKEY

When you execute the display ike statistics command on Device A to view IKE statistics, no noticeable error is found.

<DeviceA> display ike statistics

IKE statistics:

No matching proposal: 0

Invalid ID information: 0

Unavailable certificate: 0

Unsupported DOI: 0

Unsupported situation: 0

Invalid proposal syntax: 0

Invalid SPI: 0

Invalid protocol ID: 0

Invalid certificate: 0

Authentication failure: 0

…

Figure 124 Network diagram

Common causes

The following are the common causes of this type of issue:

· A host cannot reach the corresponding IPsec gateway, or the IPsec gateways cannot reach each other.

· The configuration of the route from an IPsec gateway to the subnet where the peer host resides is incorrect.

· The configurations of the security policies between security zones are incorrect.

· The IPsec policy configurations are incorrect.

· The configurations of IKE profiles and IKE proposals are incorrect.

· The configurations of the protected data flows configured on the IPsec gateways are incorrect.

Troubleshooting flow

Figure 125 shows the troubleshooting flowchart.

Figure 125 Flowchart for troubleshooting IKE negotiation triggering failures

Solution

1. Check whether Host A and Host B can ping their respective IPsec gateways, and whether the IPsec gateways can ping each other:

Execute the ping command to check the network connectivity.

a. If the ping is unsuccessful, continue troubleshooting according to the procedures for troubleshooting ping failures in network management and monitoring troubleshooting guide. Make sure Host A and Host B can ping their respective IPsec gateways, and the IPsec gateways can ping each other.

b. If the issue persists, go to step 2.

2. Check whether the configuration of the route from each IPsec gateway to the subnet where the peer host resides is correct:

a. On each IPsec gateway, execute the display ip routing-table command to view the route information. Make sure a route to the subnet where the peer host resides exists on each IPsec gateway.

For example, the route information on Device A is as follows:

<DeviceA> display ip routing-table

Destinations : 1 Routes : 1

Destination/Mask Proto Pre Cost NextHop Interface

10.1.2.0/24 Static 60 0 2.2.2.2 GE2/0/2

The route information on Device B is as follows:

<DeviceB> display ip routing-table

Destinations : 1 Routes : 1

Destination/Mask Proto Pre Cost NextHop Interface

10.1.1.0/24 Static 60 0 2.2.3.2 GE2/0/2

b. If the route information is incorrect, configure the routes on Device A and Device B correctly as below:

<DeviceA> system-view

[DeviceA] ip route-static 10.1.2.0 24 2.2.2.2

<DeviceB> system-view

[DeviceB] ip route-static 10.1.1.0 24 2.2.3.2

c. If the issue persists, go to step 3.

3. Check whether the configurations of the security policies between the security zones are correct:

Check the security zone and security policy configurations on Device A. Make sure rules permitting traffic between the security zones have been configured in the security policies. If not, configure the security policies as follows:

a. Configure rules to permit traffic between the Untrust and Local security zones, so that the devices can establish an IPsec tunnel:

# Configure a rule named ipseclocalout to allow Device A to send IPsec negotiation packets to Device B.

[DeviceA] security-policy ip

[DeviceA-security-policy-ip] rule name ipseclocalout

[DeviceA-security-policy-ip-0-ipseclocalout] source-zone local

[DeviceA-security-policy-ip-0-ipseclocalout] destination-zone untrust

[DeviceA-security-policy-ip-0-ipseclocalout] source-ip-host 2.2.2.1

[DeviceA-security-policy-ip-0-ipseclocalout] destination-ip-host 2.2.3.1

[DeviceA-security-policy-ip-0-ipseclocalout] action pass

[DeviceA-security-policy-ip-0-ipseclocalout] quit

# Configure a rule named ipseclocalin to allow Device A to receive the IPsec negotiation packets sent from Device B.

[DeviceA-security-policy-ip] rule name ipseclocalin

[DeviceA-security-policy-ip-1-ipseclocalin] source-zone untrust

[DeviceA-security-policy-ip-1-ipseclocalin] destination-zone local

[DeviceA-security-policy-ip-1-ipseclocalin] source-ip-host 2.2.3.1

[DeviceA-security-policy-ip-1-ipseclocalin] destination-ip-host 2.2.2.1

[DeviceA-security-policy-ip-1-ipseclocalin] action pass

[DeviceA-security-policy-ip-1-ipseclocalin] quit

b. Configure rules to permit the traffic between Host A and Host B:

# Configure a rule named trust-untrust to permit the packets from Host A to Host B.

[DeviceA-security-policy-ip] rule name trust-untrust

[DeviceA-security-policy-ip-2-trust-untrust] source-zone trust

[DeviceA-security-policy-ip-2-trust-untrust] destination-zone untrust

[DeviceA-security-policy-ip-2-trust-untrust] source-ip-subnet 10.1.1.0 24

[DeviceA-security-policy-ip-2-trust-untrust] destination-ip-subnet 10.1.2.0 24

[DeviceA-security-policy-ip-2-trust-untrust] action pass

[DeviceA-security-policy-ip-2-trust-untrust] quit

# Configure a rule named untrust-trust to permit the packets from Host B to Host A.

[DeviceA-security-policy-ip] rule name untrust-trust

[DeviceA-security-policy-ip-3-untrust-trust] source-zone untrust

[DeviceA-security-policy-ip-3-untrust-trust] destination-zone trust

[DeviceA-security-policy-ip-3-untrust-trust] source-ip-subnet 10.1.2.0 24

[DeviceA-security-policy-ip-3-untrust-trust] destination-ip-subnet 10.1.1.0 24

[DeviceA-security-policy-ip-3-untrust-trust] action pass

[DeviceA-security-policy-ip-3-untrust-trust] quit

[DeviceA-security-policy-ip] quit

Check the security zone and security policy configurations on Device B. Make sure rules permitting traffic between the security zones have been configured in the security policies. If not, configure the security policies as follows:

a. Configure rules to permit traffic between the Untrust and Local security zones, so that the devices can establish an IPsec tunnel:

# Configure a rule named ipseclocalout to allow Device B to send IPsec negotiation packets to Device A.

[DeviceB] security-policy ip

[DeviceB-security-policy-ip] rule name ipseclocalout

[DeviceB-security-policy-ip-0-ipseclocalout] source-zone local

[DeviceB-security-policy-ip-0-ipseclocalout] destination-zone untrust

[DeviceB-security-policy-ip-0-ipseclocalout] source-ip-host 2.2.3.1

[DeviceB-security-policy-ip-0-ipseclocalout] destination-ip-host 2.2.2.1

[DeviceB-security-policy-ip-0-ipseclocalout] action pass

[DeviceB-security-policy-ip-0-ipseclocalout] quit

# Configure a rule named ipseclocalin to allow Device B to receive the IPsec negotiation packets sent from Device A.

[DeviceB-security-policy-ip] rule name ipseclocalin

[DeviceB-security-policy-ip-1-ipseclocalin] source-zone untrust

[DeviceB-security-policy-ip-1-ipseclocalin] destination-zone local

[DeviceB-security-policy-ip-1-ipseclocalin] source-ip-host 2.2.2.1

[DeviceB-security-policy-ip-1-ipseclocalin] destination-ip-host 2.2.3.1

[DeviceB-security-policy-ip-1-ipseclocalin] action pass

[DeviceB-security-policy-ip-1-ipseclocalin] quit

b. Configure rules to permit traffic between Host B and Host A:

# Configure a rule named trust-untrust to permit the packets from Host B to Host A.

[DeviceB-security-policy-ip] rule name trust-untrust

[DeviceB-security-policy-ip-2-trust-untrust] source-zone trust

[DeviceB-security-policy-ip-2-trust-untrust] destination-zone untrust

[DeviceB-security-policy-ip-2-trust-untrust] source-ip-subnet 10.1.2.0 24

[DeviceB-security-policy-ip-2-trust-untrust] destination-ip-subnet 10.1.1.0 24

[DeviceB-security-policy-ip-2-trust-untrust] action pass

[DeviceB-security-policy-ip-2-trust-untrust] quit

# Configure a rule named untrust-trust to permit the packets from Host A to Host B.

[DeviceB-security-policy-ip] rule name untrust-trust

[DeviceB-security-policy-ip-3-untrust-trust] source-zone untrust

[DeviceB-security-policy-ip-3-untrust-trust] destination-zone trust

[DeviceB-security-policy-ip-3-untrust-trust] source-ip-subnet 10.1.1.0 24

[DeviceB-security-policy-ip-3-untrust-trust] destination-ip-subnet 10.1.2.0 24

[DeviceB-security-policy-ip-3-untrust-trust] action pass

[DeviceB-security-policy-ip-3-untrust-trust] quit

[DeviceB-security-policy-ip] quit

For more information, see security policy issues in Troubleshooting Security.

If the issue persists, go to step 4.

4. Check whether the IPsec policy configurations are correct:

a. Execute the display ipsec policy command on the local IPsec gateway Device A. View the peer address, displayed in the Remote address field, that has been configured in the corresponding IPsec policy.

[DeviceA] display ipsec policy

-----------------------------

IPsec Policy: mypolicy

-----------------------------

Sequence number: 2

Alias: hub1-spoke2

Mode: ISAKMP

-----------------------------

Description: This is my complete policy

Traffic Flow Confidentiality: Enabled

Security data flow: 3002

Selector mode: standard

Local address:2.2.2.1

Remote address: 2.2.3.1

Remote address:

Remote address switchback mode: Enabled

Transform set: completetransform

b. Execute the display ipsec policy command on the peer IPsec gateway Device B. View the address displayed in the Local address field, which is either the local address configured in the corresponding IPsec policy, or the address of interface applying the IPsec policy (if no local address is configured).

[DeviceB] display ipsec policy

-----------------------------

IPsec Policy: mypolicy

-----------------------------

Sequence number: 2

Alias: hub1-spoke2

Mode: ISAKMP

-----------------------------

Description: This is my complete policy

Traffic Flow Confidentiality: Enabled

Security data flow: 3002

Selector mode: standard

Local address: 2.2.3.1

Remote address: 2.2.2.1

Remote address:

Remote address switchback mode: Enabled

Transform set: completetransform

c. Verify that the addresses displayed in the two fields are the same.

d. If the issue persists, go to step 5.

5. Check whether the configurations of IKE profiles and IKE proposals are incorrect:

a. Check the IKE profile configuration on each device. Verify that the local and peer IPsec gateway addresses are configured correctly. If preshared key authentication is used, the preshared keys configured (using the pre-shared-key command) on the local and peer ends must be the same. If RSA signature or digital envelope authentication is used, make sure the digital certificate is within the validity period (which can be viewed by the display pki certificate domain command).

For example, the IKE profile configuration on Device A is as follows:

[DeviceA] ike keychain keychain1

[DeviceA-ike-keychain-keychain1] pre-shared-key address 2.2.3.1 255.255.255.0 key simple 123456TESTplat&!

[DeviceA-ike-keychain-keychain1] quit

[DeviceA] ike profile profile1

[DeviceA-ike-profile-profile1] keychain keychain1

[DeviceA-ike-profile-profile1] local-identity address 2.2.2.1

[DeviceA-ike-profile-profile1] match remote identity address 2.2.3.1 255.255.255.0

[DeviceA-ike-profile-profile1] quit

The IKE profile configuration on Device B is as follows:

[DeviceB] ike keychain keychain1

[DeviceB-ike-keychain-keychain1] pre-shared-key address 2.2.2.1 255.255.255.0 key simple 123456TESTplat&!

[DeviceB-ike-keychain-keychain1] quit

[DeviceB] ike profile profile1

[DeviceB-ike-profile-profile1] keychain keychain1

[DeviceB-ike-profile-profile1] local-identity address 2.2.3.1

[DeviceB-ike-profile-profile1] match remote identity address 2.2.2.1 255.255.255.0

[DeviceB-ike-profile-profile1] quit

b. Execute the display ike proposal command on Device A and Device B to check whether the IKE proposal configurations are consistent. Make sure the configuration parameters are consistent, as shown as below:

[DeviceA] display ike proposal

Priority Authentication Authentication Encryption Diffie-Hellman Duration

method algorithm algorithm group (seconds)

----------------------------------------------------------------------------

default PRE-SHARED-KEY SHA1 DES-CBC Group 1 86400

[DeviceB] display ike proposal

Priority Authentication Authentication Encryption Diffie-Hellman Duration

method algorithm algorithm group (seconds)

----------------------------------------------------------------------------

default PRE-SHARED-KEY SHA1 DES-CBC Group 1 86400

c. If the issue persists, go to step 6.

6. Check whether the configuration of the data flow to be protected on each IPsec gateway is correct:

a. On Device A, execute the display ipsec policy command to view the ACL used by the IPsec policy (displayed in the Security data flow field).

[DeviceA] display ipsec policy

-----------------------------

IPsec Policy: mypolicy

-----------------------------

Sequence number: 2

Alias: hub1-spoke2

Mode: ISAKMP

-----------------------------

Description: This is my complete policy

Traffic Flow Confidentiality: Enabled

Security data flow: 3002

Then, on Device A, execute the display acl command to check whether the rule information of ACL 3002 is consistent with the scope of data flows to be protected.

[Device A] display acl 3002

Advanced IPv4 ACL 3002, 1 rule,

ACL's step is 5

rule 0 permit ip source 10.1.1.0 0.0.0.255 destination 10.1.2.0 0.0.0.255

If the configuration is incorrect, configure an IPv4 advanced ACL to identify data flows from subnet where Host A resides to the subnet where Host B resides correctly.

[DeviceA] acl advanced 3002

[DeviceA-acl-ipv4-adv-3002] rule 0 permit ip source 10.1.1.0 0.0.0.255 destination 10.1.2.0 0.0.0.255

[DeviceA-acl-ipv4-adv-3002] quit

[DeviceA] ipsec policy policy2 1 isakmp

[DeviceA-ipsec-policy-isakmp-policy2-1] security acl 3002 aggregation

b. On Device B, execute the display ipsec policy command to view the ACL used by the IPsec policy (displayed in the Security data flow field).

[DeviceB] display ipsec policy

-----------------------------

IPsec Policy: mypolicy

-----------------------------

Sequence number: 2

Alias: hub1-spoke2

Mode: ISAKMP

-----------------------------

Description: This is my complete policy

Traffic Flow Confidentiality: Enabled

Security data flow: 3002

Then, on Device B, execute the display acl command to check whether the rule information of ACL 3002 is consistent with the scope of data flows to be protected.

Show the ACL 3002 details on Device A.

Advanced IPv4 Access Control List 3002, which contains 1 rule,

The step size of ACL is 5.

Allow IP traffic with a source address of 10.1.2.0-10.1.2.255 and a destination address of 10.1.1.0-10.1.1.255 by using rule 0.

If the configuration is incorrect, configure an IPv4 advanced ACL to identify data flows from subnet where Host B resides to the subnet where Host A resides correctly.

[DeviceB] acl advanced 3002

[DeviceB-acl-ipv4-adv-3002] rule 0 permit ip source 10.1.2.0 0.0.0.255 destination 10.1.1.0 0.0.0.255

[DeviceB-acl-ipv4-adv-3002] quit

[DeviceB] ipsec policy policy2 1 isakmp

[DeviceB-ipsec-policy-isakmp-policy2-1] security acl 3002 aggregation

c. If the issue persists, go to step 7.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Failures in triggering IKE negotiations (using an IPsec profile)

Symptom

As shown in Figure 126, an IKE-based IPsec tunnel needs to be established between Device A and Device B to protect the private network traffic between Host A and Host B. The encapsulation mode for the IPsec tunnel is the tunnel mode. After completing the configuration on Device A and Device B, traffic fails to be forwarded between Host A and Host B.

After you execute the display ike sa command on Device A, no information is displayed, which indicates that the phase-1 IKE negotiation was unsuccessful. RD is displayed in the Flag field after you execute the display ike sa command and no information is displayed after you execute the display ipsec sa command. This indicates that the phase-2 IKE negotiation was also unsuccessful.

<DeviceA> display ike sa

Connection-ID Local Remote Flag DOI

---------------------------------------------------------------

Flags:

RD--READY RL--REPLACED FD-FADING RK-REKEY

<DeviceA> display ipsec sa

When you execute the display ike statistics command on Device A to view IKE statistics, no noticeable error is found.

<DeviceA> display ike statistics

IKE statistics:

No matching proposal: 0

Invalid ID information: 0

Unavailable certificate: 0

Unsupported DOI: 0

Unsupported situation: 0

Invalid proposal syntax: 0

Invalid SPI: 0

Invalid protocol ID: 0

Invalid certificate: 0

Authentication failure: 0

…

After you execute the display ipsec statistics command on Device A to view IPsec statistics, no noticeable error is found.

<DeviceA> display ipsec statistics

IPsec packet statistics:

Received/sent packets: 0/0

Received/sent bytes: 0/0

Received/sent packet rate: 0/0 packets/sec

Received/sent byte rate: 0/0 bytes/sec

Dropped packets (received/sent): 0/0

Dropped packets statistics

No available SA: 0

Wrong SA: 0

Invalid length: 0

Authentication failure: 0

Encapsulation failure: 0

Decapsulation failure: 0

Replayed packets: 0

ACL check failure: 0

MTU check failure: 0

Loopback limit exceeded: 0

Crypto speed limit exceeded: 0

Figure 126 Network diagram

Common causes

The following are the common causes of this type of issue:

· The route between IPsec gateways is unreachable.

· The IPsec profile configuration is incorrect.

· The configurations of the IKE profiles and IKE proposals are incorrect.

Troubleshooting flow

Figure 127 shows the troubleshooting flowchart.

Figure 127 Flowchart for troubleshooting failures in triggering IKE negotiations (using an IPsec profile)

Solution

1. Check whether the IPsec gateways can ping each other:

Use the ping command to check the network connectivity.

b. If the issue persists, go to step 2.

2. Check whether the IPsec profile configurations are correct:

a. Execute the display ipsec profile command to check whether the configurations on the local IPsec gateway Device A and the peer IPsec gateway Device B are complete. Verify that both transform set and IKE profile have been configured on each device. Make sure security proposals with the same encryption algorithm, authentication algorithm, and PFS are configured on the devices.

For example, the output on Device A is as follows:

[DeviceA] display ipsec profile

-------------------------------------------

IPsec profile: myprofile

Alias: ccc

Mode: isakmp

-------------------------------------------

Transform set: tran1

IKE profile: profile

SA duration(time based): 3600 seconds

SA duration(traffic based): 1843200 kilobytes

SA soft-duration buffer(time based): 1000 seconds

SA soft-duration buffer(traffic based): 43200 kilobytes

SA idle time: 100 seconds

[DeviceA] display ipsec transform-set

IPsec transform set: tran1

State: complete

Encapsulation mode: tunnel

ESN: Enabled

PFS:

Transform: AH-ESP

AH protocol:

Integrity: SHA1

ESP protocol:

Integrity: SHA1

Encryption: AES-CBC-128

The output on Device B is as follows:

[DeviceB] display ipsec profile

-------------------------------------------

IPsec profile: myprofile

Alias: ddd

Mode: isakmp

-------------------------------------------

Transform set: tran1

IKE profile: profile

SA duration(time based): 3600 seconds

SA duration(traffic based): 1843200 kilobytes

SA soft-duration buffer(time based): 1000 seconds

SA soft-duration buffer(traffic based): 43200 kilobytes

SA idle time: 100 seconds

[DeviceB] display ipsec transform-set

IPsec transform set: tran1

State: complete

Encapsulation mode: tunnel

ESN: Enabled

PFS:

Transform: AH-ESP

AH protocol:

Integrity: SHA1

ESP protocol:

Integrity: SHA1

Encryption: AES-CBC-128

b. If the issue persists, go to step 3.

3. Check whether the IPsec profiles are correctly configured on the tunnel interfaces.

a. Execute the interface tunnel command on the IPsec gateway Device A to enter tunnel interface Tunnel 1. Execute the display this command to check whether the local and peer addresses and the IPsec profile are configured correctly on the tunnel interface.

[DeviceA] interface tunnel 1

[DeviceA-Tunnel1] display this

interface Tunnel1 mode ipsec

ip address 3.3.3.1 255.255.255.0

source 2.2.2.1

destination 2.2.3.1

tunnel protection ipsec profile myprofile

[DeviceA-Tunnel1] quit

If configuration errors exist, modify the configuration as follows:

[DeviceA] interface tunnel 1 mode ipsec

[DeviceA-Tunnel1] ip address 3.3.3.1 255.255.255.0

[DeviceA-Tunnel1] source 2.2.2.1

[DeviceA-Tunnel1] destination 2.2.3.1

[DeviceA-Tunnel1] tunnel protection ipsec profile myprofile

[DeviceA-Tunnel1] quit

b. Execute the interface tunnel command on the IPsec gateway Device B to enter tunnel interface Tunnel 1. Execute the display this command to check whether the local and peer addresses and the IPsec profile are configured correctly on the tunnel interface.

[DeviceB] interface tunnel 1

[DeviceB-Tunnel1] display this

interface Tunnel1 mode ipsec

ip address 3.3.3.2 255.255.255.0

source 2.2.3.1

destination 2.2.2.1

tunnel protection ipsec profile myprofile

[DeviceB-Tunnel1] quit

If configuration errors exist, modify the configuration as follows:

[DeviceB] interface tunnel 1 mode ipsec

[DeviceB-Tunnel1] ip address 3.3.3.2 255.255.255.0

[DeviceB-Tunnel1] source 2.2.3.1

[DeviceB-Tunnel1] destination 2.2.2.1

[DeviceB-Tunnel1] tunnel protection ipsec profile myprofile

[DeviceB-Tunnel1] quit

c. If the issue persists, go to step 4.

4. Check whether the IKE profile and IKE proposal configurations are correct.

a. Check the IKE profile configuration on each device. Verify that the local and peer IPsec gateway addresses are configured correctly. If preshared key authentication is used, the preshared keys configured (using the pre-shared-key command) on the local and peer ends must be the same. If RSA signature or digital envelope authentication is used, make sure the digital certificate is within the validity period (displayed in the Validity field of the output for the display pki certificate domain command).

For example, the IKE profile configuration on Device A is as follows:

[DeviceA] ike keychain keychain1

[DeviceA-ike-keychain-keychain1] pre-shared-key address 2.2.3.1 255.255.255.0 key simple 123456TESTplat&!

[DeviceA-ike-keychain-keychain1] quit

[DeviceA] ike profile profile

[DeviceA-ike-profile-profile] keychain keychain1

[DeviceA-ike-profile-profile] local-identity address 2.2.2.1

[DeviceA-ike-profile-profile] match remote identity address 2.2.3.1 255.255.255.0

[DeviceA-ike-profile-profile] quit

The IKE profile configuration on Device B is as follows:

[DeviceB] ike keychain keychain1

[DeviceB-ike-keychain-keychain1] pre-shared-key address 2.2.2.1 255.255.255.0 key simple 123456TESTplat&!

[DeviceB-ike-keychain-keychain1] quit

[DeviceB] ike profile profile

[DeviceB-ike-profile-profile] keychain keychain1

[DeviceB-ike-profile-profile] local-identity address 2.2.3.1

[DeviceB-ike-profile-profile] match remote identity address 2.2.2.1 255.255.255.0

[DeviceB-ike-profile-profile] quit

b. Execute the display ike proposal command on IPsec gateways Device A and Device B respectively to view the IKE proposal configurations. Verify that the IKE proposal configurations are consistent.

[DeviceA] display ike proposal

Priority Authentication Authentication Encryption Diffie-Hellman Duration

method algorithm algorithm group (seconds)

----------------------------------------------------------------------------

default PRE-SHARED-KEY SHA1 DES-CBC Group 1 86400

[DeviceB] display ike proposal

Priority Authentication Authentication Encryption Diffie-Hellman Duration

method algorithm algorithm group (seconds)

----------------------------------------------------------------------------

default PRE-SHARED-KEY SHA1 DES-CBC Group 1 86400

c. If the issue persists, go to step 5.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Collected information related to establishment of the IPsec tunnel after you execute the debugging commands as follows.

<DeviceA> terminal debugging

The current terminal is enabled to display debugging logs.

<DeviceA> terminal monitor

The current terminal is enabled to display logs.

<DeviceA> debugging ike all

<DeviceA> debugging ipsec all

IP tunneling issues

Failure in pinging the IP address of the remote tunnel interface from the local tunnel interface for a P2P tunnel

Symptom

After you configure a P2P tunnel (for example, a GRE, IPv4, or IPv6 tunnel), you cannot ping the IP address of the remote tunnel interface from the IP address of the local tunnel interface.

This section uses a GRE/IPv4 tunnel to describe the troubleshooting procedure.

NOTE:

The troubleshooting procedure in this section is not applicable to P2MP tunnels like DS-Lite and GRE P2MP tunnels.

Common causes

The following are the common causes of this type of issue:

· Configuration errors. For example, the tunnel modes at the two ends of the tunnel are inconsistent, or no source or destination address is configured on any of the tunnel interfaces attached to the tunnel. Another example is that the source and destination addresses at one end are not the destination and source addresses at the other end, respectively.

· Physical link disconnectivity. The tunnel interface at each end cannot come up because no routes exist between the source and destination addresses of the tunnel. Another case is that the routes for the physical links that the tunnel relies on are all down. In this case, the intermediate devices drop tunneled packets even if the tunnel interfaces at both ends are up.

Troubleshooting flow

Figure 128 shows the troubleshooting flowchart.

Figure 128 Flowchart for troubleshooting the failure in pinging the IP address of the remote tunnel interface

Solution

1. Verify that the tunnel interface configuration is complete on both ends of the tunnel.

Execute the display current-configuration interface tunnel command on both ends of the tunnel to display the tunnel interface configuration. Make sure the tunnel source address, tunnel destination address, and IP address of the tunnel interface have all been configured on each end.

<Sysname> display current-configuration interface tunnel 1

interface Tunnel1 mode gre

ip address 10.1.1.1 255.255.255.0

source 1.1.1.1

destination 1.1.1.2

If the configuration of the tunnel interface on one end is incomplete, supplement the missing configuration. The following information provides an example of the tunnel interface configuration:

<Sysname> system-view

[Sysname] interface tunnel 1 mode gre

[Sysname-Tunnel1] ip address 10.1.1.1 255.255.255.0

[Sysname-Tunnel1] source 1.1.1.1

[Sysname-Tunnel1] destination 1.1.1.2

2. Verify that the encapsulation modes at both ends of the tunnel are the same.

On each end, execute the display current-configuration interface tunnel command to display the encapsulation mode of the tunnel interface.

<Sysname> display current-configuration interface tunnel 1

interface Tunnel1 mode gre

ip address 10.1.1.1 255.255.255.0

source 1.1.1.1

destination 1.1.1.2

If the encapsulation modes at both ends are inconsistent, you must first execute the undo interface tunnel command to delete the tunnel interface with an incorrect mode, and then execute the interface tunnel command to re-create the tunnel interface. Deleting a tunnel interface also deletes the configuration on that tunnel interface. You must reconfigure the tunnel source address, tunnel destination address, and IP address of the tunnel interface after the tunnel interface is re-created.

3. Verify that the source and destination addresses at one end of the tunnel are the destination and source addresses at the other end of the tunnel, respectively.

On each end, execute the display current-configuration interface tunnel command to display the tunnel interface configuration. Make sure the tunnel source address on the local end is the tunnel destination address on the remote end and the tunnel destination address on the local end is the tunnel source address on the remote end. In addition, the tunnel source address on each end must be a local address.

Local end:

<Sysname> display current-configuration interface tunnel 1

interface Tunnel1 mode gre

ip address 10.1.1.1 255.255.255.0

source 1.1.1.1

destination 1.1.1.2

Remote end:

<Sysname> display current-configuration interface tunnel 1

interface Tunnel1 mode gre

ip address 10.1.1.2 255.255.255.0

source 1.1.1.2

destination 1.1.1.1

If the tunnel source or destination address on one end is incorrectly configured, execute the source or destination command in tunnel interface view to reconfigure the tunnel source or destination address.

4. Verify that the GRE keys at both ends of the tunnel are identical.

You must configure the same GRE key at both ends of a GRE tunnel, or do not configure any GRE key at both ends of a GRE tunnel. To check the GRE key configuration, execute the display current-configuration interface tunnel command on both ends.

Local end:

interface Tunnel1 mode gre

ip address 10.1.1.1 255.255.255.0

source 1.1.1.1

destination 1.1.1.2

gre key 123

Remote end:

interface Tunnel1 mode gre

ip address 10.1.1.2 255.255.255.0

source 1.1.1.2

destination 1.1.1.1

gre key 123

If the GRE keys configured on both ends of the tunnel are different, execute the gre key command in tunnel interface view to configure the same GRE key on both ends.

5. Verify that the tunnel interfaces at both ends are already up.

Execute the display interface tunnel command to display the tunnel interface state. If the tunnel interface on one end is still down after you perform steps 1 and 2, you can continue to use the procedure for troubleshooting tunnel interface instability.

<Sysname> display interface tunnel 1

Tunnel1

Current state: UP

Line protocol state: UP

Description: Tunnel1 Interface

Bandwidth: 64kbps

Maximum transmission unit: 1476

Internet address: 10.1.2.1/24 (primary)

Tunnel source 2002::1:1 (Vlan-interface10), destination 2001::2:1

Tunnel TOS 0xC8, Tunnel TTL 255

Tunnel protocol/transport GRE/IPv6

...

6. Verify that the source and destination IP addresses of the tunnel have routes to reach each other.

Execute the display current-configuration interface tunnel command to identify whether the IP addresses of the tunnel interfaces at both ends of the tunnel belong to the same subnet. If they belong to the same subnet, the two ends will generate subnet routes by default. In this case, no physical link disconnectivity issue exists. If they do not belong to the same subnet, execute the display fib command to identify whether the source and destination IP addresses of the tunnel have routes to reach each other. If no routes are available, you must configure static or dynamic routes to make sure the source and destination IP addresses of the tunnel have routes to reach each other. If the issue persists, proceed to step 7.

<Sysname> display fib

Route destination count: 4

Directly-connected host count: 0

Flag:

U:Useable G:Gateway H:Host B:Blackhole D:Dynamic S:Static

R:Relay F:FRR

Destination/Mask Nexthop Flag OutInterface/Token Label

0.0.0.0/32 127.0.0.1 UH InLoop0 Null

1.1.1.2/24 192.168.126.1 USGF M-GE0/0/0 Null

127.0.0.0/8 127.0.0.1 U InLoop0 Null

127.0.0.0/32 127.0.0.1 UH InLoop0 Null

7. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Command output from the debugging commands in Table 18.

Table 18 Debugging commands

Command	Description
debugging tunnel	Enable tunneling debugging.
debugging gre	Enable GRE debugging.
debugging ip packet [ acl acl-number ]	Enable IP packet debugging.
debugging ipv6 packet [ acl acl-number ]	Enable IPv6 packet debugging.
debugging ip error	Enable IP forwarding error debugging.
debugging ip info [ acl acl-number ]	Enable IP forwarding debugging.
debugging ipv6 info [ acl acl-number ]	Enable IPv6 forwarding debugging.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting user access and authentication issues

802.1X issues

802.1X user authentication failure

Symptom

A user fails 802.1X authentication or an exception occurs during 802.1X authentication.

Common causes

The following are the common causes of this type of issue:

· 802.1X is not enabled globally or on the interface that the user accesses.

· The 802.1X client cannot correctly send or receive authentication packets.

· The authentication method configured on the device is inconsistent with that on the RADIUS server.

· Incorrect settings exist in the authentication domain used by the 802.1X user or other authentication-related settings have errors.

· The RADIUS server does not respond.

· The RADIUS server rejects the authentication request of the user.

· Authorization attribute assignment fails.

· The MAC address of the 802.1X user is bound to an interface that is not the interface that the user accesses.

· The 802.1X user is in quiet state.

· The maximum number of online 802.1X users already reached.

Troubleshooting flow

Figure 129 shows the troubleshooting flowchart.

Figure 129 Flowchart for troubleshooting 802.1X user authentication failure

Solution

IMPORTANT:

· As a best practice, do not enable debugging when the device is running correctly. However, you can enable debugging when a fault occurs for troubleshooting purposes.

· Save the results of the steps in this section in a timely manner, so that you can quickly collect and provide feedback if the fault cannot be resolved.

1. Verify that 802.1X is enabled globally and on the interface that the user accesses.

Execute the display dot1x command on the device to identify whether 802.1X is enabled both globally and on the interface that the user accesses.

¡ If message 802.1X is not configured appears, 802.1X is not enabled globally. You can execute the dot1x command in system view to globally enable 802.1X.

¡ If the output from the display dot1x command has global configuration information but does not have interface-specific configuration information, 802.1X is not enabled on the interface. You can execute the dot1x command in interface view.

2. Verify that the 802.1X client can correctly send and receive authentication packets.

¡ Verify that the 802.1X client version is a version supported by both the device and the server.

¡ Verify that the link between the device and the 802.1X client is correctly connected.

¡ Capture packets to inspect whether the device can correctly exchange data packets with the client and analyze the captured packet file to locate and resolve the issue.

3. Verify that the authentication method is consistent on the device and the RADIUS server.

On the device, 802.1X supports EAP termination (PAP and CHAP authentication methods) and EAP relay (EAP authentication method). When you configure the authentication method, follow these restrictions and guidelines:

¡ Make sure the authentication method configured on the device and the RADIUS server is consistent and the client supports the authentication method.

¡ Local authentication only supports EAP termination.

Execute the display dot1x command on the device to check the current 802.1X authentication method.

<Sysname> display dot1x

Global 802.1X parameters:

802.1X authentication : Enabled

DR member configuration conflict : Unknown

EAP authentication : Enabled

...

If the authentication method is inconsistent with the server, you can execute the dot1x authentication-method command to change the authentication method.

4. Verify that the authentication domain and its related settings are correctly configured.

The device chooses an authentication domain for an 802.1X user in the following order: The mandatory 802.1X authentication domain specified on the interface that the user accesses -> The ISP domain specified in the username -> The default ISP domain in the system.

a. Execute the display dot1x command on the device to examine whether a mandatory 802.1X authentication domain has been specified on the interface that the user accesses.

<Sysname> display dot1x

…

GigabitEthernet2/0/1 is link-up

802.1X authentication : Enabled

…

Multicast trigger : Enabled

Mandatory auth domain : Not configured

…

If a mandatory 802.1X authentication domain has been specified, execute the display domain command to verify that authentication methods are correctly configured in the mandatory 802.1X authentication domain.

b. If no mandatory 802.1X authentication domain has been specified, check the 802.1X username for a domain name. If the 802.1X username includes a domain name, verify that the domain name delimiter is also supported by the RADIUS server, and then locate the domain specified by the username and verify that the settings in the domain are correct.

c. If the 802.1X username does not include a domain name, check the configuration of the default authentication domain.

d. If the default authentication domain does not exist, identify whether the domain if-unknown command has been executed. If the command has been executed, verify that authentication methods are correctly configured in the domain specified by using the command.

e. If none of the above mentioned authentication domains on the device are available for the user, the user cannot complete the authentication.

5. Verify that the RADIUS server can respond to the device.

For more information about the troubleshooting procedure, see the issue of RADIUS server no response in “Troubleshooting AAA.”

6. Identify whether the offline reason is authentication rejection.

a. Execute the debugging dot1x event command to enable 802.1X authentication event debugging.

- If the system generates debugging message Local authentication request was rejected, the user is rejected by local authentication. Causes for local authentication rejection include nonexistence of local user account, incorrect username or password, and incorrect user service type.

- If the system generates debugging message The RADIUS server rejected the authentication request, the user is rejected by the RADIUS server. Many reasons can cause RADIUS server authentication rejection. The most common ones include absence of username on the server, incorrect username or password, and no matching RADIUS authorization policy. You can execute the debugging radius error command to enable RADIUS error debugging and check the debugging messages in the command output. In addition, execute the test-aaa command on the device to initiate a RADIUS request test. After you locate the issue, adjust the settings of the server, device, and client accordingly.

b. Execute the display aaa online-fail-record command and check the Online failure reason field for the authentication failure reason.

7. Verify that authorization attributes are assigned successfully to the user.

a. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates message Authorization failure, the authorization has failed.

b. Examine whether the device has been configured with the port-security authorization-fail offline command to enable the authorization-fail-offline feature.

- If the authorization-fail-offline feature is not enabled, users who fail authorization can still stay online. In this case, the authentication failure is not caused by an authorization failure, and you must continue to locate other fault reasons.

- If the authorization-fail-offline feature is enabled, execute the dot1x access-user log enable failed-login command to enable logging for 802.1X user login failures. In addition, use the DOT1X_LOGIN_FAILURE log to identify the failed authorization attributes, such as the authorization ACL and VLAN.

c. Verify that the authorization attributes, for example, the authorization ACL and VLAN, on the server are configured correctly, to ensure that the server assigns accurate authorization attributes to the user.

d. Execute the display acl and display vlan commands to verify that the corresponding authorization attributes exist on the device. If an authorization attribute does not exist, you must create it on the device to ensure that the user can obtain the authorization information.

8. Verify that the MAC address of the 802.1X user is not bound to an interface that is not the interface that the 802.1X user accesses.

a. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message MAC binding processing failure, the device fails to process the MAC address binding request for the 802.1X user.

b. Execute the dot1x access-user log enable failed-login command to enable logging for 802.1X user login failures. In addition, use the DOT1X_MACBINDING_EXIST log to determine that the reason for the user login failure is that the user's MAC address has been bound to another interface.

c. Use the undo dot1x mac-binding command on the device to delete the existing MAC address binding entry for the 802.1X user.

9. Examine whether the 802.1X user is in quiet state.

Execute the display dot1x command on the device and check the Quiet timer and Quiet period fields and the Auth state field in the Online 802.1X users area. If the quiet timer is enabled and the value for the Auth state is Unauthenticated for the 802.1X user, the 802.1X user is in quiet state.

The device cannot process any 802.1X authentication requests for the quiet 802.1X user until the quiet timer expires. You can wait until the quiet timer expires or execute the dot1x timer quiet-period command to shorten the quiet period. When the quiet timer expires, initiate 802.1X authentication for the user and verify that the user can pass 802.1X authentication.

10. Identify whether the number of online 802.1X users has reached the maximum value.

a. Execute the display dot1x interface command on the device to check the information on the interface that the user accesses. The Max online users field displays the maximum number of online users supported on the interface and the Online 802.1X users field displays the current number of online users on the interface. Compare the values for these two fields to determine whether the number of online 802.1X users has reached the maximum value.

b. If the number of online 802.1X users on the interface has reached the maximum value, you can execute the dot1x max-user command to increase the maximum number of online 802.1X users allowed on the interface.

c. If the number of online 802.1X users on the interface cannot be increased, you can wait for other users to go offline or connect the user to another interface.

11. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ The log messages collected by executing the dot1x access-user log enable command.

¡ The debugging messages collected by executing the debugging dot1x all and debugging radius all commands.

Related alarm and log messages

Alarm messages

N/A

Log messages

· DOT1X_CONFIG_NOTSUPPORT

· DOT1X_LOGIN_FAILURE

· DOT1X_MACBINDING_EXIST

802.1X user logoff

Symptom

An 802.1X user goes offline unexpectedly after it passes authentication successfully to come online.

Common causes

The following are the common causes of this type of issue:

· Settings related to 802.1X authentication have changes on the device.

· The user fails online user handshake.

· Real-time accounting fails for the user.

· 802.1X reauthentication fails.

· The server forces the user to go offline.

· The user goes offline after offline detection is enabled.

· The session of the user times out.

Troubleshooting flow

Figure 130 shows the troubleshooting flowchart.

Figure 130 Flowchart for troubleshooting 802.1X user logoff

Solution

IMPORTANT:

· As a best practice, do not enable debugging when the device is running correctly. However, you can enable debugging when a fault occurs for troubleshooting purposes.

· Save the results of the steps in this section in a timely manner, so that you can quickly collect and provide feedback if the fault cannot be resolved.

1. Identify whether settings related to 802.1X authentication have changes on the device and verify that the changed settings are correct.

a. Execute the display dot1x command to examine whether the 802.1X authentication settings on the device have changes, and verify that the changed settings are correct.

b. Execute the display domain command to examine whether settings in the authentication domain used by the user have changes, and verify that the changed settings are correct.

2. Identify whether the user fails online user handshake and troubleshoot the cause of the failure.

a. Execute the display dot1x command to check the Handshake field to identify whether 802.1X online user handshake is enabled on the interface that the user accesses.

b. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message Handshake interaction failure, the user has failed online user handshake. You can capture packets to identify whether the device and the client can correctly send and receive EAP data packets and analyze the captured packet file to locate and resolve the issue.

3. Examine whether real-time accounting fails for the user and troubleshoot the cause of the failure.

Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message Real-time accounting failure, real-time accounting has failed for the user. In this case, check the link between the device and the accounting server for connectivity issues, and identify whether settings related to accounting have changes on the device and the accounting server. Verify that the changed settings are correct.

4. Identify whether the user is logged off due to a reauthentication failure and troubleshoot the cause of the failure.

a. Execute the display dot1x command to check the Periodic reauth field for the enabling status of 802.1X periodic reauthentication on the interface that the user accesses.

b. Execute the dot1x access-user log enable abnormal-logoff command to enable logging for exceptional logoffs of 802.1X users. Then, use the DOT1X_LOGOFF_ABNORMAL log to verify that the reason for the user exceptional logoff is reauthentication failure.

c. To troubleshoot the cause of the reauthentication failure, use the method in "802.1X user authentication failure."

5. Identify whether the user is logged off by the RADIUS server if RADIUS remote authentication is used.

Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message The RADIUS server forcibly logged out the user, the user is logged off by the RADIUS server. You can contact the server administrator to identify the reason for the logoff.

6. Identify whether the user is logged off because the device has not received any traffic from the user before the offline detection timer expires, and troubleshoot the issue.

a. Execute the display dot1x command to check the Offline detection field for the enabling status of the 802.1X offline detection feature on the interface that the user accesses.

b. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message Offline detect timer expired, it indicates that the device has not received any traffic from the user on the interface within the offline detection interval. As a result, the device cuts off the user's connection, causing the user to go offline.

c. Check the link between the client and the device for connectivity issues to troubleshoot the reason why the client did not send any packets.

7. Examine whether the session of the user has timed out.

a. Identify whether the session timeout time has been configured for the 802.1X user.

- If RADIUS remote authentication is used, execute the debugging radius packet command to enable RADIUS packet debugging and check the debugging messages to identify whether the response packets sent from the server contain the Session-Timeout attribute.

- If local authentication is used, execute the display local-user command to check for the existence of the Session-timeout field in the command output.

b. Execute the debugging dot1x event command to enable 802.1X authentication event debugging. If the system generates debugging message User session timed out, the user goes offline due to session timeout.

c. It is normal for a user to go offline due to session timeout, and the user can come online again.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ The offline reason displayed by executing the display aaa abnormal-offline-record or display aaa normal-offline-record command.

¡ The log messages collected by executing the dot1x access-user log enable command.

¡ The debugging messages collected by executing the debugging dot1x all and debugging radius all commands.

Related alarm and log messages

Alarm messages

N/A

Log messages

· DOT1X_LOGOFF

· DOT1X_LOGOFF_ABNORMAL

Troubleshooting AAA issues

Unable to execute some commands after logging into the device

Symptom

After logging into the device, the administrator does not have execution permissions for some commands, and the system prints a message of Permission denied.

Common causes

The common cause of this type of issue is that the authorization given to the user role is too limited.

Troubleshooting flow

Figure 131 shows the troubleshooting flowchart.

Figure 131 Flowchart for troubleshooting the issue of unable to execute some commands after login

Solution

1. Check whether the user role is a custom user role.

Log in to the device as a super administrator (with a network-admin, mdc-admin, or level-15 user role), execute the display line command to view the authentication mode for the user line, and take different processing steps according to the authentication mode used.

<Sysname> display line

Idx Type Tx/Rx Modem Auth Int Location

0 CON 0 9600 - N - 0/0

+ 81 VTY 0 - N - 0/0

+ 82 VTY 1 - P - 0/0

+ 83 VTY 2 - A - 0/0

...

¡ For authentication mode none or password (Auth field value: N or P), check whether the user role in the corresponding user line view is a custom user role. If it is not a custom user role, use the user-role role-name command to set a system predefined role with higher privileges.

¡ For the scheme authentication mode (Auth field value: A), first check the authentication method configured in the authentication domain for the login user.

If the domain's authentication method is local, use the display local-user command to check whether the user role is a custom user role. If not a custom user role, use the authorization-attribute user-role role-name command to assign a system predefined role with higher permissions (for example, network-admin).

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] authorization-attribute user-role network-admin

If the domain's authentication method is remote, contact the administrator of the remote authentication server to authorize a predefined system role with higher permissions.

2. Check whether the commands unable to execute are within the permissions allowed by the custom user role.

Execute the display role name role-name command to view the command rule associated with the user custom role.

If the commands executed by the user are outside the permissions of the command rule, add the permissions for these commands to the command rule for the custom user role through the rule command, or assign the user a predefined system role with higher privileges. Even if custom user roles are configured with higher permission rules, some commands are still unsupported. For details on these commands, see the RBAC configuration in Fundamentals Configuration Guide.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Unable to create or edit local users after logging into the device

Symptom

After logging into the device, the administrator cannot create or edit local users, and the system prompts a message of Insufficient right to perform the operation.

Common causes

The common cause of this type of issue is that the user role is not authorized to configure the target local users.

Troubleshooting flow

Figure 132 shows the troubleshooting flowchart.

Figure 132 Flowchart for troubleshooting the issue of unable to create or edit local users after login

Solution

1. Check whether the role of the current logged-in user is a predefined super administrator role, network-admin, mdc-admin, or level-15.

Only the predefined super administrator roles have the permission to create local users. Other user roles can only access their own local user views. If the logged-in user does not have a super administrator role, assign one to the user.

Execute this step only if you lack the permission to create local users. If you cannot modify local users, execute step 2.

2. Compare the permission scope of the logged-in user with that of the target user.

Execute the display role name role-name command to view the roles and permissions of both the logged-in user and the target user, and compare their permissions. If the logged-in user has lower permissions than the target user, assign the logged-in user a role with higher permissions.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Administrator not assigned a user role

Symptom

The administrator cannot successfully log in to the device, and the device does not offer three login attempts. For instance, when users attempt to log in via Telnet and enter their username and password, the device's login interface neither displays a message indicating AAA authentication failure nor prompts them to re-enter their credentials.

Common causes

The common cause of this type of issue is that the user is not assigned a user role.

Troubleshooting flow

Figure 133 shows the troubleshooting flowchart.

Figure 133 Flowchart for troubleshooting the issue of administrator not assigned a user role

Solution

1. Check whether the user is assigned with a user role.

<Sysname> display line

Idx Type Tx/Rx Modem Auth Int Location

0 CON 0 9600 - N - 0/0

+ 81 VTY 0 - N - 0/0

+ 82 VTY 1 - P - 0/0

+ 83 VTY 2 - A - 0/0

...

¡ For authentication mode none or password (Auth field value N or P), check whether the user role configuration exists in the corresponding user line view. If it does not, assign a user role (abc in this example) to the user line by using the user-role role-name command.

<Sysname> system-view

[Sysname] line vty 0 63

[Sysname-line-vty0-63] user-role abc

¡ For the scheme authentication mode (Auth field value: A), first check the authentication method configured in the authentication domain for the login user.

- If the domain's authentication method is local, use the display local-user command to view the authorized roles of the local user. If the User role list field is empty, it indicates that no user role is authorized for the user.

<Sysname> display local-user user-name test class manage

Total 1 local users matched.

Device management user test:

State: Active

Service type: Telnet

User group: system

Bind attributes:

Authorization attributes:

Work directory: flash:

User role list:

...

In this case, enter the local user view and execute the authorization-attribute user-role command to authorize the user role (abc in this example).

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] authorization-attribute user-role abc

- If the domain's authentication method is remote, contact the administrator of the authentication server to check whether the user has been authorized with a user role. If not, add the user-role authorization attribute for the user. Using the Free RADIUS server as an example, to add the user role network-admin in the users file, edit the script as follows:

user Cleartext-Password := "123456"

H3C-User-Roles ="shell:roles=\"network-admin\""

For adding user roles on other RADIUS servers, please follow the actual situation.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Invalid characters in login username

Symptom

The administrator failed to log in to the device, and the system printed the following log information:

Sysname LOGIN/5/LOGIN_INVALID_USERNAME_PWD: -MDC=1; Invalid username or password from xx.xx.xx.xx.

Common causes

The common cause of this type of issue is that the entered username contains invalid characters.

Troubleshooting flow

Figure 134 shows the troubleshooting flowchart.

Figure 134 Flowchart for troubleshooting the issue of username containing invalid characters

Solution

NOTE:

This solution applies only to SSH and Telnet login users.

1. Check whether the username entered by the user contains invalid characters.

When a user logs in to the device, the system checks the validity of the entered username and domain name. If the username contains characters "\", "|", "/", ":", "*", "?", "<", ">", and "@", or if the domain name contains "@", login is not allowed. In this case, users can try to log in again and enter the correct username.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

LOGIN_INVALID_USERNAME_PWD

Incorrect username or password for local authentication

Symptom

The administrator failed to log into the device using local authentication. If the device is enabled with event debugging for the local server (by using the debugging local-server event command), the system will print the following debugging information:

*Aug 18 10:36:58:514 2021 Sysname LOCALSER/7/EVENT: -MDC=1;

Authentication failed, user password is wrong.

*Aug 18 10:37:24:962 2021 Sysname LOCALSER/7/EVENT: -MDC=1;

Authentication failed, user "t4" doesn't exist.

Common causes

The following are the common causes of this type of issue:

· The entered password is incorrect.

· The local username does not exist.

Troubleshooting flow

Figure 135 shows the troubleshooting flowchart.

Figure 135 Flowchart for troubleshooting incorrect local username or password

Solution

1. Identify whether the local username exists.

Execute the display local-user command to Identify whether a local user of the device management type exists with the same login username.

¡ If the local user does not exist, use the local-user command to create one (username test in this example) and notify the user to try logging in to the device again.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test]

¡ If the local user exists, execute step 2.

2. Check whether the entered password for the local user is correct.

If the system prompts incorrect password during user login, enter the local user view and execute the password command to reset the password (123456TESTplat&! in this example), and then notify the user to try logging into the device again.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] password simple 123456TESTplat&!

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, log messages, alarm messages, and debugging information.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Service type of local user mismatch

Symptom

*Aug 7 17:18:07:098 2021 Sysname LOCALSER/7/EVENT: -MDC=1; Authentication failed, unexpected user service type 64 (expected = 3072).

Common causes

The common cause of this type of issue is that the user's access type does not match the service type configured for the local user on the device, meaning the user's access type is not within the configured range of service types.

Troubleshooting flow

Figure 136 shows the troubleshooting flowchart.

Figure 136 Flowchart for troubleshooting service type of local user mismatch

Solution

1. Identify whether the user's access type falls within the range of service types configured for the local user.

a. Execute the display local-user command. The Service type field in the command output displays the service types the local user can use.

<Sysname> display local-user user-name test class manage

Total 1 local users matched.

Device management user test:

State: Active

Service type: Telnet

User group: system

Bind attributes:

Authorization attributes:

Work directory: flash:

User role list:

...

b. In local user view for this user, modify the service types that the user can use. Make sure the actually used access type (SSH in this example) is included.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] service-type ssh

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, log messages, alarm messages, and debugging information.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Denied access within a period due to excessive number of login failures

Symptom

After failing to log in to the device a specified number of times, an administrator is temporarily banned from attempting to log in again.

Common causes

The following are the common causes of this type of issue:

· The device has the login attack prevention feature enabled. After this feature is enabled, if a user fails to log in the specified number of times and their IP address gets blacklisted, the device will discard packets from that IP address. This prevents the user from logging in for a set duration.

· Users log in to the device using local authentication, and the device has the password control feature enabled. After a user login authentication fails, the system adds the user to the password management blacklist and restricts subsequent login attempts according to the measures configured. When a user login fails more times than the specified limit, the system will prohibit that user from logging in. After a period, the system allows the user to attempt to log in again.

Troubleshooting flow

Figure 137 shows the troubleshooting flowchart.

Figure 137 Flowchart for troubleshooting denied access within a period

Solution

1. Try to log in again after waiting for a certain period.

Incorrect password input might cause login prohibition. As a best practice, try to log in again after waiting for some time. If you encounter the same issue again when logging into the device with the correct username and password, switch to another administrator account that can access the device and continue with the following processing steps.

2. Check whether the user can initiate a login connection after being blocked.

¡ If the user is still able to initiate a login connection to the device after being blocked but fails to authenticate, execute the display password-control blacklist command in any view to Identify whether the user has been added to the blacklist. If the user is on the blacklist and the Lock flag in the display information is set to lock, it means the user is locked out.

<Sysname> display password-control blacklist

Per-user blacklist limit: 100.

Blacklist items matched: 1.

Username IP address Login failures Lock flag

test 3.3.3.3 4 lock

For users added to the blacklist, you can process them in either of the following methods:

- Execute the undo password-control enable command in system view to disable the global password control feature.

<Sysname> system-view

[Sysname] undo password-control enable

- Execute the reset password-control blacklist command in user view to clear the user (user test in this example) from the password control blacklist.

<Sysname> reset password-control blacklist user-name test

¡ If the user is blocked and cannot initiate a login connection to the device, execute step 3.

3. Identify whether the login attack prevention feature is enabled.

If the current configuration contains commands starting with attack-defense login, you can disable the login attack prevention feature as needed or change the maximum number of consecutive login failures and the block duration after a login failure.

¡ Use the undo attack-defense login enable command to disable login user attack prevention, and use the undo blacklist global enable command to disable the global blacklist.

<Sysname> system-view

[Sysname] undo attack-defense login enable

[Sysname] undo blacklist global enable

¡ Execute the attack-defense login max-attempt command to increase the maximum number of consecutive login failures, allowing more user login attempts. This number is set to 5 in the following example:

<Sysname> system-view

[Sysname] attack-defense login max-attempt 5

¡ Execute the attack-defense login block-timeout command to reduce the blocking time, allowing users to log in again as soon as possible. The blocking time is set to 1 minute in the following example:

<Sysname> system-view

[Sysname] attack-defense login block-timeout 1

Executing the above actions may weaken the device's defense against login DoS attacks, so proceed with caution.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Delayed reauthentication after login failure

Symptom

After an administrator fails to log in to a device, the console does not respond for a certain period, during which the administrator user cannot perform any operations.

Common causes

The common cause of this type of issue is that the device has the login reauthentication-delay feature enabled. After this feature is enabled, if a user login fails, the system will delay for a certain period before allowing the user to authenticate again.

Troubleshooting flow

Figure 138 shows the troubleshooting flowchart.

Figure 138 Flowchart for troubleshooting delayed reauthentication after login failure

Solution

1. Identify whether the login reauthentication delay feature is enabled.

If the current configuration contains the attack-defense login reauthentication-delay command, you can disable the login reauthentication delay feature or adjust the delay period as needed.

¡ Execute the undo attack-defense login reauthentication-delay command to disable the login reauthentication delay feature.

<Sysname> system-view

[Sysname] undo attack-defense login reauthentication-delay

¡ Execute the attack-defense login reauthentication-delay seconds command to reduce the wait time for reauthentication after a user login fails (for example, to 10 seconds).

<Sysname> system-view

[Sysname] attack-defense login reauthentication-delay 10

Executing the above actions may weaken the device's defense against login user dictionary attacks, so proceed with caution.

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Maximum concurrent logins with identical local username reached

Symptom

When a certain number of local authentication users access the device with the same username, subsequent attempts to log in to the device with that username will fail.

If the device is enabled with event debugging for the local server (by using the debugging local-server event command), the system will print the following debugging information:

*Aug 18 10:52:56:664 2021 Sysname LOCALSER/7/EVENT: -MDC=1;

Authentication failed, the maximum number of concurrent logins already reached for the local user.

Common causes

The common cause of this type of issue is that the maximum number of concurrent logins has been set for the current local user name.

Troubleshooting flow

Figure 139 shows the troubleshooting flowchart.

Figure 139 Flowchart for troubleshooting the issue of reaching the maximum number of concurrent logins with identical local username

Solution

1. Identify whether you have set the maximum number of concurrent logins for users using the current local user name.

Execute the display local-user command to view the local user configuration for that user name. If the value for the Access limit field is Enabled, it indicates that the maximum number of concurrent users using the current local user name has been set (2 in this example).

<Sysname> display local-user user-name test class manage

Total 1 local users matched.

Device management user test:

Service type: SSH/Telnet

Access limit: Enabled Max access number: 2

Service type: Telnet

User group: system

Bind attributes:

Authorization attributes:

Work directory: flash:

User role list: test

...

You can change or remove this access limit in the local user view as needed.

¡ To remove this access limit, execute the undo access-limit command.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] undo access-limit

¡ To change the limit to a bigger value (10 in this example), execute the access-limit max-user-number command.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] access-limit 10

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Maximum concurrent users of the same access type reached

Symptom

When a certain number of users access the device using the same login method, subsequent user logins using that method will fail.

If the device has enabled with event debugging for the related access module, the system will print the following debugging information:

%Aug 18 10:57:52:596 2021 Sysname TELNETD/6/TELNETD_REACH_SESSION_LIMIT: -MDC=1; Telnet client 1.1.1.1 failed to log in. The current number of Telnet sessions is 5. The maximum number allowed is (5).

Common causes

The common cause of this type of issue is that the maximum number of concurrent users is set for the specified login method.

Troubleshooting flow

Figure 140 shows the troubleshooting flowchart:

Figure 140 Flowchart for troubleshooting the issue of reaching the maximum number of concurrent users of the same login method

Solution

1. Identify whether you have set the maximum number of concurrent users for a specific login method.

If the aaa session-limit command exists in the current configuration, you can change the maximum number of users accessing the device using the current login method by executing the aaa session-limit { ftp | http | https | ssh | telnet } max-sessions command in system view. The following example changes this limit to 32.

<Sysname> system-view

[Sysname] aaa session-limit telnet 32

2. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

RADIUS server not respond

Symptom

Authentication, authorization, and accounting through RADIUS failed because the RADIUS server is not responding. If the device has RADIUS event debugging enabled (by executing the debugging radius event command), the system will print the following debugging information:

*Aug 8 17:49:06:143 2021 Sysname RADIUS/7/EVENT: -MDC=1; Reached the maximum retries

Common causes

The following are the common causes of this type of issue:

· The shared keys configured on the RADIUS server do not match those configured on the access device.

· The IP address of the device is not added to the RADIUS server or incorrect IP address is added to the RADIUS server for the device.

· Network issues exist between the RADIUS server and the access device, such as when a firewall in the intermediate network blocks the port numbers (default authentication port number 1812, default accounting port number 1813) used by the RADIUS server to provide AAA services.

Troubleshooting flow

Figure 141 shows the troubleshooting flowchart.

Figure 141 Flowchart for troubleshooting a non-responsive RADIUS server

Solution

1. Identify whether the shared keys configured on the RADIUS server match those on the access device.

¡ If the shared keys do not match, then:

# On the access device, execute the key authentication and key accounting commands in RADIUS scheme view to reconfigure the shared keys for authentication and accounting. The following example sets the authentication key to 123 and the accounting key to 456:

<Sysname> system-view

[Sysname] radius scheme radius1

[Sysname-radius-radius1] key authentication simple 123

[Sysname-radius-radius1] key accounting simple 456

# On the RADIUS server, reconfigure the shared keys for RADIUS message interaction with the access device to ensure consistency with the share key configuration on the access device.

¡ If the shared keys are consistent, execute step 2.

2. Identify whether the access device's IP address has been added to the RADIUS server or if the added IP address is correct.

The device IP address added on the RADIUS server must be the source IP address from which the access device sends RADIUS packets. You can set the source IP address used by the access device to send RADIUS packets by commands.

The access device selects the source IP address used to send RADIUS packets in the following order:

a. The NAS-IP address configured in RADIUS scheme view by using the nas-ip command.

b. The source NAS-IP address configured in system view by using the radius nas-ip command.

c. IP address of the outgoing interface sending the RADIUS packets.

3. Identify whether any network issues exist between the device and the server.

First, use methods like ping to verify network connectivity between the device and the server. Then, Identify whether firewalls exist within the network. Typically, if a network contains a firewall that blocks packets destined for the UDP port numbers of the RADIUS server (with default RADIUS authentication port number at 1812 and default RADIUS accounting port number at 1813), RADIUS packets will be discarded.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, log messages, alarm messages, and debugging information.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

HWTACACS server not respond

Symptom

Authentication, authorization, and accounting failed using the HWTACACS server. If the device has HWTACACS event debugging enabled (by using debugging hwtacacs event command), the system prints Connection timed out in the event debugging information.

Common causes

The following are the common causes of this type of issue:

· The shared keys configured on the HWTACACS server do not match those configured on the access device.

· The IP address of the device is not added to the HWTACACS server or incorrect IP address is added to the HWTACACS server for the device.

· Network issues exist between the HWTACACS server and the access device, such as when a firewall in the intermediate network blocks the port number (default authentication/authorization/accounting port number 49) used by the HWTACACS server to provide AAA services.

Troubleshooting flow

Figure 142 shows the troubleshooting flowchart.

Figure 142 Flowchart for troubleshooting non-responsive HWTACACS server

Solution

1. Identify whether the shared keys configured on the HWTACACS server match those on the access device.

¡ If the shared keys do not match, then:

# On the access device, execute the key authentication, key authorization, and key accounting commands in HWTACACS scheme view to reconfigure the shared keys for authentication, authorization, and accounting (in the example below, the authentication and authorization keys are 123, and the accounting key is 456).

<Sysname> system-view

[Sysname] hwtacacs scheme hwt1

[Sysname-hwtacacs-hwt1] key authentication simple 123

[Sysname-hwtacacs-hwt1] key authorization simple 123

[Sysname-hwtacacs-hwt1] key accounting simple 456

# On the HWTACACS server, reconfigure the shared key for HWTACACS messages interacting with the access device to ensure consistency with the configuration on the access device.

¡ If the shared keys are consistent, execute step 2.

2. Identify whether the access device's IP address has been added to the HWTACACS server or if the added IP address is correct.

The IP address added to the HWTACACS server must be the source IP address from which the access device sends HWTACACS packets. You can set the source IP address used by the access device to send HWTACACS packets by commands.

The access device selects the source IP address used to send HWTACACS packets in the following order:

¡ The source IP address configured in HWTACACS scheme view by using the nas-ip command.

¡ The source IP address configured in system view by using the hwtacacs nas-ip command.

¡ The IP address of the outgoing interface sending the HWTACACS packets.

3. Identify whether any network issues exist between the device and the server.

First, use methods like ping to verify network connectivity between the device and the server. Then, Identify whether firewalls exist within the network. Typically, if a network contains a firewall that blocks packets destined for the TCP port number of the HWTACACS server (with the default authentication/authentication/authorization port number at 49), HWTACACS packets will be discarded.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, log messages, alarm messages, and debugging information.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Mismatched user access type and the Login-Service attribute value issued by the RADIUS server

Symptom

User authentication fails because the device does not support the Login-Service attribute value issued by the RADIUS server.

Use the debugging radius packet command to enable RADIUS packet debugging on the device. In the debugging information of the following form, you can see that the server issued a Login-Service attribute type not supported by the device.

*Aug 3 02:33:18:707 2021 Sysname RADIUS/7/PACKET:

Service-Type=Framed-User

Idle-Timeout=66666

Session-Timeout=6000

Common causes

The main reason for this class of faults is that the service type for user login does not match the service type specified by the Login-Service attribute issued by the server.

The Login-Service attribute is issued to the user by the RADIUS server to identify the type of service for authenticated users. The device currently supports the following Login-Service attribute values:

· 0: Telnet (standard attribute)

· 50: SSH (expansion attribute)

· 51: FTP (expansion attribute)

· 52: Terminal (expansion attribute)

· 53: HTTP (expansion attribute)

· 54: HTTPS (expansion attribute)

You can use the CLI to set the method in which the device inspects the value of the Login-Service attribute, controlling the consistency check method for user service types.

Troubleshooting flow

Figure 143 shows the troubleshooting flowchart.

Figure 143 Flowchart for troubleshooting mismatched user access type and the Login-Service attribute value

Solution

1. Verify if the Login-Service attribute value issued by the RADIUS server matches the access type.

Execute the display radius scheme command on the access device to view the value of the Attribute 15 check-mode field for the RADIUS scheme.

¡ If the value is Loose, it indicates that the loose check mode is used and the device uses the standard value of the Login-Service attribute to check the user service type. SSH, FTP, and terminal users can pass authentication only when the Login-Service attribute value issued by the RADIUS server is 0, indicating the Telnet user type.

¡ If the value is Strict, it indicates that the strict check mode is used and the device uses both the standard value and expansion values of the Login-Service attribute to check the user service type. SSH, FTP, and terminal users can pass authentication only when the RADIUS server assigns the corresponding Login-Service expansion attribute value.

If the Login-Service attribute issued to a user by the RADIUS server is out of the range supported by the device, you can resolve this issue by using one of the following methods:

¡ On the RADIUS server, set the server to either not issue the Login-Service attribute or change the issued attribute value to a value supported by the access device.

¡ On the access device, enter the corresponding RADIUS scheme and use the attribute 15 check-mode command to change the check mode for the Login-Service attribute. In this example, the check mode is set to loose.

<Sysname> system-view

[Sysname] radius scheme radius1

[Sysname-radius-radius1] attribute 15 check-mode loose

2. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, log information, debugging information, and alarm messages.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

Local authentication login failure

Symptom

The administrator failed to log into the device using local authentication.

Common causes

The following are the common causes of this type of issue:

· The configuration of the authentication method for the user line is incorrect.

· The protocol type supported by the VTY user line is incorrect.

· The configured authentication, authorization, and accounting schemes for the ISP domain are incorrect.

· The local user does not exist, the password is incorrect, or the service type is incorrect.

· The number of local user accesses has reached the upper limit.

· The number of users logged into the device has reached the upper limit.

· The global password management function is enabled, and the local lauth.dat file on the device is abnormal.

Troubleshooting flow

Figure 144 shows the troubleshooting flowchart.

Figure 144 Flowchart for troubleshooting local authentication login failures

Solution

NOTE:

For login issues with Web, NETCONF over SOAP, and FTP, inspection of the user line (class) configuration is not required. The other troubleshooting steps are the same.

1. Check the user line configuration .

Execute the line vty first-number [ last-number ] command to enter the view of the specified VTY user line, and execute the display this command to view if the following configurations are correct:

¡ The authentication-mode is set to scheme.

¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.

¡ For SSH login, the protocol inbound is set to ssh or the default value is used.

2. Check the configuration in user line class view.

3. The configuration in user line view takes precedence over the configuration in user line class view. If the user line view does not contain any configuration, continue to check the settings in user line class view.

4. Execute the line class vty command to enter VTY user line class view, and use the display this command to verify if the following configurations are correct:

¡ The authentication-mode is set to scheme.

¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.

¡ For SSH login, the protocol inbound is set to ssh or the default value is used.

If the configurations in user line view and user line class view are incorrect, set the authentication scheme to scheme as needed for the user line or user line class, and specify the supported protocol types for user login.

5. Identify whether the authentication, authorization, and accounting scheme configurations for the ISP domain are correct.

Execute the display domain command to view the configuration information.

¡ If a user login username includes the domain name (for example, test), verify if the value of the Login authentication scheme field for the domain is Local. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is Local.

<Sysname> display domain test

Domain: test

State: Active

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out action: Offline

Service type: HSI

Session time: Exclude idle time

NAS-ID: N/A

DHCPv6-follow-IPv6CP timeout: 60 seconds

Authorization attributes:

Idle cut: Disabled

Session timeout: Disabled

IGMP access limit: 4

MLD access limit: 4

¡ If the user login username does not include the domain name, execute the display this command in system view to view the configuration of domain default enable isp-name. In this example, the default domain name is system.

domain default enable system

- If this configuration exists, execute the display domain command to verify if the value of the Login authentication scheme field for the ISP domain is Local. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is Local.

- If the configuration does not exist, execute the display domain command to verify if the value of the Login authentication scheme field for the system domain is Local. If the Login authentication scheme field is missing for the system domain, verify if the value of the Default authentication scheme field is Local.

The method for confirming the authorization and accounting configuration is similar. If the above configurations are incorrect, configure the local scheme for authentication, authorization, or accounting for login users in the relevant ISP domain.

6. Verify that the username and password are correct.

Execute the display local-user command to verify if the corresponding local user configuration exists.

¡ If a local user exists, execute the local-user username class manage command to enter local user view. Then, use the display this command to verify if a password is configured in the view and if the service-type configuration matches the required service type.

- If the user password is required, try resetting the password once. In this example, the password is set to 123456TESTplat&!.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] password simple 123456TESTplat&!

- If the service type is incorrect, configure the service type to match the login method. In this example, SSH is used.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] service-type ssh

¡ If a local user does not exist, execute the local-user username class manage command to create a device management local user and configure the password and service type. In this example, the username is test.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test]

7. Verify if the number of users accessing with this local username has reached the upper limit.

Execute the display this command in local user view to verify if the access-limit configuration exists.

¡ If the access-limit configuration exists, execute the display local-user username class manage command to verify if the value of the Current access number field has reached the configured upper limit. If the upper limit is reached, take one of the following measures as needed:

- In local user view, execute the access-limit command to increase the user limit. In this example, the upper limit is changed to 20.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] access-limit 20

- Execute the free command in user view to force other online users offline. This example releases all connections established on VTY1.

<Sysname> free line vty 1

Are you sure to free line vty1? [Y/N]:y

[OK]

¡ If the access-limit configuration does not exist, or the number of users has not reached the upper limit, proceed to the next step.

8. Verify if the number of online users for the specified login type has reached the upper limit.

a. Execute the display this command in system view to verify if the aaa session-limit configuration exists. If the configuration is not found, it indicates that the default value 32 is used.

aaa session-limit ftp 33

domain default enable system

b. Execute the display users command to view the current user login status in use line and verify if the user quantity has reached the upper limit.

c. If the number of online users reaches the upper limit, take one of the following measures as needed:

- In system view, execute the aaa session-limit command to increase the user quantity upper limit.

- Execute the free command in user view to force other online users offline.

9. Verify if the local lauth.dat file is correct.

After you enable the global password management feature, the device automatically generates a lauth.dat file to record local users' authentication and login information. Manually deleting or modifying this file will cause an anomaly in local authentication. Therefore, first execute the display password-control command to verify if the global password management feature is enabled on the device.

¡ If the file does not exist, is of size 0, or is very small (less than 20B), contact Technical Support. If urgent, try re-enabling the global password management feature to resolve the issue.

<Sysname> dir

Directory of flash: (EXT4)

0 drw- - Aug 16 2021 11:45:37 core

1 drw- - Aug 16 2021 11:45:42 diagfile

2 drw- - Aug 16 2021 11:45:57 dlp

3 -rw- 713 Aug 16 2021 11:49:41 ifindex.dat

4 -rw- 12 Sep 01 2021 02:40:01 lauth.dat

...

<Sysname> system-view

[Sysname] undo password-control enable

[Sysname] password-control enable

¡ If this feature is not enabled, skip this step.

10. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, log information, alarm messages, and debugging information.

¡ Use the debugging local-server all command to enable debugging of the local server to collect the device debugging information.

Related alarm and log messages

Alarm messages

Module: HH3C-UI-MAN-MIB

· hh3cLogInAuthenFailure (1.3.6.1.4.1.25506.2.2.1.1.3.0.3)

· Module: HH3C-SSH-MIB

· hh3cSSHUserAuthFailure (1.3.6.1.4.1.25506.2.22.1.3.0.1)

Log messages

· LOGIN/5/LOGIN_FAILED

· SSHS/6/SSHS_AUTH_FAIL

RADIUS authentication login failure

Symptom

The administrator failed to log in to the device using RADIUS authentication.

Common causes

The following are the common causes of this type of issue:

· The configuration of the authentication method for the user line is incorrect.

· The protocol type supported by the VTY user line is incorrect.

· The configured authentication, authorization, and accounting schemes for the ISP domain are incorrect.

· Interaction with the RADIUS server failed.

· The value of the Login-Service attribute issued by the RADIUS server is incorrect.

· The RADIUS server failed to assign a user role.

Troubleshooting flow

Figure 145 shows the troubleshooting flowchart.

Figure 145 Flowchart for troubleshooting RADIUS authentication login failures

Solution

NOTE:

For login issues with Web, NETCONF over SOAP, and FTP, inspection of the user line (class) configuration is not required. The other troubleshooting steps are the same.

1. Check the user line configuration .

Execute the line vty first-number [ last-number ] command to enter the view of the specified VTY user line, and execute the display this command to view if the following configurations are correct:

¡ The authentication-mode is set to scheme.

¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.

¡ For SSH login, the protocol inbound is set to ssh or the default value is used.

2. Check the configuration in user line class view.

4. Execute the line class vty command to enter VTY user line class view, and use the display this command to verify if the following configurations are correct:

¡ The authentication-mode is set to scheme.

¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.

¡ For SSH login, the protocol inbound is set to ssh or the default value is used.

5. Identify whether the authentication, authorization, and accounting scheme configurations for the ISP domain are correct.

Execute the display domain command to view the configuration information.

¡ If a user login username includes the domain name (for example, test), verify if the value of the Login authentication scheme field for the domain is in the RADIUS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the RADIUS=xx format.

<Sysname> display domain test

Domain: test

State: Active

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out action: Offline

Service type: HSI

Session time: Exclude idle time

NAS-ID: N/A

DHCPv6-follow-IPv6CP timeout: 60 seconds

Authorization attributes:

Idle cut: Disabled

Session timeout: Disabled

IGMP access limit: 4

MLD access limit: 4

domain default enable system

- If this configuration exists, execute the display domain command to verify if the value of the Login authentication scheme field for the ISP domain is in the RADIUS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the RADIUS=xx format.

- If the configuration does not exist, execute the display domain command to verify if the value of the Login authentication scheme field for the system domain is in the RADIUS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the RADIUS=xx format.

The method for confirming the authorization and accounting configuration is similar. If the above configurations are incorrect, configure the RADIUS scheme for authentication, authorization, or accounting for login users in the relevant ISP domain. In this example, the specified RADIUS scheme is rd1.

<Sysname> system-view

[Sysname] domain test

[Sysname-isp-test] authentication login radius-scheme rd1

[Sysname-isp-test] authorization login radius-scheme rd1

[Sysname-isp-test] accounting login radius-scheme rd1

6. Use the RADIUS debugging information to troubleshoot the following faults:

¡ Execute the debugging radius packet command to enable RADIUS packet debugging. If the output debugging information shows Authentication reject, it indicates that the server has rejected the user's access request. In this case, continue to review the authentication logs recorded on the RADIUS server and contact the server administrator for appropriate processing based on the failure reasons described in the logs.

¡ Execute the debugging radius error command to enable RADIUS error debugging. If the output debugging information shows Invalid packet authenticator, it indicates that the shared key between the device and the server does not match. Try setting a matching shared key for the RADIUS scheme.

¡ Execute the debugging radius event command to enable RADIUS event debugging. If the output debugging information shows Response timed out, it indicates that the device is unreachable from the server. Try troubleshooting the link connectivity issues between the device and the server.

7. Verify if the value of the Login-Service attribute issued by the RADIUS server matches the service type supported by the device.

Execute the debugging radius packet command to enable RADIUS packet debugging. Then, view the Login-Service attribute issued by the RADIUS server, and use the method described in "Mismatched user access type and the Login-Service attribute value issued by the RADIUS server" to resolve the issue.

8. Verify if the RADIUS server has assigned the correct user role.

Execute the debugging radius all command to enable all RADIUS debugging functions. If the connection disconnects immediately after the user enters the username and password, and no anomaly exists in the RADIUS event debugging or RADIUS error debugging output, it is possible that the RADIUS server failed to assign a user role or assigned an incorrect user role to the user. In this case, verify if the RADIUS packet debugging information includes the shell:roles=xx or Exec-Privilege=xx field.

¡ If not included, it means the RADIUS server did not assign a user role to the user. To solve this issue, use one of the following methods:

- On the device, use the role default-role enable rolename command to enable default user role authorization. This gives users a default user role when the server has not authorized any roles for them.

<Sysname> system-view

[Sysname] role default-role enable

- Contact the RADIUS server administrator to assign the appropriate user role to users.

¡ If included, but the specified user role does not exist on the device, contact the RADIUS server administrator to modify the user role settings or use the user-role role-name command to create the corresponding user role on the device.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, log information, alarm messages, and debugging information.

¡ Use the debugging radius all command to enable all the RADIUS debugging functions to collect the device debugging information.

Related alarm and log messages

Alarm messages

Module: HH3C-UI-MAN-MIB

· hh3cLogInAuthenFailure (1.3.6.1.4.1.25506.2.2.1.1.3.0.3)

· Module: HH3C-SSH-MIB

· hh3cSSHUserAuthFailure (1.3.6.1.4.1.25506.2.22.1.3.0.1)

Log messages

· LOGIN/5/LOGIN_AUTHENTICATION_FAILED

· LOGIN/5/LOGIN_FAILED

· SSHS/6/SSHS_AUTH_FAIL

HWTACACS authentication login failure

Symptom

The administrator failed to log in to the device using HWTACACS authentication.

Common causes

The following are the common causes of this type of issue:

· The configuration of the authentication method for the user line is incorrect.

· The protocol type supported by the VTY user line is incorrect.

· The configured authentication, authorization, and accounting schemes for the ISP domain are incorrect.

· Interaction with the HWTACACS server failed.

· The HWTACACS server failed to assign a user role.

Troubleshooting flow

Figure 146 shows the troubleshooting flowchart.

Figure 146 Flowchart for troubleshooting HWTACACS authentication login failures

Solution

NOTE:

For login issues with Web, NETCONF over SOAP, and FTP, inspection of the user line (class) configuration is not required. The other troubleshooting steps are the same.

1. Check the user line configuration .

Execute the line vty first-number [ last-number ] command to enter the view of the specified VTY user line, and execute the display this command to view if the following configurations are correct:

¡ The authentication-mode is set to scheme.

¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.

¡ For SSH login, the protocol inbound is set to ssh or the default value is used.

2. Check the configuration in user line class view.

4. Execute the line class vty command to enter VTY user line class view, and use the display this command to verify if the following configurations are correct:

¡ The authentication-mode is set to scheme.

¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.

¡ For SSH login, the protocol inbound is set to ssh or the default value is used.

¡ If the configurations in user line view and user line class view are incorrect, set the authentication scheme to scheme as needed for the user line or user line class, and specify the supported protocol types for user login.

5. Identify whether the authentication, authorization, and accounting scheme configurations for the ISP domain are correct.

Execute the display domain command to view the configuration information.

¡ If a user login username includes the domain name (for example, test), verify if the value of the Login authentication scheme field for the domain is in the HWTACACS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the HWTACACS=xx format.

<Sysname> display domain test

Domain: test

State: Active

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out action: Offline

Service type: HSI

Session time: Exclude idle time

NAS-ID: N/A

DHCPv6-follow-IPv6CP timeout: 60 seconds

Authorization attributes:

Idle cut: Disabled

Session timeout: Disabled

IGMP access limit: 4

MLD access limit: 4

domain default enable system

- If this configuration exists, execute the display domain command to verify if the value of the Login authentication scheme field for the ISP domain is in the HWTACACS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the HWTACACS=xx format.

- If the configuration does not exist, execute the display domain command to verify if the value of the Login authentication scheme field for the system domain is in the HWTACACS=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the HWTACACS=xx format.

<Sysname> system-view

[Sysname] domain test

[Sysname-isp-test] authentication login hwtacacs-scheme hwt1

[Sysname-isp-test] authorization login hwtacacs-scheme hwt1

[Sysname-isp-test] accounting login hwtacacs-scheme hwt1

6. Use the HWTACACS debugging information to troubleshoot the following faults:

¡ Execute the debugging hwtacacs send-packet and debugging hwtacacs receive-packet commands to enable HWTACACS packet sending and receiving debugging. If the output debugging information shows status: STATUS_FAIL, it means the server rejected the user's access request. In this case, review the failure reasons described in the HWTACACS authentication log and pinpoint based on the specific reasons for failure.

¡ Execute the debugging hwtacacs error command to enable HWTACACS error debugging. If the output debugging information shows Failed to get available server, it indicates that the shared key between the device and the server does not match. Try setting a matching shared key for the HWTACACS scheme.

¡ Execute the debugging radius event command to enable HWTACACS event debugging. If the output debugging information shows Connection timed out, it indicates that the device is unreachable from the server. Try troubleshooting the link connectivity issues between the device and the server.

7. Verify if the HWTACACS server has assigned the correct user role.

Execute the debugging hwtacacs all command to enable all HWTACACS debugging functions. If the connection disconnects immediately after the user logs in, and no anomaly exists in the HWTACACS event debugging output or HWTACACS error debugging output, it is possible that the HWTACACS server failed to assign a user role to the user. In this case, verify if the HWTACACS packet debugging information includes the priv-lvl=xx or roles=xx field.

¡ If not included, it means the HWTACACS server did not assign user role to the user. To solve this issue, use one of the following methods:

<Sysname> system-view

[Sysname] role default-role enable

- Contact the HWTACACS server administrator to assign the appropriate user role to users. The authorization role configuration on the HWTACACS server must meet the format of roles="name1 name2 namen", where name1, name2, and namen are the user roles to be authorized and issued to users. Multiple roles are allowed and separated by spaces.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, log information, alarm messages, and debugging information.

¡ Use the debugging hwtacacs all command to enable all the HWTACACS debugging functions to collect the device debugging information.

Related alarm and log messages

Alarm messages

Module: HH3C-UI-MAN-MIB

· hh3cLogInAuthenFailure (1.3.6.1.4.1.25506.2.2.1.1.3.0.3)

· Module: HH3C-SSH-MIB

· hh3cSSHUserAuthFailure (1.3.6.1.4.1.25506.2.22.1.3.0.1)

Log messages

· LOGIN/5/LOGIN_AUTHENTICATION_FAILED

· LOGIN/5/LOGIN_FAILED

· SSHS/6/SSHS_AUTH_FAIL

LDAP authentication login failure

Symptom

The administrator failed to log in to the device using LDAP authentication.

Common causes

The following are the common causes of this type of issue:

· The configuration of the authentication method for the user line is incorrect.

· The protocol type supported by the VTY user line is incorrect.

· The configured authentication, authorization, and accounting schemes for the ISP domain are incorrect.

· Interaction with the LDAP server failed.

Troubleshooting flow

Figure 147 shows the troubleshooting flowchart.

Figure 147 Flowchart for troubleshooting LDAP authentication login failures

Solution

NOTE:

For login issues with Web, NETCONF over SOAP, and FTP, inspection of the user line (class) configuration is not required. The other troubleshooting steps are the same.

1. Check the user line configuration .

Execute the line vty first-number [ last-number ] command to enter the view of the specified VTY user line, and execute the display this command to view if the following configurations are correct:

¡ The authentication-mode is set to scheme.

¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.

¡ For SSH login, the protocol inbound is set to ssh or the default value is used.

2. Check the configuration in user line class view.

4. Execute the line class vty command to enter VTY user line class view, and use the display this command to verify if the following configurations are correct:

¡ The authentication-mode is set to scheme.

¡ For Telnet login, the protocol inbound is set to telnet or the default value is used.

¡ For SSH login, the protocol inbound is set to ssh or the default value is used.

If the configurations in user line view and user line class view are inaccurate, set the authentication scheme to scheme as needed for the user line or user line class, and specify the supported protocol types for user login.

5. Identify whether the authentication, authorization, and accounting scheme configurations for the ISP domain are accurate.

Execute the display domain command to view the configuration information.

¡ If a user login username includes the domain name (for example, test), verify if the value of the Login authentication scheme field for the domain is in the LDAP=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the LDAP=xx format.

<Sysname> display domain test

Domain: test

State: Active

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out action: Offline

Service type: HSI

Session time: Exclude idle time

NAS-ID: N/A

DHCPv6-follow-IPv6CP timeout: 60 seconds

Authorization attributes:

Idle cut: Disabled

Session timeout: Disabled

IGMP access limit: 4

MLD access limit: 4

domain default enable system

- If this configuration exists, execute the display domain command to verify if the value of the Login authentication scheme field for the ISP domain is in the LDAP=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the LDAP=xx format.

- If the configuration does not exist, execute the display domain command to verify if the value of the Login authentication scheme field for the system domain is in the LDAP=xx format. If the Login authentication scheme field is missing for the domain, verify if the value of the Default authentication scheme field is in the LDAP=xx format.

If the above configurations are incorrect, configure the LDAP authentication scheme for login users in the relevant ISP domain. LDAP servers generally act as authentication servers, and authorization and accounting are usually configured differently, such as local, RADIUS, or HWTACACS. In this example, authentication uses the LDAP scheme ccc, and local authorization and accounting are used.

<Sysname> system-view

[Sysname] domain test

[Sysname-isp-test] authentication login ldap-scheme ccc

[Sysname-isp-test] authorization login local

[Sysname-isp-test] accounting login local

6. Use the LDAP debugging information to troubleshoot the following faults:

Execute the debugging ldap error command to enable LDAP error debugging. Use the following debugging information printed by the system to identify the issue:

¡ If the output information shows Failed to perform binding operation as administrator, it indicates that the administrator DN configured in LDAP server view does not exist or the administrator password is incorrect. To address this issue, enter LDAP server view and execute the login-dn and login-password commands to modify the administrator DN and password configuration, respectively. In this example, the DN for a user with the administrator role is cn=administrator,cn=users,dc=ld, and the administrator password is admin!123456.

<Sysname> system-view

[Sysname] ldap server ldap1

[Sysname-ldap-server-ldap1] login-dn cn=administrator,cn=users,dc=ld

[Sysname-ldap-server-ldap1] login-password simple admin!123456

¡ If the output information shows Failed to get bind result.errno = 115, it indicates that the LDAP service is not enabled on the peer or the LDAP server is experiencing an anomaly. To address this issue, contact the administrator of the LDAP server.

¡ If the output information shows Bind operation failed, it indicates the device cannot reach the LDAP server. Try troubleshooting connectivity issues between the device and the server.

¡ If the output information shows Failed to perform binding operation as user, it indicates the password of the LDAP user is incorrect.

¡ If the output information shows Failed to bind user username for the result of searching DN is NULL, it indicates the LDAP user does not exist. To address this issue, contact the administrator of the LDAP server.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, log information, alarm messages, and debugging information.

¡ Use the debugging ldap all command to enable all the LDAP debugging functions to collect the device debugging information.

Related alarm and log messages

Alarm messages

Module: HH3C-UI-MAN-MIB

· hh3cLogInAuthenFailure (1.3.6.1.4.1.25506.2.2.1.1.3.0.3)

· Module: HH3C-SSH-MIB

· hh3cSSHUserAuthFailure (1.3.6.1.4.1.25506.2.22.1.3.0.1)

Log messages

· LOGIN/5/LOGIN_AUTHENTICATION_FAILED

· LOGIN/5/LOGIN_FAILED

· SSHS/6/SSHS_AUTH_FAIL

Ineffective dynamic VLAN issued by the RADIUS authentication server

Symptom

When an 802.1X or MAC authentication user is online, the dynamically authorized VLAN attribute issued by the RADIUS authentication server does not take effect.

Common causes

The following are the common causes of this type of issue:

· The RADIUS DAE service is disabled.

· The content of the authorization attribute issued by RADIUS is incorrect.

· The user failed to obtain the dynamic VLAN.

· The interface type configuration for the dynamically authorized VLAN is incorrect.

· The dynamically authorized VLAN does not exist.

Troubleshooting flow

Figure 148 shows the troubleshooting flowchart.

Figure 148 Figure 28 Flowchart for troubleshooting ineffective dynamic VLAN issued by the RADIUS authentication server

Solution

1. Verify if the RADIUS DAE service is enabled.

In system view, execute the display current-configuration | include radius command to verify if the radius dynamic-author server configuration exists.

¡ If the configuration exists, execute the radius dynamic-author server command to enter RADIUS DAE server view and verify if the RADIUS DAE client and RADIUS DAE service port configurations are correct.

<Sysname> system-view

[Sysname] radius dynamic-author server

[Sysname-radius-da-server] display this

radius dynamic-author server

port 3790

client ip 3.3.3.3 key cipher $c$3$kiAORLht3S3rTCmFq0uWXPgV8PjI2Q==

¡ If the configuration does not exist, execute the radius dynamic-author server command to enable the RADIUS DAE service, and enter RADIUS DAE server view to configure the RADIUS DAE client and RADIUS DAE service port. In this example, the client IP address is 1.1.1.1, the shared key is 123456, and the service port is 3798.

<Sysname> system-view

[Sysname] radius dynamic-author server

[Sysname-radius-da-server] client ip 1.1.1.1 key simple 123456

[Sysname-radius-da-server] port 3798

2. Verify if the VLAN attributes issued by the RADIUS server are correct.

Execute the debugging radius packet command to enable RADIUS packet debugging, and configure the RADIUS server to issue the VLAN attributes again.

The RADIUS server must issue the following standard attributes at the same time to issue VLAN information:

¡ The Tunnel-Type attribute, number 64, is an Integer with a fixed value of 13, representing VLAN.

¡ The Tunnel-Medium-Type attribute, number 65, is an Integer with a fixed value of 6, representing IEEE 802.

¡ The Tunnel-Private-Group-Id attribute, number 81, is a String, representing the VLAN ID or VLAN name.

View the output RADIUS debugging information, verify if the COA request contains the three standard attributes as shown in the example below.

*Aug 3 02:33:18:700 2021 Sysname RADIUS/7/PACKET:

Received a RADIUS packet

Server IP : 128.11.3.48

NAS-IP : 128.11.30.69

VPN instance : --(public)

Server port : 55805

Type : COA request

Length : 41

Packet ID : 34

User-Name="user"

Tunnel-Type:0=VLAN

Tunnel-Medium-Type:0=IEEE-802

Tunnel-Private-Group-Id:0="2"

If the output authorization attributes are incorrect, contact the administrator of the RADIUS server to modify the authorization VLAN configuration and try to re-issue the VLAN. If the output authorization attributes are correct, proceed to the next step.

3. Verify if the user successfully received the assigned VLAN information.

Execute the display dot1x connection or display mac-authentication connection command to verify if the online user information includes dynamic VLAN authorization information issued by the server.

¡ If authorized VLAN information exists, it indicates successful VLAN distribution.

¡ If no authorization VLAN information exists, it means the VLAN was not successfully deployed. In this case, as a best practice, continue identifying the cause of the fault under the guidance of technical support based on the RADIUS debugging information.

4. Verify if the authorized VLAN exists.

Execute the display vlan brief command to verify if the dynamically issued VLAN exists. If the VLAN does not exist, execute the vlan vlan-id command in system view to create the VLAN.

5. Verify if the interface type for the VLAN is correct.

Different types of interfaces have different requirements for successfully joining the authorized VLAN. For specific configuration requirements, see configuring 802.1X authentication and configuring MAC authentication in Security Configuration Guide.

6. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, debugging information, and diagnosis information.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

Ineffective or partially effective Filter-Id attribute issued by the RADIUS server

Symptom

The RADIUS authentication server issues an ACL to the user through the Filter-Id attribute, but the user cannot access network resources normally after authentication and login.

Common causes

The following are the common causes of this type of issue:

· The content of the authorization attribute issued by RADIUS is incorrect.

· The access user failed to obtain the ACL.

· The authorized ACL does not exist.

Troubleshooting flow

Figure 149 shows the troubleshooting flowchart.

Figure 149 Flowchart for troubleshooting ineffective or partially effective Filter-Id attribute issued by the RADIUS server

Solution

1. Verify if the Filter-ID attribute issued by the RADIUS server is correct.

Execute the debugging radius packet command to enable RADIUS packet debugging, and configure the RADIUS server to re-issue the Filter-ID attribute. View the output debugging information on the device.

¡ If the issued Filter-ID attribute is purely numeric, it indicates that an ACL number has been issued.

*Aug 18 16:54:49:670 2021 Sysname RADIUS/7/PACKET: -MDC=1;

Received a RADIUS packet

Server IP : 128.11.3.48

NAS-IP : 128.11.30.69

VPN instance : --(public)

Server port : 54175

Type : COA request

Length : 32

Packet ID : 200

User-Name="user"

Filter-Id="2001"

¡ If the Filter-ID attribute value is not entirely numeric and the next attribute delivered is H3c-ACL-Version (with an integer value in the range of 1 to 4), it indicates that an ACL name has been assigned.

*Aug 18 16:55:19:798 2021 Sysname RADIUS/7/PACKET: -MDC=1;

Received a RADIUS packet

Server IP : 128.11.3.48

NAS-IP : 128.11.30.69

VPN instance : --(public)

Server port : 54176

Type : COA request

Length : 48

Packet ID : 157

User-Name="user"

Filter-Id="aclname1"

H3c-ACL-Version=1

If the Filter-ID attribute is not issued as expected, or if the issued ACL type is not supported by the device, contact the administrator of the RADIUS server to modify the authorization ACL configuration and try to re-issue the Filter-ID. If the issue persists, proceed to the next step.

2. Verify if the user successfully received the assigned ACL information.

Execute the display dot1x connection or display mac-authentication connection command to verify if the online user information includes ACL authorization information.

¡ If authorized ACL information exists, it indicates successful ACL distribution.

¡ If no authorization ACL information exists, it means the ACL was not successfully deployed. In this case, as a best practice, continue identifying the cause of the fault under the guidance of technical support based on the RADIUS debugging information.

3. Verify if the corresponding ACL has been created on the device.

Execute the display acl all command to verify if the issued ACL exists.

¡ If the ACL has not been created, execute the acl number acl-number [ name acl-name ] command in system view to create the ACL.

¡ If the ACL exists, verify if the ACL configuration is correct.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, debugging information, and diagnosis information.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

RADIUS authentication fail-permit failed for 802.1X, MAC, or Web authentication users

Symptom

User fail-permit fails when the RADIUS server is unreachable during 802.1X, MAC, or Web authentication. The following conditions might occur:

· Fail-permit is not performed and new users cannot come online.

· Fail-permit succeeds, but users cannot visit critical resources.

· Fail-permit succeeds, but users are kicked off.

Common causes

The following are the common causes of this type of issue:

· Free VLAN is configured on the interface for port security. User traffic in the specified VLAN is not authenticated.

· The fail-permit policy is not configured as required.

· Not all RADIUS servers under the RADIUS authentication scheme are unreachable. Accessible RADIUS servers exist, and other reasons cause the user authentication to fail.

· A backup RADIUS authentication method (Local or None) is configured. The backup method is used when the RADIUS authentication server cannot be reached.

· The critical resources configured for 802.1X and MAC authentication fail-permit do not exist.

· Online user fail-permit is not enabled when 802.1X or MAC authentication offline detection is enabled on the interface. In this case, the device logs off users if no user traffic is detected during an offline detection timeout.

Troubleshooting flow

The troubleshooting flowchart is shown in Figure 150.

Figure 150 Troubleshooting flowchart

Solution

1. Verify if a free VLAN is configured on the port for port security.

If a free VLAN is configured for port security on the user access port, traffic from 802.1X and MAC authentication users in the VLAN will bypass authentication and be forwarded directly. As a result, these users will not trigger fail-permit. The free VLAN configuration example is as follows:

<Sysname> system-view

[Sysname] interface GigabitEthernet 2/0/1

[Sysname-GigabitEthernet2/0/1] port-security free-vlan 2 3

To disable direct forwarding of user traffic is a specific VLAN, delete the free VLAN configuration.

2. Verify if the configured fail-permit policy is correct.

The device supports the following types of fail-permit policies:

¡ ISP domain-based fail-permit (for 802.1X, MAC, or Web authentication users): When the device enters fail-permit state, newly connected users within the authentication domain will "escape" from the current domain and directly access the configured critical domain without authentication. The critical domain configuration example is as follows:

# In ISP domain test, configure domain dm1 as the critical domain.

<Sysname> system-view

[Sysname] domain abc

[Sysname-isp-abc] authen-radius-unavailable online domain dm2

¡ Port-based fail-permit (for 802.1X and MAC authentication users): When the device enters fail-permit state, new users connecting to the port can directly access certain critical resources (such as critical VLAN, critical VSI, critical microsegmentation, or critical resources within a critical profile) bound to the current port without authentication. The critical resource configuration example is as follows:

# Specify VLAN 100 as the critical VLAN of GigabitEthernet 2/0/1.

<Sysname> system-view

[Sysname] interface GigabitEthernet 2/0/1

[Sysname-GigabitEthernet2/0/1] dot1x critical vlan 100

If both types of fail-permit policies are configured, the ISP domain-based fail-permit policy has a higher priority. That is, if the RADIUS server is unreachable, new users will directly enter the critical domain bound to the authentication domain and come online in the critical domain. However, the users cannot access the critical resources on the port.

You can execute the display domain command to verify if a critical domain is configured under the ISP domain for user authentication. For example, in the display information, the Authen-radius-unavailable field shows that the configured critical domain is dm2.

<Sysname> display domain abc

Domain: abc

State: Active

LAN access authentication scheme: RADIUS=bbb

LAN access accounting scheme: RADIUS=bbb

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out policy: Offline

Service type: HSI

Session time: Exclude idle time

Dual-stack accounting method: Merge

Authorization attributes:

Idle cut: Disabled

IGMP access limit: 4

MLD access limit: 4

Authen-fail action: Offline

Authen-radius-unavailable: Online domain dm2

Authen-radius-recover: Not configured

3. Verify if a RADIUS scheme is configured for users.

Execute the display domain command to verify if a RADIUS scheme is configured for LAN access users. In the example, the LAN access authentication scheme field shows that an LDAP authentication scheme is configured and no RADIUS scheme is configured.

<Sysname> display domain abc

Domain: abc

State: Active

LAN access authentication scheme: LDAP=ldp

LAN access authorization scheme : Local

LAN access accounting scheme: Local

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out policy: Offline

Service type: HSI

Session time: Exclude idle time

Dual-stack accounting method: Merge

Authorization attributes:

Idle cut: Disabled

IGMP access limit: 4

MLD access limit: 4

Authen-fail action: Offline

Authen-radius-unavailable: Online domain dm2

Authen-radius-recover: Not configured

If no RADIUS scheme is configured in the authentication domain for LAN access users, configure a scheme as follows:

# Configure RADIUS scheme rd for LAN access users in ISP domain abc.

[Sysname] domain abc

[Sysname-isp-abc] authentication lan-access radius-scheme rd

4. Verify if all RADIUS servers are unreachable under the RADIUS authentication scheme used for user authentication.

The device enters fail-permit state only when all RADIUS servers in the RADIUS scheme used for user authentication are in Block state. Execute the display radius scheme command to view the state of the authentication servers under the RADIUS scheme. In the display information, the State fields of all the RADIUS authentication servers are Active, indicating that the servers are reachable. The fail-permit function will not be triggered.

<Sysname> display radius scheme rd

RADIUS scheme name: rad1

Index: 0

Primary authentication server:

Host name: Not Configured

IP : 128.11.3.33 Port: 1812

VPN : Not configured

State: Active (duration: 0 weeks, 0 days, 0 hours, 43 minutes, 22 seconds)

Most recent state changes:

2022/03/30 15:15:59 Changed to active state

2022/03/30 15:11:05 Changed to blocked state

2022/03/30 15:09:55 Changed to active state

2022/03/30 15:05:01 Changed to blocked state

2022/03/30 08:58:59 Changed to active state

Test profile: Not configured

Weight: 0

Primary accounting server:

Host name: Not Configured

IP : 128.11.3.33 Port: 1813

VPN : Not configured

State: Blocked (mandatory)

Most recent state changes:

2022/03/30 08:59:11 Changed to blocked state

2022/03/29 19:15:04 Changed to active state

2022/03/29 19:10:06 Changed to blocked state

2022/03/29 19:03:08 Changed to active state

2022/03/29 18:58:15 Changed to blocked state

Weight: 0

Second authentication server:

Host name: Not Configured

IP : 1.12.3.4 Port: 1812

VPN : Not configured

State: Active (duration: 0 weeks, 0 days, 0 hours, 0 minutes, 10 seconds)

Most recent state changes:

2022/03/30 15:59:11 Changed to active state

Test profile: Not configured

Weight: 0

Accounting-On function : Disabled

extended function : Disabled

retransmission times : 50

retransmission interval(seconds) : 3

Timeout Interval(seconds) : 3

Retransmission Times : 3

Retransmission Times for Accounting Update : 5

Server Quiet Period(minutes) : 5

Realtime Accounting Interval(seconds) : 720

Stop-accounting packets buffering : Enabled

Retransmission times : 500

NAS IP Address : Not configured

Local NAS IP Address : Not configured

5. Verify if the RADIUS scheme is the authentication method in use.

If a backup RADIUS authentication method (Local or None) is configured, the backup method is used when the RADIUS authentication server cannot be reached. Fail-permit will not be triggered.

Execute the display domain command to view the authentication method configured for LAN access users in the user authentication domain. In the example, the LAN access authentication scheme field shows that the preferred RADIUS authentication scheme is rd and local authentication can be used if the authentication scheme is unavailable.

<Sysname> display domain abc

Domain: abc

State: Active

LAN access authentication scheme: RADIUS=rd, Local

LAN access authorization scheme: RADIUS=rd, Local

LAN access accounting scheme: RADIUS=rd, Local

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out policy: Offline

Service type: HSI

Session time: Exclude idle time

Dual-stack accounting method: Merge

Authorization attributes:

Idle cut: Disabled

IGMP access limit: 4

MLD access limit: 4

Authen-fail action: Offline

Authen-radius-unavailable: Online domain dm2

Authen-radius-recover: Not configured

In this scenario, to trigger user fail-permit when the RADIUS server is unreachable, delete the configured backup authentication method, making RADIUS authentication the last method.

6. Verify if critical resources are configured on the port.

¡ For 802.1X, MAC, and Web authentication users in the critical domain, they can access the authorized resources configured in the domain. First, execute the display domain command to view the Authorization attributes field in the critical domain, and then configure the corresponding authorization resources on the device.

¡ For 802.1X and MAC authentication users that performed fail-permit based on the port-based fail-permit policy, they can access the critical resources configured on the port. First, view the critical configuration on the user authentication interface, and then create the corresponding authorization resources on the device.

[Sysname-GigabitEthernet2/0/24] display this

interface GigabitEthernet2/0/24

port link-mode bridge

dot1x critical vlan 24

7. Verify if offline detection is enabled on the user access port.

If offline detection is enabled, by default, when all RADIUS authentication servers in the authentication domain are unreachable, the device logs off users with no traffic within a detection period.

In this example, the command output shows that offline detection is enabled on the access port for MAC authentication users.

<Sysname> display mac-authentication

Global MAC authentication parameters:

MAC authentication : Enabled

Authentication method : PAP

DR member configuration conflict : Unknown

Username format : MAC address in lowercase(xxxxxxxxxxxx)

Username : mac

Password : Not configured

MAC range accounts : 2

MAC address Mask Username

2222-0000-0000 ffff-0000-0000 user1

4444-0000-0000 ffff-0000-0000 user1

Offline detect period : 300 s

Quiet period : 60 s

Server timeout : 100 s

Reauth period : 3600 s

User aging period for critical VLAN : 1000 s

User aging period for critical VSI : 1000 s

User aging period for guest VLAN : 1000 s

User aging period for guest VSI : 1000 s

User aging period for critical microsegment: 1000 s

Temporary user aging period : 60 s

Authentication domain : Not configured, use default domain

HTTP proxy port list : Total 10 ports

1-3, 5, 7, 9, 11-13, 15

HTTPS proxy port list : Not configured

Max number of silent MACs : 31236 (per slot)

Online MAC-auth wired users : 1

Online MAC-auth wireless users : 2

Silent MAC users:

MAC address VLAN ID From port Port index

0001-0000-0001 100 GE2/0/2 21

GigabitEthernet2/0/1 is link-up

MAC authentication : Enabled

Carry User-IP : Disabled

Authentication domain : Not configured

Auth-delay timer : Enabled

Auth-delay period : 60 s

Periodic reauth : Enabled

Reauth period : 120 s

Re-auth server-unreachable : Logoff

Guest VLAN : 100

Guest VLAN reauthentication : Enabled

Guest VLAN auth-period : 150 s

Critical VLAN : Not configured

Critical voice VLAN : Disabled

Host mode : Single VLAN

Offline detection : Enabled

Authentication order : Parallel

User aging : Enabled

Server-recovery online-user-sync : Enabled

...omit...

When a RADIUS authentication server is reachable, to use offline detection and allow users to stay online in case of authentication server failure, you can enable online user fail-permit on the device.

For MAC authentication users, the configuration method to enable online user fail-permit on an interface is as follows:

<Sysname> system-view

[Sysname] interface GigabitEthernet 2/0/1

[Sysname-GigabitEthernet2/0/1] mac-authentication auth-server-unavailable escape

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ Device configuration file, debugging information, and diagnosis information.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

IPoE user fail-permit failure during RADIUS authentication

Symptom

During IPoE user authentication, the RADIUS server is unreachable and the fail-permit function fails, preventing users from coming online.

Common causes

The following are the common causes of this type of issue:

· The fail-permit policy is not configured as required.

· Not all RADIUS servers under the RADIUS authentication scheme are unreachable. Accessible RADIUS servers exist, and other reasons cause the user authentication to fail.

· A backup RADIUS authentication method (Local or None) is configured. The backup method is used when the RADIUS authentication server cannot be reached.

Troubleshooting flow

Figure 151 shows the troubleshooting flowchart.

Figure 151 Flowchart for troubleshooting IPoE user fail-permit failure during RADIUS authentication

Solution

1. Verify if the configured fail-permit policy is correct.

IPoE users support fail-permit based on an ISP domain. In the user authentication domain, specify a critical domain (also known as fail-permit domain) to accommodate users that access the authentication domain when all RADIUS servers are unavailable.

<Sysname> display domain name abc

Domain: abc

Current state: Active

State configuration: Active

IPoE authentication scheme: RADIUS=rd

IPoE authorization scheme: RADIUS=rd

IPoE accounting scheme: RADIUS=rd

PPPoEA authentication scheme: None

PPPoEA authorization scheme: None

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out policy: Offline

Send accounting update:Yes

Session time: Exclude idle time

Dual-stack accounting method: Merge

Authen-fail action: Offline

Service type: HSI

DHCPv6-follow-IPv6CP timeout: 60 seconds

IPv6CP interface ID assignment: Disabled

NAS-ID: N/A

Service rate-limit mode: Separate

Web server IPv4 URL : Not configured

Track : Not configured

Web server IPv6 URL : Not configured

Track : Not configured

Web server URL parameters : Not configured

Web server IPv4 address : Not configured

Web server secondary IPv4 address: Not configured

Web server IPv6 address : Not configured

Web server secondary IPv6 address: Not configured

Secondary Web server IPv4 URL : Not configured

Track : Not configured

Secondary Web server IPv6 URL : Not configured

Track : Not configured

Secondary Web server IPv4 address : Not configured

Secondary Web server secondary IPv4 address: Not configured

Secondary Web server IPv6 address : Not configured

Secondary Web server secondary IPv6 address: Not configured

Redirect active time : Not configured

Redirect server IPv4 address : Not configured

Temporary redirect : Disabled

Redirect server IPv6 address : Not configured

Access user auto-save : Enabled

Authorization attributes:

Idle cut: Disabled

IGMP access limit: 4

MLD access limit: 4

Access limit: Not configured

Access interface VPN instance strict check: Disabled

Dynamic authorization effective attributes: Not configured

Authen-radius-unavailable: Online on domain dm2

Authen-radius-recover: Not configured

IP resource usage warning thresholds:

High threshold: Not configured

Low threshold: Not configured

IPv6 resource usage warning thresholds:

High threshold: Not configured

Low threshold: Not configured

L2TP-user RADIUS-force: Disabled

IPv6 ND autoconfiguration:

Managed-address flag: Unset

Other flag : Unset

If the Authen-radius-unavailable field shows Not configured or does not show the expected domain name, reconfigure the critical domain as follows:

# In ISP domain abc, configure domain dm1 as the critical domain.

<Sysname> system-view

[Sysname] domain name abc

[Sysname-isp-abc] authen-radius-unavailable online domain dm1

2. Verify if all RADIUS servers are unreachable under the RADIUS authentication scheme used for user authentication.

<Sysname> display radius scheme rd

RADIUS scheme name: rd

Index: 0

Primary authentication server:

IP : 2.2.2.2 Port: 1812

VPN : Not configured

State: Active (duration: 0 weeks, 0 days, 0 hours, 0 minutes, 19 seconds)

Most recent state changes:

2022/04/22 15:54:58 Changed to active state

Test profile: Not configured

Weight: 0

Primary accounting server:

IP : 2.2.2.2 Port: 1813

VPN : Not configured

State: Active (duration: 0 weeks, 0 days, 0 hours, 0 minutes, 8 seconds)

Most recent state changes:

2022/04/22 15:55:10 Changed to active state

Weight: 0

...

3. Verify if the RADIUS scheme is the authentication method in use.

If a backup RADIUS authentication method (Local or None) is configured, the backup method is used when the RADIUS authentication server cannot be reached. Fail-permit will not be triggered.

Execute the display domain command to view the authentication method configured for IPoE users in the user authentication domain. In the example, the IPoE access authentication scheme field shows that the preferred RADIUS authentication scheme is rd and local authentication can be used if the authentication scheme is unavailable.

<Sysname> display domain abc

Domain: abc

State: Active

LAN access authentication scheme: RADIUS=rd, Local

LAN access authorization scheme: RADIUS=rd, Local

LAN access accounting scheme: RADIUS=rd, Local

Default authentication scheme: Local

Default authorization scheme: Local

Default accounting scheme: Local

Accounting start failure action: Online

Accounting update failure action: Online

Accounting quota out policy: Offline

Service type: HSI

Session time: Exclude idle time

Dual-stack accounting method: Merge

Authorization attributes:

Idle cut: Disabled

IGMP access limit: 4

MLD access limit: 4

Authen-fail action: Offline

Authen-radius-unavailable: Online domain dm2

Authen-radius-recover: Not configured

In this scenario, to trigger user fail-permit when the RADIUS server is unreachable, delete the configured backup authentication method, making RADIUS authentication the last method.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, debugging information, and diagnosis information.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

ITA service does not take effect

Symptom

After a user comes online, the ITA service policy either fails to take effect or stops functioning. The system does not independently meter and rate-limit traffic to different destination addresses according to the expected accounting levels as intended.

Common causes

The following are the common causes of this type of issue:

· The user access type does not support ITA service policies.

· No ITA service policy is configured on the device for the user.

· The RADIUS server has not authorized an ITA service policy for the user, and the ITA service policy to be used is not specified in the user authentication domain.

· The RADIUS server has authorized an EDSG service policy for the user, and an ITA service policy to be applied is specified in the user authentication domain.

· The accounting configuration in the ITA service policy is incorrect.

· The QoS configuration for marking ITA service traffic is incorrect.

· The user's ITA service traffic quota has been exhausted.

Troubleshooting flow

Figure 152 shows the troubleshooting flowchart.

Figure 152 Flowchart for troubleshooting ineffective ITA service

Solution

1. Verify that the user access type supports ITA service policies.

Currently, only the portal, IPoE, and PPP access types support applying ITA service policies.

You can execute the display access-user command and view the Access type field to identify the user access type.

¡ If the user access type is portal, IPoE, or PPP, proceed to step 2.

¡ If the user access type is any other type, no action is required.

2. Verify if the expected ITA service policy is configured on the device.

¡ Execute the display ita policy command to verify if an ITA policy is configured on the device.

¡ If the specified ITA policy does not exist, execute the ita policy command in system view to create the ITA service policy and configure the policy as needed. For more information, see Security Configuration Guide.

¡ If the specified ITA policy exists, proceed to step 3.

3. Verify if the ITA service policy is authorized for the user.

If the RADIUS server has authorized an ITA service policy for the user, the device will use the policy authorized by the RADIUS server. If no policy is authorized, the device uses the ITA service policy specified in the user authentication domain. Therefore, first verify if the RADIUS server has authorized an ITA service policy, and then check the configuration under the authentication domain as needed.

a. Execute the debugging radius packet command to enable RADIUS packet debugging. If the system prints H3C-Ita-Policy="XXX" when the user comes online, it indicates that an ITA service policy has been authorized for the user. In this case, proceed to step 4. If no ITA service policy is authorized, proceed to step b.

b. Execute the display domain command to view the user authentication domain configuration. If the command output includes ITA service policy: XXX (where XXX represents the policy name), it indicates that an ITA service policy is configured in the domain. In this case, proceed to step c. If no ITA service policy is specified, proceed to step d.

c. Verify if the RADIUS server has authorized an EDSG service policy for the user. If the RADIUS packet debugging information output when the user comes online includes H3C-AV-Pair := "edsg-policy:activelist=xxx" or Cisco-AVPair := "edsg-policy:username=[xxx]xxx", it indicates than an EDSG service policy has been authorized.

If an EDSG service policy has been authorized, the ITA service policy specified in the authentication domain does not take effect. In this case, first change the user authorization configuration on the RADIUS server to cancel EDSG policy issuance, and then proceed to step 4.

d. You can authorize an ITA service policy for users in either of the following methods

- Based on user authorization: Configure an ITA service policy on the RADIUS authentication server, and make users go offline and come online again.

- Based on authentication domain: Specify an ITA service policy in authentication domain view used by the user to come online, and then make the user go offline and come online again.

For example, specify ITA service policy ita1 in ISP domain test.

<Sysname> system-view

[Sysname] domain name test

[Sysname-isp-test] ita-policy ita1

4. Verify if the accounting scheme used by the ITA service policy is available.

Execute the display ita policy command to display ITA service policy configuration, and view the Accounting method field to identify the accounting scheme used by the ITA service policy.

For example, view the configuration of ITA service policy ita1.

<Sysname> display ita policy ita1

Accounting method : RADIUS=Rd1, None

Accounting merge : Enabled

Accounting levels :

Level 1 IPv4

Inbound CAR: CIR 100 kbps PIR 200 kbps

Outbound CAR: CIR 100 kbps PIR 200 kbps

Level 2 IPv6

Inbound CAR: CIR 300 kbps PIR 400 kbps

Level 3 IPv4

Level 8 IPv6

Traffic separation : Enabled

Separated levels: 1, 2, 3, 4

Traffic quota-out action: Online

Send accounting update: No

¡ If the Accounting method shows None, it indicates that no accounting method is specified for the ITA service policy. In this case, configure an accounting scheme in ITA service policy view, and make sure the specified accounting server is available.

For example, specify accounting scheme radius1 in ITA service policy ita1.

<Sysname> system-view

[Sysname] ita policy ita1

[Sysname-ita-policy-ita1] accounting-method radius-scheme radius1

¡ If the Accounting method field includes RADIUS=xxx, it indicates that the RADIUS accounting method is specified for the ITA service policy. In this case, make sure the RADIUS accounting server is available. If the accounting server is unavailable, see "RADIUS server not responding."

5. Verify if the ITA service is being charged according to the traffic accounting level.

By defining different traffic levels based on the destination addresses of users' traffic, you can use ITA to separate the traffic accounting statistics of different levels for each user.

a. Execute the display ita policy command to display ITA service policy configuration, and view the Accounting levels field to verify if the accounting level information is correct under the ITA service policy.

For example, view the configuration of ITA service policy ita1.

<Sysname> display ita policy ita1

Accounting method : RADIUS=Rd1, None

Accounting merge : Enabled

Accounting levels :

Level 1 IPv4

Inbound CAR: CIR 100 kbps PIR 200 kbps

Outbound CAR: CIR 100 kbps PIR 200 kbps

Level 2 IPv6

Inbound CAR: CIR 300 kbps PIR 400 kbps

Traffic separation : Enabled

Separated levels: 1, 2, 3, 4

Traffic quota-out action: Online

Send accounting update: No

- If the Accounting levels shows None, it indicates that no accounting level is specified. In this case, configure the accounting level for users under the ITA service policy.

For example, in ITA service policy ita1, specify the accounting levels for IPv4 traffic as level 2, and the accounting level for IPv6 traffic as level 5.

<Sysname> system-view

[Sysname] ita policy ita1

[Sysname-ita-policy-ita1] accounting-level 2 ipv4

[Sysname-ita-policy-ita1] accounting-level 5 ipv6

- If the Accounting levels field is not None, it indicates that an accounting level is specified. Make sure the accounting level is correct and then proceed to step b.

b. Verify if the QoS policy configuration for identifying user ITA service traffic is correct.

- To issue QoS policies to users based on authorized user profiles, verify that a QoS policy is applied to the user profile, and the traffic class and the traffic priority marking settings are correct in the QoS policy.

- To issue QoS policies to users based on interfaces, verify that a QoS policy is applied to the user access interface, and the traffic class and the traffic priority marking settings are correct in the QoS policy.

6. Verify if the ITA service has stopped working.

a. Execute the display value-added-service user xxx verbose command to view detailed information about value-added service users. If the Level-X State fields shows Offline, it indicates that the value-added service is offline.

b. If the service with the specified accounting level is offline, view the Offline reason field in the command output to identify the service offline reason. Possible options of the Offline reason field includes:

- Authentication failed.

- Accounting failed.

- Accounting update failed.

- Failed to send accounting packets.

- Traffic quota exhausted.

- Session timed out.

- Cut by the AAA server.

- Logged out by the RADIUS proxy.

If the quota is exhausted, no action is required. If the service was forced to go offline, validate with RADIUS or device administrator. For other cases, see "RADIUS server not responding" to exclude the server fault, and then proceed to step 8.

For example, view detailed information about the value-added service user with IP address 1.1.1.1.

<Sysname> display value-added-service user ip-address 1.1.1.1 verbose

Slot 97:

Basic:

User ID : 0x1

User name : user1

IP address : 1.1.1.1

IPv6 address : -

Service type : ITA

ITA:

Policy name : ita1

Accounting merge : Disabled

Traffic quota-out action : Offline

Level-1 State : Offline

Offline reason : Session timed out

Inbound CAR : CIR 1000kbps PIR 2000kbps

CBS -

Outbound CAR : CIR 1000kbps PIR 2000kbps

CBS -

Uplink packets/bytes : 4/392

Downlink packets/bytes : 4/392

IPv6 uplink packets/bytes : 0/0

IPv6 downlink packets/bytes : 0/0

Accounting start time : 2022-08-27 01:23:41

Online time (hh:mm:ss) : 0:00:12

Accounting state : Stop

Session timeout : Unlimited

Time remained : Unlimited

Realtime accounting interval: -

Traffic separate : Disabled

Traffic quota : Unlimited

Traffic remained : Unlimited

The above display shows that the Level-1 State is Offline, indicating that the ITA service of level 1 accounting is offline. The Offline reason field is Session timed out, indicating that the quota for the ITA service of level 1 accounting has been exhausted.

7. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, log information, and alarm messages.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

EDSG service does not take effect

Symptom

After a user comes online, the EDSG service policy does not take effect or stops taking effect. The user is not provided with independent accounting and dynamic rate limit services as expected based on the EDSG value-added service parameters.

Common causes

The following are the common causes of this type of issue:

· The user access type does not support EDSG service policies.

· No EDSG service policy is configured on the device for the user.

· The RADIUS server failed to authorize an EDSG service policy for the user.

· The RADIUS server has authorized both an EDSG service policy and an ITA service policy for the user.

· The EDSG service policy information (including EDSG policy name, username, and password) delivered by the RADIUS server is invalid and cannot be recognized by the device.

· The authentication or accounting scheme specified in the EDSG service policy is not available.

· The EDSG service policy has stopped working.

Troubleshooting flow

Figure 153 shows the troubleshooting flowchart.

Figure 153 Flowchart for troubleshooting ineffective EDSG service

Solution

1. Verify that the user access type supports EDSG service policies.

Currently, only the IPoE and PPP access types support applying EDSG service policies.

You can execute the display access-user command and view the Access type field to identify the user access type.

¡ If the user access type is IPoE or PPPoE, proceed to step 2.

¡ If the user access type is any other type, no action is required.

2. Verify if the expected EDSG service policy is configured on the device.

¡ Execute the display service policy command to verify if an EDSG policy is configured on the device.

¡ If the specified EDSG policy does not exist, execute the service policy command in system view to create the EDSG service policy and configure the policy as needed. For more information, see Security Configuration Guide.

¡ If the specified EDSG policy exists, proceed to step 3.

3. Verify if the RADIUS server has authorized an EDSG service policy for the user.

The device can recognize only EDSG service policy information (including EDSG policy name, username, and password) delivered by the RADIUS server through private attributes H3c-AV-Pair and Cisco-AVPair.

a. Execute the debugging radius packet command to enable RADIUS packet debugging. If the RADIUS packet debugging information output when the user comes online includes H3C-AV-Pair := "edsg-policy:activelist=xxx" or Cisco-AVPair := "edsg-policy:username=[xxx]xxx", it indicates than an EDSG service policy has been authorized. In this case, proceed to step 4. If no EDSG service policy has been authorized, proceed to step b.

b. Configure an EDSG service policy on the RADIUS authentication server, and make users go offline and come online again.

4. Verify if the RADIUS server has authorized both an ITA service policy and an EDSG service policy for the user.

If the RADIUS server issues both ITA and EDSG service policies for the same user, the EDSG service policy will not take effect. In this case, change the user authorization configuration on the RADIUS server. Make sure the server authorizes only an EDSG service policy for the user.

NOTE:

When RADIUS packet debugging is enabled, if the RADIUS server authorizes an ITA service policy for a user, the system prompt when the user comes online will include H3C-Ita-Policy="XXX".

5. Verify if the device can identify the EDSG service policy information issued by the RADIUS server.

The device can only recognize EDSG service policy information (policy name, username, password) delivered through private attributes (H3c-AV-Pair or Cisco-AVPair). Confirm with the server administrator whether the username and password are issued using other unsupported attributes.

¡ If both the username and password are issued simultaneously, confirm the RADIUS attribute name with the server administrator. Then, enable the RADIUS attribute interpretation function under the RADIUS scheme for user authentication, and configure the RADIUS attribute translation rule to convert the attribute to H3C-AV-Pair or Cisco-AVPair.

For example, if RADIUS scheme rs1 is used for user authentication, enable RADIUS attribute translation in RADIUS scheme view of RADIUS scheme rs1, and configure the system to convert received H3c-Server-String attributes into H3c-AVPair attributes.

<Sysname> system-view

[Sysname] radius scheme rs1

[Sysname-radius-rs1] attribute translate

[Sysname-radius-rs1] attribute convert H3c-Server-String to H3c-AVPair received

¡ If username and password are not issued simultaneously, proceed to step 6.

6. Verify if the authentication and accounting schemes used by the EDSG service are available.

Execute the display service policy command to view EDSG service policy information. The Authentication method and Accounting method fields display the authentication scheme and accounting scheme used by the EDSG service policy, respectively.

For example, view the configuration of EDSG service policy sp1.

<Sysname> display service policy sp1

Service policy: sp1

Service ID : 10

Authentication method : RADIUS=Rd1, None

Accounting method : RADIUS=Rd1, None

Traffic statistics : Separate

Inbound CAR : CIR=222 kbps, PIR=2222 kpbs, CBS=5678 bytes, EBS=5678 bytes

Outbound CAR : CIR=222 kbps, PIR=2222 kpbs

Dual-stack rate limit mode : Merge

Service rate-limit mode : Separate

¡ If both the Authentication method and Accounting method fields are None, it indicates that the EDSG service for the user does not require separate authentication or accounting. In this case, proceed to step 7.

¡ If the Authentication method and Accounting method field include the RADIUS=xxx string, it indicates that the EDSG service for the user requires separate authentication and accounting. In this case, make sure the RADIUS authentication server and accounting server are available and the corresponding authentication username and password are created on the server.

NOTE:

If the EDSG service policy delivered by the server during user login includes a username and password, the device uses the username and password for EDSG authentication. Otherwise, the device uses the login username and password for EDSG service validation.

7. Verify if the EDSG service has stopped working.

- Authentication failed.

- Accounting failed.

- Accounting update failed.

- Failed to send accounting packets.

- Traffic quota exhausted.

- Session timed out.

- Cut by the AAA server.

- Logged out by the RADIUS proxy.

For example, view detailed information about the value-added service user with IP address 1.1.1.1.

<Sysname> display value-added-service user ip-address 1.1.1.1 verbose

Slot 97:

Basic:

User ID : 0x80000033

User name : pp3

IP address : 1.1.1.1

IPv6 address : -

Service type : EDSG

Service policy:

Service ID : 8

Policy name : sp8

Policy username : pp3

State : Offline

Offline reason : Session timed out

Traffic statistics : Separate

Service rate-limit mode : Separate

Dual-stack rate limit mode : Merge

Traffic quota-out action : Offline

Inbound CAR : -

Outbound CAR : -

Uplink packets/bytes : 0/0

Downlink packets/bytes : 0/0

IPv6 uplink packets/bytes : 0/0

IPv6 downlink packets/bytes : 0/0

Accounting start time : 2022-08-27 05:03:49

Online time (hh:mm:ss) : 0:03:13

Accounting state : Stop

Session timeout : Unlimited

Time remained : Unlimited

Realtime accounting interval : 20 seconds

Traffic quota : Unlimited

Traffic remained : Unlimited

8. If the issue persists, collect the following information and contact Technical Support:

¡ Execution results of the above steps.

¡ Device configuration file, log information, and alarm messages.

Related alarm and log messages

Alarm messages

None.

Log messages

None.

MAC authentication issues

MAC authentication failures

Symptom

Failures or exceptions occur in MAC authentication for a user.

Common causes

The following are the common causes of this type of issue:

· The user has come online through other authentication methods.

· MAC authentication is not enabled globally or on the interface.

· The authentication method configured on the device is not consistent with that on the RADIUS server.

· The authentication domain used by the MAC authentication user and related settings are not configured correctly.

· No response from the RADIUS server.

· The local authentication or RADIUS authentication request is rejected.

· Failed to deploy the authorization attributes.

· The user's MAC address has been set as a silent MAC address.

· The number of concurrent online MAC authentication users on the interface has reached the maximum.

Troubleshooting flow

Figure 129 shows the troubleshooting flowchart.

Figure 154 Flowchart for troubleshooting MAC authentication failures

Solution

CAUTION:

· Do not enable the debugging commands when the device is operating normally. Enable the commands when you reproduce the issue after it has occurred.

· Save the execution results of the following steps promptly, so that information can be quickly collected and provided if the issue persists.

1. Check whether the user has come online through other authentication methods.

By default, the authentication order on a port is 802.1X authentication, MAC authentication, and then Web authentication.

Execute the display dot1x connection command to check whether the user has successfully passed 802.1X authentication and come online. If the user has come online, determine whether the user needs to come online again through MAC authentication. If MAC authentication is required, log out the user, disable 802.1X authentication, and then configure the user to perform MAC authentication.

2. Check whether MAC authentication is enabled globally or on the interface:

a. Execute the display mac-authentication command, if MAC authentication is not configured. is prompted, global MAC authentication is disabled. To enable it, execute the mac-authentication command in system view.

b. Execute the display mac-authentication command. If global MAC authentication configuration exists, but MAC authentication configuration on the user authentication interface does not exist, execute the mac-authentication command in the view of the user authentication interface.

3. Check whether the authentication method configured on the device is consistent with that on the RADIUS server.

The device supports using both CHAP and PAP authentication methods for MAC authentication.

Execute the dis mac-authentication command to check whether the authentication method used for MAC authentication displayed in the Authentication method field is consistent with that configured on the RADIUS server. If they are different, execute the mac-authentication authentication-method command to modify the configuration on the device.

4. Check whether the authentication domain and related configurations are configured correctly.

MAC authentication users accessing through the port will select the authentication domain in the following order: the authentication domain specified on the port, the authentication domain specified in system view, and then the default authentication domain of the system.

a. Execute the display mac-authentication command on the device to check whether a MAC authentication domain for user authentication is configured on the system and authentication interface.

<Sysname> display mac-authentication

Global MAC authentication parameters:

MAC authentication : Enabled

Authentication method : PAP

Authentication domain : Not configured, use default domain

…

GigabitEthernet2/0/1 is link-up

MAC authentication : Enabled

Carry User-IP : Disabled

Authentication domain : Not configured

…

b. If an authentication domain used for MAC authentication users is configured on the authentication interface, execute the display domain command to check whether the authentication scheme in the authentication domain is configured correctly. If an authentication domain is not configured on the authentication interface, but is configured in system view, execute the display domain command to check whether the authentication scheme in the authentication domain is configured correctly.

c. If no authentication domain used for MAC authentication users is configured on both the authentication interface and system view, check the configuration of the default authentication domain.

d. If no default authentication domain exists and a domain to accommodate users assigned to nonexistent domains has been configured by the domain if-unknown command, check whether the authentication scheme in the domain is configured correctly.

e. If none of the authentication domains mentioned above exists on the device, the user cannot perform authentication.

5. Check whether the RADIUS server is responsive.

For more information, see troubleshooting RADIUS server unresponsiveness in AAA troubleshooting procedures.

6. Check whether the authentication request is rejected:

a. Execute the debugging mac-authetication event command to enable debugging for MAC authentication events.

- If the system prompts Local authentication request was rejected., it indicates that the local authentication request is rejected. The causes for local authentication rejection includes non-existent local user, incorrect user password, and incorrect service type.

- If the system prompts The RADIUS server rejected the authentication request., it indicates that the request is rejected by the RADIUS server. Common causes for server authentication rejection includes missing username on the server, inconsistent username formats, incorrect username password, and RADIUS server policy mismatch.

Execute the debugging radius error command on the device to enable debugging for RADIUS errors. You can also execute the test-aaa command to perform a RADIUS request test on the device. After identifying the issue, adjust the server, device, and client configurations accordingly.

b. Execute the display aaa online-fail-record command and view the authentication failure reasons displayed in the Online failure reason field. For more information, see AAA troubleshooting procedures.

7. Check whether authorization attributes failed to be deployed.

Execute the debugging mac-authentication event command to enable debugging for MAC address authentication events. If the device prompts Authorization failure.,, it indicates an authorization failure.

a. Check whether the authorization-fail-offline feature has been configured in system view using the authorization fail user offline command. If this feature is not configured, users can stay online after authorization failures by default. This indicates that the authentication failure is not caused by an authorization failure. Proceed with other steps.

b. If the authorization-fail-offline feature is configured, execute the mac-authentication access-user log enable failed-login command to enable logging for MAC authentication user login failures. You can identify the attributes (such as authorization ACL and VLAN) that failed to be deployed from the logs.

c. Check whether the authorization attribute settings on the server are correct. Make sure the authorization attributes deployed by the server are correct.

d. Execute commands such as display acl or display vlan to check whether the corresponding authorization attributes exist on the device. If the attributes do not exist, create relevant authorization attributes on the device, and make sure that the user can obtain the authorization information.

8. Check whether the user's MAC address has been set as a silent MAC address.

Execute the display mac-authentication command to view the information displayed in the Silent MAC users field. If the user's MAC address is a silent MAC address, wait for the quiet timer to age before performing MAC authentication again. You can reconfigure the quiet timer using the mac-authentication timer quiet command.

9. Check whether the number of concurrent online MAC authentication users on the interface has reached the maximum:

a. Execute the display mac-authentication command to view information on the authentication interface. View the maximum number of concurrent online users allowed on the interface displayed in the Max online users field and the number of current online users displayed in the Current online users field. Compare the two numbers to determine whether the number of concurrent online MAC authentication users on the interface has reached the maximum.

b. If the maximum number of concurrent online users has been reached, execute the mac-authentication max-user command to increase the maximum number of concurrent MAC authentication users on the interface.

c. If the number of maximum number of concurrent MAC authentication users cannot be increased, wait for other users to go offline or use a different port for user access.

10. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Log information collected after you execute the mac-authentication access-user log enable command.

¡ Debugging information collected after you execute the debugging mac-authentication all and debugging radius all commands.

Related alarm and log messages

Alarm messages

N/A

Log messages

· MACA_ENABLE_NOT_EFFECTIVE

· MACA_LOGIN_FAILURE

MAC authentication user disconnections

Symptom

A MAC authentication user is disconnected unexpectedly after passing authentication and coming online.

Common causes

The following are the common causes of this type of issue:

· The user has come online using 802.1X authentication.

· MAC authentication-related configurations on the device have changed.

· Real-time accounting of MAC authentication user's traffic has failed.

· The user failed MAC reauthentication.

· The server forces the user offline.

· The user goes offline after offline detection is enabled.

· The user session has timed out.

Troubleshooting flow

Figure 130 shows the troubleshooting flowchart.

Figure 155 Flowchart for troubleshooting MAC authentication user disconnections

Solution

CAUTION:

· Do not enable the debugging commands when the device is operating normally. Enable the commands when you reproduce the issue after it has occurred.

· Save the execution results of the following steps promptly, so that information can be quickly collected and provided if the issue persists.

1. Check whether disconnection occurs because the user has come online after passing 802.1X authentication.

By default, the authentication order on a port is 802.1X authentication, MAC authentication, and then Web authentication.

If the user first passes MAC authentication, Web authentication is terminated immediately, but 802.1X authentication will proceed. If the user also passes 802.1X authentication, the 802.1X authentication information will overwrite the MAC authentication information of the user.

Execute the display dot1x connection command to check whether the user has successfully passed 802.1X authentication and come online. If the user has come online, determine whether the user needs to come online again through MAC authentication. If MAC authentication is required, log off the user, disable 802.1X authentication, and then configure the user to perform MAC authentication.

2. Check whether MAC authentication-related configurations on the device have changed:

a. Execute the display mac-authentication command to check whether the configurations (such as feature enabling and authentication method) related to MAC authentication on the device have changed.

b. Execute the display domain command to check whether the configurations (such as authorization attributes) in the user authentication domain have changed.

3. Check whether real-time accounting failed.

Execute the debugging mac-authentication event command to enable debugging for MAC authentication events. If the system prompts Real-time accounting failure., it indicates that real-time charging accounting failed. Check the link state between the device and the accounting server, and whether the related accounting configurations on the device and the accounting server have changed.

4. Check whether disconnection occurs because of a reauthentication failure:

a. Execute the display mac-authentication command and view the information displayed in the Periodic reauth field to check whether MAC reauthentication is enabled on the authentication interface.

b. Execute the mac-authentication access-user log enable logoff command to enable logging for MAC authentication user logoffs.

c. Identify the reasons for the reauthentication failure as described in "MAC authentication failures."

5. Check whether the RADIUS server forced the user offline.

Execute the debugging mac-authentication event command to enable debugging for MAC authentication events. If the system prompts The RADIUS server forcibly logged out the user., it indicates that the server forced the user offline. Please contact the server administrator to identify the reasons for forcible logoff by the server.

6. Check whether no user packet is received before the offline detect timer expires:

a. Execute the display mac-authentication command and view the information displayed in the Offline detection field on the authentication interface to check whether the offline detection has been enabled.

b. Execute the debugging mac-authentication event command to enable debugging for MAC authentication events. If the system prompts Offline detect timer expired., it indicates that no packet was received from the online MAC authentication user on the interface before the offline detect timer expires. The device disconnected the user connection, causing the user to go offline.

c. Check the link state between the user client and the device to identify the reasons for packet sending failures.

7. Check whether the user session has timed out:

a. Execute the debugging radius packet command to enable RADIUS packet debugging. Verify that the Session-Timeout attribute is carried in the responses from the server.

b. Execute the debugging mac-authentication event command to enable debugging for MAC authentication events. If the system prompts User session timed out., it indicates that the user goes offline because of user session timeout.

c. Disconnections caused by user session timeouts are normal. Users can reinitiate a request to come online.

8. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Offline reasons displayed after you execute the display aaa abnormal-offline-record or display aaa normal-offline-record command.

¡ Log information collected after you execute the mac-authentication access-user log enable command.

¡ Debugging information collected after you execute the debugging mac-authentication all and debugging radius all commands.

Related alarm and log messages

Alarm messages

N/A

Log messages

MACA_LOGOFF

Password control issues

Password change required upon admin login

Symptom

When an administrator logs in to the device through local authentication, the system identifies that the password strength does not meet the requirements and prompts the administrator to change the current login password.

Common causes

The following are the common causes of this type of issue:

· The password control configured in local user view has a high password strength check.

· The password control configured in local user group view has a high password strength check.

· The password control configured in system view has a high password strength check.

Troubleshooting flow

Figure 156 shows the troubleshooting flowchart.

Figure 156 Flowchart for troubleshooting password change upon the login of an administrator

Solution

1. Identify whether to reduce the current password check strength.

With the global password control feature enabled, when a device management user that log in via Telnet, SSH, HTTP, and HTTPS enters the login password, the system will check the user's login password according to password restrictions. The password restrictions include the current password composition policy, minimum password length, and password complexity policy. If the password does not meet the above password restrictions, the system considers the password weak. For information about password control, see Security Configuration Guide.

By default, when a user logs in to the device with a weak password, the system will generate an alarm message. If the current password strength check is higher than the actual login control requirements, identify the scope of changes (local user, user group, or all local users). Then, perform the subsequent steps to reduce the password check strength in the corresponding view.

2. Reduce the password check strength of password control for the local user.

Execute the local-user command to enter local user view and perform the following operations:

¡ Execute the password-control composition command to configure the password composition policy. In this example, a password must contain a minimum of four character types and a minimum of five characters for each type.

¡ Execute the password-control length command to set the minimum password length. In this example, the minimum password length is 16 characters.

¡ Execute the password-control complexity command to configure the password complexity policy. In this example, the device will identify whether a password contains the username.

<Sysname> system-view

[Sysname] local-user test class manage

[Sysname-luser-manage-test] password-control composition type-number 4 type-length 5

[Sysname-luser-manage-test] password-control length 16

[Sysname-luser-manage-test] password-control complexity user-name check

3. Reduce the password check strength of password control for the user group.

Execute the user-group command to enter user group view and perform the following operations:

¡ Execute the password-control composition command to configure the password composition policy for the user group.

¡ Execute the password-control length command to set the minimum password length.

¡ Execute the password-control complexity command to configure the password complexity policy.

4. Reduce the password check strength of password control for all local users.

In system view, perform the following operations:

¡ Execute the password-control composition command to configure the password composition policy.

¡ Execute the password-control length command to set the minimum password length.

¡ Execute the password-control complexity command to configure the password complexity policy.

5. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, diagnostics information, and prompt messages of the device.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Failure to create a local user or configure a user password

Symptom

When you fail to create a local user, the system generates the Add user failed message.

When you fail to configure the local user password, the system generates the Operation failed message.

Common causes

The following are the common causes of this type of issue:

· The memory usage of the device has reached the specified threshold.

· The local file system of the device is running out of memory space.

· An anomaly occurs on the local lauth.dat file of the device.

Troubleshooting flow

Figure 157 shows the troubleshooting flowchart.

Figure 157 Flowchart for troubleshooting failure to create a local user or configure a user password

Solution

1. Identify whether the amount of the free memory space of the device has reached the specified memory alarm threshold.

If you fail to change the local user's password, directly proceed to step 2.

Execute the display memory-threshold command to view memory alarm thresholds and statistics. You can obtain the current state of the free memory in the system. During the period when the system memory is in the minor, severe, and critical alarm threshold states, creating local users is not allowed.

<Sysname> display memory-threshold

Memory usage threshold: 100%

Free-memory thresholds:

Minor: 96M

Severe: 64M

Critical: 48M

Normal: 128M

Early-warning: 144M

Secure: 160M

Current free-memory state: Normal (secure)

...

You can execute the monitor process command to check the process statistics in any view. Enter m to locate the processes that are consuming excessive memory resources, sorted by memory usage. If necessary, clean up the memory space. After the memory alarm state is cleared, try again to create local users.

2. Identify whether the storage space of the local file system on the device is insufficient.

If any of the following types of log messages are output on the device, a file system error causes this issue:

¡ PWDCTL/3/PWDCTL_FAILED_TO_OPENFILE: Failed to create or open the password file.

¡ PWDCTL/3/PWDCTL_FAILED_TO_WRITEPWD: Failed to write the password records to file.

¡ PWDCTL/3/PWDCTL_NOENOUGHSPACE: Not enough free space on the storage media where the file is located.

Execute the dir command in user view to check the remaining capacity information of local storage media (such as flash). If no enough remaining space is available, delete unnecessary files.

3. Identify whether the local lauth.dat file is operating properly.

After the global password control feature is enabled, the device will automatically generate the lauth.dat file to record the local user's authentication and login information. If this file is manually deleted or edited, an anomaly occurs on local authentication. Execute the dir command in user view to check the presence of the lauth.dat file in the local storage media, such as flash.

<Sysname> dir

Directory of flash: (EXT4)

0 drw- - Aug 16 2021 11:45:37 core

1 drw- - Aug 16 2021 11:45:42 diagfile

2 drw- - Aug 16 2021 11:45:57 dlp

3 -rw- 713 Aug 16 2021 11:49:41 ifindex.dat

4 -rw- 12 Sep 01 2021 02:40:01 lauth.dat

...

If this file is absent, is 0 in size, or is very small (less than 20B when an anomaly might occur), contact Technical Support. If the current configuration is required urgently, you can try to resolve this issue by enabling the global password control feature.

<Sysname> system-view

[Sysname] undo password-control enable

[Sysname] password-control enable

If this issue is resolved, you can try to re-create the local user or configure the user password.

4. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, diagnostics information, and prompt messages of the device.

Related alarm and log messages

Alarm messages

N/A

Log messages

· PWDCTL/3/PWDCTL_FAILED_TO_WRITEPWD

· PWDCTL/3/PWDCTL_FAILED_TO_OPENFILE

· PWDCTL/3/PWDCTL_NOENOUGHSPACE

Admin login failure due to idle timeout

Symptom

When an administrator uses local authentication to log in to the device, the login might fail due to account idle timeout. The system generates the prompt message of Failed to login because the idle timer expired.

Common causes

The main reason for this issue is that when a user has not logged in successfully within the configured idle time since their last login, their account is immediately invalidated once the idle time expires. Then, the system no longer permits the user to log in using that account.

Troubleshooting flow

Figure 158 shows the troubleshooting flowchart.

Figure 158 Flowchart for troubleshooting login failure of an administrator due to idle timeout

Solution

1. Identify whether other administrators or methods can log in to the device.

¡ If other administrators or methods (such as console login) can log in to the device, only the target user is prevented from logging in to the device. Therefore, after other administrators log in, they can delete this local user and re-create it, or edit the idle time of the user account (by the password-control login idle-time command). If the idle time is set to 0, the system disables the account idle time restriction.

¡ If no other administrators or methods can log in to the device, proceed to step 2.

2. Identify whether the device is enabled with SNMP.

You can attempt to log in to the device through the network management system (NMS):

¡ If SNMP is enabled, use the MIB to change the system time to a point before the idle time, and then log in to the device with this administrator account. The MIB node for changing system time is hh3cSysLocalClock (1.3.6.1.4.1.25506.2.3.1.1.1) in HH3C-SYS-MAN-MIB.

After a successful re-login by the administrator, restore the system time and disable the idle timeout check for user accounts.

¡ If you disable SNMP, the MIB is not available. You can try to restart the device and enter the EXTENDED-BOOTWARE menu. Then, select either the option to bypass console authentication or bypass the configuration file option to access the system. As a best practice, contact Technical Support to perform this step.

3. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, diagnostics information, and prompt messages of the device.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Portal issues

Portal authentication page pushing failures

Symptom

When a user accesses any webpage that is not the portal Web server page, or directly accesses the portal Web server page, no portal authentication page is pushed for the user.

Common causes

The following are the common causes of this type of issue:

· The host, server, and device cannot reach one other.

· HTTP proxy has been enabled on the browser.

· The webpage address entered by the user contains a non-standard TCP port number (neither 80 nor 443).

· Exceptions occur on the intermediate network or DNS server.

· HTTPS redirect on the device is not working properly.

· HTTP Strict Transport Security (HSTS) has been enabled on the HTTPS website the user accesses.

· The portal server cannot identify the escaped characters of special characters in the URL.

· Portal server configuration errors.

Troubleshooting flow

Figure 159 shows the troubleshooting flowchart.

Figure 159 Flowchart for troubleshooting portal authentication page pushing failures

Solution

1. Verify that the route configuration on the client and portal server are correct.

After disabling the firewall on the client, use the ping command to check whether the portal server is reachable. If the server cannot be pinged, first check whether the route configurations on the client and the portal server are correct. Then proceed with the following steps:

¡ Check whether the return route from the portal server to the client is configured correctly.

¡ Whether multiple NICs are present on the client or the portal server.

If multiple NICs exist, some traffic between the client and the server might not pass through the network configured with portal authentication. Identify the NIC from which the user's Web access traffic is sent out. For example, if a Windows client is used, execute the route print command in the CMD window to view specific route information and identify the NIC.

Finally, use the ping command to test the connectivity for each pair of devices along the network path so as to locate the issue. First, ping the gateway from the client (for successful ping, you must disable authentication first), and then ping the server from the gateway.

2. Whether HTTP proxy has been enabled on the browser of the client.

If HTTP proxy has been enabled on the browser, users might be unable to access the portal authentication page. You must disable HTTP proxy. For example, open the Windows Internet Explorer browser, click Tools, select Internet Options > Connections > LAN Settings, and then clear the Use a proxy server for your LAN option in the Proxy server area.

3. Check whether the entered address includes a non-standard TCP port number.

Non-standard TCP port numbers refer to port numbers other than 80 or 443. If the webpage address entered by the user includes a non-standard TCP port number, the portal authentication page might be prevented from popping up. For example, http://10.1.1.1:18008. For HTTP addresses, use port 80. For HTTPS protocol addresses, use port 443.

4. Check whether exceptions have occurred on the intermediate network or DNS server:

a. Check whether the DNS server IP address is configured as a permitted address on the device.

b. Check the connectivity of the intermediate network and troubleshoot DNS server issues. On the gateway, collect traffic statistics on the downlink interface connecting the client and the uplink interface connecting the DNS server, or mirror and capture the client's packets accessing the DNS server. Confirm whether the gateway has sent out DNS requests, but has not received responses.

5. Check whether HTTPS redirect has been enabled:

a. Check whether the SSL server policy associated with the HTTPS redirect server exists. If not, complete the relevant configuration.

6. Check whether HSTS has been enabled on the HTTPS website.

With HSTS enabled, an HTTPS website requires browsers to access it using HTTPS and the certificate must be valid. When the device redirects the user's browser through HTTPS, the device uses a self-signed certificate (because it does not have the target website's certificate) to impersonate the target website and establish an SSL connection with the browser. If the browser detects the certificate as untrusted, HTTPS redirect will fail, preventing the portal authentication page from popping up. This issue is related to the specific HSTS protocol enforcement requirements set by the website, and cannot be resolved. In this case, try other websites as a best practice.

7. The portal server does not support encoding of special characters in the URL.

In actual applications, some portal Web servers cannot identify the escaped characters of any combination of special characters $-_.+!*'();,/?:@, so they cannot correctly provide the Web authentication page to users. To resolve this issue, you can execute the portal url-unescape-chars command to unescape these special characters.

# Configure the unescaped special characters in redirect portal Web server URLs as ;().

<Sysname> system-view

[Sysname] portal url-unescape-chars ;()

8. Check whether the portal server configuration is correct:

¡ Check whether an IP address group is configured on the portal server and whether the device is associated with an IP address group.

¡ Check whether the client IP address is within the range of the IP address group configured on the portal server.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ Device configuration files, log information, and alarm messages.

¡ Screenshots of portal-related configurations on the server.

¡ Files containing the packets captured between the device and the server.

¡ Screenshots of the issue taken on the client's browser.

¡ Portal filtering rules for packet matching displayed after you execute the display portal rule command.

¡ If the issue persists, execute the debugging portal or debugging ip packet command to collect debugging information.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Portal authentication failures

Symptom

Failures or exceptions occur in portal authentication for a user.

Common causes

The following are the common causes of this type of issue:

· The shared key configured in portal authentication server view on the device is inconsistent with that configured on the portal authentication server.

· The address of the portal authentication server configured in portal authentication server view on the device does not exist.

· The portal packets are invalid.

· The authentication domain used by the portal user is configured incorrectly.

· The shared key configured in RADIUS scheme view on the device is inconsistent with that configured on the RADIUS server.

· Failed to obtain the physical information of the user.

· The RADIUS server has denied the authentication.

· The RADIUS server is unresponsive.

· Failed to deploy the authorization ACL or user profile.

Troubleshooting flow

Figure 160 shows the troubleshooting flowchart.

Figure 160 Flowchart for troubleshooting portal authentication failures

Solution

1. Check whether shared key configured in portal authentication server view on the device is inconsistent with that configured on the portal authentication server.

If the IMC server is used, enter the username and password and then click Log In, check whether a message indicating the request to the device timed out is prompted. If so, this indicates that the shared key configured in portal authentication server view on the device might be inconsistent with that configured on the server.

You can troubleshoot using the following methods:

¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the shared key configured on the device is inconsistent with that configured on the portal server.

*Jul 28 17:51:20:774 2021 Sysname PORTAL/7/ERROR: -MDC=1; Packet validity check failed due to invalid key.

¡ Execute the display portal auth-error-record command to check whether the following information is displayed in the Auth error reason field of the command output: Packet validity check failed due to invalid authenticator.

If the shared keys are inconsistent, modify the shared key configured in portal authentication server view on the device or on the portal authentication server to ensure consistency.

2. Check whether the address of the portal authentication server configured in portal authentication server view on the device exists.

When the device receives an authentication packet sent by the portal server, it validates whether the source IP address of the packet is in the list of portal authentication server addresses configured on the device. If not, the device considers the authentication packet to be invalid and drops it.

If the IMC server is used, enter the username and password and then click Log In, check whether a message indicating the request to the device timed out is prompted. If so, this indicates that the authentication server address configured in portal server view on the device might not exist.

You can troubleshoot using the following method:

¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the IP address of the portal authentication server configured on the device is incorrect.

*Jul 28 19:15:10:665 2021 Sysname PORTAL/7/ERROR: -MDC=1;Packet source unknown. Server IP:192.168.161.188, VRF Index:0.

¡ Execute the display portal auth-error-record command to check whether the following information is displayed in the Auth error reason field of the command output: Packet source unknown. Server IP:X.X.X.X, VRF index:0.

If the address is incorrect, execute the ip command to modify the portal server's IP address in portal authentication sever view on the device.

3. Check whether the portal packets are invalid.

Upon receiving a portal packet sent by the portal server, the device performs a validity check on the packet. If the packet length is incorrect, or errors exist on the packet checksum, the packet will be considered as invalid and dropped.

You can check whether the portal packet is invalid using the following methods:

¡ Execute the display portal packet statistics command to check whether invalid packets exist and whether the number of invalid packets is increasing. If invalid packets exist, execute the debugging portal error command on the device to enable portal error debugging for troubleshooting.

¡ Execute the display portal auth-error-record command to check whether the following information is displayed in the Auth error reason field of the command output: Packet type invalid or Packet validity check failed because packet length and version don't match.

If the portal packets are invalid, identify the reason for invalidity and make modifications accordingly.

4. Check the authentication domain configuration used by the portal user:

The device selects the authentication domain for a portal user in this order:

a. ISP domain specified for the interface.

b. ISP domain carried in the username.

c. System default ISP domain.

If the chosen domain does not exist on the device, the device searches for the ISP domain configured to accommodate users assigned to nonexistent domains. If no such ISP domain is configured, user authentication fails.

Execute the display portal command to check whether an authentication domain is used on the authentication interface.

¡ If an authentication domain is used, check whether the authentication domain exists on the device, and whether the authentication, authorization, and accounting configurations in the domain are configured correctly.

¡ If no authentication domain is used, check whether the domain included in the username exists. If the domain does not exist, check whether the default authentication domain exists and whether the configuration in the default authentication domain is correct.

If the IMC server is used, enter the username and password and then click Log In, check whether a message indicating request rejection is prompted. If so, this indicates that the authentication domain configuration on the device might be incorrect.

You can troubleshoot using the following method:

¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the authentication domain is configured incorrectly on the device and further troubleshooting is required.

*Jul 28 19:49:12:725 2021 Sysname PORTAL/7/ERROR: -MDC=1; User-SM [21.0.0.21]: AAA processed authentication request and returned error.

¡ Execute the display portal auth-error-record command to check whether the following information is displayed in the Auth error reason field of the command output: AAA authentication failed or AAA returned an error.

If the authentication domain is configured incorrectly, execute the related command to configure a correct authentication domain used by the portal user.

5. Check whether the shared key configured in RADIUS scheme view on the device is consistent with that configured on the RADIUS server.

If the IMC server is used, enter the username and password and then click Log In, check whether a message indicating the request to the device timed out is prompted. If so, this indicates that the shared key configured in RADIUS scheme view is inconsistent with that configured on the server.

Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the shared key configured in RADIUS scheme view is inconsistent with that configured on the server.

*Jul 28 19:49:12:725 2021 Sysname RADIUS/7/ERROR: -MDC=1; The response packet has an invalid Response Authenticator value.

When the device initiates an authentication request to the RADIUS server, the server first validates the shared key used in the request. If the validation fails, the server notifies the device of the failure. If the shared key configuration is incorrect, make sure the shared key configured in the RADIUS scheme view is consistent with that configured on the server.

6. Check whether the device failed to obtain physical information about the user.

During the user's onboarding process, portal searches for the user's physical information, and identifies information such as the interface through which the user accesses based on the corresponding physical information. If the search for physical information fails, the user will fail to come online.

You can troubleshoot using the following method:

¡ Execute the debugging portal event command on the device to enable portal event debugging. If the following information is displayed on the device, it indicates that the device failed to obtain physical information about the user.

*Jul 28 19:49:12:725 2021 Sysname PORTAL/7/ERROR: -MDC=1; User-SM [21.0.0.21]: Failed to find physical info for ack_info.

¡ Execute the display portal auth-error-record or display portal auth-fail-record command to check whether the following information is displayed in the Auth error reason field of the command output: Failed to obtain user physical information or Failed to get physical information.

After you confirm that obtaining the user's physical information failed, check whether an entry for the authentication user exists on the device. If no entry exists, go to the next step.

7. Check whether the RADIUS server has rejected the authentication:

a. Many reasons might cause the RADIUS server to reject the authentication of a user. Most common ones include incorrect username or password, or failure in matching the RADIUS server's authorization policy. To resolve these issues, first check the authentication logs on the server, or enable RADIUS error debugging on the device by using the debugging radius error command to view the relevant debugging information. After identifying the root causes, adjust the configurations on the server, client, or device accordingly.

b. Execute the display portal auth-fail-record command to identify the portal authentication failure reason for the user displayed in the Auth error reason field of the command output.

8. Check whether the RADIUS server is unresponsive.

You can troubleshoot using the following methods:

¡ Execute the display radius scheme command and view the server's state displayed in the State field. If the state is Blocked, it indicates the server is unavailable.

¡ Check whether the device prints the following log:

RADIUS/4/RADIUS_AUTH_SERVER_DOWN: -MDC=1; RADIUS authentication server was

blocked: server IP=192.168.161.188, port=1812, VPN instance=public.

¡ Execute the debugging radius event command on the device to enable debugging for RADIUS events. If the following information is printed on the device, it indicates that the RADIUS server is unresponsive.

*Jul 28 19:49:12:725 2021 Sysname RADIUS/7/evnet: -MDC=1; Reached the maximum retries.

After confirming that the RADIUS server is unresponsive, proceed with the following steps:

a. Check whether the device's IP address has been added on the server.

- If not, add the device's IP address on the server. If yes, make sure the added device IP address is consistent with the source IP address of the authentication request. By default, the source IP address of RADIUS packets sent to the RADIUS server is the IP address of the outgoing interface for these packets.

- If yes, make sure the device IP address added on the server is the source IP address of the authentication request.

b. View packets on both the device and the server, and check whether exceptions has occurred in the intermediate links. For example, a firewall in the intermediate network might not allow RADIUS (default authentication port 1812) packets to pass through. If a large number of users cannot be authenticated and RADIUS server down records appear in the logs on the device, there is a high probability that exceptions has occurred on the server or the intermediate network, and further check is required.

9. Check whether the authorization ACL or user profile failed to be deployed.

With portal strict checking enabled, if the authorized ACL or user profile does not exist on the device or the device fails to be deployed, the device will force the portal user offline.

a. Execute the display portal command and view the Strict checking field to check whether strict checking is enabled on the device. Then determine whether you need to enable strict checking. If it is not required, disable it directly. If it is required, go to step b.

b. Execute the display acl or display user-profile command on the device to check whether the ACL or user profile authorized by the AAA server does not exist. If the ACL or user profile does not exist, determine whether authorization by the server is required. If yes, add the corresponding ACL or user profile configurations on the device.

10. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Information collected after you execute the display portal auth-error-record or display portal auth-fail-record command.

¡ Screenshots of portal-related configurations on the portal server.

¡ Files containing the packets captured between the device and the AAA server.

¡ Screenshots of the issue taken on the client's browser.

¡ Debugging information collected after you enable the debugging portal command.

Related alarm and log messages

Alarm messages

N/A

Log messages

RADIUS/4/RADIUS_AUTH_SERVER_DOWN

Portal authentication user disconnections

A portal user is disconnected after coming online for a period of time.

Common causes

The following are the common causes of this type of issue:

· The user session has timed out.

· User idle cut.

· Accounting update failures.

· User traffic reaches the threshold.

· The server forces the user offline.

· The user failed the online detection.

· The interface where the user accesses is down.

Troubleshooting flow

Figure 161 shows the troubleshooting flowchart.

Figure 161 Flow chart for troubleshooting portal authentication user disconnections

Solution

1. Execute the portal logout-record enable command to enable portal user offline recording.

2. Check whether the user session has timed out.

If the AAA server has deployed the session timeout time (single online duration) to the portal user, once the user's online duration exceeds the timeout time, the device logs out the user.

Use the following methods to check whether the portal user goes offline because of session timeout:

¡ View the user offline records on the AAA server.

¡ Execute the display portal logout-record command to view the user logout reason.

<Sysname> display portal logout-record all

Total logout records: 1

User name : gkt

User MAC : 0800-2700-94ad

Interface : Vlan-interface100

User IP address : 21.0.0.20

AP : N/A

SSID : N/A

User login time : 2021-07-29 11:05:58

User logout time : 2021-07-29 11:05:58

Logout reason : Session timeout

¡ Execute the debugging portal error command on the device to enable portal error debugging. If the following information is displayed on the device, it indicates that the portal user is logged out because of user session timeout.

*Jul 28 17:51:20:774 2021 Sysname PORTAL/7/ERROR: -MDC=1; Session timer timed out and the user will be logged off.

The user logout triggered by session timeout is a normal logout. The user can come online again.

3. Check whether the user goes offline because of user idle cut.

With the idle cut feature configured, if the device or the AAA server has authorized an idle timeout period for the user, the device periodically checks the user's traffic after the user comes online. If the user's traffic generated within the specified idle timeout period is less than the specified data volume, the user will be forced offline.

You can use the following methods to check whether the portal user goes offline because of idle cut:

¡ View the user logout records on the AAA server.

¡ Execute the display portal logout-record command to view the user offline records.

<Sysname> display portal logout-record all

Total logout records: 1

User name : gkt

User MAC : 0800-2700-94ad

Interface : Vlan-interface100

User IP address : 21.0.0.20

AP : N/A

SSID : N/A

User login time : 2021-07-29 11:05:58

User logout time : 2021-07-29 11:05:58

Logout reason : Idle timeout

*Jul 28 17:51:20:774 2021 Sysname PORTAL/7/ERROR: -MDC=1; Idle-cut timer timed out and the user will be logged off.

The logout triggered by idle timeout is a normal logout. The user can come online again.

4. Check whether accounting update failures.

When a remote portal authentication user comes online, the device periodically sends accounting-update packets to the AAA server. If the link between the device and the AAA server is disconnected or the server fails, the device fails to send accounting-update packets. When the maximum number of retransmissions is reached, transmission of accounting-update packets fails and the accounting update failure policy has been configured on the device, the user will be triggered to go offline. The accounting update failure policy is configured by using the accounting update-fail offline command.

You can use the following methods to check whether the user goes offline because of accounting update failures:

¡ Execute the display portal logout-record command to view the user offline records.

<Sysname> display portal logout-record all

Total logout records: 1

User name : gkt

User MAC : 0800-2700-94ad

Interface : Vlan-interface100

User IP address : 21.0.0.20

AP : N/A

SSID : N/A

User login time : 2021-07-29 11:05:58

User logout time : 2021-07-29 11:05:58

Logout reason : Accounting update failure

¡ Execute the display interface command to check whether the port on the device connected to the AAA server has any changes, or whether the AAA server has any exception records. Or, execute the display radius scheme command to check whether Block is displayed in the State field (indicating the state of the server). If yes, the reason for the user logout might be accounting update failures.

*Jul 28 17:51:20:774 2021 Sysname PORTAL/7/ERROR: -MDC=1; Processed accounting-update failed and user logout.

If you confirm that the user goes offline because of accounting update failures, check the link state between the device and the server, and check whether the relevant accounting configurations on the device and the AAA server have changed.

5. Check whether the user's traffic has reached the threshold.

When a user comes online, if the AAA server deploys a traffic threshold, the device will force the user offline once the user's traffic exceeds the deployed threshold.

You can use the following methods to check whether the user goes offline because traffic threshold reaching:

¡ Check the user offline records on the AAA server.

¡ Execute the display portal logout-record command to view the user offline records.

<Sysname> display portal logout-record all

Total logout records: 1

User name : gkt

User MAC : 0800-2700-94ad

Interface : Vlan-interface100

User IP address : 21.0.0.20

AP : N/A

SSID : N/A

User login time : 2021-07-29 11:05:58

User logout time : 2021-07-29 11:05:58

Logout reason : User traffic reached threshold

The user logout triggered by traffic threshold reaching is a normal logout. The user can come online again.

6. Check whether the AAA server actively kicks the user offline.

After RADIUS session-control feature is enabled on the device (using the radius session-control enable command), the device immediately forces a user offline upon reception of a disconnection request from the AAA server. If the feature is enabled, you can use the following methods to check whether the user is forced offline by the AAA server:

¡ View the user offline records on the AAA server.

¡ Execute the display portal logout-record command to view the user offline records.

<Sysname> display portal logout-record all

Total logout records: 1

User name : gkt

User MAC : 0800-2700-94ad

Interface : Vlan-interface100

User IP address : 21.0.0.20

AP : N/A

SSID : N/A

User login time : 2021-07-29 11:05:58

User logout time : 2021-07-29 11:05:58

Logout reason : Force logout by RADIUS server

For more information about the reasons for the forcible user logout, contact the server administrator.

7. Check whether the user goes offline because of online detection failures.

If the portal user online detection feature is enabled on the device (using the portal user-detect command), the device periodically sends detection packets to the user client. If the device has not received a response from the client after the specified maximum number of attempts, it will force the user offline.

Check whether the portal user online detection feature is enabled on the device. If the feature is enabled, you can use the following methods to check whether the user goes offline because of user online detection failures:

¡ View the user offline records on the AAA server.

¡ Execute the display portal logout-record command to view the user offline records.

<Sysname> display portal logout-record all

Total logout records: 1

User name : gkt

User MAC : 0800-2700-94ad

Interface : Vlan-interface100

User IP address : 21.0.0.20

AP : N/A

SSID : N/A

User login time : 2021-07-29 11:05:58

User logout time : 2021-07-29 11:05:58

Logout reason : User detection failure

After you confirm that the user goes offline because of user online detection failures, check the link state between the client and the device, and identify the reasons why the client does not respond to the detection packet.

8. Check whether the interface through which the portal user accesses is down.

If the interface used by the portal user goes down for a period of time, the device forces all portal users accessing through this interface offline.

You can confirm use the following methods to check whether the user has gone offline because of interface down:

¡ View the user logout records on the AAA server.

¡ Execute the display interface command to check whether the state of the interface changed. If the interface's state changed and the change time is close to the time when the user went offline, the reason for the user logout might be interface down.

¡ Execute the display portal logout-record command to view the user logout records

<Sysname> display portal logout-record all

Total logout records: 1

User name : gkt

User MAC : 0800-2700-94ad

Interface : Vlan-interface100

User IP address : 21.0.0.20

AP : N/A

SSID : N/A

User login time : 2021-07-29 11:05:58

User logout time : 2021-07-29 11:05:58

Logout reason : Interface down

If you confirm that the user goes offline because of the interface down, identify the reasons for interface down, such as loosely connected network cable.

9. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration file, log messages, and alarm messages.

¡ Screenshots of portal-related configurations on the portal server.

¡ User logout records on the AAA server.

¡ Files containing the packets captured between the device and the AAA server.

¡ Screenshots of the issue taken on the client's browser.

¡ Debugging information collected after you enable the debugging portal command.

Related alarm and log messages

Alarm messages

N/A

Log messages

N/A

Troubleshooting security issues

Troubleshooting SSH

Failure to log in to the device from the SSH client

Symptom

The SSH client fails to log in to the device as the SSH server.

Common causes

The following are the common causes of this type of issue:

· The SSH client cannot reach the device.

· The device is not enabled with the SSH server function.

· An SSH login control ACL is specified on the device, but the ACL does not permit the IP address of the SSH client.

· The service port specified by the client does not match the server port.

· The SSH version on the device is not compatible with the client.

· No local key pairs are generated on the device.

· The public key on the server is inconsistent with that cached on the device.

· The authentication method or access protocol for a user line is incorrectly configured.

· No SSH service is configured in local user view on the device.

· The service type or authentication method for the SSH user is incorrectly configured.

· The algorithms for SSH2 the device are not compatible that on the client.

· The device does not have enough VTY user line resources.

· The number of SSH login users on the device reaches the upper limit.

Troubleshooting flow

Figure 162 shows the troubleshooting flowchart.

Figure 162 Flowchart for troubleshooting SSH login failure

Solution

1. Verify that the client can ping the device.

Execute the ping command to check network connectivity.

¡ If the ping fails, see the ping troubleshooting guide to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.

¡ If the ping succeeds, proceed to step 2.

2. Verify that the SSH server function is enabled.

If the following log message occurs, the SSH server function is disabled:

SSHS/6/SSHS_SRV_UNAVAILABLE: The SCP server is disabled or the SCP service type is not supported.

Execute the display ssh server status command to identify whether the Stelnet server function, SFTP server function, NETCONF over SSH server function, and SCP server function are enabled as needed.

<Sysname> display ssh server status

Stelnet server: Disable

SSH version : 2.0

SSH authentication-timeout : 60 second(s)

SSH server key generating interval : 0 hour(s)

SSH authentication retries : 3 time(s)

SFTP server: Disable

SFTP Server Idle-Timeout: 10 minute(s)

NETCONF server: Disable

SCP server: Disable

¡ If the SSH server function is disabled, execute the following commands to enable related SSH server functions:

<Sysname> system-view

[Sysname] ssh server enable

[Sysname] sftp server enable

[Sysname] scp server enable

[Sysname] netconf ssh server enable

¡ If the SSH server function is enabled, proceed to step 3.

3. Identify whether an SSH login control ACL is configured.

Identify whether an SSH login control ACL is specified by the ssh server acl command.

¡ If an SSH login control ACL is configured, identify whether the specified ACL permits the IP address of the client.

If the following log message occurs, the specified ACL denies the IP address of the client:

SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.10 was denied by ACL rule (rule ID=20).

SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.11 was denied by ACL rule (default rule).

- If the specified ACL denies the IP address of the client, edit the SSH login control ACL for the ACL to permit the IP address of the client. If no SSH clients require login control, remove SSH login control settings.

- If the specified ACL already permits the IP address of the client, proceed to step 4.

¡ If no SSH login control ACL is configured, proceed to step 4.

4. Identify whether the SSH service port on the client matches that on the server.

If the SSH service port on the server changes, but the client still uses the default SSH service port, the SSH login will fail.

Take an H3C device as the client as an example. The following error message will occur: Failed to connect to host 10.1.1.1 port 100.

¡ If the SSH service port on the client does not match that on the server, execute the display current-configuration | include ssh command to view the SSH service port on the server, and then change the SSH service port on the client to that on the server.

¡ If the SSH service port on the client matches that on the server, proceed to the next step.

5. Identify whether the SSH version on the server is compatible with that on the client.

If the following log message occurs, the SSH version on the device is not compatible with that on the client:

SSHS/6/SSHS_VERSION_MISMATCH: SSH client 192.168.30.117 failed to log in because of version mismatch.

If an SSH1 client logs in to the device, you can execute the display ssh server status command on the device to identify the SSH version from the SSH version field.

¡ If the SSH version field displays 1.99, the device is compatible the SSH1 client. Then, proceed to the next step.

¡ If the SSH version field displays 2.0, execute the ssh server compatible-ssh1x enable command on the device to enable the device to support SSH1 clients.

6. Identify whether the server generates a local key pair.

When the device acts as the SSH server, you must configure a local asymmetric key pair. A client uses only one of DSA, ECDSA, or RSA public key algorithms to authenticate the server, but different clients support different algorithms. To ensure successful client login, generate DSA, ECDSA, and RSA key pairs on the server as a best practice.

Execute the display public-key local public command on the device to view local public key information on the device.

¡ If no DSA, ECDSA, or RSA key pair exists, execute the public-key local create command to configure these key pairs in sequence.

¡ If these key pairs are configured, proceed to the next step.

7. Identify whether the public key on the server is consistent with that cached on the client.

If the client chooses to save the server's public key upon the first login, updating the server's local key pair will cause the client to fail to authenticate the server.

This example uses an H3C device as the client. If the following message occurs upon client login, the public key on the server is inconsistent with that cached on the client:

The server's host key does not match the local cached key. Either the server administrator has changed the host key, or you connected to another server pretending to be this server. Please remove the local cached key, before logging in!

¡ If the inconsistency occurs, execute the undo public-key peer command to delete the old server public key saved on the client.

¡ If the inconsistency does not exist, proceed to the next step.

8. Identify whether the authentication method and the access protocol for a VTY user line are configured correctly.

If the client is an Stelnet or NETCONF over SSH client, execute the display this command in VTY user line view to identify whether the authentication method is scheme and SSH is specified as an access protocol.

[Sysname] line vty 0 63

[Sysname-line-vty0-63] display this

line vty 0 63

authentication-mode scheme

user-role network-admin

idle-timeout 0 0

¡ If the authentication method or access protocol is configured incorrectly, change the authentication method to scheme and specify SSH as one of the access protocols.

¡ If the configuration is correct, proceed to step 9.

9. Identify whether the SSH service is authorized to local users. This steps applies to only local authentication.

Execute the display this command in local user view to identify whether the SSH service is authorized to the local user.

[Sysname] local-user test

[Sysname-luser-manage-test] display this

local-user test class manage

service-type ssh

authorization-attribute user-role network-admin

authorization-attribute user-role network-operator

¡ If the SSH service is not authorized, execute the service-type command in local user view to specify the SSH service.

¡ If the SSH service is authorized, proceed to the next step.

If remote authentication is configured, locate issues as described in the AAA troubleshooting guide.

10. Identify whether an SSH user is configured and the correct service type and authentication method are specified for the SSH user.

SSH supports Stelnet, SFTP, NETCONF, and SCP service types.

First, identify whether the SSH user is created correctly based on the authentication method on the server.

¡ If the server uses the publickey authentication method, you must create an SSH user and a local user on the device. The two users must have the same username, so that the SSH user can be assigned the correct working directory and user role.

¡ If the server uses the password authentication method, you must perform one of the following tasks:

- For local authentication, configure a local user on the device.

- For remote authentication, configure an SSH user on a remote authentication server, for example, a RADIUS server. You do not need to create an SSH user. However, if such an SSH user has been created, make sure you have specified the correct service type and authentication method.

¡ If the server uses the keyboard-interactive, password-publickey, or any authentication method, you must create an SSH user on the device and perform one of the following tasks:

- For local authentication, configure a local user on the device.

- For remote authentication, configure an SSH user on a remote authentication server, for example, a RADIUS server.

Then, perform the following operations based on the result of the previous step:

¡ If no SSH user is created and required, proceed to the next step. If no SSH user is created but an SSH user is required, execute the ssh user command to create an SSH user.

¡ If an SSH user has been created, check the service type and authentication method for the SSH user.

To avoid login failure, the service type of the SSH user must match the client type, which can be Stelnet, SFTP, SCP, or NETCONF over SSH. Identify whether the service type for the SSH user is correct .

Take the SCP client as an example. The service type does not match the client type, if the following log message occurs on the device:

SSHS/6/SSHS_SRV_UNAVAILABLE: The SCP server is disabled or the SCP service type is not supported.

Then, perform the following operations:

- Execute the ssh user command in system view on the device to edit the service type for the SSH user.

- Execute the display ssh user-information command on the device to view the authentication method used by the SSH server. Identify whether the SSH user on the device is configured correctly based on the authentication method.

11. Identify whether algorithms for SSH2 on the device match the client.

Execute the display ssh2 algorithm command to view algorithms used by SSH2 to identify whether these algorithms include those supported by the client. For example, if the device is configured to not use CBC-related encryption algorithms, but the SSH client supports only CBC-related algorithms, the client will be unable to log in to the server.

Algorithms for SSH2 on the device does not match the client, if the following log message occurs on the device:

SSHS/6/SSHS_ALGORITHM_MISMATCH: SSH client 192.168.30.117 failed to log in because of encryption algorithm mismatch.

¡ If the algorithms on the client do not match those on the device, perform one of the following operations as needed:

- Execute the ssh2 algorithm cipher, ssh2 algorithm key-exchange, ssh2 algorithm mac, or ssh2 algorithm public-key command on the device to add algorithms supported by the client.

- Add algorithms supported by the server on the client.

¡ If the algorithms on the client match those on the device, proceed to the next step.

12. Identify whether the number of VTY users on the device reaches the upper limit.

Both SSH and Telnet users log in using VTY user lines, but VTY user lines are limited resources. If all VTY user lines are occupied, clients using Stelnet and NETCONF over SSH cannot log in. Clients using SFTP and SCP do not occupy user lines and can still log in.

The number of VTY users on the device reaches the upper limit if the following log message occurs:

SSHS/6/SSHS_REACH_USER_LIMIT: SSH client 192.168.30.117 failed to log in, because the number of users reached the upper limit.

Execute the display line command to identify whether VTY user lines are sufficient.

¡ If VTY user line resources are insufficient, change the authentication method for idle VTY lines with non-scheme authentication to scheme. If all VTY lines already use scheme authentication and are active, execute free line vty to forcibly release VTY lines. This allows new SSH users to come online.

¡ If VTY user line resources are sufficient, proceed to the next step.

13. Identify whether the number of online SSH users reaches the upper limit.

Execute the display ssh server session command to view session information on the server and the maximum number of SSH connections set by the aaa session-limit ssh command.

The number of online SSH users reaches the upper limit if the following log message occurs on the device:

SSHS/6/SSHS_REACH_SESSION_LIMIT: SSH client 192.168.30.117 failed to log in. The number of SSH sessions is 10, and exceeded the limit (10).

SSHS/6/SSHS_REACH_SESSION_LIMIT: SSH client 192.168.30.117 failed to log in. The current number of SSH sessions is 10. The maximum number allowed is 10.

¡ If the number of SSH sessions has reached the upper limit, execute aaa session-limit ssh to increase the upper limit. If the configured maximum number of user connections has reached the upper limit, disconnect idle SSH clients from the client side. This will allow new SSH users to come online.

¡ If the upper limit is not reached, proceed to the next step.

14. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module: HH3C-SSH-MIB

hh3cSSHVersionNegotiationFailure (1.3.6.1.4.1.25506.2.22.1.3.0.2)

Log messages

· SSHS/5/SSH_ACL_DENY

· SSHS/6/SSHS_ALGORITHM_MISMATCH

· SSHS/6/SSHS_REACH_SESSION_LIMIT

· SSHS/6/SSHS_REACH_USER_LIMIT

· SSHS/6/SSHS_SRV_UNAVAILABLE

· SSHS/6/SSHS_VERSION_MISMATCH

Failure to log in to the device as the SSH server through password authentication

Symptom

When the device acts the SSH server, a user fails to log in to the device through password authentication.

Common causes

The following are the common causes of this type of issue:

· The SSH client cannot reach the device.

· The login password of the SSH client is incorrect.

· The device is not enabled with the SSH server function.

· The SSH user is not configured on the SSH server.

· An SSH login control ACL is specified on the device, but the ACL does not permit the IP address of the SSH client.

· The number of SSH login users on the device reaches the upper limit.

· The SSH version on the device is not compatible with the client.

· The service type or authentication method for SSH users is configured incorrectly.

· No local key pairs are generated on the device.

· The SCP or SFTP working directory is incorrect.

Troubleshooting flow

Figure 163 shows the troubleshooting flowchart.

Figure 163 Flowchart for troubleshooting failure to log in to the device as the server through password authentication

Solution

1. Verify that the client can ping the device.

Execute the ping command to check network connectivity.

¡ If the ping fails, see the ping troubleshooting guide to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.

¡ If the ping succeeds, proceed to the next step.

2. Verify that the login password is correct.

¡ If the server uses local authentication, identify whether the login password of the user is consistent with that set for the local device management user on the device.

- If the inconsistency occurs, enter the correct login password again. If the login password is forgotten, enter the view of the local device management user and execute the password command to specify a new password to ensure that the login password of the user and the specified password are the same. The name of the local device management user is that of the current login user.

- If the inconsistency does not exist, proceed to step 3.

¡ If the server uses remote authentication, make sure the password of the current login user is consistent with that on the authentication server.

- If the inconsistency occurs, enter the correct login password again. If the password is forgotten, set a new password on the device for the login user. Make sure the set password is consistent with that on the authentication server.

- If the inconsistency does not exist, proceed to step 3.

3. Verify that the SSH server function is enabled.

If the following log message occurs, the SSH server function is disabled:

SSHS/6/SSHS_SRV_UNAVAILABLE: The SCP server is disabled or the SCP service type is not supported.

Execute the display ssh server status command to identify whether the Stelnet server function, SFTP server function, NETCONF over SSH server function, and SCP server function are enabled as needed.

<Sysname> display ssh server status

Stelnet server: Disable

SSH version : 2.0

SSH authentication-timeout : 60 second(s)

SSH server key generating interval : 0 hour(s)

SSH authentication retries : 3 time(s)

SFTP server: Disable

SFTP Server Idle-Timeout: 10 minute(s)

NETCONF server: Disable

SCP server: Disable

¡ If the SSH server function is disabled, execute the following commands to enable related SSH server functions:

<Sysname> system-view

[Sysname] ssh server enable

[Sysname] sftp server enable

[Sysname] scp server enable

[Sysname] netconf ssh server enable

¡ If the SSH server function is enabled, proceed to the next step.

4. Identify whether the SSH service port on the client matches that on the server.

If the SSH service port on the server changes, but the client still uses the default SSH service port, the SSH login will fail.

Take an H3C device as the client as an example. The following error message will occur: Failed to connect to host 10.1.1.1 port 22.

¡ If the SSH service port on the client matches that on the server, proceed to the next step.

5. Identify whether an SSH login control ACL is configured.

Identify whether an SSH login control ACL is specified by the ssh server acl command.

¡ If an SSH login control ACL is specified, identify whether the client is permitted by the ACL. First, execute the ssh server acl-deny-log enable command to enable logging for SSH login attempts that are denied by the SSH login control ACL.

¡ If the following log message occurs, the specified ACL denies the IP address of the client:

SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.10 was denied by ACL rule (rule ID=20).

SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.11 was denied by ACL rule (default rule).

- If the specified ACL already permits the IP address of the client, proceed to the next step.

¡ If no SSH login control ACL is specified, proceed to the next step.

6. Identify whether the SSH version on the server is compatible with the client version.

If the following log message occurs, the SSH version on the device is not compatible with that on the client:

SSHS/6/SSHS_VERSION_MISMATCH: SSH client 192.168.30.117 failed to log in because of version mismatch.

If an SSH1 client logs in to the device, you can execute the display ssh server status command on the device to identify the SSH version from the SSH version field.

¡ If the SSH version field displays 1.99, the device is compatible the SSH1 client. Then, proceed to the next step.

¡ If the SSH version field displays 2.0, execute the ssh server compatible-ssh1x enable command on the device to enable the device to support SSH1 clients.

7. Identify whether the authentication method and the access protocol for a VTY user line are configured incorrectly.

[Sysname] line vty 0 63

[Sysname-line-vty0-63] display this

line vty 0 63

authentication-mode scheme

user-role network-admin

idle-timeout 0 0

¡ If the authentication method or access protocol is configured incorrectly, execute authentication-mode scheme to change the authentication method to scheme and execute protocol inbound ssh to specify SSH as one of the access protocols.

¡ If the configuration is correct, proceed to the next step.

8. Identify whether the number of VTY users on the device reaches the upper limit.

The number of VTY users on the device reaches the upper limit if the following log message occurs:

SSHS/6/SSHS_REACH_USER_LIMIT: SSH client 192.168.30.117 failed to log in, because the number of users reached the upper limit.

Execute the display line command to identify whether VTY user lines are sufficient.

¡ If VTY user line resources are sufficient, proceed to the next step.

9. Identify whether the number of online SSH users reaches the upper limit.

Execute the display ssh server session command to view session information on the server and the maximum number of SSH connections set by the aaa session-limit ssh command.

The number of online SSH users reaches the upper limit if the following log message occurs on the device:

SSHS/6/SSHS_REACH_SESSION_LIMIT: SSH client 192.168.30.117 failed to log in. The number of SSH sessions is 10, and exceeded the limit (10).

¡ If the upper limit is not reached, proceed to the next step.

10. Identify whether the server generates a local key pair.

To prevent fake server spoofing, the client first identifies whether the public key sent from the server matches the one stored locally when the client authenticates the server. After the client verifies the public key consistency, the client uses this public key to verify the server's digital signature. If the client has not saved the server's public key or the saved server’s public key is incorrect, server authentication will fail, preventing the client from logging in to the server. Therefore, before the client logs in to the server, create a key pair on the server and save the correct server’s public key on the client.

A client uses only one of DSA, ECDSA, or RSA public key algorithms to authenticate the server, but different clients support different algorithms. To ensure successful client login, generate DSA, ECDSA, and RSA key pairs on the server as a best practice.

Execute the display public-key local public command no the device to view local public key information on the device.

¡ If no DSA, ECDSA, or RSA key pair exists, execute the public-key local create command to configure these key pairs in sequence. Make sure the public key generated on the server is saved to the client.

¡ If these key pairs are configured, proceed to the next step.

11. Identify whether an SSH user is configured and the correct service type and authentication method are specified for the SSH user.

SSH supports Stelnet, SFTP, NETCONF, and SCP service types.

Identify whether the SSH user is created correctly based on the authentication method on the server.

If the server uses the password authentication method, you must perform one of the following tasks:

¡ For local authentication, configure a local user on the device.

¡ For remote authentication, configure an SSH user on a remote authentication server, for example, a RADIUS server.

For remote authentication, you do not need to create an SSH user. However, if such an SSH user has been created, make sure you have specified the correct service type and authentication method.

Perform the following operations based on the result of the previous step:

¡ If no SSH user is created and required, proceed to the next step. If no SSH user is created but an SSH user is required, execute the ssh user command to create an SSH user.

¡ If an SSH user has been created, check the service type and authentication method for the SSH user.

- Execute the display ssh user-information command on the device to view the service type and authentication for the SSH user from the Service-type and Authentication-type fields, respectively. The service type of the SSH user must match the client type, which can be Stelnet, SFTP, SCP, or NETCONF over SSH, and the authentication method must be password.

- Execute the ssh user command in system view on the device to edit the service type and authentication method for the SSH user as required.

12. Identify whether the SCP or SFTP working directory is correct.

When the service type for an SSH user is SCP or SFTP, specify an authorized directory for the SSH user. If the specified authorized directory does not exist, the SCP or SFTP client will fail to connect to the SCP or SFTP server through that SSH user. For a password authentication user, identify whether the AAA-authorized working directory exists.

¡ If the working directory does not exist, execute the authorization-attribute work-directory directory-name command in local user view to edit the authorized working directory.

¡ If the working directory exists, proceed to the next step.

13. If the issue persists, collect the following information and contact Technical Support:

¡ Results of each step.

¡ The configuration files, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module: HH3C-SSH-MIB

hh3cSSHVersionNegotiationFailure (1.3.6.1.4.1.25506.2.22.1.3.0.2)

Log messages

· SSHS/5/SSH_ACL_DENY

· SSHS/6/SSHS_ALGORITHM_MISMATCH

· SSHS/6/SSHS_REACH_SESSION_LIMIT

· SSHS/6/SSHS_REACH_USER_LIMIT

· SSHS/6/SSHS_SRV_UNAVAILABLE

· SSHS/6/SSHS_VERSION_MISMATCH

Failure to log in to the device as the SSH server through publickey authentication

Symptom

When the device acts the SSH server, a user fails to log in to the device through publickey authentication.

Common causes

The following are the common causes of this type of issue:

· The SSH client cannot reach the device.

· The user's public key on the server is configured incorrectly.

· The device is not enabled with the SSH server function.

· The SSH user is not configured on the SSH server.

· An SSH login control ACL is specified on the device, but the ACL does not permit the IP address of the SSH client.

· The number of SSH login users on the device reaches the upper limit.

· The SSH version on the device is not compatible with the client.

· The service type or authentication method for SSH users is configured incorrectly.

· No local key pairs are generated on the device.

· The SCP or SFTP working directory is incorrect.

Troubleshooting flow

Figure 164 shows the troubleshooting flowchart.

Figure 164 Flowchart for troubleshooting failure to log in to the device as the server through publickey authentication

Solution

1. Verify that the client can ping the device.

Execute the ping command to check network connectivity.

¡ If the ping fails, see the ping troubleshooting guide to locate ping issues to ensure that that the SSL VPN client can ping the SSL VPN gateway.

¡ If the ping succeeds, proceed to step 2.

2. Identify whether the user's public key configured on the server matches the private key used by the user.

The SSH client might support multiple public key algorithms, each corresponding to a different asymmetric key pair. User authentication will succeed only when the type of public key saved on the server matches the type of private key used by the user during login. For example, if the server specifies the DSA public key for a user and the user also has a matching private key. However, user authentication will fail if the user attempts to log in using an RSA private key, . Execute the display public-key peer command on the device to view client public key information saved on the device. Identify whether the client public key is consistent with the type of private key used by the login user.

¡ If the inconsistency occurs, execute the public-key local create command for the device to generate the corresponding type of private key pair.

¡ If the consistency exists, proceed to step 3.

3. Verify that the SSH server function is enabled.

If the following log message occurs, the SSH server function is disabled:

SSHS/6/SSHS_SRV_UNAVAILABLE: The SCP server is disabled or the SCP service type is not supported.

Execute the display ssh server status command to identify whether the Stelnet server function, SFTP server function, NETCONF over SSH server function, and SCP server function are enabled as needed.

<Sysname> display ssh server status

Stelnet server: Disable

SSH version : 2.0

SSH authentication-timeout : 60 second(s)

SSH server key generating interval : 0 hour(s)

SSH authentication retries : 3 time(s)

SFTP server: Disable

SFTP Server Idle-Timeout: 10 minute(s)

NETCONF server: Disable

SCP server: Disable

¡ If the SSH server function is disabled, execute the following commands to enable related SSH server functions:

<Sysname> system-view

[Sysname] ssh server enable

[Sysname] sftp server enable

[Sysname] scp server enable

[Sysname] netconf ssh server enable

¡ If the SSH server function is enabled, proceed to step 4.

4. Identify whether the SSH service port on the client matches that on the server.

If the SSH service port on the server changes, but the client still uses the default SSH service port, the SSH login will fail.

Take an H3C device as the client as an example. The following error message will occur: Failed to connect to host 10.1.1.1 port 100.

¡ If the SSH service port on the client matches that on the server, proceed to step 5.

5. Identify whether an SSH login control ACL is configured.

Identify whether an SSH login control ACL is specified by the ssh server acl command.

If the following log message occurs, the specified ACL denies the IP address of the client:

SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.10 was denied by ACL rule (rule ID=20).

SSHS/5/SSH_ACL_DENY: The SSH connection request from 181.1.1.11 was denied by ACL rule (default rule).

- If the specified ACL already permits the IP address of the client, proceed to the next step

¡ If no SSH login control ACL is configured, proceed to the next step.