02-Virtual Technologies

HomeSupportRoutersSR6602-I[IE] SeriesDiagnose & MaintainTroubleshootingH3C SR6602-I[IE] AI-Powered ICT Converged Gateways Troubleshooting Guide(V9)-R9141-6W10002-Virtual Technologies
Table of Contents
Related Documents
01-IRF Troubleshooting Guide
Title Size Download
01-IRF Troubleshooting Guide 133.08 KB

Troubleshooting virtual technologies

IRF issues

IRF setup failure

Symptom

Several devices cannot form an IRF fabric, or a new member device cannot join an existing IRF fabric.

Common causes

The following are the common causes of this type of issue:

·     When you use member devices to set up a new IRF fabric, the total number of IRF member devices exceeds the upper limit. When you add a new member device to an existing IRF fabric, the number of existing IRF member devices has reached the upper limit in that IRF fabric.

·     The device configuration does not meet the IRF setup requirements.

·     The IRF physical interfaces, cables, and physical topology do not meet the IRF setup requirements. As a result, the IRF links cannot come up.

Troubleshooting flow

Figure 1 shows the troubleshooting flowchart.

Figure 1 Flowchart for troubleshooting IRF setup failure

 

Solution

IMPORTANT

IMPORTANT:

This section only covers the routine requirements for setting up an IRF fabric. For more information about the requirements for setting up an IRF fabric, see IRF configuration in the configuration guides for the product.

 

1.     Identify whether the number of IRF member devices has reached the maximum value supported by the system.

Execute the display irf command to view the number of member devices in the current IRF fabric. If the number of IRF member devices has reached the maximum value supported by the system, you cannot add any member device to the IRF fabric.

The maximum number of member devices in an IRF fabric varies by device model. For example, an S12500-AF IRF fabric can contain a maximum of four member devices.

2.     Verify that all member devices run the same version of software.

Execute the display version command to display the current software version on each device. Only devices running the same software version can form an IRF fabric.

Typically, the IRF auto-update feature (enabled by default) can automatically synchronize the software version of a member device with the software version of the master device. However, the synchronization might fail when the gap between the software versions is large. In this case, you must manually upgrade the software of that member device.

If the member device has two MPUs, you must upgrade software for both the MPUs to ensure software consistency across them.

3.     Verify that the IRF configuration on each member device meets the IRF setup requirements.

a.     Verify that all member devices are operating in IRF mode.

Some products are shipped in IRF mode and do not support mode conversion. Some products are shipped in standalone mode and support mode conversion. If a device supports the display irf link or display irf topology command, the device is operating in IRF mode. If a device does not support either of the commands, the device is operating in standalone mode. To enable IRF mode for the device, execute the chassis convert mode irf command in system view.

<Sysname> display irf ?

  >              Redirect it to a file

  >>             Redirect it to a file in append mode

  configuration  IRF configuration that will be valid after reboot

  link           Display link status

  topology       Topology information

  |              Matching output

  <cr>

b.     Verify that the member ID of each member device is unique across the IRF fabric.

Execute the display irf command to display the member IDs of the member devices in the IRF fabric. Each member device in the IRF fabric must use a unique member ID. Devices that use the same member ID cannot establish an IRF fabric or join the same IRF fabric. The default member ID for a device is 1. In standalone mode, you can change the IRF member ID of a device by using the irf member command. In IRF mode, you can change the IRF member ID of a device by using the irf member renumber command. For the new member ID to take effect, you must save the configuration and reboot the device.

c.     Verify that each member device is shipped with a unique bridge MAC address.

Member devices shipped with the same bridge MAC address cannot join the same IRF fabric. Typically, each device is shipped with a unique bridge MAC address across the network. If IRF setup fails and the Failed to stack because of the same bridge MAC addresses message is generated, two devices are shipped with the same bridge MAC address. In this case, use the irf mac-address command to change the bridge MAC address on one of the devices. (Support for the irf mac-address command depends on the device model.)

d.     Verify that all member devices in the same IRF fabric use the same IRF domain ID.

The IRF domain ID does not affect IRF fabric setup and merge, but it affects multi-active detection (MAD). To ensure that MAD can operate correctly, make sure all member devices in the same IRF fabric use the same IRF domain ID. By default, the IRF domain ID is 0. To obtain the IRF domain ID of a device, execute the display irf command on that device and check the value in the Domain ID field of the command output. If the IRF domain ID of a device is different from that of the other devices, execute the irf domain command to change the IRF domain ID on the device.

4.     Verify that the IRF ports are in up state.

An IRF port is a logical interface that connects IRF member devices. To use an IRF port, you must bind a minimum of one physical interface to it. To obtain the status of IRF ports, execute the display irf topology command and check the value in the Link field of the command output.

<Sysname> display irf topology

                              Topology Info

 -------------------------------------------------------------------------

               IRF-Port1                IRF-Port2

 MemberID    Link       neighbor      Link       neighbor    Belong To

 2           DIS        ---           UP         1           5e40-08d9-0104

 1           UP         2             DIS        ---         5e40-08d9-0104

¡     If the value of the Link field is UP for an IRF port on a member device, the IRF port is correctly connected and no action is required.

¡     If the value of the Link field is DIS for an IRF port on a member device, no IRF physical interfaces have been bound to the IRF port. If binding IRF physical interfaces to the IRF port is required, execute the port group interface command in IRF port view to bind IRF physical interfaces to the IRF port.

¡     If the value of the Link field is DOWN for an IRF port on a member device, execute the display irf link command to examine whether the IRF physical interfaces bound to the IRF port are in UP state.

-     If a minimum of one IRF physical interface is up when the IRF port is down, the configuration of the IRF port might not be activated. To activate the IRF port configuration, execute the irf-port-configuration active command in system view.

-     If no IRF physical interfaces are in UP state, proceed to step 5 to troubleshoot the IRF physical interface issue.

¡     If the value of the Link field is TIMEOUT for an IRF port on a member device, the IRF hello packets have timed out and the IRF link has communication issues. Perform the following tasks to locate the timeout issue of IRF packets:

-     Identify whether the IRF packet exchange failure is caused by an anomaly of the neighboring IRF port. For this purpose, log in to the neighboring device at the other end of the IRF link, execute the display irf topology and display irf link commands on the neighboring device, and then locate the issue based on the command output.

-     Verify that no network loops exist on the IRF fabric, as they lead to packet loss. To identify whether a network loop exists, execute the display counters rate inbound interface command to display the packet rate statistics of the IRF physical interfaces and examine whether a packet storm has occurred on the IRF link. If a packet storm exists, check for a physical loop and examine whether the VLAN and STP settings are correct. If a physical loop exists or the settings are incorrect, remove the loop or correct the settings to resolve the packet storm issue.

-     Execute the display device command to examine whether the switching fabric modules are operating correctly. If not, first troubleshoot the issue with the switching fabric module.

¡     If the value of the Link field is ISOLATE for an IRF port on a member device, the member device is isolated. In this case, execute the display logbuffer | include STM stackability check command, and then proceed according to the command output.

-     If the command output includes the STM stackability check: Product series is inconsistency message, the model of the member device does not meet the IRF setup requirements. In this case, proceed to step 7.

-     If the command output includes the STM stackability check: Product xxx is inconsistency message, where xxx might represent the system operating mode or other settings that require consistency across member devices, the current system parameter configuration does not meet the IRF setup requirements. In this case, proceed to step 8.

5.     Check the state of IRF physical interfaces and verify that a minimum of one IRF physical interface is up for each IRF port.

Execute the display irf link command to check the state of IRF physical interfaces.

¡     If the value of the Interface field is disable for an IRF port, no IRF physical interfaces have been bound to the IRF port.

¡     If the value of the Interface field for an IRF port is one or multiple physical interface names, continue to check the Status field. The value and meaning of the Status field are as follows:

-     UP—An IRF physical link is up. In this state, no action is required.

-     DOWN—An IRF physical link is down. In this case, verify that the transceiver module and fiber or cable of the IRF physical interface is operating correctly. You must use a physical interface that meets the product requirements as an IRF physical interface and use a connection medium that meets the product requirements to connect the IRF physical interface. When the transceiver module and fiber or cable of the IRF physical interface is operating correctly, proceed to step 6.

-     ADM—An IRF physical interface is shut down by using the shutdown command. In this state, the IRF physical interface is administratively down. To bring up the IRF physical interface, you must execute the undo shutdown command.

-     ABSENT—An IRF physical interface does not exist. You can insert the card or expansion interface module that hosts the interface.

6.     Verify that the IRF physical connections meet the IRF connection requirements.

Perform the following operations to locate an IRF physical connection issue:

a.     On each member device, execute the display irf configuration command to view the binding relationship between IRF ports and IRF physical interfaces. Verify that the IRF physical interfaces bound to IRF ports are consistent with those on the IRF physical connections. If not, reconfigure the IRF port bindings or reconnect physical interfaces.

b.     Verify that the IRF physical interfaces are correctly connected. Make sure the IRF physical interfaces of IRF-port 1 on one member device are connected to the IRF physical interfaces of IRF-port 2 on another member device. If the IRF fabric contains only two member devices, you must connect them in a daisy-chain topology rather than a ring topology.

7.     Verify that the hardware of the member devices meets the IRF setup requirements.

You must use hardware that meets the IRF setup requirements to set up an IRF fabric. For example, the device model, MPUs, interface modules, and IRF physical interfaces must meet the IRF setup requirements. You can perform the following tasks to determine whether the device hardware meets the IRF setup requirements:

# Execute the display version command to check the device model.

<Sysname> display version

H3C Comware Software, Version 7.1.070, Alpha 704228

Copyright (c) 2004-2021 New H3C Technologies Co., Ltd. All rights reserved.

H3C S12508X-AF uptime is 0 weeks, 0 days, 2 hours, 31 minutes

Last reboot reason : Cold reboot

...

# Execute the display device command to check the models of the MPUs and interface modules.

<Sysname> display device

Slot   Type                State    Subslot  Soft Ver             Patch Ver

1/0    LSXM1SUPB1          Master   0        S12508X-AF-704228    None

1/1    NONE                Absent   0        NONE                 None

1/2    NONE                Absent   0        NONE                 None

1/3    LSXM1CGQ18QGHB1     Normal   0        S12508X-AF-704228    None

...

# Execute the display interface command to check the rate and type of each IRF physical interface.

<Sysname> display interface ten-gigabitethernet 0/0/6

Ten-GigabitEthernet0/0/6

Current state: UP

IP packet frame type: Ethernet II, hardware address: 4077-a9ee-ce85

Description: Ten-GigabitEthernet0/0/6 Interface

Bandwidth: 10000000 kbps

Loopback is not set

Media type is optical fiber, port is 10G_BASE_SR_SFP

10Gbps-speed mode, full-duplex mode

Link speed type is autonegotiation, link duplex type is autonegotiation

...

8.     Verify that the system parameter settings meet the IRF setup requirements.

To set up an IRF fabric, all member devices must use the same system parameter settings, including the same system operating mode, VXLAN hardware resource mode, route hardware resource mode, and maximum number of ECMP routes. (The restrictions vary by device model.)

¡     To display the system operating mode on a device, use the display system-working-mode command. To change the system operating mode of the device, use the system-working-mode command. For the mode change to take effect, you must save the configuration and reboot the device.

¡     To display the hardware resource modes on a device, use the display hardware-resource command. To change the VXLAN and route hardware resource modes of the device, use the hardware-resource vxlan and hardware-resource routing-mode commands, respectively. For the mode changes to take effect, you must save the configuration and reboot the device.

¡     To display the maximum number of IPv4 ECMP routes and the maximum number of IPv6 ECMP routes supported by the system, use the display max-ecmp-num and display ipv6 max-ecmp-num commands, respectively. To change the maximum number of IPv4 ECMP routes and the maximum number of IPv6 ECMP routes, use the max-ecmp-num and ipv6 max-ecmp-num commands, respectively. For the changes to take effect, you must save the configuration and reboot the device.

9.     If the issue persists, collect the following information and contact Technical Support:

¡     Results of each step.

¡     The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

Module name: HH3C-STACK-MIB

·     hh3cStackPhysicalIntfLinkDown(1.3.6.1.4.1.25506.2.91.6.0.8)

·     hh3cStackPhysicalIntfRxTimeout (1.3.6.1.4.1.25506.2.91.6.0.9)

Log messages

·     STM/3/STM_LINK_DOWN

·     STM/2/STM_LINK_TIMEOUT

·     STM/6/STM_LINK_UP

·     STM/4/STM_SAMEMAC

·     STM/3/STM_SOMER_CHECK

Unexpected reboot of an IRF member device

Symptom

The master device or a subordinate device in an IRF fabric reboots unexpectedly. As a result, the IRF fabric splits.

Common causes

The following are the common causes of this type of issue:

·     The subordinate device automatically reboots to load startup software images from the master device.

·     IRF merge causes the subordinate device to reboot.

·     A software or hardware fault causes the device to reboot unexpectedly in an attempt to fix the fault.

Troubleshooting flow

Figure 2 shows the troubleshooting flowchart.

Figure 2 Flowchart for troubleshooting unexpected reboot of an IRF member device

 

 

Solution

1.     Identify whether the rebooted device is a subordinate device.

¡     If the device is a subordinate device, proceed to step 2.

¡     If the device is the master device, proceed to step 4.

2.     Identify whether the reboot is caused by the software auto-update feature.

¡     If the reboot is caused by the software auto-update feature, no action is required.

¡     If the reboot is not caused by the software auto-update feature, proceed to step 3.

To identify whether the reboot of the subordinate device is caused by the software auto-update feature, execute the display system internal irf msg command in probe view. If the command output includes the Version is different, and the sender CPU MAC is xxxx-xxxx-xxxx (chassis xx slot xx). message, the reboot of the subordinate device with the CPU MAC of xxxx-xxxx-xxxx is caused by the software auto-update feature.

3.     Identify whether the reboot is caused by an IRF merge.

¡     If the reboot is caused by an IRF merge, locate the causes of the IRF split and merge, and eliminate security risks to prevent the same issue from causing an IRF split and merge again.

¡     If the reboot is not caused by an IRF merge, proceed to step 4.

To identify whether the reboot of the subordinate device is caused by an IRF merge:

¡     Execute the display kernel reboot command on the IRF fabric to obtain the device reboot reason after the device reboots. If the value for the Reason field is 0x7, the device reboots due to an IRF merge. The value for the Slot field represents the number of the slot that triggers the reboot, and the value for the Target Slot field represents the number of the slot that has been rebooted.

<Sysname> display kernel reboot 1

--------------------- Reboot record 1 ---------------------

Recorded at           : 2021-12-06  00:10:05.440616

Occurred at           : 2021-12-06  00:10:05.440616

Reason                : 0x7

Thread                : STM_Main (TID: 232)

Context               : thread context

Slot                  : 1

Target Slot           : 2

Cpu                   : 0

VCPU ID               : 2

Kernel module info    : module name (system) module address (0xffffffffc0074000)

                        module name (addon) module address (0xffffffffc0008000)

¡     Execute the display system internal irf msg | include reboot command in probe view on the IRF fabric. If the master device has sent a reboot message, the reboot of the subordinate device is caused by an IRF merge.

19> Send reboot pkt, src_addr 5e40-08d9-0104 (chassis 1 slot 1), at 2022/1/5 15:42:48:386

4.     Examine whether the reboot is caused by a software or hardware fault.

Execute the display version command, check the Reboot Cause field for the reboot cause, and handle the reboot issue according to the reboot cause as shown in Table 1.

<Sysname> display version

...

Reboot Cause  :     ColdReboot

[SubSlot 0] 24GE+4SFP Plus+POE

Table 1 Device reboot causes and recommended actions

Value for the Reboot Cause field

Reboot cause description

Recommended actions

AutoUpdateReboot

The reboot was caused by an automatic software upgrade.

No action is required.

BootwareBackupReboot

Bootware backup area reboot.

Collect log messages and diagnostic messages, and then contact Technical Support for help.

ColdReboot

The reboot was caused by a power cycle.

Check the power supply environment of the device to ensure that the power supply module can provide power correctly to the device.

CryptographicModuleSelftestsFailedReboot

The reboot was caused by an algorithm library self-test failure.

Upgrade the software version as soon as possible.

CryptotestFailReboot

The reboot was caused by a cryptographic algorithm library self-check failure.

Upgrade the software version as soon as possible.

DeadLoopReboot

The reboot was caused by a kernel thread dead loop.

Collect log messages, diagnostic messages, and the command output from the display kernel deadloop 20 verbose command executed for the reboot slot, and then contact Technical Support for help.

DEVHandShakeReboot

The reboot was caused by a device management handshake failure.

Execute the display device command to identify whether the active MPU is in Normal state. If the state is not Normal, the MPU might fail. You must resolve the MPU issue first.

GoldMonReboot

The Generic OnLine Diagnostics (GOLD) module detected an exception.

Perform the following operations to locate the reboot cause:

1.     Execute the display diagnostic content command, check the Correct-action field, and find that the corrective action is reboot. Then, obtain the time when the device was rebooted and troubleshoot issues occurred around the time.

2.     Execute the display diagnostic event-log command to display GOLD log entries.

3.     Locate the reboot cause based on the command output and resolve the issue.

IRFMergeReboot

The reboot was caused by an IRF merge.

An IRF link failure can cause an IRF split. Once the IRF link is recovered, the IRF fabric will automatically merge. To prevent the same issue from causing an IRF split and merge again, locate and resolve the issue.

KernelAbnormalReboot

A CPU, host memory, or software issue led to a system kernel error.

Collect log messages, diagnostic messages, and the command output from the display kernel exception 10 verbose and display kernel reboot 20 verbose commands, and then contact Technical Support for help.

KeyReboot

The RESET key was pressed.

Avoid accidental operations.

LicenseTimeoutReboot

The license has expired.

Install a formal license as soon as possible.

MasterLostReboot

The master slot was rebooted while the current slot was performing a bulk backup operation.

Collect log messages and diagnostic messages, and then contact Technical Support for help.

MemoryexhaustReboot

The amount of free memory is lower than the threshold value.

Identify the cause of high memory usage and resolve the high memory usage fault accordingly. For example, too many ACL entries can cause high memory usage.

PdtReboot

The reboot was required by the driver.

Collect log messages and diagnostic messages, and then contact Technical Support for help.

SelfReboot

The current slot was reset.

Collect log messages and diagnostic messages, and then contact Technical Support for help.

StandbyCannotUpdateReboot

The standby MPU cannot be upgraded to the active MPU.

Collect log messages and diagnostic messages, and then contact Technical Support for help.

StandbySwitchReboot

The original active MPU was rebooted after an active/standby switchover.

Identify the cause of the active/standby switchover and resolve the fault that causes the active/standby switchover to prevent another unexpected active/standby switchover. For example, software upgrade can cause an active/standby switchover.

UserReboot

The reboot was caused by a manual operation through the CLI, the network manager, or the Web interface.

No action is required.

WarmReboot

The reboot might be caused by various reasons, for example, poor contact of board pins.

Collect log messages and diagnostic messages, and then contact Technical Support for help.

WatchDogReboot

The watchdog detected a system fault, for example, a CPU, memory, software, or hardware fault.

Use the display hardware-failure-detection command to locate the cause of the fault based on the command output, and troubleshoot the fault.

 

5.     If the issue persists, collect the following information and contact Technical Support:

¡     For example, the active MPU is in slot 16 and the standby MPU is in slot 17. The standby MPU reboots. To resolve the issue, collect the output information of the following commands:

-     Execute the following commands in any view:

display version

display device

display diagnostic-information

display kernel deadloop 20 verbose slot 16

display kernel exception 10 verbose slot 16

display kernel reboot 20 verbose slot 16

-     Execute the following commands in probe view to collect information:

local logbuffer slot 17 display

local logbuffer slot 17 display from-highmemory

display reboot last-time slot 17

display system internal version

display diag-msg start-msg slot 17

 

 

NOTE:

Support for these commands depends on the device model and software version.

 

¡     The configuration file, log messages, and alarm messages.

Related alarm and log messages

Alarm messages

N/A

Log messages

·     DEV/1/AUTO_SWITCH_FAULT_REBOOT

·     DEV/5/BOARD_REBOOT

·     DEV/1/BOARD_RUNNING_FAULT_REBOOT

·     DEV/5/CHASSIS_REBOOT

·     DEV/5/SUBCARD_REBOOT

·     DEV/5/SYSTEM_REBOOT

·     STM/4/STM_MERGE

  • Cloud & AI
  • InterConnect
  • Intelligent Computing
  • Intelligent Storage
  • Security
  • SMB Products
  • Intelligent Terminal Products
  • Product Support Services
  • Technical Service Solutions
All Services
  • Resource Center
  • Policy
  • Online Help
  • Technical Blogs
All Support
  • Become A Partner
  • Partner Policy & Program
  • Global Learning
  • Partner Sales Resources
  • Partner Business Management
  • Service Business
All Partners
  • Profile
  • News & Events
  • Online Exhibition Center
  • Contact Us
All About Us
新华三官网