H3C SeerEngine-Campus Troubleshooting Guide-E65xx-5W100

HomeSupportResource CenterSDNH3C SeerEngine-CampusH3C SeerEngine-CampusTechnical DocumentsDiagnose & MaintainTroubleshootingH3C SeerEngine-Campus Troubleshooting Guide-E65xx-5W100
Download Book
Table of Contents
Related Documents

 

H3C SeerEngine-Campus

Troubleshooting Guide

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Document version: 5W100-20220725

 

Copyright © 2022 New H3C Technologies Co., Ltd. All rights reserved.

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

The information in this document is subject to change without notice.



Introduction

This document provides information about troubleshooting common software and hardware issues with H3C SeerEngine-Campus controllers.

General guidelines

To help identify the cause of the issue, collect system and configuration information, including:

·     Versions of the SeerEngine-Campus controller, Linux operation system, Unified Platform, EIA, and DHCP server.

·     Symptom, time of failure, and configuration.

·     Network topology information, including the network diagram, port connections, and points of failure.

·     Log messages and diagnostic information. For more information, see "Collecting diagnosis log messages."

·     Steps you have taken and their effects.

Collecting diagnosis log messages

1.     Enter the URL of Unified Platform in the address bar of a browser (for example, Chrome) to enter the Unified Platform login page.

The URL is in the format of https://unifiedplatform_ip_address:30000/portal/..

2.     On the login page, enter the username and password, and click Log in.

3.     Click System on the top navigation bar, and then select Log Managemen from the navigation pane. The page that opens displays operation, system, and diagnosis logs. You can click a tab to view the corresponding log entries.

¡     To filter log entries by component or time, specify the filtering criteria and click Query.

¡     To filter log entries by username, user IP, host name, service name, module name, operation result, operation description, or failure reason, click Advanced Search.

¡     To export the filtered log entries, click Export.

Figure 1 Log information page

 

Contacting technical support

If you cannot resolve an issue after using the troubleshooting procedures in this document, contact H3C Support.

The following is the contact information for H3C Support:

·     Telephone number—400-810-0504.

·     E-mailservice@h3c.com.


Troubleshooting product licensing

This section provides troubleshooting information for common product licensing issues.

Cannot get licensing data from the license server after a controller team reboot

Symptom

A controller team restarts and has established a connection with the license server. The license for the controller team has been installed at the license server. However, the controller team cannot get the licensing data from the license server.

Solution

This symptom might occur if the controller team reconnects to the license server before the license server aging timer for the last connection expires. The license server does not reclaim the licensing data from an unexpectedly disconnected license client (the controller team) until the aging timer expires.

To resolve the issue:

1.     Log in to the license server, and kick off the license clients that were disconnected from the license server unexpectedly.

2.     Log in to Unified Platform,and then perform the following tasks:

a.     Click System on the top navigation bar and then select License Management from the navigation pane

b.     On the page that appears, disconnect the controller from the license server, and reconnect to the license sever.

3.     If the issue persists, contact H3C Support.


Troubleshooting teams

This section provides troubleshooting information for common team issues.

Node hardware failure

Symptom

A node in the Matrix cluster cannot operate correctly because of hardware failure and needs to be replaced.

Solution

To resolve the issue:

1.     Execute the following command on the master node to release the IP addresses used by the faulty node (node matrix02 in this example):

[root@matrix01 ~]# sh /opt/matrix/k8s/disaster-recovery/recovery.sh matrix02

2.     Replace the faulty node with a new server. Make sure new server have the same IP settings, username, and password as the faulty node.

3.     Copy folder /opt/matrix/app/install from the master node to the corresponding directory of the new server.

4.     Install the Matrix platform on the new server. For more information, see H3C Matrix Containerized Application Deployment Platform Installation Guide.

5.     Log in to the Matrix platform. Click DEPLOY on the top navigation bar and then select Cluster from the navigation pane.

6.     Disable and then enable the faulty node. To disable or enable a node, click the  icon for the node and then select Disable or Enable.

 


Troubleshooting OpenFlow

This section provides troubleshooting information for common OpenFlow issues.

OpenFlow connection failure

Symptom

No device information is displayed for a correctly configured OpenFlow device (spine or leaf device) after you access the Assurance > Controller Info page on the controller's GUI and click the region link for the controller.

Solution

To resolve the issue:

1.     Log in to the OpenFlow device and verify that the controller IP address specified for the OpenFlow device is correct. If the controller IP address is incorrect, specify the correct controller IP address on the OpenFlow device as shown in Figure 2.

Figure 2 Specifying the controller IP address

gw-------.JPG

 

2.     Verify that the controller IP address is reachable. If the controller IP address is reachable, troubleshoot the network.

3.     If the issue persists, contact H3C Support.

Unstable OpenFlow connection

Symptom

The OpenFlow connection established between the controller and the OpenFlow device is unstable.

Solution

To resolve the issue:

1.     Verify that the network is connected. If the network is disconnected, troubleshoot the network.

2.     Verify that traffic congestion does not occur in the region.

If traffic congestion occurs in the region, OpenFlow echo messages cannot be exchanged correctly. Execute the netstat -anp | grep 6633 command as a root user to identify whether the TCP channel for the OpenFlow connection is occupied. As shown in Figure 3, if the values for the first and the second columns are in the range of 200000 to 250000, the traffic in the region is heavy. You can disconnect OpenFlow connections for some OpenFlow devices and then connect these devices to controllers in other regions.

Figure 3 TCP channel status

 

3.     If the issue persists, contact H3C Support.

Network device information display failure

Symptom

When you access the Provision > Inventory > Devices page on the controller's GUI, the Physical Devices tab cannot display device summary and port information.

Solution

To resolve the issue:

1.     Log in to the OpenFlow device, and execute the display openflow instance instance-id controller command to verify that the controller role is correctly assigned to the OpenFlow device.

This example uses OpenFlow instance 1. If the controller role is Equal, create a region on the controller or connect the OpenFlow device to a controller in a region.

Figure 4 Controller role assigned to the OpenFlow device

捕获.PNG

 

2.     Verify that the region to which the OpenFlow device is connected is configured correctly.

a.     Access the PROVISION > Network Design > Fabrics > Fabric [xx] > Switching Device [xx] > Switching Device Details page on the controller's GUI to identify whether region information is displayed for the OpenFlow device.

If the region information is not displayed, export the diagnosis log messages for the controller. For more information, see "Collecting diagnosis log messages."

b.     Identify whether the MAC address of the OpenFlow device exists in the Global Master Cache field in the RegionInfo log file exported.

If the MAC address of the OpenFlow device does exist in the Global Master Cache field, disconnect the OpenFlow device from the controller and reconnect the device to the controller.

As a best practice, do not disconnect and reconnect the OpenFlow device if the service traffic can be processed correctly when the symptom appears.

3.     If the issue persists, contact H3C Support.

 

 


Troubleshooting NETCONF

This section provides troubleshooting information for common NETCONF issues.

NETCONF communication failure

Symptom

The controller fails to use SOAP to issue NETCONF configuration. For example, after a network element is added, its state is inactive and the system displays either of the following error messages:

·     OpenFlow connection is down.

·     NETCONF connection fails due to network congestion.

Solution

To resolve the issue:

1.     Verify that the network device and the controller are physically connected:

a.     Log in to the controller, and examine the cable connection status and link status.

b.     Log in to the network device, and examine the cable connection status and link status.

2.     Verify that the NETCONF settings are consistent on the network device and the controller:

a.     Make sure NETCONF over SOAP over HTTPS is enabled on the network device.

b.     Make sure the network device and the controller are configured with the same username and password.

If any inconsistency occurs, modify the NETCONF settings on the network device or the controller.

3.     Verify that a NETCONF session can be established between the network device and the controller.

There is a limit on the number of NETCONF sessions that can be established on the network device. If the upper limit has been reached, the network device cannot establish a NETCONF session with the controller. In this case, delete the existing NETCONF sessions or increase the NETCONF session limit to ensure that a NETCONF session can be established between network device and the controller.

4.     If the issue persists, contact H3C Support.


Troubleshooting SNMP

This section provides troubleshooting information for common SNMP issues.

SNMP communication failure

Symptom

Configuration cannot be issued to a newly added access device. The device is in inactive state and the system prompts configuration deployment uncompleted.

Solution

IMPORTANT

IMPORTANT:

Perform steps 1 and 2 on both the access device and its leaf device.

 

To resolve the issue:

1.     Verify that the device and the controller can reach each other. You can log in to the controller and the device to verify the network connection and link state.

2.     Verify that the device and the controller are both enabled with SNMP and have consistent SNMP settings, read/write community string for example.

3.     If the issue persists, contact H3C Support.


Troubleshooting LLDP

This section provides troubleshooting information for common LLDP issues.

LLDP communication failure

Symptom

Configuration cannot be issued to a newly added access device. The device is in inactive state and the system prompts configuration deployment uncompleted.

Solution

IMPORTANT

IMPORTANT:

Perform steps 1 and 2 on both the access device and its leaf device.

 

To resolve the issue:

1.     Verify that the device and the controller can reach each other. You can log in to the controller and the device to verify the network connection and link state.

2.     Verify that the device and its directly connected leaf or access device are configured with LLDP correctly. For example, verify that LLDP is enabled on these devices globally and on related interfaces.

3.     If the issue persists, contact H3C Support.


Troubleshooting carrier networks

This section provides troubleshooting information for common carrier network issues.

Physical network element activation failure

Symptom

A physical network element remains in inactive state after it is created.

Solution

To resolve the issue:

1.     Verify the number of physical network elements managed by the controller. If the number exceeds the limit allowed by the licenses, purchase new licenses.

2.     Verify that the physical network element and the controller can ping each other by using the management IP of the physical network element. If the ping operation fails, troubleshoot the network connection issue.

3.     Verify that the actual role of each device is the same as its device role on the controller.

4.     Verify that NETCONF communication between the physical network element and the controller succeeds. If NETCONF communication fails, troubleshoot NETCONF. For more information, see "Troubleshooting NETCONF."

5.     Verify that the controller and the physical network elements can reach each other through SNMP operations. For more information, see "SNMP communication failure."

6.     Verify that the physical network elements can reach each other through LLDP. For more information, see "LLDP communication failure."

7.     Verify that a region is automatically selected for the physical network element if the controller operates in team mode:

a.     Click Provision on the top navigation bar, and then select Network Design > Fabrics from the left navigation pane.

b.     Select a fabric and view the Selected Region field. If --- is displayed, troubleshoot automatic region configuration failure. For more information, see "Automatic region configuration failure."

8.     If the issue persists, contact H3C Support.

Automatic region configuration failure

Symptom

A network element fails to automatically select a region when the controller operates in team mode.

Solution

To resolve the issue:

1.     Verify that a region is configured for the team:

a.     Click the gear icon at the upper right corner of the page, and select Controller from the left navigation pane.

b.     Click Region to identify whether a region is configured. If no region is configured, configure a region for the team.

2.     Verify that the management IP address of the network element belongs to the managed subnets of the configured region:

a.     On the Region page, view the Managed Subnets field.

If the management IP address of the network element does not belong to the managed subnets, perform either of the following tasks:

-     Create a new region without any managed subnets.

-     Click the Edit icon in the Actions field for the region to add the network segment of the management IP address.

3.     If the issue persists, contact H3C Support.