13-Network Management and Monitoring Configuration Guide

HomeSupportSwitchesS12500G-AF SeriesConfigure & DeployConfiguration GuidesH3C S12500G-AF Switch Series Configuration Guides-Release7639Pxx-6W10013-Network Management and Monitoring Configuration Guide
31-KPI data collection configuration
Title Size Download
31-KPI data collection configuration 180.02 KB

Configuring KPI data collection

About KPI data collection

The key performance indicators (KPIs) of the device are a set of performance values that indicate the device's running status at a certain moment. During operation, the device automatically collects KPI data and stores the KPI data in the flash.

The KPI data collection feature periodically collects various types of KPI data and records the KPI data in real time. Based on the collected KPI data, you can understand the device running status, service failure time, service failure type, and possible failure causes and quickly troubleshoot the issues.

Basic concepts

The KPI data collection feature can collect a vast quantity and variety of data. For example, the collected CPU usage for a card is a performance parameter belonging to the Device-resource class. The CPU usage is a card-specific parameter belonging to the DEV-RES module and its value 50%. To easily describe, categorize, and retrieve all types of data, KPI data is defined from the following dimensions:

·     Indicator—Performance parameters and state collected by the KPI data collection feature, such as the CPU usage, memory usage, FIB table usage, ARP table usage, card failures, power supply failures, and abnormal card temperature.

·     Object—Physical entities to which the indicators belong, such as devices, cards, and subcards. As the KPI data collection feature can collect more and more indicators, the object types will also become more diverse. The value for the object varies by object type. Available values include:

¡     device—Specifies a device. Indicators for this object describe the overall condition of the device.

¡     chassis.x/slot.y—Specifies a card. Indicators for this object describe the performance of the card. The value for x is 0, and y represents the slot number of the card. (In standalone mode.) x represents the member ID of the IRF member device, and y represents the slot number of the card. (In IRF mode.)

¡     chassis.x/slot.y/subslot.z—Specifies a subcard. Indicators for this object describe the performance of the subcard. The value for x is 0, y represents the slot number of the card, and z represents the subcard ID. (In standalone mode.) x represents the member ID of the IRF member device, y represents the slot number of the card, and z represents the subcard ID. (In IRF mode.)

¡     interface-typeinterface-number—Specifies an interface by its type and number. Indicators for this object describe the running status of the physical interface.

·     Module—Service module to which an indicator belongs. For example, the CPU usage and memory usage belong to the device resource (DEV-RES) module. The FIB table usage and ARP table usage belong to the forwarding resource (FWD-RES) module.

·     Class—A collection of a certain type of indicators. Some indicators can indicate the running status of a certain aspect of the device. Such indicators can be divided into a class. The system has predefined some classes, such as the network performance (Net-performance) class and port state (Port-state) class.

Operating mechanism

Figure 1 Operating mechanism

 

The device enabled with KPI data collection works as follows:

1.     Collect KPI data. Enabled with KPI data collection for a service module, the KPI process collects KPI data for the module at intervals and temporarily saves the KPI data in the device memory. By default, the KPI data collection interval is 300 seconds and you can edit the KPI data collection interval as required.

2.     Store KPI data. The KPI process stores the collected KPI data in the flash at intervals. When the remaining storage media space is insufficient or the total KPI file size exceeds the threshold, the KPI process automatically deletes the earliest KPI files to release some space.

Available data for KPI data collection

Table 1 Available data for KPI data collection

Class

Module

Object

Indicator

Indicator description

Device-state

DEV

Card

Device_restarts

Number of device reboots.

IRF

Device

IRF_splits

Number of IRF splits.

IRF

Device

IRF_dual-active_count

Number of dual-master IRF fabrics.

DEV

Card

LPU_failures

Number of LPU failures.

DEV

Card

MPU_state

State of the MPU:

·     0—The MPU is not present.

·     1—The MPU is operating correctly.

·     2—The MPU has failed.

DEV

Card

MPU_failures

Number of MPU failures.

DEV

Card

SFU_state

State of the SFU:

·     0—The SFU is not present.

·     1—The SFU is operating correctly.

·     2—The SFU has failed.

DEV

Card

SFU_failures

Number of SFU failures.

DEV

Subcard

Subslot_failure

Number of subcard failures.

FWD

Device

Inc_H_S_entries

Inconsistent hardware and software entries.

FAN

Device

Fan_state

Fan tray state:

·     0—Normal.

·     1—Faulty.

POWER

Device

Power_state

Power supply state:

·     0—Normal.

·     1—Faulty.

POE

Device

PoE_state

PoE power supply state:

·     0—Normal.

·     1—Faulty.

TEMP

Device

Card_temperature

Card temperature:

·     0—Normal.

·     1—Faulty.

FS

Device

File_exceptions

Number of save operation failures due to file system error.

DEV

Device

Process_abnormal_reboot

Number of process reboot failures.

DEV

Device

Process_normal_reboot

Number of successful process reboots.

Device-resource

FWD-RES

Card

ARP_entry_usage

Ratio of the real-time ARP entry count to the upper ARP entry count limit.

FWD-RES

Card

ARP_threshold_ratio

Ratio of the real-time ARP entry count to the ARP table usage threshold.

FWD-RES

Card

MAC entry usage

Ratio of the real-time MAC entry count to the upper MAC entry count limit.

FWD-RES

Card

MAC_threshold_ratio

Ratio of the real-time MAC entry count to the MAC table usage threshold.

FWD-RES

Card

FIB_entry_usage

Ratio of the real-time FIB entry count to the upper FIB entry count limit.

FWD-RES

Card

FIB_threshold_ratio

Ratio of the real-time FIB entry count to the FIB table usage threshold.

FWD-RES

Card

ND entry usage

Ratio of the real-time ND entry count to the upper ND entry count limit.

FWD-RES

Card

ND_threshold_ratio

Ratio of the real-time ND entry count to the ND table usage threshold.

FWD-RES

Card

IPv4L2multicast_usage

Ratio of the real-time IPv4 Layer 2 multicast entry count to the upper IPv4 Layer 2 multicast entry count limit.

FWD-RES

Card

IPv4L2multicast_ratio

Ratio of the real-time IPv4 Layer 2 multicast entry count to the IPv4 Layer 2 multicast entry count threshold.

FWD-RES

Card

IPv6L2multicast_usage

Ratio of the real-time IPv6 Layer 2 multicast entry count to the upper IPv6 Layer 2 multicast entry count limit.

FWD-RES

Card

IPv6L2multicast_ratio

Ratio of the real-time IPv6 Layer 2 multicast entry count to the IPv6 Layer 2 multicast entry count threshold.

FWD-RES

Card

IPv4L3multicast_usage

Ratio of the real-time IPv4 Layer 3 multicast entry count to the upper IPv4 Layer 3 multicast entry count limit.

FWD-RES

Card

IPv4L3multicast_ratio

Ratio of the real-time IPv4 Layer 3 multicast entry count to the IPv4 Layer 3 multicast entry count threshold.

FWD-RES

Card

IPv6L3multicast_usage

Ratio of the real-time IPv6 Layer 3 multicast entry count to the upper IPv6 Layer 3 multicast entry count limit.

FWD-RES

Card

IPv6L3multicast_ratio

Ratio of the real-time IPv6 Layer 3 multicast entry count to the IPv6 Layer 3 multicast entry count threshold.

ACL-RES

Card

ACL_usage

Ratio of the real-time ACL entry count to the upper ACL entry count limit.

ACL-RES

Card

ACL_threshold_ratio

Ratio of the real-time ACL entry count to the ACL entry count threshold.

STOR-RES

Card

Storage_usage

Ratio of the used storage space to the total storage space.

STOR-RES

Card

Storage_threshold_ratio

Ratio of the used storage space to the storage space usage threshold.

DEV-RES

Card

CPU_usage

Ratio of the used CPU capacity to the total CPU capacity.

DEV-RES

Card

CPU_threshold_ratio

Ratio of the used CPU capacity to the CPU usage threshold.

DEV-RES

Card

Memory_usage

Ratio of the used memory to the total memory.

DEV-RES

Card

Memory_threshold_ratio

Ratio of the used memory to the memory usage threshold.

Net-performance

LOOP-DCT

Device

L2 loop_state

Layer 2 loop state:

·     0—The Layer 2 loop is operating correctly.

·     1—Layer 2 loop has failed.

IF-CI

Interface

Port_congestion

Number of packets dropped due to traffic congestion.

IF-ERROR

Interface

Port_error

Number of packets dropped due to error packets.

CPCAR

Device

CPCAR_loss

Number of dropped packets due to traffic policing configured on the control panel.

STP-SWT

Device

STP_switchovers

Number of STP switchovers.

LACP-SWT

Device

LACP_switchovers

Number of link aggregation switchovers.

IRF-SWT

Device

IRF_switchovers

Number of IRF switchovers.

M-LAG-SWT

Device

M-LAG_switchovers

Number of M-LAG switchovers.

RRPP-SWT

Device

RRPP_switchovers

Number of RRPP switchovers.

VRRP-SWT

Device

VRRP_switchovers

Number of VRRP switchovers.

IF-USAGE

Device

Port_BW_usage

Bandwidth usage for all ports.

Port-state

PORT-ST

Device

Down_ports

Number of physical interfaces in down state.

PORT-ST

Device

Port_flappings

Number of port flappings.

TRAN-ST

Device

Opti-module_health

This indicator is not supported in the current software version.

Transceiver module health.

Net-connection

RPNCS

Device

ISIS_peer_status

IS-IS neighbor connection state:

·     0—The IS-IS neighbor connection is operating correctly.

·     1—The IS-IS neighbor connection has failed.

RPNCS

Device

OSPF_peer_status

OSPF neighbor connection state:

·     0—The OSPF neighbor connection is operating correctly.

·     1—The OSPF neighbor connection has failed.

RPNCS

Device

OSPv3_peer_status

OSPFv3 neighbor connection state:

·     0—The OSPFv3 neighbor connection is operating correctly.

·     1—The OSPFv3 neighbor connection has failed.

RPNCS

Device

BGP_peer_status

BGP neighbor connection state:

·     0—The BGP neighbor connection is operating correctly.

·     1—The BGP neighbor connection has failed.

MCRCS

Device

Multicast_connection_status

Multicast route connection state:

·     0—The multicast route connection is operating correctly.

·     1—The multicast route connection has failed.

DHCPCS

Device

DHCPv4_server_state

Statistics about DHCPv4 server address allocation failures.

DHCPCS

Device

DHCPv6_server_state

Statistics about DHCPv6 server address allocation failures.

DHCPCS

Device

DHCPv4_server_switching

Number of DHCPv4 server switchovers.

DHCPCS

Device

DHCPv6_server switching

Number of DHCPv6 server switchovers.

DHCPCS

Device

DHCPv4_entry failures

Number of DHCPv4 entry establishment failures.

DHCPCS

Device

DHCPv6_entry failures

Number of DHCPv6 entry establishment failures.

Net-securit

y

AAA

Device

1X_AuthN_status

State of 802.1X authentication:

·     0—802.1X authentication succeeded.

·     1—802.1X authentication failed. An attack might exist.

AAA

Device

1X_Usr&Pwd_status

State of the username and password for 802.1X authentication:

·     0—The username and password are correct.

·     1—The username and password are incorrect.

AAA

Device

MAC_AuthN_status

State of MAC authentication:

·     0—MAC authentication succeeded.

·     1—MAC authentication failed. An attack might exist.

AAA

Device

MAC_Usr&Pwd_status

State of the username and password for MAC authentication:

·     0—The username and password are correct.

·     1—The username and password are incorrect.

AAA

Device

Portsec_AuthN_status

State of the port security authentication:

·     0—The authentication succeeded.

·     1—The authentication failed. An attack might exist.

AAA

Device

Portsec_Usr&Pwd_status

State of the port security access username and password:

·     0—The username and password are correct.

·     1—The username and password are incorrect.

AAA

Device

StaticUser_AuthN_status

State of the static user authentication:

·     0—The authentication succeeded.

·     1—The authentication failed. An attack might exist.

AAA

Device

StaticUser_Usr&Pwd_status

State of the static username and password:

·     0—The username and password are correct.

·     1—The username and password are incorrect.

ATTACK

Device

All-type_attacks

Number of all types of attacks.

TCP

Device

TCP_attacks

Number of TCP attacks.

ARP-ATK

Device

ARP_attacks

Number of ARP attacks.

ND-ATK

Device

ND_attacks

Number of ND attacks.

AAA

Device

Illegal_user_detections

Number of illegal user detections.

Restrictions and guidelines

By default, KPI data collection is enabled for all service modules that support this feature on the device.

To prevent data collection from affecting normal services due to a large amount of data, the KPI data collection feature is suppressed when the device memory or CPU usage reaches the alarm threshold. At the same time, the KPI process stops collecting data. As a best practice, disable KPI data collection for modules other than the DEV-RES module. For detailed information about the alarm thresholds for the device memory and CPU memory, see device management configuration in Fundamentals Configuration Guide.

KPI data collection tasks at a glance

To configure KPI data collection, perform the following tasks:

·     (Optional.) Configuring KPI data storage

·     (Optional.) Configuring KPI file aging

·     (Optional.) Copying the KPI data on the standby MPU to the active MPU

·     (Optional.) Disabling KPI data collection for service modules

·     (Optional.) Specifying the KPI data collection interval for service modules

Configuring KPI data storage

About this task

The KPI files in the memory are saved to the storage media at intervals. Use this feature to edit the KPI file directory and the interval for saving KPI files to the storage media.

Procedure

1.     Enter system view.

system-view

2.     Specify the interval for saving KPI files to the storage media.

kpi file save-interval interval

By default, KPI files are saved to the storage media at an interval of 1440 minutes.

3.     Specify the KPI file directory.

kpi file directory dir-name

By default, KPI files are saved in the flash:/kpi directory.

Configuring KPI file aging

About this task

When the free storage media space is insufficient or the total KPI file size exceeds the threshold, the KPI process automatically deletes the earliest KPI files to release some space. Use this feature to edit the free storage media capacity threshold and the KPI file size threshold for triggering KPI file aging.

Procedure

1.     Enter system view.

system-view

2.     Specify the free storage media capacity threshold for triggering KPI file aging.

kpi file aging threshold remain-disk-size size

By default, the free storage media capacity threshold for triggering KPI file aging is 128 MB.

3.     Specify the KPI file size threshold for triggering KPI file aging.

kpi file aging threshold total-file-size size

By default, the KPI file size threshold for triggering KPI file aging is 128 MB.

Copying the KPI data on the standby MPU to the active MPU

About this task

After an MPU active/standby switchover on the device, the new active MPU cannot automatically obtain the KPI data from the old active MPU (current standby MPU). To ensure service continuity, you must use this feature to copy the KPI data on the old active MPU to the new active MPU.

 

IMPORTANT

IMPORTANT:

If the administrator edits the KPI file directory by using the kpi file directory command before the active/standby switchover, the original active MPU will have two KPI file directories. After the switchover, this feature enables the system to copy only the KPI data stored in the new directory on the old MPU to the same directory on the new active MPU. The KPI data files in the old directory cannot be copied to the new active MPU.

 

Procedure

1.     Enter system view.

system-view

2.     Copy the KPI data in the standby MPU to the active MPU.

kpi copy-file to active-mpu

Disabling KPI data collection for service modules

About this task

To prevent data collection from affecting normal services due to a large amount of data, use this feature to disable KPI data collection for some service modules when the device memory usage or CPU usage is high.

Procedure

1.     Enter system view.

system-view

2.     Enter probe view.

probe

3.     Disable KPI data collection for service modules.

undo kpi system internal collect module [ module-name ] enable

By default, KPI data collection is enabled for all service modules that support this feature on the device.

Specifying the KPI data collection interval for service modules

About this task

You can use this feature to edit the KPI data collection interval for service modules.

Procedure

1.     Enter system view.

system-view

2.     Enter probe view.

probe

3.     Specify the KPI data collection interval for service modules.

kpi system internal module module-name collect-interval collect-interval

By default, the KPI data collection interval is 300 seconds.

Display and maintenance commands for KPI data collection

Execute display commands in any view.

 

Task

Command

Display the KPI data of service modules and objects for the remote device.

display external-kpi data [ device-ip ip-address [ module module-name [ object object-name ] ] ]

Display KPI data collection information for service modules.

display kpi module-info [ module-name ] [ verbose ]

Display the KPI data for service modules and objects within a time range on the storage media.

display kpi data module module-name object object-name from time1 date1 to time2 date2 [ file file-path ]

 


Configuring EAI

About EAI

Embedded Artificial Intelligence (EAI) is a KPI monitoring and prediction technology based on intelligent algorithms. EAI can monitor and predict indicator values in real time based on the history indicator values collected by the KPI data collection feature. It helps the administrator analyze the trend of key indicators on the device and proactively prevent potential failures.

EAI monitoring

Based on the history indicator values collected by the KPI data collection feature, the device dynamically generates reasonable alarm thresholds and recovery thresholds for the indicators in Table 2.

·     When an indicator value is out of the alarm threshold range, the device logs the threshold violation event and reports it to an NMS through SNMP.

·     When the indicator value restores to be within the alarm threshold range, the device records the recovery event and reports it to an NMS through SNMP.

EAI prediction

Enabled with this feature, the device dynamically calculates and predicts the indicator values 30 days later based on the history KPI data.

·     When the predicted indicator value is out of the alarm threshold range, the device logs the threshold violation event and reports it to an NMS through SNMP.

·     When the predicted indicator value restores to be within the alarm threshold range, the device logs the recovery and reports it to an NMS through SNMP.

Available indicators for EAI

Table 2 Available indicators for EAI

Class

Module

Object

Indicator

Indicator description

Device-resource

FWD-RES

Card

ARP_entry_usage

Ratio of the real-time ARP entry count to the upper ARP entry count limit.

FWD-RES

Card

MAC_entry_usage

Ratio of the real-time MAC entry count to the upper MAC entry count limit.

FWD-RES

Card

FIB_entry_usage

Ratio of the real-time FIB entry count to the upper FIB entry count limit.

FWD-RES

Card

ND_entry_usage

Ratio of the real-time ND entry count to the upper ND entry count limit.

FWD-RES

Card

IPv4L2multicast_usage

Ratio of the real-time IPv4 Layer 2 multicast entry count to the upper IPv4 Layer 2 multicast entry count limit.

FWD-RES

Card

IPv6L2multicast_usage

Ratio of the real-time IPv6 Layer 2 multicast entry count to the upper IPv6 Layer 2 multicast entry count limit.

FWD-RES

Card

IPv4L3multicast_usage

Ratio of the real-time IPv4 Layer 3 multicast entry count to the upper IPv4 Layer 3 multicast entry count limit.

FWD-RES

Card

IPv6L3multicast_usage

Ratio of the real-time IPv6 Layer 3 multicast entry count to the upper IPv6 Layer 3 multicast entry count limit.

ACL-RES

Card

ACL_usage

Ratio of the real-time ACL entry count to the upper ACL entry count limit.

STOR-RES

Card

Storage_usage

Ratio of the used storage space to the total storage space.

DEV-RES

Card

CPU_usage

Ratio of the used CPU capacity to the total CPU capacity.

DEV-RES

Card

Memory_usage

Ratio of the used memory to the total memory.

Prerequisites for EAI

Make sure the KPI data collection feature is enabled for service modules in Table 2.

EAI tasks at a glance

To configure EAI, perform the following tasks:

·     Enabling EAI monitoring

·     Enabling EAI prediction

Enabling EAI monitoring

1.     Enter system view.

system-view

2.     Enter EAI view.

eai artificial intelligence

3.     Enable EAI monitoring.

eai monitoring enable

By default, EAI monitoring is disabled.

Enabling EAI prediction

1.     Enter system view.

system-view

2.     Enter EAI view.

eai artificial intelligence

3.     Enable EAI prediction.

eai prediction enable

By default, EAI prediction is disabled.

Display and maintenance commands for EAI

Execute display commands in any view.

 

Task

Command

Display EAI monitoring information.

display eai monitoring

Display EAI predicted information and history KPI data.

display eai prediction

 

  • Cloud & AI
  • InterConnect
  • Intelligent Computing
  • Intelligent Storage
  • Security
  • SMB Products
  • Intelligent Terminal Products
  • Product Support Services
  • Technical Service Solutions
All Services
  • Resource Center
  • Policy
  • Online Help
  • Technical Blogs
All Support
  • Become A Partner
  • Partner Policy & Program
  • Global Learning
  • Partner Sales Resources
  • Partner Business Management
  • Service Business
All Partners
  • Profile
  • News & Events
  • Online Exhibition Center
  • Contact Us
All About Us
新华三官网