Table of Contents

Related Documents

31-KPI data collection configuration

Title	Size	Download
31-KPI data collection configuration	180.02 KB

Configuring KPI data collection

About KPI data collection

The key performance indicators (KPIs) of the device are a set of performance values that indicate the device's running status at a certain moment. During operation, the device automatically collects KPI data and stores the KPI data in the flash.

The KPI data collection feature periodically collects various types of KPI data and records the KPI data in real time. Based on the collected KPI data, you can understand the device running status, service failure time, service failure type, and possible failure causes and quickly troubleshoot the issues.

Basic concepts

The KPI data collection feature can collect a vast quantity and variety of data. For example, the collected CPU usage for a card is a performance parameter belonging to the Device-resource class. The CPU usage is a card-specific parameter belonging to the DEV-RES module and its value 50%. To easily describe, categorize, and retrieve all types of data, KPI data is defined from the following dimensions:

· Indicator—Performance parameters and state collected by the KPI data collection feature, such as the CPU usage, memory usage, FIB table usage, ARP table usage, card failures, power supply failures, and abnormal card temperature.

· Object—Physical entities to which the indicators belong, such as devices, cards, and subcards. As the KPI data collection feature can collect more and more indicators, the object types will also become more diverse. The value for the object varies by object type. Available values include:

¡ device—Specifies a device. Indicators for this object describe the overall condition of the device.

¡ chassis.x/slot.y—Specifies a card. Indicators for this object describe the performance of the card. The value for x is 0, and y represents the slot number of the card. (In standalone mode.) x represents the member ID of the IRF member device, and y represents the slot number of the card. (In IRF mode.)

¡ chassis.x/slot.y/subslot.z—Specifies a subcard. Indicators for this object describe the performance of the subcard. The value for x is 0, y represents the slot number of the card, and z represents the subcard ID. (In standalone mode.) x represents the member ID of the IRF member device, y represents the slot number of the card, and z represents the subcard ID. (In IRF mode.)

¡ interface-typeinterface-number—Specifies an interface by its type and number. Indicators for this object describe the running status of the physical interface.

· Module—Service module to which an indicator belongs. For example, the CPU usage and memory usage belong to the device resource (DEV-RES) module. The FIB table usage and ARP table usage belong to the forwarding resource (FWD-RES) module.

· Class—A collection of a certain type of indicators. Some indicators can indicate the running status of a certain aspect of the device. Such indicators can be divided into a class. The system has predefined some classes, such as the network performance (Net-performance) class and port state (Port-state) class.

Operating mechanism

Figure 1 Operating mechanism

The device enabled with KPI data collection works as follows:

1. Collect KPI data. Enabled with KPI data collection for a service module, the KPI process collects KPI data for the module at intervals and temporarily saves the KPI data in the device memory. By default, the KPI data collection interval is 300 seconds and you can edit the KPI data collection interval as required.

2. Store KPI data. The KPI process stores the collected KPI data in the flash at intervals. When the remaining storage media space is insufficient or the total KPI file size exceeds the threshold, the KPI process automatically deletes the earliest KPI files to release some space.

Available data for KPI data collection

Table 1 Available data for KPI data collection

Class	Module	Object	Indicator	Indicator description
Device-state	DEV	Card	Device_restarts	Number of device reboots.
	IRF	Device	IRF_splits	Number of IRF splits.
	IRF	Device	IRF_dual-active_count	Number of dual-master IRF fabrics.
	DEV	Card	LPU_failures	Number of LPU failures.
	DEV	Card	MPU_state	State of the MPU: · 0—The MPU is not present. · 1—The MPU is operating correctly. · 2—The MPU has failed.
	DEV	Card	MPU_failures	Number of MPU failures.
	DEV	Card	SFU_state	State of the SFU: · 0—The SFU is not present. · 1—The SFU is operating correctly. · 2—The SFU has failed.
	DEV	Card	SFU_failures	Number of SFU failures.
	DEV	Subcard	Subslot_failure	Number of subcard failures.
	FWD	Device	Inc_H_S_entries	Inconsistent hardware and software entries.
	FAN	Device	Fan_state	Fan tray state: · 0—Normal. · 1—Faulty.
	POWER	Device	Power_state	Power supply state: · 0—Normal. · 1—Faulty.
	POE	Device	PoE_state	PoE power supply state: · 0—Normal. · 1—Faulty.
	TEMP	Device	Card_temperature	Card temperature: · 0—Normal. · 1—Faulty.
	FS	Device	File_exceptions	Number of save operation failures due to file system error.
	DEV	Device	Process_abnormal_reboot	Number of process reboot failures.
	DEV	Device	Process_normal_reboot	Number of successful process reboots.
Device-resource	FWD-RES	Card	ARP_entry_usage	Ratio of the real-time ARP entry count to the upper ARP entry count limit.
	FWD-RES	Card	ARP_threshold_ratio	Ratio of the real-time ARP entry count to the ARP table usage threshold.
	FWD-RES	Card	MAC entry usage	Ratio of the real-time MAC entry count to the upper MAC entry count limit.
	FWD-RES	Card	MAC_threshold_ratio	Ratio of the real-time MAC entry count to the MAC table usage threshold.
	FWD-RES	Card	FIB_entry_usage	Ratio of the real-time FIB entry count to the upper FIB entry count limit.
	FWD-RES	Card	FIB_threshold_ratio	Ratio of the real-time FIB entry count to the FIB table usage threshold.
	FWD-RES	Card	ND entry usage	Ratio of the real-time ND entry count to the upper ND entry count limit.
	FWD-RES	Card	ND_threshold_ratio	Ratio of the real-time ND entry count to the ND table usage threshold.
	FWD-RES	Card	IPv4L2multicast_usage	Ratio of the real-time IPv4 Layer 2 multicast entry count to the upper IPv4 Layer 2 multicast entry count limit.
	FWD-RES	Card	IPv4L2multicast_ratio	Ratio of the real-time IPv4 Layer 2 multicast entry count to the IPv4 Layer 2 multicast entry count threshold.
	FWD-RES	Card	IPv6L2multicast_usage	Ratio of the real-time IPv6 Layer 2 multicast entry count to the upper IPv6 Layer 2 multicast entry count limit.
	FWD-RES	Card	IPv6L2multicast_ratio	Ratio of the real-time IPv6 Layer 2 multicast entry count to the IPv6 Layer 2 multicast entry count threshold.
	FWD-RES	Card	IPv4L3multicast_usage	Ratio of the real-time IPv4 Layer 3 multicast entry count to the upper IPv4 Layer 3 multicast entry count limit.
	FWD-RES	Card	IPv4L3multicast_ratio	Ratio of the real-time IPv4 Layer 3 multicast entry count to the IPv4 Layer 3 multicast entry count threshold.
	FWD-RES	Card	IPv6L3multicast_usage	Ratio of the real-time IPv6 Layer 3 multicast entry count to the upper IPv6 Layer 3 multicast entry count limit.
	FWD-RES	Card	IPv6L3multicast_ratio	Ratio of the real-time IPv6 Layer 3 multicast entry count to the IPv6 Layer 3 multicast entry count threshold.
	ACL-RES	Card	ACL_usage	Ratio of the real-time ACL entry count to the upper ACL entry count limit.
	ACL-RES	Card	ACL_threshold_ratio	Ratio of the real-time ACL entry count to the ACL entry count threshold.
	STOR-RES	Card	Storage_usage	Ratio of the used storage space to the total storage space.
	STOR-RES	Card	Storage_threshold_ratio	Ratio of the used storage space to the storage space usage threshold.
	DEV-RES	Card	CPU_usage	Ratio of the used CPU capacity to the total CPU capacity.
	DEV-RES	Card	CPU_threshold_ratio	Ratio of the used CPU capacity to the CPU usage threshold.
	DEV-RES	Card	Memory_usage	Ratio of the used memory to the total memory.
	DEV-RES	Card	Memory_threshold_ratio	Ratio of the used memory to the memory usage threshold.
Net-performance	LOOP-DCT	Device	L2 loop_state	Layer 2 loop state: · 0—The Layer 2 loop is operating correctly. · 1—Layer 2 loop has failed.
	IF-CI	Interface	Port_congestion	Number of packets dropped due to traffic congestion.
	IF-ERROR	Interface	Port_error	Number of packets dropped due to error packets.
	CPCAR	Device	CPCAR_loss	Number of dropped packets due to traffic policing configured on the control panel.
	STP-SWT	Device	STP_switchovers	Number of STP switchovers.
	LACP-SWT	Device	LACP_switchovers	Number of link aggregation switchovers.
	IRF-SWT	Device	IRF_switchovers	Number of IRF switchovers.
	M-LAG-SWT	Device	M-LAG_switchovers	Number of M-LAG switchovers.
	RRPP-SWT	Device	RRPP_switchovers	Number of RRPP switchovers.
	VRRP-SWT	Device	VRRP_switchovers	Number of VRRP switchovers.
	IF-USAGE	Device	Port_BW_usage	Bandwidth usage for all ports.
Port-state	PORT-ST	Device	Down_ports	Number of physical interfaces in down state.
	PORT-ST	Device	Port_flappings	Number of port flappings.
	TRAN-ST	Device	Opti-module_health	This indicator is not supported in the current software version. Transceiver module health.
Net-connection	RPNCS	Device	ISIS_peer_status	IS-IS neighbor connection state: · 0—The IS-IS neighbor connection is operating correctly. · 1—The IS-IS neighbor connection has failed.
	RPNCS	Device	OSPF_peer_status	OSPF neighbor connection state: · 0—The OSPF neighbor connection is operating correctly. · 1—The OSPF neighbor connection has failed.
	RPNCS	Device	OSPv3_peer_status	OSPFv3 neighbor connection state: · 0—The OSPFv3 neighbor connection is operating correctly. · 1—The OSPFv3 neighbor connection has failed.
	RPNCS	Device	BGP_peer_status	BGP neighbor connection state: · 0—The BGP neighbor connection is operating correctly. · 1—The BGP neighbor connection has failed.
	MCRCS	Device	Multicast_connection_status	Multicast route connection state: · 0—The multicast route connection is operating correctly. · 1—The multicast route connection has failed.
	DHCPCS	Device	DHCPv4_server_state	Statistics about DHCPv4 server address allocation failures.
	DHCPCS	Device	DHCPv6_server_state	Statistics about DHCPv6 server address allocation failures.
	DHCPCS	Device	DHCPv4_server_switching	Number of DHCPv4 server switchovers.
	DHCPCS	Device	DHCPv6_server switching	Number of DHCPv6 server switchovers.
	DHCPCS	Device	DHCPv4_entry failures	Number of DHCPv4 entry establishment failures.
	DHCPCS	Device	DHCPv6_entry failures	Number of DHCPv6 entry establishment failures.
Net-securit y	AAA	Device	1X_AuthN_status	State of 802.1X authentication: · 0—802.1X authentication succeeded. · 1—802.1X authentication failed. An attack might exist.
	AAA	Device	1X_Usr&Pwd_status	State of the username and password for 802.1X authentication: · 0—The username and password are correct. · 1—The username and password are incorrect.
	AAA	Device	MAC_AuthN_status	State of MAC authentication: · 0—MAC authentication succeeded. · 1—MAC authentication failed. An attack might exist.
	AAA	Device	MAC_Usr&Pwd_status	State of the username and password for MAC authentication: · 0—The username and password are correct. · 1—The username and password are incorrect.
	AAA	Device	Portsec_AuthN_status	State of the port security authentication: · 0—The authentication succeeded. · 1—The authentication failed. An attack might exist.
	AAA	Device	Portsec_Usr&Pwd_status	State of the port security access username and password: · 0—The username and password are correct. · 1—The username and password are incorrect.
	AAA	Device	StaticUser_AuthN_status	State of the static user authentication: · 0—The authentication succeeded. · 1—The authentication failed. An attack might exist.
	AAA	Device	StaticUser_Usr&Pwd_status	State of the static username and password: · 0—The username and password are correct. · 1—The username and password are incorrect.
	ATTACK	Device	All-type_attacks	Number of all types of attacks.
	TCP	Device	TCP_attacks	Number of TCP attacks.
	ARP-ATK	Device	ARP_attacks	Number of ARP attacks.
	ND-ATK	Device	ND_attacks	Number of ND attacks.
	AAA	Device	Illegal_user_detections	Number of illegal user detections.

‌

Restrictions and guidelines

By default, KPI data collection is enabled for all service modules that support this feature on the device.

To prevent data collection from affecting normal services due to a large amount of data, the KPI data collection feature is suppressed when the device memory or CPU usage reaches the alarm threshold. At the same time, the KPI process stops collecting data. As a best practice, disable KPI data collection for modules other than the DEV-RES module. For detailed information about the alarm thresholds for the device memory and CPU memory, see device management configuration in Fundamentals Configuration Guide.

KPI data collection tasks at a glance

To configure KPI data collection, perform the following tasks:

· (Optional.) Configuring KPI data storage

· (Optional.) Configuring KPI file aging

· (Optional.) Copying the KPI data on the standby MPU to the active MPU

· (Optional.) Disabling KPI data collection for service modules

· (Optional.) Specifying the KPI data collection interval for service modules

Configuring KPI data storage

About this task

The KPI files in the memory are saved to the storage media at intervals. Use this feature to edit the KPI file directory and the interval for saving KPI files to the storage media.

Procedure

1. Enter system view.

system-view

2. Specify the interval for saving KPI files to the storage media.

kpi file save-interval interval

By default, KPI files are saved to the storage media at an interval of 1440 minutes.

3. Specify the KPI file directory.

kpi file directory dir-name

By default, KPI files are saved in the flash:/kpi directory.

Configuring KPI file aging

About this task

When the free storage media space is insufficient or the total KPI file size exceeds the threshold, the KPI process automatically deletes the earliest KPI files to release some space. Use this feature to edit the free storage media capacity threshold and the KPI file size threshold for triggering KPI file aging.

Procedure

1. Enter system view.

system-view

2. Specify the free storage media capacity threshold for triggering KPI file aging.

kpi file aging threshold remain-disk-size size

By default, the free storage media capacity threshold for triggering KPI file aging is 128 MB.

3. Specify the KPI file size threshold for triggering KPI file aging.

kpi file aging threshold total-file-size size

By default, the KPI file size threshold for triggering KPI file aging is 128 MB.

Copying the KPI data on the standby MPU to the active MPU

About this task

After an MPU active/standby switchover on the device, the new active MPU cannot automatically obtain the KPI data from the old active MPU (current standby MPU). To ensure service continuity, you must use this feature to copy the KPI data on the old active MPU to the new active MPU.

IMPORTANT:

If the administrator edits the KPI file directory by using the kpi file directory command before the active/standby switchover, the original active MPU will have two KPI file directories. After the switchover, this feature enables the system to copy only the KPI data stored in the new directory on the old MPU to the same directory on the new active MPU. The KPI data files in the old directory cannot be copied to the new active MPU.

Procedure

1. Enter system view.

system-view

2. Copy the KPI data in the standby MPU to the active MPU.

kpi copy-file to active-mpu

Disabling KPI data collection for service modules

About this task

To prevent data collection from affecting normal services due to a large amount of data, use this feature to disable KPI data collection for some service modules when the device memory usage or CPU usage is high.

Procedure

1. Enter system view.

system-view

2. Enter probe view.

probe

3. Disable KPI data collection for service modules.

undo kpi system internal collect module [ module-name ] enable

By default, KPI data collection is enabled for all service modules that support this feature on the device.

Specifying the KPI data collection interval for service modules

About this task

You can use this feature to edit the KPI data collection interval for service modules.

Procedure

1. Enter system view.

system-view

2. Enter probe view.

probe

3. Specify the KPI data collection interval for service modules.

kpi system internal module module-name collect-interval collect-interval

By default, the KPI data collection interval is 300 seconds.

Display and maintenance commands for KPI data collection

Execute display commands in any view.

Task	Command
Display the KPI data of service modules and objects for the remote device.	display external-kpi data [ device-ip ip-address [ module module-name [ object object-name ] ] ]
Display KPI data collection information for service modules.	display kpi module-info [ module-name ] [ verbose ]
Display the KPI data for service modules and objects within a time range on the storage media.	display kpi data module module-name object object-name from time1 date1 to time2 date2 [ file file-path ]

Configuring EAI

About EAI

Embedded Artificial Intelligence (EAI) is a KPI monitoring and prediction technology based on intelligent algorithms. EAI can monitor and predict indicator values in real time based on the history indicator values collected by the KPI data collection feature. It helps the administrator analyze the trend of key indicators on the device and proactively prevent potential failures.

EAI monitoring

Based on the history indicator values collected by the KPI data collection feature, the device dynamically generates reasonable alarm thresholds and recovery thresholds for the indicators in Table 2.

· When an indicator value is out of the alarm threshold range, the device logs the threshold violation event and reports it to an NMS through SNMP.

· When the indicator value restores to be within the alarm threshold range, the device records the recovery event and reports it to an NMS through SNMP.

EAI prediction

Enabled with this feature, the device dynamically calculates and predicts the indicator values 30 days later based on the history KPI data.

· When the predicted indicator value is out of the alarm threshold range, the device logs the threshold violation event and reports it to an NMS through SNMP.

· When the predicted indicator value restores to be within the alarm threshold range, the device logs the recovery and reports it to an NMS through SNMP.

Available indicators for EAI

Table 2 Available indicators for EAI

Class	Module	Object	Indicator	Indicator description
Device-resource	FWD-RES	Card	ARP_entry_usage	Ratio of the real-time ARP entry count to the upper ARP entry count limit.
	FWD-RES	Card	MAC_entry_usage	Ratio of the real-time MAC entry count to the upper MAC entry count limit.
	FWD-RES	Card	FIB_entry_usage	Ratio of the real-time FIB entry count to the upper FIB entry count limit.
	FWD-RES	Card	ND_entry_usage	Ratio of the real-time ND entry count to the upper ND entry count limit.
	FWD-RES	Card	IPv4L2multicast_usage	Ratio of the real-time IPv4 Layer 2 multicast entry count to the upper IPv4 Layer 2 multicast entry count limit.
	FWD-RES	Card	IPv6L2multicast_usage	Ratio of the real-time IPv6 Layer 2 multicast entry count to the upper IPv6 Layer 2 multicast entry count limit.
	FWD-RES	Card	IPv4L3multicast_usage	Ratio of the real-time IPv4 Layer 3 multicast entry count to the upper IPv4 Layer 3 multicast entry count limit.
	FWD-RES	Card	IPv6L3multicast_usage	Ratio of the real-time IPv6 Layer 3 multicast entry count to the upper IPv6 Layer 3 multicast entry count limit.
	ACL-RES	Card	ACL_usage	Ratio of the real-time ACL entry count to the upper ACL entry count limit.
	STOR-RES	Card	Storage_usage	Ratio of the used storage space to the total storage space.
	DEV-RES	Card	CPU_usage	Ratio of the used CPU capacity to the total CPU capacity.
	DEV-RES	Card	Memory_usage	Ratio of the used memory to the total memory.

‌

Prerequisites for EAI

Make sure the KPI data collection feature is enabled for service modules in Table 2.

EAI tasks at a glance

To configure EAI, perform the following tasks:

· Enabling EAI monitoring

· Enabling EAI prediction

Enabling EAI monitoring

1. Enter system view.

system-view

2. Enter EAI view.

eai artificial intelligence

3. Enable EAI monitoring.

eai monitoring enable

By default, EAI monitoring is disabled.

Enabling EAI prediction

1. Enter system view.

system-view

2. Enter EAI view.

eai artificial intelligence

3. Enable EAI prediction.

eai prediction enable

By default, EAI prediction is disabled.

Display and maintenance commands for EAI

Execute display commands in any view.

Task	Command
Display EAI monitoring information.	display eai monitoring
Display EAI predicted information and history KPI data.	display eai prediction

13-Network Management and Monitoring Configuration Guide

Configuring KPI data collection

Available data for KPI data collection

Configuring KPI data storage

About this task

Procedure

About this task

Procedure

Copying the KPI data on the standby MPU to the active MPU

About this task

Procedure

About this task

Procedure

About this task

Procedure

Display and maintenance commands for KPI data collection

Intelligent Terminal Products

Product Support Services

Technical Service Solutions

Resource Center

Policy

Online Help

Become a Partner

Partner Policy & Program

Global Learning

Partner Sales Resources

Service Business

News & Events

Contact Us