Table of Contents

Related Documents

09-KPI data collection configuration

Title	Size	Download
09-KPI data collection configuration	150.05 KB

Configuring KPI data collection

About KPI data collection

The key performance indicators (KPIs) of the device are a set of performance values that indicate the device's running status at a certain moment. During operation, the device automatically collects KPI data and stores the KPI data in the flash.

The KPI data collection feature periodically collects various types of KPI data and records the KPI data in real time. Based on the collected KPI data, you can understand the device running status, service failure time, service failure type, and possible failure causes and quickly troubleshoot the issues.

Basic concepts

The KPI data collection feature can collect a vast quantity and variety of data. For example, the collected CPU usage for a card is a performance parameter belonging to the Device-resource class. The CPU usage is a card-specific parameter belonging to the DEV-RES module and its value 50%. To easily describe, categorize, and retrieve all types of data, KPI data is defined from the following dimensions:

· Indicator—Performance parameters and state collected by the KPI data collection feature, such as the CPU usage, memory usage, FIB table usage, ARP table usage, card failures, power supply failures, and abnormal card temperature.

· Object—A physical or logical entity to which an indicator belongs, such as device, card, and subcard. As the KPI data collection feature can collect more and more indicators, the object types will also become more diverse. The object name varies by object type. For external indicators, the object type and object name vary by module and device model. For more information about object names, see "Available external indicators for KPI data collection." Object name examples are as follows:

¡ device—Specifies a device. Indicators for this object describe the overall condition of the device.

¡ chassis.x/slot.y—Specifies a card. Indicators for this object describe the performance and state of the card. The value for the x argument is 0, and the y argument represents the slot number of the card. ‌

¡ chassis.x/slot.y/subslot.z—Specifies a subcard. Indicators for this object describe the performance and state of the subcard. The value for the x argument is 0, the y argument represents the slot number of the card, and the z argument represents the subcard ID. ‌

¡ interface-typeinterface-number—Specifies an interface by its type and number. Indicators for this object describe the running status of the physical interface.

· Module—Module to which an indicator belongs. For example, the total number of BFD sessions belongs to the BFD module (module name BFD).

· Class—A collection of a certain type of indicators. Some indicators can indicate the running status of a certain aspect of the device. Such indicators can be divided into a class. The system has predefined some classes, such as the network performance (Net-performance) class and port state (Port-state) class.

Operating mechanism

Figure 1 Operating mechanism

‌

The device enabled with KPI data collection works as follows:

2. Collects KPI data. With KPI data collection enabled, modules push the collected data for indicators to the KPI process at intervals. By default, the KPI data collection intervals for external indicators are customized for modules.

The KPI process temporarily saves the obtained KPI data in the device memory

3. Stores KPI data. The KPI process stores the obtained KPI data in the flash at intervals. When the remaining storage media space is insufficient or the total KPI file size exceeds the threshold, the KPI process automatically deletes the earliest KPI files to release some space.

Available indicators for KPI data collection

Available external indicators for KPI data collection

Table 1 lists some available external indicators for KPI data collection.

Table 1 Available external indicators for KPI data collection

Class	Module	Object	Indicator	Indicator description
Net-performance	IF-USAGE-EX	Interface. The value is interface-type interface-number.	Port_TXBW_Usage	Output bandwidth usage of the interface, in the range of 0% to 100%.
Net-performance	IF-USAGE-EX	Interface. The value is interface-type interface-number.	Port_RXBW_Usage	Input bandwidth usage of the interface, in the range of 0% to 100%.
Device-resource	ARP-MSG-QUEUE	Custom message queue. The value is message-queue.	ARP_PKTQ_Health	ARP packet queue health, which is the ratio of the number of messages in the queue to the total queue size. The value range is 0% to 100%.
	ARP-MSG-QUEUE	Custom message queue. The value is message-queue.	ARP_EVTQ_Health	ARP event queue health, which is the ratio of the number of messages in the queue to total queue size. The value range is 0% to 100%.
	ND-MSG-QUEUE	Custom message queue. The value is message-queue.	ND_PKTQ_Health	ND packet queue health, which is the ratio of the number of messages in the queue to the total queue size. The value range is 0% to 100%.
	ND-MSG-QUEUE	Custom message queue. The value is message-queue.	ND_EVTQ_Health	ND event queue health, which is the ratio of the number of messages in the queue to total queue size. The value range is 0% to 100%.
	MAC-USAGE	Card. The value is chassis.x/slot.y.	MAC_Addr_Useage	MAC address usage of the card, in the range of 0% to 100%.
	AGG-USAGE	Device. The value is device.	AGG_ID_Useage	Aggregate interface ID resource usage on the device, in the range of 0% to 100%.
	ACL_USE	Chip. The value is chassis.x/slot.y/chip.z.	ACL_USE_IPV4_RATIO	IPv4 ACL entry resource usage, in the range of 0% to 100%.
	ACL_USE	Chip. The value is chassis.x/slot.y/chip.z.	ACL_USE_IPV6_RATIO	IPv6 ACL entry resource usage, in the range of 0% to 100%.
	CGN	Card. The value is chassis.x/slot.y.	CGN_SESSION_USAGE	CGN session resource usage, in the range of 0% to 100%.
	CGN	Card. The value is chassis.x/slot.y.	CGN_FWD_RX_USAGE	Input bandwidth usage of the CGN card, in the range of 0% to 100%.
	CGN	Card. The value is chassis.x/slot.y.	CGN_FWD_TX_USAGE	Output bandwidth usage of the CGN card, in the range of 0% to 100%.
	NAT_ADDRGRP_RES	NAT address group usage. The value is group-name.	NAT_ADDRGRP_CUR_RES_USAGE	Resource usage of the address group, in the range of 0% to 100%.
	NAT_IPPOOL_RES	Address pool. The value is pool-name.	NAT_IPPOOL_CUR_RES_USAGE	Address pool resource usage, in the range of 0% to 100%.
	NAT_QUEUE	Queue. The value is Nat_cgn_send_queue.	NAT_QUEUE_LEN	Queue length.
	CGN_QUEUE	Queue. The value is CGN_SEND_QUEUE.	CGN_QUEUE_LEN	Queue length.
	SOFTQUEINFO	Custom message queue. The value is chassis.x/slot.y/RQ.u.	SOFTQUE_PACKET	Number of packets in the software queue.
	SOFTQUEINFO	Custom message queue. The value is chassis.x/slot.y/RQ.u.	SOFTQUE_DROP	Number of dropped packets in the software queue.
Device-state	BFD	Device. The value is device.	BFD_TOTAL_NUMBER	Total number of BFD sessions.
	VRRP-V4	Device. The value is device.	VRRP_FAIL_STATE_RATIO_V4	Ratio of abnormal VRRPv4 sessions to total VRRPv4 sessions. The value is a decimal in the range of 0 to 1.
	VRRP-V6	Device. The value is device.	VRRP_FAIL_STATE_RATIO_V6 address family.	Ratio of abnormal VRRPv6 sessions to total VRRPv6 sessions. The value is a decimal in the range of 0 to 1.
	VRRP-V4 address family.	Device. The value is device.	VRRP_STATE_CONVER_V4 address family.	Number of master/backup switchovers in the VRRPv4 group.
	VRRP-V6 address family.	Device. The value is device.	VRRP_STATE_CONVER_V6 address family.	Number of master/backup switchovers in the VRRPv6 group.
	STRUNK	Subcard. The value is chassis.x/slot.y/subslot.z.	STRUNK_FAIL_STATE_RATIO	Ratio of abnormal S-Trunk sessions to total S-Trunk sessions. The value is a decimal in the range of 0 to 1.
	STRUNK	Device. The value is device.	STRUNK_GROUP_STATE_CONVER	Number of S-Trunk group state changes.
	STRUNK	Device. The value is device.	STRUNK_MEMBER_STATE_CONVER	Number of S-Trunk member state changes.
	ACL_STATE	Card. The value is chassis.x/slot.y.	ACL_COPP_IPV4_USE_STATE	State of IPv4 ACL entry operation in a COPP service cycle: · 0—Normal. · 1—Abnormal.
	ACL_STATE	Card. The value is chassis.x/slot.y.	ACL_COPP_IPV6_USE_STATE	State of IPv6 ACL entry operation in a COPP service cycle: · 0—Normal. · 1—Abnormal.
	ACL_STATE	Card. The value is chassis.x/slot.y.	ACL_ATTACK_USE_STATE	State of ACL entry operation in an anti-attack service cycle: · 0—Normal. · 1—Abnormal.
	ARP_ND	Card. The value is chassis.x/slot.y.	ARP_USE_NUM	Number of ARP entry deployments within a collection interval.
	FIB	Card. The value is chassis.x/slot.y.	FIB_USE_NUM	Number of FIB entry deployments within a collection interval.
	IBC_CHAN	Device. The value is device.	IBC_GE_STATE	GE channel link state: · 0—Normal. · 1—Abnormal.
	IBC_CHAN	Device. The value is device.	IBC_FE_STATE	FE channel link state: · 0—Normal. · 1—Abnormal.
	ARP_ND	Card. The value is chassis.x/slot.y.	ND_USE_NUM	Number of ND entry deployments within a collection interval.
	ARP_ND	Card. The value is chassis.x/slot.y.	ARP_USE_STATE	Whether ARP entry deployment to the driver is abnormal within a collection interval.
	ARP_ND	Card. The value is chassis.x/slot.y.	ND_USE_STATE	Whether ND entry deployment to the driver is abnormal within a collection interval.
	FIB	Card. The value is chassis.x/slot.y.	FIB6_USE_NUM address family.	Number of IPv6 FIB entry deployments within a collection interval.
	FIB	Card. The value is chassis.x/slot.y.	FIB4_USE_STATE	Whether IPv4 FIB entry deployment to the driver is abnormal within a collection interval.
	FIB	Card. The value is chassis.x/slot.y.	FIB6_USE_STATE	Whether IPv6 FIB entry deployment to the driver is abnormal within a collection interval.
	VSRP_INSTANCE	VSRP instance. The value is instance-name.	VSRP_INSTANCE_STATE	VSRP instance state: · Master. · Backup. · Down.
	VSRP_PEER	VSRP peer. The value is peer-name.	VSRP_PEER_STATE	VSRP peer state: · 0—Connected. · 1—Not connected.
	BFD	Device. The value is device.	BFD_NORMAL_NUMBER	Number of normal BFD sessions.
	PROTORATE	Protocol type: · 1588_FD. · 1588_FE. · 1588_FF. · 8021x_B.	PROTO_RATE	Rate of protocol packets sent to the operating system, in pps.

Restrictions and guidelines

By default, KPI data collection is enabled for all modules that support this feature on the device.

To prevent data collection from affecting normal services due to a large amount of data, the KPI data collection feature is suppressed when the device memory or CPU usage reaches the alarm threshold. At the same time, the KPI process stops collecting data. For detailed information about the alarm thresholds for the device memory and CPU memory, see device management configuration in Fundamentals Configuration Guide.

KPI data collection tasks at a glance

To configure KPI data collection, perform the following tasks:

· (Optional.) Configuring KPI data storage

· (Optional.) Configuring KPI file aging

· (Optional.) Copying the KPI data on the standby MPU to the active MPU

· (Optional.) Disabling KPI data collection for a module

· (Optional.) Specifying the KPI data collection interval for a module

· (Optional.) Enabling alarms for a KPI data collection indicator

· (Optional.) Configuring alarm thresholds for an indicator

· (Optional.) Specifying a log and alarm report mode for an indicator

· (Optional.) Enabling SNMP notifications for KPI

Configuring KPI data storage

About this task

The KPI files in the memory are saved to the storage media at intervals. Use this feature to edit the KPI file directory and the interval for saving KPI files to the storage media.

Procedure

1. Enter system view.

system-view

2. Specify the interval for saving KPI files to the storage media.

kpi file save-interval interval

By default, KPI files are saved to the storage media at an interval of 1440 minutes.

3. Specify the KPI file directory.

kpi file directory dir-name

By default, KPI files are saved in the flash:/kpi directory.

Configuring KPI file aging

About this task

When the free storage media space is insufficient or the total KPI file size exceeds the threshold, the KPI process automatically deletes the earliest KPI files to release some space. Use this feature to edit the free storage media capacity threshold and the KPI file size threshold for triggering KPI file aging.

Procedure

1. Enter system view.

system-view

2. Specify the free storage media capacity threshold for triggering KPI file aging.

kpi file aging threshold remain-disk-size size

By default, the free storage media capacity threshold for triggering KPI file aging is 128 MB.

3. Specify the KPI file size threshold for triggering KPI file aging.

kpi file aging threshold total-file-size size

By default, the KPI file size threshold for triggering KPI file aging is 128 MB.

Copying the KPI data on the standby MPU to the active MPU

About this task

After an MPU active/standby switchover on the device, the new active MPU cannot automatically obtain the KPI data from the old active MPU (current standby MPU). To ensure service continuity, you must use this feature to copy the KPI data on the old active MPU to the new active MPU.

IMPORTANT:

If the administrator edits the KPI file directory by using the kpi file directory command before the active/standby switchover, the original active MPU will have two KPI file directories. After the switchover, this feature enables the system to copy only the KPI data stored in the new directory on the old MPU to the same directory on the new active MPU. The KPI data files in the old directory cannot be copied to the new active MPU.

Procedure

1. Enter system view.

system-view

2. Copy the KPI data in the standby MPU to the active MPU.

kpi copy-file to active-mpu

The slot slot-number option is supported only on CTRL-VMs.

Disabling KPI data collection for a module

About this task

To prevent data collection from affecting normal services due to a large amount of data, use this feature to disable KPI data collection for some modules when the device memory usage or CPU usage is high.

Procedure

1. Enter system view.

system-view

2. Disable KPI data collection for a module.

undo kpi collect module [ module-name ] enable

By default, KPI data collection is enabled for all modules that support this feature on the device.

Specifying the KPI data collection interval for a module

About this task

You can use this feature to edit the KPI data collection interval for a module.

Procedure

1. Enter system view.

system-view

2. Specify the KPI data collection interval for a module.

kpi module module-name collect-interval collect-interval

By default, the KPI data collection intervals for modules to which external indicators belong are customized.

To view the KPI data collection intervals for modules, execute the display kpi module-info command.

Enabling alarms for a KPI data collection indicator

About this task

After you enable alarms for a KPI data collection indicator, the device reports the collected data to NMS via SNMP and generates a log for it.

For more information about alarms for indicators, execute the display system internal kpi register-status command.

Procedure

1. Enter system view.

system-view

2. Enable alarms for a KPI data collection indicator.

kpi collect indicator indicator-name alarm-enable

By default, the enabling status of alarms varies by module and device model. To obtain the enabling status of alarms, execute the display system internal kpi register-status command.

Configuring alarm thresholds for an indicator

About this task

The device generates an alarm in the following conditions:

· The device generates a level-1 alarm if the indicator value collected by KPI exceeds the level-1 alarm threshold.

· The device generates a level-2 alarm if the indicator value collected by KPI exceeds the level-2 alarm threshold.

· The device generates a level-3 alarm if the indicator value collected by KPI exceeds the level-3 alarm threshold.

· The device generates a low value alarm if the indicator value collected by KPI drops below the low alarm threshold.

The device generates an alarm clearance notification in the following conditions:

· The device generates a level-3 alarm clearance notification if the indicator value collected by KPI drops below the level-3 alarm threshold.

· The device generates a level-2 alarm clearance notification if the indicator value collected by KPI drops below the level-2 alarm threshold.

· The device generates a level-1 alarm clearance notification if the indicator value collected by KPI drops below the level-1 alarm threshold.

· The device generates a low value alarm clearance notification if the indicator value collected by KPI exceeds the low alarm threshold but does not exceed the level-1 alarm threshold.

Procedure

1. Enter system view.

system-view

2. Configure alarm thresholds for an indicator.

kpi collect [ percentage ] indicator indicator-name threshold low low-threshold normal normal-threshold l1warning l1-threshold l2warning l2-threshold l3warning l3-threshold

By default, the alarm thresholds for an indicator are customized for each module. To view the alarm thresholds, execute the display system internal kpi register-status command.

Specifying a log and alarm report mode for an indicator

About this task

Whether a trap and a log are generated and reported for an indicator varies by module and report mode.

· Always—Reports an alarm and a log every time an indicator value is collected.

· Change—Reports an alarm and a log only if the collected indicator value is different from that collected previously.

· Threshold—Reports an alarm and a log only if the collected indicator value exceeds an threshold.

Procedure

1. Enter system view.

system-view

2. Specify a log and alarm report mode for an indicator.

kpi collect indicator indicator-name collect-type { always | change | threshold }

By default, the log and alarm report mode is customized for each module. To view the report mode for an indicator, execute the display system internal kpi register-status command.

Enabling SNMP notifications for KPI

About this task

With SNMP notifications enabled for KPI, the device sends related information to the SNMP module when an indicator value falls outside or within the alarm threshold range.

For the notifications to be sent correctly, you must also configure SNMP on the device. For more information about SNMP notifications, see "Configuring SNMP."

Procedure

1. Enter system view.

system-view

2. Enable SNMP notifications for KPI.

snmp-agent trap enable kpi

By default, SNMP notifications are enabled for KPI.

Display and maintenance commands for KPI data collection

Execute display commands in any view.

Task	Command
Display the KPI data for an object of a module on the remote device.	display external-kpi data [ device-ip ip-address [ module module-name [ object object-name ] ] ]
Display KPI data collection information for a module.	display kpi module-info [ module-name ] [ verbose ]
Display the KPI data for an object of a module within a time range on the storage media.	display kpi data module module-name object object-name from time1 date1 to time2 date2 [ file file-path ]
Display the running status and registration information of KPI data collection for a module.	display system internal kpi register-status [ module module-name ]
Display the running status of KPI data collection for a module.	display system internal kpi status [ module module-name ]

14-Network Management and Monitoring Configuration Guide

Configuring KPI data collection

Available indicators for KPI data collection

Configuring KPI data storage

About this task

Procedure

About this task

Procedure

Copying the KPI data on the standby MPU to the active MPU

About this task

Procedure

Disabling KPI data collection for a module

About this task

Procedure

About this task

Procedure

About this task

Procedure

About this task

Procedure

About this task

Procedure

About this task

Procedure

Display and maintenance commands for KPI data collection

Intelligent Terminal Products

Product Support Services

Technical Service Solutions

Resource Center

Policy

Online Help

Become a Partner

Partner Policy & Program

Global Learning

Partner Sales Resources

Service Business

News & Events

Contact Us