- Table of Contents
-
- 14-Network Management and Monitoring Configuration Guide
- 00-Preface
- 01-CWMP configuration
- 02-EAA configuration
- 03-Flow log configuration
- 04-Flow monitor configuration
- 05-iFIT configuration
- 06-Information center configuration
- 07-iNQA configuration
- 08-IPv6 NetStream configuration
- 09-KPI data collection configuration
- 10-Mirroring configuration
- 11-NETCONF configuration
- 12-NetStream configuration
- 13-Network synchronization configuration
- 14-NQA configuration
- 15-NTP configuration
- 16-Packet capture configuration
- 17-Performance management configuration
- 18-Process monitoring and maintenance configuration
- 19-PTP configuration
- 20-RMON configuration
- 21-Sampler configuration
- 22-SNMP configuration
- 23-SRPM configuration
- 24-System maintenance and debugging configuration
- 25-TCP connection trace configuration
- Related Documents
-
Title | Size | Download |
---|---|---|
09-KPI data collection configuration | 150.05 KB |
Configuring KPI data collection
Available indicators for KPI data collection
KPI data collection tasks at a glance
Copying the KPI data on the standby MPU to the active MPU
Disabling KPI data collection for a module
Specifying the KPI data collection interval for a module
Enabling alarms for a KPI data collection indicator
Configuring alarm thresholds for an indicator
Specifying a log and alarm report mode for an indicator
Enabling SNMP notifications for KPI
Display and maintenance commands for KPI data collection
Configuring KPI data collection
About KPI data collection
The key performance indicators (KPIs) of the device are a set of performance values that indicate the device's running status at a certain moment. During operation, the device automatically collects KPI data and stores the KPI data in the flash.
The KPI data collection feature periodically collects various types of KPI data and records the KPI data in real time. Based on the collected KPI data, you can understand the device running status, service failure time, service failure type, and possible failure causes and quickly troubleshoot the issues.
Basic concepts
The KPI data collection feature can collect a vast quantity and variety of data. For example, the collected CPU usage for a card is a performance parameter belonging to the Device-resource class. The CPU usage is a card-specific parameter belonging to the DEV-RES module and its value 50%. To easily describe, categorize, and retrieve all types of data, KPI data is defined from the following dimensions:
· Indicator—Performance parameters and state collected by the KPI data collection feature, such as the CPU usage, memory usage, FIB table usage, ARP table usage, card failures, power supply failures, and abnormal card temperature.
· Object—A physical or logical entity to which an indicator belongs, such as device, card, and subcard. As the KPI data collection feature can collect more and more indicators, the object types will also become more diverse. The object name varies by object type. For external indicators, the object type and object name vary by module and device model. For more information about object names, see "Available external indicators for KPI data collection." Object name examples are as follows:
¡ device—Specifies a device. Indicators for this object describe the overall condition of the device.
¡ chassis.x/slot.y—Specifies a card. Indicators for this object describe the performance and state of the card. The value for the x argument is 0, and the y argument represents the slot number of the card.
¡ chassis.x/slot.y/subslot.z—Specifies a subcard. Indicators for this object describe the performance and state of the subcard. The value for the x argument is 0, the y argument represents the slot number of the card, and the z argument represents the subcard ID.
¡ interface-typeinterface-number—Specifies an interface by its type and number. Indicators for this object describe the running status of the physical interface.
· Module—Module to which an indicator belongs. For example, the total number of BFD sessions belongs to the BFD module (module name BFD).
· Class—A collection of a certain type of indicators. Some indicators can indicate the running status of a certain aspect of the device. Such indicators can be divided into a class. The system has predefined some classes, such as the network performance (Net-performance) class and port state (Port-state) class.
Operating mechanism
The device enabled with KPI data collection works as follows:
2. Collects KPI data. With KPI data collection enabled, modules push the collected data for indicators to the KPI process at intervals. By default, the KPI data collection intervals for external indicators are customized for modules.
The KPI process temporarily saves the obtained KPI data in the device memory
3. Stores KPI data. The KPI process stores the obtained KPI data in the flash at intervals. When the remaining storage media space is insufficient or the total KPI file size exceeds the threshold, the KPI process automatically deletes the earliest KPI files to release some space.
Available indicators for KPI data collection
Available external indicators for KPI data collection
Table 1 lists some available external indicators for KPI data collection.
Table 1 Available external indicators for KPI data collection
Class |
Module |
Object |
Indicator |
Indicator description |
Net-performance |
IF-USAGE-EX |
Interface. The value is interface-type interface-number. |
Port_TXBW_Usage |
Output bandwidth usage of the interface, in the range of 0% to 100%. |
IF-USAGE-EX |
Interface. The value is interface-type interface-number. |
Port_RXBW_Usage |
Input bandwidth usage of the interface, in the range of 0% to 100%. |
|
Device-resource |
ARP-MSG-QUEUE |
Custom message queue. The value is message-queue. |
ARP_PKTQ_Health |
ARP packet queue health, which is the ratio of the number of messages in the queue to the total queue size. The value range is 0% to 100%. |
ARP-MSG-QUEUE |
Custom message queue. The value is message-queue. |
ARP_EVTQ_Health |
ARP event queue health, which is the ratio of the number of messages in the queue to total queue size. The value range is 0% to 100%. |
|
ND-MSG-QUEUE |
Custom message queue. The value is message-queue. |
ND_PKTQ_Health |
ND packet queue health, which is the ratio of the number of messages in the queue to the total queue size. The value range is 0% to 100%. |
|
ND-MSG-QUEUE |
Custom message queue. The value is message-queue. |
ND_EVTQ_Health |
ND event queue health, which is the ratio of the number of messages in the queue to total queue size. The value range is 0% to 100%. |
|
MAC-USAGE |
Card. The value is chassis.x/slot.y. |
MAC_Addr_Useage |
MAC address usage of the card, in the range of 0% to 100%. |
|
AGG-USAGE |
Device. The value is device. |
AGG_ID_Useage |
Aggregate interface ID resource usage on the device, in the range of 0% to 100%. |
|
ACL_USE |
Chip. The value is chassis.x/slot.y/chip.z. |
ACL_USE_IPV4_RATIO |
IPv4 ACL entry resource usage, in the range of 0% to 100%. |
|
ACL_USE |
Chip. The value is chassis.x/slot.y/chip.z. |
ACL_USE_IPV6_RATIO |
IPv6 ACL entry resource usage, in the range of 0% to 100%. |
|
CGN |
Card. The value is chassis.x/slot.y. |
CGN_SESSION_USAGE |
CGN session resource usage, in the range of 0% to 100%. |
|
CGN |
Card. The value is chassis.x/slot.y. |
CGN_FWD_RX_USAGE |
Input bandwidth usage of the CGN card, in the range of 0% to 100%. |
|
CGN |
Card. The value is chassis.x/slot.y. |
CGN_FWD_TX_USAGE |
Output bandwidth usage of the CGN card, in the range of 0% to 100%. |
|
NAT_ADDRGRP_RES |
NAT address group usage. The value is group-name. |
NAT_ADDRGRP_CUR_RES_USAGE |
Resource usage of the address group, in the range of 0% to 100%. |
|
NAT_IPPOOL_RES |
Address pool. The value is pool-name. |
NAT_IPPOOL_CUR_RES_USAGE |
Address pool resource usage, in the range of 0% to 100%. |
|
NAT_QUEUE |
Queue. The value is Nat_cgn_send_queue. |
NAT_QUEUE_LEN |
Queue length. |
|
CGN_QUEUE |
Queue. The value is CGN_SEND_QUEUE. |
CGN_QUEUE_LEN |
Queue length. |
|
SOFTQUEINFO |
Custom message queue. The value is chassis.x/slot.y/RQ.u. |
SOFTQUE_PACKET |
Number of packets in the software queue. |
|
SOFTQUEINFO |
Custom message queue. The value is chassis.x/slot.y/RQ.u. |
SOFTQUE_DROP |
Number of dropped packets in the software queue. |
|
Device-state |
BFD |
Device. The value is device. |
BFD_TOTAL_NUMBER |
Total number of BFD sessions. |
VRRP-V4 |
Device. The value is device. |
VRRP_FAIL_STATE_RATIO_V4 |
Ratio of abnormal VRRPv4 sessions to total VRRPv4 sessions. The value is a decimal in the range of 0 to 1. |
|
VRRP-V6 |
Device. The value is device. |
VRRP_FAIL_STATE_RATIO_V6 address family. |
Ratio of abnormal VRRPv6 sessions to total VRRPv6 sessions. The value is a decimal in the range of 0 to 1. |
|
VRRP-V4 address family. |
Device. The value is device. |
VRRP_STATE_CONVER_V4 address family. |
Number of master/backup switchovers in the VRRPv4 group. |
|
VRRP-V6 address family. |
Device. The value is device. |
VRRP_STATE_CONVER_V6 address family. |
Number of master/backup switchovers in the VRRPv6 group. |
|
STRUNK |
Subcard. The value is chassis.x/slot.y/subslot.z. |
STRUNK_FAIL_STATE_RATIO |
Ratio of abnormal S-Trunk sessions to total S-Trunk sessions. The value is a decimal in the range of 0 to 1. |
|
STRUNK |
Device. The value is device. |
STRUNK_GROUP_STATE_CONVER |
Number of S-Trunk group state changes. |
|
STRUNK |
Device. The value is device. |
STRUNK_MEMBER_STATE_CONVER |
Number of S-Trunk member state changes. |
|
ACL_STATE |
Card. The value is chassis.x/slot.y. |
ACL_COPP_IPV4_USE_STATE |
State of IPv4 ACL entry operation in a COPP service cycle: · 0—Normal. · 1—Abnormal. |
|
ACL_STATE |
Card. The value is chassis.x/slot.y. |
ACL_COPP_IPV6_USE_STATE |
State of IPv6 ACL entry operation in a COPP service cycle: · 0—Normal. · 1—Abnormal. |
|
ACL_STATE |
Card. The value is chassis.x/slot.y. |
ACL_ATTACK_USE_STATE |
State of ACL entry operation in an anti-attack service cycle: · 0—Normal. · 1—Abnormal. |
|
ARP_ND |
Card. The value is chassis.x/slot.y. |
ARP_USE_NUM |
Number of ARP entry deployments within a collection interval. |
|
FIB |
Card. The value is chassis.x/slot.y. |
FIB_USE_NUM |
Number of FIB entry deployments within a collection interval. |
|
IBC_CHAN |
Device. The value is device. |
IBC_GE_STATE |
GE channel link state: · 0—Normal. · 1—Abnormal. |
|
IBC_CHAN |
Device. The value is device. |
IBC_FE_STATE |
FE channel link state: · 0—Normal. · 1—Abnormal. |
|
ARP_ND |
Card. The value is chassis.x/slot.y. |
ND_USE_NUM |
Number of ND entry deployments within a collection interval. |
|
ARP_ND |
Card. The value is chassis.x/slot.y. |
ARP_USE_STATE |
Whether ARP entry deployment to the driver is abnormal within a collection interval. |
|
ARP_ND |
Card. The value is chassis.x/slot.y. |
ND_USE_STATE |
Whether ND entry deployment to the driver is abnormal within a collection interval. |
|
FIB |
Card. The value is chassis.x/slot.y. |
FIB6_USE_NUM address family. |
Number of IPv6 FIB entry deployments within a collection interval. |
|
FIB |
Card. The value is chassis.x/slot.y. |
FIB4_USE_STATE |
Whether IPv4 FIB entry deployment to the driver is abnormal within a collection interval. |
|
FIB |
Card. The value is chassis.x/slot.y. |
FIB6_USE_STATE |
Whether IPv6 FIB entry deployment to the driver is abnormal within a collection interval. |
|
VSRP_INSTANCE |
VSRP instance. The value is instance-name. |
VSRP_INSTANCE_STATE |
VSRP instance state: · Master. · Backup. · Down. |
|
VSRP_PEER |
VSRP peer. The value is peer-name. |
VSRP_PEER_STATE |
VSRP peer state: · 0—Connected. · 1—Not connected. |
|
BFD |
Device. The value is device. |
BFD_NORMAL_NUMBER |
Number of normal BFD sessions. |
|
PROTORATE |
Protocol type: · 1588_FD. · 1588_FE. · 1588_FF. · 8021x_B. |
PROTO_RATE |
Rate of protocol packets sent to the operating system, in pps. |
Restrictions and guidelines
By default, KPI data collection is enabled for all modules that support this feature on the device.
To prevent data collection from affecting normal services due to a large amount of data, the KPI data collection feature is suppressed when the device memory or CPU usage reaches the alarm threshold. At the same time, the KPI process stops collecting data. For detailed information about the alarm thresholds for the device memory and CPU memory, see device management configuration in Fundamentals Configuration Guide.
KPI data collection tasks at a glance
To configure KPI data collection, perform the following tasks:
· (Optional.) Configuring KPI data storage
· (Optional.) Configuring KPI file aging
· (Optional.) Copying the KPI data on the standby MPU to the active MPU
· (Optional.) Disabling KPI data collection for a module
· (Optional.) Specifying the KPI data collection interval for a module
· (Optional.) Enabling alarms for a KPI data collection indicator
· (Optional.) Configuring alarm thresholds for an indicator
· (Optional.) Specifying a log and alarm report mode for an indicator
· (Optional.) Enabling SNMP notifications for KPI
Configuring KPI data storage
About this task
The KPI files in the memory are saved to the storage media at intervals. Use this feature to edit the KPI file directory and the interval for saving KPI files to the storage media.
Procedure
1. Enter system view.
system-view
2. Specify the interval for saving KPI files to the storage media.
kpi file save-interval interval
By default, KPI files are saved to the storage media at an interval of 1440 minutes.
3. Specify the KPI file directory.
kpi file directory dir-name
By default, KPI files are saved in the flash:/kpi directory.
Configuring KPI file aging
About this task
When the free storage media space is insufficient or the total KPI file size exceeds the threshold, the KPI process automatically deletes the earliest KPI files to release some space. Use this feature to edit the free storage media capacity threshold and the KPI file size threshold for triggering KPI file aging.
Procedure
1. Enter system view.
system-view
2. Specify the free storage media capacity threshold for triggering KPI file aging.
kpi file aging threshold remain-disk-size size
By default, the free storage media capacity threshold for triggering KPI file aging is 128 MB.
3. Specify the KPI file size threshold for triggering KPI file aging.
kpi file aging threshold total-file-size size
By default, the KPI file size threshold for triggering KPI file aging is 128 MB.
Copying the KPI data on the standby MPU to the active MPU
About this task
After an MPU active/standby switchover on the device, the new active MPU cannot automatically obtain the KPI data from the old active MPU (current standby MPU). To ensure service continuity, you must use this feature to copy the KPI data on the old active MPU to the new active MPU.
IMPORTANT: If the administrator edits the KPI file directory by using the kpi file directory command before the active/standby switchover, the original active MPU will have two KPI file directories. After the switchover, this feature enables the system to copy only the KPI data stored in the new directory on the old MPU to the same directory on the new active MPU. The KPI data files in the old directory cannot be copied to the new active MPU. |
Procedure
1. Enter system view.
system-view
2. Copy the KPI data in the standby MPU to the active MPU.
kpi copy-file to active-mpu
The slot slot-number option is supported only on CTRL-VMs.
Disabling KPI data collection for a module
About this task
To prevent data collection from affecting normal services due to a large amount of data, use this feature to disable KPI data collection for some modules when the device memory usage or CPU usage is high.
Procedure
1. Enter system view.
system-view
2. Disable KPI data collection for a module.
undo kpi collect module [ module-name ] enable
By default, KPI data collection is enabled for all modules that support this feature on the device.
Specifying the KPI data collection interval for a module
About this task
You can use this feature to edit the KPI data collection interval for a module.
Procedure
1. Enter system view.
system-view
2. Specify the KPI data collection interval for a module.
kpi module module-name collect-interval collect-interval
By default, the KPI data collection intervals for modules to which external indicators belong are customized.
To view the KPI data collection intervals for modules, execute the display kpi module-info command.
Enabling alarms for a KPI data collection indicator
About this task
After you enable alarms for a KPI data collection indicator, the device reports the collected data to NMS via SNMP and generates a log for it.
For more information about alarms for indicators, execute the display system internal kpi register-status command.
Procedure
1. Enter system view.
system-view
2. Enable alarms for a KPI data collection indicator.
kpi collect indicator indicator-name alarm-enable
By default, the enabling status of alarms varies by module and device model. To obtain the enabling status of alarms, execute the display system internal kpi register-status command.
Configuring alarm thresholds for an indicator
About this task
The device generates an alarm in the following conditions:
· The device generates a level-1 alarm if the indicator value collected by KPI exceeds the level-1 alarm threshold.
· The device generates a level-2 alarm if the indicator value collected by KPI exceeds the level-2 alarm threshold.
· The device generates a level-3 alarm if the indicator value collected by KPI exceeds the level-3 alarm threshold.
· The device generates a low value alarm if the indicator value collected by KPI drops below the low alarm threshold.
The device generates an alarm clearance notification in the following conditions:
· The device generates a level-3 alarm clearance notification if the indicator value collected by KPI drops below the level-3 alarm threshold.
· The device generates a level-2 alarm clearance notification if the indicator value collected by KPI drops below the level-2 alarm threshold.
· The device generates a level-1 alarm clearance notification if the indicator value collected by KPI drops below the level-1 alarm threshold.
· The device generates a low value alarm clearance notification if the indicator value collected by KPI exceeds the low alarm threshold but does not exceed the level-1 alarm threshold.
Procedure
1. Enter system view.
system-view
2. Configure alarm thresholds for an indicator.
kpi collect [ percentage ] indicator indicator-name threshold low low-threshold normal normal-threshold l1warning l1-threshold l2warning l2-threshold l3warning l3-threshold
By default, the alarm thresholds for an indicator are customized for each module. To view the alarm thresholds, execute the display system internal kpi register-status command.
Specifying a log and alarm report mode for an indicator
About this task
Whether a trap and a log are generated and reported for an indicator varies by module and report mode.
· Always—Reports an alarm and a log every time an indicator value is collected.
· Change—Reports an alarm and a log only if the collected indicator value is different from that collected previously.
· Threshold—Reports an alarm and a log only if the collected indicator value exceeds an threshold.
Procedure
1. Enter system view.
system-view
2. Specify a log and alarm report mode for an indicator.
kpi collect indicator indicator-name collect-type { always | change | threshold }
By default, the log and alarm report mode is customized for each module. To view the report mode for an indicator, execute the display system internal kpi register-status command.
Enabling SNMP notifications for KPI
About this task
With SNMP notifications enabled for KPI, the device sends related information to the SNMP module when an indicator value falls outside or within the alarm threshold range.
For the notifications to be sent correctly, you must also configure SNMP on the device. For more information about SNMP notifications, see "Configuring SNMP."
Procedure
1. Enter system view.
system-view
2. Enable SNMP notifications for KPI.
snmp-agent trap enable kpi
By default, SNMP notifications are enabled for KPI.
Display and maintenance commands for KPI data collection
Execute display commands in any view.
Task |
Command |
Display the KPI data for an object of a module on the remote device. |
display external-kpi data [ device-ip ip-address [ module module-name [ object object-name ] ] ] |
Display KPI data collection information for a module. |
display kpi module-info [ module-name ] [ verbose ] |
Display the KPI data for an object of a module within a time range on the storage media. |
display kpi data module module-name object object-name from time1 date1 to time2 date2 [ file file-path ] |
Display the running status and registration information of KPI data collection for a module. |
display system internal kpi register-status [ module module-name ] |
Display the running status of KPI data collection for a module. |
display system internal kpi status [ module module-name ] |