- Table of Contents
- Related Documents
-
Title | Size | Download |
---|---|---|
03-NQA Troubleshooting Guide | 152.46 KB |
Troubleshooting network management and monitoring
Troubleshooting NQA
ICMP echo operation failure
Symptom
1. Execute the display nqa { history | result | statistics } command on the device. The following conditions indicate an operation failure:
¡ If you execute the display nqa history command, the Status field value in the output is not Succeeded.
¡ If you execute the display nqa result or display nqa statistics command, the Extended results field value in the output is not 0.
2. If an application module (such as Track) is associated with a failed NQA operation, the application module takes corresponding actions. For example, the state of the track entry will change from Positive to Negative or NotReady.
Common causes
1. If the Status field value in the operation result is Internal error or Unknown error, the following are the common causes of this type of issue:
¡ The device does not have a route or ARP entry destined for the destination address of the operation.
¡ Insufficient device memory.
¡ Other internal reasons.
2. If the Status field value in the operation result is Timeout, the following are the common causes of this type of issue:
¡ Network errors:
- Probe packets are mistakenly identified as attack packets and dropped by security devices.
- The network time frequently changes.
- Transmission of probe packets fails and cyclic redundancy check (CRC) errors occur on the interface.
- Probe packets are lost on the packet transmission path due to other reasons.
¡ Configuration errors:
- The network is complicated. Too many devices exist between the source and the destination of the operation. The default TTL value, which is 20, cannot meet the requirements.
- Probe packets are excessively large, causing too many fragments and processing timeout.
- The output interface and next hop are configured incorrectly.
- The source address is configured incorrectly.
- The probe timeout time is too small.
Troubleshooting flow
Figure 1 shows the troubleshooting flowchart.
Figure 1 Flowchart for troubleshooting ICMP echo operation failure
Solution
1. Execute the display nqa { history | result | statistics } command to view the NQA operation results. Identify the failed NQA operation, its execution time, and the failure type.
¡ If the value of the Status field in the output of the display nqa history command is not Succeeded, it indicates that the NQA operation fails.
<Sysname> display nqa history admin test
NQA entry (admin admin, tag test) history records:
Index Response Status Time
10 500 Timeout 2023-03-12 17:03:01.6
9 500 Timeout 2023-03-12 17:03:01.1
...
Available values for the Status field include:
- Succeeded.
- Internal error. (This state does not trigger the Track state change in NQA-Track collaboration.)
- Unknown error. (This state triggers the Track state change in NQA-Track collaboration.)
- Timeout. (This state triggers the Track state change in NQA-Track collaboration.)
¡ If the values in the Extended results area in the output of the display nqa result command are not 0, it indicates that the most recent NQA operation fails.
<Sysname> display nqa result admin test
NQA entry (admin admin, tag test) test results:
Send operation times: 1 Receive response times: 1
Min/Max/Average round trip time: 35/35/35
Square-Sum of round trip time: 1225
Last succeeded probe time: 2023-03-12 10:50:33.2
Extended results:
Packet loss ratio: 0%
Failures due to timeout: 0
Failures due to disconnect: 0
Failures due to no connection: 0
Failures due to internal error: 0
Failures due to other errors: 0
¡ If the values in the Extended results area in the output of the display nqa statistics command are not 0, it indicates that a performed NQA operation fails.
<Sysname> display nqa statistics admin test
NQA entry (admin admin, tag test) test statistics:
NO. : 1
Start time: 2023-03-12 09:30:20.0
Life time: 2 seconds
Send operation times: 1 Receive response times: 1
Min/Max/Average round trip time: 13/13/13
Square-Sum of round trip time: 169
Extended results:
Packet loss ratio: 0%
Failures due to timeout: 0
Failures due to disconnect: 0
Failures due to no connection: 0
Failures due to internal error: 0
Failures due to other errors: 0
¡ If the time displayed in the output of the display nqa { history | result | statistics } command is not as expected, the NQA operation you configured might not have started. Execute the nqa schedule command to start the NQA operation.
2. To resolve an NQA operation failure due to internal errors or unknown errors, perform the following tasks:
a. Identify the destination address of the NQA operation.
Execute the display current-configuration [ configuration nqa ] command to view NQA configuration. The destination ip or destination ipv6 field in the output displays the destination address of the NQA operation. If the destination address is incorrect, execute the undo nqa schedule command in system view to stop the NQA operation. Then, execute the destination ip or destination ipv6 command in NQA operation view to edit the destination address before you restart the operation.
b. Execute the ping command to ping the destination address of the NQA operation. If the address cannot be pinged, resolve the unreachability issue. If there is a data link to the destination address but the device has no routes to the address in the routing table, execute the out interface or next-hop ip command in ICMP echo operation view. With either command executed, NQA will skip the routing table lookup and directly encapsulate the NQA probe packets with the specified IP address.
c. Identify whether the NQA operation failure is caused by device memory insufficiency.
d. Execute the display memory-threshold command to view information about the memory alarm threshold. If the value of the Current free-memory state field is Minor (level-1 alarm threshold state), Severe (level-2 alarm threshold state), or Critical (level-3 alarm threshold state), it indicates that the device memory is insufficient. Resolve the device memory insufficiency.
3. To resolve an NQA operation failure due to timeout issues, perform the following tasks:
a. Identify the destination address of the NQA operation.
Execute the display current-configuration [ configuration nqa ] command to view NQA configuration. The destination ip or destination ipv6 field in the output displays the destination address of the NQA operation. If the destination address is incorrect, execute the undo nqa schedule command in system view to stop the NQA operation. Then, execute the destination ip or destination ipv6 command in NQA operation view to edit the destination address before you restart the operation.
b. Execute the ping command to ping the destination address of the NQA operation. If the IP address cannot be pinged, resolve the unreachability issue. If there is a data link to the destination address but the device has no routes to the address in the routing table, execute the out interface or next-hop ip command in ICMP echo operation view. With either command executed, NQA will skip the routing table lookup and directly encapsulate the NQA probe packets with the specified IP address.
c. If the destination address can be pinged but packet loss randomly occurs, execute the display nqa statistics command to view the value of the Max round trip time field. Identify whether the value is close to the value configured by the probe timeout command in ICMP echo operation view.
- If they are close, it indicates that the link delay is relatively high and the probe timeout time is too small. Execute the probe timeout command to set a probe timeout time that is greater than the maximum round-trip time.
- If they are not close, the random packet loss issue exists on the link. Identify whether CRC errors occur on the input interface of the probe responses by executing the display interface command. The value of the CRC field in the Input area of the command output represents the number of inbound packets that contained CRC errors. If this number continues to grow rapidly, a component on the transmission path might be faulty. Further troubleshoot the issue.
d. Execute the ping command to ping the destination address of the NQA operation with the same settings configured for the operation. Identify whether the NQA operation failed due to incorrect parameter configuration. The settings include the packet size, output interface, next hop, source address, and initial TTL value.
- If the destination address can be pinged, the security devices on the probed path might have filtered the NQA probe packets. Further troubleshoot the issue.
- If the destination address cannot be pinged, edit the settings for probe packets. If the destination address can be pinged, the NQA operation failure might be caused by incorrect parameter configuration.
The default TTL value is 20 for ICMP echo operations. If more than 20 devices exist between the source and destination on the network, execute the ttl command in ICMP echo operation view to set a greater TTL value.
If the probe packets are too large, causing excessive fragments and processing timeout, execute the data-size command in ICMP echo operation view to set a smaller payload size for each probe packet.
If the output interface and next hop are configured incorrectly, execute the out interface and nexthop commands in ICMP echo operation view to edit the output interface and next hop.
If the source address is configured incorrectly, execute the source ip or source ipv6 command in ICMP echo operation view to edit the source address for probe packets. ICMP echo operations do not support source port configuration. For operations that support source port configuration, also execute the source port command to edit the source port.
If the probe timeout time is too small, execute the probe timeout command in ICMP echo operation view to set a greater probe timeout timer.
- If the destination address still cannot be pinged, identify the cause of packet loss through methods such as packet capture, traffic measurement, and debugging.
4. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: HH3C-NQA-MIB
· hh3cNqaProbeFailure (1.3.6.1.4.1.25506.8.3.3.3)
· hh3cNqaProbeTimeAboveThreshold (1.3.6.1.4.1.25506.8.3.3.10)
· hh3cNqaProbeTimeBelowThreshold (1.3.6.1.4.1.25506.8.3.3.11)
· hh3cNqaProbeFailAboveThreshold (1.3.6.1.4.1.25506.8.3.3.12)
· h3cNqaProbeFailBelowThreshold (1.3.6.1.4.1.25506.8.3.3.13)
· hh3cNqaTestFailure (1.3.6.1.4.1.25506.8.3.3.16)
Log messages
· NQA/6/NQA_LOG_UNREACHABLE
· NQA/6/NQA_PACKET_OVERSIZE
· NQA/4/NQA_SCHEDULE_FAILURE
· NQA/4/NQA_SEVER_FAILURE
· NQA/6/NQA_START_FAILURE
TWAMP Light test failure
Symptom
The device, acting as the TWAMP Light sender, starts a TWAMP Light test to the destination end. The TWAMP Light test fails when one of the following conditions occurs:
· The status of the TWAMP Light test is abnormal.
Execute the display nqa twamp-light client command on the device. If the Status field value in the output is Inactive, the TWAMP Light test has not started and is considered a failure.
· The TWAMP Light test result contains an anomaly.
Execute the display nqa twamp-light client statistics two-way-loss test-session command on the device. If the Loss count field value is not 0, it indicates that TWAMP Light test packet loss occurs on the network. If the Error count field value is not 0, it indicates that the device receives error TWAMP Light test packets. When the number of lost packets or error packets exceeds the threshold allowed for user services, the TWAMP Light test is considered a failure.
Common causes
· The following are the common causes of test status anomaly:
¡ On a Layer 3 VPN network, the VPN is deleted.
¡ On a Layer 2 VPN network, the source AC state changes to down.
¡ An interface card is removed and the interface specified by the source interface command does not exist.
· The following are the common causes of test result anomaly:
¡ Packet loss occurs.
- Settings on the TWAMP Light sender and responder do not match.
- The device cannot communicate with the destination address of the test. The destination address cannot be pinged or packet loss occurs during the ping operation.
- CRC errors occur on the interface.
¡ Packet error occurs.
- The timeout time specified by the timeout keyword in the start command is too small. After you execute the start command to start the TWAMP Light test, the reflected package reaches the device after the timeout timer expires. The device takes the reflected packet as an error packet.
- The content of the test packets contains fields that do not comply with the protocol requirements.
- Packet encapsulation fails.
Troubleshooting flow
Figure 2 shows the troubleshooting flowchart:
Figure 2 Flowchart for troubleshooting TWAMP Light test failure
Solution
1. Collect the status and results of the failed TWAMP Light test.
Execute the display nqa twamp-light client and display nqa twamp-light client statistics two-way-loss test-session commands on the device. Identify the failed TWAMP Light test and collect its status and results.
¡ Execute the display nqa twamp-light client command on the device. If the Status field value in the output is Inactive, it indicates that the status of the TWAMP Light test is abnormal.
<Sysname> display nqa twamp-light client
Brief information about all test sessions:
Total sessions: 1
Active sessions: 1
-----------------------------------------------------------------------------
ID Status Source IP/Port Destination IP/Port
1 Active 1.1.1.1/10000 1.1.1.2/20000
¡ Execute the display nqa twamp-light client statistics two-way-loss test-session command on the device. If the Loss count field value is not 0, it indicates that TWAMP Light test packet loss has occurred on the network. If the Error count field value is not 0, it indicates that the device receives error TWAMP Light test packets.
<Sysname> display nqa twamp-light client statistics two-way-delay test-session 1
Latest two-way loss statistics:
Index Loss count Loss ratio Error count Error ratio
1 200 100.0000% 0 0.0000%
2 200 100.0000% 0 0.0000%
3 200 100.0000% 0 0.0000%
4 200 100.0000% 0 0.0000%
5 200 100.0000% 0 0.0000%
--------------------------------------------------------------------------------
Average loss count : 200 Average loss ratio : 100.0000%
Maximum loss count : 200 Maximum loss ratio : 100.0000%
Minimum loss count : 200 Minimum loss ratio : 100.0000%
Average error count : 0 Average error ratio : 0.0000%
Maximum error count : 0 Maximum error ratio : 0.0000%
Minimum error count : 0 Minimum error ratio : 0.0000%
2. To resolve the test status anomaly, perform the following tasks:
a. If the device has just completed startup or active/standby switchover, or the interface card where the interface (specified in the source interface command) is located has not completed startup, wait for the device state to become stable. Execute the display system stable state command. If the System state field value in the output is Stable, the device is already in a stable state. In this case, identify whether the test status changes to Active.
- If it changes to Active, no further action is required.
- If it does not change to Active, proceed to the next step.
b. If the device is operating stably, identify whether the configuration is complete.
- On a Layer 3 VPN network, execute the display nqa twamp-light client verbose command to view the VPN bound to the TWAMP Light test and execute the display ip vpn-instance command to identify whether the VPN exists. If the bound VPN does not exist, execute the ip vpn-instance command in system view to create a VPN instance.
- On a Layer 2 VPN network, execute the display nqa twamp-light client verbose command to view the source interface. If the Source interface field value is a hyphen (-), execute the source interface command in TWAMP Light client-session view to specify a source AC for test packets, and make sure the specified interface is up.
c. Identify whether the network connection is ready. If a source interface or source AC is specified for the TWAMP Light test, make sure the source interface or source AC is up.
- Execute the display l2vpn pw xconnect-group or display l2vpn forwarding ac command. If the State field (which represents AC state) value is Down, resolve the AC issue.
- Execute the display interface command. If the values for the Current state and Line protocol state fields (which indicate the interface state) are Down, bring up the interface.
3. To resolve the test packet loss issue, perform the following tasks:
a. Identify whether packet loss is due to configuration errors.
Execute the display nqa twamp-light client verbose command on the device and the display nqa twamp-light responder command on the TWAMP Light responder of the test. If the following parameters are specified, the settings on the TWAMP Light sender and responder must be consistent.
- Source IP address. You can edit this parameter on the TWAMP Light sender by using the source ip or source ipv6 command in TWAMP Light client-session view.
- Source port number. You can edit this parameter on the TWAMP Light sender by using the source port command in TWAMP Light client-session view.
- Destination IP address. You can edit this parameter on the TWAMP Light sender by using the source ip or source ipv6 command in TWAMP Light client-session view.
- Destination port number. You can edit this parameter on the TWAMP Light sender by using the destination port command in TWAMP Light client-session view.
- VPN instance name. You can edit this parameter on the TWAMP Light sender by using the vpn-instance command in TWAMP Light client-session view.
- VLAN ID. You can edit this parameter on the TWAMP Light sender by using the vlan command in TWAMP Light client-session view.
- Source MAC address. You can edit this parameter on the TWAMP Light sender by using the source mac command in TWAMP Light client-session view.
- Destination MAC address. You can edit this parameter on the TWAMP Light sender by using the destination mac command in TWAMP Light client-session view.
You can edit all the above parameters on the TWAMP Light responder by using the test-session command in TWAMP Light responder view.
If the timestamp format specified on the TWAMP Light sender is NTP and the test packet sending interval is 10 or 100 milliseconds, the device takes it as a configuration conflict. In this case, the TWAMP Light test fails. Execute the start command in TWAMP Light sender view to edit the test packet sending interval, or execute the timestamp-format command in TWAMP Light client-session view to edit the timestamp format.
b. Execute the terminal monitor, terminal debugging, debugging nqa error, and debugging nqa event commands sequentially in user view on the TWAMP Light sender to enable debugging for NQA. Then, execute the view /var/log/trace.log command in probe view to view NQA trace logs. Based on the logs, identify whether the device correctly sends TWAMP Light test packets and receives reflected packets of the TWAMP Light test, and whether the timestamps in the test results are correct.
- If the TWAMP Light sender does not send test packets correctly, identify the cause of the packet sending issue based on the NQA debugging information and trace logs. Edit the TWAMP Light test configuration on the sender according to the failure reason, and then restart the TWAMP Light test. If the packet sending issue cannot be resolved, execute the following commands, collect command output information, and then proceed to step 5.
display ip statistics
display ipv6 statistics
display ethernet statistics
- If the TWAMP Light sender does not receive reflected packets of the test correctly, execute the nqa agent enable command in system view of the TWAMP Light responder to enable the NQA client. Then, return to user view and execute the terminal monitor, terminal debugging, and debugging nqa packet commands sequentially to enable NQA packet debugging. Identify whether the TWAMP Light responder receives NQA packets and whether the NQA packet configuration is correct. If the responder does not receive any NQA packet, the network might fail. Proceed to the next step to troubleshoot the network issue. If the NQA packet configuration is incorrect, edit the NQA configuration as described in step 3.a and restart the test. The following example illustrates the NQA packet debugging information:
- Identify whether the following timestamp conditions are met in the test results:
CSendTime ≤ CRecvTime.
SRecvTime ≤ SSendTime.
SSendTime – SRecvTime, which is the processing time of the NQA server, is relatively small.
If these conditions are not met, collect the timestamp values and execute the display device command to collect device card information, and then proceed to step 5.
The following examples illustrate trace logs:
*May 6 00:36:24:900 2023 Sysname NQA/7/KDIAG: send packt, session 1, ucSampler 187.
// The output shows that the device sends a TWAMP Light test packet.
*May 6 00:36:24:901 2023 Sysname NQA/7/KDIAG: Twmap Recv Pakcet ucSampler=187
// The output shows that the device receives a TWAMP Light reflected packet.
*May 6 00:36:24:901 2023 Sysname NQA/7/KDIAG: cSendSec is 1683304584, cSendFrac is 900923500, sRecvSec is 1683304584, sRecvFrac is 835000000,cRecvSec is 1683304584, cRecvFrac is 901923500, sSendSec is 1683304584, sSendFrac is 835000000
*May 6 00:36:24:901 2023 Sysname NQA/7/KDIAG: nqa entry (twamplight?session-1) Sampler(187) client time:
CSendTime=1683304584900923 CRecvTime=1683304584901923 SRecvTime=1683304584835000 SSendTime=1683304584835000
// The output shows the timestamps within a TWAMP Light test packet.
c. Identify whether packet loss is caused by network failures. Execute the ping command to ping the destination address of the test. If the destination address cannot be pinged or packet loss occurs, first resolve the network failures.
d. Identify whether the packet loss is caused by CRC errors.
e. Execute the display counters command. If the value of the Err (pkts) field in the command output keeps increasing as the test progresses, it indicates that packet sending failures occur on the link layer. Replace the associated interfaces or cables to resolve the issue.
4. To resolve the error packet issue, perform the following tasks:
a. Identify whether configuration errors exist and the device mistakenly considers TWAMP Light reflected packets that arrive late as error packets.
- Execute the ping command on the TWAMP Light sender to ping the responder and view the maximum delay between the two ends, which is the max value in the round-trip min/avg/max/std-dev field of the ping results, in milliseconds.
- Execute the display nqa twamp-light client verbose command on the device and view the value of the Timeout(sec) field, which represents the timeout time of the TWAMP Light reflected packets. The timeout time of TWAMP Light reflected packets must be greater than the maximum delay between the two ends. If it is not, execute the start command in TWAMP Light sender view to specify a larger timeout time with the time-out keyword.
b. Execute the terminal monitor, terminal debugging, debugging nqa error, and debugging nqa event commands sequentially in user view on the TWAMP Light sender to enable debugging for NQA. Then, execute the view /var/log/trace.log command in probe view to view NQA trace logs. Based on the logs, identify whether the packet content meets protocol requirements and whether the packet encapsulation is correct. If the packet content does not meet protocol requirements or the packet encapsulation is incorrect, re-configure the TWAMP Light test as described in TWAMP Light configuration.
5. If the issue persists, collect the following information and contact Technical Support:
¡ Results of each step.
¡ The configuration file, log messages, and alarm messages.
Related alarm and log messages
Alarm messages
Module name: HH3C-TWAMP-MIB
· hh3cTwampTwoWayLossExceed (1.3.6.1.4.1.25506.2.184.1.0.1)
· hh3cTwampTwoWayLossRecover (1.3.6.1.4.1.25506.2.184.1.0.2)
· hh3cTwampTwoWayDelayExceed (1.3.6.1.4.1.25506.2.184.1.0.3)
· hh3cTwampTwoWayDelayRecover (1.3.6.1.4.1.25506.2.184.1.0.4)
· hh3cTwampTwoWayJitterExceed (1.3.6.1.4.1.25506.2.184.1.0.5)
· hh3cTwampTwoWayJitterRecover (1.3.6.1.4.1.25506.2.184.1.0.6)
· hh3cTwampSenderStartFailure (1.3.6.1.4.1.25506.2.184.1.0.9)
· hh3cTwampStatisticsAbnormal (1.3.6.1.4.1.25506.2.184.1.0.11)
Log messages
· NQA/6/NQA_TWAMP_LIGHT_PACKET_INVALID
· NQA/6/NQA_TWAMP_LIGHT_REACTION
· NQA/6/NQA_TWAMP_LIGHT_SENDER_START_FAILURE
· NQAS/6/NQA_TWAMP_LIGHT_START_FAILURE