18-AIOps Configuration Guide

HomeSupportSwitchesS6530X SeriesConfigure & DeployConfiguration GuidesH3C S6530X Switch Series Configuration Guides-R8307Pxx-6W10018-AIOps Configuration Guide
Table of Contents
Related Documents
01-AIOps configuration
Title Size Download
01-AIOps configuration 86.21 KB

Configuring AIOps

AIOps overview

AIOps is artificial intelligence for IT operations. AI is profoundly transforming human society at an unprecedented pace. Various ICT devices are also actively utilizing AI technology to improve device operation efficiency and enhance traditional maintenance methods, thereby meeting user demands for cost reduction and efficiency improvement. The three core elements of AI are algorithms, computing power, and data. Devices utilize various AI algorithms and massive amounts of sample data to achieve different AI functions through the computing power of chips on the device.

AIOps functions

AI ECN

AI for Explicit Congestion Notification (AI ECN) uses AI algorithms and data models to dynamically predict the optimal ECN threshold for the queue. Devices transmit packets with ECN markers based on the optimized threshold to reduce network congestion. This ensures low latency and high throughput in complex network environments. For more information about AI ECN, see "Configuring AI ECN."

 


Configuring AI ECN

About AI ECN

AI ECN is a dynamic congestion notification technology implemented using AI algorithms. AI ECN can be used in an intelligent lossless network to provide congestion avoidance for RDMA over converged Ethernet version 2 (RoCEv2) traffic.

Basic concepts

ECN uses the DS field in the IP packet header to mark the congestion status on the transmission path. An ECN-capable endpoint can determine congestion on the transmission path through ECN marks in packets, and adjust transmission rates to prevent worsening congestion.

RFC 2481 defines the last two bits in the DS field of the IP packet header as the ECN field:

·     Bit 6 is used to identify whether the sending device supports the ECN function, known as the ECN-Capable Transport (ECT) bit.

·     Bit 7 is used to identify whether a packet has encountered congestion on its transmission path, known as the Congestion Experienced (CE) bit.

Figure 1 ECN field in IPv4 packet header

 

As shown in Figure 1, RFC 3168 defines the values of the ECN field for an IPv4 packet as follows:

·     If the value of the ECN field is 00, it indicates that the sending device does not support ECN.

·     When the value of the ECN field is 01 or 10, it indicates that the sending device supports ECN, marked as ECT(0) or ECT(1) respectively.

·     When the value of ECN field is 11, it indicates that congestion has occurred on the forwarding path of the packet, marked as CE.

RFC 3168 defines the last two bits of the Traffic Class field in the IPv6 header as the ECN field.

Static ECN

Definition of static ECN

Static ECN refers to ECN that works with WRED. You manually configure WRED parameters for a queue (including the upper threshold and lower threshold for the average queue length) and then enable ECN for the queue. For more information about static ECN, see QoS in ACL and QoS Configuration Guide.

Advantages and disadvantages of static ECN

Static ECN has the following advantages:

·     Setting a proper lower threshold allows devices to detect congestion on the path in advance and have the receiving end notify the transmitting end to slow down the transmission rate.

·     The forwarding device marks the ECN field as 11 for packets exceeding the lower threshold. This avoids the process of message discard and retransmission in the network, reducing network delay.

·     When congestion occurs in the network, the sender gradually lowers the packet transmission rate within a certain time. After congestion disappears, the sender gradually increases the rate to avoid rapid changes of network throughput before and after congestion.

However, traffic passing through each queue might dynamically change over time. Network administrators cannot adapt to real-time traffic changes through static ECN threshold settings.

·     If the ECN threshold is set too high, the forwarding device uses longer queues and more buffers to ensure the rate of traffic transmission, meeting the bandwidth requirements of large flows. However, during congestion in the queue, packets waiting in the buffer can cause significant queue delay, which is unacceptable for small, latency-sensitive flows.

·     If the ECN threshold is set too low, the forwarding device uses shorter queues and less buffers to reduce the queuing delay in the queue, meeting the latency requirements of small flows. However, a low ECN threshold can reduce network throughput, limiting transmission of large flows.

Due to the previous reasons, an intelligent real-time ECN low-threshold control function, which is called AI ECN, is introduced.

AI ECN

As shown in Figure 2, AI ECN uses AI service components on the device or an analyzer to dynamically optimize ECN thresholds according to specific rules. The AI business components, built into network devices or analyzers, are crucial for ECN dynamic optimization. They consist of a three-level functional framework:

·     The data collection and analysis layer provides data collection interfaces for collecting massive amounts of data, preprocesses, and analyzes the collected data.

·     The model management layer manages model files and infers the AI ECN threshold based on AI function models mounted by users.

·     The algorithm layer invokes the data collection interface to obtain real-time data, and uses the fixed-step search algorithm to calculate the AI ECN threshold.

Figure 2 AI ECN

 

As shown in Figure 2, AI ECN is implemented as follows:

1.     The forwarding chip inside the device collects traffic pattern information, such as queue buffer occupancy, traffic throughput, large and small flow ratio, and other data. Then, it passes real-time traffic characteristics through Telemetry to the data collection and analysis layer of the AI service component.

2.     After receiving traffic pattern information, the AI service component analyzes the current traffic pattern through the data collection and analysis layer and determines whether it matches a traffic model in the management layer.

¡     If a match is found, the AI business component infers the optimal value of the real-time ECN threshold based on the matching traffic model. This generation method of AI ECN is called model reasoning and uses the Neural Network algorithm.

¡     If the traffic pattern does not match a traffic model, the AI service component modifies the ECN thresholds by a fixed step based on the current network state while ensuring high bandwidth and low latency. The updated ECN thresholds will then be deployed to the forwarding chip. After setting new ECN thresholds, the AI service component adjusts them based on collected new traffic patterns until the optimal ECN thresholds are obtained. This generation method of AI ECN is called heuristic reasoning mode.

3.     After AI ECN is enabled on the device, the forwarding chip will automatically receive ECN data pushed by the AI business component and adjust the ECN threshold value according to the optimal ECN threshold issued by the AI business component.

4.     The linkage mechanism between AI business components and forwarding chips can dynamically adjust the ECN threshold to match the real-time traffic flows.

¡     When a queue has a high proportion of small flows, the ECN triggering threshold is reduced to ensure low latency for small flows.

¡     When a queue has a high proportion of large flows, the ECN trigger threshold is increased to ensure high throughput for large flows.

Licensing requirements

The AI ECN feature is limited by licenses. To use the AI ECN feature, first install the corresponding licenses. For more information, see Fundamentals Configuration Guide.

Enabling AI ECN for a queue

About this task

This function enables the device to collect and send traffic characteristics to the AI service component on an analyzer or the local AI service component by using NetAnalysis. The AI service component dynamically sets the optimal ECN triggering threshold for a queue to achieve low delay and high throughput. For more information about NetAnalysis, see Network Management and Monitoring Configuration Guide.

The following AI ECN modes are supported based on the chip and hardware capabilities:

·     Centralized mode—The analyzer calculates the ECN triggering threshold and communicates it to devices. This mode does not require high hardware capabilities for devices.

·     Distributed mode—The device itself intelligently sets the optimal ECN triggering threshold for queues. This mode requires high hardware capabilities for devices and consumes device CPU resources.

·     Neural mode—The neural network algorithm of the device intelligently sets the optimal ECN triggering threshold for queues. This mode requires the device chip to support the neural network algorithm.

Restrictions and guidelines

This feature is mutually exclusive with the following settings:

·     Applying a WRED table to an interface.

·     Configuring WRED parameters for a queue.

·     Setting the WRED exponent for average queue size calculation.

AI ECN requires a license to work. Install a valid license before using this feature. For more information about licensing, see license management in Fundamentals Configuration Guide.

Queue-based outbound traffic statistics can be displayed only if AI ECN is enabled.

IRF ports do not support ECN.

Prerequisites

To configure AI ECN in an intelligent lossless network, first configure NetAnalysis for RoCEv2 traffic:

·     Use the netanalysis rocev2 mode command to set the mode of RoCEv2 traffic analysis.

·     Use the netanalysis rocev2 statistics command to enable RoCEv2 traffic statistics collection.

·     Use the netanalysis rocev2 ai-ecn enable command to enable AI ECN for RoCEv2 traffic statistics collection.

Procedure

1.     Enter system view.

system-view

2.     Enter AI service view.

ai-service

3.     Enable AI ECN and set the AI ECN mode.

ai ai-ecn enable mode { centralized | distributed | neural }

4.     Enter AI ECN view.

ai-ecn

5.     Enable AI ECN for a queue.

queue queue-id enable

By default, AI ECN is disabled for a queue.

 

  • Cloud & AI
  • InterConnect
  • Intelligent Computing
  • Intelligent Storage
  • Security
  • SMB Products
  • Intelligent Terminal Products
  • Product Support Services
  • Technical Service Solutions
All Services
  • Resource Center
  • Policy
  • Online Help
  • Technical Blogs
All Support
  • Become A Partner
  • Partner Policy & Program
  • Global Learning
  • Partner Sales Resources
  • Partner Business Management
  • Service Business
All Partners
  • Profile
  • News & Events
  • Online Exhibition Center
  • Contact Us
All About Us
新华三官网