H3C S6805 & S6825 & S6850 & S9850 & S9820 RDMA Technical Reference-6W100

HomeSupportConfigure & DeployBest PracticesH3C S6805 & S6825 & S6850 & S9850 & S9820 RDMA Technical Reference-6W100
Download Book

 

RDMA Technical Reference

S6805 Switch Series Release 66xx

 

S6825 Switch Series Release 66xx

 

S6850 Switch Series Release 655x, Release 66xx

 

S9850 Switch Series Release 655x, Release 66xx

 

S9820-64H Switch Release 655x, Release 66xx

 

S9820-8C Switch Release 66xx

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Copyright © 2023 New H3C Technologies Co., Ltd. All rights reserved.

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

The information in this document is subject to change without notice.


Contents

Overview·· 1

Technical background· 1

Benefits· 1

RDMA technique types· 2

RDMA technique overview·· 2

IB· 2

Introduction· 2

Benefits· 3

iWARP· 3

Introduction· 3

Benefits· 3

RoCE· 3

Introduction· 3

Benefits· 4

Building a lossless Ethernet network to support RoCE·· 4

Managing and monitoring data buffers· 5

About data buffers· 5

Data buffer types· 5

Cell resources· 6

Fixed area and shared area· 6

Managing data buffers· 7

About managing data buffers· 7

Restrictions and guidelines· 7

Procedure· 7

Configuring data buffer monitoring· 9

Setting the per-interface buffer usage threshold· 9

Configuring alarm thresholds for the ingress or egress buffer· 9

Configuring alarm thresholds for the headroom buffer· 9

Configuring packet-drop alarms· 10

Displaying data buffer information at the CLI and reporting data buffer information by using gRPC· 10

Displaying data buffer information at the CLI 10

Using gRPC to report buffer usage information· 13

See also· 16

PFC·· 16

About PFC· 16

Mechanism·· 17

PFC pause frame generation mechanism·· 17

Packet priority-to-queue mappings· 17

Configuring PFC· 19

Restrictions and guidelines· 19

Configuring PFC in system view·· 19

Configuring PFC in Ethernet interface view·· 19

Setting PFC thresholds· 20

About PFC thresholds· 20

Restrictions and guidelines· 21

Procedure· 23

Configuring PFC deadlock detection· 25

About PFC deadlock detection· 25

Restrictions and guidelines· 26

PFC deadlock detection tasks at a glance· 27

Configuring the PFC deadlock detection interval 27

Configuring the delay timer for PFC deadlock detection automatic recovery· 28

Configuring the PFC deadlock recovery mode· 28

Enabling PFC deadlock detection· 29

PFC deadlock detection logs· 30

Configuring the early warning thresholds for PFC packets· 31

About this task· 31

Procedure· 31

Displaying PFC information at the CLI and reporting PFC information by using gRPC· 32

Displaying PFC information at the CLI 32

Reporting PFC information by using gRPC· 33

See also· 34

ECN·· 34

About ECN·· 34

Mechanism·· 34

Restrictions and guidelines· 36

Configuring ECN·· 36

Displaying ECN information at the CLI and reporting ECN information by using gRPC· 36

Displaying ECN information at the CLI 36

Reporting ECN information by using gRPC· 37

See also· 37

DCBX·· 37

About DCBX· 37

Mechanism·· 37

Configuring DCBX· 37

See also· 38

ETS·· 38

About ETS· 38

Mechanism·· 38

Configuring ETS· 39

Configure an 802.1p-to-local priority mapping· 39

Configuring SP+WRR queuing· 41

See also· 41

Example: Configuring RDMA·· 41

Example: Configuring PFC deadlock detection· 47

Example: Configuring PFC thresholds· 49

Appendixes· 52

Default priority map· 52

Developing a gRPC collector-side application· 52

Prerequisites· 52

Generating the C++ code for the proto definition file· 52

Developing the collector-side application· 52

 


Overview

Technical background

High-performance computing (HPC), big data analysis, artificial intelligence (AI), and Internet of Things (IoT) are developing fast, and the centralized/distributed storage and cloud databases are widely used. As a result, service applications need to obtain more and more data from networks, leading to higher requirement for the switching speed and performance of datacenter networks.

In traditional TCP/IP software and hardware architectures and applications, the network transmission and data processing delays are long, data are copied and interrupted multiple times, and TCP/IP protocol processing is complicated. Remote Direct Memory Access (RDMA) reduces the data processing delay on servers during the network transmission process. RDMA directly transmits the user application data to the storage space of the servers, and uses the network to fast transmit the data from the local system to the storage of the remote system. RDMA eliminates multiple data copying and context switching operations during the transmission process, and reduces the CPU load.

Figure 1 Traditional TCP/IP data transmission process

 

Figure 2 RDMA data transmission process

 

Benefits

During the network transmission process, RDMA directly transmits data between the data buffers of two nodes. RDMA directly transmits data on the local node to the memory of the remote node, bypassing multiple data copying operations in the CPU by the operating systems. Compared with traditional network transmission technologies, RDMA does not involve operating systems or TCP/IP protocols, and easily implements low-latency data processing and high throughput. Because RDMA does not involve the CPU resources of remote nodes, RDMA saves resources for data migration and processing.

RDMA technique types

RDMA techniques include the following types:

·     IB—InfiniBand, an RDMA technique based on the InfiniBand architecture, which is proposed by InfiniBand Trade Association (IBTA). To build an IB-based RDMA network, you need dedicated IB NICs and IB switches.

·     iWARP—Internet Wide Area RDMA Protocol, an RDMA technique based on TCP/IP protocols, which is proposed by IETF. iWARP supports RDMA on standard Ethernet infrastructures. However, the servers must use iWARP-capable NICs.

·     RoCE—RDMA over Converged Ethernet, an RDMA technique based on Ethernet, which is also proposed by IBTA. RoCE supports RDMA on standard Ethernet infrastructures. However, the Ethernet switches must support lossless Ethernet and the servers must use RoCE NICs.

H3C Ethernet switches support iWARP. Some H3C switches support lossless Ethernet, and thus support RoCE. For devices that support lossless Ethernet, consult the marketing staff or see the product documents.

RDMA technique overview

IB

Introduction

IB is an RDMA technique based on the InfiniBand architecture. IB provides a channel-based point-to-point message queuing forwarding model. Each application can directly obtain its own data messages through the created virtual channel, without involving other operating systems or protocol stacks. On the application layer, IB uses RDMA to provide direct read/write access to the remote nodes and completely offload the CPU. On the network layer, IB provides high-bandwidth transmission. On the link layer, IB provides a dedicated retransmission mechanism to guarantee QoS, and does not need buffering data.

IB must run in an IB network using IB switches and IB NICs.

Figure 3 IB architecture

 

Benefits

IB delivers the following benefits:

·     Uses RDMA on the application layer to reduce the data processing delay on the host side.

·     Controls message forwarding by using the subnet manager, without complicated protocol interactions and calculations in Ethernet.

·     Uses a retransmission mechanism on the link layer to guarantee QoS, not buffers packets, and achieves zero packet loss.

·     Achieves low latency, high bandwidth, and low overhead.

iWARP

Introduction

iWARP is an RDMA technique based on Ethernet, and can run on a standard Ethernet infrastructure.

iWARP contains the following layers:

·     RDMAPRDMA protocol, which performs RDMA read/write operations and translates RDMA messages, and sends RDMA messages to the DDP layer.

·     DDPDirect data placement, which segments long RDMA messages, encapsulates the segments in DDP PDUs, and sends the packets to the MPA layer.

·     MPA—Marker PDU aligned framing, which adds markers at the fixed octet intervals, data packet length fields, and CRC fields to DDP PDUs to form MPA PDUs and sends them to TCP for transmission.

Benefits

iWARP reduces the network load on the host side in the following aspects:

·     Offloads TCP/IP processing from the CPU to the RDMA NIC, reducing the CPU load.

·     Eliminates memory copying. An application can directly transmits its data to the memory of an application on the remote end, sharply reducing the CPU load.

·     Reduces context switching for application programs. An application can bypass the operating system and directly issue commands to the RDMA NIC in the user space, reducing the overheads and the delay caused by application context switching.

Because TCP protocols can provide flow control and congestion management, iWARP does not need to support lossless Ethernet, and can be implemented by common Ethernet switches and iWARP NICs. Therefore, iWARP can be used in WANs and easily expanded.

RoCE

Introduction

RoCE supports carrying IB over Ethernet to implement RDMA over Ethernet. RoCE and IB are the same on the application layer and transport layer, and are different only on the network layer and Ethernet link layer.

Figure 4 RoCE architecture

 

RoCE has the following versions:

·     RoCEv1Carries RDMA over Ethernet. RoCEv1 can be deployed only on Layer 2 networks. RoCEv1 adds Layer 2 Ethernet headers to IB packets and identifies RoCE packets by using Ethertype 0x8915.

·     RoCEv2Carries RDMA over UDP/IP protocols. RoCEv2 can be deployed on Layer 3 networks. RoCEv2 adds UDP headers, IP headers, and Layer 2 Ethernet headers to IB packets, and identifies RoCE packets by using destination UDP port number 4791. RoCEv2 supports hashing based on source port numbers and uses ECMP to implement load sharing, improving the network efficiency.

Benefits

RoCE delivers the following benefits for data transmission over Ethernet:

·     High throughput.

·     Low latency.

·     Low CPU load.

RoCE can be implemented through common Ethernet switches. However, the servers must use RoCE NICs and the network must support lossless Ethernet because loss of any packet will cause a large number of retransmissions in IB and seriously affect the data transmission performance.

Building a lossless Ethernet network to support RoCE

In an RoCE network, you must build a lossless Ethernet network to ensure zero packet loss. Lossless Ethernet must support the following key features:

·     Data buffer management and monitoringAdjusts the buffer space that each interface or queue can use according to traffic characteristics, and reports buffer usage conditions at the CLI or through gRPC.

Adjust the data buffer size under the guidance of professionals. As a best practice, configure data buffer monitoring.

·     (Required.) PFC—Priority-based Flow Control. PFC provides per-hop priority-based flow control for multiple types of traffic separately.

·     (Required.) ECN—Explicit Congestion Notification. When the device is congested, ECN marks the ECN field in the IP header of a packet. The receiver sends congestion notification packets (CNPs) to notify the sender to slow down the sending speed. ECN implements end-to-end congestion management and reduces the spread and deterioration of congestion.

·     (Recommended.) DCBX—Data Center Bridging Capability Exchange Protocol. DCBX uses LLDP to autonegotiate DCB capabilities, including PFC and ETS capabilities. Typically, DCBX is used on the interface connecting the switch to the server, and negotiates capabilities with the server NIC.

·     (Optional.) ETS—Enhanced Transmission Selection. ETS classifies traffic by service type, provides minimum guaranteed bandwidth for different traffic types, and improves the link utilization. ETS must be configured hop by hop.

Figure 5 Key features for building a lossless Ethernet

 

In an RoCE network, PFC must be used together with ECN to guarantee both zero packet loss and bandwidth. Table 1 compares PFC and ECN.

Table 1 PFC vs ECN

Item

PFC

ECN

Network location

Layer 2

Network layer and transport layer

Effective scope

Point-to-point

End-to-end

Needs network-wide support or not

Yes

No

Controlled objects

Previous node in the network (if the server NIC supports PFC, PFC also takes effect on the NIC)

Sender host

Packet buffer location

Intermediate nodes and sender

Sender

Affected traffic

All traffic in one of the eight queues on the device

Congested connection

Response speed

Fast

Slow

 

Managing and monitoring data buffers

About data buffers

Data buffer types

Data buffers temporarily store packets to avoid packet loss.

The following data buffers are available:

·     Ingress buffer—Stores incoming packets when the CPU is busy.

·     Egress buffer—Stores outgoing packets when network congestion occurs.

·     Headroom buffer—Stores packets when the ingress buffer or egress buffer is used up.

Figure 6 shows the structure of ingress and egress buffers.

Figure 6 Data buffer structure

 

Cell resources

A buffer uses cell resources to store packets based on packet sizes. Suppose a cell resource provides 208 bytes. The buffer allocates one cell resource to a 128-byte packet and two cell resources to a 300-byte packet.

On the S9820-64H switch, a cell resource is 208 bytes. On the S9820-8C switch, a cell resource is 254 bytes. On the S6805&S6825&S6850&S9850 switch series, a cell resource is 256 bytes.

Fixed area and shared area

The cell resources have a fixed area and a shared area.

·     Fixed area—Partitioned into queues, each of which is equally divided by all the interfaces on a card, as shown in Figure 7. When congestion occurs or the CPU is busy, the following rules apply:

a.     An interface first uses the relevant queues of the fixed area to store packets.

b.     When a queue is full, the interface uses the corresponding queue of the shared area.

c.     When the queue in the shared area is also full, the interface discards subsequent packets.

The system allocates the fixed area among queues as specified by the user. Even if a queue is not full, other queues cannot preempt its space. Similarly, the share of a queue for an interface cannot be preempted by other interfaces even if it is not full.

·     Shared area—Partitioned into queues, each of which is not equally divided by the interfaces, as shown in Figure 7. The system determines the actual shared-area space for each queue according to user configuration and the number of packets actually received and sent. If a queue is not full, other queues can preempt its space.

The system puts packets received or sent on all interfaces into a queue in the order they arrive. When the queue is full, subsequent packets are dropped.

The shared area is also divided into service pools based on application services. You can map a queue to a service pool, and this queue can only use the resources of that service pool. By default, all of the shared area belongs to service pool 0.

·     Headroom areaWhen PFC is in effect and the back pressure frame triggering threshold is reached, the device sends PFC pause frames to the peer device. The headroom area is used to store the packets that the peer device has sent before receiving PFC pause frames.

For the headroom area, the S6805&S6825&S6850&S9850 switch series, S9820-64H switch, and S9820-8C switch support only service pool 0.

Figure 7 Fixed area and shared area

 

Managing data buffers

About managing data buffers

By default, all queues have an equal share of the shared area and fixed area. You can change the shared-area or fixed-area size for a queue. The unconfigured queues use the default setting.

For more information about data buffers, see ACL and QoS Configuration Guide and ACL and QoS Command Reference.

Restrictions and guidelines

The data buffer setting in interface view has higher priority than the data buffer setting in system view. If it is configured in both views, the setting in interface view takes effect.

Procedure

1.     Enter system view.

system-view

2.     Set the fixed-area ratio for a queue. Choose the options to configure as needed:

¡     Configure the global fixed-area ratio for a queue.

buffer egress [ slot slot-number ] cell queue queue-id guaranteed ratio ratio

The default setting is 13%.

¡     Execute the following commands in sequence to configure the fixed-area ratio for a queue on an interface:

interface interface-type interface-number

buffer egress cell queue queue-id guaranteed ratio ratio

quit

By default, the global fixed-area ratio is used.

3.     Set the maximum shared-area ratio for a queue. Choose the options to configure as needed:

¡     Configure the global maximum shared-area ratio for a queue.

buffer egress [ slot slot-number ] cell queue queue-id shared { ratio ratio | size }

The default setting is 20%.

The S6805&S6825&S6850&S9850 switch series, S9820-64H switch, and S9820-8C switch support the size argument in Release 6616 and later versions .

¡     Execute the following commands in sequence to configure the maximum shared-area ratio for a queue on an interface:

interface interface-type interface-number

buffer egress cell queue queue-id shared { ratio ratio | size }

quit

By default, the global maximum shared-area ratio is used.

The S6805&S6825&S6850&S9850 switch series, S9820-64H switch, and S9820-8C switch support the size argument in Release 6616 and later versions .

4.     Set the fixed-area ratio for a service pool.

buffer { egress | ingress } slot slot-number cell service-pool sp-id shared ratio ratio

By default, all of the shared area is reserved for service pool 0.

5.     Apply buffer assignment rules.

buffer apply

You need to execute this command only for the configuration in the system view to make the configuration take effect.

6.     Enter Ethernet interface view.

interface interface-type interface-number

7.     Map a queue to a service pool.

buffer { egress | ingress } queue queue-id map-to service-pool sp-id

By default, all queues are mapped to service pool 0.

8.     Return to system view.

quit

9.     Set the maximum number of cell resources in the headroom area.

priority-flow-control poolID pool-number headroom headroom-number

By default, the maximum number of cell resources in the headroom area is 12288 on the S6805&S6825&S6850&S9850 switch series and S9820-8C switches and is 28672 on the S9820-64H switch series.

The S6805&S6825&S6850&S9850 switch series, S9820-64H switch, and S9820-8C switch support only service pool 0.

For more information about this command, see Ethernet interface in Layer 2—LAN Switching Configuration Guide and Layer 2—LAN Switching Command Reference.

Configuring data buffer monitoring

Setting the per-interface buffer usage threshold

About the per-interface buffer usage threshold

This feature allows you to identify the interfaces that use an excessive amount of data buffer space. The switch automatically records buffer usage for each interface. When a queue on an interface uses more buffer space than the set threshold, the system counts one threshold violation for the queue.

Procedure

1.     Enter system view.

system-view

2.     Set the per-interface buffer usage threshold.

buffer usage threshold slot slot-number ratio ratio

The default setting is 100%.

Configuring alarm thresholds for the ingress or egress buffer

1.     Enter system view.

system-view

2.     Configure the alarm thresholds. Choose the options to configure as needed:

¡     Configure the global alarm threshold for a queue.

buffer { egress | ingress } usage threshold slot slot-number queue queue-id ratio ratio

The default setting is 100%.

¡     Execute the following commands in sequence to configure the alarm threshold for a queue on an interface:

interface interface-type interface-number

buffer { egress | ingress } usage threshold queue queue-id ratio ratio

quit

By default, the global alarm threshold is used.

3.     Set the alarm threshold for a service pool.

buffer { egress | ingress } usage threshold service-pool sp-id slot slot-number ratio ratio

The default setting is 100%.

4.     (Optional.) Set the interval for sending threshold-crossing alarms.

buffer threshold alarm { egress | ingress } interval interval

The default setting is 5 seconds.

5.     Enable threshold-crossing alarms.

buffer threshold alarm { egress | ingress } enable

By default, threshold-crossing alarms are disabled.

Configuring alarm thresholds for the headroom buffer

1.     Enter system view.

system-view

2.     Configure the alarm thresholds. Choose the options to configure as needed:

¡     Configure the global per-queue alarm threshold.

buffer usage threshold headroom slot slot-number ratio ratio

The default setting is 100%.

¡     Execute the following commands in sequence to configure the alarm threshold for a queue on an interface:

interface interface-type interface-number

buffer usage threshold headroom queue queue-id ratio ratio

quit

By default, the global per-queue alarm threshold is used.

3.     (Optional.) Set the interval for sending threshold-crossing alarms.

buffer threshold alarm headroom interval interval

The default setting is 5 seconds.

4.     Enable threshold-crossing alarms.

buffer threshold alarm headroom enable

By default, threshold-crossing alarms are disabled.

Configuring packet-drop alarms

1.     Enter system view.

system-view

2.     (Optional.) Set the interval for sending packet-drop alarms.

buffer packet-drop alarm interval interval

The default setting is 5 seconds.

3.     Enable packet-drop alarms.

buffer packet-drop alarm enable

By default, packet-drop alarms are disabled.

Displaying data buffer information at the CLI and reporting data buffer information by using gRPC

Displaying data buffer information at the CLI

Overview

Task

Command

Display the buffer usage threshold for queues and the number of threshold violations for each queue.

display buffer usage interface [ interface-type [ interface-number ] ]

Display the used buffer sizes, available buffer sizes, current buffer usage, and peak buffer usage for both egress and ingress buffers.

display buffer usage interface [ interface-type [ interface-number ] ] verbose

Display the number of cell resources used and the peak used size of the headroom area in the ingress buffer.

Display the available headroom area in the egress buffer and ingress buffer.

display buffer usage interface [ interface-type [ interface-number ] ] verbose

Display the number of packets forwarded and the number of bytes forwarded in the outbound direction.

Display the number of packets dropped and the number of bytes dropped in the outbound direction.

display qos queue-statistics interface [ interface-type interface-number ] outbound

 

display buffer usage interface command output

# Display brief buffer usage statistics for Twenty-FiveGigE 1/0/1.

<Sysname> display buffer usage interface twenty-fivegige 1/0/1

Interface              QueueID Total       Used        Threshold(%) Violations

--------------------------------------------------------------------------------

WGE1/0/1               0        6692352     0           70             0

                         1        6692352     0           70             0

                         2        6692352     0           70             0

                         3        6692352     0           70             0

                         4        6692352     0           70             0

                         5        6692352     0           70             0

                         6        6692352     0           70             0

                         7        6692352     0           70             0

Table 2 Command output

Field

Description

Total

Data buffer size in bytes allowed for a queue.

Used

Data buffer size in bytes that has been used by a queue.

Threshold(%)

Buffer usage threshold for a queue. The threshold value is the same as the per-interface threshold value.

Violations

Number of threshold violations for a queue.

The value of this field is reset upon a switch reboot.

 

# Display detailed buffer usage statistics for Twenty-FiveGigE 1/0/1.

<Sysname> display buffer usage interface twenty-fivegige 1/0/1 verbose

Twenty-FiveGigE1/0/1

  Ingress:

    QueueID: 0

      Total: 127974            Used: 0                  Threshold(%): 70

 

 

      Violations: 0            Shared: 0                Headroom: 0

 

 

      XoffThres: 127968        IsDynamic: 0

      Used(%): 0               Free: 127968             UsedPeak: 0

 

 

      HeadroomUsed(%): 0       HeadroomFree: 0          HeadroomPeak: 0

 

 

...

 

 

  Egress:

    QueueID: 0

      Total: 116070            Used: 0                  Threshold(%): 70

 

 

      Violations: 0            TailDropThres: 116070    IsDynamic: 1

 

 

      DeadlockCount: 0         DeadlockRecover: 0

      Used(%): 0               Free: 116070             UsedPeak: 0

...

Table 3 Command output

Field

Description

Ingress

Usage statistics for the ingress buffer.

Egress

Usage statistics for the egress buffer.

Total

Data buffer size allowed for a queue, in number of cell resources.

Used

Data buffer size that has been used by a queue, in number of cell resources.

Threshold(%)

Buffer usage threshold for a queue. The threshold value is the same as the per-interface threshold value.

Violations

Number of threshold violations for a queue.

The value of this field is reset upon a switch reboot.

Shared

Number of cell resources in the shared area used by a queue.

Headroom

Number of cell resources in the headroom area used by a queue.

XoffThres

Back pressure frame triggering threshold in number of cell resources.

IsDynamic

For the inbound direction, this field can be one of the following values:

·     0—Indicates a static back pressure frame triggering threshold.

·     1—Indicates a dynamic back pressure frame triggering threshold.

For the outbound direction, this field can only be 1, which indicates a dynamic tail drop threshold.

Used(%)

Buffer usage in percentage.

Free

Free buffer in number of cell resources.

UsedPeak

Peak used buffer in number of cell resources during the time between two executions of the display buffer usage interface command.

HeadroomUsed(%)

Headroom area usage in percentage for the ingress buffer.

HeadroomFree

Free headroom area in number of cell resources for the ingress buffer.

HeadroomPeak

Peak used headroom area in number of cell resources during the time between two executions of the display buffer usage interface command.

DeadlockCount

Number of times the device entered the PFC deadlock state in the egress buffer.

DeadlockRecover

Number of times the device released the PFC deadlock state in the egress buffer.

 

display qos queue-statistics interface outbound command output

# Display queue-based outgoing traffic statistics for Twenty-FiveGigE 1/0/1.

<Sysname> display qos queue-statistics interface twenty-fivegige 1/0/1 outbound

Interface: Twenty-FiveGigE1/0/1

 Direction: outbound

 Forwarded: 0 packets, 0 bytes

 Dropped: 0 packets, 0 bytes

 Queue 0

  Forwarded: 0 packets, 0 bytes, 0 pps, 0 bps

  Dropped: 0 packets, 0 bytes

  Current queue length: 0 packets

...

Table 4 Command output

Field

Description

Interface

Interface for which queue-based traffic statistics are displayed.

Direction

Direction of traffic for which statistics are collected.

Forwarded

Counts forwarded traffic both in packets and in bytes.

Dropped

Counts dropped traffic both in packets and in bytes.

Current queue length

Number of packets in the queue.

 

Using gRPC to report buffer usage information

About gRPC

gRPC is an open source remote procedure call (RPC) framework initially developed at Google. It uses HTTP 2.0 for transport and provides network device configuration and management methods that support multiple programming languages.

gRPC protocol stack layers

Table 5 describes the gRPC protocol stack layers.

Table 5 gRPC protocol stack layers

Layer

Description

Content layer

Defines the data of the service module.

Two peers must notify each other of the data models that they are using.

Protocol buffer encoding layer

Encodes data by using the protocol buffer code format.

gRPC layer

Defines the protocol interaction format for remote procedure calls.

HTTP 2.0 layer

Carries gRPC.

TCP layer

Provides connection-oriented reliable data links.

 

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. Protocol buffers are like XML and JSON but smaller, faster, and simpler.

Protocol buffer code uses .proto files to describe data structures. From .proto files, you can use a utility such as protoc to generate code in a variety of programming languages, for example, Java, C++, and Python. Based on the generated code, you can develop a collector-side application to receive data from a gRPC client.

gRPC network architecture

As shown in Figure 8, the gRPC network uses the client/server model. It uses HTTP 2.0 for packet transport.

Figure 8 gRPC network architecture

 

The gRPC network uses the following mechanism:

1.     The gRPC server listens to connection requests from clients at the gRPC service port.

2.     A user runs the gRPC client application to log in to the gRPC server, and uses methods provided in the .proto file to send requests.

3.     The gRPC server responds to requests from the gRPC client.

The device can act as the gRPC server or client. In RDMA application scenarios, the device acts as a gRPC client to report RDMA-related events to collectors. This section describes only the configuration procedures for you to configure the device as a gRPC client.

Basic concepts

When the device acts as a gRPC client, the following concepts are involved:

·     Sensor—The device uses sensors to sample data. A sensor path indicates a data source. The device supports the following data sampling types:

¡     Event-triggered sampling—Sensors in a sensor group sample data when certain events occur. For sensor paths of this data sampling type, see NETCONF XML API Event Reference for the module.

¡     Periodic sampling—Sensors in a sensor group sample data at intervals. For sensor paths of this data sampling type, see the NETCONF XML API references for the module except for NETCONF XML API Event Reference.

·     Collector—Collectors are used to receive sampled data from network devices. For the device to communicate with collectors, you must create a destination group and add collectors to the destination group.

·     Subscription—A subscription binds sensor groups to destination groups. Then, the device pushes data from the specified sensors to the collectors.

Restrictions and guidelines

To view the sensor paths supported by the buffer monitor, enter sensor path buffermonitor/? at the CLI. The following are some of the sensor paths:

·     buffermonitor/bufferusagesReports buffer usage information.

·     buffermonitor/headroomusagesReports headroom buffer usage information.

·     buffermonitor/portqueoverruneventReports buffer usage alarms for queues on interfaces, including the inbound buffers, outbound buffers, and headroom buffers.

·     buffermonitor/portquedropeventReports drop statistics for a queue on an interface in the inbound or outbound direction.

For information about the buffer monitor, see Comware BufferMonitor NETCONF XML API Event Reference and Comware V7 BufferMonitor GRPC API Reference. These references are released together with software versions. Obtain the references for your software version.

Before configuring gRPC to report event-triggered information, you must enable the monitoring and alarm function for the corresponding functions. For the monitoring and alarm functions supported by the data buffer, see "Configuring data buffer monitoring."

Configuring the device as a gRPC client

1.     Enable the gRPC service:

a.     Enter system view.

system-view

b.     Enable the gRPC service.

grpc enable

By default, the gRPC service is disabled.

2.     Configure sensors:

a.     Enter system view.

system-view

b.     Enter telemetry view.

telemetry

c.     Create a sensor group and enter sensor group view.

sensor-group group-name

d.     Specify a sensor path.

sensor path path

To specify multiple sensor paths, execute this command multiple times.

3.     Configure collectors:

a.     Enter system view.

system-view

b.     Enter telemetry view.

telemetry

c.     Create a destination group and enter destination group view.

destination-group group-name

d.     Configure a collector.

IPv4:

ipv4-address ipv4-address [ port port-number ] [ vpn-instance vpn-instance-name ]

IPv6:

ipv6-address ipv6-address [ port port-number ] [ vpn-instance vpn-instance-name ]

To specify multiple collectors, execute this command multiple times.

4.     Configure a subscription:

a.     Enter system view.

system-view

b.     Enter telemetry view.

telemetry

c.     Create a subscription and enter subscription view.

subscription subscription-name

d.     Specify a sensor group.

sensor-group group-name [ sample-interval interval ]

Specify the sample-interval interval option for periodic sensor paths and only for periodic sensor paths.

-     If you specify the option for event-triggered sensor paths, the sensor paths do not take effect.

-     If you do not specify the option for periodic sensor paths, the device does not sample or push data.

e.     Specify a destination group.

destination-group group-name

Developing a collector-side application to receive and display buffer usage information

For more information, see "Developing a gRPC collector-side application."

See also

For more information about data buffers, see ACL and QoS Configuration Guide and ACL and QoS Command Reference.

For more information about gRPC, see Telemetry Configuration Guide and Telemetry Command Reference.

PFC

About PFC

PFC is required for building a lossless Ethernet network. PFC provides per-hop priority-based flow control. When the device is forwarding packets, the device assigns packets to queues for scheduling and forwarding through looking up packet priorities in priority mapping tables. When the sending rate of packets carrying an 802.1p priority exceeds the receiving rate and the data buffer space on the receiver is insufficient, the receiver sends PFC pause frames to the sender. When the sender receives the PFC pause frames, the sender stops sending packets with the specified 802.1p priority until the sender receives PFC XON frames or the aging timer expires. When PFC is configured, the congestion of packets of a specific type does not affect the normal forwarding of the other types of packets, and different types of packets on a link are forwarded independently.

Mechanism

PFC pause frame generation mechanism

Figure 9 How PFC pause frames are generated

 

As shown in Figure 9, PFC pause frames are generated in the following process when congestion occurs on Port 1 of Device B:

1.     When Port 1 of Device B receives packets from Device A, the memory management unit (MMU) of Device B allocates cell resources to the packets. If PFC is enabled on Device B, Device B counts the cell resources occupied by packets with each 802.1p priority.

 

 

NOTE:

Cell resources are used to store packets. An interface allocates cell resources to packets based on packet sizes. Suppose a cell resource provides 208 bytes. An interface allocates one cell resource to a 128-byte packet and two cell resources to a 300-byte packet.

 

2.     When the cell resources used by packets carrying a certain 802.1p priority exceed the set threshold on Port 1 of Device B, Port 1 of Device B sends PFC pause frames for the 802.1p priority to Device A.

3.     When Device A receives the PFC pause frames for the 802.1p priority, Device A stops sending out packets carrying the 802.1p priority and buffers these packets. If the buffer threshold for the 802.1p priority is reached, Device A sends PFC pause frames for the 802.1p priority to its upstream device, as shown in Figure 10.

Figure 10 PFC pause frame processing between multiple devices

 

Packet priority-to-queue mappings

When a device forwards packets, packets with different priority values are assigned to different queues for scheduling and forwarding. The packet priority-to-queue mappings depend on the priority mapping method configured. The device supports the following priority mapping methods:

·     Configuring priority trust mode—In this method, you can configure an interface to trust the specified type of priority carried in packets. Then, the device looks up the trusted priority type in incoming packets in the priority maps and modifies the priority values in packets based on the priority maps. Packets are scheduled within the device based on the priorities. Available priority trust modes include:

¡     dot1pTrusts the 802.1p priority carried in packets and uses the 802.1p priority for priority mapping.

¡     dscpTrusts the DSCP priority carried in IP packets and uses the DSCP priority for priority mapping.

·     Changing port priority—If no priority trust mode is configured for an incoming interface, the port priority of the incoming interface is used for priority mapping. By changing the port priority of an interface, you can change the priority of incoming packets on the interface. Then, packets received on different incoming interfaces can be assigned to the corresponding queues and scheduled differentiatedly.

When configuring PFC on an interface, you must configure the interface to trust the 802.1p or DSCP priority carried in packets. When the interface receives Ethernet packets, the interface marks local precedence values for packets according to the priority trust mode and the 802.1Q tagging status of packets. Then, the packets are scheduled based on their local precedence values. Figure 11 shows the detailed process.

 

 

NOTE:

This document describes only the packet priority to local precedence mappings when the interface trusts the 802.1p or DSCP priority carried in packets. For information about the port priority configuration and the drop precedence (used as a reference for dropping packets), see the configuration guide for your device.

 

Figure 11 Packet priority to queue mappings

 

Configuring PFC

Restrictions and guidelines

For PFC to work properly on an interface, you must configure the interface to trust the 802.1p or DSCP priorities carried in packets by using the qos trust { dot1p | dscp } command, and make sure the 802.1p-to-local priority map and DSCP-to-802.1p priority map are the same on all interfaces along the transmission path.

You can configure PFC in both system view and Ethernet interface view. If you configure PFC in system view and Ethernet interface view multiple times, the most recent configuration takes effect.

For IRF and other protocols to operate correctly, as a best practice, do not enable PFC for 802.1p priority 0, 6, or 7.

To perform PFC on an IRF port, configure PFC on the IRF port and the IRF physical interfaces that are bound to the IRF port. For information about IRF, see Virtual Technologies Configuration Guide.

To perform PFC in an overlay network, execute the qos trust tunnel-dot1p command. For information about the overlay network, see VXLAN Configuration Guide. For information about the qos trust tunnel-dot1p command, see ACL and QoS Command Reference.

To avoid packet loss, apply the same PFC configuration to all interfaces that the packets pass through.

Configuring PFC in system view

Restrictions and guidelines

When emergent failures occur to the PFC feature on the device, you do not need to disable PFC on interfaces one by one. Instead, you can one-key disable PFC on all interfaces at the CLI. After the failures are removed, you can one-key enable PFC on all interfaces at the CLI.

Procedure

1.     Enter system view.

system-view

2.     Enable PFC on all Ethernet interfaces.

priority-flow-control enable { receive | send }

By default, PFC is disabled on all Ethernet interfaces.

3.     Enable PFC for 802.1p priorities on all Ethernet interfaces.

priority-flow-control no-drop dot1p dot1p-list

By default, PFC is disabled for all 802.1p priorities on all Ethernet interfaces.

Configuring PFC in Ethernet interface view

1.     Enter system view.

system-view

2.     Enter Ethernet interface view.

interface interface-type interface-number

3.     Configure the interface to trust 802.1p or DSCP priorities carried in packets.

qos trust { dot1p | dscp }

4.     Enable PFC on the Ethernet interface.

priority-flow-control enable { receive | send }

By default, PFC is disabled on an Ethernet interface.

5.     Enable PFC for 802.1p priorities.

priority-flow-control no-drop dot1p dot1p-list

By default, PFC is disabled for all 802.1p priorities.

6.     (Optional.) Set the pause time in PFC pause frames.

priority-flow-control pause-time time-vale

By default, the pause time in PFC pause frames is 65535. The unit is the time needed for transmitting 512-bit data on the current interface.

Setting PFC thresholds

About PFC thresholds

By configuring the PFC buffer thresholds, you can avoid tail drop in the sending data buffer when the buffer space is insufficient for incoming packets.

Figure 12 PFC buffer thresholds

 

The device supports the following PFC thresholds:

·     Headroom buffer threshold—Maximum number of cell resources that can be used by packets with a specific 802.1p priority value in a headroom storage space, as indicated by headroom in Figure 12. An interface drops received packets once this threshold is reached.

·     Back pressure frame triggering threshold—Maximum number of cell resources that can be used by packets with a specific 802.1p priority value in a shared storage space, as indicated by XOFF in Figure 12. PFC is triggered to send PFC pause frames when this threshold is reached. The back pressure frame triggering threshold includes the following types:

¡     Dynamic back pressure frame triggering threshold—Maximum cell resources set in percentage.

¡     Static back pressure frame triggering threshold—Maximum cell resources set in an absolute value.

The XoffThres and IsDynamic fields in the display buffer usage interface verbose command output represent the back pressure frame triggering threshold and its type.

·     Offset between the back pressure frame stopping threshold and triggering thresholdAs indicated by ingress-threshold-offset in Figure 12. When the number of cell resources used by packets with a specific 802.1p priority value decreases by this offset after PFC is triggered, PFC will be stopped.

·     PFC reserved thresholdAs indicated by reserved-buffer in Figure 12. Number of cell resources reserved for packets with a specific 802.1p priority value in a guaranteed storage space.

·     Maximum number of cell resources in a headroom storage spaceNumber of cell resources allocated to the headroom storage space in a storage pool (only pool 0 is supported in the current software version).

Restrictions and guidelines

After PFC is enabled for 802.1p priorities, the PFC thresholds use the default values, which are adequate in typical network environments. As a best practice, change the PFC thresholds only when necessary. If the network environment or traffic is complicated, consult the professionals to adjust the PFC thresholds.

Headroom buffer threshold

The recommended headroom buffer threshold for an interface depends on the transmission speed and transmission distance of the interface.

·     For a 100-GE interface:

¡     When the transmission distance is 300 meters, you only need to execute the priority-flow-control no-drop dot1p command on the interface. This command automatically sets the default headroom buffer threshold, which is adequate for implementing zero packet loss for the transmission distance.

¡     When the transmission distance is 10000 meters, you must use the priority-flow-control dot1p headroom command on the interface to set the headroom buffer threshold. As a best practice, set the headroom buffer threshold to 9984. You can adjust the threshold as needed to ensure zero packet loss for the transmission distance.

¡     When the transmission distance is longer than 10000 meters, the device series does not support transceiver modules with the transmission distance in the current software version.

·     For a non-100-GE interface:

¡     When the transmission distance is 300, 10000, or 20000 meters, you only need to execute the priority-flow-control no-drop dot1p command on the interface. This command automatically sets the default headroom buffer threshold, which is adequate for implementing zero packet loss for the transmission distance.

¡     When the transmission distance is 40000 meters, you must use the priority-flow-control dot1p headroom command on the interface to set the headroom buffer threshold. As a best practice, set the headroom buffer threshold to 9984. You can adjust the threshold as needed to ensure zero packet loss for the transmission distance.

To set a more exact headroom buffer threshold, calculate the threshold as follows:

1.     Calculate the size of in-transit traffic (traffic transmitted between the time the receiver sends PFC pause frames and the time the sender receives PFC pause frames and actually stops sending packets). The formula is in-transit traffic (bytes) = MTUR + MTUs + Response + 2*link_delay, where the parameters are defined as follows:

¡     MTURLength of a large packet if the chip needs to send a large packet between the time the back pressure frame trigger threshold is triggered and the time the device sends PFC pause frames. Because the large packet might come from any queue, the packet length must be the maximum length of frames allowed to pass through, 9216 bytes.

¡     MTUs—Length of the packet that the upstream device sends immediately before stopping sending packets. Because only packets with the specified 802.1p priority affect the threshold on the receiver, the MTUs is the length of a packet with the 802.1p priority specified by PFC.

¡     Response—Length of data generated within the response time between the time the upstream device receives PFC pause frames and the time the device stops sending packets. The value is fixed at 3840 bytes.

¡     2*link_delay—Length of the forward and backward buffered packets on the link, as shown in the following table:

 

Interface speed

Buffered packets (bytes) in each 100 meter of cable

10 Gbps

1300

25 Gbps

3250

40 Gbps

5200

100 Gbps

13000

 

In conclusion, in-transit traffic (bytes) = 9216 + MTUs + 3840 + N*packets buffered (bytes) in each 100 meter of cable, where N is the cable length (in 100 meters).

2.     Calculate the number of cell resources needed by the in-transit traffic. At the same time, you need to consider the influence of packet length on cell resource allocation. In extreme conditions where each packet is of 64 bytes, most cell resources are occupied.

¡     On the S6805&S6825&S6850&S9850 switch series, each cell resource provides 256 bytes.

¡     On the S9820-64H switch, each cell resource provides 208 bytes.

¡     On the S9820-8C switch, each cell resource provides 254 bytes.

Maximum number of available cell resources in the headroom

The headroom resources are preempted by all ports and all queues on the device. Consider the headroom space needed by each port and the number of ports, and then determine a small value that can meet the requirements.

Back pressure frame triggering threshold

When determining the back pressure frame triggering threshold, analyze the network traffic model, and make sure the sum of back pressure frame triggering thresholds on all incoming interfaces does not exceed the tail drop threshold of the outgoing interface. The tail drop threshold of the outgoing interface is configured by using the buffer egress cell queue shared ratio command, 20% by default. In a lossless Ethernet network, the value can be set to 100%. Calculate the proportion of traffic on each incoming interface to the total buffer. Then, obtain the dynamic back pressure frame triggering threshold (in percentage) based on Table 6. At last, adjust the threshold in actual traffic conditions to obtain the optimal value, and configure the value on the incoming interface.

Table 6 Dynamic back pressure frame triggering thresholds

Percentage (dynamic back pressure frame triggering threshold)

Alpha value (ratio of cells in a queue to available cells, configured by the chip register)

Proportion to total buffer

0

1/128

0.77

1

1/64

1.53

2 to 3

1/32

3.03

4 to 5

1/16

5.88

6 to 11

1/8

11.11

12 to 20

1/4

20.00

21 to 33

1/2

33.33

34 to 50

1

50.00

51 to 66

2

66.66

67 to 80

4

80.00

81 to 100

8

88.88

 

Additionally, when PFC is used together with ECN, for ECN to take effect first, make sure the back pressure frame triggering threshold (the number of cell resources) is greater than the high-limit value configured in the queue queue-id [ drop-level drop-level ] low-limit low-limit high-limit high-limit [ discard-probability discard-prob ] command.

Calculate the number of cell resources for the back pressure frame triggering threshold by using the formula: Used=Total*Alpha/(1+N*Alpha), where the parameters are defined as follows:

·     Used—Buffer size used by a single flow when multiple flows are congested.

·     Total—Total buffer of the device, which is 128K cell resources for the S6805&S6825&S6850&S9850 switch series, 204K cell resources for the S9820-64H switch, and 258K cell resources for the S9820-8C switch.

·     Alpha—Ratio of cells in a queue to available cells.

·     N—Number of congested flows that can be processed by the network.

Offset between the back pressure frame stopping threshold and triggering threshold

Configure the offset as a value slightly greater than the maximum length of a packet with the specified priority. Too large an offset will slow down the back pressure frame stopping speed and affects the traffic sending speed.

PFC reserved threshold

As a best practice, configure the PFC reserved threshold as a value slightly greater than the maximum length of a packet with the specified priority. You can calculate the PFC reserved threshold value as follows: (maximum length of a packet carrying the specified priority + packet header length (64 bytes) + bytes of one cell resource)/(bytes of one cell resource).

Procedure

1.     Enter system view.

system-view

2.     Set the maximum number of cell resources in a headroom storage space.

priority-flow-control poolID pool-number headroom headroom-number

By default, the maximum number of cell resources in the headroom area is 12288 on the S6805&S6825&S6850&S9850 switch series and S9820-8C switches and is 28672 on the S9820-64H switch series.

3.     Enter Ethernet interface view.

interface interface-type interface-number

4.     Set the headroom buffer threshold.

priority-flow-control dot1p dot1p-list headroom headroom-number

For the default settings, see Table 7 and Table 8.

5.     Set the back pressure frame triggering threshold.

¡     Set the dynamic back pressure frame triggering threshold.

priority-flow-control dot1p dot1p-list ingress-buffer dynamic ratio

For the default settings, see Table 7 and Table 8.

¡     Set the static back pressure frame triggering threshold.

priority-flow-control dot1p dot1p-list ingress-buffer static threshold

For the S6850&S9850 switches series, the static back pressure frame triggering threshold is 512 in versions earlier than R6616. For S6850&S9850 switch series, the static back pressure frame triggering threshold is not configured in R6616 or later.

For the S6805, S6825, S9820-64H, and S9820-8C switches, the static back pressure frame triggering threshold is not configured by default.

6.     Set the offset between the back pressure frame stopping threshold and triggering threshold.

priority-flow-control dot1p dot1p-list ingress-threshold-offset offset-number

For the default settings, see Table 7 and Table 8.

When the number of cell resources used by packets with a specific 802.1p priority value decreases by this offset after PFC is triggered, the interface stops sending PFC pause frames and the peer device resumes packet sending.

7.     Set the PFC reserved threshold.

priority-flow-control dot1p dot1p-list reserved-buffer reserved-number

For the default settings, see Table 7 and Table 8.

Table 7 Default PFC threshold settings (R6616 and later)

Product series

PFC threshold (right)

Interface type (below)

Headroom buffer threshold

Dynamic back pressure frame triggering threshold

Offset between the back pressure frame stopping threshold and triggering threshold

PFC reserved threshold

S6805&S6825&S6850&S9850

1GE/10GE

100

5

12

17

25GE

125

5

12

17

40GE

200

5

12

17

100GE

491

5

12

17

S9820-8C

100GE

491

5

12

20

400GE

1000

5

12

20

S9820-64H

25GE

125

5

12

20

100GE

491

5

12

20

 

Table 8 Default PFC thresholds (earlier than R6616)

Product series

PFC threshold (right)

Interface type (below)

Headroom buffer threshold

Dynamic back pressure frame triggering threshold

Offset between the back pressure frame stopping threshold and triggering threshold

PFC reserved threshold

S6805&S6825&S6850&S9850&S9820-8C

All Interfaces

8192

Unconfigured

48

6

S9820-64H

All Interfaces

9984

Unconfigured

48

8

 

Configuring PFC deadlock detection

About PFC deadlock detection

Typically, when an interface is congested and the PFC back pressure frame triggering threshold is reached, the device sends PFC pause frames to the upstream device. Upon receiving the PFC pause frames, the upstream device stops sending packets. If the back pressure frame triggering threshold is reached on the upstream device, the upstream device sends PFC pause frames to its upstream device, and so on. This process is repeated until the server that sends packets receives the PFC pause frames. Then, the server stops sending packets within the PFC pause time specified in the PFC pause frames. This process reduces the packet sending rate at the source end and eliminates packet loss caused by congestion on network nodes.

In some special conditions, for example, when a link or device fails, a temporary loop might appear during the route convergence period. In this case, devices repeatedly send and receive PFC pause frames, and packets cannot be forwarded. As a result, the cell resources in the interface buffer cannot be released, and the device enters the PFC deadlock state.

As shown in Figure 13, the original path for traffic from Server 1 to Server 2 is Server 1—>Device C—>Device A—>Device D—>Server 2. Suppose the link between Device D and Server 2 fails. During the route convergence period, a forwarding path Server 1—>Device C—>Device A—>Device D—>Device B—>Device C—>Server 2 appears. Therefore, a loop Device C—>Device A—>Device D—>Device B—>Device C exists. If the interface connecting Device C to Server 2 is congested, Device C sends PFC pause frames to the upstream device. Then, these PFC pause frames will be forwarded in the loop mentioned above, and each device along the loop stops sending packets and waits for the peer to release cell resources. In this case, the devices along the loop enter the PFC deadlock state.

Figure 13 How a PFC deadlock is generated

 

To remove the PFC deadlock on these devices, disable PFC or ignore the received PFC pause frames (which are used to notify the devices to stop sending packets) to resume packet sending on the devices, and forward or drop packets in the buffer. The PFC deadlock detection interval can be accurate to the millisecond level, and can reduce the influence of PFC deadlocks efficiently.

Restrictions and guidelines

The specified CoS value must be within the 802.1p priority list specified by using the priority-flow-control no-drop dot1p command. To view the 802.1p priority for each CoS value, execute the display qos map-table dot1p-lp command.

PFC deadlock detection tasks at a glance

To configure PFC deadlock detection, perform the following tasks:

1.     Configuring the PFC deadlock detection interval

2.     Configuring the delay timer for PFC deadlock detection automatic recovery

When the PFC deadlock detection recovery mode is set to auto, the device ignores the received PFC pause frames and disables PFC deadlock detection within the delay timer. Then, packets can be forwarded properly.

3.     Configuring the PFC deadlock recovery mode

After the PFC deadlock is removed, you must recover the PFC deadlock detection feature, which recovers PFC at the same time.

4.     Enabling PFC deadlock detection

Configuring the PFC deadlock detection interval

About this task

The PFC deadlock detection interval for a CoS value is the product of the interval argument configured by using the priority-flow-control deadlock cos interval command and the precision configured by using the priority-flow-control deadlock precision command. For example, if you execute the priority-flow-control deadlock cos 5 interval 10 command to set the interval argument to 10 for CoS priority 5 and execute the priority-flow-control deadlock precision command to set the precision to high (which represents 10 milliseconds) for CoS priority 5, the PFC deadlock detection interval for CoS priority 5 is 10*10 =100 milliseconds.

Procedure

1.     Enter system view.

system-view

2.     Set the precision for the PFC deadlock detection timer.

priority-flow-control deadlock precision { high | normal | low }

By default, the PFC deadlock detection timer uses normal precision.

high: Specifies the high precision for the PFC deadlock detection timer. The high precision represents 10 ms.

normal: Specifies the normal precision for the PFC deadlock detection timer. The normal precision represents 100 ms.

low: Specifies the low precision for the PFC deadlock detection timer. This keyword is not supported in the current software version.

3.     Set the PFC deadlock detection interval for the specified CoS value.

priority-flow-control deadlock cos cos-value interval interval [ pause-recover ]

By default, the PFC deadlock detection interval is not set.

interval interval: Specifies the PFC deadlock detection interval in the range of 1 to 15.

pause-recover: Automatically recovers the PFC feature and PFC deadlock detection feature based on whether an interface receives PFC pause frames.

¡     If an interface is in PFC deadlock state and can still receive PFC pause frames when the detection interval expires, the interface is considered as not recovered and stays in PFC deadlock state.

¡     If an interface is in PFC deadlock state and receives no PFC pause frames when the detection interval expires, the interface is considered as recovered, and the PFC feature and PFC deadlock detection feature will be automatically recovered on the interface.

If you do not specify this keyword, the PFC feature and PFC deadlock detection feature are automatically recovered on an interface when the detection interval expires, no matter whether the interface receives PFC pause frames. This parameter is supported only in Release 6616 and later.

Configuring the delay timer for PFC deadlock detection automatic recovery

About this task

When PFC deadlock is detected on a device, the device ignores received PFC XOFF frames and disables PFC deadlock detection during the delay timer for PFC deadlock detection automatic recovery. Then, packets can be forwarded properly.

Restrictions and guidelines

The delay timer for PFC deadlock detection automatic recovery for the specified priority is equal to 100 milliseconds + delay-interval * 100 milliseconds. For example, if you execute the priority-flow-control deadlock auto-recover cos 5 delay 10 command, the delay timer for CoS priority 5 is 100 milliseconds + 10 * 100 milliseconds =1100 milliseconds.

The delay timer takes effect only when the recovery mode is set to auto. In manual mode, you must execute the priority-flow-control deadlock recover command to recover the PFC deadlock detection feature.

The action configured by using the priority-flow-control deadlock auto-recover action command takes effect in both automatic recovery mode and manual recovery mode.

Procedure

1.     Enter system view.

system-view

2.     Configure the delay timer for PFC deadlock detection automatic recovery.

priority-flow-control deadlock auto-recover cos cos-value delay delay-interval

By default, the delay timer for PFC deadlock detection automatic recovery is not configured.

delay delay-time: Specifies the delay timer for PFC deadlock detection automatic recovery, in the range of 1 to 15.

3.     Configure the action to take on packets during the delay timer period for PFC deadlock automatic recovery.

priority-flow-control deadlock auto-recover action { discard | forwarding }

By default, the device forwards received data packets during the delay timer period for PFC deadlock detection automatic recovery.

Configuring the PFC deadlock recovery mode

About this task

After the PFC deadlock is removed on a device, PFC deadlock detection must be recovered, and PFC is recovered at the same time. PFC deadlock detection can be recovered in automatic mode or manual mode.

Typically, use the automatic recovery mode when no serious failures occur. When a packet loop cannot be removed and the device enters the PFC deadlock state frequently, manually recover PFC deadlock detection on an interface as follows:

1.     Set the PFC deadlock detection recovery mode to manual on the interface.

2.     Troubleshoot the device. After the failures are solved, execute the priority-flow-control deadlock recover command to recover the PFC deadlock detection and PFC features on the interface.

After the upper threshold for PFC deadlock times during the specified period is configured, if the PFC deadlock times within the specified period exceed the upper threshold on an interface, the device disables PFC for the corresponding priority on the interface. To recover the PFC feature on the interface in this case, troubleshoot the device and execute the undo priority-flow-control deadlock threshold command after the failures are solved.

Procedure

1.     Enter system view.

system-view

2.     Configure the upper threshold for PFC deadlock times during the specified period.

priority-flow-control deadlock threshold cos cos-value period period count count

By default, the upper threshold for PFC deadlock times during the specified period is not configured.

period period: Specifies the period for detecting PFC deadlock times, in the range of 1 to 60 seconds.

count count: Specifies the upper threshold for PFC deadlock times within the specified period, in the range of 1 to 500.

The detection period specified in this command must be longer than the PFC deadlock detection interval (product of the interval argument configured by using the priority-flow-control deadlock cos interval command and the precision configured by using the priority-flow-control deadlock precision command), so that you can determine whether the device frequently enters the PFC deadlock state.

3.     Enter Ethernet interface view.

interface interface-type interface-number

4.     Set the recovery mode for PFC deadlock detection on the Ethernet interface.

priority-flow-control deadlock recover-mode { auto | manual }

By default, PFC deadlock detection recovers in automatic mode.

5.     (Optional.) Recover PFC deadlock detection on the Ethernet interface.

priority-flow-control deadlock recover

You can use only this command to recover PFC deadlock detection if you set the manual recovery mode for PFC deadlock detection on the Ethernet interface.

Enabling PFC deadlock detection

1.     Enter system view.

system-view

2.     Enter Ethernet interface view.

interface interface-type interface-number

3.     Enable PFC deadlock detection on the Ethernet interface.

priority-flow-control deadlock enable

By default, PFC deadlock detection is disabled.

PFC deadlock detection logs

Message text

PFC Deadlock Recovery Event Begin.Port is [STRING].

Variable fields

$1: Ethernet interface number.

Severity level

4

Example

DRVPLAT/4/DrvDebug:

PFC Deadlock Recovery Event Begin.Port is HGE1/1/3.

Explanation

The device detected a PFC deadlock event on an Ethernet interface.

Recommended action

No action is required.

 

Message text

PFC Deadlock Recovery Event End.Port is [STRING].

Variable fields

$1: Ethernet interface number.

Severity level

4

Example

DRVPLAT/4/DrvDebug

PFC Deadlock Recovery Event End.Port is HGE1/1/3.

Explanation

PFC deadlock was removed from an Ethernet interface.

Recommended action

No action is required.

 

Message text

Please recover PFC deadlock.Port is [STRING].

Variable fields

$1: Ethernet interface number.

Severity level

4

Example

DRVPLAT/4/DrvDebug

Please recover PFC deadlock.Port is HGE1/1/3

Explanation

PFC deadlock detection recovery was not completed on an Ethernet interface. In this case, other commands cannot be executed to modify the PFC deadlock detection status.

Recommended action

No action is required.

 

Message text

PFC deadlock limit has been checked, Ifname: [STRING], Cos: [UINT32], times: [UINT32].

Variable fields

$1: Ethernet interface number.

$2: CoS priority. To see the priority mapping between CoS priority and 802.1p priority, execute the display qos map-table dot1p-lp command.

$3: PFC deadlock times.

Severity level

4

Example

DRVPLAT/4/DrvDebug

PFC deadlock limit has been checked, Ifname: HGE1/1/3, Cos: 5, times: 10.

Explanation

The number of PFC deadlock times for priority 5 has exceeded the upper limit within the specified detection period.

Recommended action

1.     Troubleshoot the network to solve the failures. The device will disable PFC for the corresponding priority on the corresponding interface.

2.     After the failures are solved, execute the undo priority-flow-control deadlock threshold command to recover PFC for the corresponding priority on the corresponding interface.

 

Configuring the early warning thresholds for PFC packets

About this task

You can configure the early warning threshold for incoming or outgoing PFC packets of an interface as needed. The early warning threshold notifies a situation where the PFC packet transmission rate is still within a normal range but needs attention.

When the rate of PFC packets that an interface sends or receives reaches the early warning threshold, the system generates traps and logs to notify the user. According to the traps and logs, the user can discover some exceptions in the network, for example:

·     The NIC of the peer device fails and continuously sends PFC packets at a high speed. In this case, you can set the early warning threshold for incoming PFC packets.

·     The device fails and continuously sends PFC pause frames. In this case, you can set the early warning threshold for outgoing PFC packets.

To monitor bidirectional PFC packets, you can set the early warning thresholds for incoming packets and outgoing packets separately.

Procedure

1.     Enter system view.

system-view

2.     Enter Ethernet interface view.

interface interface-type interface-number

3.     Configure the early warning threshold for incoming PFC packets.

priority-flow-control early-warning dot1p dot1p-list inpps pps-value

By default, no early warning threshold is configured for incoming PFC packets.

4.     Configure the early warning threshold for outgoing PFC packets.

priority-flow-control early-warning dot1p dot1p-list outpps pps-value

By default, no early warning threshold is configured for outgoing PFC packets.

Displaying PFC information at the CLI and reporting PFC information by using gRPC

Displaying PFC information at the CLI

Overview

Task

Command

Display the PFC configuration on each interface and the number and rate of incoming or outgoing PFC pause frames for each interface or queue

display priority-flow-control

Display the total number of packets dropped and the number of packets dropped on each interface on the receiver or sender

display packet-drop

 

display priority-flow-control command output

# Display the PFC information for all Ethernet interfaces.

<Sysname> display priority-flow-control interface

Conf -- Configured mode   Ne -- Negotiated mode   P -- Priority

Interface     Conf Ne  Dot1pList   P Recv       Sent       Inpps      Outpps

WGE1/0/1      Auto On  0,2-3,5-6   0 178        43         12         15

Table 9 Command output

Field

Description

Conf -- Configured mode

Locally configured PFC status.

Ne -- Negotiated mode

Negotiated PFC status.

P -- Priority

802.1p priority value for which PFC is enabled.

Interface

Abbreviated name of the interface.

Conf

Locally configured PFC status:

·     Auto—The interface is configured to autonegotiate the PFC status with the remote end.

·     Off—PFC is disabled for the interface.

·     On—PFC is enabled for the interface.

Ne

Negotiated PFC status:

·     Off—PFC is disabled.

·     On—PFC is enabled.

Dot1pList

802.1p priorities that are enabled with PFC. 802.1p priority values 0 through 7 are available.

P

An 802.1p priority is displayed only when the 802.1p priority is enabled with PFC and the interface has received or sent packets with the 802.1p priority.

Recv

Number of received PFC pause frames.

Sent

Number of sent PFC pause frames.

Inpps

Incoming PFC pause frame rate in pps for the 802.1p priority.

Outpps

Outgoing PFC pause frame rate in pps for the 802.1p priority.

 

display packet-drop command output

# Display information about dropped packets on Twenty-FiveGigE 1/0/1.

<Sysname> display packet-drop interface twenty-fivegige 1/0/1

Twenty-FiveGigE1/0/1:

  Packets dropped due to Fast Filter Processor (FFP): 261

  Packets dropped due to STP non-forwarding state: 0

  Packets dropped due to insufficient data buffer. Input dropped: 0 Output dropped:0

  Packets of ECN marked: 0

  Packets of WRED dropped: 0

# Display the summary of dropped packets on only interfaces that support this command.

<Sysname> display packet-drop summary

All interfaces:

  Packets dropped due to Fast Filter Processor (FFP): 261

  Packets dropped due to STP non-forwarding state: 0

  Packets dropped due to insufficient data buffer. Input dropped: 0 Output dropped:0

  Packets of ECN marked: 0

  Packets of WRED dropped: 0

Table 10 Command output

Field

Description

Packets dropped due to Fast Filter Processor (FFP)

Packets that are dropped due to FFP in the inbound direction.

Packets dropped due to STP non-forwarding state

Packets that are dropped because STP is in the non-forwarding state.

Packets dropped due to insufficient data buffer. Input dropped: 0 Output dropped:0

Inbound and outbound packets that are dropped due to insufficient data buffer.

Packets of ECN marked

Packets with the ECN field set to 11 because WRED queue thresholds are reached. For more information about WRED and ECN, see ACL and QoS Configuration Guide.

Packets of WRED dropped

Packets that are dropped because the WRED queue thresholds are reached.

 

Reporting PFC information by using gRPC

For more information about gRPC and data buffer monitoring information, see "Using gRPC to report buffer usage information."

To use gRPC to report the total number of PFC pause frames and the rate of PFC pause frames, specify the periodic sensor paths buffermonitor/pfcstatistics and buffermonitor/pfcspeeds. For more information about the two sensor paths, see Comware V7 BufferMonitor GRPC API Reference.

See also

For more information about PFC configuration and commands, see Ethernet interface configuration in Layer 2—LAN Switching Configuration Guide and Ethernet interface commands in Layer 2—LAN Switching Command Reference.

ECN

About ECN

ECN is required for building a lossless Ethernet network. ECN defines a traffic control and end-to-end congestion notification mechanism based on the IP layer and transport layer. ECN uses the DS field in the IP header to mark the congestion status along the packet transmission path. An ECN-capable terminal can determine whether congestion occurs on the transmission path according to the packet contents. Then, the terminal adjusts the packet sending speed to avoid deteriorating congestion.

Mechanism

ECN defines the last two bits (ECN field) in the DS field of the IP header as follows:

·     Bit 6 indicates whether the sending terminal device supports ECN, and is called the ECN-Capable Transport (ECT) bit.

·     Bit 7 indicates whether the packet has experienced congestion along the transmission path, and is called the Congestion Experienced (CE) bit.

 

 

NOTE:

In actual applications, the following packets are considered as packets that an ECN-capable terminal sends:

·     Packets with ECT set to 1 and CE set to 0.

·     Packets with ECT set to 0 and CE set to 1.

 

Figure 14 DS field location

 

Figure 15 ECN field location

 

ECN works in the following flow:

1.     The sender sets the ECN field to 10, and notifies the devices along the transmission and the receiver that the sender supports ECN.

2.     When congestion occurs on an intermediate device, the congested device sets the ECN field to 11 for congested packets, and normally forwards the packets.

3.     When the receiver receives packets with the ECN field set to 11, the transport layer sends congestion notification packets (CNPs) to the sender.

4.     When receiving the CNPs, the sender slows down the sending rate of packets with the specified priority.

5.     After a configurable period of time or the specified number of packets are sent, the sender resumes the original sending rate.

Figure 16 ECN working mechanism

 

An ECN-enabled forwarding device recognizes and processes received data packets in the following manner:

·     When packets of the forwarding device are enqueued in the outbound direction and the queue length is smaller than the lower threshold (ECN threshold), the forwarding device forwards the packets out of the outgoing interface directly without any processing.

·     When packets of the forwarding device are enqueued in the outbound direction and the queue length is greater than the lower threshold but smaller than the upper threshold, the following rules apply:

¡     If the ECN field value of the received packets is 00, which indicates that the sender does not support ECN, the forwarding device calculates the probability of dropping the packets.

¡     If the ECN field value of the received packets is 01 or 10, which indicates that the sender supports ECN, the forwarding device modifies the ECN field of some incoming packets to 11 based on the drop probability and continues to forward the modified packets. All incoming packets are not dropped.

¡     If the ECN field value of the received packets is 11, which indicates that the packets had a congestion on a previous forwarding device, the forwarding device does not process the packets and forwards them out of the outgoing interface directly.

·     When packets of the forwarding device are enqueued in the outbound direction and the queue length is greater than the upper threshold, the following rules apply:

¡     If the ECN field value of the received packets is 00, which indicates that the sender does not support ECN, the forwarding device drops the received packets.

¡     If the ECN field value of the received packets is 01 or 10, which indicates that the sender supports ECN, the forwarding device modifies the ECN field of some incoming packets to 11 based on the drop probability and continues to forward the modified packets. All incoming packets are not dropped.

¡     If the ECN field value of the received packets is 11, which indicates that the packets had a congestion on a previous forwarding device, the forwarding device does not process the packets and forwards them out of the outgoing interface directly.

Restrictions and guidelines

For ECN to take effect before PFC when they are both used, make sure the static back pressure frame triggering threshold (in number of cell resources) is greater than the ECN high limit value.

Configuring ECN

1.     Enter system view.

system-view

2.     Create a WRED table and enter its view.

qos wred queue table table-name

3.     (Optional.) Set the WRED exponent for average queue size calculation.

queue queue-id weighting-constant exponent

The default setting is 9.

4.     (Optional.) Configure the other WRED parameters.

queue queue-id [ drop-level drop-level ] low-limit low-limit high-limit high-limit [ discard-probability discard-prob ]

By default, the lower limit is 100, the higher limit is 1000, and the drop probability is 10%.

5.     Enable ECN for a queue.

queue queue-id ecn

By default, ECN is disabled for a queue.

6.     Return to system view.

quit

7.     Enter interface view.

interface interface-type interface-number

8.     Apply the WRED table to the interface.

qos wred apply [ table-name ]

By default, no WRED table is applied to an interface, and tail drop is used on an interface.

One WRED table can be applied to multiple interfaces. You can modify the parameters of a WRED table applied to an interface, but you cannot delete the WRED table.

Displaying ECN information at the CLI and reporting ECN information by using gRPC

Displaying ECN information at the CLI

Execute the display packet-drop command. The Packets of ECN marked and Packets of WRED droped fields are about ECN information.

Reporting ECN information by using gRPC

To report ECN information by using gRPC, configure the buffermonitor/ecnandwredstatistics table. For more information about this table, see Comware V7 BufferMonitor GRPC API Reference.

See also

For more information about ECN configuration and commands, see ACL and QoS Configuration Guide and ACL and QoS Command Reference.

DCBX

About DCBX

DCBX is a key feature for building a lossless Ethernet network. Data center Ethernet (DCE) uses Data Center Bridging Exchange Protocol (DCBX) to negotiate and remotely configure the bridging capabilities of network elements. With DCBX, DCB parameters can be negotiated and automatically configured between switches or switches and NICs. DCBX simplifies the network configuration and guarantees configuration consistency.

Mechanism

Figure 17 DCBX configuration in a network

 

DCBX uses Link Layer Discovery Protocol (LLDP) to exchange configuration information between two ends of a link.

To enable DCBX on an interface, first enable LLDP globally and on the interface, and configure the interface to advertise DCBX TLVs. Then, you can configure DCBX to advertise Application Protocol (APP), Enhanced Transmission Selection (ETS), and PFC parameters on the interface as needed. In this document, DCBX is used to advertise ETS parameters.

When configuring DCBX, you must configure the DCBX version, which can be manually configured or autonegotiated. For DCBX to work properly, make sure the DCBX version is the same on the local and peer devices.

Configuring DCBX

1.     Enter system view.

system-view

2.     Enable LLDP globally.

lldp global enable

By default:

¡     If the device is started with the software default settings, LLDP is disabled globally.

¡     If the device is started with the factory default settings, LLDP is enabled globally.

3.     Enter Layer 2 Ethernet interface view.

interface interface-type interface-number

4.     Enable LLDP on the interface.

lldp enable

By default, LLDP is enabled on an interface.

5.     Enable the interface to advertise DCBX TLVs.

lldp tlv-enable dot1-tlv dcbx

By default, DCBX TLV advertisement is disabled on an interface.

6.     Set the DCBX version.

dcbx version { rev100 | rev101 | standard }

By default, the DCBX version is not configured. It is autonegotiated by the local port and peer port.

See also

For more information about DCBX, see LLDP configuration and commands in Layer 2—LAN Switching Configuration Guide and Layer 2—LAN Switching Command Reference.

ETS

About ETS

ETS allocates bandwidth based on priority groups and provides committed bandwidth. To avoid packet loss caused by congestion, the device performs the following operations:

1.     Uses ETS parameters to negotiate with the peer device.

2.     Controls the peer device's transmission speed of the specified type of traffic.

3.     Guarantees that the transmission speed is within the committed bandwidth of the interface.

Mechanism

ETS classifies the priorities of traffic in the network into multiple priority groups and allocates certain bandwidth to each priority group. If the bandwidth allocated to a priority group is not used, the other priority groups can use the unused bandwidth. ETS guarantees bandwidth for important traffic during the transmission procedure.

To configure ETS parameters, perform the following tasks:

1.     Configure the 802.1p-to-local priority mapping by using either of the following methods:

¡     MQC method.

¡     Priority mapping table method.

If you configure the 802.1p-to-local priority mapping in both methods, the configuration made in the MQC method applies. For the QoS policy and priority mapping table configuration commands, see the manual for your device.

2.     Configure group-based WRR queuing to allocate bandwidth. For information about WRR configuration commands, see the manual for your device.

WRR queuing schedules all the queues in turn to ensure that every queue is served for a certain time. Assume an interface provides eight output queues. WRR assigns each queue a weight value (represented by w7, w6, w5, w4, w3, w2, w1, or w0). The weight value of a queue decides the proportion of resources assigned to the queue. On a 100 Mbps interface, you can set the weight values to 50, 50, 30, 30, 10, 10, 10, and 10 for w7 through w0. In this way, the queue with the lowest priority can get a minimum of 5 Mbps of bandwidth.

Another advantage of WRR queuing is that when the queues are scheduled in turn, the service time for each queue is not fixed. If a queue is empty, the next queue will be scheduled immediately. This improves bandwidth resource use efficiency.

WRR queuing includes the following types:

¡     Basic WRR queuing—Contains multiple queues. You can set the weight for each queue, and WRR schedules these queues based on the user-defined parameters in a round robin manner.

¡     Group-based WRR queuing—All the queues are scheduled by WRR. You can divide output queues to WRR priority queue group 1 and WRR priority queue group 2. Round robin queue scheduling is performed for group 1 first. If group 1 is empty, round robin queue scheduling is performed for group 2. Only WRR priority queue group 1 is supported in the current software version.

On an interface enabled with group-based WRR queuing, you can assign queues to the SP group. Queues in the SP group are scheduled with SP. The SP group has higher scheduling priority than the WRR groups.

To configure ETS to guarantee bandwidth for important traffic, you can configure group-based WRR queuing in one of the following methods:

·     Configure a higher weight value for the queue of important traffic in WRR priority queue group 1.

qos wrr queue-id group 1 byte-count schedule-value

·     Assign the queue of important traffic to the SP group.

qos wrr queue-id group sp

Configuring ETS

Configure an 802.1p-to-local priority mapping

About this task

Perform this task to put packets with a specific 802.1p priority into the specified queue.

Restrictions and guidelines

You can perform this task by using either of the following methods:

·     MQC method.

·     Priority mapping table method.

If you perform this task in both methods, the configuration made in the MQC method applies.

Configure an 802.1p-to-local priority mapping by using a QoS policy

1.     Enter system view.

system-view

2.     Create a traffic class and enter traffic class view.

traffic classifier classifier-name operator or

3.     (Optional.) Configure a description for the traffic class.

description text

By default, no description is configured for a traffic class.

4.     Configure an 802.1p priority match criterion.

if-match service-dot1p dot1p-value&<1-8>

By default, no match criterion is configured.

5.     Return to system view.

quit

6.     Create a traffic behavior and enter traffic behavior view.

traffic behavior behavior-name

7.     Configure a local precedence marking action in the traffic behavior.

remark local-precedence local-precedence-value

By default, no marking action is configured for a traffic behavior.

8.     Return to system view.

quit

9.     Create a QoS policy and enter QoS policy view.

qos [ remarking ] policy policy-name

10.     Associate a traffic class with a traffic behavior in the QoS policy.

classifier classifier-name behavior behavior-name mode dcbx [ insert-before before-classifier-name ]

11.     Return to system view.

quit

12.     Enter Ethernet interface view.

interface interface-type interface-number

13.     Apply the QoS policy to the Ethernet service instance.

qos apply [ remarking ] policy policy-name inbound [ share-mode ]

By default, no QoS policy is applied to an Ethernet service instance.

Configure an 802.1p-to-local priority mapping by using a mapping table

1.     Enter system view.

system-view

2.     Enter Ethernet interface view.

interface interface-type interface-number

3.     Configure the priority trust mode as dot1p.

qos trust dot1p

By default, an interface does not trust any packet priority and uses the port priority as the 802.1p priority for mapping.

4.     Return to system view.

quit

5.     Enter the view of the dot1p-lp mapping table.

qos map-table dot1p-lp

6.     Configure an 802.1p-to-local priority mapping for the mapping table.

import import-value-list export export-value

By default, the default priority maps are used. For more information, see "Default priority map."

Configuring SP+WRR queuing

1.     Enter system view.

system-view

2.     Enter Ethernet interface view.

interface interface-type interface-number

3.     Enable byte-count WRR queuing.

qos wrr byte-count

By default, byte-count WRR queuing is enabled.

4.     Assign a queue to the SP group.

qos wrr queue-id group sp

By default, all queues on a WRR-enabled interface are in WRR group 1.

5.     Assign a queue to the WRR group, and configure a scheduling weight for the queue.

qos wrr queue-id group  1 byte-count schedule-value

By default, all queues on a WRR-enabled interface are in WRR group 1, and queues 0 through 7 have a weight of 1, 2, 3, 4, 5, 9, 13, and 15, respectively.

See also

For more information about the ETS function, see LLDP in Layer 2LAN Switching Configuration Guide.

For more information about the commands, see ACL and QoS Command Reference.

Example: Configuring RDMA

Network configuration

As shown in Figure 18, Server 1, Server 2, and Server 3 all have RoCE NICs installed. Server 1 and Server 2 are connected to Server 3 through S6850 switches Device A and Device B. Device B is connected to the gRPC server.

Configure the network as a lossless Ethernet network to support RoCE as follows:

·     Enable PFC on all interfaces along the packet transmission paths. This example enables lossless transmission for packets with 802.1p priority 5.

·     Enable DCBX on the interfaces connecting switches to servers, so that the switches and server NICs can negotiate ETS and PFC parameters.

·     Configure ETS to guarantee bandwidth for packets with 802.1p priority 5 on Twenty-FiveGigE 1/0/3 of Device A and Twenty-FiveGigE 1/0/2 and Device B.

 

 

NOTE:

In this example, suppose the traffic from Server 1 and Server 2 to Server 3 is more than the reverse traffic. Therefore, ETS is configured only on the interfaces mentioned above. If traffic is unpredictable in actual conditions, you can configure ETS on all interfaces in the network.

 

·     Configure ECN on Twenty-FiveGigE 1/0/3 of Device A, so that Device A can mark packets with an ECN flag and notify the senders to adjust the packet sending rate when congestion occurs on Device A.

 

 

NOTE:

In this example, congestion might occur on Twenty-FiveGigE 1/0/3 of Device A, so ECN is configured only on the interface. If the congestion positions are unpredictable in actual conditions, you can configure ECN on all interfaces in the network.

 

·     Configure data buffer management and monitoring on Device A and Device B to ensure that packets with 802.1p priority 5 are not dropped because of insufficient buffer and monitor the buffer usages of queues.

Figure 18 Network diagram

 

Procedures

1.     Make sure the devices and collector have been configured with IP addresses and can reach each other at Layer 3.

2.     Configure Device A:

# Configure Twenty-FiveGigE 1/0/1, Twenty-FiveGigE 1/0/2, and Twenty-FiveGigE 1/0/3 to trust 802.1p priorities carried in packets. Enable PFC and enable PFC for 802.1p priority 5 on these interfaces.

<DeviceA> system-view

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/3

[DeviceA-if-range] qos trust dot1p

[DeviceA-if-range] priority-flow-control enable

[DeviceA-if-range] priority-flow-control no-drop dot1p 5

[DeviceA-if-range] quit

# Enable LLDP globally.

[DeviceA] lldp global enable

# Enable LLDP on Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2. Enable these interfaces to advertise DCBX TLVs. Set the DCBX version to Rev. 1.01 on these interfaces.

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceA-if-range] lldp enable

[DeviceA-if-range] lldp tlv-enable dot1-tlv dcbx

[DeviceA-if-range] dcbx version rev101

[DeviceA-if-range] quit

# Enable byte-count WRR on Twenty-FiveGigE 1/0/3. Assign queue 5 (802.1p priority 5 is mapped to local precedence 5 by default) to the SP group.

[DeviceA] interface twenty-fivegige 1/0/3

[DeviceA-Twenty-FiveGigE1/0/3] qos wrr byte-count

[DeviceA-Twenty-FiveGigE1/0/3] qos wrr 5 group sp

[DeviceA-Twenty-FiveGigE1/0/3] quit

# Create WRED table queue-table5. In the WRED table, set the exponent for WRED to calculate the average queue size and WRED parameters, and enable ECN for queue 5. Apply WRED table queue-table5 to Twenty-FiveGigE 1/0/3.

[DeviceA] qos wred queue table queue-table5

[DeviceA-wred-table-queue-table5] queue 5 weighting-constant 12

[DeviceA-wred-table-queue-table5] queue 5 drop-level 0 low-limit 10 high-limit 20 discard-probability 30

[DeviceA-wred-table-queue-table5] queue 5 ecn

[DeviceA-wred-table-queue-table5] quit

[DeviceA] interface twenty-fivegige 1/0/3

[DeviceA-Twenty-FiveGigE1/0/3] qos wred apply queue-table5

# On Twenty-FiveGigE 1/0/3, configure queue 5 to use 15% fixed-area space and 25% shared-area space of cell resources in the egress buffer.

[DeviceA-Twenty-FiveGigE1/0/3] buffer egress cell queue 5 guaranteed ratio 15

[DeviceA-Twenty-FiveGigE1/0/3] buffer egress cell queue 5 shared ratio 25

[DeviceA-Twenty-FiveGigE1/0/3] quit

# Configure service pool 1 to use up to 20% shared-area space of cell resources in the egress buffer on slot 1.

[DeviceA] buffer egress slot 1 cell service-pool 1 shared ratio 20

# Apply manually configured data buffer settings.

[DeviceA] buffer apply

# Map queue 5 to service pool 1 on Twenty-FiveGigE 1/0/3.

[DeviceA] interface twenty-fivegige 1/0/3

[DeviceA-Twenty-FiveGigE1/0/3] buffer egress queue 5 map-to service-pool 1

[DeviceA-Twenty-FiveGigE1/0/3] quit

# Set the per-interface buffer usage threshold to 90% for slot 1. When a queue on an interface uses more buffer space than the set threshold, the system counts one threshold violation for the queue.

[DeviceA] buffer usage threshold slot 1 ratio 90

# Set the ingress buffer alarm threshold to 90% for queue 5 on Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2. When queue 5 exceeds the alarm threshold for the ingress buffer, the device generates and reports a threshold-crossing alarm.

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceA-if-range] buffer ingress usage threshold queue 5 ratio 90

[DeviceA-if-range] quit

# Set the egress buffer alarm threshold to 90% for queue 5 on Twenty-FiveGigE 1/0/3. When queue 5 exceeds the alarm threshold for the egress buffer, the device generates and reports a threshold-crossing alarm.

[DeviceA] interface twenty-fivegige 1/0/3

[DeviceA-Twenty-FiveGigE1/0/3] buffer egress usage threshold queue 5 ratio 90

[DeviceA-Twenty-FiveGigE1/0/3] quit

# Set the alarm threshold to 90% for service pool 1 on slot 1.

[DeviceA] buffer egress usage threshold service-pool 1 slot 1 ratio 90

# Enable threshold-crossing alarms.

[DeviceA] buffer threshold alarm egress enable

# Set the headroom buffer alarm threshold to 90% for queue 5 on Twenty-FiveGigE 1/0/1, Twenty-FiveGigE1/0/2, and Twenty-FiveGigE 1/0/3. When a queue exceeds the alarm threshold, the device generates and reports a threshold-crossing alarm.

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/3

[DeviceA-if-range] buffer usage threshold headroom queue 5 ratio 90

[DeviceA-if-range] quit

# Enable threshold-crossing alarms for the headroom buffer.

[DeviceA] buffer threshold alarm headroom enable

# Enable packet-drop alarms.

[DeviceA] buffer packet-drop alarm enable

# Enable gRPC.

[DeviceA] grpc enable

# Create sensor group test1, and add sensor paths bufferusages and headroomusages.

[DeviceA] telemetry

[DeviceA-telemetry] sensor-group test1

[DeviceA-telemetry-sensor-group-test1] sensor path buffermonitor/bufferusages

[DeviceA-telemetry-sensor-group-test1] sensor path buffermonitor/headroomusages

[DeviceA-telemetry-sensor-group-test1] quit

# Create sensor group test2, and add sensor paths portqueoverrunevent and portquedropevent.

[DeviceA-telemetry] sensor-group test2

[DeviceA-telemetry-sensor-group-test2] sensor path buffermonitor/portqueoverrunevent

[DeviceA-telemetry-sensor-group-test2] sensor path buffermonitor/portquedropevent

[DeviceA-telemetry-sensor-group-test2] quit

# Create destination group collector1, and add a collector that uses IPv4 address 192.168.2.1 and port number 50050 to the destination group.

[DeviceA-telemetry] destination-group collector1

[DeviceA-telemetry-destination-group-collector1] ipv4-address 192.168.2.1 port 50050

[DeviceA-telemetry-destination-group-collector1] quit

# Create subscription A, specify sensor group test1 for the subscription with the data sampling interval 10 seconds, and specify sensor group test2 and destination group collector1 for the subscription.

[DeviceA-telemetry] subscription A

[DeviceA-telemetry-subscription-A] sensor-group test1 sample-interval 10

[DeviceA-telemetry-subscription-A] sensor-group test2

[DeviceA-telemetry-subscription-A] destination-group collector1

[DeviceA-telemetry-subscription-A] quit

3.     Configure Device B:

# Configure Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2 to trust 802.1p priorities carried in packets. Enable PFC and enable PFC for 802.1p priority 5 on these interfaces.

<DeviceB> system-view

[DeviceB] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceB-if-range] qos trust dot1p

[DeviceB-if-range] priority-flow-control enable

[DeviceB-if-range] priority-flow-control no-drop dot1p 5

[DeviceB-if-range] quit

# Enable LLDP globally.

[DeviceB] lldp global enable

# Enable LLDP on Twenty-FiveGigE 1/0/2. Enable the interface to advertise DCBX TLVs. Set the DCBX version to Rev. 1.01 on the interface.

[DeviceB]interface twenty-fivegige 1/0/2

[DeviceB-Twenty-FiveGigE1/0/2] lldp enable

[DeviceB-Twenty-FiveGigE1/0/2] lldp tlv-enable dot1-tlv dcbx

[DeviceB-Twenty-FiveGigE1/0/2] dcbx version rev101

[DeviceB-Twenty-FiveGigE1/0/2] quit

# Enable byte-count WRR on Twenty-FiveGigE 1/0/2. Assign queue 5 (802.1p priority 5 is mapped to local precedence 5 by default) to the SP group.

[DeviceB] interface twenty-fivegige 1/0/2

[DeviceB-Twenty-FiveGigE1/0/2] qos wrr byte-count

[DeviceB-Twenty-FiveGigE1/0/2] qos wrr 5 group sp

# On Twenty-FiveGigE 1/0/2, configure queue 5 to use 15% fixed-area space and 25% shared-area space of cell resources in the egress buffer.

[DeviceB-Twenty-FiveGigE1/0/2] buffer egress cell queue 5 guaranteed ratio 15

[DeviceB-Twenty-FiveGigE1/0/2] buffer egress cell queue 5 shared ratio 25

[DeviceB-Twenty-FiveGigE1/0/2] quit

# Configure service pool 1 to use up to 20% shared-area space of cell resources in the egress buffer on slot 1.

[DeviceB] buffer egress slot 1 cell service-pool 1 shared ratio 20

# Apply manually configured data buffer settings.

[DeviceB] buffer apply

# Set the per-interface buffer usage threshold to 90%. When a queue on an interface uses more buffer space than the set threshold, the system counts one threshold violation for the queue.

[DeviceB] buffer usage threshold slot 1 ratio 90

# Map queue 5 to service pool 1 on Twenty-FiveGigE 1/0/2.

[DeviceB] interface twenty-fivegige 1/0/2

[DeviceB-Twenty-FiveGigE1/0/2] buffer egress queue 5 map-to service-pool 1

[DeviceB-Twenty-FiveGigE1/0/2] quit

# Set the ingress buffer alarm threshold to 90% for queue 5 on Twenty-FiveGigE 1/0/1. When queue 5 exceeds the alarm threshold for the ingress buffer, the device generates and reports a threshold-crossing alarm.

[DeviceB] interface twenty-fivegige 1/0/1

[DeviceB-Twenty-FiveGigE1/0/1] buffer ingress usage threshold queue 5 ratio 90

[DeviceB-Twenty-FiveGigE1/0/1] quit

# Set the egress buffer alarm threshold to 90% for queue 5 on Twenty-FiveGigE 1/0/2. When queue 5 exceeds the alarm threshold for the egress buffer, the device generates and reports a threshold-crossing alarm.

[DeviceB] interface twenty-fivegige 1/0/2

[DeviceB-Twenty-FiveGigE1/0/2] buffer egress usage threshold queue 5 ratio 90

[DeviceB-Twenty-FiveGigE1/0/2] quit

# Set the alarm threshold to 90% for service pool 1 on slot 1.

[DeviceB] buffer egress usage threshold service-pool 1 slot 1 ratio 90

# Enable threshold-crossing alarms.

[DeviceB] buffer threshold alarm egress enable

# Set the headroom buffer alarm threshold to 90% for queue 5 on Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE1/0/2. When a queue exceeds the alarm threshold, the device generates and reports a threshold-crossing alarm.

[DeviceB] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceB-if-range] buffer usage threshold headroom queue 5 ratio 90

[DeviceB-if-range] quit

# Enable threshold-crossing alarms for the headroom buffer.

[DeviceB] buffer threshold alarm headroom enable

# Enable packet-drop alarms.

[DeviceB] buffer packet-drop alarm enable

# Enable gRPC.

[DeviceB] grpc enable

# Create sensor group test1, and add sensor paths bufferusages and headroomusages.

[DeviceB] telemetry

[DeviceB-telemetry] sensor-group test1

[DeviceB-telemetry-sensor-group-test1] sensor path buffermonitor/bufferusages

[DeviceB-telemetry-sensor-group-test1] sensor path buffermonitor/headroomusages

[DeviceB-telemetry-sensor-group-test1] quit

# Create sensor group test2, and add sensor paths portqueoverrunevent and portquedropevent.

[DeviceB-telemetry] sensor-group test2

[DeviceB-telemetry-sensor-group-test2] sensor path buffermonitor/portqueoverrunevent

[DeviceB-telemetry-sensor-group-test2] sensor path buffermonitor/portquedropevent

[DeviceB-telemetry-sensor-group-test2] quit

# Create destination group collector1, and add a collector that uses IPv4 address 192.168.2.1 and port number 50050 to the destination group.

[DeviceB-telemetry] destination-group collector1

[DeviceB-telemetry-destination-group-collector1] ipv4-address 192.168.2.1 port 50050

[DeviceB-telemetry-destination-group-collector1] quit

# Create subscription A, specify sensor group test1 for the subscription with the data sampling interval 10 seconds, and specify sensor group test2 and destination group collector1 for the subscription.

[DeviceB-telemetry] subscription A

[DeviceB-telemetry-subscription-A] sensor-group test1 sample-interval 10

[DeviceB-telemetry-subscription-A] sensor-group test2

[DeviceB-telemetry-subscription-A] destination-group collector1

[DeviceB-telemetry-subscription-A] quit

Developing and executing a gRPC collector-side application

For more information, see "Developing a gRPC collector-side application."

Verifying the configuration

# Display information about packets dropped on Device B.

<DeviceB> display packet-drop summary

All interfaces:

  Packets dropped due to Fast Filter Processor (FFP): 0

  Packets dropped due to STP non-forwarding state: 0

  Packets dropped due to insufficient data buffer. Input dropped: 0 Output dropped: 0

  Packets of ECN marked: 1622267130

  Packets of WRED dropped: 0

The output shows that zero packets are dropped on Device B.

# Display the bandwidth usage of Twenty-FiveGigE 1/0/2 on Device B.

<DeviceB> display counters rate outbound interface Twenty-FiveGigE 1/0/2

Usage: Bandwidth utilization in percentage

Interface            Usage (%)   Total (pps)   Broadcast (pps)   Multicast (pps)

WGE1/0/2                    100        2825427                  --                  --

 

 Overflow: More than 14 digits.

       --: Not supported.

The output shows that the bandwidth usage of Twenty-FiveGigE 1/0/2 is 100%.

Example: Configuring PFC deadlock detection

Network configuration

As shown in Figure 1:

·     Enable PFC for packets with 802.1p priority 5 on S6850 switches Device A, Device B, Device C, and Device D.

·     Connect Server 1 and Server 2 to devices through Simple Multichassis Link Aggregation (S-MLAG) to provide device-level redundancy and load sharing.

When link switchover occurs due to aggregate link or device failure, a temporary loop might appear to cause PFC deadlock and affect packet forwarding. Configure PFC deadlock detection on Device A, Device B, Device C, and Device D with the PFC deadlock detection interval as 50 ms and delay timer for PFC deadlock detection automatic recovery as 700 ms.

Figure 19 Network diagram

 

Restrictions and guidelines

The CoS value specified must be within the 802.1p priority list specified in the priority-flow-control no-drop dot1p command. Different CoS values correspond to different 802.1p priority values. For the mappings, execute the display qos map-table dot1p-lp command. In this example, the default 802.1p-to-local priority mapping table is used, where 802.1p priority 5 is mapped to CoS value 5.

Procedures

1.     Configure S-MLAG. (Details not shown.)

2.     Configure Device A:

# On Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2, configure the interfaces to trust 802.1p priorities carried in packets, enable PFC, and enable PFC for packets with 802.1p priority 5.

<DeviceA> system-view

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceA-if-range] qos trust dot1p

[DeviceA-if-range] priority-flow-control enable

[DeviceA-if-range] priority-flow-control no-drop dot1p 5

[DeviceA-if-range] quit

# Set the precision for the PFC deadlock detection timer to high, which represents 10 ms.

<DeviceA> system-view

[DeviceA] priority-flow-control deadlock precision high

# Set the PFC deadlock detection interval to 5 for CoS value 5.

[DeviceA] priority-flow-control deadlock cos 5 interval 5

# Set the delay timer for PFC deadlock detection automatic recovery to 6 for CoS value 5.

[DeviceA] priority-flow-control deadlock auto-recover cos 5 delay 6

# Configure the action to take on packets during the delay timer period for PFC deadlock automatic recovery as forwarding. (The default is forwarding. You can skip this step if the device uses the default settings.)

[DeviceA] priority-flow-control deadlock auto-recover action forwarding

# Configure the upper threshold for PFC deadlock times as 10 during the specified period of one second.

[DeviceA] priority-flow-control deadlock threshold cos 5 period 1 count 10

# Set the recovery mode for PFC deadlock detection to auto on Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2. (The default is auto. You can skip this step if the device uses the default settings.)

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceA-if-range] priority-flow-control deadlock recover-mode auto

# Enable PFC deadlock detection on Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2.

[DeviceA-if-range] priority-flow-control deadlock enable

[DeviceA-if-range] quit

3.     Configure Device B, Device C, and Device D in the same way Device A is configured.

Verifying the configuration

# Send a large amount of traffic continuously from Server 1 to Server 2. Disconnect the connection between Device D and Server 2. In this case, a temporary loop might appear because of aggregate link switchover, and PFC deadlock occurs if the interface connecting Device C to Server 2 is congested.

When PFC deadlock occurs or is removed on a device, a log message is generated, for example:

%Feb 24 15:04:29:663 2019 DeviceA DRVPLAT/4/DrvDebug: PFC Deadlock Recovery Event Begin.Port is WGE1/0/2.

%Feb 24 15:04:29:800 2019 DeviceA DRVPLAT/4/DrvDebug: PFC Deadlock Recovery Event End.Port is WGE1/0/2.

# If a network failure (for example, frequent aggregate link switchover) causes 10 or more PFC deadlock events within one second, PFC is disabled on the device.

%Feb 24 15:08:24:650 2019 DeviceA DRVPLAT/4/DrvDebug: PFC deadlock limit has been checked, Ifname: WGE1/0/2, Cos: 5, times: 10.

# In this case, set the PFC deadlock detection recovery mode to manual on Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2.

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceA-if-range] priority-flow-control deadlock recover-mode manual

[DeviceA-if-range] quit

# Troubleshoot the network as soon as possible, and then cancel the upper threshold configuration for PFC deadlock times within the specified period. Manually recover PFC deadlock detection on Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2.

[DeviceA] undo priority-flow-control deadlock threshold cos 5

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceA-if-range] priority-flow-control deadlock recover

# Identify whether PFC deadlock occurs on the device. If no PFC deadlock is reported within a period of time, set the PFC deadlock detection recovery mode to auto.

[DeviceA-if-range] priority-flow-control deadlock recover-mode auto

Example: Configuring PFC thresholds

Network configuration

As shown in Figure 1:

·     S6850 switches Device A and Device B are connected through 100-GE interfaces, and the cable length is about 30 meters.

·     Device A is connected to Server 1 and Server 2 through 25-GE interfaces. Device B is connected to Server 3 through a 25-GE interfaces. The length of each cable is 5 meters.

·     A large number of RDMAv2 packets are exchanged among Server 1, Server 2, and Server 3. The maximum packet size is 1536 bytes, and the packets carry 802.1p priority 5.

Configure PFC to ensure that packets with 802.1p priority 5 are not lost, network resources are well utilized, and the total network throughput is not affected.

Figure 20 Network diagram

 

Dynamic back pressure frame triggering threshold configuration requirement analysis

In this example, queue 5 is configured to use 100% shared-area space of cell resources in the egress buffer by using the buffer egress cell queue 5 shared ratio 100 command on Device A and Device B. Then, queue 5 on each interface of these devices can use 100% buffer of the interface.

In this network, three flows might be congested: the flow from Server 1 or Server 2 to Server 3, the flow from Server 2 or Server 3 to Server 1, and the flow from Server 1 or Server 3 to Server 2.

·     For the flow from Server 1 or Server 2 to Server 3:

¡     On Device A: WGE 1/0/1 of Device B will generate PFC pause frames to limit the rate to 25 Gbps for HGE 1/0/25 of Device A. Therefore, the total thresholds for the ingress buffer of WGE 1/0/1 and WGE 1/0/2 cannot exceed the threshold for the egress buffer on HGE 1/0/25. WGE 1/0/1 of WGE 1/0/2 can use up to 50% of the ingress buffer separately. To further ensure traffic security, configure the dynamic back pressure triggering threshold as 33% for WGE 1/0/1 and WGE 1/0/2.

¡     On Device B, you do not need to set the dynamic back pressure framing triggering threshold.

·     For the flow from Server 2 or Server 3 to Server 1:

¡     On Device A: The total thresholds for the ingress buffer of WGE 1/0/2 and HGE 1/0/25 cannot exceed the threshold for the egress buffer of WGE 1/0/1. WGE 1/0/2 of HGE 1/0/25 can use up to 50% of the ingress buffer separately. To further ensure traffic security, configure the dynamic back pressure triggering threshold as 33% for WGE 1/0/2 and HGE 1/0/25.

¡     On Device B, you do not need to set the dynamic back pressure framing triggering threshold.

·     For the flow from Server 1 or Server 3 to Server 2:

¡     On Device A: The total thresholds for the ingress buffer of WGE 1/0/1 and HGE 1/0/25 cannot exceed the threshold for the egress buffer of WGE 1/0/2. WGE 1/0/1 of HGE 1/0/25 can use up to 50% of the ingress buffer separately. To further ensure traffic security, configure the dynamic back pressure triggering threshold as 33% for WGE 1/0/1 and HGE 1/0/25.

¡     On Device B, you do not need to set the dynamic back pressure framing triggering threshold.

In summary, configure the dynamic back pressure frame triggering threshold as 33 on WGE 1/0/1, WGE 1/0/2, and HGE 1/0/25 of Device A, and you do not need to set the dynamic back pressure framing triggering threshold on Device B.

Procedures

IMPORTANT

IMPORTANT:

For how PFC thresholds are calculated, see "Restrictions and guidelines."

 

1.     Configure Device A:

# Configure queue 5 to use 100% shared-area space of cell resources in the egress buffer.

<DeviceA> system-view

[DeviceA] buffer egress cell queue 5 shared ratio 100

[DeviceA] buffer apply

# On Twenty-FiveGigE 1/0/1, Twenty-FiveGigE 1/0/2, and HundredGigE 1/0/25, configure the interfaces to trust 802.1p priorities carried in packets, enable PFC, and enable PFC for packets with 802.1p priority 5.

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2 hundredgige 1/0/25

[DeviceA-if-range] qos trust dot1p

[DeviceA-if-range] priority-flow-control enable

[DeviceA-if-range] priority-flow-control no-drop dot1p 5

[DeviceA-if-range] quit

# On Twenty-FiveGigE 1/0/1 and Twenty-FiveGigE 1/0/2, set the headroom buffer threshold to 234 for 802.1p priority 5. Take the cable length as 10 meters, and calculate the in-transit traffic (bytes) = 9216+1536+3840+325 = 14917 bytes. In extreme conditions, each packet is of 64 bytes, and 14917 bytes need 234 (14917/64=233.078) cell resources.

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2

[DeviceA-if-range] priority-flow-control dot1p 5 headroom 234

[DeviceA-if-range] quit

# On HundredGigE 1/0/25, set the headroom buffer threshold to 432 for 802.1p priority 5. The cable length does not reach 100 meters, and take it as 100 meters. Calculate the in-transit traffic (bytes) = 9216+1536+3840+13000 = 27592 bytes. In extreme conditions, each packet is of 64 bytes, and 27592 bytes need 432 (27592/64=431.125) cell resources.

[DeviceA] interface hundredgige 1/0/25

[DeviceA-HundredGigE1/0/25] priority-flow-control dot1p 5 headroom 432

[DeviceA-HundredGigE1/0/25] quit

# On Twenty-FiveGigE 1/0/1, Twenty-FiveGigE 1/0/2, and HundredGigE 1/0/25, set the dynamic back pressure frame triggering threshold to 33 for 802.1p priority 5.

[DeviceA] interface range twenty-fivegige 1/0/1 to twenty-fivegige 1/0/2 hundredgige 1/0/25

[DeviceA-if-range] priority-flow-control dot1p 5 ingress-buffer dynamic 33

# On Twenty-FiveGigE 1/0/1, Twenty-FiveGigE 1/0/2, and HundredGigE 1/0/25, configure the offset between the back pressure frame stopping threshold and triggering threshold as 7 for 802.1p priority 5. (The calculated offset is 1536/256=6, set it to 7 for a small margin.)

[DeviceA-if-range] priority-flow-control dot1p 5 ingress-threshold-offset 7

# On Twenty-FiveGigE 1/0/1, Twenty-FiveGigE 1/0/2, and HundredGigE 1/0/25, set the PFC reserved threshold to 8. (The minimum value required is (1536+64+256)/256=7.25.)

[DeviceA-if-range] priority-flow-control dot1p 5 reserved-buffer 8

[DeviceA-if-range] quit

2.     Configure Device B:

# Configure queue 5 to use 100% shared-area space of cell resources in the egress buffer.

<DeviceB> system-view

[DeviceB] buffer egress cell queue 5 shared ratio 100

[DeviceB] buffer apply

# On Twenty-FiveGigE 1/0/1 and HundredGigE 1/0/25, configure the interfaces to trust 802.1p priorities carried in packets, enable PFC, and enable PFC for packets with 802.1p priority 5.

[DeviceB] interface range twenty-fivegige 1/0/1 hundredgige 1/0/25

[DeviceB-if-range] qos trust dot1p

[DeviceB-if-range] priority-flow-control enable

[DeviceB-if-range] priority-flow-control no-drop dot1p 5

[DeviceB-if-range] quit

# On Twenty-FiveGigE 1/0/1, set the headroom buffer threshold to 234 for 802.1p priority 5. Take the cable length as 10 meters, and calculate the in-transit traffic (bytes) = 9216+1536+3840+325 = 14917 bytes. In extreme conditions, each packet is of 64 bytes, and 14917 bytes need 234 (14917/64=233.078) cell resources.

[DeviceB] interface twenty-fivegige 1/0/1

[DeviceB-Twenty-FiveGigE1/0/1] priority-flow-control dot1p 5 headroom 234

[DeviceB-Twenty-FiveGigE1/0/1] quit

# On HundredGigE 1/0/25, set the headroom buffer threshold to 432 for 802.1p priority 5. The cable length does not reach 100 meters, and take it as 100 meters. Calculate the in-transit traffic (bytes) = 9216+1536+3840+13000 = 27592 bytes. In extreme conditions, each packet is of 64 bytes, and 27592 bytes need 432 (27592/64=431.125) cell resources.

[DeviceB] interface hundredgige 1/0/25

[DeviceB-HundredGigE1/0/25] priority-flow-control dot1p 5 headroom 432

[DeviceB-HundredGigE1/0/25] quit

# On Twenty-FiveGigE 1/0/1 and HundredGigE 1/0/25, configure the offset between the back pressure frame stopping threshold and triggering threshold as 7 for 802.1p priority 5. (The calculated offset is 1536/256=6, set it to 7 for a small margin.)

[DeviceB] interface range twenty-fivegige 1/0/1 hundredgige 1/0/25

[DeviceB-if-range] priority-flow-control dot1p 5 ingress-threshold-offset 7

# On Twenty-FiveGigE 1/0/1 and HundredGigE 1/0/25, set the PFC reserved threshold to 8. (The minimum value required is (1536+64+256)/256=7.25.)

[DeviceB-if-range] priority-flow-control dot1p 5 reserved-buffer 8

[DeviceB-if-range] quit

Appendixes

Default priority map

Table 11 Default dot1p-lp priority map

Input priority value

dot1p-lp map

dot1p

lp

0

2

1

0

2

1

3

3

4

4

5

5

6

6

7

7

 

Developing a gRPC collector-side application

This section uses an example to show how to develop a gRPC collector-side application to enable a collector to collect device data. In this example, the operating system is Linux. The programming language is C++.

Prerequisites

1.     Install the C++ execution environment. For more information, see the relative user guides.

2.     Contact H3C Support to obtain file grpc_dialout.proto.

3.     Download the proto file processing utility protoc from the following website: https://github.com/google/protobuf/releases.https://github.com/google/protobuf/releases

4.     Obtain the protobuf plug-in for C++ (protobuf-cpp) from the following website: https://github.com/google/protobuf/releases.

Generating the C++ code for the proto definition file

# Copy proto definition file grpc_dialout.proto to the current directory.

$ protoc --plugin=./grpc_cpp_plugin --grpc_out=. --cpp_out=. grpc_dialout.proto

Developing the collector-side application

1.     Create file main.cc.

#include <iostream>

#include <grpc++/grpc++.h>

#include <grpc/grpc.h>

#include <grpc++/server.h>

#include <grpc++/server_builder.h>

#include <grpc++/server_context.h>

#include "grpc_dialout.grpc.pb.h"

#include "my_server.h"

using namespace std;

int main()

{

    string server_address("0.0.0.0:50050");

DialoutTest dialout_test;

    ServerBuilder builder;

    cout << "runing on " << server_address << endl;

    builder.AddListeningPort(server_address, InsecureServerCredentials());

    builder.RegisterService(&dialout_test);

    unique_ptr<Server> server(builder.BuildAndStart());

    server->Wait();

    return 0;

}

2.     Create file my_server.cc.

#include <iostream>

#include "my_server.h"

 

Status DialoutTest::Dialout(ServerContext* context, ServerReader< DialoutMsg>* r

eader, DialoutResponse* response)

{

    DialoutMsg msg;

    while (reader->Read(&msg))

    {

        const DeviceInfo &device_msg = msg.devicemsg();

        std::cout<< "MSG INDEX: " << this->i++ << std::endl;

        std::cout << "peer info: " << context->peer() << std::endl;

        std::cout<< "Producer-Name: " << device_msg.producername() << std::endl;

        std::cout<< "Device-Name: " << device_msg.devicename() << std::endl;

        std::cout<< "Device-Model: " << device_msg.devicemodel() << std::endl;

        std::cout<<"Sensor-Path: " << msg.sensorpath()<<std::endl;

        std::cout<<"Json-Data: " << msg.jsondata()<<std::endl;

        std::cout<<"--------------------" << std::endl << std::endl;

    }

    response->set_response("test");

    return Status::OK;

}

3.     Create file my_server.h.

using namespace grpc;

using namespace grpc_dialout;

 

class DialoutTest final : public GRPCDialout::Service

{

    long long i = 0;

    Status Dialout(ServerContext *context, ServerReader<DialoutMsg> *reader, Dia

loutResponse *response) override;

};

4.     Create the makefile.

export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig

SOURCES = $(wildcard $(subdir)*.cc)

SRCOBJS = $(patsubst %.cc, %.o, $(SOURCES))

CC = g++

 

%.o:%.cc

         $(CC) -std=c++11 -I/usr/local/include -pthread -c $< -o $@

 

all: server

 

server: grpc_dialout.grpc.pb.o grpc_dialout.pb.o my_server.o main.o

         $(CC) $^ -L/usr/local/lib `pkg-config --libs grpc++ grpc` -Wl,--no-as-n

eeded -lgrpc++_reflection -Wl,--as-needed -lprotobuf -lpthread -ldl -lssl -o $@

 

#chmod 777 $@

 

clean:

         rm *.o

5.     Execute the make command in the directory.

 

  • Cloud & AI
  • InterConnect
  • Intelligent Computing
  • Intelligent Storage
  • Security
  • SMB Products
  • Intelligent Terminal Products
  • Product Support Services
  • Technical Service Solutions
All Services
  • Resource Center
  • Policy
  • Online Help
  • Technical Blogs
All Support
  • Become A Partner
  • Partner Policy & Program
  • Global Learning
  • Partner Sales Resources
  • Partner Business Management
  • Service Business
All Partners
  • Profile
  • News & Events
  • Online Exhibition Center
  • Contact Us
All About Us
新华三官网