Download Book

Telemetry Technology White Paper-6W101-book.pdf(898.02 KB)

Released At: 14-08-2025
Page Views:
Downloads:

Table of Contents

Telemetry Technology White Paper-6W101

Related Documents

Telemetry Technology White Paper

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

This document provides generic technical information, some of which might not be applicable to your products.

Contents

Telemetry· 1

Technical background· 1

Technical benefits· 1

Telemetry network model 1

Telemetry implementation methods and their differences· 2

Telemetry application scenarios· 4

gRPC-based telemetry· 6

gRPC overview· 6

gRPC protocol stack layers· 6

gRPC network architecture· 6

Telemetry modes· 7

Data that gRPC can collect 7

Application scenarios· 7

INT-based telemetry· 9

INT packet formats· 10

INT packet header 10

Inherent header format 11

Metadata format 12

Operating mechanism·· 13

Common INT· 13

Flexible INT· 13

INT operating mechanisms in different networks· 13

Metadata that INT can collect 14

Telemetry streaming· 16

Telemetry streaming overview· 16

Technical background· 16

Technical benefits· 16

Telemetry streaming packet format 16

Telemetry streaming operating mechanism·· 18

ERSPAN-based telemetry· 20

ERSPAN overview· 20

Packet encapsulation formats· 20

ERSPANv2 packet encapsulation format 20

ERSPANv3 packet encapsulation format 21

Port mirroring ERSPAN· 23

Network architecture· 23

Operating mechanism·· 23

Flow mirroring ERSPAN· 25

About flow mirroring ERSPAN· 25

Operating mechanisms· 25

Application scenarios· 28

Cloud platform-based telemetry· 30

Cloud platform-based telemetry overview· 30

Network architecture· 30

Operating mechanism·· 31

Data aggregation and report by the AC· 33

Direct data report by APs· 33

Typical networking· 34

Telemetry

Technical background

Driven by the popularity of networks and the emergence of new technologies, networks have grown radically in scale and deployment complexity. Users also have increasingly high requirements for service quality. To meet user requirements, network operations must be more precise and intelligent. Network operations are facing the following challenges:

· Ultra large-scale—The number of managed devices and the amount of monitored information are very large.

· Fast fault location—Second-level or even subsecond-level fault location is required in complex networks.

· Granular monitoring—A wider variety of monitored data types and finer monitoring granularity are required to completely and accurately reflect the network status. This enables potential fault prediction and provides strong data support for network optimization. The network operations system must monitor the following information: traffic statistics on interfaces, packet loss on each flow, CPU and memory usage, each flow's delay and jitter, transmission delay of each packet, and buffer usage on each device.

Traditional monitoring methods (SNMP, CLI, and logging) cannot meet the network requirements.

· SNMP and CLI use the pull mode to request data from a device, which limits the number of monitored devices and the data acquisition speed.

· Although SNMP traps and logging use the push mode in which devices actively report data to monitoring devices, the data is limited to events and alarms and cannot accurately reflect the network status.

Telemetry is a remote data collection technology that monitors device performance and faults. It uses the push mode to timely obtain rich monitoring data, which helps rapidly locate faults and address network operations issues.

Technical benefits

Telemetry provides the following benefits:

· Multiple implementation methods, including Google Remote Procedure Call (gRPC), In-band Network Telemetry (INT), and Encapsulated Remote SPAN (ERSPAN), that can meet different user requirements.

· Finer granularity and rich types of collected data, which can fully reflect the network status.

· One subscription for continuous reporting. Compared with traditional network monitoring technologies, Telemetry allows one configuration for continuous data reporting, which alleviates request processing pressure on devices.

· More rapid and precise fault location.

Telemetry network model

As shown in Figure 1, a telemetry network contains the following components:

· Network device—Monitored device. It samples monitoring data and sends the sampled data to the collector at intervals through gRPC, INT, telemetry streaming, or ERSPAN.

· Collector—Receives and saves the monitoring data reported by the network device.

· Analyzer—Analyzes and processes monitoring data received by the collector and displays the analysis results in a graphical interface.

· Controller—Manages the network device by deploying configuration to it through NETCONF or other methods. It can deploy configuration or adjust the forwarding behavior of the device based on the analysis data. It can also control the data to be sampled and reported by the device.

Figure 1 Telemetry network model

Telemetry implementation methods and their differences

Based on the data reporting method, telemetry can be implemented in the following methods:

· gRPC-based telemetry.

gRPC-based telemetry can collect traffic statistics on interfaces, CPU usage, and alarms of devices. It encodes data in the protocol buffer code format and reports it to the collector in real time.

· INT-based telemetry.

Proposed by Barefoot, Arista, Dell, Intel, and VMware, INT is a network monitoring technology designed to collect data from devices. A device actively sends data to a collector in real time for device performance and network monitoring.

INT collects per-packet data plane information such as path and delay and can implement comprehensive and real-time network monitoring.

· Telemetry streaming.

Telemetry streaming is a traffic monitoring technology based on packet sampling, which is mainly used for precisely monitoring traffic transmission paths and transmission delay.

Telemetry streaming can collect the input and output interface information of each device that the traffic passes through and add corresponding timestamp information. It can also calculate the delay introduced when traffic passes through any of the devices.

· ERSPAN-based telemetry.

ERSPAN, a port mirroring technology, mirrors packets passing through a port, encapsulates the mirrored packets into GRE packets with protocol number 0x88BE, and sends them to a remote monitoring device.

You can define the packets to be mirrored as required. For example, you can mirror TCP three-way handshake packets to monitor TCP connection establishment or mirror Remote Direct Memory Access (RDMA) signaling packets to monitor the RDMA session state.

· Cloud platform-based telemetry

Cloud platform-based telemetry can collect data such as interface traffic statistics, CPU, alarms, and wireless client running status from ACs and APs. The collected data is reported in real time to the collector for reception and storage. This method provides lightweight and centralized operations support for large-scale wireless networks.

The data reported by gRPC comes from service modules of devices. The data reported by INT, telemetry streaming, and ERSPAN comes from packets received from other network nodes. The data reported by the cloud platform mainly comes from running status information collected by devices locally. Table 1 shows the differences among the telemetry implementation methods.

Table 1 Differences among the telemetry implementation methods

Item	gRPC	INT	Telemetry streaming	ERSPAN	Cloud platform
Measurement object	Xpath (sensor path)	TCP/UDP packets	Various packets	Various packets	Wireless devices and wireless clients
Measurement object selection rule	Configuring subscriptions to specify Xpaths for sampling	Using QoS policies or ACLs to filter packets	Using ACLs to filter packets	· Mirroring packets of source ports, source VLANs, and source CPUs · Flow mirroring	Devices automatically report data to the cloud platform upon connection
Packet sampling method	· Periodically sampling data and reporting it · Reporting event-triggered data in real time	Sampling packets based on the sampling rate (copying packets), and then inserting INT headers to obtain INT packets	Sampling packets based on the sampling rate (copying packets)	Mirroring packets Support for proportional sampling	· Periodically sampling data and reporting it · Reporting event-triggered data in real time
Measurement data	Configuration data and operating status data (such as interface status and statistics) of devices	Device, interface, queue, timestamp, and forwarding path information of each device on the forwarding path	Device ID, traffic input interface and its timestamp, traffic output interface and its timestamp	Timestamp	· Operating status data (such as interface status and statistics) of devices · Running data of wireless clients (such as signal strength and traffic) · Air interface data (such as channel usage and radar)
Timestamp granularity	Millisecond	Nanosecond	Nanosecond	Nanosecond	Second
Method in which data is sent to a collector	Data is reported by each node. A node encodes the sampled data into subscription messages through the gRPC protocol stack and sends them to the associated collector.	Data is reported by the exit node. The exit node encapsulates INT packets in UDP packets and sends them to the collector based on the IP forwarding table.	Data is reported by each node. A node adds telemetry streaming headers to the mirrored packets and encapsulates the packets with UDP headers (including collector address information), Layer 2 headers. and Layer 3 headers. Then, the node adds timestamp information and forwards the packets to the collector based on the IP forwarding table.	Data is reported by each node. A node adds ERSPANv2 or ERSPANv3 headers to the mirrored packets, recalculates CRCs, adds GRE and IPv4 headers to the packets, and then route the packets to the data monitoring device over the IP network.	Data is aggregated and reported by the AC or directly reported by APs

Telemetry application scenarios

You can deploy multiple telemetry technologies in the network to implement comprehensive and multi-perspective monitoring. Alternatively, you can deploy a single telemetry technology to monitor a specific aspect in real time.

As shown in Figure 2, after a telemetry technology sends collected data to the collector, the analyzer analyzes and displays the data in a graphical interface. This helps the administrator to better understand the network state and rapidly locate network faults. The administrator can also identify potential issues and optimize the network in time.

Figure 2 Telemetry application scenario

NOTE:

The collector and the analyzer can be two separate devices or a single device.

gRPC-based telemetry

gRPC-based telemetry enables the device to read various statistics (for example, CPU, memory, and interface statistics) and push data to collectors as subscribed. Compared with traditional monitoring methods, gRPC-based telemetry features real-time and high-efficiency data collection.

gRPC overview

gRPC protocol stack layers

Table 2 describes the gRPC protocol stack layers.

Table 2 gRPC protocol stack layers

Layer	Description
Content layer	Carries encoded service data. This layer supports the following encoding formats: · Google Protocol Buffer (GPB)—A highly efficient binary encoding format. This format uses proto definition files to describe the structure of data to be serialized. GPB is more efficient in data transmission than protocols such as JavaScript Object Notation. · JavaScript Object Notation (JSON)—A lightweight data exchange format. It uses a text format that is language neutral to store and represent data, which is easy to read and compile. If service data is in JSON format, you can use the public proto files to decode the data without having to use the proto file specific to the service module. Make sure the device and collectors use compatible proto files.
gRPC layer	Defines the interaction format for RPC calls. Public proto definition files such as the grpc_dialout.proto file define the public RPC methods.
HTTP 2.0 layer	Carries gRPC. HTTP 2.0 provides enhanced features such as header field compression, multiplexing requests on a single TCP connection, and flow control.
Transport Layer Security (TLS) layer	(Optional.) Provides channel encryption and mutual certificate authentication.
Transport layer	TCP provides connection-oriented and reliable data links.

gRPC network architecture

As shown in Figure 3, a gRPC network uses the client/server model.

Figure 3 gRPC network architecture

When the device acts as the server, the gRPC mechanism is as follows:

1. The gRPC server listens to connection requests from clients at the gRPC service port.

2. A user or system runs a gRPC client application to connect to the gRPC server.

3. The gRPC client calls methods provided in the .proto file to send requests.

4. The gRPC server responds to the requests from the gRPC client.

NOTE:

The device can act as the gRPC server or client, which depends on the telemetry mode that the gRPC client and server use to establish a connection. For more information, see "Telemetry modes."

Telemetry modes

As shown in Figure 2, the network device and the collector establish a gRPC connection to transmit data. The device supports the following telemetry modes:

· Dial-in mode—The device acts as a gRPC server and the collectors act as gRPC clients. A collector initiates a gRPC connection to the device to subscribe to device data.

Dial-in mode supports the following types of operations:

¡ Get—Obtains device status and settings and subscribes to events.

¡ gNMI—Includes the following subtypes of operations:

- gNMI Capabilities—Obtains the capacities of the device.

- gNMI Get—Obtains the status and settings of the device.

- gNMI Set—Deploys settings to the device.

- gNMI Subscribe—Subscribes to data push services provided by the device. The data might be collected periodically or upon occurrence of an event.

¡ CLI—Executes commands on the device.

· Dial-out mode—The device acts as a gRPC client and the collectors act as gRPC servers. The device initiates gRPC connections to the collectors and pushes device data to the collectors as configured.

NOTE:

gRPC Network Management Interface (gNMI) is a gRPC-based protocol for network device management. It defines a series of RPC methods to obtain or configure the states of devices. gNMI supports common data models. No proto files specific to service modules are needed.

Data that gRPC can collect

gRPC can collect the following data:

· Device status—For example, the status of the CPU and memory.

· Physical interface status—For example, the transceiver module status and the bandwidth usage of interfaces.

· Packet/queue statistics for interfaces—For example, packet loss and error statistics of interfaces, packet loss statistics of queues, and buffer resource usage status of queues.

· Entry/resource statistics—For example, the usage of forwarding entry resources, ACL flow entry resources, and virtual interface resources.

Application scenarios

As shown in Figure 4, devices (SW A and SW B) act as gRPC clients and actively report event data to the collector and analyzer in dial-out mode. The collector and analyzer (acting as the gRPC server) detects the service status of the devices based on the reported events for network status and health analysis and displays the status.

Figure 4 Application scenario

INT-based telemetry

INT overview

Background

In traditional networks, the radar detection technology is typically used to detect the packet forwarding paths. However, the radar detection technology requires the intervention of controller software. The design is relatively complex without hardware support, and the technology cannot fully simulate actual packet forwarding.

Ping and tracert can monitor the network delay and path, but they cannot accurately determine on which interface a packet is longest delayed in a delay-sensitive network. As an important part of network visualization, INT is the first and the most important step in the journey towards automated operations.

INT allows you to obtain the following information on the forwarding path:

· Device information.

· Ingress port, egress port, and queue information of packets on each device.

· Timestamp information.

Benefits

INT provides visibility into the following information on the forwarding path:

· Ingress port, egress port, and queue information of each device.

· Ingress timestamp and egress timestamp.

· Queue congestion information.

On the last hop in the forwarding path, INT encapsulates collected packets into UDP packets and sends the packets to the collector. Then, the NMS software deployed on the collector analyzes the monitored data and extracts useful information.

INT provides the following benefits:

· A device uses a dedicated INT processor to complete INT processing.

· The administrator deploys INT configuration to a device once, and the device continuously reports collected data to the collector.

· You can configure the sampling rate for INT collection.

· You can configure QoS policies or ACLs to flexibly match original packets that require path detection.

· INT can directly encapsulate and send packets to the collector at the last hop of path detection.

· INT can collect information about devices, interfaces, queues, timestamps, and forwarding paths.

Standardization

INT is defined in the IETF's Internet Draft Inband Flow Analyzer draft-kumar-ippm-ifa-02. This draft describes the formats of the header inherent to INT and metadata (monitoring information) in detail. Theoretically, network devices that support this draft can implement INT packet analysis and processing functions.

INT network model

Based on the configuration method and operating mechanism, INT can be classified into common INT and flexible INT.

· Common INT—Each node must be configured with an INT role on its input interface. Traffic flows are defined on the entry node by using a QoS policy. INT flows are automatically identified on the transit node and exit node and processed according to configured actions. On each input interface in the path, you can perform INT processing only on the flows defined on the entry node.

· Flexible INT—No device role needs to be configured on each node. On each node, an ACL can be used to define a flow and an action used to take on the defined flow. For example, configure the actions of mirroring packets and adding collected data on the entry node. For the same flow, the original packets are matched on the entry node, and the INT packets are matched on the transit node and exit node. You can define multiple flows on an interface and take different actions on different flows.

Figure 5 INT network model

INT packet formats

INT packet header

An INT packet header contains the following parts: INT Probe HDR (header inherent to INT) and MD #1-N (inserted metadata). Figure 6 shows the INT packet header encapsulation positions. Based on the transport layer protocol, INT packets can be classified into the following types:

· INT over TCP—An original TCP packet is called an INT-over-TCP packet after it is mirrored and inserted with an INT header.

· INT over UDP—An original UDP packet is called an INT-over-UDP packet after it is mirrored and inserted with an INT header.

Figure 6 INT packet formats

Inherent header format

Figure 7 Inherent header format

The meanings of each field in an inherent header are as follows:

· Probe Maker—Used by the device to identify INT packets. The value is fixed at 0xaaaaaaaabbbbbbbb.

· Version—Currently fixed at 0x01.

· Message Type—Currently fixed at 0x01.

· Flags—Reserved field, currently fixed at 0x0000.

· Telemetry Request Vector—Currently fixed at 0xffffffff.

· Hop Limit—Maximum number of allowed hops.

· Hop Count—Number of nodes the packet has traversed.

· Must Be Zero—Currently fixed at all 0s.

· Maximum Length—Maximum length of collected data, in bytes.

· Current Length—Current length of collected data, in bytes.

· Sender's Handle—Set by the entry node for the collector to uniquely identify an INT flow.

· Sequence Number—Sequence number of a packet in an INT flow, which uniquely identifies the packet in the flow.

Metadata format

The INT packet format varies by product.

Figure 8 Metadata format

The meanings of each field in metadata are as follows:

· Device-ID—ID of the device.

· Template-Id—Reserved field, currently fixed at 000.

· Congestion—Indicates the congestion state. The three high bits are fixed at 000, and the two low bits indicate the ECN field.

· Egress Port Drop Pkt Byte Cnt Upper—Drop count in bytes for the egress port, currently fixed at 0x00.

· IP_TTL—TTL value of the packet.

· Queue-Id—Egress port queue ID, currently fixed at 0x00.

· Rx Timestamp Seconds Upper/Rx Timestamp Seconds—Ingress timestamp in seconds.

· Rx Timestamp Nano-Seconds Upper—Ingress timestamp in nanoseconds.

· Tx Timestamp Nano-Seconds Upper—Egress timestamp in nanoseconds.

· Egress Port Utilization [%]—Egress port usage in percentage, currently fixed at 0x0000.

· Ingress Port [module, port]—Ingress port identifier.

· Egress Port [module, port]—Egress port identifier.

· Egress Port Drop Pkt Byte Cnt—Drop count in bytes for the egress port, currently fixed at 0x00000000.

Operating mechanism

Common INT

The nodes in common INT perform the following functions:

· Entry node.

On the entry node, the ingress port uses a QoS policy to sample matching packets, and mirrors sampled packets to the INT processor. Then, the INT processor adds an INT header to the INT packet and loops it back to the ingress port. The ingress port identifies the looped-back INT packet according to the INT mark, adds collected data to it, and forwards it to the egress port. The egress port adds collected data to the INT packet and sends it to the transit node.

· Transit node.

On the transit node, the ingress port identifies the INT packet according to the INT mark, adds collected data to it, and forwards it to the egress port. The egress port adds collected data to the INT packet, and sends it to the exit node.

· Exit node.

On the exit node, the ingress port identifies the INT packet according to the INT mark, adds collected data to the INT packet, and sends the INT packet to the INT processor. The INT processor encapsulates the INT packet into a new UDP packet, which is then forwarded to the egress port. The egress port sends the packet to the collector.

Flexible INT

The nodes in flexible INT perform the following functions:

· Entry node.

On the entry node, the ingress port uses an ACL to sample matching packets, and mirrors sampled packets to the INT processor. Then, the INT processor adds an INT header to the INT packet and loops it back to the ingress port. The ingress port identifies the looped-back INT packet according to an ACL, adds collected data to it, and forwards it to the egress port. The egress port adds collected data to the INT packet and sends it to the transit node.

· Transit node.

On the transit node, the ingress port uses an ACL to identify INT packets, adds collected data to INT packets, and forwards them to the egress port. The egress port adds collected data to the INT packets, and sends them to the exit node.

· Exit node.

On the exit node, the ingress port uses an ACL to identify INT packets and mirrors them to the INT processor. The INT processor encapsulates the INT packets in UDP packets, which are then forwarded to the egress port. The egress port sends the packets to the collector.

INT operating mechanisms in different networks

Figure 9 and Figure 10 show the INT operating mechanisms in a common network and an EVPN/VXLAN network, respectively.

Figure 9 INT operating mechanism in a common network

Figure 10 INT operating mechanism in an EVPN/VXLAN network

Metadata that INT can collect

INT can collect and monitor the following metadata:

· Device ID—The device ID of each device on the packet forwarding path, which is the device ID specified when you configure INT.

· Ingress port ID—Logical input interface of packets on each node in the INT network.

· Ingress timestamp—The local time on the device when a packet enters the ingress port. For the entry node, it is the time when an INT packet enters the loopback interface.

· Egress port ID—Logical output interface of packets on each node in the INT network.

· Egress timestamp—The local time on the device when a packet leaves the egress port.

· Cache information—ID of the queue that caches original packets and ECN information.

Telemetry streaming

Telemetry streaming overview

Technical background

In a network with high real-time requirements, it is necessary to accurately determine on which interface a packet is longest delayed. By using telemetry streaming, you can obtain information about the devices that traffic passes through and the time when traffic passes the input and output interfaces. This helps calculate the transmission delay when traffic passes through one or multiple devices, allowing for optimizing the network architecture accordingly and reducing network latency.

Telemetry streaming can monitor the following information: device ID, traffic input interface and its timestamp, and traffic output interface and its timestamp. The device ID is the one specified when you configure telemetry streaming, which uniquely identifies a device on the packet transmission path.

Technical benefits

Telemetry streaming provides the following benefits:

· Simple configuration.

· The administrator deploys telemetry streaming configuration to a device once, and the device continuously reports collected data to the collector.

· You can use ACLs to flexibly match original packets that require path detection.

· You can edit the sampler to flexibly adjust the sampling granularity.

Telemetry streaming packet format

The telemetry streaming packet format varies by product. This section uses the S12500G-AF switch as an example.

As shown in Figure 11, telemetry streaming adds timestamps, a telemetry streaming header, a UDP header, an IP header, and an Ethernet header to each sampled packet. Table 3 and Table 4 show the meanings of each field in a timestamp and a telemetry streaming header, respectively.

Figure 11 Telemetry streaming packet encapsulation format

Table 3 Meanings of each field in a timestamp

Field	Length (in bits)	Description
Time	48	Obtained from the PTP module, which includes 16 bits that indicate the second and 32 bits that indicate the nanosecond.
Reserved	8	Reserved field.
Origin ID	23	Source device information of the packet that contains the timestamp. The device ID is split into two 16-bit parts that are stored in the first 16 bits in this field of the input interface timestamp and the output interface timestamp, respectively.
Rx_Tx	1	Direction identifier. Values include: · 0—Receive direction, which indicates the input interface timestamp. · 1—Transmit direction, which indicates the output interface timestamp.
FCS	32	Frame Check Sequence (FCS).

Table 4 Meanings of each field in a telemetry streaming header

Field	Length (in bits)	Description
Version	32	Telemetry streaming version. The value is currently fixed at 1.
Src MID	8	Source module ID of the original traffic. Src MID and Src Port together uniquely identify the input interface of the original traffic.
Src Port	8	Source port of the original traffic.
Dst MID	8	Destination module ID of the original traffic. Dst MID and Dst Port together uniquely identify the input interface of the original traffic.
Dst Port	8	Destination port of the original traffic.
Flags	9	1 indicates yes and 0 indicates no. The meanings of bits from left to right are as follows: · Source_sample—1 bit. The value indicates whether telemetry streaming sampling is based on input interfaces. · Dest_sample—1 bit. The value is fixed at 0. · Flex_sample—1 bit. The value indicates whether telemetry streaming sampling is based on flows. · Mcast_sample—1 bit. The value indicates whether the sampling is multicast packet sampling. · Discarded—1 bit. The value indicates whether the sampled packet is discarded when it is sent to the local CPU for processing. · Truncated—1 bit. The value is fixed at 0 (not truncating). Currently, sampling copies the original packets for UDP encapsulation. · Dest_port_encoding—3 bits. ¡ 000—Control frame for communication between CPUs. ¡ 001—Layer 2 or Layer 3 unicast packet in which the destination address has been resolved. ¡ 010—Multicast packet, unknown unicast packet or unknown multicast packet, which is sent to all ports in the VLAN. ¡ 011—Layer 2 multicast packet, which is sent to all ports in the multicast group. ¡ 100—IP multicast packet, which is sent to all ports in the multicast group. ¡ 101, 110, 111—Reserved values.
Reserved	7	Reserved field.
User metadata	16	Customizable user information.
Sequence number	32	Sequence number.

Telemetry streaming operating mechanism

Take Device B in Figure 12 as an example. The operating mechanism of telemetry streaming is as follows:

1. All devices involved in measurement use PTP to achieve nanosecond-level time synchronization.

2. The device uses an ACL to filter original packets on the input interface and copies matching packets based on the specified sampling rate.

3. The device encapsulates the packets with the following types of headers:

¡ Telemetry streaming header (records the input and output interfaces of original packets).

¡ UDP header, Layer 2 header, and Layer 3 header (record the port number and MAC/IP address of the collector).

¡ Input interface timestamp (Rx Timestamp).

¡ Output interface timestamp (Tx Timestamp).

4. The device sends the sampled packets to the collector. The input and output interface timestamps of the sampled packets include the information of the devices they belong to (device IDs).

Figure 12 Telemetry streaming operating mechanism

The collector can calculate the path and delay information according to the collected data from multiple nodes.

· Transmission delay of traffic passing through the specified device = Tx timestamp of the device – Rx timestamp of the device.

· Transmission delay of traffic passing through multiple devices = Tx timestamp for the device on which the output interface resides – Rx timestamp for the device on which the input interface resides.

ERSPAN-based telemetry

ERSPAN overview

ERSPAN is a Layer 3 remote mirroring technology that copies packets passing through a port, VLAN, or CPU and routes the packets to the remote monitoring device through a GRE tunnel for monitoring and troubleshooting.

ERSPAN supports port mirroring and flow mirroring.

Packet encapsulation formats

ERSPANv2 packet encapsulation format

As shown in Figure 13, ERSPANv2 encapsulates mirrored packets in GRE packets with a protocol number of 0x88BE.

Figure 13 ERSPANv2 packet encapsulation format

ERSPANv2 adds ERSPANv2 headers to the mirrored packets, recalculates CRCs, and then adds GRE and IPv4 headers to the packets. The meanings of key fields in a GRE header and an ERSPAN header are as follows:

· GRE header:

¡ Flags—The S bit is 1, which indicates that a packet can be determined as an in-order or out-of-order packet through the sequence number. Other bits are all 0s.

¡ Ver—Version number, which is fixed at 0.

¡ Protocol type (0x88BE)—The passenger protocol of GRE is ERSPAN type II.

¡ Sequence number—The sequence number increases by 1 when a new packet is received.

· ERSPANv2 header:

¡ Ver—ERSPAN version number. The version number for ERSPAN type II is 1.

¡ VLAN—Original VLAN of the mirrored packet.

¡ CoS—Original class of service (CoS) of the mirrored packet.

¡ En—Data frame encapsulation type of the ERSPAN traffic source port. Values include:

- 00—Encapsulation without VLAN tags.

- 01—ISL encapsulation.

- 10—802.1Q encapsulation.

- 11—Encapsulation with VLAN tags.

¡ T—A value of 1 indicates that the mirrored packet was fragmented during encapsulation in ERSPAN because it exceeded the interface MTU.

¡ Session ID—ERSPAN session ID. This ID must be unique for the same source device and destination device.

¡ Reserved—Reserved field.

¡ Index—Index for the source port and mirroring direction.

ERSPANv3 packet encapsulation format

Compared with ERSPANv2, ERSPANv3 introduces a bigger and more flexible composite header to meet the requirements in increasingly complex and diverse network monitoring scenarios (for example, network management, intrusion detection, and performance and latency analysis). In these scenarios, all parameters of the original mirrored packets must be known, including those parameters that do not exist in the original mirrored packets.

As shown in Figure 14, ERSPANv3 encapsulates mirrored packets in GRE packets with a protocol number of 0x22EB.

Figure 14 ERSPANv3 packet encapsulation format

ERSPANv3 adds ERSPANv3 headers to the mirrored packets, recalculates CRCs, and then adds GRE and IPv4 headers to the packets. The meanings of key fields in a GRE header and an ERSPANv3 header are as follows:

· GRE header:

¡ Flags—The S bit is 1, which indicates that a packet can be determined as an in-order or out-of-order packet through the sequence number. Other bits are all 0s.

¡ Ver—Version number, which is fixed at 0.

¡ Protocol type (0x22EB)—The passenger protocol of GRE is ERSPAN type III.

¡ Sequence number—The sequence number increases by 1 when a new packet is received.

· ERSPANv3 header:

¡ Ver—ERSPAN version number. The version number for ERSPAN type III is 2.

¡ VLAN—Original VLAN of the mirrored packet.

¡ CoS—Original CoS of the mirrored packet.

¡ BSO—Payload integrity of the data frame carried by ERSPAN. Values include:

- 00—Complete data frame.

- 11—Incomplete data frame.

- 01—Short data frame.

- 10—Oversized data frame.

¡ Session ID—ERSPAN session ID. This ID must be unique for the same source device and destination device.

¡ Timestamp—Derived from a hardware clock synchronized to the system time. This 32-bit field must support at least a timestamp granularity of 100 microseconds. For more information about the timestamp granularity, see the Gra field.

¡ SGT—Security group tag of the mirrored packet, which marks source identity information of the mirrored packet.

¡ P—Protocol tag, which indicates whether the data frame carried by ERSPAN is an Ethernet frame. 1 indicates yes and 0 indicates no.

¡ FT—Indicates whether the mirrored packet is an Ethernet frame or IP packet. Values include:

- 0—Ethernet frame.

- 2—IP packet.

¡ HW ID—Unique identifier of an ERSPAN engine in a system.

¡ D—Direction of the mirrored packet. Values include:

- 0—Inbound direction.

- 1—Outbound direction.

¡ Gra—Timestamp granularity. Values include:

- 00b—100 microseconds.

- 01b—100 nanoseconds.

- 10b—IEEE 1588.

- 11b—User-defined.

¡ O—Indicates whether the platform-specific subheader is carried. 1 indicates yes and 0 indicates no.

¡ Platf ID—Platform-specific subheader ID. Different IDs correspond to different platform-specific subheader encapsulation formats. Currently, the value can only be 0x5.

¡ Platform Specific SubHeader—Platform-specific subheader. Figure 15 shows the detailed platform-specific subheader format.

- Switch ID—Identifies the source device of the mirrored packet.

- Port ID/Index—Identifies the destination port on the source device.

- Timestamp—In this encapsulation format, the Timestamp field in the ERSPANv3 header represents IEEE 1588 nanoseconds, the value for the Gra field is 10b, and this field represents IEEE 1588 seconds.

Figure 15 Platform-specific subheader format

Port mirroring ERSPAN

Network architecture

Port mirroring ERSPAN contains the following components:

· Mirroring source—The mirroring sources can be one or more monitored ports (called source ports), VLANs (called source VLANs), or CPUs (called source CPUs). Packets passing through mirroring sources are copied and sent to a data monitoring device for monitoring and analysis.

· Source device—The device where the mirroring sources reside.

· Mirroring destination—The mirroring destination connects to a data monitoring device and is the destination port (also known as the monitor port) of mirrored packets. Mirrored packets are sent out of the monitor port to the data monitoring device.

· Destination device—The device where the monitor port resides.

· Data monitoring device—The device that receives and analyzes the mirrored packets.

Operating mechanism

Port mirroring ERSPAN can be implemented in tunnel mode and encapsulation parameter mode.

Tunnel mode

Configure the mirroring sources and destination for the local mirroring groups on the source device and destination device as follows:

· On the source device:

¡ Configure the ports to be monitored as source ports.

¡ Configure the VLANs to be monitored as source VLANs.

¡ Configure the CPUs to be monitored as source CPUs.

¡ Configure the tunnel interface through which mirrored packets are forwarded to the destination device as the monitor port.

· On the destination device:

¡ Configure the physical port corresponding to the tunnel interface as the source port.

¡ Configure the VLAN of the physical port corresponding to the tunnel interface as the source VLAN.

¡ Configure the port that connects to the data monitoring device as the monitor port.

As shown in Figure 16, Layer 3 remote port mirroring in tunnel mode works as follows:

1. The source device sends one copy of a packet in the inbound, outbound, or bidirectional direction to the tunnel interface. The tunnel interface acts as the monitor port in the local mirroring group created on the source device.

2. The tunnel interface on the source device forwards the mirrored packet to the tunnel interface on the destination device through the GRE tunnel.

3. The destination device receives the mirrored packet from the physical interface of the tunnel interface. The tunnel interface acts as the source port in the local mirroring group created on the destination device.

4. The physical interface of the tunnel interface sends one copy of the packet to the monitor port (Port B).

5. The monitor port (Port B) forwards the packet to the data monitoring device.

Figure 16 Layer 3 remote port mirroring in tunnel mode

‌

Encapsulation parameter mode

To implement Layer 3 remote port mirroring in encapsulation parameter mode, perform the following tasks:

1. On the source device, create a local mirroring group and configure the mirroring sources, the monitor port, and the encapsulation parameters for mirrored packets.

2. On all devices from source to destination, configure a unicast routing protocol to ensure Layer 3 reachability between the devices.

Create a local mirroring group on the source device, and specify the source ports and monitor port for the local mirroring group. When you configure the monitor port, specify the following encapsulation parameters for mirrored packets:

· Monitoring device IP address as destination IP address.

· Monitor port IP address as source IP address.

As shown in Figure 17, Layer 3 remote port mirroring in encapsulation parameter mode works as follows:

1. The source device copies a packet passing through a source port in the inbound, outbound, or bidirectional direction.

2. The source device encapsulates the copied packet with the specified encapsulation parameters, monitoring device IP as destination IP and monitor port IP as source IP.

3. The encapsulated packet is routed to the monitoring device through the IP network.

4. The monitoring device decapsulates the packet and analyzes the packet contents.

The packet sent to the monitoring device through Layer 3 remote port mirroring in encapsulation parameter mode is encapsulated. In this mode, make sure the monitoring device supports decapsulating packets.

Figure 17 Layer 3 remote port mirroring in encapsulation parameter mode

Flow mirroring ERSPAN

About flow mirroring ERSPAN

Flow mirroring copies packets matching a class to a destination for packet analyzing and monitoring. It is implemented through QoS.

To implement flow mirroring through QoS, perform the following tasks:

1. Define traffic classes and configure match criteria to classify packets to be mirrored. Flow mirroring allows you to flexibly classify packets to be analyzed by defining match criteria.

2. Configure traffic behaviors to mirror the matching packets to the specified destination.

When the flow mirroring destination is an interface, you can use flow mirroring to implement ERSPAN.

Operating mechanisms

Flow mirroring ERSPAN can be implemented in the following modes:

· Loopback mode.

· Encapsulation parameter mode.

· Monitoring group mode.

Loopback mode

As shown in Figure 18, configure flow mirroring ERSPAN in loopback mode as follows:

1. On the source device, apply a QoS policy to the source interface as follows:

a. Configure a traffic class to match packets.

b. Configure a traffic behavior to mirror packets to Port B and specify the loopback keyword.

c. Create a QoS policy, and associate the traffic class with the traffic behavior.

d. Apply the QoS policy to the source interface.

2. On the source device, apply a QoS policy to Port B as follows:

a. Configure a traffic class to match packets.

b. Configure a traffic behavior to redirect packets to a tunnel interface.

c. Create a QoS policy, and associate the traffic class with the traffic behavior.

d. Apply the QoS policy to Port B.

3. The destination device receives mirrored packets on the tunnel interface and decapsulates the packets. Then, the destination device forwards the packets based on the destination IP address of the original packets. Make sure the destination device has the route and ARP entry to the destination IP address.

Figure 18 Flow mirroring ERSPAN in loopback mode

Encapsulation parameter mode

In this mode, configure a QoS policy on the source device. Configure the QoS policy as follows:

1. Configure a traffic class to match packets.

2. Configure a traffic behavior to flow-mirror traffic to an interface.

3. Associate the traffic class with the traffic behavior.

You can configure flow-mirroring traffic to an interface in one of the following modes:

· Directly specifying an outgoing interface—In this mode, specify both the outgoing interface and encapsulation parameters. The device encapsulates packets with the specified parameters and then forwards packets out of the specified interface.

· Specifying an outgoing interface through route lookup—In this mode, specify only encapsulation parameters without specifying an outgoing interface. The device looks up a route for the encapsulated mirrored packets based on the source IP address and destination IP address of the encapsulated packets. The outgoing interface of the route is a destination interface of the mirrored packets.

In this mode, you can use the load sharing function of a routing protocol to forward mirrored packets to multiple destination interfaces.

As shown in Figure 19, flow mirroring ERSPAN in encapsulation parameter mode works as follows:

1. The source device copies a matching packet.

2. The source device encapsulates the packet with the specified ERSPAN encapsulation parameters.

3. The source device forwards the packet in either of the following methods:

¡ Forwards the mirrored packets out of the specified outgoing interface.

¡ Looks up a route for the encapsulated mirrored packet based on the source IP address and destination IP address of the encapsulated packet.

4. The encapsulated packet is routed to the monitoring device through the IP network.

5. The monitoring device decapsulates the packet and analyzes the packet contents.

The packet sent to the monitoring device through flow mirroring in this mode is encapsulated. In this mode, make sure the monitoring device supports decapsulating packets.

Figure 19 Flow mirroring ERSPAN in encapsulation parameter mode

Monitoring group mode

As shown in Figure 20, flow mirroring ERSPAN in monitoring group mode works as follows:

1. On the source device, configure a monitoring group, add member interfaces to the monitoring group, and configure the encapsulation parameters for the member interfaces.

2. On the source device, apply a QoS policy as follows:

a. Configure a traffic class to match packets.

b. Configure a traffic behavior to mirror traffic to the monitoring group.

c. Create a QoS policy, and associate the traffic class with the traffic behavior in the QoS policy.

d. Apply the QoS policy.

3. The source device copies a matching packet and mirrors the packet to the monitoring group. The member interfaces of the monitoring group encapsulate the packet with the specified encapsulation parameters.

4. The source device forwards the packet in either of the following methods:

¡ Forwards the mirrored packet out of the specified outgoing interface.

¡ Looks up a route for the encapsulated mirrored packet based on the source IP address and destination IP address of the encapsulated packet.

5. The encapsulated packet is routed to the monitoring device through the IP network.

6. The monitoring device decapsulates the packet and analyzes the packet contents.

The packet sent to the monitoring device through flow mirroring in this mode is encapsulated. In this mode, make sure the monitoring device supports decapsulating packets.

Figure 20 Flow mirroring ERSPAN in monitoring group mode

Application scenarios

ERSPAN allows you to mirror packets of interest to a remote analyzer for analysis and monitoring. For example,

· You can monitor TCP connection establishment if the mirrored packets include TCP three-way handshake packets.

· You can monitor the RDMA session state if the mirrored packets include RDMA signaling packets.

Figure 21 ERSPAN-based telemetry application scenario

As shown in Figure 21, the administrator monitors the status of the TCP connection between the source and destination by using ERSPAN-based telemetry. The monitoring process is as follows:

1. The source initiates a TCP connection establishment request by sending a TCP SYN packet to the destination.

2. The switch along the path captures the TCP SYN packet, encapsulates it into a ERSPAN mirrored packet, and sends the mirrored packet to the remote analyzer through a GRE tunnel.

3. The analyzer decapsulates and analyzes the ERSPAN mirrored packet.

4. The switch along the path captures subsequent TCP control packets (SYN/FIN/RST packets) and sends them to the analyzer.

5. The analyzer obtains the packet forwarding path based on the received ERSPAN mirrored packet. It works together with telemetry to report the interface queue information in real time and implements application experience analysis.

Cloud platform-based telemetry

Cloud platform-based telemetry overview

Cloud platform-based telemetry is a wireless network monitoring solution designed for cloud-based network management scenarios. It is applicable to AD-Campus and Cloudnet. This technology uses the cloud management tunnel protocol between the AC and the cloud platform to realize automatic collection and reporting of running data for APs and clients, which provides lightweight and centralized operations support for large-scale wireless networks.

NOTE:

This document uses Cloudnet as an example.

Network architecture

Cloud platform-based telemetry uses a layered distributed architecture to uniformly collect and manage wireless network data. The detailed functions of each component are as follows:

· AP—Manages client access, collects radio status in real time, and reports raw running data.

· AC—Processes data across the entire link. The aggregation node collects multi-dimensional AP data (such as client association, traffic statistics, and channel interference), the edge compute node preprocesses data and extracts characteristics, and then the protocol conversion gateway generates standardized and timestamped cloud management tunnel protocol packets.

· Cloudnet platform—Integrates the data lake and the analysis engine core modules. It uses the data lake to provide efficient storage and retrieval of massive device running data, and the analysis engine to provide intelligent capabilities such as wireless network health scoring and root cause diagnostics.

· Third-party system—Connects to Cloudnet via standard RESTful APIs and supports secure and controllable data exchange and function integration.

Figure 22 Network architecture

Operating mechanism

Cloud platform-based telemetry supports the following data report modes:

· Data aggregation and report by the AC—Default mode. This mode is automatically used after the AC connects to Cloudnet. In this mode, the AC aggregates data from multiple APs and centrally reports it to Cloudnet.

· Direct data report by APs—Applicable to scenarios that require high-precision data collection. The administrator can use the AC to deploy configuration to enable this mode. In this mode, each AP directly reports running data to Cloudnet, which improves the timeliness of data collection.

You can flexibly combine the two modes. You can configure only the mode in which data is aggregated and reported by the AC, or additionally configure the other mode to meet requirements of different scenarios. This ensures efficient collection and management of wireless network data.

Figure 23 Flow for data aggregation and report by the AC

Figure 24 Flow for direct data report by APs

Data aggregation and report by the AC

This mode uses layered data collection and centralized data reporting of aggregated.

1. The administrator enables APs to report wireless client statistics and radio statistics via the AC, and specifies the report interval.

2. APs sample data periodically and report metrics to the AC, including information about all online clients, AP information, channel usage, and retransmission rate. AP information refers to running status information about APs, for example, CPU and memory status. Online client information includes information about RSSIs and traffic.

3. The AC monitors key events such as client roaming and radio frequency interference based on the predefined policy and performs event-triggered data collection.

4. The AC periodically reports both its own information and collected information to Cloudnet with the default granularity (5 minutes). AC information is its running status information, for example, CPU and memory status.

5. To ensure efficient data transmission, this mode uses the WebSocket protocol and encapsulates data in JSON format.

6. The encapsulated data is uploaded to Cloudnet and displayed on the Client Connection Info page under the Smart O&M tab.

Figure 25 Viewing client connection information

Direct data report by APs

This mode is applicable to scenarios that require high-precision data collection.

1. The administrator flexibly configures the address of the server to which APs' operations data is reported and the report interval.

2. APs sample data periodically and report information about all online clients to Cloudnet. Online client information includes information about RSSIs and traffic.

3. To ensure efficient data transmission, this mode uses the UDP protocol and encapsulates data in JSON format.

4. The encapsulated data is uploaded to Cloudnet and displayed on the Client Connection Info page under the Smart O&M tab. If you use this mode, turn on the Collect At 1 Min feature on the Client Collection Info page.

Figure 26 Viewing client connection information (when the Collect At 1 Min feature is turned on)

Typical networking

Cloud platform-based telemetry is applicable to the AC+AP architecture. In this architecture, a cloud management tunnel is established between the cloud platform and the AC to ensure secure and reliable data transmission. The AC establishes CAPWAP tunnels to APs to realize efficient data exchange and management. APs collect data in real time and either report it to the AC or directly report it to the cloud platform. The AC processes data from multiple APs and sends collected data to the cloud platform for comprehensive network monitoring and management.

Figure 27 Typical networking for cloud platform-based telemetry

Telemetry Technology White Paper-6W101

Telemetry implementation methods and their differences

gRPC protocol stack layers

INT-based telemetry

Benefits

INT network model

INT packet formats

Inherent header format

Metadata that INT can collect

ERSPAN-based telemetry

Tunnel mode

Encapsulation parameter mode

Encapsulation parameter mode

Monitoring group mode

Intelligent Terminal Products

Product Support Services

Technical Service Solutions

Resource Center

Policy

Online Help

Become a Partner

Partner Policy & Program

Global Learning

Partner Sales Resources

Service Business

News & Events

Contact Us