H3C AMD G6 Servers RAS Technology White Paper-6W101

Title	Size	Downloads
H3C AMD G6 Servers RAS Technology White Paper-6W101-book.pdf	1.14 MB

Table of Contents

H3C AMD G6 Servers RAS Technology White Paper-6W101

Related Documents

book

Title	Size	Download
book	1.14 MB

H3C AMD G6 Servers RAS

Technology White Paper

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

The information in this document is subject to change without notice.

Contents

Overview·· 1

Benefits· 1

Applicable products· 2

Using this document 2

RAS system architecture· 3

RAS operating mechanism·· 4

Error detection· 5

Error reporting· 6

Error processing· 7

Memory CE funnel and threshold mechanism·· 8

Memory CE funnel mechanism·· 8

Memory CE threshold· 11

SMI interrupt control 13

Memory CE report method· 13

Reliability· 14

Memory RAS· 14

On-chip ECC/Parity· 14

DRAM ECC· 14

DRAM Error Check and Scrub (ECS) 14

DRAM Writeback Suppression on X4 Writes· 15

DRAM UECC Retry· 15

DRAM Address/Command Parity with Replay· 15

DRAM Write Data CRC with Replay· 15

DRAM Read Data CRC with Replay· 16

DRAM Patrol Scrubber 16

DRAM Redirect Scrubber 16

GMI/xGMI RAS· 17

On-Package Link Errors· 17

Off-Package Link Errors· 17

PCIe RAS· 17

PCIe® End-to-End CRC· 17

Availability· 18

Data Poisoning· 18

Fatal Error Recovery· 18

Machine Overflow Recovery· 19

CPU Watchdog Timers· 19

Decoding Last I/O Addresses Table· 19

DRAM Thermal Throttling· 20

CPU Thermal Throttling· 20

PCIe speed reduction and bandwidth reduction detection· 20

PCIe Downstream Port Containment and Error Disconnect Recovery· 21

PCIe Orderly and Surprise Hotplug· 21

DRAM Memory Tester 21

Automatic Boot-time DIMM MapOut 22

Automatic Boot-time Core Disable· 22

Serviceability· 23

In-band Error Reporting· 23

Machine Check Architecture· 23

Platform First Error Handling· 24

PCIe® Advanced Error Reporting (AER) 24

CXL™ Error Handling· 24

CXL™ Protocol Error Reporting· 25

CXL™ Component Error Reporting· 25

PMIC Error Handling· 25

Machine Check Alert 26

Fatal Error Alert 26

Thermal Throttle Alert 26

MCA Mailbox· 26

Legacy SB-RMI MCA Interface· 27

Reset Reason Mailbox· 27

ACPI Platform Error Interface (APEI) 27

APEI Boot Error Record Table (BERT) 27

APEI Hardware Error Status Table (HEST) 28

APEI Error Injection Table (EINJ) 28

Out of Band Error Monitoring· 28

DRAM Corrected Error Leaky Bucket Counters· 29

Boot Status Indicators· 29

DRAM Post-Package Repair (CPU) 29

MCA Address Translation· 30

MCA FruText and DIMM FRU Identification· 30

Platform RAS features· 31

ABL RAS· 31

Error Injection· 31

IPMI command classification for RAS reporting· 31

SMI storm suppression· 33

SWITCH RAS· 33

Overview

The server is one of the key components of any modern data center infrastructure. It includes various components such as processors, storage devices, PCIe devices, power supplies, and fans. To ensure service continuity, correct server operation based on data integrity are critical to a modern data center. In other words, we must avoid data corruption no matter data is stored in any server component (memory, cache, or processor registers) or transmitted through any platform links (xGMI or PCI Express).

When a server component fails, the set of reliability, availability and serviceability (RAS) features can meet the above requirements by maximizing service availability and maintaining data integrity.

Table 1 RAS definition based on H3C G6 servers

Item	Definition
Reliability	Probability that the system produces the correct output within a given time T, as measured by the mean time between failures (MTBF) metric. It can be enhanced by avoiding, detecting, and repairing hardware failures. A reliable system does not provide incorrect data and calculation results, but can detect and correct data corruption.
Availability	Probability of correct system running at a given time, and the percentage of the actual running time of the server to the actual running time.
Serviceability	How easy and fast the system can be repaired or maintained. If the time for the system to repair errors increases, the serviceability decreases. Serviceability can be improved by simplifying system issue diagnosis and providing clear and intelligent advance warnings of failures to avoid system failures.

Classify errors from high to low error severity into uncorrectable errors, delayed errors, correctable errors and transparent errors.

Table 2 Error categories

Item	Definition
Uncorrectable errors	Errors that cannot be corrected by hardware are reported to software through Machine Check Exception (MCE).
Delayed errors	Errors that cannot be corrected by hardware but does not result in immediate interruption of program progress, loss of data integrity, or corruption of processor state. Delayed errors indicate that the data is poisoned but not used, such as poisoned data.
Correctable errors	Correctable errors can be fixed by hardware and do not lead to data loss or processor status damage.
Transparent errors	Errors that has been corrected by the hardware, does not result in data loss or processor state corruption, and the hardware is capable of correcting the next error in the same device. For example, a single-bit error occurs in a cache with double-bit error correction capability.

Benefits

RAS can provide the following benefits:

· Increased system uptime—Increases system reliability for the system to stay longer, as measured by the Mean Time Between Failures (MTBF), Annual Crash Rate (ACR), or Annual Service Rate (ASR) metric.

· Reduced the duration of unexpected downtime—Adopts AMD EPYC processors to support synchronized hardware and firmware logging, helping users to identify and isolate errors and take preventive or proactive maintenance measures. This enables quick system restoration, reduces the cost of repairs, and mitigates the consequences of the outage to the business.

Outages are inevitable even with the best plans and processes. When an unplanned outage happens, a maintainable system can come back online quickly, as measure by the Mean Repair Time (MTTR) matric.

· Enhanced data integrity—RAS provides several mechanisms to prevent data corruption or correct poisoned data, which ensures data corruption can get contained once detected.

Applicable products

This document is applicable to the following H3C UniServer servers:

· H3C UniServer R3950 G6

· H3C UniServer R4950 G6

· H3C UniServer R5350 G6

· H3C UniServer R5500 G6 AMD

Using this document

The information in this document is subject to change over time.

The information in this document might differ from your product if it contains custom configuration options or features.

RAS system architecture

Based on AMD RAS architecture, H3C provides a complete fault management system together with hardware, the BIOS, HDM, and OS error processing mechanisms. The system can provide functions such as error diagnosis, error location, error correction, information collection, and error reporting. Since the core of the system runs on the BIOS and HDM, it does not rely on the OS and can perform all-time detection of the system and take corresponding measures once an error occurs.

As shown in Figure 1, the fault management system contains the hardware layer, CPLD, processor platform, HDM (out-of-band management), the BIOS, and OS.

· HDM—Core of the error location system. It is responsible for error information collection and analysis and can display error information as event logs or alarms from the Web interface.

· Processor platform—Supported by AMD EPYC processors, which provides more powerful management of errors occurred on processors, memory modules, and PCIe devices.

· CPLD—Connects downlink hardware modules, including power supplies, fans, and other underlying hardware (except processors, memory modules, drives, and standard PCIe modules), captures hardware exceptions, connects to HDM at the uplink, and transmits error information.

· BIOS—Collects and locates errors occurred on processors, memory modules, PCIe devices, and storage devices, provides error location results to HDM, and provides OS-level error management interfaces, such as APEI, to the OS.

· Web interface—Web interface provided by management tools, such as HDM, for users to maintain the server locally or remotely. Users can use the Web interface together with LEDs of specific server components to manage the server.

· Involved protocols—Protocols used by the fault management system includes eSPI, SPI, PCIe, UART, I2C, SMBUS, and LocalBus.

Figure 1 H3C fault management system architecture

RAS operating mechanism

The basic error processing schemes of RAS are as follows:

· For transparent errors, no further action is required at the upper layers, as the hardware has already corrected the errors.

· For correctable errors, RAS marks the error location and fast repair the corresponding module. Users will not aware the occurrence of such errors.

· For delayed errors, the location of toxic data will be immediately marked, and by limiting the spread of poisoned data, delayed errors will be prevented from evolving into more serious problems.

· For uncorrectable errors, RAS isolates the errors by isolating bad memory blocks or degrading the bus to maintain system operation. If severe errors occur and result in system outage, you must use HDM to restore or restart the system.

RAS technology for AMD G6 servers is realized based on the following mechanisms:

· Maintain reliable operation through robust components and maximum error detection.

· Avoid system failures by reducing global errors, error correction, and continuing operation despite uncorrectable errors.

· Avoid system downtime (mainly for planned downtime) through the following methods:

¡ First-time error diagnosis: Try to capture enough data to complete the error diagnosis when the mistake occurs for the first time.

¡ Internal redundancy design and online maintenance capabilities: Allows for continuous operation without replacing parts or problem-solving without shutting down. For more information about redundancy design and online maintenance, see the detailed introduction.

· Flexibility: The choice of whether to enable a function is primarily implemented in the firmware, not in the hardware.

The new and improved RAS features in the AMD Genoa/G6 are as follows:

· Reliability improvement:

¡ Support Advanced Memory Device Correction (AMDC).

¡ Support DRAM Error Check and Scrub (ECS).

¡ Support DRAM read and write data CRC.

¡ Support PCIe/CXL LCRC + replay.

¡ Support PCIe/CXL.io ECRC.

¡ Support PCIe/CXL.io uncorrected error detection.

· Availability improvement:

¡ Support on-chip watchdog timers.

¡ Support PCIe system firmware intermediary.

· Maintainability improvement (diagnosis and repair):

¡ Support in-band error reporting.

¡ Support memory error reporting.

¡ Support out-of-band error monitoring.

¡ Support DRAM runtime post-package repair.

¡ Support error injection.

¡ Support error injection on secure silicon.

· Added features:

¡ DRAM Corrected Error Leaky Bucket Counters.

¡ Advanced Platform Management Link interface (APML).

Figure 2 RAS schemes

The following introduces the key points of AMD RAS technology according to the processes of error detection, reporting, and processing.

Error detection

Error detection of AMD G6 series servers is mainly achieved through ECC, Parity and CRC technologies in key components and areas. Errors are mainly divided into data errors, bus errors and logical function errors. The corresponding common methods of error detection are as follows:

· The general data module uses ECC and Parity for detection.

· The bus interface module uses the CRC for detection.

· The logic functional module uses the Timeout for detection.

Error correction is generally achieved through ECC and Retry. The main data and bus module's error detection and error correcting capability are shown in Table 3.

Table 3 Error detection methods for different modules

Module	Detection method
CPU Core	Parity
L1 Data Cache	ECC
L1 Data Tag	ECC
L1 Instruction TLB	Parity + Retry
L1 Instruction Cache	Parity + Retry
L1 Instruction Tag	Parity + Retry
L2/L3 Cache	DEC-TED ECC
L2/L3 Cache Tag	SEC-DED ECC
DRAM Address/Command	Parity + Replay
DRAM Write Data	CRC + Replay
Memory Controller	SEC-DED ECC
Memory Controller DF	Parity
NBIO, PCIE and NBIF	Parity + ECC
SATA	Parity
USB	ECC
FCH A-Link	Parity
On-Chip Data Bus	Parity
Off-Package Link Packet	Parity + Retry
On-package and off-package PHY Controller	ECC
System Probe Filter	ECC
System Management Network	Parity
SMN Off-Package Link Packet	CRC + Retry
SMN On-Package Link Packet	CRC + Retry
SMU	Parity + ECC
MP5	Parity + ECC
PSP	Parity + ECC
Parameter Block	ECC

The most error-prone part of the CPU is the data storage module in the CPU, that is, the cache at all Layers. The AMD G6 server series offers comprehensive cache protection mechanisms, including ECC for Layer 1 data cache, parity check for Layer 1 instruction cache, and double-error correction, triple error detection (DEC-TED) for Layer 2 and Layer 3 cache. Compared to traditional single-error correction and double error detection (SEC-DED) ECC, it adds an additional correction layer. EC-TED reduces the impact of multi-bit transient errors by correcting double-bit errors and also reduces the impact of single-bit hard faults. EC-TED is able to correct a second single-bit error in the affected part of the cache.

In addition to ECC, the memory also supports Patrol Scrubber, Redirect Scrubber and Poison Scrubber. Some memory locations might not be accessed for a long time. The Patrol Scrubber's periodic memory inspection function periodically traverses all memory zones, correcting correctable mistakes. It's a good, low-cost method for checking memory and enhancing data integrity. Redirect Scrubber refers to the operation of correcting the error and writing the correct data back to the memory if a correctable error is found when the CPU instruction actively reads the memory data. The AMD G6 series server also supports Poison Scrubber, which marks the location of poison data to prevent reporting unnecessary delay errors.

Error reporting

The system reports detected errors and generates error log messages. The AMD G6 series server supports four types of error reporting methods: MCA, MCAX, AER, and AMPL.

MCA records processor and system hardware errors for reporting to the operating system. When an error is detected, MCA will save information about the error in specific registers and interrupt error reporting of the specified type. For hardware correctable errors, the software logs error information for subsequent diagnosis and repair. If the error cannot be corrected, the error is classified as an uncorrectable error. When an uncorrectable error occurs, the system immediately takes steps to correct the error and resume the interrupted program. If the error cannot be corrected by software, MCA determines the extent of the impact of the uncorrected error on the execution instruction flow and the architectural state of the processor or system, and then takes action to control the impact of the failure by terminating the damaged software process.

For errors that are not corrected and have no direct impact on the system, processor kernel, or the schema state of any currently executing thread, the hardware (HW) may classify them as delayed errors. Information about latency errors will be logged but not reported via MCE. Instead, the hardware continuously monitors for errors and upgrades them to uncorrectable errors when an error condition is about to affect the execution of the instruction flow or cause corruption of the processor core or system architectural state. At this point, if the reporting for this mistake source is enabled, it will trigger a disrupt in MCE to report the error. If the error can be corrected at the system level, it may be possible to restore the affected programs. If the error cannot be corrected, the operating system can terminate the affected program without shutting down the entire system.

The processor also supports Machine Check Architecture Extensions (MCAX), which is AMD's x86-64 expansion of MCA. It provides a richer MCA bank than traditional MCA and provides bank ownership all the way to a single kernel. Since each bank is owned by a single kernel, the MCA register can only be accessed from the MCA main thread, which helps ensure that interrupts from the machine checking the bank are routed to the appropriate thread. The MCA X architecture supports up to 256 MCA groups, each with 16 memory registers. The expanded MCA architecture holds more information about system health, which can be queried to speed troubleshooting.

Advanced Error Reporting (AER) is mainly used to report and handle PCIe errors. AER can locate the error source in the PCIe architecture, provides a standardized control mechanism for error messages received by PCIe Root through interrupt reporting, and can distinguish the severity of various uncorrectable errors.

Advanced Platform Management Link (APML) follows the SMBus protocol, also known as Sideband Interface (SBI). APML provides an interface for out-of-band access to MCA registers. You can use the APML interface to communicate with the server's out-of-band management system HDM so that error information can be directly obtained by HDM

Error processing

If a mistake is detected but cannot be corrected, the impact of the error is minimized through data poisoning marking or fault tolerance on the link.

Data poisoning is a mechanism that enables machine check recovery by transforming global uncorrected errors into local uncorrected errors, reducing the frequency of system interrupts. Data poisoning involves detecting, marking, and tracing uncorrectable data errors. As poisoned data moves through the system, the poisoned state is retained in memory, cache, and links so that the poisoned data is not used by the system. When marked by data poisoning, uncorrectable data errors can lead to fatal conditions for the system. Through data poisoning, uncorrectable data errors can be contained within the process context. When the system terminates the affected process, other processes and the system itself are not affected. The system remains available despite data errors. When the CPU attempts to use data marked as poison, poisoning consumption occurs. The poisoning consumption passes the Machine Check Anomaly (MCE) in the context of the consumption process or task, enabling the MCA to identify the process consuming data and proceed with restoration or termination. When the processor detects poisoned data, it logs a delayed error in the MCA bank to identify the location of the uncorrectable data error, helping determine if hardware failure has occurred and reconfigure the system to prevent the use of faulty hardware.

In terms of memory, AMD G6 series servers support the JEDEC-defined Post Package Repair (PPR) feature, which supports configuring spare DRAM rows to replace faulty ones. The combination of hardware and firmware support from the processor enables both soft (reconfigurable) and hard (permanent) repair, ensuring that DIMMs with problematic DRAM rows can be reconfigured and maintain the same level of reliability as before the issue occurred.

In terms of PCIe, the PCIe bus is protected by AER and EDPC, which can help isolate and recover from errors on the PCIe link. When an uncorrectable error occurs on a PCIe root port, the EDPC feature can recover by disconnecting and reconnecting the affected PCIe link. To ensure uninterrupted operation of devices under the OS, the PCIe root port saves the context of the PCIe endpoint device before disconnecting the link. After the link is restored, the context information of the PCIe endpoint is accurately recovered to ensure the system operates unaffected.

Memory CE funnel and threshold mechanism

Memory CE funnel mechanism

The memory CE funnel mechanism is designed to manage and control Correctable Error (CE) events on the AMD G6 platform. By reducing the value by one within a set time and discarding the value, memory CE funnel controls the frequency of overall error event reporting. The AMD Genoa platform provides the following features:

· Funnel enablement register: Used to enable or disable the funnel feature. The disabling state is equivalent to the faucet being closed in Figure 3.

· Speed register: Used to set the flow rate, which is equivalent to the degree to which the faucet is opened or closed as shown in Figure 3.

Figure 3 Memory CE funnel mechanism

Selecting the LeakMode or NoLeakMode mode

Set the DRAM Corrected Error Counter Enable value from the BIOS Setup Utility. Options include:

· LeakMode—Enable the funnel feature.

· NoLeakMode—Disable the funnel feature.

Figure 4 Selecting the LeakMode or NoLeakMode mode

Setting the Leak Rate

Set the DRAM Corrected Error Counter Leak Rate value from the BIOS Setup Utility.

Figure 5 Setting the Leak Rate

Figure 6 shows available values of the leak rate. The default value is 07h, which represents leak one memory CE every 10.24 us.

Based on experience, it is generally recommended to discard one CE per second. Therefore, you can select the closest available option, which is 18h, meaning one is discarded every 1.34 seconds.

Figure 6 Available values of the leak rate

Memory CE threshold

The memory CEs are accumulated by a counter. The counter starts at the Start Count value and increases by 1 every time a CE occurs. When the counter value reaches 0xFFFF, the counter overflows, which triggers an SMI interrupt. The BIOS starts to execute the interrupt.

Figure 7 Memory CE threshold

Setting the Start Count for the memory CE counter

Set the DRAM Corrected Error Counter Start Count value from the BIOS Setup Utility to specify the start value of the counter.

Figure 8 Setting the Start Count value for the memory CE counter

When the counter value reaches 0xFFFF, it triggers an SMI interrupt. The BIOS starts to execute the interrupt.

From this, it can be determined that the memory CE threshold is the value between Start Count and 0xFFFF. The threshold is calculated as follows:

For example, if the target threshold is 5000:

Start Count = 0xFFFF – 5000 = 0xEC77

Therefore, set the BIOS Setup option value to 0xEC77.

Setting the Memory CE Report Count Threshold

You can choose to report errors based on the frequency of persistent memory CEs. When the set threshold is met, SMI interrupts are masked to prevent memory CE reporting, enhancing system reliability and accuracy.

The default value of the Memory CE Report Count Threshold option on the BIOS Setup Utility is 1.

SMI interrupt control

Setting the MCA error threshold

You can set the threshold for correctable errors in the Machine Check Architecture (MCA). When the count reaches 0xFFF, it will trigger an SMI interrupt.

Configuring SMI storm suppression

To prevent performance impacts from excessive SMI interrupts, the AMD platform offers an SMI storm suppression mechanism. When Machine Check Exception (MCE) frequency exceeds 5 times per second, the system enters SMI Polling mode, handling MCE events at a rate of 5 times per second. Once the frequency drops below 5 times per second, the system reverts to SMI Interrupt mode.

Memory CE report method

Memory CE reporting to BMC

The BIOS reports memory CE information through the IPMI interface, which includes three formats: SEL log, SDS log, and OEM IPMI commands.

Memory CE reporting to OS

The BIOS reports error records to the ACPI Table using APEI-HEST-GHES. You can view these records in the OS using the dmesg command.

Reliability

Memory RAS

On-chip ECC/Parity

Feature name	On-chip ECC/Parity
Description	The processor protects critical on-chip SRAM, triggers, and latch arrays using ECC or parity check. The array can prevent multi-bit errors through interleaving.
Purpose	Improve system reliability and accuracy.
Configuration	Disabled by default and can be enabled from the BIOS.
Remarks	N/A

DRAM ECC

Feature name	DRAM ECC
Description	The memory controller of the processor supports two different error correction codes. Both DRAM error correction codes use symbol-based coding to form ECC. x4 ECC uses 36 4-bit symbols to generate a 144-bit ECC containing 128 data bits and 16 parity bits. x16 ECC uses 18 16-bit symbols to generate a 288-bit ECC containing 256 data bits and 32 parity bits. When x4 DRAM devices are used on an 80-bit channel, x16 ECC with Advanced Memory Device Correction (AMDC) can correct all errors caused by a single failed DRAM device. x16 ECC is compatible with the DDR5 limited-failure specification, providing better correction performance when using x8 DRAM devices on an 80-bit channel or x4 DRAM devices on a 72-bit channel.
Purpose	Improve system reliability and accuracy.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM Error Check and Scrub (ECS)

Feature name	DRAM Error Check and Scrub (ECS)
Description	The memory controller periodically corrects single-bit errors detected by ECC within the chip. This feature must be enabled in the memory controller. When a DRAM device exceeds a specific error threshold, the MCA of the memory controller records the error channel, device, bank, and row.
Purpose	Improve system reliability and accuracy.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM Writeback Suppression on X4 Writes

Feature name	DRAM Writeback Suppression on X4 Writes
Description	DRAM provides a method to suppress data write-back during a read-modify-write (RMW) operation when writing to X4.
Purpose	X4 device error correction write-back inhibition.
Configuration	Disabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM UECC Retry

Feature name	DRAM UECC Retry
Description	If an uncorrected ECC error occurs, the processor can retry the command, providing recovery for transient errors on the data bus. If retrying doesn't result in an ECC error or corrects it, data is forwarded and the corrected error is recorded. If the maximum retry count is reached and the uncorrected ECC error persists, the data is poisoned, and the delayed error is recorded.
Purpose	Improve system reliability and accuracy.
Configuration	Disabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM Address/Command Parity with Replay

Feature name	DRAM Address/Command Parity with Replay
Description	Based on the JEDEC DDR5 standard, the processor implements RCD parity check on the DDR5 address or command bus. If a parity check error occurs, the command can be replayed. This feature provides detection and recovery for transient errors on the bus.
Purpose	Improve system reliability and accuracy.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM Write Data CRC with Replay

Feature name	DRAM Write Data CRC with Replay
Description	According to the JEDEC DDR5 standard, the processor performs cyclic redundancy check (CRC) on the write data packets. If a CRC error occurs, the command can be replayed. This function provides detection and recovery for transient errors on the bus.
Purpose	Improve system reliability and accuracy.
Configuration	Disabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM Read Data CRC with Replay

Feature name	DRAM Read Data CRC with Replay
Description	According to the JEDEC DDR5 standard, the processor performs cyclic redundancy check (CRC) on the read data packets. If a CRC error occurs, the command can be replayed. This function provides detection and recovery for transient errors on the bus. Errors not corrected through replay will result in data poisoning and record delayed errors.
Purpose	Improve system reliability and accuracy.
Configuration	Disabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM Patrol Scrubber

Feature name	DRAM Patrol Scrubber
Description	DRAM performs read-modify-write operations periodically on memory locations to detect and correct potential errors. This feature is awakened once per cycle and checks the next sequential position in memory. The processor can configure this feature to simultaneously erase multiple address ranges, improving erase speed on systems with high memory occupancy.
Purpose	Improve system reliability and accuracy.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM Redirect Scrubber

Feature name	DRAM Patrol Scrubber
Description	When a read request encounters a correctable ECC error, the DRAM redirect handling feature will be invoked as necessary. This feature immediately writes the corrected data back to memory.
Purpose	Improve system reliability and accuracy.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

GMI/xGMI RAS

On-Package Link Errors

Feature name	On-Package Link Errors
Description	CRC errors might occur in the GMI/xGMI encapsulation. When correctable CRC errors exceed a threshold, they are recorded in MCA to indicate system performance degradation. Uncorrectable errors in the link will also be logged in MCA, resulting in a system fatal error event.
Purpose	Improve the reliability of GMI/xGMI link data transmission.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

Off-Package Link Errors

Feature name	Off-Package Link Errors
Description	The external GMI/xGMI link encapsulation might encounter CRC errors. When correctable CRC errors exceed a threshold, they are recorded in MCA to indicate system performance degradation. Uncorrectable errors in the link will also be logged in MCA, resulting in a system fatal error event.
Purpose	Improve the reliability of GMI/xGMI link data transmission.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

PCIe RAS

PCIe® End-to-End CRC

Feature name	PCIe® End-to-End CRC
Description	The processor supports generating PCIe ECRC and checking all PCIe root ports. For peer-to-peer requests passing through the processor, ECRC is not preserved. Typically, ECRC generation should only be enabled in the root ports and endpoints when there are switches between them.
Purpose	Achieve higher data integrity when exchanging data between two PCIe endpoints.
Configuration	Enabled by default.
Remarks	N/A

Availability

Data Poisoning

Feature name	Data Poisoning
Description	When a system encounters a poison, it will attempt to mark it. The marking process generates poison consumption, which in turn results in reporting to the MCA during the consumption process in the context.
Purpose	Improve system reliability.
Configuration	Automatically takes effect and cannot be disabled.
Remarks	N/A

Fatal Error Recovery

Feature name	Fatal Error Recovery
Description	When the CPU detects an unmarked, unrecoverable error, it acts to contain data corruption, issuing a system-level fatal error event to the internal Data Fabric module. This event freezes the Data Fabric's data transmission queues to prevent bad data propagation to non-volatile storage. In the event of a system fatal error, the CPU supports system recovery through a warm reset and the option for an out-of-band MCA information collection in case of a system hang during the warm reset. If warm reset fails, the CPU initiates a cold restart, also supporting the triggering of a system hang before the cold restart. To facilitate debugging, the processor logs all fatal error events in the NBIO MCA bank. AMD recommends platforms to shield fatal errors in the NBIO MCA, as fatal errors will still be recorded in the original block's MCA (or MCAX).
Purpose	Improve system reliability.
Configuration	The Reset after Sync-Flood menu controls whether a hot reset is executed when a fatal error occurs. When this menu is set to TRUE, a fatal error triggers a hot reset. When set to FALSE, a fatal error will not trigger a hot reset.
Remarks	N/A

Machine Overflow Recovery

Feature name	Machine Overflow Recovery
Description	Machine Overflow Recovery allows system recovery in overflow situations. When MCi_STATUS[Overflow] records a non-fatal error, Machine Overflow Recovery is supported. If a system-critical error occurs, Machine Overflow Recovery is not allowed. When Machine Overflow Recovery is supported, software records all system critical situations via MCA_STATUS[PCC]. If Machine Overflow Recovery is not supported, unrecoverable errors recorded in MCA_STATUS[Overflow] will not be logged as recovery errors.
Purpose	Improve system reliability.
Configuration	Automatically takes effect and cannot be disabled.
Remarks	N/A

CPU Watchdog Timers

Feature name	CPU Watchdog Timers
Description	The CPU watchdog timer (WDT) is used to detect situations where the x86 kernel cannot make forward progress and to recover from them. It is a configurable counter that is reset upon completion of each instruction operation. If no operation is completed within a specified time, a system fatal error event is generated.
Purpose	Use a watchdog timer to track execution progress and detect when the processor is unable to continue forward execution.
Configuration	Enabled by default.
Remarks	N/A

Decoding Last I/O Addresses Table

Feature name	Decoding Last I/O Addresses Table
Description	When a fatal error occurs, diagnostic software records the Last I/O address to DF::OrigWdtAddrLogLo and DF::OrigWdtAddrLogHi. The address range 0 to FFFFFFFC_FFFFFFFF corresponds to DRAM or MMIO, and the specific physical DRAM location can be determined further through the OS's memmap. The range FFFFF_10000000 to FFFFF_1FFFFFF corresponds to a PCIe device, where bits 27-20 represent the bus, bits 19-15 represent the device, and bits 14-12 represent the function, allowing identification of the specific PCIe device location. The range FFFFFFD_FC000000 to FFFFFFD_FC00FFFF corresponds to PCIe I/O.
Purpose	Identify whether the fatal error is caused by an external component through the decoded address.
Configuration	Enabled by default and collected through ADDC.
Remarks	N/A

DRAM Thermal Throttling

Feature name	DRAM Thermal Throttling
Description	When approaching DIMM temperature limits, the processor supports thermal throttling. It increases refresh rates to maintain data integrity and imposes bandwidth limits on the command bus. The processor also supports firmware-driven bandwidth limits, enabling a response to platform-detected events using APML.
Purpose	Increase availability by avoiding downtime situations due to exceeding temperature limits.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

CPU Thermal Throttling

Feature name	CPU Thermal Throttling
Description	The processor supports thermal throttling when approaching temperature limits. Increase availability by avoiding downtime situations due to exceeding temperature limits.
Purpose	Improve system availability and maintainability.
Configuration	Enabled by default and partially configurable.
Remarks	N/A

PCIe speed reduction and bandwidth reduction detection

Feature name	PCIe speed reduction and bandwidth reduction detection
Description	The processor supports thermal throttling when approaching temperature limits. Increase availability by avoiding downtime situations due to exceeding temperature limits.
Purpose	Improve system availability and maintainability.
Configuration	Enabled by default and partially configurable.
Remarks	N/A

PCIe Downstream Port Containment and Error Disconnect Recovery

Feature name	PCIe Downstream Port Containment and Error Disconnect Recovery
Description	The processor supports the DPC functionality described in the PCIe specification, including RP PIO extension and DL_Active ERR_COR signal. Platform firmware and operating systems coordinate using downstream port containment (DPC) and Surprise Down functionality of the processor to support PCIe link error recovery.
Purpose	Avoid propagation of potentially bad data with error isolation and recovery on PCIe links
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

PCIe Orderly and Surprise Hotplug

Feature name	PCIe Orderly and Surprise Hotplug
Description	Supports hot-swapping of PCIe devices, allowing users to replace faulty devices and restore the server to normal operation when the system is not powered on.
Purpose	Improve system availability.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

DRAM Memory Tester

Feature name	DRAM Memory Tester
Description	The processor offers the capability to test installed memory to check for damaged DIMMs. During the reset sequence, the memory tester writes and reads data to a set of addresses on each installed channel and DIMM in the system. If an uncorrectable ECC error is detected while accessing a DIMM, it is marked as damaged. When a DIMM is marked as damaged, the processor disables it and performs a system hot reset to reinitialize the memory mapping without the DIMM in the system.
Purpose	The memory tester's purpose is not to exhaustively search for all DIMM errors, but to test for completely damaged DIMMs.
Configuration	Enabled by default.
Remarks	N/A

Automatic Boot-time DIMM MapOut

Feature name	Automatic Boot-time DIMM MapOut
Description	If one of the two memory tests, PMU training and Agesa Memory Test, fails, the failed memory will be moved out of the Map and disabled.
Purpose	Improve system memory reliability and stability.
Configuration	Enabled by default.
Remarks	N/A

Automatic Boot-time Core Disable

Feature name	Automatic Boot-time Core Disable
Description	This processor implements a feature where if the core fails the Built-In Self-Test (BIST), it can be ignored from the active configuration. If a core or cache fails the BIST, the processor will report the faulty core complex and attempt to boot with the minimum number of core complexes as specified in the processor programming reference.
Purpose	Automatically disable cores that have not passed the Built-In Self-Test.
Configuration	Disabled by default.
Remarks	N/A

Serviceability

In-band Error Reporting

Machine Check Architecture

Feature name	Legacy x86 Machine Check Architecture (MCA)
Description	The processor implements the x86 Machine Check Architecture (MCA). MCA defines a way to record processor and system hardware errors and report them to system software, allowing system software to play a role in the recovery and diagnosis of hardware errors.
Purpose	Ensure component-level reliability.
Configuration	Automatically takes effect and cannot be disabled.
Remarks	N/A

Feature name	AMD Machine CheckArchitecture Extensions(MCAX)
Description	MCAX, an extension of the Machine Check Architecture, provides a richer feature set than the traditional x86 Machine Check Architecture. The extended features include: · Expansion of MCA banks: Supports extending the number of MCA banks supported by AMD processors to support comprehensive error logging for many blocks in the processor. · Expansion of MCA banks size: Enhanced error handling, improved diagnostics, and finer configuration by recording extended information in MCA Banks. · Single-kernel ownership of MCA banks: Each MCA register set is only visible to one core, and no software synchronization is required when accessing MCA set registers.
Purpose	Enhance BIOS control over faults.
Configuration	Automatically takes effect and cannot be disabled.
Remarks	N/A

Feature name	Machine Check Architecture (MCA) Thresholding
Description	The error threshold is used to count the number of errors, and the processor implements the x86 MCA threshold through the MCA_MISCx register. Supports setting error threshold via saturation counter, SMI interrupt will be sent on overflow.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

Platform First Error Handling

Feature name	Platform First Error Handling (PFEH)
Description	The processor implements platform-first error handling, enabling all errors logged in the MCA to be reported first to the platform firmware rather than to the operating system or hypervisor. This feature allows the platform firmware to take platform-specific action (for example, repair or log) for each error before notifying the operating system or hypervisor. For example, platform firmware can implement predictive failure analysis to reduce service costs or future downtime.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

PCIe® Advanced Error Reporting (AER)

Feature name	PCIe AER
Description	Each PCIe root port on the processor supports AER, enabling advanced error handling, diagnostics and recovery capabilities for PCIe devices. This feature works on physical root ports that connect external devices and internal root ports that connect internal PCIe devices. The processor supports OS-first and firmware-first reporting of PCIe AER errors.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

CXL™ Error Handling

Feature name	CXL Error Handling
Description	The processor supports CXL protocol version 1.1. For CXL support, the processor provides up to 4 RCs per Package, and each RC supports up to 4 CXL ports. That is, each Package supports up to 16 CXL memory devices. CXL error handling mainly includes device error handling and protocol layer error handling.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

CXL™ Protocol Error Reporting

Feature name	CXL Protocol Error Reporting
Description	CXL protocol fault reporting is completed by the PCIe AER register and the CXL Ras capability structure. Therefore, the strategy used to control CXL protocol fault reporting will be affected by the PCIe AER control strategy. If CXL protocol fault reporting is set to firmware priority, then CXL protocol fault reporting follows the firmware priority principle. If CXL protocol fault reporting is set to PCIe AER reporting, then CXL protocol error reporting follows the PCIe AER fault reporting policy.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

CXL™ Component Error Reporting

Feature name	CXL Component Error Handling
Description	CXL component faults are reported by CXL devices that support the Mailbox interface defined by the CXL Memory device register. For firmware-first fault reporting, each CXL device component that supports CXL component failures provides GHES/CPER structures for fault reporting in the system.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

PMIC Error Handling

Feature name	PMIC Error Handling
Description	On DDR5, the PMIC provides voltage to the DDR core/IO and VPP. According to the JEDEC PMIC specification, PMIC can report faults. The ability of PMIC fault recovery depends on different PMIC manufacturer strategies. In the memory initialization phase, the PMIC register is read to collect fault information. In the firmware execution phase, PMIC faults are reported to BMC and recorded in the BERT table.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default and can be configured from the BIOS.
Remarks	N/A

Machine Check Alert

Feature name	Machine Check Alert
Description	When an MCA error occurs, the CPU sends an alert to APML. The CPU will record this alert in the SB-RMI RasStatus register, which can be accessed through APML.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default.
Remarks	N/A

Fatal Error Alert

Feature name	Fatal Error Alert
Description	When a fatal error occurs, the CPU sends an alert to APML. The CPU will record this alert in the SB-RMI RasStatus register, which can be accessed through APML. At the same time, when a fatal error occurs, the CPU also supports disabling the hot reset action, which can be used to enable MCA collection after a fatal error occurs.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default.
Remarks	N/A

Thermal Throttle Alert

Feature name	Thermal Throttle Alert
Description	The processor supports thermal throttling event alerts. SB-TSI has high/low thresholds that trigger the ALERT_L signal to the BMC.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default.
Remarks	N/A

MCA Mailbox

Feature name	MCA Mailbox
Description	The processor supports two BMC Mailbox commands for querying and retrieving any valid MCA: BMC_RAS_MCA_VALIDITY_CHECK and BMC_RAS_MCA_MSR_DUMP. If warm restart response to fatal errors is disabled, these Mailbox commands can be used at runtime or after a fatal error is generated.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default.
Remarks	N/A

Legacy SB-RMI MCA Interface

Feature name	Legacy SB-RMI MCA Interface
Description	The processor supports the Legacy SB-RMI MCA interface for polling MCA registers. Accessing MCA state through SB-RMI or MSR in-band modes does not conflict. In the event of a severe error, MCA registers cannot be accessed before a reset. As a best practice, use the MCA mailbox method to access MCA instead of this Legacy interface.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default.
Remarks	N/A

Reset Reason Mailbox

Feature name	Reset Reason Mailbox
Description	FCH provides a register that holds the reason for previous restarts. The S5_Reset_Status register of the FCH contains the cause of the last reset, including hardware-induced fatal errors, x86 triple-fault shutdown events, and push-button and software-triggered resets.
Purpose	Enhance BIOS fault control.
Configuration	Enabled by default.
Remarks	N/A

ACPI Platform Error Interface (APEI)

Feature name	APEI
Description	Supports industry-standard Advanced Configuration and Power Interface Platform Error Interface (APEI) version 6.2. AMD supports the Hardware Error Status Table (HEST), Boot Error Recording Table (BERT), and Error Injection Table (EINJ).
Purpose	Improve system availability and maintainability.
Configuration	Enabled by default and cannot be configured.
Remarks	N/A

APEI Boot Error Record Table (BERT)

Feature name	APEI Boot Error Record Table (BERT)
Description	All errors logged at startup create an entry in BERT and are divided into general error data entries, memory error entries, memory parity error entries, PCIe error entries and processor error entries according to the error type.
Purpose	Record startup errors and report them to the operating system.
Configuration	Enabled by default.
Remarks	N/A

APEI Hardware Error Status Table (HEST)

Feature name	APEI Hardware Error Status Table (HEST)
Description	Supports the ACPI standard HEST table to define hardware-related error sources and error types. Errors reported via standard MCA, MCAX support MCE error sources, CMCI sources and DMC sources (MCAX) or PFEH interfaces.
Purpose	Implementation of standardized software and hardware error interfaces.
Configuration	Enabled by default.
Remarks	N/A

APEI Error Injection Table (EINJ)

Feature name	APEI Error Injection Table (EINJ)
Description	AMD supports error injection for type 3, 4, and 5 errors through APEI's EINJ table. The firmware allows separate injection and triggering of each error type. · Type 3 error injection injects correctable DRAM ECC errors at the specified address. · Type 4 error injection injects uncorrectable DRAM ECC errors at the specified address, to be reported and recorded as deferred errors. · Type 5 error injection injects uncorrectable DRAM ECC errors at specified addresses. The triggering action of a Type 5 injection causes the operating system to read the error location and log and report a delay error, and an uncorrectable poison consumption error to be logged and reported via MCA.
Purpose	Support memory ECC error injection and verify related functions.
Configuration	Enabled by default.
Remarks	N/A

Out of Band Error Monitoring

Feature name	Out of Band Error Monitoring
Description	The processor supports out-of-band error monitoring through the APML interface. This can be used by a BMC or similar functional entity to monitor errors occurring on the CPU.
Purpose	Enhance system control over failures.
Configuration	Enabled by default.
Remarks	N/A

DRAM Corrected Error Leaky Bucket Counters

Feature name	DRAM Corrected Error Leaky Bucket Counters
Description	The processor provides a 16-bit DRAM correction error counter for each chip select, which is incremented when a corrected ECC error is detected. For DIMMs that support column multiplication, the memory controller provides a counter for each column multiplier per chip select. When the counter is saturated, an SMI or APIC-based LVT interrupt can optionally be generated.
Purpose	Improves memory reliability and reduces the likelihood of data corruption or system crashes.
Configuration	Enabled by default.
Remarks	N/A

Boot Status Indicators

Feature name	Boot Status Indicators
Description	Boot Status Indicators indicate the status of the boot process. If issues occur during boot, the indicators pinpoint the problem, enabling fault detection.
Purpose	Improved system availability and maintainability.
Configuration	Enabled by default.
Remarks	N/A

DRAM Post-Package Repair (CPU)

Feature name	DRAM Post-Package Repair (CPU)
Description	The JEDEC-defined post-package repair (PPR) feature allows a spare DRAM row to be configured to replace a faulty or failed row. The processor provides hardware and firmware support, allowing soft (reconfigurable) and hard (permanent) fixes.
Purpose	Using DDR5 DRAM PPR, DIMMs with problematic DRAM rows can be reconfigured and maintain the same level of reliability as before the problem occurred.
Configuration	Enabled by default.
Remarks	N/A

MCA Address Translation

Feature name	MCA Address Translation
Description	When a DRAM ECC error occurs, the Ecc error address recorded by the memory controller in the MCA register is a truncated version (relative to the normalized address). The truncated version address needs to be converted into the system physical address through a specific interface before it can be used directly by the platform firmware or operating system. Similarly, the truncated version address needs to be converted into a DRAM physical address through a specific interface before it can be directly used by the platform firmware to obtain DRAM chip select/row/bank/column and other information.
Purpose	Used to report and record DRAM Ecc errors.
Configuration	The DXE stage uses AMD_RAS_SERVICE_DXE_PROTOCOL to perform MCA Address conversion. The Rumtime stage uses AMD_RAS_SERVICE_SMM_PROTOCOL to perform MCA Address conversion.
Remarks	The platform firmware aligns the translation with the provided interface before performing any processing on the DRAM Address recorded in the MCA register.

MCA FruText and DIMM FRU Identification

Feature name	MCA FruText and DIMM FRU Identification
Description	The memory controller records the characteristic information of each Error in the MCA_SYND1_UMC and MCA_SYND2_UMC registers. The CPU supports filling some platform-related FruText information in the MCA error log reported to the operating system into these MCA_SYND1_UMC and MCA_SYND2_UMC registers. MCA FruTestju encoding example: Perr Sx:Txx:Bxx: · Perr—Processor Error · S—Socket# · T—Thread# · B—Bank# · x—hexadecimal number For example, Perr S0:T0a:B1f.
Purpose	Used to quickly analyze the meaning of MCA Error.
Configuration	Set PcdAmdMcaFruTextEnable to not TRUE.
Remarks	· For platform firmware: ¡ The platform firmware needs to implement the DIMM FRU recognition Protocol (DXE_RAS_OEM_DIMMMAP_PROTOCOL) to provide the platform-related memory map table. ¡ The platform firmware needs to populate the FruText field values into HEST and BERT tables. · For operating system (Linux): ¡ When MCA_CONFIG[FruTextInSynd] is 1, print MCA_SYND1 and MCA_SYND2 as ASCII characters in the MCA error log. ¡ When MCA_CONFIG[FruTextInSynd] is 0, print MCA_SYND1 and MCA_SYND2 as hexadecimal numbers in the MCA error log.

Platform RAS features

ABL RAS

Feature name	ABL RAS
Description	During the ABL stage, the processor performs a series of error detection and reporting processes, including: · Executing memory initialization. · Performing DDR training and recording training results. · Conducting MBIST testing and recording test results. · Collecting MBIST data in APOB and storing it in memory. · The driver retrieves MBIST results and DDR training results. · Transmitting error information through Error Codes.
Purpose	Implement error detection and reporting during the ABL stage.
Configuration	Enabled by default and can be configured from BIOS.
Remarks	N/A

Error Injection

Feature name	ABL RAS
Description	To assist with software testing and debugging, the processor supports hardware interfaces for fault injection. Supported error types for injection include: · On-chip ECC/parity check (NBIO, SMU/PSP/MPIO, UMC, and PCIe). · DRAM ECC. · DRAM UECC retry. · DRAM address/command parity check and replay. · On-Package Link Errors. · Off-Package Link Errors. · PCIe LCRC. · PCIe End-to-End CRC. · USB ECC. · SATA parity check.
Purpose	Convenient for server testing and debugging.
Configuration	Enabled by default.
Remarks	N/A

IPMI command classification for RAS reporting

index	Error Types	NETFUN	BMC_LUN	CMD	SubCmd	Description	Reporting Stage
1	McaErr	0x36	0x00	0x2a	0x03	Report MCAerr register Runtime stage POST stage	DXE, SMM
2	MemTestErr	0x36	0x00	0x2a	0x06	Report MemTestErr POST stage	DXE
3	PcieErr	0x36	0x00	0x2a	0x04	Report PcieErr Runtime stage POST stage	DXE, SMM
4	SmnErr	0x36	0x00	0x2a	0x05	Report SmnErr POST stage	DXE
5	NbioErr (Not Implemented)
6	ReportAbsentMemTestErrToBMC (Absent dimm)	0x36	0x00	0x2a	0x08	POST stage	DXE
7	PMIC Error	0x36	0x00	0x2a	0x09	POST stage	DXE
8	SATA Error (Not Implemented)
9	USB Error (Not Implemented)
10	NbioSyncFloodFromPin	0x36	0x00	0x2a	0x0a	POST stage.	DXE
11	DRAM ECC Error	0x36	0x00	0x2a	0x0b	Runtime stage.	SMM
12	McaErr threshold	0x36	0x00	0x04	0x39	Report the MCA CPU err register Runtime stage	SMM
13	PCIeErr threshold	0x36	0x00	0x04	0x3a	Report PCIe err aer register Runtime stage	SMM

SMI storm suppression

Feature name	SMI storm suppression
Description	AMD supports SMI Polling storm suppression. When the CE error reaches the threshold, the SMI interrupt is triggered to report the error. When the triggered SMI reaches the threshold within a certain period of time, an SMI storm is considered to have occurred, and the Polling Mode is entered for suppression, and SMI interrupts are no longer triggered.
Purpose	When an SMI storm occurs, suppress it to avoid system downtime.
Configuration	RAS Periodic SMI Control is enabled by default.
Remarks	N/A

SWITCH RAS

Feature name	SWITCH RAS
Description	SWITCH also has slot numbers and can note and report errors like ordinary PCIe devices.
Purpose	SWITCH can normally trigger RAS related mechanisms.
Configuration	Consistent with ordinary PCIe devices.
Remarks	N/A

H3C AMD G6 Servers RAS Technology White Paper-6W101

Selecting the LeakMode or NoLeakMode mode

Setting the Leak Rate

Setting the Start Count for the memory CE counter

Setting the Memory CE Report Count Threshold

Setting the MCA error threshold

Configuring SMI storm suppression

Memory CE reporting to BMC

Memory CE reporting to OS

Intelligent Terminal Products

Product Support Services

Technical Service Solutions

Resource Center

Policy

Online Help

Become a Partner

Partner Policy & Program

Global Learning

Partner Sales Resources

Service Business

News & Events

Contact Us