H3C G7 Servers HDM3 System Log Messages Reference-6W100

HomeSupportDiagnose & MaintainAlarms ReferencesH3C G7 Servers HDM3 System Log Messages Reference-6W100
Download Book
  • Released At: 05-06-2025
  • Page Views:
  • Downloads:
Table of Contents
Related Documents

H3C G7 Servers HDM3

System Log Messages Reference

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Copyright © 2025 New H3C Technologies Co., Ltd. All rights reserved.

No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.

Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.

The information in this document is subject to change without notice.


Contents

Introduction· 1

Use cases· 1

Obtaining system log messages· 1

System log severity level 1

Using this document 2

Applicable products· 2

Event log messages· 4

Temperature· 4

Dropped below the lower minor threshold· 4

Dropped below the lower major threshold· 4

Dropped below the lower critical threshold· 5

Exceeded the upper minor threshold· 5

Exceeded the upper major threshold· 6

Exceeded the upper critical threshold· 7

Voltage· 8

Dropped below the lower minor threshold· 8

Dropped below the lower major threshold· 8

Dropped below the lower major threshold· 9

Dropped below the lower critical threshold· 9

Exceeded the upper minor threshold· 10

Exceeded the upper major threshold· 10

Exceeded the upper major threshold· 11

Exceeded the upper critical threshold· 11

Transition to Non-Critical from OK· 12

Transition to Non-Critical from OK· 12

Transition to Non-recoverable from less severe· 13

Transition to Non-recoverable from less severe· 13

Transition to Non-recoverable from less severe· 14

Transition to Non-recoverable from less severe· 14

Transition to Non-recoverable from less severe· 15

Transition to Non-recoverable from less severe· 15

Transition to Non-recoverable from less severe· 16

Transition to Non-recoverable from less severe· 16

Transition to Non-recoverable from less severe· 17

Transition to Non-recoverable from less severe· 17

Transition to Non-recoverable from less severe· 18

Transition to Non-recoverable from less severe· 18

Transition to Non-recoverable from less severe· 19

Transition to Non-recoverable from less severe· 19

Transition to Non-recoverable from less severe· 20

Transition to Non-recoverable from less severe· 21

Transition to Non-recoverable from less severe· 22

Transition to Non-recoverable from less severe· 22

Transition to Non-recoverable from less severe· 23

Transition to Non-recoverable from less severe· 23

Transition to Non-recoverable from less severe· 24

Transition to Non-recoverable from less severe· 24

Transition to Non-recoverable from less severe· 25

Transition to Non-recoverable from less severe· 25

Transition to Non-recoverable from less severe· 26

Transition to Non-recoverable from less severe· 26

Transition to Non-recoverable from less severe· 27

Transition to Non-recoverable from less severe· 27

Transition to Non-recoverable from less severe· 28

Transition to Non-recoverable from less severe· 28

Transition to Non-recoverable from less severe· 29

Transition to Non-recoverable from less severe· 29

Transition to Non-recoverable from less severe· 30

Transition to Non-recoverable from less severe· 30

Transition to Non-recoverable from less severe· 31

Transition to Non-recoverable from less severe· 31

Transition to Non-recoverable from less severe· 32

Transition to Non-recoverable from less severe· 32

Transition to Non-recoverable from less severe· 33

Transition to Non-recoverable from less severe· 33

Transition to Non-recoverable from less severe· 34

Transition to Non-recoverable from less severe· 34

Transition to Non-recoverable from less severe· 35

Transition to Non-recoverable from less severe· 35

Transition to Non-recoverable from less severe· 36

Transition to Non-recoverable from less severe· 36

Transition to Non-recoverable from less severe· 37

Transition to Non-recoverable from less severe· 37

Transition to Non-recoverable from less severe· 38

Transition to Non-recoverable from less severe· 38

Transition to Non-recoverable from less severe· 39

Transition to Non-recoverable from less severe· 39

Transition to Non-recoverable from less severe· 40

Transition to Non-recoverable from less severe· 40

Transition to Non-recoverable from less severe· 41

Transition to Non-recoverable from less severe· 41

Transition to Non-recoverable from less severe· 42

Transition to Non-recoverable from less severe· 42

Transition to Non-recoverable from less severe· 43

Transition to Non-recoverable from less severe· 43

Transition to Non-recoverable from less severe· 44

Transition to Non-recoverable from less severe· 44

Transition to Non-recoverable from less severe· 45

Transition to Non-recoverable from less severe· 45

Transition to Non-recoverable from less severe· 46

Transition to Non-recoverable from less severe· 46

Transition to Non-recoverable from less severe· 47

Transition to Non-recoverable from less severe· 47

Transition to Non-recoverable from less severe· 48

Transition to Non-recoverable from less severe· 48

Transition to Non-recoverable from less severe· 49

Transition to Non-recoverable from less severe· 49

Transition to Non-recoverable from less severe· 50

Transition to Non-recoverable from less severe· 50

Transition to Non-recoverable from less severe· 51

Transition to Non-recoverable from less severe· 51

Transition to Non-recoverable from less severe· 52

Transition to Non-recoverable from less severe· 52

Transition to Non-recoverable from less severe· 53

Transition to Non-recoverable from less severe· 53

Transition to Non-recoverable from less severe· 54

Transition to Non-recoverable from less severe· 54

Transition to Non-recoverable from less severe· 55

Transition to Non-recoverable from less severe· 55

Transition to Non-recoverable from less severe· 56

Transition to Non-recoverable from less severe· 56

Transition to Non-recoverable from less severe· 57

Transition to Non-recoverable from less severe· 57

Transition to Non-recoverable from less severe· 58

Transition to Non-recoverable from less severe· 58

Transition to Non-recoverable from less severe· 59

Transition to Non-recoverable from less severe· 59

Transition to Non-recoverable from less severe· 60

Transition to Non-recoverable from less severe· 60

Current 61

Transition to Critical from less severe· 61

Exceeded the upper minor threshold· 61

Exceeded the upper major threshold· 62

Exceeded the upper major threshold· 62

Exceeded the upper critical threshold· 63

Fan· 63

Predictive Failure deasserted· 63

Predictive Failure asserted· 64

Transition to Running· 64

Fully Redundant 65

Non-redundant:Sufficient Resources from Redundant 65

Non-redundant:Insufficient Resources· 66

Physical Security· 66

General Chassis Intrusion· 66

LAN Leash Lost 67

Processor 67

Thermal Trip· 67

FRB1/BIST failure· 68

FRB2/Hang in POST failure· 68

FRB3/Processor Startup/Initialization failure· 69

Configuration Error 69

Processor Presence detected· 70

Processor Automatically Throttled· 70

Processor Automatically Throttled· 71

Processor Automatically Throttled· 71

Machine Check Exception· 72

Triggered a uncorrectable error 72

Machine Check Exception· 73

Triggered a correctable error 73

Correctable Machine Check Error 74

Correctable Machine Check Error 74

Correctable Machine Check Error 75

Power Supply· 75

Presence detected· 75

Power Supply Failure detected· 76

Power Supply Predictive Failure· 76

Power Supply input lost (AC/DC) 77

Power Supply input lost or out-of-range· 77

Power Supply input out-of-range - but present 78

Configuration error ---Vendor mismatch· 78

Configuration error---Power Supply rating mismatch· 79

Configuration error---Power supply rating mismatch· 79

Power Supply Inactive/standby state· 80

PSU failure detected by CPLD·· 80

Redundancy Lost 81

Power Unit 81

Power limit is exceeded over correction time limit 81

Cooling Device· 82

Transition to OK· 82

Transition to Non-recoverable---Liquid leakage occurred· 82

Transition to Non-recoverable from less severe· 83

Transition to Non-Critical from OK--- Liquid leakage detection cable is disconnected· 83

Other Units-based Sensor 84

Exceeded the upper minor threshold· 84

Memory· 84

Correctable ECC or other correctable memory error 84

Correctable ECC or other correctable memory error 85

Correctable ECC or other correctable memory error 85

CPU triggered a correctable error 86

Uncorrectable ECC or other uncorrectable memory error 86

Uncorrectable ECC or other uncorrectable memory error 87

Uncorrectable ECC or other uncorrectable memory error 88

Uncorrectable ECC or other uncorrectable memory error 89

Triggered an uncorrectable error 89

Parity· 90

Parity· 90

Parity· 91

Parity---Memory training error 91

Parity---CmdPiGroup: No Eye width· 92

Memory Device Disabled· 93

Memory Device Disabled---The DIMM is disabled· 94

Memory Device Disabled---The DIMM is disabled because of inconsistency with POR restrictions· 94

Memory Device Disabled---Buck Regulator Output Over or Under Voltage Lockout 95

Correctable ECC or other correctable memory error logging limit reached· 96

Correctable ECC or other memory error limit reached· 97

Presence detected· 97

Memory patrol scrub CE occurred· 98

Memory patrol scrub UCE occurred and degraded to CE· 99

Memory patrol scrub CE occured· 99

Memory patrol scrub UCE occurred· 100

Configuration Error---DIMM speed is less than the minimum POR DIMM speed· 100

Drive Slot 101

Drive Presence· 101

Drive Fault 101

Drive Fault 102

Drive Fault---The disk is present, but its details cannot be obtained· 102

Drive Fault---The disk is present, but its details cannot be obtained· 103

Drive Fault 103

Predictive Failure· 104

Predictive Failure· 104

Predictive Failure· 105

In Critical Array· 105

In Failed Array· 106

Rebuild/Remap in progress· 106

The disk triggered an media error 107

The disk triggered an uncorrectable error 107

The disk is missing· 108

System Firmware Progress· 108

System Firmware Error (POST Error)---CPU PPL initialization failed· 108

System Firmware Error (POST Error)---No memory found· 109

System Firmware Error (POST Error)---CPU matching failure---CPU stepping is detected· 109

System Firmware Error (POST Error)---CPU matching failure---CPU frequency is detected· 110

System Firmware Error (POST Error)---CPU matching failure---CPU Microcode is detected· 110

System Firmware Error (POST Error)---CPU matching failure---UPI Topology is detected· 111

System Firmware Error (POST Error)---CPU matching failure· 111

System Firmware Error(POST Error)---Unrecoverable video controller failure· 112

System Firmware Hang· 112

System Firmware Hang---C2C initialization failed· 113

System Firmware Hang---C2C initialization cannot obtain parameter table· 113

System software triggered an uncorrectable error 114

System software triggered a correctable error 114

System Firmware Progress---Memory initialization---The system is unable to find memory parameter table  115

System Firmware Progress---Secondary processor(s) initialization---Detection unsuccessful 115

System Firmware Progress---PCI resource configuration---PCIe controller initialization failed· 116

System Firmware Progress---PCI resource configuration---PCIe controller initialization cannot find parameter table  116

System Firmware Progress---Video initialization---Detection unsuccessful 117

Event Logging Disabled· 117

Log Area Reset/Cleared· 117

SEL Full 118

SEL Almost Full 118

System Event 119

System Reconfigured---BIOS load default. CMOS cleared· 119

Oem system boot event---LPC Reset occurred· 119

Limit Exceeded---CPU usage exceeds the threshold· 120

Limit Exceeded---Mem usage exceeds the threshold· 120

Limit Exceeded---Network usage exceeds the threshold· 121

Limit Exceeded---Hard disk usage exceeds the threshold· 121

Timestamp clock synch· 122

Timestamp clock synch---BMC Time SYNC succeed· 122

Critical Interrupt 123

Transition to Non-Critical from OK· 123

Bus Correctable Error 123

GPU Device Correctable Error 123

GPU PCIe Bus Correctable Error 124

GPU Vedio Memory Correctable Error 124

Bus Uncorrectable Error 125

GPU Device Uncorrectable Error 126

GPU PCIe Bus Uncorrectable Error 126

GPU Vedio Memory Uncorrectable Error 127

Bus Fatal Error 128

Bus Degraded· 129

$1 triggered an uncorrectable error 130

$1 triggered a correctable error 131

Button / Switch· 132

Power Button pressed---Physical button---Button pressed· 132

Reset Button pressed· 132

Module / Board· 133

Transition to Non-Critical from OK($1) 133

Transition to Critical from less severe· 133

Transition to Non- Recoverable from less severe· 134

System board triggered a correctable error 134

System board triggered an uncorrectable error 135

Add-in Card· 135

Transition to OK· 135

Transition to Critical from less severe· 136

Transition to Critical from less severe· 137

Transition to Non-recoverable  from less severe· 138

ChipSet 138

Transition to Critical from less severe· 138

Cable/Interconnect 139

Configuration Error - Incorrect cable connected / Incorrect interconnection· 139

Configuration Error - Incorrect cable connected / Incorrect interconnection· 139

Configuration Error - Incorrect cable connected / Incorrect interconnection· 140

Configuration Error - Incorrect cable connected / Incorrect interconnection· 140

System Boot / Restart Initiated· 141

Initiated by power up· 141

Initiated by hard reset 141

Initiated by warm reset 142

System restart 142

Boot Error 143

No bootable media· 143

OS_BOOT· 143

C: boot completed· 143

Boot completed - boot device not specified· 144

OS Stop / Shutdown· 144

Run-time Critical Stop· 144

OS Graceful Stop· 145

OS Graceful Shutdown· 145

Slot / Connector 146

Device Disabled: PCIe module information not obtained· 146

Fault Status asserted· 146

Transition to Non-Critical from OK· 147

System ACPI Power State· 147

S0 / G0 "working" 147

S0 / G0 "working" 148

S5 / G2 "soft-off" 148

S5 / G2 "soft-off" 149

S4 / S5 soft-off, particular S4 / S5 state cannot be determined· 150

Watchdog2· 151

Watchdog overflowAction:Timer expired· 151

Watchdog overflowAction:Hard Reset 152

Watchdog overflowAction:Power Down· 153

Watchdog overflowAction:Power Cycle· 154

Entity Presence· 155

Entity Present---License is about to expire· 155

Entity Disabled---License has expired· 155

Management Subsystem Health· 156

Controller access degraded or unavailable· 156

Controller access degraded or unavailable· 156

Battery· 157

Battery low (predictive failure) 157

Battery failed· 158

Battery presence detected· 158

Version Change· 159

Hardware incompatibility detected with associated Entity---Memory is not certified· 159

 


Introduction

This document describes HDM3 log messages generated to notify the occurrence and removal of system exceptions detected by sensors in the server. You can use this document to obtain message details and recommended actions for server maintenance.

HDM3 is an upgraded version of HDM. For convenience purposes, the term "HDM" refers to HDM3 in this document.

Use cases

When the device experiences a failure or certain reasons lead to an abnormal working state of the system, the system is able to generate alarms based on the faults occurring in different modules, as well as generate event log information. After obtaining the log information, users can search for the corresponding log information in this document using the relevant fields in the log information. This will allow them to understand the detailed content of the log information and receive recommended solutions for handling, thus facilitating the maintenance of the server's normal operation.

Obtaining system log messages

You can obtain system log messages through the following methods:

·     HDM Web interface—Access the HDM Web interface and click Remote O&M > Log > Log Download. On the Log Download tab, select to download the entire log or log entries for a period.

·     Alert emails—Complete alert email settings to obtain log messages.

·     Third-party platform—Complete SNMP, SMTP, and SYSLOG settings to connect HDM to a third-party management platform, and obtain log messages from the platform.

·     Redfish event subscription—If a remote subscription server is configured, Redfish uploads received log messages to the remote subscription server.

·     IPMI commands—Use IPMItool commands to access the IPMI interface for BMC and enter commands to obtain event log messages.

System log severity level

Table 1 System log message severity levels

Severity

Description

Critical

The following conditions are present: severe decreases in the processing power of the system processing unit, significant reduction in available system resources, severe decreases in service processing capabilities, widespread interruptions in service modules, or unavailability of storage devices. This may lead to or cause server failure, system crashes, service data loss or other similar situations. Immediate action is required.

Major

Such events have had a significant impact on the system and there is a possibility of interrupting the normal operation of system or service modules (computing, storage, communication, and user data security), which may lead to service interruption.

Minor

Such events have not had a significant impact on the system, but there may be some risks and potential hazards. It is advisable to observe the relevant events and take necessary measures when needed to prevent further escalation of faults.

Info

Event logs generated during the normal operation of the server. Such events do not affect the normal operation of the server and do not require any action.

 

Using this document

This document explains messages in tables. Table 2 describes information provided in these tables.

Table 2 Message explanation table contents

Item

Description

Example

Event code

A hexadecimal code that uniquely represents a log message.

The parity of the last character in the event code represents the alarm type:

·     Even—An alarm was generated.

·     Odd—An alarm was removed.

0x02900002

Message text

Presents the message description. The same message description might be reported by different types of sensors.

Exceeded the upper major threshold.---Current reading:$1---Threshold reading:$2

Variable fields

Briefly describes the variable fields in the order that they appear in the message text.

The variable fields are numbered in the "$Number" form to help you identify their location in the message text.

·     $1: Current reading of the voltage sensor.

·     $2: Major overvoltage threshold of the voltage sensor.

Severity level

Provides the severity level of the message.

Major

Example

Log example.

Exceeded the upper major threshold.---Current reading:2.58---Threshold reading:2.56

Impact

Explains the impact of the alarm event on the system

Performance degradation and unstable operation might occur on the device components if the voltage is too high.

Cause

Explains the reason for the log generation

Abnormal board voltage.

Recommended action

Provides recommended actions. If the issue persists after the recommended actions have been taken, contact the technical support.

1.     Verify that the external power supply is operating correctly.

2.     Access the HDM Web interface and verify that the power supply is operating correctly.

3.     If the issue persists, contact Technical Support.

 

Applicable products

This document is available for the following product models:

·     H3C UniServer R3350 G7

·     H3C UniServer R3950 G7

·     H3C UniServer R4700 G7

·     H3C UniServer R4900 G7

·     H3C UniServer R4930 G7

·     H3C UniServer R4950 G7

·     H3C UniServer R4970 G7

·     H3C UniServer R5330 G7

·     H3C UniServer R5500 G7


Event log messages

Temperature

Dropped below the lower minor threshold

Event code

0x01000002

Variable fields

$1: Current reading of the temperature sensor

$2: Threshold (in Celsius) for triggering a minor low-temperature notification.

Severity level

Minor

Example

Dropped below the lower minor threshold---Current reading:8--Threshold reading:10

Impact

Performance degradation and unstable operation might occur on the device components if the temperature is too low.

If the temperature does not rise and the alarm persists, it may result in further temperature reduction and produce alarms of the major level. Therefore, it is important to detect potential issues that may lead to low temperature alarms as early as possible to avoid escalation of the problem.

Cause

The temperature is too low.

Recommended action

1.     ‍Adjust the temperature of the equipment room.

2.     If the issue persists, contact Technical Support.

 

Dropped below the lower major threshold

Event code

0x01200002

Message text

Dropped below the lower major threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the temperature sensor

$2: Threshold (in Celsius) for triggering a major low-temperature notification.

Severity level

Major

Example

Dropped below the lower major threshold---Current reading:4--Threshold reading:5

Impact

Performance degradation and unstable operation might occur on the device components if the temperature is too low.

If the temperature does not rise and the alarm persists, it may result in further temperature reduction and generate alarms of the critical level. Therefore, it is important to detect potential issues that may lead to low temperature alarms as early as possible in order to avoid problem escalation.

Cause

The temperature is too low.

Recommended action

1.     ‍Adjust the temperature of the equipment room.

2.     If the issue persists, contact Technical Support.

 

Dropped below the lower critical threshold

Event code

0x01400002

Message text

Dropped below the lower critical threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the temperature sensor

$2: Threshold (in Celsius) for triggering a critical low-temperature notification.

Severity level

Critical

Example

Dropped below the lower critical threshold---Current reading:0--Threshold reading:1

Impact

Operating devices in ultra-low temperature environments can reduce device performance, impact device lifespan, disrupt business operations, and lead to system downtime.

Cause

The temperature is too low.

Recommended action

1.     ‍Adjust the temperature of the equipment room.

2.     If the issue persists, contact Technical Support.

 

Exceeded the upper minor threshold

Event code

0x01700002

Message text

Exceeded the upper minor threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the temperature sensor

$2: Threshold (in Celsius) for triggering a minor high-temperature notification.

Severity level

Minor

Example

Exceeded the upper minor threshold---Current reading:85---Threshold reading:80

Impact

Performance degradation and unstable operation might occur on the device components if the temperature is too high.

If the temperature does not decrease and the alarm persists, it may result in further temperature rise and generate major-level alarms. Therefore, it is important to detect potential issues that may lead to high temperature alarms as early as possible in order to avoid problem escalation.

Cause

High ambient temperature, blockage of air intake or exhaust, and low fan speed.

Recommended action

1.     ‍Adjust the temperature of the equipment room.

2.     Verify that the air inlet and outlet are not blocked.

3.     Log in to HDM, and verify that the fans are running correctly. If abnormal fans exist, replace them.

4.     Log in to HDM, access the fan management page, and verify that the fan speed is appropriate.

5.     If the issue persists, contact Technical Support.

 

Exceeded the upper major threshold

Event code

0x01900002

Message text

Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the temperature sensor

$2: Threshold (in Celsius) for triggering a major high-temperature notification.

Severity level

Major

Example

Exceeded the upper major threshold---Current reading:90---Threshold reading:88

Impact

Performance degradation and unstable operation might occur on the device components if the temperature is too high.

If the temperature does not decrease and the alarm persists, it may result in further temperature rise and generate critical-level alarms. Therefore, it is important to detect potential issues that may lead to high temperature alarms as early as possible in order to avoid problem escalation.

Cause

High ambient temperature, clogged air intake or exhaust, and low fan speed.

Recommended action

1.     ‍Adjust the temperature of the equipment room.

2.     Verify that the air inlet and outlet are not blocked.

3.     Log in to HDM, and verify that the fans are running correctly. If abnormal fans exist, replace them.

4.     Log in to HDM, access the fan management page, and verify that the fan speed is appropriate.

5.     If the issue persists, contact Technical Support.

 

Exceeded the upper critical threshold

Event code

0x01b00002

Message text

Exceeded the upper critical threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the temperature sensor

$2: Threshold (in Celsius) for triggering a critical high-temperature notification.

Severity level

Critical

Example

Exceeded the upper critical threshold---Current reading:95---Threshold reading:90

Impact

Operating devices in high-temperature environments can reduce device performance, impact device lifespan, increase energy consumption, disrupt business operations, and cause system crashes.

Cause

High ambient temperature, clogged air intake or exhaust, and low fan speed.

Recommended action

1.     ‍Adjust the temperature of the equipment room.

2.     Verify that the air inlet and outlet are not blocked.

3.     Log in to HDM, and verify that the fans are running correctly. If abnormal fans exist, replace them.

4.     Log in to HDM, access the fan management page, and verify that the fan speed is appropriate.

5.     If the issue persists, contact Technical Support.

 

Voltage

Dropped below the lower minor threshold

Event code

0x02000002

Message text

Dropped below the lower minor threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the voltage sensor.

$2: Threshold for triggering a minor low-voltage notification.

Severity level

Minor

Example

Dropped below the lower minor threshold---Current reading:8--Threshold reading:10

Impact

Performance degradation and unstable operation might occur on the device components if the voltage is too low.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify whether the log was generated during device power-on or power-off. If it was, no action is required.

2.     If device was running correctly when the log was generated, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Dropped below the lower major threshold

Event code

0x02200002

Message text

Dropped below the lower major threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the voltage sensor.

$2: Threshold for triggering a major low-voltage notification.

Severity level

Major

Example

Dropped below the lower major threshold---Current reading:4--Threshold reading:5

Impact

Performance degradation and unstable operation might occur on the device components if the voltage is too low.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify whether the log was generated during device power-on or power-off. If it was, no action is required.

2.     If device was running correctly when the log was generated, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Dropped below the lower major threshold

Event code

0x02220002

Message text

Dropped below the lower major threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the voltage sensor.

$2: Threshold for triggering a major low-voltage notification.

Severity level

Major

Example

Dropped below the lower major threshold---Current reading:10---Threshold reading:2

Impact

Memory and system performance degradation might occur.

Cause

This alarm is generated when the PMIC voltage reading of the memory is lower than the low voltage major alarm threshold.

Recommended action

1.     ‍Verify whether the log was generated during device power-on or power-off. If it was, no action is required.

2.     If device was running correctly when the log was generated, replace the DIMM.

3.     If the issue persists, contact Technical Support.

 

Dropped below the lower critical threshold

Event code

0x02400002

Message text

Dropped below the lower critical threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the voltage sensor.

$2: Threshold for triggering a critical low-voltage notification.

Severity level

Critical

Example

Dropped below the lower critical threshold---Current reading:0--Threshold reading:1

Impact

The device is running in an ultra-low voltage environment, which affects the system's power supply, or causes one board to power off, leading to a system crash.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify whether the log was generated during device power-on or power-off. If it was, no action is required.

2.     If device was running correctly when the log was generated, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Exceeded the upper minor threshold

Event code

0x02700002

Message text

Exceeded the upper minor threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the voltage sensor.

$2: Threshold for triggering a minor high-voltage notification.

Severity level

Minor

Example

Exceeded the upper minor threshold---Current reading:85---Threshold reading:80

Impact

Performance degradation and unstable operation might occur on the device components if the voltage is too high.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify whether the log was generated during device power-on or power-off. If it was, no action is required.

2.     If device was running correctly when the log was generated, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Exceeded the upper major threshold

Event code

0x02900002

Message text

Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the voltage sensor.

$2: Threshold for triggering a major high-voltage notification.

Severity level

Major

Example

Exceeded the upper major threshold---Current reading:90---Threshold reading:88

Impact

Performance degradation and unstable operation might occur on the device components if the voltage is too high.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify whether the log was generated during device power-on or power-off. If it was, no action is required.

2.     If device was running correctly when the log was generated, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Exceeded the upper major threshold

Event code

0x02920002

Message text

Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the voltage sensor.

$2: Threshold for triggering a major high-voltage notification.

Severity level

Major

Example

Exceeded the upper major threshold---Current reading:10---Threshold reading:1

Impact

Memory and system performance degradation might occur.

Cause

This alarm is generated when the PMIC voltage of the memory is higher than the current major voltage alarm threshold.

Recommended action

1.     ‍Verify whether the log was generated during device power-on or power-off. If it was, no action is required.

2.     If device was running correctly when the log was generated, replace the DIMM.

3.     If the issue persists, contact Technical Support.

 

Exceeded the upper critical threshold

Event code

0x02b00002

Message text

Exceeded the upper critical threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the voltage sensor.

$2: Threshold for triggering a critical high-voltage notification.

Severity level

Critical

Example

Exceeded the upper critical threshold---Current reading:95---Threshold reading:90

Impact

The device is operating in an ultra-high voltage environment, which affects the system's power supply, or causes a board to power off, resulting in a system crash.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify whether the log was generated during device power-on or power-off. If it was, no action is required.

2.     If device was running correctly when the log was generated, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-Critical from OK

Event code

0x0218d00e

Message text

Transition to Non- Critical from OK($1)

Variable fields

$1: VGA_REAR, USB_REAR_UP, USB_REAR_DOWN,  EAR_VGA2, EAR_LCD, L_EAR_USB, INNER_USB, R_EAR_USB, L_EAR_VGA, L_EAR_TYPEC

Severity level

Minor

Example

Transition to Non-Critical from OK(VGA_REAR)

Impact

The system will be powered off.

Cause

In-board voltage error.

Recommended action

1.     ‍Verify that the power supply is correct and stable.

2.     Verify if the alarm interface is connected to a device. If connected, disconnect it to prevent overcurrent caused by device short-circuit.

3.     Power off and restart the server.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-Critical from OK

Event code

0x0219000e

Message text

Transition to Non-Critical from OK ($1)

Variable fields

$1: USB_U41, USB_U40

Severity level

Minor

Example

Transition to Non-Critical from OK (USB_U41)

Impact

The system will be powered off.

Cause

In-board voltage error.

Recommended action

1.     ‍Verify that the power supply is correct and stable.

2.     Power off and restart the server.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x1530200e

Message text

Transition to Non-recoverable from less severe

Variable fields

N/A

Severity level

Critical

Example

Transition to Non-recoverable from less severe

Impact

HDD Bay does not operate correctly, which impacts the reliability of the system.

Cause

The HDD Bay voltage is abnormal.

Recommended action

1.     ‍Reconnect the HDD Bay node. Make sure the node is completely powered off from the AC power source before powering on the node.

2.     If the issue persists, replace the HDD Bay component.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x1530300e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: Faulty power supply position on the GPU riser card.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(P3V3_A)

Impact

The system is powered off.

Cause

A power supply error occurred on the GPU riser card.

Recommended action

1.     Gently press the physical power button to power off the server. Then, press the physical power button again to power on the server.

2.     If the server fails to power on, reconnect all power cords, and then power on the server.

3.     If the issue persists, replace the GPU riser card.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x1530400e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: Faulty power supply position on the GPU riser card.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(P12V_0)

Impact

The system is powered off.

Cause

A power supply error occurred on the GPU riser card.

Recommended action

1.     Gently press the physical power button to power off the server. Then, press the physical power button again to power on the server.

2.     If the server fails to power on, reconnect all power cords, and then power on the server.

3.     If the issue persists, replace the GPU riser card.

4.     If the issue persists, reconnect or replace the power cord of the GPU power adapter board.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0230010e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P1V8_PCH_STBY, P1V05_PCH_STBY, PVNN_PCH_STBY, PVCCIN_CPU0_VR_HOT, PVCCD_HV_CPU0_VR_HOT, PVCCINFAON_CPU0_VR_HOT.

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P1V8_PCH_STBY)

Impact

The system will be powered off.

Cause

The current or voltage on the system board is abnormal.

Recommended action

1.     ‍Gently press the physical power button to power off the host, and then press the power button to power on the host again.

2.     If the host fails to be powered on again, reconnect all power cords to ensure that the host is powered on after it is completely powered off.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0230a00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure ($1)

Variable fields

$1: AC lost

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure (AC lost)

Impact

The AC power supply is removed from the device.

Cause

CPLD detected ACFAIL signal from all PSUs.

Recommended action

1.     ‍Examine whether any abnormalities exist in the power supply network of the device, such as power grid fluctuations, PDU abnormality, or poor contact of the national standard power cord.

2.     Examine the PSUs for errors. If problems are detected, replace the PSUs.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0230c00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: RISER1, RISER2, RISER3, RISER4, RISER5

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(RISER1)

Impact

The system will be powered off.

Cause

·     A power error occurs on RISER P3V3.

·     Overcurrent is detected on RISER INA3221.

Recommended action

1.     Gently press the physical power button to power off the host, and then press the power button to power on the host again.

2.     If the host fails to be powered on again, reconnect all power cords to ensure that the host is powered on after it is completely powered off.

3.     If the issue persists, replace the faulty riser card.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0230d00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: Mezz1, Mezz2, Mezz3, NIC_MEZZ.

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(NIC_MEZZ)

Impact

The system will be powered off.

Cause

The Mezz card power supply is abnormal.

Recommended action

1.     ‍Reconnect the power cords. Verify that the server can be powered on correctly. If the server cannot be powered on, replace the corresponding MEZZ module.

2.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231110e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: CPU1_DIMM_AB_PMIC_ERROR.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(CPU1_DIMM_AB_PMIC_ERROR)

Impact

The system will be powered off.

Cause

The power supply is abnormal.

Recommended action

1.     ‍Memory power failure occurred in channel 1 and channel 2 for processor 1, re-install the memory module in the alarming channels.

2.     Replace the memory modules in the alarming channels.

3.     Remove the corresponding processor and verify that the processor socket does not have bent pins or foreign objects.

4.     Re-install the processor and make sure it is properly secured.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231120e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: CPU1_DIMM_CD_PMIC_ERROR.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(CPU1_DIMM_CD_PMIC_ERROR)

Impact

The system will be powered off.

Cause

The power supply is abnormal.

Recommended action

1.     ‍Memory power failure occurred in channel 3 and channel 4 for processor 1, re-install the memory module in the alarming channels.

2.     Replace the memory modules in the alarming channels.

3.     Remove the corresponding processor and verify that the processor socket does not have bent pins or foreign objects.

4.     Re-install the processor and make sure it is properly secured.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231130e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: CPU1_DIMM_EF_PMIC_ERROR.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(CPU1_DIMM_EF_PMIC_ERROR)

Impact

The system will be powered off.

Cause

The power supply is abnormal.

Recommended action

1.     ‍Memory power failure occurred in channel 5 and channel 6 for processor 1, re-install the memory module in the alarming channels.

2.     Replace the memory modules in the alarming channels.

3.     Remove the corresponding processor and verify that the processor socket does not have bent pins or foreign objects.

4.     Re-install the processor and make sure it is properly secured.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231140e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: CPU1_DIMM_GH_PMIC_ERROR.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(CPU1_DIMM_GH_PMIC_ERROR)

Impact

The system will be powered off.

Cause

The power supply is abnormal.

Recommended action

1.     ‍Memory power failure occurred in channel 7 and channel 8 for processor 1, re-install the memory module in the alarming channels.

2.     Replace the memory modules in the alarming channels.

3.     Remove the corresponding processor and verify that the processor socket does not have bent pins or foreign objects.

4.     Re-install the processor and make sure it is properly secured.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231150e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: CPU2_DIMM_AB_PMIC_ERROR.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(CPU2_DIMM_AB_PMIC_ERROR)

Impact

The system will be powered off.

Cause

The power supply is abnormal.

Recommended action

1.     ‍Memory power failure occurred in channel 1 and channel 2 for processor 2, re-install the memory module in the alarming channels.

2.     Replace the memory modules in the alarming channels.

3.     Remove the corresponding processor and verify that the processor socket does not have bent pins or foreign objects.

4.     Re-install the processor and make sure it is properly secured.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231160e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: CPU2_DIMM_CD_PMIC_ERROR.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(CPU2_DIMM_CD_PMIC_ERROR)

Impact

The system will be powered off.

Cause

The power supply is abnormal.

Recommended action

1.     ‍Memory power failure occurred in channel 3 and channel 4 for processor 2, re-install the memory module in the alarming channels.

2.     Replace the memory modules in the alarming channels.

3.     Remove the corresponding processor and verify that the processor socket does not have bent pins or foreign objects.

4.     Re-install the processor and make sure it is properly secured.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231170e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: CPU2_DIMM_EF_PMIC_ERROR.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(CPU2_DIMM_EF_PMIC_ERROR)

Impact

The system will be powered off.

Cause

The power supply is abnormal.

Recommended action

1.     ‍Memory power failure occurred in channel 5 and channel 6 for processor 2, re-install the memory module in the alarming channels.

2.     Replace the memory modules in the alarming channels.

3.     Remove the corresponding processor and verify that the processor socket does not have bent pins or foreign objects.

4.     Re-install the processor and make sure it is properly secured.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231180e

Message text

Transition to Non-recoverable from less severe($1)

Variable fields

$1: CPU2_DIMM_GH_PMIC_ERROR.

Severity level

Critical

Example

Transition to Non-recoverable from less severe(CPU2_DIMM_GH_PMIC_ERROR)

Impact

The system will be powered off.

Cause

The power supply is abnormal.

Recommended action

1.     ‍Memory power failure occurred in channel 7 and channel 8 for processor 2, re-install the memory module in the alarming channels.

2.     Replace the memory modules in the alarming channels.

3.     Remove the corresponding processor and verify that the processor socket does not have bent pins or foreign objects.

4.     Re-install the processor and make sure it is properly secured.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0230200e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure on $1

Variable fields

$1: CPU power supply failure type. Options include PVCCD_HV_CPU3, PVPP_HBM_CPU3, PVCCFA_EHV_CPU3, PVCCFA_EHV_FIVRA_CPU3, PVCCINFAON_CPU3, PVNN_MAIN_CPU3, PVCCIN_CPU3, PVCCD_HV_CPU4, PVPP_HBM_CPU4, PVCCFA_EHV_CPU4, PVCCFA_EHV_FIVRA_CPU4, PVCCINFAON_CPU4, PVNN_MAIN_CPU4, and PVCCIN_CPU4.

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure (PVCCD_HV_CPU3)

Impact

The device is immediately shut down and enters power failure state.

The LEDs on the chassis ear are rapidly flashing (the power LED flashes red, the UID LED flashes blue, the NIC LED flashes green, and the health LED flashes red), and the status is no longer controllable. The LED status recovers after the issue is resolved and the device is re-powered on.

Cause

The internal power supply of the CPU experiences faults such as overcurrent, overvoltage, or undervoltage in the VR chip on the processor mezzanine board that corresponds to the CPU power supply.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the processor mezzanine board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0230300e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure on the backplane($1)

Variable fields

$1: REAR_BACKPLANE, REAR_BACKPLANE1, REAR_BACKPLANE2, REAR_BACKPLANE3, FRONT_BACKPLANE1, FRONT_BACKPLANE2, FRONT_BACKPLANE3, MID_BACKPLANE1, MID_BACKPLANE2.

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure on the backplane(FRONT_BACKPLANE1)

Impact

The system will be powered off.

Cause

The board voltage is abnormal.

Recommended action

1.     ‍Verify that the drive backplane is operating correctly.

2.     Reconnect the power cord to verify that the server can be powered on correctly. If the server cannot be powered on correctly, replace the drive backplane.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231d00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure on $1

Variable fields

$1: RAID card

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure on RAID card

Impact

The device immediately shuts down and enters power failure state.

The LEDs on the chassis ear are rapidly flashing (the power LED flashes red, the UID LED flashes blue, the NIC LED flashes green, and the health LED flashes red), and the status is no longer controllable. The LED status recovers after the issue is resolved and the device is re-powered on.

Cause

CPLD detects PGD signal of RAID controller.

Recommended action

1.     ‍Reconnect the power cords. Verify that the server can be powered on correctly. If the server cannot be powered on, replace the corresponding RAID controller.

2.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231400e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: M.2 adapter card

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(M.2 adapter card)

Impact

The system will be powered off.

Cause

The power supply of the M.2 adapter card has failed.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the corresponding M.2 adapter card.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0231500e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: OCP1 network card, OCP2 network card, or OCP3 network card

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(OCP1 network card)

Impact

The system will be powered off.

Cause

The power supply of the OCP network adapter is abnormal.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the OCP network adapter.

3.     If the issue persists, replace the adapter.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0232200e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: Mezzanine storage controller

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(Mezzanine storage controller)

Impact

The system will be powered off.

Cause

The power supply of the Mezzanine storage controller has failed.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the corresponding Mezzanine storage controller.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0239000e

Message text

Transition to Non-recoverable from less severe---System detected GPU power supply failure($1)

Variable fields

$1: P12V_SLOT_1_2, P12V_SLOT_3_4, P12V_SLOT_5_6, P12V_SLOT_7_8.

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected GPU power supply failure(P12V_SLOT_1_2)

Impact

The system might be powered off.

Cause

The power supply to the system board is abnormal.

Recommended action

1.     Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, remove the GPU from the corresponding slot and then power on the server. If the server is powered on, contact Technical Support. If the server cannot be powered on, proceed to the next step.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0233000e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: BMC_network_PHY_P1V0, BMC_network_PHY_P1V8, RGM_3V3

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(BMC_network_PHY_P1V0)

Impact

The system will be powered off.

Cause

The power supply of the BMC card is abnormal.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the BMC card.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0233500e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure on GPUSWITCH($1)

Variable fields

$1: P3V3_A, P3V3_B, P3V3_C, P3V3_D, P12V_0, P12V_1, P1V8, P1V8_A, P1V8_B, P0V9_2_3, P0V9_0_1, P12V_SLOT1, P12V_SLOT3, P12V2, or P12V3.

Severity level

Critical

Example

Transition to Non-recoverable from less severe--- System detected a power supply failure on GPUSWITCH(P1V8)

Impact

The system might be powered off.

Cause

·     The power cables of the GPUSWITCH board are either not properly assembled or not fully connected.

·     The AUX cable from the GPUSWITCH board to the system board is not properly assembled.

·     The power of the GPUSWITCH board is abnormal.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server cannot be poweredon, verify if the cables of the GPUSWITCH board are connected correctly.

3.     If the issue persists, replace the GPUSWITCH board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0233a00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: DSD card

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(DSD card)

Impact

The system will be powered off.

Cause

Abnormal DSD card voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the DSD card.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0233d00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V)

Impact

The system will be powered off.

Cause

Abnormal P12V voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     Sequentially check the PSU, fans, RISER, drive backplane, and system board.

3.     Replace the faulty component.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0233e00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P5V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P5V)

Impact

The system will be powered off.

Cause

Abnormal P5V voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the BMC card.

4.     If the issue persists, replace the rear backplane.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0233f00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P5V_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P5V_STBY)

Impact

The system will be powered off.

Cause

Abnormal P5V voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234000e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V_STBY)

Impact

The system will be powered off.

Cause

Abnormal P12V voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the OCP3 module.

4.     If the issue persists, replace the fan module.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234100e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V Overcurrent

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V Overcurrent)

Impact

The system will be powered off.

Cause

Abnormal P12V signal current.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the fan module.

4.     If the issue persists, replace the memory module.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234200e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCD_HV_CPU1, CPU1_VDD_Core, PVCCD_HV0_CPU1, PVCCD_HV1_CPU1

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCD_HV_CPU1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234300e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVPP_HBM_CPU1, CPU1_VDDQ

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVPP_HBM_CPU1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234400e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCFA_EHV_CPU1, CPU1_PCIe_P1V8

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCFA_EHV_CPU1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234500e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCFA_EHV_FIVRA_CPU1, CPU1_PCIe_P0V9

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCFA_EHV_FIVRA_CPU1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234600e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCINFAON_CPU1, CPU1_DDR_VDD, PVCCINF_CPU1

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCINFAON_CPU1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234700e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVNN_MAIN_CPU1, CPU1_PLL_P1V8

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVNN_MAIN_CPU1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234800e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCIN_CPU1, CPU1_P1V8

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCIN_CPU1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234900e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCD_HV_CPU2, CPU2_VDD_Core, PVCCD_HV0_CPU2, PVCCD_HV1_CPU2

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCD_HV_CPU2)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234a00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVPP_HBM_CPU2, CPU2_VDDQ

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVPP_HBM_CPU2)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234b00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCFA_EHV_CPU2, CPU2_PCIe_P1V8

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCFA_EHV_CPU2)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234c00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCFA_EHV_FIVRA_CPU2, CPU2_PCIe_P0V9

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCFA_EHV_FIVRA_CPU2)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234d00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCINFAON_CPU2, CPU2_DDR_VDD, PVCCINF_CPU2

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCINFAON_CPU2)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234e00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVNN_MAIN_CPU2, CPU2_PLL_P1V8

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVNN_MAIN_CPU2)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0234f00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCIN_CPU2, CPU2_P1V8

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCIN_CPU2)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235000e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P3V3_STBY_A, P3V3

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P3V3_STBY_A)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235100e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P5V_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P5V_STBY)

Impact

The system will be powered off.

 

Cause

Abnormal board voltage.

 

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the BMC card.

4.     If the issue persists, contact Technical Support.

 

 

Transition to Non-recoverable from less severe

Event code

0x0235200e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V_STBY)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the PSUs.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235300e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the PSUs.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235400e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU1_1V8_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU1_1V8_STBY)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235500e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU1_3V3_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU1_3V3_STBY)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235600e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU2_1V8_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU2_1V8_STBY)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235700e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU2_3V3_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU2_3V3_STBY)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235800e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU1_VDDCR1

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU1_VDDCR1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235900e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU1_VDDCR0

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU1_VDDCR0)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235a00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU1_VDDCR_SOC

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU1_VDDCR_SOC)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235b00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU1_VDDIO

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU1_VDDIO)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235c00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU1_1V1

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU1_1V1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235d00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU2_VDDCR1

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU2_VDDCR1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235e00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU2_VDDCR0

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU2_VDDCR0)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0235f00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU2_VDDCR_SOC

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU2_VDDCR_SOC)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236000e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU2_VDDIO

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU2_VDDIO)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236100e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU2_1V1

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU2_1V1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace CPU2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236200e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: OCP1 card

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(OCP1 card)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the OCP1 card.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236300e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: OCP2 card

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(OCP2 card)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the OCP2 card.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236400e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: OCP3 card

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(OCP3 card)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the OCP3 card.

3.     If the issue persists, replace the adapter board if available.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236500e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: AC lost

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(AC lost)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the power supply.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236600e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V_STBY)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify that the power supply is correct and stable.

2.     Power off and restart the server.

3.     If the issue persists,replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236700e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236900e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: RISER_P12V_OCP, P12V_CPU_RISER_OCP

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(RISER_P12V_OCP)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify that the power supply is correct and stable.

2.     Power off and restart the server.

3.     Re-install the riser card or replace the riser card and make sure the riser card provides power correctly.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236a00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU_DIMM_P12V_OCP, P12V_STBY_DIMM_OCP

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU_DIMM_P12V_OCP)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If power supply is not removed from the server, verify if any DIMM, processor, or system board alarms are present. If a corresponding component is faulty, replace the component.

3.     If no component alarms are present, replace the DIMM.

4.     If the issue persists, replace the CPU.

5.     If the issue persists, replace the system board.

6.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236b00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V_BP_FRONT, P12V_Front_BP_eFuse

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V_BP_FRONT)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the front backplane.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236c00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V_BP_REAR, P12V_Rear_BP_eFuse

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V_BP_REAR)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the rear backplane.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236d00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P5V_BP, P3V3_STBY_BP

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P5V_BP)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Verify that the power supply is correct and stable.

2.     Power off and restart the server.

3.     Reconnect the backplane power supply or replace the backplane. Make sure the backplane has correct power supply.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0236e00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V Overcurrent

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V Overcurrent)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the fan.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237100e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the riser card.

3.     If the issue persists, replace the backplane.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237200e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU1_THERMTRIP, CPU2_THERMTRIP

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU1_THERMTRIP)

Impact

The device gets stuck before this message is generated. Then, the device is powered off and enters the standby mode.

Cause

The CPU actively lower its frequency when its actual temperature exceeds the upper limit. If the CPU continues to overheat even after the frequency is lowered, the Thermtrip signal will be triggered and the CPU will stop running.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the CPU and CPU heatsink.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237300e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: REAR_4SFF_EFUSE, P12V_BP_REAR

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(REAR_4SFF_EFUSE)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the 4SFF backplane.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237400e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: RISER2_GPU_EFUSE, P12V_SLOT_2_3

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(RISER2_GPU_EFUSE)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the riser card 2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237500e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: RISER1_GPU_EFUSE, P12V_SLOT_0_1

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(RISER1_GPU_EFUSE)

Impact

The system will be powered off.

Cause

·     The EFUSE chip is faulty on the expander module.

·     The P12V_EB_F power supply provided by the expander module for the riser card and GPU is abnormal.

·     Riser card 1 or the GPU P12V power supply is abnormal.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the expander module.

3.     If the issue persists, replace Riser card 1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237600e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1) ---SW CpldReg 0x30:$2, 0x31:$3

Variable fields

$1: SW

$2: Value of register 0x30 in SW.

$3: Value of register 0x31 in SW.

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(SW) ---SW CpldReg 0x30:0x01, 0x31:0x40

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the switch card.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237900e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: UART_ERROR

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(UART_ERROR)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237c00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: SWCPLD_ERROR

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(SWCPLD_ERROR)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the switch card.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237d00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P5V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P5V)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the BMC card.

4.     If the issue persists, replace the rear backplane.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237a00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V_STBY)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the issue persists, replace the power supply.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237b00e

Message text

Transition to Non-recoverable from less severe---System detected a failure on the BMC board($1)

Variable fields

$1: BMCCPLD_ERROR

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a failure on the BMC board(BMCCPLD_ERROR)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the BMC card.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237e00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: RISER_P12V_PWR

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(RISER_P12V_PWR)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the riser card.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0237f00e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCD_HV_CPU1, PVPP_HBM_CPU1, PVCCFA_EHV_CPU1, PVCCFA_EHV_FIVRA_CPU1, PVCCINFAON_CPU1, PVNN_MAIN_CPU1, PVCCIN_CPU1

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCD_HV_CPU1)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU1.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0238000e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: PVCCD_HV_CPU2, PVPP_HBM_CPU2, PVCCFA_EHV_CPU2, PVCCFA_EHV_FIVRA_CPU2, PVCCINFAON_CPU2, PVNN_MAIN_CPU2, PVCCIN_CPU2

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(PVCCD_HV_CPU2)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, replace the CPU2.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0238100e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V)

Impact

The system will be powered off.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0238400e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: FAN_P12V, P12V_FAN_VIN_1, P12V_FAN_VIN_2

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(FAN_P12V)

Impact

The system will be powered off immediately.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the fan board.

3.     If the issue persists, replace the power board.

4.     If the issue persists, replace the fan.

5.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0238500e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: USB_HUB_P1V2_STBY

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(USB_HUB_P1V2_STBY)

Impact

The system will be powered off immediately.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the upper and lower USB ports of the BMC card.

3.     If the issue persists, replace the iFIST module.

4.     If the issue persists, replace the system board.

5.     If the issue persists, replace the internal USB.

6.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0238600e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: DIMM_P12V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(DIMM_P12V)

Impact

The system will be powered off immediately.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the DIMM.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0238700e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: CPU_P12V

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(CPU_P12V)

Impact

The system will be powered off immediately.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the issue persists, replace the CPU.

3.     If the issue persists, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable from less severe

Event code

0x0238800e

Message text

Transition to Non-recoverable from less severe---System detected a power supply failure($1)

Variable fields

$1: P12V_BP_REAR_GPU

Severity level

Critical

Example

Transition to Non-recoverable from less severe---System detected a power supply failure(P12V_BP_REAR_GPU)

Impact

The system will be powered off immediately.

Cause

Abnormal board voltage.

Recommended action

1.     ‍Identify whether the server is powered off. If the server is powered off, re-connect the power cord and identify whether the server can be powered on properly.

2.     If the server is not powered off, replace the power board.

3.     If the issue persists, replace the rear GPU.

4.     If the issue persists, contact Technical Support.

 

Current

Transition to Critical from less severe

Event code

0x0320000e

Message text

Transition to Critical from less severe

Variable fields

N/A

Severity level

Major

Example

Transition to Critical from less severe

Impact

Powering off a module affects system operations.

Cause

The current of the corresponding component is abnormal.

Recommended action

1.     ‍Check for any abnormal alarms on the power supply and the system board through the HDM Web alarm page.

2.     Make sure the power supply system is functioning properly and the voltage is stable.

3.     If the issue persists, contact Technical Support.

 

Exceeded the upper minor threshold

Event code

0x03700002

Message text

Exceeded the upper minor threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the current sensor.

$2: Threshold for triggering a minor current notification.

Severity level

Minor

Example

Exceeded the upper minor threshold---Current reading:85---Threshold reading:80

Impact

Performance degradation and unstable operation might occur on the device components if the current is too high.

Cause

The current of the corresponding component is abnormal.

Recommended action

1.     ‍Replace the component.

2.     If the issue persists, contact Technical Support.

 

Exceeded the upper major threshold

Event code

0x03900002

Message text

Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the current sensor.

$2: Threshold for triggering a major current notification.

Severity level

Major

Example

Exceeded the upper major threshold---Current reading:90---Threshold reading:88

Impact

Performance degradation and unstable operation might occur on the device components if the current is too high.

Cause

Abnormal board current.

Recommended action

1.     ‍Replace the component.

2.     If the issue persists, contact Technical Support.

 

Exceeded the upper major threshold

Event code

0x03920002

Message text

Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the current sensor.

$2: Threshold for triggering a major current notification.

Severity level

Major

Example

Exceeded the upper major threshold---Current reading:0.50---Threshold reading:0.20

Impact

Memory and system performance degradation might occur.

Cause

This alarm is triggered when the current reading of the PMIC for the memory exceeds the major alarm threshold.

Recommended action

1.     ‍Replace the DIMM.

2.     If the issue persists, contact Technical Support.

 

Exceeded the upper critical threshold

Event code

0x03b00002

Message text

Exceeded the upper critical threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current reading of the current sensor.

$2: Threshold for triggering a critical current notification.

Severity level

Critical

Example

Exceeded the upper critical threshold---Current reading:95---Threshold reading:90

Impact

This could potentially cause component damage, leading to a system crash.

Cause

Abnormal board current.

Recommended action

1.     ‍Replace the component.

2.     If the issue persists, contact Technical Support.

 

Fan

Predictive Failure deasserted

Event code

0x04100008

Message text

Predictive Failure deasserted

Variable fields

N/A

Severity level

Info

Example

Predictive Failure deasserted

Impact

No negative impact.

Cause

The status of the power fan has returned to normal.

Recommended action

No action is required.

 

Predictive Failure asserted

Event code

0x04000008

Message text

Predictive Failure asserted

Variable fields

N/A

Severity level

Minor

Example

Predictive Failure asserted

Impact

Predictive failure.

Cause

The state of the power supply fan is abnormal.

Recommended action

1.     ‍If the power supply fan stops due to foreign objects in the power supply, remove the foreign objects.

2.     If the issue persists, Re-install the power supplies,

3.     If the issue persists, replace the faulty power supply.

4.     If the issue persists, contact Technical Support.

 

Transition to Running

Event code

0x04000014

Message text

Transition to Running

Variable fields

N/A

Severity level

Info

Example

Transition to Running

Impact

No negative impact.

Cause

The fan is operating correctly.

Recommended action

No action is required.

 

Fully Redundant

Event code

0x04000016

Message text

Fully Redundant

Variable fields

N/A

Severity level

Info

Example

Fully Redundant

Impact

No negative impact.

Cause

All fan slots are equipped with fans.

Recommended action

No action is required.

 

Non-redundant:Sufficient Resources from Redundant

Event code

0x04300016

Message text

Non-redundant:Sufficient Resources from Redundant

Variable fields

N/A

Severity level

Major

Example

Non-redundant:Sufficient Resources from Redundant

Impact

This issue does not affect system heat dissipation.

Cause

The fan is invalid or is absent.

Recommended action

1.     ‍If the fan has been removed, reinstall the fan as a best practice.

2.     Reinsert or reattach the fan to ensure proper contact.

3.     If the fan status sensor reports a malfunction, it means that the fan has failed. Replace the fan.

4.     If the issue persists, contact Technical Support.

 

Non-redundant:Insufficient Resources

Event code

0x04500016

Message text

Non-redundant:Insufficient Resources

Variable fields

N/A

Severity level

Critical

Example

Non-redundant:Insufficient Resources

Impact

This affects system heat dissipation, causing the system to overheat and automatically shut down.

Cause

The fan is invalid or is absent.

Recommended action

1.     ‍If the fan has been removed, reinstall the fan as a best practice.

2.     Reinsert or reattach the fan to ensure proper contact.

3.     If the fan status sensor reports a malfunction, it means that the fan has failed. Replace the fan.

4.     If the issue persists, contact Technical Support.

 

Physical Security

General Chassis Intrusion

Event code

0x050000de

Message text

General Chassis Intrusion

Variable fields

N/A

Severity level

Minor

Example

General Chassis Intrusion

Impact

No negative impact.

Cause

The chassis access panel is removed.

Recommended action

1.     ‍Check if the access panel was removed manually.

2.     Check if the access panel is installed properly. If necessary, open the access panel and then close it to see if the error log is cleared.

3.     Check if the connection between the access-open alarm module and the chassis ear is normal.

4.     If the issue persists, contact Technical Support.

 

LAN Leash Lost

Event code

0x054000de

Message text

LAN Leash Lost

Variable fields

N/A

Severity level

Info

Example

LAN Leash Lost

Impact

No negative impact.

Cause

BMC's NCSI channel detects a physical disconnection in the network.

Recommended action

1.     ‍Check if the network adapter is disabled in the operating system. If it is disabled, no action is required.

2.     If the system reports this log during the power on/off phase, it can be ignored.

3.     Check if the shared network port cable is properly connected.

4.     If the shared network port is not needed, disable it.

5.     If the issue persists, contact Technical Support.

 

Processor

Thermal Trip

Event code

0x071000de

Message text

Thermal Trip

Variable fields

N/A

Severity level

Critical

Example

Thermal Trip

Impact

It can cause host power-off.

Cause

When the CPU overheats, this event is triggered, which may result in shutdown and poweroff.

Recommended action

1.     ‍Log in to HDM, and verify that the fan is in normal state.

2.     Re-install or replace the fan module with a speed alarm.

3.     Identify whether the ambient temperature is too high. Keep the server operating within its normal temperature range.

4.     Check for any blockages at the air inlet/outlet and remove any obstructions.

5.     Power off the server, check for poor contact of the CPU heatsink, reapply the thermal grease, reinstall the heatsink, and power on the server again.

6.     For a liquid-cooled server model, identify whether liquid-cooled component-related alarms have occurred.

7.     If the issue persists, contact Technical Support.

 

FRB1/BIST failure

Event code

0x072000de

Message text

FRB1/BIST failure.

Variable fields

N/A

Severity level

Minor

Example

FRB1/BIST failure

Impact

The operating system might fail to start up and hardware downsizing applies.

Cause

This alarm is generated when the CPU self-check detects an error during system startup.

Recommended action

1.     ‍Power cycle the device.

2.     If the issue persists, replace the CPU.

3.     If the issue persists, contact Technical Support.

 

FRB2/Hang in POST failure

Event code

0x073000de

Message text

FRB2/Hang in POST failure

Variable fields

N/A

Severity level

Major

Example

FRB2/Hang in POST failure

Impact

The operating system might fail to start up.

Cause

The BIOS startup timed out.

Recommended action

1.     ‍Upgrade the BIOS.

2.     If the issue persists, contact Technical Support.

 

FRB3/Processor Startup/Initialization failure

Event code

0x074000de

Message text

FRB3/Processor Startup/Initialization failure

Variable fields

N/A

Severity level

Minor

Example

FRB3/Processor Startup/Initialization failure

Impact

The operating system might fail to start up.

Cause

The BIOS startup timed out.

Recommended action

1.     ‍Upgrade the BIOS.

2.     If the issue persists, contact Technical Support.

 

Configuration Error

Event code

0x075000de

Message text

Configuration Error---$1, ErrorType: $2,Severity: $3, Component: $4, IIO Stack: $5, Location: Socket: $6 or Configuration Error--- ErrorType: $2,Severity: $3, Failed Core: $7, Location: Socket: $6

Variable fields

$1: Time at which the error occurred. It can beCurrent Boot Error or Last Boot Error.

$2: Fault type. It can be IIO Internal Error or Spare core Error.

$3: Fault severity.

$4: Faulty component.

$5: I/O number.

$6: CPU number.

$7: core number.

Severity level

Minor

Example

Configuration Error---Current Boot Error, ErrorType: IIO Internal Error,Severity:Correctable, Component:VTD, IIO Stack: 1, Location: Socket: 1

Impact

The operating system might fail to start up.

Cause

The main system CPU detected internal correctable error information during operation.

Recommended action

This log is generated when correctable internal errors are detected during server operation, such as IIO internal errors or CPU core errors. No action is required for correctable internal errors.

 

Processor Presence detected

Event code

0x077000df

Message text

Processor Presence detected

Variable fields

N/A

Severity level

Info/Critical

Example

Processor Presence detected

Impact

If the primary CPU is not in place, it may result in system startup failure.

Cause

This event log is triggered when the primary CPU is not in place or installed incorrectly.

Recommended action

1.     ‍Verify that the primary CPU is installed correctly.

2.     If the primary CPU fails, replace the CPU.

3.     If the issue persists, contact Technical Support.

 

Processor Automatically Throttled

Event code

0x07a000de

Message text

Processor Automatically Throttled---$1

Variable fields

$1: CPU throttling reasons, possible values include: due to sensor reading exceeds the threshold, due to insufficient fan redundancy, due to sensor reading exceeds the threshold and insufficient fan redundancy.

Severity level

Minor

Example

Processor Automatically Throttled---due to sensor reading exceeds the threshold

Impact

System performance decreases due to CPU throttling.

Cause

The CPU throttles due to fan failure.

Recommended action

1.     ‍Verify that the fan speed mode matches the current service model.

2.     Identify whether the ambient temperature is too high or whether air inlet and air outlet are blocked.

3.     Identify whether the fan is blocked by any foreign objects or whether the fan is faulty.

4.     Identify the fan status. If the fan is faulty, replace the faulty fan.

5.     If the issue persists, contact Technical Support.

 

Processor Automatically Throttled

Event code

0x07a010de

Message text

Processor Automatically Throttled---prochot

Variable fields

N/A

Severity level

Minor

Example

Processor Automatically Throttled---prochot

Impact

System performance decreases due to CPU throttling.

Cause

CPU throttling might occur due to CPU overheating.

Recommended action

1.     ‍Log in to HDM, and verify that the fan is in normal state.

2.     Re-install or replace the fan module with a speed alarm.

3.     Identify whether the ambient temperature is too high. Keep the server operating within its normal temperature range.

4.     Check for any blockages at the air inlet/outlet and remove any obstructions.

5.     Power off the server, check for poor contact of the CPU heatsink, reapply the thermal grease, reinstall the heatsink, and power on the server again.

6.     For a liquid-cooled server model, identify whether liquid-cooled component-related alarms have occurred.

7.     If the issue persists, contact Technical Support.

 

Processor Automatically Throttled

Event code

0x07a020de

Message text

Processor Automatically Throttled---memhot

Variable fields

N/A

Severity level

Minor

Example

Processor Automatically Throttled---memhot

Impact

System performance decreases due to CPU throttling.

Cause

CPU throttling may occur due to memory overheating.

Recommended action

1.     ‍Log in to HDM, and verify that the fan is in normal state.

2.     Re-install or replace the fan module with a speed alarm.

3.     Identify whether the ambient temperature is too high. Keep the server operating within its normal temperature range.

4.     Check for any blockages at the air inlet/outlet and remove any obstructions.

5.     For a liquid-cooled server model, identify whether liquid-cooled component-related alarms have occurred.

6.     If the issue persists, contact Technical Support.

 

Machine Check Exception

Event code

0x07b050de

Message text

Machine Check Exception---$1---$2---Location: Socket:$3

Explanation

An uncorrectable error occurred on a CPU.

Parameters

$1: Uncorrectable error type, including fatal and non-fatal.

$2: CPU error type, including Cache Error, TLB Error, Bus Error, and Micro-architectural Error.

$3: CPU number.

Severity

Critical

Example

Machine Check Exception---fatal---Cache Error---Location: Socket:1

System impact

It might cause the system to stop responding.

Possible reasons

This event is generated when an uncorrectable error occurs on a CPU.

Recommended action

1.     ‍Check if a corresponding fault exists in the OS, and resolve the software issues.

2.     Check the CPU microcode and evaluate whether to upgrade the microcode.

3.     Update the BIOS and HDM to the latest version.

4.     If the issue persists, contact Technical Support.

 

Triggered a uncorrectable error

Event code

0x07b201de

Message text

CPU $1 triggered a uncorrectable error.

Variable fields

$1: CPU number.

Severity level

Critical

Example

CPU 1 triggered a uncorrectable error.

Impact

The system might stop responding.

Cause

Triggering IERR or MCERR errors, the diagnosis result of HDM is CPU uncorrectable error.

Recommended action

1.     ‍Upgrade the BIOS and HDM firmware to the up-to-date version.

2.     Safely power off the server and replace the CPU to identify whether the alarm disappears.

3.     If the issue persists, contact Technical Support.

 

Machine Check Exception

Event code

0x07b001de

Message text

Machine Check Exception---$1, Bank: $2,Severity:$3, Error Info:$4, Location: Socket: $5

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot.

$2: Fault bank.

$3: Fault severity.

$4: Fault information.

$5: CPU number.

Severity level

Critical

Example

Machine Check Exception---Current Boot Error, Bank: IFU,Severity:FATAL, Error Info:Cache, Location: Socket: 1

Impact

The system might stop responding.

Cause

This event occurs when there is an internal fault in the CPU.

Recommended action

1.     ‍Check if there are any corresponding faults present in the operating system.

2.     Check the CPU microcode and upgrade the BIOS and HDM to the latest versions.

3.     If the issue persists, preliminarily determine the range of the fault based on the bank location and check if any other warning logs have been generated.

4.     Power off the server safely and replace the CPU or peripheral with a known working one to see if the warning disappears.

5.     Replace the system board.

 

Triggered a correctable error

Event code

0x07c201de

Message text

CPU $1 triggered a correctable error.

Variable fields

$1: CPU number.

Severity level

Minor

Example

CPU 1 triggered a correctable error.

Impact

No negative impact.

Cause

Triggering IERR or MCERR errors, the diagnosis result of HDM is CPU correctable error.

Recommended action

No action is required.

 

Correctable Machine Check Error

Event code

0x07c050de

Message text

Correctable Machine Check Error---$1---Location: Socket:$2

Explanation

A correctable error occurred on a CPU.

Parameters

$1: CPU error type, including Cache Error, TLB Error, Bus Error, and Micro-architectural Error.

$2: CPU number.

Severity

Minor

Example

Correctable Machine Check Error---Cache Error---Location: Socket:1

System impact

No negative impact on the system.

Possible reasons

This event is generated when a correctable error occurs on a CPU.

Recommended action

No action is required.

 

Correctable Machine Check Error

Event code

0x07c100de

Message text

Correctable Machine Check Error---HBM error---Location: Socket:$1

Variable fields

$1: CPU number.

Severity level

Minor

Example

Correctable Machine Check Error---HBM error---Location: Socket:1

Impact

No negative impact.

Cause

An correctable error was detected on HBM.

Recommended action

No action is required.

 

Correctable Machine Check Error

Event code

0x07c001de

Message text

Correctable Machine Check Error---$1, Bank: $2,Severity:$3, Error Info:$4, Location: Socket: $5

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot.

$2: Fault bank.

$3: Fault severity.

$4: Fault information.

$5: CPU number.

Severity level

Minor

Example

Correctable Machine Check Error---Current Boot Error, Bank: IFU,Severity:Corrected, Error Info:Cache, Location: Socket: 1

Impact

No negative impact.

Cause

An internal correctable error occurred in the CPU.

Recommended action

No action is required.

 

Power Supply

Presence detected

Event code

0x080000de

Message text

Presence detected

Variable fields

N/A

Severity level

Info

Example

Presence detected

Impact

No negative impact.

Cause

When the power supply is detected as being inserted, this event is triggered, indicating a transition from the power supply not being in place to being in place.

When the power supply is detected as being removed, this event is cleared, indicating a transition from the power supply being in place to not being in place.

Recommended action

If the power supply is removed, install the power supply again.

 

Power Supply Failure detected

Event code

0x081000de

Message text

Power Supply Failure detected

Variable fields

N/A

Severity level

Major

Example

Power Supply Failure detected

Impact

It affects system power supply and may result in abnormal system power-off.

Cause

A power supply fault was detected.

Recommended action

1.     ‍Re-install the power supply.

2.     If the issue persists, replace the power supply.

3.     If the issue persists, contact Technical Support.

 

Power Supply Predictive Failure

Event code

0x082000de

Message text

Power Supply Predictive Failure

Variable fields

N/A

Severity level

Major

Example

Power Supply Predictive Failure

Impact

The power supply may have malfunctions that affect system power supply.

Cause

A power supply fault was detected.

Recommended action

1.     ‍Identify whether any foreign objects have obstructed and stop the power supply fan. If yes, remove the foreign objects.

2.     If the issue persists, re-install the power supply.

3.     If the issue persists, replace the power supply.

4.     If the issue persists, contact Technical Support.

 

Power Supply input lost (AC/DC)

Event code

0x083000de

Message text

Power Supply input lost (AC/DC)

Variable fields

N/A

Severity level

Major

Example

Power Supply input lost (AC/DC)

Impact

It may cause the server to power off abnormally.

Cause

The AC power cable of the power supply is unplugged or there is an abnormal AC input.

Recommended action

1.     ‍Verify that the power input is normal.

2.     Verify that all power cables are undamaged and properly connected.

3.     Verify that all power supplies are correctly installed.

4.     If the issue persists, contact Technical Support.

 

Power Supply input lost or out-of-range

Event code

0x084000de

Message text

Power Supply input lost or out-of-range

Variable fields

N/A

Severity level

Major

Example

Power Supply input lost or out-of-range

Impact

This may cause the server to power off abnormally.

Cause

The input voltage of the power supply exceeded the rated range.

Recommended action

1.     ‍Verify that the power input is normal.

2.     Verify that all power cables are undamaged and properly connected.

3.     Verify that all power supplies are correctly installed.

4.     If the issue persists, contact Technical Support.

 

Power Supply input out-of-range - but present

Event code

0x085000de

Message text

Power Supply input out-of-range - but present

Variable fields

N/A

Severity level

Major

Example

Power Supply input out-of-range - but present

Impact

Abnormal power input beyond the supported range may cause the server to power off.

Cause

The input voltage of the power supply is too high.

Recommended action

1.     ‍Check if the input voltage of the power supply is normal.

2.     Verify that the power cables and power supplies are installed correctly.

3.     Unplug and re-plug the power supply to ensure a good power connection.

4.     Check if the fans of the power supply are spinning.

5.     If the issue persists, contact Technical Support.

 

Configuration error ---Vendor mismatch

Event code

0x086000de

Message text

Configuration error ---Vendor mismatch

Variable fields

N/A

Severity level

Minor

Example

Configuration error ---Vendor mismatch

Impact

An unknown risk exists due to non-original certified components.

Cause

Non-original certified power supplies are installed.

Recommended action

Install original certified power supplies.

 

Configuration error---Power Supply rating mismatch

Event code

0x086030de

Message text

Configuration error --- Power Supply rating mismatch

Variable fields

N/A

Severity level

Minor

Example

Configuration error --- Power Supply rating mismatch

Impact

This may result in unstable power supply and abnormal system shutdown.

Cause

Original certified power supplies are installed, but the models of the two power supplies do not match.

Recommended action

1.     ‍Make sure all the power supplies are of the same model.

2.     If the issue persists, contact Technical Support.

 

Configuration error---Power supply rating mismatch

Event code

0x086200de

Message text

Configuration error---Power supply rating mismatch:PSU$1,POUT:$2W

Variable fields

$1: PSU ID, which can be 1 or 2.

$2: Output power of the power supply.

Severity level

Minor

Example

Configuration error---Power supply rating mismatch:PSU1,POUT:2000W

Impact

This may result in unstable power supply and abnormal system shutdown.

Cause

The rated power of the installed power supplies may be inconsistent.

Recommended action

1.     ‍Make sure all the power supplies are of the same model.

2.     If the issue persists, contact Technical Support.

 

Power Supply Inactive/standby state

Event code

0x087000de

Message text

Power Supply Inactive/standby state

Variable fields

N/A

Severity level

Info

Example

Power Supply Inactive/standby state

Impact

No negative impact.

Cause

The power supply exits cold standby mode. When the function of standby power supply is enabled, if the current device is running at a high power, the standby power supply will automatically exit cold backup mode and provide power to the device.

Recommended action

No action is required.

 

PSU failure detected by CPLD

Event code

0x088000de

Message text

PSU failure detected by CPLD

Variable fields

N/A

Severity level

Critical

Example

PSU failure detected by CPLD

Impact

This may result in unstable power supply and abnormal system shutdown.

Cause

The server has experienced an AC power failure.

Recommended action

1.     ‍Check for environmental issues such as high temperature or abnormal power supply fan.

2.     Replug the power supply and check if the alarm disappears.

3.     If the issue persists, replace the power supply.

 

Redundancy Lost

Event code

0x08100016

Message text

Redundancy Lost

Variable fields

N/A

Severity level

Major

Example

Redundancy Lost

Impact

Power redundancy failure reduces the reliability of device power supply.

Cause

Power redundancy got lost.

Recommended action

1.     ‍Check if the power supply environment is normal.

2.     Check if any power supply has been removed.

3.     Check for poor contact between power supplies and power cables.

4.     Check for power-related fault alarm logs to determine if it is a power failure.

5.     If the issue persists, contact Technical Support.

 

Power Unit

Power limit is exceeded over correction time limit

Event code

0x095010de

Message text

Power limit is exceeded over correction time limit---$1 Current Power: $2W.

Variable fields

$1: GPU/Not available for chassis power consumption

$2: Current power value.

Severity level

Minor

Example

GPU: Power limit is exceeded over correction time limit---GPU Current Power: 2000W

Chassis: Power limit is exceeded over correction time limit---Current Power: 2000W

Impact

Power capping failed and the corresponding policy will be executed.

Cause

Power capping triggers this alarm after a certain amount of time elapsed when the power output exceeds the limit.

Recommended action

1.     ‍Adjust the power capping threshold or adjust the GPU workload.

2.     If the issue persists, contact Technical Support.

 

Cooling Device

Transition to OK

Event code

0x0a00000e

Message text

Transition to OK

Variable fields

N/A

Severity level

Info

Example

Transition to OK

Impact

No negative impact.

Cause

The liquid-cooled module is in place and free of faults.

Recommended action

No action is required.

 

Transition to Non-recoverable---Liquid leakage occurred

Event code

0x0a60000e

Message text

Transition to Non-recoverable---Liquid leakage occurred

Variable fields

N/A

Severity level

Critical

Example

Transition to Non-recoverable---Liquid leakage occurred

Impact

For the server model that supports only processor liquid-cooled module, processor heal dissipation is affected. For the server model that supports processor liquid-cooled module and GPU liquid-cooled module, processor or GPU heat dissipation is affected.

Cause

This message is generated when liquid leakage occurs.

Recommended action

1.     ‍Check if the liquid cooling device is functioning properly or if there is any liquid leakage.

2.     Replace the liquid-cooled module.

 

Transition to Non-recoverable from less severe

Event code

0x0a30000e

Message text

Transition to Non-recoverable from less severe--- Liquid Cooler not present

Variable fields

N/A

Severity level

Minor

Example

Transition to Non-recoverable from less severe--- Liquid Cooler not present

Impact

Heat dissipation of the components in the liquid-cooled device on which the alarm is present is affected.

Cause

In a server that supports multiple liquid-cooled devices, one liquid-cooled device is not present (or the corresponding liquid leakage detection cable is not plugged in).

Recommended action

1.     ‍Examine if the liquid leakage detection cable of the liquid-cooled device is loose. If it is loose, remove the AC power and reconnect the liquid leakage detection cable.

2.     If the issue persists, contact Technical Support.

 

Transition to Non-Critical from OK--- Liquid leakage detection cable is disconnected

Event code

0x0a10000e

Message text

Transition to Non-Critical from OK--- Liquid leakage detection cable is disconnected

Variable fields

N/A

Severity level

Major

Example

Transition to Non-Critical from OK--- Liquid leakage detection cable is disconnected

Impact

Unable to detect coolant leakage.

Cause

Liquid leakage sensor cannot be detected.

Recommended action

1.     ‍Check if the liquid cooling device is present.

2.     Check if the liquid leakage sensor is installed correctly.

3.     Replace the liquid-cooled module.

 

Other Units-based Sensor

Exceeded the upper minor threshold

Event code

0x0b700002

Message text

Exceeded the upper minor threshold---Current reading:$1---Threshold reading:$2

Variable fields

$1: Current power value.

$2: Threshold for triggering a minor power notification.

Severity level

Minor

Example

Exceeded the upper minor threshold---Current reading:20---Threshold reading:18

Impact

Exceeding the maximum power limit will cause the system to shut down.

Cause

The power exceeds the limit.

Recommended action

1.     ‍Log in to HDM, and verify that the threshold value is appropriate.

2.     Check if the total power consumption of the server is too high through the HDM web page.

3.     Check if the total power consumption of the power supply meets the service requirements.

4.     If the issue persists, contact Technical Support.

 

Memory

Correctable ECC or other correctable memory error

Event code

0x0c0000de

Message text

Correctable ECC or other correctable memory error--$1-Location:CPU:$2 CH:$3 DIMM:$4 $5

Variable fields

$1: Time at which the error occurred, Current Boot Error or Last Boot Error.

$2: CPU number.

$3: Channel number.

$4: DIMM number.

$5: DIMM mark.

Severity level

Minor

Example

Correctable ECC or other correctable memory error---Current Boot Error-Location:CPU:1 CH:1 DIMM:0 A1

Impact

No negative impact.

Cause

Correctable memory errors.

Recommended action

No action is required.

 

Correctable ECC or other correctable memory error

Event code

0x0c0050de

Message text

Correctable ECC or other correctable memory error---$1---Location:CPU:$2 CH:$3 DIMM:$4

Explanation

A correctable error occurs on memory.

Parameters

$1: Memory error type, including Unknown, No error, Single-bit ECC, Multi-bit ECC, Single-symbol ChipKill ECC, Multi-symbol ChipKill ECC, Master abort, Target abort, Watchdog timeout, Invalid address, Mirror Broken, Memory Sparing, and Physical Memory Map-out event.

$2: CPU number.

$3: Channel number.

$4: Memory number.

Severity

Minor

Example

Correctable ECC or other correctable memory error---Unknown---Location:CPU:1 CH:1 DIMM:0

System impact

No negative impact on the system.

Possible reasons

A correctable error occurs on memory.

Recommended action

No action is required.

 

Correctable ECC or other correctable memory error

Event code

0x0c0600de

Message text

Correctable ECC or other correctable memory error---$1---$2---Location:CPU$3 CH:$4 DIMM:$5 $6

Variable fields

$1: Fault type, which can be ECC, Parity, or CRC.

$2: Specifies if the error occurred during the current boot or the previous boot. It can be Current Boot Error or Last Boot Error.

$3: CPU number.

$4: Channel number.

$5: DIMM number.

$6: DIMM mark.

Severity level

Minor

Example

Correctable ECC or other correctable memory error---ECC---Current Boot Error---Location:CPU1 CH:8 DIMM:0 A0

Impact

No negative impact.

Cause

A correctable error occurred on the memory.

Recommended action

No action is required.

 

CPU triggered a correctable error

Event code

0x0c0500de

Message text

CPU $1 $2 triggered a correctable error

Variable fields

$1: CPU number.

$2: DIMM mark.

Severity level

Minor

Example

CPU 1 A0 triggered a correctable error

Impact

No negative impact.

Cause

Triggering IERR or MCERR errors, the HDM diagnostic result shows correctable errors in memory.

Recommended action

No action is required.

 

Uncorrectable ECC or other uncorrectable memory error

Event code

0x0c1000de

Message text

Uncorrectable ECC or other uncorrectable memory error--$1-Location:CPU:$2 CH:$3 DIMM:$4 $5

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot. It can be Current Boot Error or Last Boot Error.

$2: CPU number.

$3: Channel number.

$4: DIMM number.

$5: DIMM mark.

Severity level

Major

Example

Uncorrectable ECC or other uncorrectable memory error---Current Boot Error-Location:CPU:1 MEM CTRL:1 CH:1 DIMM:0 A1

Impact

It can cause the system to stop sending responses, unless the memory is in certain RAS modes, such as mirror or MCA recovery.

Cause

A non-correctable (multiple bit flip) ECC error has occurred.

Recommended action

1.     ‍Verify that the temperature and humidity are appropriate.

2.     Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding DIMM.

3.     Replace the DIMM.

4.     If the issue persists, contact Technical Support.

 

Uncorrectable ECC or other uncorrectable memory error

Event code

0x0c1020de

Message text

Uncorrectable ECC or other uncorrectable memory error--$1-Location:CPU:$2 CH:$3 DIMM:$4 $5

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot. It can be Current Boot Error or Last Boot Error.

$2: CPU number.

$3: Channel number.

$4: DIMM number.

$5: DIMM mark.

Severity level

Major

Example

Uncorrectable ECC or other uncorrectable memory error---Current Boot Error-Location:CPU:1 MEM CTRL:1 CH:1 DIMM:0 A0

Impact

It can cause the system to stop sending responses.

Cause

An uncorrectable (multiple bit flip) ECC error has occurred.

Recommended action

1.     ‍Verify that the temperature and humidity are appropriate.

2.     Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding DIMM.

3.     Replace the DIMM.

4.     If the issue persists, contact Technical Support.

 

Uncorrectable ECC or other uncorrectable memory error

Event code

0x0c1050de

Message text

Uncorrectable ECC or other uncorrectable memory error---$1---$2---Location:CPU:$3 CH:$4 DIMM:$5

Variable fields

$1: Uncorrectable error type, including fatal and non-fatal.

$2: Memory error type, including Unknown, No error, Single-bit ECC, Multi-bit ECC, Single-symbol ChipKill ECC, Multi-symbol ChipKill ECC, Master abort, Target abort, Watchdog timeout, Invalid address, Mirror Broken, Memory Sparing, and Physical Memory Map-out event.

$3: CPU number.

$4: Channel number.

$5: Memory number.

Severity level

Major

Example

Uncorrectable ECC or other uncorrectable memory error---fatal---Single-bit ECC---Location: CPU:1 CH:1 DIMM:0

Impact

It might cause the system to stop responding.

Cause

This message is generated when an uncorrectable error occurs on memory.

Recommended action

1.     ‍Verify that the ambient temperature or humidity is within the normal range.

2.     Clean the slot and the gold contacts of the memory module. Make sure no foreign objects exist in the slots and the contacts are not contaminated. Then, reinstall the corresponding memory module.

3.     If the issue persists, replace the memory module.

4.     If the issue persists, contact Technical Support.

 

Uncorrectable ECC or other uncorrectable memory error

Event code

0x0c1600de

Message text

Uncorrectable ECC or other uncorrectable memory error---$1---$2---Location:CPU$3 CH:$4 DIMM:$5 $6

Variable fields

$1: Fault type, which can be ECC, Parity, or CRC.

$2: Startup time upon error occurrence. It can be Current Boot Error or Last Boot Error.

$3: CPU number.

$4: Channel number.

$5: DIMM number.

$6: DIMM mark.

Severity level

Major

Example

Uncorrectable ECC or other uncorrectable memory error---ECC---Last Boot Error---Location:CPU1 CH:8 DIMM:0 A0

Impact

The system might restart or stop responding.

Cause

Uncorrectable ECC or other uncorrectable errors occur.

Recommended action

1.     ‍Verify that the temperature and humidity are appropriate.

2.     Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding DIMM.

3.     If the issue persists, replace the DIMM.

4.     If the issue persists, contact Technical Support.

 

Triggered an uncorrectable error

Event code

0x0c1500de

Message text

CPU$1 $2 triggered an uncorrectable error

Variable fields

$1: CPU number.

$2: DIMM mark.

Severity level

Major

Example

CPU1 A0 triggered an uncorrectable error

Impact

The system might restart or stop responding.

Cause

Triggering IERR or MCERR errors, the HDM diagnostic result shows uncorrectable errors in memory.

Recommended action

1.     ‍Verify that the temperature and humidity are appropriate.

2.     Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding DIMM.

3.     If the issue persists, replace the DIMM.

4.     If the issue persists, contact Technical Support.

 

Parity

Event code

0x0c2000de

Message text

Parity ---$1---Location: Location:CPU:$2 CH:$3 DIMM:$4 $5

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot. It can be Current Boot Error or Last Boot Error.

$2: CPU number.

$3: Channel number.

$4: DIMM number.

$5: DIMM mark.

Severity level

Minor

Example

Parity---Current Boot Error-Location:CPU:1 CH:1 DIMM:0 A0

Impact

No negative impact.

Cause

This error message is generated when there is a failure in data parity on the command/address lines while reading the memory cell data, resulting in abnormal data access to the memory.

Recommended action

No action is required.

 

Parity

Event code

0x0c2020de

Message text

Parity---Location:CPU:$1 CH:$2 DIMM:$3 $4

Variable fields

$1: CPU number.

$2: Channel number.

$3: DIMM number.

$4: DIMM mark.

Severity level

Minor

Example

Parity---Location:CPU:1 CH:1 DIMM:0 A0

Impact

No negative impact.

Cause

This message is generated when there is a failure in data parity on the command/address lines while reading the memory cell data, resulting in abnormal data access to the memory. The SEL records the command/address parity error and logs the accessed DIMM.

Recommended action

No action is required.

 

Parity

Event code

0x0c2050de

Message text

Parity---Location: CPU:$1 CH:$2 DIMM:$3

Explanation

Memory data parity check fails.

Variable fields

$1: CPU number.

$2: Channel number.

$3: Memory number.

Severity

Minor

Example

Parity---Location: CPU:1 CH:1 DIMM:0

Impact

No negative impact on the system.

Cause

Data parity check fails when the system reads memory unit data, causing memory data access error. The SEL log records the command/address parity check error and records the target DIMM.

Recommended action

No action is required.

 

Parity---Memory training error

Event code

0x0c20c294

Message text

Parity---Memory training error---Location:CPU:$1 CH:$2 DIMM:$3 $4

Variable fields

$1: CPU number.

$2: Channel number.

$3: DIMM number.

$4: DIMM mark.

Severity level

Minor

Example

Parity---Memory training error---Location:CPU:1 CH:1 DIMM:0 A0

Impact

The system performance might be degraded.

Cause

Memory training failed during memory initialization.

Recommended action

1.     ‍Verify that the temperature and humidity are appropriate.

2.     Clean the memory slots and memory contacts, ensuring that the memory slots do not have any foreign objects and the contacts are not contaminated. Then, re-install the corresponding memory module.

3.     If the issue persists, check for any bent pins on the corresponding memory slot. If the bent pins are present, replace the system board.

4.     If the issue persists, replace the memory module.

5.     If the issue persists, contact Technical Support.

 

Parity---CmdPiGroup: No Eye width

Event code

0x0c226024

Message text

Parity---CmdPiGroup: No Eye width---Location:CPU:$1 CH:$2 DIMM:$3 $4 Rank:$5

Variable fields

$1: CPU number.

$2: Channel number.

$3: DIMM number.

$4: DIMM mark.

$5: Rank number.

Severity level

Minor

Example

Parity---CmdPiGroup: No Eye width---Location:CPU:1 CH:2 DIMM:0 A0 Rank:0

Impact

System performance degradation might occur.

Cause

CMD eye width does not exist.

Recommended action

1.     ‍Confirm the memory slot according to the alarm information.

2.     Check if there are foreign objects on the memory gold finger and memory slot and clean them.

3.     If the issue persists, replace the DIMM.

4.     If the issue persists, contact Technical Support.

 

Memory Device Disabled

Event code

0x0c4000de

Message text

Memory Device Disabled--$1---Location:CPU:$2 CH:$3 DIMM:$4 $5

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot. It can be Current Boot Error or Last Boot Error.

$2: CPU number.

$3: Channel number.

$4: DIMM number.

$5: DIMM mark.

Severity level

Major

Example

Memory Device Disabled---Current Boot Error---Location:CPU:1 CH:1 DIMM:0 A1

Impact

The DIMM is disabled. System performance degradation might occur.

Cause

A memory fault is detected during the system startup process.

Recommended action

1.     ‍Replace the DIMM.

2.     If multiple DIMMs in the channels for a CPU are disabled at the same time except the DIMMs disabled due to POR rules, insert the CPU to another slot. If the issue occurs on this CPU again, replace this CPU.

3.     If the multiple DIMMs are disabled for two CPUs at the same time except the DIMMs disabled due to POR rules, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Memory Device Disabled---The DIMM is disabled

Event code

0x0c40a044

Message text

Memory Device Disabled---The DIMM is disabled---Location:CPU:$1 CH:$2 DIMM:$3 $4 Rank:$5

Variable fields

$1: CPU number.

$2: Channel number.

$3: DIMM number.

$4: DIMM mark.

$5: Rank number.

Severity level

Major

Example

Memory Device Disabled---The DIMM is disabled---Location:CPU:1 CH:1 DIMM:0 A0 Rank:0

Impact

System performance degradation might occur.

Cause

The DIMM is disabled.

Recommended action

1.     ‍Replace the DIMM.

2.     If multiple DIMMs in the channels for a CPU are disabled at the same time except the DIMMs disabled due to POR rules, insert the CPU to another slot. If the issue occurs on this CPU again, replace this CPU.

3.     If the multiple DIMMs are disabled for two CPUs at the same time except the DIMMs disabled due to POR rules, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Memory Device Disabled---The DIMM is disabled because of inconsistency with POR restrictions

Event code

0x0c417504

Message text

Memory Device Disabled---The DIMM is disabled because of inconsistency with POR restrictions---Location:CPU:$1 CH:$2 DIMM:$3

Variable fields

$1: CPU number.

$2: Channel number.

$3: DIMM mark.

Severity level

Major

Example

Memory Device Disabled--- The DIMM is disabled because of inconsistency with POR restrictions---Location:CPU:1 CH:1 DIMM: A0

Impact

The system performance might degrade.

Cause

The DIMM is disabled due to incorrect DIMM installation.

Recommended action

1.     ‍Re-install DIMMs based on the DIMM population guidelines in the user guide for the server.

2.     If the issue persists, contact Technical Support.

 

Memory Device Disabled---Buck Regulator Output Over or Under Voltage Lockout

Event code

0x0c446fb4

Message text

Memory Device Disabled---Buck Regulator Output Over or Under Voltage Lockout---Location:CPU:$1 CH:$2 DIMM:$3 $4

Variable fields

$1: CPU number.

$2: Channel number.

$3: DIMM number.

$4: DIMM mark.

Severity level

Major

Example

Memory Device Disabled---Buck Regulator Output Over or Under Voltage Lockout---Location:CPU:1 CH:3 DIMM:0 C0

Impact

The DIMM is disabled. System performance degradation might occur.

Cause

A memory fault is detected.

Recommended action

1.     ‍Replace the DIMM.

2.     If multiple DIMMs in the channels for a CPU are disabled at the same time except the DIMMs disabled due to POR rules, insert the CPU to another slot. If the issue occurs on this CPU again, replace this CPU.

3.     If the multiple DIMMs are disabled for two CPUs at the same time except the DIMMs disabled due to POR rules, replace the system board.

4.     If the issue persists, contact Technical Support.

 

Correctable ECC or other correctable memory error logging limit reached

Event code

0x0c5000de

Message text

Correctable ECC or other correctable memory error logging limit reached--$1---Location:CPU:$2 CH:$3 DIMM:$4 $5 $6

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot. It can be Current Boot Error or Last Boot Error.

$2: CPU number.

$3: Channel number.

$4: DIMM number.

$5: DIMM mark.

$6: Description, ---Predictive Failure assert, which is not displayed when the CE limit is normally reported.

Severity level

Minor

Example

Correctable ECC or other correctable memory error logging limit reached---Current Boot Error---Location:CPU:1 CH:1 DIMM:0 A1---Predictive Failure assert

Impact

The system might restart or stop responding.

Cause

The memory may not be installed correctly or there could be an internal memory failure. The correctable errors in the memory have reached the set threshold, and when the corresponding Memory RAS mode is enabled, the corresponding RAS features will be executed without causing a system crash. Even in the memory repair mode, the errors still exceed the threshold.

Recommended action

1.     ‍Reinstall the corresponding DIMM to ensure correct installation, clean the gold fingers, make sure no foreign objects exist in the memory slot, and that the temperature and humidity in the environment are normal.

2.     Check the memory funnel threshold in the BIOS. If it is too low, adjust the funnel threshold value in the BIOS.

3.     If the issue persists, contact Technical Support.

 

Correctable ECC or other memory error limit reached

Event code

0x0c5020de

Message text

Correctable ECC or other correctable memory error logging limit reached---$1 $2:$3---Location:CPU:$4 CH:$5 DIMM:$6 $7 $8

Variable fields

$1: MCA/UMC(Available in case of CE Count Overflow)

$2: CE Count Overflow/Memory CE Storm Threshold/Memory CE Accumulation Threshold

$3: Threshold.

$4: CPU number.

$5: Channel number.

$6: DIMM number.

$7: DIMM mark.

$8: Description, ---Predictive Failure assert, which is not displayed when the CE limit is normally reported.

Severity level

Minor

Example

Correctable ECC or other correctable memory error logging limit reached---MCA CE Count Overflow:8769---Location:CPU:1 CH:5 DIMM:0 A0---Predictive Failure assert

Impact

The system might restart or stop responding.

Cause

The memory may not be installed correctly or there could be an internal memory failure. The correctable errors in the memory have reached the set threshold and will not cause a system crash. Even in the memory repair mode, the errors still exceed the threshold.

Recommended action

1.     ‍Reinstall the corresponding DIMM. Make sure it is installed correctly, the gold contacts are not contaminated, no foreign objects exist in the memory slot, and the environmental temperature and humidity are normal.

2.     Check whether the memory funnel threshold in the BIOS is too low. If so, adjust the funnel threshold value in the BIOS.

3.     If the issue persists, contact Technical Support.

 

Presence detected

Event code

0x0c6000de

Message text

Presence detected

Variable fields

N/A

Severity level

Info

Example

Presence detected

Impact

No negative impact.

Cause

A DIMM is detected present.

Recommended action

No action is required.

 

Memory patrol scrub CE occurred

Event code

0x0c3010de

Message text

Memory patrol scrub CE occurred---$1---Location: Location:CPU:$2 CH:$3 DIMM:$4 $5

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot. It can be Current Boot Error or Last Boot Error.

$2: CPU number.

$3: Channel number.

$4: DIMM number.

$5: DIMM mark.

Severity level

Minor

Example

Memory patrol scrub CE occurred---Current Boot Error---Location:CPU:1 CH:1 DIMM:0 A0

Impact

Check failed for reading memory data. No negative impact.

Cause

CE Inspection.

This error message indicates that there was a data parity error during the read operation of a memory cell. The error occurred on the command/address lines, resulting in abnormal data retrieval from the memory. The error is recorded in the SEL, along with the DIMM that was accessed during the error.

Recommended action

No action is required.

 

Memory patrol scrub UCE occurred and degraded to CE

Event code

0x0c3020de

Message text

Memory patrol scrub UCE occurred and degraded to CE---$1---Location: Location:CPU:$2 CH:$3 DIMM:$4 $5

Variable fields

$1: Specifies if the error occurred during the current boot or the previous boot. It can be Current Boot Error or Last Boot Error.

$2: CPU number.

$3: Channel number.

$4: DIMM number.

$5: DIMM mark.

Severity level

Minor

Example

Memory patrol scrub UCE occurred and degraded to CE---Current Boot Error---Location:CPU:1 CH:1 DIMM:0 A0

Impact

Check failed for reading memory data. No negative impact.

Cause

UCE Inspection: Degraded CE.

This error message indicates that there was a data parity error during the read operation of a memory cell. The error occurred on the command/address lines, resulting in abnormal data retrieval from the memory. The error is recorded in the SEL, along with the DIMM that was accessed during the error.

Recommended action

No action is required.

 

Memory patrol scrub CE occured

Event code

0x0c3050de

Message text

Memory scrub Failed---$1---Location: CPU:$2 CH:$3 DIMM:$4

Explanation

A correctable error detected during memory patrol.

Variable fields

$1: Memory error type, including Scrub corrected error.

$2: CPU number.

$3: Channel number.

$4: Memory number.

Severity

Minor

Example

Memory scrub Failed ---Scrub corrected error---Location: CPU:1 CH:1 DIMM:0

Impact

No negative impact on the system.

Cause

A memory correctable error was detected during memory patrol.

Recommended action

No action is required.

 

Memory patrol scrub UCE occurred

Event code

0x0c3150de

Message text

Memory scrub Failed ---$1--Location: CPU:$2 CH:$3 DIMM:$4

Explanation

An uncorrectable error detected during memory patrol.

Parameters

$1: Memory error type, including Scrub uncorrected error.

$2: CPU number.

$3: Channel number.

$4: Memory number.

Severity

Major

Example

Memory scrub Failed ---Scrub uncorrected error---Location: CPU:1 CH:1 DIMM:0

System impact

It might cause the system to stop responding.

Possible reasons

An uncorrectable error was detected during memory patrol.

Recommended action

1.     ‍Verify that the ambient temperature or humidity is within the normal range.

2.     Clean the slot and the gold contacts of the memory module. Make sure no foreign objects exist in the slots and the contacts are not contaminated. Then, reinstall the corresponding memory module.

3.     If the issue persists, replace the memory module.

4.     If the issue persists, contact Technical Support.

 

Configuration Error---DIMM speed is less than the minimum POR DIMM speed

Event code

0x0c707274

Message text

Configuration Error---DIMM speed is less than the minimum POR DIMM speed---Location:CPU:$1 CH:$2 DIMM:$3 $4 Rank:$5

Variable fields

$1: CPU number.

$2: Channel number.

$3: DIMM number.

$4: DIMM mark.

$5: Rank number.

Severity level

Minor

Example

Configuration Error---DIMM speed is less than the minimum POR DIMM speed---Location:CPU:1 CH:1 DIMM:0 A0 Rank:0

Impact

The system might restart or stop responding.

Cause

DIMM speed is less than the minimum POR DIMM speed.

Recommended action

1.     ‍See the user guide on the official website to select the matching memory or CPUs.

2.     If the issue persists, contact Technical Support.

 

Drive Slot

Drive Presence

Event code

0x0d0000de

Message text

Drive Presence

Variable fields

N/A

Severity level

Info

Example

Drive Presence

Impact

The drive presence changed.

Cause

The drive presence changed.

Recommended action

No action is required.

 

Drive Fault

Event code

0x0d1000de

Message text

Drive Fault

HDDBay upper drive: Drive Fault --- Bay Slot: $1, HDD Slot: $2

Variable fields

$1: Bay slot number.

$2: HDD slot number.

Severity level

Major

Example

Drive Fault

HDDBay upper drive: Drive Fault --- Bay Slot: 1, HDD Slot: 2

Impact

The drive is faulty, which might cause data loss.

Cause

The drive is faulty.

Recommended action

1.     ‍Verify that the status of the drive is Unconfigured Good.

2.     Verify that drive LEDs are normal, and the drive can be identified and is accessible in the OS. If a drive LED is orange, the drive is faulty. Replace the faulty components, if any.

3.     Verify that the storage controller is in normal state.

4.     If the issue persists, contact Technical Support.

 

Drive Fault

Event code

0x0d1050de

Message text

Drive Fault---Percentage drive life used is $1%---Exceed the major threshold $2% $3.

Variable fields

$1: Used life percentage of the drive.

$2: Major threshold.

$3: PCIe mark for the AIC card.

Severity level

Major

Example

·     Non-NVMe AIC card: Drive Fault---Percentage drive life used is 100%---Exceed the major threshold 95%.

·     NVMe AIC card: Drive Fault---Percentage drive life used is 100%---Exceed the major threshold 95% ---PCIe slot 7.

Impact

The remaining drive life is significantly lower than expected, which might cause data loss.

Cause

The used SSD life reaches the major alarm threshold.

Recommended action

1.     ‍If the used drive life reaches the major alarm threshold, replace the drive as soon as possible.

2.     If the issue persists, contact Technical Support.

 

Drive Fault---The disk is present, but its details cannot be obtained

Event code

0x0d1500de

Message text

Drive Fault---The disk is present, but its details cannot be obtained

HDDBay drive: Drive Fault---The disk is present but its details cannot be obtained

Variable fields

N/A

Severity level

Major

Example

Drive Fault---The disk is present, but its details cannot be obtained

HDDBay drive: Drive Fault---The disk is present but its details cannot be obtained

Impact

The storage system stability of the system is impacted.

Cause

A drive cannot be identified by the storage system, because the drive is faulty or the cables are connected incorrectly.

Recommended action

1.     Log in to HDM, and verify that the drive can be identified successfully.

2.     If the issue persists, verify that the drive data, power, and signal cable are connected correctly.

3.     If the issue persists, replace the drive.

4.     If the issue persists, contact Technical Support.

 

Drive Fault---The disk is present, but its details cannot be obtained

Event code

0x0d1520de

Message text

Drive Fault---The disk is present, but its details cannot be obtained

HDDBay drive: Drive Fault---The disk is present but its details cannot be obtained---Bay slot:$1---HDD slot:$2

Variable fields

$1: Bay slot number.

$2: HDD slot number.

Severity level

Major

Example

Drive Fault---The disk is present, but its details cannot be obtained

HDDBay drive: Drive Fault---The disk is present but its details cannot be obtained---Bay slot:10---HDD slot:34

Impact

The storage system stability of the system is impacted.

Cause

A drive cannot be identified by the storage system, because the drive is faulty or the cables are connected incorrectly.

Recommended action

1.     Log in to HDM, and verify that the drive can be identified successfully.

2.     If the issue persists, verify that the drive data, power, and signal cable are connected correctly.

3.     If the issue persists, replace the drive.

4.     If the issue persists, contact Technical Support.

 

Drive Fault

Event code

0x0d1900de

Message text

Drive Fault--- $1---$2

Variable fields

$1: Fault description.

$2: PCIe silk screen for ACI card. This field is available only when an ACI card is connected.

Severity level

Major

Example

Drive Fault---The NVM subsystem reliability has been degraded---PCIe Slot 2

Impact

The drive health status is abnormal, which might cause data loss.

Cause

The SMART status of drives is abnormal.

Recommended action

1.     ‍Run a full SMART test on drives to check the drive health status.

2.     If the issue persists, back up drive data in time and replace the faulty drive.

3.     If the issue persists, contact Technical Support.

 

Predictive Failure

Event code

0x0d2000de

Message text

Predictive Failure

Variable fields

N/A

Severity level

Minor

Example

Predictive Failure

Impact

The drive reliability decreases, which might impact the OS storage performance and service operation.

Cause

The RAID controller or NVMe SSD reports a predictive failure, which can be a storage medium reserved block alarm, drive lifetime alarm, Prefail alarm, or bad sector alarm.

Recommended action

1.     ‍Replace the drive.

2.     If the issue persists, contact Technical Support.

 

Predictive Failure

Event code

0x0d2050de

Message text

Predictive Failure---Percentage drive life used is $1%---Exceed the minor threshold $2% $3.

Variable fields

$1: Used life percentage of the drive.

$2: Minor threshold.

$3: PCIe mark for the AIC card.

Severity level

Minor

Example

Non-NVMe AIC card: Predictive Failure---Percentage drive life used is 93%---Exceed the minor threshold 90%.

NVMe AIC card: Predictive Failure---Percentage drive life used is 93%---Exceed the minor threshold 90%---PCIe slot 7.

Impact

The remaining drive life is significantly lower than expected. The probability of drive damage is increasing, which affects data security.

Cause

The used SSD life reaches the minor alarm threshold.

Recommended action

1.     ‍If the used drive life reaches the minor alarm threshold, replace the drive as soon as possible.

2.     If the issue persists, contact Technical Support.

 

Predictive Failure

Event code

0x0d2900de

Message text

Predictive Failure--- $1 $2

Variable fields

$1: Fault description.

$2: PCIe silk screen for ACI card. This field is available only when an ACI card is connected.

Severity level

Minor

Example

Predictive Failure --- The available spare capacity has fallen below the threshold---PCIe Slot 2

Impact

The drive health status is abnormal, which might cause data loss.

Cause

The SMART status of drives is abnormal.

Recommended action

1.     ‍Run a full SMART test on drives to check the drive health status.

2.     If the issue persists, back up drive data in time and replace the faulty drive.

3.     If the issue persists, contact Technical Support.

 

In Critical Array

Event code

0x0d5000de

Message text

In Critical Array---PCIe slot:$1---LDDevno:$2

Variable fields

$1: PCIe slot where the logical drive resides.

$2: Logical drive number.

Severity level

Major

Example

In Critical Array---PCIe slot:1---LDDevno:1

Impact

The logical drive degraded, which might impact data reliability.

Cause

A drive in a logical drive was removed or failed and the logical drive degraded.

Recommended action

1.     ‍Verify that no drive is removed. If a drive is removed, re-install the drive and recreate the RAID array.

2.     Log in to HDM, view drive information from the storage page, and verify that all drives in the logical drive are identified correctly. If a drive cannot be identified, re-install the drive. If the drive cannot be identified after re-installation, replace the drive.

3.     Log in to HDM, view drive information, and verify that the status of the drive is Unconfigured Good.

4.     After the drive is identified correctly, recreate the RAID array.

5.     If the issue persists, contact Technical Support.

 

In Failed Array

Event code

0x0d6000de

Message text

In Failed Array---PCIe slot:$1---LDDevno:$2

Variable fields

$1: PCIe slot where the logical drive resides.

$2: Logical drive number.

Severity level

Major

Example

In Failed Array---PCIe slot:1---LDDevno:1

Impact

The RAID array becomes invalid, causing data loss offline.

Cause

A drive in a logical drive was removed or failed and the logical drive was totally corrupted.

Recommended action

1.     ‍Verify that no drive is removed. If a drive is removed, re-install the drive.

2.     If the drive is installed correctly, log in to HDM. View drive information from the storage page, and verify that the drive can be identified correctly. If the drive cannot be identified, re-install the drive. If the drive cannot be identified after re-installation, replace the drive.

3.     If the drive is installed correctly, log in to HDM, view drive information from the storage page, and verify that the status of the drive is Unconfigured Good.

4.     After the drive is identified correctly, verify that the RAID array is normal. If the RAID array is faulty, recreate the RAID array.

5.     If the issue persists, contact Technical Support.

 

Rebuild/Remap in progress

Event code

0x0d7000de

Message text

Rebuild/Remap in progress

Variable fields

N/A

Severity level

Info

Example

Rebuild/Remap in progress

Impact

No negative impact.

Cause

This message is generated during RAID rebuilding after a drive is installed.

Recommended action

No action is required.

 

The disk triggered an media error

Event code

0x0da000de

Message text

The disk triggered an media error--$1

Variable fields

$1: Drive location.

Severity level

Info

Example

The disk triggered an media error--Front 1

Impact

A media error on the storage media might cause data loss.

Cause

The number of media errors exceeded the threshold.

Recommended action

1.     ‍Update the drive firmware.

2.     Replace the drive.

3.     If the issue persists, contact Technical Support.

 

The disk triggered an uncorrectable error

Event code

0x0db000de

Message text

The disk triggered an uncorrectable error--$1

Variable fields

$1: Drive location.

Severity level

Minor

Example

The disk triggered an uncorrectable error--Front 1

Impact

An uncorrectable error on the storage media might cause data loss.

Cause

The number of uncorrectable errors exceeded the threshold.

Recommended action

1.     ‍Update the drive firmware.

2.     Replace the drive.

3.     If the issue persists, contact Technical Support.

 

The disk is missing

Event code

0x0dc000de

Message text

The disk is missing

Variable fields

N/A

Severity level

Major

Example

The disk is missing

Impact

The drive is removed or not installed correctly, which impacts the storage system stability of the system.

Cause

The drive cannot be identified by the storage controller or drive cables are connected incorrectly.

Recommended action

1.     ‍Log in to HDM, and verify that the drive can be identified successfully.

2.     Verify that the drive data cables, power cords, and signal cables are connected correctly.

3.     Re-install the drive.

4.     Replace the drive.

5.     If the issue persists, contact Technical Support.

 

System Firmware Progress

System Firmware Error (POST Error)---CPU PPL initialization failed

Event code

0x0f00e0de

Message text

System Firmware Error (POST Error)---CPU PPL initialization failed

Variable fields

N/A

Severity level

Critical

Example

System Firmware Error (POST Error)---CPU PPL initialization failed

Impact

System startup failure occur.

Cause

CPU PLL initialization failed. The startup process got hung.

Recommended action

1.     ‍Restart the server and verify if the log is generated again.

2.     Replace the BIOS firmware.

3.     Power off the server and replace the CPU.

4.     If the issue persists, contact Technical Support.

 

System Firmware Error (POST Error)---No memory found

Event code

0x0f0e8014

Message text

System Firmware Error (POST Error)---No memory found

Variable fields

N/A

Severity level

Major

Example

System Firmware Error (POST Error)---No memory found

Impact

The system cannot start up correctly.

Cause

No DIMMs are available.

Recommended action

Verify that the DIMMs are available in the system.

 

System Firmware Error (POST Error)---CPU matching failure---CPU stepping is detected

Event code

0x0f0d00de

Message text

System Firmware Error (POST Error)---CPU matching failure---CPU stepping is detected

Variable fields

N/A

Severity level

Major

Example

System Firmware Error (POST Error)---CPU matching failure---CPU stepping is detected

Impact

System startup failure occurs.

Cause

A CPU stepping mismatch error occurred at POST.

Recommended action

1.     ‍Verify that the CPU has the same model as the primary CPU.

2.     Verify that CPU stepping of the CPU matches the primary CPU.

 

System Firmware Error (POST Error)---CPU matching failure---CPU frequency is detected

Event code

0x0f0d10de

Message text

System Firmware Error (POST Error)---CPU matching failure---CPU frequency is detected

Variable fields

N/A

Severity level

Major

Example

System Firmware Error (POST Error)---CPU matching failure---CPU frequency is detected

Impact

System startup failure might occur.

Cause

A CPU frequency mismatch error occurred at POST.

Recommended action

Verify that the CPU has the same model as the primary CPU.

 

System Firmware Error (POST Error)---CPU matching failure---CPU Microcode is detected

Event code

0x0f0d20de

Message text

System Firmware Error (POST Error)---CPU matching failure---CPU Microcode is detected

Variable fields

N/A

Severity level

Major

Example

System Firmware Error (POST Error)---CPU matching failure---CPU Microcode is detected

Impact

System startup failure might occur.

Cause

A CPU microcode mismatch error occurred at POST.

Recommended action

Verify that the CPU has the same model as the primary CPU.

 

System Firmware Error (POST Error)---CPU matching failure---UPI Topology is detected

Event code

0x0f0d30de

Message text

System Firmware Error (POST Error)---CPU matching failure---UPI Topology is detected

Variable fields

N/A

Severity level

Major

Example

System Firmware Error (POST Error)---CPU matching failure---UPI Topology is detected

Impact

System startup failure might occur.

Cause

A CPU UPI mismatch error occurred at POST.

Recommended action

Verify that the CPU has the same model as the primary CPU.

 

System Firmware Error (POST Error)---CPU matching failure

Event code

0x0f0d40de

Message text

System Firmware Error (POST Error)---CPU matching failure

Variable fields

N/A

Severity level

Major

Example

System Firmware Error (POST Error)---CPU matching failure

Impact

System startup failure might occur.

Cause

The CPU model mismatch error occurred at POST.

Recommended action

Verify that the CPU has the same model as the primary CPU.

 

System Firmware Error(POST Error)---Unrecoverable video controller failure

Event code

0x0f0090de

Message text

System Firmware Error(POST Error)---Unrecoverable video controller failure

Variable fields

N/A

Severity level

Minor

Example

System Firmware Error(POST Error)---Unrecoverable video controller failure

Impact

KVM video display is abnormal.

Cause

Two VGA screen captures are the same during the host startup process.

Recommended action

1.     ‍Replace the BMC card.

2.     If the issue persists, contact Technical Support.

 

System Firmware Hang

Event code

0x0f1000de

Message text

System Firmware Hang

Variable fields

N/A

Severity level

Critical

Example

System Firmware Hang

Impact

System operation failure might occur.

Cause

The BIOS hangs during startup.

Recommended action

1.     ‍Resolve the issue based on other event logs reported simultaneously for the component.

2.     If the issue persists, contact Technical Support.

 

System Firmware Hang---C2C initialization failed

Event code

0x0f1800de

Message text

System Firmware Hang---C2C initialization failed

Variable fields

N/A

Severity level

Critical

Example

System Firmware Hang---C2C initialization failed

Impact

The system fails to start up properly.

Cause

The processor interconnect link C2C failed to be initialized.

Recommended action

1.     ‍Restart the server.

2.     Update the BIOS firmware.

3.     If the issue persists, replace the processor.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

System Firmware Hang---C2C initialization cannot obtain parameter table

Event code

0x0f1801de

Message text

System Firmware Hang---C2C initialization cannot obtain parameter table

Variable fields

N/A

Severity level

Critical

Example

System Firmware Hang---C2C initialization cannot obtain parameter table

Impact

The system cannot operate correctly.

Cause

The initialization of the processor interconnect link C2C failed to continue due to failure to obtain the parameter table.

Recommended action

1.     ‍Restart the server.

2.     Update the BIOS firmware.

3.     If the issue persists, replace the processor.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

System software triggered an uncorrectable error

Event code

0x0f1a00de

Message text

System software triggered an uncorrectable error

Variable fields

N/A

Severity level

Major

Example

System software triggered an uncorrectable error

Impact

An IERR or MCERR error is triggered, which may cause service unavailability.

Cause

An IERR or MCERR error is triggered, and the HDM diagnosis result shows that a system software uncorrectable error occurred.

Recommended action

Usually an IERR or MCERR error is triggered by an abnormality in the system or system software. Contact Technical Support.

 

System software triggered a correctable error

Event code

0x0f0a00de

Message text

System software triggered a correctable error

Variable fields

N/A

Severity level

Minor

Example

System software triggered a correctable error

Impact

An IERR or MCERR error is triggered, which may cause service unavailability.

Cause

An IERR or MCERR error is triggered, and the HDM diagnosis result shows that a system software correctable error occurred.

Recommended action

Usually an IERR or MCERR error is triggered by an abnormality in the system or system software. Contact Technical Support.

 

System Firmware Progress---Memory initialization---The system is unable to find memory parameter table

Event code

0x0f2011de

Message text

System Firmware Progress---Memory initialization---The system is unable to find memory parameter table

Variable fields

N/A

Severity level

Critical

Example

System Firmware Progress---Memory initialization---The system is unable to find memory parameter table

Impact

The system fails to start up properly.

Cause

The system failed to obtain the memory parameter table during memory initialization.

Recommended action

1.     ‍Restart the server.

2.     Update the BIOS firmware.

3.     If the issue persists, replace the memory module.

4.     If the issue persists, replace the system board.

5.     If the issue persists, contact Technical Support.

 

System Firmware Progress---Secondary processor(s) initialization---Detection unsuccessful

Event code

0x0f2030de

Message text

System Firmware Progress---Secondary processor(s) initialization---Detection unsuccessful

Variable fields

N/A

Severity level

Minor

Example

System Firmware Progress---Secondary processor(s) initialization---Detection unsuccessful

Impact

No negative impact.

Cause

The TPM/TCM self-test signal is lost or a device access failure occurs.

Recommended action

Contact Technical Support.

 

System Firmware Progress---PCI resource configuration---PCIe controller initialization failed

Event code

0x0f2070de

Message text

System Firmware Progress---PCI resource configuration---PCIe controller initialization failed

Variable fields

N/A

Severity level

Critical

Example

System Firmware Progress---PCI resource configuration---PCIe controller initialization failed

Impact

The system cannot start up correctly.

Cause

The PCIe controller fails to be initialized and gets stuck during the startup.

Recommended action

1.     ‍Check the BIOS settings on the server. Make sure the related PCIe controller settings are configured correctly and not disabled.

2.     Restart the server.

3.     Update the BIOS firmware.

4.     If the issue persists, replace the CPU or system board where the PCIe controller resides.

5.     If the issue persists, contact Technical Support.

 

System Firmware Progress---PCI resource configuration---PCIe controller initialization cannot find parameter table

Event code

0x0f2071de

Message text

System Firmware Progress---PCI resource configuration---PCIe controller initialization cannot find parameter table

Variable fields

N/A

Severity level

Critical

Example

System Firmware Progress---PCI resource configuration---PCIe controller initialization cannot find parameter table

Impact

The system cannot start up correctly.

Cause

The PCIe controller fails to be initialized due to failure to find the parameter table.

Recommended action

1.     ‍Check the BIOS settings on the server. Make sure the related PCIe controller settings are configured correctly and not disabled.

2.     Restart the server.

3.     Update the BIOS firmware.

4.     If the issue persists, replace the CPU or system board where the PCIe controller resides.

5.     If the issue persists, contact Technical Support.

 

System Firmware Progress---Video initialization---Detection unsuccessful

Event code

0x0f2090de

Message text

System Firmware Progress---Video initialization---Detection unsuccessful

Variable fields

N/A

Severity level

Minor

Example

System Firmware Progress---Video initialization---Detection unsuccessful

Impact

No negative impact.

Cause

A video controller check failed.

Recommended action

Contact Technical Support.

 

Event Logging Disabled

Log Area Reset/Cleared

Event code

0x102000de

Message text

Log Area Reset/Cleared

Variable fields

N/A

Severity level

Info

Example

Log Area Reset/Cleared

Impact

No negative impact.

Cause

This message is generated when all event log entries are cleared.

Recommended action

No action is required.

 

SEL Full

Event code

0x104000de

Message text

SEL Full

Variable fields

N/A

Severity level

Minor

Example

SEL Full

Impact

The system stops logging new events.

Cause

This message is generated when one of the following occurs:

·     If the event log reaches its maximum size, the system stops logging new events.

·     A user disables event logging.

Recommended action

Log in to HDM, enter the Event Log page, and clear all event logs.

 

SEL Almost Full

Event code

0x105000de

Message text

SEL Almost Full

Variable fields

N/A

Severity level

Minor

Example

SEL Almost Full

Impact

No negative impact.

Cause

The log file is reaching its maximum size when the logging policy is configured to stop login at full storage.

Recommended action

Log in to HDM, enter the Event Log page, and clear all event logs.

 

System Event

System Reconfigured---BIOS load default. CMOS cleared

Event code

0x120000de

Message text

System Reconfigured---BIOS load default. CMOS cleared

Variable fields

N/A

Severity level

Info

Example

System Reconfigured---BIOS load default. CMOS cleared

Impact

The BIOS loads the default settings and the user-configured settings get lost.

Cause

The system board battery is abnormal.

Recommended action

1.     ‍Verify that the BIOS boot mode meets the requirements of secure boot. If not, change the boot mode to UEFI.

2.     Verify that the BIOS firmware is upgraded successfully.

3.     Upgrade the BIOS with the factory defaults (if any) or default settings of the BIOS restored.

4.     If the issue persists, contact Technical Support.

 

Oem system boot event---LPC Reset occurred

Event code

0x121000de

Message text

Oem system boot event---LPC Reset occurred

Variable fields

N/A

Severity level

Info

Example

Oem system boot event---LPC Reset occurred

Impact

The host system reboots or powers off.

Cause

·     A command to reboot or power off the host system was executed.

·     The host system has an anomaly.

Recommended action

No action is required.

 

Limit Exceeded---CPU usage exceeds the threshold

Event code

0x1210100a

Message text

Limit Exceeded---CPU usage exceeds the threshold---Current usage $1, Threshold $2

Variable fields

$1: Current CPU usage.

$2: CPU usage threshold.

Severity level

Info

Example

Limit Exceeded---Cpu usage exceeds the threshold---Current usage 82%, Threshold 80%

Impact

System performance degradation might occur.

Cause

The CPU usage exceeds the threshold.

Recommended action

No action is required.

 

Limit Exceeded---Mem usage exceeds the threshold

Event code

0x1210200a

Message text

Limit Exceeded---Mem usage exceeds the threshold---Current usage $1, Threshold $2

Variable fields

$1: Current memory usage.

$2: Memory usage threshold.

Severity level

Info

Example

Limit Exceeded---Mem usage exceeds the threshold---Current usage 81%, Threshold 80%

Impact

System performance degradation might occur.

Cause

The memory usage exceeds the threshold.

Recommended action

No action is required.

 

Limit Exceeded---Network usage exceeds the threshold

Event code

0x1210300a

Message text

Limit Exceeded---Network usage exceeds the threshold---Current usage $1, Threshold $2

Variable fields

$1: Current network usage.

$2: Network usage threshold.

Severity level

Info

Example

Limit Exceeded---Network usage exceeds the threshold---Current usage 81%, Threshold 80%

Impact

The network might get lost.

Cause

The network usage exceeds the threshold.

Recommended action

This message is triggered by FIST SMS according to the system resource usage.

 

Limit Exceeded---Hard disk usage exceeds the threshold

Event code

0x1210400a

Message text

Limit Exceeded---Hard disk usage exceeds the threshold---OS:Linux/Unix,See disk details about Logical disk name,Current usage $1, Threshold $2

Variable fields

$1: Current drive usage.

$2: Drive usage threshold.

Severity level

Info

Example

Limit Exceeded---Hard disk usage exceeds the threshold---OS:Linux/Unix,See disk details about Logical disk name,Current usage 81%, Threshold 80%

Impact

The drive reliability decreases, which might impact the storage performance and service operation of the OS.

Cause

The drive usage exceeds the threshold.

Recommended action

This message is triggered by FIST SMS according to the system resource usage.

 

Timestamp clock synch

Event code

0x125000de

Message text

Timestamp Clock Synch---event is $1 of pair---SEL Timestamp Clock updated

Variable fields

$1: In the format of first/second, where first represents the event before time synchronization and second represents the event after time synchronization.

Severity level

Info

Example

Timestamp Clock Synch---event is first of pair---SEL Timestamp Clock updated

Impact

No negative impact.

Cause

HDM synchronizes time with the server when the server is powered on. The first event is triggered before time synchronization and the second event is triggered after time synchronization.

Recommended action

No action is required.

 

Timestamp clock synch---BMC Time SYNC succeed

Event code

0x125800de

Message text

Timestamp Clock Synch---BMC Time SYNC succeed.

Variable fields

N/A

Severity level

Info

Example

Timestamp Clock Synch---BMC Time SYNC succeed.

Impact

No negative impact.

Cause

BMC synchronized ME clock successfully.

Recommended action

No action is required.

 

Critical Interrupt

Transition to Non-Critical from OK

Event code

0x1310000e

Message text

Transition to Non-Critical from OK--- Single-bit ECC error---PCIe slot:$1

Variable fields

$1: Slot number.

Severity level

Major

Example

Transition to Non-Critical from OK--- Single-bit ECC error---PCIe slot: 2

Impact

An error occurred during the access to a PCIe module. This has no negative impact on the system operation.

Cause

The PCIe module in the slot is faulty.

Recommended action

This message is generated when an error is detected by PCIe hardware check. Review the related event log messages and replace the faulty PCIe module or contact Technical Support.

 

Bus Correctable Error

Event code

0x137000de

Message text

Bus Correctable Error ---Slot $1---PCIE Name:$2

Variable fields

$1: PCIe slot number.

$2: PCIe module name, IEH.

Severity level

Minor

Example

·     Bus Correctable Error---Slot 3---PCIE Name: RAID-LSI-9361-8i

·     Bus Correctable Error---PCIE Name: IEH

Impact

If this message is generated occasionally, no negative impact occurs on the system. If this message is generated frequently, the PCIe module performance might be affected.

Cause

An internal correctable error occurred on the PCIe module.

Recommended action

1.     ‍When you access PCIe devices, this alarm can generally be ignored.

2.     If the alarm is generated continuously, determine the faulty PCIe device based on the error information.

3.     Restart the server and verify if the logs still report the issue.

4.     Update the BIOS, PCIe device firmware, and drivers.

5.     If an IEH error occurred, replace the system board or CPUs.

6.     If a module error occurred, verify that the module is installed correctly or replace the module slot to identify the fault reason.

7.     If the issue persists, replace with the spare part.

GPU Device Correctable Error

Event code

0x137100de

Message text

GPU Device Correctable Error---Slot:$1

Variable fields

$1: Slot number of the faulty PCIe module.

Severity level

Minor

Example

GPU Device Correctable Error---Slot:9

Impact

The GPU can operate correctly and does not impact the system operation.

Cause

The onboard sensor, temperature sensor, or power consumption sensor for the Enflame S60 GPU is faulty.

Recommended action

No action is required.

 

GPU PCIe Bus Correctable Error

Event code

0x137200de

Message text

GPU PCIe Bus Correctable Error---Slot:$1

Variable fields

$1: Slot number of the faulty PCIe module.

Severity level

Minor

Example

GPU PCIe Bus Correctable Error---Slot:9

Impact

The GPU can operate correctly and does not impact the system operation.

Cause

The PCIe bus for the S60 GPU is faulty.

Recommended action

No action is required.

 

GPU Vedio Memory Correctable Error

Event code

0x137300de

Message text

GPU Vedio Memory Correctable Error---Slot:$1

Variable fields

$1: Slot number of the faulty PCIe module.

Severity level

Minor

Example

GPU Vedio Memory Correctable Error---Slot:9

Impact

The GPU can operate correctly and does not impact the system operation.

Cause

The S60 GPU video memory is faulty.

Recommended action

No action is required.

 

Bus Uncorrectable Error

Event code

0x138000de

Message text

Bus Uncorrectable Error ---Slot $1---PCIE Name:$2

Variable fields

$1: PCIe slot number.

$2: PCIe module name, IEH.

Severity level

Major

Example

·     Bus Uncorrectable Error---Slot 3---PCIE Name: RAID-LSI-9361-8i

·     Bus Uncorrectable Error---PCIE Name: IEH

Impact

An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough.

Cause

An uncorrectable error occurred on the device or link when the specified PCIe device interacted with CPUs.

Recommended action

1.     Restart the server to identify whether the log message is still generated.

2.     Update the firmware of the BIOS, PCIe device, and drivers.

3.     If this log message is generated due to an IEH error, replace the system board or CPU.

4.     If this log message is generated due to a non-IEH error, perform the following steps:

a.     Verify that the PCIe device is installed securely and the gold contacts on the device are not contaminated.

b.     Install the PCIe device to a new slot. If the faulty slot changes, replace the PCIe device. If the fault occurs on the fixed slot, replace the card where the PCIe device resides.

c.     If this log message is generated on devices integrated on the system board, replace the system board.

5.     If multiple similar errors occur in the same time period, check the connection status of the riser cards and other link components or system board. If no anomalies are detected, replace link components such as the switch board system board.

6.     If the issue persists, contact Technical Support.

 

GPU Device Uncorrectable Error

Event code

0x138100de

Message text

GPU Device Uncorrectable Error---Slot:$1

Variable fields

$1: Slot number of the faulty PCIe module.

Severity level

Major

Example

GPU Device Unorrectable Error---Slot:9

Impact

The GPU is being reset, which interrupts services.

Cause

The onboard sensor, temperature sensor, or power consumption sensor for the Enflame S60 GPU is faulty.

Recommended action

1.     ‍Wait for the GPU reset to finish.

2.     If the issue persists, restart the server.

3.     If the issue persists, re-install the GPU.

4.     If the issue persists, replace the GPU.

5.     If the issue persists, contact Technical Support.

 

GPU PCIe Bus Uncorrectable Error

Event code

0x138200de

Message text

GPU PCIe Bus Uncorrectable Error---Slot:$1

Variable fields

$1: Slot number of the faulty PCIe module.

Severity level

Major

Example

GPU Device Unorrectable Error---Slot:9

Impact

The GPU is being reset, which interrupts services.

Cause

The PCIe bus for the S60 GPU is faulty.

Recommended action

1.     ‍Wait for the GPU reset to finish.

2.     If the issue persists, restart the server.

3.     If the issue persists, re-install the GPU.

4.     If the issue persists, replace the GPU.

5.     If the issue persists, contact Technical Support.

 

GPU Vedio Memory Uncorrectable Error

Event code

0x138300de

Message text

GPU Vedio Memory Uncorrectable Error---Slot:$1

Variable fields

$1: Slot number of the faulty PCIe module.

Severity level

Major

Example

GPU Vedio Memory Unorrectable Error---Slot:9

Impact

The GPU is being reset, which interrupts services.

Cause

The S60 GPU video memory is faulty.

Recommended action

1.     ‍Wait for the GPU reset to finish.

2.     If the issue persists, restart the server.

3.     If the issue persists, re-install the GPU.

4.     If the issue persists, replace the GPU.

5.     If the issue persists, contact Technical Support.

 

Bus Fatal Error

Event code

0x13a000de

Message text

Bus Fatal Error ------Slot $1---PCIE Name: $2

Variable fields

$1: PCIe slot number.

$2: PCIe module name, IEH.

Severity level

Major

Example

·     Bus Fatal Error---Slot 3---PCIE Name: RAID-LSI-9361-8i

·     Bus Fatal Error---PCIE Name: IEH

Impact

An error occurred on the PCIe module. The server might fail to be powered due to the error.

Cause

Due to uncorrectable errors in the PCIe link or device, and the inability of the software layer to perform fault tolerance, irreversible effects are caused on the system.

Recommended action

1.     Restart the server to identify whether the log message is still generated.

2.     Update the firmware of the BIOS, PCIe device, and drivers.

3.     If this log message is generated due to an IEH error, replace the system board or CPU.

4.     If this log message is generated due to a non-IEH error, perform the following steps:

a.     ‍Verify that the PCIe device is installed securely and the gold contacts on the device are not contaminated.

b.     Install the PCIe device to a new slot. If the faulty slot changes, replace the PCIe device. If the fault occurs on the fixed slot, replace the card where the PCIe device resides.

c.     If this log message is generated on devices integrated on the system board, replace the system board.

5.     If multiple similar errors occur in the same time period, check the connection status of the riser cards and other link components or system board. If no anomalies are detected, replace link components such as the switch board system board.

6.     If the issue persists, contact Technical Support.

 

Bus Degraded

Event code

0x13b000de

Message text

Bus Degraded ------Slot $1---PCIE Name: $2

Variable fields

$1: PCIe slot number.

$2: PCIe module name.

Severity level

Major

Example

Bus Degraded ---Slot 3---PCIE Name: RAID-LSI-9361-8i

Impact

System performance degradation might occur.

Cause

The speed and bandwidth of the PCIe module decreased.

Recommended action

1.     ‍If the message is reported serval times during a period of time, ensure that the riser card is securely connected to the system board.

2.     Reboot the server and verify that the message is not generated again.

3.     Locate the PCIe module based on the slot number.

4.     If the PCIe module is a removable component, perform the following operations:

a.     ‍Verify that the PCIe module is installed correctly.

b.     Verify that the golden plating on the PCIe module is not contaminated.

c.     Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot.

d.     Update all firmware and drivers, including non-Intel components.

e.     If the error occurs on the PCIe slot, verify that the slot is normal and the gold plating on the riser card is not contaminated.

f.     Replace the PCIe module.

5.     If the PCIe module is embedded on the system board, perform the following operations:

a.     ‍Update the BIOS, firmware, and drivers.

b.     Replace the system board.

 

$1 triggered an uncorrectable error

Event code

0x138400de

Message text

$1 triggered an uncorrectable error

Variable fields

$1: PCIe module type.

Severity level

Major

Example

NIC triggered an uncorrectable error

Impact

An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough.

Cause

An IERR or MCERR error occurred, which is identified as a PCIe uncorrectable error by HDM.

Recommended action

1.     ‍Locate the PCIe module based on the slot number.

2.     If the PCIe module is a removable component, perform the following operations:

a.     ‍Verify that the PCIe module is installed correctly.

b.     Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot.

c.     Update all firmware and drivers, including non-Intel components.

3.     If the PCIe module is embedded on the system board, perform the following operations:

a.     ‍Update the BIOS, firmware, and driver.

b.     Replace the system board.

 

$1 triggered a correctable error

Event code

0x137400de

Message text

$1 triggered a correctable error

Variable fields

$1: PCIe module type.

Severity level

Minor

Example

NIC triggered a correctable error

Impact

An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough.

Cause

An IERR or MCERR error occurred, which is identified as a PCIe correctable error by HDM.

Recommended action

1.     ‍If this alarm occurs occasionally, you can ignore it.

2.     If the alarm persists, locate the faulty PCIe module based on the slot number.

3.     If the PCIe module is a removable component, perform the following operations:

a.     ‍Verify that the PCIe module is installed correctly.

b.     Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot.

c.     Update all firmware and drivers, including non-Intel components.

4.     If the PCIe module is embedded on the system board, perform the following operations:

a.     ‍Update the BIOS, firmware, and driver.

b.     Replace the system board.

 

Button / Switch

Power Button pressed---Physical button---Button pressed

Event code

0x140000de

Message text

Power Button pressed---$1---$2

Variable fields

N/A

Severity level

$1: Button type, including Physical button and Virtual button.

$2: Action, including Power off command, Power on command, and Soft off command.

Example

Power Button pressed---Physical button---Power off command

Impact

No negative impact.

Cause

This message is generated in the following conditions:

·     The physical power button on the front panel of the server is pressed.

·     Commands are executed to forcedly power off the server, gracefully power off the server, and power cycle the server.

Recommended action

No action is required.

 

Reset Button pressed

Event code

0x142000de

Message text

Reset Button pressed---Virtual button---reset command

Variable fields

N/A

Severity level

Info

Example

Reset Button pressed---Virtual button---reset command

Impact

No negative impact.

Cause

This message is generated when one of the following conditions exists:

·     The reset command is executed.

·     An IERR event occurs.

Recommended action

No action is required.

 

Module / Board

Transition to Non-Critical from OK($1)

Event code

0x1510000e

Message text

Transition to Non-Critical from OK($1)

Variable fields

$1: Types of alarms, such as VGA_REAR, USB_REAR_UP,USB_REAR_DOW, EAR_VGA2, EAR_LCD, L_EAR_USB, INNER_USB, R_EAR_USB.

Severity level

Minor

Example

Transition to Non-Critical from OK(VGA_REAR)

Impact

No negative impact if this message is generated occasionally.

Cause

An internal correctable error occurred on the motherboard.

Recommended action

1.     ‍Verify that the power supply for the system is normal.

2.     If the issue persists, contact Technical Support.

 

Transition to Critical from less severe

Event code

0x1520000e

Message text

Transition to Critical from less severe

Variable fields

N/A

Severity level

Major

Example

Transition to Critical from less severe

Impact

An error occurred on the PCIe BUS0 device, which might lead to the system-level failure if the error is severe enough.

Cause

An internal uncorrectable error occurred on the PCIe BUS0 device.

Recommended action

1.     ‍Verify that the power supply for the system is normal

2.     Verify that all components are operating correctly.

3.     If the issue persists, contact Technical Support.

 

Transition to Non- Recoverable from less severe

Event code

0x1530000e

Message text

Transition to Non-Recoverable from less severe--System detected a power supply failure on Motherboard($1)

Variable fields

$1: P5V, P5V_STBY, P12V_SHORTOUT, P1V0_STBY, P1V8_STBY, P1V05_PCH_STBY, PVNN_PCH_STBY, P1V8_PCH_STBY.

Severity level

Critical

Example

Transition to Non- Recoverable from less severe---System detected a power supply failure on Motherboard(P5V)

Impact

The system will be powered off.

Cause

The voltage for the system board is abnormal.

Recommended action

1.     ‍Ignore this message if it is triggered by a system power-on or power-off event.

2.     Reconnect power cords and identify whether the server can be powered on correctly.

¡     If the server can be powered on, the message might be generated because the detection signals were interfered. No action is required.

¡     If the server cannot be powered on, replace the system board.

3.     If the issue persists, replace the I/O expander module if any.

4.     If the issue persists, contact Technical Support.

 

System board triggered a correctable error

Event code

0x1511000e

Message text

System board triggered a correctable error

Variable fields

N/A

Severity level

Minor

Example

System board triggered a correctable error

Impact

An IERR or MCERR error occurred in the system, which causes services to become unavailable.

Cause

An IERR or MCERR error was triggered. The error was identified as an uncorrectable error on the system board (including backplanes) by HDM.

Recommended action

If the issue persists, contact Technical Support.

 

System board triggered an uncorrectable error

Event code

0x1521000e

Message text

System board triggered an uncorrectable error

Variable fields

N/A

Severity level

Major

Example

System board triggered an uncorrectable error

Impact

An IERR or MCERR error occurred in the system, which causes services to become unavailable.

Cause

An IERR or MCERR error was triggered. The error was identified as an uncorrectable error on the system board (including backplanes) by HDM.

Recommended action

If the issue persists, contact Technical Support.

 

Add-in Card

Transition to OK

Event code

0x1700000e

Message text

Transition to OK---PCIe slot: $1---LDDevno:$2

Variable fields

$1: PCIe slot where the logical drive resides.

$2: Logical drive number.

Severity level

Info

Example

Transition to OK---PCIe slot:1---LDDevno:0

Impact

No negative impact.

Cause

This message is generated if the logical drive managed by the storage controller changes from abnormal to normal.

Recommended action

No action is required.

 

Transition to Critical from less severe

Event code

0x1720000e

Message text

Transition to Critical from less severe

Variable fields

N/A

Severity level

Major

Example

Transition to Critical from less severe

Impact

The system will be powered off.

Cause

The backplane power supply is faulty.

Recommended action

1.     ‍Log in to HDM to identify whether the logical drive is degraded or faulty.

2.     If the logical drive is degraded, perform the following operations:

a.     ‍Verify that all member drives in the logical drive are operating correctly.

b.     Re-install member drives to identify whether the drives can be correctly identified.

c.     Access the BIOS to identify whether all member drives have been configured correctly.

d.     Check the error logs for the drives.

e.     Replace the faulty drive.

f.     If the issue persists, contact Technical Support.

3.     If the logical drive is faulty, perform the following operations:

a.     ‍Verify that the drive has not been uninstalled.

b.     Re-install the member drives and rebuild the RAID.

c.     Replace the faulty drive, and then reboot the server.

d.     If the issue persists, contact Technical Support.

 

Transition to Critical from less severe

Event code

0x172a000e

Event code

Transition to Critical from less severe---PCIe slot:$1---LDDevno::$2

Message text

The logical drive degraded.

Variable fields

Major

Severity level

Transition to Critical from less severe---PCIe slot: 1---LDDevno:0

Example

The logical drive degraded, which might impact data reliability.

Impact

This message is generated when the logical drive managed by the storage controller is degraded or faulty.

Cause

1.     ‍Log in to HDM to identify whether the logical drive is degraded or faulty.

2.     If the logical drive is degraded, perform the following operations:

a.     ‍Verify that all member drives in the logical drive are operating correctly.

b.     Re-install member drives to identify whether the drives can be correctly identified.

c.     Access the BIOS to identify whether all member drives have been configured correctly.

d.     Check the error logs for the drives.

e.     Replace the faulty drive.

f.     If the issue persists, contact Technical Support.

3.     If the logical drive is faulty, perform the following operations:

a.     ‍Verify that the drive has not been uninstalled.

b.     Re-install the member drives and rebuild the RAID.

c.     Replace the faulty drive, and then reboot the server.

d.     If the issue persists, contact Technical Support.

 

Transition to Non-recoverable  from less severe

Event code

0x1730000e

Message text

Transition to Non-recoverable from less severe

Variable fields

N/A

Severity level

Critical

Example

Transition to Non-recoverable from less severe

Impact

The system will be powered off.

Cause

The backplane power supply or the riser power supply is faulty.

Recommended action

1.     ‍Ignore this message if it is triggered by a system power-on or power-off event.

2.     Reconnect power cords and identify whether the server can be powered on correctly.

¡     If the server can be powered on, the message might be generated because the detection signals were interfered. No action is required.

¡     If the server cannot be powered on, review the SDS logs to locate the fault and replace the faulty component.

3.     If the issue persists, replace the faulty component.

4.     If the issue persists, contact Technical Support.

 

ChipSet

Transition to Critical from less severe

Event code

0x1920000e

Message text

Transition to Critical from less severe

Variable fields

N/A

Severity level

Minor

Example

Transition to Critical from less severe

Impact

System performance degradation might occur.

Cause

The PCH status was abnormal.

Recommended action

1.     ‍If this message is generated during the host restart process, ignore this message.

2.     If this message is repeatedly generated during the operation, replace the system board.

3.     If the issue persists, contact Technical Support.

 

Cable/Interconnect

Configuration Error - Incorrect cable connected / Incorrect interconnection

Event code

0x1b1000de

Message text

Configuration Error - Incorrect cable connected / Incorrect interconnection

Variable fields

N/A

Severity level

Minor

Example

Configuration Error - Incorrect cable connected / Incorrect interconnection

Impact

The network is abnormal, which might cause network disconnectivity in the system.

Cause

Incorrect cable configuration.

Recommended action

1.     ‍Verify that the cables are connected to the correct interfaces.

2.     Verify that the cables connected properly for power connection.

 

Configuration Error - Incorrect cable connected / Incorrect interconnection

Event code

0x1b1200de

Message text

Configuration Error - Incorrect cable connected / Incorrect interconnection---Slot$1 support S5 power supply, $2

Variable fields

$1: PCIe slot with S5 power supply enabled.

$2: Description of the reason.

Severity level

Minor

Example

Configuration Error-Incorrect cable connected / Incorrect interconnection---Slot5 support S5 power supply, but Card not support this feature

Impact

No negative impact on the system.

Cause

The slot with S5 power supply enabled either does not have a smart network adapter or the smart network adapter is not properly inserted.

Recommended action

1.     Disable the S5 function for the current slot.

2.     Insert the smart network adapter into the slot again.

 

Configuration Error - Incorrect cable connected / Incorrect interconnection

Event code

0x1b1800de

Message text

Configuration Error - Incorrect cable connected / Incorrect interconnection---$1

Variable fields

$1: Incorrect cable configuration.

Severity level

Minor

Example

Configuration Error - Incorrect cable connected / Incorrect interconnection---Incorrect SATA cable connection to the backplane

Impact

A communication exception might occur on the backplane.

Cause

Incorrect cable configuration.

Recommended action

1.     ‍Verify that the cables are connected to the correct interfaces.

2.     Verify that the cables connected properly for power connection.

 

Configuration Error - Incorrect cable connected / Incorrect interconnection

Event code

0x1b1400de

Message text

Configuration Error - Incorrect cable connected / Incorrect interconnection ($1)

Variable fields

$1: Cable connection location.

Severity level

Minor

Example

Configuration Error-Incorrect cable connected / Incorrect interconnection(FrontBackplane1)

Impact

A communication exception might occur on the backplane.

Cause

Incorrect cable configuration.

Recommended action

1.     ‍Verify that the cables are connected to the correct interfaces.

2.     Verify that the cables connected properly for power connection.

 

System Boot / Restart Initiated

Initiated by power up

Event code

0x1d0000de

Message text

Initiated by power up

Variable fields

N/A

Severity level

Info

Example

Initiated by power up

Impact

No negative impact.

Cause

This event is triggered by a system power-on.

Recommended action

No action is required.

 

Initiated by hard reset

Event code

0x1d1000de

Message text

Initiated by hard reset

Variable fields

N/A

Severity level

Info

Example

Initiated by hard reset

Impact

No negative impact.

Cause

This event is triggered by a system restart.

Recommended action

No action is required.

 

Initiated by warm reset

Event code

0x1d2000de

Message text

Initiated by warm reset

Variable fields

N/A

Severity level

Info

Example

Initiated by warm reset

Impact

No negative impact.

Cause

This event is triggered by a system warm restart.

Recommended action

No action is required.

 

System restart

Event code

0x1d7000de

Message text

System Restart---$1:$2

Variable fields

$1: Reboot cause.

$2: Power mode. Options include power off, power reset, and power cycle. This option may be empty.

Severity level

Info

Example

System Restart---due to power button pressed:power off

Impact

No negative impact.

Cause

The system restarts.

Recommended action

No action is required.

 

Boot Error

No bootable media

Event code

0x1e0000de

Message text

No bootable media

Variable fields

N/A

Severity level

Info

Example

No bootable media

Impact

No negative impact.

Cause

Status description to indicate no bootable media, which typically has no negative impact.

Recommended action

1.     ‍Specify an available boot device.

2.     If the issue persists, contact Technical Support.

 

OS_BOOT

C: boot completed

Event code

0x1f1000de

Message text

C: boot completed

Variable fields

N/A

Severity level

Info

Example

C: boot completed

Impact

No negative impact.

Cause

The operating system booted from a hard drive. This event happens for most Windows OSs.

Recommended action

No action is required.

 

Boot completed - boot device not specified

Event code

0x1f6000de

Message text

Boot completed - boot device not specified

Variable fields

N/A

Severity level

Info

Example

Boot completed - boot device not specified

Impact

No negative impact.

Cause

This message is generated when the server exits the BIOS boot phase.

Recommended action

No action is required.

 

OS Stop / Shutdown

Run-time Critical Stop

Event code

0x201000de

Message text

Run-time Critical Stop

Variable fields

N/A

Severity level

Critical

Example

Run-time Critical Stop

Impact

The system crashes.

Cause

A critical error occurred during operating system operation.

Recommended action

1.     ‍Verify that the installed system, drivers, firmware, and software do not have bugs and are compatible with the server.

2.     Update the versions if bugs or compatibility issues exist.

3.     Verify that the installed hardware options are compatible with the server. For more information about component and server compatibility, access the component compatibility query tool at the official website.

4.     If the issue persists, contact Technical Support.

 

OS Graceful Stop

Event code

0x202000de

Message text

OS Graceful Stop

Variable fields

N/A

Severity level

Info

Example

OS Graceful Stop

Impact

The system shut down.

Cause

The Windows OS was forcedly stopped.

Recommended action

No action is required.

 

OS Graceful Shutdown

Event code

0x203000de

Message text

OS Graceful Shutdown

Variable fields

N/A

Severity level

Info

Example

OS Graceful Shutdown

Impact

The system shut down.

Cause

The Windows OS was shut down gracefully.

Recommended action

No action is required.

 

Slot / Connector

Device Disabled: PCIe module information not obtained

Event code

0x21000012

Message text

Device Disabled: PCIe module information not obtained---Slot $1

Variable fields

$1: PCIe slot number.

Severity level

Major

Example

Device Disabled: PCIe module information not obtained---Slot 1

Impact

The PCIe module cannot be identified, which decrease the system performance.

Cause

The PCIe module is faulty.

Recommended action

1.     ‍Verify that the server starts up with the minimum configuration. For more information, see the troubleshooting guide for the server.

2.     Verify that port is disabled in the BIOS.

3.     Verify that the PCIe module is compatible with the server.

4.     Verify that the PCIe module is installed correctly.

5.     Install the PCIe module into another slot to verify that the PCIe module is not faulty.

6.     If the issue persists, contact Technical Support.

 

Fault Status asserted

Event code

0x210000de

Message text

Fault Status asserted:---fan error in slot $1

Variable fields

$1: Slot number.

Severity level

Major

Example

Fault Status asserted:---fan error in slot 15

Impact

The system might crash due to a PCIe module error.

Cause

This message is generated when the OCP fan is absent or blocked.

Recommended action

1.     ‍Re-install the OCP fan.

2.     If the issue persists, replace the OCP fan.

 

Transition to Non-Critical from OK

Event code

0x2110000e

Message text

Transition to Non-Critical from OK---slot $1----PCIe Name:$2

Variable fields

$1: PCIe slot number.

$2: PCIe module name.

Severity level

Major

Example

Transition to Non-Critical from OK---slot 15----PCIe Name:NIC-620F-B2-25Gb-2P-1-X

Impact

The system might crash due to a PCIe module error.

Cause

This message is generated when the system fails to obtain information about network adapter connection.

Recommended action

1.     ‍Verify that the network adapter is no faulty.

2.     Verify that the related links are operating correctly, for example, I2C or MCTP.

 

System ACPI Power State

S0 / G0 "working"

Event code

0x220000de

Message text

S0 / G0 "working"

Variable fields

N/A

Severity level

Info

Example

S0 / G0 "working"

Impact

No negative impact.

Cause

S0/G0 indicate that the system is operating correctly, where G(0-2) indicate the global states (G-States) and S(0-5) indicate the sleep states (S-States).

G0 operating status: In this state, you can run the applications.

S0 sleep state: Normal operating status.

Recommended action

No action is required.

 

S0 / G0 "working"

Event code

0x220800de

Message text

S0 / G0 "working"---$1

Variable fields

$1: Reason for a power-on operation, including:

·     due to virtual power button pressed

·     due to physical power button pressed

·     due to ipmi cmd

·     due to redfish cmd

·     due to AC lost

·     due to kvm button pressed

·     due to snmp cmd

Severity level

Info

Example

S0 / G0 "working"--- due to virtual power button pressed

Impact

No negative impact.

Cause

The system is powered on.

Recommended action

No action is required.

 

S5 / G2 "soft-off"

Event code

0x225000de

Message text

S5 / G2 "soft-off"

Variable fields

N/A

Severity level

Info

Example

S5 / G2 "soft-off"

Impact

No negative impact.

Cause

S5 / G2 indicates the software shutdown state. You cannot run applications or the operating system in this state. Software shutdown shuts down the entire operating system except the main power supply unit. Almost no power is consumed during software shutdown. The waking time will be longer to reboot the system after a soft shutdown.

Recommended action

No action is required.

 

S5 / G2 "soft-off"

Event code

0x225800de

Message text

S5 / G2 "soft-off"---$1

Variable fields

$1: Reason for a power-off operation, including:

·     due to virtual power button pressed

·     due to physical power button pressed

·     due to ipmi cmd

·     due to redfish cmd

·     due to AC lost

·     due to kvm button pressed

·     due to snmp cmd

·     due to pef expiration

Severity level

Info

Example

S5 / G2 "soft-off"--- due to virtual power button pressed

Impact

No negative impact.

Cause

S5 / G2 indicates the software shutdown state. You cannot run applications or the operating system in this state. Software shutdown shuts down the entire operating system except the main power supply unit. Almost no power is consumed during software shutdown. The waking time will be longer to reboot the system after a soft shutdown.

Recommended action

No action is required.

 

S4 / S5 soft-off, particular S4 / S5 state cannot be determined

Event code

0x226000de

Message text

S4 / S5 soft-off, particular S4 / S5 state cannot be determined

Variable fields

N/A

Severity level

Info

Example

S4 / S5 soft-off, particular S4 / S5 state cannot be determined

Impact

No negative impact.

Cause

S4/S5 indicates the software shutdown state, but you cannot identify whether the current state is S4 or S5.

S(0-5) indicate the sleep states (S-States).

S4 state:

·     All components are closed including ARM.

·     Only the platform settings are retained, while other settings are saved in a special location on the drive.

·     After a successful switch to S4, the system will shut down.

·     Due to the cessation of almost all programs and configurations, the power consumption is less than 3W.

·     Upon wake-up, the system needs to enter BIOS Boot Sequence again.

·     No system restart is required. The system will continue with the S5 shutdown state.

Recommended action

No action is required.

 

Watchdog2

Watchdog overflowAction:Timer expired

Event code

0x230000de

Message text

Watchdog overflow.Action:Timer expired - status only (no action and no interrupt)---interrupt type:$1---timer use at expiration:$2

Variable fields

$1: Interrupt type. Options include none, SMI, NMI, Messaging Interrupt, and unspecified.

$2: Watchdog. Options include reserved, BIOS FRB2, BIOS POST, OS Load, SMS OS, OEM, and unspecified.

Severity level

Info

Example

Watchdog overflow.Action:Timer expired - status only (no action and no interrupt)---interrupt type:none---timer use at expiration:BIOS FRB2

Impact

System startup failure might occur.

Cause

This message is generated when the following conditions are met:

·     The watchdog is enabled in the BIOS.

·     The watchdog timer expires.

·     The timeout action is set to no action.

Recommended action

1.     ‍For a BIOS POST watchdog timeout, review the event logs to identify hardware errors or BIOS startup errors, and troubleshoot the errors as instructed in the logs.

2.     For an OS Load watchdog timeout, verify that no error is present in the system startup environment. If no error is present, proceed to step 5.

3.     For an OS Running watchdog timeout, review the OS logs to identify whether software exceptions occurred and troubleshoot the exceptions as instructed in the logs.

4.     Identify whether data storms have occurred. If yes, troubleshoot network errors.

5.     If the issue persists, contact Technical Support.

 

Watchdog overflowAction:Hard Reset

Event code

0x231000de

Message text

Watchdog overflow.Action:Hard Reset---interrupt type:$1---timer use at expiration:$2

Variable fields

$1: Interrupt type. Options include none, SMI, NMI, Messaging Interrupt, and unspecified.

$2: Watchdog. Options include reserved, BIOS FRB2, BIOS POST, OS Load, SMS OS, OEM, and unspecified.

Severity level

Major

Example

Watchdog overflow.Action:Hard Reset---interrupt type:none---timer use at expiration:BIOS FRB2

Impact

System startup failure might occur.

Cause

This message is generated when the following conditions are met:

·     The watchdog is enabled in the BIOS.

·     The watchdog timer expires during the BIOS POST, OS Load, or SMS/OS phase (indicated by the watchdog timer type).

·     The timeout action is set to hard reset.

Recommended action

1.     ‍For a BIOS POST watchdog timeout, review the event logs to identify hardware errors or BIOS startup errors, and troubleshoot the errors as instructed in the logs.

2.     For an OS Load watchdog timeout, verify that no error is present in the system startup environment. If no error is present, proceed to step 5.

3.     For an OS Running watchdog timeout, review the OS logs to identify whether software exceptions occurred and troubleshoot the exceptions as instructed in the logs.

4.     Identify whether data storms have occurred. If yes, troubleshoot network errors.

5.     If the issue persists, contact Technical Support.

 

Watchdog overflowAction:Power Down

Event code

0x232000de

Message text

Watchdog overflow.Action:Power Down---interrupt type:$1---timer use at expiration:$2

Variable fields

$1: Interrupt type. Options include none, SMI, NMI, Messaging Interrupt, and unspecified.

$2: Watchdog. Options include reserved, BIOS FRB2, BIOS POST, OS Load, SMS OS, OEM, and unspecified.

Severity level

Major

Example

Watchdog overflow.Action:Power Down---interrupt type:none---timer use at expiration:BIOS FRB2

Impact

System startup failure might occur.

Cause

This message is generated when the following conditions are met:

·     The watchdog is enabled in the BIOS.

·     The watchdog timer expires during the BIOS POST, OS Load, or SMS/OS phase (indicated by the watchdog timer type).

·     The timeout action is set to power down.

The watchdog powered off the system forcibly. Services are interrupted and the data that has not been saved will get lost.

Recommended action

1.     ‍For a BIOS POST watchdog timeout, review the event logs to identify hardware errors or BIOS startup errors, and troubleshoot the errors as instructed in the logs.

2.     For an OS Load watchdog timeout, verify that no error is present in the system startup environment. If no error is present, proceed to step 5.

3.     For an OS Running watchdog timeout, review the OS logs to identify whether software exceptions occurred and troubleshoot the exceptions as instructed in the logs.

4.     Identify whether data storms have occurred. If yes, troubleshoot network errors.

5.     If the issue persists, contact Technical Support.

 

Watchdog overflowAction:Power Cycle

Event code

0x233000de

Message text

Watchdog overflow.Action:Power Cycle---interrupt type:$1---timer use at expiration:$2

Variable fields

$1: Interrupt type. Options include none, SMI, NMI, Messaging Interrupt, and unspecified.

$2: Watchdog. Options include reserved, BIOS FRB2, BIOS POST, OS Load, SMS OS, OEM, and unspecified.

Severity level

Major

Example

Watchdog overflow.Action:Power Cycle---interrupt type:none---timer use at expiration:BIOS FRB2

Impact

System startup failure might occur.

Cause

This message is generated when the following conditions are met:

·     The watchdog is enabled in the BIOS.

·     The watchdog timer expires during the BIOS POST, OS Load, or SMS/OS phase (indicated by the watchdog timer type).

·     The timeout action is set to power cycle.

Recommended action

1.     ‍For a BIOS POST watchdog timeout, review the event logs to identify hardware errors or BIOS startup errors, and troubleshoot the errors as instructed in the logs.

2.     For an OS Load watchdog timeout, verify that no error is present in the system startup environment. If no error is present, proceed to step 5.

3.     For an OS Running watchdog timeout, review the OS logs to identify whether software exceptions occurred and troubleshoot the exceptions as instructed in the logs.

4.     Identify whether data storms have occurred. If yes, troubleshoot network errors.

5.     If the issue persists, contact Technical Support.

 

Entity Presence

Entity Present---License is about to expire

Event code

0x250000de

Message text

Entity Present---License is about to expire

Variable fields

N/A

Severity level

Minor

Example

Entity Present---License is about to expire

Impact

No negative impact.

Cause

This message is generated when the remaining validity period of the license is less than 10 days.

Recommended action

The temporary license is about to expire. Please purchase the formal license.

 

Entity Disabled---License has expired

Event code

0x252000de

Message text

Entity Disabled---$1

Variable fields

$1: Certificate state:

·     License has expired.

·     License is unavailable.

Severity level

Minor

Example

Entity Disabled---License has expired

Impact

No negative impact.

Cause

The certificate has expired or is not available.

Recommended action

1.     ‍If the temporary license has expired, purchase and activate the formal license.

2.     If the license is not available, re-install and activate the existing license or contact Technical Support.

 

Management Subsystem Health

Controller access degraded or unavailable

Event code

0x281000de

Message text

Controller access degraded or unavailable---$1

Variable fields

$1: Possible options include Failed to access the SD card and SD card partitions are missing.

Severity level

Major

Example

Controller access degraded or unavailable---Failed to access the SD card.

Impact

No negative impact.

Cause

SD card reading failed or the SD card was missing.

Recommended action

1.     ‍Restart HDM.

2.     Reset the SD module for BMC.

3.     If the issue persists, contact Technical Support.

 

Controller access degraded or unavailable

Event code

0x282000de

Message text

Management controller off-line ---$1

Variable fields

$1: BMC reboot cause.

Severity level

Info

Example

Management controller off-line---BMC reset

Impact

No negative impact.

Cause

BMC was restarted.

Recommended action

No action is required.

 

Battery

Battery low (predictive failure)

Event code

0x290000de

Message text

Battery low (predictive failure)---PCIe slot:$1

Variable fields

$1: PCIe slot number of the storage controller.

Severity level

Minor

Example

Battery low (predictive failure)---PCIe slot:1

Impact

The reliability of the RAID controller will degrade, which might cause system performance degradation.

Cause

The supercapacitor of the storage controller has a low charge, overtemperature, overvoltage, or overcurrent condition.

Recommended action

1.     ‍Power on the server to charge the supercapacitor. Log in to HDM, and verify that the supercapacitor of the RAID controller is in normal state and identify whether the alarm is cleared.

2.     Verify that the power fail safeguard module is installed correctly.

3.     Replace the corresponding components, including the battery, supercapacitor, or flash card (if any), and then restart the server.

4.     If the issue persists, contact Technical Support.

 

Battery failed

Event code

0x291000de

Message text

Battery failed---PCIe slot:$1

Variable fields

$1: PCIe slot number of the storage controller.

Severity level

Minor

Example

Battery failed---PCIe slot:1

Impact

The reliability of the RAID controller will degrade, which might cause system performance degradation.

Cause

An internal error occurred on the power fail safeguard module of the storage controller.

Possible reasons include:

·     The supercapacitor is exhausted or has expired.

·     The power fail safeguard module failed to be initialized.

·     The power fail safeguard module subsystem failed.

·     The supercapacitor failed to be charged.

·     The battery or supercapacitor fails.

Recommended action

1.     ‍Log in to HDM, and verify that the supercapacitor of the RAID controller is in normal state.

2.     Verify that the power fail safeguard module is installed correctly.

3.     Replace the corresponding components, including the battery, supercapacitor, or flash card (if any), and then restart the server.

4.     If the issue persists, contact Technical Support.

 

Battery presence detected

Event code

0x292000de

Message text

Battery presence detected---PCIe slot:$1

Variable fields

$1: PCIe slot number of the storage controller.

Severity level

Info

Example

Battery presence detected---PCIe slot:1

Impact

The reliability of the RAID controller will degrade, which might cause system performance degradation.

Cause

The battery or supercapacitor of the RAID controller is absent.

Recommended action

1.     ‍Log in to HDM, and verify that the supercapacitor of the RAID controller is in normal state.

2.     Verify that the supercapacitor is installed correctly and the supercapacitor cable is connected correctly.

3.     Replace the corresponding components, including the battery, supercapacitor, or flash card (if any), and then restart the server.

4.     If the issue persists, contact Technical Support.

 

Version Change

Hardware incompatibility detected with associated Entity---Memory is not certified

Event code

0x2b2000de

Message text

Hardware incompatibility detected with associated Entity---Memory is not certified---Location:CPU:$1 CH:$2 DIMM:$3

Variable fields

$1: CPU number.

$2: Channel number.

$3: DIMM number.

Severity level

Minor

Example

Hardware incompatibility detected with associated Entity---Memory is not certified---Location:CPU:1 CH:1 DIMM:0

Impact

No negative impact.

Cause

This message is generated when the DIMM is not certified.

Recommended action

1.     ‍Install DIMMs certified by the server vendor.

2.     If the issue persists, contact Technical Support.

 

 

  • Cloud & AI
  • InterConnect
  • Intelligent Computing
  • Intelligent Storage
  • Security
  • SMB Products
  • Intelligent Terminal Products
  • Product Support Services
  • Technical Service Solutions
All Services
  • Resource Center
  • Policy
  • Online Help
  • Technical Blogs
All Support
  • Become A Partner
  • Partner Policy & Program
  • Global Learning
  • Partner Sales Resources
  • Partner Business Management
  • Service Business
All Partners
  • Profile
  • News & Events
  • Online Exhibition Center
  • Contact Us
All About Us
新华三官网