Title | Size | Downloads |
---|---|---|
H3C HDM2 System Log Messages Reference-6W100-book.pdf | 757.18 KB |
- Table of Contents
- Related Documents
-
Title | Size | Download |
---|---|---|
book | 757.18 KB |
System Log Messages Reference |
|
|
Copyright © 2023 New H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of New H3C Technologies Co., Ltd.
Except for the trademarks of New H3C Technologies Co., Ltd., any trademarks that may be mentioned in this document are the property of their respective owners.
The information in this document is subject to change without notice.
Contents
Dropped below the lower minor threshold
Dropped below the lower major threshold
Dropped below the lower critical threshold
Exceeded the upper minor threshold
Exceeded the upper major threshold
Exceeded the upper critical threshold
Dropped below the lower minor threshold
Dropped below the lower major threshold
Dropped below the lower major threshold
Dropped below the lower critical threshold
Exceeded the upper minor threshold
Exceeded the upper major threshold
Exceeded the upper major threshold
Exceeded the upper critical threshold
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Non-recoverable from less severe
Transition to Critical from less severe
Exceeded the upper minor threshold
Exceeded the upper major threshold
Exceeded the upper major threshold
Exceeded the upper critical threshold
Non-redundant:Sufficient Resources from Redundant
Non-redundant:Insufficient Resources
FRB3/Processor Startup/Initialization failure
Processor Automatically Throttled
Processor Automatically Throttled
Processor Automatically Throttled
Triggered a uncorrectable error
Correctable Machine Check Error
Correctable Machine Check Error
Correctable Machine Check Error
Power Supply Predictive Failure
Power Supply input lost (AC/DC)
Power Supply input lost or out-of-range
Power Supply input out-of-range - but present
Configuration error ---Vendor mismatch
Configuration error---Power Supply rating mismatch
Configuration error---Power supply rating mismatch
Power Supply Inactive/standby state
Power limit is exceeded over correction time limit
Exceeded the upper minor threshold
Correctable ECC or other correctable memory error
Correctable ECC or other correctable memory error
Correctable ECC or other correctable memory error
CPU triggered a correctable error
Uncorrectable ECC or other uncorrectable memory error
Uncorrectable ECC or other uncorrectable memory error
Triggered an uncorrectable error
Uncorrectable ECC or other uncorrectable memory error
Parity---An uncorrectable error occurs during the memory test phase
Parity---The memory interleaving configuration cannot meet the requirements of the server
Parity---The memory interleaving configuration cannot meet the requirements of the server
Parity---The memory interleaving configuration cannot meet the requirements of the server
Parity---CMD eye width is too small
Parity---CmdPiGroup: No Eye width
Parity---The command is not in the FNv table
Parity---Memory read DqDqs training failed
Parity---Memory Receive Enable Training Error
Parity---Memory write DqDqs training failed
Parity---An error occurrs during memory test, and the rank is disabled
Parity---LRDIMM RCVEN training failed
Parity---Read delay training failed
Parity---Write delay training failed
Parity---Mapped out because failed critical mask test at cold boot
Parity---The DCPMM memory modules of the unexpected model are installed
Parity---Failed to set the VDD voltage of the DIMM
Parity---Timing error occurred during signal line adjustment for memory write leveling training
Parity---CS is not consistent with clock in timing, and the channel is isolated
Parity---CS is not consistent with clock in timing, and the channel is isolated
Parity---LRDIMM external coarse training failed
Parity---LRDIMM external fine training failed
Parity---LRDIMM internal coarse training failed
Parity---LRDIMM internal fine training failed
Memory Device Disabled---The Rank is disabled
Memory Device Disabled---The DIMM is disabled
Correctable ECC or other memory error limit reached
Correctable ECC or other memory error limit reached
Parity---The DCPMM memory modules of the unexpected model are installed
Memory patrol scrub CE occured
Memory patrol scrub UCE occurred and degraded to CE
Configuration error---RDIMMs are installed on the server that supports only UDIMMs
Configuration error---UDIMMs are installed on the server that supports only RDIMMs
Configuration error---SODIMMs are installed on the server that supports only RDIMMs
Configuration error---The number of ranks per channel can be only 1, 2, or 4
Configuration error---The number of ranks in the channel exceeds 8
Configuration error---The CPU is not compatible with 3DS DIMMs
Configuration error---NVDIMMs with stepping lower than 0x10 are not supported
Configuration error---The CPU is not compatible with the DIMMs
Configuration error---The frequency of the DIMM is not supported on the server
Configuration error---24Gb or higher Capacity DRAMs not supported with this CPU
Configuration error---The CPU is not compatible with LRDIMMs
Configuration error--- DCPMM + HBM config is not supported. Disable DCPMM populated channel
Configuration error---Failed to enable the full mirror mode
Configuration error---Failed to enable patrol scrubbing
Configuration error---The DDR-T memory module is installed in the white slot
Configuration error---ODT configuration errorThe channel is isolated
Configuration error---REQ is not consistent with clock in timing
Configuration error---Failed to enable ADDDC
Configuration error---NVMCTRL_MEDIA_NOTREADY
The disk triggered an media error
The disk triggered an uncorrectable error
System Firmware Error (POST Error)---Run sense AMP HW FSM failed
System Firmware Error (POST Error)---No Dimm on socket0
System Firmware Error (POST Error)---No memory found
System Firmware Error (POST Error)---No DIMM is available for memory-mapping operation
System Firmware Error (POST Error)---DIMM population error
System Firmware Error (POST Error)---CPU stepping mismatch detected
System Firmware Error (POST Error)---KTI Topology Change Logged
System Firmware Error (POST Error)---CPU matching failure---CPU stepping is detected
System Firmware Error (POST Error)---CPU matching failure---CPU frequency is detected
System Firmware Error (POST Error)---CPU matching failure---CPU Microcode is detected
System Firmware Error (POST Error)---CPU matching failure---UPI Topology is detected
System Firmware Error(POST Error)---Unrecoverable video controller failure
System Firmware Progress---Video initialization---Detection unsuccessful
System Firmware Progress---Secondary processor(s) initialization---Detection unsuccessful
System Reconfigured---BIOS load default. CMOS cleared
Limit Exceeded---Cpu usage exceeds the threshold
Limit Exceeded---Mem usage exceeds the threshold
Limit Exceeded---Network usage exceeds the threshold
Limit Exceeded---Hard disk usage exceeds the threshold
Timestamp clock synch---BMC Time SYNC succeed
$1 triggered an uncorrectable error
$1 triggered a correctable error
Power Button pressed---Physical button---Button pressed
Transition to Non-Critical from OK
Transition to Critical from less severe
Transition to Non- Recoverable from less severe
Transition to Non-Critical from OK---System is operating in KTI Link Slow Speed Mode
Transition to Non-Critical from OK---Requested Link Speed is not supported. Defaulting to 18GT
System board triggered an uncorrectable error
System board triggered a correctable error
Transition to Critical from less severe
Transition to Critical from less severe
Transition to Non-recoverable from less severe
Transition to Critical from less severe
Configuration Error - Incorrect cable connected / Incorrect interconnection
Configuration Error - Incorrect cable connected / Incorrect interconnection
Configuration Error - Incorrect cable connected / Incorrect interconnection
System Boot / Restart Initiated
Boot completed - boot device not specified
Device disabled: PCIe module information not obtained
Transition to Non-Critical from OK
S4 / S5 soft-off, particular S4 / S5 state cannot be determined
Watchdog overflowAction:Timer expired
Watchdog overflowAction:Hard Reset
Watchdog overflowAction:Power Down
Watchdog overflowAction:Power Cycle
Entity Present---License is about to expire
Entity Disabled---License has expired
Controller access degraded or unavailable
Controller access degraded or unavailable
Battery low (predictive failure)
Hardware incompatibility detected with associated Entity---Memory is not certified
Temperature
Dropped below the lower minor threshold
Event code |
0x01000002 |
Variable fields |
$1: Current reading of the temperature sensor $2: Threshold (in Celsius) for triggering a minor low-temperature notification. |
Severity level |
Minor |
Example |
Dropped below the lower minor threshold---Current reading:8--Threshold reading:10 |
Impact |
Performance degradation and unstable operation might occur on the device components if the temperature is too low. If the temperature does not rise and the alarm persists, it may result in further temperature reduction and produce alarms of the major level. Therefore, it is important to detect potential issues that may lead to low temperature alarms as early as possible to avoid escalation of the problem. |
Cause |
The temperature is too low. |
Recommended action |
1. Adjust the temperature of the equipment room. 2. If the problem persists, contact the technical support. |
Dropped below the lower major threshold
Event code |
0x01200002 |
Message text |
Dropped below the lower major threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the temperature sensor $2: Threshold (in Celsius) for triggering a major low-temperature notification. |
Severity level |
Major |
Example |
Dropped below the lower major threshold---Current reading:4--Threshold reading:5 |
Impact |
Performance degradation and unstable operation might occur on the device components if the temperature is too low. If the temperature does not rise and the alarm persists, it may result in further temperature reduction and generate alarms of the critical level. Therefore, it is important to detect potential issues that may lead to low temperature alarms as early as possible in order to avoid problem escalation. |
Cause |
The temperature is too low. |
Recommended action |
1. Adjust the temperature of the equipment room. 2. If the problem persists, contact the technical support. |
Dropped below the lower critical threshold
Event code |
0x01400002 |
Message text |
Dropped below the lower critical threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the temperature sensor $2: Threshold (in Celsius) for triggering a critical low-temperature notification. |
Severity level |
Critical |
Example |
Dropped below the lower critical threshold---Current reading:0--Threshold reading:1 |
Impact |
Operating devices in ultra-low temperature environments can reduce device performance, impact device lifespan, disrupt business operations, and lead to system downtime. |
Cause |
The temperature is too low. |
Recommended action |
3. Adjust the temperature of the equipment room. 4. If the problem persists, contact the technical support. |
Exceeded the upper minor threshold
Event code |
0x01700002 |
Message text |
Exceeded the upper minor threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the temperature sensor $2: Threshold (in Celsius) for triggering a minor high-temperature notification. |
Severity level |
Minor |
Example |
Exceeded the upper minor threshold---Current reading:85---Threshold reading:80 |
Impact |
Performance degradation and unstable operation might occur on the device components if the temperature is too high. If the temperature does not decrease and the alarm persists, it may result in further temperature rise and generate major-level alarms. Therefore, it is important to detect potential issues that may lead to high temperature alarms as early as possible in order to avoid problem escalation. |
Cause |
High ambient temperature, blockage of air intake or exhaust, and low fan speed. |
Recommended action |
1. Adjust the temperature of the equipment room. 2. Verify that the air inlet and outlet are not blocked. 3. Log in to HDM, and verify that the fans are running correctly. If abnormal fans exist, replace them. 4. Log in to HDM, access the fan management page, and verify that the fan speed is appropriate. 5. If the problem persists, contact the technical support. |
Exceeded the upper major threshold
Event code |
0x01900002 |
Message text |
Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the temperature sensor $2: Threshold (in Celsius) for triggering a major high-temperature notification. |
Severity level |
Major |
Example |
Exceeded the upper major threshold---Current reading:90---Threshold reading:88 |
Impact |
Performance degradation and unstable operation might occur on the device components if the temperature is too high. If the temperature does not decrease and the alarm persists, it may result in further temperature rise and generate critical-level alarms. Therefore, it is important to detect potential issues that may lead to high temperature alarms as early as possible in order to avoid problem escalation. |
Cause |
High ambient temperature, clogged air intake or exhaust, and low fan speed. |
Recommended action |
1. Adjust the temperature of the equipment room. 2. Verify that the air inlet and outlet are not blocked. 3. Log in to HDM, and verify that the fans are running correctly. If abnormal fans exist, replace them. 4. Log in to HDM, access the fan management page, and verify that the fan speed is appropriate. 5. If the problem persists, contact the technical support. |
Exceeded the upper critical threshold
Event code |
0x01b00002 |
Message text |
Exceeded the upper critical threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the temperature sensor $2: Threshold (in Celsius) for triggering a critical high-temperature notification. |
Severity level |
Critical |
Example |
Exceeded the upper critical threshold---Current reading:95---Threshold reading:90 |
Impact |
Operating devices in high-temperature environments can reduce device performance, impact device lifespan, increase energy consumption, disrupt business operations, and cause system crashes. |
Cause |
High ambient temperature, clogged air intake or exhaust, and low fan speed. |
Recommended action |
1. Adjust the temperature of the equipment room. 2. Verify that the air inlet and outlet are not blocked. 3. Log in to HDM, and verify that the fans are running correctly. If abnormal fans exist, replace them. 4. Log in to HDM, access the fan management page, and verify that the fan speed is appropriate. 5. If the problem persists, contact the technical support. |
Voltage
Dropped below the lower minor threshold
Event code |
0x02000002 |
Message text |
Dropped below the lower minor threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the voltage sensor. $2: Threshold for triggering a minor low-voltage notification. |
Severity level |
Minor |
Example |
Dropped below the lower minor threshold---Current reading:8--Threshold reading:10 |
Impact |
Performance degradation and unstable operation might occur on the device components if the voltage is too low. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Verify whether the log was generated during device power-on or power-off. If it was, no action is required. 2. If device was running correctly when the log was generated, replace the system board. 3. If the problem persists, contact the technical support. |
Dropped below the lower major threshold
Event code |
0x02200002 |
Message text |
Dropped below the lower major threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the voltage sensor. $2: Threshold for triggering a major low-voltage notification. |
Severity level |
Major |
Example |
Dropped below the lower major threshold---Current reading:4--Threshold reading:5 |
Impact |
Performance degradation and unstable operation might occur on the device components if the voltage is too low. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Verify whether the log was generated during device power-on or power-off. If it was, no action is required. 2. If device was running correctly when the log was generated, replace the system board. 3. If the problem persists, contact the technical support. |
Dropped below the lower major threshold
Event code |
0x02220002 |
Message text |
Dropped below the lower major threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the voltage sensor. $2: Threshold for triggering a major low-voltage notification. |
Severity level |
Major |
Example |
Dropped below the lower major threshold---Current reading:10---Threshold reading:2 |
Impact |
Memory and system performance degradation might occur. |
Cause |
This alarm is generated when the PMIC voltage reading of the memory is lower than the low voltage major alarm threshold. |
Recommended action |
1. Verify whether the log was generated during device power-on or power-off. If it was, no action is required. 2. If device was running correctly when the log was generated, replace the DIMM. 3. If the problem persists, contact the technical support. |
Dropped below the lower critical threshold
Event code |
0x02400002 |
Message text |
Dropped below the lower critical threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the voltage sensor. $2: Threshold for triggering a critical low-voltage notification. |
Severity level |
Critical |
Example |
Dropped below the lower critical threshold---Current reading:0--Threshold reading:1 |
Impact |
The device is running in an ultra-low voltage environment, which affects the system's power supply, or causes one board to power off, leading to a system crash. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Verify whether the log was generated during device power-on or power-off. If it was, no action is required. 2. If device was running correctly when the log was generated, replace the system board. 3. If the problem persists, contact the technical support. |
Exceeded the upper minor threshold
Event code |
0x02700002 |
Message text |
Exceeded the upper minor threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the voltage sensor. $2: Threshold for triggering a minor high-voltage notification. |
Severity level |
Minor |
Example |
Exceeded the upper minor threshold---Current reading:85---Threshold reading:80 |
Impact |
Performance degradation and unstable operation might occur on the device components if the voltage is too high. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Verify whether the log was generated during device power-on or power-off. If it was, no action is required. 2. If device was running correctly when the log was generated, replace the system board. 3. If the problem persists, contact the technical support. |
Exceeded the upper major threshold
Event code |
0x02900002 |
Message text |
Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the voltage sensor. $2: Threshold for triggering a major high-voltage notification. |
Severity level |
Major |
Example |
Exceeded the upper major threshold---Current reading:90---Threshold reading:88 |
Impact |
Performance degradation and unstable operation might occur on the device components if the voltage is too high. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Verify whether the log was generated during device power-on or power-off. If it was, no action is required. 2. If device was running correctly when the log was generated, replace the system board. 3. If the problem persists, contact the technical support. |
Exceeded the upper major threshold
Event code |
0x02920002 |
Message text |
Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the voltage sensor. $2: Threshold for triggering a major high-voltage notification. |
Severity level |
Major |
Example |
Exceeded the upper major threshold---Current reading:10---Threshold reading:1 |
Impact |
Memory and system performance degradation might occur. |
Cause |
This alarm is generated when the PMIC voltage of the memory is higher than the current major voltage alarm threshold. |
Recommended action |
1. Verify whether the log was generated during device power-on or power-off. If it was, no action is required. 2. If device was running correctly when the log was generated, replace the DIMM. 3. If the problem persists, contact the technical support. |
Exceeded the upper critical threshold
Event code |
0x02b00002 |
Message text |
Exceeded the upper critical threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the voltage sensor. $2: Threshold for triggering a critical high-voltage notification. |
Severity level |
Critical |
Example |
Exceeded the upper critical threshold---Current reading:95---Threshold reading:90 |
Impact |
The device is operating in an ultra-high voltage environment, which affects the system's power supply, or causes a board to power off, resulting in a system crash. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Verify whether the log was generated during device power-on or power-off. If it was, no action is required. 2. If device was running correctly when the log was generated, replace the system board. 3. If the problem persists, contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0231500e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: OCP1 network card, OCP2 network card, or OCP3 network card |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure(OCP1 network card) |
Impact |
System power-off might occur. |
Cause |
The power supply of the OCP network adapter is abnormal. |
Recommended action |
1. Verify if AC power outage has occurred. If yes, try unplugging and re-plugging the power cord to verify if the machine can power on normally. 2. If AC power outage did not occur, replace the corresponding spare OCP network adapter. 3. If the problem persists, replace the adapter. 4. If the problem persists, replace the system board. 5. If the problem persists, contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0233000e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplugg and re-plugg the power cord to verify if the machine can power on normally. If the device cannot be powered on, replace the BMC module. 2. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0233a00e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplugg and re-plugg the power cord to verify if the machine can power on normally. If the device cannot be powered on, replace the DSD module. 2. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0233d00e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure in system board ($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
P12V is the global power supply for the entire system. You must perform step-by-step troubleshooting to narrow down the scope. 1. Sequentially check the PSU, fans, RISER, and drive backplane, as well as the system board. 2. Replace the faulty component. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0233e00e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Press and hold the physical power button until the fault is cleared (the four LEDs are no longer flashing rapidly), then press the power button again to power on the device. 2. If you cannot use the above method to power on the device, unplug and re-plug all power cables to remove power completely from the whole system, and then power on the device again. 3. If the issue persists, the damaged components may include the system board, BMC card, and rear backplane. 4. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0233f00e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Press and hold the physical power button until the fault is cleared (the four LEDs are no longer flashing rapidly), then press the power button again to power on the device. 2. If you cannot use the above method to power on the device, unplug and re-plug all power cables to remove power completely from the whole system, and then power on the device again. 3. If the issue persists, replace the system board. 4. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234000e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Press and hold the physical power button until the fault is cleared (the four LEDs are no longer flashing rapidly), then press the power button again to power on the device. 2. If you cannot use the above method to power on the device, unplug and re-plug all power cables to remove power completely from the whole system, and then power on the device again. 3. If the issue persists, verify the PSU for errors. If the PSU is faulty, replacethe PSU. If all the PSUs are operating correctly, the damaged components may include the system board, OCPDIMM, CPU, and fan. 4. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234100e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. When the system board experiences an overcurrent, it is not possible to clear the fault by pressing and holding the physical power button. 2. Unplug and re-plug all power cables to remove power completely from the whole system, and then power on the device again. 3. If the issue persists, the damaged components may include the fan, system board, and DIMM. 4. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234200e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234300e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234400e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234500e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234600e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234700e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234800e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234900e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x0234a00e |
Message text |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Variable fields |
$1: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe---System detected a power supply failure($1) |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Unplug and re-plug all power cables and then verify if the device can be powered on. 2. Replace the system board or CPU. 3. If the problem persists, download the SDS logs and contact the technical support. |
Current
Transition to Critical from less severe
Event code |
0x0320000e |
Message text |
Transition to Critical from less severe |
Variable fields |
N/A |
Severity level |
Major |
Example |
Transition to Critical from less severe |
Impact |
Powering off a module affects system operations. |
Cause |
Abnormal board current. |
Recommended action |
1. Check for any abnormal alarms on the power module and the system baord through the HDM Web alarm page. 2. Make sure the power supply system is functioning properly and the voltage is stable. 3. If the problem persists, contact the technical support. |
Exceeded the upper minor threshold
Event code |
0x03700002 |
Message text |
Exceeded the upper minor threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the current sensor. $2: Threshold for triggering a minor current notification. |
Severity level |
Minor |
Example |
Exceeded the upper minor threshold---Current reading:85---Threshold reading:80 |
Impact |
Performance degradation and unstable operation might occur on the device components if the current is too high. |
Cause |
The current of the corresponding part is abnormal. |
Recommended action |
1. Replace the component. 2. If the problem persists, contact the technical support. |
Exceeded the upper major threshold
Event code |
0x03900002 |
Message text |
Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the current sensor. $2: Threshold for triggering a major current notification. |
Severity level |
Major |
Example |
Exceeded the upper major threshold---Current reading:90---Threshold reading:88 |
Impact |
Performance degradation and unstable operation might occur on the device components if the current is too high. |
Cause |
Abnormal board current. |
Recommended action |
1. Replace the component. 2. If the problem persists, contact the technical support. |
Exceeded the upper major threshold
Event code |
0x03920002 |
Message text |
Exceeded the upper major threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the current sensor. $2: Threshold for triggering a major current notification. |
Severity level |
Major |
Example |
Exceeded the upper major threshold---Current reading:0.50---Threshold reading:0.20 |
Impact |
Memory and system performance degradation might occur. |
Cause |
This alarm is triggered when the current reading of the PMIC for the memory exceeds the major alarm threshold. |
Recommended action |
1. Replace the DIMM. 2. If the problem persists, contact the technical support. |
Exceeded the upper critical threshold
Event code |
0x03b00002 |
Message text |
Exceeded the upper critical threshold---Current reading:$1---Threshold reading:$2 |
Variable fields |
$1: Current reading of the current sensor. $2: Threshold for triggering a critical current notification. |
Severity level |
Critical |
Example |
Exceeded the upper critical threshold---Current reading:95---Threshold reading:90 |
Impact |
This could potentially cause component damage, leading to a system crash. |
Cause |
Abnormal board current. |
Recommended action |
1. Replace the component. 2. If the problem persists, contact the technical support. |
Fan
Transition to Running
Event code |
0x04000014 |
Message text |
Transition to Running |
Variable fields |
N/A |
Severity level |
Info |
Example |
Transition to Running |
Impact |
No negative impact. |
Cause |
The fan is operating correctly. |
Recommended action |
No action is required. |
Transition to Off Line
Event code |
0x04400014 |
Message text |
Transition to Off Line |
Variable fields |
N/A |
Severity level |
Info |
Example |
Transition to Off Line |
Impact |
This affects system heat dissipation and reduces the performance of the system board components. |
Cause |
The fan module has been unplugged or the fan module and the system board has poor contact. |
Recommended action |
1. If the fan has been removed, reinstall the fan as a best practice. 2. Check if the pins of the fan and system board connector are normal. If an abnormality is present, replace the component. Otherwise, reinsert or reattach the fan to ensure proper contact. 3. Replace the fan. 4. If the problem persists, contact the technical support. |
Transition to Degraded
Event code |
0x04600014 |
Message text |
Transition to Degraded |
Variable fields |
N/A |
Severity level |
Major |
Example |
Transition to Degraded |
Impact |
This affects system heat dissipation and reduces the performance of the system board components. |
Cause |
The fan speed is abnormal. |
Recommended action |
1. Use the HDM Web page to check the fan speed and confirm the cause of the fan failure. If the speed is too low, it may be due to fan aging. If the speed is close to zero, it may be due to the fan being blocked by foreign objects or a fan failure. 2. Verify that the fan is not blocked. 3. Replace the fan. 4. If the problem persists, contact the technical support. |
Fully Redundant
Event code |
0x04000016 |
Message text |
Fully Redundant |
Variable fields |
N/A |
Severity level |
Info |
Example |
Fully Redundant |
Impact |
No negative impact. |
Cause |
All fan slots are equipped with fans. |
Recommended action |
No action is required. |
Non-redundant:Sufficient Resources from Redundant
Event code |
0x04300016 |
Message text |
Non-redundant:Sufficient Resources from Redundant |
Variable fields |
N/A |
Severity level |
Major |
Example |
Non-redundant:Sufficient Resources from Redundant |
Impact |
This issue does not affect system heat dissipation. |
Cause |
The fan is invalid or is absent. |
Recommended action |
1. If the fan has been removed, reinstall the fan as a best practice. 2. Reinsert or reattach the fan to ensure proper contact. 3. If the fan status sensor reports a malfunction, it means that the fan has failed. Replace the fan. 4. If the problem persists, contact the technical support. |
Non-redundant:Insufficient Resources
Event code |
0x04500016 |
Message text |
Non-redundant:Insufficient Resources |
Variable fields |
N/A |
Severity level |
Critical |
Example |
Non-redundant:Insufficient Resources |
Impact |
This affects system heat dissipation, causing the system to overheat and automatically shut down. |
Cause |
The fan is invalid or is absent. |
Recommended action |
1. If the fan has been removed, reinstall the fan as a best practice. 2. Reinsert or reattach the fan to ensure proper contact. 3. If the fan status sensor reports a malfunction, it means that the fan has failed. Replace the fan. 4. If the problem persists, contact the technical support. |
Physical Security
General Chassis Intrusion
Event code |
0x050000de |
Message text |
General Chassis Intrusion |
Variable fields |
N/A |
Severity level |
Minor |
Example |
General Chassis Intrusion |
Impact |
No negative impact. |
Cause |
The chassis access panel is removed. |
Recommended action |
1. Check if the access panel was removed manually. 2. Check if the access panel is installed properly. If necessary, open the access panel and then close it to see if the error log is cleared. 3. Check if the connection between the access-open alarm module and the chassis ear is normal. 4. If the problem persists, contact the technical support. |
LAN Leash Lost
Event code |
0x054000de |
Message text |
LAN Leash Lost |
Variable fields |
N/A |
Severity level |
Info |
Example |
LAN Leash Lost |
Impact |
No negative impact. |
Cause |
BMC's NCSI channel detects a physical disconnection in the network. |
Recommended action |
1. Check if the network adapter is disabled in the operating system. If it is disabled, no action is required. 2. If the system reports this log during the power on/off phase, it can be ignored. 3. Check if the shared network port cable is properly connected. 4. If the shared network port is not needed, disable it. 5. If the problem persists, contact the technical support. |
Processor
IERR
Event code |
0x070000de |
Message text |
$1 IERR err---Socket $2 |
Variable fields |
$1: Signal type. Options include MSMI and CATERR. $2: Faulty CPU. |
Severity level |
Critical |
Example |
CATERR IERR err---Socket 1 |
Impact |
It can cause system crash. |
Cause |
CPU internal error. For example, if the Package Control Unit (PCU) encounters an unrecoverable error, this alarm will be triggered. |
Recommended action |
1. Upgrade the BIOS and HDM firmware to the up-to-date version. 2. Process it in conjunction with the specific component event logs reported at the same time as this log. 3. If the problem persists, contact the technical support. |
MCERR
Event code |
0x070010de |
Message text |
$1 MCERR err---Socket $2 |
Variable fields |
$1: Signal type. Options include MSMI and CATERR. $2: Faulty CPU. |
Severity level |
Critical |
Example |
CATERR MCERR err---Socket 1 |
Impact |
It can cause system restart. |
Cause |
CPU internal error. For example, if an uncorrectable error occurs on the memory, this alarm will be triggered. |
Recommended action |
1. CPU detects internal error and generates this log. Further check hardware information and sensor pages for errors or disabled components based on the description information. 2. Restart the host to check if the problem still persists. 3. Check for memory, PCIe, and UPI failures using the contextual logs, and perform troubleshooting steps based on the corresponding recommended actions. |
Thermal Trip
Event code |
0x071000de |
Message text |
Thermal Trip |
Variable fields |
N/A |
Severity level |
Critical |
Example |
Thermal Trip |
Impact |
It can cause host power-off. |
Cause |
When the CPU overheats, this event is triggered, which may result in shutdown and power off. If only the Thermal Trip is reported without other errors, it is possible that there was a sharp change in CPU load, and the cooling strategy was unable to adjust in time, resulting in a temporary temperature rise and reporting an abnormal event. |
Recommended action |
1. Log in to HDM, and verify that the fan is in normal state. 2. Plug or replace the fan module with a speed alarm. 3. Check the system resources monitoring tab to see if the system workload is too heavy. Close non-essential tasks to reduce the workload. 4. Check if the ambient temperature is too high and keep the server operating within its normal temperature range. 5. Check for any blockages at the air inlet/outlet and remove any obstructions. 6. Power off the server, check for poor contact of the CPU heatsink, reapply the thermal grease, reinstall the heatsink, and power on again. 7. If the problem persists, contact the technical support. |
FRB1/BIST failure
Event code |
0x072000de |
Message text |
FRB1/BIST failure. |
Variable fields |
N/A |
Severity level |
Minor |
Example |
FRB1/BIST failure |
Impact |
The operating system might fail to start up. |
Cause |
This alarm is generated when the CPU self-check detects an error during system startup. |
Recommended action |
1. Power cycle the device. 2. If the issue persists, it indicates that some cores of the CPU have failed the self-check. Replace the CPU. 3. If the problem persists, contact the technical support. |
FRB2/Hang in POST failure
Event code |
0x073000de |
Message text |
FRB2/Hang in POST failure |
Variable fields |
N/A |
Severity level |
Major |
Example |
FRB2/Hang in POST failure |
Impact |
The operating system might fail to start up. |
Cause |
CPU self-check error. |
Recommended action |
This alarm is generated when the CPU self-check detects an error during system startup. 1. Perform a shutdown and restart operation to confirm if the log is still triggered. 2. If the log is still triggered, replace the CPU. |
FRB3/Processor Startup/Initialization failure
Event code |
0x074000de |
Message text |
FRB3/Processor Startup/Initialization failure |
Variable fields |
N/A |
Severity level |
Minor |
Example |
FRB3/Processor Startup/Initialization failure |
Impact |
The operating system might fail to start up. |
Cause |
CPU self-check error. |
Recommended action |
This alarm is generated when the CPU self-check detects an error during system startup. 1. Perform a shutdown and restart operation to confirm if the log is still triggered. 2. If the log is still triggered, replace the CPU. |
Configuration Error
Event code |
0x075000de |
Message text |
Configuration Error---$1, ErrorType: $2,Severity:$3, $4, Location: Socket: $5 |
Variable fields |
$1: Time at which the error occurred. It can beCurrent Boot Error or Last Boot Error. $2: Fault type. It can be IIO Internal Error or Spare core Error. $3: Fault severity. $4: Faulty component. $5: CPU number. |
Severity level |
Minor |
Example |
Configuration Error---Current Boot Error, ErrorType: IIO Internal Error,Severity:Correctable, Component:VTD, IIO Stack: 1, Location: Socket: 1 |
Impact |
The operating system might fail to start up. |
Cause |
The main system CPU detected internal correctable error information during operation. |
Recommended action |
This log is generated when correctable internal errors are detected during server operation, such as IIO internal errors or CPU core errors. No action is required for correctable internal errors. |
Processor Presence detected
Event code |
0x077000df |
Message text |
Processor Presence detected |
Variable fields |
N/A |
Severity level |
Info |
Example |
Processor Presence detected |
Impact |
If the primary CPU is not in place, it may result in system startup failure. |
Cause |
This event log is triggered when the primary CPU is not in place or installed incorrectly. |
Recommended action |
1. Verify that the primary CPU is installed correctly. 2. If the primary CPU fails, replace the CPU. 3. If the problem persists, contact the technical support. |
Processor Automatically Throttled
Event code |
0x07a000de |
Message text |
Processor Automatically Throttled---due to fan error |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Processor Automatically Throttled---due to fan error |
Impact |
System performance decreases due to CPU throttling. |
Cause |
The CPU throttles due to fan failure, such as the failure of a single fan. The processor automatically throttles (setting T-states, reducing performance, similar to duty cycle), which triggers this event. |
Recommended action |
1. Log in to HDM, and verify that the fans are running correctly. 2. Verify that the air conditioner in the equipment room is running correctly. |
Processor Automatically Throttled
Event code |
0x07a010de |
Message text |
Processor Automatically Throttled---prochot |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Processor Automatically Throttled---prochot |
Impact |
System performance decreases due to CPU throttling. |
Cause |
The CPU may throttle due to reasons such as overheating or other temperature sensors exceeding a certain set temperature, such as the failure of a single fan. The processor automatically throttles (setting T-states, reducing performance, similar to duty cycle), which triggers this event. |
Recommended action |
1. Log in to HDM, and verify that the fans are running correctly. 2. Verify that the air conditioner in the equipment room is running correctly. |
Processor Automatically Throttled
Event code |
0x07a020de |
Message text |
Processor Automatically Throttled---memhot |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Processor Automatically Throttled---memhot |
Impact |
System performance decreases due to CPU throttling. |
Cause |
CPU throttling may occur due to overheating of memory, such as the failure of a single fan. The processor automatically throttles (setting T-states, reducing performance, similar to duty cycle), which triggers this event. |
Recommended action |
1. Log in to HDM, and verify that the fans are running correctly. 2. Verify that the air conditioner in the equipment room is running correctly. |
Machine Check Exception
Event code |
0x07b000de |
Message text |
Machine Check Exception---$1---$2---Location: Socket:$3 |
Variable fields |
$1: Fault type. $2: Specifies if the error occurred during the current boot or the previous boot. $3: Faulty CPU. |
Severity level |
Critical |
Example |
Machine Check Exception---PIE---Last Boot Error---Location: Socket:1 |
Impact |
The system might stop responding. |
Cause |
Only in AMD models, when the CPU generates an uncorrectable error, this event will be triggered. |
Recommended action |
1. Check if any corresponding faults exist in the operating system. 2. Check the CPU microcode and upgrade the BIOS and BMC to the latest versions. 3. If the issue persists, based on the bank location, preliminarily determine the range of the fault and check if there are any other warning logs generated. 4. Safely power off the server and replace the CPU or peripheral with a known working one to see if the warning disappears. 5. Replace the system board. |
Triggered a uncorrectable error
Event code |
0x07b201de |
Message text |
CPU $1 triggered a uncorrectable error. |
Variable fields |
$1: CPU number. |
Severity level |
Critical |
Example |
CPU 1 triggered a uncorrectable error. |
Impact |
The system might stop responding. |
Cause |
Triggering IERR or MCERR errors, the diagnosis result of SHD is CPU uncorrectable error. |
Recommended action |
1. Upgrade the BIOS and HDM firmware to the up-to-date version. 2. Check other error warning logs for analysis and see if the corresponding memory, PCIe, or CPU can be identified. 3. Safely power off the server and replace the memory, PCIe, or CPU with a known working one to see if the warning disappears. 4. Replace the system board. 5. If the problem persists, contact the technical support. |
Machine Check Exception
Event code |
0x07b100de |
Message text |
Machine Check Exception---HBM error---Location: Socket:1 |
Variable fields |
$1: CPU number. |
Severity level |
Critical |
Example |
Machine Check Exception---HBM error---Location: Socket:1 |
Impact |
The system might stop responding. |
Cause |
HBM failed. |
Recommended action |
1. Check if there are any corresponding faults in the operating system. 2. Check the CPU microcode and upgrade the BIOS and BMC to the latest versions. 3. If the issue persists, preliminarily determine the range of the fault based on the bank location and check if there are any other warning logs generated. 4. Safely power off the server and replace the CPU or peripheral with a known working one to see if the warning disappears. 5. Replace the system board. |
Triggered a correctable error
Event code |
0x07c201de |
Message text |
CPU $1 triggered a correctable error. |
Variable fields |
$1: CPU number. |
Severity level |
Minor |
Example |
CPU 1 triggered a correctable error. |
Impact |
The system might stop responding. |
Cause |
Triggering IERR or MCERR errors, the diagnosis result of SHD is CPU correctable error. |
Recommended action |
1. Upgrade the BIOS and HDM firmware to the up-to-date version. 2. Process according to the specific component event logs reported simultaneously. 3. Replace the CPU, memory, or PCIe module. 4. Replace the system board. 5. If the problem persists, contact the technical support. |
Correctable Machine Check Error
Event code |
0x07c000de |
Message text |
Correctable Machine Check Error---$1---$2---Location: Socket:$3 |
Variable fields |
$1: Fault type. $2: Specifies if the error occurred during the current boot or the previous boot. $3: Faulty CPU. |
Severity level |
Minor |
Example |
Correctable Machine Check Error---PIE---Current Boot Error---Location: Socket:1 |
Impact |
No negative impact. |
Cause |
This alarm is generated only in AMD models when correctable errors such as TWIX, WAFL, or SMU occur. |
Recommended action |
1. Random occurrences can be temporarily ignored. 2. Check if there are any relevant errors in the OS software and drivers. 3. If the problem persists, replace the system board, CPU, or peripheral. |
Correctable Machine Check Error
Event code |
0x07c100de |
Message text |
Correctable Machine Check Error---HBM error---Location: Socket:1 |
Variable fields |
$1: Faulty CPU. |
Severity level |
Minor |
Example |
Correctable Machine Check Error---HBM error---Location: Socket:$1 |
Impact |
No negative impact. |
Cause |
HBM failed. |
Recommended action |
No action is required. |
Machine Check Exception
Event code |
0x07b001de |
Message text |
Machine Check Exception---$1, Bank: $2,Severity:$3, Error Info:$4, Location: Socket: $5 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. $2: Fault bank. $3: Fault severity. $4: Fault information. $5: CPU number. |
Severity level |
Critical |
Example |
Machine Check Exception---Current Boot Error, Bank: IFU,Severity:FATAL, Error Info:Cache, Location: Socket: 1 |
Impact |
The system might stop responding. |
Cause |
This event occurs when there is an internal fault in the CPU. |
Recommended action |
1. Check if there are any corresponding faults present in the operating system. 2. Check the CPU microcode and upgrade the BIOS and BMC to the latest versions. 3. If the issue persists, preliminarily determine the range of the fault based on the bank location and check if any other warning logs have been generated. 4. Power off the server safely and replace the CPU or peripheral with a known working one to see if the warning disappears. 5. Replace the system board. |
Correctable Machine Check Error
Event code |
0x07c001de |
Message text |
Correctable Machine Check Error---$1, Bank: $2,Severity:$3, Error Info:$4, Location: Socket: $5 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. $2: Fault bank. $3: Fault severity. $4: Fault information. $5: CPU number. |
Severity level |
Minor |
Example |
Correctable Machine Check Error---Current Boot Error, Bank: IFU,Severity:Corrected, Error Info:Cache, Location: Socket: 1 |
Impact |
No negative impact. |
Cause |
This event occurs when there is an internal failure in the CPU. |
Recommended action |
1. If the occurrence is sporadic, it can be temporarily ignored. 2. Check if there are any related errors in the operating system software and drivers. 3. If the problem persists, replace the system board, CPU, or peripheral. |
Power Supply
Presence detected
Event code |
0x080000de |
Message text |
Presence detected |
Variable fields |
N/A |
Severity level |
Info |
Example |
Presence detected |
Impact |
If the power supply is not in place, it will reduce the reliability of device power supply. |
Cause |
When the power module is detected as being inserted, this event is triggered, indicating a transition from the power module not being in place to being in place. When the power module is detected as being removed, this event is cleared, indicating a transition from the power module being in place to not being in place. |
Recommended action |
1. Check if the operation to remove the power module was performed. 2. Check if the power module is installed correctly. 3. If the problem persists, contact the technical support. |
Power Supply Failure detected
Event code |
0x081000de |
Message text |
Power Supply Failure detected |
Variable fields |
N/A |
Severity level |
Major |
Example |
Power Supply Failure detected |
Impact |
It affects system power supply and may result in abnormal system power-off. |
Cause |
This event occurs when there is a power detection failure. |
Recommended action |
1. Check if the fan of the power module is not spinning. 2. Disconnect and reconnect the power module. 3. Check if the input voltage of the power module is normal. 4. Replace the power module. 5. If the problem persists, contact the technical support. |
Power Supply Predictive Failure
Event code |
0x082000de |
Message text |
Power Supply Predictive Failure |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Power Supply Predictive Failure |
Impact |
The power module may have malfunctions that affect system power supply. |
Cause |
The power module generates a minor alarm internally. |
Recommended action |
1. Check if the status LED of the power module is normal. 2. Check if the fan of the power module is not spinning. 3. Check if the input voltage of the power module is normal. 4. If the problem persists, contact the technical support. |
Power Supply input lost (AC/DC)
Event code |
0x083000de |
Message text |
Power Supply input lost (AC/DC) |
Variable fields |
N/A |
Severity level |
Major |
Example |
Power Supply input lost (AC/DC) |
Impact |
It may cause the server to power off abnormally. |
Cause |
The AC power cable of the power supply is unplugged or there is an abnormal AC input. |
Recommended action |
1. Verify that all power cables are undamaged and properly connected. 2. Ensure that all power modules are correctly installed. 3. Check if the fans of the power modules are spinning. 4. Confirm that the power input is normal. 5. If the problem persists, contact the technical support. |
Power Supply input lost or out-of-range
Event code |
0x084000de |
Message text |
Power Supply input lost or out-of-range |
Variable fields |
N/A |
Severity level |
Major |
Example |
Power Supply input lost or out-of-range |
Impact |
This may cause the server to power off abnormally. |
Cause |
The power supply module is present, but the power supply is interrupted or has exceeded the threshold. |
Recommended action |
1. Check if there has been any deliberate power interruption. 2. Check if the input voltage of the power module is normal. 3. Verify that the power cables and power modules are installed correctly. 4. Unplug and re-plug the power module to ensure a good power connection. 5. Check if the fans of the power module are spinning. 6. If the problem persists, contact the technical support. |
Power Supply input out-of-range - but present
Event code |
0x085000de |
Message text |
Power Supply input out-of-range - but present |
Variable fields |
N/A |
Severity level |
Major |
Example |
Power Supply input out-of-range - but present |
Impact |
Abnormal power input beyond the supported range may cause the server to power off. |
Cause |
The input voltage of the power supply is too high. |
Recommended action |
1. Check if the input voltage of the power module is normal. 2. Verify that the power cables and power modules are installed correctly. 3. Unplug and re-plug the power module to ensure a good power connection. 4. Check if the fans of the power module are spinning. 5. If the problem persists, contact the technical support. |
Configuration error ---Vendor mismatch
Event code |
0x086000de |
Message text |
Configuration error ---Vendor mismatch |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Configuration error ---Vendor mismatch |
Impact |
This may cause unstable power supply and abnormal shutdown of the system. |
Cause |
Non-original certified power modules are installed. |
Recommended action |
Install original certified power modules. |
Configuration error---Power Supply rating mismatch
Event code |
0x086030de |
Message text |
Configuration error --- Power Supply rating mismatch |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Configuration error --- Power Supply rating mismatch |
Impact |
This may result in unstable power supply and abnormal system shutdown. |
Cause |
Original certified power modules are installed, but the models of the two power modules do not match. |
Recommended action |
1. If the rated power of the power modules is consistent, try plugging and unplugging them one by one to see if the issue is resolved. 2. If the rated power of the power modules is inconsistent, replace them with power modules that have the same rated power. 3. If the problem persists, contact the technical support. |
Configuration error---Power supply rating mismatch
Event code |
0x086200de |
Message text |
Configuration error---Power supply rating mismatch:PSU$1,POUT:$2W |
Variable fields |
$1: PSU ID, which can be 1 or 2. $2: Output power of the power supply. |
Severity level |
Minor |
Example |
Configuration error---Power supply rating mismatch:PSU1,POUT:2000W |
Impact |
This may result in unstable power supply and abnormal system shutdown. |
Cause |
The rated power of the installed power supplies may be inconsistent. |
Recommended action |
1. If the rated power of the power supplies is consistent, try plugging and unplugging them one by one to see if the issue is resolved. 2. If the rated power of the power supplies is inconsistent, replace them with power supplies that have the same rated power. 3. If the problem persists, contact the technical support. |
Power Supply Inactive/standby state
Event code |
0x087000de |
Message text |
Power Supply Inactive/standby state |
Variable fields |
N/A |
Severity level |
Info |
Example |
Power Supply Inactive/standby state |
Impact |
No negative impact. |
Cause |
The power supply exits cold standby mode. When the function of standby power supply is enabled, if the current device is running at a high power, the standby power supply will automatically exit cold backup mode and provide power to the device. |
Recommended action |
No action is required. |
PSU failure detected by CPLD
Event code |
0x088000de |
Message text |
PSU failure detected by CPLD |
Variable fields |
N/A |
Severity level |
Critical |
Example |
PSU failure detected by CPLD |
Impact |
This may result in unstable power supply and abnormal system shutdown. |
Cause |
The server has experienced an AC power failure. |
Recommended action |
1. Check for environmental issues such as high temperature or abnormal power supply fan. 2. Replug the power module and check if the alarm disappears. 3. If the problem persists, replace the power module. |
Redundancy Lost
Event code |
0x08100016 |
Message text |
Redundancy Lost |
Variable fields |
N/A |
Severity level |
Major |
Example |
Redundancy Lost |
Impact |
Power redundancy failure reduces the reliability of device power supply. |
Cause |
Power redundancy got lost. |
Recommended action |
1. Check if the power supply environment is normal. 2. Check if any power supply has been removed. 3. Check for poor contact between power supplies and power cables. 4. Check for power-related fault alarm logs to determine if it is a power failure. 5. If the problem persists, contact the technical support. |
Power Unit
Power limit is exceeded over correction time limit
Event code |
0x095010de |
Message text |
Power limit is exceeded over correction time limit---$1 Current Power: $2W. |
Variable fields |
$1:GPU/None $2: Current power value. |
Severity level |
Minor |
Example |
Power limit is exceeded over correction time limit---GPU Current Power: 2000W |
Impact |
Exceeding the maximum power output will cause the system to shut down. |
Cause |
Power capping triggers this alarm after a certain amount of time elapsed when the power output exceeds the limit. |
Recommended action |
1. Adjust the power capping threshold or adjust the GPU workload. 2. If the problem persists, contact the technical support. |
Cooling Device
Transition to OK
Event code |
0x0a00000e |
Message text |
Transition to OK |
Variable fields |
N/A |
Severity level |
Info |
Example |
Transition to OK |
Impact |
No negative impact. |
Cause |
The liquid cooling module is in place and free of faults. |
Recommended action |
No action is required. |
Transition to Non-recoverable
Event code |
0x0a60000e |
Message text |
Transition to Non-recoverable |
Variable fields |
N/A |
Severity level |
Critical |
Example |
Transition to Non-recoverable |
Impact |
It affects CPU cooling. |
Cause |
This message will be generated when liquid leakage occurs. |
Recommended action |
1. Check if the liquid cooling device is functioning properly or if there is any liquid leakage. 2. Replace the liquid cooling module. |
Monitor
Event code |
0x0a70000e |
Message text |
Monitor |
Variable fields |
N/A |
Severity level |
Major |
Example |
Monitor |
Impact |
Unable to detect coolant leakage. |
Cause |
Liquid leakage sensor cannot be detected. |
Recommended action |
1. Check if the liquid cooling device is present. 2. Check if the liquid leakage sensor is installed correctly. 3. Replace the liquid cooling module. |
Other Units-based Sensor
Exceeded the upper minor threshold
Event code |
0x0b700002 |
Message text |
Exceeded the upper minor threshold---Current reading:20---Threshold reading:18 |
Variable fields |
$1: Current power value. $2: Threshold for triggering a minor power notification. |
Severity level |
Minor |
Example |
Exceeded the upper minor threshold---Current reading:$1---Threshold reading:$2 |
Impact |
Exceeding the maximum power limit will cause the system to shut down. |
Cause |
The power exceeds the limit. |
Recommended action |
1. Log in to HDM, and verify that the threshold value is appropriate. 2. Check if the total power consumption of the server is too high through the HDM web page. 3. Check if the total power consumption of the power supply meets the service requirements. 4. If the problem persists, contact the technical support. |
Memory
Correctable ECC or other correctable memory error
Event code |
0x0c0000de |
Message text |
Correctable ECC or other correctable memory error--$1-Location:CPU:$2 CH:$3 DIMM:$4 $5 |
Variable fields |
$1: Time at which the error occurred, Current Boot Error or Last Boot Error. $2: CPU number. $3: Channel number. $4: DIMM number. $5: DIMM mark. |
Severity level |
Minor |
Example |
Correctable ECC or other correctable memory error---Current Boot Error-Location:CPU:1 CH:1 DIMM:0 A1 |
Impact |
No negative impact. However, if there are a large number of ECC errors, it is important to pay attention in order to prevent the accumulation of correctable errors from escalating into uncorrectable errors, which could result in system crash. |
Cause |
Correctable memory errors. |
Recommended action |
No action is required. |
Correctable ECC or other correctable memory error
Event code |
0x0c0020de |
Message text |
Correctable ECC or other correctable memory error---$1---Location:CPU:$2 CH:$3 DIMM:$4 |
Variable fields |
$1: Fault type, which can be ECC, Parity, CRC, or Other $2: CPU number. $3: Channel number. $4: DIMM number. |
Severity level |
Minor |
Example |
Correctable ECC or other correctable memory error---CRC---Location:CPU:1 CH:1 DIMM:0 |
Impact |
No negative impact. |
Cause |
For AMD models, memory has no negative impact. However, if there are a large number of ECC errors, it is important to pay attention in order to prevent the accumulation of correctable errors from escalating into uncorrectable errors, which could result in system crash. |
Recommended action |
No action is required. |
Correctable ECC or other correctable memory error
Event code |
0x0c0060de |
Message text |
Correctable ECC or other correctable memory error---$1---$2---Location:CPU$2 CH:$3 DIMM:$4 |
Variable fields |
$1: Fault type, which can be ECC, Parity, or CRC. $2: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $3: CPU number. $4: Channel number. $5: DIMM number. |
Severity level |
Minor |
Example |
Correctable ECC or other correctable memory error---ECC---Current Boot Error---Location:CPU1 CH:8 DIMM:0 |
Impact |
No negative impact. |
Cause |
AMD models generate correctable ECC or other correctable errors. |
Recommended action |
Correctable errors in memory do not directly affect the normal operation of the system. No action is required. |
CPU triggered a correctable error
Event code |
0x0c0500de |
Message text |
CPU $1 $2 triggered a correctable error |
Variable fields |
$1: CPU number. $2: DIMM mark. |
Severity level |
Minor |
Example |
CPU 1 A0 triggered a correctable error |
Impact |
No negative impact. |
Cause |
Triggering IERR or MCERR errors, the SHD diagnostic result shows correctable errors in memory. |
Recommended action |
Correctable errors in memory do not directly affect the normal operation of the system. No action is required. |
Uncorrectable ECC or other uncorrectable memory error
Event code |
0x0c1000de |
Message text |
Uncorrectable ECC or other uncorrectable memory error--$1-Location:CPU:$2 CH:$3 DIMM:$4 $5 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $2: CPU number. $3: Channel number. $4: DIMM number. $5: DIMM mark. |
Severity level |
Major |
Example |
Uncorrectable ECC or other uncorrectable memory error---Current Boot Error-Location:CPU:1 MEM CTRL:1 CH:1 DIMM:0 A1 |
Impact |
It can cause the OS to crash, unless the memory is in certain RAS modes, such as mirror or MCA recovery. |
Cause |
A non-correctable (multiple bit flip) ECC error has occurred. |
Recommended action |
1. Verify that the temparature and humidity are appropriate. 2. Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding memory module. 3. If the issue persists, check if the pins on the corresponding memory socket are bent. If the pins are bent, replace the system board. 4. Replace the DIMM. 5. If the problem persists, contact the technical support. |
Uncorrectable ECC or other uncorrectable memory error
Event code |
0x0c1020de |
Message text |
Uncorrectable ECC or other uncorrectable memory error--$1-Location:CPU:$2 CH:$3 DIMM:$4 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $2: CPU number. $3: Channel number. $4: DIMM number. |
Severity level |
Major |
Example |
Uncorrectable ECC or other uncorrectable memory error---Current Boot Error-Location:CPU:1 MEM CTRL:1 CH:1 DIMM:0 A1 |
Impact |
It can cause the OS to crash, unless the memory is in certain RAS modes, such as mirror or MCA recovery. |
Cause |
In AMD models, an uncorrectable (multiple bit flip) ECC error has occurred. This is an urgent issue and usually leads to OS crashes. |
Recommended action |
1. Verify that the temparature and humidity are appropriate. 2. Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding memory module. 3. If the issue persists, check if the pins on the corresponding memory socket are bent. If the pins are bent, replace the system board. 4. Replace the DIMM. 5. If the problem persists, contact the technical support. |
Triggered an uncorrectable error
Event code |
0x0c1500de |
Message text |
CPU$1 $2 triggered an uncorrectable error |
Variable fields |
$1: CPU number. $2: DIMM mark. |
Severity level |
Major |
Example |
CPU1 A0 triggered an uncorrectable error |
Impact |
The system might restart or stop responding. |
Cause |
Triggering IERR or MCERR errors, the SHD diagnostic result shows uncorrectable errors in memory. |
Recommended action |
1. Verify that the temparature and humidity are appropriate. 2. Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding memory module. 3. If the issue persists, check if the pins on the corresponding memory socket are bent. If the pins are bent, replace the system board. |
Uncorrectable ECC or other uncorrectable memory error
Event code |
0x0c1600de |
Message text |
Uncorrectable ECC or other uncorrectable memory error---$1---$2---Location:CPU$3 CH:4 DIMM:5 |
Variable fields |
$1: Fault type, which can be ECC, Parity, or CRC. $2: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $3: CPU number. $4: Channel number. $5: DIMM number. |
Severity level |
Major |
Example |
Uncorrectable ECC or other uncorrectable memory error---ECC---Last Boot Error---Location:CPU1 CH:8 DIMM:0 |
Impact |
The system might restart or stop responding. |
Cause |
In AMD models, uncorrectable ECC or other uncorrectable errors occur and are generated. |
Recommended action |
1. Verify that the temparature and humidity are appropriate. 2. Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding memory module. 3. If the issue persists, check if the pins on the corresponding memory socket are bent. If the pins are bent, replace the system board. |
Parity
Event code |
0x0c2000de |
Message text |
Parity ---$1---Location: Location:CPU:$2 CH:$3 DIMM:$4 $5 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $2: CPU number. $3: Channel number. $4: DIMM number. $5: DIMM mark. |
Severity level |
Minor |
Example |
Parity---Current Boot Error-Location:CPU:1 CH:1 DIMM:0 A0 |
Impact |
No negative impact. |
Cause |
This error message is generated when there is a failure in data parity on the command/address lines while reading the memory cell data, resulting in abnormal data access to the memory. The SEL records the command/address parity error and logs the accessed DIMM. |
Recommended action |
1. Verify that the temparature and humidity are appropriate. 2. Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding memory module. 3. If the issue persists, check if the pins on the corresponding memory socket are bent. If the pins are bent, replace the system board. 4. Replace the DIMM. 5. If the problem persists, contact the technical support. |
Parity
Event code |
0x0c2020de |
Message text |
Parity---Location:CPU:$1 CH:$2 DIMM:$3 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM number. |
Severity level |
Minor |
Example |
Parity---Location:CPU:1 CH:1 DIMM:0 |
Impact |
No negative impact. |
Cause |
In AMD models, this error message is generated when there is a failure in data parity on the command/address lines while reading the memory cell data, resulting in abnormal data access to the memory. The SEL records the command/address parity error and logs the accessed DIMM. |
Recommended action |
1. Verify that the temparature and humidity are appropriate. 2. Clean the memory slots and memory contacts, ensuring that there are no foreign objects in the memory slots and the contacts are not contaminated. Then, reinstall the corresponding memory module. 3. If the issue persists, check if the pins on the corresponding memory socket are bent. If the pins are bent, replace the system board. 4. Replace the DIMM. 5. If the problem persists, contact the technical support. |
Parity---An uncorrectable error occurs during the memory test phase
Event code |
0x0c20b1c4 |
Message text |
Parity---An uncorrectable error occurs during the memory test phase---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---An uncorrectable error occurs during the memory test phase---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Occurrence of memory test errors. |
Recommended action |
1. Isolate the corresponding rank if an UCE was generated during the memtest phase. 2. Replace the DIMM. |
Parity---The memory interleaving configuration cannot meet the requirements of the server
Event code |
0x0c20e014 |
Message text |
Parity---The memory interleaving configuration cannot meet the requirements of the server---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---The memory interleaving configuration cannot meet the requirements of the server---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Configuration error. Memory interleave configuration does not meet the requirements of the server. |
Recommended action |
1. Check the configuration in Setup regarding memory interleave settings (such as NUMA and Interleave). 2. Update the BIOS firmware. 3. Collect the BIOS logs and contact the technical support. |
Parity---The memory interleaving configuration cannot meet the requirements of the server
Event code |
0x0c20e024 |
Message text |
Parity---The memory interleaving configuration cannot meet the requirements of the server---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---The memory interleaving configuration cannot meet the requirements of the server---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Configuration error. Memory interleave configuration does not meet the requirements of the server. |
Recommended action |
1. Check the configuration in Setup regarding memory interleave settings (such as NUMA and Interleave). 2. Update the BIOS firmware. 3. Collect the BIOS logs and contact the technical support. |
Parity---The memory interleaving configuration cannot meet the requirements of the server
Event code |
0x0c20e0e4 |
Message text |
Parity---The memory interleaving configuration cannot meet the requirements of the server---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---The memory interleaving configuration cannot meet the requirements of the server---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Configuration error. Memory interleave configuration does not meet the requirements of the server. |
Recommended action |
1. Check the configuration in Setup regarding memory interleave settings (such as NUMA and Interleave). 2. Update the BIOS firmware. 3. Collect the BIOS logs and contact the technical support. |
Parity---CMD eye width is too small
Event code |
0x0c226014 |
Message text |
Parity---CMD eye width is too small---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---CMD eye width is too small---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. CMD eye width is too small. |
Recommended action |
1. Confirm the memory slot according to the alarm information. 2. Check if there are foreign objects on the memory gold finger and memory slot and clean them. 3. Reinsert the module. If the issue occurs again, replace the DIMM if necessary. |
Parity---CmdPiGroup: No Eye width
Event code |
0x0c226024 |
Message text |
Parity---CmdPiGroup: No Eye width---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---CmdPiGroup: No Eye width---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. CMD eye width does not exsit. |
Recommended action |
1. Confirm the memory slot according to the alarm information. 2. Check if there are foreign objects on the memory gold finger and memory slot and clean them. 3. Reinsert the module. If the issue occurs again, replace the DIMM if necessary. |
Parity---The command is not in the FNv table
Event code |
0x0c228004 |
Message text |
Parity---The command is not in the FNv table---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---The command is not in the FNv table---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. The command sent is not in the FNv table. |
Recommended action |
Update the BIOS and DCPMM firmware. |
Parity---Memory read DqDqs training failed
Event code |
0x0c231134 |
Message text |
Parity---Memory read DqDqs training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Memory read DqDqs training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. Memory read Dq or Dqs training failed. |
Recommended action |
1. If it is a CE error, the system can continue to run normally. 2. If it is a UCE error, check if there are foreign objects on the memory gold finger and memory slot and clean them. 3. If the error persists after reinsertion, replace the DIMM. |
Parity---Memory Receive Enable Training Error
Event code |
0x0c231144 |
Message text |
Parity---Memory Receive Enable Training Error---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Memory Receive Enable Training Error---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory Faulty Parts Tracking failure. The Receive Enable signal of the memory fails to train to the corresponding timing. |
Recommended action |
1. If it is a CE error, the system can continue to run normally. 2. If it is a UCE error, check if there are foreign objects on the memory gold finger and memory slot, and clean them. 3. If the error persists after reinserting the DIMM, replace the DIMM. |
Parity---Memory write DqDqs training failed
Event code |
0x0c231164 |
Message text |
Parity---Memory write DqDqs training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Memory write DqDqs training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. Memory read Dq or Dqs training failed. |
Recommended action |
1. If it is a CE error, the system can continue to run normally. 2. If it is a UCE error, check if there are foreign objects on the memory gold finger and memory slot and clean them. 3. If the error persists after reinsertion, replace the DIMM. |
Parity---An error occurrs during memory test, and the rank is disabled
Event code |
0x0c2311c4 |
Message text |
Parity---An error occurrs during memory test, and the rank is disabled---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---An error occurrs during memory test, and the rank is disabled---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. An error occurred during memory testing, and that column has been disabled. |
Recommended action |
Replace the DIMM. |
Parity---LRDIMM RCVEN training failed
Event code |
0x0c231264 |
Message text |
Parity---LRDIMM RCVEN training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---LRDIMM RCVEN training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. LRDIMM RCVEN training failed. |
Recommended action |
1. If it is a CE error, the system can continue to run normally. 2. If it is a UCE error, check if there are foreign objects on the memory gold finger and memory slot, and clean them. 3. If the error persists after reinserting the DIMM, replace the DIMM. |
Parity---Read delay training failed
Event code |
0x0c231284 |
Message text |
Parity---Read delay training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Read delay training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. Read delay training has failed. |
Recommended action |
1. If it is a CE error, the system can continue to run normally. 2. If it is a UCE error, check if there are foreign objects on the memory gold finger and memory slot, and clean them. 3. If the error persists after reinserting the DIMM, replace the DIMM. |
Parity---Write delay training failed
Event code |
0x0c2312b4 |
Message text |
Parity---Write delay training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Write delay training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. Write delay training failed. |
Recommended action |
1. If it is a CE error, the system can continue to run normally. 2. If it is a UCE error, check if there are foreign objects on the memory gold finger and memory slot, and clean them. 3. If the error persists after reinserting the DIMM, replace the DIMM. |
Parity---Mapped out because failed critical mask test at cold boot
Event code |
0x0c28c024 |
Message text |
Parity---Mapped out because failed critical mask test at cold boot---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Mapped out because failed critical mask test at cold boot---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Key mask test failed and mapped out during cold boot. |
Recommended action |
Replace the DIMM. |
Parity---Invalid SPD contents
Event code |
0x0c2ed094 |
Message text |
Parity---Invalid SPD contents---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Invalid SPD contents---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory parity error. Invalid SPD content. |
Recommended action |
Replace the isolated memory module. |
Parity---The DCPMM memory modules of the unexpected model are installed
Event code |
0x0c2ed0c4 |
Message text |
Parity---The DCPMM memory modules of the unexpected model are installed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---The DCPMM memory modules of the unexpected model are installed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Unsupported CDPMMs are inserted. |
Recommended action |
1. The DCPMM type is not supported. 2. Based on the alarm type, confirm the specifications of the DCPMM and replace the DCPMM memory. |
Parity---Failed to set the VDD voltage of the DIMM
Event code |
0x0c2f0014 |
Message text |
Parity---Failed to set the VDD voltage of the DIMM---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Failed to set the VDD voltage of the DIMM---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Software data structure abnormality. |
Recommended action |
Replace the CPU or system board. |
Parity---Delay exceeded
Event code |
0x0c214024 |
Message text |
Parity---Delay exceeded---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Delay exceeded---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Latency exceeded. |
Recommended action |
1. If it is a CE error, the system can continue to run normally. 2. If it is a UCE error, check if there are foreign objects on the memory gold finger and memory slot, and clean them. 3. If the error persists after reinserting the DIMM, replace the DIMM. |
Parity---Timing error occurred during signal line adjustment for memory write leveling training
Event code |
0x0c215014 |
Message text |
Parity---Timing error occurred during signal line adjustment for memory write leveling training---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---Timing error occurred during signal line adjustment for memory write leveling training---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Timing abnormality occurs to write leveling adjustment signal line. |
Recommended action |
1. If it is a CE error, the system can continue to run normally. 2. If it is a UCE error, check if there are foreign objects on the memory gold finger and memory slot, and clean them. 3. If the error persists after reinserting the DIMM, replace the DIMM. |
Parity---CS is not consistent with clock in timing, and the channel is isolated
Event code |
0x0c229044 |
Message text |
Parity---CS is not consistent with clock in timing, and the channel is isolated---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---CS is not consistent with clock in timing, and the channel is isolated---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Timing between CS and clock does not meet the requirements. |
Recommended action |
1. Confirm the memory slot based on the alarm information. 2. Check if there are foreign objects on the memory gold finger and memory slot, and clean them. 3. If the issue persists after reinsertion, replace the DIMM. |
Parity---CS is not consistent with clock in timing, and the channel is isolated
Event code |
0x0c229054 |
Message text |
Parity---CS is not consistent with clock in timing, and the channel is isolated---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---CS is not consistent with clock in timing, and the channel is isolated---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Timing between CS and clock does not meet the requirements. |
Recommended action |
1. Confirm the memory slot based on the alarm information. 2. Check if there are foreign objects on the memory gold finger and memory slot, and clean them. 3. If the issue persists after reinsertion, replace the DIMM. |
Parity---LRDIMM external coarse training failed
Event code |
0x0c231204 |
Message text |
Parity---LRDIMM external coarse training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---LRDIMM external coarse training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
LRDIMM RCVEN training failed. |
Recommended action |
Replace the DIMM. |
Parity---LRDIMM external fine training failed
Event code |
0x0c231214 |
Message text |
Parity---LRDIMM external fine training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---LRDIMM external fine training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
LRDIMM RCVEN training failed. |
Recommended action |
Replace the DIMM. |
Parity---LRDIMM internal coarse training failed
Event code |
0x0c231224 |
Message text |
Parity---LRDIMM internal coarse training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Parity---LRDIMM internal coarse training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
LRDIMM RCVEN training failed. |
Recommended action |
Replace the DIMM. |
Parity---LRDIMM internal fine training failed
Event code |
0x0c231234 |
Message text |
Parity---LRDIMM internal fine training failed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM number. $4: DIMM mark. |
Severity level |
Minor |
Example |
Parity---LRDIMM internal fine training failed---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
LRDIMM RCVEN training failed. |
Recommended action |
Replace the DIMM. |
Memory Device Disabled---The Rank is disabled
Event code |
0x0c40a034 |
Message text |
Memory Device Disabled---The rank is disabled---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM number. $4: DIMM mark. |
Severity level |
Minor |
Example |
Memory Device Disabled---The rank is disabled---Location:CPU:2 CH:1 DIMM:B1 Rank:1 |
Impact |
System performance degradation might occur. This does not affect normal use of the system. |
Cause |
One rank of the memory is disabled, but it does not affect the use of the remaining ranks. |
Recommended action |
Record the disabled rank. For specific isolation reasons, refer to the remaining error logs of the DIMM during that startup. |
Memory Device Disabled---The DIMM is disabled
Event code |
0x0c40a044 |
Message text |
Memory Device Disabled---The DIMM is disabled---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM number. $4: DIMM mark. |
Severity level |
Minor |
Example |
Memory Device Disabled---The DIMM is disabled---Location:CPU:1 CH:1 DIMM:0 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Memory is disabled. |
Recommended action |
Record the disabled DIMM. For specific isolation reasons, refer to the remaining error logs of that DIMM during that startup. |
Memory Device Disabled
Event code |
0x0c4000de |
Message text |
Memory Device Disabled--$1---Location:CPU:$2 CH:$3 DIMM:$4 $5 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $2: CPU number. $3: Channel number. $4: DIMM number. $5: DIMM mark. |
Severity level |
Major |
Example |
Memory Device Disabled---Current Boot Error---Location:CPU:1 CH:1 DIMM:0 A1 |
Impact |
Memory is disabled. System performance degradation or system startup failure might occur. |
Cause |
Whether the BIOS configuration actively disables the memory or memory failure is detected during the system startup process. |
Recommended action |
1. Check if the BIOS configuration actively disables the memory. If disabled, enable the memory in the BIOS settings. 2. If the memory is already enabled in the BIOS configuration but the issue still persists, check the memory channel to ensure there are no faults. 3. If the problem persists, contact the technical support. |
Memory Device Disabled
Event code |
0x0c4020de |
Message text |
Memory Device Disabled---Location:CPU:$2 CH:$3 DIMM:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM number. |
Severity level |
Major |
Example |
Memory Device Disabled ---Location:CPU:1 CH:1 DIMM:0 |
Impact |
Memory is disabled. System performance degradation or system startup failure might occur. |
Cause |
For AMD models, whether the BIOS configuration actively disables the memory or memory faults are detected during the system startup process. |
Recommended action |
1. Check if the BIOS configuration actively disables the memory. If disabled, enable the memory in the BIOS settings. 2. If the memory is already enabled in the BIOS configuration but the issue still persists, check the memory channel to ensure there are no faults. 3. If the problem persists, contact the technical support. |
Correctable ECC or other memory error limit reached
Event code |
0x0c5000de |
Message text |
Correctable ECC or other memory error limit reached--$1---Location:CPU:$2 CH:$3 DIMM:$4 $5 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $2: CPU number. $3: Channel number. $4: DIMM number. $5: DIMM mark. |
Severity level |
Minor |
Example |
Correctable ECC or other memory error limit reached---Current Boot Error---Location:CPU:1 CH:1 DIMM:0 A1 |
Impact |
The system might restart or stop responding. |
Cause |
The memory may not be installed correctly or there could be an internal memory failure. The correctable errors in the memory have reached the set threshold, and when the corresponding Memory RAS mode is enabled, the corresponding RAS features will be executed without causing a system crash. Even in the memory repair mode, the errors still exceed the threshold. |
Recommended action |
1. Reinstall the corresponding memory module to ensure correct installation, clean the gold fingers, make sure no foreign objects exist in the memory slot, and that the temperature and humidity in the environment are normal. 2. Check the memory funnel threshold in the BIOS. If it is too low, adjust the funnel threshold value in the BIOS. 3. If the problem persists, contact the technical support. |
Correctable ECC or other memory error limit reached
Event code |
0x0c5020de |
Message text |
Correctable ECC or other correctable memory error logging limit reached---$1 $2:$3---Location:CPU:$4 CH:$5 DIMM:$6 |
Variable fields |
$1: MCA/UMC(Avaialble in case of CE Count Overflow) $2: CE Count Overflow/Memory CE Storm Threshold/Memory CE Accumulation Threshold $3: Threshold. $4: CPU number. $5: Channel number. $6: DIMM number. |
Severity level |
Minor |
Example |
Correctable ECC or other correctable memory error logging limit reached---MCA CE Count Overflow:8769---Location:CPU:1 CH:5 DIMM:0 |
Impact |
The system might restart or stop responding. |
Cause |
The memory may not be installed correctly or there could be an internal memory failure. The correctable errors in the memory have reached the set threshold and will not cause a system crash. Even in the memory repair mode, the errors still exceed the threshold. |
Recommended action |
1. Reinstall the corresponding memory module. Make sure it is installed correctly, the gold contacts are not contaminated, no foreign objects exist in the memory slot, and the environmental temperature and humidity are normal. 2. Check whether the memory funnel threshold in the BIOS is too low. If so, adjust the funnel threshold value in the BIOS. 3. If the problem persists, contact the technical support. |
Presence detected
Event code |
0x0c6000de |
Message text |
Presence detected |
Variable fields |
N/A |
Severity level |
Info |
Example |
Presence detected |
Impact |
If all the memory modules are not properly seated, the system will not be able to boot up correctly. |
Cause |
This alarm message is generated when the corresponding sensor detects that the monitored memory module is not properly seated. |
Recommended action |
1. Check if the server is in minimal boot mode on the BIOS page. If it is in minimal boot mode, the device may be isolated by the BIOS, causing the HDM to be unable to recognize it. 2. Reinstall the corresponding memory module. Make sure it is installed correctly, the gold contacts are not contaminated, and no foreign objects exist in the memory slots. 3. If the problem persists, contact the technical support. |
Parity---The DCPMM memory modules of the unexpected model are installed
Event code |
0x0c2ed0c4 |
Message text |
Parity---The DCPMM memory modules of the unexpected model are installed---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM number. $4: DIMM mark. |
Severity level |
Minor |
Example |
Parity--- The DCPMM memory modules of the unexpected model are installed---Location:CPU:2 CH:1 DIMM:B1 Rank:0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
Unsupported DCPMM. |
Recommended action |
1. The DCPMM type is not supported. 2. Based on the alarm type, confirm the DCPMM specifications and replace the DCPMM memory module. |
Memory patrol scrub CE occured
Event code |
0x0c3010de |
Message text |
Memory patrol scrub CE occured---$1---Location: Location:CPU:$2 CH:$3 DIMM:$4 $5 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $2: CPU number. $3: Channel number. $4: DIMM number. $5: DIMM mark. |
Severity level |
Minor |
Example |
Memory patrol scrub CE occured---Current Boot Error---Location:CPU:1 CH:1 DIMM:0 A0 |
Impact |
Check failed for reading memory data. No negative impact. |
Cause |
CE Inspection. This error message indicates that there was a data parity error during the read operation of a memory cell. The error occurred on the command/address lines, resulting in abnormal data retrieval from the memory. The error is recorded in the SEL, along with the DIMM that was accessed during the error. |
Recommended action |
1. Check the gold contacts on the edge of the DIMM to confirm if the contacts are clean. 2. Check if the DIMM in the processor socket is connected to bent pins. If found, replace the system board. 3. Consider replacing the DIMM as a preventive measure. If the issue reoccurs multiple times, replace the DIMM. |
Memory patrol scrub UCE occurred and degraded to CE
Event code |
0x0c3020de |
Message text |
Memory patrol scrub UCE occurred and degraded to CE---$1---Location: Location:CPU:$2 CH:$3 DIMM:$4 $5 |
Variable fields |
$1: Specifies if the error occurred during the current boot or the previous boot. Ir can be Current Boot Error or Last Boot Error. $2: CPU number. $3: Channel number. $4: DIMM number. $5: DIMM mark. |
Severity level |
Minor |
Example |
Memory patrol scrub UCE occurred and degraded to CE---Current Boot Error---Location:CPU:1 CH:1 DIMM:0 A0 |
Impact |
Check failed for reading memory data. No negative impact. |
Cause |
UCE Inspection: Degraded CE. This error message indicates that there was a data parity error during the read operation of a memory cell. The error occurred on the command/address lines, resulting in abnormal data retrieval from the memory. The error is recorded in the SEL, along with the DIMM that was accessed during the error. |
Recommended action |
1. Check the gold contacts on the edge of the DIMM to confirm if the contacts are clean. 2. Check if the DIMM in the processor socket is connected to bent pins. If found, replace the system board. 3. Consider replacing the DIMM as a preventive measure. If the issue reoccurs multiple times, replace the DIMM. |
Configuration error---RDIMMs are installed on the server that supports only UDIMMs
Event code |
0x0c701014 |
Message text |
Configuration error---RDIMMs are installed on the server that supports only UDIMMs---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---RDIMMs are installed on the server that supports only UDIMMs-Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
An RDIMM was inserted into a CPU platform that only supports UDIMM. |
Recommended action |
Check the DIMM type and replace the unsupported memory modules. |
Configuration error---UDIMMs are installed on the server that supports only RDIMMs
Event code |
0x0c702014 |
Message text |
Configuration error---UDIMMs are installed on the server that supports only RDIMMs---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---UDIMMs are installed on the server that supports only RDIMMs---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
An UDIMM was inserted into a CPU platform that only supports RDIMM. |
Recommended action |
Check the DIMM type and replace the unsupported memory modules. |
Configuration error---SODIMMs are installed on the server that supports only RDIMMs
Event code |
0x0c703014 |
Message text |
Configuration error---SODIMMs are installed on the server that supports only RDIMMs-Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---SODIMMs are installed on the server that supports only RDIMMs-Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
An SODIMM was inserted into a platform that only supports RDIMM. |
Recommended action |
Check the DIMM type and replace the unsupported memory modules. |
Configuration error---The number of ranks per channel can be only 1, 2, or 4
Event code |
0x0c707024 |
Message text |
Configuration error---The number of ranks per channel can be only 1, 2, or 4---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The number of ranks per channel can be only 1, 2, or 4---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The number of ranks in the memory does not meet the requirements of the CPU platform. The current CPU platform supports a maximum of 4 ranks of memory. |
Recommended action |
1. Based on the error message, determine the memory slot. 2. Replace the memory module that does not meet the rank requirements. |
Configuration error---Columns, rows, or banks of the DIMM cannot meet the JEDEC standards, and LRDIMMs are not supported
Event code |
0x0c707044 |
Message text |
Configuration error---Columns, rows, or banks of the DIMM cannot meet the JEDEC standards, and LRDIMMs are not supported---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---Columns, rows, or banks of the DIMM cannot meet the JEDEC standards, and LRDIMMs are not supported---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Unsupported memory type: · The memory design (COL, Row, Bank) does not comply with JEDEC standard design. · The LRDIMM is not on the server's supported list. |
Recommended action |
1. Based on the error message, determine the memory slot. 2. Replace the memory module that does not meet the rank requirements. |
Configuration error---The number of ranks in the channel exceeds 8
Event code |
0x0c707054 |
Message text |
Configuration error---The number of ranks in the channel exceeds 8---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The number of ranks in the channel exceeds 8---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The total number of ranks of all memory in the channel exceeds the maximum supported number of ranks (8). |
Recommended action |
1. Check the number of memory ranks in the channel as indicated in the error message. 2. Replace the memory module that is causing the total number of memory ranks to exceed 8. |
Configuration error---Support for ECC on the DIMMs is not consistent with support for ECC on the server
Event code |
0x0c707094 |
Message text |
Configuration error---Support for ECC on the DIMMs is not consistent with support for ECC on the server---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---Support for ECC on the DIMMs is not consistent with support for ECC on the server---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The ECC support for the server's memory is inconsistent. |
Recommended action |
1. Identify the memory type according to the error message and the memory slot. 2. Determine whether to replace the DIMM or disable ECC. |
Configuration error---The voltage for a DDR4 DIMM must be 12V, and the voltage for a DDR5 DIMM must be 11V
Event code |
0x0c7070a4 |
Message text |
Configuration error---The voltage for a DDR4 DIMM must be 12V, and the voltage for a DDR5 DIMM must be 11V---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The voltage for a DDR4 DIMM must be 12V, and the voltage for a DDR5 DIMM must be 11V---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The current voltage does not meet the supported voltage of the memory. · DDR4 memory supports a voltage of 12V. · DDR5 memory supports a voltage of 11V. |
Recommended action |
1. Identify the supported voltage of the current memory according to the memory slot (by checking the memory datasheet). 2. Replace the DIMM. |
Configuration error---The CPU is not compatible with 3DS DIMMs
Event code |
0x0c707104 |
Message text |
Configuration error---The CPU is not compatible with 3DS DIMMs---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The CPU is not compatible with 3DS DIMMs-Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The current CPU does not support memory with 3DS packaging. |
Recommended action |
The current CPU does not support DIMM modules with 3DS packaging. Replace the DIMM module in the reported slot with a compatible one. |
Configuration error---NVDIMMs with stepping lower than 0x10 are not supported
Event code |
0x0c707114 |
Message text |
Configuration error---NVDIMMs with stepping lower than 0x10 are not supported---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---NVDIMMs with stepping lower than 0x10 are not supported---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Configuration error. NVDIMMs with a step value lower than 16 are not supported. |
Recommended action |
The current CPU does not support the stepping of this DCPMM. Verify the detailed information of the DIMM. |
Configuration error---The CPU is not compatible with the DIMMs
Event code |
0x0c707144 |
Message text |
Configuration error---The CPU is not compatible with the DIMMs---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The CPU is not compatible with the DIMMs---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Configuration error: CPU and DIMM are not compatible. |
Recommended action |
1. Replace the CPU with support for installing a maximum of 2 memory modules per channel. 2. Adjust the memory installation by installing a maximum of 1 memory module per channel. |
Configuration error---The frequency of the DIMM is not supported on the server
Event code |
0x0c707154 |
Message text |
Configuration error---The frequency of the DIMM is not supported on the server---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The frequency of the DIMM is not supported on the server---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The current platform configuration does not support the frequency of the memory module. |
Recommended action |
The current configuration does not support the memory frequency settings. Confirm whether the Enforce Population POR/Enforce DDR Memory Frequency POR option in the Setup menu is enabled and whether the supported frequency of the memory module is within the supported range. |
Configuration error---24Gb or higher Capacity DRAMs not supported with this CPU
Event code |
0x0c7071f4 |
Message text |
Configuration error---24Gb or higher Capacity DRAMs not supported with this CPU---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---24Gb or higher Capacity DRAMs not supported with this CPU---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The CPU does not support memory modules with a capacity of 24GB or above. |
Recommended action |
The current CPU does not support memory modules with a capacity of 24GB or above. Check the error message for the corresponding DIMM and replace the DIMM with a supported capacity. |
Configuration error---The CPU is not compatible with LRDIMMs
Event code |
0x0c707214 |
Message text |
Configuration error---The CPU is not compatible with LRDIMMs---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The CPU is not compatible with LRDIMMs---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Cause |
The CPU does not support DIMMs with low loads. |
Recommended action |
Replace the LRDIMMs or the CPU. |
Configuration error--- DCPMM + HBM config is not supported. Disable DCPMM populated channel
Event code |
0x0c707224 |
Message text |
Configuration error--- DCPMM + HBM config is not supported. Disable DCPMM populated channel---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error--- DCPMM + HBM config is not supported. Disable DCPMM populated channel---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
DCPMM and HBM cannot coexist. The channel for DCPMM detected by memory installation check must be disabled. |
Recommended action |
DCPMM and HBM cannot coexist. Disable the channel for DCPMM detected by memory installation check. |
Configuration error--- Failed to enable the lockstep mode The memory RAS mode has degraded to independent
Event code |
0x0c709014 |
Message text |
Configuration error--- Failed to enable the lockstep mode The memory RAS mode has degraded to independent ---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---Failed to enable the lockstep mode The memory RAS mode has degraded to independent---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The memory configuration cannot enable Lockstep mode. It will be downgraded to independent mode. |
Recommended action |
Lockstep configuration has been downgraded. Check if the memory installation satisfies Lockstep mode. |
Configuration error---Failed to enable the full mirror mode
Event code |
0x0c70c014 |
Message text |
Configuration error---Failed to enable the full mirror modet---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---Failed to enable the full mirror mode---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Impact |
The system might restart or stop responding. |
Cause |
Enabling Full Mirror RAS mode for memory has failed. The Mirror configuration will be downgraded. |
Recommended action |
Mirror configuration has been downgraded. Check if the memory installation satisfies mirror mode. |
Configuration error---Failed to enable the partial mirror mode The memory RAS mode degraded to independent
Event code |
0x0c70d014 |
Message text |
Configuration error--- Failed to enable the partial mirror mode The memory RAS mode degraded to independent---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error--- Failed to enable the partial mirror mode The memory RAS mode degraded to independent---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Impact |
The system might restart or stop responding. |
Cause |
Unable to start partial mirror mode. The system is switched to Independent channel mode. |
Recommended action |
Partial mirror configuration has been downgraded. Check if the memory installation satisfies Partial mirror mode. |
Configuration error---The memory interleaving configuration cannot meet the requirements of the server
Event code |
0x0c70e034 |
Message text |
Configuration error---The memory interleaving configuration cannot meet the requirements of the server---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The memory interleaving configuration cannot meet the requirements of the server---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Memory configuration error. The memory interleaving configuration does not meet the requirements of the server. |
Recommended action |
1. Check the memory interleaving configuration in the setup (NUMA and interleaving). 2. Update the BIOS firmware. 3. Collect the BIOS logs and contact the technical support. |
Configuration error---Failed to enable the rank sparing mode The memory RAS mode has degraded to independent
Event code |
0x0c710014 |
Message text |
Configuration error---Failed to enable the rank sparing mode The memory RAS mode has degraded to independent---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---Failed to enable the rank sparing mode The memory RAS mode has degraded to independent---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Configuration error. Rank Sparing mode cannot be enabled. The memory RAS mode has been downgraded to independent mode. |
Recommended action |
The Sparing configuration has been downgraded. Check if the memory installation satisfies the Sparing mode. |
Configuration error---Failed to enable patrol scrubbing
Event code |
0x0c711004 |
Message text |
Configuration error---Failed to enable patrol scrubbing---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---Failed to enable patrol scrubbing---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Configuration error. Memory patrol/scrubbing cannot be enabled. |
Recommended action |
Enabling patrol scrub has failed. Check the RAS (Reliability, Availability, Serviceability) features supported by the CPU specifications. |
Configuration error---The number of ranks in the black slot is greater than that in the white slot, or the DIMM is installed in the black slot with the white slot empty
Event code |
0x0c717014 |
Message text |
Configuration error---The number of ranks in the black slot is greater than that in the white slot, or the DIMM is installed in the black slot with the white slot empty---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The number of ranks in the black slot is greater than that in the white slot, or the DIMM is installed in the black slot with the white slot empty---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
1. The principle of having larger rank memory in front (white slot) is not met under the channel configuration. 2. The principle of white slot preferred with memory is not met. |
Recommended action |
The memory installation is incorrect. Refer to the Intel PDG for DDR5/DCPMM and other relevant resources for proper memory installation guidelines. |
Configuration error---DIMM population error Two DDR-T memory modules cannot be installed in a channel
Event code |
0x0c717034 |
Message text |
Configuration error---DIMM population error Two DDR-T memory modules cannot be installed in a channel---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---DIMM population error Two DDR-T memory modules cannot be installed in a channel---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Two DCPMMs are installed in a channel, which does not meet DIMM installation requirements. |
Recommended action |
The DIMMs are installed incorrectly. For more information, see DDR5/DCPMM related information from Intel PDG. |
Configuration error---The DDR-T memory module is installed in the white slot
Event code |
0x0c717054 |
Message text |
Configuration error---The DDR-T memory module is installed in the white slot---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---The DDR-T memory module is installed in the white slot---Location:CPU:1 CH:1 DIMM:A1 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
The DCPMM is installed in a white slot, which does not meet DIMM installation requirements. |
Recommended action |
The DIMMs are installed incorrectly. For more information, see DDR5/DCPMM related information from Intel PDG. |
Configuration error---ODT configuration errorThe channel is isolated
Event code |
0x0c729034 |
Message text |
Configuration error---ODT configuration error The channel is isolated---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---ODT configuration errorThe channel is isolated---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Memory ODT is configured incorrectly, and the channel is isolated. |
Recommended action |
1. Identify the DIMM slot according to the notification. 2. Make sure the gold contacts on the DIMM and the DIMM slot are clean. 3. If the message is generated again after re-installation, replace the DIMM. |
Configuration error---REQ is not consistent with clock in timing
Event code |
0x0c729064 |
Message text |
Configuration error---REQ is not consistent with clock in timing---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---REQ is not consistent with clock in timing---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
REQ and the clock input have inconsistent timing. |
Recommended action |
1. Identify the DIMM slot according to the notification. 2. Make sure the gold contacts on the DIMM and the DIMM slot are clean. 3. If the message is generated again after re-installation, replace the DIMM. |
Configuration error---Failed to enable ADDDC
Event code |
0x0c73a014 |
Message text |
Configuration error---Failed to enable ADDDC---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---Failed to enable ADDDC---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Failed to enable ADDDC due to incorrect memory configuration. |
Recommended action |
Verify that the memory configuration meets the ADDDC requirements. |
Configuration error---NVMCTRL_MEDIA_NOTREADY
Event code |
0x0c784024 |
Message text |
Configuration error---NVMCTRL_MEDIA_NOTREADY---Location:CPU:$1 CH:$2 DIMM:$3 Rank:$4 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM mark. $4: Rank number. |
Severity level |
Minor |
Example |
Configuration error---NVMCTRL_MEDIA_NOTREADY---Location:CPU:1 CH:2 DIMM:A0 Rank:0 |
Impact |
The system might restart or stop responding. |
Cause |
Memory configuration is incorrect. The DCPMM firmware medium is not ready. |
Recommended action |
1. Access the BIOS setup utility to identify the DCPMM status and update the DCPMM firmware. 2. Replace the DIMM. |
Drive Slot
Drive Presence
Event code |
0x0d0000de |
Message text |
Drive Presence |
Variable fields |
N/A |
Severity level |
Info |
Example |
Drive Presence |
Impact |
If the alarm is removed, it indicates that the drive is removed or not installed correctly, which impacts the storage system stability of the system. |
Cause |
This message is not generated when the server starts up for the first time unless an error occurs. |
Recommended action |
No action is required. |
Drive Fault
Event code |
0x0d1000de |
Message text |
Drive Fault |
Variable fields |
N/A |
Severity level |
Major |
Example |
Drive Fault --- Bay Slot: 1, HDD Slot: 2 |
Impact |
The drive is faulty, which might cause data loss. |
Cause |
The drive cannot be identified or failed. |
Recommended action |
1. Log in to HDM, view drive information. If the drive in the corresponding slot cannot be identified, verify that the drive is installed correctly. 2. RE-install the drive, and identify whether the drive can be identified. If the drive cannot be identified after re-installation, replace the drive. 3. View drive information and verify that the status of the drive is Unconfigured Good. 4. View drive information and verify that the drive can be identified and is normal, and the drive number on HDM is consistent with the drive number in the message. If the drive number on HDM is different from the drive number in the message, verify that the drive cables are connected correctly. 5. If multiple drives are absent, verify that the drive cables and the drive backplane are normal. Replace the faulty components, if any. 6. Verify that drive LEDs are normal, and the drive can be identified and is accessible in the OS. If a drive LED is orange, the drive is faulty. Replace the faulty components, if any. 7. Verify that the storage controller is in normal state. 8. If the problem persists, contact the technical support. |
Predictive Failure
Event code |
0x0d2000de |
Message text |
Predictive Failure |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Predictive Failure |
Impact |
The drive reliability decreases, which might impact the OS storage performance and service operation. |
Cause |
The RAID controller or NVMe SSD reports a predictive failure, which can be a storage medium reserved block alarm, drive lifetime alarm, Prefail alarm, or bad sector alarm. |
Recommended action |
1. Log in to HDM, and verify that the drive is running correctly. 2. If the drive is abnormal, replace the drive. 3. If the problem persists, contact the technical support. |
In Critical Array
Event code |
0x0d5000de |
Message text |
In Critical Array---PCIe slot:$1---LDDevno:$2 |
Variable fields |
$1: PCIe slot where the logical drive resides. $2: Logical drive number. |
Severity level |
Major |
Example |
In Critical Array---PCIe slot:1---LDDevno:1 |
Impact |
The logical drive degraded, which might impact data reliability. |
Cause |
A drive in a logical drive was removed or failed and the logical drive degraded. |
Recommended action |
1. Verify that no drive is not removed. If a drive is removed, re-install the drive and recreate the RAID array. 2. Log in to HDM, view drive information from the storage page, and verify that all drives in the logical drive are identified correctly. If a drive cannot be identified, re-install the drive. If the drive cannot be identified after re-installation, replace the drive. 3. Log in to HDM, view drive information, and verify that the status of the drive is Unconfigured Good. 4. After the drive is identified correctly, recreate the RAID array. 5. If the problem persists, contact the technical support. |
In Failed Array
Event code |
0x0d6000de |
Message text |
In Failed Array---PCIe slot:$1---LDDevno:$2 |
Variable fields |
$1: PCIe slot where the logical drive resides. $2: Logical drive number. |
Severity level |
Major |
Example |
In Failed Array---PCIe slot:1---LDDevno:1 |
Impact |
The RAID array becomes invalid, causing data loss offline. |
Cause |
A drive in a logical drive was removed or failed and the logical drive was totally corrupted. |
Recommended action |
1. Verify that no drive is removed. If a drive is removed, re-install the drive. 2. If the drive is installed correctly, log in to HDM. View drive information from the storage page, and verify that the drive can be identified correctly. If the drive cannot be identified, re-install the drive. If the drive cannot be identified after re-installation, replace the drive. 3. If the drive is installed correctly, log in to HDM, view drive information from the storage page, and verify that the status of the drive is Unconfigured Good. 4. After the drive is identified correctly, verify that the RAID array is normal. If the RAID array is faulty, recreate the RAID array. 5. If the problem persists, contact the technical support. |
Rebuild/Remap in progress
Event code |
0x0d7000de |
Message text |
Rebuild/Remap in progress |
Variable fields |
N/A |
Severity level |
Info |
Example |
Rebuild/Remap in progress |
Impact |
No negative impact. |
Cause |
This message is generated during RAID rebuilding after a drive is installed. |
Recommended action |
No action is required. |
The disk triggered an media error
Event code |
0x0da000de |
Message text |
The disk triggered an media error--$1 |
Variable fields |
$1: Drive location. |
Severity level |
Info |
Example |
The disk triggered an media error--Front 1 |
Impact |
A media error on the storage media might cause data loss. |
Cause |
The number of media errors exceeded the threshold. |
Recommended action |
1. Update the drive firmware. 2. Replace the drive. 3. If the problem persists, contact the technical support. |
The disk triggered an uncorrectable error
Event code |
0x0db000de |
Message text |
The disk triggered an uncorrectable error--$1 |
Variable fields |
$1: Drive location. |
Severity level |
Minor |
Example |
The disk triggered an uncorrectable error--Front 1 |
Impact |
An uncorrectable error on the storage media might cause data loss. |
Cause |
The number of uncorrectable errors exceeded the threshold. |
Recommended action |
1. Update the drive firmware. 2. Replace the drive. 3. If the problem persists, contact the technical support. |
The disk is missing
Event code |
0x0dc000de |
Message text |
The disk is missing |
Variable fields |
N/A |
Severity level |
Major |
Example |
The disk is missing |
Impact |
The drive is removed or not installed correctly, which impacts the storage system stability of the system. |
Cause |
The drive cannot be identified by the storage controller or drive cables are connected incorrectly. |
Recommended action |
1. Log in to HDM, and verify that the drive can be identified successfully. 2. Verify that the drive data cables, power cords, and signal cables are connected correctly. 3. Re-install the drive. 4. Replace the drive. 5. If the problem persists, contact the technical support. |
System Firmware Progress
System Firmware Error (POST Error)---Run sense AMP HW FSM failed
Event code |
0x0f0fe044 |
Message text |
System Firmware Error (POST Error)---Run sense AMP HW FSM failed |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---Run sense AMP HW FSM failed |
Impact |
System startup failure might occur. |
Cause |
A memory configuration error occurred. |
Recommended action |
1. Update the BIOS firmware. 2. Verify that the CPUs and DIMMs are installed correctly. 3. Reduce interleaving configuration (memory interleaving and NUMA). |
System Firmware Error (POST Error)--- Memory population enforcement mismatch, Please check the DIMM symmetry on the socket
Event code |
0x0f017134 |
Message text |
System Firmware Error (POST Error)--- Memory population enforcement mismatch, Please check the DIMM symmetry on the socket |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)--- Memory population enforcement mismatch, Please check the DIMM symmetry on the socket |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
The DIMM population is incorrect. |
Recommended action |
See DDR5/DCPMM related information from Intel PDG. |
System Firmware Error (POST Error)---No Dimm on socket0
Event code |
0x0f017184 |
Message text |
System Firmware Error (POST Error)---No Dimm on socket$1 |
Variable fields |
$1: CPU number. |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---No Dimm on socket0 |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
No DIMMs are installed in the server. |
Recommended action |
1. Verify that the DIMMs are installed correctly as required in the user guide for the server. Re-install all DIMMs if needed. 2. Update the BIOS and HDM firmware to the up-to-date version. 3. Power off the server, and reconnect all power cords, and then power on the server. Make sure the server is completely powered off before powering on the server. 4. Re-install the DIMM. Verify that the gold contacts on the DIMM are not contaminated and the DIMM slot esdo not contain any foreign objects. 5. Replace the DIMM, and then power on the server. 6. If the problem persists, contact the technical support. |
System Firmware Error (POST Error)---No memory found
Event code |
0x0f0e8014 |
Message text |
System Firmware Error (POST Error)---No memory found |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---No memory found |
Impact |
The system cannot start up correctly. |
Cause |
No DIMMs are available. |
Recommended action |
Verify that the DIMMs are available in the system. |
System Firmware Error (POST Error)---No DIMM is available for memory-mapping operation
Event code |
0x0f0e8024 |
Message text |
System Firmware Error (POST Error)---No DIMM is available for memory-mapping operation |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---No DIMM is available for memory-mapping operation |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
No DIMMs is available for memory mapping. |
Recommended action |
1. Log in to HDM, access the memory page, and verify that available DIMMs exist. 2. If the problem persists, contact the technical support. |
System Firmware Error (POST Error)---DIMM population error
Event code |
0x0f0ed024 |
Message text |
System Firmware Error (POST Error)---DIMM population error |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---DIMM population error |
Impact |
System startup failure might occur. |
Cause |
A DIMM compatibility error occurred. |
Recommended action |
See the BMC maintenance guide for the server. |
System Firmware Error (POST Error)---Some CPU links failed to train. KTI topology changed across reset
Event code |
0x0f003ff4 |
Message text |
System Firmware Error (POST Error)---Some CPU links failed to train. KTI topology changed across reset |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---Some CPU links failed to train. KTI topology changed across reset |
Impact |
System startup failure might occur. |
Cause |
A CPU error occurred. |
Recommended action |
Verify that CPUs are installed correctly. |
System Firmware Error (POST Error)---CPU stepping mismatch detected
Event code |
0x0f010ff4 |
Message text |
System Firmware Error (POST Error)---CPU stepping mismatch detected |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---CPU stepping mismatch detected |
Impact |
System startup failure might occur. |
Cause |
The CPUs were installed incorrectly and CPU stepping mismatch occurred. |
Recommended action |
Verify that the CPU stepping is consistent between the installed CPUs. |
System Firmware Error (POST Error)---KTI Topology Change Logged
Event code |
0x0f0ffff4 |
Message text |
System Firmware Error (POST Error)---KTI Topology Change Logged |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---KTI Topology Change Logged |
Impact |
System startup failure might occur. |
Cause |
A CPU error occurred. |
Recommended action |
Verify that the CPUs are installed correctly. |
System Firmware Error (POST Error)---CPU matching failure---CPU stepping is detected
Event code |
0x0f0d00de |
Message text |
System Firmware Error (POST Error)---CPU matching failure---CPU stepping is detected |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---CPU matching failure---CPU stepping is detected |
Impact |
System startup failure might occur. |
Cause |
A CPU stepping mismatch error occurred at POST. |
Recommended action |
1. Verify that the corresponding CPU is installed correctly. 2. Verify that the CPU has the same model as the primary CPU. 3. Verify that CPU stepping and microcode of the CPU match the primary CPU. Report to the BIOS to identify whether to update the CPU microcode upon power-on. |
System Firmware Error (POST Error)---CPU matching failure---CPU frequency is detected
Event code |
0x0f0d10de |
Message text |
System Firmware Error (POST Error)---CPU matching failure---CPU frequency is detected |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---CPU matching failure---CPU frequency is detected |
Impact |
System startup failure might occur. |
Cause |
A CPU frequency mismatch error occurred at POST. |
Recommended action |
1. Verify that the corresponding CPU is installed correctly. 2. Verify that the CPU has the same model as the primary CPU. 3. Verify that CPU stepping and microcode of the CPU match the primary CPU. Report to the BIOS to identify whether to update the CPU microcode upon power-on. |
System Firmware Error (POST Error)---CPU matching failure---CPU Microcode is detected
Event code |
0x0f0d20de |
Message text |
System Firmware Error (POST Error)---CPU matching failure---CPU Microcode is detected |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---CPU matching failure---CPU Microcode is detected |
Impact |
System startup failure might occur. |
Cause |
A CPU microcode mismatch error occurred at POST. |
Recommended action |
4. Verify that the corresponding CPU is installed correctly. 5. Verify that the CPU has the same model as the primary CPU. 6. Verify that CPU stepping and microcode of the CPU match the primary CPU. Report to the BIOS to identify whether to update the CPU microcode upon power-on. |
System Firmware Error (POST Error)---CPU matching failure---UPI Topology is detected
Event code |
0x0f0d30de |
Message text |
System Firmware Error (POST Error)---CPU matching failure---UPI Topology is detected |
Variable fields |
N/A |
Severity level |
Major |
Example |
System Firmware Error (POST Error)---CPU matching failure---UPI Topology is detected |
Impact |
System startup failure might occur. |
Cause |
A CPU UPI mismatch error occurred at POST. |
Recommended action |
1. Verify that the corresponding CPU is installed correctly. 2. Verify that the CPU has the same model as the primary CPU. 3. Verify that CPU stepping and microcode of the CPU match the primary CPU. Report to the BIOS to identify whether to update the CPU microcode upon power-on. |
System Firmware Error(POST Error)---Unrecoverable video controller failure
Event code |
0x0f0090de |
Message text |
System Firmware Error(POST Error)---Unrecoverable video controller failure |
Variable fields |
N/A |
Severity level |
Minor |
Example |
System Firmware Error(POST Error)---Unrecoverable video controller failure |
Impact |
KVM video display is abnormal. |
Cause |
Two VGA screen captures are the same during the host startup process. |
Recommended action |
If the problem persists, contact the technical support. |
System Firmware Hang
Event code |
0x0f1000de |
Message text |
System Firmware Hang |
Variable fields |
N/A |
Severity level |
Critical |
Example |
System Firmware Hang |
Impact |
System operation failure might occur. |
Cause |
The BIOS cannot start up. |
Recommended action |
1. Resolve the issue based on other event logs reported simultaneously for the component. 2. If the problem persists, contact the technical support. |
System Firmware Progress---Video initialization---Detection unsuccessful
Event code |
0x0f1000de |
Message text |
System Firmware Progress---Video initialization---Detection unsuccessful |
Variable fields |
N/A |
Severity level |
Minor |
Example |
System Firmware Progress---Video initialization---Detection unsuccessful |
Impact |
No negative impact. |
Cause |
TPM/TCM check failure occurred. |
Recommended action |
This message is generated when the TPM/TCM self-test signal is lost or a device access failure occurs, which typically does not affect the system operation. If the problem persists, contact the technical support. |
System Firmware Progress---Secondary processor(s) initialization---Detection unsuccessful
Event code |
0x0f1000de |
Message text |
System Firmware Progress---Secondary processor(s) initialization---Detection unsuccessful |
Variable fields |
N/A |
Severity level |
Minor |
Example |
System Firmware Progress---Secondary processor(s) initialization---Detection unsuccessful |
Impact |
No negative impact. |
Cause |
A video controller check error occurred. |
Recommended action |
This message is generated when the video controller check fails, which typically does not affect the system operation. If the problem persists, contact the technical support. |
Event Logging Disabled
Log Area Reset/Cleared
Event code |
0x102000de |
Message text |
Log Area Reset/Cleared |
Variable fields |
N/A |
Severity level |
Info |
Example |
Log Area Reset/Cleared |
Impact |
No negative impact. |
Cause |
This message is generated when all event log entries are cleared. |
Recommended action |
No action is required. |
SEL Full
Event code |
0x104000de |
Message text |
SEL Full |
Variable fields |
N/A |
Severity level |
Minor |
Example |
SEL Full |
Impact |
The system stops logging new events. |
Cause |
This message is generated when one of the following occurs: · The event log reaches its maximum size. The system stops logging new events, and the old logs might be overwritten. · A user disables event logging. |
Recommended action |
Log in to HDM, enter the Event Log page, and clear all event logs. |
SEL Almost Full
Event code |
0x105000de |
Message text |
SEL Almost Full |
Variable fields |
N/A |
Severity level |
Minor |
Example |
SEL Almost Full |
Impact |
No negative impact. |
Cause |
The log file is reaching its maximum size. |
Recommended action |
Log in to HDM, enter the Event Log page, and clear all event logs. |
System Event
System Reconfigured---BIOS load default. CMOS cleared
Event code |
0x120000de |
Message text |
System Reconfigured---BIOS load default. CMOS cleared |
Variable fields |
N/A |
Severity level |
Minor |
Example |
System Reconfigured---BIOS load default. CMOS cleared |
Impact |
The BIOS loads the default settings and the user-configured settings get lost. |
Cause |
The system board battery is abnormal. |
Recommended action |
1. Verify that the BIOS boot mode meets the requirements of secure boot. If not, change the boot mode to UEFI. 2. Verify that the BIOS firmware is upgraded successfully. 3. Upgrade the BIOS with the factory defaults (if any) or default settings of the BIOS restored. 4. If the problem persists, contact the technical support. |
Limit Exceeded---Cpu usage exceeds the threshold
Event code |
0x120100de |
Message text |
Limit Exceeded---Cpu usage exceeds the threshold---Current usage $1, Threshold $2 |
Variable fields |
$1: Current CPU usage. $2: CPU usage threshold. |
Severity level |
Major |
Example |
Limit Exceeded---Cpu usage exceeds the threshold---Current usage 82%, Threshold 80% |
Impact |
System performance degradation might occur. |
Cause |
The CPU usage exceeds the threshold. |
Recommended action |
No action is required. |
Limit Exceeded---Mem usage exceeds the threshold
Event code |
0x120200de |
Message text |
Limit Exceeded---Mem usage exceeds the threshold---Current usage $1, Threshold $2 |
Variable fields |
$1: Current memory usage. $2: Memory usage threshold. |
Severity level |
Major |
Example |
Limit Exceeded---Mem usage exceeds the threshold---Current usage 81%, Threshold 80% |
Impact |
System performance degradation might occur. |
Cause |
The memory usage exceeds the threshold. |
Recommended action |
No action is required. |
Limit Exceeded---Network usage exceeds the threshold
Event code |
0x120300de |
Message text |
Limit Exceeded---Network usage exceeds the threshold---Current usage $1, Threshold $2 |
Variable fields |
$1: Current network usage. $2: Network usage threshold. |
Severity level |
Major |
Example |
Limit Exceeded---Network usage exceeds the threshold---Current usage 81%, Threshold 80% |
Impact |
The network might get lost. |
Cause |
The network usage exceeds the threshold. |
Recommended action |
This message is triggered by FIST SMS according to the system resource usage. |
Limit Exceeded---Hard disk usage exceeds the threshold
Event code |
0x120400de |
Message text |
Limit Exceeded---Hard disk usage exceeds the threshold---OS:Linux/Unix,See disk details about Logical disk name,Current usage 81%, Threshold 80% |
Variable fields |
$1: Current drive usage. $2: Drive usage threshold. |
Severity level |
Major |
Example |
Limit Exceeded---Hard disk usage exceeds the threshold---OS:Linux/Unix,See disk details about Logical disk name,Current usage 81%, Threshold 80% |
Impact |
The drive reliability decreases, which might impact the storage performance and service operation of the OS. |
Cause |
The drive usage exceeds the threshold. |
Recommended action |
This message is triggered by FIST SMS according to the system resource usage. |
Timestamp clock synch---BMC Time SYNC succeed
Event code |
0x125000de |
Message text |
Timestamp Clock Synch---BMC Time SYNC succeed. |
Variable fields |
N/A |
Severity level |
Info |
Example |
Timestamp Clock Synch---BMC Time SYNC succeed. |
Impact |
No negative impact. |
Cause |
HDM synchronized ME clock successfully. |
Recommended action |
No action is required. |
Timestamp clock synch
Event code |
0x128000de |
Message text |
Timestamp Clock Synch---event is $1 of pair---SEL Timestamp Clock updated |
Variable fields |
$1: In the format of first/second, where first represents the event before time synchronization and second represents the event after time synchronization. |
Severity level |
Info |
Example |
Timestamp Clock Synch---event is first of pair---SEL Timestamp Clock updated |
Impact |
No negative impact. |
Cause |
HDM synchronizes time with the server when the server is powered on. The first event is triggered before time synchronization and the second event is triggered after time synchronization. |
Recommended action |
No action is required. |
Critical Interrupt
PCI PERR
Event code |
0x134000de |
Message text |
PCI PERR ---Slot $1---PCIE Name:$2 |
Variable fields |
$1: Slot number. $2: PCIe name. |
Severity level |
Major |
Example |
PCI PERR ---Slot 3---PCIE Name: RAID-LSI-9361-8i |
Impact |
An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough. |
Cause |
An internal parity error occurs on the PCIe module. This message is generated when the PERR signal (parity check) on the PCIe module is abnormal. |
Recommended action |
1. If the message is reported serval times during a period of time, ensure that the riser card is securely connected to the system board. 2. Reboot the server. 3. Locate the PCIe module based on the slot number. 4. If the PCIe module is a removable component, perform the following operations: a. Verify that the PCIe module is installed correctly. b. Verify that the golden plating on the PCIe module is not contaminated. c. Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot. d. Update all firmware and drivers, including non-Intel components. e. If the error occurs on the slot, verify that the gold plating on the riser card is not contaminated. f. Replace the PCIe module. 5. If the PCIe module is embedded on the system board, perform the following operations: a. Update the BIOS, firmware, and drivers. b. Replace the system board. |
PCI SERR
Event code |
0x13500000 |
Message text |
PCI SERR---Slot $1---PCIE Name:$2 |
Variable fields |
$1: Slot number. $2: PCIe name. |
Severity level |
Major |
Example |
PCI SERR---Slot 3---PCIE Name: RAID-LSI-9361-8i |
Impact |
An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough. |
Cause |
An internal system error occurred on the PCIe module. This message is generated when the SERR signal on the PCIe module is abnormal. A system error includes an address parity error, data parity error within a period, and other fatal errors. |
Recommended action |
1. If the message is reported serval times during a period of time, ensure that the riser card is securely connected to the system board. 2. Reboot the server. 3. Locate the PCIe module based on the slot number. 4. If the PCIe module is a removable component, perform the following operations: a. Verify that the PCIe module is installed correctly. b. Verify that the golden plating on the PCIe module is not contaminated. c. Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot. d. Update all firmware and drivers, including non-Intel components. e. If the error occurs on the slot, verify that the gold plating on the riser card is not contaminated. f. Replace the PCIe module. 5. If the PCIe module is embedded on the system board, perform the following operations: a. Update the BIOS, firmware, and drivers. b. Replace the system board. |
Bus Correctable Error
Event code |
0x137000de |
Message text |
Bus Correctable Error ---Slot $1---PCIE Name:$2 |
Variable fields |
$1: PCIe slot number. $2: PCIe module name. |
Severity level |
Minor |
Example |
Bus Correctable Error---Slot 3---PCIE Name: RAID-LSI-9361-8i |
Impact |
If this message is generated occasionally, no negative impact occurs on the system. If this message is generated frequently, the PCIe module performance might be affected. |
Cause |
An internal correctable error occurred on the PCIe module. |
Recommended action |
1. Ignore this message if it is generated during access to the PCIe module, ignore it. 2. If the same message is generated repeatedly, use the slot number to locate the faulty PCIe module. 3. If the PCIe module is removable, verify that the PCIe module is installed correctly or install the PCIe module to another slot to identify the cause. 4. Replace the PCIe module. |
Bus Correctable Error
Event code |
0x137800de |
Message text |
Bus Correctable Error ---Slot $1---PCIE Name:$2 |
Variable fields |
$1: PCIe slot number. $2: PCIe module name. |
Severity level |
Minor |
Example |
Bus Correctable Error---Slot 3---PCIE Name: RAID-LSI-9361-8i |
Impact |
If this message is generated occasionally, no negative impact occurs on the system. If this message is generated frequently, the PCIe module performance might be affected. |
Cause |
An internal correctable error occurred on the PCIe module on an AMD model. |
Recommended action |
1. Ignore this message if it is generated during access to the PCIe module, ignore it. 2. If the same message is generated repeatedly, use the slot number to locate the faulty PCIe module. 3. If the PCIe module is removable, verify that the PCIe module is installed correctly or install the PCIe module to another slot to identify the cause. 4. Replace the PCIe module. |
Bus Uncorrectable Error
Event code |
0x138000de |
Message text |
Bus Uncorrectable Error ---Slot $1---PCIE Name:$2 |
Variable fields |
$1: PCIe slot number. $2: PCIe module name. |
Severity level |
Major |
Example |
Bus Uncorrectable Error---Slot 3---PCIE Name: RAID-LSI-9361-8i |
Impact |
An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough. |
Cause |
An internal uncorrectable error occurred on the PCIe module. |
Recommended action |
1. If the message is reported serval times during a period of time, ensure that the riser card is securely connected to the system board. 2. Reboot the server. 3. Locate the PCIe module based on the slot number. 4. If the PCIe module is a removable component, perform the following operations: a. Verify that the PCIe module is installed correctly. b. Verify that the golden plating on the PCIe module is not contaminated. c. Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot. d. If the error occurs on the PCIe module, update all firmware and drivers. e. If the error occurs on the slot, verify that the gold plating on the riser card is not contaminated. f. If the issue persists, replace the PCIe module. 5. If the PCIe module is embedded on the system board, perform the following operations: a. Update the BIOS, firmware, and drivers. b. Replace the system board. 6. If the problem persists, contact the technical support. |
Bus Uncorrectable Error
Event code |
0x138800de |
Message text |
Bus Uncorrectable Error ---Slot $1---PCIE Name:$2 |
Variable fields |
$1: PCIe slot number. $2: PCIe module name. |
Severity level |
Major |
Example |
Bus Uncorrectable Error---Slot 3---PCIE Name: RAID-LSI-9361-8i |
Impact |
An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough. |
Cause |
An internal uncorrectable error identified by SHD occurred on the PCIe module on an AMD model. |
Recommended action |
1. Locate the PCIe module based on the slot number. 2. If the PCIe module is a removable component, perform the following operations: a. Verify that the PCIe module is installed correctly. b. Install the PCIe module is another slot. c. Update the firmware and driver of the PCIe module. 3. If the PCIe module is embedded on the system board, perform the following operations: a. Update the BIOS, firmware, and driver. b. Replace the system board. |
Bus Fatal Error
Event code |
0x13a000de |
Message text |
Bus Fatal Error ------Slot $1---PCIE Name: $2 |
Variable fields |
$1: PCIe slot number. $2: PCIe module name. |
Severity level |
Major |
Example |
Bus Fatal Error---Slot 3---PCIE Name: RAID-LSI-9361-8i |
Impact |
An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough. |
Cause |
An internal fatal error occurred on the PCIe module. |
Recommended action |
1. If the message is reported serval times during a period of time, ensure that the riser card is securely connected to the system board. 2. Reboot the server. 3. Locate the PCIe module based on the slot number. 4. If the PCIe module is a removable component, perform the following operations: a. Verify that the PCIe module is installed correctly. b. Verify that the golden plating on the PCIe module is not contaminated. c. Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot. d. If the error occurs on the PCIe module, upgrade firmware and drivers of the PCIe module. e. If the error occurs on the slot, verify that the gold plating on the riser card is not contaminated. f. If the problem persists, replace the PCIe module. 5. If the PCIe module is embedded on the system board, perform the following operations: a. Update the BIOS, firmware, and drivers. b. Replace the system board. 6. If the problem persists, contact the technical support. |
Bus Degraded
Event code |
0x13b000de |
Message text |
Bus Degraded ------Slot $1---PCIE Name: $2 |
Variable fields |
$1: PCIe slot number. $2: PCIe module name. |
Severity level |
Major |
Example |
Bus Degraded ---Slot 3---PCIE Name: RAID-LSI-9361-8i |
Impact |
System performance degradation might occur. |
Cause |
The speed and bandwidth of the PCIe module decreased. |
Recommended action |
7. If the message is reported serval times during a period of time, ensure that the riser card is securely connected to the system board. 8. Reboot the server. 9. Locate the PCIe module based on the slot number. 10. If the PCIe module is a removable component, perform the following operations: a. Verify that the PCIe module is installed correctly. b. Verify that the golden plating on the PCIe module is not contaminated. c. Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot. d. Update all firmware and drivers, including non-Intel components. e. If the error occurs on the slot, verify that the gold plating on the riser card is not contaminated. f. Replace the PCIe module. 11. If the PCIe module is embedded on the system board, perform the following operations: a. Update the BIOS, firmware, and drivers. b. Replace the system board. |
$1 triggered an uncorrectable error
Event code |
0x138400de |
Message text |
$1 triggered an uncorrectable error |
Variable fields |
$1: PCIe module type. |
Severity level |
Major |
Example |
NIC triggered an uncorrectable error |
Impact |
An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough. |
Cause |
An IERR or MCERR error occurred, which is identified as a PCIe uncorrectable error by SHD. |
Recommended action |
1. Locate the PCIe module based on the slot number. 2. If the PCIe module is a removable component, perform the following operations: a. Verify that the PCIe module is installed correctly. b. Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot. c. Update all firmware and drivers, including non-Intel components. 3. If the PCIe module is embedded on the system board, perform the following operations: a. Update the BIOS, firmware, and driver. b. Replace the system board. |
$1 triggered a correctable error
Event code |
0x138400de |
Message text |
$1 triggered a correctable error |
Variable fields |
$1: PCIe module type. |
Severity level |
Major |
Example |
NIC triggered a correctable error |
Impact |
An error occurred on the PCIe module, which might lead to the system-level failure if the error is severe enough. |
Cause |
An IERR or MCERR error occurred, which is identified as a PCIe correctable error by SHD. |
Recommended action |
4. Locate the PCIe module based on the slot number. 5. If the PCIe module is a removable component, perform the following operations: a. Verify that the PCIe module is installed correctly. b. Install the PCIe module to another slot to identify whether the error is present on the PCIe module or the slot. c. Update all firmware and drivers, including non-Intel components. 6. If the PCIe module is embedded on the system board, perform the following operations: a. Update the BIOS, firmware, and driver. b. Replace the system board. |
Button / Switch
Power Button pressed---Physical button---Button pressed
Event code |
0x140000de |
Message text |
Power Button pressed---Physical button---Button pressed |
Variable fields |
N/A |
Severity level |
Info |
Example |
Power Button pressed---Physical button---Button pressed |
Impact |
No negative impact. |
Cause |
This message is generated when the physical power button on the front panel of the server is pressed. |
Recommended action |
No action is required. |
Reset Button pressed
Event code |
0x142000de |
Message text |
Reset Button pressed |
Variable fields |
N/A |
Severity level |
Info |
Example |
Reset Button pressed |
Impact |
No negative impact. |
Cause |
This message is generated when one of the following conditions exists:: · The reset command is executed. · An IERR event occurs. |
Recommended action |
1. Review the operation logs to verify whether the reset command was executed. If the reset command was executed, no action is required. 2. Identify whether an IERR event log message is also generated. If yes, resolve the issue as described by the event log message. 3. If the problem persists, contact the technical support. |
Module / Board
Transition to Non-Critical from OK
Event code |
0x1510000e |
Message text |
Transition to Non-Critical from OK |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Transition to Non-Critical from OK |
Impact |
No negative impact if this message is generated occasionally. |
Cause |
An internal correctable error occurred on the PCIe BUS0 device. |
Recommended action |
1. Verify that the power supply for the system is normal. 2. If the problem persists, contact the technical support. |
Transition to Critical from less severe
Event code |
0x1520000e |
Message text |
Transition to Critical from less severe |
Variable fields |
N/A |
Severity level |
Major |
Example |
Transition to Critical from less severe |
Impact |
An error occurred on the PCIe BUS0 device, which might lead to the system-level failure if the error is severe enough. |
Cause |
An internal uncorrectable error occurred on the PCIe BUS0 device. |
Recommended action |
1. Verify that the power supply for the system is normal 2. Verify that all components are operating correctly. 3. If the problem persists, contact the technical support. |
Transition to Non- Recoverable from less severe
Event code |
0x1530000e |
Message text |
Transition to Non- Recoverable from less severe---System detected a power supply failure on $1($2). |
Variable fields |
$1: Faulty component, such as the system board, PDB, compute module (SMDB), and riser card. $2: Specific faulty component, such as P5V, P5V_STBY, CPU1_PVCSA, CPU2_PVCCIO. |
Severity level |
Major |
Example |
Transition to Non- Recoverable from less severe---System detected a power supply failure on Motherboard(P5V). |
Impact |
System power-off might occur. |
Cause |
Abnormal board voltage. |
Recommended action |
1. Ignore this message if it is triggered by a system power-on or power-off event. 2. Reconnect power cords and identify whether the server can be powered on correctly. ¡ If the server can be powered on, the message might be generated because the detection signals were interfered. No action is required. ¡ If the server cannot be powered on, review the SDS logs to locate the fault and replace the faulty component. 3. If the problem persists, replace the faulty component. 4. If the problem persists, contact the technical support. |
Transition to Non-Critical from OK---System is operating in KTI Link Slow Speed Mode
Event code |
0x15101ff4 |
Message text |
Transition to Non-Critical from OK---System is operating in KTI Link Slow Speed Mode |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Transition to Non-Critical from OK---System is operating in KTI Link Slow Speed Mode |
Impact |
System startup failure might occur. |
Cause |
The system is operating in Keizer Technology Interconnect (KTI) low speed mode. |
Recommended action |
Verify that the signal quality and hardware parameters are correct. |
Transition to Non-Critical from OK---Requested Link Speed is not supported. Defaulting to 18GT
Event code |
0x15102ff4 |
Message text |
Transition to Non-Critical from OK---Requested Link Speed is not supported. Defaulting to 18GT |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Transition to Non-Critical from OK---Requested Link Speed is not supported. Defaulting to 18GT |
Impact |
System startup failure might occur. |
Cause |
The link speed is not supported. |
Recommended action |
Verify that hardware parameters are correct. |
Transition to Non-Critical from OK---One or more per Link option mismatch detected. Forcing to common setting
Event code |
0x15104ff4 |
Message text |
Transition to Non-Critical from OK---One or more per Link option mismatch detected. Forcing to common setting |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Transition to Non-Critical from OK---One or more per Link option mismatch detected. Forcing to common setting |
Impact |
System startup failure might occur. |
Cause |
Some CPU links are faulty. |
Recommended action |
Verify that the UPI configuration is correct on the BIOS setup utility. |
Transition to Non-Critical from OK---Some CPU has more than one link connecting to other CPU. Disable one of the Dual-Link
Event code |
0x15105ff4 |
Message text |
Transition to Non-Critical from OK---Some CPU has more than one link connecting to other CPU. Disable one of the Dual-Link |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Transition to Non-Critical from OK---Some CPU has more than one link connecting to other CPU. Disable one of the Dual-Link |
Impact |
System startup failure might occur. |
Cause |
A UPI link error occurred. |
Recommended action |
Verify that the UPI link is connected as required. |
Transition to Non-Critical from OK---KTI Adaptation is in progress, or High Speed adaptation is failed
Event code |
0x15106ff4 |
Message text |
Transition to Non-Critical from OK---KTI Adaptation is in progress, or High Speed adaptation is failed |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Transition to Non-Critical from OK---KTI Adaptation is in progress, or High Speed adaptation is failed |
Impact |
System startup failure might occur. |
Cause |
KTI adaption is in progress. |
Recommended action |
Verify that the signal quality and hardware parameters are correct. |
System board triggered an uncorrectable error
Event code |
0x1521000e |
Message text |
System board triggered an uncorrectable error |
Variable fields |
N/A |
Severity level |
Major |
Example |
System board triggered an uncorrectable error |
Impact |
An IERR or MCERR error occurred in the system, which causes services to become unavailable. |
Cause |
An IERR or MCERR error was triggered. The error was identified as an uncorrectable error on the system board (including backplanes) by SHD. |
Recommended action |
If the problem persists, contact the technical support. |
System board triggered a correctable error
Event code |
0x1521000e |
Message text |
System board triggered a correctable error |
Variable fields |
N/A |
Severity level |
Minor |
Example |
System board triggered a correctable error |
Impact |
An IERR or MCERR error occurred in the system, which causes services to become unavailable. |
Cause |
An IERR or MCERR error was triggered. The error was identified as an uncorrectable error on the system board (including backplanes) by SHD. |
Recommended action |
If the problem persists, contact the technical support. |
Add-in Card
Transition to OK
Event code |
0x1700000e |
Message text |
Transition to OK---PCIe slot: $1---LDDevno:$2 |
Variable fields |
$1: PCIe slot where the logical drive resides. $2: Logical drive number. |
Severity level |
Info |
Example |
Transition to OK---PCIe slot:1---LDDevno:0 |
Impact |
No negative impact. |
Cause |
This message is generated if the logical drive managed by the storage controller changes from abnormal to normal. |
Recommended action |
No action is required. |
Transition to Critical from less severe
Event code |
0x1720000e |
Message text |
Transition to Critical from less severe |
Variable fields |
N/A |
Severity level |
Major |
Example |
Transition to Critical from less severe |
Impact |
System power-off might occur. |
Cause |
The backplane power supply is faulty. |
Recommended action |
1. Ignore this message if it is triggered by a system power-on or power-off event. 2. Reconnect power cords and identify whether the server can be powered on correctly. ¡ If the server can be powered on, the message might be generated because the detection signals were interfered. No action is required. ¡ If the server cannot be powered on, review the SDS logs to locate the fault and replace the faulty component. 3. If the problem persists, replace the faulty component. 4. If the problem persists, contact the technical support. |
Transition to Critical from less severe
Event code |
0x172a000e |
Event code |
Transition to Critical from less severe---PCIe slot:$1---LDDevno::$2 |
Message text |
The logical drive degraded. |
Variable fields |
Major |
Severity level |
Transition to Critical from less severe---PCIe slot: 1---LDDevno:0 |
Example |
The logical drive degraded, which might impact data reliability. |
Impact |
This message is generated when the logical drive managed by the storage controller is degraded or faulty. |
Cause |
1. Log in to HDM to identify whether the logical drive is degraded or faulty. 2. If the logical drive is degraded, perform the following operations: a. Verify that all member drives in the logical drive are operating correctly. b. Re-install member drives to identify whether the drives can be correctly identified. c. Access the BIOS to identify whether all member drives have been configured correctly. d. Check the error logs for the drives. e. Replace the faulty drive. f. If the problem persists, contact the technical support. 3. If the logical drive is faulty, perform the following operations: a. Verify that the drive has not been uninstalled. b. Re-install the member drives and rebuild the RAID. c. Replace the faulty drive, and then reboot the server. d. If the problem persists, contact the technical support. |
Transition to Non-recoverable from less severe
Event code |
0x1730000e |
Message text |
Transition to Non-recoverable from less severe |
Variable fields |
N/A |
Severity level |
Critical |
Example |
Transition to Non-recoverable from less severe |
Impact |
System power-off might occur. |
Cause |
The backplane power supply is faulty. |
Recommended action |
1. Ignore this message if it is triggered by a system power-on or power-off event. 2. Reconnect power cords and identify whether the server can be powered on correctly. ¡ If the server can be powered on, the message might be generated because the detection signals were interfered. No action is required. ¡ If the server cannot be powered on, review the SDS logs to locate the fault and replace the faulty component. 3. If the problem persists, replace the faulty component. 4. If the problem persists, contact the technical support. |
ChipSet
Transition to Critical from less severe
Event code |
0x1920000e |
Message text |
Transition to Critical from less severe |
Variable fields |
N/A |
Severity level |
Major |
Example |
Transition to Critical from less severe |
Impact |
System performance degradation or system startup failure might occur. |
Cause |
The PCH status was abnormal. |
Recommended action |
Review the event log to locate the error. |
Cable/Interconnect
Configuration Error - Incorrect cable connected / Incorrect interconnection
Event code |
0x1b1000de |
Message text |
Configuration Error - Incorrect cable connected / Incorrect interconnection |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Configuration Error - Incorrect cable connected / Incorrect interconnection |
Impact |
The network is abnormal, which might cause network disconnectivity in the system. |
Cause |
Incorrect cable configuration. |
Recommended action |
5. Verify that the cables are connected to the correct interfaces. 6. Verify that the cables connected properly for power connection. |
Configuration Error - Incorrect cable connected / Incorrect interconnection
Event code |
0x1b1800de |
Message text |
Configuration Error - Incorrect cable connected / Incorrect interconnection---$1 |
Variable fields |
$1: Incorrect cable configuration. |
Severity level |
Minor |
Example |
Configuration Error - Incorrect cable connected / Incorrect interconnection---Incorrect SATA cable connection to the backplane |
Impact |
A communication exception might occur on the backplane. |
Cause |
Incorrect cable configuration. |
Recommended action |
7. Verify that the cables are connected to the correct interfaces. 8. Verify that the cables connected properly for power connection. |
Configuration Error - Incorrect cable connected / Incorrect interconnection
Event code |
0x1b1400de |
Message text |
Configuration Error - Incorrect cable connected / Incorrect interconnection ($1) |
Variable fields |
$1: Cable connection location. |
Severity level |
Minor |
Example |
Configuration Error-Incorrect cable connected / Incorrect interconnection(FrontBackplane1) |
Impact |
A communication exception might occur on the backplane. |
Cause |
Incorrect cable configuration. |
Recommended action |
1. Verify that the cables are connected to the correct interfaces. 2. Verify that the cables connected properly for power connection. |
System Boot / Restart Initiated
Initiated by power up
Event code |
0x1d0000de |
Message text |
Initiated by power up |
Variable fields |
N/A |
Severity level |
Info |
Example |
Initiated by power up |
Impact |
No negative impact. |
Cause |
This event is triggered by a system power-on. |
Recommended action |
1. Review other logs for the cause and recommended action. 2. If the problem persists, contact the technical support. |
Initiated by hard reset
Event code |
0x1d1000de |
Message text |
Initiated by hard reset |
Variable fields |
N/A |
Severity level |
Info |
Example |
Initiated by hard reset |
Impact |
No negative impact. |
Cause |
This event is triggered by a system restart. |
Recommended action |
1. Review other logs for the cause and recommended action. 2. If the problem persists, contact the technical support. |
Initiated by warm reset
Event code |
0x1d2000de |
Message text |
Initiated by warm reset |
Variable fields |
N/A |
Severity level |
Info |
Example |
Initiated by warm reset |
Impact |
No negative impact. |
Cause |
This event is triggered by a system warm restart. |
Recommended action |
1. Review other logs for the cause and recommended action. 2. If the problem persists, contact the technical support. |
System restart
Event code |
0x1d7000de |
Message text |
System Restart---$1:$2 |
Variable fields |
$1: Reboot cause. |
Severity level |
Info |
Example |
System Restart---due to power button pressed:power off |
Impact |
No negative impact. |
Cause |
This event is trigged by a proactive restart within the OS. |
Recommended action |
No action is required. |
Boot Error
No bootable media
Event code |
0x1e0000de |
Message text |
No bootable media |
Variable fields |
N/A |
Severity level |
Info |
Example |
No bootable media |
Impact |
No negative impact. |
Cause |
Status description to indicate no bootable media, which typically has no negative impact. |
Recommended action |
1. Specify an available boot device. 2. If the problem persists, contact the technical support. |
OS_BOOT
C: boot completed
Event code |
0x1f1000de |
Message text |
C: boot completed |
Variable fields |
N/A |
Severity level |
Info |
Example |
C: boot completed |
Impact |
No negative impact. |
Cause |
The operating system booted from a hard drive. This event happens for most Windows OSs. |
Recommended action |
No action is required. |
Boot completed - boot device not specified
Event code |
0x1f6000de |
Message text |
Boot completed - boot device not specified |
Variable fields |
N/A |
Severity level |
Info |
Example |
Boot completed - boot device not specified |
Impact |
No negative impact. |
Cause |
This message is generated when the server exits the BIOS boot phase. |
Recommended action |
No action is required. |
OS Stop / Shutdown
Run-time Critical Stop
Event code |
0x201000de |
Message text |
Run-time Critical Stop |
Variable fields |
N/A |
Severity level |
Critical |
Example |
Run-time Critical Stop |
Impact |
The system crashes. |
Cause |
A critical error occurred during operating system operation. |
Recommended action |
1. Verify that the installed system, drivers, firmware, and software do not have bugs and are compatible with the server. 2. Update the versions if bugs or compatibility issues exist. 3. Verify that the installed hardware options are compatible with the server. For more information about component and server compatibility, access the component compatibility query tool at the official website. 4. If the problem persists, contact the technical support. |
OS Graceful Stop
Event code |
0x202000de |
Message text |
OS Graceful Stop |
Variable fields |
N/A |
Severity level |
Info |
Example |
OS Graceful Stop |
Impact |
The system shut down. |
Cause |
The Windows OS was forcedly stopped. |
Recommended action |
No action is required. |
OS Graceful Shutdown
Event code |
0x203000de |
Message text |
OS Graceful Shutdown |
Variable fields |
N/A |
Severity level |
Info |
Example |
OS Graceful Shutdown |
Impact |
The system shut down. |
Cause |
The Windows OS was shut down gracefully. |
Recommended action |
No action is required. |
Slot / Connector
Device disabled: PCIe module information not obtained
Event code |
0x21000012 |
Message text |
Device disabled: PCIe module information not obtained---Slot $1 |
Variable fields |
$1: PCIe slot number. |
Severity level |
Major |
Example |
Device Disabled: PCIe module information not obtained---Slot 1 |
Impact |
The PCIe module cannot be identified, which decrease the system performance. |
Cause |
The PCIe module is faulty. |
Recommended action |
1. Verify that the server starts up with the minimum configuration. For more information, see H3C Servers Troubleshooting Guide. 2. Verify that port is disabled in the BIOS. 3. Verify that the PCIe module is compatible with the server. 4. Verify that the PCIe module is installed correctly. 5. Install the PCIe module into another slot to verify that the PCIe module is not faulty. 6. If the problem persists, contact the technical support. |
Fault Status asserted
Event code |
0x210000de |
Message text |
Fault Status asserted:---fan error in slot $1 |
Variable fields |
$1: Slot number. |
Severity level |
Major |
Example |
Fault Status asserted:---fan error in slot 1 |
Impact |
The system might crash due to a PCIe module error. |
Cause |
This message is generated when the OCP fan is absent or blocked. |
Recommended action |
1. Re-install the OCP fan. 2. If the issue persists, replace the OCP fan. |
Transition to Non-Critical from OK
Event code |
0x2110000e |
Message text |
Transition to Non-Critical from OK---slot $1----PCIe Name:$2 |
Variable fields |
$1: PCIe slot number. $2: PCIe module name. |
Severity level |
Major |
Example |
Transition to Non-Critical from OK---slot 2----PCIe Name:NIC-620F-B2-25Gb-2P-1-X |
Impact |
The system might crash due to a PCIe module error. |
Cause |
This message is generated when the system fails to obtain information about network adapter connection. |
Recommended action |
1. Verify that the network adapter is no faulty. 2. Verify that the related links are operating correctly, for example, I2C or MCTP. |
System ACPI Power State
S0 / G0 "working"
Event code |
0x220000de |
Message text |
S0 / G0 "working" |
Variable fields |
N/A |
Severity level |
Info |
Example |
S0 / G0 "working" |
Impact |
No negative impact. |
Cause |
S0/G0 indicate that the system is operating correctly, where G(0-2) indicate the global states (G-States) and S(0-5) indicate the sleep states (S-States). G0 operating status: In this state, you can run the applications. S0 sleep state: Normal operating status. |
Recommended action |
No action is required. |
S0 / G0 "working"
Event code |
0x220800de |
Message text |
S0 / G0 "working"---$1 |
Variable fields |
$1: Reason for a power-on operation, including: · due to virtual power button pressed · due to physical power button pressed · due to ipmi cmd · due to redfish cmd · due to AC lost · due to kvm button pressed · due to snmp cmd |
Severity level |
Info |
Example |
S0 / G0 "working"--- due to virtual power button pressed |
Impact |
No negative impact. |
Cause |
The system is powered on. |
Recommended action |
No action is required. |
S5 / G2 "soft-off"
Event code |
0x225000de |
Message text |
S5 / G2 "soft-off" |
Variable fields |
N/A |
Severity level |
Info |
Example |
S5 / G2 "soft-off" |
Impact |
No negative impact. |
Cause |
S5 / G2 indicates the software shutdown state. You cannot run applications or the operating system in this state. Software shutdown shuts down the entire operating system except the main power supply unit. Almost no power is consumed during software shutdown. The waking time will be longer to reboot the system after a soft shutdown. |
Recommended action |
No action is required. |
S5 / G2 "soft-off"
Event code |
0x225000de |
Message text |
S5 / G2 \"soft-off"---$1 |
Variable fields |
$1: Reason for a power-off operation, including: · due to virtual power button pressed · due to physical power button pressed · due to ipmi cmd · due to redfish cmd · due to AC lost · due to kvm button pressed · due to snmp cmd |
Severity level |
Info |
Example |
S5 / G2 "soft-off"--- due to virtual power button pressed |
Impact |
No negative impact. |
Cause |
S5 / G2 indicates the software shutdown state. You cannot run applications or the operating system in this state. Software shutdown shuts down the entire operating system except the main power supply unit. Almost no power is consumed during software shutdown. The waking time will be longer to reboot the system after a soft shutdown. |
Recommended action |
No action is required. |
S4 / S5 soft-off, particular S4 / S5 state cannot be determined
Event code |
0x226000de |
Message text |
S4 / S5 soft-off, particular S4 / S5 state cannot be determined |
Variable fields |
N/A |
Severity level |
Info |
Example |
S4 / S5 soft-off, particular S4 / S5 state cannot be determined |
Impact |
No negative impact. |
Cause |
S4/S5 indicates the software shutdown state, but you cannot identify whether the current state is S4 or S5. S(0-5) indicate the sleep states (S-States). S4 state: · All components are closed including ARM. · Only the platform settings are retained, while other settings are saved in a special location on the drive. · After a successful switch to S4, the system will shut down. · Due to the cessation of almost all programs and configurations, the power consumption is less than 3W. · Upon wake-up, the system needs to enter BIOS Boot Sequence again. · No system restart is required. The system will continue with the S5 shutdown state. |
Recommended action |
No action is required. |
LPC Reset occurred
Event code |
0x22d000de |
Message text |
LPC Reset occurred |
Variable fields |
N/A |
Severity level |
Info |
Example |
LPC Reset occurred |
Impact |
No negative impact. |
Cause |
The server was reset. This message is available only for servers that use Intel processors. |
Recommended action |
No action is required. |
Watchdog2
Watchdog overflowAction:Timer expired
Event code |
0x230000de |
Message text |
Watchdog overflow.Action:Timer expired - status only (no action and no interrupt)---interrupt type:$1---timer use at expiration:$2 |
Variable fields |
$1: Interrupt type. Options include none, SMI, NMI, Messaging Interrupt, and unspecified. $2: Watchdog. Options include reserved, BIOS FRB2, BIOS POST, OS Load, SMS OS, OEM, and unspecified. |
Severity level |
Info |
Example |
Watchdog overflow.Action:Timer expired - status only (no action and no interrupt)---interrupt type:none---timer use at expiration:BIOS FRB2 |
Impact |
System startup failure might occur. |
Cause |
This message is generated when the following conditions are met: · The watchdog is enabled in the BIOS. · The watchdog timer expires. · The timeout action is set to no action. |
Recommended action |
1. For a BIOS POST watchdog timeout, review the event logs to identify hardware errors or BIOS startup errors, and troubleshoot the errors as instructed in the logs. 2. For an OS Load watchdog timeout, verify that no error is present in the system startup environment. If no error is present, proceed to step 5. 3. For an OS Running watchdog timeout, review the OS logs to identify whether software exceptions occurred and troubleshoot the exceptions as instructed in the logs. 4. Identify whether data storms have occurred. If yes, troubleshoot network errors. 5. If the problem persists, contact the technical support. |
Watchdog overflowAction:Hard Reset
Event code |
0x231000de |
Message text |
Watchdog overflow.Action:Hard Reset---interrupt type:$1---timer use at expiration:$2 |
Variable fields |
$1: Interrupt type. Options include none, SMI, NMI, Messaging Interrupt, and unspecified. $2: Watchdog. Options include reserved, BIOS FRB2, BIOS POST, OS Load, SMS OS, OEM, and unspecified. |
Severity level |
Major |
Example |
Watchdog overflow.Action:Hard Reset---interrupt type:none---timer use at expiration:BIOS FRB2 |
Impact |
System startup failure might occur. |
Cause |
This message is generated when the following conditions are met: · The watchdog is enabled in the BIOS. · The watchdog timer expires during the BIOS POST, OS Load, or SMS/OS phase (indicated by the watchdog timer type). · The timeout action is set to hard reset. |
Recommended action |
1. For a BIOS POST watchdog timeout, review the event logs to identify hardware errors or BIOS startup errors, and troubleshoot the errors as instructed in the logs. 2. For an OS Load watchdog timeout, verify that no error is present in the system startup environment. If no error is present, proceed to step 5. 3. For an OS Running watchdog timeout, review the OS logs to identify whether software exceptions occurred and troubleshoot the exceptions as instructed in the logs. 4. Identify whether data storms have occurred. If yes, troubleshoot network errors. 5. If the problem persists, contact the technical support. |
Watchdog overflowAction:Power Down
Event code |
0x232000de |
Message text |
Watchdog overflow.Action:Power Down---interrupt type:$1---timer use at expiration:$2 |
Variable fields |
$1: Interrupt type. Options include none, SMI, NMI, Messaging Interrupt, and unspecified. $2: Watchdog. Options include reserved, BIOS FRB2, BIOS POST, OS Load, SMS OS, OEM, and unspecified. |
Severity level |
Major |
Example |
Watchdog overflow.Action:Power Down---interrupt type:none---timer use at expiration:BIOS FRB2 |
Impact |
System startup failure might occur. |
Cause |
This message is generated when the following conditions are met: · The watchdog is enabled in the BIOS. · The watchdog timer expires during the BIOS POST, OS Load, or SMS/OS phase (indicated by the watchdog timer type). · The timeout action is set to power down. The watchdog powered off the system forcibly. Services are interrupted and the data that has not been saved will get lost. |
Recommended action |
6. For a BIOS POST watchdog timeout, review the event logs to identify hardware errors or BIOS startup errors, and troubleshoot the errors as instructed in the logs. 7. For an OS Load watchdog timeout, verify that no error is present in the system startup environment. If no error is present, proceed to step 5. 8. For an OS Running watchdog timeout, review the OS logs to identify whether software exceptions occurred and troubleshoot the exceptions as instructed in the logs. 9. Identify whether data storms have occurred. If yes, troubleshoot network errors. 10. If the problem persists, contact the technical support. |
Watchdog overflowAction:Power Cycle
Event code |
0x233000de |
Message text |
Watchdog overflow.Action:Power Cycle---interrupt type:$1---timer use at expiration:$2 |
Variable fields |
$1: Interrupt type. Options include none, SMI, NMI, Messaging Interrupt, and unspecified. $2: Watchdog. Options include reserved, BIOS FRB2, BIOS POST, OS Load, SMS OS, OEM, and unspecified. |
Severity level |
Major |
Example |
Watchdog overflow.Action:Power Cycle---interrupt type:none---timer use at expiration:BIOS FRB2 |
Impact |
System startup failure might occur. |
Cause |
This message is generated when the following conditions are met: · The watchdog is enabled in the BIOS. · The watchdog timer expires during the BIOS POST, OS Load, or SMS/OS phase (indicated by the watchdog timer type). · The timeout action is set to power cycle. |
Recommended action |
1. For a BIOS POST watchdog timeout, review the event logs to identify hardware errors or BIOS startup errors, and troubleshoot the errors as instructed in the logs. 2. For an OS Load watchdog timeout, verify that no error is present in the system startup environment. If no error is present, proceed to step 5. 3. For an OS Running watchdog timeout, review the OS logs to identify whether software exceptions occurred and troubleshoot the exceptions as instructed in the logs. 4. Identify whether data storms have occurred. If yes, troubleshoot network errors. 5. If the problem persists, contact the technical support. |
Entity Presence
Entity Present---License is about to expire
Event code |
0x250000de |
Message text |
Entity Present---License is about to expire |
Variable fields |
N/A |
Severity level |
Minor |
Example |
Entity Present---License is about to expire |
Impact |
No negative impact. |
Cause |
This message is generated when the remaining validity period of the license is less than 10 days. |
Recommended action |
The temporary license is about to expire. Please purchase the formal license. |
Entity Disabled---License has expired
Event code |
0x252000de |
Message text |
Entity Disabled---$1 |
Variable fields |
$1: Certificate state: · License has expired. · License is unavailable. |
Severity level |
Minor |
Example |
Entity Disabled---License has expired |
Impact |
No negative impact. |
Cause |
The certificate has expired or is not available. |
Recommended action |
1. If the temporary license has expired, purchase and activate the formal license. 2. If the license is not available, re-install and activate the existing license or contact the technical support. |
Management Subsystem Health
Controller access degraded or unavailable
Event code |
0x281000de |
Message text |
Controller access degraded or unavailable---$1 |
Variable fields |
$1: Possible options include Failed to access the SD card and SD card partitions are missing. |
Severity level |
Major |
Example |
Controller access degraded or unavailable---Failed to access the SD card. |
Impact |
No negative impact. |
Cause |
SD card reading failed or the SD card was missing. |
Recommended action |
1. Restart BMC. 2. Re-install the SD module for BMC. 3. If the problem persists, contact the technical support. |
Controller access degraded or unavailable
Event code |
0x282000de |
Message text |
Management controller off-line ---$1 |
Variable fields |
$1: BMC reboot cause. |
Severity level |
Info |
Example |
Management controller off-line---BMC reset |
Impact |
No negative impact. |
Cause |
BMC was restarted. |
Recommended action |
No action is required. |
Battery
Battery low (predictive failure)
Event code |
0x290000de |
Message text |
Battery low (predictive failure)---PCIe slot:$1 |
Variable fields |
$1: PCIe slot number of the storage controller. |
Severity level |
Minor |
Example |
Battery low (predictive failure)---PCIe slot:1 |
Impact |
The reliability of the RAID controller will degrade, which might cause system performance degradation. |
Cause |
The supercapacitor of the storage controller has a low charge, overtemperature, overvoltage, or overcurrent condition. |
Recommended action |
1. Power on the server to charge the supercapacitor. Log in to HDM, and verify that the supercapacitor of the RAID controller is in normal state and identify whether the alarm is cleared. 2. Verify that the power fail safeguard module is installed correctly. 3. Replace the corresponding components, including the battery, supercapacitor, or flash card (if any), and then restart the server. 4. If the problem persists, contact the technical support. |
Battery failed
Event code |
0x291000de |
Message text |
Battery failed---PCIe slot:$1 |
Variable fields |
$1: PCIe slot number of the storage controller. |
Severity level |
Minor |
Example |
Battery failed---PCIe slot:1 |
Impact |
The reliability of the RAID controller will degrade, which might cause system performance degradation. |
Cause |
An internal error occurred on the power fail safeguard module of the storage controller. Possible reasons include: · The supercapacitor is exhausted or has expired. · The power fail safeguard module failed to be initialized. · The power fail safeguard module subsystem failed. · The supercapacitor failed to be charged. · The battery or supercapacitor fails. |
Recommended action |
1. Log in to HDM, and verify that the supercapacitor of the RAID controller is in normal state. 2. Verify that the power fail safeguard module is installed correctly. 3. Replace the corresponding components, including the battery, supercapacitor, or flash card (if any), and then restart the server. 4. If the problem persists, contact the technical support. |
Battery presence detected
Event code |
0x292000df |
Message text |
Battery presence detected---PCIe slot:$1 |
Variable fields |
$1: PCIe slot number of the storage controller. |
Severity level |
Info |
Example |
Battery presence detected---PCIe slot:1 |
Impact |
The reliability of the RAID controller will degrade, which might cause system performance degradation. |
Cause |
The battery or supercapacitor of the RAID controller is absent. |
Recommended action |
1. Log in to HDM, and verify that the supercapacitor of the RAID controller is in normal state. 2. Verify that the supercapacitor is installed correctly and the supercapacitor cable is connected correctly. 3. Replace the corresponding components, including the battery, supercapacitor, or flash card (if any), and then restart the server. 4. If the problem persists, contact the technical support. |
Version Change
Hardware incompatibility detected with associated Entity---Memory is not certified
Event code |
0x2b2000de |
Message text |
Hardware incompatibility detected with associated Entity---Memory is not certified---Location:CPU:$1 CH:$2 DIMM:$3 |
Variable fields |
$1: CPU number. $2: Channel number. $3: DIMM number. |
Severity level |
Minor |
Example |
Hardware incompatibility detected with associated Entity---Memory is not certified---Location:CPU:1 CH:1 DIMM:0 |
Impact |
No negative impact. |
Cause |
This message is generated when the DIMM is not certified. |
Recommended action |
The DIMM is not certified. |