Dell OM : Memory is in critical state

Dell.Server.OOB.SNMPTrap.2265 (Rule)

Knowledge Base article:

Summary

Memory critical state alert

Causes

Memory has generated critical alert. Probable causes and corresponding resolutions for this condition are:

Cause

Resolutions

Multi-bit memory errors detected on a memory device at location(s) <location>.

Re-install the memory component. If the problem persists, contact technical support. Refer to the product documentation to choose a convenient contact method.

Parity memory errors detected on a memory device at location <location>.

Re-install the memory component. If the problem persists, contact technical support. Refer to the product documentation to choose a convenient contact method.

Stuck bit memory error detected on a memory device at location <location>.

Re-install the memory component. If the problem persists, contact technical support. Refer to the product documentation to choose a convenient contact method.

Memory device at location <location> is disabled.

Re-install the memory component. Review product documentation for supported memory configurations. If the problem continues, contact support.

Persistent correctable memory error limit reached for a memory device at location(s) <location>.

Re-install the memory component. If the problem persists, contact technical support. Refer to the product documentation to choose a convenient contact method.

Unsupported memory configuration; check memory device at location <location>.

Review product documentation for supported memory configurations

Memory device at location <location> is over heating.

If unexpected, review system logs for power or thermal exceptions.

Correctable memory error rate exceeded for <location>.

Re-install the memory component. If the problem continues, contact support.

Memory device at location <location> failed to transition to a running state.

Re-install the memory component. If the problem continues, contact support.

Memory device at location <location> failed to power off.

Re-attempt memory removal process

Memory device at location <location> failed to transition to online.

Re-attempt memory removal process

Memory device at location <location> failed to transition to offline.

Re-attempt memory removal process

Memory device at location <location> is not installed correctly.

Re-install the memory component. If the problem continues, contact support.

Memory RAID redundancy is lost. Check memory device at location(s) <location>.

Re-install the memory component. If the problem continues, contact support.

Memory mirror redundancy is lost. Check memory device at location(s) <location>.

Review system logs for memory exceptions. Re-install memory at location <location>

Memory spare redundancy is lost. Check memory device at location <location>.

Review system logs for memory exceptions. Re-install memory at location <location>

Memory redundancy is lost.

Review system logs for memory exceptions. Re-install memory at location <location>

A hardware mismatch detected for memory riser.

Review product documentation for proper memory riser installation and configuration

Correctable memory error logging disabled for a memory device at location <location>.

Review system logs for memory exceptions. Re-install memory at location <location>

Memory interconnect degraded.

  • Reseat the memory riser and restart the server.

  • If the issue is not resolved by preceding action, reseat the processor and restart the server.

  • If the issue persists, contact your service provider.

Intel QPI interconnect <QPI link number> has degraded.

  • Restart the server and check whether the issue persists.

  • If the issue is not resolved by the preceding action, reseat the processors and restart the server.

  • If the issue persists, contact your service provider.

Intel SMI 2 Memory interconnect <link number> has degraded.

  • Reseat the memory riser and restart the server.

  • If the issue is not resolved by preceding action, reseat the processor and restart the server.

  • If the issue persists, contact your service provider.

Intel QPI interconnect <QPI link number> has a non-recoverable issue.

  • Restart the server and check whether the issue persists.

  • If the issue is not resolved by the preceding action, reseat the processors and restart the server.

  • If the issue persists, contact your service provider.

Intel SMI 2 Memory interconnect <link number> has a non-recoverable issue.

  • Reseat the memory riser and restart the server.

  • If the issue is not resolved by preceding action, reseat the processor and restart the server.

  • If the issue persists, contact your service provider.

Intel DDR Memory interconnect <link number> has a non-recoverable issue.

Reseat all DIMMs on the memory riser, reseat the memory riser, and restart the server. If the issue persists, contact your service provider.

Resolutions

Additional information on this issue may be available. Launch the iDRAC Console to debug further.

Element properties:

TargetDell.Server
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
RemotableTrue
Alert Message
Dell OM : Memory is in critical state
Event Description: {0}

Member Modules:

ID Module Type TypeId RunAs 
DS DataSource System.NetworkManagement.SnmpTrapEventProvider Default
Alert WriteAction System.Health.GenerateAlert Default
ResolveAlert WriteAction Dell.Operations.Server.ResolveAlert.WAT Default

Source Code:

<Rule ID="Dell.Server.OOB.SNMPTrap.2265" Enabled="true" Target="DellModelServer!Dell.Server" ConfirmDelivery="false" Remotable="true" Priority="Normal" DiscardLevel="100">
<Category>AvailabilityHealth</Category>
<DataSources>
<DataSource ID="DS" TypeID="NetworkDevice!System.NetworkManagement.SnmpTrapEventProvider">
<IP>$Target/Property[Type="DellModelServer!Dell.Server"]/RemoteAccessIP$</IP>
<OIDProps>
<OIDProp>.1.3.6.1.4.1.674.10892.5.3.2.1.0.2265</OIDProp>
</OIDProps>
<EventOriginId>$Target/Id$</EventOriginId>
<PublisherId>$Target/Id$</PublisherId>
<PublisherName>iDRAC</PublisherName>
<Channel>SnmpEvent</Channel>
<LoggingComputer/>
<EventNumber>2265</EventNumber>
<EventCategory>5</EventCategory>
<EventLevel>10</EventLevel>
<UserName/>
<Params/>
</DataSource>
</DataSources>
<WriteActions>
<WriteAction ID="Alert" TypeID="SystemHealth!System.Health.GenerateAlert">
<Priority>1</Priority>
<Severity>2</Severity>
<AlertName/>
<AlertDescription/>
<AlertOwner/>
<AlertMessageId>$MPElement[Name="Dell.Server.OOB.SNMPTrap.2265.Rule"]$</AlertMessageId>
<AlertParameters>
<AlertParameter1>$Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[4]/Value$</AlertParameter1>
</AlertParameters>
<Suppression>
<SuppressionValue>$Data/EventDisplayNumber$</SuppressionValue>
<SuppressionValue>$Data/Channel$</SuppressionValue>
<SuppressionValue>$Data/PublisherName$</SuppressionValue>
<SuppressionValue>$Data/LoggingComputer$</SuppressionValue>
<SuppressionValue>$Data/EventCategory$</SuppressionValue>
<SuppressionValue>$Data/EventLevel$</SuppressionValue>
<SuppressionValue>$Data/UserName$</SuppressionValue>
<SuppressionValue>$Data/EventNumber$</SuppressionValue>
<SuppressionValue>$Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[3]/Value$</SuppressionValue>
<SuppressionValue>$Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[4]/Value$</SuppressionValue>
<SuppressionValue>$Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[6]/Value$</SuppressionValue>
<SuppressionValue>$Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[8]/Value$</SuppressionValue>
</Suppression>
<Custom1>Alert Message ID = $Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[3]/Value$ </Custom1>
<Custom2>Alert Message = $Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[4]/Value$ </Custom2>
<Custom3>Alert Status = $Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[5]/Value$ </Custom3>
<Custom4>Alert Service Tag = $Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[6]/Value$ </Custom4>
<Custom5>Alert FQDN = $Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[7]/Value$ </Custom5>
<Custom6>Alert FQDD = $Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[8]/Value$ </Custom6>
<Custom7/>
<Custom8/>
<Custom9/>
<Custom10/>
</WriteAction>
<WriteAction ID="ResolveAlert" TypeID="DellOperationsCommon!Dell.Operations.Server.ResolveAlert.WAT">
<IP>$Target/Property[Type="DellModelServer!Dell.Server"]/RemoteAccessIP$</IP>
<MessageID>$Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[3]/Value$</MessageID>
<FQDN>$Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[7]/Value$</FQDN>
<FQDD>$Data/EventData/DataItem/SnmpVarBinds/SnmpVarBind[8]/Value$</FQDD>
<EventNumber>$Data/EventNumber$</EventNumber>
<Prefix>OOB</Prefix>
</WriteAction>
</WriteActions>
</Rule>