Dell OM : Controller is in critical state

Dell.ManagedServer.Alert.4329 (Rule)

Knowledge Base article:

Summary

Controller critical state alert

Causes

Controller has generated critical alert. Probable causes and corresponding resolutions for this condition are:

Cause

Resolutions

An invalid SAS configuration has been detected on <Controller name>. Details: <error message>

Refer to the storage hardware documentation for information on correct cabling configurations.

Multi-bit ECC error on <Controller name> DIMM.

Replace the Dual In-line Memory Module (DIMM). The DIMM is a part of the controller battery pack. Refer to the storage hardware documentation for information on replacing the DIMM. You may need to restore data from backup.

Diagnostic message <message> from <Controller name>

Refer to the storage hardware documentation for more information on the diagnostic test.

Single-bit ECC error. The <Controller name> DIMM is critically degraded.

Replace the DIMM immediately to avoid data loss or data corruption. The DIMM is a part of the controller battery pack. Refer to the storage hardware documentation for information on replacing the DIMM.

Single-bit ECC error on <Controller name>.

Replace the DIMM immediately. The DIMM is a part of the controller battery pack. Refer to your hardware documentation for information on replacing the DIMM.

<Controller name> SAS SMP communications error <args>

There may be a SAS topology error. See the hardware documentation for information on correct SAS topology configurations. There may be problems with the cables such as a loose connection or an invalid cabling configuration. See the Cables Attached Correctly section for more information on checking the cables. See the hardware documentation for information on correct cabling configurations. Verify that the firmware version is supported.

<Controller name> SAS expander error: <args>

There may be a problem with the enclosure. Verify the health of the enclosure and its components. To verify the health of the enclosure, select the enclosure object in the tree view. The Health subtab displays a red X or yellow exclamation mark for enclosure components that are failed or degraded. See the enclosure documentation for more information.

A configuration command could not be committed to disk on <Controller name>

Re-issue the failed configuration command or try with a different set of physical disks. Contact technical support if the problem persists.

Resolutions

Additional information on this issue may be available. Launch the iDRAC Console to debug further.

Element properties:

TargetDell.ManagedServer
CategoryAlert
EnabledTrue
Event_ID4329
Event SourceLifeCycle Controller Log
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
RemotableTrue
Alert Message
Dell OM : Controller is in critical state
Event Description: {0}
Event LogSystem

Member Modules:

ID Module Type TypeId RunAs 
DS DataSource Microsoft.Windows.EventProvider Default
Alert WriteAction System.Health.GenerateAlert Default
WriteToDW WriteAction Microsoft.SystemCenter.DataWarehouse.PublishEventData Default

Source Code:

<Rule ID="Dell.ManagedServer.Alert.4329" Enabled="true" Target="DellManagedServer!Dell.ManagedServer" ConfirmDelivery="false" Remotable="true" Priority="Normal" DiscardLevel="100">
<Category>Alert</Category>
<DataSources>
<DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider">
<ComputerName>$Target/Property[Type="DellManagedServer!Dell.ManagedServer"]/HostName$</ComputerName>
<LogName>System</LogName>
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="UnsignedInteger">4329</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">PublisherName</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">LifeCycle Controller Log</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</DataSource>
</DataSources>
<WriteActions>
<WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert">
<Priority>1</Priority>
<Severity>2</Severity>
<AlertMessageId>$MPElement[Name="Dell.ManagedServer.Alert.4329.Rule"]$</AlertMessageId>
<AlertParameters>
<AlertParameter1>$Data/EventDescription$</AlertParameter1>
</AlertParameters>
<Suppression>
<SuppressionValue>$Data/EventDisplayNumber$</SuppressionValue>
<SuppressionValue>$Data/Channel$</SuppressionValue>
<SuppressionValue>$Data/PublisherName$</SuppressionValue>
<SuppressionValue>$Data/LoggingComputer$</SuppressionValue>
<SuppressionValue>$Data/EventCategory$</SuppressionValue>
<SuppressionValue>$Data/EventLevel$</SuppressionValue>
<SuppressionValue>$Data/UserName$</SuppressionValue>
<SuppressionValue>$Data/EventNumber$</SuppressionValue>
<SuppressionValue>$Data/EventDescription$</SuppressionValue>
</Suppression>
<Custom1/>
<Custom2/>
<Custom3/>
<Custom4/>
<Custom5/>
<Custom6/>
<Custom7/>
<Custom8/>
<Custom9/>
<Custom10/>
</WriteAction>
<WriteAction ID="WriteToDW" TypeID="SCDW!Microsoft.SystemCenter.DataWarehouse.PublishEventData"/>
</WriteActions>
</Rule>