Dell OM : PCI is in critical state

Dell.ManagedServer.Alert.2417 (Rule)

Knowledge Base article:

Summary

PCI critical state alert

Causes

PCI has generated critical alert. Probable causes and corresponding resolutions for this condition are:

Cause

Resolutions

A bus time-out was detected on a component at bus <bus> device <device> function <func>.

Cycle input power, update component drivers, if device is removable, re-install the device.

An I/O channel check error was detected.

Cycle input power, update component drivers, if device is removable, re-install the device.

A software error was detected on a component at bus <bus> device <device> function <func>.

Reboot the system and update the component drivers.

A PCI parity error was detected on a component at bus <bus> device <device> function <func>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A PCI system error was detected on a component at bus <bus> device <device> function <func>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A bus correctable error was detected on a component at bus <bus> device <device> function <func>.

Cycle input power, update component drivers, if device is removable re-install the device at the next scheduled service time.

A bus uncorrectable error was detected on a component at bus <bus> device <device> function <func>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A fatal error was detected on a component at bus <bus> device <device> function <func>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A bus fatal error was detected on a component at bus <bus> device <device> function <func>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A bus time-out was detected on a component at slot <number>.

Cycle input power, update component drivers, if device is removable, re-install the device.

An I/O channel check error was detected.

Cycle input power, update component drivers, if device is removable, re-install the device.

A software error was detected on a component at slot <number>.

Reboot the system and update the component drivers.

A PCI parity error was detected on a component at slot <number>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A PCI system error was detected on a component at slot <number>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A bus correctable error was detected on a component at slot <number>.

Cycle input power, update component drivers, remove and re-install the device at the next scheduled service time.

A bus uncorrectable error was detected on a component at slot <number>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A fatal error was detected on a component at slot <number>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A bus fatal error was detected on a component at slot <number>.

Cycle input power, update component drivers, if device is removable, re-install the device.

A fatal IO error detected on a component at bus <bus> device <device> function <func>.

Cycle input power, update component drivers, remove and re-install the device.

A fatal IO error detected on a component at slot <number>.

Cycle input power, update component drivers, remove and re-install the device.

Device option ROM on embedded NIC failed to support Link Tuning or FlexAddress.

Update BIOS, BMC/iDRAC, and LOM firmware. If problem persists please contact customer support.

Failed to program virtual MAC address on a component at bus <bus> device <device> function <func>.

Update BIOS, BMC/iDRAC, LOM, and mezzanine card firmware. If problem persists please contact customer support.

Device option ROM on mezzanine card <number> failed to support Link Tuning or FlexAddress.

Update BIOS, BMC/iDRAC, and mezzanine card firmware. If problem persists please contact customer support.

Failed to get Link Tuning or FlexAddress data from iDRAC.

Update BIOS, and BMC/iDRAC firmware. If problem persists please contact customer support.

Device option ROM on mezzanine card failed to support Link Tuning or FlexAddress.

Update BIOS, BMC/iDRAC, and mezzanine card firmware. If problem persists please contact customer support.

A power fault issue is detected in the PCIe adapter that was turned on in PCIe slot <slot number>.

Make sure that: 1) The PCIe adapter used is supported by the Chassis Management Controller (CMC). For a list of supported adapters, contact your service provider. 2) The adapter is not damaged. 3) The power requirement of adapter is not more than the power allocated to the slot. 4) The adapter is properly inserted in the slot.

An auxiliary power fault issue is detected in the PCIe adapter that was turned on in PCIe slot <slot number>.

Make sure that: 1) The PCIe adapter used is supported by the Chassis Management Controller (CMC). For a list of supported adapters, contact your service provider. 2) The adapter is not damaged. 3) The adapter is properly inserted in the slot.

The Chassis Management Controller (CMC) is unable to communicate with the PCIe switch board.

Do the following: 1) Perform an AC Power Cycle operation on the Chassis by disconnecting the chassis from AC power, waiting for 30 seconds, and then reconnecting AC power. 2) Power on at least one server. Using the Chassis Management Controller (CMC) web interface, select Server Overview->Power->Control and then click the Operation field and select the Power On Server operation. If the issue persists, contact your service provider.

Resolutions

Additional information on this issue may be available. Launch the iDRAC Console to debug further.

Element properties:

TargetDell.ManagedServer
CategoryAlert
EnabledTrue
Event_ID2417
Event SourceLifeCycle Controller Log
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
RemotableTrue
Alert Message
Dell OM : PCI is in critical state
Event Description: {0}
Event LogSystem

Member Modules:

ID Module Type TypeId RunAs 
DS DataSource Microsoft.Windows.EventProvider Default
Alert WriteAction System.Health.GenerateAlert Default
WriteToDW WriteAction Microsoft.SystemCenter.DataWarehouse.PublishEventData Default

Source Code:

<Rule ID="Dell.ManagedServer.Alert.2417" Enabled="true" Target="DellManagedServer!Dell.ManagedServer" ConfirmDelivery="false" Remotable="true" Priority="Normal" DiscardLevel="100">
<Category>Alert</Category>
<DataSources>
<DataSource ID="DS" TypeID="Windows!Microsoft.Windows.EventProvider">
<ComputerName>$Target/Property[Type="DellManagedServer!Dell.ManagedServer"]/HostName$</ComputerName>
<LogName>System</LogName>
<Expression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="UnsignedInteger">2417</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">PublisherName</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">LifeCycle Controller Log</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</Expression>
</DataSource>
</DataSources>
<WriteActions>
<WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert">
<Priority>1</Priority>
<Severity>2</Severity>
<AlertMessageId>$MPElement[Name="Dell.ManagedServer.Alert.2417.Rule"]$</AlertMessageId>
<AlertParameters>
<AlertParameter1>$Data/EventDescription$</AlertParameter1>
</AlertParameters>
<Suppression>
<SuppressionValue>$Data/EventDisplayNumber$</SuppressionValue>
<SuppressionValue>$Data/Channel$</SuppressionValue>
<SuppressionValue>$Data/PublisherName$</SuppressionValue>
<SuppressionValue>$Data/LoggingComputer$</SuppressionValue>
<SuppressionValue>$Data/EventCategory$</SuppressionValue>
<SuppressionValue>$Data/EventLevel$</SuppressionValue>
<SuppressionValue>$Data/UserName$</SuppressionValue>
<SuppressionValue>$Data/EventNumber$</SuppressionValue>
<SuppressionValue>$Data/EventDescription$</SuppressionValue>
</Suppression>
<Custom1/>
<Custom2/>
<Custom3/>
<Custom4/>
<Custom5/>
<Custom6/>
<Custom7/>
<Custom8/>
<Custom9/>
<Custom10/>
</WriteAction>
<WriteAction ID="WriteToDW" TypeID="SCDW!Microsoft.SystemCenter.DataWarehouse.PublishEventData"/>
</WriteActions>
</Rule>