Monitor REC_FAILED_HOST_IO_CARD (150)

NetAppESeries.FailureID_0150_Monitor (UnitMonitor)

A host I/O card
has failed and the alternate controller is locked down. This enumeration
value is numerically equivalent to REC_HOST_BOARD_FAULT. It has
been added so as to make the failure recovery terminology line up
with the major event log terminology. The two enumeration values may
be used interchangeably.

Knowledge Base article:

Failed I/OHost Card

What Caused the Problem?

An I/O host card in one of the controllers is not functioning properly. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.

 Caution: Electrostatic discharge can damage sensitive components. Always use proper antistatic protection when handling components. Touching components without using a proper ground may damage the equipment.

Important Notes

 

Recovery Steps

If...

Then...

Your storage array has one controller

Go to

Procedure for Storage Arrays with One Controller.

Your storage array has two controllers

If there are any hosts connected to this storage array that are NOT running a host-based, multi-path failover driver, stop I/O to the storage array from each of these hosts.

Go to Procedure for Storage Arrays with Two Controllers.

Procedure for Storage Arrays with One Controller

1

Check the replacement part number of the affected controller to ensure that the new controller has the same replacement part number.

a

On the

Hardware

tab in the Array Management Window (AMW), select the affected controller.

b

Identify the "Replacement part number" in the Properties pane.

If...

Then...

The replacement controller has the same part number

Go to step 2.

The replacement controller does NOT have the same part number

Do not continue with the remaining recovery steps and contact your Technical Support Representative.

2

Stop all I/O to this storage array.

3

Turn off power to all power-fan canisters in the tray containing the failed controller.

4

Remove the affected controller. Refer to the Enterprise Management Window to view which management method you are using to manage this storage array.

If...

Then...

You are using In-Band management for ALL hosts attached to this storage array

Go to step 5.

You are using Out-of-Band management for ANY host attached to this storage array

Before you insert a new controller canister into the storage array, you must update the DHCP/BOOTP server for each Out-of-Band managed host so that it will associate the new controller's hardware Ethernet (MAC) address with the DNS/network name and IP address previously assigned to the removed controller.

To update the DHCP/BOOTP server, find the entry associated with the removed controller and replace its Ethernet (MAC) address with the new controller's Ethernet (MAC) address. The controller's Ethernet (MAC) address is located on an Ethernet ID label on the controller canister in the form xx.xx.xx.xx.xx.xx.

When you are finished, go to step 5.

5

If necessary, insert the battery from the old controller canister into the new replacement controller canister. Make sure at least 1 minute has elapsed and then insert the new (compatible) controller canister firmly into place.

6

Turn on power to all power-fan canisters in the tray. Wait until all drives have completed the spin-up process, and then go to step 7.

7

On the

Hardware

tab in the AMW, select the affected controller and view the status of the controller in the Properties pane.

8

Click the

Recheck

button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative.

Procedure for Storage Arrays with Two Controllers

1

Place the affected controller offline.

a

Select the controller on the

Hardware

tab in the Array Management Window.

b

Select the

Hardware > Controller > Advanced > Place > Offline

menu option.

c

Follow the instructions in the dialog, then click the

Yes

button.

2

Read all of the following steps before taking any action. The remaining recovery steps will no longer be accessible from the Recovery Guru dialog after you complete step a.

a

Click the

Recheck

button to rerun the Recovery Guru.

b

Select the "Offline Controller" problem that is being reported in the Summary area.

c

Complete the recovery steps in the "Offline Controller" recovery procedure to replace the affected controller.

Element properties:

TargetNetAppESeries.StorageArray
Parent MonitorNetAppESeries.StorageArrayAvailability
CategoryCustom
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeNetAppESeries.FailureUnitMonitorType
RemotableTrue
AccessibilityInternal
Alert Message
Alert: REC_FAILED_HOST_IO_CARD
A host I/O card
has failed and the alternate controller is locked down. This enumeration
value is numerically equivalent to REC_HOST_BOARD_FAULT. It has
been added so as to make the failure recovery terminology line up
with the major event log terminology. The two enumeration values may
be used interchangeably. Alert Value: {0}
RunAsDefault
CommentMachine generated entity

Source Code:

<UnitMonitor ID="NetAppESeries.FailureID_0150_Monitor" Accessibility="Internal" Enabled="true" Target="NetAppESeries.StorageArray" ParentMonitorID="NetAppESeries.StorageArrayAvailability" Remotable="true" Priority="Normal" TypeID="NetAppESeries.FailureUnitMonitorType" ConfirmDelivery="true" Comment="Machine generated entity">
<Category>Custom</Category>
<AlertSettings AlertMessage="NetAppESeries.REC_FAILED_HOST_IO_CARD_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name='FailureDescription']$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="NetAppESeries.StateIdC4319E02E02503AFCCA15F3433F68031" MonitorTypeStateID="NoIssue" HealthState="Success"/>
<OperationalState ID="NetAppESeries.StateId4E1AE622E206A6F9416D1E6A4131EF74" MonitorTypeStateID="IssueFound" HealthState="Error"/>
</OperationalStates>
<Configuration>
<FailureID>150</FailureID>
<IntervalSeconds>59</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<Trace>0</Trace>
</Configuration>
</UnitMonitor>