A host I/O card
has failed and the alternate controller is locked down. This enumeration
value is numerically equivalent to REC_HOST_BOARD_FAULT. It has
been added so as to make the failure recovery terminology line up
with the major event log terminology. The two enumeration values may
be used interchangeably.
What Caused the Problem?
An I/O host card in one of the controllers is not functioning properly. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.
Caution: Electrostatic discharge can damage sensitive components. Always use proper antistatic protection when handling components. Touching components without using a proper ground may damage the equipment.
Important Notes
You will need to replace the controller that has the failed host card. The failed controller is listed in the
Component requiring service
field in the Details area.
To ensure a complete configuration restore (both disk pool and traditional volume group), it is highly recommended that storage array configuration data is backed up prior to executing this procedure. This is especially important for simplex storage arrays, and controllers that operate without the use of batteries. To save your configuration, open either the Command Line Interface (CLI), or the Script Editor from the Enterprise Management Window (EMW), and execute the following command:
save storageArray dbmDatabase sourceLocation=onboard controller[a] contentType=all file="hostfile.zip";
Recovery Steps
If... | Then... |
Your storage array has one controller | Go to Procedure for Storage Arrays with One Controller. |
Your storage array has two controllers | If there are any hosts connected to this storage array that are NOT running a host-based, multi-path failover driver, stop I/O to the storage array from each of these hosts. Go to Procedure for Storage Arrays with Two Controllers. |
Procedure for Storage Arrays with One Controller
1 | Check the replacement part number of the affected controller to ensure that the new controller has the same replacement part number.
| ||||||||||
2 | Stop all I/O to this storage array. | ||||||||||
3 | Turn off power to all power-fan canisters in the tray containing the failed controller. | ||||||||||
4 | Remove the affected controller. Refer to the Enterprise Management Window to view which management method you are using to manage this storage array.
| ||||||||||
5 | If necessary, insert the battery from the old controller canister into the new replacement controller canister. Make sure at least 1 minute has elapsed and then insert the new (compatible) controller canister firmly into place. | ||||||||||
6 | Turn on power to all power-fan canisters in the tray. Wait until all drives have completed the spin-up process, and then go to step 7. | ||||||||||
7 | On the Hardware tab in the AMW, select the affected controller and view the status of the controller in the Properties pane. | ||||||||||
8 | Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative. |
Procedure for Storage Arrays with Two Controllers
1 | Place the affected controller offline.
| ||||||
2 | Read all of the following steps before taking any action. The remaining recovery steps will no longer be accessible from the Recovery Guru dialog after you complete step a.
|
Target | NetAppESeries.StorageArray | ||
Parent Monitor | NetAppESeries.StorageArrayAvailability | ||
Category | Custom | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | NetAppESeries.FailureUnitMonitorType | ||
Remotable | True | ||
Accessibility | Internal | ||
Alert Message |
| ||
RunAs | Default | ||
Comment | Machine generated entity |
<UnitMonitor ID="NetAppESeries.FailureID_0150_Monitor" Accessibility="Internal" Enabled="true" Target="NetAppESeries.StorageArray" ParentMonitorID="NetAppESeries.StorageArrayAvailability" Remotable="true" Priority="Normal" TypeID="NetAppESeries.FailureUnitMonitorType" ConfirmDelivery="true" Comment="Machine generated entity">
<Category>Custom</Category>
<AlertSettings AlertMessage="NetAppESeries.REC_FAILED_HOST_IO_CARD_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name='FailureDescription']$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="NetAppESeries.StateIdC4319E02E02503AFCCA15F3433F68031" MonitorTypeStateID="NoIssue" HealthState="Success"/>
<OperationalState ID="NetAppESeries.StateId4E1AE622E206A6F9416D1E6A4131EF74" MonitorTypeStateID="IssueFound" HealthState="Error"/>
</OperationalStates>
<Configuration>
<FailureID>150</FailureID>
<IntervalSeconds>59</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<Trace>0</Trace>
</Configuration>
</UnitMonitor>