The failure type is Impending Drive Failure Risk - Medium.
What Caused the Problem?
A drive is reporting internal errors that could cause the drive to fail. If this drive fails, the volumes in the disk pool or volume group will become degraded. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.
Caution: Risk of Data Loss. If a "Degraded Volume" problem is also displayed in the Recovery Guru Summary area, always fix the "Degraded Volume" problem first. Fixing the "Impending Drive Failure" problem before fixing a "Degraded Volume" may result in data loss.
Caution: Electrostatic discharge can damage sensitive components. Always use proper antistatic protection when handling components. Touching the components without using a proper ground may damage the equipment.
Important Notes
The affected volumes are RAID 1,3,5,6. If the drive fails, you may lose redundancy. You should correct this problem as soon as possible.
If the affected drive is a member of a volume group, the recommended recovery procedure will require the use of a standby hot spare. Because hot spares will automatically spare for any similar failed or missing drives in the storage array, it is recommended that all failed or missing drives be replaced prior to following this recovery procedure.
Because the drive has not failed, its fault indicator light will not be turned on.
You can replace the drive while the volumes associated to the affected drive are receiving I/O.
Recovery Steps
If... | Then... |
You want to replace the drive now | Go to step 2. |
You want to take no further action and wait for the affected drive to fail. | You can then complete the recovery steps to fix the problem when it is reported. You need to do nothing more in this procedure. |
Check the status of the volumes associated with the affected drive. To determine the associated volumes, on the Hardware tab in the Array Management Window (AMW), highlight the affected drive and view the "Associated volume group" or "Associated disk pool" in the Properties pane. Next, on the Storage and Copy Services tab, view the status of the identified disk pool or volume group and its volumes.
All volumes on the
Storage and Copy Services
tab in the AMW should be Optimal
before continuing with this procedure.
If any volumes in the disk pool or volume group currently show Operation in Progress
, wait for all volumes to change to Optimal before continuing.
If the volumes change from Operation in Progress to any status other than Optimal, click the
Recheck
button to rerun the Recovery Guru and fix the failures reported.
If the affected drive is a member of a volume group, make sure there is at least one standby hot spare in the storage array that is eligible to spare for the affected drive, before proceeding to step 4. If there are no eligible standby hot spares available in the storage array, assign an unassigned drive that is similar in media and interface type, and that has a capacity equal or larger than the affected drive as a hot spare. Information on assigning a hot Spare can be found in the "Assigning and Unassigning Hot Spare Drives" Online Help Topic. If you are unable to assign a hot spare due to your storage array's configuration go to Recovering a Volume Group Drive without a Hot Spare.
Highlight the affected drive on the Hardware tab in the AMW and select the Hardware > Drive > Advanced > Fail menu option. This will open the Confirm Fail Drive dialog.
Making sure that the Copy contents of drive before failing checkbox is checked, type 'yes' into the text field and press OK. The drive's contents will now be copied to either an available hot spare (if the drive is a member of a volume group), or to the appropriate disk pool's preservation capacity. The drive will then be failed.
After the drive copy operation has completed and all of the volumes in the drive's disk pool or volume group have changed back to Optimal , remove the failed drive (its fault indicator light should be on).
Wait 30 seconds, then insert the new drive. Its fault indicator light may be lit for a short time (one minute or less). The newly inserted drive's status will then transition to "Replaced", and the original drive's data will begin copying to the drive.
Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative.
Recovering a Volume Group Drive without a Hot Spare
Use the following procedure if all of the following conditions apply:
The affected drive is a member of a RAID 1, 3, 5, or 6 volume group
There are no eligible standby hot spares in the storage array
There are no unassigned drives that can be assigned as a hot spare for the affected drive
Note: Whenever possible it is recommended to copy the drive's contents to a hot spare before failing the drive, as it greatly reduces the possibility of data loss.
Recovery Steps
Highlight the affected drive on the Hardware tab in the AMW and select the Hardware > Drive > Advanced > Fail menu option. This will open the Confirm Fail Drive dialog.
Making sure that the Copy contents of drive before failing checkbox is unchecked, type 'yes' into the text field and press OK. The associated volumes will become Degraded .
Remove the failed drive (its fault indicator light should be on).
Wait 30 seconds, then insert the new drive. Its fault indicator light may be lit for a short time (one minute or less).
Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative.
Target | NetAppESeries.StorageArray | ||
Parent Monitor | NetAppESeries.StorageArrayAvailability | ||
Category | Custom | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | NetAppESeries.FailureUnitMonitorType | ||
Remotable | True | ||
Accessibility | Internal | ||
Alert Message |
| ||
RunAs | Default | ||
Comment | Machine generated entity |
<UnitMonitor ID="NetAppESeries.FailureID_0025_Monitor" Accessibility="Internal" Enabled="true" Target="NetAppESeries.StorageArray" ParentMonitorID="NetAppESeries.StorageArrayAvailability" Remotable="true" Priority="Normal" TypeID="NetAppESeries.FailureUnitMonitorType" ConfirmDelivery="true" Comment="Machine generated entity">
<Category>Custom</Category>
<AlertSettings AlertMessage="NetAppESeries.REC_IMPENDING_DRIVE_FAILURE_RISK_MED_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name='FailureDescription']$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="NetAppESeries.StateId45E72E4D437AE65DE91A3A82DA5CB7BD" MonitorTypeStateID="NoIssue" HealthState="Success"/>
<OperationalState ID="NetAppESeries.StateId5DA434C6E4DB6A6D9760417968B20162" MonitorTypeStateID="IssueFound" HealthState="Error"/>
</OperationalStates>
<Configuration>
<FailureID>25</FailureID>
<IntervalSeconds>59</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<Trace>0</Trace>
</Configuration>
</UnitMonitor>