A volume group has
transitioned to the degraded state due to one or more drive failures.
What Caused the Problem?
One or more drives have failed in a disk pool or volume group and the associated volumes have become degraded. The data on the volumes is still accessible; however, data may be lost if another drive in the same disk pool or volume group fails. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.
Caution: Possible loss of data accessibility. Do not remove a component when either (1) the Service Action (removal) Allowed (SAA) field in the Details area of this recovery procedure is NO (), or (2) the SAA LED on the affected component is OFF (note that some products do not have SAA LEDs). Removing a component while its SAA LED is OFF may result in temporary loss of access to your data. Refer to the following Important Notes for more detail.
Caution: Electrostatic discharge can damage sensitive components. Always use proper antistatic protection when handling components. Touching components without using a proper ground may damage the equipment.
Important Notes
Usually, a single drive failure in a RAID 1, 3, or 5 volume group, or up to two drive failures in a RAID 6 disk pool or volume group, will cause a degraded volume status. However, a degraded RAID 1 volume group may have more than one failed drive if the failed drives belong to different mirrored pairs.
When you replace a failed drive, data from the failed drive is reconstructed on the new unassigned drive. This reconstruction should begin automatically after you insert the new drive.
Make sure the replacement drives have a capacity equal to or greater than the failed drives you will remove.
You can replace failed drives while the affected volumes are receiving I/O only if there are no other operations-in-progress for those volumes.Service Action Allowed Important Information:
The
Service Action (removal) Allowed
field in the Details area indicates whether or not you can safely remove the component. If the SAA field is NO (
), then the affected component must remain in place until you service another component first.
The
Service action LED on Component
field in the Details area indicates whether or not a physical SAA LED is present on the hardware component. This field does NOT indicate whether the LED is ON or OFF (that indication is provided by the Service action (removal) allowed field).
If a component does not have an SAA LED, then it is OK to remove the component when its fault LED is lit and the
Service Action (removal) Allowed
field = YES (
) in the Details area.
The
Service Action (removal) Allowed
field shown in the Details area and the physical SAA LED on the hardware component (if supported) MUST match before you remove the affected component. In rare cases (such as multiple problems), the status of the LED and the SAA field may not match. If there is a mismatch, then you should NOT remove the component until these indications match.
Recovery Steps
1 | Check the Recovery Guru Details area to identify the failed drive(s). |
2 | Remove all failed drives associated with this disk pool or volume group (the fault indicator lights on the failed drives should be on). Note: To determine the failed drives, select one of the degraded volumes (identified in the Details area) on the Storage and Copy Services tab in the AMW. Each failed drive will have an association dot underneath it. |
3 | Wait 30 seconds, then insert the new drives. The fault indicator light on the new drives may become lit for a short time (one minute or less). Data reconstruction should begin on the new drive(s). Their fault indicator lights will go off and the activity indicator lights of the drives in the disk pool or volume group will start flashing. When the reconstruction starts, the disk pool or volume group's volume icons on the Storage and Copy Services tab in the AMW change to Operation in Progress, then to Optimal, as the volumes are reconstructed. Notes:
|
4 | Click the Recheck button to rerun the Recovery Guru. When ALL failed drives are replaced, then this failure should no longer appear in the Summary area. If the failure appears again after all failed drives have been replaced, contact your Technical Support Representative. |
Target | NetAppESeries.StorageArray | ||
Parent Monitor | NetAppESeries.StorageArrayAvailability | ||
Category | Custom | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | NetAppESeries.FailureUnitMonitorType | ||
Remotable | True | ||
Accessibility | Internal | ||
Alert Message |
| ||
RunAs | Default | ||
Comment | Machine generated entity |
<UnitMonitor ID="NetAppESeries.FailureID_0013_Monitor" Accessibility="Internal" Enabled="true" Target="NetAppESeries.StorageArray" ParentMonitorID="NetAppESeries.StorageArrayAvailability" Remotable="true" Priority="Normal" TypeID="NetAppESeries.FailureUnitMonitorType" ConfirmDelivery="true" Comment="Machine generated entity">
<Category>Custom</Category>
<AlertSettings AlertMessage="NetAppESeries.REC_DEGRADED_VOLUME_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name='FailureDescription']$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="NetAppESeries.StateId7702F2D7AF204700E3FD540244153ABA" MonitorTypeStateID="NoIssue" HealthState="Success"/>
<OperationalState ID="NetAppESeries.StateId4E71F12D1F1E33A761E45FD10ABFE3F1" MonitorTypeStateID="IssueFound" HealthState="Error"/>
</OperationalStates>
<Configuration>
<FailureID>13</FailureID>
<IntervalSeconds>59</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<Trace>0</Trace>
</Configuration>
</UnitMonitor>