A volume group has
been marked failed due to excessive drive failures.
One or more volumes on the storage array have failed. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.
Caution: Possible loss of data accessibility. Do not remove a component when either (1) the Service Action (removal) Allowed (SAA) field in the Details area of this recovery procedure is NO (), or (2) the SAA LED on the affected component is OFF (note that some products do not have SAA LEDs). Removing a component while its SAA LED is OFF may result in temporary loss of access to your data. Refer to the following Important Notes for more detail.
Caution: Electrostatic discharge can damage sensitive components. Always use proper antistatic protection when handling components. Touching components without using a proper ground may damage the equipment.
Important Notes
You may be able to recover data from a failed volume. Whether or not this is possible depends on how the failure occurred. You can use this procedure to restore data in two ways: contacting your Technical Support Representative to attempt a data recovery, or restoring data from backup media.
If the volume is marked failed because you replaced the wrong drive during a degraded volume recovery procedure, you have not lost data. Follow the recovery procedure to reinsert the drive and return the volume to the Degraded state.
This problem could happen if one or more drives comprising the disk pool or volume group have failed, causing the associated volumes to fail. However, there may be situations, such as unreadable sectors, where you can have a failed volume even if the drives associated with the volume are Optimal. Follow the recovery procedure to identify the status of the associated drives and the appropriate recovery steps.
All I/O to the affected volumes will fail.
To the Operating System (OS), a failed volume is exactly the same as a failed non-RAID drive. Refer to the operating system documentation for any special requirements concerning failed drives and perform them where necessary.
Make sure the replacement drives have a capacity equal to or greater than the failed drives you will remove in the following steps.
You can replace the failed drives while other disk pools and volume groups on the storage array are receiving I/O. Service Action Allowed Important Information:
The
Service Action (removal) Allowed
field in the Details area indicates whether or not you can safely remove the component. If the SAA field is NO (
), then the affected component must remain in place until you service another component first.
The
Service action LED on Component
field in the Details area indicates whether or not a physical SAA LED is present on the hardware component. This field does NOT indicate whether the LED is ON or OFF (that indication is provided by the Service action (removal) allowed field).
If a component does not have an SAA LED, then it is OK to remove the component when its fault LED is lit and the
Service Action (removal) Allowed
field = YES (
) in the Details area.
The
Service Action (removal) Allowed
field shown in the Details area and the physical SAA LED on the hardware component (if supported) MUST match before you remove the affected component. In rare cases (such as multiple problems), the status of the LED and the SAA field may not match. If there is a mismatch, then you should NOT remove the component until these indications match.
Recovery Steps
1 | In the Array Management Window (AMW), verify the status of the drives associated with the failed volumes:
| ||||||||||||||||||||||||||||||||||||||||||
2 |
| ||||||||||||||||||||||||||||||||||||||||||
3 | Important: You will delete the disk pool or volume group later in these recovery steps. If you wish to re-create the disk pool or volume group later using the same configuration, select the Monitor > Reports > Storage Array Profile menu option and then click the Save As button to save a copy of the Storage Array Profile. Make sure the "Storage" option is selected in the Save As dialog. There are several different types of volumes that can exist in a disk pool or volume group. Use the Recovery Guru details area to determine the affected disk pool or volume group. Then, find the disk pool or volume group on the Storage and Copy Services tab in the AMW. Use the information provided by the AMW to determine the types of volumes on the affected disk pool or volume group. Step through every entry in the following table and perform all procedures associated with the volume type combination for the affected disk pool or volume group.
| ||||||||||||||||||||||||||||||||||||||||||
4 | Locate all failed drives associated with this disk pool or volume group (the fault indicator lights on the failed drives should be lit). Note: To determine the associated drives, select one of the affected volumes (identified in the Details area) on the Storage and Copy Services tab in the AMW. Each associated drive will have an association dot underneath it. | ||||||||||||||||||||||||||||||||||||||||||
5 | Remove each failed drive. | ||||||||||||||||||||||||||||||||||||||||||
6 | Wait 30 seconds, then insert the new drives into the same slots (if you want to keep the disk pool or volume group on drives in the same slot locations). The fault indicator light on the new drives may become lit for a short time (one minute or less). Note: Wait until the replaced drives are ready (fault indicator light off) before going to step 7. | ||||||||||||||||||||||||||||||||||||||||||
7 | Important: All data on the disk pool or volume group will be lost once you complete this step. Be sure that you have an adequate backup, or go back to step 1 if you want to attempt data recovery.
| ||||||||||||||||||||||||||||||||||||||||||
8 |
| ||||||||||||||||||||||||||||||||||||||||||
9 |
| ||||||||||||||||||||||||||||||||||||||||||
10 | Add the volumes in the new disk pool or volume group back to the operating system. You may need to reboot the system to see the volumes. Note: Do not start I/O to these volumes until after you restore from backup. | ||||||||||||||||||||||||||||||||||||||||||
11 | Restore the data for the new volumes from backup. | ||||||||||||||||||||||||||||||||||||||||||
12 | Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative. |
Target | NetAppESeries.StorageArray | ||
Parent Monitor | NetAppESeries.StorageArrayAvailability | ||
Category | Custom | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | NetAppESeries.FailureUnitMonitorType | ||
Remotable | True | ||
Accessibility | Internal | ||
Alert Message |
| ||
RunAs | Default | ||
Comment | Machine generated entity |
<UnitMonitor ID="NetAppESeries.FailureID_0017_Monitor" Accessibility="Internal" Enabled="true" Target="NetAppESeries.StorageArray" ParentMonitorID="NetAppESeries.StorageArrayAvailability" Remotable="true" Priority="Normal" TypeID="NetAppESeries.FailureUnitMonitorType" ConfirmDelivery="true" Comment="Machine generated entity">
<Category>Custom</Category>
<AlertSettings AlertMessage="NetAppESeries.REC_FAILED_VOLUME_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name='FailureDescription']$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="NetAppESeries.StateId493B053E5427ABE32FA8DDA0B4B01DD" MonitorTypeStateID="NoIssue" HealthState="Success"/>
<OperationalState ID="NetAppESeries.StateId75539B8ABEDE577AE9D7084006B4A39A" MonitorTypeStateID="IssueFound" HealthState="Error"/>
</OperationalStates>
<Configuration>
<FailureID>17</FailureID>
<IntervalSeconds>59</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<Trace>0</Trace>
</Configuration>
</UnitMonitor>