Monitor REC_FAILED_VOLUME (17)

NetAppESeries.FailureID_0017_Monitor (UnitMonitor)

A volume group has
been marked failed due to excessive drive failures.

Knowledge Base article:

Failed Volume

One or more volumes on the storage array have failed. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.

 Caution: Possible loss of data accessibility. Do not remove a component when either (1) the Service Action (removal) Allowed (SAA) field in the Details area of this recovery procedure is NO (), or (2) the SAA LED on the affected component is OFF (note that some products do not have SAA LEDs). Removing a component while its SAA LED is OFF may result in temporary loss of access to your data. Refer to the following Important Notes for more detail.

Caution: Electrostatic discharge can damage sensitive components. Always use proper antistatic protection when handling components. Touching components without using a proper ground may damage the equipment.

Important Notes

Recovery Steps

1

In the Array Management Window (AMW), verify the status of the drives associated with the failed volumes:

If...

Then...

The status of the associated drives is Optimal

It may be possible to recover data from the failed volumes. Do not continue with the remaining recovery steps and contact your Technical Support Representative. Performing any recovery actions before contacting your Technical Support Representative could jeopardize any chance of recovering data.

The status of the associated drives is NOT Optimal

Go to step 2.

2

If...

Then...

You mistakenly removed the wrong drive while performing a degraded volume recovery procedure

You can return the volume back to the Degraded state by replacing the drive you removed. After the volumes return to the Degraded state, click the

Recheck

button and perform the recovery steps listed for a "Degraded Volume" procedure. You are finished with this procedure.

You did NOT mistakenly remove the wrong drive while performing a degraded volume recovery procedure

It may be possible to recover data from the failed volumes. If you wish to attempt a data recovery, do not continue with the remaining recovery steps and contact your Technical Support Representative. Performing any recovery actions before contacting your Technical Support Representative could jeopardize any chance of recovering data.

If you prefer to recover from an existing backup, go to step 3.

3

Important: You will delete the disk pool or volume group later in these recovery steps. If you wish to re-create the disk pool or volume group later using the same configuration, select the Monitor > Reports > Storage Array Profile menu option and then click the Save As button to save a copy of the Storage Array Profile. Make sure the "Storage" option is selected in the Save As dialog.

There are several different types of volumes that can exist in a disk pool or volume group. Use the Recovery Guru details area to determine the affected disk pool or volume group. Then, find the disk pool or volume group on the Storage and Copy Services tab in the AMW. Use the information provided by the AMW to determine the types of volumes on the affected disk pool or volume group. Step through every entry in the following table and perform all procedures associated with the volume type combination for the affected disk pool or volume group.

If...

Then...

The affected disk pool or volume group contains one or more source or target volumes in a copy operation

Perform the following steps:

a

Select the Copy Services > Volume Copy > Manage Copies menu option from the AMW.

b

Check to see if any of the copy operations involving the affected volumes have a copy status of Pending, In Progress, or Failed.

c

Highlight the copy pair that contains the affected volume and select the

Copy > Stop

menu option.

d

Check to see if any of the Target volumes have read-only enabled

.

e

Disable read-only by selecting ALL target volumes that have read-only enabled and then selecting the

Change > Target Volume Permissions > Disable Read-Only

menu option.

f

Go to step 4.

One or more snapshot volumes exist on the affected disk pool or volume group

The information on the snapshots is no longer valid and cannot be retrieved.

Delete all snapshot volumes associated with the affected disk pool or volume group by highlighting the snapshot volume and selecting the Copy Services > Snapshot (Legacy) > Delete menu option. You will be able to create any needed snapshot volumes after this procedure has been completed.

One or more snapshot repository volumes exist on the affected disk pool or volume group

The information on the snapshot volumes associated with the affected snapshot repository volumes is no longer valid and cannot be retrieved, even if the associated snapshot volumes exist on a different disk pool or volume group.

Delete all snapshot volumes associated with the snapshot repositories on the affected disk pool or volume group by highlighting the associated snapshot volumes and selecting the Copy Services > Snapshot (Legacy) > Delete menu option. You will be able to create any needed snapshots after this procedure has been completed.

The mirror repository volumes exist on the affected disk pool or volume group

Perform the following steps:

a

Save the Storage Array Profile by selecting the

Monitor > Reports > Storage Array Profile

menu option and then selecting the Save As button. The profile will give you a roadmap of any mirror relationships you may want to recreate after re-activating Synchronous Mirroring.

b

Remove all mirror relationships on this storage array by highlighting any primary volume and selecting the

Copy Services > Mirroring > Synchronous Mirroring > Remove Mirror Relationship

menu option. You can then select all mirror relationships on the storage array.

c

Deactivate the Synchronous Mirroring feature by selecting the

Copy Services > Mirroring > Deactivate

menu option.

d

Re-activate the Synchronous Mirroring feature by selecting the

Copy Services > Mirroring > Activate

menu option. Your mirror repository volumes will now reside on a different disk pool or volume group. Once the affected volume group or disk pool is restored, step 9 will help you restore your mirror relationships.

One or more Synchronous Mirroring primary or secondary volumes exist on the affected volume group, but the mirror repository volumes exist on a different disk pool or volume group

Perform the following steps:

a

Save the Storage Array Profile by selecting the

Monitor > Reports > Storage Array Profile

menu option and then selecting the Save As button. The profile will give you a roadmap of any mirror relationships you may want to recreate after re-activating Synchronous Mirroring.

b

Remove the mirror relationships for only those primary or secondary volumes on the affected disk pool or volume group by highlighting any of the affected primary volumes and selecting the

Copy Services > Mirroring > Synchronous Mirroring > Remove Mirror Relationship

menu option

.

You can then select only those mirror relationships that exist on the affected disk pool or volume group.

c

Once the affected disk pool or volume group is restored, step 9 will help you restore your mirror relationships for the volumes on the affected disk pool or volume group. The mirror relationships that use secondary volumes on the affected disk pool or volume group will synchronize automatically once the affected disk pool or volume group is restored.

d

Go to step 4.

Only standard volumes exist on the affected disk pool or volume group

Go to step 4.

4

Locate all failed drives associated with this disk pool or volume group (the fault indicator lights on the failed drives should be lit).

Note: To determine the associated drives, select one of the affected volumes (identified in the Details area) on the Storage and Copy Services tab in the AMW. Each associated drive will have an association dot underneath it.

5

Remove each failed drive.

6

Wait 30 seconds, then insert the new drives into the same slots (if you want to keep the disk pool or volume group on drives in the same slot locations). The fault indicator light on the new drives may become lit for a short time (one minute or less).

Note: Wait until the replaced drives are ready (fault indicator light off) before going to step 7.

7

Important:

All data on the disk pool or volume group will be lost once you complete this step. Be sure that you have an adequate backup, or go back to step 1 if you want to attempt data recovery.

If...

Then...

The affected object is a disk pool

Perform the following steps:

a

On the

Storage and Copy Services

tab in the AMW, highlight the affected disk pool.

b

Select the

Storage > Disk Pool > Delete

menu option.

c

Follow the instructions in the dialog, and then type "yes" to confirm the operation.

d

Go to step 8.

The affected object is a volume group

Perform the following steps:

a

On the

Storage and Copy Services

tab in the AMW, highlight the affected disk pool.

b

Select the

Storage > Volume Group > Delete

menu option.

c

Follow the instructions in the dialog, and then type "yes" to confirm the operation.

d

Go to step 8.

8

If...

Then...

You want to re-create the disk pool or volume group with the same configuration before the failure

If...

Then...

The affected object is a disk pool

Perform the following steps:

a

On the

Storage and Copy Services

tab in the AMW, select a Free Capacity node in the tree.

b

Select the

Storage > Disk Pool > Create

menu option.

c

While creating the new disk pool, refer to the Storage Array Profile you saved in step 3 for information about how the previous disk pool was configured.

d

Go to step 9.

The affected object is a volume group

Perform the following steps:

a

On the

Storage and Copy Services

tab in the AMW, select a Free Capacity node in the tree.

b

Select the

Storage > Volume Group > Create

menu option.

c

While creating the new volume group, refer to the Storage Array Profile you saved in step 3 for information about how the previous volume group was configured.

d

Go to step 9.

You do not want to re-create the disk pool or volume group

Go to step 12.

9

If...

Then...

You deleted one or more snapshot volumes or snapshot repository volumes in step 3.

If desired, create new snapshot volumes to replace those you deleted.

You stopped one or more copy operations in step 3.

If desired, re-create any copies you stopped by highlighting the copy pairs in the Copy Manager and selecting

Copy > Re-Copy

.

You disabled read-only on any target volumes in step 3.

Restore that data on those volumes from backup.

You removed mirror relationships during step 3

Re-create any desired mirror relationships by selecting any volume you wish to be a primary (note that some of these volumes may reside on remote storage arrays) and selecting the

Storage >

Volume > Create > Synchronous Mirrored Pair

menu option.

10

Add the volumes in the new disk pool or volume group back to the operating system. You may need to reboot the system to see the volumes.

Note: Do not start I/O to these volumes until after you restore from backup.

11

Restore the data for the new volumes from backup.

12

Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative.

Element properties:

TargetNetAppESeries.StorageArray
Parent MonitorNetAppESeries.StorageArrayAvailability
CategoryCustom
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeNetAppESeries.FailureUnitMonitorType
RemotableTrue
AccessibilityInternal
Alert Message
Alert: REC_FAILED_VOLUME
A volume group has
been marked failed due to excessive drive failures. Alert Value: {0}
RunAsDefault
CommentMachine generated entity

Source Code:

<UnitMonitor ID="NetAppESeries.FailureID_0017_Monitor" Accessibility="Internal" Enabled="true" Target="NetAppESeries.StorageArray" ParentMonitorID="NetAppESeries.StorageArrayAvailability" Remotable="true" Priority="Normal" TypeID="NetAppESeries.FailureUnitMonitorType" ConfirmDelivery="true" Comment="Machine generated entity">
<Category>Custom</Category>
<AlertSettings AlertMessage="NetAppESeries.REC_FAILED_VOLUME_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name='FailureDescription']$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="NetAppESeries.StateId493B053E5427ABE32FA8DDA0B4B01DD" MonitorTypeStateID="NoIssue" HealthState="Success"/>
<OperationalState ID="NetAppESeries.StateId75539B8ABEDE577AE9D7084006B4A39A" MonitorTypeStateID="IssueFound" HealthState="Error"/>
</OperationalStates>
<Configuration>
<FailureID>17</FailureID>
<IntervalSeconds>59</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<Trace>0</Trace>
</Configuration>
</UnitMonitor>