Impending Physical Disk Failure (High Data Availability Risk)
The causes and resolutions refer to the Dell Modular Disk Storage Manager recovery guru. Launch Dell Modular Disk Storage Manager to diagnose and fix the recovery failure as follows:
Open Start >> Programs >> Dell >> MD Storage Manager >> Modular Disk Storage Manager Client.
If the MD Storage Array is already being managed by MDSM, you can proceed with the Causes and Resolution sections.
From Edit -> Add Storage Array, provide the IP address of the MD Storage Array and Add it to the discovered devices configuration in order to manage it.
Select the MD Storage Array and follow the steps specified in this recovery guru.
A physical disk is reporting internal errors that could cause the physical disk to fail. The Recovery Guru Details area provides specific information you will need as you follow the recovery steps.
Caution: Risk of Data Loss. This problem needs to be resolved immediately. Data loss will occur if the indicated physical disk fails before you follow these recovery steps.
Caution: Electrostatic discharge can damage sensitive components. Always use proper antistatic protection when handling components. Touching the components without using a proper ground may damage the equipment.
Important Notes
No more physical disks can fail in this disk pool or disk group without losing data. If this physical disk fails before you follow these recovery steps, the virtual disks in the disk pool or disk group will fail and all data on the virtual disks will be lost.
The current status and RAID level of the virtual disks determines the corrective action you must take in the following recovery steps.
Because the physical disk has not failed, its fault indicator light will not be turned on.
Make sure the replacement physical disk has a capacity equal to or greater than the physical disk you will remove.
To determine the associated virtual disks, disk pools, or disk groups, on the Hardware tab in the Array Management Window, highlight the affected physical disk and view the "Associated disk group" or "Associated disk pool" in the Properties pane. Next, on the Storage and Copy Services tab, view the status of the identified disk pool or disk group and its virtual disks.
If the current status/RAID level of the virtual disks is... | Then go to... |
Optimal RAID 0 | 'Recovering RAID 0' |
Degraded RAID 1, 5, 6, or 10 | 'Recovering Degraded Virtual Disks' |
RAID 1, 5, 6, or 10 with a hot spare physical disk currently being reconstructed | 'Recovering with a Reconstructing Hot Spare' |
Recovering RAID 0
Use the following procedure if the affected virtual disks are RAID 0.
1 | Stop all I/O to the affected virtual disks. | ||||||||||||||
2 | Back up all data on the affected virtual disks (step 5 will destroy all data on the affected virtual disks). Note: To the Operating System (OS), a failed virtual disk is exactly the same as a failed non-RAID physical disk. Refer to the OS documentation for any special requirements concerning failed physical disks and perform them where necessary. | ||||||||||||||
3 |
| ||||||||||||||
4 | If you have snapshot (legacy) virtual disks associated with the affected virtual disks, these snapshot virtual disks will no longer be valid once you fail the physical disk in step 5. Perform any necessary operations (such as backup) on the snapshot virtual disks and then delete them. | ||||||||||||||
5 | Caution: Risk of Data Loss. The data on the affected virtual disks will be lost once you perform this step. Be sure you have backed up your data before performing this step. Highlight the affected physical disk on the Hardware tab in the AMW and select the Hardware > Physical Disk > Advanced > Fail... menu option. The affected virtual disks become Failed . | ||||||||||||||
6 | Remove the failed physical disk (its fault indicator light should be on). | ||||||||||||||
7 | Wait 30 seconds, then insert the new physical disk. Its fault indicator light may be lit for a short time (one minute or less). Note: Wait until the new physical disk is ready (its fault indicator light must be off) before attempting to initialize the virtual disks in step 8. | ||||||||||||||
8 | Highlight the disk group associated with the replaced physical disk on the Storage and Copy Services tab in the AMW and select the Storage > Disk Group > Advanced > Initialize... menu option.
Note: Make sure you save this procedure by selecting Save As because once you perform step 9 and the failure is fixed, you will not be able to access the information in steps 10 and 11 from the Recovery Guru. | ||||||||||||||
9 | Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative, otherwise, go to step 10. | ||||||||||||||
10 | Add the affected virtual disks back to the operating system. You may need to reboot the system to see the re-initialized virtual disks. Note: Do not start I/O to these virtual disks until after you restore from backup. | ||||||||||||||
11 | Restore the data for the affected virtual disks from backup. | ||||||||||||||
12 | If desired, create any snapshots that you deleted in step 4. | ||||||||||||||
13 | If desired, re-create any copies you stopped by highlighting the copy pairs in the Copy Manager and selecting the Copy > Re-Copy menu option. |
Recovering Degraded Virtual Disks
Use the following procedure if the affected virtual disks are degraded RAID 1, 5, 6, or 10. This procedure applies to both disk groups and disk pools. You will need two replacement physical disks for this procedure.
Caution: Risk of Data Loss. An impending physical disk failure means that the affected physical disk is likely to fail. If it fails while you are replacing the physical disk that has already failed on this disk pool or disk group (see steps 3 and 4 below), you will lose all data on the affected virtual disks.
1 | Although it is not required, you should stop all I/O to the affected virtual disks to reduce the possibility of data loss. |
2 | Although it is not required, you should back up all data on the affected virtual disks. |
3 | Remove the failed physical disk. The fault indicator light for the physical disk should be on. Note: The Service Action Allowed status in the Details area is always NO for this problem because the component is not yet failed. In this situation, it is acceptable to remove the component even though the Service Action Allowed is NO. |
4 | Wait 30 seconds, then insert the new physical disk.
|
5 | Wait until all affected virtual disks have returned to an Optimal status. Resume I/O to the affected virtual disks, if you stopped it in step 1. |
6 | Highlight the impending physical disk failure physical disk on the Hardware tab in the Array Management Window and select the Hardware > Physical Disk > Advanced > Fail... menu option. The virtual disks in the disk pool or disk group return to a Degraded state. |
7 | Remove the failed physical disk (its fault indicator light should be on). Note: The Service Action (removal) Allowed (SAA) status in the Details area is always NO for this problem because the component is not yet failed. In this situation, it is acceptable to remove the component even though the SAA is NO. |
8 | Wait 30 seconds, then insert the new physical disk. |
9 | Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative. |
Recovering with a Reconstructing Hot Spare
Use the following procedure if all of the following conditions apply:
The affected virtual disks are RAID 1, 5, 6, or 10
A hot spare is currently being reconstructed to cover for a failed physical disk
The hot spare physical disk does not have an Impending Physical Disk Failure status
Caution: Risk of Data Loss. An Impending Physical Disk Failure means that the affected physical disk is likely to fail. If it fails while the hot spare is reconstructing, you will lose all data on the affected virtual disks. For this reason, you should stop all I/O to the affected virtual disks and back up all data on the affected virtual disks before replacing the physical disks.
1 | Although it is not required, you should stop all I/O to the affected virtual disks to reduce the possibility of data loss. |
2 | Although it is not required, you should back up all data on the affected virtual disks. |
3 | Wait for the hot spare physical disk to finish reconstructing.
|
4 | Click the Recheck button to rerun the Recovery Guru. The failure should no longer appear in the Summary area. If the failure appears again, contact your Technical Support Representative. |
Target | Microsoft.SystemCenter.ManagementServer | ||
Category | Alert | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Warning | ||
Alert Priority | Normal | ||
Remotable | True | ||
Alert Message |
|
ID | Module Type | TypeId | RunAs |
---|---|---|---|
DS | DataSource | Microsoft.Windows.ScriptGenerated.EventProvider | Default |
Alert | WriteAction | System.Health.GenerateAlert | Default |
WriteToDW | WriteAction | Microsoft.SystemCenter.DataWarehouse.PublishEventData | Default |
<Rule ID="Dell.MDStorageArray.ABBXMLEvent24" Enabled="onEssentialMonitoring" Target="SystemCenter!Microsoft.SystemCenter.ManagementServer" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100">
<Category>Alert</Category>
<DataSources>
<DataSource ID="DS" TypeID="Windows!Microsoft.Windows.ScriptGenerated.EventProvider">
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<ScriptName>RBODEventGenerator</ScriptName>
<EventNumber>24</EventNumber>
</DataSource>
</DataSources>
<WriteActions>
<WriteAction ID="Alert" TypeID="SystemHealth!System.Health.GenerateAlert">
<Priority>1</Priority>
<Severity>1</Severity>
<AlertMessageId>$MPElement[Name="Dell.MDStorageArray.ABBXMLEvent24.StringResource"]$</AlertMessageId>
<AlertParameters>
<AlertParameter1>$Data/EventDescription$</AlertParameter1>
</AlertParameters>
<Suppression>
<SuppressionValue>$Data/EventDisplayNumber$</SuppressionValue>
<SuppressionValue>$Data/Channel$</SuppressionValue>
<SuppressionValue>$Data/PublisherName$</SuppressionValue>
<SuppressionValue>$Data/LoggingComputer$</SuppressionValue>
<SuppressionValue>$Data/EventCategory$</SuppressionValue>
<SuppressionValue>$Data/EventLevel$</SuppressionValue>
<SuppressionValue>$Data/UserName$</SuppressionValue>
<SuppressionValue>$Data/EventNumber$</SuppressionValue>
<SuppressionValue>$Data/EventDescription$</SuppressionValue>
</Suppression>
<Custom1/>
<Custom2/>
<Custom3/>
<Custom4/>
<Custom5/>
<Custom6/>
<Custom7/>
<Custom8/>
<Custom9/>
<Custom10/>
</WriteAction>
<WriteAction ID="WriteToDW" TypeID="SCDW!Microsoft.SystemCenter.DataWarehouse.PublishEventData"/>
</WriteActions>
</Rule>