Dell Chassis Storage External PhysicalDisk Primary Health Status PollBased UnitMonitor

Dell.Chassis.Storage.ExternalPhysicalDiskPrimaryStatusHealth.PollBasedUnitMonitor (UnitMonitor)

Knowledge Base article:

Summary

Dell Chassis Storage External PhysicalDisk Primary Health Status PollBased UnitMonitor

Causes

Probable causes and corresponding resolutions for this condition are:

Cause

Resolutions

A physical disk included in the virtual disk has failed or is corrupted. In addition, you may have cancelled the rebuild.

Replace the failed or corrupt disk, and then start the rebuild operation.

The RAID Controller may not be able to read/write data to the physical disk drive indicated in the message. This may be due to a failure with the physical disk drive or because the physical disk drive was removed from the system.

Remove and re-insert the physical disk drive identified in the message and make sure the physical disk drive is inserted properly. If the issue persists, replace the physical disk drive.

A Clear task was being performed on a physical disk but the task was interrupted and did not complete successfully. The controller may have lost communication with the disk, the disk was removed, or the cables may be loose or defective.

Verify that the disk is present and not in a Failed state. Make sure the cables are attached securely. See the storage hardware documentation for more information on checking the cables. Restart the Clear task.

The Patrol Read task has encountered an error that cannot be corrected. There may be a bad disk block that cannot be remapped.

Backup your data from the disk. Start disk initialization and wait for it to complete, then restore the data from the backup to the disk.

The controller encountered an unrecoverable medium error when attempting to read a block on the physical disk and marked that block as invalid. If the controller encountered the unrecoverable medium error on a source physical disk during a rebuild or reconfigure operation, it punctures the corresponding block on the target physical disk. The invalid block clears on a write operation.

Back up the data from the disk. Start disk initialization and wait for it to complete, and then restore the data from a backup copy.

The controller firmware attempted to do SMART polling on the hot spare but could not complete the SMART polling. The controller may have lost communication with the hot spare.

Verify the health of the disk assigned as a hot spare. Replace the disk and reassign the hot spare. Make sure the cables are attached securely. See the storage hardware documentation for more information on checking the cables.

The bad block table is the table used for remapping bad disk blocks. This table fills as bad disk blocks are remapped. When the table is full, bad disk blocks are no longer remapped, which means that disk errors are no longer corrected. At this point, data loss can occur.

Replace the disk generating this message and restore from a backup copy. You may have lost data.

The rebuild of <physical disk> failed due to errors on the source physical disk.

Replace the source disk and restore data from backup.

The rebuild failed due to errors on the target <physical disk>.

Replace the target disk. If a rebuild does not automatically start after replacing the disk, then initiate the Rebuild task. You may need to assign the new disk as a hot spare to initiate the rebuild.

A write operation cannot complete because the disk contains bad disk blocks that could not be reassigned. Data loss may have occurred. Data redundancy may be lost.

Replace the disk.

The rebuild or recovery operation encountered an unrecoverable disk media error.

Replace the disk.

The physical disk participating in the copyback operation failed.

Replace the disk and retry the operation.

Errors were detected with security related operations on the disk. The data on the disk might not be retrieved or stored successfully. In addition, the security of the stored data might be at risk.

Verify the disk is a Secure Encrypted Disk and is not locked. If it is not, replace the disk with a Secure Encrypted Disk. See the storage hardware documentation for more information.

When physical drives in spun down power state are configured, the drives should transition to spun up power state.If the drive is not functioning properly, this transition could fail.

Replace the physical disk and try again. Contact technical support if the issue persists.

This message is generated after a copyback stops on a physical disk during a rebuild operation.

Wait for rebuild to finish, then copyback should resume.

The physical disk is predicted to fail. Many physical disks contain Self Monitoring Analysis and Reporting Technology (SMART). When enabled, SMART monitors the disk health based on indications such as the number of write operations that were performed on the disk.

Replace the physical disk. Even though the disk may not have failed yet, it is strongly recommended that the disk be replaced. Review other messages for additional information.

This message is generated if a physical disk does not have enough space to do a copyback operation.

Replace the physical disk with a larger physical disk, and then restart the copyback operation.

The physical device may not have a supported version of the firmware or the physical device may not be supported.

If the physical device is supported, then update the firmware to a supported version. If the physical device is not supported, then replace the physical device with one that is supported.

The dedicated hot spare is not large enough to protect all virtual disks that reside on the disk group.

Assign a larger disk as the dedicated hot spare.

A physical disk has been removed from the disk group. This alert can also be caused by loose or defective cables or by problems with the enclosure.

Do one of the following: 1) If a physical disk was removed from the disk group, either replace the disk or restore the original disk. Identify the disk that was removed by locating the disk that has a red "X" for its status. 2) Perform a rescan after replacing or restoring the disk. 3) If a disk was not removed from the disk group, then check for cable problems. Refer to product documentation for more information on checking the cables. 4) Make sure that the enclosure is powered on. 5) If the problem persists, check the enclosure documentation for further diagnostic information.

The global hot spare is not large enough to protect all virtual disks that reside on the controller.

Assign a larger disk as the global hot spare.

The controller has two connectors that are connected to the same enclosure. The communication path on one connector has lost connection with the enclosure. The communication path on the other connector is reporting this loss.

Make sure the cables are attached securely. See the Cables Attached Correctly section for more information on checking the cables. Make sure both Enclosure Management Modules (EMMs) are healthy.

The physical disk inserted is too small for the rebuild to occur.

Replace the physical disk with one of correct protocol (SAS, SATA) with at least the required size. Force rebuild if it does not start automatically.

The bad block table is the table used for remapping bad disk blocks. This table fills as bad disk blocks are remapped. When the table is full, bad disk blocks are no longer remapped, which means that disk errors are no longer corrected. At this point, data loss can occur. The bad block table is now 80 percent full.

Replace the disk generating this message.

A physical disk in the virtual disk is offline. This may have been user initiated.

Force the physical disk online or assign a hot spare to the virtual disk.

The reason for the error can vary depending on the situation. The firmware error code is indicated in the message.

Verify the health of attached devices. Review the Lifecycle log for significant events. Replace faulty hardware, if required. Make sure the cables are attached securely. Refer to the storage hardware documentation for more information on checking the cable connections.

Physical disk inserted is of the incorrect protocol. SAS or SATA mixing is not supported in the same virtual disk.

Remove drive, insert correct protocol type, and force a rebuild as imposed for controller and system.

The number of blocks on the disk that exhibit an error has exceeded the capacity of the drive to remap. Any future writes to bad sectors will be unrecoverable.

Replace the disk.

The attempt to update the physical disk has failed. This is due to too much activity on the bus, a bad update package, or a bad disk.

Retry the update. If the update fails a second time, verify the update package is valid. If the update package is valid, replace the failing disk. Contact technical support if the problem persists.

An error occurred while performing an action on the disk.

Check the disk is present or replace the disk.

This message is generated after a rebuild completes on a physical disk.

No response action is required.

A user has cancelled the <physical disk> rebuild operation.

Restart the rebuild operation.

The physical disk was assigned as a global hot spare to a virtual disk by a user operation.

No response action is required.

This message is generated after a rebuild starts on a physical disk.

No response action is required.

The physical disk was assigned as a dedicated hot spare to a virtual disk by a user operation.

No response action is required.

A physical disk that was assigned as a hot spare has been un-assigned and is no longer functioning as a hot spare. The physical disk was unassigned by a user or automatically by the storage management software. When one of the disk in a virtual disk fails, data is rebuilt onto the hot spare. The hot spare becomes a member of the virtual disk and is no longer assigned as a hot spare. In this situation, assigning a new hot spare to maintain data protection in recommended.

No response action is required. Assigning a new dedicated hot spare to the virtual disk is recommended.

A previously offline physical disk is now online.

No response action is required.

A user has initiated a clear operation on a physical disk.

No response action is required.

A physical drive that was previously in an error state has returned to a ready state.

No response action is required.

The clear command did not complete on the physical disk. This means that some data was not cleared and may be recoverable.

No response action is required.

This message is generated after rebuild resumes on a physical disk.

No response action is required.

This message is generated after a disk media error is corrected on a physical disk.

No response action is required.

The disk has a bad block. Data was remapped to another disk block. No data loss has occurred.

Monitor the disk for other messages or indications of poor health.

A copyback operation began to copy data from one physical disk to another physical disk. This may be due to a user initiated operation or a predictive failure.

No response action is required.

This alert is provided for informational purposes

No response action is required.

A user aborted an ongoing copyback operation. The operation did not complete.

Re-issue the command.

The physical disk firmware update operation has started.

Security on a secure encrypted disk was activated.

The Security key on a secure encrypted disk was activated.

No response action is required.

The physical device was reset. This is a normal part of operations and is not a cause for concern.

No response action is required.

The physical disk firmware update operation has completed.

No response action is required.

The attempt to update the physical disk has timed out. This may be due to too much activity on the bus, a bad update package, or a bad disk.

Retry the update. The update utility may have already performed the retry operation.

Security on a secure encrypted disk was disabled.

No response action is required.

The controller detected drives that require security keys for access. Without providing security keys, the drives are unstable.

Provide the security key required to unlock the secure encrypted drives.

Resolutions

Launch the CMC Console to debug further.

Element properties:

TargetDell.Chassis.Storage.Controller.Enclosure.ExternalPhysicalDisk
Parent MonitorSystem.Health.AvailabilityState
CategoryCustom
EnabledTrue
Alert GenerateFalse
Alert Auto ResolveFalse
Monitor TypeDell.Chassis.Storage.HealthCookDownUMT
RemotableTrue
AccessibilityPublic
RunAsDefault

Source Code:

<UnitMonitor ID="Dell.Chassis.Storage.ExternalPhysicalDiskPrimaryStatusHealth.PollBasedUnitMonitor" Accessibility="Public" Enabled="true" Target="DAD!Dell.Chassis.Storage.Controller.Enclosure.ExternalPhysicalDisk" ParentMonitorID="SystemHealth!System.Health.AvailabilityState" Remotable="true" TypeID="Dell.Chassis.Storage.HealthCookDownUMT" Priority="Normal" ConfirmDelivery="false">
<Category>Custom</Category>
<OperationalStates>
<OperationalState ID="Success" MonitorTypeStateID="Success" HealthState="Success"/>
<OperationalState ID="Critical" MonitorTypeStateID="Error" HealthState="Error"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>21600</IntervalSeconds>
<LogLevel>0</LogLevel>
<InstanceIndex>$Target/Property[Type="DAD!Dell.Chassis.Storage.Controller.Enclosure.ExternalPhysicalDisk"]/FQDD$</InstanceIndex>
<RemoteAccessIP>$Target/Host/Host/Host/Host/Property[Type="DAD!Dell.Chassis.Storage"]/RemoteAccessIP$</RemoteAccessIP>
<RemoteConfig>$Target/Host/Host/Host/Host/Property[Type="DAD!Dell.Chassis.Storage"]/RemoteSettings$</RemoteConfig>
<LogDirectory>ChassisRemoteAccess_Logs</LogDirectory>
<LogFileName>Dell_ChassisDetailed_Health_</LogFileName>
<ComponentType>Dell.Chassis.Storage.Controller.Enclosure.ExternalPhysicalDisk.PrimaryStatus</ComponentType>
<Username>$RunAs[Name='DAD!Dell.CMC.RemoteAccount']/UserName$</Username>
<Password>$RunAs[Name="DAD!Dell.CMC.RemoteAccount"]/Password$</Password>
</Configuration>
</UnitMonitor>