Dell Chassis Storage ControllerHealth PollBased UnitMonitor

Dell.Chassis.Storage.ControllerHealth.PollBasedUnitMonitor (UnitMonitor)

Dell Chassis Storage Controller Health PollBased UnitMonitor

Knowledge Base article:

Summary

Dell Chassis Storage Controller Health PollBased UnitMonitor

Causes

Probable causes and corresponding resolutions for this condition are:

Cause

Resolutions

The controller and attached enclosures are not cabled correctly.

Refer to the storage hardware documentation for information on correct cabling configurations.

An error involving multiple bits has been encountered during a read or write operation. The error correction algorithm recalculates parity data during read and write operations. If an error involves only a single bit, it may be possible for the error correction algorithm to correct the error and maintain parity data. An error involving multiple bits, however, generally indicates data loss. In some cases, if the multi-bit error occurs during a read operation, then the data on the disk may be ok. If the multi-bit error occurs during a write operation, then data loss has occurred.

Replace the Dual In-line Memory Module (DIMM). The DIMM is a part of the controller battery pack. Refer to the storage hardware documentation for information on replacing the DIMM. You may need to restore data from backup.

A diagnostics test failed. The text for this message is generated by the tool that ran the diagnostics and can vary depending on the situation.

Refer to the storage hardware documentation for more information on the diagnostic test.

The Dual In-line Memory Module (DIMM) is malfunctioning. Data loss or data corruption may occur. The DIMM must be replaced immediately.

Replace the DIMM immediately. The DIMM is a part of the controller battery pack. Refer to your hardware documentation for information on replacing the DIMM.

The text for this alert is generated by the firmware and can vary depending on the situation. The reference to SMP in this text refers to SAS Management Protocol.

There may be a SAS topology error. See the hardware documentation for information on correct SAS topology configurations. There may be problems with the cables such as a loose connection or an invalid cabling configuration. See the Cables Attached Correctly section for more information on checking the cables. See the hardware documentation for information on correct cabling configurations. Verify that the firmware version is supported.

There may be a problem with the enclosure. Verify the health of the enclosure and its components. To verify the health of the enclosure, select the enclosure object in the tree view. The Health sub-tab displays a red X or yellow exclamation point for enclosure components that are failed or degraded. See the enclosure documentation for more information.

There may be a problem with the enclosure. Verify the health of the enclosure and its components. To verify the health of the enclosure, select the enclosure object in the tree view. The Health sub-tab displays a red X or yellow exclamation mark for enclosure components that are failed or degraded. See the enclosure documentation for more information.

The controller encountered errors while applying the configuration on the physical disks. The configuration command applied has not taken effect.

Re-issue the failed configuration command or try with a different set of physical disks. Contact technical support if the problem persists.

The controller has flushed the cache and all data in the cache has been lost. This may occur when the system has memory or battery problems that cause the controller to distrust the cache. Although user data may have been lost, this alert does not always indicate that relevant or user data has been lost.

Verify that the battery and memory are functioning properly.

The controller memory is malfunctioning.

Replace the controller, contact Dell technical support at support.dell.com.

An error involving a single bit has been encountered during a read or write operation. The error correction algorithm has corrected this error.

No response action is required.

The Dual In-line Memory Module (DIMM) is beginning to malfunction.

Replace the Dual In-line Memory Module (DIMM) to avoid data loss or data corruption. The DIMM is a part of the controller battery pack. Refer to hardware documentation for information on replacing the DIMM.

The NVRAM has corrupted data. This may occur after a power surge, a battery failure, or for other reasons. The controller is reinitializing the NVRAM.

No response action is required as the controller is taking the required corrective action. However, If this message is generated often (such as during each reboot), then replace the controller.

The NVRAM has corrupt data. The controller is unable to correct the situation.

Replace the controller.

The text for this message is generated by the controller during runtime and can vary depending on the situation.

Make sure the SAS cables attached to the enclosure/backplane are attached securely. Refer to the storage hardware documentation for more information on checking the cables. Contact technical support if the issue persists.

The controller has detected a change in the storage configuration between since its last shut down. This can generally happen if some configured physical disks are removed from the system when the system was shutdown.

Shut down the system. Re-insert the removed physical disks and re-start the system.

The controller is unable to retrieve storage configuration saved during last shut down. This can generally happen if some configured physical disks are removed from the system when the system was shutdown.

Shut down the system. Re-insert the removed physical disks and re-start the system.

There is too much foreign configuration data to be imported in one attempt.

Import these foreign configurations in multiple attempts.

The controller has preserved data in the cache from a virtual disk, which it can no longer locate. No data loss has occurred as the controller is saving these operations in its cache.

Check for the existence of a foreign configuration and import if any. Check that enclosures are cabled correctly.

The controller cache was discarded by the user. Data may have been lost or corrupted.

If data was corrupted or lost, restore the data to the disk from a backup copy.

This event is retrieved from the controller when iDRAC storage monitoring was not running. Such events which are generated in the past are logged as informational severity.

No response action is required.

The controller alarm test has run successfully. This alert is provided for informational purposes.

No response action is required.

A user operation to reset the controller configuration was performed.

No response action is required.

This alert is provided for informational purposes.

No response action is required.

This is the result of a user operation.

No response action is required.

The operation to import foreign configuration (virtual disks, host spares, and so on) completed successfully.

Verify the foreign configuration was successfully imported by viewing the virtual disks and hot spares. See the hardware documentation for more information.

A patrol read operation was initiated by the controller or the user.

No response action is required.

The patrol read task ended.The patrol read task ended.

No response action is required.

The controller has physical disks that were moved from another controller or have been removed and replaced on the current configuration. These physical disks contain virtual disks that were created on the current or other controller. See the storage hardware documentation for Import Foreign Configuration and Clear Foreign Configuration for more information. Foreign configuration is seen when a controller is replaced.

Import or clear the detected foreign configuration using a configuration tool or utility.

This alert message occurs if controller firmware is flashed successfully.

No response action is required.

An attempt to hot plug an Enclosure Management Module (EMM) has been detected.

If the new enclosure and its device are not detected by the operating system, try restarting your system to verify the newly attached enclosure and its sub-components are detected. See the hardware documentation for more information.

This alert is provided for informational purposes. The text for this alert is generated by the controller and can vary depending on the situation.

No response action is required.

The controller was reset to factory default configuration settings.

No response action is required.

Physical disk(s) have been removed from a virtual disk. The virtual disk will be in Failed state during the next system reboot. This alert is provided for informational purposes.

No response action is required.

Resolutions

Launch the CMC Console to debug further.

Element properties:

TargetDell.Chassis.Storage.Controller
Parent MonitorSystem.Health.AvailabilityState
CategoryCustom
EnabledTrue
Alert GenerateFalse
Alert Auto ResolveFalse
Monitor TypeDell.Chassis.Storage.HealthCookDownUMT
RemotableTrue
AccessibilityPublic
RunAsDefault

Source Code:

<UnitMonitor ID="Dell.Chassis.Storage.ControllerHealth.PollBasedUnitMonitor" Accessibility="Public" Enabled="true" Target="DAD!Dell.Chassis.Storage.Controller" ParentMonitorID="SystemHealth!System.Health.AvailabilityState" Remotable="true" TypeID="Dell.Chassis.Storage.HealthCookDownUMT" Priority="Normal" ConfirmDelivery="false">
<Category>Custom</Category>
<OperationalStates>
<OperationalState ID="Success" MonitorTypeStateID="Success" HealthState="Success"/>
<OperationalState ID="Critical" MonitorTypeStateID="Error" HealthState="Error"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>21600</IntervalSeconds>
<LogLevel>0</LogLevel>
<InstanceIndex>$Target/Property[Type="DAD!Dell.Chassis.Storage.Controller"]/FQDD$</InstanceIndex>
<RemoteAccessIP>$Target/Host/Property[Type="DAD!Dell.Chassis.Storage"]/RemoteAccessIP$</RemoteAccessIP>
<RemoteConfig>$Target/Host/Property[Type="DAD!Dell.Chassis.Storage"]/RemoteSettings$</RemoteConfig>
<LogDirectory>ChassisRemoteAccess_Logs</LogDirectory>
<LogFileName>Dell_ChassisDetailed_Health_</LogFileName>
<ComponentType>Dell.Chassis.Storage.Controller</ComponentType>
<Username>$RunAs[Name='DAD!Dell.CMC.RemoteAccount']/UserName$</Username>
<Password>$RunAs[Name="DAD!Dell.CMC.RemoteAccount"]/Password$</Password>
</Configuration>
</UnitMonitor>