Alert monitor for Lenovo FlexSystem Cooling Zone inadequate cooling

IBM.FlexSystem.CoolZoonInadequate (UnitMonitor)


Alert monitor for Lenovo FlexSystem Cooling Zone inadequate cooling

Knowledge Base article:

Summary

This monitor watches for a CMM event that indicates cooling zone might not have adequate cooling.

Configuration

You can disable this monitor through the Operations Manager's Operations Console. See the "Disable monitors" topic in the Operations Manager's Operations User's Guide for more information.

The Flex System event is delivered to this monitor asynchronously. There is no monitoring interval to configure for this monitor.

The Flex System event is delivered to this monitor from the CMM (Chassis Management Module) of the Chassis via the SNMP (Simple Network Management Protocol) protocol. It also goes through the Flex System runtime support of the Hardware Management Pack installed on the management server that was designated to manage the PureFlex System during the Network Device Discovery process.

For the proper Flex System CMM SNMP settings that are required for the Hardware Management Pack to discover Flex System modules and report events, consult the Hardware Management Pack's User's Guide.

Causes

One or more fan modules in the specified cooling zone have failed or have been removed. If additional fan modules fail or are removed, chassis devices might shut down or throttle because of excessive temperatures. Consider moving applications that are running on nodes in the specified cooling zone to nodes in another cooling zone to ensure the availability of those applications. Note that the fan modules might run faster than normal to compensate for reduced cooling.

For a particular incident, review the history in the State Changes tab. Consult the relevant hardware knowledge articles listed below, keeping in mind the relevant event data.

The relevant Lenovo hardware knowledge articles are available on a system with the Lenovo Hardware Management Pack package installed.

Resolutions

Review the relevant Lenovo hardware knowledge articles listed above for information about how to resolve the hardware problem for a particular incident.

After the hardware problem is resolved, manually reset the health state of this monitor. However, any outstanding corresponding alerts will be automatically closed. See the "Reset Health" topic in the Operations Manager's Operations User's Guide for more information.

To verify that the hardware problem has been resolved, refer to the most recent health state of the corresponding "regular health checkup monitor." Be sure to refer to a health state that was reported later than the hardware event.

Additional

For the proper CMM SNMP settings needed for the Hardware Management Pack, see the "Configuring CMM SNMP settings" topic in the Lenovo Hardware Management Pack for Microsoft System Center Operations Manager Installation and User's Guide.

External

Links to Lenovo resources

Element properties:

TargetIBM.FlexSystem.Chassis
Parent MonitorSystem.Health.AvailabilityState
CategoryCustom
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeIBM.FlexSystem.SNMPTrap.3StateManualResetMonitorTypeForChassis
RemotableTrue
AccessibilityPublic
Alert Message
Lenovo FlexSystem Cooling Zone inadequate cooling

{0} -- EventID = {1}
RunAsDefault

Source Code:

<UnitMonitor ID="IBM.FlexSystem.CoolZoonInadequate" Accessibility="Public" Enabled="true" Target="IBM.FlexSystem.Chassis" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="IBM.FlexSystem.SNMPTrap.3StateManualResetMonitorTypeForChassis" ConfirmDelivery="false">
<Category>Custom</Category>
<AlertSettings AlertMessage="IBM.FlexSystem.CoolZoonInadequate.AlertMessageID">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/SnmpVarBinds/SnmpVarBind[OID='.1.3.6.1.4.1.2.6.158.3.1.1.8']/Value$</AlertParameter1>
<AlertParameter2>$Data/Context/SnmpVarBinds/SnmpVarBind[OID='.1.3.6.1.4.1.2.6.158.3.1.1.14']/Value$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="ComponentSuccess" MonitorTypeStateID="SuccessEventRaised" HealthState="Success"/>
<OperationalState ID="ComponentWarning" MonitorTypeStateID="WarningEventRaised" HealthState="Warning"/>
<OperationalState ID="ComponentError" MonitorTypeStateID="ErrorEventRaised" HealthState="Error"/>
</OperationalStates>
<Configuration>
<EventIds>22963[3-6]</EventIds>
</Configuration>
</UnitMonitor>