Alert monitor for Lenovo System temperature condition (platform extended checks)

IBM.SystemX.TemperatureSensor.PlatformExtendedEvents (UnitMonitor)

Knowledge Base article:

Summary

This monitor watches for unusual hardware events that are specific to the temperature sensor on the system hardware platform. Data in the alert and in the state change provides details about the hardware event.

Configuration

You can disable this monitor through the Operations Manager's Operations Console. See the "Disable monitors" topic in the Operations Manager's Operations User's Guide for more information.

Hardware-platform-specific events, such as this one, are delivered asynchronously to this monitor. There is no monitoring interval to configure for this monitor.

The hardware event with this monitor is available only on an Lenovo system with the appropriate hardware sensors and with a management controller (also called a Service Processor), such as Integrated Management Module (IMM), Baseboard Management Controller (BMC), Remote Supervisor Adapter (RSA), or an equivalent management controller on an older Lenovo system.

This monitor depends on hardware instrumentation software, namely the IBM Director Platform Agent (also called Core Services) and the Intelligent Platform Management Interface (IPMI) driver stack. This software raises the hardware event to the WMI level, so that the monitor can be notified. On certain configurations, the RSA daemon can be used in place of, or in parallel with, the IPMI driver stack. See the "Additional Information" section below for more information about Lenovo Director Platform Agent, the IPMI driver stack and the RSA daemon.

Causes

Detailed specifics about the cause of the hardware event are recorded in the alert data and in the state change record. The latest state change of this monitor reflects the severity level of the most recent hardware event recorded by this monitor.

Resolutions

Review the details of the hardware event. Contact Lenovo support (see links below) if the reports or relevant articles do not provide enough information to resolve the hardware problem.

After the hardware problem is resolved, manually reset the health state of this monitor. However, any outstanding corresponding alerts will be automatically closed. See the "Reset Health" topic in the Operations Manager's Operations User's Guide for more information.

To verify that the hardware problem has been resolved, refer to the most recent health state of the corresponding "regular health checkup monitor." Be sure to refer to a health state that was reported later than the hardware event.

Additional

External

Links to Lenovo resources

Element properties:

TargetIBM.SystemX.TemperatureSensor
Parent MonitorSystem.Health.PerformanceState
CategoryCustom
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeIBM.SingleClass.HWInstanceNotEqualMonitorType.ManualReset3State
RemotableTrue
AccessibilityPublic
Alert Message
A temperature sensor platform extended event occurred

{0} -- EventClass = {1}
RunAsDefault

Source Code:

<UnitMonitor ID="IBM.SystemX.TemperatureSensor.PlatformExtendedEvents" Accessibility="Public" Enabled="true" Target="IBM.SystemX.TemperatureSensor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="IBM.SingleClass.HWInstanceNotEqualMonitorType.ManualReset3State" ConfirmDelivery="false">
<Category>Custom</Category>
<AlertSettings AlertMessage="IBM.SystemX.TemperatureSensor.PlatformExtendedEvents.AlertMessageResourceID">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name="Description"]$</AlertParameter1>
<AlertParameter2>$Data/Context/Property[@Name="__CLASS"]$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Success" MonitorTypeStateID="ManualResetEventRaised" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="WarningEventRaised" HealthState="Warning"/>
<OperationalState ID="Error" MonitorTypeStateID="ErrorEventRaised" HealthState="Error"/>
</OperationalStates>
<Configuration>
<NameSpace>root\ibmsd</NameSpace>
<Query>SELECT __Class, AlertingManagedElement, Description, EventID, PerceivedSeverity FROM CIM_AlertIndication</Query>
<CIMAlertClassName>IBMPSG_TemperatureEvent</CIMAlertClassName>
<SpecialFilter>Temperature</SpecialFilter>
<PollInterval>10</PollInterval>
<WinEventFiltering>$Target/Host/Host/Property[Type="IBM.SystemX.Platform"]/ibmInternalWinEventFiltering$</WinEventFiltering>
<Licensed>$Target/Host/Host/Property[Type="IBM.SystemX.Platform"]/Licensed$</Licensed>
</Configuration>
</UnitMonitor>