Health monitor for processor over-temperature Condition

Lenovo.IMM.Processor.OverTemp (UnitMonitor)

Knowledge Base article:

Summary

This monitor watches for a IMM event that indicates that IMM has detected an Over-Temperature Condition Detected for Processor.

Configuration

You can disable this monitor through the Operations Manager's Operations Console. See the "Disable monitors" topic in the Operations Manager's Operations User's Guide for more information.

The IMM(Integrated Management Module) event is delivered to this monitor asynchronously. There is no monitoring interval to configure for this monitor.

The IMM event is delivered to this monitor from the authenticated IMM. Please be sure the IMM is authenticated first. To authenticate an IMM, select an IMM from Lenovo Integrated Management Module (IMM), then run task "Authenticate IMM".

The IMM event is delivered to this monitor from the Server via network port "9500". Please be sure that this port is not blocked by firewall. You can follow below steps to create a rule to pass firewall:

Resolutions

Complete the following steps: Check the IMM event log for any fan or cooling related issues.Make sure that the airflow at the front and rear of the server is not obstructed and that fillers are in place and correctly installed.Make sure that the room temperature is within operating specifications.Make sure that the microprocessor 1 heat sink is securely installed.(Trained service technician only) Make sure that the microprocessor 1 heat sink is installed correctly and the thermal material is correctly applied.(Trained service technician only) Replace microprocessor 1.

The monitor generates an alert when its health state changes to either Critical or Warning. After the hardware problem is resolved, the health state will be automatically restored to the Healthy state. Any outstanding corresponding alerts will also be automatically closed.

External

Links to Lenovo resources

Element properties:

TargetLenovo.HardwareMgmtPack.IMM2.Processor
Parent MonitorSystem.Health.AvailabilityState
CategoryCustom
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityLow
Alert Auto ResolveTrue
Monitor TypeLenovo.IMM.SubComponent.UMT.Id3State
RemotableTrue
AccessibilityPublic
Alert Message
{2} - {3}
The alert was genereted because event "{0}" received from IMM {1}
RunAsDefault

Source Code:

<UnitMonitor ID="Lenovo.IMM.Processor.OverTemp" Accessibility="Public" Target="Lenovo.HardwareMgmtPack.IMM2.Processor" Enabled="true" TypeID="Lenovo.IMM.SubComponent.UMT.Id3State" ParentMonitorID="Health!System.Health.AvailabilityState">
<Category>Custom</Category>
<AlertSettings AlertMessage="Lenovo.IMM.Alert.Rule.AlertMessageID">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Low</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/EventDescription$</AlertParameter1>
<AlertParameter2>$Data/Context/EventSourceName$</AlertParameter2>
<AlertParameter3>$Data/Context/EventData/Data/MessageID[1]$</AlertParameter3>
<AlertParameter4>$Data/Context/EventData/Data/MessageName[1]$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState HealthState="Success" MonitorTypeStateID="HealthyEventRaised" ID="Success"/>
<OperationalState HealthState="Warning" MonitorTypeStateID="WarningEventRaised" ID="Warning"/>
<OperationalState HealthState="Error" MonitorTypeStateID="ErrorEventRaised" ID="Error"/>
</OperationalStates>
<Configuration>
<IMMIP>$Target/Property[Type="Lenovo.HardwareMgmtPack.IMM2.BaseSubModule"]/IMMIP$</IMMIP>
<UnhealthyPlat>PLAT0036</UnhealthyPlat>
<HealthyPlat>PLAT0037</HealthyPlat>
<Identifier>6F010703$Target/Property[Type="Lenovo.HardwareMgmtPack.IMM2.Processor"]/Identifier$</Identifier>
</Configuration>
</UnitMonitor>