Repeated 'Temperature' Performance Monitor Communication Problem Monitor

Fujitsu.Servers.PRIMERGY.OutOfBand.PerfMon.Temperature.RepeatedCommunicationProblem (UnitMonitor)

This monitor checks if there are multiple communication problems related to retrieving the 'Temperature Performance Monitor' information of the Fujitsu Out-Of-Band Server within a defined timespan.

Knowledge Base article:

Summary

This monitor checks the if there have been multiple communication problems logged within a certain time period when accessing the 'Temperature Performance Monitor' information of the Fujitsu Out-Of-Band Server. This typically indicates a networking problem or an internal resource problem with the iRMC itself.

Note: The health state resets itself to Success (OK) if there are no more events logged within a second configurable time period!

Causes

The iRMC is no longer answering any requests over the network.

The iRMC Web Server is no longer reliable answering https requests over the network.

Resolutions

Check if the iRMC can be reached over the network with ping. If not, please contact your Network Administrator.

Check if the iRMC Web Interface can be reached over HTTP or HTTPS. If the iRMC does respond to pings but not to HTTP or HTTPS requests this typically indicated an internal resource problem of the iRMC.

Check if the iRMC can be reached over the network with an IPMI based tool.

If the problem persists, reboot the iRMC (and not the Out-Of-Band Server!) with the help of an IPMI tool such as ipmiview32/ipmiview64 from Fujitsu or any Open Source tool for IPMI such as ipmiutil (see http://ipmiutil.sourceforge.net/ ), FreeIPMI ( see http://www.gnu.org/software/freeipmi/) or ipmitool (see http://sourceforge.net/projects/ipmitool/) .

If you do not have access to an IPMI tool or the iRMC does not answer to IPMI requests you need to A/C fail the server by unplugging all power cables and wait at least 60 seconds before connecting the server to your power source again.

Element properties:

TargetFujitsu.Servers.PRIMERGY.OutOfBand.CommunicationMonitor
Parent MonitorSystem.Health.AvailabilityState
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeFujitsu.Servers.PRIMERGY.OutOfBand.SelfResolvingRepeatedEventsMonitorType
RemotableTrue
AccessibilityPublic
Alert Message
Fujitsu Out-Of-Band {0}: Repeated 'Temperature' Performance Monitor Communication Problem

There have been at least {1} communication problems retrieving the 'Temperature Performance Monitor' information from the Fujitsu Out-Of-Band Server {0} within the following time window:
Time Window Start: {2}
Time Window End: {3}
Time First Event: {4}
Time Last Event: {5}
RunAsDefault

Source Code:

<UnitMonitor ID="Fujitsu.Servers.PRIMERGY.OutOfBand.PerfMon.Temperature.RepeatedCommunicationProblem" Accessibility="Public" Enabled="true" Remotable="true" Priority="Normal" ConfirmDelivery="true" Target="FujitsuOutOfBand!Fujitsu.Servers.PRIMERGY.OutOfBand.CommunicationMonitor" ParentMonitorID="Health!System.Health.AvailabilityState" TypeID="FujitsuOutOfBand!Fujitsu.Servers.PRIMERGY.OutOfBand.SelfResolvingRepeatedEventsMonitorType">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Fujitsu.Servers.PRIMERGY.OutOfBand.PerfMon.Temperature.RepeatedCommunicationProblem_AlertMessageResourceID">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type="FTSLIB!Fujitsu.ServerView.Server"]/NetworkName$</AlertParameter1>
<AlertParameter2>$Data/Context/Count$</AlertParameter2>
<AlertParameter3>$Data/Context/TimeWindowStart$</AlertParameter3>
<AlertParameter4>$Data/Context/TimeWindowEnd$</AlertParameter4>
<AlertParameter5>$Data/Context/TimeFirst$</AlertParameter5>
<AlertParameter6>$Data/Context/TimeLast$</AlertParameter6>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="RepeatedEventRaised" MonitorTypeStateID="RepeatedEventRaised" HealthState="Warning"/>
<OperationalState ID="RepeatedEventReset" MonitorTypeStateID="RepeatedEventReset" HealthState="Success"/>
</OperationalStates>
<Configuration>
<ComputerName>.</ComputerName>
<LogName>Operations Manager</LogName>
<FilterExpression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="UnsignedInteger">EventDisplayNumber</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<!-- $ERROR_NO_TEMP_INFORMATION -->
<Value Type="UnsignedInteger">8233</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<!-- IP Address ... -->
<XPathQuery Type="String">Params/Param[1]</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">$Target/Host/Property[Type="FTSLIB!Fujitsu.ServerView.Server"]/NetworkName$</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</And>
</FilterExpression>
<ConsolidationEventDisplayNumber>EventDisplayNumber</ConsolidationEventDisplayNumber>
<ConsolidationPublisherName>PublisherName</ConsolidationPublisherName>
<!-- Default monitoring is every 900 Seconds, report 3 consecutive failed attempts within 2 hours -->
<RepeatedEventCount>3</RepeatedEventCount>
<IntervalSeconds>7200</IntervalSeconds>
<!-- slightly larger than 3 times default monitoring interval -->
<NoEventIntervalSeconds>3000</NoEventIntervalSeconds>
</Configuration>
</UnitMonitor>