Regular health checkup monitor for IBM hardware management software failures
This monitor checks the system management software's health condition.
You can disable this monitor through the Operations Manager's Operations Console. See the "Disable monitors" topic in the Operations Manager's Operations User's Guide for more information.
You can also change the interval between the health checkups by overriding the value of the "IntervalSeconds" parameter of the monitor. See the "Override" topic in the Operations Manager's Operations User's Guide.
The hardware event with this monitor is available only on an IBM system with the appropriate hardware sensors and with a management controller (also called a Service Processor), such as Integrated Management Module (IMM), Baseboard Management Controller (BMC), Remote Supervisor Adapter (RSA), or an equivalent management controller on an older IBM system.
This monitor depends on hardware instrumentation software, namely the IBM Director Platform Agent (also called Core Services) and the Intelligent Platform Management Interface (IPMI) driver stack. This software raises the hardware event to the WMI level, so that the monitor can be notified. On certain configurations, the RSA daemon can be used in place of, or in parallel with, the IPMI driver stack. See the "Additional Information" section below for more information about IBM Director Platform Agent, the IPMI driver stack and the RSA daemon.
When the system management software fails on a target system, a hardware event is generated. The health state of this monitor is then set to the Critical or Warning state.
For a particular incident, review the history in the State Changes tab. Consult the relevant hardware knowledge articles listed below, keeping in mind the relevant event data.
The relevant IBM hardware knowledge articles are available on a system with the IBM Hardware Management Pack package installed.
Director Core Services failed or is not started
The OSA/Avocent IPMI driver failed or is not started
The Microsoft IPMI stack failed or is not started
The RSA-II Daemon failed or is not started
The ServeRAID Manager extension failed or is not started
The ServeRAID-MR Provider failed or is not started
Review the health checkup report's details about the system management software. Contact IBM support (see links below) if the reports or relevant articles do not provide enough information to help you resolve the hardware problem.
After the hardware problem is resolved, the overall health state of this monitor is automatically restored to the Healthy state. However, you must manually close any corresponding alerts that might have occurred.
IBM Director Platform Agent
IMM, BMC and IPMI driver stack
IBM RSA-II and the RSA-II daemon
Links to IBM resources
Target | IBM.WinSw.HwMgmt | ||
Parent Monitor | System.Health.AvailabilityState | ||
Category | Custom | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | MatchMonitorHealth | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | IBM.WinSw.HwMgmt.Failed.MonitorType | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Default |
<UnitMonitor ID="IBM.WinSw.HwMgmt.Failed" Accessibility="Public" Enabled="true" Target="IBM.WinSw.HwMgmt" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="IBM.WinSw.HwMgmt.Failed.MonitorType" ConfirmDelivery="false">
<Category>Custom</Category>
<AlertSettings AlertMessage="IBM.WinSw.HwMgmt.Failed.AlertMessageResourceID">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
<OperationalState ID="Critical" MonitorTypeStateID="Error" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>7200</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>