Überprüfung des Zustands der Komponenten.
This monitor regularly does health checkup's for all the components, and reports critical and warning health problems from SCVMM PRO's perspective. More details on this event are available through the Lenovo Hardware Management Pack in SCOM. NOTE: If you dismiss this PRO Tip, you will need to manually clear the monitor state of the affected machine in the Lenovo HW PRO MP in SCOM. If you implement this PRO Tip, the machine that generated this event will be placed into Maintenance Mode in SCVMM and any VMs on it will be migrated. You will need to manually remove it from Maintenance Mode once the problem is resolved.
You can enable or disable this monitor or configure it to run with a different monitoring interval by changing the "override-controlled" parameters of this monitor. See the Operations Manager documentation about "Override" for more information.
When one or more hardware component has health problem(s), an alert is raised and the health state is set to indicate an error, according to the severity of the implication to the virtual machines on the host. It is important to note that the health checkup report as in "State Change Events" is based on the severity level as seen from the hardware component perspective. This monitor does further filtering and report the severity from SCVMM PRO's perspective. For example, a critical error of a system cooling fan is not considered a critical error from SCVMM PRO's perspective, and therefore reported as a warning.
There could be a variety of causes that could cause a compenent health problem. Exact details of the problem are available in the "State Change Events" tab in Operations Manager's Health Explorer.
State View: SCVMM-Managed Hosts on Lenovo Servers
For critical PRO errors, all virtual machines on this host should be migrated immediately and the host put into maintenance mode, until all the problems are resolved. Although for warnings of PRO errors that can be done a bit later, but that should still be taken care of at the earliest possible moment. Once all the errors are cleared, the alert will be automatically closed and the health state will be reset automatically.
Target | IBM.HWPRO.VMHost.DirAgent.5.x | ||
Parent Monitor | IBM.HWPRO.IBMPRORollupMonitor | ||
Category | Custom | ||
Enabled | False | ||
Alert Generate | True | ||
Alert Severity | MatchMonitorHealth | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | IBM.HWPRO.ComponentHealth.RegularCheckup.MonitorType | ||
Remotable | True | ||
Accessibility | Internal | ||
Alert Message |
| ||
RunAs | Default |
<UnitMonitor Accessibility="Internal" ConfirmDelivery="false" Enabled="true" ID="IBM.HWPRO.ComponentHealth.RegularCheckup" ParentMonitorID="IBM.HWPRO.IBMPRORollupMonitor" Priority="Normal" Remotable="true" Target="IBM.HWPRO.VMHost.DirAgent.5.x" TypeID="IBM.HWPRO.ComponentHealth.RegularCheckup.MonitorType">
<!-- This is an auto-reset monitor, as defined by RegularCheckup.MonitorType. -->
<Category>Custom</Category>
<AlertSettings AlertMessage="IBM.HWPRO.ComponentHealth.RegularCheckup.AlertMessageResourceID">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Property[@Name="[List of Critical Errors]"]$</AlertParameter1>
<AlertParameter2>$Data/Context/Property[@Name="[List of Warnings]"]$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState HealthState="Error" ID="Error" MonitorTypeStateID="State.Error"/>
<OperationalState HealthState="Warning" ID="Warning" MonitorTypeStateID="State.Warning"/>
<OperationalState HealthState="Success" ID="Success" MonitorTypeStateID="State.Healthy"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>7200</IntervalSeconds>
<!-- default to be 7200 secs = 2 hrs -->
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>