Monitoring Host Handle Count Threshold

Microsoft.SystemCenter.Agent.MonitoringHost.HandleCountThreshold (UnitMonitor)

This monitor checks if the average Process\Handle Count counter for the MonitoringHost.exe process exceeds the configured threshold. When this threshold is reached, a Recovery is automatically triggered to restart the Health Service.

Knowledge Base article:

Summary

This rule measures the Process\Handle Count utilization for all instances of the MonitoringHost.exe process. If it exceeds the configured threshold, a response attempts to restart the Health Service to ensure it doesn’t continue to overwhelm the computer.

There are different thresholds depending on the role that Health Service configured to perform. The following summarizes the default thresholds:

Health Service Role

Handle Count Threshold

Agent

2,000

Management Server

10,000

Below is the configuration for the response that attempts to restart the Health Service:

Health Service Role

Restart Response Behavior

Agent

Enabled

Management Server

No restart response

Causes

A brief summary of potential causes are:

Too many rules and monitors are loaded from all the management packs this Health Service has been configured with.

A misconfigured rule or monitor collecting too much data or processing too much data (e.g. performance counter collection rule collecting data every 1 second)

This can be caused by the Health Service running many management packs. Each Management Pack may have a lot of monitoring that uses a small amount of resources. With many management packs that add up to many thousands of rules and monitors, each MonitoringHost.exe instance may start consuming more resources.

This may be expected for this Health Service depending on the type of monitoring the Health Service is performing.

Another cause could be one or more rules and monitors that are not conforming to some best practices. An example is a performance counter rule that attempts to collect performance data every 1 second. Too many rules or monitors configured this way will cause the MonitoringHoste.exe process to consume more resources.

Resolutions

The default action for this rule running on agents is to restart the Health Service. Because this recovery is enabled by default on agents, no user action is required.

Note that the Health Service may not restart correctly if the action account that this agent has been configured with does not have the right permissions to restart the service.

If this is the case, start the HealthService windows service.

If you feel that the resource utilization is appropriate for the amount of monitoring being performed by this agent, you can override the threshold or disable the rule.

Element properties:

TargetMicrosoft.SystemCenter.Agent
Parent MonitorMicrosoft.SystemCenter.HealthService.ServiceStateRollup
CategoryStateCollection
EnabledTrue
Alert GenerateFalse
Alert Auto ResolveFalse
Monitor TypeMicrosoft.SystemCenter.Agent.Performance.AveragerThresholdWithSingleSampleSuccessState
RemotableFalse
AccessibilityInternal
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.SystemCenter.Agent.MonitoringHost.HandleCountThreshold" Accessibility="Internal" Enabled="true" Target="SCLibrary!Microsoft.SystemCenter.Agent" ParentMonitorID="Microsoft.SystemCenter.HealthService.ServiceStateRollup" Remotable="false" Priority="High" TypeID="Microsoft.SystemCenter.Agent.Performance.AveragerThresholdWithSingleSampleSuccessState" ConfirmDelivery="false">
<Category>StateCollection</Category>
<OperationalStates>
<OperationalState ID="HandleCountUnderThreshold" MonitorTypeStateID="UnderThreshold" HealthState="Success"/>
<OperationalState ID="HandleCountOverThreshold" MonitorTypeStateID="OverThreshold" HealthState="Error"/>
</OperationalStates>
<Configuration>
<ObjectName>Process</ObjectName>
<CounterName>Handle Count</CounterName>
<InstanceName>MonitoringHost*</InstanceName>
<Frequency>120</Frequency>
<NumSamples>5</NumSamples>
<Threshold>2000</Threshold>
</Configuration>
</UnitMonitor>