Agent processor utilization

Microsoft.SystemCenter.HealthService.SCOMpercentageCPUTimeMonitor (UnitMonitor)

Monitor all agent processes to identify potential issues with the agent using too much processor time.

Knowledge Base article:

Summary

This monitor calculates the total CPU utilization of the Operations Manager agent and its related processes, and generates an alert when CPU utilization exceeds a specified threshold for a specified number of consecutive samples.

The monitor’s underlying script works by locating and sampling the CPU utilization for the Operations Manager agent process (HealthService.exe), its child monitoring host process (MonitoringHost.exe) and the child processes of those monitoring host processes (cscript.exe, PowerShell.exe, etc.). The script runs the calculation three times and outputs the average of the three consecutive samples, which is then used by this monitor to determine critical or healthy state.

Configuration

You can use overrides to customize the following parameters to alter the default behavior of this monitor:

This monitor is disabled by default for all management servers.

Causes

Excessive CPU utilization of the various Operations Manager agent processes may indicate that agent or one of its underlying dependencies is not operating properly. If the agent and its underlying dependencies are updated properly, then the agent is being over-utilized on the system being monitored. This may be short-lived, due to a recent update in the management group, such as the deployment of a new management pack, or this may be due to the agent truly being under excessive load, in which case tuning may be required.

Resolutions

To ensure that the agent and its underlying dependencies are operating properly, check the following:

If the condition persists after those configurations are verified, then deeper investigation is required to understand what is driving CPU utilization. Investigate further using any combination of the following steps:

When the cause or causes are identified, any one of the following steps may be taken to address the issue:

If all of the steps above do not produce a solution, contact Microsoft Customer Service and Support (http://support.microsoft.com/).

Additional

This monitor has a related diagnostic task, “Collect agent processor utilization diagnostic”, which reruns the sampling of CPU utilization. The diagnostic task is disabled by default.

There is also a task in the Operations console, ”Get the agent processor utilization”, which reruns the sampling of CPU utilization. When you run the ”Get the agent processor utilization” task, you can set the time-out and number of samples parameters. The task returns a table of results. Run the Get the 'agent processor utilization' task

Element properties:

TargetMicrosoft.SystemCenter.HealthService
Parent MonitorMicrosoft.SystemCenter.HealthService.PerformanceHealthRollup
CategoryCustom
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.SystemCenter.HealthService.SCOMpercentageCPUTimeCounterMonitorType
RemotableFalse
AccessibilityPublic
Alert Message
The Operations Manager agent processes are using too much processor time
The total processor utilization on computer {0} of all agent processes has exceeded the threshold of {1} over multiple samples.
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.SystemCenter.HealthService.SCOMpercentageCPUTimeMonitor" Accessibility="Public" Enabled="onEssentialMonitoring" Target="SCLibrary!Microsoft.SystemCenter.HealthService" ParentMonitorID="Microsoft.SystemCenter.HealthService.PerformanceHealthRollup" Remotable="false" Priority="Normal" TypeID="Microsoft.SystemCenter.HealthService.SCOMpercentageCPUTimeCounterMonitorType" ConfirmDelivery="true">
<Category>Custom</Category>
<AlertSettings AlertMessage="Microsoft.SystemCenter.HealthService.SCOMpercentageCPUTimeMonitor.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</AlertParameter1>
<AlertParameter2>$Data/Context/SampleValue$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="CPUTimeOverThreshold" MonitorTypeStateID="OverThreshold" HealthState="Error"/>
<OperationalState ID="CPUTimeUnderThreshold" MonitorTypeStateID="UnderThreshold" HealthState="Success"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>321</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<SyncTime>00:00</SyncTime>
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</ComputerName>
<Threshold>25</Threshold>
<ConsecutiveSampleCountCritical>6</ConsecutiveSampleCountCritical>
<ConsecutiveSampleCountHealthy>3</ConsecutiveSampleCountHealthy>
</Configuration>
</UnitMonitor>