Performance Threshold: CPU usage > 95\% for 10 minutes

Performance_Threshold__CPU_usage__95__for_10_minutes_1_Rule.AdvancedAlertCriteriaMonitor (UnitMonitor)

Knowledge Base article:

Management Pack
Summary

This warning alert is generated in MOM when CPU usage on a Data Protection Manager (DPM) server exceeds 95 percent for longer than 10 minutes. This level of CPU usage could indicate that a processing bottleneck is affecting performance of the DPM server.

CPU usage is measured by using the Processor/% Processor Time counter for the Microsoft® Windows® operating system. % Processor Time displays the average percentage of busy time observed during a 2-minute sampling interval. This alert is generated if the processor usage remains higher than 95 percent during five successive samples.

The repeat count for this alert is incremented for each successive 10-minute period during which CPU usage remains above 95 percent. When CPU usage again falls below 95 percent, the alert becomes inactive.

 
Causes

Possible causes for very high CPU usage on a DPM server include:

  • Multiple DPM jobs are running simultaneously. Synchronization with consistency check jobs are particularly CPU-intensive.
  • On-the-wire compression has been enabled on the DPM server. On-the-wire compression allows faster data throughput during replica creation and synchronization without negatively affecting network performance. However, it places a large processing load on both the file server and the DPM server.
  • A runaway process is exhausting system resources.
  • The DPM server does not have sufficient processing capacity to handle the DPM workload.
 
Resolutions

If the performance of the DPM server is unacceptably slow, and you determine that CPU usage is a contributing factor, perform the following steps to isolate and relieve the CPU bottleneck.

To get more information about CPU usage

  • On the DPM server, use Task Manager to determine which processes are consuming the most processing capacity. (On the Processes tab, review CPU usage.) If those processes do not return to their normal performance ranges, you may need to perform additional, application-specific diagnostics to identify the cause.

    In DPM, three processes are of interest:

    • DPM File Agent (MsDpmFsAgent.exe)
    • DPM service (MsDpm.exe)
    • DPM Administrator Console (an instance of Mmc.exe)

To resolve the immediate problem

  1. Close DPM Administrator Console.
  2. You may want to consider canceling some non-critical DPM jobs and running them later, when there is less demand for the processor. Do this with care: canceling a protection job can compromise data protection. For more information, see "How to cancel a job" in DPM Help at http://go.microsoft.com/fwlink/?linkid=46350.
  3. Stop and restart the DPM service and the SQL Server service. In Administrative Tools, open Services, and then restart the services by performing the following steps:
    1. Stop the DPM service if it is running. (The service stops automatically when not in use.)
    2. Stop the SQL Server service (MSSQL$Microsoft$DPM$).
    3. Start the SQL Server Service (MSSQL$Microsoft$DPM$).
    4. The DPM service will start automatically the next time that it is needed.

To solve long-term CPU bottlenecks related to DPM jobs

  1. Turn off on-the-wire compression for protection groups. This is an advanced protection option that can be configured for individual protection groups. For more information, see "How to modify protection options" in DPM Help at http://go.microsoft.com/fwlink/?linkid=46350. If you require on-the-wire compression to run protection jobs over a wide area network (WAN), you may need to upgrade your hardware to accommodate the extra CPU load.
  2. Modify protection schedules for protection groups in order to stagger protection jobs. Consider offsetting some synchronization jobs from the beginning of the hour. For more information, see "How to modify protection schedules" in DPM Help.
  3. If the CPU usage continues to be a factor in degraded performance, you might need to either upgrade your hardware or offload some of the protection workload to another DPM server.

    For information about processor requirements for a DPM server, see the " Planning a Deployment" chapter in the Microsoft®System Center Data Protection Manager 2006 Planning and Deployment Guide at http://go.microsoft.com/fwlink/?LinkId=46355.

 
External Knowledge Sources
  • For information about planning the hardware capacity of the DPM server, see the " Planning a Deployment" chapter in the DPM 2006 Planning and Deployment Guide at http://go.microsoft.com/fwlink/?LinkId=46355.
  • For information about monitoring and resolving processor bottlenecks, in addition to other factors in the performance of a computer running the Microsoft® Windows Server™ 2003 operating system, see "Solving performance problems" in Help and Support for Windows Server 2003.
  • For information about analyzing and correcting potential processor bottlenecks on a Windows-based server, see " Analyzing Processor Activity" in the "Performance Monitoring" chapter of the Server Operations Guide in the Microsoft® Windows® 2000 Server Resource Kit at http://go.microsoft.com/fwlink/?LinkId=46518.
 
© 2000-2004 Microsoft Corporation, all rights reserved.

Element properties:

TargetMicrosoft.SystemCenter.DPM.DPM_Class
Parent MonitorPerformance
CategoryStateCollection
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeSystem.Mom.BackwardCompatibility.PerformanceThreshold.TwoStateMonitorType
RemotableTrue
AccessibilityInternal
Alert Message
Performance Threshold: CPU usage > 95\% for 10 minutes
{1}
RunAsDefault
CommentMom2005ID='{942C8155-9E07-4B93-A7EC-68572B12A18F}'

Source Code:

<UnitMonitor ID="Performance_Threshold__CPU_usage__95__for_10_minutes_1_Rule.AdvancedAlertCriteriaMonitor" TypeID="MomBackwardCompatibility!System.Mom.BackwardCompatibility.PerformanceThreshold.TwoStateMonitorType" Accessibility="Internal" Target="Microsoft.SystemCenter.DPM.DPM_Class" Enabled="true" ParentMonitorID="Performance" Comment="Mom2005ID='{942C8155-9E07-4B93-A7EC-68572B12A18F}'">
<Category>StateCollection</Category>
<AlertSettings AlertMessage="Performance_Threshold__CPU_usage__95__for_10_minutes_1_Rule.AdvancedAlertCriteriaMonitor.StringResource">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Name$</AlertParameter1>
<AlertParameter2>$Data/Context/Description$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState HealthState="Warning" MonitorTypeStateID="ConditionTrue" ID="AlertConditionTrue"/>
<OperationalState HealthState="Success" MonitorTypeStateID="ConditionFalse" ID="AlertConditionFalse"/>
</OperationalStates>
<Configuration>
<ServerRole>DPM</ServerRole>
<Component>Performance</Component>
<ServerRoleInstance>$Target/Property[Type="Microsoft.SystemCenter.DPM.DPM_Class"]/Name$</ServerRoleInstance>
<RuleId>$MPElement[Name="Performance_Threshold__CPU_usage__95__for_10_minutes_1_Rule"]$</RuleId>
<Threshold>95</Threshold>
<Operator>Greater</Operator>
</Configuration>
</UnitMonitor>