Total daily execution time of the replication agent

Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentsLoad (UnitMonitor)

This monitors the total daily execution time of the replication agents, Distribution, Log Reader, Merge, Queue Reader and Snapshot on the Distributor.

Knowledge Base article:

Summary

This alert calculates the total run time of each replication agent for the last 24 hours (default), and checks to see if any agent’s total run time exceeds the configured threshold.

The default thresholds produce a warning at 3 hours and a critical warning at 4 hours.

NOTE: Continuous jobs are not monitored.

Causes

When a large number of changes and commands are being replicated, performance bottlenecks can occur. Common causes of the bottlenecks are as follows:

Resolutions

Ensure indexes on the subscribing database are not heavily fragmented.

Additional

Further insights in troubleshooting Replication performance issues can be found in the following article: http://technet.microsoft.com/library/ms152522.aspx

Enable verbose logging to get more information about the specific replication agent’s actions/errors: http://support.microsoft.com/kb/312292

Overrideable Parameters

Name

Description

Default Value

Alert Priority

Defines Alert Priority.

Normal

Alert Severity

Defines Alert Severity.

Error

Enabled

Enables or disables the workflow.

Yes

Error Threshold

Error Threshold

4

Generates Alerts

Defines whether the workflow generates an Alert.

Yes

Interval (seconds)

The recurring interval of time in seconds in which to run the workflow.

300

Measurement Period (hours)

Period of time used for measurement (hours)

24

Synchronization Time

Synchronization Time

 

Timeout (seconds)

Specifies the time the workflow is allowed to run before being closed and marked as failed.

200

Timeout for database connection (seconds)

The workflow will fail and register an event, if it cannot access the database during the specified period.

15

Warning Threshold

Warning Threshold

3

Element properties:

TargetMicrosoft.SQLServer.Replication.Windows.Distributor
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.SQLServer.Replication.Windows.MonitorType.DistributorAgentsLoad
RemotableTrue
AccessibilityPublic
Alert Message
MSSQL on Windows Replication: Replication agents exceeded the total daily execution time threshold.
Distributor (Name: '{0}', Server: '{1}') has experienced heavy utilization for {2} hours of the past 24 hours.
{3}
RunAsMicrosoft.SQLServer.Core.RunAs.Monitoring

Source Code:

<UnitMonitor ID="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentsLoad" Accessibility="Public" Enabled="true" Target="SQLReplWD!Microsoft.SQLServer.Replication.Windows.Distributor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.Replication.Windows.MonitorType.DistributorAgentsLoad" ConfirmDelivery="false" RunAs="SqlCoreLib!Microsoft.SQLServer.Core.RunAs.Monitoring">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentsLoad.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</AlertParameter1>
<AlertParameter2>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='LoadHours']$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
<OperationalState ID="Error" MonitorTypeStateID="Error" HealthState="Error"/>
</OperationalStates>
<Configuration>
<MachineName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/MachineName$</MachineName>
<InstanceName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</InstanceName>
<SqlTimeoutSeconds>15</SqlTimeoutSeconds>
<ConnectionString>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</ConnectionString>
<MonitoringType>$Target/Host/Property[Type="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine"]/MonitoringType$</MonitoringType>
<WarningThreshold>3</WarningThreshold>
<ErrorThreshold>4</ErrorThreshold>
<LastHours>24</LastHours>
<CategoryList>Distribution, LogReader, Merge, QueueReader, Snapshot</CategoryList>
<ExcludeCategoryList/>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>200</TimeoutSeconds>
</Configuration>
</UnitMonitor>