Total daily execution time of the replication agent

Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentsLoad (UnitMonitor)

This monitors the total daily execution time of the replication agents, Distribution, Log Reader, Merge, Queue Reader and Snapshot on the Distributor.

Knowledge Base article:

Summary

This alert calculates the total run time of each replication agent for the last 24 hours (default), and checks to see if any agent’s total run time exceeds the configured threshold.

The default thresholds produce a warning at 3 hours and a critical warning at 4 hours.

NOTE: Continuous jobs are not monitored.

Causes

When a large number of changes and commands are being replicated, performance bottlenecks can occur. Common causes of the bottlenecks are as follows:

Poor query performance on the Subscriber due to outdated database statistics or fragmented indexes.
Resource contention on the Subscriber: write queries executed by the replication agent may experience contention with read queries from applications running against the subscription database.
Network latency
Untraced errors at the agent level, which are only logged when verbose logging is enabled.

Resolutions

From the Job Monitor of SQL Server Management Studio or Replication Monitor, check for progress messages of the replication agents.
Identify and investigate any slow running queries on the Subscriber.
Ensure the activities of the Replication agents are not slowed down by other processes on the Subscriber. If this happens, consider scheduling these competing activities at different times.
Ensure statistics are up to date by running sp_updatestats against the distribution and subscribing databases. ( http://msdn.microsoft.com/library/ms173804.aspx)

Ensure indexes on the subscribing database are not heavily fragmented.

Additional

Further insights in troubleshooting Replication performance issues can be found in the following article: http://technet.microsoft.com/library/ms152522.aspx

Enable verbose logging to get more information about the specific replication agent’s actions/errors: http://support.microsoft.com/kb/312292

Overrideable Parameters

Name	Description	Default Value
Alert Priority	Defines Alert Priority.	Normal
Alert Severity	Defines Alert Severity.	Error
Enabled	Enables or disables the workflow.	Yes
Error Threshold	Error Threshold	4
Generates Alerts	Defines whether the workflow generates an Alert.	Yes
Interval (seconds)	The recurring interval of time in seconds in which to run the workflow.	300
Measurement Period (hours)	Period of time used for measurement (hours)	24
Synchronization Time	Synchronization Time
Timeout (seconds)	Specifies the time the workflow is allowed to run before being closed and marked as failed.	200
Timeout for database connection (seconds)	The workflow will fail and register an event, if it cannot access the database during the specified period.	15
Warning Threshold	Warning Threshold	3

Element properties:

Target

Microsoft.SQLServer.Replication.Windows.Distributor

Parent Monitor

System.Health.PerformanceState

Category

PerformanceHealth

Enabled

True

Alert Generate

True

Alert Severity

Error

Alert Priority

Normal

Alert Auto Resolve

True

Monitor Type

Microsoft.SQLServer.Replication.Windows.MonitorType.DistributorAgentsLoad

Remotable

True

Accessibility

Public

Alert Message

MSSQL on Windows Replication: Replication agents exceeded the total daily execution time threshold.

Distributor (Name: '{0}', Server: '{1}') has experienced heavy utilization for {2} hours of the past 24 hours.
{3}

RunAs

Microsoft.SQLServer.Core.RunAs.Monitoring

Source Code:

<UnitMonitor ID="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentsLoad" Accessibility="Public" Enabled="true" Target="SQLReplWD!Microsoft.SQLServer.Replication.Windows.Distributor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.Replication.Windows.MonitorType.DistributorAgentsLoad" ConfirmDelivery="false" RunAs="SqlCoreLib!Microsoft.SQLServer.Core.RunAs.Monitoring">

  <Category>PerformanceHealth</Category>

  <AlertSettings AlertMessage="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentsLoad.AlertMessage">

    <AlertOnState>Error</AlertOnState>

    <AutoResolve>true</AutoResolve>

    <AlertPriority>Normal</AlertPriority>

    <AlertSeverity>Error</AlertSeverity>

    <AlertParameters>

      <AlertParameter1>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</AlertParameter1>

      <AlertParameter2>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</AlertParameter2>

      <AlertParameter3>$Data/Context/Property[@Name='LoadHours']$</AlertParameter3>

      <AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>

    </AlertParameters>

  </AlertSettings>

  <OperationalStates>

    <OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>

    <OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>

    <OperationalState ID="Error" MonitorTypeStateID="Error" HealthState="Error"/>

  </OperationalStates>

  <Configuration>

    <MachineName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/MachineName$</MachineName>

    <InstanceName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</InstanceName>

    <SqlTimeoutSeconds>15</SqlTimeoutSeconds>

    <ConnectionString>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</ConnectionString>

    <MonitoringType>$Target/Host/Property[Type="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine"]/MonitoringType$</MonitoringType>

    <WarningThreshold>3</WarningThreshold>

    <ErrorThreshold>4</ErrorThreshold>

    <LastHours>24</LastHours>

    <CategoryList>Distribution, LogReader, Merge, QueueReader, Snapshot</CategoryList>

    <ExcludeCategoryList/>

    <IntervalSeconds>300</IntervalSeconds>

    <SyncTime/>

    <TimeoutSeconds>200</TimeoutSeconds>

  </Configuration>

</UnitMonitor>