This monitors the total daily execution time of the replication agents, Distribution, Log Reader, Merge, Queue Reader and Snapshot on the Distributor.
This alert calculates the total run time of each replication agent for the last 24 hours (default), and checks to see if any agent’s total run time exceeds the configured threshold.
The default thresholds produce a warning at 3 hours and a critical warning at 4 hours.
NOTE: Continuous jobs are not monitored.
When a large number of changes and commands are being replicated, performance bottlenecks can occur. Common causes of the bottlenecks are as follows:
Poor query performance on the Subscriber due to outdated database statistics or fragmented indexes.
Resource contention on the Subscriber: write queries executed by the replication agent may experience contention with read queries from applications running against the subscription database.
Network latency
Untraced errors at the agent level, which are only logged when verbose logging is enabled.
From the Job Monitor of SQL Server Management Studio or Replication Monitor, check for progress messages of the replication agents.
Identify and investigate any slow running queries on the Subscriber.
Ensure the activities of the Replication agents are not slowed down by other processes on the Subscriber. If this happens, consider scheduling these competing activities at different times.
Ensure statistics are up to date by running sp_updatestats against the distribution and subscribing databases. ( http://msdn.microsoft.com/library/ms173804.aspx)
Ensure indexes on the subscribing database are not heavily fragmented.
Further insights in troubleshooting Replication performance issues can be found in the following article: http://technet.microsoft.com/library/ms152522.aspx
Enable verbose logging to get more information about the specific replication agent’s actions/errors: http://support.microsoft.com/kb/312292
Name | Description | Default Value |
Alert Priority | Defines Alert Priority. | Normal |
Alert Severity | Defines Alert Severity. | Error |
Enabled | Enables or disables the workflow. | Yes |
Error Threshold | Error Threshold | 4 |
Generates Alerts | Defines whether the workflow generates an Alert. | Yes |
Interval (seconds) | The recurring interval of time in seconds in which to run the workflow. | 300 |
Measurement Period (hours) | Period of time used for measurement (hours) | 24 |
Synchronization Time | Synchronization Time |
|
Timeout (seconds) | Specifies the time the workflow is allowed to run before being closed and marked as failed. | 200 |
Timeout for database connection (seconds) | The workflow will fail and register an event, if it cannot access the database during the specified period. | 15 |
Warning Threshold | Warning Threshold | 3 |
Target | Microsoft.SQLServer.Replication.Windows.Distributor | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.SQLServer.Replication.Windows.MonitorType.DistributorAgentsLoad | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Microsoft.SQLServer.Core.RunAs.Monitoring |
<UnitMonitor ID="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentsLoad" Accessibility="Public" Enabled="true" Target="SQLReplWD!Microsoft.SQLServer.Replication.Windows.Distributor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.Replication.Windows.MonitorType.DistributorAgentsLoad" ConfirmDelivery="false" RunAs="SqlCoreLib!Microsoft.SQLServer.Core.RunAs.Monitoring">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentsLoad.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</AlertParameter1>
<AlertParameter2>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='LoadHours']$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
<OperationalState ID="Error" MonitorTypeStateID="Error" HealthState="Error"/>
</OperationalStates>
<Configuration>
<MachineName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/MachineName$</MachineName>
<InstanceName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</InstanceName>
<SqlTimeoutSeconds>15</SqlTimeoutSeconds>
<ConnectionString>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</ConnectionString>
<MonitoringType>$Target/Host/Property[Type="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine"]/MonitoringType$</MonitoringType>
<WarningThreshold>3</WarningThreshold>
<ErrorThreshold>4</ErrorThreshold>
<LastHours>24</LastHours>
<CategoryList>Distribution, LogReader, Merge, QueueReader, Snapshot</CategoryList>
<ExcludeCategoryList/>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>200</TimeoutSeconds>
</Configuration>
</UnitMonitor>