Replication agents (Distribution and Merge) load on Subscriber
This Monitor calculates total run time of each replication agent (Distribution, Merge) on a Subscriber for the time period defined by the threshold. If any agent’s total run time exceeds the defined threshold, an alert is fired in the following format:
The Subscriber (Name: '{0}', Server: '{1}') has detected {2}% utilization.
SQL Server agent failures and replication agent jobs retries
High volume of commands and large transactions.
Slow agent job performance.
Slow network between servers participating to the Replication topology.
Open the retrying agent job history for any error message, then investigate and address the errors accordingly.
Run sp_who2 when the Job runs and check if it is blocked by some other session. If it is blocked, resolve the blocking or wait for the other session to complete.
Ensure that the network latency between Distributor and Subscriber (distribution agent) is not too high.
Identify and investigate any slow running queries.
Ensure statistics are up to date by running sp_updatestats against the subscribing databases. ( http://msdn.microsoft.com/library/ms173804.aspx)
Ensure indexes on the subscribing database are not heavily fragmented.
Name | Description | Default Value |
Alert Priority | Defines Alert Priority. | Normal |
Alert Severity | Defines Alert Severity. | Error |
Enabled | Enables or disables the workflow. | Yes |
Error Threshold | Error Threshold | 4 |
Generates Alerts | Defines whether the workflow generates an Alert. | Yes |
Interval (seconds) | The recurring interval of time in seconds in which to run the workflow. | 300 |
Measurement Period (hours) | Period of time used for measurement (hours) | 24 |
Synchronization Time | Synchronization Time |
|
Timeout (seconds) | Specifies the time the workflow is allowed to run before being closed and marked as failed. | 300 |
Timeout for database connection (seconds) | The workflow will fail and register an event, if it cannot access the database during the specified period. | 15 |
Warning Threshold | Warning Threshold | 3 |
Target | Microsoft.SQLServer.2012.Replication.Subscriber | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.SQLServer.2012.Replication.MonitorType.DistributorAgentsLoad | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Microsoft.SQLServer.Replication.Monitoring.RunAs.Monitor |
<UnitMonitor ID="Microsoft.SQLServer.2012.Replication.Monitor.SubscriberAgentsLoad" Accessibility="Public" Enabled="true" Target="MS2RD!Microsoft.SQLServer.2012.Replication.Subscriber" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.2012.Replication.MonitorType.DistributorAgentsLoad" ConfirmDelivery="false" RunAs="MSRL!Microsoft.SQLServer.Replication.Monitoring.RunAs.Monitor">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.2012.Replication.Monitor.SubscriberAgentsLoad.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericSubscriber']/InstanceName$</AlertParameter1>
<AlertParameter2>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericSubscriber']/ConnectionString$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='LoadHours']$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
<OperationalState ID="Error" MonitorTypeStateID="Error" HealthState="Error"/>
</OperationalStates>
<Configuration>
<SqlTimeout>15</SqlTimeout>
<ConnectionString>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericSubscriber']/ConnectionString$</ConnectionString>
<WarningThreshold>3</WarningThreshold>
<ErrorThreshold>4</ErrorThreshold>
<LastHours>24</LastHours>
<CategoryList>Distribution, Merge</CategoryList>
<ExcludeCategoryList/>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>