This monitor checks if the following Replication agent jobs are in healthy state: Distribution agent, Merge agent, Queue Reader agent, Log reader agent or Snapshot agent. If any of the agents are in failed state, the monitor will be triggered.
This monitor checks the health state of the jobs for the follow replication agents: Distribution Agent, Merge Agent, Queue Reader Agent, Log reader Agent and Snapshot Agent. If any of the agents’ jobs are in failed state, the monitor will be triggered.
The Replication agent jobs can fail due to many reasons:
SQL Server Agent failure.
Agent configuration issues, such as improper parameter values.
Network issue that prevent or slow down access to the Subscriber or Distributor.
Data integrity errors, such as “row not found on Subscriber”.
Query timeouts.
To resolve the issue, try the following:
Make sure the SQL Server Agent is running;
Check replication monitor, or look at the agent job history for any error message and investigate/fix accordingly.
Enable verbose logging and rerun the job to obtain detailed error information.
How to enable replication agents for logging to output files in SQL Server:
http://support.microsoft.com/kb/312292
Name | Description | Default Value |
Alert Priority | Defines Alert Priority. | Normal |
Alert Severity | Defines Alert Severity. | Warning |
Enabled | Enables or disables the workflow. | Yes |
Failed jobs count threshold | Failed jobs count threshold | 1 |
Generates Alerts | Defines whether the workflow generates an Alert. | Yes |
Interval (seconds) | The recurring interval of time in seconds in which to run the workflow. | 300 |
Per-Job threshold | Per-Job threshold | 1 |
Synchronization Time | Synchronization Time |
|
Timeout (seconds) | Specifies the time the workflow is allowed to run before being closed and marked as failed. | 300 |
Timeout for database connection (seconds) | The workflow will fail and register an event, if it cannot access the database during the specified period. | 15 |
Target | Microsoft.SQLServer.2008.Replication.Distributor | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Warning | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.SQLServer.2008.Replication.MonitorType.DistributorFailJobs | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Microsoft.SQLServer.Replication.Monitoring.RunAs.Monitor |
<UnitMonitor ID="Microsoft.SQLServer.2008.Replication.Monitor.ReplicationAgentFailJobs" Accessibility="Public" Enabled="true" Target="MS2RD!Microsoft.SQLServer.2008.Replication.Distributor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.2008.Replication.MonitorType.DistributorFailJobs" ConfirmDelivery="false" RunAs="MSRL!Microsoft.SQLServer.Replication.Monitoring.RunAs.Monitor">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.2008.Replication.Monitor.ReplicationAgentFailJobs.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Warning</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/InstanceName$</AlertParameter1>
<AlertParameter2>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/ConnectionString$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='DistributorFailJobs']$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<SqlTimeout>15</SqlTimeout>
<ConnectionString>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/ConnectionString$</ConnectionString>
<ThresholdCountOfFailsForJob>1</ThresholdCountOfFailsForJob>
<ThresholdCountOfFailedJobs>1</ThresholdCountOfFailedJobs>
<CategoryList>Distribution</CategoryList>
<ExcludeCategoryList/>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>