This monitor checks if the following Replication agent jobs are in a healthy state: Distribution agent, Merge agent, QueueReader agent, Log reader agent or Snapshot agent. If any of the agents are in a failed state, the monitor will be triggered.
This monitor checks the healthy state of the jobs for the follow replication agents: Distribution Agent, Merge Agent, QueueReader Agent, Log reader Agent and Snapshot Agent. If any of the agents’ jobs are in a failed state, the monitor will be triggered.
The Replication agent jobs can fail due to many reasons:
SQL Server Agent failure.
Agent configuration issues, such as improper parameter values.
Network issue that prevent or slow down access the subscriber or distributor.
Data integrity errors such as “row not found on subscriber”.
Query timeouts.
To resolve the issue try the following:
Make sure the SQL Server Agent is running;
Check replication monitor or look at the agent job history for any error message and investigate/fix accordingly.
Enable verbose logging and rerun the job to obtain detailed error information.
How to enable replication agents for logging to output files in SQL Server:
http://support.microsoft.com/kb/312292
Name | Description | Default Value |
Enabled | Enables or disables the workflow | Yes |
Generates Alerts | Defines whether the workflow generates an Alert | Yes |
Interval (seconds) | The recurring interval of time in seconds in which to run the workflow. | 300 |
Timeout (seconds) | Timeout (seconds) | 300 |
Synchronization Time | Synchronization Time |
|
Failed jobs count threshold | Failed jobs count threshold | 1 |
Per-Job threshold | Per-Job threshold | 1 |
Target | Microsoft.SQLServer.2008.Replication.Distributor | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Warning | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.SQLServer.2008.Replication.MonitorType.DistributorFailJobs | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Microsoft.SQLServer.Replication.Monitoring.RunAs.Monitor |
<UnitMonitor ID="Microsoft.SQLServer.2008.Replication.Monitor.ReplicationAgentFailJobs" Accessibility="Public" Enabled="true" Target="MS2RD!Microsoft.SQLServer.2008.Replication.Distributor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.2008.Replication.MonitorType.DistributorFailJobs" ConfirmDelivery="false" RunAs="MSRL!Microsoft.SQLServer.Replication.Monitoring.RunAs.Monitor">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.2008.Replication.Monitor.ReplicationAgentFailJobs.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Warning</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/InstanceName$</AlertParameter1>
<AlertParameter2>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/ConnectionString$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='DistributorFailJobs']$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<SqlTimeout>300</SqlTimeout>
<ConnectionString>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/ConnectionString$</ConnectionString>
<ThresholdCountOfFailsForJob>1</ThresholdCountOfFailsForJob>
<ThresholdCountOfFailedJobs>1</ThresholdCountOfFailedJobs>
<CategoryList>Distribution</CategoryList>
<ExcludeCategoryList/>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
</Configuration>
</UnitMonitor>