This monitor checks if the following Replication Agent jobs are in a healthy state: Distribution Agent, Merge Agent, Snapshot Agent, Log Reader Agent, Queue Reader Agent. If any of the Agents are in a failed state, the monitor will be triggered.
This monitor checks if the following Replication Agent jobs are in a healthy state: Distribution Agent, Merge Agent, Snapshot Agent, Log Reader Agent, Queue Reader Agent. If any of the Agents are in a failed state, the monitor will be triggered.
The Replication Agent jobs can fail due to many reasons:
SQL Server Agent failure.
Agent configuration issues such as improper parameter values.
Network issue that prevent or slow down access to the Subscriber or distributor.
Data integrity errors, such as "row not found on Subscriber".
Query timeouts.
To resolve the issue, try the following:
Make sure the SQL Server Agent is running.
Check replication monitor or look at the Agent job history for any error message and investigate/fix accordingly.
Enable verbose logging and rerun the job to obtain detailed error information.
How to enable replication Agents for logging to output files in SQL Server:
http://support.microsoft.com/kb/312292
Name | Description | Default Value |
Alert Priority | Defines Alert Priority. | Normal |
Alert Severity | Defines Alert Severity. | Warning |
Enabled | Enables or disables the workflow. | Yes |
Failed jobs count threshold | Failed jobs count threshold. | 1 |
Generates Alerts | Defines whether the workflow generates an Alert. | Yes |
Interval (seconds) | The recurring interval of time in seconds in which to run the workflow. | 300 |
Per-Job threshold | Count of fails per-job threshold. | 1 |
Synchronization Time | The synchronization time specified by using a 24-hour format. May be omitted. |
|
Timeout (seconds) | Specifies the time the workflow is allowed to run before being closed and marked as failed. | 200 |
Timeout for database connection (seconds) | The workflow will fail and register an event if it cannot access the database during the specified period. | 15 |
Target | Microsoft.SQLServer.Replication.Windows.Distributor | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Warning | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.SQLServer.Replication.Windows.MonitorType.DistributorFailJobs | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Microsoft.SQLServer.Core.RunAs.Monitoring |
<UnitMonitor ID="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentFailJobs" Accessibility="Public" Enabled="true" Target="SQLReplWD!Microsoft.SQLServer.Replication.Windows.Distributor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.Replication.Windows.MonitorType.DistributorFailJobs" ConfirmDelivery="false" RunAs="SqlCoreLib!Microsoft.SQLServer.Core.RunAs.Monitoring">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentFailJobs.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Warning</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</AlertParameter1>
<AlertParameter2>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='DistributorFailJobs']$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<MachineName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/MachineName$</MachineName>
<NetbiosComputerName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/NetbiosComputerName$</NetbiosComputerName>
<InstanceName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</InstanceName>
<SqlTimeoutSeconds>15</SqlTimeoutSeconds>
<ConnectionString>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</ConnectionString>
<MonitoringType>$Target/Host/Property[Type="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine"]/MonitoringType$</MonitoringType>
<ThresholdCountOfFailsForJob>1</ThresholdCountOfFailsForJob>
<ThresholdCountOfFailedJobs>1</ThresholdCountOfFailedJobs>
<CategoryList>Distribution, LogReader, Merge, QueueReader, Snapshot</CategoryList>
<ExcludeCategoryList/>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>200</TimeoutSeconds>
</Configuration>
</UnitMonitor>