Replication Agents failed on the Distributor

Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentFailJobs (UnitMonitor)

This monitor checks if the following Replication Agent jobs are in a healthy state: Distribution Agent, Merge Agent, Snapshot Agent, Log Reader Agent, Queue Reader Agent. If any of the Agents are in a failed state, the monitor will be triggered.

Knowledge Base article:

Summary

This monitor checks if the following Replication Agent jobs are in a healthy state: Distribution Agent, Merge Agent, Snapshot Agent, Log Reader Agent, Queue Reader Agent. If any of the Agents are in a failed state, the monitor will be triggered.

Causes

The Replication Agent jobs can fail due to many reasons:

Resolutions

To resolve the issue, try the following:

External

How to enable replication Agents for logging to output files in SQL Server:

http://support.microsoft.com/kb/312292

Overrideable Parameters

Name

Description

Default Value

Alert Priority

Defines Alert Priority.

Normal

Alert Severity

Defines Alert Severity.

Warning

Enabled

Enables or disables the workflow.

Yes

Failed jobs count threshold

Failed jobs count threshold.

1

Generates Alerts

Defines whether the workflow generates an Alert.

Yes

Interval (seconds)

The recurring interval of time in seconds in which to run the workflow.

300

Per-Job threshold

Count of fails per-job threshold.

1

Synchronization Time

The synchronization time specified by using a 24-hour format. May be omitted.

 

Timeout (seconds)

Specifies the time the workflow is allowed to run before being closed and marked as failed.

200

Timeout for database connection (seconds)

The workflow will fail and register an event if it cannot access the database during the specified period.

15

Element properties:

TargetMicrosoft.SQLServer.Replication.Windows.Distributor
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityWarning
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.SQLServer.Replication.Windows.MonitorType.DistributorFailJobs
RemotableTrue
AccessibilityPublic
Alert Message
MSSQL on Windows Replication: Failed jobs detected on the Distributor.
The Distributor (Name: '{0}', Server: '{1}') has detected {2} failed job(s).
{3}
RunAsMicrosoft.SQLServer.Core.RunAs.Monitoring

Source Code:

<UnitMonitor ID="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentFailJobs" Accessibility="Public" Enabled="true" Target="SQLReplWD!Microsoft.SQLServer.Replication.Windows.Distributor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.Replication.Windows.MonitorType.DistributorFailJobs" ConfirmDelivery="false" RunAs="SqlCoreLib!Microsoft.SQLServer.Core.RunAs.Monitoring">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.Replication.Windows.Monitor.ReplicationAgentFailJobs.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Warning</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</AlertParameter1>
<AlertParameter2>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='DistributorFailJobs']$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<MachineName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/MachineName$</MachineName>
<NetbiosComputerName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/NetbiosComputerName$</NetbiosComputerName>
<InstanceName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</InstanceName>
<SqlTimeoutSeconds>15</SqlTimeoutSeconds>
<ConnectionString>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</ConnectionString>
<MonitoringType>$Target/Host/Property[Type="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine"]/MonitoringType$</MonitoringType>
<ThresholdCountOfFailsForJob>1</ThresholdCountOfFailsForJob>
<ThresholdCountOfFailedJobs>1</ThresholdCountOfFailedJobs>
<CategoryList>Distribution, LogReader, Merge, QueueReader, Snapshot</CategoryList>
<ExcludeCategoryList/>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>200</TimeoutSeconds>
</Configuration>
</UnitMonitor>