Replication Agents failed on the Distributor

Microsoft.SQLServer.2008.Replication.Monitor.ReplicationAgentFailJobs (UnitMonitor)

This monitor checks if the following Replication agent jobs are in healthy state: Distribution agent, Merge agent, Queue Reader agent, Log reader agent or Snapshot agent. If any of the agents are in failed state, the monitor will be triggered.

Knowledge Base article:

Summary

This monitor checks the health state of the jobs for the follow replication agents: Distribution Agent, Merge Agent, Queue Reader Agent, Log reader Agent and Snapshot Agent. If any of the agents’ jobs are in failed state, the monitor will be triggered.

Causes

The Replication agent jobs can fail due to many reasons:

Resolutions

To resolve the issue, try the following:

External

How to enable replication agents for logging to output files in SQL Server:

http://support.microsoft.com/kb/312292

Overrideable Parameters

Name

Description

Default Value

Alert Priority

Defines Alert Priority.

Normal

Alert Severity

Defines Alert Severity.

Warning

Enabled

Enables or disables the workflow.

Yes

Failed jobs count threshold

Failed jobs count threshold

1

Generates Alerts

Defines whether the workflow generates an Alert.

Yes

Interval (seconds)

The recurring interval of time in seconds in which to run the workflow.

300

Per-Job threshold

Per-Job threshold

1

Synchronization Time

Synchronization Time

 

Timeout (seconds)

Specifies the time the workflow is allowed to run before being closed and marked as failed.

300

Timeout for database connection (seconds)

The workflow will fail and register an event, if it cannot access the database during the specified period.

15

Element properties:

TargetMicrosoft.SQLServer.2008.Replication.Distributor
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityWarning
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.SQLServer.2008.Replication.MonitorType.DistributorFailJobs
RemotableTrue
AccessibilityPublic
Alert Message
MSSQL2008 Replication: Failed jobs detected on the Distributor.
The Distributor (Name: '{0}', Server: '{1}') has detected {2} failed job(s).
{3}
RunAsMicrosoft.SQLServer.Replication.Monitoring.RunAs.Monitor

Source Code:

<UnitMonitor ID="Microsoft.SQLServer.2008.Replication.Monitor.ReplicationAgentFailJobs" Accessibility="Public" Enabled="true" Target="MS2RD!Microsoft.SQLServer.2008.Replication.Distributor" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.2008.Replication.MonitorType.DistributorFailJobs" ConfirmDelivery="false" RunAs="MSRL!Microsoft.SQLServer.Replication.Monitoring.RunAs.Monitor">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.2008.Replication.Monitor.ReplicationAgentFailJobs.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Warning</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/InstanceName$</AlertParameter1>
<AlertParameter2>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/ConnectionString$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='DistributorFailJobs']$</AlertParameter3>
<AlertParameter4>$Data/Context/Property[@Name='Message']$</AlertParameter4>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Health" MonitorTypeStateID="Health" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<SqlTimeout>15</SqlTimeout>
<ConnectionString>$Target/Property[Type='MSRL!Microsoft.SQLServer.Replication.Library.GenericDistributor']/ConnectionString$</ConnectionString>
<ThresholdCountOfFailsForJob>1</ThresholdCountOfFailsForJob>
<ThresholdCountOfFailedJobs>1</ThresholdCountOfFailedJobs>
<CategoryList>Distribution</CategoryList>
<ExcludeCategoryList/>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>