MSSQL on Windows Replication: The Maintenance Job(s) Failed on Distributor Alert Rule

Microsoft.SQLServer.Replication.Windows.DistributorFailedMaintenanceJobsAlertRule (Rule)

The Maintenance Job(s) Failed on Distributor Alert Rule.

Knowledge Base article:

Summary

Replication uses the following jobs to perform scheduled and on-demand maintenance.

Clean up job

Description

Default schedule

Agent History Clean Up: Distribution

Removes replication agent history from the distribution database.

Runs every ten minutes.

Distribution Clean Up: Distribution

Removes replicated transactions from the distribution database. Deactivates subscriptions that have not been synchronized within the maximum distribution retention period.

Runs every ten minutes.

Expired Subscription Clean Up

Detects and removes expired subscriptions from publication databases.

Runs every day at 1:00 A.M.

Reinitialize Subscriptions Having Data Validation Failures

Detects all subscriptions that have data validation failures and marks them for re-initialization. The next time the Merge Agent or Distribution Agent runs, a new snapshot will be applied at the Subscribers.

No default schedule (not enabled by default).

Replication Agents Checkup

Detects replication agents that are not actively logging history. It writes to the Microsoft Windows event log if a job step fails.

Runs every ten minutes.

Replication monitoring refresher for distribution

Refreshes cached queries used by Replication Monitor.

Runs continuously.

Causes

The Replication maintenance jobs can fail due to many reasons:

Resolutions

To resolve the issue, try the following:

Overrideable Parameters

Name

Description

Default Value

Enabled

Enables or disables the workflow.

Yes

Interval (seconds)

The recurring interval of time in seconds in which to run the workflow.

300

Priority

Defines Alert Priority.

2

Severity

Defines Alert Severity.

2

Synchronization Time

The synchronization time specified by using a 24-hour format. May be omitted.

 

Timeout (seconds)

Specifies the time the workflow is allowed to run before being closed and marked as failed.

200

Timeout for database connection (seconds)

The workflow will fail and register an event if it cannot access the database during the specified period.

15

Element properties:

TargetMicrosoft.SQLServer.Replication.Windows.Distributor
CategoryAlert
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityHigh
RemotableTrue
Alert Message
MSSQL on Windows Replication: Some of the Replication maintenance jobs failed on the Distributor.
{0}

Member Modules:

ID Module Type TypeId RunAs 
DS DataSource Microsoft.SQLServer.Replication.Windows.DataSource.DistributorFailJobsMonitor Microsoft.SQLServer.Core.RunAs.Monitoring
CDMaintenanceJobsCountNotNull ConditionDetection System.ExpressionFilter Default
Alert WriteAction System.Health.GenerateAlert Default

Source Code:

<Rule ID="Microsoft.SQLServer.Replication.Windows.DistributorFailedMaintenanceJobsAlertRule" ConfirmDelivery="true" DiscardLevel="100" Enabled="true" Priority="Normal" Remotable="true" Target="SQLReplWD!Microsoft.SQLServer.Replication.Windows.Distributor">
<Category>Alert</Category>
<DataSources>
<DataSource ID="DS" TypeID="Microsoft.SQLServer.Replication.Windows.DataSource.DistributorFailJobsMonitor" RunAs="SqlCoreLib!Microsoft.SQLServer.Core.RunAs.Monitoring">
<MachineName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/MachineName$</MachineName>
<NetbiosComputerName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/NetbiosComputerName$</NetbiosComputerName>
<InstanceName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</InstanceName>
<SqlTimeoutSeconds>15</SqlTimeoutSeconds>
<ConnectionString>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</ConnectionString>
<MonitoringType>$Target/Host/Property[Type="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine"]/MonitoringType$</MonitoringType>
<NumberOfConsecutiveFails>1</NumberOfConsecutiveFails>
<CategoryList/>
<ExcludeCategoryList>Distribution, LogReader, Merge, QueueReader, Snapshot</ExcludeCategoryList>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>200</TimeoutSeconds>
</DataSource>
</DataSources>
<ConditionDetection ID="CDMaintenanceJobsCountNotNull" TypeID="System!System.ExpressionFilter">
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="Integer">Property[@Name='DistributorFailJobs']</XPathQuery>
</ValueExpression>
<Operator>Greater</Operator>
<ValueExpression>
<Value Type="Integer">0</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</ConditionDetection>
<WriteActions>
<WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert">
<Priority>2</Priority>
<Severity>2</Severity>
<AlertName/>
<AlertDescription/>
<AlertOwner/>
<AlertMessageId>$MPElement[Name="Microsoft.SQLServer.Replication.Windows.DistributorFailedMaintenanceJobsAlertRule.AlertMessage"]$</AlertMessageId>
<AlertParameters>
<AlertParameter1>$Data/Property[@Name="Message"]$</AlertParameter1>
</AlertParameters>
<Suppression>
<SuppressionValue/>
</Suppression>
<Custom1/>
<Custom2/>
<Custom3/>
<Custom4/>
<Custom5/>
<Custom6/>
<Custom7/>
<Custom8/>
<Custom9/>
<Custom10/>
</WriteAction>
</WriteActions>
</Rule>