The Maintenance Job(s) Failed on Distributor Alert Rule.
Replication uses the following jobs to perform scheduled and on-demand maintenance.
Clean up job | Description | Default schedule |
Agent History Clean Up: Distribution | Removes replication agent history from the distribution database. | Runs every ten minutes. |
Distribution Clean Up: Distribution | Removes replicated transactions from the distribution database. Deactivates subscriptions that have not been synchronized within the maximum distribution retention period. | Runs every ten minutes. |
Expired Subscription Clean Up | Detects and removes expired subscriptions from publication databases. | Runs every day at 1:00 A.M. |
Reinitialize Subscriptions Having Data Validation Failures | Detects all subscriptions that have data validation failures and marks them for re-initialization. The next time the Merge Agent or Distribution Agent runs, a new snapshot will be applied at the Subscribers. | No default schedule (not enabled by default). |
Replication Agents Checkup | Detects replication agents that are not actively logging history. It writes to the Microsoft Windows event log if a job step fails. | Runs every ten minutes. |
Replication monitoring refresher for distribution | Refreshes cached queries used by Replication Monitor. | Runs continuously. |
The Replication maintenance jobs can fail due to many reasons:
SQL Server Agent configuration issues
Replication maintenance job issues such as improper parameter values.
Data integrity errors.
Query timeouts.
To resolve the issue, try the following:
Examine the SQL Server Agent history to determine the cause of the agent failure.
Look in the replication maintenance job history for any error message mentioned in the alert, and investigate/fix accordingly. To review the job history, use this KB.
Enable verbose logging and rerun the job to obtain detailed error information.
Name | Description | Default Value |
Enabled | Enables or disables the workflow. | Yes |
Interval (seconds) | The recurring interval of time in seconds in which to run the workflow. | 300 |
Priority | Defines Alert Priority. | 2 |
Severity | Defines Alert Severity. | 2 |
Synchronization Time | The synchronization time specified by using a 24-hour format. May be omitted. |
|
Timeout (seconds) | Specifies the time the workflow is allowed to run before being closed and marked as failed. | 200 |
Timeout for database connection (seconds) | The workflow will fail and register an event if it cannot access the database during the specified period. | 15 |
Target | Microsoft.SQLServer.Replication.Windows.Distributor | ||
Category | Alert | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | High | ||
Remotable | True | ||
Alert Message |
|
ID | Module Type | TypeId | RunAs |
---|---|---|---|
DS | DataSource | Microsoft.SQLServer.Replication.Windows.DataSource.DistributorFailJobsMonitor | Microsoft.SQLServer.Core.RunAs.Monitoring |
CDMaintenanceJobsCountNotNull | ConditionDetection | System.ExpressionFilter | Default |
Alert | WriteAction | System.Health.GenerateAlert | Default |
<Rule ID="Microsoft.SQLServer.Replication.Windows.DistributorFailedMaintenanceJobsAlertRule" ConfirmDelivery="true" DiscardLevel="100" Enabled="true" Priority="Normal" Remotable="true" Target="SQLReplWD!Microsoft.SQLServer.Replication.Windows.Distributor">
<Category>Alert</Category>
<DataSources>
<DataSource ID="DS" TypeID="Microsoft.SQLServer.Replication.Windows.DataSource.DistributorFailJobsMonitor" RunAs="SqlCoreLib!Microsoft.SQLServer.Core.RunAs.Monitoring">
<MachineName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/MachineName$</MachineName>
<NetbiosComputerName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/NetbiosComputerName$</NetbiosComputerName>
<InstanceName>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/InstanceName$</InstanceName>
<SqlTimeoutSeconds>15</SqlTimeoutSeconds>
<ConnectionString>$Target/Host/Property[Type='SqlCoreLib!Microsoft.SQLServer.Core.DBEngine']/ConnectionString$</ConnectionString>
<MonitoringType>$Target/Host/Property[Type="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine"]/MonitoringType$</MonitoringType>
<NumberOfConsecutiveFails>1</NumberOfConsecutiveFails>
<CategoryList/>
<ExcludeCategoryList>Distribution, LogReader, Merge, QueueReader, Snapshot</ExcludeCategoryList>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>200</TimeoutSeconds>
</DataSource>
</DataSources>
<ConditionDetection ID="CDMaintenanceJobsCountNotNull" TypeID="System!System.ExpressionFilter">
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="Integer">Property[@Name='DistributorFailJobs']</XPathQuery>
</ValueExpression>
<Operator>Greater</Operator>
<ValueExpression>
<Value Type="Integer">0</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</ConditionDetection>
<WriteActions>
<WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert">
<Priority>2</Priority>
<Severity>2</Severity>
<AlertName/>
<AlertDescription/>
<AlertOwner/>
<AlertMessageId>$MPElement[Name="Microsoft.SQLServer.Replication.Windows.DistributorFailedMaintenanceJobsAlertRule.AlertMessage"]$</AlertMessageId>
<AlertParameters>
<AlertParameter1>$Data/Property[@Name="Message"]$</AlertParameter1>
</AlertParameters>
<Suppression>
<SuppressionValue/>
</Suppression>
<Custom1/>
<Custom2/>
<Custom3/>
<Custom4/>
<Custom5/>
<Custom6/>
<Custom7/>
<Custom8/>
<Custom9/>
<Custom10/>
</WriteAction>
</WriteActions>
</Rule>