Long Running Jobs

Microsoft.SQLServer.2016.Agent.LongRunningJobs (UnitMonitor)

This monitor checks for long running SQL Agent jobs.
Note that SQL Server Agent Windows Service is not supported by any edition of SQL Server Express; there is no appropriate discovered object. This monitor is disabled by default. Please use overrides to enable it when necessary.

Knowledge Base article:

Summary

This monitor checks for long running SQL Agent jobs. A warning or error alert will be raised if a job has been running for longer that the configured threshold.

Causes

An unhealthy state is caused by a SQL Server Agent job that has run longer than the defined threshold. This could indicate a problem with the job.

The SQL Server Agent is responsible for running SQL Server tasks scheduled to occur at specific times or intervals as well as detecting specific conditions for which administrators have defined an action, such as alerting someone through pages or e-mail, or a task that will address the conditions. The SQL Server Agent is also used for running replication tasks defined by administrators.

To identify the job that caused the warning or error state, examine the context data for the state change or alert.

Resolutions

Check SQL Server Management Studio to identify what jobs are running. If these jobs are running longer than necessary, investigate them to find out why they are.

Use sp_help_jobactivity to see information about currently running jobs.

Alternatively if it is expected for some agent jobs to run for a long time:

Overrideable Parameters

Name

Description

Default Value

Alert Priority

Defines Alert Priority.

Normal

Alert Severity

Defines Alert Severity.

MatchMonitorHealth

Critical Threshold (minutes)

The monitor will change its state to Critical if the value exceeds this threshold. Being between this threshold and the warning threshold (inclusive) will result in the monitor being in a warning state.

120

Enabled

Enables or disables the workflow.

No

Generates Alerts

Defines whether the workflow generates an Alert.

Yes

Interval (seconds)

The recurring interval of time in seconds in which to run the workflow.

600

Synchronization Time

The synchronization time specified by using a 24-hour format. May be omitted.

 

Timeout (seconds)

Specifies the time the workflow is allowed to run before being closed and marked as failed.

300

Warning Threshold (minutes)

Warning threshold. Exceeding this threshold will result in the monitor changing to at least a warning state.

60

Element properties:

TargetMicrosoft.SQLServer.2016.Agent
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledFalse
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.SQLServer.2016.AgentLongRunningJobsProvider
RemotableTrue
AccessibilityPublic
Alert Message
MSSQL 2016: Long Running Jobs
There are long running agent jobs on SQL instance {1} on computer {0}. This may indicate an issue with one or more jobs.
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.SQLServer.2016.Agent.LongRunningJobs" Accessibility="Public" Enabled="false" Target="SQL2016Core!Microsoft.SQLServer.2016.Agent" ParentMonitorID="SystemHealth!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="Microsoft.SQLServer.2016.AgentLongRunningJobsProvider" ConfirmDelivery="false">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.2016.Agent.LongRunningJobs.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</AlertParameter1>
<AlertParameter2>$Target/Host/Property[Type="SQL2016Core!Microsoft.SQLServer.2016.ServerRole"]/InstanceName$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="UnderThreshold1" MonitorTypeStateID="UnderThreshold1" HealthState="Success"/>
<OperationalState ID="OverThreshold1UnderThreshold2" MonitorTypeStateID="OverThreshold1UnderThreshold2" HealthState="Warning"/>
<OperationalState ID="OverThreshold2" MonitorTypeStateID="OverThreshold2" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>600</IntervalSeconds>
<SyncTime/>
<ConnectionString>$Target/Host/Property[Type="SQL2016Core!Microsoft.SQLServer.2016.DBEngine"]/ConnectionString$</ConnectionString>
<Threshold1>60</Threshold1>
<Threshold2>120</Threshold2>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>