MSSQL on Windows: The agent is suspect. No response within last minutes

Microsoft.SQLServer.Windows.CollectionRule.Agent.The_agent_is_suspect._No_response_within_last_minutes_1_5_Rule (Rule)

This behavior occurs because the replication agent is too busy to respond when SQL Server Enterprise Manager polls the replication agent; therefore, SQL Server Enterprise Manager does not know the status of the replication agent and it cannot report whether the replication agent is functioning or not.

Knowledge Base article:

Summary

This behavior occurs because the replication agent is too busy to respond when SQL Server Enterprise Manager polls the replication agent; therefore, SQL Server Enterprise Manager does not know the status of the replication agent and it cannot report whether the replication agent is functioning or not.

If the replication agent fails, you receive the following error message:

Message 20536 Severity 10 "Replication: Agent failure."

There are many reasons why the replication agent is busy: There may be a lot of data that is being replicated, or there may be configuration or replication design issues that cause processes to run for a long time.

Note that this rule does not work if SQL Server on Windows instance is monitored agentlessly.

Resolutions

To reduce the frequency of the message, increase the inactivity threshold.

Changing this value does not fix anything. Instead, it changes how frequently the replication agent is polled for its current status. To change the value of the inactivity threshold:

Unless you receive additional error messages that indicate that there is a problem with the replication agent, the "agent is suspect" message is only an informational message. When you receive this message, do not stop the replication agent if there are no additional related errors. If you stop the replication agent, it rolls back the process in which it is engaged, and then you must restart the process; instead, wait for the process to complete.

Overridable Parameters

Name

Description

Default Value

Enabled

Enables or disables the workflow.

Yes

Interval (seconds)

The recurring interval of time in seconds in which to run the workflow.

300

Priority

Defines Alert Priority.

1

Severity

Defines Alert Severity.

2

Synchronization Time

The synchronization time specified by using a 24-hour format. May be omitted.

 

Timeout (seconds)

Specifies the time the workflow is allowed to run before being closed and marked as failed.

200

Timeout for query execution (seconds)

The workflow will fail and register an event, if the query execution takes longer than the specified period.

60

Timeout for database connection (seconds)

The workflow will fail and register an event, if it cannot access the database during the specified period.

15

Element properties:

TargetMicrosoft.SQLServer.Windows.DBEngine
CategoryEventCollection
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
RemotableTrue
Alert Message
MSSQL on Windows: The agent is suspect. No response within last minutes
{2}
CommentMom2017ID='{C65DF52B-B877-48C3-B546-67D69C494E84}';MOM2017GroupID={467ECC75-C5DA-42BD-955C-A73BBB51AF74}

Member Modules:

ID Module Type TypeId RunAs 
_F6DA1507_12AF_11D3_AB21_00A0C98620CE_ DataSource Microsoft.SQLServer.Windows.DataSource.EventCollectionFiltered Default
GenerateAlert WriteAction System.Health.GenerateAlert Default

Source Code:

<Rule ID="Microsoft.SQLServer.Windows.CollectionRule.Agent.The_agent_is_suspect._No_response_within_last_minutes_1_5_Rule" Target="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine" Enabled="true" ConfirmDelivery="true" Remotable="true" Comment="Mom2017ID='{C65DF52B-B877-48C3-B546-67D69C494E84}';MOM2017GroupID={467ECC75-C5DA-42BD-955C-A73BBB51AF74}">
<Category>EventCollection</Category>
<DataSources>
<DataSource ID="_F6DA1507_12AF_11D3_AB21_00A0C98620CE_" Comment="{F6DA1507-12AF-11D3-AB21-00A0C98620CE}" TypeID="Microsoft.SQLServer.Windows.DataSource.EventCollectionFiltered">
<MachineName>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/MachineName$</MachineName>
<NetbiosComputerName>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/NetbiosComputerName$</NetbiosComputerName>
<InstanceName>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/InstanceName$</InstanceName>
<ConnectionString>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/ConnectionString$</ConnectionString>
<InstanceVersion>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/Version$</InstanceVersion>
<InstanceEdition>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/Edition$</InstanceEdition>
<MonitoringType>$Target/Property[Type="SqlDiscW!Microsoft.SQLServer.Windows.DBEngine"]/MonitoringType$</MonitoringType>
<SqlExecTimeoutSeconds>60</SqlExecTimeoutSeconds>
<SqlTimeoutSeconds>15</SqlTimeoutSeconds>
<TimeoutSeconds>200</TimeoutSeconds>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<EventDisplayNumber>20554</EventDisplayNumber>
</DataSource>
</DataSources>
<WriteActions>
<WriteAction ID="GenerateAlert" TypeID="Health!System.Health.GenerateAlert">
<Priority>1</Priority>
<Severity>2</Severity>
<AlertMessageId>$MPElement[Name="Microsoft.SQLServer.Windows.CollectionRule.Agent.The_agent_is_suspect._No_response_within_last_minutes_1_5_Rule.AlertMessage"]$</AlertMessageId>
<AlertParameters>
<AlertParameter1>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/MachineName$</AlertParameter1>
<AlertParameter2>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/InstanceName$</AlertParameter2>
<AlertParameter3>Event ID: $Data/Property[@Name='EventID']$. $Data/Property[@Name='Message']$</AlertParameter3>
</AlertParameters>
<Suppression>
<SuppressionValue/>
</Suppression>
</WriteAction>
</WriteActions>
</Rule>