Memory-Optimized Data Stale Checkpoint File Pairs Ratio

Microsoft.SQLServer.Linux.Monitor.DBFilegroupFx.StaleCFPs (UnitMonitor)

The monitor reports a warning state and raises an alert when the ratio of stale checkpoint file pairs in Memory-Optimized Data Filegroup is higher than the specified thresholds.
Please note that the alerts are raised only if the corresponding database is reasonably big (300 or more checkpoint files total).

Knowledge Base article:

Summary

The monitor reports a warning state and raises an alert when the portion of stale checkpoint file pairs in Memory-Optimized Data Filegroup is higher than the specified thresholds.

Please note that the alerts are raised only if the corresponding database is reasonably big (300 or more checkpoint files total).

Stale CFPs are files left in system waiting for log truncation or garbage collection.

Causes

Merge of CFP is kicked off based on an internal merge policy. Please refer to this article for details.

Resolutions

Perform a manual merge and force garbage collection as described in these articles:

Overridable Parameters

Name

Description

Default Value

Alert Priority

Defines Alert Priority.

Normal

Alert Severity

Defines Alert Severity.

Warning

Checkpoint File Pairs Threshold

An alert would be generated if Checkpoint File Pairs total count greater than or equal to the Checkpoint File Pairs Threshold.

300

Enabled

Enables or disables the workflow.

Yes

Generates Alerts

Defines whether the workflow generates an Alert.

Yes

Interval (seconds)

The recurring interval of time in seconds in which to run the workflow.

300

Number of samples

Indicates how many times a measured value should breach the thresholds before the state is changed.

6

Synchronization Time

The synchronization time specified by using a 24-hour format. May be omitted.

 

Threshold

The collected ratio will be compared against this parameter.

60

Timeout (seconds)

Specifies the time the workflow is allowed to run before being closed and marked as failed.

200

Timeout for query execution (seconds)

The workflow will fail and register an event, if the query execution takes longer than the specified period.

60

Timeout for database connection (seconds)

The workflow will fail and register an event, if it cannot access the database during the specified period.

15

Element properties:

TargetMicrosoft.SQLServer.Linux.DBFilegroupFx
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityWarning
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.SQLServer.Linux.MonitorType.DBFilegroupFx.StaleCFPs
RemotableTrue
AccessibilityPublic
Alert Message
MSSQL on Linux: The portion of stale CFPs is above the thresholds
The ratio of stale checkpoint file pairs to total number of CFPs reached the thresholds.
Server: {0}
SQL Server Instance: {1}
Database: {2}
Filegroup: {3}
Stale Checkpoint File Pairs Ratio: {4}
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.SQLServer.Linux.Monitor.DBFilegroupFx.StaleCFPs" Target="SqlDiscL!Microsoft.SQLServer.Linux.DBFilegroupFx" ParentMonitorID="Health!System.Health.PerformanceState" TypeID="Microsoft.SQLServer.Linux.MonitorType.DBFilegroupFx.StaleCFPs" Accessibility="Public" Enabled="true" Remotable="true" Priority="Normal" ConfirmDelivery="true">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServer.Linux.Monitor.DBFilegroupFx.StaleCFPs.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Warning</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/MachineName$</AlertParameter1>
<AlertParameter2>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/InstanceName$</AlertParameter2>
<AlertParameter3>$Target/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.Database"]/DatabaseName$</AlertParameter3>
<AlertParameter4>$Target/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.Filegroup"]/GroupName$</AlertParameter4>
<AlertParameter5>$Data/Context/SampleValue$</AlertParameter5>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="ErrorState" MonitorTypeStateID="ErrorState" HealthState="Warning"/>
<OperationalState ID="SuccessState" MonitorTypeStateID="SuccessState" HealthState="Success"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>300</IntervalSeconds>
<SyncTime/>
<MachineName>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/MachineName$</MachineName>
<NetbiosComputerName>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/NetbiosComputerName$</NetbiosComputerName>
<InstanceName>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/InstanceName$</InstanceName>
<ConnectionString>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/ConnectionString$</ConnectionString>
<InstanceVersion>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/Version$</InstanceVersion>
<InstanceEdition>$Target/Host/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.DBEngine"]/Edition$</InstanceEdition>
<DatabaseName>$Target/Host/Property[Type="SqlCoreLib!Microsoft.SQLServer.Core.Database"]/DatabaseName$</DatabaseName>
<Threshold>60</Threshold>
<CfpCountThreshold>300</CfpCountThreshold>
<SqlExecTimeoutSeconds>60</SqlExecTimeoutSeconds>
<SqlTimeoutSeconds>15</SqlTimeoutSeconds>
<TimeoutSeconds>200</TimeoutSeconds>
<NumSamples>6</NumSamples>
</Configuration>
</UnitMonitor>