Critical State

Microsoft.SQLServerAppliance.APS.Cluster.NodeStateMonitor.Critical (UnitMonitor)

This monitor detects if Node Component is in Critical state.

Knowledge Base article:

Summary

The Windows Failover Clustering service is reporting a critical status for this node. The clustered node is down, but the cluster should still be operational.

Address this failure quickly. Subsequent failures in this cluster will cause a system outage.

Causes

The clustered node is down (status: 1).

For more details, view the component's node_state property in the PDW Admin Console or query the sys.dm_pdw_component_health_status DMV.

Resolutions

To diagnose this issue:

1) On the Host node, use a Windows administrator account to view the Cluster Events in the Failover Cluster Manager.

2) Use the sys.dm_pdw_os_events_logs DMV to check for relevant failures in the Windows Event Log for the node.

To resolve this issue, contact Microsoft support and provide them with the alert name and details. Microsoft Support will help you to understand the failure and help you to bring the node back online or into a healthy state.

Element properties:

TargetMicrosoft.SQLServerAppliance.APS.Cluster.Node
Parent MonitorSystem.Health.AvailabilityState
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.SQLServerAppliance.APS.ComponentTwoStateType
RemotableTrue
AccessibilityPublic
Alert Message
Node in a cluster has CRITICAL status
Appliance Name: {0}
Node Name: {1}
Component: {2}
Component Details: https://{3}/Fabric/Health/NodeDetails/{4}?compId={5}
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.SQLServerAppliance.APS.Cluster.NodeStateMonitor.Critical" Accessibility="Public" Enabled="true" Target="APSLibrary!Microsoft.SQLServerAppliance.APS.Cluster.Node" ParentMonitorID="Health!System.Health.AvailabilityState" TypeID="Microsoft.SQLServerAppliance.APS.ComponentTwoStateType" Remotable="true" Priority="Normal" ConfirmDelivery="false">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServerAppliance.APS.Cluster.NodeStateMonitor.Critical.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Property[Type="APSLibrary!Microsoft.SQLServerAppliance.APS.Component"]/ApplianceID$</AlertParameter1>
<AlertParameter2>$Target/Property[Type="APSLibrary!Microsoft.SQLServerAppliance.APS.Component"]/NodeName$</AlertParameter2>
<AlertParameter3>$Target/Property[Type="System!System.Entity"]/DisplayName$</AlertParameter3>
<AlertParameter4>$Target/Property[Type="APSLibrary!Microsoft.SQLServerAppliance.APS.Component"]/ApplianceNetworkAddress$</AlertParameter4>
<AlertParameter5>$Target/Property[Type="APSLibrary!Microsoft.SQLServerAppliance.APS.Component"]/NodeID$</AlertParameter5>
<AlertParameter6>$Target/Property[Type="APSLibrary!Microsoft.SQLServerAppliance.APS.Component"]/ID$</AlertParameter6>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Good" MonitorTypeStateID="Good" HealthState="Success"/>
<OperationalState ID="Bad" MonitorTypeStateID="Bad" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>600</TimeoutSeconds>
<ConnectionString>$Target/Property[Type="APSLibrary!Microsoft.SQLServerAppliance.APS.Component"]/ApplianceTdsAddress$</ConnectionString>
<NodeName>$Target/Property[Type="APSLibrary!Microsoft.SQLServerAppliance.APS.Component"]/NodeName$</NodeName>
<GroupName>$Target/Property[Type="APSLibrary!Microsoft.SQLServerAppliance.APS.Component"]/GroupName$</GroupName>
<ComponentName>$Target/Property[Type="System!System.Entity"]/DisplayName$</ComponentName>
<MonitoredState>Critical</MonitoredState>
</Configuration>
</UnitMonitor>