Failed State

Microsoft.SQLServerAppliance.PDW.Cooling.DeviceStateMonitor.Failed (UnitMonitor)

This monitor detects if Cooling Device is in Failed state.

Knowledge Base article:

Summary

The cooling device is reporting a value that the vendor considers dangerous to the server. This should be considered as a critical situation.

Causes

The cooling device detects a condition that could permanently damage the system. The system could shutdown if this condition is met. The cooling device could be in a failed state and not able to continue cooling the server components.

Status is reported in the component's "device_status" property in the PDW Admin Console or query the sys.dm_pdw_component_health_status DMV.

Resolutions

Further triage and resolution will require access to the PDW Management Node to use the manufacturer's installed diagnostic tool to interrogate the server's cooling devices and temperature sensors to understand the issue. This might be due to a data center air conditioning event, or potentially a cooling device has failed. Look for other alerts related to cooling devices.

Report the information to Microsoft Support Services to understand the failure and how to fix and get it back online, or into a healthy state. The server vendor might need to replace a cooling device.

Element properties:

TargetMicrosoft.SQLServerAppliance.PDW.Cooling.Device
Parent MonitorSystem.Health.AvailabilityState
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityWarning
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.SQLServerAppliance.PDW.ComponentTwoStateType
RemotableTrue
AccessibilityPublic
Alert Message
Cooling device status is FAILED

Appliance Name: {0}
Node Name: {1}
Component: {2}
Component Details: https://{3}/Topology/NodeDetails/{4}?compId={5}
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.SQLServerAppliance.PDW.Cooling.DeviceStateMonitor.Failed" Accessibility="Public" Enabled="true" Target="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Cooling.Device" ParentMonitorID="Health!System.Health.AvailabilityState" TypeID="Microsoft.SQLServerAppliance.PDW.ComponentTwoStateType" Remotable="true" Priority="Normal" ConfirmDelivery="false">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Microsoft.SQLServerAppliance.PDW.Cooling.DeviceStateMonitor.Failed.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Warning</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Property[Type="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Component"]/ApplianceID$</AlertParameter1>
<AlertParameter2>$Target/Property[Type="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Component"]/NodeName$</AlertParameter2>
<AlertParameter3>$Target/Property[Type="System!System.Entity"]/DisplayName$</AlertParameter3>
<AlertParameter4>$Target/Property[Type="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Component"]/ApplianceNetworkAddress$</AlertParameter4>
<AlertParameter5>$Target/Property[Type="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Component"]/NodeID$</AlertParameter5>
<AlertParameter6>$Target/Property[Type="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Component"]/ID$</AlertParameter6>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Good" MonitorTypeStateID="Good" HealthState="Success"/>
<OperationalState ID="Bad" MonitorTypeStateID="Bad" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<SyncTime/>
<TimeoutSeconds>600</TimeoutSeconds>
<ConnectionString>$Target/Property[Type="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Component"]/ApplianceID$</ConnectionString>
<NodeName>$Target/Property[Type="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Component"]/NodeName$</NodeName>
<GroupName>$Target/Property[Type="PDWLibrary!Microsoft.SQLServerAppliance.PDW.Component"]/GroupName$</GroupName>
<ComponentName>$Target/Property[Type="System!System.Entity"]/DisplayName$</ComponentName>
<MonitoredState>Failed</MonitoredState>
</Configuration>
</UnitMonitor>