DataNode Component State

Microsoft.HDInsight.UnitMonitor.DataNodeComponentHealthState (UnitMonitor)

Monitors the health state of DataNode process.

Knowledge Base article:

Summary

This monitor checks the health state of DataNode process on the host virtual machine. If DataNode on particular host is down, that has negative impact on data availability and overall system performance. DataNodes which are down augment number of under-replicated blocks in the system. If critical number of DataNodes is down it can result in some data blocks being completely unavailable to rest of the system and to end users.

HDInsight Appliance

Monitor is active and reports actual component state.

HDInsight Azure

This monitor is not available in HDInsight clusters on Azure, so diagnostic and resolution steps below do not apply to this type of environment.

Causes

DataNode service can go offline due to various reasons:

Resolutions

If DataNode is not stopped on purpose, use the following steps to diagnose the issue:

Connecting remotely to the virtual machine that owns the failed DataNode is a two-step operation:

To resolve the issue:

Element properties:

TargetMicrosoft.HDInsight.HostComponent.DataNode
Parent MonitorSystem.Health.AvailabilityState
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateFalse
Alert Auto ResolveTrue
Monitor TypeMicrosoft.HDInsight.UnitMonitorType.HostComponentHealthState
RemotableTrue
AccessibilityPublic
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.HDInsight.UnitMonitor.DataNodeComponentHealthState" TypeID="Microsoft.HDInsight.UnitMonitorType.HostComponentHealthState" Target="Microsoft.HDInsight.HostComponent.DataNode" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" Accessibility="Public" Enabled="true" ConfirmDelivery="true">
<Category>AvailabilityHealth</Category>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Unhealthy" MonitorTypeStateID="Unhealthy" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>