NameNode Component State

Microsoft.HDInsight.UnitMonitor.NameNodeComponentHealthState (UnitMonitor)

Monitors the health state of NameNode process.

Knowledge Base article:

Summary

This monitor checks the health state of NameNode process on the head node. NameNode is vital to the HDFS service and its failures cause the entire file system to appear unavailable to end users: they cannot load new data nor read existing files from the cluster. It is also a single point of failure for the entire cluster: if it is down all dependent services (including jobtracker) will try to connect for a certain period of time (15 minutes by default) and will go offline if connection does not succeed.

HDInsight Appliance

Monitor is active and reports actual component state.

HDInsight Azure

This monitor is not available in HDInsight clusters on Azure, so diagnostic and resolution steps below do not apply to this type of environment.

Causes

If NameNode is not running that may indicate problems with service itself or cluster hosting infrastructure (host virtual machine, storage, and networking).

It can be also taken offline as part of maintenance action performed by HDInsight cluster administrator.

Resolutions

If NameNode is not stopped on purpose, review component logs to diagnose the issue. Log files can be accessed using one of the following approaches:

Connecting remotely to the head node is a two-step operation:

To resolve the issue:

Element properties:

TargetMicrosoft.HDInsight.HostComponent.NameNode
Parent MonitorSystem.Health.AvailabilityState
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.HDInsight.UnitMonitorType.HostComponentHealthState
RemotableTrue
AccessibilityPublic
Alert Message
NameNode is not running.
NameNode hosted on "{0}" in the cluster "{1}" reports "{2}" state.
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.HDInsight.UnitMonitor.NameNodeComponentHealthState" TypeID="Microsoft.HDInsight.UnitMonitorType.HostComponentHealthState" Target="Microsoft.HDInsight.HostComponent.NameNode" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" Accessibility="Public" Enabled="true" ConfirmDelivery="true">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Microsoft.HDInsight.UnitMonitor.NameNodeComponentHealthState.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/HostName$</AlertParameter1>
<AlertParameter2>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/ClusterName$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='$Target/Property[Type="Microsoft.HDInsight.HostComponent"]/ComponentName$']$</AlertParameter3>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Unhealthy" MonitorTypeStateID="Unhealthy" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>