Monitors the health state of NameNode process.
This monitor checks the health state of NameNode process on the head node. NameNode is vital to the HDFS service and its failures cause the entire file system to appear unavailable to end users: they cannot load new data nor read existing files from the cluster. It is also a single point of failure for the entire cluster: if it is down all dependent services (including jobtracker) will try to connect for a certain period of time (15 minutes by default) and will go offline if connection does not succeed.
HDInsight Appliance
Monitor is active and reports actual component state.
HDInsight Azure
This monitor is not available in HDInsight clusters on Azure, so diagnostic and resolution steps below do not apply to this type of environment.
If NameNode is not running that may indicate problems with service itself or cluster hosting infrastructure (host virtual machine, storage, and networking).
It can be also taken offline as part of maintenance action performed by HDInsight cluster administrator.
If NameNode is not stopped on purpose, review component logs to diagnose the issue. Log files can be accessed using one of the following approaches:
Browse NameNode web interface by logging into Developer Dashboard (https://<secure node>:81) and choosing "Namenode Logs" option. An alternate approach is to access the following address directly: https://<secure node>/namenode/logs.
Remotely connect to the head node virtual machine. NameNode log files are located at <OS disk>:\hadoop\hadoop-<HDP version>\logs (C:\hadoop\hadoop-1.2.0.1.3.0.0-0514\logs, for example).
Connecting remotely to the head node is a two-step operation:
Use Remote Desktop Connection to login into secure node of the HDInsight cluster.
Use another Remote Desktop Connection from the secure node to connect to the head node virtual machine.
To resolve the issue:
Based on findings in diagnose step, fix all problems that caused NameNode to fail and start it again using Start HDInsight Host Component action available on the Tasks pane.
If procedure from above doesn’t solve the issue, please contact Microsoft Support team and provide them with alert name and details. Be aware that diagnostic action may require administrator permissions for HDInsight cluster.
Target | Microsoft.HDInsight.HostComponent.NameNode | ||
Parent Monitor | System.Health.AvailabilityState | ||
Category | AvailabilityHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | MatchMonitorHealth | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.HDInsight.UnitMonitorType.HostComponentHealthState | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Default |
<UnitMonitor ID="Microsoft.HDInsight.UnitMonitor.NameNodeComponentHealthState" TypeID="Microsoft.HDInsight.UnitMonitorType.HostComponentHealthState" Target="Microsoft.HDInsight.HostComponent.NameNode" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" Accessibility="Public" Enabled="true" ConfirmDelivery="true">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Microsoft.HDInsight.UnitMonitor.NameNodeComponentHealthState.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/HostName$</AlertParameter1>
<AlertParameter2>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/ClusterName$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='$Target/Property[Type="Microsoft.HDInsight.HostComponent"]/ComponentName$']$</AlertParameter3>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Unhealthy" MonitorTypeStateID="Unhealthy" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>