Memory Heap Usage

Microsoft.HDInsight.UnitMonitor.NameNodeMemoryHeapUsed (UnitMonitor)

Monitors memory heap usage of NameNode process.

Knowledge Base article:

Summary

This monitor checks memory heap usage of NameNode service process. The alert indicates that NameNode memory usage is over acceptable threshold which has negative impact on cluster performance.

Default monitor thresholds:

Causes

The majority of NameNode memory consumption is driven by HDFS in-memory file system image. Total memory consumed by file system image is proportional to number of file blocks. When the NameNode increases amount of used memory that may be caused by the following reasons:

Resolutions

To resolve this issue consider the following options:

Contact Microsoft Support and provide them with the alert name and details. Microsoft Support team will require administrator access on HDInsight cluster to be able to root cause the problem.

Element properties:

TargetMicrosoft.HDInsight.HostComponent.NameNode
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.HDInsight.UnitMonitorType.HostComponentThreeStateThreshold
RemotableTrue
AccessibilityPublic
Alert Message
NameNode is working under high memory pressure.
NameNode is using {1} \% of its maximum heap memory in the cluster "{0}".
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.HDInsight.UnitMonitor.NameNodeMemoryHeapUsed" TypeID="Microsoft.HDInsight.UnitMonitorType.HostComponentThreeStateThreshold" Target="Microsoft.HDInsight.HostComponent.NameNode" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" Accessibility="Public" Enabled="true" ConfirmDelivery="true">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HDInsight.UnitMonitor.NameNodeMemoryHeapUsed.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/ClusterName$</AlertParameter1>
<AlertParameter2>$Data/Context/Property[@Name='calculated.memheapusedpercent']$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
<OperationalState ID="Critical" MonitorTypeStateID="Critical" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<PropertyName>calculated.memheapusedpercent</PropertyName>
<TheGreaterTheBetter>false</TheGreaterTheBetter>
<WarningThreshold>85</WarningThreshold>
<CriticalThreshold>95</CriticalThreshold>
</Configuration>
</UnitMonitor>