Memory Heap Usage

Microsoft.HDInsight.UnitMonitor.ResourceManagerMemoryHeapUsed (UnitMonitor)

Monitors memory heap usage of ResourceManager process.

Knowledge Base article:

Summary

This monitor checks memory usage of ResourceManager service process. ResourceManager is vital for Yarn subsystem, so it is highly desirable to keep its parameters in acceptable ranges at all times.

Default monitor thresholds:

HDInsight Appliance

Monitor is active and reports actual component state.

HDInsight Azure

This monitor is not available in HDInsight clusters on Azure, so diagnostic and resolution steps below do not apply to this type of environment.

Causes

Regularly, ResourceManager doesn’t have issues with running on low memory. ResourceManager memory usage can get increased if Yarn/MapReduce parameters are changed to non-optimal values and when it runs maximum number of jobs allowed by the system capacity.

Resolutions

To diagnose the issue:

Connecting remotely to the head node is a two-step operation:

To resolve the issue:

Element properties:

TargetMicrosoft.HDInsight.HostComponent.ResourceManager
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.HDInsight.UnitMonitorType.HostComponentThreeStateThreshold
RemotableTrue
AccessibilityPublic
Alert Message
ResourceManager is working under high memory pressure.
ResourceManager is using {1} \% of its maximum heap memory in the cluster "{0}".
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.HDInsight.UnitMonitor.ResourceManagerMemoryHeapUsed" TypeID="Microsoft.HDInsight.UnitMonitorType.HostComponentThreeStateThreshold" Target="Microsoft.HDInsight.HostComponent.ResourceManager" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" Accessibility="Public" Enabled="true" ConfirmDelivery="true">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HDInsight.UnitMonitor.ResourceManagerMemoryHeapUsed.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/ClusterName$</AlertParameter1>
<AlertParameter2>$Data/Context/Property[@Name='calculated.memheapusedpercent']$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
<OperationalState ID="Critical" MonitorTypeStateID="Critical" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<PropertyName>calculated.memheapusedpercent</PropertyName>
<TheGreaterTheBetter>false</TheGreaterTheBetter>
<WarningThreshold>85</WarningThreshold>
<CriticalThreshold>95</CriticalThreshold>
</Configuration>
</UnitMonitor>