Monitors memory heap usage of ResourceManager process.
This monitor checks memory usage of ResourceManager service process. ResourceManager is vital for Yarn subsystem, so it is highly desirable to keep its parameters in acceptable ranges at all times.
Default monitor thresholds:
Warning: when percent of memory used is between 85% and 95%.
Error: when percent of memory used is 95% or higher.
HDInsight Appliance
Monitor is active and reports actual component state.
HDInsight Azure
This monitor is not available in HDInsight clusters on Azure, so diagnostic and resolution steps below do not apply to this type of environment.
Regularly, ResourceManager doesn’t have issues with running on low memory. ResourceManager memory usage can get increased if Yarn/MapReduce parameters are changed to non-optimal values and when it runs maximum number of jobs allowed by the system capacity.
To diagnose the issue:
Remotely connect to the head node and check Hadoop configuration. You should check files yarn-site.xml and mapred-site.xml which is located at <OS disk>:\hadoop\hadoop-<HDP version>\etc\hadoop and review configuration parameters
Connecting remotely to the head node is a two-step operation:
Use Remote Desktop Connection to login into secure node of the HDInsight cluster.
Use another Remote Desktop Connection from the secure node to connect to the head node virtual machine.
To resolve the issue:
If ResourceManager memory usage is caused by inappropriate Hadoop configuration, use different configuration values or revert yarn-site.xml to factory settings. Restart ResourceManager after changing the configuration.
If you are not able to resolve issue, please contact Microsoft Support team and provide them with alert name and details. Be aware that diagnostic action may require administrator permissions on HDInsight cluster.
Target | Microsoft.HDInsight.HostComponent.ResourceManager | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | MatchMonitorHealth | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.HDInsight.UnitMonitorType.HostComponentThreeStateThreshold | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Default |
<UnitMonitor ID="Microsoft.HDInsight.UnitMonitor.ResourceManagerMemoryHeapUsed" TypeID="Microsoft.HDInsight.UnitMonitorType.HostComponentThreeStateThreshold" Target="Microsoft.HDInsight.HostComponent.ResourceManager" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" Accessibility="Public" Enabled="true" ConfirmDelivery="true">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HDInsight.UnitMonitor.ResourceManagerMemoryHeapUsed.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/ClusterName$</AlertParameter1>
<AlertParameter2>$Data/Context/Property[@Name='calculated.memheapusedpercent']$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
<OperationalState ID="Critical" MonitorTypeStateID="Critical" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<PropertyName>calculated.memheapusedpercent</PropertyName>
<TheGreaterTheBetter>false</TheGreaterTheBetter>
<WarningThreshold>85</WarningThreshold>
<CriticalThreshold>95</CriticalThreshold>
</Configuration>
</UnitMonitor>