Monitors memory heap usage of NameNode process.
This monitor checks memory heap usage of NameNode service process. The alert indicates that NameNode memory usage is over acceptable threshold which has negative impact on cluster performance.
Default monitor thresholds:
Warning: when percent of memory used is between 85% and 95%.
Error: when percent of memory used is 95% or higher.
The majority of NameNode memory consumption is driven by HDFS in-memory file system image. Total memory consumed by file system image is proportional to number of file blocks. When the NameNode increases amount of used memory that may be caused by the following reasons:
There are too many file blocks in the system (system is approaching its physical limits).
Default block size is too small (compared to average file size and recommended defaults).
Replication factor (default or per file) is set to a high (conservative) value.
To resolve this issue consider the following options:
Use bigger block size, more appropriate to typical/average file size in the system (avoid using too small block size).
Apply lower replication factor (if too high or conservative was value specified).
Cleanup storage space by removing files that are not relevant nor necessary.
Restart NameNode by using actions available on the Tasks pane (stop and start) and check if problem still persists.
Contact Microsoft Support and provide them with the alert name and details. Microsoft Support team will require administrator access on HDInsight cluster to be able to root cause the problem.
Target | Microsoft.HDInsight.HostComponent.NameNode | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | MatchMonitorHealth | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.HDInsight.UnitMonitorType.HostComponentThreeStateThreshold | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Default |
<UnitMonitor ID="Microsoft.HDInsight.UnitMonitor.NameNodeMemoryHeapUsed" TypeID="Microsoft.HDInsight.UnitMonitorType.HostComponentThreeStateThreshold" Target="Microsoft.HDInsight.HostComponent.NameNode" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" Accessibility="Public" Enabled="true" ConfirmDelivery="true">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HDInsight.UnitMonitor.NameNodeMemoryHeapUsed.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/ClusterName$</AlertParameter1>
<AlertParameter2>$Data/Context/Property[@Name='calculated.memheapusedpercent']$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Warning" MonitorTypeStateID="Warning" HealthState="Warning"/>
<OperationalState ID="Critical" MonitorTypeStateID="Critical" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<PropertyName>calculated.memheapusedpercent</PropertyName>
<TheGreaterTheBetter>false</TheGreaterTheBetter>
<WarningThreshold>85</WarningThreshold>
<CriticalThreshold>95</CriticalThreshold>
</Configuration>
</UnitMonitor>