Monitors the health state of History Server process.
This monitor checks the health state of History Server process on the host. History Server is responsible for keeping track of executed jobs that are not handled by jobtracker anymore (jobs marked as "retired" by JobTracker). When History Server is not running cluster administrator is not able access details on executed jobs using "Job Tracker History" link from Map Reduce Web UI.
HDInsight Appliance
Monitor is active and reports actual component state.
HDInsight Azure
This monitor is not available in HDInsight clusters on Azure, so diagnostic and resolution steps below do not apply to this type of environment.
History Server service may be offline due to various reasons:
Maintenance action of service is in progress, performed by HDInsight cluster administrator.
There are failures in physical/virtual cluster infrastructure (fabric layer, i.e. head node) that owns History Server component.
If History Server is not stopped on purpose, use the following steps to diagnose the issue:
Review History Server logs. Log files can be accessed using one of the following approaches:
Browse JobTracker web interface by logging into Developer Dashboard (https://<secure node>:81) and choosing "Log directory" option. An alternate approach is to access the following address directly: https://<secure node>/jobtracker/logs
Remotely connect to the head node virtual machine. History Server log files are located at <OS disk>:\hadoop\hadoop-<HDP version>\logs. Example of this path is C:\hadoop\hadoop-1.2.0.1.3.0.0-0514\logs
Connecting remotely to the head node is a two-step operation:
Use Remote Desktop Connection to login into secure node of the HDInsight cluster.
Use another Remote Desktop Connection from the secure node to connect to the head node virtual machine.
To resolve the issue:
Based on findings in diagnose step, fix all problems that caused History Server to fail and start it again using Start HDInsight Host Component action available on the Tasks pane.
If procedure from above doesn’t solve the issue, please contact Microsoft Support team and provide them with alert name and details. Be aware that diagnostic action may require administrator permissions on HDInsight cluster.
Target | Microsoft.HDInsight.HostComponent.HistoryServerYarn | ||
Parent Monitor | System.Health.AvailabilityState | ||
Category | AvailabilityHealth | ||
Enabled | True | ||
Alert Generate | True | ||
Alert Severity | MatchMonitorHealth | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | Microsoft.HDInsight.UnitMonitorType.HostComponentHealthState | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Default |
<UnitMonitor ID="Microsoft.HDInsight.UnitMonitor.HistoryServerYarnComponentHealthState" TypeID="Microsoft.HDInsight.UnitMonitorType.HostComponentHealthState" Target="Microsoft.HDInsight.HostComponent.HistoryServerYarn" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" Accessibility="Public" Enabled="true" ConfirmDelivery="true">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Microsoft.HDInsight.UnitMonitor.HistoryServerComponentHealthState.AlertMessage">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/HostName$</AlertParameter1>
<AlertParameter2>$Target/Host/Host/Property[Type="Microsoft.HDInsight.Host.Private"]/ClusterName$</AlertParameter2>
<AlertParameter3>$Data/Context/Property[@Name='$Target/Property[Type="Microsoft.HDInsight.HostComponent"]/ComponentName$']$</AlertParameter3>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Healthy" MonitorTypeStateID="Healthy" HealthState="Success"/>
<OperationalState ID="Unhealthy" MonitorTypeStateID="Unhealthy" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>900</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
</Configuration>
</UnitMonitor>