Node Manager Service

Microsoft.HPC.2008.Monitor.ComputeNode.NodeManager (UnitMonitor)

Knowledge Base article:

Summary

This monitor tracks the status of the HPC Node Manager Service. When this service is stopped, no jobs will be able to run on this node.

Causes

This error can be caused by any of the following:

Resolutions

To troubleshoot and fix this problem:

Additional

Recovery task will be run automatically to restart the service, so you may find the service keeps restarting while you are trying to stop it. There are couple of options to avoid it happen:

Element properties:

TargetMicrosoft.HPC.2008.ComputeNode
Parent MonitorSystem.Health.AvailabilityState
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.Windows.CheckNTServiceStateMonitorType
RemotableTrue
AccessibilityPublic
Alert Message
Compute Node HPC Node Manager Service is not running
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.HPC.2008.Monitor.ComputeNode.NodeManager" Accessibility="Public" Enabled="true" Target="Microsoft.HPC.2008.ComputeNode" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Windows!Microsoft.Windows.CheckNTServiceStateMonitorType" ConfirmDelivery="false">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Microsoft.HPC.2008.Monitor.ComputeNode.NodeManager_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="Running" MonitorTypeStateID="Running" HealthState="Success"/>
<OperationalState ID="NotRunning" MonitorTypeStateID="NotRunning" HealthState="Error"/>
</OperationalStates>
<Configuration>
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<ServiceName>HpcNodeManager</ServiceName>
<CheckStartupType/>
</Configuration>
</UnitMonitor>