Node Connectivity

Microsoft.HPC.2008R2.Monitor.NodeRole.Availability.Connectivity (UnitMonitor)

Monitor of connectivity to job scheduler for HPC 2008 R2 Node

Knowledge Base article:

Summary

This monitor tracks the ability of a node to connect to the job scheduler port on the head node by using TCP/IP. If a node cannot connect to the job scheduler port on the head node, the node cannot communicate with the job scheduler to run jobs. A node attempts to connect to the job scheduler port on the head node using TCP/IP every 15 minutes by default.

Causes

This error can be caused by any of the following:

Resolutions

Additional

Element properties:

TargetMicrosoft.HPC.2008R2.NodeRole
Parent MonitorSystem.Health.AvailabilityState
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.HPC.2008R2.MonitorType.PowershellScriptMonitor.TwoStates
RemotableTrue
AccessibilityPublic
Alert Message
Node cannot connect to the head node
Please see the alert context for details.
RunAsMicrosoft.HPC.RunAsProfile.AdminActionAccount

Source Code:

<UnitMonitor ID="Microsoft.HPC.2008R2.Monitor.NodeRole.Availability.Connectivity" Accessibility="Public" Enabled="true" Target="Microsoft.HPC.2008R2.NodeRole" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" RunAs="HPCLibrary!Microsoft.HPC.RunAsProfile.AdminActionAccount" TypeID="Microsoft.HPC.2008R2.MonitorType.PowershellScriptMonitor.TwoStates" ConfirmDelivery="true">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Microsoft.HPC.2008R2.Monitor.NodeRole.Availability.Connectivity_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="UIGeneratedOpStateId00f669790fe541dd80803872986dcbb5" MonitorTypeStateID="Success" HealthState="Success"/>
<OperationalState ID="UIGeneratedOpStateId578767281f37481db1b682c81ac9845a" MonitorTypeStateID="Failure" HealthState="Error"/>
</OperationalStates>
<Configuration>
<ScriptName>GetConnectivityToScheduler</ScriptName>
<ScriptBody><Script>

param([int]$schedulerPort)

Function TryConnect
{
param([string]$server, [int]$port)
&amp;{ #TRY
$client = New-Object System.Net.Sockets.TcpClient $server,$port
$client.Close()
return "Success"
}
trap [Exception]
{
return "Failure"
}
}

$api = New-Object -ComObject "MOM.ScriptAPI"
$bag = $api.CreatePropertyBag()

$clusterName = (Get-ItemProperty hklm:\SOFTWARE\Microsoft\HPC).ClusterName
if (($clusterName -ne $null) -and ($clusterName -ne ""))
{
$result = TryConnect $clusterName $schedulerPort
$bag.AddValue("Result", $result)
}
else
{
$bag.AddValue("Result", "Failure")
}

$api.Return($bag)

</Script></ScriptBody>
<Parameters>5800</Parameters>
<TimeoutSeconds>300</TimeoutSeconds>
<IntervalSeconds>900</IntervalSeconds>
</Configuration>
</UnitMonitor>