Job Submission Test Result

Microsoft.HPC.2008.Monitor.JobSubmissionTestResult (UnitMonitor)

Knowledge Base article:

Summary

This monitor runs a diagnostic test of the HPC Job Scheduler Service once a day. The Job Submission Test submits a simple job to all nodes in the cluster. It tests the ability of the HPC Job Scheduler Service to accept and run a job on the cluster.

This monitor will enter the Critical state if the Job Submission Test fails on the cluster.

Causes

This error can be caused by a variety of reasons. The following is a list of some of the major reasons:

Resolutions

To troubleshoot and fix this problem:

Element properties:

TargetMicrosoft.HPC.2008.HeadNode.HPCPack.JobScheduler
Parent MonitorSystem.Health.AvailabilityState
CategoryAvailabilityHealth
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.HPC.2008.MonitorType.RunDiagnosticResult
RemotableTrue
AccessibilityPublic
Alert Message
Job Submission Simple Scheduler Test Failed
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.HPC.2008.Monitor.JobSubmissionTestResult" Accessibility="Public" Enabled="true" Target="Microsoft.HPC.2008.HeadNode.HPCPack.JobScheduler" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Microsoft.HPC.2008.MonitorType.RunDiagnosticResult" ConfirmDelivery="true">
<Category>AvailabilityHealth</Category>
<AlertSettings AlertMessage="Microsoft.HPC.2008.Monitor.JobSubmissionTestResult_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="UIGeneratedOpStateIde51dafbb58294d4b8b35f49c51d37074" MonitorTypeStateID="Success" HealthState="Success"/>
<OperationalState ID="UIGeneratedOpStateIde60a0185d56c4731b2a0b71eb2957002" MonitorTypeStateID="Failed" HealthState="Error"/>
</OperationalStates>
<Configuration>
<IntervalSeconds>86400</IntervalSeconds>
<TimeoutSeconds>300</TimeoutSeconds>
<NodeName>Null</NodeName>
<TestName>SimpleSchedulerTest</TestName>
</Configuration>
</UnitMonitor>