Daily Job Queue Time

Microsoft.HPC.2008.Monitor.JobScheduler.WaitTime (UnitMonitor)

Knowledge Base article:

Summary

This monitor tracks the average job queue time. The wait time can be used as one of the indicators to show whether the cluster is congested. This monitor is disabled by default because job queue times can be very different across different organizations.

Causes

This error can be caused by any of the following:

Resolutions

To troubleshoot and fix this problem:

Element properties:

TargetMicrosoft.HPC.2008.HeadNode.HPCPack.JobScheduler
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledFalse
Instance NameHPC Scheduler
Counter NameDaily job queue time
Frequency60
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeSystem.Performance.DoubleThreshold
RemotableTrue
AccessibilityPublic
Alert Message
Daily Job Queue Time has exceeded the upper threshold
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.HPC.2008.Monitor.JobScheduler.WaitTime" Accessibility="Public" Enabled="false" Target="Microsoft.HPC.2008.HeadNode.HPCPack.JobScheduler" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" TypeID="SystemPerf!System.Performance.DoubleThreshold" ConfirmDelivery="false">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HPC.2008.Monitor.JobScheduler.WaitTime_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="UnderThreshold1" MonitorTypeStateID="UnderThreshold1" HealthState="Error"/>
<OperationalState ID="OverThreshold1UnderThreshold2" MonitorTypeStateID="OverThreshold1UnderThreshold2" HealthState="Warning"/>
<OperationalState ID="OverThreshold2" MonitorTypeStateID="OverThreshold2" HealthState="Success"/>
</OperationalStates>
<Configuration>
<ComputerName>$Target/Host/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<CounterName>Daily job queue time</CounterName>
<ObjectName>HPC Scheduler</ObjectName>
<InstanceName/>
<AllInstances>false</AllInstances>
<Frequency>60</Frequency>
<Threshold1>-2</Threshold1>
<Threshold2>-1</Threshold2>
</Configuration>
</UnitMonitor>