Service Monitoring: Scheduler service is not running

Service_Monitoring__Scheduler_service_is_not_running_1_Rule.AdvancedAlertCriteriaMonitor (UnitMonitor)

Knowledge Base article:

Management Pack
Summary

This alert is generated when the Microsoft Compute Cluster Scheduler service has stopped running on the head node. When this service stops running, Windows Compute Cluster Server 2003 command-line and graphical user interfaces will not function properly, including Job Manager.

No jobs can be submitted to the job queue. Jobs already running on compute nodes are not affected.

 
Causes

This error can be caused by any of the following:

  • The Microsoft Compute Cluster Scheduler service encountered an error and stopped running.
  • The Microsoft Compute Cluster Scheduler service was disabled.
  • Group Policy does not allow this service to start.
 
Resolutions

To troubleshoot and fix this problem:

  1. Service Control Manager will produce an error event if the service has been terminated unexpectedly. Start Event Viewer on the affected head node and check for any system events from Service Control Manager, or application events from the Microsoft Compute Cluster Scheduler service (CcpScheduler). Resolve any errors reported by these events.
  2. Restart the Microsoft Compute Cluster Scheduler service on the affected head node.
  3. If the service cannot be restarted, contact the network domain administrator to make sure this service is not disabled by domain policy.
  4. If none of the above solves the problem, uninstall and reinstall Compute Cluster Pack on the compute node.
 
© 2006 Microsoft Corporation, all rights reserved.

Element properties:

TargetMicrosoft.Windows.Server.ComputeCluster.2003.Head_Node_Class
Parent MonitorScheduler
CategoryStateCollection
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeSystem.Mom.BackwardCompatibility.StateAlert.RuleGenerated.AdvancedRuleCriteriaMonitor
RemotableTrue
AccessibilityInternal
Alert Message
Service Monitoring: Scheduler service is not running
{1}
RunAsDefault
CommentMom2005ID='{F3549068-C569-4304-91AC-2A29EFF0A70D}'

Source Code:

<UnitMonitor ID="Service_Monitoring__Scheduler_service_is_not_running_1_Rule.AdvancedAlertCriteriaMonitor" TypeID="MomBackwardCompatibility!System.Mom.BackwardCompatibility.StateAlert.RuleGenerated.AdvancedRuleCriteriaMonitor" Accessibility="Internal" Target="Microsoft.Windows.Server.ComputeCluster.2003.Head_Node_Class" Enabled="true" ParentMonitorID="Scheduler" Comment="Mom2005ID='{F3549068-C569-4304-91AC-2A29EFF0A70D}'">
<Category>StateCollection</Category>
<AlertSettings AlertMessage="Service_Monitoring__Scheduler_service_is_not_running_1_Rule.AdvancedAlertCriteriaMonitor.StringResource">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Name$</AlertParameter1>
<AlertParameter2>$Data/Context/Description$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState HealthState="Success" MonitorTypeStateID="Success" ID="AlertLevelSuccess"/>
<OperationalState HealthState="Warning" MonitorTypeStateID="Warning" ID="AlertLevelWarning"/>
<OperationalState HealthState="Error" MonitorTypeStateID="Error" ID="AlertLevelError"/>
</OperationalStates>
<Configuration>
<ServerRole>Head Node</ServerRole>
<Component>Scheduler</Component>
<ServerRoleInstance>$Target/Property[Type="Microsoft.Windows.Server.ComputeCluster.2003.Head_Node_Class"]/Server_Name$</ServerRoleInstance>
<RuleId>$MPElement[Name="Service_Monitoring__Scheduler_service_is_not_running_1_Rule"]$</RuleId>
<ServiceUnavailableExpression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">AlertContext/DataItem/Params/Param[10]</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="Integer">2</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<Or>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">AlertContext/DataItem/Params/Param[9]</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">1</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">AlertContext/DataItem/Params/Param[9]</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">3</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</Or>
</Expression>
</And>
</ServiceUnavailableExpression>
<SecurityIssueExpression>
<Not>
<Expression/>
</Not>
</SecurityIssueExpression>
<CriticalErrorExpression>
<Not>
<Expression/>
</Not>
</CriticalErrorExpression>
<ErrorExpression>
<Not>
<Expression/>
</Not>
</ErrorExpression>
<WarningExpression>
<Not>
<Expression/>
</Not>
</WarningExpression>
<SuccessExpression>
<Not>
<Expression/>
</Not>
</SuccessExpression>
<InformationExpression>
<Not>
<Expression/>
</Not>
</InformationExpression>
</Configuration>
</UnitMonitor>