Service Monitoring: SDM Store service is not running

Service_Monitoring__SDM_Store_service_is_not_running_1_Rule.AdvancedAlertCriteriaMonitor (UnitMonitor)

Knowledge Base article:

Management Pack
Summary

This alert occurs when the Microsoft Compute Cluster SDM Store service that is installed on the head node stops running. This service is responsible for maintaining the integrity of read and write data from the System Definition Model (SDM) data store, which is used to store cluster configuration information. If this service is not running, the Microsoft Compute Cluster Management service stops functioning as well and the entire cluster management infrastructure will not function. No cluster configuration changes will be propagated from the head node to the cluster and no status updates or job statistics will be accepted from the nodes.

 
Causes

This error can be caused by any of the following:

  • The Microsoft Compute Cluster SDM Store service encountered an error and stopped running.
  • The Microsoft Compute Cluster SDM Store service was disabled.
  • Group Policy does not allow this service to start.
 
Resolutions

To troubleshoot and fix this problem:

  1. Service Control Manager produces an error event if the service is terminated unexpectedly. Start Event Viewer on the affected head node and check for any system events from Service Control Manager, or application events from the Microsoft Compute Cluster SDM Store service (CcpSdm). Resolve any errors reported by this event.
  2. Restart the Microsoft Compute Cluster SDM Store service on the head node. To do this, click Start, click All Programs, click Administrative Tools, click Services, and then restart the service. Restart the Microsoft Compute Cluster Management service if it has been stopped.
  3. If the service cannot be restarted, contact the network domain administrator to make sure this service is not disabled by domain policy.
  4. If none of the above solves the problem, uninstall and reinstall Compute Cluster Pack on the head node. Note that all job and cluster configuration information will be lost. If you reinstall Compute Cluster Pack, you will have to configure the cluster again and re-add compute nodes to the cluster.
 
© 2006 Microsoft Corporation, all rights reserved.

Element properties:

TargetMicrosoft.Windows.Server.ComputeCluster.2003.Head_Node_Class
Parent MonitorSDM
CategoryStateCollection
EnabledTrue
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeSystem.Mom.BackwardCompatibility.StateAlert.RuleGenerated.AdvancedRuleCriteriaMonitor
RemotableTrue
AccessibilityInternal
Alert Message
Service Monitoring: SDM Store service is not running
{1}
RunAsDefault
CommentMom2005ID='{8BE2A265-6B4F-429E-8D21-E038C0F26E0D}'

Source Code:

<UnitMonitor ID="Service_Monitoring__SDM_Store_service_is_not_running_1_Rule.AdvancedAlertCriteriaMonitor" TypeID="MomBackwardCompatibility!System.Mom.BackwardCompatibility.StateAlert.RuleGenerated.AdvancedRuleCriteriaMonitor" Accessibility="Internal" Target="Microsoft.Windows.Server.ComputeCluster.2003.Head_Node_Class" Enabled="true" ParentMonitorID="SDM" Comment="Mom2005ID='{8BE2A265-6B4F-429E-8D21-E038C0F26E0D}'">
<Category>StateCollection</Category>
<AlertSettings AlertMessage="Service_Monitoring__SDM_Store_service_is_not_running_1_Rule.AdvancedAlertCriteriaMonitor.StringResource">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Name$</AlertParameter1>
<AlertParameter2>$Data/Context/Description$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState HealthState="Success" MonitorTypeStateID="Success" ID="AlertLevelSuccess"/>
<OperationalState HealthState="Warning" MonitorTypeStateID="Warning" ID="AlertLevelWarning"/>
<OperationalState HealthState="Error" MonitorTypeStateID="Error" ID="AlertLevelError"/>
</OperationalStates>
<Configuration>
<ServerRole>Head Node</ServerRole>
<Component>SDM</Component>
<ServerRoleInstance>$Target/Property[Type="Microsoft.Windows.Server.ComputeCluster.2003.Head_Node_Class"]/Server_Name$</ServerRoleInstance>
<RuleId>$MPElement[Name="Service_Monitoring__SDM_Store_service_is_not_running_1_Rule"]$</RuleId>
<ServiceUnavailableExpression>
<And>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">AlertContext/DataItem/Params/Param[10]</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="Integer">2</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<Or>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">AlertContext/DataItem/Params/Param[9]</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">1</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
<Expression>
<SimpleExpression>
<ValueExpression>
<XPathQuery Type="String">AlertContext/DataItem/Params/Param[9]</XPathQuery>
</ValueExpression>
<Operator>Equal</Operator>
<ValueExpression>
<Value Type="String">3</Value>
</ValueExpression>
</SimpleExpression>
</Expression>
</Or>
</Expression>
</And>
</ServiceUnavailableExpression>
<SecurityIssueExpression>
<Not>
<Expression/>
</Not>
</SecurityIssueExpression>
<CriticalErrorExpression>
<Not>
<Expression/>
</Not>
</CriticalErrorExpression>
<ErrorExpression>
<Not>
<Expression/>
</Not>
</ErrorExpression>
<WarningExpression>
<Not>
<Expression/>
</Not>
</WarningExpression>
<SuccessExpression>
<Not>
<Expression/>
</Not>
</SuccessExpression>
<InformationExpression>
<Not>
<Expression/>
</Not>
</InformationExpression>
</Configuration>
</UnitMonitor>