Cluster Power Consumption Efficiency

Microsoft.HPC.2008R2.Monitor.HeadNode.Performance.PowerConsumption (UnitMonitor)

Cluster power consumption efficiency performance monitor for HPC 2008 R2 cluster

Knowledge Base article:

Summary

This monitor tracks the power consumption efficiency of the compute nodes in the HPC cluster. The power consumption efficiency value is reflected by the utilization of the compute nodes, defined as the number of currently busy cores (the cores that have tasks allocated to them) divided by the number of currently available cores (the cores that are online and reachable).

Configuration

This monitor is disabled by default. You can enable it and configure the power consumption threshold for Healthy and Warning states. (For information about how to override the threshold values, see the management pack guide.)

Causes

By default, the monitor is in a warning state when the power consumption efficiency of compute node cores in the cluster is less than 40% at five consecutive sampling points (it samples every 5 mins while the frequency is configurable). At a higher percentage, the monitor is in a Healthy state.

Resolutions

You can enable one of the power consumption adjustment rules to increase the cluster power consumption efficiency:

Element properties:

TargetMicrosoft.HPC.2008R2.ActiveHeadNode
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledFalse
Alert GenerateTrue
Alert SeverityMatchMonitorHealth
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeMicrosoft.HPC.2008R2.MonitorType.PowerConsumptionMonitor
RemotableTrue
AccessibilityPublic
Alert Message
Power consumption efficiency of the cluster is low
Please see the alert context for details.
RunAsMicrosoft.HPC.RunAsProfile.AdminActionAccount

Source Code:

<UnitMonitor ID="Microsoft.HPC.2008R2.Monitor.HeadNode.Performance.PowerConsumption" Accessibility="Public" Enabled="false" Target="Microsoft.HPC.2008R2.ActiveHeadNode" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" RunAs="HPCLibrary!Microsoft.HPC.RunAsProfile.AdminActionAccount" TypeID="Microsoft.HPC.2008R2.MonitorType.PowerConsumptionMonitor" ConfirmDelivery="true">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HPC.2008R2.Monitor.HeadNode.Performance.PowerConsumption_AlertMessageResourceID">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="UIGeneratedOpStateId2fab8a3b60b34522aef5c20d9ca4a347" MonitorTypeStateID="Low" HealthState="Warning"/>
<OperationalState ID="UIGeneratedOpStateId89b8bd8ca6fc41ccbddbfce327846b40" MonitorTypeStateID="High" HealthState="Success"/>
</OperationalStates>
<Configuration>
<ClusterName>$Target/Property[Type="Microsoft.HPC.2008R2.ActiveHeadNode"]/ClusterName$</ClusterName>
<Threshold>40</Threshold>
<NumSamples>5</NumSamples>
<TimeoutSeconds>300</TimeoutSeconds>
<IntervalSeconds>300</IntervalSeconds>
</Configuration>
</UnitMonitor>