Memory Utilization

Microsoft.HPC.2008R2.Monitor.BrokerNode.Performance.MemoryUtilization (UnitMonitor)

Memory utilization performance monitor for HPC 2008 R2 Broker Node

Knowledge Base article:

Summary

This monitor tracks the memory utilization on a broker node. The memory utilization is based on the corresponding Windows performance counter.

Configuration

This monitor is disabled by default. You can enable it and configure the memory threshold and the frequency of sampling.

Causes

Sustained high memory usage is usually caused by the node managing a large number of service-oriented architecture (SOA) sessions. If the node is occupied for an extended period of time, other jobs may need to wait in the queue for a longer duration than usual.

Resolutions

To troubleshoot and fix this problem:

You can take the node offline so that the job scheduler does not schedule jobs on the node.

You can optionally enable the recovery task in Operations Manager to take a broker node offline automatically if the memory utilization exceeds the configured threshold. However, to use the broker node again after the utilization goes below the threshold, you must manually bring the broker node online.

Element properties:

TargetMicrosoft.HPC.2008R2.BrokerNode
Parent MonitorSystem.Health.PerformanceState
CategoryPerformanceHealth
EnabledTrue
Instance NameMemory
Counter Name\% Committed Bytes In Use
Frequency300
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeSystem.Performance.ThresholdMonitorType
RemotableTrue
AccessibilityPublic
Alert Message
Memory Utilization has exceeded the upper threshold
Please see the alert context for details.
RunAsMicrosoft.HPC.RunAsProfile.AdminActionAccount

Source Code:

<UnitMonitor ID="Microsoft.HPC.2008R2.Monitor.BrokerNode.Performance.MemoryUtilization" Accessibility="Public" Enabled="true" Target="Microsoft.HPC.2008R2.BrokerNode" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" RunAs="HPCLibrary!Microsoft.HPC.RunAsProfile.AdminActionAccount" TypeID="Performance!System.Performance.ThresholdMonitorType" ConfirmDelivery="false">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HPC.2008R2.Monitor.BrokerNode.Performance.MemoryUtilization_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="UnderThreshold" MonitorTypeStateID="UnderThreshold" HealthState="Success"/>
<OperationalState ID="OverThreshold" MonitorTypeStateID="OverThreshold" HealthState="Error"/>
</OperationalStates>
<Configuration>
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<CounterName>% Committed Bytes In Use</CounterName>
<ObjectName>Memory</ObjectName>
<InstanceName/>
<AllInstances>false</AllInstances>
<Frequency>300</Frequency>
<Threshold>90</Threshold>
</Configuration>
</UnitMonitor>