Memory utilization performance monitor for HPC 2008 R2 Broker Node
This monitor tracks the memory utilization on a broker node. The memory utilization is based on the corresponding Windows performance counter.
This monitor is disabled by default. You can enable it and configure the memory threshold and the frequency of sampling.
Sustained high memory usage is usually caused by the node managing a large number of service-oriented architecture (SOA) sessions. If the node is occupied for an extended period of time, other jobs may need to wait in the queue for a longer duration than usual.
To troubleshoot and fix this problem:
You can take the node offline so that the job scheduler does not schedule jobs on the node.
You can optionally enable the recovery task in Operations Manager to take a broker node offline automatically if the memory utilization exceeds the configured threshold. However, to use the broker node again after the utilization goes below the threshold, you must manually bring the broker node online.
Target | Microsoft.HPC.2008R2.BrokerNode | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | True | ||
Instance Name | Memory | ||
Counter Name | \% Committed Bytes In Use | ||
Frequency | 300 | ||
Alert Generate | True | ||
Alert Severity | Error | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | System.Performance.ThresholdMonitorType | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Microsoft.HPC.RunAsProfile.AdminActionAccount |
<UnitMonitor ID="Microsoft.HPC.2008R2.Monitor.BrokerNode.Performance.MemoryUtilization" Accessibility="Public" Enabled="true" Target="Microsoft.HPC.2008R2.BrokerNode" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" RunAs="HPCLibrary!Microsoft.HPC.RunAsProfile.AdminActionAccount" TypeID="Performance!System.Performance.ThresholdMonitorType" ConfirmDelivery="false">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HPC.2008R2.Monitor.BrokerNode.Performance.MemoryUtilization_AlertMessageResourceID">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="UnderThreshold" MonitorTypeStateID="UnderThreshold" HealthState="Success"/>
<OperationalState ID="OverThreshold" MonitorTypeStateID="OverThreshold" HealthState="Error"/>
</OperationalStates>
<Configuration>
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<CounterName>% Committed Bytes In Use</CounterName>
<ObjectName>Memory</ObjectName>
<InstanceName/>
<AllInstances>false</AllInstances>
<Frequency>300</Frequency>
<Threshold>90</Threshold>
</Configuration>
</UnitMonitor>