Memory utilization performance monitor for HPC 2008 R2 Workstation Node
This monitor tracks the memory utilization on a compute node or a workstation node. The memory utilization is based on the corresponding Windows performance counter.
This monitor is disabled by default. You can enable it and configure the memory threshold and the frequency of sampling.
Sustained high memory usage is usually caused by a memory intensive job occupying the node. It may cause the node hang or significantly impact the performance of the tasks running on that node.
To troubleshoot and fix this problem:
You can take the node offline so that the job scheduler does not schedule jobs on the node.
You can optionally add a recovery task in Operations Manager to automatically trigger custom actions if the memory utilization exceeds the configured threshold.
Target | Microsoft.HPC.2008R2.WorkstationNode | ||
Parent Monitor | System.Health.PerformanceState | ||
Category | PerformanceHealth | ||
Enabled | False | ||
Instance Name | Memory | ||
Counter Name | \% Committed Bytes In Use | ||
Frequency | 300 | ||
Alert Generate | True | ||
Alert Severity | MatchMonitorHealth | ||
Alert Priority | Normal | ||
Alert Auto Resolve | True | ||
Monitor Type | System.Performance.ThresholdMonitorType | ||
Remotable | True | ||
Accessibility | Public | ||
Alert Message |
| ||
RunAs | Microsoft.HPC.RunAsProfile.AdminActionAccount |
<UnitMonitor ID="Microsoft.HPC.2008R2.Monitor.WorkstationNode.Performance.MemoryUtilization" Accessibility="Public" Enabled="false" Target="Microsoft.HPC.2008R2.WorkstationNode" ParentMonitorID="Health!System.Health.PerformanceState" Remotable="true" Priority="Normal" RunAs="HPCLibrary!Microsoft.HPC.RunAsProfile.AdminActionAccount" TypeID="Performance!System.Performance.ThresholdMonitorType" ConfirmDelivery="false">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.HPC.2008R2.Monitor.WorkstationNode.Performance.MemoryUtilization_AlertMessageResourceID">
<AlertOnState>Warning</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>MatchMonitorHealth</AlertSeverity>
</AlertSettings>
<OperationalStates>
<OperationalState ID="UnderThreshold" MonitorTypeStateID="UnderThreshold" HealthState="Success"/>
<OperationalState ID="OverThreshold" MonitorTypeStateID="OverThreshold" HealthState="Warning"/>
</OperationalStates>
<Configuration>
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<CounterName>% Committed Bytes In Use</CounterName>
<ObjectName>Memory</ObjectName>
<InstanceName/>
<AllInstances>false</AllInstances>
<Frequency>300</Frequency>
<Threshold>90</Threshold>
</Configuration>
</UnitMonitor>