Send Queue \% Used

Microsoft.SystemCenter.HealthService.Performance.SendQueuePercentUsedMonitor (UnitMonitor)

This monitor measures the Health Service Management Groups\Send Queue \% Used counter for the Health service.

Knowledge Base article:

Summary

This monitor measures the Health Service Management Groups\Send Queue % Used and generates the following states:

Monitor State

Send Queue % Used Threshold

Critical

60 %

Causes

This can be caused by a low bandwidth or high latency connection from this System Center Management Health Service to its parent Management Server. This can also be caused by rules that are collecting more data than the parent Management Server can process; especially when the parent Management Server has many agents reporting to it sending large amounts of data.

Resolutions

Check with your network administrators if the network connection from the System Center Management Health Service to the parent Management Servers is saturated. If so, you may need to upgrade your networks to accommodate the traffic.

If you cannot upgrade your network (e.g. if your System Center Management Health Service or Gateway Server is at a remote branch office), you can disable unnecessary collection rules. Below are a list of rule types you can disable and their impact of disabling them:

Rule Type

Rule Purpose

Impact when disabled

Performance Collection

Collects performance data to either the Operational Database, Data Warehouse, or both

When a performance collection rule is disabled, any view that shows that performance data will no longer have data viewable. If the rule was collecting data to the Data Warehouse, reports dependent on that performance will no longer render any data.

Event Collection

Collects event data for diagnosis. In some cases, an event may not be helpful to alert on, but is helpful for either forensic troubleshooting or near real time troubleshooting.

When an event collection rule is disabled, any view that shows that event data will no longer have data viewable. If the rule was collecting data to the Data Warehouse, reports dependent on that event will no longer render any data.

Lastly, if you still need that data, one other option you can implement in the system to attempt to reduce the amount of data sent over the network is to use optimized performance counter collection rules and event consolidation collection rules. The below table summarizes their benefit and explains how the data is summarized.

Rule Type

Benefit

How data is summarized

Optimized Performance Collection Rule

Only sends the performance data sample if it deviates from the last sample within some percentage. E.g., if the last sample was 42, and the rule was configured to only collect to a new sample with a tolerance of 10%, the next sample will need to 42 +/- 4.2 (e.g. next sample needs to be greater than 46.2 or less than 37.8)

Because only performance data that exceeds the configured tolerance is sent to the Operational Database or Data Warehouse, the data will be less precise. The larger your tolerance, the less the precision.

Consolidated Event Collection Rule

This type of event collection rule sends the data if one of the parameters it is configured with differs from the last event. E.g., you can configure a consolidated collection rule to consolidate events where the following are identical:

  • Event Source

  • Event ID

  • Source Computer

  • Description

You can then configure a timeframe to consolidate these events (e.g. 10 minutes). If the above criteria match for any event, within that 10 minute window, only 1 event is sent up with its Repeat Count property incremented. If this event was occurring frequently on a single agent, this means there would only be 144 events sent up in a 24 hour period, which may be significantly less than the number of events actually logged to the event log

You need to be aware of which event parameters and properties you consolidate on. Configuring for example on the Description will mean that if the Event Description is typically unique (e.g. it contains a username) then you will still get many events sent up. For that example, you would instead want to consolidate over the Event Parameter that represents the username field.

Also, having a very large consolidation windows has to affects:

  • Delayed events viewable in the Event View or Reports (since the data needs to be consolidated until the end of that consolidation window)

  • Slightly higher resource utilization on the agent. With a low number of consolidation rules, this may be negligible. With a large number of these rules types compounded with long consolidation windows, the resource utilization will increase correspondingly.

See the product help or navigate to the Authoring space in the console to create the type of rules mentioned above.

Element properties:

TargetMicrosoft.SystemCenter.HealthService
Parent MonitorMicrosoft.SystemCenter.HealthService.PerformanceHealthRollup
CategoryPerformanceHealth
EnabledTrue
Instance NameHealth Service Management Groups
Counter NameSend Queue \% Used
Frequency60
Alert GenerateTrue
Alert SeverityError
Alert PriorityHigh
Alert Auto ResolveTrue
Monitor TypeSystem.Performance.ConsecutiveSamplesThreshold
RemotableTrue
AccessibilityPublic
Alert Message
{0}: The Health Service send queue on this system is filling up
When the System Center Management Health Service is receiving data faster than it can send that data out, it will begin queuing the excess data. The queue has a fixed size and if that is reached, then the System Center Management Health Service will start grooming data out of the queue. When this alert was generated, the "Send Queue \% Used" counter for this system was {0}. Refer to the knowledge for more details on possible causes and troubleshooting steps.
RunAsDefault

Source Code:

<UnitMonitor ID="Microsoft.SystemCenter.HealthService.Performance.SendQueuePercentUsedMonitor" Accessibility="Public" Enabled="true" Target="SCLibrary!Microsoft.SystemCenter.HealthService" ParentMonitorID="Microsoft.SystemCenter.HealthService.PerformanceHealthRollup" Remotable="true" Priority="Normal" TypeID="Performance!System.Performance.ConsecutiveSamplesThreshold" ConfirmDelivery="false">
<Category>PerformanceHealth</Category>
<AlertSettings AlertMessage="Microsoft.SystemCenter.HealthService.Performance.SendQueuePercentUsedMonitor.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>High</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Data/Context/Value$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="BelowThreshold" MonitorTypeStateID="ConditionFalse" HealthState="Success"/>
<OperationalState ID="OverThreshold" MonitorTypeStateID="ConditionTrue" HealthState="Error"/>
</OperationalStates>
<Configuration>
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<CounterName>Send Queue % Used</CounterName>
<ObjectName>Health Service Management Groups</ObjectName>
<InstanceName>$Target/ManagementGroup/Name$</InstanceName>
<AllInstances>false</AllInstances>
<Frequency>60</Frequency>
<Threshold>90</Threshold>
<Direction>greaterequal</Direction>
<NumSamples>5</NumSamples>
</Configuration>
</UnitMonitor>