Error Group Computers Monitor

Microsoft.SystemCenter.CM.AEM.Views.Internal.ComputersAffectedWatsonBucketThreshold (AggregateMonitor)

This monitor checks for number of computers affected by this error group.

Knowledge Base article:

Summary

This monitor checks for total number of unique computers experiencing crashes for a error group. The number of unique computers experiencing Application Errors for an Error group as reported to AEM has exceeded threshold.

Causes

The number of errors reported to AEM within the sampling period has exceeded the threshold value for unique computers. This may be caused due to several factors depending on the Application that is crashing. It could also be due to hardware problems on the computers reporting to AEM..

Resolution

Additional steps that can be taken to diagnose the problem are as follows:

If transmission of Error reports has been turned on, verify if there is a Microsoft solution available. The solution, if available, can provide additional information to resolve the problem.

Review the Error Events view to obtain link to the cab file from the crash in the Persisted Cabs folder. If this is an old Error report, the cab file may have been groomed out.

For those applications that may not get a Microsoft Solution URL response, the crash information would be required by Application support or Application development team

A recent Service Pack or Quick Fix (QFE) that has been applied to AEM Managed computers could lead to a sudden rise in number of error across these computers. Please check to see if any such software update

Further diagnosis of this problem requires review of the related Top N Error group report to identify the top N Error groups to understand if this is the most critical Error group based on aggregated count.

Element properties:

TargetMicrosoft.SystemCenter.CM.AEM.WatsonBucket
Parent MonitorSystem.Health.AvailabilityState
AlgorithmWorstOf
CategoryAlert
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityHigh
Alert Auto ResolveTrue
RemotableTrue
AccessibilityPublic
Alert Message
Error Group Computers Monitor
The number of unique computers experiencing crashes for the error group have exceeded the threshold. The total number of unique computers impacted are {0}. The threshold value for the monitor is {1}.

Source Code:

<AggregateMonitor ID="Microsoft.SystemCenter.CM.AEM.Views.Internal.ComputersAffectedWatsonBucketThreshold" Target="AEMLib!Microsoft.SystemCenter.CM.AEM.WatsonBucket" Accessibility="Public" Enabled="true" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal">
<Category>Alert</Category>
<AlertSettings AlertMessage="Microsoft.SystemCenter.CM.AEM.Views.Internal.ComputersAffectedWatsonBucketThreshold.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>High</AlertPriority>
<AlertParameters>
<AlertParameter1>$Data/Context/Columns/Column[1]$</AlertParameter1>
<AlertParameter2>$Data/Context/Columns/Column[3]$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<Algorithm>WorstOf</Algorithm>
</AggregateMonitor>