Error Group Error Count Monitor

Microsoft.SystemCenter.CM.AEM.Views.Internal.HitCountWatsonBucketThreshold (AggregateMonitor)

This monitor checks for number of errors caused by this error group.

Knowledge Base article:

Summary

This monitor checks for total number of errors reported for a error group. The number of application errors reported to AEM for the error group has exceeded threshold.

Causes

The number of errors reported to AEM within the sampling period has exceeded the threshold value. This may be caused due to several factors depending on the Application for this group that is crashing. It could also be due to hardware problems on the computers reporting to AEM.

Resolution

Additional steps that can be taken to diagnose the problem are as follows:

If transmission of Error reports has been turned on, verify if there is a Microsoft solution available. The solution, if available, can provide additional information to resolve the problem.

Review the Error Events view to obtain link to the cab file from the crash in the Persisted Cabs folder. If this is an old Error report, the cab file may have been groomed out.

For those applications that may not get a Microsoft Solution URL response, the crash information would be required by Application support or Application development team.

A recent Service Pack or Quick Fix (QFE) that has been applied to AEM Managed computers could lead to a sudden rise in number of error across these computers. Please check to see if any such software update was applied.

Further diagnosis of this problem requires review of the related 'Top N Error Groups' report to understand if this is the most critical error group based on aggregated count.

Element properties:

TargetMicrosoft.SystemCenter.CM.AEM.WatsonBucket
Parent MonitorSystem.Health.AvailabilityState
AlgorithmWorstOf
CategoryAlert
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityHigh
Alert Auto ResolveTrue
RemotableTrue
AccessibilityPublic
Alert Message
Error Group Error Count Monitor
The number of crashes for the error group has exceeded the threshold. The total number of crashes recorded are {0}. The threshold value for the monitor is {1}.

Source Code:

<AggregateMonitor ID="Microsoft.SystemCenter.CM.AEM.Views.Internal.HitCountWatsonBucketThreshold" Target="AEMLib!Microsoft.SystemCenter.CM.AEM.WatsonBucket" Accessibility="Public" Enabled="true" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal">
<Category>Alert</Category>
<AlertSettings AlertMessage="Microsoft.SystemCenter.CM.AEM.Views.Internal.HitCountWatsonBucketThreshold.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>High</AlertPriority>
<AlertParameters>
<AlertParameter1>$Data/Context/Columns/Column[1]$</AlertParameter1>
<AlertParameter2>$Data/Context/Columns/Column[3]$</AlertParameter2>
</AlertParameters>
</AlertSettings>
<Algorithm>WorstOf</Algorithm>
</AggregateMonitor>