Ambari Timeout monitor

Ambari.SCOM.TimeoutMonitor (UnitMonitor)

This monitor detects timeout for Ambari API.

Knowledge Base article:

Summary

This monitor detects timeout for Ambari API.

Causes

This monitor turns to Warning state if Ambari Management Pack gets a repeatable timeouts trying to connect to Ambari server. Ambari server might be overloaded.

Resolutions

- Check if Ambari server is overloaded

- Check for network issues which may affect connection between watcher and Ambari server

Element properties:

TargetAmbari.SCOM.AmbariWatcherNode
Parent MonitorSystem.Health.AvailabilityState
CategoryCustom
EnabledTrue
Alert GenerateTrue
Alert SeverityError
Alert PriorityNormal
Alert Auto ResolveTrue
Monitor TypeAmbari.SCOM.RepeatedEventLogTimer2StateMonitorType
RemotableTrue
AccessibilityPublic
Alert Message
Timeout errors on the watcher {0} during connection to Hadoop Ambari server.
Timeout errors on the watcher {0} during connection to Hadoop Ambari server.
RunAsDefault

Source Code:

<UnitMonitor ID="Ambari.SCOM.TimeoutMonitor" Accessibility="Public" Enabled="true" Target="Ambari.SCOM.AmbariWatcherNode" ParentMonitorID="Health!System.Health.AvailabilityState" Remotable="true" Priority="Normal" TypeID="Ambari.SCOM.RepeatedEventLogTimer2StateMonitorType" ConfirmDelivery="true">
<Category>Custom</Category>
<AlertSettings AlertMessage="Ambari.SCOM.TimeoutMonitor.AlertMessage">
<AlertOnState>Error</AlertOnState>
<AutoResolve>true</AutoResolve>
<AlertPriority>Normal</AlertPriority>
<AlertSeverity>Error</AlertSeverity>
<AlertParameters>
<AlertParameter1>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</AlertParameter1>
</AlertParameters>
</AlertSettings>
<OperationalStates>
<OperationalState ID="HealthyState" MonitorTypeStateID="TimerEventRaised" HealthState="Success"/>
<OperationalState ID="ErrorState" MonitorTypeStateID="RepeatedEventRaised" HealthState="Error"/>
</OperationalStates>
<Configuration>
<ComputerName>$Target/Host/Property[Type="Windows!Microsoft.Windows.Computer"]/NetworkName$</ComputerName>
<LogName>Operations Manager</LogName>
<EventNumber>1904</EventNumber>
<TimeInterval>4500</TimeInterval>
<Count>3</Count>
</Configuration>
</UnitMonitor>