Alarms & Conditions

Alarms & Conditions


Alarms & Conditions

Overview

The STM can generate an alarm or event based on a condition, such as a threshold being exceeded e.g. on a 10G interface, if the rate exceeds 9G continuously for 30 seconds, flag a critical alarm. 

The GUI provides menus for the creation and modification of conditions. Alarms can also be configured via the CLI (examples below). It also provides a window where alarm status is tracked. All currently active alarms will be tracked with color coding for the criticality. Clicking on an alarm will provide all information about it.
Triggering a condition, whether it results in an alarm, is captured by the history collector, with the same information that appears in an alarm.

Key notes on alarms are as follows:

  • All objects in the system are continuously monitored
  • One or more conditions can be specified, based on the attributes of an object - for example "any application with distress>0.1 and total_rate>1 Mbit/sec"
  • Every condition is associated with one or more actions
  • An action can specify a variety of things to happen when it is triggered by a condition, such as making a configuration change or creating an alarm
  • When an action is triggered, an alarm object is created. This records that the action was triggered and contains information about the triggering condition, such as the specific object and attribute value that caused it
  • Alarms can be monitored via the Rest API and can be updated e.g. to record that they have been noted or acted upon
  • When the alarm condition is cleared, the alarm itself is marked as cleared. It is possible to restore changed attribute values, set them explicitly, and/or to run a script.
  • every alarm will be recorded in the history, and the GUI provides a means for superimposing alarms on history and real-time graphs

Descriptions

  • Condition: a condition object represents a condition to be monitored. It specifies an object, a condition to be monitored, and one or more actions to be taken if the condition is met. Conditions are explicitly created via the Rest API. A condition object also specifies the action to be taken, from:
  • make a configuration change, i.e. set the value of an attribute of the object which triggered it
  • send an SNMP trap
  • run a script
  • send an email / SMS / etc
  • create an alarm, with specified information associated with it
  • Alarm: these are created spontaneously by an action when it is triggered. An alarm records the triggering object, one or more associated attribute values, creation time, criticality level, and status. It also includes some tracking facilities, such as when state changes were made and by whom.

Conditions

A condition has the following attributes (in addition to the common attributes such as name and creation time):

  • object class: the class of object to be monitored (e.g. application, user)
  • filter condition:  a condition to be evaluated. This uses the same syntax as a rest '?with' filter, e.g. 'distress>0.1,total_rate>1000'. 
  • clearing filter condition: condition to be evaluated for the alarm to be cleared. This is in the same sense as the filter condition, i.e. the filter must test false for clearing to be possible
  • applicable name: a (possibly empty) set of possibly wildcarded object names that this condition applies to, e.g. '*.youtube.*,*.google.*'. If nothing is specified (the default), the condition will match any object of the specified class
  • applicable groups: a (possibly empty) set of groups that the object must be part of for the condition to apply. Membership of any one of the groups is enough.
  • delay: a delay during which the condition must remain true before the condition is triggered, to avoid triggering on short-term spikes
  • clearing delay: delay during which the clearing condition must be false, before the alarm is automatically cleared
  • enabled: if false, the condition will not trigger
  • severity: a value selected from minor/major/critical/warning 

The following attributes describe the action to be taken when the condition is triggered.

  • change attribute: the attribute to be changed (if specified)
  • attribute value: the new value for the attribute (a constant)
  • clear value: the value to be set when the alarm is cleared
  • restore value: a boolean indicating (if True) that the attribute value should be restored to its former value when the alarm is cleared
  • script: the name of a script object to run
  • clearing script: the name of a script object to run when the alarm is cleared
  • email address: the destination address of an email to be sent (it's assumed that other message types, e.g. SMS, can be sent using an email address)
  • email subject: the subject line to be used in the email
  • email body: the body of the text to be sent in the email. 

Alarms

An alarm object has the following attributes (in addition to the common named_object attributes):

  • object name: the name of the object that triggered the condition
  • condition: the name of the condition that triggered the action
  • severity: the severity of the alarm, as defined in the triggering condition
  • cleared_time: the time the alarm was cleared, or blank if it has not yet been cleared
  • acknowledged_time: the time the alarm was acknowledged, or blank if it has not been acknowledged
  • status changes: a set of zero or more status change objects, each containing time, administrator name, previous and new status, description (i.e. random text).
  • attribute values: a snapshot of the values of all attributes specified in the filter, at the time the alarm object was created

In addition, the attribute acknowledged can be set to true to acknowledge the alarm. This automatically sets the acknowledged_time.

Alarm objects are automatically deleted once a certain time has passed since they were cleared, by default one hour.

Configuration Examples

The following example creates an alarm condition which will be triggered when an application has a total_rate greater than 10000 (10 Mbit/sec). Note the quoted syntax for the filter. All of the attributes will be the defaults. A trap will be generated for the alarm.

cli# condition c1 object_class application filter 'total_rate>10000'

The following example creates an alarm condition which will be triggered if the rate for a geolocation in the group 'bad_places' exceeds 5 Mbit/sec. It will not clear until the rate has been below 1 Mbit/sec for at least 10 minutes.

cli# condition c2 object_class geolocation filter 'total_rate>5000' group bad_places clearing_filter 'total_rate<1000' clear_delay 10m

The following example creates an alarm condition which will be triggered immediately if a host's is_potential_threat attribute becomes true, and will set its host policy to 'hostile'

cli# condition c3 object_class host filter is_potential_threat delay 0s attribute_name policy attribute_value hostile