Using SLA alerting to build a 2-tier monitoring system

About SLA alerting

  • SLA alerting concepts were introduced in TrackMe 2.0.92.

  • The concepts of SLA alerting can be leveraged to design a 2-tier monitoring system, where a first alert is emitted when TrackMe entities switch to a red state, and a second independent alert is emitted when the SLA is breached.

  • This can be translated as “alert: this TrackMe entity is red!”, then later on “alert: This entity has now spent too long in alert and is breaching its SLA!”.

  • Entities are associated with a given SLA class, each SLA class has a threshold in seconds and a numerical rank value.

  • Available SLA classes and their configuration are defined at the level of the TrackMe system configuration. (A JSON object which defines the list of classes and their parameters)

  • You can then define the SLA class per entity (which if defined will take precedence over policy-based classes), or via SLA policies. SLA policies support two modes:

    • Regex-based policies match regular expressions against the entity name and assign an SLA class.

    • Lookup-based policies leverage existing Splunk lookups (CSV or KVstore) to assign SLA classes based on field mappings, making it easy to integrate with a CMDB or asset inventory.

  • If multiple policies match, the highest ranked SLA class wins (based on the rank value defined in the SLA class configuration).

  • Finally, you can create an SLA alert, which leverages the SLA information to emit an alert when the SLA is breached.

SLA Classes and Thresholds

SLA classes are defined at the level of the TrackMe system configuration, SLA classes define a threshold in seconds and a numerical rank value:

configure_classes.png

The default SLA classes are:

{
    "gold": {
        "sla_threshold": 14400,
        "rank": 3
    },
    "silver": {
        "sla_threshold": 86400,
        "rank": 2
    },
    "platinum": {
        "sla_threshold": 172800,
        "rank": 1
    }
}

Notes:

  • You can add / remove / change classes as needed.

  • You can define the default class to be used when TrackMe entities are discovered. (parameter: default_sla_class)

  • The Threshold value is in seconds. For an SLA to be breached, a given TrackMe entity must be in alert for a continuous amount of time exceeding the threshold.

  • The rank value is a numerical value, it is used to handle any conflict when applying SLA policies, the highest rank value will always win.

SLA Tab in TrackMe UI

The SLA feature is first translated into a tab called SLA which describes the current SLA status for the selected TrackMe entity:

The entity is in green state, therefore the SLA cannot be breached:

sla_tab_green.png

The entity is red state, however it has not yet breached the SLA threshold by spending a continuous amount of time in alert higher than the threshold:

sla_tab_red_not_breached.png

The entity is red state, and it has breached the SLA threshold by spending a continuous amount of time in alert higher than the threshold:

sla_tab_red_breached.png

Defining the SLA class per entity

You can define the SLA class manually on per entity basis: (via the UI or via the associated endpoint)

Hint

If the SLA has been defined manually, it will take precedence against policies based classes. (see next section)

sla_per_entity_001.png sla_per_entity_002.png

Defining SLA policies to assign SLA classes automatically

TrackMe supports the management of SLA classes via policy, which can be defined per Virtual Tenant:

  • SLA policies allow you to automate SLA class assignment across entities using regex-based or lookup-based rules.

  • Regex-based policies match regular expressions against the entity name (object field) and assign the configured SLA class.

  • Lookup-based policies leverage existing Splunk lookups (CSV or KVstore) to assign SLA classes based on field mappings, making it easy to integrate with a CMDB or asset inventory.

  • Matching entities are automatically updated with the SLA class defined in the policy.

  • If multiple policies match a given entity, the highest ranked SLA class takes precedence across all policy types (based on the rank value defined in the SLA class configuration).

  • An entity managed by policies can still be updated manually, and the policy will not override the manual update.

Lookup-based SLA policies

Lookup-based policies leverage existing Splunk lookup transforms (CSV files or KVstore collections) to assign SLA classes to entities based on field mappings. This is the recommended approach when integrating with an external CMDB, asset inventory, or any structured data source.

Creating a lookup-based SLA policy:

  • Click Create new policy and select the Lookup mode

  • Enter a policy identifier (or let TrackMe auto-generate one)

  • Select a Splunk lookup transform from the dropdown — TrackMe lists all available lookup transforms in your Splunk environment

  • Configure field mappings to map lookup fields to entity fields — for example, map the lookup field index to the entity field data_index, and the lookup field host to the entity field object

  • Select the SLA class field — the field in the lookup that contains the SLA class value

  • Optionally configure SLA class value mappings to translate foreign values to TrackMe’s SLA classes (e.g., tier1gold, tier2silver)

  • Select the match mode: Exact (case-insensitive string matching) or Wildcard (supports * and ? patterns)

  • Click Simulate to preview which entities would be matched and what SLA classes would be assigned

Lookup policy configuration details

  • Field mappings: Define how lookup fields map to entity fields. All mapped fields must match for an entity to be selected. For example, if you map indexdata_index and hostobject, both conditions must be satisfied.

  • SLA class field: The column in the lookup containing the SLA class value (e.g., a sla_tier or service_level column).

  • SLA class value mappings (optional): If your lookup uses custom values like tier1, tier2, tier3, you can map them to TrackMe SLA classes (gold, silver, platinum, etc.). If no mappings are configured, the lookup values must already match TrackMe’s SLA class names.

  • Match modes: Exact performs case-insensitive string comparison. Wildcard supports * (matches any characters) and ? (matches a single character) for flexible pattern matching.

Regex-based SLA policies

Regex-based policies match a regular expression pattern against the entity name (object field). When the pattern matches, the configured SLA class is assigned to the entity.

Creating a regex-based SLA policy:

  • Click Create new policy and select the Regex mode

  • Enter a policy identifier (or let TrackMe auto-generate one)

  • Enter the regex pattern to match against entities

  • Select the SLA class to assign to matched entities

  • Click Simulate to preview which entities would be matched before saving

For instance, say we want to match entities containing “cribl”:

sla_policies_001.png sla_policies_002.png sla_policies_003.png

SLA class hierarchy and conflict resolution

When multiple policies (regex, lookup, or a mix of both) match the same entity, TrackMe applies the highest ranked SLA class across all matching policies.

The rank is determined by the numerical rank value defined in the SLA class configuration (see SLA Classes and Thresholds above). The policy with the highest rank value wins.

For example, if a regex policy assigns silver (rank 2) and a lookup policy assigns gold (rank 3) to the same entity, the entity will receive the gold SLA class.

Running the SLA policy tracker

After creating one or more policies, you can execute the SLA policy tracker to apply all configured policies to your entities:

  • Click the Run policy tracker button in the policy management screen

  • TrackMe will evaluate all regex and lookup policies against all entities in the selected component

  • A summary of the results is displayed, including the number of entities updated, matched, and any errors encountered

sla_policies_004.png

From this stage, all matching entities will get the highest ranked policy and its associated threshold:

sla_policies_005.png

SLA Alerts

From TrackMe alert tabs, you can now create an SLA alert:

Hint

The TrackMe component must be defined and match your target (there would be one alert per component in the tenant)

sla_alerts_001.png

Notes:

  • This alert is independent from the TrackMe main alerts and notable alert

  • If it triggers, it means that the SLA has been breached for one or more entities

  • If the SLA is breached, the concept is to say that basically we had a first alert, the issue is not fixed after an acceptable amount of time so we generate a second alert once the SLA threshold has been breached. (2 tiers alerting system)

Example:

sla_alerts_002.png