Outliers Anomaly Detection

Machine Learning Outliers Anomaly Detection in TrackMe

TrackMe implements Machine Learning Outliers Anomaly detection across all components, from feeds tracking to the monitoring of scheduled activity in Splunk.

TrackMe implements its own native ML density function engine (TrackMeNativeDensityFunction), providing a built-in outlier detection system. Models are stored and managed in per-tenant KVstore collections, delivering better performance, easier management, and eliminating replication overhead in Search Head Cluster (SHC) environments. The DensityFunction algorithm from the Splunk AI Toolkit is also available and can be selected per model. The workflow operates in two phases:

  • ML training (mltrain): models are generated and trained from historical metrics stored in TrackMe’s metric store indexes, and persisted to KVstore (or optionally to file)

  • ML monitoring (mlmonitor): TrackMe evaluates the anomaly detection status for each entity by applying trained models

Models are created automatically when entities are discovered and can be customized per entity to adjust behavior as needed.

Impact-Based Alerting and ML Outliers

TrackMe uses Impact-Based Alerting (IBA) to determine how ML Outliers affect entity status. When an outlier is detected, TrackMe generates outlier score events with a configurable impact score (default: 36). Only when the cumulative impact score reaches the alerting threshold (score >= 100) does the entity transition to a red alert state. This approach provides high flexibility and significantly reduces false positive risks.

  • Outlier impact scores are configurable per tenant and per component

  • Outliers alone (at default score 36) will not trigger a red alert unless combined with other anomalies or the score is tuned higher

  • One-click false positive suppression is available for individual outliers or entire entities

  • See QUICK START - TrackMe Impact-Based Alerting (IBA) for full details on Impact-Based Alerting

Key capabilities

  • Confidence levels: ML models have a confidence level (low or normal) based on available historical data. Models with low confidence do not influence entity status until sufficient data has been collected (configurable, default: 7 days)

  • Seasonality control: Models can include time-based seasonality factors or set the time_factor to none to exclude seasonal concepts for KPIs that are not driven by time-of-day or day-of-week patterns

  • Threshold guards: Per-model minimal and maximal thresholds for LowerBound and UpperBound outliers allow rejecting outliers that do not meet defined value boundaries

  • Auto-correction: Built-in deviation checks reject outliers where the variation is too small relative to the current KPI value, reducing false positives

  • True context simulation: Simulation mode trains a dedicated simulation model that fully replicates live model behavior, ensuring accurate previews before applying changes

  • Native ML engine: TrackMe’s built-in TrackMeNativeDensityFunction engine uses scipy with automatic best-fit distribution selection (normal, exponential, Gaussian KDE, beta), with the MLTK DensityFunction also available

  • KVstore model storage: ML models are stored in per-tenant KVstore collections by default, with file-based storage available as an alternative

  • Custom algorithms and parameters: Define custom algorithms, extra fit/apply parameters, and custom boundaries extraction macros for advanced ML requirements

  • Selective enablement: ML Outliers can be enabled or disabled per tenant and per component using the mloutliers and mloutliers_allowlist configuration options

  • Bulk operations: Reset, enable/disable, train, and monitor operations can be performed in bulk via the UI

See also: Use TrackMe to detect abnormal events count drop in Splunk feeds for a practical use case around Machine Learning outlier detection.

Seasonality Concepts

Most data sources exhibit recurring patterns — higher activity during business hours, lower volumes on weekends, periodic batch ingestion cycles. TrackMe’s ML Outliers engine leverages these patterns by training models that learn the expected behavior over time and flag deviations as potential anomalies.

Sample pattern over the past 30 days — seasonality by weekday with higher activity during working hours:

However, not all KPIs follow seasonal patterns. For metrics with steady, time-independent behavior (for example, a fixed-rate feed or a constant host count), applying seasonality to the model can introduce unnecessary noise. To handle these cases, ML models can be configured with the time_factor set to none, which excludes day-of-week and hour-of-day seasonality from the outlier calculations. This can be set on a per-model basis, or chosen as the default for new models via the splk_outliers_detection_timefactor_default configuration option.

time_factor_none.png

Note

Generating samples for outliers detection

You can find, download, and use the following sample generator with no restrictions: https://github.com/trackme-limited/mlgen-python — we use it to generate data with seasonality concepts for development, qualification, and documentation purposes.

Confidence Level

TrackMe assigns a confidence level to each ML model during training, based on the amount of historical data available:

  • low: The model is trained and rendered, but outlier results do not influence entity status. This prevents false positives while the model is still learning from limited data.

  • normal: The model is fully trusted and outlier results contribute to the entity’s impact score.

The minimum number of days of historical metrics required to reach normal confidence is controlled by the splk_outliers_min_days_history configuration option (default: 7 days). The confidence level and reason are visible in the Manage Outliers screen and stored in the rules KVstore collection.

Outliers Impact Score & triggering on Upper Bound/Lower Bound breaches

When TrackMe’s ML engine detects an outlier, it generates outlier score events that feed into the entity’s cumulative impact score. Whether an outlier actually influences the entity status depends on several factors:

Upper Bound and Lower Bound breach configuration:

Each ML model defines which breach directions are active:

  • alert_lower_breached: When enabled, a LowerBound outlier (observed value falls below the predicted lower boundary) generates a score event

  • alert_upper_breached: When enabled, an UpperBound outlier (observed value exceeds the predicted upper boundary) generates a score event

These settings are configured per model, giving you full control over which types of volume deviations are meaningful for each metric. For example, you may want to alert only on lower bound breaches for a critical feed (detecting data drops) while ignoring upper bound breaches that represent harmless spikes.

Impact score behavior:

  • The default outlier impact score is 36 (configurable per tenant and per component)

  • At the default score, a single outlier detection alone will not trigger a red alert (which requires a cumulative score >= 100)

  • Multiple concurrent outlier detections, or outliers combined with other anomalies (delay breach, latency breach, etc.), can accumulate to reach the alerting threshold

  • Outlier score events are aggregated over a 24-hour rolling window — older scores automatically expire

Threshold guards:

Per-model minimal and maximal value thresholds provide an additional layer of filtering:

  • min_value_for_lowerbound_breached: If the observed value remains above this minimum threshold, the LowerBound breach is considered insignificant and rejected

  • min_value_for_upperbound_breached: If the observed value remains below this minimum threshold, the UpperBound breach is considered insignificant and rejected

This filters out insignificant deviations — for example, a small dip in event count that technically breaches the lower boundary but represents a negligible variation. Rejected outliers do not generate score events and do not affect the entity’s impact score.

False positive suppression:

When an outlier is identified as a false positive, a one-click action generates a negative score event that cancels out the outlier’s positive score. The anomaly reason is preserved for audit purposes but no longer contributes to the entity status. See QUICK START - TrackMe Impact-Based Alerting (IBA) for details on false positive management.

Outliers impact score and breach configuration in the Manage Outliers screen:

Outliers per component

ML Outliers detection is available across TrackMe components, but the value and applicability varies significantly depending on the nature of the data being monitored.

Enabling Outliers at tenant creation

When creating a new Virtual Tenant, the Advanced Options section includes the Outliers enablement setting. TrackMe automatically pre-selects the recommended value based on the component type:

  • dsm and flx tenants: Outliers is enabled by default

  • dhm, wlk, and fqm tenants: Outliers is disabled by default

  • mhm tenants: Outliers is not applicable

This setting can be changed at any time after creation from the Virtual Tenants account configuration page, allowing you to enable or disable Outliers detection as your requirements evolve.

Applicability per component

ML Outliers applicability per component

Component

Recommendation

Details

dsm (Data Sources)

Recommended

High value for volume-based anomaly detection on data feeds. Particularly valuable for Foundation Edition customers who cannot leverage Flex Objects for a more segmented approach to outliers detection.

dhm (Data Hosts)

Not recommended

Host-level metrics tend to have low predictability and stability due to the dynamic nature of endpoints. Outliers detection on hosts typically generates excessive false positives.

mhm (Metric Hosts)

Not applicable

Outliers detection is not applicable to metric hosts.

wlk (Workload)

Not recommended

Workload monitoring generally has low predictability and limited value from outliers detection. TrackMe automatically disables Outliers by default when creating a Workload tenant.

flx (Flex Objects)

Key feature

Outliers is a core capability of Flex Objects, enabling precise anomaly detection on custom metrics with full control over model configuration. Can be disabled at the tenant level for use cases where outliers detection is not applicable (no metrics, or simply not wanted).

fqm (Fields Quality)

Not recommended

Fields quality metrics are generally not suitable for outliers detection due to their inherent variability.

Hint

The mloutliers_allowlist configuration option controls which components have ML Outliers enabled per tenant. By default, all applicable components are included (dsm,dhm,flx,wlk,fqm). Adjust this list to match your environment — for example, set it to dsm,flx to focus on the highest-value components.

Enabling/disabling Anomaly Outliers at the tenant level during tenant creation:

Enabling/disabling Anomaly Outliers at the tenant level from the Virtual Tenants account configuration page:

Flex Objects and Outliers

For Flex Objects (flx), ML Outliers models are primarily driven by the tracker configuration. When creating or editing a Flex tracker through the UI, the Outliers Metrics (ML) section allows you to define outliers models directly as part of the tracker setup:

  • Select which metrics should have outliers detection enabled

  • Configure per-metric parameters at creation time: impact score, breach directions, density thresholds, time factor, auto-correction, period calculation, and static bounds

  • These settings are applied automatically when entities are discovered by the tracker, creating pre-configured ML models without manual intervention

This approach is a key advantage of Flex Objects — the outliers configuration is part of the tracker definition itself, ensuring consistent model parameters across all entities discovered by that tracker.

For other components (dsm, dhm, etc.), ML models are created with system-wide defaults at entity discovery and must be tuned individually or via bulk actions afterward.

Hint

When a Flex tracker defines outliers_metrics in its search results, the ML models are automatically created with the specified parameters. This can also be set programmatically via the outliers_metrics field in the tracker search output — see splk-flx - Creating and managing Flex Trackers for details.

Demonstrating Machine Learning Outliers Detection in TrackMe

How ML Outliers Detection Works in TrackMe

In short:

  • Outliers rely on TrackMe-generated metrics only

  • This allows running fast and efficient training and rendering searches, with minimal cost in terms of resources

For the purpose of this demonstration, we create a Flex Object TrackMe tenant that uses our ML generator:

Our Flex tracker:

  • Tracker name: “demo”

  • Runs every 5 minutes (earliest: -5m, latest: now)

index=mlgen ref=* instance_id=*
| stats avg(dcount_hosts) as dcount_hosts, avg(events_count) as events_count, values(instance_id) as instance_id by ref

| eval group = "demo"
| eval object = instance_id . ":" . ref, alias = ref
| eval object_description = "Demo Outliers in TrackMe"
| eval metrics = "{'dcount_hosts': " . dcount_hosts . ", 'events_count': " . events_count . "}"
| eval outliers_metrics="{'dcount_hosts': {'alert_lower_breached': 1, 'alert_upper_breached': 1, 'time_factor': '%H', 'period_calculation': '-90d'}, 'events_count': {'alert_lower_breached': 1, 'alert_upper_breached': 1, 'time_factor': '%H', 'period_calculation': '-90d'}}"
| eval status=1
| eval status_description="Machine Learning Outliers detection demo"
| table group, object, alias, object_description, metrics, outliers_metrics, status, status_description
``` default metric for the TrackMe UI to pick when opening the entity screen ```
| eval default_metric="events_count"
``` alert if inactive for more than 3600 sec```
| eval max_sec_inactive=3600

This Flex Tracker creates entities by monitoring the availability of data in our ML index; it also generates metrics and automates the definition of models which alert on both lower bound and upper bound outliers.

We have our ML generator running and having backfilled the past 90 days of data; it currently does not generate any outliers:

index=mlgen ref="security:linux_secure" earliest=-90d
| timechart avg(events_count) as avg_events_count span=1h
img-008768@2x.png

Our ML generator takes into account the weekdays; we can use the following search to compare the relative activity of the current weekdays against the past 4 previous same weekdays:

index=mlgen ref="security:linux_secure" earliest=@d latest=+1d@d
| timechart span=5m avg(events_count) as events_count_today
| appendcols [
search index=mlgen ref="security:linux_secure" earliest=-7d@d latest=-6d@d
| timechart span=5m avg(events_count) as events_count_ref
]
| appendcols [
search index=mlgen ref="security:linux_secure" earliest=-14d@d latest=-13d@d
| timechart span=5m avg(events_count) as events_count_ref2
]
| appendcols [
search index=mlgen ref="security:linux_secure" earliest=-21d@d latest=-20d@d
| timechart span=5m avg(events_count) as events_count_ref3
]

Results:

img-008769@2x.png

TrackMe automatically discovered the entity, let’s take note of its internal identifier which we will use to manually backfill TrackMe metrics, as if we had been monitoring this entity since the beginning:

We use mcollect to force backfilling metrics, pay attention to replace with the valid tenant_id value:

index=mlgen ref=* instance_id=* earliest=-90d latest=now
| bucket _time span=5m
| stats avg(dcount_hosts) as trackme.splk.flx.dcount_hosts, avg(events_count) as trackme.splk.flx.events_count by _time, instance_id, ref

``` form the object ```
| eval object = "demo" . ":" . instance_id . ":" . ref

``` lookup the tenant/component ```
| lookup trackme_flx_tenant_demo-outliers object OUTPUT tenant_id, _key as object_id, object_category
| where isnotnull(object_id)

``` collect ```
| mcollect index=trackme_metrics split=t object, object_category, object_id, tenant_id

Opening the entity shows we have backfilled metrics now:

img-008772@2x.png

Depending on if TrackMe already ran or not the ML training job for the tenant, ML Outliers may not be ready yet:

img-008774@2x.png

We can either run the mltrain job manually, or train the models via the UI:

Machine Learning Outliers is ready: (we have no outliers yet)

Let’s access and review the models definition, for now we will only increase the training period to the past 90 days:

  • Click on “Manage Outliers detection”

  • Update the models to increase the time range for the calculation

  • Manually run a training for each model

  • Click on Simulate Selected to review the results (we have selected the event count model), this is looking great for now

Scenario: Detecting a lower bound outlier

Although we know there is a weekday behavior in the data, for now we will stick with the default settings and we will start generating a lower bound outlier.

To achieve this, we stop the run_backfill.sh and we start run_gen_lowerbound_outlier.sh, this basically:

  • Influences metrics with a large decrease of approximately 75%, according to the magnitude of the weekday/hour range

After a few minutes, we start to see a clear outlier using the previous days’ comparison timechart search:

img-008783@2x.png

The outliers condition will also be reflected in TrackMe. It can take 5 to 10 minutes to be detected as an effective outlier:

Excellent—the sudden decrease in activity has been detected successfully!

The next action is to run the ML rendering process, which can be achieved in different ways:

  • Through the bulk actions for Outliers Detection

  • Through the Outliers management screen

Note

TrackMe monitors every entity’s models by default once per hour at the earliest, which is configurable through the system-wide options: System-wide configuration options

At some point, the Outliers condition is pertinent enough to generate a sufficient impact score to trigger a red alert:

If an AI provider is configured, the AI Assistant can be leveraged to investigate the Outliers condition and generate an AI status report in stateful email notifications:

Fine tuning the ML models

True context simulation

TrackMe performs true context simulations — when a simulation is executed, TrackMe trains a dedicated model specifically for that simulation, allowing the outlier detection to fully reflect live behavior:

  • All parameters, whether saved or not yet saved, are taken into account on the fly during the simulation

  • You can add specific conditions such as periods of exclusion

Adjusting model parameters

From the Manage Outliers detection screen, each model can be individually tuned to match the behavior of the underlying data. The key parameters are:

  • Outlier impact score (outlier_impact_score): The score value generated when an outlier is detected. Default: 36. A single outlier at the default score will not trigger a red alert on its own (cumulative score >= 100 is required). Increase the value to make outlier detections more impactful, or decrease it to reduce their weight.

  • Training period (period_calculation): Controls how far back in time the model looks during training. Longer periods capture more historical patterns but may dilute recent trends. Shorter periods react faster to changes.

  • Time factor (time_factor): Determines the seasonality granularity. Common values include %w%H (weekday + hour), %H (hour only), %w (weekday only), or none to disable seasonality entirely.

  • Density thresholds (density_lowerthreshold / density_upperthreshold): Control the sensitivity of the density function algorithm (both TrackMeNativeDensityFunction and DensityFunction). Lower values produce tighter boundaries (more sensitive), higher values produce wider boundaries (more tolerant).

  • Auto-correction (auto_correct): When enabled, outliers where the deviation is below a configurable percentage of the current KPI value are automatically rejected as insignificant. The deviation thresholds are controlled by perc_min_lowerbound_deviation and perc_min_upperbound_deviation.

All changes can be previewed using the Simulate function before saving — see True context simulation above.

Excluding periods from training

When an incident or planned maintenance causes abnormal data patterns, the affected time period can be excluded from ML model training to prevent the model from learning incorrect baselines.

How to use:

  • Open the Manage Outliers detection screen for the entity

  • Click Period Exclusions on the target model

  • Define the start and end time for the exclusion window

Behavior:

  • Excluded periods are removed from the training dataset during the next mltrain cycle

  • When the exclusion window falls entirely outside the model’s training period (e.g., the exclusion is older than the 30-day training window), TrackMe automatically removes the exclusion entry during training

  • Exclusion events are logged in index=_internal with sourcetype trackme:custom_commands:trackmesplkoutlierstrain and the keyword period exclusion

Periods of exclusion can be added through bulk actions:

Or added and managed through the Outliers management screen:

Managing Outliers

Manage Outliers screen

The Manage Outliers detection screen is the central interface for reviewing and configuring ML models for a given entity. It is accessible from the entity’s detail view by clicking Manage Outliers detection.

The screen provides:

  • An overview of all ML models defined for the entity, with their current status, confidence level, and last training/monitoring timestamps

  • Per-model configuration: training period, time factor, density thresholds, breach direction settings, auto-correction, and threshold guards

  • Simulate Selected: Runs a true context simulation for the selected model, training a dedicated simulation model to preview results before applying changes

  • Train Selected: Triggers an immediate training cycle for the selected model

  • Save: Persists configuration changes to the rules KVstore collection

Bulk actions

TrackMe provides bulk action capabilities for ML Outliers from the entity detail view. Two categories of bulk actions are available:

Outliers actions:

  • Train All Models: Triggers an immediate training cycle for all models of the entity

  • Monitor All Models: Triggers an immediate monitoring (rendering) cycle for all models of the entity

  • Reset All Models: Resets all models to their default configuration and retrains them

Outliers rules:

  • Enable All Models: Enables outlier detection for all models of the entity

  • Disable All Models: Disables outlier detection for all models of the entity

These bulk actions are available directly from the entity screen, allowing efficient management without opening individual model configurations.

Enabling and disabling Outliers detection

ML Outliers detection can be controlled at multiple levels:

Per-tenant:

In the Configuration UI, the mloutliers setting controls whether ML Outliers is enabled for the entire tenant. When set to 0 (disabled), no ML training or monitoring jobs run for that tenant.

Per-component:

The mloutliers_allowlist setting defines which components have ML Outliers enabled. By default, all components are included (dsm,dhm,flx,wlk,fqm). Remove a component from the list to disable ML Outliers for that component type across the tenant.

Per-entity:

Individual models can be disabled using the is_disabled flag in the Manage Outliers detection screen, or via the bulk Disable All Models action.

At discovery:

The splk_outliers_detection_disable_default system-wide option controls whether outlier detection is enabled or disabled by default when new entities are discovered.

Backend & Scheduling

ML training scheduled jobs

The ml_train scheduled job orchestrates the training of ML models across all enabled tenants and components.

  • Schedule: Runs every hour

  • Max runtime: 3600 seconds minus a 120-second safety margin

  • Behavior: The orchestrator iterates through eligible entities sequentially, training models that have exceeded the splk_outliers_time_train_mlmodels_default interval since their last training

  • Orchestrator command: trackmesplkoutlierstrainhelper

  • Per-entity command: trackmesplkoutlierstrain

The training frequency per entity is governed by the splk_outliers_time_train_mlmodels_default system-wide option (default: 7 days). When the ml_train job runs, it selects entities whose models are due for retraining based on this interval.

ML monitoring scheduled jobs

The mlmonitor scheduled job evaluates trained models against current data to detect outliers.

  • Schedule: Runs every 20 minutes

  • Max runtime: 900 seconds minus a 120-second safety margin

  • Behavior: The orchestrator iterates through eligible entities sequentially, rendering models that have exceeded the splk_outliers_time_monitor_mlmodels_default interval since their last monitoring

  • Orchestrator command: trackmesplkoutlierstrackerhelper

  • Per-entity command: trackmesplkoutliersrender

The monitoring frequency per entity is governed by the splk_outliers_time_monitor_mlmodels_default system-wide option (default: 1 hour).

Note

Both training and monitoring jobs process entities sequentially within their allocated runtime window. If the job reaches its max runtime before processing all entities, remaining entities are picked up in the next cycle.

System-wide configuration options

The following options are available in Configuration > System Options and control the default behavior of ML Outliers across all tenants and components. These options are applied when new entities are discovered and new ML models are created.

ML Outliers system-wide options

Option

Default

Description

splk_outliers_min_days_history

7

Minimal number of days of historical metrics required to reach normal confidence level for outlier detection.

splk_outliers_time_train_mlmodels_default

604800

Interval in seconds between model training cycles per entity. Default: 7 days.

splk_outliers_time_monitor_mlmodels_default

3600

Interval in seconds between model monitoring (rendering) cycles per entity. Default: 1 hour.

splk_outliers_max_runtime_train_mlmodels_default

900

Maximum duration in seconds for the ML training job. Should align with the job’s cron schedule. Default: 15 minutes.

splk_outliers_max_days_since_last_train_default

15

If a model has not been trained within this many days, it is automatically retrained before rendering. Default: 15 days.

splk_outliers_detection_disable_default

0

When set to 1, outlier detection is disabled by default for newly discovered entities. Can still be enabled per entity.

splk_outliers_calculation_default

stdev

Default calculation mode for anomaly detection. Can be updated per entity.

splk_outliers_density_lower_threshold_default

0.005

Default lower threshold for the density function algorithm. Lower values = tighter boundaries.

splk_outliers_density_upper_threshold_default

0.005

Default upper threshold for the density function algorithm. Lower values = tighter boundaries.

splk_outliers_alert_lower_threshold_volume_default

1

Alert on lower bound breaches for volume-based KPIs. 1 = enabled, 0 = disabled.

splk_outliers_alert_upper_threshold_volume_default

0

Alert on upper bound breaches for volume-based KPIs. 1 = enabled, 0 = disabled.

splk_outliers_alert_lower_threshold_latency_default

0

Alert on lower bound breaches for latency-based KPIs. 1 = enabled, 0 = disabled.

splk_outliers_alert_upper_threshold_latency_default

1

Alert on upper bound breaches for latency-based KPIs. 1 = enabled, 0 = disabled.

splk_outliers_detection_period_default

-30d

Default relative time period used for outlier calculations. Applied at discovery, can be updated per entity.

splk_outliers_detection_period_latest_default

-1d

Default latest time quantifier for outlier calculations. Accepts Splunk relative time quantifiers (e.g., -1h@h).

splk_outliers_detection_timefactor_default

%w%H

Default time factor for seasonality. Values: %H (hour), %H%M (hour/min), %w%H (weekday/hour), %w%H%M (weekday/hour/min), %w (weekday), none (no seasonality).

splk_outliers_detection_latency_kpi_metric_default

None

Default KPI metric for latency outlier detection. None disables latency outliers by default. Options: splk.feeds.avg_latency_5m, splk.feeds.latest_latency_5m, splk.feeds.perc95_latency_5m, splk.feeds.stdev_latency_5m.

splk_outliers_detection_volume_kpi_metric_default

splk.feeds.avg_eventcount_5m

Default KPI metric for volume outlier detection.

splk_outliers_auto_correct

1

Enable auto-correction by default. When enabled, outliers with insignificant deviations are rejected based on min deviation percentages.

splk_outliers_perc_min_lowerbound_deviation_default

5.0

Minimum percentage deviation required for a LowerBound outlier to be considered valid. Below this threshold, the outlier is auto-corrected.

splk_outliers_perc_min_upperbound_deviation_default

5.0

Minimum percentage deviation required for an UpperBound outlier to be considered valid. Below this threshold, the outlier is auto-corrected.

splk_outliers_mltk_algorithms_list

TrackMeNativeDensityFunction,DensityFunction

Comma-separated list of selectable algorithms. TrackMeNativeDensityFunction is the built-in native engine; DensityFunction is the MLTK algorithm from the Splunk AI Toolkit. Custom algorithms added here become available in all Outliers configuration screens.

splk_outliers_mltk_algorithms_default

TrackMeNativeDensityFunction

Default algorithm used when ML model rules are created (at entity discovery or model reset). Set to DensityFunction to use the MLTK algorithm instead.

splk_outliers_native_model_storage

kvstore

Storage backend for native TrackMeNativeDensityFunction models. kvstore stores models in per-tenant KVstore collections (kv_trackme_native_ml_models_tenant_<tenant_id>); file stores models as JSON files in the Splunk lookups directory. KVstore is recommended for SHC environments.

splk_outliers_fit_extra_parameters

(empty)

Extra parameters appended to the fit command during training. For the native engine, example: exclude_dist="beta" to exclude Beta distributions from auto-selection. For MLTK, parameters are passed to the MLTK fit command.

splk_outliers_apply_extra_parameters

(empty)

Extra parameters appended to the apply command during rendering. For MLTK, example: sample="True".

splk_outliers_boundaries_extraction_macro_default

(empty)

Default Splunk macro used for boundaries extraction when defining ML model rules. Leave empty for standard behavior.

splk_outliers_boundaries_extraction_macros_list

(empty)

Comma-separated list of custom boundaries extraction macros. These become selectable in the Outliers management screens.

splk_outliers_static_lower_threshold_default

(empty)

Static override for the calculated lowerBound. When set, replaces the dynamically computed lower boundary.

splk_outliers_static_upper_threshold_default

(empty)

Static override for the calculated upperBound. When set, replaces the dynamically computed upper boundary.

Per-model options

Each ML model stores the following options in the rules KVstore collection. These are set from system-wide defaults at discovery and can be customized per model through the Manage Outliers detection screen.

ML Outliers per-model options

Option

Description

kpi_metric

The KPI metric used for outlier detection (e.g., splk.feeds.avg_eventcount_5m).

kpi_span

The time span used for metric aggregation during training and rendering.

method_calculation

The algorithm used for this model. TrackMeNativeDensityFunction (native engine, default) or DensityFunction (MLTK).

model_storage

Storage backend for this model’s trained data. kvstore (default for native engine) stores models in per-tenant KVstore collections; file stores models as files on disk.

period_calculation

The relative time period used for training data (e.g., -30d).

time_factor

The seasonality factor applied to the model (e.g., %w%H, none).

density_lowerthreshold

The lower threshold for the density function algorithm.

density_upperthreshold

The upper threshold for the density function algorithm.

auto_correct

When enabled (1), outliers with insignificant deviations are auto-rejected.

perc_min_lowerbound_deviation

Minimum deviation percentage for LowerBound outliers to be considered valid.

perc_min_upperbound_deviation

Minimum deviation percentage for UpperBound outliers to be considered valid.

alert_lower_breached

When enabled (1), LowerBound breaches generate outlier score events.

alert_upper_breached

When enabled (1), UpperBound breaches generate outlier score events.

min_value_for_lowerbound_breached

Threshold guard for LowerBound outliers. If the observed value remains above this threshold, the breach is considered insignificant and rejected.

min_value_for_upperbound_breached

Threshold guard for UpperBound outliers. If the observed value remains below this threshold, the breach is considered insignificant and rejected.

outlier_impact_score

The impact score value generated when an outlier is detected. Default: 36. Higher values increase the weight of outlier detections in the entity’s cumulative impact score.

is_disabled

When set to 1, outlier detection is disabled for this model.

Advanced Topics

Accessing ML model rules and results

Model rules (configuration) can be retrieved using the dedicated command or directly from the KVstore:

| trackmesplkoutliersgetrules tenant_id="<tenant_id>" component="<component>" object="<entity_name>"

Or via the underlying KVstore lookup:

| inputlookup trackme_<component>_outliers_entity_rules_tenant_<tenant_id>

Model results (current outlier detection state) can be retrieved similarly:

| trackmesplkoutliersgetdata tenant_id="<tenant_id>" component="<component>" object="<entity_name>"

Or via the underlying KVstore lookup:

| inputlookup trackme_<component>_outliers_entity_data_tenant_<tenant_id>

Algorithms and model storage

TrackMe provides two density function algorithms for outlier detection:

TrackMeNativeDensityFunction (default):

The native engine is TrackMe’s built-in density function, powered by scipy. It automatically selects the best-fit distribution from four types — normal, exponential, Gaussian KDE, and beta — using Wasserstein distance. Models are stored in per-tenant KVstore collections (kv_trackme_native_ml_models_tenant_<tenant_id>) by default, with file-based storage available as an alternative via the splk_outliers_native_model_storage configuration option.

  • Training uses the trackmefit streaming command

  • Rendering uses the trackmeapply streaming command

  • Distribution exclusion is supported via the exclude_dist parameter (e.g., exclude_dist="beta")

DensityFunction (MLTK):

The DensityFunction algorithm from the Splunk AI Toolkit is also available and can be selected per model. This algorithm uses the Splunk AI Toolkit’s fit and apply commands. Models are stored as .mlmodel files on disk.

Selecting the algorithm:

Add algorithms to the splk_outliers_mltk_algorithms_list system-wide option as a comma-separated list. These algorithms become selectable in all Outliers configuration screens. Set the default algorithm for new models via splk_outliers_mltk_algorithms_default.

Extra fit/apply parameters:

  • splk_outliers_fit_extra_parameters: Additional parameters passed to the fit command during training (e.g., exclude_dist="beta" to exclude Beta distributions)

  • splk_outliers_apply_extra_parameters: Additional parameters passed to the apply command during rendering (e.g., sample="True" for the MLTK DensityFunction)

These parameters are applied when ML model rules are defined (typically at entity discovery or model reset).

Custom boundaries extraction macros:

For algorithms that require custom boundary calculations, define Splunk macros and register them in splk_outliers_boundaries_extraction_macros_list. The default macro used for new models is set via splk_outliers_boundaries_extraction_macro_default.

KVstore model lifecycle management:

When using the native engine with KVstore storage, TrackMe automatically manages the model lifecycle:

  • Models are stored in per-tenant KVstore collections, providing tenant isolation

  • The general health manager periodically detects and removes orphan models — models whose associated entity or outlier rule no longer exists

  • After upgrading from file-based to KVstore storage, the render command detects missing KVstore models and triggers automatic training on first access, ensuring zero downtime for ML outlier results

Expanding ML model data

The trackmesplkoutliersexpand streaming command expands the complex JSON dictionaries stored in the ML model KVstore collections into individual fields, making them easier to analyze and filter.

Expanding model results:

| trackmesplkoutliersgetdata tenant_id="<tenant_id>" component="<component>" object="<entity_name>"
| trackmesplkoutliersexpand

Expanding model rules (definitions):

| trackmesplkoutliersgetrules tenant_id="<tenant_id>" component="<component>" object="<entity_name>"
| rename entities_outliers as models_summary
| trackmesplkoutliersexpand

Extracting specific fields from outlier reasons:

After expanding results, use rex to extract details from the isOutlierReason field:

| rex field=isOutlierReason "time=(?<outlier_time>[^,]+)"
| rex field=isOutlierReason "pct_decrease=(?<pct_decrease>[^,\}]+)"

REST API endpoints

TrackMe exposes REST API endpoints for programmatic control of ML Outliers operations. These endpoints are used internally by the UI and can be called directly for automation.

Train models:

| trackme mode=post url="/services/trackme/v2/splk_outliers_engine/write/outliers_train_models" body="{'tenant_id': '<tenant_id>', 'component': '<component>', 'object': '<entity_name>'}"

Reset models (restore to defaults and retrain):

| trackme mode=post url="/services/trackme/v2/splk_outliers_engine/write/outliers_reset_models" body="{'tenant_id': '<tenant_id>', 'component': '<component>', 'object': '<entity_name>'}"

Delete models:

| trackme mode=post url="/services/trackme/v2/splk_outliers_engine/write/outliers_delete_models" body="{'tenant_id': '<tenant_id>', 'component': '<component>', 'object': '<entity_name>'}"

Mass operations via REST API

For bulk operations across multiple entities, combine the trackmesplkoutliersexpand command with Splunk’s map command to iterate over filtered results:

Example — mass delete models matching a specific KPI metric:

| trackmesplkoutliersgetrules tenant_id="<tenant_id>" component="<component>"
| rename entities_outliers as models_summary
| trackmesplkoutliersexpand
| search kpi_metric="<target_metric>"
| map search="| trackme mode=post url=\"/services/trackme/v2/splk_outliers_engine/write/outliers_delete_models\" body=\"{'tenant_id': '$tenant_id$', 'component': '$component$', 'object': '$object$'}\""

Warning

Mass delete operations are irreversible. Always verify the filter criteria by running the search without the | map portion first to confirm the targeted entities.

Troubleshooting

Understanding ML rendering decisions

To inspect the detailed decision-making process of the ML rendering engine, use the trackmeprettyjson command on the raw rendering results. This shows per-time-frame density function output, boundary calculations, and auto-correction decisions:

| trackmesplkoutliersgetdata tenant_id="<tenant_id>" component="<component>" object="<entity_name>"
| fields _time, _raw
| trackmeprettyjson fields=_raw

The JSON output includes the rendering results for each time bucket, the calculated boundaries (lowerBound / upperBound), whether the value was flagged as an outlier, and whether auto-correction rejected the outlier.

Training logs

ML training activity is logged in index=_internal under two sourcetypes:

Orchestrator logs (job-level activity, entity selection, timing):

index=_internal sourcetype=trackme:custom_commands:trackmesplkoutlierstrainhelper

Per-entity training logs (model training details, period exclusions, errors):

index=_internal sourcetype=trackme:custom_commands:trackmesplkoutlierstrain

Monitoring logs

ML monitoring (rendering) activity is logged in index=_internal under two sourcetypes:

Orchestrator logs (job-level activity, entity selection, timing):

index=_internal sourcetype=trackme:custom_commands:trackmesplkoutlierstrackerhelper

Per-entity rendering logs (outlier detection results, score events, errors):

index=_internal sourcetype=trackme:custom_commands:trackmesplkoutliersrender