Outliers Anomaly Detection¶
Machine Learning Outliers Anomaly Detection in TrackMe¶
TrackMe implements Machine Learning Outliers Anomaly detection across all components, from feeds tracking to the monitoring of scheduled activity in Splunk.
TrackMe implements its own native ML density function engine (TrackMeNativeDensityFunction), providing a built-in outlier detection system. Models are stored and managed in per-tenant KVstore collections, delivering better performance, easier management, and eliminating replication overhead in Search Head Cluster (SHC) environments. The DensityFunction algorithm from the Splunk AI Toolkit is also available and can be selected per model. The workflow operates in two phases:
ML training (
mltrain): models are generated and trained from historical metrics stored in TrackMe’s metric store indexes, and persisted to KVstore (or optionally to file)ML monitoring (
mlmonitor): TrackMe evaluates the anomaly detection status for each entity by applying trained models
Models are created automatically when entities are discovered and can be customized per entity to adjust behavior as needed.
Impact-Based Alerting and ML Outliers
TrackMe uses Impact-Based Alerting (IBA) to determine how ML Outliers affect entity status. When an outlier is detected, TrackMe generates outlier score events with a configurable impact score (default: 36). Only when the cumulative impact score reaches the alerting threshold (score >= 100) does the entity transition to a red alert state. This approach provides high flexibility and significantly reduces false positive risks.
Outlier impact scores are configurable per tenant and per component
Outliers alone (at default score 36) will not trigger a red alert unless combined with other anomalies or the score is tuned higher
One-click false positive suppression is available for individual outliers or entire entities
See QUICK START - TrackMe Impact-Based Alerting (IBA) for full details on Impact-Based Alerting
Key capabilities
Confidence levels: ML models have a confidence level (
lowornormal) based on available historical data. Models withlowconfidence do not influence entity status until sufficient data has been collected (configurable, default: 7 days)Seasonality control: Models can include time-based seasonality factors or set the
time_factortononeto exclude seasonal concepts for KPIs that are not driven by time-of-day or day-of-week patternsThreshold guards: Per-model minimal and maximal thresholds for LowerBound and UpperBound outliers allow rejecting outliers that do not meet defined value boundaries
Auto-correction: Built-in deviation checks reject outliers where the variation is too small relative to the current KPI value, reducing false positives
True context simulation: Simulation mode trains a dedicated simulation model that fully replicates live model behavior, ensuring accurate previews before applying changes
Native ML engine: TrackMe’s built-in
TrackMeNativeDensityFunctionengine uses scipy with automatic best-fit distribution selection (normal, exponential, Gaussian KDE, beta), with the MLTKDensityFunctionalso availableKVstore model storage: ML models are stored in per-tenant KVstore collections by default, with file-based storage available as an alternative
Custom algorithms and parameters: Define custom algorithms, extra
fit/applyparameters, and custom boundaries extraction macros for advanced ML requirementsSelective enablement: ML Outliers can be enabled or disabled per tenant and per component using the
mloutliersandmloutliers_allowlistconfiguration optionsBulk operations: Reset, enable/disable, train, and monitor operations can be performed in bulk via the UI
See also: Use TrackMe to detect abnormal events count drop in Splunk feeds for a practical use case around Machine Learning outlier detection.
Seasonality Concepts¶
Most data sources exhibit recurring patterns — higher activity during business hours, lower volumes on weekends, periodic batch ingestion cycles. TrackMe’s ML Outliers engine leverages these patterns by training models that learn the expected behavior over time and flag deviations as potential anomalies.
Sample pattern over the past 30 days — seasonality by weekday with higher activity during working hours:
However, not all KPIs follow seasonal patterns. For metrics with steady, time-independent behavior (for example, a fixed-rate feed or a constant host count), applying seasonality to the model can introduce unnecessary noise. To handle these cases, ML models can be configured with the time_factor set to none, which excludes day-of-week and hour-of-day seasonality from the outlier calculations. This can be set on a per-model basis, or chosen as the default for new models via the splk_outliers_detection_timefactor_default configuration option.
Note
Generating samples for outliers detection
You can find, download, and use the following sample generator with no restrictions: https://github.com/trackme-limited/mlgen-python — we use it to generate data with seasonality concepts for development, qualification, and documentation purposes.
Confidence Level¶
TrackMe assigns a confidence level to each ML model during training, based on the amount of historical data available:
low: The model is trained and rendered, but outlier results do not influence entity status. This prevents false positives while the model is still learning from limited data.
normal: The model is fully trusted and outlier results contribute to the entity’s impact score.
The minimum number of days of historical metrics required to reach normal confidence is controlled by the splk_outliers_min_days_history configuration option (default: 7 days). The confidence level and reason are visible in the Manage Outliers screen and stored in the rules KVstore collection.
Outliers Impact Score & triggering on Upper Bound/Lower Bound breaches¶
When TrackMe’s ML engine detects an outlier, it generates outlier score events that feed into the entity’s cumulative impact score. Whether an outlier actually influences the entity status depends on several factors:
Upper Bound and Lower Bound breach configuration:
Each ML model defines which breach directions are active:
alert_lower_breached: When enabled, a LowerBound outlier (observed value falls below the predicted lower boundary) generates a score eventalert_upper_breached: When enabled, an UpperBound outlier (observed value exceeds the predicted upper boundary) generates a score event
These settings are configured per model, giving you full control over which types of volume deviations are meaningful for each metric. For example, you may want to alert only on lower bound breaches for a critical feed (detecting data drops) while ignoring upper bound breaches that represent harmless spikes.
Impact score behavior:
The default outlier impact score is 36 (configurable per tenant and per component)
At the default score, a single outlier detection alone will not trigger a red alert (which requires a cumulative score >= 100)
Multiple concurrent outlier detections, or outliers combined with other anomalies (delay breach, latency breach, etc.), can accumulate to reach the alerting threshold
Outlier score events are aggregated over a 24-hour rolling window — older scores automatically expire
Threshold guards:
Per-model minimal and maximal value thresholds provide an additional layer of filtering:
min_value_for_lowerbound_breached: If the observed value remains above this minimum threshold, the LowerBound breach is considered insignificant and rejectedmin_value_for_upperbound_breached: If the observed value remains below this minimum threshold, the UpperBound breach is considered insignificant and rejected
This filters out insignificant deviations — for example, a small dip in event count that technically breaches the lower boundary but represents a negligible variation. Rejected outliers do not generate score events and do not affect the entity’s impact score.
False positive suppression:
When an outlier is identified as a false positive, a one-click action generates a negative score event that cancels out the outlier’s positive score. The anomaly reason is preserved for audit purposes but no longer contributes to the entity status. See QUICK START - TrackMe Impact-Based Alerting (IBA) for details on false positive management.
Outliers impact score and breach configuration in the Manage Outliers screen:
Outliers per component¶
ML Outliers detection is available across TrackMe components, but the value and applicability varies significantly depending on the nature of the data being monitored.
Enabling Outliers at tenant creation¶
When creating a new Virtual Tenant, the Advanced Options section includes the Outliers enablement setting. TrackMe automatically pre-selects the recommended value based on the component type:
dsm and flx tenants: Outliers is enabled by default
dhm, wlk, and fqm tenants: Outliers is disabled by default
mhm tenants: Outliers is not applicable
This setting can be changed at any time after creation from the Virtual Tenants account configuration page, allowing you to enable or disable Outliers detection as your requirements evolve.
Applicability per component¶
Component |
Recommendation |
Details |
|---|---|---|
dsm (Data Sources) |
Recommended |
High value for volume-based anomaly detection on data feeds. Particularly valuable for Foundation Edition customers who cannot leverage Flex Objects for a more segmented approach to outliers detection. |
dhm (Data Hosts) |
Not recommended |
Host-level metrics tend to have low predictability and stability due to the dynamic nature of endpoints. Outliers detection on hosts typically generates excessive false positives. |
mhm (Metric Hosts) |
Not applicable |
Outliers detection is not applicable to metric hosts. |
wlk (Workload) |
Not recommended |
Workload monitoring generally has low predictability and limited value from outliers detection. TrackMe automatically disables Outliers by default when creating a Workload tenant. |
flx (Flex Objects) |
Key feature |
Outliers is a core capability of Flex Objects, enabling precise anomaly detection on custom metrics with full control over model configuration. Can be disabled at the tenant level for use cases where outliers detection is not applicable (no metrics, or simply not wanted). |
fqm (Fields Quality) |
Not recommended |
Fields quality metrics are generally not suitable for outliers detection due to their inherent variability. |
Hint
The mloutliers_allowlist configuration option controls which components have ML Outliers enabled per tenant. By default, all applicable components are included (dsm,dhm,flx,wlk,fqm). Adjust this list to match your environment — for example, set it to dsm,flx to focus on the highest-value components.
Enabling/disabling Anomaly Outliers at the tenant level during tenant creation:
Enabling/disabling Anomaly Outliers at the tenant level from the Virtual Tenants account configuration page:
Flex Objects and Outliers¶
For Flex Objects (flx), ML Outliers models are primarily driven by the tracker configuration. When creating or editing a Flex tracker through the UI, the Outliers Metrics (ML) section allows you to define outliers models directly as part of the tracker setup:
Select which metrics should have outliers detection enabled
Configure per-metric parameters at creation time: impact score, breach directions, density thresholds, time factor, auto-correction, period calculation, and static bounds
These settings are applied automatically when entities are discovered by the tracker, creating pre-configured ML models without manual intervention
This approach is a key advantage of Flex Objects — the outliers configuration is part of the tracker definition itself, ensuring consistent model parameters across all entities discovered by that tracker.
For other components (dsm, dhm, etc.), ML models are created with system-wide defaults at entity discovery and must be tuned individually or via bulk actions afterward.
Hint
When a Flex tracker defines outliers_metrics in its search results, the ML models are automatically created with the specified parameters. This can also be set programmatically via the outliers_metrics field in the tracker search output — see splk-flx - Creating and managing Flex Trackers for details.
Demonstrating Machine Learning Outliers Detection in TrackMe¶
How ML Outliers Detection Works in TrackMe¶
In short:
Outliers rely on TrackMe-generated metrics only
This allows running fast and efficient training and rendering searches, with minimal cost in terms of resources
For the purpose of this demonstration, we create a Flex Object TrackMe tenant that uses our ML generator:
For more information about the Flex Objects component: splk-flx - Creating and managing Flex Trackers
Our Flex tracker:
Tracker name: “demo”
Runs every 5 minutes (earliest: -5m, latest: now)
index=mlgen ref=* instance_id=*
| stats avg(dcount_hosts) as dcount_hosts, avg(events_count) as events_count, values(instance_id) as instance_id by ref
| eval group = "demo"
| eval object = instance_id . ":" . ref, alias = ref
| eval object_description = "Demo Outliers in TrackMe"
| eval metrics = "{'dcount_hosts': " . dcount_hosts . ", 'events_count': " . events_count . "}"
| eval outliers_metrics="{'dcount_hosts': {'alert_lower_breached': 1, 'alert_upper_breached': 1, 'time_factor': '%H', 'period_calculation': '-90d'}, 'events_count': {'alert_lower_breached': 1, 'alert_upper_breached': 1, 'time_factor': '%H', 'period_calculation': '-90d'}}"
| eval status=1
| eval status_description="Machine Learning Outliers detection demo"
| table group, object, alias, object_description, metrics, outliers_metrics, status, status_description
``` default metric for the TrackMe UI to pick when opening the entity screen ```
| eval default_metric="events_count"
``` alert if inactive for more than 3600 sec```
| eval max_sec_inactive=3600
This Flex Tracker creates entities by monitoring the availability of data in our ML index; it also generates metrics and automates the definition of models which alert on both lower bound and upper bound outliers.
We have our ML generator running and having backfilled the past 90 days of data; it currently does not generate any outliers:
index=mlgen ref="security:linux_secure" earliest=-90d
| timechart avg(events_count) as avg_events_count span=1h
Our ML generator takes into account the weekdays; we can use the following search to compare the relative activity of the current weekdays against the past 4 previous same weekdays:
index=mlgen ref="security:linux_secure" earliest=@d latest=+1d@d
| timechart span=5m avg(events_count) as events_count_today
| appendcols [
search index=mlgen ref="security:linux_secure" earliest=-7d@d latest=-6d@d
| timechart span=5m avg(events_count) as events_count_ref
]
| appendcols [
search index=mlgen ref="security:linux_secure" earliest=-14d@d latest=-13d@d
| timechart span=5m avg(events_count) as events_count_ref2
]
| appendcols [
search index=mlgen ref="security:linux_secure" earliest=-21d@d latest=-20d@d
| timechart span=5m avg(events_count) as events_count_ref3
]
Results:
TrackMe automatically discovered the entity, let’s take note of its internal identifier which we will use to manually backfill TrackMe metrics, as if we had been monitoring this entity since the beginning:
We use mcollect to force backfilling metrics, pay attention to replace with the valid tenant_id value:
index=mlgen ref=* instance_id=* earliest=-90d latest=now
| bucket _time span=5m
| stats avg(dcount_hosts) as trackme.splk.flx.dcount_hosts, avg(events_count) as trackme.splk.flx.events_count by _time, instance_id, ref
``` form the object ```
| eval object = "demo" . ":" . instance_id . ":" . ref
``` lookup the tenant/component ```
| lookup trackme_flx_tenant_demo-outliers object OUTPUT tenant_id, _key as object_id, object_category
| where isnotnull(object_id)
``` collect ```
| mcollect index=trackme_metrics split=t object, object_category, object_id, tenant_id
Opening the entity shows we have backfilled metrics now:
Depending on if TrackMe already ran or not the ML training job for the tenant, ML Outliers may not be ready yet:
We can either run the mltrain job manually, or train the models via the UI:
Machine Learning Outliers is ready: (we have no outliers yet)
Let’s access and review the models definition, for now we will only increase the training period to the past 90 days:
Click on “Manage Outliers detection”
Update the models to increase the time range for the calculation
Manually run a training for each model
Click on Simulate Selected to review the results (we have selected the event count model), this is looking great for now
Scenario: Detecting a lower bound outlier¶
Although we know there is a weekday behavior in the data, for now we will stick with the default settings and we will start generating a lower bound outlier.
To achieve this, we stop the run_backfill.sh and we start run_gen_lowerbound_outlier.sh, this basically:
Influences metrics with a large decrease of approximately 75%, according to the magnitude of the weekday/hour range
After a few minutes, we start to see a clear outlier using the previous days’ comparison timechart search:
The outliers condition will also be reflected in TrackMe. It can take 5 to 10 minutes to be detected as an effective outlier:
Excellent—the sudden decrease in activity has been detected successfully!
The next action is to run the ML rendering process, which can be achieved in different ways:
Through the bulk actions for Outliers Detection
Through the Outliers management screen
Note
TrackMe monitors every entity’s models by default once per hour at the earliest, which is configurable through the system-wide options: System-wide configuration options
At some point, the Outliers condition is pertinent enough to generate a sufficient impact score to trigger a red alert:
If an AI provider is configured, the AI Assistant can be leveraged to investigate the Outliers condition and generate an AI status report in stateful email notifications:
Fine tuning the ML models¶
True context simulation¶
TrackMe performs true context simulations — when a simulation is executed, TrackMe trains a dedicated model specifically for that simulation, allowing the outlier detection to fully reflect live behavior:
All parameters, whether saved or not yet saved, are taken into account on the fly during the simulation
You can add specific conditions such as periods of exclusion
Adjusting model parameters¶
From the Manage Outliers detection screen, each model can be individually tuned to match the behavior of the underlying data. The key parameters are:
Outlier impact score (
outlier_impact_score): The score value generated when an outlier is detected. Default: 36. A single outlier at the default score will not trigger a red alert on its own (cumulative score >= 100 is required). Increase the value to make outlier detections more impactful, or decrease it to reduce their weight.Training period (
period_calculation): Controls how far back in time the model looks during training. Longer periods capture more historical patterns but may dilute recent trends. Shorter periods react faster to changes.Time factor (
time_factor): Determines the seasonality granularity. Common values include%w%H(weekday + hour),%H(hour only),%w(weekday only), ornoneto disable seasonality entirely.Density thresholds (
density_lowerthreshold/density_upperthreshold): Control the sensitivity of the density function algorithm (bothTrackMeNativeDensityFunctionandDensityFunction). Lower values produce tighter boundaries (more sensitive), higher values produce wider boundaries (more tolerant).Auto-correction (
auto_correct): When enabled, outliers where the deviation is below a configurable percentage of the current KPI value are automatically rejected as insignificant. The deviation thresholds are controlled byperc_min_lowerbound_deviationandperc_min_upperbound_deviation.
All changes can be previewed using the Simulate function before saving — see True context simulation above.
Excluding periods from training¶
When an incident or planned maintenance causes abnormal data patterns, the affected time period can be excluded from ML model training to prevent the model from learning incorrect baselines.
How to use:
Open the Manage Outliers detection screen for the entity
Click Period Exclusions on the target model
Define the start and end time for the exclusion window
Behavior:
Excluded periods are removed from the training dataset during the next
mltraincycleWhen the exclusion window falls entirely outside the model’s training period (e.g., the exclusion is older than the 30-day training window), TrackMe automatically removes the exclusion entry during training
Exclusion events are logged in
index=_internalwith sourcetypetrackme:custom_commands:trackmesplkoutlierstrainand the keywordperiod exclusion
Periods of exclusion can be added through bulk actions:
Or added and managed through the Outliers management screen:
Managing Outliers¶
Manage Outliers screen¶
The Manage Outliers detection screen is the central interface for reviewing and configuring ML models for a given entity. It is accessible from the entity’s detail view by clicking Manage Outliers detection.
The screen provides:
An overview of all ML models defined for the entity, with their current status, confidence level, and last training/monitoring timestamps
Per-model configuration: training period, time factor, density thresholds, breach direction settings, auto-correction, and threshold guards
Simulate Selected: Runs a true context simulation for the selected model, training a dedicated simulation model to preview results before applying changes
Train Selected: Triggers an immediate training cycle for the selected model
Save: Persists configuration changes to the rules KVstore collection
Bulk actions¶
TrackMe provides bulk action capabilities for ML Outliers from the entity detail view. Two categories of bulk actions are available:
Outliers actions:
Train All Models: Triggers an immediate training cycle for all models of the entity
Monitor All Models: Triggers an immediate monitoring (rendering) cycle for all models of the entity
Reset All Models: Resets all models to their default configuration and retrains them
Outliers rules:
Enable All Models: Enables outlier detection for all models of the entity
Disable All Models: Disables outlier detection for all models of the entity
These bulk actions are available directly from the entity screen, allowing efficient management without opening individual model configurations.
Enabling and disabling Outliers detection¶
ML Outliers detection can be controlled at multiple levels:
Per-tenant:
In the Configuration UI, the mloutliers setting controls whether ML Outliers is enabled for the entire tenant. When set to 0 (disabled), no ML training or monitoring jobs run for that tenant.
Per-component:
The mloutliers_allowlist setting defines which components have ML Outliers enabled. By default, all components are included (dsm,dhm,flx,wlk,fqm). Remove a component from the list to disable ML Outliers for that component type across the tenant.
Per-entity:
Individual models can be disabled using the is_disabled flag in the Manage Outliers detection screen, or via the bulk Disable All Models action.
At discovery:
The splk_outliers_detection_disable_default system-wide option controls whether outlier detection is enabled or disabled by default when new entities are discovered.
Backend & Scheduling¶
ML training scheduled jobs¶
The ml_train scheduled job orchestrates the training of ML models across all enabled tenants and components.
Schedule: Runs every hour
Max runtime: 3600 seconds minus a 120-second safety margin
Behavior: The orchestrator iterates through eligible entities sequentially, training models that have exceeded the
splk_outliers_time_train_mlmodels_defaultinterval since their last trainingOrchestrator command:
trackmesplkoutlierstrainhelperPer-entity command:
trackmesplkoutlierstrain
The training frequency per entity is governed by the splk_outliers_time_train_mlmodels_default system-wide option (default: 7 days). When the ml_train job runs, it selects entities whose models are due for retraining based on this interval.
ML monitoring scheduled jobs¶
The mlmonitor scheduled job evaluates trained models against current data to detect outliers.
Schedule: Runs every 20 minutes
Max runtime: 900 seconds minus a 120-second safety margin
Behavior: The orchestrator iterates through eligible entities sequentially, rendering models that have exceeded the
splk_outliers_time_monitor_mlmodels_defaultinterval since their last monitoringOrchestrator command:
trackmesplkoutlierstrackerhelperPer-entity command:
trackmesplkoutliersrender
The monitoring frequency per entity is governed by the splk_outliers_time_monitor_mlmodels_default system-wide option (default: 1 hour).
Note
Both training and monitoring jobs process entities sequentially within their allocated runtime window. If the job reaches its max runtime before processing all entities, remaining entities are picked up in the next cycle.
System-wide configuration options¶
The following options are available in Configuration > System Options and control the default behavior of ML Outliers across all tenants and components. These options are applied when new entities are discovered and new ML models are created.
Option |
Default |
Description |
|---|---|---|
|
7 |
Minimal number of days of historical metrics required to reach |
|
604800 |
Interval in seconds between model training cycles per entity. Default: 7 days. |
|
3600 |
Interval in seconds between model monitoring (rendering) cycles per entity. Default: 1 hour. |
|
900 |
Maximum duration in seconds for the ML training job. Should align with the job’s cron schedule. Default: 15 minutes. |
|
15 |
If a model has not been trained within this many days, it is automatically retrained before rendering. Default: 15 days. |
|
0 |
When set to |
|
stdev |
Default calculation mode for anomaly detection. Can be updated per entity. |
|
0.005 |
Default lower threshold for the density function algorithm. Lower values = tighter boundaries. |
|
0.005 |
Default upper threshold for the density function algorithm. Lower values = tighter boundaries. |
|
1 |
Alert on lower bound breaches for volume-based KPIs. |
|
0 |
Alert on upper bound breaches for volume-based KPIs. |
|
0 |
Alert on lower bound breaches for latency-based KPIs. |
|
1 |
Alert on upper bound breaches for latency-based KPIs. |
|
-30d |
Default relative time period used for outlier calculations. Applied at discovery, can be updated per entity. |
|
-1d |
Default latest time quantifier for outlier calculations. Accepts Splunk relative time quantifiers (e.g., |
|
%w%H |
Default time factor for seasonality. Values: |
|
None |
Default KPI metric for latency outlier detection. |
|
splk.feeds.avg_eventcount_5m |
Default KPI metric for volume outlier detection. |
|
1 |
Enable auto-correction by default. When enabled, outliers with insignificant deviations are rejected based on min deviation percentages. |
|
5.0 |
Minimum percentage deviation required for a LowerBound outlier to be considered valid. Below this threshold, the outlier is auto-corrected. |
|
5.0 |
Minimum percentage deviation required for an UpperBound outlier to be considered valid. Below this threshold, the outlier is auto-corrected. |
|
TrackMeNativeDensityFunction,DensityFunction |
Comma-separated list of selectable algorithms. |
|
TrackMeNativeDensityFunction |
Default algorithm used when ML model rules are created (at entity discovery or model reset). Set to |
|
kvstore |
Storage backend for native |
|
(empty) |
Extra parameters appended to the |
|
(empty) |
Extra parameters appended to the |
|
(empty) |
Default Splunk macro used for boundaries extraction when defining ML model rules. Leave empty for standard behavior. |
|
(empty) |
Comma-separated list of custom boundaries extraction macros. These become selectable in the Outliers management screens. |
|
(empty) |
Static override for the calculated lowerBound. When set, replaces the dynamically computed lower boundary. |
|
(empty) |
Static override for the calculated upperBound. When set, replaces the dynamically computed upper boundary. |
Per-model options¶
Each ML model stores the following options in the rules KVstore collection. These are set from system-wide defaults at discovery and can be customized per model through the Manage Outliers detection screen.
Option |
Description |
|---|---|
|
The KPI metric used for outlier detection (e.g., |
|
The time span used for metric aggregation during training and rendering. |
|
The algorithm used for this model. |
|
Storage backend for this model’s trained data. |
|
The relative time period used for training data (e.g., |
|
The seasonality factor applied to the model (e.g., |
|
The lower threshold for the density function algorithm. |
|
The upper threshold for the density function algorithm. |
|
When enabled ( |
|
Minimum deviation percentage for LowerBound outliers to be considered valid. |
|
Minimum deviation percentage for UpperBound outliers to be considered valid. |
|
When enabled ( |
|
When enabled ( |
|
Threshold guard for LowerBound outliers. If the observed value remains above this threshold, the breach is considered insignificant and rejected. |
|
Threshold guard for UpperBound outliers. If the observed value remains below this threshold, the breach is considered insignificant and rejected. |
|
The impact score value generated when an outlier is detected. Default: 36. Higher values increase the weight of outlier detections in the entity’s cumulative impact score. |
|
When set to |
Advanced Topics¶
Accessing ML model rules and results¶
Model rules (configuration) can be retrieved using the dedicated command or directly from the KVstore:
| trackmesplkoutliersgetrules tenant_id="<tenant_id>" component="<component>" object="<entity_name>"
Or via the underlying KVstore lookup:
| inputlookup trackme_<component>_outliers_entity_rules_tenant_<tenant_id>
Model results (current outlier detection state) can be retrieved similarly:
| trackmesplkoutliersgetdata tenant_id="<tenant_id>" component="<component>" object="<entity_name>"
Or via the underlying KVstore lookup:
| inputlookup trackme_<component>_outliers_entity_data_tenant_<tenant_id>
Algorithms and model storage¶
TrackMe provides two density function algorithms for outlier detection:
TrackMeNativeDensityFunction (default):
The native engine is TrackMe’s built-in density function, powered by scipy. It automatically selects the best-fit distribution from four types — normal, exponential, Gaussian KDE, and beta — using Wasserstein distance. Models are stored in per-tenant KVstore collections (kv_trackme_native_ml_models_tenant_<tenant_id>) by default, with file-based storage available as an alternative via the splk_outliers_native_model_storage configuration option.
Training uses the
trackmefitstreaming commandRendering uses the
trackmeapplystreaming commandDistribution exclusion is supported via the
exclude_distparameter (e.g.,exclude_dist="beta")
DensityFunction (MLTK):
The DensityFunction algorithm from the Splunk AI Toolkit is also available and can be selected per model. This algorithm uses the Splunk AI Toolkit’s fit and apply commands. Models are stored as .mlmodel files on disk.
Selecting the algorithm:
Add algorithms to the splk_outliers_mltk_algorithms_list system-wide option as a comma-separated list. These algorithms become selectable in all Outliers configuration screens. Set the default algorithm for new models via splk_outliers_mltk_algorithms_default.
Extra fit/apply parameters:
splk_outliers_fit_extra_parameters: Additional parameters passed to thefitcommand during training (e.g.,exclude_dist="beta"to exclude Beta distributions)splk_outliers_apply_extra_parameters: Additional parameters passed to theapplycommand during rendering (e.g.,sample="True"for the MLTKDensityFunction)
These parameters are applied when ML model rules are defined (typically at entity discovery or model reset).
Custom boundaries extraction macros:
For algorithms that require custom boundary calculations, define Splunk macros and register them in splk_outliers_boundaries_extraction_macros_list. The default macro used for new models is set via splk_outliers_boundaries_extraction_macro_default.
KVstore model lifecycle management:
When using the native engine with KVstore storage, TrackMe automatically manages the model lifecycle:
Models are stored in per-tenant KVstore collections, providing tenant isolation
The general health manager periodically detects and removes orphan models — models whose associated entity or outlier rule no longer exists
After upgrading from file-based to KVstore storage, the render command detects missing KVstore models and triggers automatic training on first access, ensuring zero downtime for ML outlier results
Expanding ML model data¶
The trackmesplkoutliersexpand streaming command expands the complex JSON dictionaries stored in the ML model KVstore collections into individual fields, making them easier to analyze and filter.
Expanding model results:
| trackmesplkoutliersgetdata tenant_id="<tenant_id>" component="<component>" object="<entity_name>"
| trackmesplkoutliersexpand
Expanding model rules (definitions):
| trackmesplkoutliersgetrules tenant_id="<tenant_id>" component="<component>" object="<entity_name>"
| rename entities_outliers as models_summary
| trackmesplkoutliersexpand
Extracting specific fields from outlier reasons:
After expanding results, use rex to extract details from the isOutlierReason field:
| rex field=isOutlierReason "time=(?<outlier_time>[^,]+)"
| rex field=isOutlierReason "pct_decrease=(?<pct_decrease>[^,\}]+)"
REST API endpoints¶
TrackMe exposes REST API endpoints for programmatic control of ML Outliers operations. These endpoints are used internally by the UI and can be called directly for automation.
Train models:
| trackme mode=post url="/services/trackme/v2/splk_outliers_engine/write/outliers_train_models" body="{'tenant_id': '<tenant_id>', 'component': '<component>', 'object': '<entity_name>'}"
Reset models (restore to defaults and retrain):
| trackme mode=post url="/services/trackme/v2/splk_outliers_engine/write/outliers_reset_models" body="{'tenant_id': '<tenant_id>', 'component': '<component>', 'object': '<entity_name>'}"
Delete models:
| trackme mode=post url="/services/trackme/v2/splk_outliers_engine/write/outliers_delete_models" body="{'tenant_id': '<tenant_id>', 'component': '<component>', 'object': '<entity_name>'}"
Mass operations via REST API¶
For bulk operations across multiple entities, combine the trackmesplkoutliersexpand command with Splunk’s map command to iterate over filtered results:
Example — mass delete models matching a specific KPI metric:
| trackmesplkoutliersgetrules tenant_id="<tenant_id>" component="<component>"
| rename entities_outliers as models_summary
| trackmesplkoutliersexpand
| search kpi_metric="<target_metric>"
| map search="| trackme mode=post url=\"/services/trackme/v2/splk_outliers_engine/write/outliers_delete_models\" body=\"{'tenant_id': '$tenant_id$', 'component': '$component$', 'object': '$object$'}\""
Warning
Mass delete operations are irreversible. Always verify the filter criteria by running the search without the | map portion first to confirm the targeted entities.
Troubleshooting¶
Understanding ML rendering decisions¶
To inspect the detailed decision-making process of the ML rendering engine, use the trackmeprettyjson command on the raw rendering results. This shows per-time-frame density function output, boundary calculations, and auto-correction decisions:
| trackmesplkoutliersgetdata tenant_id="<tenant_id>" component="<component>" object="<entity_name>"
| fields _time, _raw
| trackmeprettyjson fields=_raw
The JSON output includes the rendering results for each time bucket, the calculated boundaries (lowerBound / upperBound), whether the value was flagged as an outlier, and whether auto-correction rejected the outlier.
Training logs¶
ML training activity is logged in index=_internal under two sourcetypes:
Orchestrator logs (job-level activity, entity selection, timing):
index=_internal sourcetype=trackme:custom_commands:trackmesplkoutlierstrainhelper
Per-entity training logs (model training details, period exclusions, errors):
index=_internal sourcetype=trackme:custom_commands:trackmesplkoutlierstrain
Monitoring logs¶
ML monitoring (rendering) activity is logged in index=_internal under two sourcetypes:
Orchestrator logs (job-level activity, entity selection, timing):
index=_internal sourcetype=trackme:custom_commands:trackmesplkoutlierstrackerhelper
Per-entity rendering logs (outlier detection results, score events, errors):
index=_internal sourcetype=trackme:custom_commands:trackmesplkoutliersrender