Tracking Splunk Cloud SVC consumption in TrackMe

Monitoring the Splunk Cloud SVC consumption using TrackMe

  • This white paper aims at providing a detailed guide on how to monitor the Splunk Cloud SVC consumption using TrackMe.

  • The Splunk Cloud SVC consumption is a key metric to monitor in order to ensure the Splunk Cloud service is used efficiently and effectively.

Introduction about Splunk Cloud SVC

Whenever you are a Cloud customer paying by the volume, or by the capacity allocated to your environment (stack), SVC consumption is the key indicator in Splunk Cloud which allows you to understand the costs associated with anything that consumes computes in Splunk:

Essentially:

  • Indexing related activities

  • Users ad-hoc searches

  • Users and applications scheduled searches

  • DataModel acceleration (DMA)

In TrackMe, you can leverage this key performance indicator to monitor the consumption of your Splunk Cloud environment, and detect abnormal trends.

To do so, we can leverage two components in TrackMe:

  • TrackMe Flex Objects trackers, which can monitor SVC consumptions based on different patterns, and notably leverage Machine Learning to detect abnormal trends.

  • TrackMe Workload, which can monitor the consumption of SVC at the levels of scheduled searches.

Splunk generates SVC consumption metrics in the summary index:

Global SVC consumption:

index=summary source=splunk-svc

Detailed SVC consumption per consumer:

index=summary source=splunk-svc-consumer

TrackMe leverages these indicators and stores these as metrics into the TrackMe metric indexes, and can leverage basic logics or more advanced logics using Machine Learning Outliers detection.

TrackMe Flex trackers for Splunk Cloud SVC consumption

We will start by creating a dedicated Virtual Tenant for the purposes of tracking Splunk Cloud SVC consumption, we can use the UI and create a new Flex Object enabled tenant:

ui_tenant001.png ui_tenant002.png

Or we can use the trackme custom command:

| trackme url="/services/trackme/v2/vtenants/admin/add_tenant" mode="post" body="{'tenant_alias':'splunk-cloud-svc','tenant_desc':'Splunk Cloud SVC monitoring','tenant_name':'splunk-cloud-svc','tenant_roles_admin':['trackme_admin'],'tenant_roles_power':['trackme_power'],'tenant_roles_user':['trackme_user'],'tenant_owner':'nobody','tenant_idx_settings':'{\'trackme_summary_idx\': \'trackme_summary\', \'trackme_audit_idx\': \'trackme_audit\', \'trackme_notable_idx\': \'trackme_notable\', \'trackme_metric_idx\': \'trackme_metrics\'}','tenant_outliers_set_state':'1','tenant_mloutliers':'1','tenant_cmdb_lookup':'1','tenant_default_priority':'medium','tenant_flx_enabled':1}"

Once the Virtual Tenant is created, we can leverage the TrackMe Flex object library and rely on the two out of the box Flex Object use cases to monitoring SVC:

Hint

TrackMe version 2.0.99

  • These two use cases were last updated in TrackMe 2.0.99, make sure you are using this version or later.

TrackMe stack global SVC consumption

Once in the TrackMe UI, create a new Flex Object tracker and select the following use case template:

flex_global_svc001.png

Scroll down and execute the tracker, you also can customise the tracker if you wish to do so, although this isn’t mandatory and the use case will work out of the box:

flex_global_svc002.png flex_global_svc003.png flex_global_svc004.png flex_global_svc005.png

Once executed, you will see one TrackMe entity, which corresponds to the Splunk Cloud stack name:

flex_global_svc006.png

If you open the entity, you will see two SVC kpis:

  • splunk_cloud.svc.pct_svc_used: the percentage of SVC used in the stack

  • splunk_cloud.svc.utilized_svc: the amount of SVC used in the stack

Example:

flex_global_svc007.png

After some time, this will be look like:

flex_global_svc008.png

Looking at Outliers anomaly detection, the use case automatically created a model for SVC consumption based on the SVC units, it would eventually trigger an outlier alert if an increasing abnormal trend is detected:

flex_global_svc009.png flex_global_svc010.png flex_global_svc011.png

TrackMe stack app consumer SVC consumption

Similarly, we can now leverage the second out of the box Flex Object use case to monitor the SVC consumption at the app level:

flx_app001.png

Once created and executed, this use case tracks the SVC consumption by the app consumer as registered by the SVC consumption KPIS:

flx_app002.png

The same SVC KPI is leveraged per consumer:

flx_app003.png

After some time, once we have started to collect enough historical knowledge:

flx_app004.png

Anomaly Outliers detection:

flx_app005.png flx_app006.png

From this stage, if a consumer starts to abnormally consume SVC, TrackMe’s Outlier detection will eventually trigger an alert.

Accessing TrackMe’s metrics and building your own dashboards and reports

You can easily access to the metrics collected for a given TrackMe Virtual Tenant from the Virtual Tenant home UI:

ui_tenant_metrics001.png ui_tenant_metrics002.png

Query example:

| mstats avg(trackme.splk.flx.splunk_cloud.svc.utilized_svc) as avg_svc,
max(trackme.splk.flx.splunk_cloud.svc.utilized_svc) as max_svc,
sum(trackme.splk.flx.splunk_cloud.svc.utilized_svc) as sum_svc,
perc95(trackme.splk.flx.splunk_cloud.svc.utilized_svc) as perc95_svc
where index=trackme_metrics tenant_id=svc-tracking object_category="splk-flx" object=* by object
| foreach *_svc [ eval <<FIELD>> = round('<<FIELD>>', 2) ]
ui_tenant_metrics003.png

Building a dashboard based on the above, the following would easily show the bigger SVC consumers:

| mstats avg(trackme.splk.flx.splunk_cloud.svc.utilized_svc) as avg_svc,
max(trackme.splk.flx.splunk_cloud.svc.utilized_svc) as max_svc,
sum(trackme.splk.flx.splunk_cloud.svc.utilized_svc) as sum_svc,
perc95(trackme.splk.flx.splunk_cloud.svc.utilized_svc) as perc95_svc
where index=trackme_metrics tenant_id=svc-tracking object_category="splk-flx" object=Splunk-Cloud:02-* by object
| foreach *_svc [ eval <<FIELD>> = round('<<FIELD>>', 2) ]
| fields object avg_svc
ui_tenant_metrics004.png

TrackMe Workload component and Splunk Cloud SVC consumption

TrackMe’s licensed customers can leverage the Workload component, which can also track SVC consumption on a per Splunk scheduled basis.

When you create the Workload Virtual Tenant, ensure to select “Splunk Cloud” as the type of Splunk environment:

ui_tenant_workload001.png

Once configured, the Workload component tracks various aspects of the health of Splunk scheduled, between other KPIs, the workload component also tracks SVC usage:

ui_tenant_workload002.png ui_tenant_workload003.png

We can for instance add an Outlier models against the SVC consumption:

ui_tenant_workload004.png

Note that by default, the Workload component would enable ML against the elapsed run time KPI, you can for instance while creating the tenant, or at a later stage, use SVC consumption instead.

ui_tenant_workload005.png

Accessing metrics from the Workload component

The following search query can be used to access the SVC consumption metrics collected by the Workload component:

you need to replace the tenant_id name with the one you have created.

| mstats avg(trackme.splk.wlk.svc_usage) as avg_svc,
max(trackme.splk.wlk.svc_usage) as max_svc,
sum(trackme.splk.wlk.svc_usage) as sum_svc,
perc95(trackme.splk.wlk.svc_usage) as perc95_svc
where index=trackme_metrics tenant_id=workload object_category="splk-wlk" object=* by object
| foreach *_svc [ eval <<FIELD>> = round('<<FIELD>>', 2) ]

Based on the previous example, you can easily build very interesting dashboards on top of this:

ui_tenant_workload006.png

TrackMe out of the box SVC consumption dashboard

While not directly related to the SVC consumption TrackMe tracks through the components explained above, it is worth mentioning that TrackMe also comes with an out of the box dashboard which is meant to audit the SVC usage that is related to TrackMe itself.

You can find this dashboard in the menu “Audit & Troubleshoot” / “TrackMe SVC usage stack”:

Note: the stack SVC entitlement comes as N/A because it is a dev Cloud stack which doesn’t come with a normal limit, unlike a proper Cloud stack.

ui_tenant_audit001.png