Elastic sources for feeds tracking

Introduction to Elastic sources

Elastic sources feature versus Hybrid Trackers in TrackMe V2

  • In TrackMe V2 and in a large number of use cases, the concept of Elastic Sources is superseded by Hybrid Trackers

  • Indeed, an Elastic Source creates and maintains a single TrackMe entity, where Hybrid Trackers can manage from a few to ten of thousands of entities

  • However, there are also use cases where Elastic Sources can be required and can be very valuable to meet your requirements or to address specific use cases such as tracking lookups and more

What does the Elastic sources feature?

  • The Elastic sources feature provides a builtin workflow to create virtual data sources based on any constraints and any Splunk language

  • In TrackMe V2, the target can be the local Search Head tier, as well as any remote Splunk deployment or instance using TrackMe’s builtin remote search capabilities

  • Elastic Sources allows extending TrackMe feeds tracking capability in a rich and flexible way

  • Elastic Sources can be based on tstats, raw, from (datamodel and lookup) and mstats searches

Some examples of use cases for Elastic Sources:

  • Creating a Virtual entity which is the combination of different indexes, sourcetypes, sources and so forth with additional search filters, using indexed or search time extracted fields

  • Creating a Virtual entity based on a Splunk search language non supported by Hybrid Trackers (from which allows calling lookups or datamodels)

  • Create a Virtual entity based on a raw search, opposed to tstats used typically on Hybrid Trackers for performance purposes, and where the use case requires using search time extracted only fields

  • Tracking specific Splunk objects such as lookup content over time (CSV based lookup or KVstore), to detect if the lookup is being updated and also apply Machine Learning based outliers detection on the number of records

Shared Elastic versus Dedicated Elastic:

  • An Elastic Source can be added to the Shared tracker, or created as an independent Dedicated Tracker

  • Shared Elastic Sources are orchestrated by the Elastic Sources shared tracker

  • Dedicated Elastic Sources are orchestrated by their own scheduled report

  • The Shared Elastic Sources tracker processes searches in a multithreading parallel way (the max number of concurrent searches is configurable at the TrackMe system level and can be overridden on a per tenant basis)

  • When the Shared Elastic Sources tracker reaches its maximal run time (which is automatically calculated depending on its cron schedule and time margin configuration), it stops its execution to avoid generating skipping searches

  • On the next iteration, the Shared Elastic Sources tracker will resume its execution starting entities which have not been inspected previously (but the oldest age of the inspection date)

  • On the other hand, Dedicated Elastic trackers are fully independent scheduled reports, their life recycle remains orchestrated by TrackMe but these run independently from each other

Accessing the Elastic source creation UI

First, let’s expose how to access the Elastic sources interface, from the data sources tab in the main UI, click on the Elastic Sources button:

img_v2/admin_elastic/screen01.png

The following screen appears:

img_v2/admin_elastic/screen02.png

Elastic source example 1: raw search with search time only extracted fields

Hint

tstats versus raw

  • These steps are in fact the same whenever you use a raw search, or a tstats based search

  • In the first case and in the example below, we use a raw search to benefit from search time extracted fields

  • A tstats based search is how highly recommendable when possible, performances are infinitely better than raw searches, and computing costs much lower

In this example, requirements are:

  • To monitor the Web data coming from a specific Website (trackme-solutions.com)

  • Fields that can be used are only available at search time, creating indexed time fields for this use case would be challenging and possibly overkill for this edge use case (assuming we use tstats Hybrid Trackers globally)

  • The volume of data is relatively low, allowing efficient searches to be performed

The following search constraints provides access to the data we need:

(index=webserver site="trackme-solutions.com")

We will create a new Shared Elastic source in a just a couple of easy steps:

Step 1: define name of the entity

  • This defines the name of the entity, stored as the value for object and alias in TrackMe’s collections (you probably want something meaningful)

  • Also define the target, local means that we looking at data searchable from the local Search head tier, remote would target a remote account previously configured

img_v2/admin_elastic/example1_01.png

Step 2: choose the language

  • You can choose between different options, in our case we will use raw because we want to use search time extracted fields

img_v2/admin_elastic/example1_02.png

Then we define the search constraint:

  • The search constraint should target valid filters, allowing the search to be as efficient as possible and targeting the data we want to monitor

img_v2/admin_elastic/example1_03.png

Hint

Defining the search constraint

  • The search constraint is meant to be the list of filters providing access to the data you want to monitor

  • This can a simple constraint, or a complex one using OR and AND operators, different levels of parenthesis, etc

  • However, it is not meant to be the place where would perform complex regular expression extractions for this instance, this should should happen at the props and transforms level instead

In step 3, you should define proper values for index and sourcetypes:

  • These are virtual values, they do not impact the search itself but are used to classify the entity in the UI

  • You can choose anything you want, but it is recommended to use values that are meaningful to you

img_v2/admin_elastic/example1_04.png

Finally in Step 4, define the earliest and latest time ranges:

  • These are relevant for both Shared and Dedicated Sources

  • The values influence how performing this search will be, especially with raw searches

  • In this example, we reduce the earliest to -1h, latest should be in the future to be able to detect events abnormally defined in the future

img_v2/admin_elastic/example1_05.png

Finally, after the simulation, we validate the creation as Shared or Dedicated:

img_v2/admin_elastic/example1_06.png img_v2/admin_elastic/example1_07.png img_v2/admin_elastic/example1_08.png img_v2/admin_elastic/example1_10.png

Elastic source example 2: tracking lookups update and number of records

In this example, we will demo how we can monitor the content of a Splunk lookup, KVstore or CSV based lookup, and get alerted if the lookup is not updated as expected.

Lookups are very useful in Splunk, we use these all the time as intermediate storage, for enrichment purposes, etc.

However, and this is especially true for security use cases, being capable to detect if a lookup used in other use cases has stopped being updated or maintained is a critical requirement.

In our example, we have a KVstore based lookup called cybops_net_traffic_monitor which is updated on a regular basis and contains some associations and pre-calculations for network devices:

img_v2/admin_elastic/example2_01.png

We will use the following Elastic Source definition to monitor our lookup, and we will leverage the exec_time field to define the _time value:

| from lookup:cybops_net_traffic_monitor | eval _time=exec_time
img_v2/admin_elastic/example2_02.png

Administering Shared Elastic Sources

Shared Elastic sources are orchestrated by the Shared Elastic tracker, this tracker is automatically created along with the Virtual Tenant creation. (if the component splk-dsm is enabled)

The tracker is called:

  • trackme_dsm_shared_elastic_tracker_tenant_<tenant_id>

The tracker is a scheduler wrapper which calls a Python engine:

| trackmeelasticexecutor tenant_id="<tenant_id>" component="splk-dsm"

You can find its execution logs using the following search:

index=_internal sourcetype=trackme:custom_commands:trackmeelasticexecutor

The Shared Elastic tracker loads entities to be processed from a KVstore, the following search shows the content of the KVstore: (update the tenant_id)

| inputlookup trackme_dsm_elastic_shared_tenant_<tenant_id>

Its execution is influenced by the following configuration item(s), go in Configuration / splk-general:

Shared Elastic tracker configuration

Option

Purpose

Default

Concurrent searches Elastic

System level number of parallel concurrent searches for Shared Elastic sources, this can be overridden on a per tenant basis using max_concurrent_searches on the Shared Elastic tracker

3

You can review the performance run time of the Shared Elastic Tracker using the deep dive dashboard:

img_v2/admin_elastic/shared_deepdive.png

When the job starts, the following message is shown:

2023-11-09 22:21:11,840 INFO trackmeelasticexecutor.py generate 426 tenant_id="09-demo-elastic", component="splk-dsm", report="trackme_dsm_shared_elastic_tracker_tenant_09-demo-elastic", Elastic Sources shared job started, max_concurrent_searches=3, margin_sec=60

This message indicates the current number of max concurrent searches, as well as the time margin in seconds to calculate the max run time of the full job.

The max concurrent job is first defined at the global level as explained above, but it can also be overridden on a per tenant basis adding the argument in the search definition:

**Syntax:** **max_concurrent_searches=****
**Description:** The max number of searches to be executed in parallel, if set to a different value than the system default, this value wins.

the margin in seconds defined the time in seconds to be deducing from the total cron translation in seconds, for instance if the job is scheduled to run every 5 minutes, the cron sequence is 300, the job will interrupt at 300-<margin_sec>, this value can be overridden on a per tenant basis adding the argument:

**Syntax:** **margin_sec=****
**Description:** The time in seconds used as a margin when calculating the max runtime depending on the cron schedule.
If the search is triggered every 5 minutes, the max runtime will be 5 minutes less the margin_sec value.

When the job inspects an entity to be processed, it will show the following message: (example)

2023-11-09 22:32:28,903 INFO trackmeelasticexecutor.py process_elastic_object 176 tenant_id="09-demo-elastic", component="splk-dsm", report="trackme_dsm_shared_elastic_tracker_tenant_09-demo-elastic", processing elastic object, object="elastic:local:lookup:cybops_net_traffic_monitor", last inspection with tracker_runtime="2023-11-09 22:26:57", elastic_report_root_search="| from lookup:cybops_net_traffic_monitor | eval _time=exec_time
| eventstats max(_time) as indextime | eval _indextime=if(isnum(_indextime), _indextime, indextime) | fields - indextime
| eval host=if(isnull(host), "none", host)
| stats max(_indextime) as data_last_ingest, min(_time) as data_first_time_seen, max(_time) as data_last_time_seen, count as data_eventcount, dc(host) as dcount_host
| eval latest_eventcount_5m=data_eventcount
| eval object="elastic:local:lookup:cybops_net_traffic_monitor", data_index="lookups", data_sourcetype="cybops_net_traffic_monitor" | `trackme_elastic_dedicated_tracker("09-demo-elastic")`
| eval tenant_id="09-demo-elastic"
| stats count as report_entities_count by tenant_id
| `register_tenant_component_summary(09-demo-elastic, dsm)`"

Once the search has been executed, TrackMe shows is summary result and its run time information:

2023-11-09 22:32:57,742 INFO trackmeelasticexecutor.py process_elastic_object 245 tenant_id="09-demo-elastic", component="splk-dsm", object="elastic:local:lookup:cybops_net_traffic_monitor", report="trackme_dsm_shared_elastic_tracker_tenant_09-demo-elastic", Entity search successfully executed, status="success", run_time="28.839"

If the system reaches the max number of concurrent searches (Splunk wise), TrackMe will re-attempt automatically for a certain number of times, which is visible with a warning message:

2023-11-09 22:46:54,302 WARNING trackmeelasticexecutor.py process_elastic_object 265 tenant_id="09-demo-elastic", component="splk-dsm", report="trackme_dsm_shared_elastic_tracker_tenant_09-demo-elastic", temporary search failure, current_failures_count=4, max_failures_count=24, exception="HTTP 503 Service Unavailable -- b'{"messages":[{"type":"WARN","text":"Search not executed: The maximum number of concurrent historical searches on this instance has been reached., concurrency_category=\\"historical\\", concurrency_context=\\"instance-wide\\", current_concurrency=30, concurrency_limit=30","help":""}]}'"

Once all entities to be processed have been processed effectively, the job will show the following final message which includes the total run time: (which runtime is shown in the deep dive dashboard)

2023-11-09 22:53:18,138 INFO trackmeelasticexecutor.py generate 535 tenant_id="09-demo-elastic", component="splk-dsm", report="trackme_dsm_shared_elastic_tracker_tenant_09-demo-elastic", Elastic Sources shared job successfully executed, status="success", run_time="82.069", entities_count="6"

Administrating Dedicated Elastic Sources

Dedicated Elastic trackers are independent scheduled reports, their life cycle is orchestrated by TrackMe but they run independently from each other.

img_v2/admin_elastic/dedicated_screen01.png

Each tracker is composed by two reports, the tracker which is scheduled and the wrapper which is called by the tracker:

img_v2/admin_elastic/dedicated_screen02.png

The tracker calls the trackmetrackerexecutor command:

img_v2/admin_elastic/dedicated_screen03.png

Execution logs can be found with the following search:

index=_internal sourcetype=trackme:custom_commands:trackmetrackerexecutor

Similarly, use the deepdive performance dashboard to review the tracker performance over time:

img_v2/admin_elastic/dedicated_deepdive.png

TrackMe stores dedicated Elastic trackers main Metadata in the following KVstore:

| inputlookup trackme_dsm_elastic_dedicated_tenant_<tenant_id>