Splunk SOAR Cloud & on-premise monitoring and active actions in TrackMe

Hint

Version 2.0.46 and later

  • The SOAR integration requires TrackMe version 2.0.46 and later

  • SOAR High Availability management for Automation Brokers is available from TrackMe version 2.0.50 and later

Caution

Splunk Remote Account: You must use the trackme namespace

  • If you use a remote account to run the SOAR active monitoring (such as managing Automation Brokers active/passive), you must use the TrackMe namespace when configuring the Splunk remote account.

  • This is required because TrackMe by default is shared at the application level, so unless you share TrackMe at the system level, your remote account would not be able to access to TrackMe custom commands on the remote partner.

remote_account_warning.png

1. Introduction to SOAR monitoring

TrackMe provides builtin use cases to actively and efficiently monitor at scale one or more Splunk SOAR environments.

SOAR Cloud & on-premise monitoring is performed via the TrackMe Flex Object component (splk-flx) which is a restricted component not available with the Free community edition of TrackMe.

The monitoring relies on TrackMe’s SOAR integration and the Splunk Application for SOAR integration, associated with TrackMe’s Flex Object concepts, and provides the following use cases:

uc_ref

uc_description

uc_implementation_comments

splk_soar_actions_apprun_failures

Monitor for playbook run actions failures in SOAR

This use case relies on SOAR events indexed in Splunk through the Splunk Application for SOAR integration. It monitors for app run action failures and triggers shall there be any failures, the alert will clear itself once failures cannot be found in the time range. You may want to adapt the search time range according to your preferences to keep the alert active for a certain amount of time after detection happened.

splk_soar_actions_playbooks_failures

Monitor for playbook actions failures in SOAR

This use case relies on SOAR events indexed in Splunk through the Splunk Application for SOAR integration. It monitors for playbook action failures and triggers shall there be any failures, the alert will clear itself once failures cannot be found in the time range. You may want to adapt the search time range according to your preferences to keep the alert active for a certain amount of time after detection happened.

splk_soar_assets_health

Runs an active assets health check, parse and render results

This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, if active_check is set to True, it performs an active asset health check (connectivity test) by discovering assets, running a POST call to request the connectivity test, then parses and renders results for each asset. If active_check is set to False, it simply parses assets and checks the latest connectivity test achieved by SOAR and renders results. Running the active check requires the automation user in SOAR to have the edit asset permissions, if you do not want this or cannot set it, you can disable the active check and rely on SOAR doing it automatically once per day, therefore a failure detection can take up to 24 hours, enable active check is recommended. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used. You can restrict the list of assets to be taken into account using the assets_allow_list option, and/or avoid some assets to be taken into account using assets_block_list option. The SOAR automation used for the SOAR server account configuration needs to be able to update assets if active check is enabled as it performs an active test request against the SOAR REST API.

splk_soar_infra_load

Monitor SOAR CPU load from the SOAR API health endpoint

This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, it retrieves the 1/5/15 minutes CPU load performing a GET call to the health endpoint. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used.

splk_soar_infra_memory

Monitor SOAR CPU load from the SOAR API health endpoint

This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, it retrieves and calculates the percentage of memory usage of the SOAR instance performing a GET call to the health endpoint. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used.

splk_soar_services_health

Get the status of SOAR services from the SOAR API health endpoint

This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, retrieving the status of SOAR services performing a GET call to the health endpoint. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used.

splk_soar_automation_brokers_monitor

Monitor the status of SOAR Automation Brokers from the SOAR API

This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, retrieving the status of SOAR Automation Brokers performing a GET call to the automation_proxy endpoint. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used.

splk_soar_automation_brokers_manage

Monitor the status of SOAR Automation Brokers and update SOAR Assets

This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, it monitors the status of Automation Brokers and updates SOAR Assets as needed when Automation Brokers are detected as inactive, this allows automated High Availability for SOAR Automation Brokers via TrackMe’s integration.

These use cases are provided via the Flex Object use cases library, but not that you can also manually implement new use cases, or customise builtin use cases as needed.

You can review the use case details ahead of their creation in TrackMe with the following command:

| trackmesplkflxgetuc | search uc_vendor=Splunk uc_category=splunk_soar
screen0.png

2. Requirements for SOAR monitoring

SOAR Cloud & SOAR on-premise

TrackMe’s SOAR integration is compatible with both SOAR Cloud and SOAR on-premise deployments.

Splunk Application for SOAR integration

TrackMe’s integration for SOAR relies on the Splunk SOAR integration for Splunk provided by the Splunk App for SOAR:

Hint

Splunk App for SOAR

  • TrackMe performs active and bi-directional interactions with the SOAR API

  • TrackMe relies on the Splunk App for SOAR integration and connectivity definition to achieve this job

  • You do not need to have any additional configuration in TrackMe as long as the Splunk App for SOAR is installed and configured

  • Some of the SOAR use cases rely on the SOAR events indexed in Splunk (index-phantom_*) which is also part of the Splunk App for SOAR integration

  • TrackMe leverages the Splunk App for SOAR and extends its capabilities even further

TrackMe tenant with the Flex Object component for SOAR

You need a TrackMe tenant with the Flex Object component enabled, you can decide to create a dedicated tenant for the monitoring of SOAR, and use any existing tenant of your choice.

Once the Flex trackers have been created, TrackMe automatically groups the resulting entities for SOAR into the following groups:

uc_ref

grouping

splk_soar_actions_apprun_failures, splk_soar_actions_playbooks_failures

Splunk_SOAR:actions_health

splk_soar_assets_health

Splunk_SOAR:asset_health

splk_soar_infra_load, splk_soar_services_health

Splunk_SOAR:infrastructure

splk_soar_services_health

Splunk_SOAR:services_health

splk_soar_automation_brokers_monitor

Splunk_SOAR:automation_broker_monitor

splk_soar_automation_brokers_manage

Splunk_SOAR:automation_broker_manage

TrackMe deployment target

In TrackMe, you can choose where the SOAR monitoring Flex Objects use cases are executed, this can be either “local” or a remote deployment.

Note that some of the use cases would require TrackMe to be deployed on the remote target, as well as the Splunk App for SOAR configured on the remote target.

3. Implementation of SOAR monitoring

3.1 Integration architecture overview

The following diagram represents the integration from an high level perspective:

diagram.png

As a brief summary:

  • REST API related use cases in TrackMe imply an active interaction with SOAR API

  • TrackMe retrieves and loads the SOAR account configuration from the Splunk App for SOAR, you do not need to define anything in TrackMe

  • TrackMe implements its own REST queries against the SOAR API as needed, in some cases such as the Assets connectivity check, this interaction may imply a bi-directional integration with POST and GET calls performed by TrackMe

  • Some other use cases only deal with the SOAR data indexed in Splunk, and do not imply an API interaction

3.2 Creating SOAR Flex Object trackers

Hint

Running actions on a remote Splunk Search Head tier

  • You can leverage TrackMe’s splunkremotesearch to run these actions transparently on a remote Splunk Search Head tier

  • In this case, the Splunk App for SOAR would be installed and configured on the target (not on TrackMe’s Search Head)

  • However, TrackMe needs to be be installed on the target Splunk Search Head tier, although it is unconfigured and not actively running anything (if you run in Splunk Cloud Victoria, TrackMe is installed on all Search Heads automatically)

  • You need to create a TrackMe remote account which uses a token related a Splunk user on the remote Search Head

  • TrackMe leverages a minimalist least privileges approach, the user on the remote Search Head tier only needs to be a member of the power role and trackme_admin role (or have equivalent capabilities)

  • This requires TrackMe version 2.0.48 and later as we have addressed some issues to allow minimal permissions to be used

Once a TrackMe Virtual Tenant with the Flex Object (splk-flx) has been created, the setup is really straightforward and done via the UI:

screen1.png

Then click on Create a new Flex Tracker, select Splunk as the vendor and splunk_soar as the category, the next step is to review and validate the results, this is all:

screen2.png

Example of results with the Assets active health check:

screen3.png

This asset is failing the connectivity check:

screen4.png

This asset passes successfully:

screen5.png

Setup is completed, the number of resulting entities depends on your environment:

screen6.png screen7.png

4. Interacting with the SOAR API with GET and POST calls in pure SPL

The following sections show the root searches and technical details of each TrackMe’s SOAR use cases.

Refer to the SOAR API reference documentation as needed:

4.1 Running GET calls to SOAR API

trackmesplksoar:

TrackMe comes with a custom command (generating custom command) called trackmesplksoar, this SPL command does the interface with the Splunk App for SOAR and the SOAR API itself, usage:

| trackmesplksoar soar_server=<soar_server> action=<action> action_data=<json action data>

The command handles different options:

Syntax:

  • soar_server: the name of the SOAR server as configured in the Splunk App for SOAR

  • action: an action in the following support list: soar_get|soar_post|soar_test_apps|soar_health_status|soar_health_memory|soar_health_load|soar_automation_broker_manage

  • action_data: a JSON formatted object, either used by specific actions or used to perform a POST query to a SOAR endpoint

Example: the following command allows retrieving the health information from the SOAR API, it targets the endpoint rest/health_status:

| trackmesplksoar soar_server=* action=soar_get action_data="{'endpoint': 'health'}" | spath
screen8.png

This action allows calling any SOAR API endpoint performing a GET call only.

Another example, to retrieve the configuration of all SOAR Assets, you would:

| trackmesplksoar soar_server=* action=soar_get action_data="{'endpoint': 'asset'}" | spath

4.2 Pre-built actions and advanced parsing

In some cases, such as the one above, it may not be very straightforward to parse and use the API endpoints results, for instance when there are nested JSON structures as part of the response.

For instance, for the ease of the integration, the custom command provides builtin use cases which will parse the JSON results as needed and render the results properly, for instance the following relies on the health endpoint and extracts the status of SOAR services:

| trackmesplksoar soar_server=* action="soar_health_status"
screen9.png

4.3 Performing a POST call to a SOAR API endpoint via TrackMe

In fact, you can perform a POST to a SOAR API endpoint using TrackMe’s integration, which the Splunk App for SOAR does not provide.

For instance, to run a POST against an Asset and request an immediate health check, you would first retrieve the list of assets to identify the ID of the Asset:

See:

| trackmesplksoar soar_server=* action=soar_get action_data="{'endpoint': 'asset'}" | spath
get_and_post_001.png

Then, you would:

| trackmesplksoar soar_server=* action=soar_post action_data="{'endpoint': 'asset/7', 'data': '{\"test\": \"true\"}'}"
get_and_post_002.png

Note: “recieved: true” is the actual response from SOAR, this is not a typo from TrackMe but a typo in the SOAR API currently for this endpoint!

Another example, let’s request the sync refresh of a Git Repository via the SCM endpoint:

See:

First, we run a GET call to get the ID of the SCM Git repository configuration:

| trackmesplksoar soar_server=* action=soar_get action_data="{'endpoint': 'scm'}"
post_example01.png

We run a POST call again to request the SCM update, we have a few changes that were actually merged to the branch:

| trackmesplksoar soar_server=* action=soar_post action_data="{'endpoint': 'scm/5', 'data': '{\"pull\": \"true\", \"force\": \"true\"}'}"
post_example02.png

If we had no operations pending, we would have got the following message returned:

post_example03.png

4.4 TrackMe’s REST API splk-soar endpoints

Underneath, the traxckmesplksoar command interacts with the TrackMe API endpoints for SOAR, you can find the endpoints references and their documentation in the API reference dashboard: (Menu API & Tooling)

screen10.png

5. SOAR Application status active health check

A key monitoring capability for SOAR is to be able to detect as soon as possible when an application and its corresponding asset is experiencing an issue.

An asset that fails the status health check means that actions cannot be performed anymore, either ad-hocs actions requested by analysts or automated actions in the context of SOAR Playbooks.

Failures can happen for a variety of reasons, such as service accounts credential expiration, firewall rules changes or loss, API changes, SOAR Application upgrades, and many more.

This is an highly critical aspect of SOAR monitoring, especially true as Playbooks will only execute an action when needed, so the Application integration failure could remain undetected before it’s too late.

SOAR has a builtin and mandatory connectivity feature per application, and per asset, however this health check is not continuously performed by SOAR, so only monitoring the /rest/app_status endpoint would result in monitoring outdated information.

TrackMe handles this task by performing a multi-steps bi-directional integration with the SOAR REST API, summarised in the following diagram:

diagram_active_check.png

Advanced pre-built bi-directional interactions with Assets active check

The SOAR Assets active check in TrackMe relies on some multi-steps and bi-directional actions performed by TrackMe, the default usage of the command is the following:

| trackmesplksoar soar_server=* action=soar_test_apps action_data="{\"active_check\": \"True\", \"assets_allow_list\": \"None\", \"assets_block_list\": \"None\"}"
example_active_check.png

In summary, the command does the following:

  • Retrieve the list of assets eligible for a connection test

  • For each asset, perform a POST call against the Asset API endpoint which requests an immediate application health check

  • Record the response and store for later usage in the process

  • Retrieve the Asset app_status endpoint result, parse and render the final results

The command can accept several options:

Active checks

The default behaviour with active_check: True means TrackMe will actively perform the check bu running the POST call as described above:

If disabled with active_check: False, TrackMe does not perform the POST call and instead relies on SOAR doing it once per day, therefore it can take up to 24 hours before a failing application could be detected.

Note that active_check: True requires the SOAR automation user to have the edit Assets permissions:

permissions-assets.png

Allow list and Block list:

You can mix the usage of asset_allow_list and asset_block_list to restrict and/or avoid some assets from being taken into account, for instance to avoid including the SMTP asset “internal_smtp”:

| trackmesplksoar soar_server=* action=soar_test_apps action_data="{\"active_check\": \"True\", \"assets_allow_list\": \"None\", \"assets_block_list\": \"internal_smtp\"}"

6. SOAR Automation Brokers High Availability with TrackMe

Hint

TrackMe Version 2.0.50 required

  • This feature is available from TrackMe Version 2.0.50

The builtin use case splk_soar_manage_automation_brokers allows managing SOAR Automation Brokers High Availability using TrackMe, based on the following workflow:

  • Retrieve the list of Automation Brokers and their status from the SOAR API

  • Retrieve the list of SOAR Assets and the association with Automation Brokers

  • For each Automation Broker, if the broker status is inactive, perform an update of associated Assets to the next available and online Automation Broker

Automation broker status: the status is provided by the SOAR API and is based on the health check performed automatically by the SOAR Automation Broker.

Additional options can be used to control and restrict the High Availability features, consult the examples below.

High level workflow diagram:

ab-high-availability-diagram.png

6.1 Requirements

The SOAR Automation User requires Assets management permissions:

permissions-assets.png

6.2 SOAR Automation Broker High Availability - failure detection and Assets updates

The tracking and management of SOAR Automation Brokers is achieved by running the following command through the builtin use case splk_soar_manage_automation_brokers:

| trackmesplksoar soar_server=* action=soar_automation_broker_manage

The command calls the TrackMe REST API endpoint /services/trackme/v2/splk_soar/admin/soar_automation_broker_manage, the following options are available:

  • soar_server: The SOAR server account as defined in the Splunk App for SOAR, if unspecified or set to *, the first server in the Splunk application for SOAR configuration will be used

  • mode: Optional, the run mode, valid options are simulation | live, in simulation mode the Asset is not updated and only the message of the action to be performed is registered, in live assets are updated as needed, defaults to live

  • automation_active1_broker_name: Optional, first active automation broker, specify a couple of brokers active1/active2, both must be specified or none should specified, this targets both active1/active2 brokers and will switch Assets configuration depending on the broker status.

  • automation_active2_broker_name: Optional, second active automation broker, specify a couple of brokers active1/active2, both must be specified or none should specified, this targets both active1/active2 brokers and will switch Assets configuration depending on the broker status.

ab-ha-implementation001.png ab-ha-implementation002.png ab-ha-implementation003.png ab-ha-implementation004.png ab-ha-implementation005.png

Example 1: Monitor all Automation Brokers and act automatically

Let’s consider the following scenario:

  • AB-UK-01 is our first Automation Broker

  • AB-UK-02 is our second Automation Broker

  • Both Automation Brokers can equally handle Assets actions

Hint

More than two brokers

  • If there are more than two brokers, TrackMe will update Assets and associate these with any of the active Automation Brokers, randomly chosen

The command is called:

| trackmesplksoar soar_server=* action=soar_automation_broker_manage | spath

Which results in:

ab-ha-001.png

As we can observe:

  • Both Automation Brokers are currently online and active (last_seen_status)

  • Currently, all Assets using a Broker are using AB-UK-01 (associated_assets_count and associated_assets)

  • There are errors reported nor specific messages, as bother brokers are active

Now, let’s provoke an outage on the first broker and run the command again:

ab-ha-002.png

The following happened:

  • AB-UK-01 is seen as inactive because the SOAR Automation Broker health check failed, the SOAR API knows that the broker is offline

  • TrackMe built the list of SOAR Assets related to this failing broker, then it automatically updated each Asset to use any of the remaining active Automation Broker (currently we only have AB-UK-02)

  • Messages were added to the result of the call to provide a clear context of what was performed

Logs can be found at:

index=_internal sourcetype=trackme:rest_api post_soar_automation_broker_manage
ab-ha-003.png

At the next run of the command:

ab-ha-004.png

All Assets are now associated with the Automation Broker AB-UK-02, AK-UK-01 keeps being offline, let’s fix that:

ab-ha-005.png

Both Automation Brokers are now active and functional, Assets that were updated to the use AB-UK-02 remain associated with it as long as the broker remains active.

Example 2: Monitor a couple of two active/active Automation Brokers

In more complex scenarios, you may have different zones that are addressed by specific Automation Brokers.

TrackMe’s integration allows to specify options to target a couple of active/active brokers:

  • automation_active1_broker_name: name of the first Automation Broker

  • automation_active2_broker_name: name of the second active Automation Broker

With these options, TrackMe will consider these two Automation Brokers only, and update Assets accordingly to the status of the brokers, if Active1 is inactive and Active2 is active, associated assets are updated to use Active2, and vice-versa.

Hint

More than two brokers

  • With more than two brokers, you will simply create a Flex Tracker per couple of active/active Automation Brokers

ab-ha-006.png

Active1 has an issue and is now inactive, Assets are updated to use active2:

ab-ha-007.png

At the next run, Assets are now associated with Active2 instead:

ab-ha-008.png

Both Automation Brokers are now active and functional, Assets that were updated to the target the second active Automation Broker remain associated with it as long as the broker remains active.