Splunk SOAR Cloud & on-premise monitoring and active actions in TrackMe
Hint
Version 2.0.46 and later
The SOAR integration requires TrackMe version 2.0.46 and later
SOAR High Availability management for Automation Brokers is available from TrackMe version 2.0.50 and later
Caution
Splunk Remote Account: You must use the trackme namespace
If you use a remote account to run the SOAR active monitoring (such as managing Automation Brokers active/passive), you must use the TrackMe namespace when configuring the Splunk remote account.
This is required because TrackMe by default is shared at the application level, so unless you share TrackMe at the system level, your remote account would not be able to access to TrackMe custom commands on the remote partner.
1. Introduction to SOAR monitoring
TrackMe provides builtin use cases to actively and efficiently monitor at scale one or more Splunk SOAR environments.
SOAR Cloud & on-premise monitoring is performed via the TrackMe Flex Object component (splk-flx) which is a restricted component not available with the Free community edition of TrackMe.
The monitoring relies on TrackMe’s SOAR integration and the Splunk Application for SOAR integration, associated with TrackMe’s Flex Object concepts, and provides the following use cases:
uc_ref |
uc_description |
uc_implementation_comments |
---|---|---|
|
Monitor for playbook run actions failures in SOAR |
This use case relies on SOAR events indexed in Splunk through the Splunk Application for SOAR integration. It monitors for app run action failures and triggers shall there be any failures, the alert will clear itself once failures cannot be found in the time range. You may want to adapt the search time range according to your preferences to keep the alert active for a certain amount of time after detection happened. |
|
Monitor for playbook actions failures in SOAR |
This use case relies on SOAR events indexed in Splunk through the Splunk Application for SOAR integration. It monitors for playbook action failures and triggers shall there be any failures, the alert will clear itself once failures cannot be found in the time range. You may want to adapt the search time range according to your preferences to keep the alert active for a certain amount of time after detection happened. |
|
Runs an active assets health check, parse and render results |
This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, if active_check is set to True, it performs an active asset health check (connectivity test) by discovering assets, running a POST call to request the connectivity test, then parses and renders results for each asset. If active_check is set to False, it simply parses assets and checks the latest connectivity test achieved by SOAR and renders results. Running the active check requires the automation user in SOAR to have the edit asset permissions, if you do not want this or cannot set it, you can disable the active check and rely on SOAR doing it automatically once per day, therefore a failure detection can take up to 24 hours, enable active check is recommended. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used. You can restrict the list of assets to be taken into account using the assets_allow_list option, and/or avoid some assets to be taken into account using assets_block_list option. The SOAR automation used for the SOAR server account configuration needs to be able to update assets if active check is enabled as it performs an active test request against the SOAR REST API. |
|
Monitor SOAR CPU load from the SOAR API health endpoint |
This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, it retrieves the 1/5/15 minutes CPU load performing a GET call to the health endpoint. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used. |
|
Monitor SOAR CPU load from the SOAR API health endpoint |
This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, it retrieves and calculates the percentage of memory usage of the SOAR instance performing a GET call to the health endpoint. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used. |
|
Get the status of SOAR services from the SOAR API health endpoint |
This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, retrieving the status of SOAR services performing a GET call to the health endpoint. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used. |
|
Monitor the status of SOAR Automation Brokers from the SOAR API |
This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, retrieving the status of SOAR Automation Brokers performing a GET call to the automation_proxy endpoint. The soar_server can be specified with the option soar_server=<account>, if unspecified or set to *, the first account will be used. |
|
Monitor the status of SOAR Automation Brokers and update SOAR Assets |
This use case relies on TrackMe’s SOAR integration and the Splunk App for SOAR integration, it monitors the status of Automation Brokers and updates SOAR Assets as needed when Automation Brokers are detected as inactive, this allows automated High Availability for SOAR Automation Brokers via TrackMe’s integration. |
|
Monitopr the status of SOAR forwarding to Splunk |
This use case relies on Splunk indexed data to detect if there any issues with SOAR embedded forwarding to Splunk, it monitors for containers indexed in Splunk and would ideally be used in conjonction with the SOAR Timer App to regularly create health check containers based on an ingest interval. |
These use cases are provided via the Flex Object use cases library, but not that you can also manually implement new use cases, or customise builtin use cases as needed.
You can review the use case details ahead of their creation in TrackMe with the following command:
| trackmesplkflxgetuc | search uc_vendor=Splunk uc_category=splunk_soar
2. Requirements for SOAR monitoring
SOAR Cloud & SOAR on-premise
TrackMe’s SOAR integration is compatible with both SOAR Cloud and SOAR on-premise deployments.
Splunk Application for SOAR integration
TrackMe’s integration for SOAR relies on the Splunk SOAR integration for Splunk provided by the Splunk App for SOAR:
Hint
Splunk App for SOAR
TrackMe performs active and bi-directional interactions with the SOAR API
TrackMe relies on the Splunk App for SOAR integration and connectivity definition to achieve this job
You do not need to have any additional configuration in TrackMe as long as the Splunk App for SOAR is installed and configured
Some of the SOAR use cases rely on the SOAR events indexed in Splunk (index-phantom_*) which is also part of the Splunk App for SOAR integration
TrackMe leverages the Splunk App for SOAR and extends its capabilities even further
TrackMe tenant with the Flex Object component for SOAR
You need a TrackMe tenant with the Flex Object component enabled, you can decide to create a dedicated tenant for the monitoring of SOAR, and use any existing tenant of your choice.
Once the Flex trackers have been created, TrackMe automatically groups the resulting entities for SOAR into the following groups:
uc_ref |
grouping |
---|---|
splk_soar_actions_apprun_failures, splk_soar_actions_playbooks_failures |
Splunk_SOAR:actions_health |
splk_soar_assets_health |
Splunk_SOAR:asset_health |
splk_soar_infra_load, splk_soar_services_health |
Splunk_SOAR:infrastructure |
splk_soar_services_health |
Splunk_SOAR:services_health |
splk_soar_automation_brokers_monitor |
Splunk_SOAR:automation_broker_monitor |
splk_soar_automation_brokers_manage |
Splunk_SOAR:automation_broker_manage |
TrackMe deployment target
In TrackMe, you can choose where the SOAR monitoring Flex Objects use cases are executed, this can be either “local” or a remote deployment.
Note that some of the use cases would require TrackMe to be deployed on the remote target, as well as the Splunk App for SOAR configured on the remote target.
3. Implementation of SOAR monitoring
3.1 Integration architecture overview
The following diagram represents the integration from an high level perspective:
As a brief summary:
REST API related use cases in TrackMe imply an active interaction with SOAR API
TrackMe retrieves and loads the SOAR account configuration from the Splunk App for SOAR, you do not need to define anything in TrackMe
TrackMe implements its own REST queries against the SOAR API as needed, in some cases such as the Assets connectivity check, this interaction may imply a bi-directional integration with POST and GET calls performed by TrackMe
Some other use cases only deal with the SOAR data indexed in Splunk, and do not imply an API interaction
3.2 Creating SOAR Flex Object trackers
Hint
Running actions on a remote Splunk Search Head tier
You can leverage TrackMe’s splunkremotesearch to run these actions transparently on a remote Splunk Search Head tier
In this case, the Splunk App for SOAR would be installed and configured on the target (not on TrackMe’s Search Head)
However, TrackMe needs to be be installed on the target Splunk Search Head tier, although it is unconfigured and not actively running anything (if you run in Splunk Cloud Victoria, TrackMe is installed on all Search Heads automatically)
You need to create a TrackMe remote account which uses a token related a Splunk user on the remote Search Head
TrackMe leverages a minimalist least privileges approach, the user on the remote Search Head tier only needs to be a member of the power role and trackme_admin role (or have equivalent capabilities)
This requires TrackMe version 2.0.48 and later as we have addressed some issues to allow minimal permissions to be used
Once a TrackMe Virtual Tenant with the Flex Object (splk-flx) has been created, the setup is really straightforward and done via the UI:
Then click on Create a new Flex Tracker, select Splunk as the vendor and splunk_soar as the category, the next step is to review and validate the results, this is all:
Example of results with the Assets active health check:
This asset is failing the connectivity check:
This asset passes successfully:
Setup is completed, the number of resulting entities depends on your environment:
4. Interacting with the SOAR API with GET and POST calls in pure SPL
The following sections show the root searches and technical details of each TrackMe’s SOAR use cases.
Refer to the SOAR API reference documentation as needed:
4.1 Running GET calls to SOAR API
trackmesplksoar:
TrackMe comes with a custom command (generating custom command) called trackmesplksoar
, this SPL command does the interface with the Splunk App for SOAR and the SOAR API itself, usage:
| trackmesplksoar soar_server=<soar_server> action=<action> action_data=<json action data>
The command handles different options:
Syntax:
soar_server: the name of the SOAR server as configured in the Splunk App for SOAR
action: an action in the following support list:
soar_get|soar_post|soar_test_apps|soar_health_status|soar_health_memory|soar_health_load|soar_automation_broker_manage
action_data: a JSON formatted object, either used by specific actions or used to perform a POST query to a SOAR endpoint
Example: the following command allows retrieving the health information from the SOAR API, it targets the endpoint rest/health_status:
| trackmesplksoar soar_server=* action=soar_get action_data="{'endpoint': 'health'}" | spath
This action allows calling any SOAR API endpoint performing a GET call only.
Another example, to retrieve the configuration of all SOAR Assets, you would:
| trackmesplksoar soar_server=* action=soar_get action_data="{'endpoint': 'asset'}" | spath
4.2 Pre-built actions and advanced parsing
In some cases, such as the one above, it may not be very straightforward to parse and use the API endpoints results, for instance when there are nested JSON structures as part of the response.
For instance, for the ease of the integration, the custom command provides builtin use cases which will parse the JSON results as needed and render the results properly, for instance the following relies on the health endpoint and extracts the status of SOAR services:
| trackmesplksoar soar_server=* action="soar_health_status"
4.3 Performing a POST call to a SOAR API endpoint via TrackMe
In fact, you can perform a POST to a SOAR API endpoint using TrackMe’s integration, which the Splunk App for SOAR does not provide.
For instance, to run a POST against an Asset and request an immediate health check, you would first retrieve the list of assets to identify the ID of the Asset:
See:
https://docs.splunk.com/Documentation/SOAR/current/PlatformAPI/RESTInfo
https://docs.splunk.com/Documentation/SOAR/current/PlatformAPI/RESTAssets
| trackmesplksoar soar_server=* action=soar_get action_data="{'endpoint': 'asset'}" | spath
Then, you would:
| trackmesplksoar soar_server=* action=soar_post action_data="{'endpoint': 'asset/7', 'data': '{\"test\": \"true\"}'}"
Note: “recieved: true” is the actual response from SOAR, this is not a typo from TrackMe but a typo in the SOAR API currently for this endpoint!
Another example, let’s request the sync refresh of a Git Repository via the SCM endpoint:
See:
First, we run a GET call to get the ID of the SCM Git repository configuration:
| trackmesplksoar soar_server=* action=soar_get action_data="{'endpoint': 'scm'}"
We run a POST call again to request the SCM update, we have a few changes that were actually merged to the branch:
| trackmesplksoar soar_server=* action=soar_post action_data="{'endpoint': 'scm/5', 'data': '{\"pull\": \"true\", \"force\": \"true\"}'}"
If we had no operations pending, we would have got the following message returned:
4.4 TrackMe’s REST API splk-soar endpoints
Underneath, the traxckmesplksoar
command interacts with the TrackMe API endpoints for SOAR, you can find the endpoints references and their documentation in the API reference dashboard: (Menu API & Tooling)
5. SOAR Application status active health check
A key monitoring capability for SOAR is to be able to detect as soon as possible when an application and its corresponding asset is experiencing an issue.
An asset that fails the status health check means that actions cannot be performed anymore, either ad-hocs actions requested by analysts or automated actions in the context of SOAR Playbooks.
Failures can happen for a variety of reasons, such as service accounts credential expiration, firewall rules changes or loss, API changes, SOAR Application upgrades, and many more.
This is an highly critical aspect of SOAR monitoring, especially true as Playbooks will only execute an action when needed, so the Application integration failure could remain undetected before it’s too late.
SOAR has a builtin and mandatory connectivity feature per application, and per asset, however this health check is not continuously performed by SOAR, so only monitoring the /rest/app_status
endpoint would result in monitoring outdated information.
TrackMe handles this task by performing a multi-steps bi-directional integration with the SOAR REST API, summarised in the following diagram:
Advanced pre-built bi-directional interactions with Assets active check
The SOAR Assets active check in TrackMe relies on some multi-steps and bi-directional actions performed by TrackMe, the default usage of the command is the following:
| trackmesplksoar soar_server=* action=soar_test_apps action_data="{\"active_check\": \"True\", \"assets_allow_list\": \"None\", \"assets_block_list\": \"None\"}"
In summary, the command does the following:
Retrieve the list of assets eligible for a connection test
For each asset, perform a POST call against the Asset API endpoint which requests an immediate application health check
Record the response and store for later usage in the process
Retrieve the Asset app_status endpoint result, parse and render the final results
The command can accept several options:
Active checks
The default behaviour with active_check: True
means TrackMe will actively perform the check bu running the POST call as described above:
If disabled with active_check: False
, TrackMe does not perform the POST call and instead relies on SOAR doing it once per day, therefore it can take up to 24 hours before a failing application could be detected.
Note that active_check: True
requires the SOAR automation user to have the edit Assets permissions:
Allow list and Block list:
You can mix the usage of asset_allow_list and asset_block_list to restrict and/or avoid some assets from being taken into account, for instance to avoid including the SMTP asset “internal_smtp”:
| trackmesplksoar soar_server=* action=soar_test_apps action_data="{\"active_check\": \"True\", \"assets_allow_list\": \"None\", \"assets_block_list\": \"internal_smtp\"}"
6. SOAR Automation Brokers High Availability with TrackMe
Hint
TrackMe Version 2.0.50 required
This feature is available from TrackMe Version 2.0.50
The builtin use case splk_soar_manage_automation_brokers
allows managing SOAR Automation Brokers High Availability using TrackMe, based on the following workflow:
Retrieve the list of Automation Brokers and their status from the SOAR API
Retrieve the list of SOAR Assets and the association with Automation Brokers
For each Automation Broker, if the broker status is inactive, perform an update of associated Assets to the next available and online Automation Broker
Automation broker status: the status is provided by the SOAR API and is based on the health check performed automatically by the SOAR Automation Broker.
Additional options can be used to control and restrict the High Availability features, consult the examples below.
High level workflow diagram:
6.1 Requirements
The SOAR Automation User requires Assets management permissions:
6.2 SOAR Automation Broker High Availability - failure detection and Assets updates
The tracking and management of SOAR Automation Brokers is achieved by running the following command through the builtin use case splk_soar_manage_automation_brokers:
| trackmesplksoar soar_server=* action=soar_automation_broker_manage
The command calls the TrackMe REST API endpoint /services/trackme/v2/splk_soar/admin/soar_automation_broker_manage
, the following options are available:
soar_server
: The SOAR server account as defined in the Splunk App for SOAR, if unspecified or set to *, the first server in the Splunk application for SOAR configuration will be usedmode
: Optional, the run mode, valid options are simulation | live, in simulation mode the Asset is not updated and only the message of the action to be performed is registered, in live assets are updated as needed, defaults to liveautomation_active1_broker_name
: Optional, first active automation broker, specify a couple of brokers active1/active2, both must be specified or none should specified, this targets both active1/active2 brokers and will switch Assets configuration depending on the broker status.automation_active2_broker_name
: Optional, second active automation broker, specify a couple of brokers active1/active2, both must be specified or none should specified, this targets both active1/active2 brokers and will switch Assets configuration depending on the broker status.
Example 1: Monitor all Automation Brokers and act automatically
Let’s consider the following scenario:
AB-UK-01
is our first Automation BrokerAB-UK-02
is our second Automation BrokerBoth Automation Brokers can equally handle Assets actions
Hint
More than two brokers
If there are more than two brokers, TrackMe will update Assets and associate these with any of the active Automation Brokers, randomly chosen
The command is called:
| trackmesplksoar soar_server=* action=soar_automation_broker_manage | spath
Which results in:
As we can observe:
Both Automation Brokers are currently online and active (last_seen_status)
Currently, all Assets using a Broker are using
AB-UK-01
(associated_assets_count and associated_assets)There are errors reported nor specific messages, as bother brokers are
active
Now, let’s provoke an outage on the first broker and run the command again:
The following happened:
AB-UK-01
is seen as inactive because the SOAR Automation Broker health check failed, the SOAR API knows that the broker is offlineTrackMe built the list of SOAR Assets related to this failing broker, then it automatically updated each Asset to use any of the remaining active Automation Broker (currently we only have
AB-UK-02
)Messages were added to the result of the call to provide a clear context of what was performed
Logs can be found at:
index=_internal sourcetype=trackme:rest_api post_soar_automation_broker_manage
At the next run of the command:
All Assets are now associated with the Automation Broker AB-UK-02
, AK-UK-01
keeps being offline, let’s fix that:
Both Automation Brokers are now active and functional, Assets that were updated to the use AB-UK-02
remain associated with it as long as the broker remains active
.
Example 2: Monitor a couple of two active/active Automation Brokers
In more complex scenarios, you may have different zones that are addressed by specific Automation Brokers.
TrackMe’s integration allows to specify options to target a couple of active/active brokers:
automation_active1_broker_name
: name of the first Automation Brokerautomation_active2_broker_name
: name of the second active Automation Broker
With these options, TrackMe will consider these two Automation Brokers only, and update Assets accordingly to the status of the brokers, if Active1
is inactive and Active2
is active, associated assets are updated to use Active2
, and vice-versa.
Hint
More than two brokers
With more than two brokers, you will simply create a Flex Tracker per couple of active/active Automation Brokers
Active1 has an issue and is now inactive, Assets are updated to use active2:
At the next run, Assets are now associated with Active2 instead:
Both Automation Brokers are now active and functional, Assets that were updated to the target the second active Automation Broker remain associated with it as long as the broker remains active
.