Accelerated searching is a great way to reduce search runtime by pre-computing (basically pre-running) frequently searched data. There are different ways to get value from pre-computing these results and different ways to store these results depending on how they’ll get used. In this module, you will explore a few different options. You'll work with storing summarized data in a lookup table first, then storing data in a summary index, and wrap up with metricizing events to store in a metrics index.
The SOC is interested in using machine learning to help detect when an anomalous amount of data is being downloaded or uploaded to OneDrive. They’d like to use outlier detection to accomplish this, but in order to do this they want to search off of the past 90 days of OneDrive data every hour to figure out a baseline for what is a normal amount of data transfer. Searching over the past 90 days of OneDrive data every hour to detect anomalies will put an unnecessary amount of search load on Splunk and there’s a better way: summary indexing!
In this task you'll configure a search to gather metrics related to OneDrive usage, and save those results to a lookup table. This lookup table can then be used to establish a baseline for what "normal" OneDrive usage looks like, and used in an alert instead of searching over the raw 90 days of data every time the search powering the alert needs to run.
- In module 1, you redirected the OneDrive-related events to go to the
storage_services
index, however for this task we'll want to reference as much data as possible, so you'll be searching against OneDrive data in both themain
andstorage_services
index. Run the search below to make sure data is getting pulled back for the whole 30 minutes:
index=storage_services OR index=main sourcetype=o365:management:activity Workload=OneDrive
- Add the timechart command onto the search to see the count per-minute for each Operation. Just for testing purposes (and this workshop), you'll be looking at the data with a 1-minute granularity (span), but if you were deploying this for the SOC you'd use a 1-hour granularity (span).
index=storage_services OR index=main sourcetype=o365:management:activity Workload=OneDrive
| timechart span=1m count by Operation
- Create a new lookup named
onedrive-activity.csv
by adding theoutputlookup
command to the saerch:
index=storage_services OR index=main sourcetype=o365:management:activity Workload=OneDrive
| timechart span=1m count by Operation
| outputlookup onedrive-activity.csv
This search can be saved and configured to run as a saved search once-a-day so that the SOC's anomaly detection search needs to search has up-to-date information via the lookup table to reference, as opposed to searching the raw data every time the alert needs to run. In the real world (outside of the workshop), the timeframe for the search would be changed to something more substantial (ex looking at the past 90 days as opposed to 30 minutes), and the span would be updated to match with the SOC's alert (ex span=1h
instead of span=1m
).
- Make sure the lookup table was created correctly by verifying this search below, which dumps the contents of the lookup table:
| inputlookup onedrive-activity.csv
Now that the lookup table has been created, the search that created the lookup table can be saved and configured to run automatically to update the lookup table using the outputlookup
command.
- The data from the lookup table can be appended to a search using the
append
andinputlookup
commands. Set the time range picker to 1 minute and run the search below to see this in action:
index=storage_services OR index=main sourcetype=o365:management:activity Workload=OneDrive
| timechart span=1m count by Operation
| append
[| inputlookup onedrive-activity.csv]
Notice how that even though the search is searching for 1 minute of results (as set by the time picker), there are more than 1 minute of results. The other results are being populated by the lookup table.
The sales department would like to keep summarized, statistical data on product purchases long-term. Ideally, these summarized events would be kept around for seven years, which is significantly over the retention period of the purchasing events in the punches index. The sales department has explained they don't need the raw purchase records in Splunk long-term, just the summarized data.
In this task, you'll be configuring summary indexing to store this data long-term.
You’ll be using the collect
command and summary indexing in this task. Here’s a few tips/tricks to using collect to help with summary indexing.
- It's considered a best practice to use a transforming command (like
timechart
orstats
) to aggregate the data before sending it to the summary index. - The
marker
option in thecollect
command can be used to add additional fields to help describe the summarized data. - We highly recommend versioning the data being sent to the summary index in case you need to iterate on the search generating the data to be summarized or the collect command itself. We’ll be doing that in this task using the
marker
option. - Use the
testmode
option for the collect command when writing and testing the search to avoid writing data to the destination index. - By default,
collect
will set the sourcetype tostash
, which will not impact license utilization. If you set the sourcetype in thecollect
command to something other thanstash
, this will impact license usage. - Use a
marker
ofsearch_name
to make it easy to figure out what search the summarized data came from - Make the search summarizing the data performant to minimize Splunk resource (and SVC) utilization.
Follow these instructions to configure the summary indexing:
- Find the data you'll be summarizing by running the search below
index=purchases sourcetype=web_purchases
- The sales team is only interested in knowing about purchases, so filter for just those records
index=purchases sourcetype=web_purchases action=purchase
- The sales team would like total revenue (cost of the product) per-product. They'd like the data aggregated by day, but for testing purposes (and this workshop), use 1-minute granularity
index=purchases sourcetype=web_purchases action=purchase
| timechart span=1m sum(cost) as revenue by product
- Add the
collect
command next. The search below will take the results from thetimechart
command, send the data to thesummary
index, with a markers ofversion=100
, &search_name=summarize_purchases
. Settestmode
on this search to view the results without sending any data to the summary index.
index=purchases sourcetype=web_purchases action=purchase
| timechart span=1m sum(cost) as revenue by product
| collect index=summary source=summarize_purchases marker="version=100, search_name=summarize_purchases" testmode=true
If you zoom into the screenshot (or the results on your screen), you can see the data that will be sent to the summary index in the _raw
field.
Notice how the data is dropped into key-value format with each filed from the lookup table being a field, and the value from the table being a value for that key.
- At this point, the data looks good, so the scheduled search can be created without the
testmode=true
option. However, since this is a workshop, run the searches below to verify send data to the summary index then verify the data can be searched back:
index=purchases sourcetype=web_purchases action=purchase
| timechart span=1m sum(cost) as revenue by product
| collect index=summary source=summarize_purchases marker="version=100, search_name=summarize_purchases"
index=summary version=100
| table *
Notice how the data can be sent to a table just like if the original timechart
command were run. This summarized data can be kept inside of the summary
index and since it'll take up less space it'll cost less to store as compared to storing the raw events.
- If you're interested in determining how much less space the summarized events take up, run the search below:
(index=purchases sourcetype=web_purchases action=purchase) OR (index=summary version=100)
| eval eventSize=len(_raw)
| stats sum(eventSize) as total_event_size by index
Notice how the event sizes per index are significantly different, with summarized events in the summary
index being significantly smaller than the raw events in the purchases
index.
Similarly, the networking team would like to keep aggregated statistics long-term on VPC Flow Logs. Ideally, this data would be metricized and stored as metric-style events to minimize data storage costs and maximize query performance. The networking team would like to store how many bytes were transmitted to/from each ENI in each AWS account at a 1-minute granularity, then keep this data for 365 days. They’d like this data placed in the summary_metrics
index.
- Run the search below to find the VPC Flow log data that will be metricized
index=aws sourcetype=aws:cloudwatchlogs:vpcflow
- Since the
timechart
command has a limitation of only being able to group by 1 field, use a combination of thebin
andstats
commands to aggregate the data on a 1-minute granularity by the ENI ID, AWS account number, and time (orinterface_id
,account_id
, and_time
respectively).
index=aws sourcetype=aws:cloudwatchlogs:vpcflow
| bin span=1m _time
| stats sum(bytes) as sum_bytes by interface_id, account_id, _time
- Use the rename command to properly format the
sum_bytes
field into a metric name format
index=aws sourcetype=aws:cloudwatchlogs:vpcflow
| bin span=1m _time
| stats sum(bytes) as sum_bytes by interface_id, account_id, _time
| rename sum_bytes as metric_name:sum_bytes
- Now, use the
mcollect
command to send the metricized events to thesummary_metrics
index, specifyingaccount_id
andinterface_id
as metric dimensions, with a version of100
using a marker &search_name
ofsummarize_vpcflow
using markers:
index=aws sourcetype=aws:cloudwatchlogs:vpcflow
| bin span=1m _time
| stats sum(bytes) as sum_bytes by interface_id, account_id, _time
| rename sum_bytes as metric_name:sum_bytes
| mcollect index=summary_metrics source=aws-vpcflow marker="version=100, search_name=summarize_vpcflow" account_id, interface_id
- You can check your work by using the
mpreview
command:
| mpreview index=summary_metrics filter=version=100
From here, a saved search can be created to regularly (on a cadence) metricize the events and send them to the summary_metrics
index. The networking team can then search the metricized events using the mstats
command.