Summary
This module offers an in-depth exploration of Splunk, a leading platform in the field of cybersecurity analytics and threat detection. The module begins with a comprehensive introduction to Splunk, providing a solid understanding of its architecture, components, and core functionalities.
A significant focus of the module is on crafting effective detection-related SPL (Search Processing Language) searches. We will delve into the intricacies of the Search Processing Language, which serves as the backbone of Splunk's querying capabilities. Through hands-on exercises and guided tutorials, we will learn the syntax, commands, and techniques required to create powerful and targeted searches to identify security incidents, anomalies, and potential threats.
To reinforce the practical application of Splunk as a Security Information and Event Management (SIEM) tool, the module presents a real-world scenario. We will step into the shoes of a security analyst tasked with investigating a simulated security incident using Splunk. We will navigate through large volumes of machine data, leverage Splunk's search capabilities, and apply various data analysis techniques to identify the root cause of the incident and gather critical forensic evidence.
In addition to detection-focused searches, the module covers the creation of TTP-driven and analytics-driven SPL searches. TTP-driven searches involve crafting queries that align with known Tactics, Techniques, and Procedures commonly employed by threat actors. This approach helps security teams proactively detect and respond to sophisticated attacks by recognizing patterns associated with specific attack methods. Analytics-driven searches, on the other hand, leverage statistical analysis and mathematical models to identify abnormal behaviors and anomalies that may indicate potential security breaches.
Throughout the module, we will gain valuable insights into leveraging Splunk as a robust security monitoring and incident investigation tool. We will develop the skills needed to identify and understand the ingested data and available fields within Splunk.
This module is broken into sections with accompanying hands-on exercises to practice the techniques we cover. The module ends with a practical hands-on skills assessment to gauge your understanding of the various topic areas.
As you work through the module, you will see detection examples for the topics introduced. It is worth reproducing as many of these examples as possible to reinforce further the concepts presented in each section. You can do this in the target host provided in the interactive sections or your virtual machine.
You can start and stop the module anytime and pick up where you left off. There is no time limit or "grading," but you must complete all of the exercises and the skills assessment to receive the maximum number of cubes and have this module marked as complete in any paths you have chosen.
The module is classified as "hard" and assumes basic knowledge of Windows event logs and common attack principles.
A firm grasp of the following modules can be considered prerequisites for successful completion of this module:
- Penetration Testing Process
- Incident Handling Process
- Security Monitoring & SIEM Fundamentals
- Windows Event Logs & Finding Evil
- Introduction to Threat Hunting & Hunting With Elastic
Introduction To Splunk & SPL
What Is Splunk?
Splunk is a highly scalable, versatile, and robust data analytics software solution known for its ability to ingest, index, analyze, and visualize massive amounts of machine data. Splunk has the capability to drive a wide range of initiatives, encompassing cybersecurity, compliance, data pipelines, IT monitoring, observability, as well as overall IT and business management.
Splunk's (Splunk Enterprise) architecture
consists of several layers that work together to collect, index, search, analyze, and visualize data. The architecture can be divided into the following main components:
-
Forwarders
: Forwarders are responsible for data collection. They gather machine data from various sources and forward it to the indexers. The types of forwarders used in Splunk are:-
Universal Forwarder (UF)
: This is a lightweight agent that collects data and forwards it to the Splunk indexers without any preprocessing. Universal Forwarders are individual software packages that can be easily installed on remote sources without significantly affecting network or host performance. -
Heavy Forwarder (HF)
: This agent serves the purpose of collecting data from remote sources, especially for intensive data aggregation assignments involving sources like firewalls or data routing/filtering points. According to Splexicon, heavy forwarders stand out from other types of forwarders as they parse data before forwarding, allowing them to route data based on specific criteria such as event source or type. They can also index data locally while simultaneously forwarding it to another indexer. Typically, Heavy Forwarders are deployed as dedicated "data collection nodes" for API/scripted data access, and they exclusively support Splunk Enterprise. - Please note that there are
HTTP Event Collectors (HECs)
available for directly collecting data from applications in a scalable manner. HECs operate by using token-based JSON or raw API methods. In this process, data is sent directly to the Indexer level for further processing.
-
-
Indexers
: The indexers receive data from the forwarders, organize it, and store it in indexes. While indexing data, the indexers generate sets of directories categorized by age, wherein each directory hold compressed raw data and corresponding indexes that point to the raw data. They also process search queries from users and return results. -
Search Heads
: Search heads coordinate search jobs, dispatching them to the indexers and merging the results. They also provide an interface for users to interact with Splunk. On Search Heads,Knowledge Objects
can be crafted to extract supplementary fields and manipulate data without modifying the original index data. It is important to mention that Search Heads also offer various tools to enrich the search experience, including reports, dashboards, and visualizations. -
Deployment Server
: It manages the configuration for forwarders, distributing apps and updates. -
Cluster Master
: The cluster master coordinates the activities of indexers in a clustered environment, ensuring data replication and search affinity. -
License Master
: It manages the licensing details of the Splunk platform.
Splunk's key components
include:
-
Splunk Web Interface
: This is the graphical interface through which users can interact with Splunk, carrying out tasks like searching, creating alerts, dashboards, and reports. -
Search Processing Language (SPL)
: The query language for Splunk, allowing users to search, filter, and manipulate the indexed data. -
Apps and Add-ons
: Apps provide specific functionalities within Splunk, while add-ons extend capabilities or integrate with other systems. Splunk Apps enable the coexistence of multiple workspaces on a single Splunk instance, catering to different use cases and user roles. These ready-made apps can be found on Splunkbase, providing additional functionalities and pre-configured solutions. Splunk Technology Add-ons serve as an abstraction layer for data collection methods. They often include relevant field extractions, allowing for schema-on-the-fly functionality. Additionally, Technology Add-ons encompass pertinent configuration files (props/transforms) and supporting scripts or binaries. A Splunk App, on the other hand, can be seen as a comprehensive solution that typically utilizes one or more Technology Add-ons to enhance its capabilities. -
Knowledge Objects
: These include fields, tags, event types, lookups, macros, data models, and alerts that enhance the data in Splunk, making it easier to search and analyze.
Splunk As A SIEM Solution
When it comes to cybersecurity, Splunk can play a crucial role as a log management solution, but its true value lies in its analytics-driven Security Information and Event Management (SIEM) capabilities. Splunk as a SIEM solution can aid in real-time and historical data analysis, cybersecurity monitoring, incident response, and threat hunting. Moreover, it empowers organizations to enhance their detection capabilities by leveraging User Behavior Analytics.
As discussed, Splunk Processing Language (SPL) is a language containing over a hundred commands, functions, arguments, and clauses. It's the backbone of data analysis in Splunk, used for searching, filtering, transforming, and visualizing data.
Let's assume that main
is an index containing Windows Security and Sysmon logs, among others.
-
Basic Searching
The most fundamental aspect of SPL is searching. By default, a search returns all events, but it can be narrowed down with keywords, boolean operators, comparison operators, and wildcard characters. For instance, a search for error would return all events containing that word.
Boolean operators
AND
,OR
, andNOT
are used for more specific queries.The
search
command is typically implicit at the start of each SPL query and is not usually written out. However, here's an example using explicit search syntax:search index="main" "UNKNOWN"
By specifying the index as
main
, the query narrows down the search to only the events stored in themain
index. The termUNKNOWN
is then used as a keyword to filter and retrieve events that include this specific term.Note: Wildcards (
*
) can replace any number of characters in searches and field values. Example (implicit search syntax):index="main" "*UNKNOWN*"
This SPL query will search within the
main
index for events that contain the termUNKNOWN
anywhere in the event data. -
Fields and Comparison Operators
Splunk automatically identifies certain data as fields (like
source
,sourcetype
,host
,EventCode
, etc.), and users can manually define additional fields. These fields can be used with comparison operators (=
,!=
,<
,>
,<=
,>=
) for more precise searches. Example:index="main" EventCode!=1
This SPL (Splunk Processing Language) query is used to search within the
main
index for events that donot
have anEventCode
value of1
. -
The fields command
The
fields
command specifies which fields should be included or excluded in the search results. Example:index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | fields - User
After retrieving all process creation events from the
main
index, thefields
command excludes theUser
field from the search results. Thus, the results will contain all fields normally found in the Sysmon Event ID 1 logs, except for the user that initiated the process. Please note that utilizingsourcetype
restricts the scope exclusively to Sysmon event logs. -
The table command
The
table
command presents search results in a tabular format. Example:index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | table _time, host, Image
This query returns process creation events, then arranges selected fields (_time, host, and Image) in a tabular format.
_time
is the timestamp of the event,host
is the name of the host where the event occurred, andImage
is the name of the executable file that represents the process. -
The rename command
The
rename
command renames a field in the search results. Example:index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | rename Image as Process
This command renames the
Image
field toProcess
in the search results.Image
field in Sysmon logs represents the name of the executable file for the process. By renaming it, all the subsequent references toProcess
would now refer to what was originally theImage
field. -
The dedup command
The 'dedup' command removes duplicate events. Example:
index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | dedup Image
The
dedup
command removes duplicate entries based on theImage
field from the process creation events. This means if the same process (Image
) is created multiple times, it will appear only once in the results, effectively removing repetition. -
The sort command
The
sort
command sorts the search results. Example:index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | sort - _time
This command sorts all process creation events in the
main
index in descending order of their timestamps (_time), i.e., the most recent events are shown first. -
The stats command
The
stats
command performs statistical operations. Example:index="main" sourcetype="WinEventLog:Sysmon" EventCode=3 | stats count by _time, Image
This query will return a table where each row represents a unique combination of a timestamp (
_time
) and a process (Image
). The count column indicates the number of network connection events that occurred for that specific process at that specific time.However, it's challenging to visualize this data over time for each process because the data for each process is interspersed throughout the table. We'd need to manually filter by process (
Image
) to see the counts over time for each one. -
The chart command
The
chart
command creates a data visualization based on statistical operations. Example:index="main" sourcetype="WinEventLog:Sysmon" EventCode=3 | chart count by _time, Image
This query will return a table where each row represents a unique timestamp (
_time
) and each column represents a unique process (Image
). The cell values indicate the number of network connection events that occurred for each process at each specific time.With the
chart
command, you can easily visualize the data over time for each process because each process has its own column. You can quickly see at a glance the count of network connection events over time for each process. -
The eval command
The
eval
command creates or redefines fields. Example:index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | eval Process_Path=lower(Image)
This command creates a new field
Process_Path
which contains the lowercase version of theImage
field. It doesn't change the actualImage
field, but creates a new field that can be used in subsequent operations or for display purposes. -
The rex command
The
rex
command extracts new fields from existing ones using regular expressions. Example:index="main" EventCode=4662 | rex max_match=0 "[^%](?<guid>{.*})" | table guid
-
index="main" EventCode=4662
filters the events to those in themain
index with theEventCode
equal to4662
. This narrows down the search to specific events with the specified EventCode. -
rex max_match=0 "[^%](?<guid>{.*})"
uses the rex command to extract values matching the pattern from the events' fields. The regex pattern{.*}
looks for substrings that begin with{
and end with}
. The[^%]
part ensures that the match does not begin with a%
character. The captured value within the curly braces is assigned to the named capture groupguid
. -
table guid
displays the extracted GUIDs in the output. This command is used to format the results and display only theguid
field. - The
max_match=0
option ensures that all occurrences of the pattern are extracted from each event. By default, the rex command only extracts the first occurrence.
This is useful because GUIDs are not automatically extracted from 4662 event logs.
-
-
The lookup command
The
lookup
command enriches the data with external sources. Example:Suppose the following CSV file called
malware_lookup.csv
.filename, is_malware notepad.exe, false cmd.exe, false powershell.exe, false sharphound.exe, true randomfile.exe, true
This CSV file should be added as a new Lookup table as follows.
index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | rex field=Image "(?P<filename>[^\\\]+)$" | eval filename=lower(filename) | lookup malware_lookup.csv filename OUTPUTNEW is_malware | table filename, is_malware
-
index="main" sourcetype="WinEventLog:Sysmon" EventCode=1
: This is the search criteria. It's looking for Sysmon logs (as identified by the sourcetype) with an EventCode of 1 (which represents process creation events) in the "main" index. -
| rex field=Image "(?P<filename>[^\\\]+)$"
: This command is using the regular expression (regex) to extract a part of the Image field. The Image field in Sysmon EventCode=1 logs typically contains the full file path of the process. This regex is saying: Capture everything after the last backslash (which should be the filename itself) and save it as filename. -
| eval filename=lower(filename)
: This command is taking the filename that was just extracted and converting it to lowercase. The lower() function is used to ensure the search is case-insensitive. -
| lookup malware_lookup.csv filename OUTPUTNEW is_malware
: This command is performing a lookup operation using the filename as a key. The lookup table (malware_lookup.csv) is expected to contain a list of filenames of known malicious executables. If a match is found in the lookup table, the new field is_malware is added to the event, which indicates whether or not the process is considered malicious based on the lookup table.<-- filename in this part of the query is the first column title in the CSV
. -
| table filename, is_malware
: This command is formatting the output to show only the fields filename and is_malware. If is_malware is not present in a row, it means that no match was found in the lookup table for that filename.
In summary, this query is extracting the filenames of newly created processes, converting them to lowercase, comparing them against a list of known malicious filenames, and presenting the findings in a table.
An equivalent that also removes duplicates is the following.
index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | eval filename=mvdedup(split(Image, "\\")) | eval filename=mvindex(filename, -1) | eval filename=lower(filename) | lookup malware_lookup.csv filename OUTPUTNEW is_malware | table filename, is_malware | dedup filename, is_malware
-
index="main" sourcetype="WinEventLog:Sysmon" EventCode=1
: This command is the search criteria. It is pulling from themain
index where the sourcetype isWinEventLog:Sysmon
and theEventCode
is1
. The SysmonEventCode
of1
indicates a process creation event. -
| eval filename=mvdedup(split(Image, "\\"))
: This command is splitting theImage
field, which contains the file path, into multiple elements at each backslash and making it a multivalue field. Themvdedup
function is used to eliminate any duplicates in this multivalue field. -
| eval filename=mvindex(filename, -1)
: Here, themvindex
function is being used to select the last element of the multivalue field generated in the previous step. In the context of a file path, this would typically be the actual file name. -
| eval filename=lower(filename)
: This command is taking the filename field and converting it into lowercase using the lower function. This is done to ensure the search is not case-sensitive and to standardize the data. -
| lookup malware_lookup.csv filename OUTPUTNEW is_malware
: This command is performing a lookup operation. Thelookup
command is taking thefilename
field, and checking if it matches any entries in themalware_lookup.csv
lookup table. If there is a match, it appends a new field,is_malware
, to the event, indicating whether the process is flagged as malicious. -
| table filename, is_malware
: Thetable
command is used to format the output, in this case showing only thefilename
andis_malware
fields in a tabular format. -
| dedup filename, is_malware
: This command eliminates any duplicate events based on the filename and is_malware fields. In other words, if there are multiple identical entries for thefilename
andis_malware
fields in the search results, thededup
command will retain only the first occurrence and remove all subsequent duplicates.
In summary, this SPL query searches the Sysmon logs for process creation events, extracts the
file name
from theImage
field, converts it to lowercase, matches it against a list of known malware from themalware_lookup.csv
file, and then displays the results in a table, removing any duplicates based on thefilename
andis_malware
fields. -
-
The inputlookup command
The
inputlookup
command retrieves data from a lookup file without joining it to the search results. Example:| inputlookup malware_lookup.csv
This command retrieves all records from the
malware_lookup.csv
file. The result is not joined with any search results but can be used to verify the content of the lookup file or for subsequent operations like filtering or joining with other datasets. -
Time Range
Every event in Splunk has a timestamp. Using the time range picker or the
earliest
andlatest
commands, you can limit searches to specific time periods. Example:index="main" earliest=-7d EventCode!=1
By combining the
index="main"
condition withearliest=-7d
andEventCode!=1
, the query will retrieve events from themain
index that occurred in the last seven days and do not have anEventCode
value of1
. -
The transaction command
The
transaction
command is used in Splunk to group events that share common characteristics into transactions, often used to track sessions or user activities that span across multiple events. Example:index="main" sourcetype="WinEventLog:Sysmon" (EventCode=1 OR EventCode=3) | transaction Image startswith=eval(EventCode=1) endswith=eval(EventCode=3) maxspan=1m | table Image | dedup Image
-
index="main" sourcetype="WinEventLog:Sysmon" (EventCode=1 OR EventCode=3)
: This is the search criteria. It's pulling from themain
index where the sourcetype isWinEventLog:Sysmon
and theEventCode
is either1
or3
. In Sysmon logs,EventCode 1
refers to a process creation event, andEventCode 3
refers to a network connection event. -
| transaction Image startswith=eval(EventCode=1) endswith=eval(EventCode=3) maxspan=1m
: The transaction command is used here to group events based on the Image field, which represents the executable or script involved in the event. This grouping is subject to the conditions: the transaction starts with an event whereEventCode
is1
and ends with an event whereEventCode
is3
. Themaxspan=1m
clause limits the transaction to events occurring within a 1-minute window. The transaction command can link together related events to provide a better understanding of the sequences of activities happening within a system. -
| table Image
: This command formats the output into a table, displaying only theImage
field. -
| dedup Image
: Finally, thededup
command removes duplicate entries from the result set. Here, it's eliminating any duplicateImage
values. The command keeps only the first occurrence and removes subsequent duplicates based on theImage
field.
In summary, this query aims to identify sequences of activities (process creation followed by a network connection) associated with the same executable or script within a 1-minute window. It presents the results in a table format, ensuring that the listed executables/scripts are unique. The query can be valuable in threat hunting, particularly when looking for indicators of compromise such as rapid sequences of process creation and network connection events initiated by the same executable.
-
-
Subsearches
A subsearch in Splunk is a search that is nested inside another search. It's used to compute a set of results that are then used in the outer search. Example:
index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 NOT [ search index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | top limit=100 Image | fields Image ] | table _time, Image, CommandLine, User, ComputerName
In this query:
-
index="main" sourcetype="WinEventLog:Sysmon" EventCode=1
: The main search that fetchesEventCode=1 (Process Creation)
events. -
NOT []
: The square brackets contain the subsearch. By placingNOT
before it, the main search will exclude any results that are returned by the subsearch. -
search index="main" sourcetype="WinEventLog:Sysmon" EventCode=1 | top limit=100 Image | fields Image
: The subsearch that fetchesEventCode=1 (Process Creation)
events, then uses thetop
command to return the 100 most commonImage
(process) names. -
table _time, Image, CommandLine, User, Computer
: This presents the final results as a table, displaying the timestamp of the event (_time
), the process name (Image
), the command line used to execute the process (CommandLine
), the user that executed the process (User
), and the computer on which the event occurred (ComputerName
).
This query can help to highlight unusual or rare processes, which may be worth investigating for potential malicious activity. Be sure to adjust the limit in the subsearch as necessary to fit your environment.
As a note, this type of search can generate a lot of noise in environments where new and unique processes are frequently created, so careful tuning and context are important.
-
This is just the tip of the iceberg when it comes to SPL. Its vast command set and flexible syntax provide comprehensive data analysis capabilities. As with any language, proficiency comes with practice and experience. Find below some excellent resources to start with:
- https://docs.splunk.com/Documentation/SCS/current/SearchReference/Introduction
- https://docs.splunk.com/Documentation/SplunkCloud/latest/SearchReference/
- https://docs.splunk.com/Documentation/SplunkCloud/latest/Search/
How To Identify The Available Data
Data and field identification approach 1: Leverage Splunk's Search & Reporting Application (SPL)
In any robust Security Information and Event Management (SIEM) system like Splunk, understanding the available data sources, the data they provide, and the fields within them is critical to leveraging the system effectively. In Splunk, we primarily use the Search & Reporting application to do this. Let's delve into how we can identify data source types, data, and fields within Splunk.
Splunk can ingest a wide variety of data sources. We classify these data sources into source types that dictate how Splunk formats the incoming data. To identify the available source types, we can run the following SPL command, after selecting the suitable time range in the time picker of the Search & Reporting application
.
| eventcount summarize=false index=* | table index
This query uses eventcount
to count events in all indexes, then summarize=false
is used to display counts for each index separately, and finally, the table
command is used to present the data in tabular form.
| metadata type=sourcetypes
This search uses the metadata
command, which provides us with various statistics about specified indexed fields. Here, we're focusing on sourcetypes
. The result is a list of all sourcetypes
in our Splunk environment, along with additional metadata such as the first time a source type was seen (firstTime
), the last time it was seen (lastTime
), and the number of hosts (totalCount
).
For a simpler view, we can use the following search.
| metadata type=sourcetypes index=* | table sourcetype
Here, the metadata
command retrieves metadata about the data in our indexes. The type=sourcetypes
argument tells Splunk to return metadata about sourcetypes. The table
command is used to present the data in tabular form.
| metadata type=sources index=* | table source
This command returns a list of all data sources in the Splunk environment.
Once we know our source types, we can investigate the kind of data they contain. Let's say we are interested in a sourcetype named WinEventLog:Security
, we can use the table command to present the raw data as follows.
sourcetype="WinEventLog:Security" | table _raw
The table
command generates a table with the specified fields as columns. Here, _raw
represents the raw event data. This command will return the raw data for the specified source type.
Splunk automatically extracts a set of default fields for every event it indexes, but it can also extract additional fields depending on the source type of the data. To see all fields available in a specific source type, we can use the fields
command.
sourcetype="WinEventLog:Security" | table *
This command generates a table with all fields available in the WinEventLog:Security
sourcetype. However, be cautious, as the use of table *
can result in a very wide table if our events have a large number of fields. This may not be visually practical or effective for data analysis.
A better approach is to identify the fields you are interested in using the fields
command as mentioned before, and then specifying those field names in the table
command. Example:
sourcetype="WinEventLog:Security" | fields Account_Name, EventCode | table Account_Name, EventCode
If we want to see a list of field names only, without the data, we can use the fieldsummary
command instead.
sourcetype="WinEventLog:Security" | fieldsummary
This search will return a table that includes every field found in the events returned by the search (across the sourcetype we've specified). The table includes several columns of information about each field:
-
field
: The name of the field. -
count
: The number of events that contain the field. -
distinct_count
: The number of distinct values in the field. -
is_exact
: Whether the count is exact or estimated. -
max
: The maximum value of the field. -
mean
: The mean value of the field. -
min
: The minimum value of the field. -
numeric_count
: The number of numeric values in the field. -
stdev
: The standard deviation of the field. -
values
: Sample values of the field.
We may also see:
-
modes
: The most common values of the field. -
numBuckets
: The number of buckets used to estimate the distinct count.
Please note that the values provided by the fieldsummary
command are calculated based on the events returned by our search. So if we want to see all fields within a specific sourcetype
, we need to make sure our time range is large enough to capture all possible fields.
index=* sourcetype=* | bucket _time span=1d | stats count by _time, index, sourcetype | sort - _time
Sometimes, we might want to know how events are distributed over time. This query retrieves all data (index=* sourcetype=*
), then bucket
command is used to group the events based on the _time
field into 1-day buckets. The stats
command then counts the number of events for each day (_time
), index
, and sourcetype
. Lastly, the sort
command sorts the result in descending order of _time
.
index=* sourcetype=* | rare limit=10 index, sourcetype
The rare
command can help us identify uncommon event types, which might be indicative of abnormal behavior. This query retrieves all data and finds the 10 rarest combinations of indexes and sourcetypes.
index="main" | rare limit=20 useother=f ParentImage
This command displays the 20 least common values of the ParentImage
field.
index=* sourcetype=* | fieldsummary | where count < 100 | table field, count, distinct_count
A more complex query can provide a detailed summary of fields. This search shows a summary of all fields (fieldsummary
), filters out fields that appear in less than 100 events (where count < 100
), and then displays a table (table
) showing the field name, total count, and distinct count.
index=* | sistats count by index, sourcetype, source, host
We can also use the sistats
command to explore event diversity. This command counts the number of events per index, sourcetype, source, and host, which can provide us a clear picture of the diversity and distribution of our data.
index=* sourcetype=* | rare limit=10 field1, field2, field3
The rare command can also be used to find uncommon combinations of field values. Replace field1
, field2
, field3
with the fields of interest. This command will display the 10 rarest combinations of these fields.
By combining the above SPL commands and techniques, we can explore and understand the types of data source, the data they contain, and the fields within them. This understanding is the foundation upon which we build effective searches, reports, alerts, and dashboards in Splunk.
Lastly, remember to follow your organization's data governance policies when exploring data and source types to ensure you're compliant with all necessary privacy and security guidelines.
You can apply each of the searches mentioned above successfully against the data within the Splunk instance of the target that you can spawn below. We highly recommend running and even modifying these searches to familiarize yourself with SPL (Search Processing Language) and its capabilities.
Data and field identification approach 2: Leverage Splunk's User Interface
When using the Search & Reporting
application's user interface, identifying the available data source types, the data they contain, and the fields within them becomes a task that involves interacting with various sections of the UI. Let's examine how we can effectively use the Splunk Web interface to identify these elements.
-
Data Sources
: The first thing we want to identify is our data sources. We can do this by navigating to theSettings
menu, which is usually located on the top right corner of our Splunk instance. In the dropdown, we'll findData inputs
. By clicking onData inputs
, we'll see a list of various data input methods, including files & directories, HTTP event collector, forwarders, scripts, and many more. These represent the various sources through which data can be brought into Splunk. Clicking into each of these will give us an overview of the data sources being utilized. -
Data (Events)
: Now, let's move on to identifying the data itself, in other words, the events. For this, we'll want to navigate to theSearch & Reporting
app. By exploring the events in theFast
mode, we can quickly scan through our data. TheVerbose
mode, on the other hand, lets us dive deep into each event's details, including its raw event data and all the fields that Splunk has extracted from it. In the search bar, we could simply put*
and hit search, which will bring up all the data that we have indexed. However, this is usually a massive amount of data, and it might not be the most efficient way to go about it. A better approach might be to leverage the time picker and select a smaller time range (let's be careful while doing so though to not miss any important/useful historic logs). -
Fields
: Lastly, to identify the fields in our data, let's look at an event in detail. We can click on any event in our search results to expand it. We can also see on the left hand side of the "Search & Reporting" application two categories of fields:Selected Fields
andInteresting Fields
.Selected Fields
are fields that are always shown in the events (likehost
,source
, andsourcetype
), whileInteresting Fields
are those that appear in at least 20% of the events. By clickingAll fields
, we can see all the fields present in our events. -
Data Models
: Data Models provide an organized, hierarchical view of our data, simplifying complex datasets into understandable structures. They're designed to make it easier to create meaningful reports, visualizations, and dashboards without needing a deep understanding of the underlying data sources or the need to write complex SPL queries. Here is how we can use the Data Models feature to identify and understand our data:-
Accessing Data Models
: To access Data Models, we click on theSettings
tab on the top right corner of the Splunk Web interface. Then we selectData Models
under theKnowledge
section. This will lead us to the Data Models management page.<-- If it appears empty, please execute a search and navigate to the Data Models page again.
-
Understanding Existing Data Models
: On the Data Models management page, we can see a list of available Data Models. These might include models created by ourselves, our team, or models provided by Splunk Apps. Each Data Model is associated with a specific app context and is built to describe the structured data related to the app's purpose. -
Exploring Data Models
: By clicking on the name of a Data Model, we are taken to theData Model Editor
. This is where the true power of Data Models lies. Here, we can view the hierarchical structure of the data model, which is divided intoobjects
. Each object represents a specific part of our data and containsfields
that are relevant to that object. For example, if we have a Data Model that describes web traffic, we might see objects likeWeb Transactions
,Web Errors
, etc. Within these objects, we'll find fields likestatus
,url
,user
, etc.
-
-
Pivots
: Pivots are an extremely powerful feature in Splunk that allows us to create complex reports and visualizations without writing SPL queries. They provide an interactive, drag-and-drop interface for defining and refining our data reporting criteria. As such, they're also a fantastic tool for identifying and exploring the available data and fields within our Splunk environment. To start with Pivots to identify available data and fields, we can use thePivot
button that appears when we're browsing a particular data model in theData Models
page.
Practical Exercises
Navigate to the bottom of this section and click on Click here to spawn the target system!
Now, navigate to http://[Target IP]:8000
, open the Search & Reporting
application, and answer the questions below.