IPMES (Incremental Behavioral Pattern Matching Algorithm over the System Audit Event Stream for APT Detection) is a system that performs incremental pattern matching over event streams.
This repository holds the original source code for IMPES-java.
- littleponywork/IPMES - the official version (Java) for DSN 2024.
- XYFC128/IPMES_PLUS - successor version implemented with Rust.
If you use IPMES in your research, please cite our paper published at the 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN): IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream
BibTeX entry:
@INPROCEEDINGS {10646966,
author = { Li, Hong-Wei and Liu, Ping-Ting and Lin, Bo-Wei and Liao, Yi-Chun and Huang, Yennun },
booktitle = { 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) },
title = {{ IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream }},
year = {2024},
volume = {},
ISSN = {},
pages = {265-273},
doi = {10.1109/DSN58291.2024.00036},
url = {https://doi.ieeecomputersociety.org/10.1109/DSN58291.2024.00036},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month =Jun}
The Provenance Graph is a streaming directed graph constructed by system logs. Each event in the log corresponds to an directed edge (arc) in the provenance graph. The points in the provenance graph represent the objects in the system like a file, a process or a network socket. The endpoints of an arc is the object that triggers the event and the object that receives the event respectively. For example, if a process opens a file, there will be an arc from the process to the file and labeled open.
The provenance graph is called streaming sice it is constrcted by event log stream. The graph is given arc-by-arc over time and will keep growing.
The behavioral pattern is consists of a graph and a order relation. The graph part of the pattern is similar to the definition of the provenance graph. The order relation specifies the order of some edges in the pattern graph.
The subgraph
- there is a bijective function
$F$ from$V(G)$ to$V(P)$ such that-
$uv \in E(G)$ if and ony if$F(u)F(v) \in E(P)$ for all$v \in V(G)$ - lable of
$v$ is the same as the lable of$F(v)$ for all$v \in V(G)$
-
- the incoming order of the edges in
$G$ satisifies the order relations
TL;DR, we are given:
- Provenance Graph
- Behavioral Pattern
- Graph
- Order relations
expected output:
- The matched subgraphs
See the example at TTP11_regex_edge
The oRels file reprensents the order relations of the edges in the pattern. The file format is a json object like this:
{
"root": {
"parents": [],
"children": [
0,
3,
6
]
},
"0": {
"parents": ["root"],
"children": [2]
}, ...
}
The order relations is expressed in a connected DAG with a root. The root is a virtual vertex, not representing any edge. The other vertices in the graph have a number lable n, corresponding to the n-th edge in the edge file, n starting from 0.
In the DAG, the parents are the dependencies of it's childs, meaning the occurrence of a child must after all of it's parent.
See the example at TTP11_oRels
In csv format. The columns in the csv: [start_time
, end_time
, event_sig
, eid
, start_id
, end_id
]
start_time
: the event start timeend_time
: the event end timeevent_sig
: event signature, a signature is in the format:{edge label}#{start node label}#{end node label}
eid
: edge idstart_id
: id of the start nodeend_id
: id of the end node
You can download the preprocessed provanence graph at link
The output contains the number of matched subgraphs, memory usage and other metircs. The example output format is:
{
"PeakHeapSize": 522190848,
"NumResults": 1000,
"UsageCount": {
"1": 7490,
"7": 1133
},
"PeakPoolSize": 2336
}
If enable the debug options (--debug), the edge id of all edges in each matched subgraph will be printed to stderr.
The workflow of IPMES is
- Convert raw input data into DataEdges.
- Decompose the pattern we want to match into TC subqueries (this step is to accelerate matching).
- Match each TC subqueries separately and store the match results in specific buffers.
- Decide a strict join order and join match results according to it.
Event collections without strict timing order would have countless permutations. To avoid it, we decompose patterns without strict timing order into several TC subqueries that have strict timing order.
What is TC Query: https://ieeexplore.ieee.org/document/9248627
When join match result, we need to scan through the whole table we use to store partial match results in order to ensure not missing any possible match. Since the number of partial match results grow explosively, we introduce a new algorithm, Priority Join, to make join more efficient.
Priority Join organize the table, We store diffrient kinds of partial match results in individual buffer. Every time we want to join, we only need to check the partial match results in the corresponding buffer according to the strict join order. This significantly reduce the number of match results we need to check.
- Java >= 11
- Apache Maven >= 3.8.7
cd ipmes-java/
mvn compile
mvn exec:java -Dexec.args="[options] data_graph pattern_graph"
usage: ipmes-java [-h] [-r] [--darpa] [-w WINDOWSIZE] [--debug] pattern_prefix data_graph
IPMES implemented in Java.
positional arguments:
pattern_prefix The path prefix of pattern's files, e.g. ./data/patterns/TTP11
data_graph The path to the preprocessed data graph
named arguments:
-h, --help show this help message and exit
-r, --regex Explicitly use regex matching. Default will automatically depend on the pattern prefix name (default: false)
--darpa We are running on DARPA dataset. (default: false)
-w WINDOWSIZE, --window-size WINDOWSIZE
Time window size (sec) when joining. (default: 1800)
--debug Output debug information. (default: false)
data/
: the example input data for the programpatterns/
: the patterns for the SPADE datasetdarpa_pattern/
: the patterns for the DARPA dataset
ipmes-python/
: the Python implementation of IPMES which is deprecated now. The files are kept in the repository for data preprocessing.ipmes-java/
: the java implementation for IPMES- For a more detailed description of the folder, please read the README in the folder