Skip to content

IPMES: A Tool for Incremental Pattern Matching over Event Stream

Notifications You must be signed in to change notification settings

Datou0718/IPMES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IPMES

IPMES (Incremental Behavioral Pattern Matching Algorithm over the System Audit Event Stream for APT Detection) is a system that performs incremental pattern matching over event streams.

This repository holds the original source code for IMPES-java.

Related Repositories

If you use IPMES in your research, please cite our paper published at the 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN): IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream

BibTeX entry:

@INPROCEEDINGS {10646966,
author = { Li, Hong-Wei and Liu, Ping-Ting and Lin, Bo-Wei and Liao, Yi-Chun and Huang, Yennun },
booktitle = { 2024 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) },
title = {{ IPMES: A Tool for Incremental TTP Detection Over the System Audit Event Stream }},
year = {2024},
volume = {},
ISSN = {},
pages = {265-273},
doi = {10.1109/DSN58291.2024.00036},
url = {https://doi.ieeecomputersociety.org/10.1109/DSN58291.2024.00036},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month =Jun}

Description

Problems

The Provenance Graph is a streaming directed graph constructed by system logs. Each event in the log corresponds to an directed edge (arc) in the provenance graph. The points in the provenance graph represent the objects in the system like a file, a process or a network socket. The endpoints of an arc is the object that triggers the event and the object that receives the event respectively. For example, if a process opens a file, there will be an arc from the process to the file and labeled open.

The provenance graph is called streaming sice it is constrcted by event log stream. The graph is given arc-by-arc over time and will keep growing.

The behavioral pattern is consists of a graph and a order relation. The graph part of the pattern is similar to the definition of the provenance graph. The order relation specifies the order of some edges in the pattern graph.

The subgraph $G$ of the provenance graph is called matched subgraph for a pattern $P$ if:

  • there is a bijective function $F$ from $V(G)$ to $V(P)$ such that
    • $uv \in E(G)$ if and ony if $F(u)F(v) \in E(P)$ for all $v \in V(G)$
    • lable of $v$ is the same as the lable of $F(v)$ for all $v \in V(G)$
  • the incoming order of the edges in $G$ satisifies the order relations

TL;DR, we are given:

  • Provenance Graph
  • Behavioral Pattern
    • Graph
    • Order relations

expected output:

  • The matched subgraphs

Input / Output Format

Input format for pattern graph

See the example at TTP11_regex_edge

Input format for pattern order relations

The oRels file reprensents the order relations of the edges in the pattern. The file format is a json object like this:

{
    "root": {
        "parents": [],
        "children": [
            0,
            3,
            6
        ]
    },
    "0": {
        "parents": ["root"],
        "children": [2]
    }, ...
}

The order relations is expressed in a connected DAG with a root. The root is a virtual vertex, not representing any edge. The other vertices in the graph have a number lable n, corresponding to the n-th edge in the edge file, n starting from 0.

In the DAG, the parents are the dependencies of it's childs, meaning the occurrence of a child must after all of it's parent.

See the example at TTP11_oRels

Input format for provanence graph

In csv format. The columns in the csv: [start_time, end_time, event_sig, eid, start_id, end_id]

  • start_time: the event start time
  • end_time: the event end time
  • event_sig: event signature, a signature is in the format: {edge label}#{start node label}#{end node label}
  • eid: edge id
  • start_id: id of the start node
  • end_id: id of the end node

You can download the preprocessed provanence graph at link

Output Format

The output contains the number of matched subgraphs, memory usage and other metircs. The example output format is:

{
  "PeakHeapSize": 522190848,
  "NumResults": 1000,
  "UsageCount": {
    "1": 7490,
    "7": 1133
  },
  "PeakPoolSize": 2336
}

If enable the debug options (--debug), the edge id of all edges in each matched subgraph will be printed to stderr.

Solution

The workflow of IPMES is

  1. Convert raw input data into DataEdges.
  2. Decompose the pattern we want to match into TC subqueries (this step is to accelerate matching).
  3. Match each TC subqueries separately and store the match results in specific buffers.
  4. Decide a strict join order and join match results according to it.

workflowOfIPMES

Decompose:

Event collections without strict timing order would have countless permutations. To avoid it, we decompose patterns without strict timing order into several TC subqueries that have strict timing order.

What is TC Query: https://ieeexplore.ieee.org/document/9248627

Join:

When join match result, we need to scan through the whole table we use to store partial match results in order to ensure not missing any possible match. Since the number of partial match results grow explosively, we introduce a new algorithm, Priority Join, to make join more efficient.

Priority Join organize the table, We store diffrient kinds of partial match results in individual buffer. Every time we want to join, we only need to check the partial match results in the corresponding buffer according to the strict join order. This significantly reduce the number of match results we need to check.

Requirement

  • Java >= 11
  • Apache Maven >= 3.8.7

Build from source

cd ipmes-java/
mvn compile

Running

mvn exec:java -Dexec.args="[options] data_graph pattern_graph"

Usage

usage: ipmes-java [-h] [-r] [--darpa] [-w WINDOWSIZE] [--debug] pattern_prefix data_graph

IPMES implemented in Java.

positional arguments:
  pattern_prefix         The path prefix of pattern's files, e.g. ./data/patterns/TTP11
  data_graph             The path to the preprocessed data graph

named arguments:
  -h, --help             show this help message and exit
  -r, --regex            Explicitly use regex matching. Default will automatically depend on the pattern prefix name (default: false)
  --darpa                We are running on DARPA dataset. (default: false)
  -w WINDOWSIZE, --window-size WINDOWSIZE
                         Time window size (sec) when joining. (default: 1800)
  --debug                Output debug information. (default: false)

Directory Structure

  • data/: the example input data for the program
    • patterns/: the patterns for the SPADE dataset
    • darpa_pattern/: the patterns for the DARPA dataset
  • ipmes-python/: the Python implementation of IPMES which is deprecated now. The files are kept in the repository for data preprocessing.
  • ipmes-java/: the java implementation for IPMES
    • For a more detailed description of the folder, please read the README in the folder

About

IPMES: A Tool for Incremental Pattern Matching over Event Stream

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published