Audit Record Labeling

Label audit records for supervised machine learning

Introduction

The term labeling refers to the procedure of adding classification information to each audit record. For the purpose of intrusion detection this is usually a label stating whether the record is normal or malicious. This is called binary classification, since there are just two choices for the label (good / bad). Another option is to use multi-class labels which could represent attack names or categories. Efficient and precise creation of labeled datasets is important for supervised machine learning techniques. To create labeled data, Netcap parses logs produced by suricata and extracts information for each alert. The quality of labels therefore depends on the quality of the used ruleset. In the next step it iterates over the data generated by itself and maps each alert to the corresponding packets, connections or flows. This takes into account all available information on the audit records and alerts. More details on the mapping logic can be found in the implementation chapter. While labeling is usually performed by marking all packets of a known malicious IP address, Netcap implements a more granular and thus more precise approach of mapping labels for each record. Labeling happens asynchronously for each audit record type in a separate goroutine.

Labeling with Suricata

For labeling with suricata please install suricata first and make sure it can be found in your $PATH.

The installation guide can be found here:

The suricata config file is expected by default at /usr/local/etc/suricata/suricata.yaml, but you can overwrite this path with the -suricata-config flag.

// SuricataAlert is a summary structure for an alert
type SuricataAlert struct {
   Timestamp      string
   Proto          string
   SrcIP          string
   SrcPort        int
   DstIP          string
   DstPort        int
   Classification string
   Description    string
}

The label tool expects being passed a packet capture dump with the -read flag, which will then be scanned with suricata to retrieve the alerts from the suricata fast.log file with regular expressions.

Future versions could use the eve.json log for this.

Inside of the provided output directory (-out or current directory by default) the audit records generated for the provided PCAP file are expected to be present. That means you need to generate them first before using the label tool.

When running the tool, labeled CSV files will be created for the alerts produced by suricata, adding the attack class or description (depending on the configuration), as a last element to every line.

Labeling with custom attack information

Custom attack information can be loaded as a CSV file. The data is expected to have the following fields:

type AttackInfo struct {
	Num      int       `csv:"num"`
	Name     string    `csv:"name"`
	Start    time.Time `csv:"start"`
	End      time.Time `csv:"end"`
	IPs      []string  `csv:"ips"`
	Proto    string    `csv:"proto"`
	Notes    string    `csv:"notes"`
	Category string    `csv:"category"`
}

The time format for the start and end markers is:

2006/1/2 15:04:05

Audit records will be labeled as a part of an attack if all of the following conditions are met:

  • at least one of the ips from the attackinfo is either source or destination of the audit record

  • the audit record has a timestamp within the attack period or matches it exactly

This specification resulted from a specific dataset from a research project and can be easily updated if you have different requirements or different data.

Usage

Scan input pcap and create labeled csv files by mapping audit records in the current directory:

$ net label -read traffic.pcap

Scan input pcap and create output files by mapping audit records from the output directory:

$ net label -read traffic.pcap -out output_dir

Abort if there is more than one alert for the same timestamp:

$ net label -read taffic.pcap -strict

Display progress bar while processing input (experimental):

$ net label -read taffic.pcap -progress

Append classifications for duplicate labels:

$ net label -read taffic.pcap -collect

Last updated