Audit Record Labeling
Label audit records for supervised machine learning
Introduction
The term labeling refers to the procedure of adding classification information to each audit record. For the purpose of intrusion detection this is usually a label stating whether the record is normal or malicious. This is called binary classification, since there are just two choices for the label (good / bad) [BK14]. Efficient and precise creation of labeled datasets is important for supervised machine learning techniques. To create labeled data, Netcap parses logs produced by suricata and extracts information for each alert. The quality of labels therefore depends on the quality of the used ruleset. In the next step it iterates over the data generated by itself and maps each alert to the corresponding packets, connections or flows. This takes into account all available information on the audit records and alerts. More details on the mapping logic can be found in the implementation chapter. While labeling is usually performed by marking all packets of a known malicious IP address, Netcap implements a more granular and thus more precise approach of mapping labels for each record. Labeling happens asynchronously for each audit record type in a separate goroutine.
Netlabel command-line Tool
In the following common operations with the netlabel tool on the command-line are presented and explained. The tool can be found in the label/cmd package.
To display the available command-line flags, the -h flag must be used:
$ net.label -h 
Usage of netlabel:
    -collect
        append classifications from alert with duplicate timestamps to the generated label
    -description
        use attack description instead of classification for labels
    -disable-layers
        do not map layer types by timestamp
    -exclude string
        specify a comma separated list of suricata classifications that shall be excluded from the generated labeled csv
    -out string
        specify output directory, will be created if it does not exist
    -progress
        use progress bars
    -r string
        read specified file, can either be a pcap or netcap audit record file
    -sep string
        set separator string for csv output (default ",")
    -strict
        fail when there is more than one alert for the same timestampScan input pcap and create labeled csv files by mapping audit records in the current directory:
$ net.label -r traffic.pcapScan input pcap and create output files by mapping audit records from the output directory:
$ net.label -r traffic.pcap -out output_dirAbort if there is more than one alert for the same timestamp:
$ net.label -r taffic.pcap -strictDisplay progress bar while processing input (experimental):
$ net.label -r taffic.pcap -progressAppend classifications for duplicate labels:
$ net.label -r taffic.pcap -collectLast updated
