Internals

Framework inner workings and Implementation details

Packages

cmd

The cmd package contains all netcap commandline applications.

label

The label package contains the code for creating labeled datasets. For now, the suricata IDS / IPS engine is used to scan the input PCAP and generate alerts. In the future, support could also be added for using YARA. Alerts are then parsed with regular expressions and trans- formed into the label.SuricataAlert type. This could also be replaced by parsing suricatas eve.json event logs in upcoming versions. A suricata alert contains the following information:

// SuricataAlert is a summary structure of an alerts contents
type SuricataAlert struct {
Timestamp string
Proto string
SrcIP string
SrcPort int
DstIP string
DstPort int
Classification string
Description string
}

In the next iteration, the gathered alerts are mapped onto the collected data. For layer types which are not handled separately, this is currently by using solely the timestamp of the packet, since this is the only field required by Netcap, however multiple alerts might exist for the same timestamp. To detect this and throw an error, the -strict flag can be used. The default is to ignore duplicate alerts for the same timestamp, use the first encountered label and ignore the rest. Another option is to collect all labels that match the timestamp, and append them to the final label with the -collect flag. To allow filtering out classifications that shall be excluded, the -excluded flag can be used. Alerts matching the excluded classi- fication will then be ignored when collecting the generated alerts. Flow, Connection, HTTP and TLS records mapping logic also takes source and destination information into consider- ation. The created output files follow the naming convention: NetcapType labeled.csv. The label package includes a standalone command-line application in label/cmd.

types

The types package contains types.AuditRecord interface implementations for each supported protocol, to enable converting data to the CSV format. For this purpose, each protocol must provide a CSVRecord() []string and a CSVHeader() []string function. Additionally, a NetcapTimestamp() string function that returns the Netcap timestamp must be implemented.

encoder

The encoder package implements conversion of decoded network protocols to protocol buffers. This has to be defined for each supported protocol. Two types of encoders exist: The LayerEncoder and the CustomEncoder.

Layer Encoder

A LayerEncoder operates on a gopacket.Layer and has to provide the gopacket.LayerType constant, as well a handler function to receive the layer and the timestamp and convert it into a protocol buffer.

Custom Encoder

A CustomEncoder operates on a gopacket.Packet and is used to decode traffic into abstrac- tions such as Flows or Connections. To create it a name has to be supplied among three different handler functions to control initialization, decoding and deinitialization. Its handler function receives a gopacket.Packet interface type and returns a proto.Message. The postinit function is called after the initial initialization has taken place, the deinit function is used to teardown any additionally created structures for a clean exit. Both functions are optional and can be omitted by supplying nil as value.

utils

The utils package contains shared utility functions used by several other packages.

collector

The collector package provides an interface for fetching packets from a data source, this can either be a PCAP / PCAPNG file or directly from a named network interface. It is used to implement the command-line interface for Netcap.

io

Primitives for atomic maps and write operations

Caveats

Protocol buffers have a few caveats that developers and researchers should be aware of. First, there are no types for 16 bit signed (int16) and unsigned (uint16) integers in protobuf, also there is no type for unsigned 8 bit integers (uint8). This data type is seen a lot in network protocols, so the question arises how to represent it in protocol buffers. The non-fixed integer types use variable length encoding, so int32 is used instead. The variable-length encoding will take care of not sending the bytes that are not being used. Unfortunately, the mu type is too short for this purpose. Second, protocol buffers require all strings to be encoded as valid UTF-8, otherwise encoding to proto will fail. This means all input data that will be encoded as a string in protobuf must be checked to contain valid UTF-8, or they will create an error upon serialization and end up in the errors.pcap file. If this behavior is not desired strings must be filtered prior to setting them on the protocol buffer instances. Another thing that has to be kept in mind is that Netcap processes packets in parallel, thus the order in which packets are written to the dump file is not guaranteed. In experiments, no mixup was detected, and records were tracked in the correct order. However, under heavy load conditions or with a high number of workers, this might be different. Because of this caveat, the Netcap specification requires each record to preserve the timestamp, in order to allow sorting the packets afterwards, if required.

Data Race Detection Builds

In concurrent programming, shared resources need to be synchronized, in order to guarantee their state when modifying or reading them. If access is not synchronized, race conditions occur, which will lead to faulty program behavior. To avoid this and detect race conditions early in the development cycle, the go toolchain offers compiling the program with the race detector enabled. This will let the application crash with stack traces to assist the developer in debugging, if a data race occurs. Programs with active race detection are slower by the factor of 10 to 100. To compile a Go program with the race detection enabled the -race flag must be added to the compilation command.

Unit Tests

Unit tests have been implemented for parts of the core functionality. Currently there are benchmarks for reading pcap and pcapng data, as well as tests and benchmarks for common utility functions, such progress displaying and time conversions. The tests and benchmarks can be executed from the repository root by running:

$ go test -v -bench=. ./...

Netcap Audit Record File Header

Each Netcap protocol buffer dump file, has a Header (type definition in netcap.proto) as its first element. The header contains information about the creation date, Netcap version, input source and the data type of the audit records.

message Header {
string Created = 1; // Timestamp of creation date
string InputSource = 2; // interface name or name of dumpfile
Type Type = 3; // netcap data type
string Version = 4; // Netcap version string
bool ContainsPayloads = 5; // Do the audit records contain payloads?
}

Type Enumerations

Type enumerations are maintained in the netcap.proto definitions. Due to the C++ scoping implemented by the Protocol Buffer compiler, enumeration names cannot be the same as the corresponding message type. To solve this, a NC prefix is prepended to each entry (NC stands for NetCap). Constants will follow the naming scheme Type NC RecordName in the generated code, for example the TCP constant is named: Type NC TCP.

enum Type {
NC_Header
NC_Batch
NC_Flow
NC_Connection
NC_LinkFlow
NC_NetworkFlow
NC_TransportFlow
NC_Ethernet
NC_ARP
NC_Dot1Q
NC_Dot11
...
}

‚Äč