Internals
Framework inner workings and Implementation details
Packages
cmd
The cmd package contains the command-line application. It receives configuration param- eters from command-line flags, creates and configures a collector instance, and then starts collecting data from the desired source.
label
The label package contains the code for creating labeled datasets. For now, the suricata IDS / IPS engine is used to scan the input PCAP and generate alerts. In the future, support could also be added for using YARA. Alerts are then parsed with regular expressions and trans- formed into the label.SuricataAlert type. This could also be replaced by parsing suricatas eve.json event logs in upcoming versions. A suricata alert contains the following information:
In the next iteration, the gathered alerts are mapped onto the collected data. For layer types which are not handled separately, this is currently by using solely the timestamp of the packet, since this is the only field required by Netcap, however multiple alerts might exist for the same timestamp. To detect this and throw an error, the -strict flag can be used. The default is to ignore duplicate alerts for the same timestamp, use the first encountered label and ignore the rest. Another option is to collect all labels that match the timestamp, and append them to the final label with the -collect flag. To allow filtering out classifications that shall be excluded, the -excluded flag can be used. Alerts matching the excluded classi- fication will then be ignored when collecting the generated alerts. Flow, Connection, HTTP and TLS records mapping logic also takes source and destination information into consider- ation. The created output files follow the naming convention: NetcapType labeled.csv. The label package includes a standalone command-line application in label/cmd.
types
The types package contains types.AuditRecord interface implementations for each supported protocol, to enable converting data to the CSV format. For this purpose, each protocol must provide a CSVRecord() []string and a CSVHeader() []string function. Additionally, a NetcapTimestamp() string function that returns the Netcap timestamp must be implemented.
encoder
The encoder package implements conversion of decoded network protocols to protocol buffers. This has to be defined for each supported protocol. Two types of encoders exist: The LayerEncoder and the CustomEncoder.
Layer Encoder
A LayerEncoder operates on a gopacket.Layer and has to provide the gopacket.LayerType constant, as well a handler function to receive the layer and the timestamp and convert it into a protocol buffer.
Custom Encoder
A CustomEncoder operates on a gopacket.Packet and is used to decode traffic into abstrac- tions such as Flows or Connections. To create it a name has to be supplied among three different handler functions to control initialization, decoding and deinitialization. Its handler function receives a gopacket.Packet interface type and returns a proto.Message. The postinit function is called after the initial initialization has taken place, the deinit function is used to teardown any additionally created structures for a clean exit. Both functions are optional and can be omitted by supplying nil as value.
utils
The utils package contains shared utility functions used by several other packages.
collector
The collector package provides an interface for fetching packets from a data source, this can either be a PCAP / PCAPNG file or directly from a named network interface. It is used to implement the command-line interface for Netcap.
io
Primitives for atomic maps and write operations
Caveats
Protocol buffers have a few caveats that developers and researchers should be aware of. First, there are no types for 16 bit signed (int16) and unsigned (uint16) integers in protobuf, also there is no type for unsigned 8 bit integers (uint8). This data type is seen a lot in network protocols, so the question arises how to represent it in protocol buffers. The non-fixed integer types use variable length encoding, so int32 is used instead. The variable-length encoding will take care of not sending the bytes that are not being used. Unfortunately, the mu type is too short for this purpose. Second, protocol buffers require all strings to be encoded as valid UTF-8, otherwise encoding to proto will fail. This means all input data that will be encoded as a string in protobuf must be checked to contain valid UTF-8, or they will create an error upon serialization and end up in the errors.pcap file. If this behavior is not desired strings must be filtered prior to setting them on the protocol buffer instances. Another thing that has to be kept in mind is that Netcap processes packets in parallel, thus the order in which packets are written to the dump file is not guaranteed. In experiments, no mixup was detected, and records were tracked in the correct order. However, under heavy load conditions or with a high number of workers, this might be different. Because of this caveat, the Netcap specification requires each record to preserve the timestamp, in order to allow sorting the packets afterwards, if required.
Data Race Detection Builds
In concurrent programming, shared resources need to be synchronized, in order to guarantee their state when modifying or reading them. If access is not synchronized, race conditions occur, which will lead to faulty program behavior. To avoid this and detect race conditions early in the development cycle, the go toolchain offers compiling the program with the race detector enabled. This will let the application crash with stack traces to assist the developer in debugging, if a data race occurs. Programs with active race detection are slower by the factor of 10 to 100. To compile a Go program with the race detection enabled the -race flag must be added to the compilation command.
Unit Tests
Unit tests have been implemented for parts of the core functionality. Currently there are benchmarks for reading pcap and pcapng data, as well as tests and benchmarks for common utility functions, such progress displaying and time conversions. The tests and benchmarks can be executed from the repository root by running:
Last updated