File Extraction
Extract transferred files and save them to disk

Introduction

Various protocols allow transferring files (e.g: HTTP, POP3) and some are made for the sole purpose of transferring files (FTP, SMB etc).
From a network security monitoring perspective, transferred files are interesting because they can contain malicious software or prohibited content.
Netcap extracts files from HTTP and saves them to disk, for both HTTP responses and HTTP requests.
It uses the File audit record type to model the extracted information.
Future versions will add file extraction support for other protocols as well.

File Audit Records

The audit record definition for a file looks like this:
1
message File {
2
string Timestamp = 1;
3
string Name = 2;
4
int64 Length = 3;
5
string Hash = 4;
6
string Location = 5;
7
string Ident = 6;
8
string Source = 7;
9
string ContentType = 8;
10
PacketContext Context = 9;
11
string Host = 10;
12
string ContentTypeDetected = 11;
13
}
Copied!
As can be seen, the content type indicated by the HTTP header is included, as well as the content type that was detected. In addition, the source of the File is specified (e.g: from HTTP, Mail attachment etc), as well the identifier of the connection where it originated from.
The Hash field currently holds an MD5 hash of the file, Location points to the path on disk where the file is stored.
This will likely be replaced with a stronger hash function in the future.

Usage

To enable file capture, set the -fileStorage flag and supply a path to store the files to (will be created if it does not exist):
1
$ net capture -read traffic.pcap -fileStorage files
Copied!
After capturing, lets inspect the directory contents:
1
$ tree files
2
files
3
├── application
4
│   └── x-gzip
5
│   └── unknown-193.24.227.12->216.66.80.30-80->60075.gz
6
├── image
7
│   └── x-icon
8
│   └── favicon.ico-193.24.227.12->216.66.80.30-80->60076.ico
9
└── text
10
└── html
11
├── unknown-193.24.227.12->216.66.80.30-80->55031.html
12
├── unknown-193.24.227.12->216.66.80.30-80->55032.html
13
├── unknown-193.24.227.12->216.66.80.30-80->55033.html
14
└── unknown-80.237.133.136->192.168.110.10-80->1152.html
15
16
6 directories, 6 files
Copied!
As you can see, files are sorted by their MIME types retrieved from classifying them using the go standard library and named after the TCP connection they originated from.
By default, only complete requests and responses are captured, if you also want to extract incomplete data, use the -writeincomplete flag:
1
$ net capture -read traffic.pcap -fileStorage files -writeincomplete
Copied!
Dumping a File on the commandline looks like this:
1
$ net dump -read File.ncap.gz -struc
2
NC_File
3
Timestamp: "2015-03-08 14:05:29.664213 +0000 UTC"
4
Name: "ads.bmp"
5
Length: 126
6
Hash: "2d5a035011854b04a456b244b15a583b"
7
Location: "files/image/bmp/ads.bmp-80.239.178.178->192.168.0.51-80->41214.bmp"
8
Ident: "80.239.178.178->192.168.0.51-80->41214"
9
Source: "HTTP RESPONSE from /ads.bmp"
10
Context: <
11
SrcIP: "192.168.0.51"
12
DstIP: "80.239.178.178"
13
SrcPort: "41214"
14
DstPort: "80"
15
>
16
ContentTypeDetected: "image/bmp"
17
...
Copied!
For properly exploring files for each host I recommend using the Maltego Integration:
Last modified 1yr ago