This blog post was originally published on 17 July 2015, by Sebastian Garcia, at https://stratosphereips.org/new-dataset-ctu-13-extended-now-includes-pcap-files-of-normal-traffic.html.
After considering several request we decided to extend the previous CTU-13 dataset to include truncated versions of the original pcap files. The pcap files include now all the traffic: Normal, Botnet and Background. The pcap files where however truncated to protect the privacy of the users, but in such a way that it is still possible to read the complete TCP, UDP and ICMP headers.
How the dataset was truncated
Each original pcap file was truncated following this methodology:
$ tcpdump -n -s0 -r originalcapturefile.pcap -w originalcapturefile.tcp.pcap tcp $ editcap -s 54 originalcapturefile.tcp.pcap originalcapturefile.tcp.truncated.pcap $ tcpdump -n -s0 -r originalcapturefile.pcap -w originalcapturefile.udp.pcap udp $ editcap -s 42 originalcapturefile.udp.pcap originalcapturefile.udp.truncated.pcap $ tcpdump -n -s0 -r originalcapturefile.pcap -w originalcapturefile.icmp.pcap icmp $ editcap -s 66 originalcapturefile.icmp.pcap originalcapturefile.icmp.truncated.pcap $ mergecap -w originalcapturefile.truncated.pcap originalcapturefile.tcp.truncated.pcap originalcapturefile.udp.truncated.pcap originalcapturefile.icmp.truncated.pcap
The values of 54 bytes for TCP, 42 for UDP and 66 for ICMP ensured that the complete headers were present while no information about the payload was included. (Technically speaking some bytes of the payload may be included, but they are insignificant)
Content of the CTU-13-Extended dataset
The final content of this dataset are all the previous files in the CTU-13 dataset plus the truncated pcap files of the complete traffic. Remember that the CTU-13 dataset and now the CTU-13-Extended dataset are composed of 13 different experiments or scenarios. Each of these scenarios already has its own folder with all the files, and in that folder we included the new truncated pcap files of all the traffic. Therefore, you can download the complete compressed single file of the new dataset, or you can just download the new truncated pcap file from each scenario folder.
Download the Single File Compressed Version of the CTU-13-Extended Dataset
Like the CTU-13 dataset, the new CTU-13-Extended dataset is also available as a single compressed file for your convenience. The file is here:
Scenario CTU-Malware-Capture-Botnet-42
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-43
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-44
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-45
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-46
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-47
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-48
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-49
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-50
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-51
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-52
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-53
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
Scenario CTU-Malware-Capture-Botnet-54
This folder included all the previous files, plus the new truncated pcap file:
- Binary executable file of the malware used.
- README file.
- Complete pcap file of the Botnet traffic. Not truncated.
- Text flow file. Bidirectional.
- Argus binary flow file. Labeled. Bidirectional.
- Argus and ra tools configuration files.
- Truncated pcap file of the Normal, Botnet and Background traffic.
What is not included in these new files?
The only traffic that is not included in these new files and that is present in the original pcap files are some ARP packets and some IPX packets, but since there was a small amount we decided to exclude them.