Datasets Overview
The Stratosphere IPS feeds itself with models created from real malware traffic captures. By using and studying how malware behaves in reality, we ensure the models we create are accurate and our measurements of performance are real. Our sister project, Malware Capture Facility Project, is in charge of continuously monitoring the threat landscape for new emerging threats, retrieving malicious samples and running them in our facilities to capture the traffic.
Malware captures
The Stratosphere IPS Project has a sister project called Malware Capture Facility Project. This project is responsible for making the long-term malware captures. We continually obtained malware and normal data to feed the Stratosphere IPS.
Normal captures
In order to perform a correct verification of the machine learning algorithms is paramount to have good datasets. The capture of normal traffic is key to accurately calculate the true values of False Positives & True Negatives.
mixed captures
The mixed captures provide a real scenario where a machine is not infected, then infected and after some time the infection is cleaned up. This type of scenario facilitates the testing of the StratosphereIPS machine learning algorithms and models.
To cite these datasets, please use the following:
Stratosphere. (2015). Stratosphere Laboratory Datasets. Retrieved March 13, 2020, from https://www.stratosphereips.org/datasets-overview
Bibtex format:
@misc{stratodatasets, title={Stratosphere Laboratory Datasets}, author={Stratosphere}, year={2015}, note={Retrieved March 13, 2020, from \url{https://www.stratosphereips.org/datasets-overview}} }
Malware on Iot captures
As part of the Aposemat project we execute, capture and analyze malware on IoT devices. This is the list of complete captures for you to download.
SPECIAL DATASET CTU-13
The CTU-13 dataset consist in a group of 13 different malware captures done in a real network environment. The captures include Botnet, Normal and Background traffic. The Botnet traffic comes from the infected hosts, the Normal traffic from the verified normal hosts and the Background traffic is all the rest of traffic that we don’t know what it is for sure. The dataset is labeled in a flow by flow basis, consisting in one of the largest and more labeled botnet datasets available. The files that can be downloaded are:
Binetflow files
For Botnet, Normal and Background traffic.
Text files with bidirectional flows generated by Argus.
Biargus files
For Botnet, Normal and Background traffic.
Binary files with bidirectional flows generated by Argus.
Complete Pcap files
For Botnet traffic.
Pcap files with all the payload data.
Truncated Pcap files
For Botnet, Normal and Background traffic.
Pcap files only with the headers information.
Download the CTU-13 Dataset
The CTU-13 dataset is published with the license Creative Commons CC-BY, and can be downloaded from the following link:
CTU-13-Dataset: large dataset of 13 captures with Malware, Normal and Background traffic.
Backup site for the CTU-13 dataset: in case our main repository of files is not working, you can still find the files of the CTU-13 dataset HERE.