This blog post was originally published on 12 March 2015, by Sebastian Garcia, at https://stratosphereips.org/nomad.html.
Description of the project
The goal of the NoMaD project is to collect, label, organize and make available a large, verified and labeled dataset of normal and malicious HTTPS connections. This dataset is designed to support the research team at Cisco Prague as well as to support the research activities and publications of the CVUT University. The project will give Cisco Systems an evolving dataset to generate better and faster analysis; and will give the CTU University the opportunity to research about the HTTPSbehaviors in the network as part of its Stratosphere Project.
During the years 2016 and 2017 the Computer Science department of the Faculty of Electrical Engineering of the CTU University completed the research project called Nomad, in collaboration with the CTA group (Cognitive Threat Analytics) of Cisco Systems. The motivation of the project was the sudden surge of malware using HTTPs during 2015 and 2016, which was very difficult to identify in the network given the encryption protocols. The problem particularly important because Cisco CTA group uses logs from web proxies to find new threats in its clients. The Nomad project was designed to take advantage of the security knowledge of the CTU University to research and analyze this type of malware.
Nomad Dataset
The most important challenge of analyzing malware using HTTPs is the lack of a good public dataset. As part of our work we spent almost one year collecting real, and long term, malware traffic.The dataset created is part of our Nomad Project. Our dataset consists of more than 80 network malware traffic captures. One of the goals of the dataset is to study the behavior of malware and how it changes in time. To obtain this type of data we executed the malware for long terms, up to 3 weeks or even months. The dataset contains malware capture of different types of malware (such as Trojans, Adware, botnets, etc.). For each capture, we generated several files to improve future analyses. The process of creating the dataset can be described in four phases, (1) design and creation of the laboratory, (2) design of the capture methodology, (3) generation of experiments and output of information.