New dataset, CTU-13-Extended, now includes pcap files of normal traffic

After considering several request we decided to extend the previous CTU-13 dataset to include truncated versions of the original pcap files. The pcap files include now all the traffic: Normal, Botnet and Background. The pcap files where however truncated to protect the privacy of the users, but in such a way that it is still possible to read the complete TCP, UDP and ICMP headers.

Differences on The Behavioral Patterns of Malware and Normal DNS Connections

This blog post is a comparison and analysis of the differences in the behavioral patterns found in the DNS traffic of malware and normal connections. We captured malware and normal traffic in the MCFP project and we extracted the DNS behavior with the stf tool. The captures correspond to DNStraffic of a SPAM malware, DGA-based malware and a normal computer. The idea is to analyze the differences in the behaviors as they are shown by the stf program. For an explanation of how the stfprogram is generating this data see this explanation.

The Importance of Good Labels in Security Datasets

Working as security researchers is common to create a new machine learning algorithm that we want to evaluate. It may be that we are trying to detect malware, identify attacks or analyze IDS logs, but at some point we figure it out that we need a good dataset to complete our task. But not any dataset; in fact we need a labeled dataset. The dataset will be used not only to learn the features of, for example, malware traffic, but also to verify how good our algorithm is. Since getting a dataset is difficult and time consuming, the most common solution is to get a third-party dataset; although some researchers with time and resources may create their own. Either way, most usually we obtain a dataset of malware traffic (continuing with the malware traffic detection example) and we assign the label Malware to all of its instances. This looks good, so we make our training and testing, we obtain results and we publish. However, there are important problems in this approach that can jeopardize the results of our algorithm and the verification process. Let’s analyze each problem in turn.

MALWARE USES MULTIPLE WEB SERVERS TO HAVE A PERIODIC HTTP C&C CONNECTION WHILE ITS NETFLOWS ARE NOT PERIODIC

 MALWARE USES MULTIPLE WEB SERVERS TO HAVE A PERIODIC HTTP C&C CONNECTION WHILE ITS NETFLOWS ARE NOT PERIODIC

While analyzing our capture CTU-Malware-Capture-Botnet-89-1 we found out that there were some strange issues with the periodicity of the C&C channels. In this capture there were a lot of HTTP connections, but few of them were periodic. During the analysis of the network capture we usually start looking at the NetFlows and then we move to the payload data. What we found is that several periodic HTTP connections had non-periodic NetFlows. This was strange for us so we took a deeper look. 

ANALISIS OF CTU-MALWARE-CAPTURE-1 (ZBOT.OOWO)

ANALISIS OF CTU-MALWARE-CAPTURE-1 (ZBOT.OOWO)

CTU-MALWARE-CAPTURE-1 (ZBOT.OOWO)

This capture was done between Thu Sep  5 15:40:07 CEST 2013 and Tue Oct  1 13:38:29 CEST 2013, having a total of 25 days and 21 hours. It corresponds to a binary with the MD5 46b3df3eaf1312f80788abd43343a9d2 of and that was classified by Kaspersky in VirusTotal as Trojan-Spy.Win32.Zbot.oowo. However we are not sure of the name.