Garcia

Ensembling to improve infected hosts detection

In this paper we describe the main ensemble learning techniques and their application in the cybersecurity threats detection. The state of the art in the use of ensemble learning techniques is presented here as an alternative to the current intrusion detection mechanisms, analyzing their advantages and disadvantages. We propose to incorporate ensemble learning to SLIPS [3], a behavioral-based intrusion detection and prevention system that uses machine learning algorithms to detect malicious behaviors, to obtain better results, taking advantage of the benefits of the SLIPS classifiers and modules. As part of this work we extend ensembling by considering algorithms from different domains (not machine learning domains), as Thread Intelligence. As a first stage of this project, performance tests of ensemble learning algorithms were performed to detect malware from flows evaluating its accuracy. The results of these tests are presented here, as well as the conclusions obtained and the future work.

Detecting DNS Threats: A Deep Learning Model to Rule Them All

Domain Name Service is a central part of Internet regular operation. Such importance has made it a common target of different malicious behaviors such as the application of Domain Generation Algorithms (DGA) for command and control a group of infected computers or Tunneling techniques for bypassing system administrator restrictions. A common detection approach is based on training different models detecting DGA and Tunneling capable of performing a lexicographic discrimination of the domain names. However, since both DGA and Tunneling showed domain names with observable lexicographical differences with normal domains, it is reasonable to apply the same detection approach to both threats. In the present work, we propose a multi-class convolutional network (MC-CNN) capable of detecting both DNS threats. The resulting MC-CNN is able to detect correctly 99% of normal domains, 97% of DGA and 92% of Tunneling, with a False Positive Rate of 2.8%, 0.7% and 0.0015% respectively and the advantage of having 44% fewer trainable parameters than similar models applied to DNS threats detection.

Geost Botnet: Operational security failures lead to a new Android banking threat

This paper describes the rare discovery of a new Android banking botnet, named Geost, from the operational security failures of its botmaster. They made many mistakes, including using the illegal proxy network of the HtBot malware, not encrypting their Command and Control servers, re-using security services, trusting other attackers with less operational security, and not encrypting chat sessions.

Machete: Dissecting the Operations of a Cyber Espionage Group in Latin America

Reports on cyber espionage operations have been on the rise in the last decade. However, operations in Latin America are heavily under researched and potentially underestimated. In this paper we analyze and dissect a cyber espionage tool known as Machete. Our research shows that Machete is operated by a highly coordinated and organized group who focuses on Latin American targets. We describe the five phases of the APT operations from delivery to exfiltration of information and we show why Machete is considered a cyber espionage tool. Furthermore, our analysis indicates that the targeted victims belong to military, political, or diplomatic sectors. The review of almost six years of Machete operations show that it is likely operated by a single group, and their activities are possibly state-sponsored. Machete is still active and operational to this day.

Deep Convolutional Neural Networks for DGA Detection

A Domain Generation Algorithm (DGA) is an algorithm to generate domain names in a deterministic but seemly random way. Malware use DGAs to generate the next domain to access the Command & Control (C&C) communication server. Given the simplicity of the generation process and speed at which the domains are generated, a fast and accurate detection method is required. Convolutional neural network (CNN) are well known for performing real-time detection in fields like image and video recognition. Therefore, they seemed suitable for DGA detection. The present work provides an analysis and comparison of the detection performance of a CNN for DGA detection. A CNN with a minimal architecture complexity was evaluated on a dataset with 51 DGA malware families and normal domains. Despite its simple architecture, the resulting CNN model correctly detected more than 97% of total DGA domains with a false positive rate close to 0.7%.

An Analysis of Convolutional Neural Networks for detecting DGA

A Domain Generation Algorithm (DGA) is an algorithm to generate domain names in a deterministic but seemly random way. Malware use DGAs to generate the next domain to access the Command Control (C&C) communication channel. Given the simplicity and velocity associated to the domain generation process, machine learning detection methods emerged as suitable detection solution. However, since the periodical retraining becomes mandatory, a fast and accurate detection method is needed. Convolutional neural network (CNN) are well known for performing real-time detection in fields like image and video recognition. Therefore, they seem suitable for DGA detection. The present work is a preliminary analysis of the detection performance of CNN for DGA detection. A CNN with a minimal architecture complexity was evaluated on a dataset with 51 DGA malware families as well as normal domains. Despite its simple architecture, the resulting CNN model correctly detected more than 97% of total DGA domains with a false positive rate close to 0.7%.

Bringing a GAN to a Knife-Fight: Adapting Malware Communication to Avoid Detection.

Generative Adversarial Networks (GANs) have been successfully used in a large number of domains. This paper proposes the use of GANs for generating network traffic in order to mimic other types of traffic. In particular, our method modifies the network behavior of a real malware in order to mimic the traffic of a legitimate application, and therefore avoid detection. By modifying the source code of a malware to receive parameters from a GAN, it was possible to adapt the behavior of its Command and Control (C2) channel to mimic the behavior of Facebook chat network traffic. In this way, it was possible to avoid the detection of new-generation Intrusion Prevention Systems that use machine learning and behavioral characteristics. A real-life scenario was successfully implemented using the Stratosphere behavioral IPS in a router, while the malware and the GAN were deployed in the local network of our laboratory, and the C2 server was deployed in the cloud. Results show that a GAN can successfully modify the traffic of a malware to make it undetectable. The modified malware also tested if it was being blocked and used this information as a feedback to the GAN. This work envisions the possibility of self-adapting malware and self-adapting IPS.

Reliable Machine Learning for Networking: Key Issues and Approaches.

Machine learning has become one of the go-to methods for solving problems in the field of networking. This development is driven by data availability in large-scale networks and the commodification of machine learning frameworks. While this makes it easier for researchers to implement and deploy machine learning solutions on networks quickly, there are a number of vital factors to account for when using machine learning as an approach to a problem in networking and translate testing performance to real networks deployments successfully. This paper, rather than presenting a particular technical result, discusses the necessary considerations to obtain good results when using machine learning to analyze network-related data.

Detection of HTTPS Malware Traffic

In the last years there has been an increase in the amount of malware using HTTPS traffic for their communications. This situation pose a challenge for the security analysts because the traffic is encrypted and because it mostly looks like normal traffic. Therefore, there is a need to discover new features and methods to detect malware without decrypting the traffic. A detection method that does not need to unencrypt the traffic is cheaper (because no traffic interceptor is needed), faster and private, respecting the original idea of HTTPS. The goal of this thesis is to detect HTTPS malware connections by extracting new features and using data from the Bro IDS program. Since the data for the research is hard to come by, we used data from the Stratosphere project and we created, by hand, our own datasets. Our unit of analysis is an aggregation of all the information that is possible to obtain without decrypting the data. We group together flows, SSL data and X.509 certificates data as they are generated by Bro. To classify the HTTPS malware traffic we used several algorithms, such as Neural Networks, XGBoost and Random Forest. Our results show that the HTTPS malware behaviour is distinct from normal HTTPS behaviour and that our methods are able to separate them with an accuracy of at least 96.64%.

Observer effect: How Intercepting HTTPS traffic forces malware to change their behavior

During the last couple of years there has been an important surge on the use of HTTPs by malware. The reason for this increase is not completely understood yet, but it is hypothesized that it was forced by organizations only allowing web traffic to the Internet. Using HTTPs makes malware behavior similar to normal connections. Therefore, there has been a growing interest in understanding the usage of HTTPs by malware. This paper describes our research to obtain large quantities of real malware traffic using HTTPs, our use of man-in-the-middle HTTPs interceptor proxies to open and study the content, and our analysis of how the behavior of the malware changes after being intercepted. The research goal is to understand how malware uses HTTPs and the impact of intercepting its traffic. We conclude that the use of an interceptor proxy forces the malware to change its behavior and therefore should be carefully considered before being implemented.

Detecting DGA malware traffic through behavioral models

Some botnets use special algorithms to generate the domain names they need to connect to their command and control servers. They are refereed as Domain Generation Algorithms. Domain Generation Algorithms generate domain names and tries to resolve their IP addresses. If the domain has an IP address, it is used to connect to that command and control server. Otherwise, the DGA generates a new domain and keeps trying to connect. In both cases it is possible to capture and analyze the special behavior shown by those DNS packets in the network. The behavior of Domain Generation Algorithms is difficult to automatically detect because each domain is usually randomly generated and therefore unpredictable. Hence, it is challenging to separate the DNS traffic generated by malware from the DNS traffic generated by normal computers. In this work we analyze the use of behavioral detection approaches based on Markov Models to differentiate Domain Generation Algorithms traffic from normal DNS traffic. The evaluation methodology of our detection models has focused on a real-time approach based on the use of time windows for reporting the alerts. All the detection models have shown a clear differentiation between normal and malicious DNS traffic and most have also shown a good detection rate. We believe this work is a further step in using behavioral models for network detection and we hope to facilitate the development of more general and better behavioral detection methods of malware traffic.

The Network Behavior of Targeted Attacks. Models for Malware Identification and Detection.

The network patterns of Targeted Attacks is very different from the usual malware because of the different attacker’s goals. Therefore, it is difficult to detect targeted attacks looking for DNS anomalies, DGA traffic or HTTP patterns. However, our analysis of targeted attacks reveals novel patterns in their network communication. These patterns were incorporated into our Stratosphere IPS in order to model, identify and detect the traffic of targeted attacks. With this knowledge it is possible to alert attacks in the network within a short time, independently of the malware used. The Stratosphere project analyzes the inherent patterns of malware actions in the network using Machine Learning. It uses Markov Chains algorithms to find patterns that are independent of static features. These patterns are used to build behavioral models of malware actions that are later used to detect similar traffic in the network. The tool and datasets are freely published.

Modelling the Network Behaviour of Malware To Block Malicious Patterns

Current malware traffic detection solutions work mostly by using static fingerprints, white and black lists and crowd-sourced threat intelligence analytics. These methods are useful for detecting known malware in real time, but are insufficient for detecting unknown malicious trends and attacks. Our proposed complementary solution is to analyse the inherent patterns of malware actions in the network by means of machine learning algorithms. In particular, we use Markov chains-based algorithms to find network patterns that are independent of static features, such as IP addresses or payloads. These patterns are used to build behavioural models of malware actions that are later used to detect similar traffic in the network. All these models and detection algorithms have been used to create a free software intrusion prevention system called Stratosphere IPS, which has been thoroughly tested with normal and malware traffic. The IPS is able to detect new network patterns that are similar to known malicious behaviours. The Stratosphere IPS tool will be used to show how behavioural models can detect real malware traffic.

Identifying, Modeling and Detecting Botnet Behaviors in the Network

Garcia, S. (2014). Identifying, Modeling and Detecting Botnet Behaviors in the Network. UNICENUniversity. PhD Thesis. doi:(10.13140/2.1.3488.8006)

Abstract

Botnets are the technological backbone supporting myriad of attacks, including identity stealing, organizational spying, DoS, SPAM, government-sponsored attacks and spying of political dissidents among others. The research community works hard creating detection algorithms of botnet network traffic. These algorithms have been partially successful, but are difficult to reproduce and verify; being often commercialized. However, the advances in machine learning algorithms and the access to better botnet datasets start showing promising results. The shift of the detection techniques to behavioral-based models has proved to be a better approach to the analysis of botnet patterns. However, the current knowledge of the botnet actions and patterns does not seem to be deep enough to create adequate traffic models that could be used to detect botnets in real networks. This thesis proposes three new botnet detection methods and a new model of botnet behavior that are based in a deep understanding of the botnet behaviors in the network. First the SimDetect method, that analyzes the structural similarities of clustered botnet traffic. Second the BClus method, that clusters traffic according to its connection patterns and uses decision rules to detect unknown botnet in the network. Third, the CCDetector method, that uses a novel state-based behavioral model of known Command and Control channels to train a Markov Chain and to detect similar traffic in unknown real networks. The BClus and CCDetector methods were compared with third-party detection methods, showing their use in real environments. The core of the CCDetector method is our state-based behavioral model of botnet actions. This model is capable of representing the changes in the behaviors over time. To support the research we use a huge dataset of botnet traffic that was captured in our Malware Capture Facility Project. The dataset is varied, large, public, real and has Background, Normal and Botnet labels. The tools, dataset and algorithms were released as free software. Our algorithms give a new high-level interface to identify, visualize and block botnet behaviors in the networks.

Summary-schema-of-the-BClus-detection-method_W640.jpg

An Empirical Comparison of Botnet Detection Methods

The results of botnet detection methods are usually presented without any comparison. Although it is generally accepted that more comparisons with third-party methods may help to improve the area, few papers could do it. Among the factors that prevent a comparison are the difficulties to share a dataset, the lack of a good dataset, the absence of a proper description of the methods and the lack of a comparison methodology. This paper compares the output of three different botnet detection methods by executing them over a new, real, labeled and large botnet dataset.

Survey on Network-based Botnet Detection Methods.

Botnets are an important security problem on the Internet. They continuously evolve their structure, protocols and attacks. This survey analyzes and compares the most important efforts done in the network-based detection area. It accomplishes four tasks: first, the comparison of previous surveys and the proposal of four new dimensions to analyze their classification schemes. Second, a new classification and comparison of network-based botnet detection proposals, that includes the definition of twenty desired properties of every botnet detection paper. Third, an extensive comparison between the most representative detection proposals. Fourth, the description of the most important problems and highlights in the area. We conclude that the area has achieved great advances so far, but there are still many open problems.