Master Thesis
Detecting malware and attacks by analyzing network traffic remains a challenge. Although there are several well-known detection mechanisms to accurately separate the malicious behavior of the normal, it is still extremely difficult to have a detection system that can handle all the situations that arise in the network. These known algorithms include machine learning techniques, static signatures and rules based on experience. In particular, the method most used today is based on the contribution of rules by a large community of analysts. The most important impediments to good detection are that: First, normal traffic is extremely complex, diverse and changing. Second, malicious actions change continuously, adapting, migrating and hiding as normal traffic. Third, the amount of data to analyze is huge, forcing analysts to lose data in favor of speed. And fourth, detection must occur in near real time to be of some use.
To solve some of these problems, the security learning community began to implement ensemble algorithms, or ensemble learning, in their systems. These algorithms are techniques for using, adding and summarizing information about several different detectors in a single final decision. They allow analysts to use weak detectors in series, vote on the malice of a domain and decide better blocking action based on contradictory data.
Although there were some good proposals for ensembling techniques applied to the security of the network, there are two aspects of teaching algorithms that were not fully studied. First, the application of learning assembly algorithms with community Threat Intelligence data. Secondly, there are no learning assembly algorithms that work as a function of time in the detection of the same hosts. These two problems form the basis and objectives of this thesis.
You can download the thesis from here:
http://sedici.unlp.edu.ar/handle/10915/120856