CCDetector and BotnetDetectorComparer
Some days ago we finally made public two tools that were very important for starting this project. The tools are CCDetector and BotnetDetectorComparer. With these tools we created the experiments in the paper “An empirical comparison of botnet detection methods”. You can download them and use them to verify the paper and test more ideas. Please contact us if you need assistance.
A machine learning based detector of Command and Control channels in malware and botnet traffic. See please www.researchgate.net/profile/Sebastian_Garcia6 It uses an implementation of Markov Chains to model the state transitions of the traffic according to a model and detect similar behavioral traffic in other binetflow files.
It can also read binetflow files in real time from the network and print a nice ncurses interface with the states.
- You should give it a labeled netflow text file with -f (generated by Argus) and use -r. See the example file for details. This will generate a MCModels folder full of the markov chain models for the tuples in the file.
After training the models with some file. Use -f to give a labeled binetflow file and use -e. Without any other option, a new file will be generated with the original binetflow information and an additional column with the predicted label based on the trained models. Also a sorted version of the binetflow file is created. No input is printed in the console.
Verification and performance metrics
To verify the results and know the performance metrics you should use another program called BotnetDetectorsComparer (https://bitbucket.org/eldraco2000/botnetdetectorscomparer) With this program you do:
BotnetDetectorsComparer.py -f <binetflowfile>.labeled.sorted -t weight -T 300 -a 0.01
And find out the performance metrics according to a time window and weighted logic. Please see the papers.
- 0.90 This is the first public version. Any problem please contact firstname.lastname@example.org or email@example.com
usage: ./CCDetector.py <options> options: -h, --help Show this help message and exit -V, --version Output version information and exit -D, --debug Debug level. E.g -D 3 . -f, --file Input netflow file to analize. If - is used, netflows are read from stdin. Remember to pass the header! -t, --time-threshold First Threshold of time difference. -b, --bytes-threshold First Threshold of bytes size. -d, --duration-threshold First Threshold of duration. -T, --analyze-tuple Analyze only this tuple and print detailed information for each netflow. -u, --tuple-mode The tuple mode. can be 3 for sip-dip-dport or 4 for sip-sport-dip-dport -R, --thresholds Threshold mode. Prints all the values of the features for training the thresholds. -r, --training Training mode. Read one binetflow from -f file and outputs one Markov Chain for each label in the folder 'MCModels'. Don't use -r, -v or -e at the same time. -v, --validation Validation mode. Read one or several binetflow files (comma separated in -f ) and consider them as training-validation. Use 10-folds to compute the models, applies the models and get the best thresholds. Don't use -r, -v or -e at the same time. -e, --testing Testing mode. Read binetflows from -f file, Markov Chains from the folder 'MCModels', predicts for each tuple the chain (label) with more probability of generating it and it outputs a labeled netflow file. Don't use -r, -v or -e at the same time. -w, --without-colors Do not use colors in the output -l, --state-length Minimun length to consider the state string for analysis -p, --min-prob-threshold Threshold to use when comparing each tuple to every model. If -q is specified then this is the minimal threshold to try. You can also specify it as 1e-10. The lower limit is 1e-45. -q, --max-prob-threshold Maximum threshold to use when comparing each tuple to every model. -p must be specified also. You can also specify it as 1e-11. The lower limit is 1e-45. -L, --label Print all the informatin about all the tuples with this label. -P, --print-mode Print mode: normal, csv, oneline, and epoch. You can combine them with '-'. -a, --all-models Generate and include all models in the process. By default it only uses the models of the C&Cs. Only for the training. -n, --num-folds Number of folds in the k-fold validation. -s, --step-threshold Step to use when moving the threshold. Defaults to 10. That is from 0.1 to 0.01. Use multiples of 10.
This is a program to compare different botnet/malware detectors based on network traffic. The idea is to read a netflow file that has a new column for each prediction of an algorithm and compare how each algorithm detects the traffic. It computes the FP, FN, TP and TN for each flow in a time window, by counting the errors per IP address. At the end of each time window several performance metrics are compared, and also at the end of the capture.
To use it you should give a binetflow file, the type of comparison and the width of the time window.
./BotnetDetectorsComparer.py -f statisticGenerator.testcasewithheaders9.txt -t weight -T 300
Giving an alpha is also a good idea, if not the program will assume a default of 0.01 (like in our experiments)
With -p it will plot and open a window with the graph information for each method. With -P
Any problem contact firstname.lastname@example.org or email@example.com
usage: ./BotnetDetectorsComparer.py <options> options: -h, --help Show this help message and exit -V, --version Output version information and exit -v, --verbose Output more information. -D, --debug Debug. In debug mode the statistics run live. -f, --file SORTED input netflow labeled file to analyze (Netflow or Argus). -t, --type Type of comparison. Flow based (-t flow), time based (-t time), or weighted (-t weight). The weighted type is the new IP-based and time-based error metric. -T, --time While using time based comparison, specify the time window to use in secodns. E.g. -T 120 -p, --plot Plot the fmeasures of all methods and show them on the display. -a, --alpha In weight mode, use this alpha for computing the score (defaults to 0.4). -c, --cvs Print the final scores in cvs format into the specified file. E.g. -c results.csv -o, --out Store in a log file everything that is shown in the screen. -o all-info.txt -P, --plot-to-file Instead of showing the plot on the screen, store it in a file. Type of plot given by the file extension. E.g. -P all-info.eps. The name of each algorithm is added at the beginning of the file name.