This blog post was originally published on 10 November 2014, by Sebastian Garcia, at https://mcfp.weebly.com/analysis/archives/11-2014.
While analyzing our capture CTU-Malware-Capture-Botnet-89-1 we found out that there were some strange issues with the periodicity of the C&C channels. In this capture there were a lot of HTTP connections, but few of them were periodic. During the analysis of the network capture we usually start looking at the NetFlows and then we move to the payload data. What we found is that several periodic HTTP connections had non-periodic NetFlows. This was strange for us so we took a deeper look.
The traffic of this malware looks something like this in our monitoring server:
We first converted the pcap file to a web log file (using justsniffer) to see the HTTP requests better. An example of the requests are:
TimeStamp Method URL 1339.609 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0 1632.742 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0 1933.323 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0
To find out the periodicity of these requests we just compute the difference between timestamps and we print it in the first column. These differences were around 300 seconds, i.e. 5 minutes:
293.133 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0 300.581 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0 293.941 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY &version=4.0.0.72&p2p=0&http=0&ts=0&up=0
This confirmed that these HTTP requests were periodic, but what about their NetFlows? To find out the NetFlows we extracted the IP addresses used in these requests and we sorted them by amount of requests. The results are:
Amount, IP
- 16, 202.108.14.236
- 18, 202.108.14.235
- 18, 220.181.184.199
- 18, 220.181.184.75
- 19, 220.181.184.74
- 20, 111.206.22.76
- 21, 202.108.14.19
- 25, 111.206.22.77
- 25, 220.181.109.16
- 26, 202.108.14.221
- 29, 220.181.184.166
- 30, 220.181.109.15
- 31, 202.108.14.219
This was the first interesting part, since the same URL was being requested alternatively to different IP addresses. Once that the IP addresses were extracted, we generated their 4-tuples and see their periodicity (described in the HackLu 2014 presentation). To get the 4-tuples we first convert the pcap file to a bidirectional Argus file:
argus -F argus.conf -r 2014-09-15_capture-win2.pcap -w 2014-09-15_capture-win2.biargus
Then we extract the NetFlows (the ra.conf file has specific fields):
ra -r 2014-09-15_capture-win2.biargus -n -Z b -F ra.conf > 2014-09-15_capture-win2.binetflow
And then we use our CCDetector.py (to be released soon) program that implements the state-based behavioral model (also described in the HackLu 2014 presentation).
CCDetector.py -f 2014-09-15_capture-win2.binetflow -P oneline > 2014-09-15_capture-win2.3model
From this .3model file we can see the characteristics of the 4-tuples related with the IP addressess:
- 10.0.2.102-202.108.14.221-80-tcp State:220s0ssss0ssss0ssss0ss0ssssssst0s
- 10.0.2.102-220.181.184.74-80-tcp State:120ss0s0sss0s0s0s0s0s0s0ss0s0ss0s
- 10.0.2.102-220.181.109.158-80-tcp State:110r
- 10.0.2.102-220.181.109.16-80-tcp State:22sssss0s0s0ssb0ss0sss0s0sssss0ss
- 10.0.2.102-220.181.184.199-80-tcp State:220s0s0s0s0ss0s0s0s0s0ss0s0sss
- 10.0.2.102-220.181.184.166-80-tcp State:220s0sssbbss0sss0ss0ss0ss0ssssssssss
- 10.0.2.102-220.181.109.15-80-tcp State:220ssbs0ss0s0ss0sssssssssbB0ss0s0ssss0s0s
- 10.0.2.102-202.108.14.236-80-tcp State:22ssss0s0s0sss0sssss
- 10.0.2.102-220.181.109.159-80-tcp State:110r
- 10.0.2.102-202.108.14.219-80-tcp State:22sss0sssssss0ssss0s0ss0ssssss0ssbsb0s
- 10.0.2.102-202.108.14.19-80-tcp State:22ssss0s0st0ss0sst0s0ss0s0s0ss
- 10.0.2.102-111.206.22.77-80-tcp State:22Bbsbssssss0s0sss0ssB0ss0ss0ss
- 10.0.2.102-220.181.184.75-80-tcp State:220ts0sss0ssss0s0ssss0ss
- 10.0.2.102-111.206.22.76-80-tcp State:22ssts0s0ss0ss0sss0s0ss0s0ss
- 10.0.2.102-202.108.14.235-80-tcp State:23b0sss0s0ssss0s0sss0s0s0s
In our state-based behavioral model the letters for periodic flows are 'a' to 'f' and 'A' to 'F'. Considering that the letters in these previous states were mostly 's' and '0', we conclude that there are NO periodic flows in these connections. However, we know that the HTTP requests are periodic. So what happened?
To confirm that these previous 4-tuples are not periodic we can 'open' the 4-tuple and see the flow by flow analysis. This is the information for the first 4-tuple:
1970-01-01 02:37:09.810596 T1=-1 T2=-1 TD= 0.0 1970-01-01 02:42:09.781309 T1=-1 T2=299.970713 TD= 0.0 1970-01-01 05:27:11.154282 T1=299.970713 T2=9901.372973 TD=9601.4 1970-01-01 07:17:11.504023 T1=9901.372973 T2=6600.349741 TD=-3301.0 1970-01-01 08:07:12.292119 T1=6600.349741 T2=3000.788096 TD=-3599.6 1970-01-01 08:37:12.330020 T1=3000.788096 T2=1800.037901 TD=-1200.8 (...)
Here T2 is the time difference between the current flow and the previous one, and T1 is the time difference between the previous flow and two flows ago. The values shown mean that the times of these requests were 299s, 9901s, 6600s, 3000s, etc, which are not periodic. So we confirm that the flows for the IP 202.108.14.221 were not periodic.
The answers to this problem is that the bot was sending HTTP requests to a specific URL, but the IP addresses assigned to the web server keep changing in some sort of load balancing schema. This is very common in normal applications, but in this case the malware is using a complex load balancing to have a periodic C&C HTTP connection.
IMPLICATIONS
The implications of this load balancing schema are that:
- When researchers analyze network traffic, we tend to consider each connection separately. If the detection method is using NetFlows, it is most probably going to miss this periodicity.
- If the web log analysis is using the IP address of the web server as an index, then it could be possible that the researcher will miss the connections to the rest of the IP addresses.
- Finally, we think that the owner of the malware is not aware of this complications because the load balancing seems to be designed to give more resilience to the botnet and not to hide the network patterns.