This blogpost was authored by Agustin Parmisano (@AgustinParmisan) on October 8th, 2019
In the process of analyzing malware on the network, analysts face multiple challenges, such as analysing large traffic capture files, multiple protocols and hosts, to name a few. This is costly in time and effort and requires great knowledge. In order to find data in the traffic capture that provide us with the necessary information to draw conclusions about the malware studied, such as the type of malware, whether it is working or not, who is attacking and if it is maintaining a connection and exchanging data with some Command and Control server, it is necessary to detect certain patterns in the network traffic that is being analyzed. Those patterns are, for example, which ports are being used, whether they are known as malicious or not, the duration of the connections, the amount of data exchanged between the different members of the network connections, if IRC, SSH or other communications strategy is being used to obtain the orders from some Command and Control server, among others. In the analysis of the connections between the infected device and other systems on the network, it is often necessary to deepen the study of the data that is being exchanged to detect a pattern that indicates malicious traffic. When this data is encoded in hexadecimal it is complex for analysts to understand what is happening at a glance.
There are some tools that automate the analysis of communications in the network and decode almost all messages in hexadecimal. But nevertheless, on some occasions the decoded characters do not belong to the most used group of characters and are not possible to visualize by these tools not allowing the Network analysts understand the attack that is occurring, for example, Cyrillic characters used by the Russian language.
Hexa Payload Decoder
In order to facilitate the automatic analysis of encoded data, we developed “Hexa Payload Decoder”. This tool is able to process a pcap file and return any decoded characters translated to English. The tool automates and speeds up the manual process previously done by the human analyst.
The tool receives a pcap extension network analysis file, filters all TCP traffic, obtains the data encoded in the hexadecimal system, decodes it including all characters (both the most used and the least known), detects the language and translates them into English. This is possible because the tool, written in Bash scripting language, uses three key tools: Tshark [1], a tool for filtering and analyzing network floppies to filter TCP packets; CyberChef [2], an advanced decoding tool written in Javascript script language; and GoogleTrans [3], a Python scripting language library that detects and translates any language.
Use case example
To perform network malware analysis the research team infects Raspberry Pi (RPI) devices in a controlled network connected to the internet. During the analysis of CTU-IoT-Malware-Capture-52-1, one of the network traffic captures of the infected RPI devices, we found some suspicious behaviours:
The infected device maintained long-term communications in time and with a large amount of data exchange with a server with IP 185.244.25.108 to port 4441
When analyzing the TCP traffic to destination port 4441 we found data in hexadecimal format which were mostly interpreted by the well-known Wireshark network traffic analysis tool:
One of the payloads in the hexadecimal system found with a length of 7 bytes was "0674656c6e 6574" and was decoded by Wireshark as the string in ASCII ".telnet".
Another more interesting data payload in hexadecimal system with length of 44 bytes "1b5b313b33346dd0bfd0bed0bbd18cd0b7d0bed0b2d0b0d182d0b5d0bbd18c1b5b313b33336d3a201b5b306d" was decoded by Wireshark as the ASCII string: "[1;34m.........................[1;33m: .[0m" . This decoding that gives us very little information.
When analyzing the long suspicious payload with various tools that convert hexadecimal to ASCII, we did not obtain any coherent response.
Using the multi purpose decoding tool CyberChef gave us the following result: ". [1; 34mпользователь. [1; 33m: . [0m" which translated with the GoogleTrans Python library from Russian into English turns out to be the word "user" between ANSI codes used to give color to the characters in terminal systems.
Conclusions
The tool developed can help network analysts save time by hand analyzing each stream of data from a suspicious traffic capture. Running the tool knowing or not the suspicious ports and filtering by the size of payload can generate results that allow malware analysts in the network to obtain the necessary data to develop the conclusions they were looking for. The results obtained by the tool in the tests carried out with the analyzed capture proved the help it provides to discover and make readable data that otherwise could not be found in a manual analysis.
This tool has the following software packages as prerequisites:
Node JavaScript engine: https://nodejs.org/en/
Python3: https://www.python.org/
GoogleTrans pip3 package: https://py-googletrans.readthedocs.io/en/latest/
Download
Link to the tool: https://github.com/stratosphereips/Hexa_Payload_Decoder/blob/master/README.md
References
Tshark: https://www.wireshark.org/docs/man-pages/tshark.html
CyberChef: https://github.com/gchq/CyberChef
GoogleTrans: https://py-googletrans.readthedocs.io/en/latest/