This blog post was authored by Veronica Valeros (@verovaleros) on August 4th, 2023.
In this blog post, we describe how to run AIP on a cloud instance server, to read from Zeek logs and generate your own blocklist feed of IPs to block. The blog is divided into five parts:
What is AIP?
We describe how to set up a new cloud server in Digital Ocean.
How to configure the cloud server with Zeek running.
Fourth, how to prepare the environment and configurations for AIP to run.
How to run AIP and generate your own blocklists.
What is AIP?
AIP (Attacker IP Prioritizer) is a framework developed at the Stratosphere Laboratory to process input network flows from honeypots and output a blocklist feed that can be used to block incoming attacks [1]. AIP was born out of the need to generate small, compact, and effective blocklist feeds for IoT devices that usually do not have the capacity to load hundreds of thousands of IPs to block. Originally developed by Thomas O'Hara for his bachelor thesis [2], the project was re-engineered, improved, and further developed by Joaquin Bogado [3].
The magic of the framework is in the machine learning models used to generate the different blocklists. The AIP ‘Prioritize New’ will generate a list where new attackers will have more weight and end up being at the top of the list [2]. In contrast, the AIP ‘Prioritize Consistent’ will generate a list where attackers that consistently and repeatedly attack the honeypot over time will have more weight and be at the top of the list [2]. Additionally, AIP currently features three additional models: alpha, alpha7 [3], and random forest. Alpha models focus on the attackers observed in a given time window independently of how they attack, how much they attack or when they attack. The Random Forest model is trained with the data of the previous days. It tries to predict which IPs will be active in the next 24 hours.
As illustrated in Figure 1, the AIP pipeline currently consists of seven phases. The first phase is the data input, where AIP reads Zeek connection logs. The second phase is the attack identification, where AIP reads the data input and keeps only the incoming data to selected IPs specified in the configuration, discarding all other traffic. The third phase is the attack aggregation, where incoming flows are aggregated per attacking IP address and various metrics are calculated accordingly. The fourth phase is the model execution, where each model processes the aggregated traffic following their own method. The fifth phase is the blocklist generation, where the blocklists are created, one per model. In the sixth phase, the created blocklists are parsed and all IPs marked in the configuration as allowed will be removed from the final list. The seventh and last phase is where the final blocklist feed is produced ready to be consumed.
Setting up a Cloud Server with Zeek
For this blog, we will use Digital Ocean as a cloud service provider, as they offer a wide variety of low-cost servers and regions worldwide where to locate the servers [4]. To build the AIP docker from source, select a server with at least 4GB of RAM with Ubuntu 22.04 (LTS) x64 as the operating system or similar. In our case, we will use the already pre-built image of AIP on DockerHub, so we will select the low-cost VPS of Figure 2. Please note that this is a test system, and more memory and storage would be required for a longer, more stable deployment.
After selecting the Operating System, CPU and Memory, and selecting the SSH keys to be able to log in, hit the create droplet button at the bottom of the page. We have other step-by-step guides on how to perform these steps if you need them [5].
Configuring the Server
Now that the droplet is created, it is time to upgrade it to have the latest version and to have Docker and Zeek up and running. The steps that follow are performing a system upgrade, installing Docker and Zeek, and running Zeek to start collecting data. Please note that Digital Ocean the default user is root, this is why all the commands are run without 'sudo'.
System Upgrade
Before continuing, we will do a general system upgrade to ensure all the packages and installations are done on the latest versions.
apt update && apt dist-upgrade -y
Install docker
We need Docker installed to be able to build and run AIP. There are detailed instructions on the official Docker page [6]. Please refer to them to get the latest valid actions. Here are the steps we took at the time of writing this blog:
Add Docker’s official GPG key:
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
chmod a+r /etc/apt/keyrings/docker.gpg
Set up the Docker repository sources:
echo \ "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
Install docker with apt:
apt update && apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Install Zeek
AIP needs Zeek logs in the format and directory structure provided by Zeek when reading from a network interface. To have data in this format, we will install and run Zeek on the host.
Set up repository sources:
echo 'deb http://download.opensuse.org/repositories/security:/zeek/xUbuntu_22.04/ /' | sudo tee /etc/apt/sources.list.d/security:zeek.list
curl -fsSL https://download.opensuse.org/repositories/security:zeek/xUbuntu_22.04/Release.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/security_zeek.gpg > /dev/null
Install Zeek with apt:
apt update && apt install -y zeek-lts
Reboot (optional)
We did a lot of updates and upgrades, and Ubuntu recommends a reboot. This is a personal choice, but given no services are running right now, we can safely comply.
Run Zeek
To run Zeek invoke the zeekctl manager to deploy Zeek on the current machine:
/opt/zeek/bin/zeekctl deploy
If all is running well, you should be able to see some logs already on /opt/zeek/logs/current. If you reboot the machine after this point, check that Zeek is running. For long-term deployment, it is recommended to install Zeek as a systemd service to guarantee the process is running continuously. Note also that by default, Zeek will automatically delete the logs after 7 days, so if you want to keep historical logs, you need to dive into Zeek settings for that.
AIP: Prepare Environment
The best way of currently running AIP is through a Docker image. There is already a docker image on DockerHub and GitHub as a package you can download and use. Remember that if you want to build the image, the VPS requirements are insufficient, as at least 4GB of RAM is required.
Create a group to control access to the Zeek data:
groupadd dataset
Create a new user on the system that will build and run AIP, with access to the data:
adduser aiplocal
Add the new user to the groups docker and dataset:
usermod -a -G docker aiplocal
usermod -a -G dataset aiplocal
The rest of the steps will be performed as the user aiplocal:
Switch to the user aiplocal:
sudo su - aiplocal
Clone the AIP repository as we need the output directory structure and some configuration files from it:
git clone https://github.com/stratosphereips/AIP.git --depth=1 --branch main AIP
Navigate to the AIP directory:
cd AIP
AIP requires two configuration files to run. The first file is called honeypots_public_ips.csv and it is related to the Attack Identification phase illustrated in Figure 1. The input Zeek logs will be filtered, and only data matching these IPs will be considered for analysis. The second file, do_not_blog_these_ips.csv, is related to the later phase of Blocklist Filtering, where AIP has the ability to allow certain IPs and never include them in the resulting blocklist. For instance, the public IP you use to SSH into the cloud server can be added here as you are not an attacker.
Make a copy or rename the CSVs:
mv data/external/honeypots_public_ips_example.csv data/external/honeypots_public_ips.csv
mv data/external/do_not_block_these_ips_example.csv data/external/do_not_block_these_ips.csv
Edit the honeypot public IPS to include IPs from your honeypot.
vim data/external/honeypots_public_ips.csv
Add your public IP
Edit the external file to include your public IP so it is not blocklisted:
vim data/external/do_not_block_these_ips.csv
Wait for Zeek data
AIP needs Zeek data to run. How much data? Well, this depends on the models. For the Alpha and Alpha7 models, one day of traffic may be enough. For the Prioritize Consistent and Prioritize New models, they need at least 3 days of data. AIP is under development, and we hope to implement more options soon to add flexibility on the type of data ingested and dates of data ingested.
If you have a fresh installation of the VPS, as we just did on the guide, you will have to wait to have some data available. Note that AIP does not read from the Zeek’s current directory, and needs logs already rotated.
Once you have some data, change the ownership of the Zeek logs output folder so the new user can access:
chown aiplocal.dataset -R /opt/zeek/logs/
Remember this setup is valid for testing purposes. In a more professional setting, you may want to set up your data permissions and workflow differently.
AIP: Generate Your Own Blocklists
To run AIP through docker, we need to map the input and output of data. The input data should be mapped read-only, as AIP only needs to read these files, and it would avoid any potential issues. The output folder needs to be read-write as this is where AIP stores the data generated.
Now there is some data, let's run AIP:
docker run --rm -v /opt/zeek/logs/:/home/aip/AIP/data/raw:ro -v ${PWD}/data/:/home/aip/AIP/data/:rw --name aip stratosphereips/aip:latest bin/aip
--rm → remove the container after execution finished
-v /opt/zeek/logs/:/home/aip/AIP/data/raw:ro → mount zeek data to the container
-v ${PWD}/data/:/home/aip/AIP/data/:rw → mount the current directory data folder to the container with read-write permissions.
--name aip → name of the container that is being created
stratosphereips/aip:latest → AIP docker image to use.
bin/aip → run AIP
AIP logs will be printed in the console, as shown in Figure 4.
Once AIP finishes, the blocklists will be found in the data/output directory as shown below:
AIP Prioritizers
The Prioritizers models generate a list of attackers' IPs taking into account the characteristics of the attackers. The Prioritize New will place at the top of the list attackers' IPs that were observed for the first time most recently. Attackers' IPs that continuously attack the honeypots will have a lower priority and be placed lower in the list. The Prioritize Consistent will in turn place at the top of the list attackers' IPs that were observed to be continuously and consistently attacking the honeypots. These two lists allow you to select more precisely what to block.
Example of the Prioritize New blocklist, showing the top 10 IPs on the list:
zcat data/output/Prioritize_New/AIP-Prioritize_New-2023-08-04.csv.gz |head -n 10 ip,score 147.32.82.228,0.4807168097880752 170.64.145.33,0.2898113065152538 94.232.43.94,0.24804037133182327 165.22.128.162,0.2095664121765605 165.22.128.150,0.2076171023171474 89.248.163.16,0.19964333492453712 89.248.163.19,0.1781200684183091 77.90.185.151,0.13878762365497624 185.224.128.142,0.11514293462198784
Example of the Prioritize Consistent blocklist, showing the top 10 IPs on the list:
zcat data/output/Prioritize_Consistent/AIP-Prioritize_Consistent-2023-08-04.csv.gz |head -n 10 ip,score 147.32.82.228,0.34190687714576296 89.248.163.16,0.08249171672628652 86.49.227.160,0.0750044206706934 89.248.163.19,0.07416107758357388 107.179.43.178,0.06149777174643439 77.90.185.151,0.05895208443456023 185.224.128.142,0.054371508536245006 91.191.209.218,0.03960083027900289 185.28.39.31,0.03938465411664789
AIP Alphas
The Alpha models generate a list of attackers’ IPs of the last N number of days, regardless of the type of attack, duration, or amount of traffic sent. This list has no prioritization, therefore, no ranking is provided.
There are currently two Alpha models, Alpha and Alpha7. Alpha generates a list of IPs taking into account the last day of data (last 24 hours). Alpha7 generates a list of IPs taking into account the last 7 days of data (last 168 hours).
Example of the Alpha blocklist, showing the top 10 IPs on the list:
zcat data/output/Alpha/AIP-Alpha-2023-08-04.csv.gz | head -n 10 attacker 1.117.221.245 1.117.223.105 1.15.107.76 1.182.90.159 1.192.245.73 1.2.185.105 1.23.26.210 1.53.144.106 1.58.31.158
Example of the Alpha 7 blocklist, showing the top 10 IPs on the list:
zcat data/output/Alpha7/AIP-Alpha7-2023-08-04.csv.gz |head -n 10 attacker 1.117.221.245 1.117.236.166 1.117.91.115 1.14.11.148 1.14.250.37 1.15.17.10 1.15.94.16 1.22.131.19 1.22.138.189
AIP Random Forest
The Random Forest model focuses on predicting which IPs will be active in the next 24 hours. To do this, the model is trained with the data from the previous days. The performance of this blocklist is still under evaluation, but you can try it yourself and experiment with it.
Example of the Random Forest blocklist, showing the top 10 IPs on the list:
zcat data/output/random_forest/AIP-Random_Forest-2023-08-04.csv.gz |head -n 10 ip 1.117.221.245 1.15.107.76 1.2.185.105 1.85.217.117 101.109.205.69 101.32.103.44 101.34.32.158 101.36.97.137 101.42.47.126
Before you go…
AIP is the core of our Stratosphere Blocklists, which are created with data received of more than two dozen bare metal honeypots, including IoT and non-IoT devices. The blocklists are updated daily and can be downloaded from our website: https://bit.ly/StratosphereAIPBlocklists
If you are interested in contributing to the free-software project, join our discord https://discord.gg/9QvuCrsZax and check our repository: https://github.com/stratosphereips/AIP
References
[1] Stratosphere (2023) Stratosphere Laboratory [online] Available at: https://www.stratosphereips.org [Accessed: 23rd July 2023].
[2] O’Hara, T. (2021) ‘The Attacker IP Prioritizer : An IoT Optimized Blacklisting Algorithm’, Czech Technical University in Prague [online] Available at: https://dspace.cvut.cz/handle/10467/96722. [Accessed: 27th July 2023]
[3] Bogado, J. (2023) The Blocklist Generation Project. [online] Stratosphere IPS. Available at: https://www.stratosphereips.org/the-blocklist-generation-project [Accessed: 27th July 2023].
[4] DigitalOcean (2023) Cloud Hosting for Builders. [online] Available at: https://www.digitalocean.com [Accessed: 27th July 2023].
[5] Valeros, V. (2020) Installing T-Pot Honeypot Framework in the Cloud. [online] Stratosphere IPS. Available at: https://www.stratosphereips.org/blog/2020/10/10/installing-t-pot-honeypot-framework-in-the-cloud [Accessed: 27th July 2023].
[6] Docker Documentation (2023) Install Docker Engine on Ubuntu. [online] Available at: https://docs.docker.com/engine/install/ubuntu/ [Accessed: 27th July 2023].