Generating Your Own Blocklists with the Stratosphere AIP Framework

This blog post was authored by Veronica Valeros (@verovaleros) on August 4th, 2023.

In this blog post, we describe how to run AIP on a cloud instance server, to read from Zeek logs and generate your own blocklist feed of IPs to block. The blog is divided into five parts: 

  1. What is AIP?

  2. We describe how to set up a new cloud server in Digital Ocean.

  3. How to configure the cloud server with Zeek running.

  4. Fourth, how to prepare the environment and configurations for AIP to run.

  5. How to run AIP and generate your own blocklists.

What is AIP?

AIP (Attacker IP Prioritizer) is a framework developed at the Stratosphere Laboratory to process input network flows from honeypots and output a blocklist feed that can be used to block incoming attacks [1]. AIP was born out of the need to generate small, compact, and effective blocklist feeds for IoT devices that usually do not have the capacity to load hundreds of thousands of IPs to block. Originally developed by Thomas O'Hara for his bachelor thesis [2], the project was re-engineered, improved, and further developed by Joaquin Bogado [3].

The magic of the framework is in the machine learning models used to generate the different blocklists. The AIP ‘Prioritize New’ will generate a list where new attackers will have more weight and end up being at the top of the list [2]. In contrast, the AIP ‘Prioritize Consistent’ will generate a list where attackers that consistently and repeatedly attack the honeypot over time will have more weight and be at the top of the list [2]. Additionally, AIP currently features three additional models: alpha, alpha7 [3], and random forest. Alpha models focus on the attackers observed in a given time window independently of how they attack, how much they attack or when they attack. The Random Forest model is trained with the data of the previous days. It tries to predict which IPs will be active in the next 24 hours.

As illustrated in Figure 1, the AIP pipeline currently consists of seven phases. The first phase is the data input, where AIP reads Zeek connection logs. The second phase is the attack identification, where AIP reads the data input and keeps only the incoming data to selected IPs specified in the configuration, discarding all other traffic. The third phase is the attack aggregation, where incoming flows are aggregated per attacking IP address and various metrics are calculated accordingly. The fourth phase is the model execution, where each model processes the aggregated traffic following their own method. The fifth phase is the blocklist generation, where the blocklists are created, one per model. In the sixth phase, the created blocklists are parsed and all IPs marked in the configuration as allowed will be removed from the final list. The seventh and last phase is where the final blocklist feed is produced ready to be consumed.

Figure 1 - Simplified diagram of the AIP framework process from data ingestion to feed generation.

Setting up a Cloud Server with Zeek

For this blog, we will use Digital Ocean as a cloud service provider, as they offer a wide variety of low-cost servers and regions worldwide where to locate the servers [4]. To build the AIP docker from source, select a server with at least 4GB of RAM with Ubuntu 22.04 (LTS) x64 as the operating system or similar. In our case, we will use the already pre-built image of AIP on DockerHub, so we will select the low-cost VPS of Figure 2. Please note that this is a test system, and more memory and storage would be required for a longer, more stable deployment.

Figure 2 - Digital Ocean droplet for this guide is a 1GB 1CPU droplet, with a regular SSD, of 25GB storage.


After selecting the Operating System, CPU and Memory, and selecting the SSH keys to be able to log in, hit the create droplet button at the bottom of the page. We have other step-by-step guides on how to perform these steps if you need them [5].

Configuring the Server

Now that the droplet is created, it is time to upgrade it to have the latest version and to have Docker and Zeek up and running. The steps that follow are performing a system upgrade, installing Docker and Zeek, and running Zeek to start collecting data. Please note that Digital Ocean the default user is root, this is why all the commands are run without 'sudo'.

System Upgrade

Before continuing, we will do a general system upgrade to ensure all the packages and installations are done on the latest versions. 

apt update && apt dist-upgrade -y

Install docker

We need Docker installed to be able to build and run AIP. There are detailed instructions on the official Docker page [6]. Please refer to them to get the latest valid actions. Here are the steps we took at the time of writing this blog:

  • Add Docker’s official GPG key:

    • install -m 0755 -d /etc/apt/keyrings
    • curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
    • chmod a+r /etc/apt/keyrings/docker.gpg
  • Set up the Docker repository sources:

    • echo \
      "deb [arch="$(dpkg --print-architecture)"
      signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
      "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  • Install docker with apt:

    • apt update && apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

Install Zeek

AIP needs Zeek logs in the format and directory structure provided by Zeek when reading from a network interface. To have data in this format, we will install and run Zeek on the host.

  • Set up repository sources:

    • echo 'deb http://download.opensuse.org/repositories/security:/zeek/xUbuntu_22.04/ /' | sudo tee /etc/apt/sources.list.d/security:zeek.list
    • curl -fsSL https://download.opensuse.org/repositories/security:zeek/xUbuntu_22.04/Release.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/security_zeek.gpg > /dev/null
  • Install Zeek with apt:

    • apt update && apt install -y zeek-lts

Reboot (optional)

We did a lot of updates and upgrades, and Ubuntu recommends a reboot. This is a personal choice, but given no services are running right now, we can safely comply. 

Run Zeek

To run Zeek invoke the zeekctl manager to deploy Zeek on the current machine:

  • /opt/zeek/bin/zeekctl deploy

If all is running well, you should be able to see some logs already on /opt/zeek/logs/current. If you reboot the machine after this point, check that Zeek is running. For long-term deployment, it is recommended to install Zeek as a systemd service to guarantee the process is running continuously. Note also that by default, Zeek will automatically delete the logs after 7 days, so if you want to keep historical logs, you need to dive into Zeek settings for that.

AIP: Prepare Environment

The best way of currently running AIP is through a Docker image. There is already a docker image on DockerHub and GitHub as a package you can download and use. Remember that if you want to build the image, the VPS requirements are insufficient, as at least 4GB of RAM is required.

  • Create a group to control access to the Zeek data:

    • groupadd dataset
  • Create a new user on the system that will build and run AIP, with access to the data:

    • adduser aiplocal
  • Add the new user to the groups docker and dataset: 

    • usermod -a -G docker aiplocal
    • usermod -a -G dataset aiplocal


The rest of the steps will be performed as the user aiplocal:

  • Switch to the user aiplocal:

    • sudo su - aiplocal
  • Clone the AIP repository as we need the output directory structure and some configuration files from it:

    • git clone https://github.com/stratosphereips/AIP.git --depth=1 --branch main AIP
  • Navigate to the AIP directory:

    • cd AIP

AIP requires two configuration files to run. The first file is called honeypots_public_ips.csv and it is related to the Attack Identification phase illustrated in Figure 1. The input Zeek logs will be filtered, and only data matching these IPs will be considered for analysis. The second file, do_not_blog_these_ips.csv, is related to the later phase of Blocklist Filtering, where AIP has the ability to allow certain IPs and never include them in the resulting blocklist. For instance, the public IP you use to SSH into the cloud server can be added here as you are not an attacker.

  • Make a copy or rename the CSVs:

    • mv data/external/honeypots_public_ips_example.csv data/external/honeypots_public_ips.csv
    • mv data/external/do_not_block_these_ips_example.csv data/external/do_not_block_these_ips.csv
  • Edit the honeypot public IPS to include IPs from your honeypot. 

    • vim data/external/honeypots_public_ips.csv
      • Add your public IP

  • Edit the external file to include your public IP so it is not blocklisted:

    • vim data/external/do_not_block_these_ips.csv

Wait for Zeek data

AIP needs Zeek data to run. How much data? Well, this depends on the models. For the Alpha and Alpha7 models, one day of traffic may be enough. For the Prioritize Consistent and Prioritize New models, they need at least 3 days of data. AIP is under development, and we hope to implement more options soon to add flexibility on the type of data ingested and dates of data ingested.

If you have a fresh installation of the VPS, as we just did on the guide, you will have to wait to have some data available. Note that AIP does not read from the Zeek’s current directory, and needs logs already rotated.

Once you have some data, change the ownership of the Zeek logs output folder so the new user can access:

  • chown aiplocal.dataset -R /opt/zeek/logs/

Figure 3 - Directory with the Zeek logs output with 3 days of data.

Remember this setup is valid for testing purposes. In a more professional setting, you may want to set up your data permissions and workflow differently. 

AIP: Generate Your Own Blocklists

To run AIP through docker, we need to map the input and output of data. The input data should be mapped read-only, as AIP only needs to read these files, and it would avoid any potential issues. The output folder needs to be read-write as this is where AIP stores the data generated. 

Now there is some data, let's run AIP:

  • docker run --rm -v /opt/zeek/logs/:/home/aip/AIP/data/raw:ro -v ${PWD}/data/:/home/aip/AIP/data/:rw --name aip stratosphereips/aip:latest bin/aip
    • --rm → remove the container after execution finished

    • -v /opt/zeek/logs/:/home/aip/AIP/data/raw:ro → mount zeek data to the container

    • -v ${PWD}/data/:/home/aip/AIP/data/:rw → mount the current directory data folder to the container with read-write permissions.

    •  --name aip → name of the container that is being created

    • stratosphereips/aip:latest → AIP docker image to use.

    • bin/aip → run AIP

AIP logs will be printed in the console, as shown in Figure 4.

Figure 4 - AIP running as a docker container builds the attack metrics based on Zeek input data.

Once AIP finishes, the blocklists will be found in the data/output directory as shown below:

Figure 5 - AIP writes data in the data/output folder where all the output blocklists can be found


AIP Prioritizers

The Prioritizers models generate a list of attackers' IPs taking into account the characteristics of the attackers. The Prioritize New will place at the top of the list attackers' IPs that were observed for the first time most recently. Attackers' IPs that continuously attack the honeypots will have a lower priority and be placed lower in the list. The Prioritize Consistent will in turn place at the top of the list attackers' IPs that were observed to be continuously and consistently attacking the honeypots. These two lists allow you to select more precisely what to block.

Example of the Prioritize New blocklist, showing the top 10 IPs on the list:

zcat data/output/Prioritize_New/AIP-Prioritize_New-2023-08-04.csv.gz |head -n 10

ip,score

147.32.82.228,0.4807168097880752

170.64.145.33,0.2898113065152538

94.232.43.94,0.24804037133182327

165.22.128.162,0.2095664121765605

165.22.128.150,0.2076171023171474

89.248.163.16,0.19964333492453712

89.248.163.19,0.1781200684183091

77.90.185.151,0.13878762365497624

185.224.128.142,0.11514293462198784

Example of the Prioritize Consistent blocklist, showing the top 10 IPs on the list:

zcat data/output/Prioritize_Consistent/AIP-Prioritize_Consistent-2023-08-04.csv.gz |head -n 10

ip,score

147.32.82.228,0.34190687714576296

89.248.163.16,0.08249171672628652

86.49.227.160,0.0750044206706934

89.248.163.19,0.07416107758357388

107.179.43.178,0.06149777174643439

77.90.185.151,0.05895208443456023

185.224.128.142,0.054371508536245006

91.191.209.218,0.03960083027900289

185.28.39.31,0.03938465411664789

AIP Alphas

The Alpha models generate a list of attackers’ IPs of the last N number of days, regardless of the type of attack, duration, or amount of traffic sent. This list has no prioritization, therefore, no ranking is provided.

There are currently two Alpha models, Alpha and Alpha7. Alpha generates a list of IPs taking into account the last day of data (last 24 hours). Alpha7 generates a list of IPs taking into account the last 7 days of data (last 168 hours). 

Example of the Alpha blocklist, showing the top 10 IPs on the list:

zcat data/output/Alpha/AIP-Alpha-2023-08-04.csv.gz  | head -n 10

attacker

1.117.221.245

1.117.223.105

1.15.107.76

1.182.90.159

1.192.245.73

1.2.185.105

1.23.26.210

1.53.144.106

1.58.31.158

Example of the Alpha 7 blocklist, showing the top 10 IPs on the list:

zcat data/output/Alpha7/AIP-Alpha7-2023-08-04.csv.gz |head -n 10

attacker

1.117.221.245

1.117.236.166

1.117.91.115

1.14.11.148

1.14.250.37

1.15.17.10

1.15.94.16

1.22.131.19

1.22.138.189

AIP Random Forest

The Random Forest model focuses on predicting which IPs will be active in the next 24 hours. To do this, the model is trained with the data from the previous days. The performance of this blocklist is still under evaluation, but you can try it yourself and experiment with it.

Example of the Random Forest blocklist, showing the top 10 IPs on the list:

zcat data/output/random_forest/AIP-Random_Forest-2023-08-04.csv.gz |head -n 10

ip

1.117.221.245

1.15.107.76

1.2.185.105

1.85.217.117

101.109.205.69

101.32.103.44

101.34.32.158

101.36.97.137

101.42.47.126

Before you go…

AIP is the core of our Stratosphere Blocklists, which are created with data received of more than two dozen bare metal honeypots, including IoT and non-IoT devices. The blocklists are updated daily and can be downloaded from our website: https://bit.ly/StratosphereAIPBlocklists 

If you are interested in contributing to the free-software project, join our discord https://discord.gg/9QvuCrsZax and check our repository: https://github.com/stratosphereips/AIP

References

[1] Stratosphere (2023) Stratosphere Laboratory [online]  Available at: https://www.stratosphereips.org [Accessed: 23rd July 2023].

[2] O’Hara, T. (2021) ‘The Attacker IP Prioritizer : An IoT Optimized Blacklisting Algorithm’, Czech Technical University in Prague [online] Available at: https://dspace.cvut.cz/handle/10467/96722. [Accessed: 27th July 2023]

[3] Bogado, J. (2023) The Blocklist Generation Project. [online] Stratosphere IPS. Available at: https://www.stratosphereips.org/the-blocklist-generation-project [Accessed: 27th July 2023].

[4] DigitalOcean  (2023) Cloud Hosting for Builders. [online] Available at: https://www.digitalocean.com [Accessed: 27th July 2023].

[5] Valeros, V. (2020) Installing T-Pot Honeypot Framework in the Cloud. [online] Stratosphere IPS. Available at: https://www.stratosphereips.org/blog/2020/10/10/installing-t-pot-honeypot-framework-in-the-cloud [Accessed: 27th July 2023].

[6] Docker Documentation (2023) Install Docker Engine on Ubuntu. [online] Available at: https://docs.docker.com/engine/install/ubuntu/ [Accessed: 27th July 2023].