The Blocklist Generation Project aims to produce high-quality Threat Intelligence feeds for the community. We use the AIP Tool[1] to preprocess the network traffic arriving at a honeypot network and to implement several algorithms that prioritize different aspects of the attackers. A different blocklist is generated for each algorithm every day.
The Alpha and Alpha7 Models
These models identify those attackers that connected to the honeypots in the previous days and add them to the blocklist of the following day. If an attacker does not continue attacking, is removed from the blocklist. In the case of the Alpha model, the attacker is removed after one day, and in the case of the Alpha7 model, it is removed if no new attacks are seen in the last 7 days.
These simple algorithms can block more than 60% of the attacks the following day. The Alpha7 is more effective but contains more non-attackers (addresses that connected to the honeypots in the past but do not do it again).
The Prioritize Consistent and Prioritize New Models
The prioritizers give a rank to every IP address that ever attacked the honeypots. This rank allows the prioritizers to give more importance to some attackers over others. The Prioritize New gives more priority to those attackers that connect to the honeypots for the first time, while the Prioritize Consistent gives more importance to those attackers that have been attacking the honeypots repeatedly for some time. These algorithms calculate a set of features of every attacker and update the ranking every day. If some IP is not important anymore (because its ranking is too low), then the IP is removed from the blocklist.
Future Models
AIP Tool has been designed to make new models easy. It is enough to make a subclass of the Base model and implement the run(for_date) method. This method returns a blocklist that can later be evaluated by the AIP Tool. As all the blocklists are evaluated the same way, it is possible to have instant feedback about how good the new models are. Also, the tool provides several handy functions to access the attacks data and to sanitize the blocklist (remove duplicates, apply a whitelist) for all the models.
If you want to try but you don't have honeypots nor access to Zeek files with real attacks, you can download our dataset of attacks[2] from 2022.