Use Case: UptimeRobot & Stratosphere IoT Laboratory

This blog post was authored by Veronica Valeros (@verovaleros) on 2023/10/09.

The Stratosphere Laboratory is a research group at the AIC, FEL, Czech Technical University in Prague dedicated to research at the intersection of machine learning, cybersecurity, and helping others [1] [2]. Back in 2018, as part of a joint project between Stratosphere and a private cybersecurity company, we were tasked with building a top-of-the-line Internet of Things (IoT) laboratory to research cybersecurity attacks. Our IoT laboratory includes IP cameras, smart assistants, network access storage servers, routers, and other devices that are directly exposed to attacks from the Internet. Today, this laboratory is our main source of intelligence on malicious cyber attacks, which are used to build and share with the community our AIP IP blocklist feeds [3], which anyone can use to block attacks in their networks.

For research, it's vital to minimize data-gathering gaps at all costs if we are to produce sound scientific analysis. When it came to working with IoT devices, we quickly realized how hard this was going to be. Maintaining a physical laboratory with real devices exposed to attacks on the Internet requires constant supervision and physical presence to bring devices back online when they crash. During the first few months, we struggled significantly. We had the devices connected to the network, but often, they would go down, no one would notice, and when the issue was discovered, we had already lost a day of data. We needed to increase visibility in order to trigger a quick response and decrease the data gaps; this is where UptimeRobot comes in.

We started working with UptimeRobot in our IoT lab, and it was like suddenly someone turned on the light. We configured monitors for each of our IoT devices, the main IoT lab router, and the server where we temporarily store the data. We created a dashboard with visual indicators of each device's status. We set up WebHook notifications to notify incidents directly in our workspace. We set up an automation that would take a screenshot of the dashboard and send it again directly to our workspace to check early in the morning. This gave us visibility on what we already knew was happening, but it also gave us insights into things we had no idea were happening. This new information helped us take measures to alleviate underlying issues and save time and money, as our team could spend less time maintaining the infrastructure and more time researching. We summarize two key things we learned with UptimeRobot next.

The magnitude of the problem was way larger than we expected

We knew we were experiencing downtimes, but to which extent was very hard to measure. Also, due to the nature of the laboratory, which consisted of devices exposed to the Internet, it was hard to pinpoint the causes of downtime. When we finally set up our monitoring, we realized that for some devices, we had an uptime of 50-60% –not good enough. Having the visibility allowed us to start investigating the root causes of such poor uptime and come up with new, low-cost, innovative solutions.

In the IoT laboratory, most of the downtimes were due to devices receiving attacks from the Internet. The device was then rebooted and put back online. One of the key factors was the long time it took for a person to see the alert that a device was down to reboot the device. This need for physical access needed to be solved. For this, our team incorporated smart power sockets, which allow us to control the power remotely. When we receive an alert, we remotely power the device off and on, minimizing the downtime considerably. This solution alone allowed us to have, on average, >94% uptime across all devices.

Integration is the key

UptimeRobot is now part of our core tools for monitoring. The integration with Slack and the Mobile application allows us to have information on what is happening when it's happening to take quick action and resolve incidents as fast as possible.

We have set up workflows to work with the alerts, such as the smart power socket control, to help speed up the recovery process of devices. We are sending our daily dashboard to our Slack so everyone can quickly see the status, and whoever is available can take action. 

In conclusion

Our use of UptimeRobot has helped us tremendously over the past years to finally see, understand and take action on what was happening in our IoT infrastructure. It has become an indispensable tool that helps us save time and effort and also to reduce our stress as we have increased awareness of what is happening when it's happening. 

References

[1] Stratosphere (2023) Stratosphere Laboratory [online]  Available at: https://www.stratosphereips.org [Accessed: 23rd July 2023].

[2] Michal Pěchouček (2023) Artificial Intelligence Center. [online] Available at: https://www.aic.fel.cvut.cz//about [Accessed: 23rd July 2023].

[3] Joaquin Bogado, Thomas O’Hara, and Sebastian Garcia (2023) The Stratosphere Blocklist Generation Project. [online] Available at: https://mcfp.felk.cvut.cz/publicDatasets/CTU-AIPP-BlackList/ [Accessed: 23rd July 2023].