25th April Incident Report

Issue Overview

4 min readApr 27, 2024

Malicious users were able to alter the metadata of devices, such as device names and statuses (e.g., changing a device’s status from up to down). This was visually confirmed through the front-end display of the devices.

The actual hardware of the GPUs remained secure. Malicious users could not access the underlying resources, as running compute jobs on users’ hardware is extremely secured and protected by multiple permission layers.

Full Incident Breakdown:

On 23rd April 2024, 26 hours before the incident, we introduced a Proof of Work mechanism aimed at identifying counterfeit GPUs, while continuously enhancing our security measures on a weekly basis. The aggressive security patches we pushed caused a significant escalation in the methods employed by the attackers, prompting continuous reviews and enhancements of our security protocols.

In this incident, the attackers authenticated to the workers-api using a device that granted the user a valid universal authentication token, which was historically identical across all recognized GPUs.

Previously, our system safeguarded against unauthorized changes by ensuring that metadata modifications (including user ID and device ID) could only be executed by the legitimate user associated with that device ID.

How it Happened:

A vulnerability in another API, designed to display content within the IO explorer, accidentally exposed user IDs when searching by device IDs. This leakage was exploited by malicious actors who compiled this information into their own database a few weeks ago before we found and fixed it. The reason they saved the data was to identify who had the most GPUs for the airdrop.
Later on, they discovered that on a different API, the ‘worker-api’ — which is used/called from within the worker routine crons — can update any device metadata if you pass the owner’s userID in the header only, without the need for user-level authentication.
Approximately 26 hours before the incident, we activated the proof of work systems and other checks that blocked spoofed GPUs from the network.
When some of their accounts/devices were blocked from the proof of work checks, they used the leaked data to alter the metadata of devices belonging to other users, as a revenge act for being blocked. It is also important to note that we haven’t yet finished flushing spoofed GPUs, so we expect more attacks as we complete the cleaning

IO Explorer (API powering this UI was leaking userID):

Code where we removed the user information leakage now

Detection of the Issue:

On 25th April, an unusually high rate of write operations to our GPU metadata API triggered alerts at 1:05 AM PST. Our on-call contacted our Dev team at 1:16 AM PST, and we began investigating the issue immediately.

Actions Taken:

We commenced remedial actions within 10 minutes of identifying the issue. Despite blocking the originating IP, the malicious user circumvented this by IP hopping and leveraging the previously harvested user ID and device ID database.
As an immediate response, we temporarily halted the ability to update metadata, such as changing hostnames, to prevent misuse. This change was not business-critical, and the access to GPU was always safe. However, this halt was necessary as it caused some unrest within the community due to changes in the visible information shown on the front end, hence the source of FUD.
Concurrently, we expedited the implementation of a user-specific authentication solution using Auth0 with OKTA. We had been working with OKTA for three weeks, and this rollout was planned to coincide with the launch of the new platform at the same time as the TGE. We completed this within hours, ahead of schedule, eliminating the vulnerability associated with the universal authorization token.

We also migrated existing user authentication data to the new system, a process that required six hours with our SaaS database product infrastructure (Supabase Postgres).

Further Security Upgrades Completed:

We performed SQL injection checks on our APIs to strengthen security. All changes to public API endpoints are now subject to review by the security team and are pen-tested to prevent overexposure of sensitive data. We have also enhanced our logging of unauthorized attempts and initiated the development of a model to detect and respond to such anomalies more swiftly.

SQL Injection handling done in various apis

The security team will conduct thorough reviews and penetration tests on any future changes in any public endpoint to ensure robust defense against information leakage, SQL injection, and other vulnerabilities. Our ongoing efforts aim to detect and neutralize threats at an early stage, strengthening our overall security posture.

Report by:
Gaurav Sharma | CTO