At 11:58 AM PT on April 9, 2024, a scaling event reduced the number of Inkling's main API instances in response to an unusual reduction in traffic. When traffic increased, the scaling increased the number of instances but those took some time to come online. During this time, the remaining instances were overwhelmed, initially leading to some intermittent timeouts and ultimately to 8 minutes of platform downtime between 12:30 and 12:38 PM PT.
To mitigate the immediate problem, Engineering added capacity and restarted the core internal services. This restored the platform. The scaling logic has been updated to ensure sufficient capacity at all times.