At 7:14 AM PT on Wednesday, November 23, 2022, an elevated number of ordinary database queries generated a heavy disk write load on Inkling's user database. This had the effect of locking up an internal authentication service. Since other services rely upon that auth service, a ripple effect occurred where those other services became unable to process their load.
During this time, a self-healing mechanism in our database configuration prevented complete overload of the database, allowing it to recover. Inkling engineers responded by restarting the internal authentication service as well as our core internal API service. These things combined, the services were able to restart successfully. The recovery spread out to the other affected services. In total, Inkling systems were affected for approximately 12 minutes.
Inkling Engineering is pursuing options to prevent the issue from recurring, including optimizing database memory settings, refactoring the problematic query, and archiving unused data to reduce the query's pressure.