On September 7, 2023, from 10:17 AM PT to 10:18 AM PT, performance issues in Inkling's central authentication service created failures throughout the rest of the platform. This created a queue of requests in a load balancer, stalling requests until sufficient resources freed up to process them. This led to approximately 1 minute of downtime in the platform.
Engineering teams have responded to these issues in the following ways:
- Many of the known performance issues have been isolated and corrected.
- Hardware capacity has been added to this service to help support it during peak demand.
- Additional performance monitoring tools have been installed to help detect this type of issue going forward. This has already helped identify a number of places where performance of this critical service can be improved.
- Engineering work to identify and resolve this type of problem has been prioritized.