On September 21, 2023, at 7:18 AM PT, the Inkling service which processes analytics events came under a spike of abnormally heavy load. Inkling engineers responded by forcing the analytics application to temporarily store these events on the server instead of processing them immediately. This allowed the service to recover. By 9:45 AM PT, the analytics service was operating normally.
The events which were collected on the servers have been replayed; they only had delayed delivery. During the approximately two and a half hours of this incident, the platform remained functional, although users may have experienced slow behavior for short, intermittent periods.
To prevent this from happening in the future, additional capacity was added to the service. Engineers are also working on moving the processing of these events to a more robust system already in use by other platform components, which is designed to avoid this kind of load issue entirely.