Inkling Web Reader Performance Degradation

Resolved
Major outage
Started about 1 year ago Lasted about 1 hour

Affected

Inkling Web Reader
Updates
  • Resolved
    Update

    On April 20, 2023, Inkling services experienced an outage starting at 06:52 AM PT which lasted approximately 43 minutes. This is the root cause analysis: A change to the database schema for Inkling's central authentication service was applied overnight, as a first step in the delivery of a new version of the software.  Before that new software could be deployed, an unrelated transient issue appeared related to events emitted by the service. As a result, the authentication service suffered an outage beginning at 06:52 AM PT. Engineering immediately responded by restarting the affected service, which ordinarily would have resolved the issue. However, the service detected the inconsistency with the database schema and attempted to self-heal. A security feature preventing database tampering caused it to enter an infinite retry loop. Upon investigation of the system logs, Inkling personnel identified the root cause, rolled back the schema change to allow the software to initialize normally, and once again restarted services. This restored service to the platform after approximately 43 minutes of unavailability. To prevent a recurrence of similar issues, Inkling has identified several changes that are being implemented: * Add monitoring to detect services in rapid boot cycles, which indicate problems that may not be reported through our standard telemetry. * Improve the logging of the authentication service  * Automate / streamline the process around multi-step software changes & hand-off between the development and DevOps teams.

  • Resolved
    Resolved

    This incident has been resolved.

  • Monitoring
    Monitoring

    The issue is resolved and Inkling engineers are continuing to monitor this closely.