Inkling - Investigating Incident – Incident details

Investigating Incident

Resolved
Partial outage
Started 8 months agoLasted 39 minutes

Affected

Axis Web Reader

Partial outage from 5:05 PM to 5:44 PM

Inkling Web Reader

Partial outage from 5:05 PM to 5:44 PM

Habitat

Partial outage from 5:05 PM to 5:44 PM

Updates
  • Resolved
    Update

    At approximately 3:00 AM PT on February 1, 2024, an AWS-automated process intended to maintain data redundancy while replacing defective hardware caused large amounts of data to be placed on a single server, resulting in its storage volume reaching capacity. This prevented new user and attribute data from being recorded. Inkling Engineering increased the storage capacity of these servers, which resolved the issue of saving incoming changes to this data. All of the jobs which had failed during the incident were re-run to restore the proper data state and ensure proper distribution of Inkdocs. All told, the issue persisted for approximately 5 hours and 36 minutes.


    Monitoring surrounding this issue did trigger alerts, but they were set to a low priority which prevented them from alerting the on-call engineer. Engineering has added severity to certain of these alarms to improve response time in the future. Inkling is also investigating multiple approaches to managing the size of this data, which will make this sort of routine automation operation more efficient.

  • Resolved
    Resolved

    The incident has been resolved

  • Monitoring
    Monitoring

    We have resolved the issue and are monitoring.

  • Identified
    Identified

    We have identified the problem and are working towards resolution.

  • Investigating
    Investigating

    We are currently investigating issues with Inkdoc assignment and new users appearing in the People tab in Habitat.