API

99.96% uptime
Jul 2022 · 100.0%Aug · 99.89%Sep · 100.0%
Jul 2022100.0% uptime
Aug 202299.89% uptime
Sep 2022100.0% uptime
Axis Web Reader
100.0% uptime
Jul 2022 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2022100.0% uptime
Aug 2022100.0% uptime
Sep 2022100.0% uptime
Inkling Web Reader
99.93% uptime
Jul 2022 · 100.0%Aug · 100.0%Sep · 99.80%
Jul 2022100.0% uptime
Aug 2022100.0% uptime
Sep 202299.80% uptime

Habitat

99.88% uptime
Jul 2022 · 100.0%Aug · 100.0%Sep · 99.65%
Jul 2022100.0% uptime
Aug 2022100.0% uptime
Sep 202299.65% uptime

InkForms

100.0% uptime
Jul 2022 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2022100.0% uptime
Aug 2022100.0% uptime
Sep 2022100.0% uptime

Learning Pathways

100.0% uptime
Jul 2022 · 100.0%Aug · 100.0%Sep · 100.0%
Jul 2022100.0% uptime
Aug 2022100.0% uptime
Sep 2022100.0% uptime

Notice history

Sep 2022

Habitat Performance Degradation
  • Resolved
    Update

    Around 9:03 AM PT on Tuesday, September, 27, 2022, Inkling experienced an increased amount of traffic to our site. That increased traffic exceeded certain operating system limits of our external-facing load balancers, causing them to reject incoming requests despite the servers not being overloaded. The traffic increase was a result of increased customer usage of the Inkling platform. The load balancer outage prevented external traffic from entering the application environment, causing a site outage. During the outage, traffic would fail, the connection counts would drop into acceptable levels, and then quickly fill up again, which meant services were intermittently and briefly available before going offline again. Inkling engineers diagnosed this problem and set about increasing the limits. The issue was compounded by the fact that our Analytics platform receives all traffic through these same load balancers. This system is designed to attempt to reprocess analytics events until they succeed, which caused a backlog of these events trying to work their way through the system. These constant retries further increased the load and delayed our ability to resolve the problem. Inkling personnel drained these events slowly to relieve that pressure, which resulted in bringing the site back up. In order to restore system operations in a timely manner and minimize customer impact, the Inkling team had to make some difficult choices which caused a very limited amount of data loss \(see details below\). Altogether, these issues resulted in 1 hour, 13 minutes of intermittent availability of Inkling services. As part of the issue resolution, Inkling personnel have increased the capacity of the load balancer servers to allow for increased traffic and configured the operating system's network stack accordingly. Additionally, to prevent recurrence of similar issues in the future, Inkling deployed new monitoring and metric visualizations to ensure the team is alerted well in advance of the system approaching operating system limits. Following the event, Inkling personnel reviewed additional operating system parameters related to connection capacity across all load balancers and made any necessary changes to allow for utilization growth, upgraded the servers configurations with a larger capacity and added monitoring alerts. As part of the root cause analysis process, Inkling has created additional development tasks that will reduce the compounded effect of retry logic, and will eliminate the possibility of data loss. Details about events lost as a result of the outage: * The vast majority of lost events are utilization metrics \(e.g., page views or click throughs\) emitted during the outage period. Customers should be aware that utilization metrics for the outage & recovery period \(between 9:01AM Pacific time and 12:41PM Pacific time\) are incomplete. Unfortunately, despite our best efforts, these events are not recoverable. * For Learning Pathways users, no data was lost. A very small number of events \(less than 10 across all customers\) has not been recorded into the analytics database; the Inkling engineering team had already replayed these events on Thursday, 9/29/2022. At this time, all Learning Pathways data was synchronized across systems. * A subset of customers experienced a loss of assessment results events \(submitted using the Assessment Widget\)--these customers will be contacted directly by Inkling support staff with regards to the recovery timeline.

  • Resolved
    Resolved

    This incident has been resolved.

  • Identified
    Identified

    The issue has been identified and we are working towards a fix.

  • Investigating
    Investigating

    Inkling has received reports of Habitat performance degradation and our team is currently investigating.

Inkling Web Reader Performance Degradation
  • Resolved
    Update

    At 7:36 AM PT on September 09, 2022, a large burst of roughly 45,000 tasks were emitted by an internal system and entered the API task queue. The queueing system itself was not capable of holding those tasks in memory and became unresponsive. At first, Inkling tried offloading those events to a "dead" queue where no other service components would attempt to operate on them. This had the effect of restoring service to certain parts of the system, but all components which required working with the task queue continued to fail. This was because the queueing system was still holding that large collection of tasks in memory, and it was still failing to interface with other parts of the system. It was decided that these items needed to be completely purged to restore operation. This was done, and service improved yet again. Engineering then monitored systems and restarted services which were shown to not be 100% functional. Our systems show a total of 13 minutes and 58 seconds of downtime. Investigation continues into the source of these events and how the task queueing system can be improved so as not to get overloaded by this rare high event count.

  • Resolved
    Resolved

    This incident has been resolved.

  • Monitoring
    Monitoring

    The issue is resolved and Inkling engineers are continuing to monitor this closely.

Jul 2022 to Sep 2022