Increased API error rate and delayed processing for a subset of customers

Incident Report for Iterable

Resolved

The incident has been resolved with data processing for cluster 13 caught up. We have continued to see no further 5xx API errors resulting from the incident. If you have any questions please contact your account manager or email suupport@iterable.com

Posted Dec 27, 2021 - 16:03 PST

Monitoring

API has fully recovered and impacted customers on Cluster 13 should no longer experience 5xx errors related to this incident. Event processing is still recovering as we work through the backlog of delayed data that was impacted during the downtime so customers on c13 may still see experience delays. We will continue to monitor the platform for continued improvement and will resolve the incident once our data processing is fully caught up.

Posted Dec 27, 2021 - 14:05 PST

Identified

We have identified the underlying cause and the mitigations put in place by our engineering team have resulted in the cluster health improving. Customers on the impacted cluster 13 are seeing event processing improve and the amount of 5xx API errors continue to drop. Please note that customers will need to retry 5xx API failures. We will continue to monitor the cluster as it makes progress to a full recovery.

Posted Dec 27, 2021 - 13:09 PST

Investigating

Currently a subset of customers on cluster 13 are seeing an increase in API 5xx errors and data processing delays throughout the Iterable platform. Any data already enqueued will not be dropped, only delayed. Failed API calls (5xx) will need to be retried. Our engineering team is currently investigating the root cause and also taking steps to mitigate the API failures. To Identify what cluster you are on please refer to your project settings page within the Iterable web app.

Posted Dec 27, 2021 - 12:41 PST

This incident affected: Global API Success and Cluster 13 (Journey Processing, User Updates, List Uploads, User Deletions).