Increased API error rate and delayed processing for a subset of customers
Incident Report for Iterable
Resolved
The incident has been resolved with data processing for cluster 13 caught up. We have continued to see no further 5xx API errors resulting from the incident. If you have any questions please contact your account manager or email suupport@iterable.com
Posted Dec 27, 2021 - 16:03 PST
Monitoring
API has fully recovered and impacted customers on Cluster 13 should no longer experience 5xx errors related to this incident. Event processing is still recovering as we work through the backlog of delayed data that was impacted during the downtime so customers on c13 may still see experience delays. We will continue to monitor the platform for continued improvement and will resolve the incident once our data processing is fully caught up.
Posted Dec 27, 2021 - 14:05 PST
Identified
We have identified the underlying cause and the mitigations put in place by our engineering team have resulted in the cluster health improving. Customers on the impacted cluster 13 are seeing event processing improve and the amount of 5xx API errors continue to drop. Please note that customers will need to retry 5xx API failures. We will continue to monitor the cluster as it makes progress to a full recovery.
Posted Dec 27, 2021 - 13:09 PST
Investigating
Currently a subset of customers on cluster 13 are seeing an increase in API 5xx errors and data processing delays throughout the Iterable platform. Any data already enqueued will not be dropped, only delayed. Failed API calls (5xx) will need to be retried. Our engineering team is currently investigating the root cause and also taking steps to mitigate the API failures. To Identify what cluster you are on please refer to your project settings page within the Iterable web app.
Posted Dec 27, 2021 - 12:41 PST
This incident affected: Cluster 13 (Workflow Processing, User Updates, List Uploads, User Deletions) and Global API.