Routing Engine Performance Issues

Resolved

We are pleased to inform that all the issues have been resolved.

11:30 UTC - 23 January 2020

After isolating the problematic traffic we were able to restore stability to our network. We have made some performance enhancements and will be testing these and then re-merging this traffic with our core platform this week coming week.

As of now, our platform is stable, we will keep this incident open until all of our changes have been completed.

11:12 UTC - 19 January 2020

Update: We were able to identify a key traffic profile which was responsible for causing the problems and have now isolated this onto its own separate infrastructure so we can analyse it in further detail to identify the underlying problem.

We are not anticipating any problems today and if we do we will be quickly able to remove the traffic. This incident will remain open until we have identified and fixed the problematic traffic and reintegrated it into our core platform.

10:33 UTC - 16 January 2020

We are very sorry and disappointed to report that we have a recurrence of the routing engine problem.

We are continuing to investigate this to try and identify the root cause. Please be assured that we are throwing all of our resources at this incident.

15:19 UTC - 15 January 2020

We have made progress in deploying additional infrastructure to handle the additional load. We have 2 new routing engines, 2 more capture servers and an additional database server.

Analyzing the data on this incident that we have, it seems like the highest risk time on our system is about 14:00 UTC - 17:00 UTC where we have a high load on our system due to running both EU and USA traffic.

Note: These should be running in separate zones, so should be independent of each other. We are continuing to investigate any commonalities between the two.

Our systems are running stable since 17:30 UTC yesterday and still stable now, but we are still considering our systems "at risk" until we can confirm that the changes that we have made have addressed the problem.

We will continue to update you, thank you for your understanding.

09:12 UTC - 15 January 2020

We have identified the problem is related to a database bottleneck.

We have redistributed the load across additional secondary servers to help balance the load and we will be deploying additional secondary servers to ensure that we have enough capacity moving forwards.

Further updates will be issued as we progress in permanently resolving this incident.

We apologise for the inconvenience caused.

17:07 UTC - 14 January 2020

Ongoing

We are experiencing load issues on our call platform. We are working as fast as possible to rectify the situation. Sorry for the inconvenience caused.

16:14 UTC - 14 January 2020

Find Your Subscription

Subscribe to Status Updates