Contact Us

EZRADIUS Resilience Update

Reliable Cloud RADIUS for Entra ID
05 Feb 2025

What was our Resiliency Plan Before the Incident?

Before this incident, we had taken into account that an outage could happen. Due to the sensitive nature of the service we provide, we have a multi-availability zone setup in all regions where we have customers. This means that if one availability zone goes down, the service will still be available in the other availability zones.

In addition to this, we have a caching system, which not only helps with the performance of the service but also mitigates the impact of an outage for our database provider or even Entra ID/Intune. Our services run with automated runners that check the health of the service every few seconds. If the service is not healthy, the runner will attempt to restart it (or, depending on the logs, take other actions known to restore the service). If the service remains unhealthy, the runner will send an alert to our on-call team.


What Happened on the 4th of February 2025?

On the 4th of February 2025, some customers experienced issues connecting to the EZRADIUS service. We first received reports from a customer that they were unable to connect to the service from some of their offices, while other offices were working fine.

Upon investigation, we confirmed that the service was up, running, and healthy. While on a call with that customer, we discovered that DownDetector was showing widespread reports of outages from different providers and cloud services. Further packet tracing from different locations revealed that there was an issue with packets reaching our service.

Overall, our analysis estimates that two-thirds of our U.S. customers were affected by this issue, along with about half of our Australian customers. Below, you can see the drastic drop in the number of requests to the service during the incident (in the graph, the red area represents accepted authentication requests, and the blue represents dropped authentication requests). As shown, there was a significant drop in packets between 4:30 PM and 5:45 PM EST before the service returned to normal levels.


EZRADIUS Authentication Requests during the incident

What Did We Learn from the Incident?

There were multiple key learnings from this incident:



What are We Doing to Prevent This From Happening Again?

Now that we have experienced this, we must learn from it and take steps to prevent it from happening again. Here are some of the measures we are implementing:

Closing Thoughts

The incident on the 4th of February 2025 was a wake-up call for us. We have learned valuable lessons and are taking steps to prevent a similar occurrence in the future.

We apologize for any inconvenience this may have caused and remain committed to providing the best possible service. If you have any questions or concerns, please do not hesitate to reach out to us.

Thank you for your understanding and continued support.


You Might Also Want to Read