Building Resilient ExpressRoute Connectivity for Business Continuity and Disaster Recovery
As more and more organizations adopt Azure for their business-critical workloads, the connectivity between organizations’ on-premises networks and Microsoft becomes crucial. ExpressRoute provides the private connectivity between on-premises networks and Microsoft. By default, an ExpressRoute circuit provides redundant network connections to Microsoft backbone network and is designed for carrier grade high availability. However, the high availability of a network connectivity is as good as the robustness of the weakest link in its end-to-end path. Therefore, it is imperative that the customer and the service provider segments of ExpressRoute connectivity are also architected for high availability.
Designing for high availability with ExpressRoute addresses these design considerations and talks about how to architect a robust end-to-end ExpressRoute connectivity between a customer on-premises network and Microsoft network core. The document addresses how to maximize high availability of an ExpressRoute in general, as well as components specific to Private peering and to Microsoft peering.
Private Peering High Availability
Each component of the ExpressRoute connectivity is key to build for high availability, including the first mile from on-premises to peering location, from multiple circuits to the same virtual network (VNet), and the virtual network gateway within the VNet.
To improve the availability of ExpressRoute virtual network gateway, Azure offers Zone-redundant virtual network gateways utilizing Availability Zones. ExpressRoute also supports Bidirectional Forwarding Detection (BFD) to expedite link failure detection and thereby significantly improving Mean Time To Recover (MTTR) following a link failure.
Microsoft Peering High Availability
Further, where and how you implement Network Address Translation (NAT) impacts MTTR of Microsoft PaaS services (including O365) consumed over Microsoft Peering following a connection failure. Path selection between the Internet and ExpressRoute on Microsoft Peering is also imperative to ensure a highly reliable and scalable architecture.
ExpressRoute Disaster Recovery Strategy
How about architecting ExpressRoute connectivity for disaster recovery and business continuity? Would it be possible to optimize ExpressRoute circuits in different regions both for local connectivity and to act as a backup for another regional ExpressRoute failure? In the following architecture, how do you ensure symmetrical cross-regional traffic flow either via Microsoft backbone or via the organization’s global connectivity (outside Microsoft)? Designing for disaster recovery with ExpressRoute private peering addresses these concerns and talks about how to architect for disaster recovery using ExpressRoute private peering.
To build a robust ExpressRoute circuit, end-to-end ExpressRoute connectivity should be architected for high availability that maximizes redundancy and minimizes MTTR following a failure. A robust ExpressRoute circuit can withstand many single-point failures. However, to safeguard against disasters that impact an entire peering location, your disaster recovery plans should include geo-redundant ExpressRoute circuits. Failing over to geo-redundant ExpressRoute circuits face challenges including asymmetrical routing. The following documents help you architect highly available ExpressRoute circuit and design for disaster recovery using geo-redundant ExpressRoute circuits.
- Designing for high availability with ExpressRoute
- Designing for disaster recovery with ExpressRoute private peering
- What are Availability Zones in Azure?
- About zone-redundant virtual network gateways in Azure Availability Zones
- Path selection between the Internet and ExpressRoute
- Configure BFD over ExpressRoute