Cisco Datacenter: Inter-VXLAN Routing Design

Today I am going to talk about the one of the datacenter topic: Inter-VXLAN routing design. 

As with traditional VLAN environment, routing between VXLAN segments or from VXLAN to VLAN segments is required in many situations. Because the current Cisco NX-OS releases (Release 6.1(2)I2(3) and earlier) don’t support VXLAN routing, specific designs need to be applied to achieve this network function.
Inter-VXLAN Routing Design Option A: Routing Block Design
Figure below depicts a VXLAN routing solution by adding a routing block to the Layer 3 pod network. The routing block has a router-on-a-stick design consisting of a VTEP or a pair of vPC VTEPs to terminate VXLAN tunnels, and one or a pair of routers that serve as the IP gateway for the VXLAN-extended VLANs and perform routing functions for these VLANs

Fig 1.1- InterVXLAN Routing
For Layer 2 traffic within a VXLAN VNI, the traffic will go directly between the local VTEP and the remote VTEPs. For Layer 3 routed traffic between VXLAN VNIs, the traffic will first reach the IP gateway of the source VXLAN VLAN IP subnet that is on the routers in the routing block and will be routed to the destination VXLAN VLAN IP subnet by the gateway router. 

The gateway router will then forward the packets back to the VTEP in the routing block for encapsulation in the destination VXLAN and forwarding toward the destination host. The logical traffic flow is shown in Figure below:-

Fig 1.2
Routing Block Configuration
The routing block in the recommended design for VXLAN routing consists of a physical VTEP or vPC VTEP pair that converts VXLAN VNIs back to VLANs, and a router or a pair of routers that functions as the IP gateway for the VLAN IP subnets and routes between VLAN IP subnets. For device redundancy, redundant VTEP devices, such as a pair of Cisco Nexus 9300 as vPC VTEPs and a pair of routers running a first-hop redundancy protocol such as Hot Standby Router Protocol (HSRP), are recommended.

Figure above shows a sample VXLAN routing block that is designed with two pairs of Cisco Nexus 9300 platform switches. One pair of Cisco Nexus 9300 platform switches functions as a vPC VTEP that maps between the VXLAN and VLAN. 

The second pair is an IP gateway for the VXLAN extended VLANs. There is a double-sized vPC between the two pairs of switches for Layer 2 connectivity. A separate set of Layer 3 links can be installed for routing between the VXLAN VLAN to non-VXLAN VLANs or an IP network. The relevant configuration of the devices in the routing block is provided

Note: Because of a known software issue, the peer links of the vPC VTEPs and the Layer 2 links to the routers in the routing block can’t be on the 40 Gigabit Ethernet links of Cisco Nexus 9300 platform switches before Cisco NX-OS Release 6.1(2)I2(2a). This problem is fixed in Cisco NX-OS Release 6.1(2)I2(2a).
Inter-VXLAN Routing Design Option B: VTEP-on-a-Stick Design

One alternative design for inter-VXLAN routing is shown in Figure below. It has a VTEP-on-a-stick design, in which one or a pair of Cisco Nexus 9300 VTEPs is connected to the aggregation switches through a Layer 2 link and a Layer 3 link. 

The Layer 3 links are used to establish VXLAN tunnels with the in-rack VTEP access switches to extend the host VLANs across the Layer 3 network. The aggregation switches are configured with the host VLANs and switch virtual interfaces (SVIs) for their IP subnets. 

HSRP and Virtual Router Redundancy Protocol (VRRP) can be used to provide the first-hop redundancy with a Layer 2 link in place between the two aggregation switches. The Cisco Nexus 9300 VTEPs map the VXLAN VNIs back to VLANs and send the traffic over the Layer 2 links to the aggregation switches for inter-VLAN routing. 

After the packets are routed to the destination VLAN IP subnet, the aggregation switches will send the packets back to the Cisco Nexus 9300 VTEPs through the Layer 2 links for VXLAN encapsulation. The encapsulated packets will be forwarded to the destination rack through the underlay Layer 3 network. 

In this design, the added Cisco Nexus 9300 VTEPs extend the host VLAN segments and bring them onto the aggregation switches. The aggregation switches are the centralized IP gateway for the VXLAN-extended VLANs.

Fig 1.3
The VTEP-on-a-stick design keeps the IP gateway of the VXLAN extended VLANs on the aggregation switches, which preserves the IP gateway placement of the traditional Layer 2 data center pod. However, it may create blocks for migrating the network to a spine-and-leaf fabric architecture in the future. 

The routing block design, by contrast, makes it easier to transform the existing aggregation- and access-layer architecture into a true spine-and-leaf fabric, as shown in Figure below. This architecture truly enables Layer 2 adjacency across a routed (Layer 3) fabric

Fig 1.4
Currently Cisco Nexus 9300 platform switches support only VXLAN gateway and bridging functions. A planned future release of Cisco NX-OS will bring the VXLAN routing function to the Cisco Nexus 9300 platform, which will greatly simplify the network design for inter-VXLAN routing.

In addition, Cisco is working on a BGP EVPN control plane for VXLAN  The current multicast-based VXLAN lacks a control plane and has to rely on flooding and learning to build the Layer 2 forwarding information base in the overlay network. 

Multicast in the underlay network is used to support the overlay flood-and-learn behavior. The Cisco BGP EVPN control plane is standards based and does not depend on any fabric controllers. It will offer the following main benefits:
 Eliminate or reduce flooding in the data center
 Achieve optimal handling of multiple-destination traffic (broadcast, unknown unicast, and multicast) on overlay networks
 Provide reliable and quick address resolution and updates for hosts in VXLAN VNIs: essential to support workload mobility in the data center
 Provide a distributed anycast IP gateway for VXLAN overlay networks, enabling optimal VXLAN traffic routing across the Layer 3 network

VXLAN is a network virtualization technology. It uses MAC-in-UDP tunneling to build Layer 2 overlay networks across a Layer 3 infrastructure. This approach decouples the tenant network view from the shared common infrastructure, allowing organizations to build a scalable and reliable Layer 3 data center network while maintaining direct Layer 2 adjacency in the overlay network.