So I decided to take a deep dive into eVPN, I’ll mostly be looking into VLAN-aware bundling, as per RFC 7432 – and mostly because I think this will fit more closely, with the types of deployments most of the customers are used to – good old IRB interfaces and bridge-tables!
As everyone knows, VPLS has been available for many years now and it’s pretty widely deployed, most of the customers I see have some flavour of VPLS configured on their networks and use it to good effect – so why eVPN? what’s the point in introducing a new technology if the current one appears to work fine.
The reality is that multipoint layer-2 VPNs (VPLS) were never quite as polished as layer-3 VPNs, when layer-3 VPNs were first invented they became, and still are the in many cases the “go to” technology for layer-3 connectivity across MPLS networks, and the technology itself hasn’t really changed that much for well over a decade. The same cannot be said for VPLS, over the years we’ve had many different iterations of the technology:
- Vanilla VPLS
- LDP signalled
- BGP signalled
- H-VPLS (hierarchical VPLS)
- BGP based
- LDP based
- VPLS auto-discovery
Along with the different types of VPLS, the technology itself has been repeatedly modified with hacks and patches, in order to get around some annoyingly simple problems, for example:
- VPLS auto-discovery is only supported under BGP signalling – you can’t do it if you’re using LDP signalled VPLS,
- H-VPLS – in order to get around the fully meshed psedudowire problem of vanilla VPLS, H-VPLS introduced a hierarchy, in order to cut down on the amount of pseduowires in large networks, unfortunately the design often ends up being cumbersome and complicated.
- mac-address learning – VPLS has no layer-2 control plane, it learns mac-addresses directly from the data-plane like a standard switch – which is fine if it’s taking place inside a single device, but across a large distributed network with many thousands of mac-addresses, a loss of any attachment circuit can result in stale forwarding state and slow convergence/recovery
- all-active CE-Multihoming – simply can’t do it in VPLS, single-homed only, which is a major pain for large-scale modern data centres with lots and lots of layer-2 connectivity
- Layer-3 integration – With VPLS it’s typical to use a BVI or IRB interface as the layer-3 gateway to a VLAN, however there’s no real integration between the layer-2 and layer-3 world, you still need VRRP for first hop redundancy – which comes with all the pain you’d expect (traffic black holding, complex tracking requirements, interface timers, etc)
The topology I’m going to use for this is shown below:
A few basic points about the network:
- The 3x “P” routers in the core of the network are Juniper M10i series, running nothing other than ISIS/LDP/MPLS
- The 3x “PE” routers, are Juniper MX5 – each with 14.1.R6.4 loaded on, connectivity is via a 20x1G MIC
- The 3x “EX4200” switches are doing nothing other than trunking VLAN 100 towards each MX-5
- Each IXIA port has a single host on VLAN 100
The first lab will look at eVPN with basic MPLS transport – this is essentially a replacement for vanilla VPLS, we have three sites each with a single switch – all in Vlan 100 on a common /24 subnet, nothing fancy going on, no layer-3 routing or bridging anywhere, this is all strictly layer-2 for now.
The first thing to note about eVPN is that the core of it is built around a BGP control-plane, no LDP or anything else, it’s BGP only which is great because we all love BGP, the first thing is to enable the evpn address family, (AFI 25 for L2VPN and the new of SAFI 70 evpn)
(Output taken from MX5-1, but identical on all 3 PEs, <except for IP addressing obviously>)
This essentially enables the evpn signalling which is essential, unlike VPLS there’s no manual provisioning of pseudowires, because there are no pseudowires, just like L3 VPNs everything is handled via BGP and uses the same route-distinguishers and route-targets that we’ve all come to love.
The configuration for this lab is pretty much identical across all three PEs but we’ll look at MX5-1 for this example, first the LAN facing interface:
Followed by the evpn routing-instance:
A few things to note about the routing-instance:
- Lines 4 and 5 mark the “RD” and “RT” which essentially the same as a standard L3VPN setup
- The routing-instance is of type “virtual-switch” and the bridge-domain sits inside it,
- This is essentially is configured the same as a VPLS virtual-switch, except with a different protocol.
Before we send any traffic or try to get any connectivity, lets take a look at the basic control-plane and exactly what sort of things BGP is getting up to, whilst things are simple.
You’ll notice that before we’ve sent any traffic or done anything, that we have two types of table under each established BGP peer:
- “bgp.evpn.0” for the core-facing BGP adjacency, (the same as regular L3VPN)
- “EVPN-100.evpn.0” for the routing-instance table, (again the same as regular L3VPN)
You’ll also notice that we’re receiving 1 route from each PE, for each table, if we investigate further and take a look:
Because everyone reading this has eyes like hawks 😉 you’ll immediately notice the strange looking /304 routes coming from each adjacent PE, let’s examine the first one:
The format is essentially: 3 : <RD> :: <VLAN-ID> :: <ROUTER-ID> /304
It also contains the “ROUTER-ID-LENGTH” which is obviously /32 however Juniper hides this from the output. It should be obvious to most people what all these values are, except for the “3” what does that mean?
It’s important to note, that evpn defines a set of route-route types as shown below:
- Type 1 – Ethernet auto-discovery route
- Type 2 – MAC/IP advertisement route
- Type 3 – Inclusive multicast Ethernet tag route
- Type 4 – Ethernet segment (ES) route
- Type 5 – IP prefix route
Type 3 routes are for signalling the inclusive tunnel, with VLAN-Aware evpn each PE generates a VLAN specific inclusive tunnel which is used for BUM (broadcast unknown multicast) traffic. Basically – it’s used to send BUM traffic to all PEs that have sites in the same VLAN, lets look at it in even more detail:
Line 6 shows the route-type as PMSI (provider multicast service interface) and is type “ingress-replication” one important thing to note – label 300512 is a downstream allocated label, the same as what’s commonly used in P2MP LSPs for multicast services. Essentially, in this case MX5-1 uses the remotely learnt service label to send BUM traffic to the remote PEs – OR, the other way round, it expects to receive BUM traffic from other remote PEs, tagged with IR label 300512.
Moving on – for people new to evpn, one of the coolest concepts is the way in which BGP is used to advertise mac-addresses… rather than plain old IP subnets – this is fantastic because we now have an intelligent control-plane maintained across the whole network in a scalable and stable fashion, rather than having to rely on less reliable data-plane learning.
For the first basic test, we’ll send bi-directional traffic between host connected to EX4200-1 on MX5-1 and the host connected to EX4200-2 on MX5-2
Lets recap the diagram and spin up some hosts:
We’ll start with a single host at each site, and send traffic both ways, 1Mbps each way for a total of 2Mbps, (the hosts are in the same /24 VLAN100 – 192.168.100.1 and 192.168.100.2)
Traffic is being forwarded end to end, lets check the routing and see how the control-plane has changed:
The type-3 routes are still present as before for the inclusive tunnels, but you’ll notice the addition of the new type-2 MAC/IP route, this is essentially a BGP NLRI containing a mac-address instead of an IP subnet – pretty cool huh?
The indirect route is the one learnt locally from the connected LAN, the one known via BGP/170 is the one from the remote PE, packets destined for that mac-address have label 299936 pushed on them, and are forwarded directly out of the MPLS facing core interface, like any regular MPLS packet.
Lets take a more detailed look at a type-2 route:
A basic recap on MPLS forwarding, for the above route MX5-1 is notifying all other PEs in the network, that if they receive a frame on an interface inside “EVPN-100” on VLAN 100 for destination MAC-address 00:00:0e:52:42:29, impose MPLS label 300048 and send it my way.
Another new aspect of evpn can be seen under the “ESI” field, “ESI” stands for “Ethernet segment identifier” essentially it’s a way of labelling individual Ethernet segments, but it’s only used for all-active multihomed designs, any other design it should remain the default of 0x0 (more on ESIs in the next blog)
To demonstrate the control-plane learning and MAC/IP advertisement mechanism more effectively, lets spin up all 3 sites with 50 hosts per site – then send a full mesh of traffic (150 streams in total) and see what the control-plane looks like,
Quick recap of the diagram showing all 3 sites, with 50 hosts per site:
Plenty of juicy MAC/IP routes!
Lots of MAC/IP routes 🙂
A quick look at the BGP table:
So yeah – it basically goes on and on,
Incidentally, what we gain in using more of the networks resources – we lose in scalability because you cannot get something for nothing. We all know that TCAM, forwarding-tables and BGP tables are limiting factors on even the largest routers, with evpn a very large amount of information is loaded into BGP (every single mac-address on the network) and because each mac-address is totally non-contiguous (different blocks for different vendor nics) they can’t be aggregated or summarised in any way.
If you had a data centre with 500k servers, you’d have 500k MAC/IP advertisements, which is a pretty large burden on the control-plane, in my own time I did some comparisons with tens of thousands of hosts on MX480 routers, with RE1800x4’s and high-end MPCs, and the results were not pretty on a very large network (more than 100k hosts) the control-plane learning was very laggy, and RE’s tended to suffer from very high CPU during the learning process, or if a failover occurred.
The evolution onwards from this is PBB-EVPN (provider backbone bridging EVPN) which essentially allows large numbers of hosts to be represented by a single mac-address, which enables absolutely enormous scalability (millions of hosts per site), at the expense of some feature loss – PBB-EVPNs will be the topic for another blog, where I can hopefully use IXIA to show hundreds of thousands of hosts connected!
Hope you found this useful, (if anyone even read it! 😀 )