So this is the fourth blog on EVPN, the previous blogs covered the following topics:
- EVPN basics, route-types and basic L2 forwarding
- EVPN IRB and Inter-VLAN routing
- EVPN single-active multi-homing
This post will cover the ability of EVPN to provide all-active multi-homing for layer-2 traffic, where the topology contains two different active PE routers, connecting to a switch via a LAG, the setup is similar to the previous labs. Due to some restrictions and in the interests of simplicity, this lab will cover all-active multi-homing for a single VLAN only, (VLAN 100 in this case) consider the network topology:
The topology and general connectivity is the same as the other previous examples, the two big differences are that only VLAN 100 is present here and the connectivity between MX-1 and MX-2 is now using MC-LAG.
The first consideration that needs to be made when running EVPN in all-active mode, is that it must connect to the upstream devices using some sort of LAG, or MC-LAG – consider the wording from the RFC 7432:
https://tools.ietf.org/html/rfc7432#section-14.1.2
“If a bridged network is multihomed to more than one PE in an EVPN network via switches, then the support of All-Active redundancy mode requires the bridged network to be connected to two or more PEs using a LAG.”
Essentially, this boils down to some basic facts around how switches work – you can’t have two different PE routers with active access-interfaces configured with the same mac-address, spanning two different control-planes, for the simple reason that you’ll create a duplicate mac-address in the layer-2 network, which will cause a nightmare.
Consider the below scenario:
I tried this in a lab before I read the RFC, and discovered that EX4200-1 floods egress traffic to MX-1 and MX-2, resulting in lots of traffic duplication and flooding, simply because each time a packet lands on ge-0/0/0 or ge-0/0/1 from MX-1 or MX-2 with mac-address “X” the switch has to update it’s CAM table, so essentially the whole thing is broken – which explains the wording of the RFC in relation to all-active mode.
With Juniper the way to get around this problem is simply to convert the Ethernet interfaces connecting to EX4200-1 to a basic MC-LAG configuration, we don’t need to configure ICCP or any serious multi-chassis configuration – we just need to make sure the LACP system-id is identical on MX-1 and MX-2, so that the EX4200 think’s it’s connected to a single downstream device,
Lets check the LAG configuration on MX-1 and MX-2;
MX-1
-
tim@MX5-1> show configuration interfaces ae0
-
description “MCLAG to EX4500-1”;
-
flexible-vlan-tagging;
-
encapsulation flexible-ethernet-services;
-
esi {
-
00:11:22:33:44:55:66:77:88:99;
-
all-active;
-
}
-
aggregated-ether-options {
-
lacp {
-
system-id 00:00:00:00:00:01;
-
}
-
}
-
unit 100 {
-
encapsulation vlan-bridge;
-
vlan-id 100;
-
family bridge;
-
}
MX-2
-
tim@MX5-2> show configuration interfaces ae0
-
description “MCLAG to EX4500-1”;
-
flexible-vlan-tagging;
-
encapsulation flexible-ethernet-services;
-
esi {
-
00:11:22:33:44:55:66:77:88:99;
-
all-active;
-
}
-
aggregated-ether-options {
-
lacp {
-
system-id 00:00:00:00:00:01;
-
}
-
}
-
unit 100 {
-
encapsulation vlan-bridge;
-
vlan-id 100;
-
family bridge;
-
}
And finally on EX4200-1 we have a basic standard LAG configuration, with nothing fancy or sexy going on 🙂
EX4200-1
-
imtech@ex4200-1> show configuration interfaces ae0
-
aggregated-ether-options {
-
lacp {
-
active;
-
}
-
}
-
unit 0 {
-
family ethernet-switching {
-
port-mode trunk;
-
vlan {
-
members vlan-100;
-
}
-
}
-
}
-
{master:0}
-
imtech@ex4200-1>
From the perspective of the EX4200, it’s just a totally standard LAG with two interfaces running LACP, so long as we have EVPN all-active configured correctly on MX-1 and MX-2 everything is taken care of.
EX4200-1 verification:
-
imtech@ex4200-1> show lacp interfaces
-
Aggregated interface: ae0
-
LACP state: Role Exp Def Dist Col Syn Aggr Timeout Activity
-
ge-0/0/0 Actor No No Yes Yes Yes Yes Fast Active
-
ge-0/0/0 Partner No No Yes Yes Yes Yes Fast Passive
-
ge-0/0/1 Actor No No Yes Yes Yes Yes Fast Active
-
ge-0/0/1 Partner No No Yes Yes Yes Yes Fast Passive
-
LACP protocol: Receive State Transmit State Mux State
-
ge-0/0/0 Current Fast periodic Collecting distributing
-
ge-0/0/1 Current Fast periodic Collecting distributing
-
{master:0}
-
imtech@ex4200-1>
Aside from the fact we’ve converted the access Ethernet interfaces to MC-LAG on MX-1 and MX-2, lets check to see what’s changed with the EVPN configuration in order to get all-active EVPN working, first lets check MX-1:
-
tim@MX5-1> show configuration routing-instances
-
EVPN-100 {
-
instance-type virtual-switch;
-
route-distinguisher 1.1.1.1:100;
-
vrf-target target:100:100;
-
protocols {
-
evpn {
-
extended-vlan-list 100;
-
default-gateway do-not-advertise;
-
}
-
}
-
bridge-domains {
-
VL-100 {
-
vlan-id 100;
-
interface ae0.100;
-
routing-interface irb.100;
-
}
-
}
-
}
-
VPN-100 {
-
instance-type vrf;
-
interface irb.100;
-
route-distinguisher 100.100.100.1:100;
-
vrf-target target:1:100;
-
vrf-table-label;
-
}
-
tim@MX5-1>
The configuration is absolutely identical on MX-2, you’ll notice that the only thing which has changed on MX-1, is that the physical interface of ge-1/1/5 has changed to the new LAG interface of ae0.100 for VLAN 100, everything else is exactly the same as the previous single-active example from last week, lets take a closer look at the interface on MX-1
-
tim@MX5-1> show configuration interfaces ae0
-
description “MCLAG to EX4500-1”;
-
flexible-vlan-tagging;
-
encapsulation flexible-ethernet-services;
-
esi {
-
00:11:22:33:44:55:66:77:88:99;
-
all-active;
-
}
-
aggregated-ether-options {
-
lacp {
-
system-id 00:00:00:00:00:01;
-
}
-
}
-
unit 100 {
-
encapsulation vlan-bridge;
-
vlan-id 100;
-
family bridge;
-
}
It’s clear to see that under the interface ESI configuration, we’re changed the ESI mode from single-active, to “all-active” which again should be self explanatory to most readers 🙂 and again note, that this configuration is 100% identical on both Mx-1 and MX-2,
Lets check the EVPN instance and see what’s changed since the single-active example:
-
tim@MX5-1> show evpn instance extensive
-
Instance: EVPN-100
-
Route Distinguisher: 1.1.1.1:100
-
Per-instance MAC route label: 299776
-
MAC database status Local Remote
-
Total MAC addresses: 13 96
-
Default gateway MAC addresses: 1 0
-
Number of local interfaces: 1 (1 up)
-
Interface name ESI Mode Status
-
ae0.100 00:11:22:33:44:55:66:77:88:99 all-active Up
-
Number of IRB interfaces: 1 (1 up)
-
Interface name VLAN ID Status L3 context
-
irb.100 100 Up VPN-100
-
Number of bridge domains: 1
-
VLAN ID Intfs / up Mode MAC sync IM route label
-
100 1 1 Extended Enabled 300432
-
Number of neighbors: 2
-
10.10.10.2
-
Received routes
-
MAC address advertisement: 49
-
MAC+IP address advertisement: 0
-
Inclusive multicast: 1
-
Ethernet auto-discovery: 2
-
10.10.10.3
-
Received routes
-
MAC address advertisement: 60
-
MAC+IP address advertisement: 0
-
Inclusive multicast: 1
-
Ethernet auto-discovery: 0
-
Number of ethernet segments: 1
-
ESI: 00:11:22:33:44:55:66:77:88:99
-
Status: Resolved by IFL ae0.100
-
Local interface: ae0.100, Status: Up/Forwarding
-
Number of remote PEs connected: 1
-
Remote PE MAC label Aliasing label Mode
-
10.10.10.2 300416 300416 all-active
-
Designated forwarder: 10.10.10.1
-
Backup forwarder: 10.10.10.2
-
Advertised MAC label: 300400
-
Advertised aliasing label: 300400
-
Advertised split horizon label: 300416
-
Instance: __default_evpn__
-
Route Distinguisher: 10.10.10.1:0
-
Number of bridge domains: 0
-
Number of neighbors: 1
-
10.10.10.2
-
Received routes
-
Ethernet Segment: 1
-
tim@MX5-1>
So we can see that MX-1 has changed from single-active to all-active, and is in the up/forwarding state,
Lets check MX-2 to see what it looks like:
-
tim@MX5-2> show evpn instance extensive
-
Instance: EVPN-100
-
Route Distinguisher: 1.1.1.2:100
-
Per-instance MAC route label: 299776
-
MAC database status Local Remote
-
Total MAC addresses: 47 64
-
Default gateway MAC addresses: 1 0
-
Number of local interfaces: 1 (1 up)
-
Interface name ESI Mode Status
-
ae0.100 00:11:22:33:44:55:66:77:88:99 all-active Up
-
Number of IRB interfaces: 1 (1 up)
-
Interface name VLAN ID Status L3 context
-
irb.100 100 Up VPN-100
-
Number of bridge domains: 1
-
VLAN ID Intfs / up Mode MAC sync IM route label
-
100 1 1 Extended Enabled 300528
-
Number of neighbors: 2
-
10.10.10.1
-
Received routes
-
MAC address advertisement: 14
-
MAC+IP address advertisement: 1
-
Inclusive multicast: 1
-
Ethernet auto-discovery: 2
-
10.10.10.3
-
Received routes
-
MAC address advertisement: 60
-
MAC+IP address advertisement: 0
-
Inclusive multicast: 1
-
Ethernet auto-discovery: 0
-
Number of ethernet segments: 1
-
ESI: 00:11:22:33:44:55:66:77:88:99
-
Status: Resolved by IFL ae0.100
-
Local interface: ae0.100, Status: Up/Forwarding
-
Number of remote PEs connected: 1
-
Remote PE MAC label Aliasing label Mode
-
10.10.10.1 300400 300400 all-active
-
Designated forwarder: 10.10.10.1
-
Backup forwarder: 10.10.10.2
-
Advertised MAC label: 300416
-
Advertised aliasing label: 300416
-
Advertised split horizon label: 300432
-
Instance: __default_evpn__
-
Route Distinguisher: 10.10.10.2:0
-
Number of bridge domains: 0
-
Number of neighbors: 1
-
10.10.10.1
-
Received routes
-
Ethernet Segment: 1
-
tim@MX5-2>
Excellent! both MX-1 and MX-2 are in the up/forwarding state for VLAN 100, meaning that in theory – they can both send and receive traffic received on their access LAG interface, and the MPLS side – you’ll also notice how simple it is to get working.
I currently have 50x IXIA hosts sat behind MX-1 and MX-2, and a further 50x hosts sat behind MX-3, 50Mbps of traffic is being sent bi-bidirectionally between each IXIA host, lets recap the diagram:
With an active-active configuration, traffic from multiple hosts at the top of the network, should be sent towards MX-1 and MX-2 by EX4200-1 according to it’s standard LAG hashing algorithm, (source/destination mac) because I have 100 hosts in total, there should be enough granularity at layer-2 to perform rough distribution of some traffic on MX-1 and some traffic on MX-2
Lets send the IXIA traffic:
Now lets look at the physical access interfaces on MX-1 and Mx-2 to see how the traffic is being handled:
Mx-1
tim@MX5-1> show configuration interfaces ge-1/1/5 gigether-options { 802.3ad ae0; } tim@MX5-1> show interfaces ae0 | match pps Input rate : 5404040 bps (484 pps) Output rate : 10384856 bps (929 pps)
So 5Mbps in and 10Mbps out on Mx-1
Lets check MX-2
tim@MX5-2> show configuration interfaces ge-1/0/5 gigether-options { 802.3ad ae0; } tim@MX5-2> show interfaces ae0 | match pps Input rate : 19535296 bps (1750 pps) Output rate : 14546816 bps (1302 pps)
So it seems to be working – MX-1 and MX-2 are both sending and receiving traffic in the same layer-2 broadcast domain,
Lets check their MPLS facing interfaces:
MX-1
tim@MX5-1> show isis adjacency Interface System L State Hold (secs) SNPA ge-1/1/0.0 m10i-1 2 Up 19 tim@MX5-1> show interfaces ge-1/1/0 | match pps Input rate : 10415216 bps (930 pps) Output rate : 5404040 bps (484 pps) tim@MX5-1>
MX-2
tim@MX5-2> show isis adjacency Interface System L State Hold (secs) SNPA ge-1/1/0.0 m10i-2 2 Up 24 tim@MX5-2> show interfaces ge-1/1/0 | match pps Input rate : 14583752 bps (1303 pps) Output rate : 19535576 bps (1751 pps) tim@MX5-2>
And so all seems right with the world, traffic from the MPLS network is being sent from MX-3 to both MX-1 and MX-2, lets look at the EVPN BGP control-plane on MX-3 to see what’s going on with all-active – we’ll take a look at a slice of the BGP table for brevity:
-
2:1.1.1.1:100::100::00:00:66:cf:82:df/304
-
*[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
-
AS path: I, validation-state: unverified
-
> to 192.169.100.15 via ge-1/1/0.0, Push 300944
-
2:1.1.1.1:100::100::00:00:66:cf:82:e1/304
-
*[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
-
AS path: I, validation-state: unverified
-
> to 192.169.100.15 via ge-1/1/0.0, Push 300944
-
2:1.1.1.1:100::100::00:00:66:cf:82:e3/304
-
*[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
-
AS path: I, validation-state: unverified
-
> to 192.169.100.15 via ge-1/1/0.0, Push 300944
-
2:1.1.1.1:100::100::00:00:66:d0:5d:f3/304
-
*[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
-
AS path: I, validation-state: unverified
-
> to 192.169.100.15 via ge-1/1/0.0, Push 300944
-
2:1.1.1.2:100::100::00:00:2e:18:6d:e1/304
-
*[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
-
AS path: I, validation-state: unverified
-
> to 192.169.100.15 via ge-1/1/0.0, Push 300960
-
2:1.1.1.2:100::100::00:00:2e:18:f3:c4/304
-
*[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
-
AS path: I, validation-state: unverified
-
> to 192.169.100.15 via ge-1/1/0.0, Push 300960
-
2:1.1.1.2:100::100::00:00:66:cf:82:d1/304
-
*[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
-
AS path: I, validation-state: unverified
-
> to 192.169.100.15 via ge-1/1/0.0, Push 300960
-
2:1.1.1.2:100::100::00:00:66:cf:82:d3/304
-
*[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
-
AS path: I, validation-state: unverified
-
> to 192.169.100.15 via ge-1/1/0.0, Push 300960
You’ll notice that in MX-3’s BGP EVPN table, it’s receiving those good old type-2 MAC routes, however some of them are being learnt from MX-1 and MX-2, which is exactly what we want and exactly what MX-3 needs in order for egress traffic to be sent towards MX-1 and MX-2 in the all-active fashion that we desire.
Remember that because EVPN maintains an forwarding-based layer-2 control plane, the determination on whether traffic should go to MX-1 or MX-2, from MX-3 depends on how EX4200-1 hashes egress traffic in the first place, see the below diagram for an at attempt at a better explanation:
But what happens if the EX4200 switch has a really rubbish hashing algorithm, or there’s no granularity – to the point where nearly all the traffic comes from MX-1 and hardly any comes from MX-2, you’d end up with traffic polarisation and really bad load-balancing. EVPN solves this problem by using an aliasing label.
MX-3 for example has a full table of EVPN MAC routes, so it can load-balance traffic on a per-flow basis back to MX-1 and Mx-2 by making use of the aliasing label. In the case of the IXIA hosts at the top of the network, they’re all being advertised with an ESI of 00:11:22:33:44:55:66:77:88:99, which means they’re all coming from the same place – this means MX-3 will simply treat the aliasing route as a normal MAC route and send the traffic anyway.
If there’s a failure somewhere on either MX-1 or MX-2, the aliasing label gets withdrawn and you’re left with MAC routes for one site only – to prevent the black-holing of traffic.
The last thing to consider is the concept of “designated forwarder” lets re-check the EVPN instance output from earlier on:
-
tim@MX5-1> show evpn instance extensive
-
Instance: EVPN-100
-
Route Distinguisher: 1.1.1.1:100
-
Per-instance MAC route label: 299776
-
MAC database status Local Remote
-
Total MAC addresses: 13 96
-
Default gateway MAC addresses: 1 0
-
Number of local interfaces: 1 (1 up)
-
Interface name ESI Mode Status
-
ae0.100 00:11:22:33:44:55:66:77:88:99 all-active Up
-
Number of IRB interfaces: 1 (1 up)
-
Interface name VLAN ID Status L3 context
-
irb.100 100 Up VPN-100
-
Number of bridge domains: 1
-
VLAN ID Intfs / up Mode MAC sync IM route label
-
100 1 1 Extended Enabled 300432
-
Number of neighbors: 2
-
10.10.10.2
-
Received routes
-
MAC address advertisement: 49
-
MAC+IP address advertisement: 0
-
Inclusive multicast: 1
-
Ethernet auto-discovery: 2
-
10.10.10.3
-
Received routes
-
MAC address advertisement: 60
-
MAC+IP address advertisement: 0
-
Inclusive multicast: 1
-
Ethernet auto-discovery: 0
-
Number of ethernet segments: 1
-
ESI: 00:11:22:33:44:55:66:77:88:99
-
Status: Resolved by IFL ae0.100
-
Local interface: ae0.100, Status: Up/Forwarding
-
Number of remote PEs connected: 1
-
Remote PE MAC label Aliasing label Mode
-
10.10.10.2 300416 300416 all-active
-
Designated forwarder: 10.10.10.1
-
Backup forwarder: 10.10.10.2
-
Advertised MAC label: 300400
-
Advertised aliasing label: 300400
-
Advertised split horizon label: 300416
-
Instance: __default_evpn__
-
Route Distinguisher: 10.10.10.1:0
-
Number of bridge domains: 0
-
Number of neighbors: 1
-
10.10.10.2
-
Received routes
-
Ethernet Segment: 1
-
tim@MX5-1>
When running in all-active mode, it’s obvious that both PE routers are forwarding traffic, but it’s important to know that both PE’s can only forward unicast traffic in an all-active fashion. When two PE routers discover each other on the same EVI via the MPLS network, via BGP auto-discovery routes, they elect a “designated forwarder”
The primary role of the active designated forwarder is to forward BUM (broadcast multicast traffic) it would be highly undesirable for both PE’s to forward broadcasts and so only one is responsible for this in order to prevent traffic duplication.
Anyways, that’s about all I have time for tonight – I hope you found this useful!