EVPN – All-active multihoming

So this is the fourth blog on EVPN, the previous blogs covered the following topics:

  • EVPN basics, route-types and basic L2 forwarding
  • EVPN IRB and Inter-VLAN routing
  • EVPN single-active multi-homing

This post will cover the ability of EVPN to provide all-active multi-homing for layer-2 traffic, where the topology contains two different active PE routers, connecting to a switch via a LAG, the setup is similar to the previous labs. Due to some restrictions and in the interests of simplicity, this lab will cover all-active multi-homing for a single VLAN only, (VLAN 100 in this case) consider the network topology:

Capture5

The topology and general connectivity is the same as the other previous examples, the two big differences are that only VLAN 100 is present here and the connectivity between MX-1 and MX-2 is now using MC-LAG.

The first consideration that needs to be made when running EVPN in all-active mode, is that it must connect to the upstream devices using some sort of LAG, or MC-LAG – consider the wording from the RFC 7432:


https://tools.ietf.org/html/rfc7432#section-14.1.2

“If a bridged network is multihomed to more than one PE in an EVPN network via switches, then the support of All-Active redundancy mode requires the bridged network to be connected to two or more PEs using a LAG.”

Essentially, this boils down to some basic facts around how switches work – you can’t have two different PE routers with active access-interfaces configured with the same mac-address, spanning two different control-planes, for the simple reason that you’ll create a duplicate mac-address in the layer-2 network, which will cause a nightmare.

Consider the below scenario:

Capture6

I tried this in a lab before I read the RFC, and discovered that EX4200-1 floods egress traffic to MX-1 and MX-2, resulting in lots of traffic duplication and flooding, simply because each time a packet lands on ge-0/0/0 or ge-0/0/1 from MX-1 or MX-2 with mac-address “X” the switch has to update it’s CAM table, so essentially the whole thing is broken – which explains the wording of the RFC in relation to all-active mode.

With Juniper the way to get around this problem is simply to convert the Ethernet interfaces connecting to EX4200-1 to a basic MC-LAG configuration, we don’t need to configure ICCP or any serious multi-chassis configuration – we just need to make sure the LACP system-id is identical on MX-1 and MX-2, so that the EX4200 think’s it’s connected to a single downstream device,

Lets check the LAG configuration on MX-1 and MX-2;

MX-1

  1. tim@MX5-1> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

MX-2

  1. tim@MX5-2> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

And finally on EX4200-1 we have a basic standard LAG configuration, with nothing fancy or sexy going on 🙂

EX4200-1

 

  1. imtech@ex4200-1> show configuration interfaces ae0
  2. aggregated-ether-options {
  3.     lacp {
  4.         active;
  5.     }
  6. }
  7. unit 0 {
  8.     family ethernet-switching {
  9.         port-mode trunk;
  10.         vlan {
  11.             members vlan-100;
  12.         }
  13.     }
  14. }
  15. {master:0}
  16. imtech@ex4200-1>

 

 

From the perspective of the EX4200, it’s just a totally standard LAG with two interfaces running LACP, so long as we have EVPN all-active configured correctly on MX-1 and MX-2 everything is taken care of.

EX4200-1 verification:

  1. imtech@ex4200-1> show lacp interfaces
  2. Aggregated interface: ae0
  3.     LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
  4.       ge-0/0/0       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
  5.       ge-0/0/0     Partner    No    No   Yes  Yes  Yes   Yes     Fast   Passive
  6.       ge-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
  7.       ge-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Fast   Passive
  8.     LACP protocol:        Receive State  Transmit State          Mux State
  9.       ge-0/0/0                  Current   Fast periodic Collecting distributing
  10.       ge-0/0/1                  Current   Fast periodic Collecting distributing
  11. {master:0}
  12. imtech@ex4200-1>

 

Aside from the fact we’ve converted the access Ethernet interfaces to MC-LAG on MX-1 and MX-2, lets check to see what’s changed with the EVPN configuration in order to get all-active EVPN working, first lets check MX-1:

  1. tim@MX5-1> show configuration routing-instances
  2. EVPN-100 {
  3.     instance-type virtual-switch;
  4.     route-distinguisher 1.1.1.1:100;
  5.     vrf-target target:100:100;
  6.     protocols {
  7.         evpn {
  8.             extended-vlan-list 100;
  9.             default-gateway do-not-advertise;
  10.         }
  11.     }
  12.     bridge-domains {
  13.         VL-100 {
  14.             vlan-id 100;
  15.             interface ae0.100;
  16.             routing-interface irb.100;
  17.         }
  18.     }
  19. }
  20. VPN-100 {
  21.     instance-type vrf;
  22.     interface irb.100;
  23.     route-distinguisher 100.100.100.1:100;
  24.     vrf-target target:1:100;
  25.     vrf-table-label;
  26. }
  27. tim@MX5-1>

 

The configuration is absolutely identical on MX-2, you’ll notice that the only thing which has changed on MX-1, is that the physical interface of ge-1/1/5 has changed to the new LAG interface of ae0.100 for VLAN 100, everything else is exactly the same as the previous single-active example from last week, lets take a closer look at the interface on MX-1

  1. tim@MX5-1> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

It’s clear to see that under the interface ESI configuration, we’re changed the ESI mode from single-active, to “all-active” which again should be self explanatory to most readers 🙂 and again note, that this configuration is 100% identical on both Mx-1 and MX-2,

Lets check the EVPN instance and see what’s changed since the single-active example:

  1. tim@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                13      96
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300432
  17.   Number of neighbors: 2
  18.     10.10.10.2
  19.       Received routes
  20.         MAC address advertisement:             49
  21.         MAC+IP address advertisement:           0
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36.         10.10.10.2       300416     300416          all-active  
  37.       Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300400
  40.       Advertised aliasing label: 300400
  41.       Advertised split horizon label: 300416
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.1:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.2
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-1>

 

So we can see that MX-1 has changed from single-active to all-active, and is in the up/forwarding state,

Lets check MX-2 to see what it looks like:

  1. tim@MX5-2> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.2:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                47      64
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300528
  17.   Number of neighbors: 2
  18.     10.10.10.1
  19.       Received routes
  20.         MAC address advertisement:             14
  21.         MAC+IP address advertisement:           1
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36. 10.10.10.1       300400     300400          all-active
  37. Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300416
  40.       Advertised aliasing label: 300416
  41.       Advertised split horizon label: 300432
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.2:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.1
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-2>

 

Excellent! both MX-1 and MX-2 are in the up/forwarding state for VLAN 100, meaning that in theory – they can both send and receive traffic received on their access LAG interface, and the MPLS side – you’ll also notice how simple it is to get working.

I currently have 50x IXIA hosts sat behind MX-1 and MX-2, and a further 50x hosts sat behind MX-3, 50Mbps of traffic is being sent bi-bidirectionally between each IXIA host, lets recap the diagram:

Capture7

With an active-active configuration, traffic from multiple hosts at the top of the network, should be sent towards MX-1 and MX-2 by EX4200-1 according to it’s standard LAG hashing algorithm, (source/destination mac) because I have 100 hosts in total, there should be enough granularity at layer-2 to perform rough distribution of some traffic on MX-1 and some traffic on MX-2

Lets send the IXIA traffic:

IXIA

Now lets look at the physical access interfaces on MX-1 and Mx-2 to see how the traffic is being handled:

Mx-1


tim@MX5-1> show configuration interfaces ge-1/1/5 
gigether-options {
 802.3ad ae0;
}

tim@MX5-1> show interfaces ae0 | match pps 
 Input rate : 5404040 bps (484 pps)
 Output rate : 10384856 bps (929 pps)

So 5Mbps in and 10Mbps out on Mx-1

Lets check MX-2


tim@MX5-2> show configuration interfaces ge-1/0/5 
gigether-options {
 802.3ad ae0;
}

tim@MX5-2> show interfaces ae0 | match pps 
 Input rate : 19535296 bps (1750 pps)
 Output rate : 14546816 bps (1302 pps)

So it seems to be working – MX-1 and MX-2 are both sending and receiving traffic in the same layer-2 broadcast domain,

Lets check their MPLS facing interfaces:

MX-1


tim@MX5-1> show isis adjacency 
Interface System L State Hold (secs) SNPA
ge-1/1/0.0 m10i-1 2 Up 19

tim@MX5-1> show interfaces ge-1/1/0 | match pps 
 Input rate : 10415216 bps (930 pps)
 Output rate : 5404040 bps (484 pps)

tim@MX5-1>

MX-2


tim@MX5-2> show isis adjacency 
Interface System L State Hold (secs) SNPA
ge-1/1/0.0 m10i-2 2 Up 24

tim@MX5-2> show interfaces ge-1/1/0 | match pps 
 Input rate : 14583752 bps (1303 pps)
 Output rate : 19535576 bps (1751 pps)

tim@MX5-2>

 

And so all seems right with the world, traffic from the MPLS network is being sent from MX-3 to both MX-1 and MX-2, lets look at the EVPN BGP control-plane on MX-3 to see what’s going on with all-active – we’ll take a look at a slice of the BGP table for brevity:

 

  1. 2:1.1.1.1:100::100::00:00:66:cf:82:df/304
  2.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  3.                       AS path: I, validation-state: unverified
  4.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  5. 2:1.1.1.1:100::100::00:00:66:cf:82:e1/304
  6.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  7.                       AS path: I, validation-state: unverified
  8.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  9. 2:1.1.1.1:100::100::00:00:66:cf:82:e3/304
  10.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  11.                       AS path: I, validation-state: unverified
  12.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  13. 2:1.1.1.1:100::100::00:00:66:d0:5d:f3/304
  14.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  15.                       AS path: I, validation-state: unverified
  16.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  17. 2:1.1.1.2:100::100::00:00:2e:18:6d:e1/304
  18.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  19.                       AS path: I, validation-state: unverified
  20.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  21. 2:1.1.1.2:100::100::00:00:2e:18:f3:c4/304
  22.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  23.                       AS path: I, validation-state: unverified
  24.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  25. 2:1.1.1.2:100::100::00:00:66:cf:82:d1/304
  26.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  27.                       AS path: I, validation-state: unverified
  28.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  29. 2:1.1.1.2:100::100::00:00:66:cf:82:d3/304
  30.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  31.                       AS path: I, validation-state: unverified
  32.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960

 

 

You’ll notice that in MX-3’s BGP EVPN table, it’s receiving those good old type-2 MAC routes, however some of them are being learnt from MX-1 and MX-2, which is exactly what we want and exactly what MX-3 needs in order for egress traffic to be sent towards MX-1 and MX-2 in the all-active fashion that we desire.

Remember that because EVPN maintains an forwarding-based layer-2 control plane, the determination on whether traffic should go to MX-1 or MX-2, from MX-3 depends on how EX4200-1 hashes egress traffic in the first place, see the below diagram for an at attempt at a better explanation:

Capture8

 

But what happens if the EX4200 switch has a really rubbish hashing algorithm, or there’s no granularity – to the point where nearly all the traffic comes from MX-1 and hardly any comes from MX-2, you’d end up with traffic polarisation and really bad load-balancing. EVPN solves this problem by using an aliasing label.

MX-3 for example has a full table of EVPN MAC routes, so it can load-balance traffic on a per-flow basis back to MX-1 and Mx-2 by making use of the aliasing label. In the case of the IXIA hosts at the top of the network, they’re all being advertised with an ESI of 00:11:22:33:44:55:66:77:88:99, which means they’re all coming from the same place – this means MX-3 will simply treat the aliasing route as a normal MAC route and send the traffic anyway.

If there’s a failure somewhere on either MX-1 or MX-2, the aliasing label gets withdrawn and you’re left with MAC routes for one site only – to prevent the black-holing of traffic.

 

The last thing to consider is the concept of “designated forwarder” lets re-check the EVPN instance output from earlier on:

  1. tim@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                13      96
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300432
  17.   Number of neighbors: 2
  18.     10.10.10.2
  19.       Received routes
  20.         MAC address advertisement:             49
  21.         MAC+IP address advertisement:           0
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36.         10.10.10.2       300416     300416          all-active  
  37.       Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300400
  40.       Advertised aliasing label: 300400
  41.       Advertised split horizon label: 300416
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.1:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.2
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-1>

 

When running in all-active mode, it’s obvious that both PE routers are forwarding traffic, but it’s important to know that both PE’s can only forward unicast traffic in an all-active fashion. When two PE routers discover each other on the same EVI via the MPLS network, via BGP auto-discovery routes, they elect a “designated forwarder”

The primary role of the active designated forwarder is to forward BUM (broadcast multicast traffic) it would be highly undesirable for both PE’s to forward broadcasts and so only one is responsible for this in order to prevent traffic duplication.

Anyways, that’s about all I have time for tonight – I hope you found this useful!

18 thoughts on “EVPN – All-active multihoming”

  1. Thanks for those great articles on EVPN, found it very usefull! Any twitter username so I can give you credit when sharing those links?

    Like

  2. Thank you for your awesome blog posts!

    Will there be another blog about EVPN – All-active multihoming with multiple VLANs?

    Like

  3. Regarding the question about all-active for interVLAN traffic, that is NOT implemented in the current version, at least that is what I was informed by Juniper Engineers.
    Regards Alexander

    Like

  4. Great article, thanks! 🙂
    What will happen to ARP requests (broadcast traffic) received on MX-2 (Backup Forwarder)? I guess it would drop this BUM traffic. What is the recommended configuration to handle this problem?

    Like

    1. Hi thanks! because the scenario is dealing with layer-2 forwarding only, only the designated forwarder will transmit the arp broadcast to other sites in the EVPN, this shouldn’t be a problem as the ARP broadcast will still reach all sites and ARP resolution will be successful, it just won’t be amplified by both all-active PEs sending a single arp request – if that makes any sense 🙂

      Like

      1. But since there is no guarantee that the ARP request will be received on the designated forwarder, you could have a situation, where only the backup forwarder receive the ARP request over the MC-LAG-AE0. I guess the backup forwarder will drop this broadcast traffic, right? Maybe the ‘evpn multicast ingress-replication’ needs to be enabled to solve this problem, but I’m not sure if that would give other issues..

        Like

  5. Hi,

    I am implementing EVPN on almost the same topology of you, but I am receiving two copies of each packet from MX-3 to MX-1 and MX-2, if i send 1M traffic from tester connected to MX-3 I see 2M traffic on MX-3’s core/P/uplink.

    can you suggest what could be the reason, i have configured system-id under lacp though.

    thanks,

    Like

    1. I see exactly the same!!! For the life of me I cannot see why. My setup is almost the same, but the without dedicated ‘P’ routers. When pinging between my MBPs, I see duplicates of each packet. Really stuck and cannot find any information to help. Any help is much appreciated. Using Junos: 16.2R1.6 on three MX240’s and have used Cisco and Juniper EX swithcs as CEs.

      Like

      1. In case anyone ends up here searching for the answer you need to ensure you have per-packet load balancing in place, otherwise the kernel is not able to install the next-hop list correctly and causes packet duplication.

        Like

    1. lol – last time we spoke, I was leaving Axians to go to Arbor – that didn’t go so well…… during that I ended up having a conversation with Riot games (who make league of legends) and started there this week! tonnes of Juniper and Alcatel, so I feel much more at home :p going awesome so far!

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s