EVPN – All-active multihoming

So this is the fourth blog on EVPN, the previous blogs covered the following topics:

  • EVPN basics, route-types and basic L2 forwarding
  • EVPN IRB and Inter-VLAN routing
  • EVPN single-active multi-homing

This post will cover the ability of EVPN to provide all-active multi-homing for layer-2 traffic, where the topology contains two different active PE routers, connecting to a switch via a LAG, the setup is similar to the previous labs. Due to some restrictions and in the interests of simplicity, this lab will cover all-active multi-homing for a single VLAN only, (VLAN 100 in this case) consider the network topology:

Capture5

The topology and general connectivity is the same as the other previous examples, the two big differences are that only VLAN 100 is present here and the connectivity between MX-1 and MX-2 is now using MC-LAG.

The first consideration that needs to be made when running EVPN in all-active mode, is that it must connect to the upstream devices using some sort of LAG, or MC-LAG – consider the wording from the RFC 7432:


https://tools.ietf.org/html/rfc7432#section-14.1.2

“If a bridged network is multihomed to more than one PE in an EVPN network via switches, then the support of All-Active redundancy mode requires the bridged network to be connected to two or more PEs using a LAG.”

Essentially, this boils down to some basic facts around how switches work – you can’t have two different PE routers with active access-interfaces configured with the same mac-address, spanning two different control-planes, for the simple reason that you’ll create a duplicate mac-address in the layer-2 network, which will cause a nightmare.

Consider the below scenario:

Capture6

I tried this in a lab before I read the RFC, and discovered that EX4200-1 floods egress traffic to MX-1 and MX-2, resulting in lots of traffic duplication and flooding, simply because each time a packet lands on ge-0/0/0 or ge-0/0/1 from MX-1 or MX-2 with mac-address “X” the switch has to update it’s CAM table, so essentially the whole thing is broken – which explains the wording of the RFC in relation to all-active mode.

With Juniper the way to get around this problem is simply to convert the Ethernet interfaces connecting to EX4200-1 to a basic MC-LAG configuration, we don’t need to configure ICCP or any serious multi-chassis configuration – we just need to make sure the LACP system-id is identical on MX-1 and MX-2, so that the EX4200 think’s it’s connected to a single downstream device,

Lets check the LAG configuration on MX-1 and MX-2;

MX-1

  1. tim@MX5-1> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

MX-2

  1. tim@MX5-2> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

And finally on EX4200-1 we have a basic standard LAG configuration, with nothing fancy or sexy going on 🙂

EX4200-1

 

  1. imtech@ex4200-1> show configuration interfaces ae0
  2. aggregated-ether-options {
  3.     lacp {
  4.         active;
  5.     }
  6. }
  7. unit 0 {
  8.     family ethernet-switching {
  9.         port-mode trunk;
  10.         vlan {
  11.             members vlan-100;
  12.         }
  13.     }
  14. }
  15. {master:0}
  16. imtech@ex4200-1>

 

 

From the perspective of the EX4200, it’s just a totally standard LAG with two interfaces running LACP, so long as we have EVPN all-active configured correctly on MX-1 and MX-2 everything is taken care of.

EX4200-1 verification:

  1. imtech@ex4200-1> show lacp interfaces
  2. Aggregated interface: ae0
  3.     LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
  4.       ge-0/0/0       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
  5.       ge-0/0/0     Partner    No    No   Yes  Yes  Yes   Yes     Fast   Passive
  6.       ge-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
  7.       ge-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Fast   Passive
  8.     LACP protocol:        Receive State  Transmit State          Mux State
  9.       ge-0/0/0                  Current   Fast periodic Collecting distributing
  10.       ge-0/0/1                  Current   Fast periodic Collecting distributing
  11. {master:0}
  12. imtech@ex4200-1>

 

Aside from the fact we’ve converted the access Ethernet interfaces to MC-LAG on MX-1 and MX-2, lets check to see what’s changed with the EVPN configuration in order to get all-active EVPN working, first lets check MX-1:

  1. tim@MX5-1> show configuration routing-instances
  2. EVPN-100 {
  3.     instance-type virtual-switch;
  4.     route-distinguisher 1.1.1.1:100;
  5.     vrf-target target:100:100;
  6.     protocols {
  7.         evpn {
  8.             extended-vlan-list 100;
  9.             default-gateway do-not-advertise;
  10.         }
  11.     }
  12.     bridge-domains {
  13.         VL-100 {
  14.             vlan-id 100;
  15.             interface ae0.100;
  16.             routing-interface irb.100;
  17.         }
  18.     }
  19. }
  20. VPN-100 {
  21.     instance-type vrf;
  22.     interface irb.100;
  23.     route-distinguisher 100.100.100.1:100;
  24.     vrf-target target:1:100;
  25.     vrf-table-label;
  26. }
  27. tim@MX5-1>

 

The configuration is absolutely identical on MX-2, you’ll notice that the only thing which has changed on MX-1, is that the physical interface of ge-1/1/5 has changed to the new LAG interface of ae0.100 for VLAN 100, everything else is exactly the same as the previous single-active example from last week, lets take a closer look at the interface on MX-1

  1. tim@MX5-1> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

It’s clear to see that under the interface ESI configuration, we’re changed the ESI mode from single-active, to “all-active” which again should be self explanatory to most readers 🙂 and again note, that this configuration is 100% identical on both Mx-1 and MX-2,

Lets check the EVPN instance and see what’s changed since the single-active example:

  1. tim@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                13      96
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300432
  17.   Number of neighbors: 2
  18.     10.10.10.2
  19.       Received routes
  20.         MAC address advertisement:             49
  21.         MAC+IP address advertisement:           0
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36.         10.10.10.2       300416     300416          all-active  
  37.       Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300400
  40.       Advertised aliasing label: 300400
  41.       Advertised split horizon label: 300416
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.1:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.2
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-1>

 

So we can see that MX-1 has changed from single-active to all-active, and is in the up/forwarding state,

Lets check MX-2 to see what it looks like:

  1. tim@MX5-2> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.2:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                47      64
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300528
  17.   Number of neighbors: 2
  18.     10.10.10.1
  19.       Received routes
  20.         MAC address advertisement:             14
  21.         MAC+IP address advertisement:           1
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36. 10.10.10.1       300400     300400          all-active
  37. Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300416
  40.       Advertised aliasing label: 300416
  41.       Advertised split horizon label: 300432
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.2:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.1
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-2>

 

Excellent! both MX-1 and MX-2 are in the up/forwarding state for VLAN 100, meaning that in theory – they can both send and receive traffic received on their access LAG interface, and the MPLS side – you’ll also notice how simple it is to get working.

I currently have 50x IXIA hosts sat behind MX-1 and MX-2, and a further 50x hosts sat behind MX-3, 50Mbps of traffic is being sent bi-bidirectionally between each IXIA host, lets recap the diagram:

Capture7

With an active-active configuration, traffic from multiple hosts at the top of the network, should be sent towards MX-1 and MX-2 by EX4200-1 according to it’s standard LAG hashing algorithm, (source/destination mac) because I have 100 hosts in total, there should be enough granularity at layer-2 to perform rough distribution of some traffic on MX-1 and some traffic on MX-2

Lets send the IXIA traffic:

IXIA

Now lets look at the physical access interfaces on MX-1 and Mx-2 to see how the traffic is being handled:

Mx-1


tim@MX5-1> show configuration interfaces ge-1/1/5 
gigether-options {
 802.3ad ae0;
}

tim@MX5-1> show interfaces ae0 | match pps 
 Input rate : 5404040 bps (484 pps)
 Output rate : 10384856 bps (929 pps)

So 5Mbps in and 10Mbps out on Mx-1

Lets check MX-2


tim@MX5-2> show configuration interfaces ge-1/0/5 
gigether-options {
 802.3ad ae0;
}

tim@MX5-2> show interfaces ae0 | match pps 
 Input rate : 19535296 bps (1750 pps)
 Output rate : 14546816 bps (1302 pps)

So it seems to be working – MX-1 and MX-2 are both sending and receiving traffic in the same layer-2 broadcast domain,

Lets check their MPLS facing interfaces:

MX-1


tim@MX5-1> show isis adjacency 
Interface System L State Hold (secs) SNPA
ge-1/1/0.0 m10i-1 2 Up 19

tim@MX5-1> show interfaces ge-1/1/0 | match pps 
 Input rate : 10415216 bps (930 pps)
 Output rate : 5404040 bps (484 pps)

tim@MX5-1>

MX-2


tim@MX5-2> show isis adjacency 
Interface System L State Hold (secs) SNPA
ge-1/1/0.0 m10i-2 2 Up 24

tim@MX5-2> show interfaces ge-1/1/0 | match pps 
 Input rate : 14583752 bps (1303 pps)
 Output rate : 19535576 bps (1751 pps)

tim@MX5-2>

 

And so all seems right with the world, traffic from the MPLS network is being sent from MX-3 to both MX-1 and MX-2, lets look at the EVPN BGP control-plane on MX-3 to see what’s going on with all-active – we’ll take a look at a slice of the BGP table for brevity:

 

  1. 2:1.1.1.1:100::100::00:00:66:cf:82:df/304
  2.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  3.                       AS path: I, validation-state: unverified
  4.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  5. 2:1.1.1.1:100::100::00:00:66:cf:82:e1/304
  6.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  7.                       AS path: I, validation-state: unverified
  8.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  9. 2:1.1.1.1:100::100::00:00:66:cf:82:e3/304
  10.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  11.                       AS path: I, validation-state: unverified
  12.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  13. 2:1.1.1.1:100::100::00:00:66:d0:5d:f3/304
  14.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  15.                       AS path: I, validation-state: unverified
  16.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  17. 2:1.1.1.2:100::100::00:00:2e:18:6d:e1/304
  18.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  19.                       AS path: I, validation-state: unverified
  20.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  21. 2:1.1.1.2:100::100::00:00:2e:18:f3:c4/304
  22.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  23.                       AS path: I, validation-state: unverified
  24.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  25. 2:1.1.1.2:100::100::00:00:66:cf:82:d1/304
  26.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  27.                       AS path: I, validation-state: unverified
  28.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  29. 2:1.1.1.2:100::100::00:00:66:cf:82:d3/304
  30.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  31.                       AS path: I, validation-state: unverified
  32.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960

 

 

You’ll notice that in MX-3’s BGP EVPN table, it’s receiving those good old type-2 MAC routes, however some of them are being learnt from MX-1 and MX-2, which is exactly what we want and exactly what MX-3 needs in order for egress traffic to be sent towards MX-1 and MX-2 in the all-active fashion that we desire.

Remember that because EVPN maintains an forwarding-based layer-2 control plane, the determination on whether traffic should go to MX-1 or MX-2, from MX-3 depends on how EX4200-1 hashes egress traffic in the first place, see the below diagram for an at attempt at a better explanation:

Capture8

 

But what happens if the EX4200 switch has a really rubbish hashing algorithm, or there’s no granularity – to the point where nearly all the traffic comes from MX-1 and hardly any comes from MX-2, you’d end up with traffic polarisation and really bad load-balancing. EVPN solves this problem by using an aliasing label.

MX-3 for example has a full table of EVPN MAC routes, so it can load-balance traffic on a per-flow basis back to MX-1 and Mx-2 by making use of the aliasing label. In the case of the IXIA hosts at the top of the network, they’re all being advertised with an ESI of 00:11:22:33:44:55:66:77:88:99, which means they’re all coming from the same place – this means MX-3 will simply treat the aliasing route as a normal MAC route and send the traffic anyway.

If there’s a failure somewhere on either MX-1 or MX-2, the aliasing label gets withdrawn and you’re left with MAC routes for one site only – to prevent the black-holing of traffic.

 

The last thing to consider is the concept of “designated forwarder” lets re-check the EVPN instance output from earlier on:

  1. tim@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                13      96
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300432
  17.   Number of neighbors: 2
  18.     10.10.10.2
  19.       Received routes
  20.         MAC address advertisement:             49
  21.         MAC+IP address advertisement:           0
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36.         10.10.10.2       300416     300416          all-active  
  37.       Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300400
  40.       Advertised aliasing label: 300400
  41.       Advertised split horizon label: 300416
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.1:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.2
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-1>

 

When running in all-active mode, it’s obvious that both PE routers are forwarding traffic, but it’s important to know that both PE’s can only forward unicast traffic in an all-active fashion. When two PE routers discover each other on the same EVI via the MPLS network, via BGP auto-discovery routes, they elect a “designated forwarder”

The primary role of the active designated forwarder is to forward BUM (broadcast multicast traffic) it would be highly undesirable for both PE’s to forward broadcasts and so only one is responsible for this in order to prevent traffic duplication.

Anyways, that’s about all I have time for tonight – I hope you found this useful!

EVPN Inter-VLAN routing + mobility

So in the last blog I essentially looked at one of the most basic aspects of EVPN – a multi-site layer-2 network with nothing fancy going on, with traffic forwarding occurring between multiple sites in the same VLAN. The fact of the matter is that there was nothing going on there that you couldn’t do with a traditional VPLS configuration, however the general idea was to demonstrate the basics and take a look at the basic control-plane first.

In this update we’ll be looking at some of the more exclusive and highly useful aspects of EVPNs which make it a very attractive technology for things such as data-centre interconnect, there are a few things which are possible with EVPN which cannot be done with VPLS.

Consider the revised topology:

Capture

It’s the same topology from the first blog post, however I’ve simply added an additional VLAN (VLAN 101) to ge-0/0/22 of each EX4200 LAN switch, and an additional IXIA host.

For this post we’re going to look at a rather cool way of performing inter-VLAN forwarding between hosts in VLAN100 and VLAN101. Not that I want to spend time teaching people how to suck eggs, but generally in a simple network with multiple VLANs you have 2 common ways of performing inter-VLAN forwarding:

  • Use a good ole’ fashioned router on a stick topology
  • Bolt some additional layer-3 functionality onto your layer-2 switch

As everyone knows, the latter method is by far the most common – the vast majority of switches support layer-3 routing functionality, usually in the form of IRB/BVI/SVI depending on the vendor in question.

In a service provider network, where we generally have a number of PE routers acting together as a large distributed switch, providing layer-2 connectivity – the old fashioned way of doing this would be with VPLS. In order to enable inter-VLAN forwarding we’d add a BVI interface to the VPLS instance, this enables a PE to do standard layer-2 switching and route between VLANs at layer-3 – which is very important for data-centre interconnect applications.

EVPN has a number of enhancements which make it more suitable for modern day data-centre interconnect designs, especially where things such as VM mobility are concerned. A company or organisation with a traditional MPLS based network, might require the ability to move hosts around between data centres seamlessly, without causing any real downtime.

Lets take a look at the basic interface configuration and routing-instance configuration:

  1. interfaces {
  2.     irb {
  3.         unit 100 {
  4.             family inet {
  5.                 address 192.168.100.1/24;
  6.             }
  7.             mac 00:00:19:21:68:10;
  8.         }
  9.         unit 101 {
  10.             family inet {
  11.                 address 192.168.101.1/24;
  12.             }
  13.             mac 00:00:19:21:68:11;
  14.         }
  15.     }
  16. routing-instances {
  17. EVPN-100 {
  18.     instance-type virtual-switch;
  19.     route-distinguisher 1.1.1.1:100;
  20.     vrf-target target:100:100;
  21.     protocols {
  22.         evpn {
  23.             extended-vlan-list 100-101;
  24.             default-gateway do-not-advertise;
  25.         }
  26.     }
  27.     bridge-domains {
  28.         VL-100 {
  29.             vlan-id 100;
  30.             interface ge-1/1/5.100;
  31.             routing-interface irb.100;
  32.         }
  33.         VL-101 {
  34.             vlan-id 101;
  35.             interface ge-1/1/5.101;
  36.             routing-interface irb.101;
  37.         }
  38.     }
  39. }
  40. VPN-100 {
  41.     instance-type vrf;
  42.     interface irb.100;
  43.     interface irb.101;
  44.     route-distinguisher 100.100.100.1:100;
  45.     vrf-target target:1:100;
  46.     vrf-table-label;
  47. }

 

First things first – lines 1 – 15 take care of the IRB interfaces for VLAN 100 and VLAN 101; more of that shortly.

Lines 16 – 39 form the configuration for the EVPN routing instance, you’ll note a couple of differences from the first EVPN blog post;

  • The extended-vlan-list has been increased to include both VLANs within the routing instance
  • A new command “default-gateway do-not-advertise” is present under the EVPN protocol configuration
  • An additional bridge-domain has been configured for Vlan 101 under the routing-instance, along with the IRB interface for each vlan
  • What looks like a totally standard L3VPN has been configured, albeit with different RTs and RDs – but it does contain the IRB interfaces from the EVPN routing instance.

The command “default-gateway do-not-advertise” is used to generate a new extended-community route. If on your PE routers you have different IRB MAC addresses and IPv4 addresses – the PE will generate a “default-gateway route” which tells other PEs in the EVPN that this route is a default-gateway somewhere, however in this example and in best practise – it’s simpler and easier to configure the same IRB MAC/IP on all your PEs, and so the command here is “do-not-advertise” as we don’t need it at this time.

But perhaps the coolest feature and one of the biggest advantages EVPN has over VPLS is the way the IRB interfaces are configured, in this topology the 3x PE routers, (MX5-1, MX5-2 and MX5-3) all have an identical IRB interface configuration for VLAN 100 and VLAN 101, each PE has the exact same IP address, and MAC address…:

MX5-1:

  1. imtech@MX5-1# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

MX5-2

  1. imtech@MX5-2# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

MX5-3

  1. imtech@MX5-3# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

The first time you see it, you think:

15omtr

But it’s true! all the PEs in the network have the exact same IP address and MAC address on their IRB interfaces, why would we do that? and how does it work?

Consider the following scenario:

Capture2

Imagine a basic data-centre environment running things like VMware or openstack – basically we can provision servers and move them around all over the place using things like VMotion etc. If you can imagine the active server on the left hand portion of the data-centre and business as usual from a networks perspective, arp is learnt between the host and the left hand PE, the default-gateway is 192.168.100.1

Now, imagine that the DC admin flicks the switch, and that active VM on the left is immediately torn down and spun up inside the right hand DC (which could be many miles away) you’ll notice that the interface mac-address and the default-gateway are the same. This gives us the ability to move hosts around our data centres, without having to worry about different default-gateways, or incurring too much downtime whilst we wait for things to re-arp, because everything is identical at each DC site – there’s no problem moving things around between one site or the next.

Capture3

You cannot do this with VPLS as the implementation demands that you use unique MAC-addresses, which moves us on deeper into the technology – how does EVPN achieve this breakthrough?

It’s essentially boils down to the way that EVPN has been engineered to more closely integrate with the layer-3 world, essentially the software has a number of hooks which go between EVPN and L3VPN in a much more elegant fashion than VPLS, for example in the first blog post – it showed how MAC addresses were learnt and inserted into the BGP control-plane, in this example for Inter-VLAN forwarding, a few extra things are happening:

  • Firstly we have the BGP MAC advertisement from the L2 world,
  • Secondly, we get a new MAC/IP advertisement containing the PE’s IRB MAC and IP address – this is linked to the PE’s ARP table
  • Thirdly, we get a totally standard /32 IPv4 L3VPN route for the host’s /32 address, this is advertised to all remote PEs

Let’s recap a more basic version of the lab diagram and see what the control-plane looks like when we send some traffic between hosts in different VLANs:

Capture4

Now lets look at the BGP control-plane on MX-1 and see what’s going on:

  1. imtech@MX5-1> show route protocol bgp table EVPN-100.evpn.0
  2. EVPN-100.evpn.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 2:1.1.1.2:100::101::00:00:2e:e6:77:97/304
  5.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  6.                       AS path: I, validation-state: unverified
  7.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  8. 2:1.1.1.2:100::101::00:00:2e:e6:77:97::192.168.101.11/304  
  9.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  10.                       AS path: I, validation-state: unverified
  11.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  12. 3:1.1.1.2:100::100::10.10.10.2/304
  13.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  14.                       AS path: I, validation-state: unverified
  15.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  16. 3:1.1.1.2:100::101::10.10.10.2/304
  17.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  18.                       AS path: I, validation-state: unverified
  19.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  20. imtech@MX5-1> show route protocol bgp table VPN-100.inet.0
  21. VPN-100.inet.0: 6 destinations, 9 routes (6 active, 0 holddown, 0 hidden)
  22. + = Active Route, – = Last Active, * = Both
  23. 192.168.100.0/24    [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  24.                       AS path: I, validation-state: unverified
  25.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)
  26. 192.168.101.0/24    [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  27.                       AS path: I, validation-state: unverified
  28.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)
  29. 192.168.101.11/32   [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  30.                       AS path: I, validation-state: unverified
  31.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)

You’ll immediatley notice that compared to the vanilla L2VPN implementation, there’s a lot more going on – lets break it down,

  • Line 6 is the standard MAC advertisement route, the same sort of advertisement we went over with the vanilla standard L2-only version of EVPN – this is for layer-2 connectivity only.
  • Line 10 is an EVPN MAC/IP route, which is basically the ARP mapping learnt directly from MX2 – this route makes it possible for all PEs in the network to synchronise their arp tables with each other!
  • Line 34 is a standard L3VPN route, containing the /32 host behind MX2

Line 10 essentially means, that as soon as you move a host from one place to another – the moment a packet lands on the ingress PE interface – it generates a new MAC/IP ARP route, and all other PE’s synchronise accordingly, meanwhile the host that’s moved doesn’t need to do anything else – other than keep sending packets at the exact same gateway IP/MAC as it did before it was moved, essentially we have layer-2 and layer-3 working together in harmony.

Line 34 is a standard L3VPN /32 host route for the host behind MX2, this means that if you have EVPN running across numerous data-centres in various places, if this is connected to a wider layer-3 network – such as traditional residential/business PE routers, these other routers don’t need to have any awareness of EVPN whatsoever – so long as they can participate in regular L3VPN then packets will always be delivered to the right place when things get moved around, because these routes are dynamically generated and advertised accordingly. This is a massive advantage over VPLS, as you don’t need to configure it in every corner of the network for it to be useful, it simply lives on your DC edge – the rest is left to vanilla L3VPN.

There are a few more enhancements due at some point soon, including quite an interesting one which is the “MAC mobility extended-community” which is essentially a safeguard to prevent a few rather nasty situations from arising:

  • A layer-2 loop, where two PEs constantly advertise the same MAC addresses – which could overwhelm the BGP control-plane
  • A situation where a pair of hosts each in a different DC are mis-configured with the same MAC address – if they’re both sending data then each PE will be generating route advertisements,

The MAC mobility extended community drafted in RFC 7432 introduces a sequence number, where if the same route is advertised a certain number of times within a specific period, it’s assumed that something is broken and the routers should perform some sort of damping and alerting procedure to prevent network meltdown.

I hope you found this useful! the next one I’ll be looking at some of the redundant designs including single-active and all-active multi-homing.