EVPN – All-active multihoming

So this is the fourth blog on EVPN, the previous blogs covered the following topics:

  • EVPN basics, route-types and basic L2 forwarding
  • EVPN IRB and Inter-VLAN routing
  • EVPN single-active multi-homing

This post will cover the ability of EVPN to provide all-active multi-homing for layer-2 traffic, where the topology contains two different active PE routers, connecting to a switch via a LAG, the setup is similar to the previous labs. Due to some restrictions and in the interests of simplicity, this lab will cover all-active multi-homing for a single VLAN only, (VLAN 100 in this case) consider the network topology:

Capture5

The topology and general connectivity is the same as the other previous examples, the two big differences are that only VLAN 100 is present here and the connectivity between MX-1 and MX-2 is now using MC-LAG.

The first consideration that needs to be made when running EVPN in all-active mode, is that it must connect to the upstream devices using some sort of LAG, or MC-LAG – consider the wording from the RFC 7432:


https://tools.ietf.org/html/rfc7432#section-14.1.2

“If a bridged network is multihomed to more than one PE in an EVPN network via switches, then the support of All-Active redundancy mode requires the bridged network to be connected to two or more PEs using a LAG.”

Essentially, this boils down to some basic facts around how switches work – you can’t have two different PE routers with active access-interfaces configured with the same mac-address, spanning two different control-planes, for the simple reason that you’ll create a duplicate mac-address in the layer-2 network, which will cause a nightmare.

Consider the below scenario:

Capture6

I tried this in a lab before I read the RFC, and discovered that EX4200-1 floods egress traffic to MX-1 and MX-2, resulting in lots of traffic duplication and flooding, simply because each time a packet lands on ge-0/0/0 or ge-0/0/1 from MX-1 or MX-2 with mac-address “X” the switch has to update it’s CAM table, so essentially the whole thing is broken – which explains the wording of the RFC in relation to all-active mode.

With Juniper the way to get around this problem is simply to convert the Ethernet interfaces connecting to EX4200-1 to a basic MC-LAG configuration, we don’t need to configure ICCP or any serious multi-chassis configuration – we just need to make sure the LACP system-id is identical on MX-1 and MX-2, so that the EX4200 think’s it’s connected to a single downstream device,

Lets check the LAG configuration on MX-1 and MX-2;

MX-1

  1. tim@MX5-1> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

MX-2

  1. tim@MX5-2> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

And finally on EX4200-1 we have a basic standard LAG configuration, with nothing fancy or sexy going on 🙂

EX4200-1

 

  1. imtech@ex4200-1> show configuration interfaces ae0
  2. aggregated-ether-options {
  3.     lacp {
  4.         active;
  5.     }
  6. }
  7. unit 0 {
  8.     family ethernet-switching {
  9.         port-mode trunk;
  10.         vlan {
  11.             members vlan-100;
  12.         }
  13.     }
  14. }
  15. {master:0}
  16. imtech@ex4200-1>

 

 

From the perspective of the EX4200, it’s just a totally standard LAG with two interfaces running LACP, so long as we have EVPN all-active configured correctly on MX-1 and MX-2 everything is taken care of.

EX4200-1 verification:

  1. imtech@ex4200-1> show lacp interfaces
  2. Aggregated interface: ae0
  3.     LACP state:       Role   Exp   Def  Dist  Col  Syn  Aggr  Timeout  Activity
  4.       ge-0/0/0       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
  5.       ge-0/0/0     Partner    No    No   Yes  Yes  Yes   Yes     Fast   Passive
  6.       ge-0/0/1       Actor    No    No   Yes  Yes  Yes   Yes     Fast    Active
  7.       ge-0/0/1     Partner    No    No   Yes  Yes  Yes   Yes     Fast   Passive
  8.     LACP protocol:        Receive State  Transmit State          Mux State
  9.       ge-0/0/0                  Current   Fast periodic Collecting distributing
  10.       ge-0/0/1                  Current   Fast periodic Collecting distributing
  11. {master:0}
  12. imtech@ex4200-1>

 

Aside from the fact we’ve converted the access Ethernet interfaces to MC-LAG on MX-1 and MX-2, lets check to see what’s changed with the EVPN configuration in order to get all-active EVPN working, first lets check MX-1:

  1. tim@MX5-1> show configuration routing-instances
  2. EVPN-100 {
  3.     instance-type virtual-switch;
  4.     route-distinguisher 1.1.1.1:100;
  5.     vrf-target target:100:100;
  6.     protocols {
  7.         evpn {
  8.             extended-vlan-list 100;
  9.             default-gateway do-not-advertise;
  10.         }
  11.     }
  12.     bridge-domains {
  13.         VL-100 {
  14.             vlan-id 100;
  15.             interface ae0.100;
  16.             routing-interface irb.100;
  17.         }
  18.     }
  19. }
  20. VPN-100 {
  21.     instance-type vrf;
  22.     interface irb.100;
  23.     route-distinguisher 100.100.100.1:100;
  24.     vrf-target target:1:100;
  25.     vrf-table-label;
  26. }
  27. tim@MX5-1>

 

The configuration is absolutely identical on MX-2, you’ll notice that the only thing which has changed on MX-1, is that the physical interface of ge-1/1/5 has changed to the new LAG interface of ae0.100 for VLAN 100, everything else is exactly the same as the previous single-active example from last week, lets take a closer look at the interface on MX-1

  1. tim@MX5-1> show configuration interfaces ae0
  2. description “MCLAG to EX4500-1”;
  3. flexible-vlan-tagging;
  4. encapsulation flexible-ethernet-services;
  5. esi {
  6.     00:11:22:33:44:55:66:77:88:99;
  7.     all-active;
  8. }
  9. aggregated-ether-options {
  10.     lacp {
  11.         system-id 00:00:00:00:00:01;
  12.     }
  13. }
  14. unit 100 {
  15.     encapsulation vlan-bridge;
  16.     vlan-id 100;
  17.     family bridge;
  18. }

 

It’s clear to see that under the interface ESI configuration, we’re changed the ESI mode from single-active, to “all-active” which again should be self explanatory to most readers 🙂 and again note, that this configuration is 100% identical on both Mx-1 and MX-2,

Lets check the EVPN instance and see what’s changed since the single-active example:

  1. tim@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                13      96
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300432
  17.   Number of neighbors: 2
  18.     10.10.10.2
  19.       Received routes
  20.         MAC address advertisement:             49
  21.         MAC+IP address advertisement:           0
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36.         10.10.10.2       300416     300416          all-active  
  37.       Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300400
  40.       Advertised aliasing label: 300400
  41.       Advertised split horizon label: 300416
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.1:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.2
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-1>

 

So we can see that MX-1 has changed from single-active to all-active, and is in the up/forwarding state,

Lets check MX-2 to see what it looks like:

  1. tim@MX5-2> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.2:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                47      64
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300528
  17.   Number of neighbors: 2
  18.     10.10.10.1
  19.       Received routes
  20.         MAC address advertisement:             14
  21.         MAC+IP address advertisement:           1
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36. 10.10.10.1       300400     300400          all-active
  37. Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300416
  40.       Advertised aliasing label: 300416
  41.       Advertised split horizon label: 300432
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.2:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.1
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-2>

 

Excellent! both MX-1 and MX-2 are in the up/forwarding state for VLAN 100, meaning that in theory – they can both send and receive traffic received on their access LAG interface, and the MPLS side – you’ll also notice how simple it is to get working.

I currently have 50x IXIA hosts sat behind MX-1 and MX-2, and a further 50x hosts sat behind MX-3, 50Mbps of traffic is being sent bi-bidirectionally between each IXIA host, lets recap the diagram:

Capture7

With an active-active configuration, traffic from multiple hosts at the top of the network, should be sent towards MX-1 and MX-2 by EX4200-1 according to it’s standard LAG hashing algorithm, (source/destination mac) because I have 100 hosts in total, there should be enough granularity at layer-2 to perform rough distribution of some traffic on MX-1 and some traffic on MX-2

Lets send the IXIA traffic:

IXIA

Now lets look at the physical access interfaces on MX-1 and Mx-2 to see how the traffic is being handled:

Mx-1


tim@MX5-1> show configuration interfaces ge-1/1/5 
gigether-options {
 802.3ad ae0;
}

tim@MX5-1> show interfaces ae0 | match pps 
 Input rate : 5404040 bps (484 pps)
 Output rate : 10384856 bps (929 pps)

So 5Mbps in and 10Mbps out on Mx-1

Lets check MX-2


tim@MX5-2> show configuration interfaces ge-1/0/5 
gigether-options {
 802.3ad ae0;
}

tim@MX5-2> show interfaces ae0 | match pps 
 Input rate : 19535296 bps (1750 pps)
 Output rate : 14546816 bps (1302 pps)

So it seems to be working – MX-1 and MX-2 are both sending and receiving traffic in the same layer-2 broadcast domain,

Lets check their MPLS facing interfaces:

MX-1


tim@MX5-1> show isis adjacency 
Interface System L State Hold (secs) SNPA
ge-1/1/0.0 m10i-1 2 Up 19

tim@MX5-1> show interfaces ge-1/1/0 | match pps 
 Input rate : 10415216 bps (930 pps)
 Output rate : 5404040 bps (484 pps)

tim@MX5-1>

MX-2


tim@MX5-2> show isis adjacency 
Interface System L State Hold (secs) SNPA
ge-1/1/0.0 m10i-2 2 Up 24

tim@MX5-2> show interfaces ge-1/1/0 | match pps 
 Input rate : 14583752 bps (1303 pps)
 Output rate : 19535576 bps (1751 pps)

tim@MX5-2>

 

And so all seems right with the world, traffic from the MPLS network is being sent from MX-3 to both MX-1 and MX-2, lets look at the EVPN BGP control-plane on MX-3 to see what’s going on with all-active – we’ll take a look at a slice of the BGP table for brevity:

 

  1. 2:1.1.1.1:100::100::00:00:66:cf:82:df/304
  2.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  3.                       AS path: I, validation-state: unverified
  4.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  5. 2:1.1.1.1:100::100::00:00:66:cf:82:e1/304
  6.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  7.                       AS path: I, validation-state: unverified
  8.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  9. 2:1.1.1.1:100::100::00:00:66:cf:82:e3/304
  10.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  11.                       AS path: I, validation-state: unverified
  12.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  13. 2:1.1.1.1:100::100::00:00:66:d0:5d:f3/304
  14.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.1
  15.                       AS path: I, validation-state: unverified
  16.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300944
  17. 2:1.1.1.2:100::100::00:00:2e:18:6d:e1/304
  18.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  19.                       AS path: I, validation-state: unverified
  20.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  21. 2:1.1.1.2:100::100::00:00:2e:18:f3:c4/304
  22.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  23.                       AS path: I, validation-state: unverified
  24.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  25. 2:1.1.1.2:100::100::00:00:66:cf:82:d1/304
  26.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  27.                       AS path: I, validation-state: unverified
  28.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960
  29. 2:1.1.1.2:100::100::00:00:66:cf:82:d3/304
  30.                    *[BGP/170] 01:28:27, localpref 100, from 10.10.10.2
  31.                       AS path: I, validation-state: unverified
  32.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300960

 

 

You’ll notice that in MX-3’s BGP EVPN table, it’s receiving those good old type-2 MAC routes, however some of them are being learnt from MX-1 and MX-2, which is exactly what we want and exactly what MX-3 needs in order for egress traffic to be sent towards MX-1 and MX-2 in the all-active fashion that we desire.

Remember that because EVPN maintains an forwarding-based layer-2 control plane, the determination on whether traffic should go to MX-1 or MX-2, from MX-3 depends on how EX4200-1 hashes egress traffic in the first place, see the below diagram for an at attempt at a better explanation:

Capture8

 

But what happens if the EX4200 switch has a really rubbish hashing algorithm, or there’s no granularity – to the point where nearly all the traffic comes from MX-1 and hardly any comes from MX-2, you’d end up with traffic polarisation and really bad load-balancing. EVPN solves this problem by using an aliasing label.

MX-3 for example has a full table of EVPN MAC routes, so it can load-balance traffic on a per-flow basis back to MX-1 and Mx-2 by making use of the aliasing label. In the case of the IXIA hosts at the top of the network, they’re all being advertised with an ESI of 00:11:22:33:44:55:66:77:88:99, which means they’re all coming from the same place – this means MX-3 will simply treat the aliasing route as a normal MAC route and send the traffic anyway.

If there’s a failure somewhere on either MX-1 or MX-2, the aliasing label gets withdrawn and you’re left with MAC routes for one site only – to prevent the black-holing of traffic.

 

The last thing to consider is the concept of “designated forwarder” lets re-check the EVPN instance output from earlier on:

  1. tim@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                13      96
  7.     Default gateway MAC addresses:       1       0
  8.   Number of local interfaces: 1 (1 up)
  9.     Interface name  ESI                            Mode             Status
  10. ae0.100         00:11:22:33:44:55:66:77:88:99  all-active       Up
  11.   Number of IRB interfaces: 1 (1 up)
  12.     Interface name  VLAN ID  Status  L3 context
  13.     irb.100         100      Up      VPN-100
  14.   Number of bridge domains: 1
  15.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  16.     100          1   1     Extended         Enabled   300432
  17.   Number of neighbors: 2
  18.     10.10.10.2
  19.       Received routes
  20.         MAC address advertisement:             49
  21.         MAC+IP address advertisement:           0
  22.         Inclusive multicast:                    1
  23.         Ethernet auto-discovery:                2
  24.     10.10.10.3
  25.       Received routes
  26.         MAC address advertisement:             60
  27.         MAC+IP address advertisement:           0
  28.         Inclusive multicast:                    1
  29.         Ethernet auto-discovery:                0
  30.   Number of ethernet segments: 1
  31.     ESI: 00:11:22:33:44:55:66:77:88:99
  32.       Status: Resolved by IFL ae0.100
  33. Local interface: ae0.100, Status: Up/Forwarding
  34.       Number of remote PEs connected: 1
  35.         Remote PE        MAC label  Aliasing label  Mode
  36.         10.10.10.2       300416     300416          all-active  
  37.       Designated forwarder: 10.10.10.1
  38.       Backup forwarder: 10.10.10.2
  39.       Advertised MAC label: 300400
  40.       Advertised aliasing label: 300400
  41.       Advertised split horizon label: 300416
  42. Instance: __default_evpn__
  43.   Route Distinguisher: 10.10.10.1:0
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 1
  46.     10.10.10.2
  47.       Received routes
  48.         Ethernet Segment:                       1
  49. tim@MX5-1>

 

When running in all-active mode, it’s obvious that both PE routers are forwarding traffic, but it’s important to know that both PE’s can only forward unicast traffic in an all-active fashion. When two PE routers discover each other on the same EVI via the MPLS network, via BGP auto-discovery routes, they elect a “designated forwarder”

The primary role of the active designated forwarder is to forward BUM (broadcast multicast traffic) it would be highly undesirable for both PE’s to forward broadcasts and so only one is responsible for this in order to prevent traffic duplication.

Anyways, that’s about all I have time for tonight – I hope you found this useful!

EVPN – Single-active redundancy

In the previous 2 posts I looked at the basics of EVPN including the new BGP based control-plane, later I looked at the integration between the layer-2 and layer-3 worlds within EVPN. However – all the previous examples were shown with basic single site networks with no link or device redundancy, this this post I’m going to look at the first and simplest EVPN redundancy mode.

First – consider the new lab topology:

Capture4

The topology and configuration remains pretty much the same, except that MX-1 and MX-2 each connect back to EX4200-1, for VLAN 100 and VLAN 101, with the same IRB interfaces present on each MX router, essentially a very basic site with 2 PEs for redundancy.

Let’s recap the EVPN configuration on each MX1, I’ve got the exact same configuration loaded on MX-2 and MX-3, the only differences being the interface numbers and a unique RD for each site.

MX-1: 

  1. tim@MX5-1> show configuration routing-instances
  2. EVPN-100 {
  3.     instance-type virtual-switch;
  4.     route-distinguisher 1.1.1.1:100;
  5.     vrf-target target:100:100;
  6.     protocols {
  7.         evpn {
  8.             extended-vlan-list 100-101;
  9.             default-gateway do-not-advertise;
  10.         }
  11.     }
  12.     bridge-domains {
  13.         VL-100 {
  14.             vlan-id 100;
  15.             interface ge-1/1/5.100;
  16.             routing-interface irb.100;
  17.         }
  18.         VL-101 {
  19.             vlan-id 101;
  20.             interface ge-1/1/5.101;
  21.             routing-interface irb.101;
  22.         }
  23.     }
  24. }
  25. VPN-100 {
  26.     instance-type vrf;
  27.     interface irb.100;
  28.     interface irb.101;
  29.     route-distinguisher 100.100.100.1:100;
  30.     vrf-target target:1:100;
  31.     vrf-table-label;
  32. }
  33. tim@MX5-1>

 

 

Essentially, each site is configured exactly the same, except for a unique RD per site, and differences in the interface numbering.

In terms of providing active/standby redundancy at the main site, for layer-2 and layer-3 simultaneously, we would historically use VPLS combined with VRRP on the IRB interfaces to provide connectivity.

However this isn’t a perfect solution, for the following reasons:

  1. Unlike EVPN – VPLS needs unique IPv4 GW/MAC addresses at each site, inside the same VPN, so the only way to do active-standby redundancy is with VRRP.
  2. VRRP designs can become complex, ensuring that everything is tracked and monitored – partial failures can be hard to track and things can get over-complicated.
  3. Traffic tromboning can occur where VRRP is used

Regarding point 3

Imagine a scenario where each PE is providing a layer-3 default gateway for each VLAN on each PE, where MX1 is active for VLAN 100 and MX2 is active for VLAN 101

Capture5

It looks simple enough, but traffic tromboning can occur quite easily – due to the reliance on VRRP, for example if host-1 in VLAN 100 wants to send traffic to host-2 in VLAN 101, connected to the same switch – the following things happen:

  1. The packet hits the VRRP active VLAN 100 IRB interface on MX1
  2. Because VLAN 101 is in standby mode on MX1 – it can’t be switched locally
  3. MX1 forwards the packet towards the MPLS network, because there’s a BGP route coming from MX2 (because it’s VRRP active for VLAN 101)
  4. Rather than being routed locally, the packet has to traverse the MPLS network, in order to route between VLANs:

Capture6

Things like this are a pain, and can be mitigated by design and awareness from the start – but in my opinion these sorts of scenarios are good examples of why EVPN was invented, because VPLS never properly solved the basic problems that we get in day to day designs, for simple bread and butter problems like routing between VLANs you end up having a nightmare.

So how does EVPN do it differently?

First, lets look at the configuration required to convert the lab topology into EVPN active-standby, it’s pretty simple:

MX-1: 

  1. tim@MX5-1# run show configuration interfaces ge-1/1/5
  2. flexible-vlan-tagging;
  3. encapsulation flexible-ethernet-services;
  4. esi {
  5.     00:11:22:33:44:55:66:77:88:99;
  6.     single-active;
  7. }
  8. unit 100 {
  9.     encapsulation vlan-bridge;
  10.     vlan-id 100;
  11. }
  12. unit 101 {
  13.     encapsulation vlan-bridge;
  14.     vlan-id 101;
  15. }
  16. [edit]
  17. tim@MX5-1#

 

MX-2:

  1. tim@MX5-2# run show configuration interfaces ge-1/0/5
  2. flexible-vlan-tagging;
  3. encapsulation flexible-ethernet-services;
  4. esi {
  5.     00:11:22:33:44:55:66:77:88:99;
  6.     single-active;
  7. }
  8. unit 100 {
  9.     encapsulation vlan-bridge;
  10.     vlan-id 100;
  11. }
  12. unit 101 {
  13.     encapsulation vlan-bridge;
  14.     vlan-id 101;
  15. }
  16. [edit]
  17. tim@MX5-2#

 

In basic EVPN where sites are single-homed, the “ESI” (Ethernet segment identifier) remains at zero, however whenever you have single-active multi-homing or active-active multi-homing, the ESI value  must be configured to a non-default value. It’s purpose is to identify an Ethernet segment and as such it identifies the entire “site” or “data-centre” to other PE routers on the network, it’s configured under the physical Ethernet interface and must be the same across the segment, in this case for MX1 and MX2 access-facing interfaces

Secondly, under the ESI configuration the PE interfaces are configured to operate in “single-active” mode, which should be self explanatory to most readers 🙂

How does this alter the EVPN control-plane? lets have a more detailed look at the EVPN instance on MX-1

 

  1. tim@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                 2       2
  7.     Default gateway MAC addresses:       2       0
  8.   Number of local interfaces: 2 (2 up)
  9.     Interface name  ESI                            Mode             Status
  10.     ge-1/1/5.100    00:11:22:33:44:55:66:77:88:99  single-active    Up    
  11.     ge-1/1/5.101    00:11:22:33:44:55:66:77:88:99  single-active    Up    
  12.   Number of IRB interfaces: 2 (2 up)
  13.     Interface name  VLAN ID  Status  L3 context
  14.     irb.100         100      Up      VPN-100
  15.     irb.101         101      Up      VPN-100
  16.   Number of bridge domains: 2
  17.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  18.     100          1   1     Extended         Enabled   302080
  19.     101          1   1     Extended         Enabled   301872
  20.   Number of neighbors: 2
  21.     10.10.10.2
  22.       Received routes
  23.         MAC address advertisement:              0
  24.         MAC+IP address advertisement:           0
  25.         Inclusive multicast:                    2
  26.         Ethernet auto-discovery:                1
  27.     10.10.10.3
  28.       Received routes
  29.         MAC address advertisement:              2
  30.         MAC+IP address advertisement:           2
  31.         Inclusive multicast:                    2
  32.         Ethernet auto-discovery:                0
  33.   Number of ethernet segments: 1
  34.     ESI: 00:11:22:33:44:55:66:77:88:99
  35.       Status: Resolved by IFL ge-1/1/5.100
  36.       Local interface: ge-1/1/5.100, Status: Up/Forwarding
  37.       Number of remote PEs connected: 1
  38.         Remote PE        MAC label  Aliasing label  Mode
  39.         10.10.10.2       301008     0               single-active
  40.       Designated forwarder: 10.10.10.1
  41.       Backup forwarder: 10.10.10.2
  42.       Advertised MAC label: 301232
  43.       Advertised aliasing label: 301232
  44.       Advertised split horizon label: 0
  45. Instance: __default_evpn__
  46.   Route Distinguisher: 10.10.10.1:0
  47.   VLAN ID: None
  48.   Per-instance MAC route label: 299808
  49.   MAC database status                Local  Remote
  50.     Total MAC addresses:                 0       0
  51.     Default gateway MAC addresses:       0       0
  52.   Number of local interfaces: 0 (0 up)
  53.   Number of IRB interfaces: 0 (0 up)
  54.   Number of bridge domains: 0
  55.   Number of neighbors: 1
  56.     10.10.10.2
  57.       Received routes
  58.         Ethernet auto-discovery:                0
  59.         Ethernet Segment:                       1
  60.   Number of ethernet segments: 0
  61. tim@MX5-1>

 

 

A couple of things to note:

  • EVPN is running in single-active mode, for ge-1/1/5.100 and ge-1/0/5.101
  • The access-interface (ge-1/1/5) on MX1 is shown to be up/forwarding, making this the active PE
  • MX1 is operating in single-active mode
  • The designated forwarder is MX1 (10.10.10.1)
  • The backup designated forwarder is MX2 (10.10.10.2)

Because MX-1 is the active PE, lets take a look at BGP on MX-3 to see what routes are advertised from the redundant site, to a remote site:

(Note – I currently have 2Mbps of IXIA traffic flowing bi-bidirectionally between each site, in each VLAN)

  1. EVPN-100.evpn.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
  2. + = Active Route, – = Last Active, * = Both
  3. 1:1.1.1.1:100::112233445566778899::0/304
  4.                    *[BGP/170] 04:17:27, localpref 100, from 10.10.10.1
  5.                       AS path: I, validation-state: unverified
  6.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  7. 1:10.10.10.1:0::112233445566778899::FFFF:FFFF/304
  8.                    *[BGP/170] 04:17:27, localpref 100, from 10.10.10.1
  9.                       AS path: I, validation-state: unverified
  10.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  11. 1:10.10.10.2:0::112233445566778899::FFFF:FFFF/304
  12.                    *[BGP/170] 13:50:18, localpref 100, from 10.10.10.2
  13.                       AS path: I, validation-state: unverified
  14.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300848
  15. 2:1.1.1.1:100::100::00:00:2e:18:6d:e1/304
  16.                    *[BGP/170] 04:17:23, localpref 100, from 10.10.10.1
  17.                       AS path: I, validation-state: unverified
  18.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  19. 2:1.1.1.1:100::101::00:00:2e:e6:77:95/304
  20.                    *[BGP/170] 04:17:23, localpref 100, from 10.10.10.1
  21.                       AS path: I, validation-state: unverified
  22.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  23. 2:1.1.1.1:100::100::00:00:2e:18:6d:e1::192.168.100.10/304
  24.                    *[BGP/170] 04:17:23, localpref 100, from 10.10.10.1
  25.                       AS path: I, validation-state: unverified
  26.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  27. 2:1.1.1.1:100::101::00:00:2e:e6:77:95::192.168.101.10/304
  28.                    *[BGP/170] 04:17:23, localpref 100, from 10.10.10.1
  29.                       AS path: I, validation-state: unverified
  30.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  31. 3:1.1.1.1:100::100::10.10.10.1/304
  32.                    *[BGP/170] 04:17:26, localpref 100, from 10.10.10.1
  33.                       AS path: I, validation-state: unverified
  34.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  35. 3:1.1.1.1:100::101::10.10.10.1/304
  36.                    *[BGP/170] 13:50:26, localpref 100, from 10.10.10.1
  37.                       AS path: I, validation-state: unverified
  38.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  39. 3:1.1.1.2:100::100::10.10.10.2/304
  40.                    *[BGP/170] 13:50:18, localpref 100, from 10.10.10.2
  41.                       AS path: I, validation-state: unverified
  42.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300848
  43. 3:1.1.1.2:100::101::10.10.10.2/304
  44.                    *[BGP/170] 13:50:18, localpref 100, from 10.10.10.2
  45.                       AS path: I, validation-state: unverified
  46.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300848
  47. tim@MX5-3>

 

We covered type-2 and type-3 routes in the previous labs, but here we have a new type-1 route being received on MX-3, what’s that all about? lets take a deeper look:

  1. tim@MX5-3> show route protocol bgp table EVPN-100.evpn.0 extensive
  2. EVPN-100.evpn.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
  3. 1:1.1.1.1:100::112233445566778899::0/304 (1 entry, 1 announced)
  4.         *BGP    Preference: 170/-101
  5.                 Route Distinguisher: 1.1.1.1:100
  6.                 Next hop type: Indirect
  7.                 Address: 0x2a7b880
  8.                 Next-hop reference count: 16
  9.                 Source: 10.10.10.1
  10.                 Protocol next hop: 10.10.10.1
  11.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  12.                 State: <Secondary Active Int Ext>
  13.                 Local AS:   100 Peer AS:   100
  14.                 Age: 4:21:25    Metric2: 1
  15.                 Validation State: unverified
  16.                 Task: BGP_100.10.10.10.1+179
  17.                 Announcement bits (1): 0-EVPN-100-evpn
  18.                 AS path: I
  19.                 Communities: target:100:100
  20.                 Import Accepted
  21.                 Route Label: 301232
  22.                 Localpref: 100
  23.                 Router ID: 10.10.10.1
  24.                 Primary Routing Table bgp.evpn.0
  25.                 Indirect next hops: 1
  26.                         Protocol next hop: 10.10.10.1 Metric: 1
  27.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  28.                         Indirect path forwarding next hops: 1
  29.                                 Next hop type: Router
  30.                                 Next hop: 192.169.100.15 via ge-1/1/0.0
  31.                                 Session Id: 0x0
  32.             10.10.10.1/32 Originating RIB: inet.3
  33.               Metric: 1           Node path count: 1
  34.               Forwarding nexthops: 1
  35.                 Nexthop: 192.169.100.15 via ge-1/1/0.0
  36. 1:10.10.10.1:0::112233445566778899::FFFF:FFFF/304 (1 entry, 1 announced)
  37.         *BGP    Preference: 170/-101
  38.                 Route Distinguisher: 10.10.10.1:0
  39.                 Next hop type: Indirect
  40.                 Address: 0x2a7b880
  41.                 Next-hop reference count: 16
  42.                 Source: 10.10.10.1
  43.                 Protocol next hop: 10.10.10.1
  44.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  45.                 State: <Secondary Active Int Ext>
  46.                 Local AS:   100 Peer AS:   100
  47.                 Age: 4:21:25    Metric2: 1
  48.                 Validation State: unverified
  49.                 Task: BGP_100.10.10.10.1+179
  50.                 Announcement bits (1): 0-EVPN-100-evpn
  51.                 AS path: I
  52.                 Communities: target:100:100 esi-label:single-active (label 0)
  53.                 Import Accepted
  54.                 Localpref: 100
  55.                 Router ID: 10.10.10.1
  56.                 Primary Routing Table bgp.evpn.0
  57.                 Indirect next hops: 1
  58.                         Protocol next hop: 10.10.10.1 Metric: 1
  59.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  60.                         Indirect path forwarding next hops: 1
  61.                                 Next hop type: Router
  62.                                 Next hop: 192.169.100.15 via ge-1/1/0.0
  63.                                 Session Id: 0x0
  64.             10.10.10.1/32 Originating RIB: inet.3
  65.               Metric: 1           Node path count: 1
  66.               Forwarding nexthops: 1
  67.                 Nexthop: 192.169.100.15 via ge-1/1/0.0
  68. 1:10.10.10.2:0::112233445566778899::FFFF:FFFF/304 (1 entry, 1 announced)
  69.         *BGP    Preference: 170/-101
  70.                 Route Distinguisher: 10.10.10.2:0
  71.                 Next hop type: Indirect
  72.                 Address: 0x2a7ae54
  73.                 Next-hop reference count: 6
  74.                 Source: 10.10.10.2
  75.                 Protocol next hop: 10.10.10.2
  76.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  77.                 State: <Secondary Active Int Ext>
  78.                 Local AS:   100 Peer AS:   100
  79.                 Age: 13:54:16   Metric2: 1
  80.                 Validation State: unverified
  81.                 Task: BGP_100.10.10.10.2+179
  82.                 Announcement bits (1): 0-EVPN-100-evpn
  83.                 AS path: I
  84.                 Communities: target:100:100 esi-label:single-active (label 0)
  85.                 Import Accepted
  86.                 Localpref: 100
  87.                 Router ID: 10.10.10.2
  88.                 Primary Routing Table bgp.evpn.0
  89.                 Indirect next hops: 1
  90.                         Protocol next hop: 10.10.10.2 Metric: 1
  91.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  92.                         Indirect path forwarding next hops: 1
  93.                                 Next hop type: Router
  94.                                 Next hop: 192.169.100.15 via ge-1/1/0.0
  95.                                 Session Id: 0x0
  96.             10.10.10.2/32 Originating RIB: inet.3
  97.               Metric: 1           Node path count: 1
  98.               Forwarding nexthops: 1
  99.                 Nexthop: 192.169.100.15 via ge-1/1/0.0

 

The Type-1 route is known as an AD or Auto-Discovery route, and it’s broken up into two distinct chunks:

  • A per-EVI AD route (line 4
  • A per-ESI AD route (lines 71 and 87)

The first route (line 4) is known as a per-EVI route, and contains what’s known as the “aliasing label” technically this isn’t required in an active-standby situation, as it exists to ensure that traffic can be forwarded equally where you have multiple PEs in an active-active setup. It solves the problem of traffic polarisation caused by a CE hashing traffic on one egress link only – resulting in that being replicated in the control-plane, so return traffic is also polarised, the aliasing label gets around this simply because a remote PE treats it like a regular MAC/IP route, but more on that in the next blog 🙂

The other two routes (line 71 and 87) are Per-ESI AD routes, and contain the ESI of the site, advertised from PE1 and PE2, you notice that the community is set as “target:100:100 esi-label:single-active” and has a label-value of 0. This is essentially telling MX3 that the ESI is running in single-active mode, if it was running in active-active mode – then a non-zero MPLS label would be present – in order to cater for split horizon and BUM traffic. In this case the setup is single-active and so there will only ever be one route at a time back to site 1.

These routes also speed up convergence, if you’re advertising 1000s of MAC/IP routes and you get a link failure, rather than a PE having to send BGP messages to withdraw all those routes, it can simply withdraw the Ethernet AD routes – which speeds up convergence.

Next lets take a look at what’s going on at the main site, and see what MX1 is advertising to MX2:

 

  1. tim@MX5-1> show route advertising-protocol bgp 10.10.10.2 evpn-esi-value 00:11:22:33:44:55:66:77:88:99 detail
  2. VPN-100.inet.0: 8 destinations, 14 routes (8 active, 0 holddown, 0 hidden)
  3. EVPN-100.evpn.0: 16 destinations, 16 routes (16 active, 0 holddown, 0 hidden)
  4. * 1:1.1.1.1:100::112233445566778899::0/304 (1 entry, 1 announced)
  5.  BGP group iBGP-PEs type Internal
  6.      Route Distinguisher: 1.1.1.1:100
  7.      Route Label: 301232
  8.      Nexthop: Self
  9.      Flags: Nexthop Change
  10.      Localpref: 100
  11.      AS path: [100] I
  12.      Communities: target:100:100
  13. __default_evpn__.evpn.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
  14. * 1:10.10.10.1:0::112233445566778899::FFFF:FFFF/304 (1 entry, 1 announced)
  15.  BGP group iBGP-PEs type Internal
  16.      Route Distinguisher: 10.10.10.1:0
  17.      Nexthop: Self
  18.      Flags: Nexthop Change
  19.      Localpref: 100
  20.      AS path: [100] I
  21.      Communities: target:100:100 esi-label:single-active (label 0)
  22. * 4:10.10.10.1:0::112233445566778899:10.10.10.1/304 (1 entry, 1 announced)
  23.  BGP group iBGP-PEs type Internal
  24.      Route Distinguisher: 10.10.10.1:0
  25.      Nexthop: Self
  26.      Flags: Nexthop Change
  27.      Localpref: 100
  28.      AS path: [100] I
  29.      Communities: es-import-target:22-33-44-55-66-77

 

You can see that there’s a new “type-4” route being advertised, this is known as an “Ethernet Segment (ES) route” and is advertised by PE routers which are configured with non-zero ESI values. Essentially, it’s a special extended community (ES-Import-target) that each PE router will import if they both have the same ESI configured, it means that two PE routers remote from one another, know that they’re both connected to the same Ethernet segment, all other PE routers with default, or non-zero ESI values filter these advertisements.

So a quick recap – we’ve looked at the new route types, the control-plane and the configuration, the next step is to see how well it works, first a quick recap of the diagram:

Capture7

I’ve created a flow of IXIA traffic bi-bidirectionally between the top site and the bottom site, if I go to MX-1 and look at the MPLS facing interface, we should see the traffic:


Physical interface: ge-1/1/0, Enabled, Physical link is Up
Interface index: 147, SNMP ifIndex: 525
Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 1000mbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Enabled, Auto-negotiation: Enabled, Remote fault: Online
Pad to minimum frame size: Disabled
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Link flags : None
CoS queues : 8 supported, 8 maximum usable queues
Current address: a8:d0:e5:5b:7c:90, Hardware address: a8:d0:e5:5b:7c:90
Last flapped : 2016-06-10 20:15:19 UTC (5d 19:13 ago)
Input rate : 5599000 bps (500 pps)
Output rate : 5583408 bps (499 pps)

So it’s clear that traffic is being forwarded by MX-1, because I’m sending packets at an exact rate of 1000pps we should be able to measure how quickly fail-over occurs by counting the number of lost packets, for example – at 1000pps, if I lose 50 packets, that yields a fail-over time of 50ms.

First an easy failure – I’ll shut down ge-0/0/0 on EX4200-1, this will put the interface down/down on MX-1 and we’ll measure how long it takes to recover:


imtech@ex4200-1# set interfaces ge-0/0/0 disable
{master:0}[edit]
imtech@ex4200-1# commit
configuration check succeeds
commit complete
{master:0}[edit]
imtech@ex4200-1#

Lets look at much traffic was lost:

Fail1

Frames delta = 1077, so just a fraction longer than 1 second to failover, which isn’t THAT bad, we might be able to improve it later..

Lets check the EVPN instance to see how things have changed:

on MX1:

  1. im@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                 0       3
  7.     Default gateway MAC addresses:       0       0
  8.   Number of local interfaces: 2 (0 up)
  9.     Interface name  ESI                            Mode             Status
  10.     ge-1/1/5.100    00:11:22:33:44:55:66:77:88:99  single-active    Down  
  11.     ge-1/1/5.101    00:11:22:33:44:55:66:77:88:99  single-active    Down  
  12.   Number of IRB interfaces: 2 (0 up)
  13.     Interface name  VLAN ID  Status  L3 context
  14.   irb.100         100      Down    VPN-100                          
  15.     irb.101         101      Down    VPN-100      
  16.   Number of bridge domains: 2
  17.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  18.     100          1   0     Extended         Enabled
  19.     101          1   0     Extended         Enabled
  20.   Number of neighbors: 2
  21.     10.10.10.2
  22.       Received routes
  23.         MAC address advertisement:              1
  24.         MAC+IP address advertisement:           1
  25.         Inclusive multicast:                    2
  26.         Ethernet auto-discovery:                2
  27.     10.10.10.3
  28.       Received routes
  29.         MAC address advertisement:              2
  30.         MAC+IP address advertisement:           2
  31.         Inclusive multicast:                    2
  32.         Ethernet auto-discovery:                0
  33.   Number of ethernet segments: 1
  34.     ESI: 00:11:22:33:44:55:66:77:88:99
  35.       Status: Resolved by NH 1048582
  36.   Local interface: ge-1/1/5.100, Status: Down
  37.       Number of remote PEs connected: 1
  38.         Remote PE        MAC label  Aliasing label  Mode
  39.         10.10.10.2       301008     301008          single-active
  40.       Designated forwarder: 10.10.10.2
  41.       Advertised MAC label: 301232
  42.       Advertised aliasing label: 301232
  43.       Advertised split horizon label: 0
  44. Instance: __default_evpn__
  45.   Route Distinguisher: 10.10.10.1:0
  46.   VLAN ID: None
  47.   Per-instance MAC route label: 299808
  48.   MAC database status                Local  Remote
  49.     Total MAC addresses:                 0       0
  50.     Default gateway MAC addresses:       0       0
  51.   Number of local interfaces: 0 (0 up)
  52.   Number of IRB interfaces: 0 (0 up)
  53.   Number of bridge domains: 0
  54.   Number of neighbors: 1
  55.     10.10.10.2
  56.       Received routes
  57.         Ethernet auto-discovery:                0
  58.         Ethernet Segment:                       1
  59.   Number of ethernet segments: 0
  60. tim@MX5-1>

 

So it’s pretty clear that things have gone down, and MX2 is the new active PE router, lets check it out:

  1. tim@MX5-2> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.2:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                 1       2
  7.     Default gateway MAC addresses:       2       0
  8.   Number of local interfaces: 2 (2 up)
  9.     Interface name  ESI                            Mode             Status
  10.     ge-1/0/5.100    00:11:22:33:44:55:66:77:88:99  single-active    Up    
  11.     ge-1/0/5.101    00:11:22:33:44:55:66:77:88:99  single-active    Up    
  12.   Number of IRB interfaces: 2 (2 up)
  13.     Interface name  VLAN ID  Status  L3 context
  14.     irb.100         100      Up      VPN-100                          
  15.     irb.101         101      Up      VPN-100      
  16.   Number of bridge domains: 2
  17.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  18.     100          1   1     Extended         Enabled   302272
  19.     101          1   1     Extended         Enabled   302224
  20.   Number of neighbors: 1
  21.     10.10.10.3
  22.       Received routes
  23.         MAC address advertisement:              2
  24.         MAC+IP address advertisement:           2
  25.         Inclusive multicast:                    2
  26.         Ethernet auto-discovery:                0
  27.   Number of ethernet segments: 1
  28.     ESI: 00:11:22:33:44:55:66:77:88:99
  29.       Status: Resolved by IFL ge-1/0/5.100
  30.       Local interface: ge-1/0/5.100, Status: Up/Forwarding
  31.       Designated forwarder: 10.10.10.2
  32.       Advertised MAC label: 301008
  33.       Advertised aliasing label: 301008
  34.       Advertised split horizon label: 0
  35. Instance: __default_evpn__
  36.   Route Distinguisher: 10.10.10.2:0
  37.   VLAN ID: None
  38.   Per-instance MAC route label: 299808
  39.   MAC database status                Local  Remote
  40.     Total MAC addresses:                 0       0
  41.     Default gateway MAC addresses:       0       0
  42.   Number of local interfaces: 0 (0 up)
  43.   Number of IRB interfaces: 0 (0 up)
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 0
  46.   Number of ethernet segments: 0
  47. tim@MX5-2>

 

 

If we look at the MPLS facing interface on MX2, we should see that all traffic is being sent and received via the MPLS network:


tim@MX5-2> show interfaces ge-1/1/0
Physical interface: ge-1/1/0, Enabled, Physical link is Up
Interface index: 147, SNMP ifIndex: 526
Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 1000mbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Enabled, Auto-negotiation: Enabled, Remote fault: Online
Pad to minimum frame size: Disabled
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Link flags : None
CoS queues : 8 supported, 8 maximum usable queues
Current address: a8:d0:e5:5b:75:90, Hardware address: a8:d0:e5:5b:75:90
Last flapped : 2016-06-10 20:08:17 UTC (5d 19:42 ago)
Input rate : 5605824 bps (502 pps)
Output rate : 5584392 bps (501 pps)

 

The solution itself is a lot more elegant than traditional FHRP (First hop routing protocols) such as VRRP or HSRP.

  • Because MX1 and MX2 automatically learn about each other via the MPLS network and the type-4 Ethernet-Segment route, and NOT the LAN (like HSRP) – if there’s any problem with the MPLS side connected to the active router, it transitions to standby and the solution fails over.

If I fail the MPLS interface on the “P” router connected to MX1, we get failover in less than 1 second:


Axians@m10i-1# set interfaces ge-0/0/2 disable
[edit]
Axians@m10i-1# commit
commit complete

Then check the packet loss in IXIA:

Fail2

The solution recovers from the failure in 912ms.

This is pretty great, not least because it works reliably – but most of this functionality is built directly into the protocol, I haven’t had to do any crazy tracking of routes, I haven’t needed to go anywhere near IP SLA or any of that horror that is a massive pain when designing this sort of thing, with EVPN – things are pretty simple and work reliably.

It’s not perfect however, unlike HSRP or VRRP which form an adjacency over a LAN via Multicast, EVPN doesn’t do this – all information about other PEs is sent and received via BGP. If you have a complex LAN environment and a failure leaves the PEs isolated – you don’t get a traditional split-brain scenario like you would with HSRP or VRRP, the solution simply doesn’t fail at all, the basic triggers for failure are that the physical interface goes down, the MPLS side goes down, or the entire PE goes down.

This can easily be demonstrated by breaking the logical interface on EX4200-1 whilst leaving the physical interface up/up:


imtech@ex4200-1# set interfaces ge-0/0/0.0 disable
{master:0}[edit]
imtech@ex4200-1# commit
configuration check succeeds
commit complete

The whole solution breaks, and stays broken forever:

Fail3

So you still need to be careful with the design and the different way in which EVPN operates, incidentally you can use things like Ethernet OAM to get around this problem:

Just for laughs, lets apply a basic Ethernet OAM config to MX1, MX2 and the EX4200:

OAM template (shown just on MX-1):

  1. oam {
  2.     ethernet {
  3.         connectivity-fault-management {
  4.             action-profile bring-down {
  5.                 event {
  6.                     interface-status-tlv down;
  7.                     adjacency-loss;
  8.                 }
  9.                 action {
  10.                     interface-down;
  11.                 }
  12.             }
  13.             maintenance-domain “IEEE level 4” {
  14.                 level 4;
  15.                 maintenance-association PE1 {
  16.                     short-name-format character-string;
  17.                     continuity-check {
  18.                         interval 100ms;
  19.                         interface-status-tlv;
  20.                     }
  21.                     mep 1 {
  22.                         interface ge-1/1/5.100;
  23.                         direction down;
  24.                         auto-discovery;
  25.                         remote-mep 2 {
  26.                             action-profile bring-down;
  27.                         }
  28.                     }
  29.                 }
  30.             }
  31.         }
  32.     }

 

Just for clarity, the OAM configuration ensures that if there’s a problem with connectivity between MX1 – EX4200-1 and MX2 – EX4200-1 but the physical interfaces remain up/up, OAM will detect the connectivity loss, and automatically tear the line-protocol of the interface to the down/down status, and force EVPN to fail-over,

lets repeat the exact same test again, with the OAM configuration applied to the PEs and the switch:


imtech@ex4200-1# set interfaces ge-0/0/0.0 disable
{master:0}[edit]
imtech@ex4200-1# commit
configuration check succeeds
commit complete

and check the packet-loss with IXIA:

Fail4

Not bad! 612 packets lost, equals failure and convergence in 624ms, which is a lot better than the original 1077ms when failing the physical interface, and a hell of a lot better than it being down forever, if the network experiences a non-direct failure, (software/logical fail)

Anyway I hope you’ve found this useful, there’s a few bits I’ve skipped over – but I’ll cover those in more detail when I do all-active redundancy in the next blog 🙂

 

EVPN Inter-VLAN routing + mobility

So in the last blog I essentially looked at one of the most basic aspects of EVPN – a multi-site layer-2 network with nothing fancy going on, with traffic forwarding occurring between multiple sites in the same VLAN. The fact of the matter is that there was nothing going on there that you couldn’t do with a traditional VPLS configuration, however the general idea was to demonstrate the basics and take a look at the basic control-plane first.

In this update we’ll be looking at some of the more exclusive and highly useful aspects of EVPNs which make it a very attractive technology for things such as data-centre interconnect, there are a few things which are possible with EVPN which cannot be done with VPLS.

Consider the revised topology:

Capture

It’s the same topology from the first blog post, however I’ve simply added an additional VLAN (VLAN 101) to ge-0/0/22 of each EX4200 LAN switch, and an additional IXIA host.

For this post we’re going to look at a rather cool way of performing inter-VLAN forwarding between hosts in VLAN100 and VLAN101. Not that I want to spend time teaching people how to suck eggs, but generally in a simple network with multiple VLANs you have 2 common ways of performing inter-VLAN forwarding:

  • Use a good ole’ fashioned router on a stick topology
  • Bolt some additional layer-3 functionality onto your layer-2 switch

As everyone knows, the latter method is by far the most common – the vast majority of switches support layer-3 routing functionality, usually in the form of IRB/BVI/SVI depending on the vendor in question.

In a service provider network, where we generally have a number of PE routers acting together as a large distributed switch, providing layer-2 connectivity – the old fashioned way of doing this would be with VPLS. In order to enable inter-VLAN forwarding we’d add a BVI interface to the VPLS instance, this enables a PE to do standard layer-2 switching and route between VLANs at layer-3 – which is very important for data-centre interconnect applications.

EVPN has a number of enhancements which make it more suitable for modern day data-centre interconnect designs, especially where things such as VM mobility are concerned. A company or organisation with a traditional MPLS based network, might require the ability to move hosts around between data centres seamlessly, without causing any real downtime.

Lets take a look at the basic interface configuration and routing-instance configuration:

  1. interfaces {
  2.     irb {
  3.         unit 100 {
  4.             family inet {
  5.                 address 192.168.100.1/24;
  6.             }
  7.             mac 00:00:19:21:68:10;
  8.         }
  9.         unit 101 {
  10.             family inet {
  11.                 address 192.168.101.1/24;
  12.             }
  13.             mac 00:00:19:21:68:11;
  14.         }
  15.     }
  16. routing-instances {
  17. EVPN-100 {
  18.     instance-type virtual-switch;
  19.     route-distinguisher 1.1.1.1:100;
  20.     vrf-target target:100:100;
  21.     protocols {
  22.         evpn {
  23.             extended-vlan-list 100-101;
  24.             default-gateway do-not-advertise;
  25.         }
  26.     }
  27.     bridge-domains {
  28.         VL-100 {
  29.             vlan-id 100;
  30.             interface ge-1/1/5.100;
  31.             routing-interface irb.100;
  32.         }
  33.         VL-101 {
  34.             vlan-id 101;
  35.             interface ge-1/1/5.101;
  36.             routing-interface irb.101;
  37.         }
  38.     }
  39. }
  40. VPN-100 {
  41.     instance-type vrf;
  42.     interface irb.100;
  43.     interface irb.101;
  44.     route-distinguisher 100.100.100.1:100;
  45.     vrf-target target:1:100;
  46.     vrf-table-label;
  47. }

 

First things first – lines 1 – 15 take care of the IRB interfaces for VLAN 100 and VLAN 101; more of that shortly.

Lines 16 – 39 form the configuration for the EVPN routing instance, you’ll note a couple of differences from the first EVPN blog post;

  • The extended-vlan-list has been increased to include both VLANs within the routing instance
  • A new command “default-gateway do-not-advertise” is present under the EVPN protocol configuration
  • An additional bridge-domain has been configured for Vlan 101 under the routing-instance, along with the IRB interface for each vlan
  • What looks like a totally standard L3VPN has been configured, albeit with different RTs and RDs – but it does contain the IRB interfaces from the EVPN routing instance.

The command “default-gateway do-not-advertise” is used to generate a new extended-community route. If on your PE routers you have different IRB MAC addresses and IPv4 addresses – the PE will generate a “default-gateway route” which tells other PEs in the EVPN that this route is a default-gateway somewhere, however in this example and in best practise – it’s simpler and easier to configure the same IRB MAC/IP on all your PEs, and so the command here is “do-not-advertise” as we don’t need it at this time.

But perhaps the coolest feature and one of the biggest advantages EVPN has over VPLS is the way the IRB interfaces are configured, in this topology the 3x PE routers, (MX5-1, MX5-2 and MX5-3) all have an identical IRB interface configuration for VLAN 100 and VLAN 101, each PE has the exact same IP address, and MAC address…:

MX5-1:

  1. imtech@MX5-1# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

MX5-2

  1. imtech@MX5-2# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

MX5-3

  1. imtech@MX5-3# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

The first time you see it, you think:

15omtr

But it’s true! all the PEs in the network have the exact same IP address and MAC address on their IRB interfaces, why would we do that? and how does it work?

Consider the following scenario:

Capture2

Imagine a basic data-centre environment running things like VMware or openstack – basically we can provision servers and move them around all over the place using things like VMotion etc. If you can imagine the active server on the left hand portion of the data-centre and business as usual from a networks perspective, arp is learnt between the host and the left hand PE, the default-gateway is 192.168.100.1

Now, imagine that the DC admin flicks the switch, and that active VM on the left is immediately torn down and spun up inside the right hand DC (which could be many miles away) you’ll notice that the interface mac-address and the default-gateway are the same. This gives us the ability to move hosts around our data centres, without having to worry about different default-gateways, or incurring too much downtime whilst we wait for things to re-arp, because everything is identical at each DC site – there’s no problem moving things around between one site or the next.

Capture3

You cannot do this with VPLS as the implementation demands that you use unique MAC-addresses, which moves us on deeper into the technology – how does EVPN achieve this breakthrough?

It’s essentially boils down to the way that EVPN has been engineered to more closely integrate with the layer-3 world, essentially the software has a number of hooks which go between EVPN and L3VPN in a much more elegant fashion than VPLS, for example in the first blog post – it showed how MAC addresses were learnt and inserted into the BGP control-plane, in this example for Inter-VLAN forwarding, a few extra things are happening:

  • Firstly we have the BGP MAC advertisement from the L2 world,
  • Secondly, we get a new MAC/IP advertisement containing the PE’s IRB MAC and IP address – this is linked to the PE’s ARP table
  • Thirdly, we get a totally standard /32 IPv4 L3VPN route for the host’s /32 address, this is advertised to all remote PEs

Let’s recap a more basic version of the lab diagram and see what the control-plane looks like when we send some traffic between hosts in different VLANs:

Capture4

Now lets look at the BGP control-plane on MX-1 and see what’s going on:

  1. imtech@MX5-1> show route protocol bgp table EVPN-100.evpn.0
  2. EVPN-100.evpn.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 2:1.1.1.2:100::101::00:00:2e:e6:77:97/304
  5.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  6.                       AS path: I, validation-state: unverified
  7.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  8. 2:1.1.1.2:100::101::00:00:2e:e6:77:97::192.168.101.11/304  
  9.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  10.                       AS path: I, validation-state: unverified
  11.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  12. 3:1.1.1.2:100::100::10.10.10.2/304
  13.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  14.                       AS path: I, validation-state: unverified
  15.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  16. 3:1.1.1.2:100::101::10.10.10.2/304
  17.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  18.                       AS path: I, validation-state: unverified
  19.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  20. imtech@MX5-1> show route protocol bgp table VPN-100.inet.0
  21. VPN-100.inet.0: 6 destinations, 9 routes (6 active, 0 holddown, 0 hidden)
  22. + = Active Route, – = Last Active, * = Both
  23. 192.168.100.0/24    [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  24.                       AS path: I, validation-state: unverified
  25.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)
  26. 192.168.101.0/24    [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  27.                       AS path: I, validation-state: unverified
  28.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)
  29. 192.168.101.11/32   [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  30.                       AS path: I, validation-state: unverified
  31.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)

You’ll immediatley notice that compared to the vanilla L2VPN implementation, there’s a lot more going on – lets break it down,

  • Line 6 is the standard MAC advertisement route, the same sort of advertisement we went over with the vanilla standard L2-only version of EVPN – this is for layer-2 connectivity only.
  • Line 10 is an EVPN MAC/IP route, which is basically the ARP mapping learnt directly from MX2 – this route makes it possible for all PEs in the network to synchronise their arp tables with each other!
  • Line 34 is a standard L3VPN route, containing the /32 host behind MX2

Line 10 essentially means, that as soon as you move a host from one place to another – the moment a packet lands on the ingress PE interface – it generates a new MAC/IP ARP route, and all other PE’s synchronise accordingly, meanwhile the host that’s moved doesn’t need to do anything else – other than keep sending packets at the exact same gateway IP/MAC as it did before it was moved, essentially we have layer-2 and layer-3 working together in harmony.

Line 34 is a standard L3VPN /32 host route for the host behind MX2, this means that if you have EVPN running across numerous data-centres in various places, if this is connected to a wider layer-3 network – such as traditional residential/business PE routers, these other routers don’t need to have any awareness of EVPN whatsoever – so long as they can participate in regular L3VPN then packets will always be delivered to the right place when things get moved around, because these routes are dynamically generated and advertised accordingly. This is a massive advantage over VPLS, as you don’t need to configure it in every corner of the network for it to be useful, it simply lives on your DC edge – the rest is left to vanilla L3VPN.

There are a few more enhancements due at some point soon, including quite an interesting one which is the “MAC mobility extended-community” which is essentially a safeguard to prevent a few rather nasty situations from arising:

  • A layer-2 loop, where two PEs constantly advertise the same MAC addresses – which could overwhelm the BGP control-plane
  • A situation where a pair of hosts each in a different DC are mis-configured with the same MAC address – if they’re both sending data then each PE will be generating route advertisements,

The MAC mobility extended community drafted in RFC 7432 introduces a sequence number, where if the same route is advertised a certain number of times within a specific period, it’s assumed that something is broken and the routers should perform some sort of damping and alerting procedure to prevent network meltdown.

I hope you found this useful! the next one I’ll be looking at some of the redundant designs including single-active and all-active multi-homing.

 

 

 

EVPN – the basics

So I decided to take a deep dive into eVPN, I’ll mostly be looking into VLAN-aware bundling, as per RFC 7432 – and mostly because I think this will fit more closely, with the types of deployments most of the customers are used to – good old IRB interfaces and bridge-tables!

As everyone knows, VPLS has been available for many years now and it’s pretty widely deployed, most of the customers I see have some flavour of VPLS configured on their networks and use it to good effect – so why eVPN? what’s the point in introducing a new technology if the current one appears to work fine.

The reality is that multipoint layer-2 VPNs (VPLS) were never quite as polished as layer-3 VPNs, when layer-3 VPNs were first invented they became, and still are the in many cases the “go to” technology for layer-3 connectivity across MPLS networks, and the technology itself hasn’t really changed that much for well over a decade. The same cannot be said for VPLS, over the years we’ve had many different iterations of the technology:

  • Vanilla VPLS
    • LDP signalled
    • BGP signalled
  • H-VPLS (hierarchical VPLS)
    • BGP based
    • LDP based
  • VPLS auto-discovery

Along with the different types of VPLS, the technology itself has been repeatedly modified with hacks and patches, in order to get around some annoyingly simple problems, for example:

  1. VPLS auto-discovery is only supported under BGP signalling – you can’t do it if you’re using LDP signalled VPLS,
  2. H-VPLS – in order to get around the fully meshed psedudowire problem of vanilla VPLS, H-VPLS introduced a hierarchy, in order to cut down on the amount of pseduowires in large networks, unfortunately the  design often ends up being cumbersome and complicated.
  3. mac-address learning – VPLS has no layer-2 control plane, it learns mac-addresses directly from the data-plane like a standard switch – which is fine if it’s taking place inside a single device, but across a large distributed network with many thousands of mac-addresses, a loss of any attachment circuit can result in stale forwarding state and slow convergence/recovery
  4. all-active CE-Multihoming – simply can’t do it in VPLS, single-homed only, which is a major pain for large-scale modern data centres with lots and lots of layer-2 connectivity
  5. Layer-3 integration – With VPLS it’s typical to use a BVI or IRB interface as the layer-3 gateway to a VLAN, however there’s no real integration between the layer-2 and layer-3 world, you still need VRRP for first hop redundancy – which comes with all the pain you’d expect (traffic black holding, complex tracking requirements, interface timers, etc)

The topology I’m going to use for this is shown below:

Capture

A few basic points about the network:

  • The 3x “P” routers in the core of the network are Juniper M10i series, running nothing other than ISIS/LDP/MPLS
  • The 3x “PE” routers, are Juniper MX5 – each with 14.1.R6.4 loaded on, connectivity is via a 20x1G MIC
  • The 3x “EX4200” switches are doing nothing other than trunking VLAN 100 towards each MX-5
  • Each IXIA port has a single host on VLAN 100

The first lab will look at eVPN with basic MPLS transport – this is essentially a replacement for vanilla VPLS, we have three sites each with a single switch – all in Vlan 100 on a common /24 subnet, nothing fancy going on, no layer-3 routing or bridging anywhere, this is all strictly layer-2 for now.

The first thing to note about eVPN is that the core of it is built around a BGP control-plane, no LDP or anything else, it’s BGP only which is great because we all love BGP, the first thing is to enable the evpn address family, (AFI 25 for L2VPN and the new of SAFI 70 evpn)

(Output taken from MX5-1, but identical on all 3 PEs, <except for IP addressing obviously>)

  1. bgp {
  2.         group iBGP-PEs {
  3.             type internal;
  4.             local-address 10.10.10.1;
  5.             family evpn {
  6.                 signaling;
  7.             }
  8.             neighbor 10.10.10.2;
  9.             neighbor 10.10.10.3;
  10.         }
  11.     }

 

This essentially enables the evpn signalling which is essential, unlike VPLS there’s no manual provisioning of pseudowires, because there are no pseudowires, just like L3 VPNs everything is handled via BGP and uses the same route-distinguishers and route-targets that we’ve all come to love.

The configuration for this lab is pretty much identical across all three PEs but we’ll look at MX5-1 for this example, first the LAN facing interface:

  1. ge-1/1/5 {
  2.         flexible-vlan-tagging;
  3.         encapsulation flexible-ethernet-services;
  4.         unit 100 {
  5.             encapsulation vlan-bridge;
  6.             vlan-id 100;
  7.         }
  8.     }

 

Followed by the evpn routing-instance:

  1. routing-instances {
  2.     EVPN-100 {
  3.         instance-type virtual-switch;
  4.         route-distinguisher 1.1.1.1:100;
  5.         vrf-target target:100:100;
  6.         protocols {
  7.             evpn {
  8.                 extended-vlan-list 100;
  9.             }
  10.         }
  11.         bridge-domains {
  12.             VL-100 {
  13.                 vlan-id 100;
  14.                 interface ge-1/1/5.100;
  15.             }
  16.         }
  17.     }
  18. }

 

A few things to note about the routing-instance:

  • Lines 4 and 5 mark the “RD” and “RT” which essentially the same as a standard L3VPN setup
  • The routing-instance is of type “virtual-switch” and the bridge-domain sits inside it,
  • This is essentially is configured the same as a VPLS virtual-switch, except with a different protocol.

Before we send any traffic or try to get any connectivity, lets take a look at the basic control-plane and exactly what sort of things BGP is getting up to, whilst things are simple.

  1. greg@MX5-1# run show bgp summary
  2. Groups: 1 Peers: 2 Down peers: 0
  3. Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
  4. bgp.evpn.0
  5.                        2          2          0          0          0          0
  6. Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped…
  7. 10.10.10.2              100        231        231       0       1     1:40:54 Establ
  8.   bgp.evpn.0: 1/1/1/0
  9.   EVPN-100.evpn.0: 1/1/1/0
  10.   __default_evpn__.evpn.0: 0/0/0/0
  11. 10.10.10.3              100        229        231       0       1     1:40:40 Establ
  12.   bgp.evpn.0: 1/1/1/0
  13.   EVPN-100.evpn.0: 1/1/1/0
  14.   __default_evpn__.evpn.0: 0/0/0/0
  15. [edit]
  16. greg@MX5-1#

 

You’ll notice that before we’ve sent any traffic or done anything, that we have two types of table under each established BGP peer:

  • “bgp.evpn.0” for the core-facing BGP adjacency, (the same as regular L3VPN)
  • “EVPN-100.evpn.0” for the routing-instance table, (again the same as regular L3VPN)

You’ll also notice that we’re receiving 1 route from each PE, for each table, if we investigate further and take a look:

  1. greg@MX5-1# run show route table bgp.evpn.0
  2. bgp.evpn.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 3:1.1.1.2:100::100::10.10.10.2/304
  5.                    *[BGP/170] 00:10:42, localpref 100, from 10.10.10.2
  6.                       AS path: I, validation-state: unverified
  7.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  8. 3:1.1.1.3:100::100::10.10.10.3/304  
  9.                    *[BGP/170] 00:10:40, localpref 100, from 10.10.10.3
  10.                       AS path: I, validation-state: unverified
  11.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  12. [edit]
  13. greg@MX5-1# run show route table EVPN-100.evpn.0
  14. EVPN-100.evpn.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
  15. + = Active Route, – = Last Active, * = Both
  16. 3:1.1.1.1:100::100::10.10.10.1/304
  17.                    *[EVPN/170] 00:10:54
  18.                       Indirect
  19. 3:1.1.1.2:100::100::10.10.10.2/304  
  20.                    *[BGP/170] 00:10:49, localpref 100, from 10.10.10.2
  21.                       AS path: I, validation-state: unverified
  22.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  23. 3:1.1.1.3:100::100::10.10.10.3/304  
  24.                    *[BGP/170] 00:10:47, localpref 100, from 10.10.10.3
  25.                       AS path: I, validation-state: unverified
  26.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936

 

Because everyone reading this has eyes like hawks 😉  you’ll immediately notice the strange looking /304 routes coming from each adjacent PE, let’s examine the first one:

3:1.1.1.2:100::100::10.10.10.2/304  

The format is essentially: 3 : <RD> :: <VLAN-ID> :: <ROUTER-ID> /304

It also contains the “ROUTER-ID-LENGTH” which is obviously /32 however Juniper hides this from the output. It should be obvious to most people what all these values are, except for the “3” what does that mean?

It’s important to note, that evpn defines a set of route-route types as shown below:

  • Type 1 – Ethernet auto-discovery route
  • Type 2 – MAC/IP advertisement route
  • Type 3 – Inclusive multicast Ethernet tag route
  • Type 4 – Ethernet segment (ES) route
  • Type 5 – IP prefix route

Type 3 routes are for signalling the inclusive tunnel, with VLAN-Aware evpn each PE generates a VLAN specific inclusive tunnel which is used for BUM (broadcast unknown multicast) traffic. Basically – it’s used to send BUM traffic to all PEs that have sites in the same VLAN, lets look at it in even more detail:

 

  1. greg@MX5-1# run show route table bgp.evpn.0 extensive
  2. bgp.evpn.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
  3. 3:1.1.1.2:100::100::10.10.10.2/304 (1 entry, 0 announced)
  4.         *BGP    Preference: 170/-101
  5.                 Route Distinguisher: 1.1.1.2:100
  6. PMSI: Flags 0x0: Label 300512: Type INGRESS-REPLICATION 10.10.10.2
  7.                 Next hop type: Indirect
  8.                 Address: 0x2fa4c34
  9.                 Next-hop reference count: 2
  10.                 Source: 10.10.10.2
  11.                 Protocol next hop: 10.10.10.2
  12.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  13.                 State: <Active Int Ext>
  14.                 Local AS:   100 Peer AS:   100
  15.                 Age: 30:23  Metric2: 1
  16.                 Validation State: unverified
  17.                 Task: BGP_100.10.10.10.2+56692
  18.                 AS path: I
  19.                 Communities: target:100:100
  20.                 Import Accepted
  21.                 Localpref: 100
  22.                 Router ID: 10.10.10.2
  23.                 Secondary Tables: EVPN-100.evpn.0
  24.                 Indirect next hops: 1
  25.                         Protocol next hop: 10.10.10.2 Metric: 1
  26.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  27.                         Indirect path forwarding next hops: 1
  28.                                 Next hop type: Router
  29.                                 Next hop: 192.169.100.11 via ge-1/1/0.0
  30.                                 Session Id: 0x0
  31.             10.10.10.2/32 Originating RIB: inet.3
  32.               Metric: 1           Node path count: 1
  33.               Forwarding nexthops: 1
  34.                 Nexthop: 192.169.100.11 via ge-1/1/0.0

 

Line 6 shows the route-type as PMSI (provider multicast service interface) and is type “ingress-replication” one important thing to note – label 300512 is a downstream allocated label, the same as what’s commonly used in P2MP LSPs for multicast services. Essentially, in this case MX5-1 uses the remotely learnt service label to send BUM traffic to the remote PEs – OR, the other way round, it expects to receive BUM traffic from other remote PEs, tagged with IR label 300512.

Moving on – for people new to evpn, one of the coolest concepts is the way in which BGP is used to advertise mac-addresses… rather than plain old IP subnets – this is fantastic because we now have an intelligent control-plane maintained across the whole network in a scalable and stable fashion, rather than having to rely on less reliable data-plane learning.

For the first basic test, we’ll send bi-directional traffic between host connected to EX4200-1 on MX5-1 and the host connected to EX4200-2 on MX5-2

Lets recap the diagram and spin up some hosts:

Capture2

We’ll start with a single host at each site, and send traffic both ways, 1Mbps each way for a total of 2Mbps, (the hosts are in the same /24 VLAN100 – 192.168.100.1 and 192.168.100.2) 

Capture3

Traffic is being forwarded end to end, lets check the routing and see how the control-plane has changed:

 

  1. greg@MX5-1# run show route table bgp.evpn.0
  2. bgp.evpn.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 2:1.1.1.3:100::100::00:00:0e:52:42:29/304  
  5.                    *[BGP/170] 00:04:04, localpref 100, from 10.10.10.3
  6.                       AS path: I, validation-state: unverified
  7.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  8. 3:1.1.1.2:100::100::10.10.10.2/304
  9.                    *[BGP/170] 00:53:37, localpref 100, from 10.10.10.2
  10.                       AS path: I, validation-state: unverified
  11.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  12. 3:1.1.1.3:100::100::10.10.10.3/304
  13.                    *[BGP/170] 00:53:35, localpref 100, from 10.10.10.3
  14.                       AS path: I, validation-state: unverified
  15.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  16. [edit]
  17. greg@MX5-1# run show route table EVPN-100.evpn.0
  18. EVPN-100.evpn.0: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)
  19. + = Active Route, – = Last Active, * = Both
  20. 2:1.1.1.1:100::100::00:00:0e:52:23:91/304      
  21.                    *[EVPN/170] 00:04:13
  22.                       Indirect
  23. 2:1.1.1.3:100::100::00:00:0e:52:42:29/304    
  24.                    *[BGP/170] 00:04:13, localpref 100, from 10.10.10.3
  25.                       AS path: I, validation-state: unverified
  26.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  27. 3:1.1.1.1:100::100::10.10.10.1/304
  28.                    *[EVPN/170] 00:53:51
  29.                       Indirect
  30. 3:1.1.1.2:100::100::10.10.10.2/304
  31.                    *[BGP/170] 00:53:46, localpref 100, from 10.10.10.2
  32.                       AS path: I, validation-state: unverified
  33.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  34. 3:1.1.1.3:100::100::10.10.10.3/304
  35.                    *[BGP/170] 00:53:44, localpref 100, from 10.10.10.3
  36.                       AS path: I, validation-state: unverified
  37.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  38. [edit]
  39. greg@MX5-1#

 

The type-3 routes are still present as before for the inclusive tunnels, but you’ll notice the addition of the new type-2 MAC/IP route, this is essentially a BGP NLRI containing a mac-address instead of an IP subnet – pretty cool huh?

The indirect route is the one learnt locally from the connected LAN, the one known via BGP/170 is the one from the remote PE, packets destined for that mac-address have label 299936 pushed on them, and are forwarded directly out of the MPLS facing core interface, like any regular MPLS packet.

Lets take a more detailed look at a type-2 route:

  1. 2:1.1.1.3:100::100::00:00:0e:52:42:29/304 (1 entry, 1 announced)
  2.         *BGP    Preference: 170/-101
  3.                 Route Distinguisher: 1.1.1.3:100
  4.                 Next hop type: Indirect
  5.                 Address: 0x2705954
  6.                 Next-hop reference count: 4
  7.                 Source: 10.10.10.3
  8.                 Protocol next hop: 10.10.10.3
  9.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  10.                 State: <Secondary Active Int Ext>
  11.                 Local AS:   100 Peer AS:   100
  12.                 Age: 14:20  Metric2: 1
  13.                 Validation State: unverified
  14.                 Task: BGP_100.10.10.10.3+64545
  15.                 Announcement bits (1): 0-EVPN-100-evpn
  16.                 AS path: I
  17.                 Communities: target:100:100
  18.                 Import Accepted
  19.                 Route Label: 300048
  20.                 ESI: 00:00:00:00:00:00:00:00:00:00
  21.                 Localpref: 100
  22.                 Router ID: 10.10.10.3
  23.                 Primary Routing Table bgp.evpn.0
  24.                 Indirect next hops: 1
  25.                         Protocol next hop: 10.10.10.3 Metric: 1
  26.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  27.                         Indirect path forwarding next hops: 1
  28.                                 Next hop type: Router
  29.                                 Next hop: 192.169.100.11 via ge-1/1/0.0
  30.                                 Session Id: 0x0
  31.             10.10.10.3/32 Originating RIB: inet.3
  32.               Metric: 1           Node path count: 1
  33.               Forwarding nexthops: 1
  34.                 Nexthop: 192.169.100.11 via ge-1/1/0.0

 

A basic recap on MPLS forwarding, for the above route MX5-1 is notifying all other PEs in the network, that if they receive a frame on an interface inside “EVPN-100” on VLAN 100 for destination MAC-address 00:00:0e:52:42:29, impose MPLS label 300048 and send it my way.

Another new aspect of evpn can be seen under the “ESI” field, “ESI” stands for “Ethernet segment identifier” essentially it’s a way of labelling individual Ethernet segments, but it’s only used for all-active multihomed designs, any other design it should remain the default of 0x0 (more on ESIs in the next blog)

To demonstrate the control-plane learning and MAC/IP advertisement mechanism more effectively, lets spin up all 3 sites with 50 hosts per site – then send a full mesh of traffic (150 streams in total) and see what the control-plane looks like,

Quick recap of the diagram showing all 3 sites, with 50 hosts per site:

Capture4

Plenty of juicy MAC/IP routes!

 

  1. greg@MX5-1# run show route summary
  2. Autonomous system number: 100
  3. Router ID: 10.10.10.1
  4. inet.0: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)
  5.               Direct:      3 routes,      3 active
  6.                Local:      2 routes,      2 active
  7.               Static:      1 routes,      1 active
  8.                IS-IS:      7 routes,      7 active
  9.                  LDP:      1 routes,      1 active
  10. inet.3: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)
  11.                  LDP:      5 routes,      5 active
  12. iso.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
  13.               Direct:      1 routes,      1 active
  14. mpls.0: 18 destinations, 18 routes (18 active, 0 holddown, 0 hidden)
  15.                 MPLS:      6 routes,      6 active
  16.                  LDP:      6 routes,      6 active
  17.                 EVPN:      6 routes,      6 active
  18. bgp.evpn.0: 102 destinations, 102 routes (102 active, 0 holddown, 0 hidden)
  19.                  BGP:    102 routes,    102 active
  20.  
  21. EVPN-100.evpn.0: 153 destinations, 153 routes (153 active, 0 holddown, 0 hidden)
  22.                  BGP:    102 routes,    102 active
  23.                 EVPN:     51 routes,     51 active
  24. [edit]
  25. greg@MX5-1#

 

Lots of MAC/IP routes 🙂

A quick look at the BGP table:

 

  1. bgp.evpn.0: 102 destinations, 102 routes (102 active, 0 holddown, 0 hidden)
  2. + = Active Route, – = Last Active, * = Both
  3. 2:1.1.1.2:100::100::00:00:0f:45:a2:8a/304
  4.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  5.                       AS path: I, validation-state: unverified
  6.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  7. 2:1.1.1.2:100::100::00:00:0f:45:a2:8c/304
  8.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  9.                       AS path: I, validation-state: unverified
  10.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  11. 2:1.1.1.2:100::100::00:00:0f:45:a2:8e/304
  12.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  13.                       AS path: I, validation-state: unverified
  14.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  15. 2:1.1.1.2:100::100::00:00:0f:45:a2:90/304
  16.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  17.                       AS path: I, validation-state: unverified
  18.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  19. 2:1.1.1.2:100::100::00:00:0f:45:a2:92/304
  20.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  21.                       AS path: I, validation-state: unverified
  22.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  23. 2:1.1.1.2:100::100::00:00:0f:45:a2:94/304
  24.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  25.                       AS path: I, validation-state: unverified
  26.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  27. 2:1.1.1.2:100::100::00:00:0f:45:a2:96/304
  28.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  29.                       AS path: I, validation-state: unverified
  30.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  31. 2:1.1.1.2:100::100::00:00:0f:45:a2:98/304
  32.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  33.                       AS path: I, validation-state: unverified
  34.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  35. 2:1.1.1.2:100::100::00:00:0f:45:a2:9a/304
  36.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  37.                       AS path: I, validation-state: unverified
  38.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  39. 2:1.1.1.2:100::100::00:00:0f:45:a2:9c/304
  40.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  41.                       AS path: I, validation-state: unverified
  42.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  43. 2:1.1.1.2:100::100::00:00:0f:45:a2:9e/304
  44.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  45.                       AS path: I, validation-state: unverified
  46.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904

 

So yeah – it basically goes on and on,

Incidentally, what we gain in using more of the networks resources – we lose in scalability because you cannot get something for nothing. We all know that TCAM, forwarding-tables and BGP tables are limiting factors on even the largest routers, with evpn a very large amount of information is loaded into BGP (every single mac-address on the network) and because each mac-address is totally non-contiguous (different blocks for different vendor nics) they can’t be aggregated or summarised in any way.

If you had a data centre with 500k servers, you’d have 500k MAC/IP advertisements, which is a pretty large burden on the control-plane, in my own time I did some comparisons with tens of thousands of hosts on MX480 routers, with RE1800x4’s and high-end MPCs, and the results were not pretty on a very large network (more than 100k hosts) the control-plane learning was very laggy, and RE’s tended to suffer from very high CPU during the learning process, or if a failover occurred.

The evolution onwards from this is PBB-EVPN (provider backbone bridging EVPN) which essentially allows large numbers of hosts to be represented by a single mac-address, which enables absolutely enormous scalability (millions of hosts per site), at the expense of some feature loss – PBB-EVPNs will be the topic for another blog, where I can hopefully use IXIA to show hundreds of thousands of hosts connected!

Hope you found this useful, (if anyone even read it! 😀 )