EVPN – Single-active redundancy

In the previous 2 posts I looked at the basics of EVPN including the new BGP based control-plane, later I looked at the integration between the layer-2 and layer-3 worlds within EVPN. However – all the previous examples were shown with basic single site networks with no link or device redundancy, this this post I’m going to look at the first and simplest EVPN redundancy mode.

First – consider the new lab topology:

Capture4

The topology and configuration remains pretty much the same, except that MX-1 and MX-2 each connect back to EX4200-1, for VLAN 100 and VLAN 101, with the same IRB interfaces present on each MX router, essentially a very basic site with 2 PEs for redundancy.

Let’s recap the EVPN configuration on each MX1, I’ve got the exact same configuration loaded on MX-2 and MX-3, the only differences being the interface numbers and a unique RD for each site.

MX-1: 

  1. tim@MX5-1> show configuration routing-instances
  2. EVPN-100 {
  3.     instance-type virtual-switch;
  4.     route-distinguisher 1.1.1.1:100;
  5.     vrf-target target:100:100;
  6.     protocols {
  7.         evpn {
  8.             extended-vlan-list 100-101;
  9.             default-gateway do-not-advertise;
  10.         }
  11.     }
  12.     bridge-domains {
  13.         VL-100 {
  14.             vlan-id 100;
  15.             interface ge-1/1/5.100;
  16.             routing-interface irb.100;
  17.         }
  18.         VL-101 {
  19.             vlan-id 101;
  20.             interface ge-1/1/5.101;
  21.             routing-interface irb.101;
  22.         }
  23.     }
  24. }
  25. VPN-100 {
  26.     instance-type vrf;
  27.     interface irb.100;
  28.     interface irb.101;
  29.     route-distinguisher 100.100.100.1:100;
  30.     vrf-target target:1:100;
  31.     vrf-table-label;
  32. }
  33. tim@MX5-1>

 

 

Essentially, each site is configured exactly the same, except for a unique RD per site, and differences in the interface numbering.

In terms of providing active/standby redundancy at the main site, for layer-2 and layer-3 simultaneously, we would historically use VPLS combined with VRRP on the IRB interfaces to provide connectivity.

However this isn’t a perfect solution, for the following reasons:

  1. Unlike EVPN – VPLS needs unique IPv4 GW/MAC addresses at each site, inside the same VPN, so the only way to do active-standby redundancy is with VRRP.
  2. VRRP designs can become complex, ensuring that everything is tracked and monitored – partial failures can be hard to track and things can get over-complicated.
  3. Traffic tromboning can occur where VRRP is used

Regarding point 3

Imagine a scenario where each PE is providing a layer-3 default gateway for each VLAN on each PE, where MX1 is active for VLAN 100 and MX2 is active for VLAN 101

Capture5

It looks simple enough, but traffic tromboning can occur quite easily – due to the reliance on VRRP, for example if host-1 in VLAN 100 wants to send traffic to host-2 in VLAN 101, connected to the same switch – the following things happen:

  1. The packet hits the VRRP active VLAN 100 IRB interface on MX1
  2. Because VLAN 101 is in standby mode on MX1 – it can’t be switched locally
  3. MX1 forwards the packet towards the MPLS network, because there’s a BGP route coming from MX2 (because it’s VRRP active for VLAN 101)
  4. Rather than being routed locally, the packet has to traverse the MPLS network, in order to route between VLANs:

Capture6

Things like this are a pain, and can be mitigated by design and awareness from the start – but in my opinion these sorts of scenarios are good examples of why EVPN was invented, because VPLS never properly solved the basic problems that we get in day to day designs, for simple bread and butter problems like routing between VLANs you end up having a nightmare.

So how does EVPN do it differently?

First, lets look at the configuration required to convert the lab topology into EVPN active-standby, it’s pretty simple:

MX-1: 

  1. tim@MX5-1# run show configuration interfaces ge-1/1/5
  2. flexible-vlan-tagging;
  3. encapsulation flexible-ethernet-services;
  4. esi {
  5.     00:11:22:33:44:55:66:77:88:99;
  6.     single-active;
  7. }
  8. unit 100 {
  9.     encapsulation vlan-bridge;
  10.     vlan-id 100;
  11. }
  12. unit 101 {
  13.     encapsulation vlan-bridge;
  14.     vlan-id 101;
  15. }
  16. [edit]
  17. tim@MX5-1#

 

MX-2:

  1. tim@MX5-2# run show configuration interfaces ge-1/0/5
  2. flexible-vlan-tagging;
  3. encapsulation flexible-ethernet-services;
  4. esi {
  5.     00:11:22:33:44:55:66:77:88:99;
  6.     single-active;
  7. }
  8. unit 100 {
  9.     encapsulation vlan-bridge;
  10.     vlan-id 100;
  11. }
  12. unit 101 {
  13.     encapsulation vlan-bridge;
  14.     vlan-id 101;
  15. }
  16. [edit]
  17. tim@MX5-2#

 

In basic EVPN where sites are single-homed, the “ESI” (Ethernet segment identifier) remains at zero, however whenever you have single-active multi-homing or active-active multi-homing, the ESI value  must be configured to a non-default value. It’s purpose is to identify an Ethernet segment and as such it identifies the entire “site” or “data-centre” to other PE routers on the network, it’s configured under the physical Ethernet interface and must be the same across the segment, in this case for MX1 and MX2 access-facing interfaces

Secondly, under the ESI configuration the PE interfaces are configured to operate in “single-active” mode, which should be self explanatory to most readers 🙂

How does this alter the EVPN control-plane? lets have a more detailed look at the EVPN instance on MX-1

 

  1. tim@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                 2       2
  7.     Default gateway MAC addresses:       2       0
  8.   Number of local interfaces: 2 (2 up)
  9.     Interface name  ESI                            Mode             Status
  10.     ge-1/1/5.100    00:11:22:33:44:55:66:77:88:99  single-active    Up    
  11.     ge-1/1/5.101    00:11:22:33:44:55:66:77:88:99  single-active    Up    
  12.   Number of IRB interfaces: 2 (2 up)
  13.     Interface name  VLAN ID  Status  L3 context
  14.     irb.100         100      Up      VPN-100
  15.     irb.101         101      Up      VPN-100
  16.   Number of bridge domains: 2
  17.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  18.     100          1   1     Extended         Enabled   302080
  19.     101          1   1     Extended         Enabled   301872
  20.   Number of neighbors: 2
  21.     10.10.10.2
  22.       Received routes
  23.         MAC address advertisement:              0
  24.         MAC+IP address advertisement:           0
  25.         Inclusive multicast:                    2
  26.         Ethernet auto-discovery:                1
  27.     10.10.10.3
  28.       Received routes
  29.         MAC address advertisement:              2
  30.         MAC+IP address advertisement:           2
  31.         Inclusive multicast:                    2
  32.         Ethernet auto-discovery:                0
  33.   Number of ethernet segments: 1
  34.     ESI: 00:11:22:33:44:55:66:77:88:99
  35.       Status: Resolved by IFL ge-1/1/5.100
  36.       Local interface: ge-1/1/5.100, Status: Up/Forwarding
  37.       Number of remote PEs connected: 1
  38.         Remote PE        MAC label  Aliasing label  Mode
  39.         10.10.10.2       301008     0               single-active
  40.       Designated forwarder: 10.10.10.1
  41.       Backup forwarder: 10.10.10.2
  42.       Advertised MAC label: 301232
  43.       Advertised aliasing label: 301232
  44.       Advertised split horizon label: 0
  45. Instance: __default_evpn__
  46.   Route Distinguisher: 10.10.10.1:0
  47.   VLAN ID: None
  48.   Per-instance MAC route label: 299808
  49.   MAC database status                Local  Remote
  50.     Total MAC addresses:                 0       0
  51.     Default gateway MAC addresses:       0       0
  52.   Number of local interfaces: 0 (0 up)
  53.   Number of IRB interfaces: 0 (0 up)
  54.   Number of bridge domains: 0
  55.   Number of neighbors: 1
  56.     10.10.10.2
  57.       Received routes
  58.         Ethernet auto-discovery:                0
  59.         Ethernet Segment:                       1
  60.   Number of ethernet segments: 0
  61. tim@MX5-1>

 

 

A couple of things to note:

  • EVPN is running in single-active mode, for ge-1/1/5.100 and ge-1/0/5.101
  • The access-interface (ge-1/1/5) on MX1 is shown to be up/forwarding, making this the active PE
  • MX1 is operating in single-active mode
  • The designated forwarder is MX1 (10.10.10.1)
  • The backup designated forwarder is MX2 (10.10.10.2)

Because MX-1 is the active PE, lets take a look at BGP on MX-3 to see what routes are advertised from the redundant site, to a remote site:

(Note – I currently have 2Mbps of IXIA traffic flowing bi-bidirectionally between each site, in each VLAN)

  1. EVPN-100.evpn.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
  2. + = Active Route, – = Last Active, * = Both
  3. 1:1.1.1.1:100::112233445566778899::0/304
  4.                    *[BGP/170] 04:17:27, localpref 100, from 10.10.10.1
  5.                       AS path: I, validation-state: unverified
  6.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  7. 1:10.10.10.1:0::112233445566778899::FFFF:FFFF/304
  8.                    *[BGP/170] 04:17:27, localpref 100, from 10.10.10.1
  9.                       AS path: I, validation-state: unverified
  10.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  11. 1:10.10.10.2:0::112233445566778899::FFFF:FFFF/304
  12.                    *[BGP/170] 13:50:18, localpref 100, from 10.10.10.2
  13.                       AS path: I, validation-state: unverified
  14.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300848
  15. 2:1.1.1.1:100::100::00:00:2e:18:6d:e1/304
  16.                    *[BGP/170] 04:17:23, localpref 100, from 10.10.10.1
  17.                       AS path: I, validation-state: unverified
  18.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  19. 2:1.1.1.1:100::101::00:00:2e:e6:77:95/304
  20.                    *[BGP/170] 04:17:23, localpref 100, from 10.10.10.1
  21.                       AS path: I, validation-state: unverified
  22.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  23. 2:1.1.1.1:100::100::00:00:2e:18:6d:e1::192.168.100.10/304
  24.                    *[BGP/170] 04:17:23, localpref 100, from 10.10.10.1
  25.                       AS path: I, validation-state: unverified
  26.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  27. 2:1.1.1.1:100::101::00:00:2e:e6:77:95::192.168.101.10/304
  28.                    *[BGP/170] 04:17:23, localpref 100, from 10.10.10.1
  29.                       AS path: I, validation-state: unverified
  30.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  31. 3:1.1.1.1:100::100::10.10.10.1/304
  32.                    *[BGP/170] 04:17:26, localpref 100, from 10.10.10.1
  33.                       AS path: I, validation-state: unverified
  34.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  35. 3:1.1.1.1:100::101::10.10.10.1/304
  36.                    *[BGP/170] 13:50:26, localpref 100, from 10.10.10.1
  37.                       AS path: I, validation-state: unverified
  38.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300912
  39. 3:1.1.1.2:100::100::10.10.10.2/304
  40.                    *[BGP/170] 13:50:18, localpref 100, from 10.10.10.2
  41.                       AS path: I, validation-state: unverified
  42.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300848
  43. 3:1.1.1.2:100::101::10.10.10.2/304
  44.                    *[BGP/170] 13:50:18, localpref 100, from 10.10.10.2
  45.                       AS path: I, validation-state: unverified
  46.                     > to 192.169.100.15 via ge-1/1/0.0, Push 300848
  47. tim@MX5-3>

 

We covered type-2 and type-3 routes in the previous labs, but here we have a new type-1 route being received on MX-3, what’s that all about? lets take a deeper look:

  1. tim@MX5-3> show route protocol bgp table EVPN-100.evpn.0 extensive
  2. EVPN-100.evpn.0: 17 destinations, 17 routes (17 active, 0 holddown, 0 hidden)
  3. 1:1.1.1.1:100::112233445566778899::0/304 (1 entry, 1 announced)
  4.         *BGP    Preference: 170/-101
  5.                 Route Distinguisher: 1.1.1.1:100
  6.                 Next hop type: Indirect
  7.                 Address: 0x2a7b880
  8.                 Next-hop reference count: 16
  9.                 Source: 10.10.10.1
  10.                 Protocol next hop: 10.10.10.1
  11.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  12.                 State: <Secondary Active Int Ext>
  13.                 Local AS:   100 Peer AS:   100
  14.                 Age: 4:21:25    Metric2: 1
  15.                 Validation State: unverified
  16.                 Task: BGP_100.10.10.10.1+179
  17.                 Announcement bits (1): 0-EVPN-100-evpn
  18.                 AS path: I
  19.                 Communities: target:100:100
  20.                 Import Accepted
  21.                 Route Label: 301232
  22.                 Localpref: 100
  23.                 Router ID: 10.10.10.1
  24.                 Primary Routing Table bgp.evpn.0
  25.                 Indirect next hops: 1
  26.                         Protocol next hop: 10.10.10.1 Metric: 1
  27.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  28.                         Indirect path forwarding next hops: 1
  29.                                 Next hop type: Router
  30.                                 Next hop: 192.169.100.15 via ge-1/1/0.0
  31.                                 Session Id: 0x0
  32.             10.10.10.1/32 Originating RIB: inet.3
  33.               Metric: 1           Node path count: 1
  34.               Forwarding nexthops: 1
  35.                 Nexthop: 192.169.100.15 via ge-1/1/0.0
  36. 1:10.10.10.1:0::112233445566778899::FFFF:FFFF/304 (1 entry, 1 announced)
  37.         *BGP    Preference: 170/-101
  38.                 Route Distinguisher: 10.10.10.1:0
  39.                 Next hop type: Indirect
  40.                 Address: 0x2a7b880
  41.                 Next-hop reference count: 16
  42.                 Source: 10.10.10.1
  43.                 Protocol next hop: 10.10.10.1
  44.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  45.                 State: <Secondary Active Int Ext>
  46.                 Local AS:   100 Peer AS:   100
  47.                 Age: 4:21:25    Metric2: 1
  48.                 Validation State: unverified
  49.                 Task: BGP_100.10.10.10.1+179
  50.                 Announcement bits (1): 0-EVPN-100-evpn
  51.                 AS path: I
  52.                 Communities: target:100:100 esi-label:single-active (label 0)
  53.                 Import Accepted
  54.                 Localpref: 100
  55.                 Router ID: 10.10.10.1
  56.                 Primary Routing Table bgp.evpn.0
  57.                 Indirect next hops: 1
  58.                         Protocol next hop: 10.10.10.1 Metric: 1
  59.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  60.                         Indirect path forwarding next hops: 1
  61.                                 Next hop type: Router
  62.                                 Next hop: 192.169.100.15 via ge-1/1/0.0
  63.                                 Session Id: 0x0
  64.             10.10.10.1/32 Originating RIB: inet.3
  65.               Metric: 1           Node path count: 1
  66.               Forwarding nexthops: 1
  67.                 Nexthop: 192.169.100.15 via ge-1/1/0.0
  68. 1:10.10.10.2:0::112233445566778899::FFFF:FFFF/304 (1 entry, 1 announced)
  69.         *BGP    Preference: 170/-101
  70.                 Route Distinguisher: 10.10.10.2:0
  71.                 Next hop type: Indirect
  72.                 Address: 0x2a7ae54
  73.                 Next-hop reference count: 6
  74.                 Source: 10.10.10.2
  75.                 Protocol next hop: 10.10.10.2
  76.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  77.                 State: <Secondary Active Int Ext>
  78.                 Local AS:   100 Peer AS:   100
  79.                 Age: 13:54:16   Metric2: 1
  80.                 Validation State: unverified
  81.                 Task: BGP_100.10.10.10.2+179
  82.                 Announcement bits (1): 0-EVPN-100-evpn
  83.                 AS path: I
  84.                 Communities: target:100:100 esi-label:single-active (label 0)
  85.                 Import Accepted
  86.                 Localpref: 100
  87.                 Router ID: 10.10.10.2
  88.                 Primary Routing Table bgp.evpn.0
  89.                 Indirect next hops: 1
  90.                         Protocol next hop: 10.10.10.2 Metric: 1
  91.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  92.                         Indirect path forwarding next hops: 1
  93.                                 Next hop type: Router
  94.                                 Next hop: 192.169.100.15 via ge-1/1/0.0
  95.                                 Session Id: 0x0
  96.             10.10.10.2/32 Originating RIB: inet.3
  97.               Metric: 1           Node path count: 1
  98.               Forwarding nexthops: 1
  99.                 Nexthop: 192.169.100.15 via ge-1/1/0.0

 

The Type-1 route is known as an AD or Auto-Discovery route, and it’s broken up into two distinct chunks:

  • A per-EVI AD route (line 4
  • A per-ESI AD route (lines 71 and 87)

The first route (line 4) is known as a per-EVI route, and contains what’s known as the “aliasing label” technically this isn’t required in an active-standby situation, as it exists to ensure that traffic can be forwarded equally where you have multiple PEs in an active-active setup. It solves the problem of traffic polarisation caused by a CE hashing traffic on one egress link only – resulting in that being replicated in the control-plane, so return traffic is also polarised, the aliasing label gets around this simply because a remote PE treats it like a regular MAC/IP route, but more on that in the next blog 🙂

The other two routes (line 71 and 87) are Per-ESI AD routes, and contain the ESI of the site, advertised from PE1 and PE2, you notice that the community is set as “target:100:100 esi-label:single-active” and has a label-value of 0. This is essentially telling MX3 that the ESI is running in single-active mode, if it was running in active-active mode – then a non-zero MPLS label would be present – in order to cater for split horizon and BUM traffic. In this case the setup is single-active and so there will only ever be one route at a time back to site 1.

These routes also speed up convergence, if you’re advertising 1000s of MAC/IP routes and you get a link failure, rather than a PE having to send BGP messages to withdraw all those routes, it can simply withdraw the Ethernet AD routes – which speeds up convergence.

Next lets take a look at what’s going on at the main site, and see what MX1 is advertising to MX2:

 

  1. tim@MX5-1> show route advertising-protocol bgp 10.10.10.2 evpn-esi-value 00:11:22:33:44:55:66:77:88:99 detail
  2. VPN-100.inet.0: 8 destinations, 14 routes (8 active, 0 holddown, 0 hidden)
  3. EVPN-100.evpn.0: 16 destinations, 16 routes (16 active, 0 holddown, 0 hidden)
  4. * 1:1.1.1.1:100::112233445566778899::0/304 (1 entry, 1 announced)
  5.  BGP group iBGP-PEs type Internal
  6.      Route Distinguisher: 1.1.1.1:100
  7.      Route Label: 301232
  8.      Nexthop: Self
  9.      Flags: Nexthop Change
  10.      Localpref: 100
  11.      AS path: [100] I
  12.      Communities: target:100:100
  13. __default_evpn__.evpn.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
  14. * 1:10.10.10.1:0::112233445566778899::FFFF:FFFF/304 (1 entry, 1 announced)
  15.  BGP group iBGP-PEs type Internal
  16.      Route Distinguisher: 10.10.10.1:0
  17.      Nexthop: Self
  18.      Flags: Nexthop Change
  19.      Localpref: 100
  20.      AS path: [100] I
  21.      Communities: target:100:100 esi-label:single-active (label 0)
  22. * 4:10.10.10.1:0::112233445566778899:10.10.10.1/304 (1 entry, 1 announced)
  23.  BGP group iBGP-PEs type Internal
  24.      Route Distinguisher: 10.10.10.1:0
  25.      Nexthop: Self
  26.      Flags: Nexthop Change
  27.      Localpref: 100
  28.      AS path: [100] I
  29.      Communities: es-import-target:22-33-44-55-66-77

 

You can see that there’s a new “type-4” route being advertised, this is known as an “Ethernet Segment (ES) route” and is advertised by PE routers which are configured with non-zero ESI values. Essentially, it’s a special extended community (ES-Import-target) that each PE router will import if they both have the same ESI configured, it means that two PE routers remote from one another, know that they’re both connected to the same Ethernet segment, all other PE routers with default, or non-zero ESI values filter these advertisements.

So a quick recap – we’ve looked at the new route types, the control-plane and the configuration, the next step is to see how well it works, first a quick recap of the diagram:

Capture7

I’ve created a flow of IXIA traffic bi-bidirectionally between the top site and the bottom site, if I go to MX-1 and look at the MPLS facing interface, we should see the traffic:


Physical interface: ge-1/1/0, Enabled, Physical link is Up
Interface index: 147, SNMP ifIndex: 525
Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 1000mbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Enabled, Auto-negotiation: Enabled, Remote fault: Online
Pad to minimum frame size: Disabled
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Link flags : None
CoS queues : 8 supported, 8 maximum usable queues
Current address: a8:d0:e5:5b:7c:90, Hardware address: a8:d0:e5:5b:7c:90
Last flapped : 2016-06-10 20:15:19 UTC (5d 19:13 ago)
Input rate : 5599000 bps (500 pps)
Output rate : 5583408 bps (499 pps)

So it’s clear that traffic is being forwarded by MX-1, because I’m sending packets at an exact rate of 1000pps we should be able to measure how quickly fail-over occurs by counting the number of lost packets, for example – at 1000pps, if I lose 50 packets, that yields a fail-over time of 50ms.

First an easy failure – I’ll shut down ge-0/0/0 on EX4200-1, this will put the interface down/down on MX-1 and we’ll measure how long it takes to recover:


imtech@ex4200-1# set interfaces ge-0/0/0 disable
{master:0}[edit]
imtech@ex4200-1# commit
configuration check succeeds
commit complete
{master:0}[edit]
imtech@ex4200-1#

Lets look at much traffic was lost:

Fail1

Frames delta = 1077, so just a fraction longer than 1 second to failover, which isn’t THAT bad, we might be able to improve it later..

Lets check the EVPN instance to see how things have changed:

on MX1:

  1. im@MX5-1> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.1:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                 0       3
  7.     Default gateway MAC addresses:       0       0
  8.   Number of local interfaces: 2 (0 up)
  9.     Interface name  ESI                            Mode             Status
  10.     ge-1/1/5.100    00:11:22:33:44:55:66:77:88:99  single-active    Down  
  11.     ge-1/1/5.101    00:11:22:33:44:55:66:77:88:99  single-active    Down  
  12.   Number of IRB interfaces: 2 (0 up)
  13.     Interface name  VLAN ID  Status  L3 context
  14.   irb.100         100      Down    VPN-100                          
  15.     irb.101         101      Down    VPN-100      
  16.   Number of bridge domains: 2
  17.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  18.     100          1   0     Extended         Enabled
  19.     101          1   0     Extended         Enabled
  20.   Number of neighbors: 2
  21.     10.10.10.2
  22.       Received routes
  23.         MAC address advertisement:              1
  24.         MAC+IP address advertisement:           1
  25.         Inclusive multicast:                    2
  26.         Ethernet auto-discovery:                2
  27.     10.10.10.3
  28.       Received routes
  29.         MAC address advertisement:              2
  30.         MAC+IP address advertisement:           2
  31.         Inclusive multicast:                    2
  32.         Ethernet auto-discovery:                0
  33.   Number of ethernet segments: 1
  34.     ESI: 00:11:22:33:44:55:66:77:88:99
  35.       Status: Resolved by NH 1048582
  36.   Local interface: ge-1/1/5.100, Status: Down
  37.       Number of remote PEs connected: 1
  38.         Remote PE        MAC label  Aliasing label  Mode
  39.         10.10.10.2       301008     301008          single-active
  40.       Designated forwarder: 10.10.10.2
  41.       Advertised MAC label: 301232
  42.       Advertised aliasing label: 301232
  43.       Advertised split horizon label: 0
  44. Instance: __default_evpn__
  45.   Route Distinguisher: 10.10.10.1:0
  46.   VLAN ID: None
  47.   Per-instance MAC route label: 299808
  48.   MAC database status                Local  Remote
  49.     Total MAC addresses:                 0       0
  50.     Default gateway MAC addresses:       0       0
  51.   Number of local interfaces: 0 (0 up)
  52.   Number of IRB interfaces: 0 (0 up)
  53.   Number of bridge domains: 0
  54.   Number of neighbors: 1
  55.     10.10.10.2
  56.       Received routes
  57.         Ethernet auto-discovery:                0
  58.         Ethernet Segment:                       1
  59.   Number of ethernet segments: 0
  60. tim@MX5-1>

 

So it’s pretty clear that things have gone down, and MX2 is the new active PE router, lets check it out:

  1. tim@MX5-2> show evpn instance extensive
  2. Instance: EVPN-100
  3.   Route Distinguisher: 1.1.1.2:100
  4.   Per-instance MAC route label: 299776
  5.   MAC database status                Local  Remote
  6.     Total MAC addresses:                 1       2
  7.     Default gateway MAC addresses:       2       0
  8.   Number of local interfaces: 2 (2 up)
  9.     Interface name  ESI                            Mode             Status
  10.     ge-1/0/5.100    00:11:22:33:44:55:66:77:88:99  single-active    Up    
  11.     ge-1/0/5.101    00:11:22:33:44:55:66:77:88:99  single-active    Up    
  12.   Number of IRB interfaces: 2 (2 up)
  13.     Interface name  VLAN ID  Status  L3 context
  14.     irb.100         100      Up      VPN-100                          
  15.     irb.101         101      Up      VPN-100      
  16.   Number of bridge domains: 2
  17.     VLAN ID  Intfs / up    Mode             MAC sync  IM route label
  18.     100          1   1     Extended         Enabled   302272
  19.     101          1   1     Extended         Enabled   302224
  20.   Number of neighbors: 1
  21.     10.10.10.3
  22.       Received routes
  23.         MAC address advertisement:              2
  24.         MAC+IP address advertisement:           2
  25.         Inclusive multicast:                    2
  26.         Ethernet auto-discovery:                0
  27.   Number of ethernet segments: 1
  28.     ESI: 00:11:22:33:44:55:66:77:88:99
  29.       Status: Resolved by IFL ge-1/0/5.100
  30.       Local interface: ge-1/0/5.100, Status: Up/Forwarding
  31.       Designated forwarder: 10.10.10.2
  32.       Advertised MAC label: 301008
  33.       Advertised aliasing label: 301008
  34.       Advertised split horizon label: 0
  35. Instance: __default_evpn__
  36.   Route Distinguisher: 10.10.10.2:0
  37.   VLAN ID: None
  38.   Per-instance MAC route label: 299808
  39.   MAC database status                Local  Remote
  40.     Total MAC addresses:                 0       0
  41.     Default gateway MAC addresses:       0       0
  42.   Number of local interfaces: 0 (0 up)
  43.   Number of IRB interfaces: 0 (0 up)
  44.   Number of bridge domains: 0
  45.   Number of neighbors: 0
  46.   Number of ethernet segments: 0
  47. tim@MX5-2>

 

 

If we look at the MPLS facing interface on MX2, we should see that all traffic is being sent and received via the MPLS network:


tim@MX5-2> show interfaces ge-1/1/0
Physical interface: ge-1/1/0, Enabled, Physical link is Up
Interface index: 147, SNMP ifIndex: 526
Link-level type: Ethernet, MTU: 1514, MRU: 1522, Speed: 1000mbps, BPDU Error: None, MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
Flow control: Enabled, Auto-negotiation: Enabled, Remote fault: Online
Pad to minimum frame size: Disabled
Device flags : Present Running
Interface flags: SNMP-Traps Internal: 0x0
Link flags : None
CoS queues : 8 supported, 8 maximum usable queues
Current address: a8:d0:e5:5b:75:90, Hardware address: a8:d0:e5:5b:75:90
Last flapped : 2016-06-10 20:08:17 UTC (5d 19:42 ago)
Input rate : 5605824 bps (502 pps)
Output rate : 5584392 bps (501 pps)

 

The solution itself is a lot more elegant than traditional FHRP (First hop routing protocols) such as VRRP or HSRP.

  • Because MX1 and MX2 automatically learn about each other via the MPLS network and the type-4 Ethernet-Segment route, and NOT the LAN (like HSRP) – if there’s any problem with the MPLS side connected to the active router, it transitions to standby and the solution fails over.

If I fail the MPLS interface on the “P” router connected to MX1, we get failover in less than 1 second:


Axians@m10i-1# set interfaces ge-0/0/2 disable
[edit]
Axians@m10i-1# commit
commit complete

Then check the packet loss in IXIA:

Fail2

The solution recovers from the failure in 912ms.

This is pretty great, not least because it works reliably – but most of this functionality is built directly into the protocol, I haven’t had to do any crazy tracking of routes, I haven’t needed to go anywhere near IP SLA or any of that horror that is a massive pain when designing this sort of thing, with EVPN – things are pretty simple and work reliably.

It’s not perfect however, unlike HSRP or VRRP which form an adjacency over a LAN via Multicast, EVPN doesn’t do this – all information about other PEs is sent and received via BGP. If you have a complex LAN environment and a failure leaves the PEs isolated – you don’t get a traditional split-brain scenario like you would with HSRP or VRRP, the solution simply doesn’t fail at all, the basic triggers for failure are that the physical interface goes down, the MPLS side goes down, or the entire PE goes down.

This can easily be demonstrated by breaking the logical interface on EX4200-1 whilst leaving the physical interface up/up:


imtech@ex4200-1# set interfaces ge-0/0/0.0 disable
{master:0}[edit]
imtech@ex4200-1# commit
configuration check succeeds
commit complete

The whole solution breaks, and stays broken forever:

Fail3

So you still need to be careful with the design and the different way in which EVPN operates, incidentally you can use things like Ethernet OAM to get around this problem:

Just for laughs, lets apply a basic Ethernet OAM config to MX1, MX2 and the EX4200:

OAM template (shown just on MX-1):

  1. oam {
  2.     ethernet {
  3.         connectivity-fault-management {
  4.             action-profile bring-down {
  5.                 event {
  6.                     interface-status-tlv down;
  7.                     adjacency-loss;
  8.                 }
  9.                 action {
  10.                     interface-down;
  11.                 }
  12.             }
  13.             maintenance-domain “IEEE level 4” {
  14.                 level 4;
  15.                 maintenance-association PE1 {
  16.                     short-name-format character-string;
  17.                     continuity-check {
  18.                         interval 100ms;
  19.                         interface-status-tlv;
  20.                     }
  21.                     mep 1 {
  22.                         interface ge-1/1/5.100;
  23.                         direction down;
  24.                         auto-discovery;
  25.                         remote-mep 2 {
  26.                             action-profile bring-down;
  27.                         }
  28.                     }
  29.                 }
  30.             }
  31.         }
  32.     }

 

Just for clarity, the OAM configuration ensures that if there’s a problem with connectivity between MX1 – EX4200-1 and MX2 – EX4200-1 but the physical interfaces remain up/up, OAM will detect the connectivity loss, and automatically tear the line-protocol of the interface to the down/down status, and force EVPN to fail-over,

lets repeat the exact same test again, with the OAM configuration applied to the PEs and the switch:


imtech@ex4200-1# set interfaces ge-0/0/0.0 disable
{master:0}[edit]
imtech@ex4200-1# commit
configuration check succeeds
commit complete

and check the packet-loss with IXIA:

Fail4

Not bad! 612 packets lost, equals failure and convergence in 624ms, which is a lot better than the original 1077ms when failing the physical interface, and a hell of a lot better than it being down forever, if the network experiences a non-direct failure, (software/logical fail)

Anyway I hope you’ve found this useful, there’s a few bits I’ve skipped over – but I’ll cover those in more detail when I do all-active redundancy in the next blog 🙂

 

EVPN Inter-VLAN routing + mobility

So in the last blog I essentially looked at one of the most basic aspects of EVPN – a multi-site layer-2 network with nothing fancy going on, with traffic forwarding occurring between multiple sites in the same VLAN. The fact of the matter is that there was nothing going on there that you couldn’t do with a traditional VPLS configuration, however the general idea was to demonstrate the basics and take a look at the basic control-plane first.

In this update we’ll be looking at some of the more exclusive and highly useful aspects of EVPNs which make it a very attractive technology for things such as data-centre interconnect, there are a few things which are possible with EVPN which cannot be done with VPLS.

Consider the revised topology:

Capture

It’s the same topology from the first blog post, however I’ve simply added an additional VLAN (VLAN 101) to ge-0/0/22 of each EX4200 LAN switch, and an additional IXIA host.

For this post we’re going to look at a rather cool way of performing inter-VLAN forwarding between hosts in VLAN100 and VLAN101. Not that I want to spend time teaching people how to suck eggs, but generally in a simple network with multiple VLANs you have 2 common ways of performing inter-VLAN forwarding:

  • Use a good ole’ fashioned router on a stick topology
  • Bolt some additional layer-3 functionality onto your layer-2 switch

As everyone knows, the latter method is by far the most common – the vast majority of switches support layer-3 routing functionality, usually in the form of IRB/BVI/SVI depending on the vendor in question.

In a service provider network, where we generally have a number of PE routers acting together as a large distributed switch, providing layer-2 connectivity – the old fashioned way of doing this would be with VPLS. In order to enable inter-VLAN forwarding we’d add a BVI interface to the VPLS instance, this enables a PE to do standard layer-2 switching and route between VLANs at layer-3 – which is very important for data-centre interconnect applications.

EVPN has a number of enhancements which make it more suitable for modern day data-centre interconnect designs, especially where things such as VM mobility are concerned. A company or organisation with a traditional MPLS based network, might require the ability to move hosts around between data centres seamlessly, without causing any real downtime.

Lets take a look at the basic interface configuration and routing-instance configuration:

  1. interfaces {
  2.     irb {
  3.         unit 100 {
  4.             family inet {
  5.                 address 192.168.100.1/24;
  6.             }
  7.             mac 00:00:19:21:68:10;
  8.         }
  9.         unit 101 {
  10.             family inet {
  11.                 address 192.168.101.1/24;
  12.             }
  13.             mac 00:00:19:21:68:11;
  14.         }
  15.     }
  16. routing-instances {
  17. EVPN-100 {
  18.     instance-type virtual-switch;
  19.     route-distinguisher 1.1.1.1:100;
  20.     vrf-target target:100:100;
  21.     protocols {
  22.         evpn {
  23.             extended-vlan-list 100-101;
  24.             default-gateway do-not-advertise;
  25.         }
  26.     }
  27.     bridge-domains {
  28.         VL-100 {
  29.             vlan-id 100;
  30.             interface ge-1/1/5.100;
  31.             routing-interface irb.100;
  32.         }
  33.         VL-101 {
  34.             vlan-id 101;
  35.             interface ge-1/1/5.101;
  36.             routing-interface irb.101;
  37.         }
  38.     }
  39. }
  40. VPN-100 {
  41.     instance-type vrf;
  42.     interface irb.100;
  43.     interface irb.101;
  44.     route-distinguisher 100.100.100.1:100;
  45.     vrf-target target:1:100;
  46.     vrf-table-label;
  47. }

 

First things first – lines 1 – 15 take care of the IRB interfaces for VLAN 100 and VLAN 101; more of that shortly.

Lines 16 – 39 form the configuration for the EVPN routing instance, you’ll note a couple of differences from the first EVPN blog post;

  • The extended-vlan-list has been increased to include both VLANs within the routing instance
  • A new command “default-gateway do-not-advertise” is present under the EVPN protocol configuration
  • An additional bridge-domain has been configured for Vlan 101 under the routing-instance, along with the IRB interface for each vlan
  • What looks like a totally standard L3VPN has been configured, albeit with different RTs and RDs – but it does contain the IRB interfaces from the EVPN routing instance.

The command “default-gateway do-not-advertise” is used to generate a new extended-community route. If on your PE routers you have different IRB MAC addresses and IPv4 addresses – the PE will generate a “default-gateway route” which tells other PEs in the EVPN that this route is a default-gateway somewhere, however in this example and in best practise – it’s simpler and easier to configure the same IRB MAC/IP on all your PEs, and so the command here is “do-not-advertise” as we don’t need it at this time.

But perhaps the coolest feature and one of the biggest advantages EVPN has over VPLS is the way the IRB interfaces are configured, in this topology the 3x PE routers, (MX5-1, MX5-2 and MX5-3) all have an identical IRB interface configuration for VLAN 100 and VLAN 101, each PE has the exact same IP address, and MAC address…:

MX5-1:

  1. imtech@MX5-1# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

MX5-2

  1. imtech@MX5-2# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

MX5-3

  1. imtech@MX5-3# run show configuration interfaces irb
  2. unit 100 {
  3.     family inet {
  4.         address 192.168.100.1/24;
  5.     }
  6.     mac 00:00:19:21:68:10;
  7. }
  8. unit 101 {
  9.     family inet {
  10.         address 192.168.101.1/24;
  11.     }
  12.     mac 00:00:19:21:68:11;
  13. }

The first time you see it, you think:

15omtr

But it’s true! all the PEs in the network have the exact same IP address and MAC address on their IRB interfaces, why would we do that? and how does it work?

Consider the following scenario:

Capture2

Imagine a basic data-centre environment running things like VMware or openstack – basically we can provision servers and move them around all over the place using things like VMotion etc. If you can imagine the active server on the left hand portion of the data-centre and business as usual from a networks perspective, arp is learnt between the host and the left hand PE, the default-gateway is 192.168.100.1

Now, imagine that the DC admin flicks the switch, and that active VM on the left is immediately torn down and spun up inside the right hand DC (which could be many miles away) you’ll notice that the interface mac-address and the default-gateway are the same. This gives us the ability to move hosts around our data centres, without having to worry about different default-gateways, or incurring too much downtime whilst we wait for things to re-arp, because everything is identical at each DC site – there’s no problem moving things around between one site or the next.

Capture3

You cannot do this with VPLS as the implementation demands that you use unique MAC-addresses, which moves us on deeper into the technology – how does EVPN achieve this breakthrough?

It’s essentially boils down to the way that EVPN has been engineered to more closely integrate with the layer-3 world, essentially the software has a number of hooks which go between EVPN and L3VPN in a much more elegant fashion than VPLS, for example in the first blog post – it showed how MAC addresses were learnt and inserted into the BGP control-plane, in this example for Inter-VLAN forwarding, a few extra things are happening:

  • Firstly we have the BGP MAC advertisement from the L2 world,
  • Secondly, we get a new MAC/IP advertisement containing the PE’s IRB MAC and IP address – this is linked to the PE’s ARP table
  • Thirdly, we get a totally standard /32 IPv4 L3VPN route for the host’s /32 address, this is advertised to all remote PEs

Let’s recap a more basic version of the lab diagram and see what the control-plane looks like when we send some traffic between hosts in different VLANs:

Capture4

Now lets look at the BGP control-plane on MX-1 and see what’s going on:

  1. imtech@MX5-1> show route protocol bgp table EVPN-100.evpn.0
  2. EVPN-100.evpn.0: 8 destinations, 8 routes (8 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 2:1.1.1.2:100::101::00:00:2e:e6:77:97/304
  5.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  6.                       AS path: I, validation-state: unverified
  7.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  8. 2:1.1.1.2:100::101::00:00:2e:e6:77:97::192.168.101.11/304  
  9.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  10.                       AS path: I, validation-state: unverified
  11.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  12. 3:1.1.1.2:100::100::10.10.10.2/304
  13.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  14.                       AS path: I, validation-state: unverified
  15.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  16. 3:1.1.1.2:100::101::10.10.10.2/304
  17.                    *[BGP/170] 00:04:38, localpref 100, from 10.10.10.2
  18.                       AS path: I, validation-state: unverified
  19.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299968
  20. imtech@MX5-1> show route protocol bgp table VPN-100.inet.0
  21. VPN-100.inet.0: 6 destinations, 9 routes (6 active, 0 holddown, 0 hidden)
  22. + = Active Route, – = Last Active, * = Both
  23. 192.168.100.0/24    [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  24.                       AS path: I, validation-state: unverified
  25.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)
  26. 192.168.101.0/24    [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  27.                       AS path: I, validation-state: unverified
  28.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)
  29. 192.168.101.11/32   [BGP/170] 00:04:44, localpref 100, from 10.10.10.2
  30.                       AS path: I, validation-state: unverified
  31.                     > to 192.169.100.11 via ge-1/1/0.0, Push 16, Push 299968(top)

You’ll immediatley notice that compared to the vanilla L2VPN implementation, there’s a lot more going on – lets break it down,

  • Line 6 is the standard MAC advertisement route, the same sort of advertisement we went over with the vanilla standard L2-only version of EVPN – this is for layer-2 connectivity only.
  • Line 10 is an EVPN MAC/IP route, which is basically the ARP mapping learnt directly from MX2 – this route makes it possible for all PEs in the network to synchronise their arp tables with each other!
  • Line 34 is a standard L3VPN route, containing the /32 host behind MX2

Line 10 essentially means, that as soon as you move a host from one place to another – the moment a packet lands on the ingress PE interface – it generates a new MAC/IP ARP route, and all other PE’s synchronise accordingly, meanwhile the host that’s moved doesn’t need to do anything else – other than keep sending packets at the exact same gateway IP/MAC as it did before it was moved, essentially we have layer-2 and layer-3 working together in harmony.

Line 34 is a standard L3VPN /32 host route for the host behind MX2, this means that if you have EVPN running across numerous data-centres in various places, if this is connected to a wider layer-3 network – such as traditional residential/business PE routers, these other routers don’t need to have any awareness of EVPN whatsoever – so long as they can participate in regular L3VPN then packets will always be delivered to the right place when things get moved around, because these routes are dynamically generated and advertised accordingly. This is a massive advantage over VPLS, as you don’t need to configure it in every corner of the network for it to be useful, it simply lives on your DC edge – the rest is left to vanilla L3VPN.

There are a few more enhancements due at some point soon, including quite an interesting one which is the “MAC mobility extended-community” which is essentially a safeguard to prevent a few rather nasty situations from arising:

  • A layer-2 loop, where two PEs constantly advertise the same MAC addresses – which could overwhelm the BGP control-plane
  • A situation where a pair of hosts each in a different DC are mis-configured with the same MAC address – if they’re both sending data then each PE will be generating route advertisements,

The MAC mobility extended community drafted in RFC 7432 introduces a sequence number, where if the same route is advertised a certain number of times within a specific period, it’s assumed that something is broken and the routers should perform some sort of damping and alerting procedure to prevent network meltdown.

I hope you found this useful! the next one I’ll be looking at some of the redundant designs including single-active and all-active multi-homing.

 

 

 

EVPN – the basics

So I decided to take a deep dive into eVPN, I’ll mostly be looking into VLAN-aware bundling, as per RFC 7432 – and mostly because I think this will fit more closely, with the types of deployments most of the customers are used to – good old IRB interfaces and bridge-tables!

As everyone knows, VPLS has been available for many years now and it’s pretty widely deployed, most of the customers I see have some flavour of VPLS configured on their networks and use it to good effect – so why eVPN? what’s the point in introducing a new technology if the current one appears to work fine.

The reality is that multipoint layer-2 VPNs (VPLS) were never quite as polished as layer-3 VPNs, when layer-3 VPNs were first invented they became, and still are the in many cases the “go to” technology for layer-3 connectivity across MPLS networks, and the technology itself hasn’t really changed that much for well over a decade. The same cannot be said for VPLS, over the years we’ve had many different iterations of the technology:

  • Vanilla VPLS
    • LDP signalled
    • BGP signalled
  • H-VPLS (hierarchical VPLS)
    • BGP based
    • LDP based
  • VPLS auto-discovery

Along with the different types of VPLS, the technology itself has been repeatedly modified with hacks and patches, in order to get around some annoyingly simple problems, for example:

  1. VPLS auto-discovery is only supported under BGP signalling – you can’t do it if you’re using LDP signalled VPLS,
  2. H-VPLS – in order to get around the fully meshed psedudowire problem of vanilla VPLS, H-VPLS introduced a hierarchy, in order to cut down on the amount of pseduowires in large networks, unfortunately the  design often ends up being cumbersome and complicated.
  3. mac-address learning – VPLS has no layer-2 control plane, it learns mac-addresses directly from the data-plane like a standard switch – which is fine if it’s taking place inside a single device, but across a large distributed network with many thousands of mac-addresses, a loss of any attachment circuit can result in stale forwarding state and slow convergence/recovery
  4. all-active CE-Multihoming – simply can’t do it in VPLS, single-homed only, which is a major pain for large-scale modern data centres with lots and lots of layer-2 connectivity
  5. Layer-3 integration – With VPLS it’s typical to use a BVI or IRB interface as the layer-3 gateway to a VLAN, however there’s no real integration between the layer-2 and layer-3 world, you still need VRRP for first hop redundancy – which comes with all the pain you’d expect (traffic black holding, complex tracking requirements, interface timers, etc)

The topology I’m going to use for this is shown below:

Capture

A few basic points about the network:

  • The 3x “P” routers in the core of the network are Juniper M10i series, running nothing other than ISIS/LDP/MPLS
  • The 3x “PE” routers, are Juniper MX5 – each with 14.1.R6.4 loaded on, connectivity is via a 20x1G MIC
  • The 3x “EX4200” switches are doing nothing other than trunking VLAN 100 towards each MX-5
  • Each IXIA port has a single host on VLAN 100

The first lab will look at eVPN with basic MPLS transport – this is essentially a replacement for vanilla VPLS, we have three sites each with a single switch – all in Vlan 100 on a common /24 subnet, nothing fancy going on, no layer-3 routing or bridging anywhere, this is all strictly layer-2 for now.

The first thing to note about eVPN is that the core of it is built around a BGP control-plane, no LDP or anything else, it’s BGP only which is great because we all love BGP, the first thing is to enable the evpn address family, (AFI 25 for L2VPN and the new of SAFI 70 evpn)

(Output taken from MX5-1, but identical on all 3 PEs, <except for IP addressing obviously>)

  1. bgp {
  2.         group iBGP-PEs {
  3.             type internal;
  4.             local-address 10.10.10.1;
  5.             family evpn {
  6.                 signaling;
  7.             }
  8.             neighbor 10.10.10.2;
  9.             neighbor 10.10.10.3;
  10.         }
  11.     }

 

This essentially enables the evpn signalling which is essential, unlike VPLS there’s no manual provisioning of pseudowires, because there are no pseudowires, just like L3 VPNs everything is handled via BGP and uses the same route-distinguishers and route-targets that we’ve all come to love.

The configuration for this lab is pretty much identical across all three PEs but we’ll look at MX5-1 for this example, first the LAN facing interface:

  1. ge-1/1/5 {
  2.         flexible-vlan-tagging;
  3.         encapsulation flexible-ethernet-services;
  4.         unit 100 {
  5.             encapsulation vlan-bridge;
  6.             vlan-id 100;
  7.         }
  8.     }

 

Followed by the evpn routing-instance:

  1. routing-instances {
  2.     EVPN-100 {
  3.         instance-type virtual-switch;
  4.         route-distinguisher 1.1.1.1:100;
  5.         vrf-target target:100:100;
  6.         protocols {
  7.             evpn {
  8.                 extended-vlan-list 100;
  9.             }
  10.         }
  11.         bridge-domains {
  12.             VL-100 {
  13.                 vlan-id 100;
  14.                 interface ge-1/1/5.100;
  15.             }
  16.         }
  17.     }
  18. }

 

A few things to note about the routing-instance:

  • Lines 4 and 5 mark the “RD” and “RT” which essentially the same as a standard L3VPN setup
  • The routing-instance is of type “virtual-switch” and the bridge-domain sits inside it,
  • This is essentially is configured the same as a VPLS virtual-switch, except with a different protocol.

Before we send any traffic or try to get any connectivity, lets take a look at the basic control-plane and exactly what sort of things BGP is getting up to, whilst things are simple.

  1. greg@MX5-1# run show bgp summary
  2. Groups: 1 Peers: 2 Down peers: 0
  3. Table          Tot Paths  Act Paths Suppressed    History Damp State    Pending
  4. bgp.evpn.0
  5.                        2          2          0          0          0          0
  6. Peer                     AS      InPkt     OutPkt    OutQ   Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped…
  7. 10.10.10.2              100        231        231       0       1     1:40:54 Establ
  8.   bgp.evpn.0: 1/1/1/0
  9.   EVPN-100.evpn.0: 1/1/1/0
  10.   __default_evpn__.evpn.0: 0/0/0/0
  11. 10.10.10.3              100        229        231       0       1     1:40:40 Establ
  12.   bgp.evpn.0: 1/1/1/0
  13.   EVPN-100.evpn.0: 1/1/1/0
  14.   __default_evpn__.evpn.0: 0/0/0/0
  15. [edit]
  16. greg@MX5-1#

 

You’ll notice that before we’ve sent any traffic or done anything, that we have two types of table under each established BGP peer:

  • “bgp.evpn.0” for the core-facing BGP adjacency, (the same as regular L3VPN)
  • “EVPN-100.evpn.0” for the routing-instance table, (again the same as regular L3VPN)

You’ll also notice that we’re receiving 1 route from each PE, for each table, if we investigate further and take a look:

  1. greg@MX5-1# run show route table bgp.evpn.0
  2. bgp.evpn.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 3:1.1.1.2:100::100::10.10.10.2/304
  5.                    *[BGP/170] 00:10:42, localpref 100, from 10.10.10.2
  6.                       AS path: I, validation-state: unverified
  7.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  8. 3:1.1.1.3:100::100::10.10.10.3/304  
  9.                    *[BGP/170] 00:10:40, localpref 100, from 10.10.10.3
  10.                       AS path: I, validation-state: unverified
  11.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  12. [edit]
  13. greg@MX5-1# run show route table EVPN-100.evpn.0
  14. EVPN-100.evpn.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
  15. + = Active Route, – = Last Active, * = Both
  16. 3:1.1.1.1:100::100::10.10.10.1/304
  17.                    *[EVPN/170] 00:10:54
  18.                       Indirect
  19. 3:1.1.1.2:100::100::10.10.10.2/304  
  20.                    *[BGP/170] 00:10:49, localpref 100, from 10.10.10.2
  21.                       AS path: I, validation-state: unverified
  22.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  23. 3:1.1.1.3:100::100::10.10.10.3/304  
  24.                    *[BGP/170] 00:10:47, localpref 100, from 10.10.10.3
  25.                       AS path: I, validation-state: unverified
  26.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936

 

Because everyone reading this has eyes like hawks 😉  you’ll immediately notice the strange looking /304 routes coming from each adjacent PE, let’s examine the first one:

3:1.1.1.2:100::100::10.10.10.2/304  

The format is essentially: 3 : <RD> :: <VLAN-ID> :: <ROUTER-ID> /304

It also contains the “ROUTER-ID-LENGTH” which is obviously /32 however Juniper hides this from the output. It should be obvious to most people what all these values are, except for the “3” what does that mean?

It’s important to note, that evpn defines a set of route-route types as shown below:

  • Type 1 – Ethernet auto-discovery route
  • Type 2 – MAC/IP advertisement route
  • Type 3 – Inclusive multicast Ethernet tag route
  • Type 4 – Ethernet segment (ES) route
  • Type 5 – IP prefix route

Type 3 routes are for signalling the inclusive tunnel, with VLAN-Aware evpn each PE generates a VLAN specific inclusive tunnel which is used for BUM (broadcast unknown multicast) traffic. Basically – it’s used to send BUM traffic to all PEs that have sites in the same VLAN, lets look at it in even more detail:

 

  1. greg@MX5-1# run show route table bgp.evpn.0 extensive
  2. bgp.evpn.0: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
  3. 3:1.1.1.2:100::100::10.10.10.2/304 (1 entry, 0 announced)
  4.         *BGP    Preference: 170/-101
  5.                 Route Distinguisher: 1.1.1.2:100
  6. PMSI: Flags 0x0: Label 300512: Type INGRESS-REPLICATION 10.10.10.2
  7.                 Next hop type: Indirect
  8.                 Address: 0x2fa4c34
  9.                 Next-hop reference count: 2
  10.                 Source: 10.10.10.2
  11.                 Protocol next hop: 10.10.10.2
  12.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  13.                 State: <Active Int Ext>
  14.                 Local AS:   100 Peer AS:   100
  15.                 Age: 30:23  Metric2: 1
  16.                 Validation State: unverified
  17.                 Task: BGP_100.10.10.10.2+56692
  18.                 AS path: I
  19.                 Communities: target:100:100
  20.                 Import Accepted
  21.                 Localpref: 100
  22.                 Router ID: 10.10.10.2
  23.                 Secondary Tables: EVPN-100.evpn.0
  24.                 Indirect next hops: 1
  25.                         Protocol next hop: 10.10.10.2 Metric: 1
  26.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  27.                         Indirect path forwarding next hops: 1
  28.                                 Next hop type: Router
  29.                                 Next hop: 192.169.100.11 via ge-1/1/0.0
  30.                                 Session Id: 0x0
  31.             10.10.10.2/32 Originating RIB: inet.3
  32.               Metric: 1           Node path count: 1
  33.               Forwarding nexthops: 1
  34.                 Nexthop: 192.169.100.11 via ge-1/1/0.0

 

Line 6 shows the route-type as PMSI (provider multicast service interface) and is type “ingress-replication” one important thing to note – label 300512 is a downstream allocated label, the same as what’s commonly used in P2MP LSPs for multicast services. Essentially, in this case MX5-1 uses the remotely learnt service label to send BUM traffic to the remote PEs – OR, the other way round, it expects to receive BUM traffic from other remote PEs, tagged with IR label 300512.

Moving on – for people new to evpn, one of the coolest concepts is the way in which BGP is used to advertise mac-addresses… rather than plain old IP subnets – this is fantastic because we now have an intelligent control-plane maintained across the whole network in a scalable and stable fashion, rather than having to rely on less reliable data-plane learning.

For the first basic test, we’ll send bi-directional traffic between host connected to EX4200-1 on MX5-1 and the host connected to EX4200-2 on MX5-2

Lets recap the diagram and spin up some hosts:

Capture2

We’ll start with a single host at each site, and send traffic both ways, 1Mbps each way for a total of 2Mbps, (the hosts are in the same /24 VLAN100 – 192.168.100.1 and 192.168.100.2) 

Capture3

Traffic is being forwarded end to end, lets check the routing and see how the control-plane has changed:

 

  1. greg@MX5-1# run show route table bgp.evpn.0
  2. bgp.evpn.0: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 2:1.1.1.3:100::100::00:00:0e:52:42:29/304  
  5.                    *[BGP/170] 00:04:04, localpref 100, from 10.10.10.3
  6.                       AS path: I, validation-state: unverified
  7.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  8. 3:1.1.1.2:100::100::10.10.10.2/304
  9.                    *[BGP/170] 00:53:37, localpref 100, from 10.10.10.2
  10.                       AS path: I, validation-state: unverified
  11.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  12. 3:1.1.1.3:100::100::10.10.10.3/304
  13.                    *[BGP/170] 00:53:35, localpref 100, from 10.10.10.3
  14.                       AS path: I, validation-state: unverified
  15.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  16. [edit]
  17. greg@MX5-1# run show route table EVPN-100.evpn.0
  18. EVPN-100.evpn.0: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)
  19. + = Active Route, – = Last Active, * = Both
  20. 2:1.1.1.1:100::100::00:00:0e:52:23:91/304      
  21.                    *[EVPN/170] 00:04:13
  22.                       Indirect
  23. 2:1.1.1.3:100::100::00:00:0e:52:42:29/304    
  24.                    *[BGP/170] 00:04:13, localpref 100, from 10.10.10.3
  25.                       AS path: I, validation-state: unverified
  26.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  27. 3:1.1.1.1:100::100::10.10.10.1/304
  28.                    *[EVPN/170] 00:53:51
  29.                       Indirect
  30. 3:1.1.1.2:100::100::10.10.10.2/304
  31.                    *[BGP/170] 00:53:46, localpref 100, from 10.10.10.2
  32.                       AS path: I, validation-state: unverified
  33.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  34. 3:1.1.1.3:100::100::10.10.10.3/304
  35.                    *[BGP/170] 00:53:44, localpref 100, from 10.10.10.3
  36.                       AS path: I, validation-state: unverified
  37.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299936
  38. [edit]
  39. greg@MX5-1#

 

The type-3 routes are still present as before for the inclusive tunnels, but you’ll notice the addition of the new type-2 MAC/IP route, this is essentially a BGP NLRI containing a mac-address instead of an IP subnet – pretty cool huh?

The indirect route is the one learnt locally from the connected LAN, the one known via BGP/170 is the one from the remote PE, packets destined for that mac-address have label 299936 pushed on them, and are forwarded directly out of the MPLS facing core interface, like any regular MPLS packet.

Lets take a more detailed look at a type-2 route:

  1. 2:1.1.1.3:100::100::00:00:0e:52:42:29/304 (1 entry, 1 announced)
  2.         *BGP    Preference: 170/-101
  3.                 Route Distinguisher: 1.1.1.3:100
  4.                 Next hop type: Indirect
  5.                 Address: 0x2705954
  6.                 Next-hop reference count: 4
  7.                 Source: 10.10.10.3
  8.                 Protocol next hop: 10.10.10.3
  9.                 Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  10.                 State: <Secondary Active Int Ext>
  11.                 Local AS:   100 Peer AS:   100
  12.                 Age: 14:20  Metric2: 1
  13.                 Validation State: unverified
  14.                 Task: BGP_100.10.10.10.3+64545
  15.                 Announcement bits (1): 0-EVPN-100-evpn
  16.                 AS path: I
  17.                 Communities: target:100:100
  18.                 Import Accepted
  19.                 Route Label: 300048
  20.                 ESI: 00:00:00:00:00:00:00:00:00:00
  21.                 Localpref: 100
  22.                 Router ID: 10.10.10.3
  23.                 Primary Routing Table bgp.evpn.0
  24.                 Indirect next hops: 1
  25.                         Protocol next hop: 10.10.10.3 Metric: 1
  26.                         Indirect next hop: 0x2 no-forward INH Session ID: 0x0
  27.                         Indirect path forwarding next hops: 1
  28.                                 Next hop type: Router
  29.                                 Next hop: 192.169.100.11 via ge-1/1/0.0
  30.                                 Session Id: 0x0
  31.             10.10.10.3/32 Originating RIB: inet.3
  32.               Metric: 1           Node path count: 1
  33.               Forwarding nexthops: 1
  34.                 Nexthop: 192.169.100.11 via ge-1/1/0.0

 

A basic recap on MPLS forwarding, for the above route MX5-1 is notifying all other PEs in the network, that if they receive a frame on an interface inside “EVPN-100” on VLAN 100 for destination MAC-address 00:00:0e:52:42:29, impose MPLS label 300048 and send it my way.

Another new aspect of evpn can be seen under the “ESI” field, “ESI” stands for “Ethernet segment identifier” essentially it’s a way of labelling individual Ethernet segments, but it’s only used for all-active multihomed designs, any other design it should remain the default of 0x0 (more on ESIs in the next blog)

To demonstrate the control-plane learning and MAC/IP advertisement mechanism more effectively, lets spin up all 3 sites with 50 hosts per site – then send a full mesh of traffic (150 streams in total) and see what the control-plane looks like,

Quick recap of the diagram showing all 3 sites, with 50 hosts per site:

Capture4

Plenty of juicy MAC/IP routes!

 

  1. greg@MX5-1# run show route summary
  2. Autonomous system number: 100
  3. Router ID: 10.10.10.1
  4. inet.0: 14 destinations, 14 routes (14 active, 0 holddown, 0 hidden)
  5.               Direct:      3 routes,      3 active
  6.                Local:      2 routes,      2 active
  7.               Static:      1 routes,      1 active
  8.                IS-IS:      7 routes,      7 active
  9.                  LDP:      1 routes,      1 active
  10. inet.3: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)
  11.                  LDP:      5 routes,      5 active
  12. iso.0: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
  13.               Direct:      1 routes,      1 active
  14. mpls.0: 18 destinations, 18 routes (18 active, 0 holddown, 0 hidden)
  15.                 MPLS:      6 routes,      6 active
  16.                  LDP:      6 routes,      6 active
  17.                 EVPN:      6 routes,      6 active
  18. bgp.evpn.0: 102 destinations, 102 routes (102 active, 0 holddown, 0 hidden)
  19.                  BGP:    102 routes,    102 active
  20.  
  21. EVPN-100.evpn.0: 153 destinations, 153 routes (153 active, 0 holddown, 0 hidden)
  22.                  BGP:    102 routes,    102 active
  23.                 EVPN:     51 routes,     51 active
  24. [edit]
  25. greg@MX5-1#

 

Lots of MAC/IP routes 🙂

A quick look at the BGP table:

 

  1. bgp.evpn.0: 102 destinations, 102 routes (102 active, 0 holddown, 0 hidden)
  2. + = Active Route, – = Last Active, * = Both
  3. 2:1.1.1.2:100::100::00:00:0f:45:a2:8a/304
  4.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  5.                       AS path: I, validation-state: unverified
  6.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  7. 2:1.1.1.2:100::100::00:00:0f:45:a2:8c/304
  8.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  9.                       AS path: I, validation-state: unverified
  10.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  11. 2:1.1.1.2:100::100::00:00:0f:45:a2:8e/304
  12.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  13.                       AS path: I, validation-state: unverified
  14.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  15. 2:1.1.1.2:100::100::00:00:0f:45:a2:90/304
  16.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  17.                       AS path: I, validation-state: unverified
  18.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  19. 2:1.1.1.2:100::100::00:00:0f:45:a2:92/304
  20.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  21.                       AS path: I, validation-state: unverified
  22.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  23. 2:1.1.1.2:100::100::00:00:0f:45:a2:94/304
  24.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  25.                       AS path: I, validation-state: unverified
  26.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  27. 2:1.1.1.2:100::100::00:00:0f:45:a2:96/304
  28.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  29.                       AS path: I, validation-state: unverified
  30.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  31. 2:1.1.1.2:100::100::00:00:0f:45:a2:98/304
  32.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  33.                       AS path: I, validation-state: unverified
  34.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  35. 2:1.1.1.2:100::100::00:00:0f:45:a2:9a/304
  36.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  37.                       AS path: I, validation-state: unverified
  38.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  39. 2:1.1.1.2:100::100::00:00:0f:45:a2:9c/304
  40.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  41.                       AS path: I, validation-state: unverified
  42.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904
  43. 2:1.1.1.2:100::100::00:00:0f:45:a2:9e/304
  44.                    *[BGP/170] 00:07:38, localpref 100, from 10.10.10.2
  45.                       AS path: I, validation-state: unverified
  46.                     > to 192.169.100.11 via ge-1/1/0.0, Push 299904

 

So yeah – it basically goes on and on,

Incidentally, what we gain in using more of the networks resources – we lose in scalability because you cannot get something for nothing. We all know that TCAM, forwarding-tables and BGP tables are limiting factors on even the largest routers, with evpn a very large amount of information is loaded into BGP (every single mac-address on the network) and because each mac-address is totally non-contiguous (different blocks for different vendor nics) they can’t be aggregated or summarised in any way.

If you had a data centre with 500k servers, you’d have 500k MAC/IP advertisements, which is a pretty large burden on the control-plane, in my own time I did some comparisons with tens of thousands of hosts on MX480 routers, with RE1800x4’s and high-end MPCs, and the results were not pretty on a very large network (more than 100k hosts) the control-plane learning was very laggy, and RE’s tended to suffer from very high CPU during the learning process, or if a failover occurred.

The evolution onwards from this is PBB-EVPN (provider backbone bridging EVPN) which essentially allows large numbers of hosts to be represented by a single mac-address, which enables absolutely enormous scalability (millions of hosts per site), at the expense of some feature loss – PBB-EVPNs will be the topic for another blog, where I can hopefully use IXIA to show hundreds of thousands of hosts connected!

Hope you found this useful, (if anyone even read it! 😀 )

 

iBGP for PE-CE

I’ve worked on many large-scale MPLS VPN solutions, some with as many as 20k-30k managed CPEs, and as everybody knows – where you run BGP with this sort of setup. It’s almost always eBGP with a single AS across all sites using AS-override, or each site gets a different AS number, to get around the age-old eBGP loop prevention mechanisms which tend to get in the way when we use L3VPNs.

Recently I came across RFC 6368 which describes how iBGP can actually be used as a PE-CE protocol, in order to make the provider network more transparent from a BGP perspective. Usually there’s no problem running eBGP and 99% of networks seem to operate perfectly fine with it, however if the customer CE routers have a large BGP element behind them, the provider’s AS numbers and interactions with the BGP updates can in some cases cause problems.

Recently Cisco added support to run iBGP for PE-CE with the addition of a new command placed under the VRF – “neighbor <x.x.x.x> internal-vpn-client” in JUNOS the command is “independent-domain” which goes under the routing-options for the routing-instance.

For this configuration, consider the following basic topology:

Untitled-2

CE-1 and CE-2 are both Cisco routers, MX-1 and MX-2 are Juniper MX’s running inet-vpn unicast between loopbacks, with ISIS-L2 and LDP configured in the simplest way possible, with all devices inside BGP-AS 100.

The routing instances on MX-1 and MX-2 are identical, apart from the peering IP address and the route-distinguishers.

routing-instances {
    as100 {
        instance-type vrf;
        interface ge-0/0/4.0;
        route-distinguisher 100:100;
        vrf-target target:100:100;
        routing-options {
            autonomous-system 100 independent-domain;
        }
        protocols {
            bgp {
                group iBGP-CE {
                    type internal;     
                    neighbor 10.10.11.0 {
                        family inet {
                            unicast;
                        }

Notice the command “independent-domain” present under the autonomous-system configuration under the routing-instance on each MX, this essentially allows the device to run iBGP for PE-CE.

The Cisco routers are running a simple configuration, again they’re both identical except for the peering address and LAN interface range:

router bgp 100
bgp log-neighbor-changes
network 10.10.100.0 mask 255.255.255.0
neighbor 10.10.10.1 remote-as 100

BGP comes up as expected on both devices, and the LAN range is reachable from each CE:

CE-1#sh ip route
Codes: L – local, C – connected, S – static, R – RIP, M – mobile, B – BGP
D – EIGRP, EX – EIGRP external, O – OSPF, IA – OSPF inter area
N1 – OSPF NSSA external type 1, N2 – OSPF NSSA external type 2
E1 – OSPF external type 1, E2 – OSPF external type 2
i – IS-IS, su – IS-IS summary, L1 – IS-IS level-1, L2 – IS-IS level-2
ia – IS-IS inter area, * – candidate default, U – per-user static route
o – ODR, P – periodic downloaded static route, H – NHRP, l – LISP
a – application route
+ – replicated route, % – next hop override
Gateway of last resort is not set
      10.0.0.0/8 is variably subnetted, 6 subnets, 3 masks
B        10.10.10.0/31 [200/0] via 10.10.11.1, 00:06:08
C        10.10.11.0/31 is directly connected, Ethernet0/0
L        10.10.11.0/32 is directly connected, Ethernet0/0
C        10.10.100.0/24 is directly connected, Ethernet0/1
L        10.10.100.1/32 is directly connected, Ethernet0/1
B        10.10.200.0/24 [200/0] via 10.10.11.1, 00:06:08
CE-1#ping 10.10.200.1 source 10.10.100.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.200.1, timeout is 2 seconds:
Packet sent with a source address of 10.10.100.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 14/21/32 ms
CE-1#

From the perspective of CE1 and CE2, and also MX-1 and MX-2 all sessions are iBGP sessions, so no AS numbers have been appended to the AS-Sequence, from the perspective of CE-1 and CE-2, it’s business as usual as far as iBGP is concerned – however it must be noted that MX-1 and MX-2 are changing the next-hops when BGP routes are loaded into the RIB – but this is normal VRF L3VPN behaviour anyway.

One of the interesting aspects of this particular feature, apart from the ability to run iBGP for PE-CE sessions, is that it uses a new attribute (optional transitive 128) to effectively hide specific BGP customer attributes and tunnel them through the provider core between PE routers. This means that internal BGP settings set by customers such as local-preference can be tunnelled through the provider core without it interfering with the provider’s best-path selection process. This system is analogous to running OSPF as the PE-CE protocol, where route-types are encoded into VPNv4 and transported across the core, as the OSPF domain-tag.

To demonstrate this, if we modify the local-preference on CE1, so that outgoing routes are set with a local-preference of 250, MX-1 should hide the local-pref value in it’s L3VPN advertisement to MX-2, it’s only on the routes subsequent advertisement from MX-2 to CE-2 that the local-preference value is unmasked.

Set the local-preference to 250 on CE1:

router bgp 100
bgp log-neighbor-changes
network 10.10.100.0 mask 255.255.255.0
neighbor 10.10.11.1 remote-as 100
neighbor 10.10.11.1 route-map lpref out
!
route-map lpref permit 10
set local-preference 250
!

We also see the route (10.10.100.0/24), complete with a local-preference of 250 received intact on CE-2:

CE-2#sh ip bgp
BGP table version is 31, local router ID is 10.10.10.0
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network          Next Hop            Metric LocPrf Weight Path
*>i 10.10.11.0/31    10.10.10.1                    100      0 i
*>i 10.10.100.0/24   10.10.10.1               0    250      0 i
*>  10.10.200.0/24   0.0.0.0                  0         32768 i
CE-2#

When we take a look at the RIB-IN on MX-1, we clearly see the route coming in with a local-pref of 250:

root@PE1> show route 10.10.100.0/24

as100.inet.0: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)
+ = Active Route, – = Last Active, * = Both

10.10.100.0/24     *[BGP/170] 00:21:48, MED 0, localpref 250
AS path: I, validation-state: unverified
> to 10.10.11.0 via ge-0/0/4.0

root@PE1>

However, the big difference is in the L3VPN advertisement from MX-1 to MX-2, the local-preference of 250 is tunnelled inside a new attribute set, and route has the local-preference of whatever the provider’s core is using (in this case the default 100):

Output taken on the 10.10.100.0/24 route, received on MX-2 from MX-1

 

100:100:10.10.100.0/24 (1 entry, 0 announced)
*BGP Preference: 170/-101
Route Distinguisher: 100:100
Next hop type: Indirect
Address: 0x940db4c
Next-hop reference count: 8
Source: 4.4.4.4
Next hop type: Router, Next hop index: 531
Next hop: 192.169.1.3 via ge-0/0/2.0, selected
Label operation: Push 299840
Label TTL action: prop-ttl
Load balance label: Label 299840: None;
Session Id: 0x3
Protocol next hop: 4.4.4.4
Label operation: Push 299840
Label TTL action: prop-ttl
Load balance label: Label 299840: None;
Indirect next hop: 0x972c220 1048576 INH Session ID: 0x10
State:
Local AS: 100 Peer AS: 100
Age: 20:11 Metric2: 1
Validation State: unverified
Task: BGP_100.4.4.4.4+55077
AS path: I
Communities: target:100:100
Import Accepted
VPN Label: 299840
Localpref: 100
Router ID: 4.4.4.4
Secondary Tables: as100.inet.0
Indirect next hops: 1
Protocol next hop: 4.4.4.4 Metric: 1
Label operation: Push 299840
Label TTL action: prop-ttl
Load balance label: Label 299840: None;
Indirect next hop: 0x972c220 1048576 INH Session ID: 0x10
Indirect path forwarding next hops: 1
Next hop type: Router
Next hop: 192.169.1.3 via ge-0/0/2.0
Session Id: 0x3
4.4.4.4/32 Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding nexthops: 1
Nexthop: 192.169.1.3 via ge-0/0/2.0

What’s interesting, is if we disable the “independent-domain” feature on MX-1 and re-check the output from MX-2:

100:100:10.10.100.0/24 (1 entry, 0 announced)
*BGP Preference: 170/-251
Route Distinguisher: 100:100
Next hop type: Indirect
Address: 0x940fcc4
Next-hop reference count: 4
Source: 4.4.4.4
Next hop type: Router, Next hop index: 531
Next hop: 192.169.1.3 via ge-0/0/2.0, selected
Label operation: Push 299856
Label TTL action: prop-ttl
Load balance label: Label 299856: None;
Session Id: 0x3
Protocol next hop: 4.4.4.4
Label operation: Push 299856
Label TTL action: prop-ttl
Load balance label: Label 299856: None;
Indirect next hop: 0x972c220 1048576 INH Session ID: 0x11
State:
Local AS: 100 Peer AS: 100
Age: 25 Metric: 0 Metric2: 1
Validation State: unverified
Task: BGP_100.4.4.4.4+54013
AS path: I
Communities: target:100:100
Import Accepted
VPN Label: 299856
Localpref: 250
Router ID: 4.4.4.4
Secondary Tables: as100.inet.0
Indirect next hops: 1
Protocol next hop: 4.4.4.4 Metric: 1
Label operation: Push 299856
Label TTL action: prop-ttl
Load balance label: Label 299856: None;
Indirect next hop: 0x972c220 1048576 INH Session ID: 0x11
Indirect path forwarding next hops: 1
Next hop type: Router
Next hop: 192.169.1.3 via ge-0/0/2.0
Session Id: 0x3
4.4.4.4/32 Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding nexthops: 1
Nexthop: 192.169.1.3 via ge-0/0/2.0

Essentially this breaks the feature, and the customer’s iBGP attributes are no-longer tunneled, if the provider has any sort of common policy for modifying the best-path selection process using local-preference for L3VPNs, it would obviously conflict with the customer’s setting.

The RFC states that customer specific iBGP attributes are encoded by the receiving PE router, using the “ATTR_SET” attribute, these are applied using the attribute-flags the same as vanilla L3VPNs.

I haven’t used this feature in any designs yet, whilst the RFC was finished in 2011 Cisco only released support for this last year – however it has been present in Juniper for a while, but it could be very effective for simplifying some requirements where customers have significant BGP setups behind PE routers.

Happy Easter!

Subscriber management on Juniper MX with FreeRadius

Quite often on my travels I sometimes encounter technologies I worked on a long time ago that I seem to bump into again later in life, in this case it’s terminating broadband subscribers. Many years ago I worked on large-scale Cisco platform terminating DSL business broadband users on Cisco 7200s over ATM, recently I’ve been involved in a couple of jobs where FTTC users are being terminated on Juniper MX480 routers, using double-tagging and PPPoE, this first post looks into how to setup a Juniper MX router from scratch and terminate PPPoE subscribers authenticated by RADIUS (in this case FreeRadius)

The topology:

topology

 

Equipment used for this is as follows:

  • MX-1 is a Juniper MX-5 router, acting as the BRAS or BNG
  • MX-2 is also an MX-5 is a generic PE with simulated external connectivity
  • EX-4500 is self explanatory, and is basically doing QinQ towards the BNG
  • RADIUS is an Ubuntu server running FreeRadius (explained in more detail later)
  • For Broadband subscribers, I’m lucky to have access to an IXIA XG12 tester

Before we get to the BNG side of things, lets take a look at the access network (EX-4500) essentially, this switch is doing several things:

  • Each individual subscriber is assigned their own VLAN, can be imposed on the port, or the subscriber CPE can send tagged-frames,
  • The EX-4500 pushes an “S-VLAN” onto all subscriber traffic, so frames heading towards the BNG are double-tagged, or “QinQ’d”
    • Outer VLAN or S-TAG = to mark frames coming from the EX-4500
    • Inner VLAN or C-TAG = to mark frames coming from each subscriber
  • This approach allows for a high degree of scale, as you can encapsulate 4096 C-TAGs inside 4096 S-TAGs (basically you can have 16777216 customers in theory)

If we take a look at the VLAN and interface configuration of the EX-4500, it’s relatively straightforward:

VLAN10 {
vlan-id 10;
dot1q-tunneling {
customer-vlans 100-4000;
}
}
ge-0/0/14 {
unit 0 {
family ethernet-switching {
port-mode trunk;
vlan {
members all;
}
}
}
}
xe-0/0/16 {   # link to PPPoE subs
unit 0 {
family ethernet-switching {
port-mode access;
vlan {
members VLAN10;
    }

This snippet of configuration does a couple of basic things:

  • Any frames arriving on xe-0/0/16 with C-VLAN tags within the range of 100-4000 has an additional S-VLAN tag of 10, pushed onto it
  • Double-tagged frames can then egress via ge-0/0/14 towards MX-1
  • ge-0/0/14 must be configured to trunk “all” vlans,
  • Although xe-0/0/16 is configured as an access port in VLAN 10, because that VLAN is configured for dot1q-tunneling, it’s actually expecting single-tagged frames:

 

tim@EX4500-1; show vlans VLAN10 extensive
VLAN: VLAN10, Created at: Sun Dec 27 05:15:19 2015
802.1Q Tag: 10, Internal index: 30, Admin State: Enabled, Origin: Static
Dot1q Tunneling status: Enabled
Customer VLAN ranges:
 100-4000
Protocol: Port Mode, Mac aging time: 300 seconds
Number of interfaces: Tagged 1 (Active = 0), Untagged 1 (Active = 1)
 ge-0/0/14.0, tagged, trunk
 xe-0/0/16.0*, untagged, access

{master:0}
tim@EX4500-1;

Essentially, that’s all we really need to do, to enable basic QinQ switching, if this was an actual FTTx deployment the switch would really be acting as an MSAN (Multi-service access node) but the principle would be exactly the same – each subscriber has their own VLAN, all of those subscribers are represented by a single VLAN (in this case VLAN 10)

Before we configure the MX as a BNG, lets have a quick recap on what PPPoE is and how it works.

A PPPoE device sat on a layer-2 Ethernet network has the same problem that any normal host has, however a normal host will generally have the luxury of having it’s own IP address configured, or assigned via DHCP through mechanisms most people are familiar with, when a PPPoE host appears on the wire a mechanism known as PPPoED (PPPoE discovery) takes place, which includes the following elements:

PADI (PPPoE active discovery initiation)

When a PPPoE host first appears on the wire, it needs a way of discovering the PPPoE BNG or BRAS server, this is achieved rather simply – the host sends a PADI Ethernet broadcast (ff:ff:ff:ff:ff:ff) containing it’s own source unicast MAC address, a Wireshark capture of it looks like so:

Ethernet II, Src: 00:14:01:00:00:01 (00:14:01:00:00:01), Dst: ff:ff:ff:ff:ff:ff (ff:ff:ff:ff:ff:ff)
 Destination: ff:ff:ff:ff:ff:ff (ff:ff:ff:ff:ff:ff)
 Address: ff:ff:ff:ff:ff:ff (ff:ff:ff:ff:ff:ff)
 .... ..1. .... .... .... .... = LG bit: Locally administered address (this is NOT the factory default)
 .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
 Source: 00:14:01:00:00:01 (00:14:01:00:00:01)
 Address: 00:14:01:00:00:01 (00:14:01:00:00:01)
 .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
 .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
 Type: 802.1Q Virtual LAN (0x8100)
802.1Q Virtual LAN
 000. .... .... .... = Priority: 0
 ...0 .... .... .... = CFI: 0
 .... 0000 0110 0100 = ID: 100
 Type: PPPoE Discovery (0x8863)
PPP-over-Ethernet Discovery
 Version: 1
 Type: 1
 Code: Active Discovery Initiation (PADI)
 Session ID: 0000
 Payload Length: 4
PPPoE Tags
 Tag: Service-Name
  • Lines 2 – 3 show the frame as an Ethernet broadcast
  • Lines 11 – 14 show the 802.1Q header, set with a single C-VLAN of 100
  • Lines 19 – 21 detail the PADI discovery packet
  • Line 23 details the PPPoE tags, which might contain an ISP name or QoS variables,

Seeing as this is an Ethernet broadcast, the BNG node should receive it when it does, the mechanism progresses to the next section:

PADO (PPPoE active discovery offer)

Once the BNG node receives the PADI frame, it responds with a PADO (offer) message, this message basically tells the client that there’s an available server on the network,  the BNG server name (MX-1) and crucially, the MAC-address of the BNG terminating interface.

Ethernet II, Src: a8:d0:e5:5b:75:78 (a8:d0:e5:5b:75:78), Dst: 00:14:01:00:00:01 (00:14:01:00:00:01)
Destination: 00:14:01:00:00:01 (00:14:01:00:00:01)
Source: a8:d0:e5:5b:75:78 (a8:d0:e5:5b:75:78)
Type: 802.1Q Virtual LAN (0x8100)
802.1Q Virtual LAN
110. .... .... .... = Priority: 6
...0 .... .... .... = CFI: 0
.... 0000 0110 0100 = ID: 100
Type: PPPoE Discovery (0x8863)
PPP-over-Ethernet Discovery
Version: 1
Type: 1
Code: Active Discovery Offer (PADO)
Session ID: 0000
Payload Length: 33
PPPoE Tags
Tag: AC-Name
String Data: MX5-1
Tag: Service-Name
Tag: AC-Cookie
Binary Data: (16 bytes)
  • Line 3 contains the BNG’s MAC-address
  • Line 13 denotes the frame as a PADO offer
  • Line 18 displays the AC-Name – MX-1

It’s also worth noting that multiple BNG’s can be present on the same Ethernet segment, if more than one BNG is present – the client will basically pick one of them, essentially whichever one replies first.

Once the PPPoE client receives the PADO offer frame, it immediately responds with a request known as a PADR:

PADR (PPPoE active discovery request) 

The function of the PADR is for the PPPoE client, to say to the BNG “Yep, I’d like to connect, please get me online and give me a session ID” because in all the steps up to this point, the session-ID has been 0x000, a PADR packet looks like so:

Frame 94: 64 bytes on wire (512 bits), 64 bytes captured (512 bits) on interface 0
Ethernet II, Src: 00:14:01:00:00:01 (00:14:01:00:00:01), Dst: a8:d0:e5:5b:75:78 (a8:d0:e5:5b:75:78)
Destination: a8:d0:e5:5b:75:78 (a8:d0:e5:5b:75:78)
Source: 00:14:01:00:00:01 (00:14:01:00:00:01)
Type: 802.1Q Virtual LAN (0x8100)
802.1Q Virtual LAN
000. .... .... .... = Priority: 0
...0 .... .... .... = CFI: 0
.... 0000 0110 0100 = ID: 100
Type: PPPoE Discovery (0x8863)
PPP-over-Ethernet Discovery
Version: 1
Type: 1
Code: Active Discovery Request (PADR)
Session ID: 0000
Payload Length: 24
PPPoE Tags
Tag: Service-Name
Tag: AC-Cookie
Binary Data: (16 bytes)

PADS (PPPoE active discovery session-confirmation)

The last part of the PPPoE discovery sequence is when the BNG sends a PADS packet towards the PPPoE client, this contains a unique session-ID and the service-name, in this case MX5-1, with a session-id of 0x0001

Ethernet II, Src: a8:d0:e5:5b:75:78 (a8:d0:e5:5b:75:78), Dst: 00:14:01:00:00:01 (00:14:01:00:00:01)
Destination: 00:14:01:00:00:01 (00:14:01:00:00:01)
Source: a8:d0:e5:5b:75:78 (a8:d0:e5:5b:75:78)
Type: 802.1Q Virtual LAN (0x8100)
802.1Q Virtual LAN
110. .... .... .... = Priority: 6
...0 .... .... .... = CFI: 0
.... 0000 0110 0100 = ID: 100
Type: PPPoE Discovery (0x8863)
PPP-over-Ethernet Discovery
Version: 1
Type: 1
Code: Active Discovery Session-confirmation (PADS)
Session ID: 0001
Payload Length: 33
PPPoE Tags
Tag: Service-Name
Tag: AC-Name
String Data: MX5-1
Tag: AC-Cookie
Binary Data: (16 bytes)

It’s also worth noting that multiple BNG’s can be present on the same Ethernet segment, if more than one BNG is present – the client will basically pick one of them, essentially whichever one replies first.

Once the client receives the PADS session-confirmation, the PPPoE discovery mechanism is complete, next comes PPP LCP and NCP, in order to authenticate the host with RADIUS and provide it with things such as an IP address, LDP and NCP are skipped here for brevity, but will be covered a little later when we move onto the MX configuration.

Juniper MX-5 BNG configuration

Before we get started on the BNG configuration, MX’s require licensing to enable the subscriber management features, the routers I’m using are all licensed, but you can enables these features for a 30 day trial. In terms of hardware I’m using an MX-5 with a 20x1GE PIC shown in line 14 below


tim@MX5-1 show chassis hardware
Hardware inventory:
Item Version Part number Serial number Description
Chassis G0081 MX5-T
Midplane REV 08 711-038215 CAAM9197 MX5-T
PEM 0 Rev 04 740-028288 WA02242 AC Power Entry Module
Routing Engine BUILTIN BUILTIN Routing Engine
TFEB 0 BUILTIN BUILTIN Forwarding Engine Processor
QXM 0 REV 06 711-028408 CAAM9539 MPC QXM
FPC 0 BUILTIN BUILTIN MPC BUILTIN
MIC 0 BUILTIN BUILTIN 4x 10GE XFP
PIC 0 BUILTIN BUILTIN 4x 10GE XFP
FPC 1 BUILTIN BUILTIN MPC BUILTIN
MIC 0 REV 24 750-028392 ZF6501 3D 20x 1GE(LAN) SFP
PIC 0 BUILTIN BUILTIN 10x 1GE(LAN) SFP
Xcvr 0 p NON-JNPR FNS1005Q7VW SFP-SX
PIC 1 BUILTIN BUILTIN 10x 1GE(LAN) SFP
Xcvr 0 NON-JNPR ZZ201280393 SFP-SX
Xcvr 9 NON-JNPR AABM362 UNSUPPORTED
Fan Tray Fan Tray

Traceoptions

The first step in doing BNG on Juniper MX is to enable plenty of traceoptions, there are a lot of individual pieces to the puzzle which must all fit perfectly, if you find yourself in a position where certain things don’t work – you’re going to need to know which traceoptions to use and where to apply them, for this example I’ll globally define an apply-group containing everything I need:

groups {
BNG-TRACES {
system {
auto-configuration {
traceoptions {
file autoconf.log size 10m files 10;
level verbose;
flag all;
}
}
processes {
general-authentication-service {
traceoptions {
file authlog.log size 10m files 10;
flag radius;
}
}
dhcp-service {
traceoptions {
file dhcp.log size 10m files 10;
}
}
}
}
protocols {
ppp-service {
traceoptions {
file pppserv.log size 10m files 10;
level all;
flag all;
}
}
ppp {
traceoptions {
file ppp.log size 10m files 10;
level all;
flag all;
}
}
}
}
}
apply-groups BNG-TRACES;

It must be pointed out that with BNG, if a large number of clients try to connect to the device it imposes a heavy load on the routing-engine, so you should only really have the Traceoptions on in an initial setup, or if you’re fault finding – leaving them on can drastically increase the time it takes to get subscribers terminated and connected, turning them all on or off with an apply-group is the easiest way to do this.

Dynamic profiles

Dynamic profiles form the core of the configuration for doing BNG on an MX, this is basically where the magic happens. A dynamic profile is a way of using pre-defined variables which will trigger the router to do certain things automatically when it receives packets on an interface with a dynamic-profile applied. For example in most switched networks you generally need to configure sub-interfaces in order to process traffic coming from different VLANs, you then also need to create layer-3 interfaces to route that very same traffic, all of this takes time and effort. In the broadband world you might have 64000 subscribers logging in and out constantly, so doing this manually would be impossible, by using a dynamic profile – the router can detect packets and automatically create and tear-down interfaces based on certain variables, for example if the packets are tagged with certain VLAN tags, or if it detects DHCP requests, the router can create the interfaces for you, rather than you having to do them statically.

Because in this example we’re terminating subscribers using stacked-VLAN tagging, or “QinQ” we need the router to create two kinds of interfaces, the first is the layer-2 VLAN interface, or Demux interface, the second will be the layer-3 interface that effectively does the routing for the subscriber:

 

VLAN Demux

 

dynamic-profiles {
 vlan-prof-0 {
 interfaces {
 demux0 {
 unit $junos-interface-unit; {
 no-traps;
 proxy-arp;
 vlan-tags outer $junos-stacked-vlan-id inner $junos-vlan-id;
 demux-options {
 underlying-interface $junos-interface-ifd-name;
 }
 family pppoe {
 duplicate-protection;
 dynamic-profile pppoe-client-profile;
 }
 }
 }
 }
 }

The snippet above contains some predefined-variables, these are identified by anything starting with “$junos-” it’s basically a template configuration, where the blanks are completed by whatever is inside specific bits of the subscriber traffic, for example:

  • Line 5 specifies the junos-interface-unit, this is essentially the interface unit number that the router will generate when the interface is created, instead of you entering it manually
  • Line 8 specifies the C-VLAN and S-VLAN  tags that the interface should switch packets for, when it’s created, for example if layer-2 frames tagged with S-VLAN 10 and C-VLAN 100 hit the router’s main physical interface, the logical unit associated with VLANs 10/100 will always process those frames.
  • Line 10 represents the underlying physical interface for where the packet was received, for example ge-1/0/0
  • Line 12 represents the protocol family configured on the underlying interface, in this case it’s PPPoE and it’s referencing the PPPoE client profile for the next section
  • Line 13 duplicate-protection basically prevents multiple interfaces from being created is the same user with the same MAC-address keeps trying to login

PPPoE Client profile

The next part of the configuration is to create a client-profile, in which we specify the various options we want to apply to the client, when it initiates it’s connection to the BNG, like the example above it contains a number of variables

pppoe-client-profile {
 routing-instances {
 $junos-routing-instance; {
 interface $junos-interface-name;
 routing-options {
 access {
 route $junos-framed-route-ip-address-prefix {
 next-hop $junos-framed-route-nexthop;
 metric $junos-framed-route-cost;
 preference $junos-framed-route-distance&amp;amp;;
 }
 }
 access-internal {
 route $junos-subscriber-ip-address {
 qualified-next-hop $junos-interface-name;
 }
 }
 }
 }
 }
 

 

  • Lines 2 – 3 are relatively self explanatory, but it gives an insight into how scalable such a solution can be, the “$junos-routing-instance” is defined by the Radius configuration, so you can quite easily put subscribers into different routing-instances or just the global routing table.
  • Lines 7 -10 apply default variables for generating a static host-route for the subscribers connection, “$junos-framed-route-nexthop” is simply itself and points to 0.0.0.0 unless specified by Radius
  • Lines 13 – 15 allows the “$junos-framed-route-nexthop” to be resolved by the CPE IP address and logical interface, as a qualified next-hop by the “$junos-interface-name$

 

The PPPoE logical interface

As mentioned earlier when a PPPoE subscriber is terminated on the BNG, the router creates a dynamic PPPoE interface, this is essentially the layer-3 routed interface for the subscriber. Variables are defined under the pp0 interface stanza and follow a similar regime to the previous sections, things like the address-families and PPP options are defined here.

 

interfaces {
 interface-set $junos-svlan-interface-set-name; {
 interface pp0 {
 unit $junos-interface-unit;
 }
 }
 pp0 {
 unit $junos-interface-unit; {
 no-traps;
 ppp-options {
 chap;
 pap;
 }
 pppoe-options {
 underlying-interface $junos-underlying-interface;
 server;
 }
 keepalives interval 30;
 family inet {
 rpf-check;
 unnumbered-address $junos-loopback-interface;
 }
 family inet6 {
 address $junos-ipv6-address;
 unnumbered-address $junos-loopback-interface;
 }
 }
 }
 }
 }
}
  • Lines 1 – 4 instruct the router to generate a pp0 interface-set based off the SVLAN (outer VLAN tag of the subscriber’s session)
  • Lines 7 – 25 give the various options for PPP options, such as PAP/CHAP authentication, the underlying interface which itself becomes a PPPoE server, followed by inet and inet6 address-families
  • The address-families themselves are configured for unnumbered IP addressing, in this case the loopback interfaces specified under the routing-instance.

 

The physical access-facing interface

Before we get to the physical interface, we need to apply the access-profile to the router (aaa-profile) which contains the Radius settings (in the next section) the below configuration binds everything we did in the above 3 sections directly to a physical interface, in this case ge-1/0/0

access-profile aaa-profile;
interfaces {
 ge-1/0/0 {
 flexible-vlan-tagging;
 auto-configure {
 stacked-vlan-ranges {
 dynamic-profile vlan-prof-0 {
 accept [ inet pppoe ];
 ranges {
 10-100,100-4000;
 }
 }
 }
 remove-when-no-subscribers;
 }
 mtu 1530;
 hold-time up 10000 down 10000;
 }

 

 

  • Line 4 specifies flexible-vlan-tagging, which enables double-tagged QinQ transmission on logical interfaces
  • Line 5 is an important one – this instructs the router to auto-configure dynamic interfaces based on all the things we specified so far
  • Line 7 applies the VLAN Demux dynamic profile to the physical interface, so that the layer-2 elements will be built correctly
  • Lines 9 – 10 specify the VLAN-Ranges for S and C VLAN tags entering the interface, in this case the interface will accept frames within VLAN 10-100 for S Vlans, and 100-4000 for C Vlans.
  • Line 14 basically tells the router to delete the logical interface when no subscribers are logged in

Radius and access settings

The settings in the below section are relatively self-explanatory, in this lab the Radius server is connected via the FXP0 interface on the router, and you can see the Radius settings, secret and IP addressing

access {
 radius-server {
 192.168.3.158 {
 port 1812;
 accounting-port 1813;
 secret $9$6vga9CucyKM87n/clMWdV; ## SECRET-DATA
 timeout 10;
 retry 10;
 source-address 192.168.3.61;
 }
 }
 profile lab3 {
 authentication-order radius;
 radius {
 authentication-server 192.168.3.158;
 }
 }
 profile aaa-profile {
 authentication-order radius;
 radius {
 authentication-server 192.168.3.158;
 accounting-server 192.168.3.158;
 options {
 interface-description-format {
 exclude-sub-interface;
 }
 nas-identifier mx5-1;
 accounting-session-id-format decimal;
 vlan-nas-port-stacked-format;
 }
 }
 radius-server {
 192.168.3.158 {
 port 1812;
 accounting-port 1813;
 secret $9$u9AAOBEvWxNdsp0vLN-g4; ## SECRET-DATA
 timeout 10;
 retry 10;
 source-address 192.168.3.61;
 }
 }
 accounting {
 order radius;
 accounting-stop-on-failure;
 accounting-stop-on-access-deny;
 immediate-update;
 coa-immediate-update;
 update-interval 60;
 statistics volume-time;
 }

 

 

Routing-instances and address assignment

For this lab we’re going to terminate subscribers directly into a routing-instance, as specified by the Radius server under the user profile, the routing-instance does a number of things, first we define standard routing-instance attributes – such as the fact that it’s a VRF instance, the interfaces and standard L3VPN settings such as the route-distinguisher and the route-target, but we also specify some access properties and address assignment settings:

routing-instances {
 residential {
 instance-type vrf;
 access {
 address-assignment {
 high-utilization 90;
 abated-utilization 70;
 pool TG-POOL {
 family inet {
 network 10.170.0.0/15;
 range range0 {
 low 10.170.0.0;
 high 10.171.255.255;
 }
 }
 }
 }
 }
 interface lo0.100;
 route-distinguisher 100:101;
 vrf-target target:100:101;
 vrf-table-label;
}

 

  • Lines 4 – 13 define the IP address pool directly on the router, and specifies a /15 range, when the subscriber authenticates with Radius, it’ll get allocated a single /32 host address from that range
  • Line 19 is required in order for the VRF to function, the address under lo0.100 can effectively be anything, but you cannot commit the configuration with having an interface inside the routing-instance

 

That’s the subscriber configuration complete! engineers familiar with the Cisco syntax will note that there’s a lot more configuration required to get a Juniper MX router to work, the Cisco PPPoE server settings were very basic and the majority of the configuration is hidden, Juniper BRAS functionality on MX is relatively new compared to Cisco’s which as been around for decades, so there’s a chance it might be simplified in future.

Freeradius 

For the lab we’re using a fresh install of Freeradius, the setup of which can be found here: http://freeradius.org/doc/ it’s running on an Ubuntu VM with 1 vCPU and 4GB of RAM. Subscribers are maintained inside a “users” file on the Freeradius server, many attributes are supported by Radius and it’s a very versatile platform – for this example we’re just going to use basic settings:

user1@users.com Cleartext-Password := password1;
 ERX-virtual-Router-Name := residential,
 ERX-Address-Pool-Name := TG-POOL,
 Framed-IP-Netmask = 255.255.255.255,

The above snippet from the users file is pretty simple and self explanatory, Freeradius is actually a very flexible platform with many different configuration options and various different front-ends and GUIs.

  • Line 1 means the PPPoE client will fire a CHAP authentication request from user1@users.com with a password of “password1”
  • Lines 2 and 3 are interesting, in the sense that they are denoted by ERX values, the ERX was actually an old BRAS platform that Juniper bought in 2002, before it had any real BRAS product of it’s own, and was formally known as “Redstone communications” they specialised in BRAS and they competed primarily with Cisco GSR10k and the old 7500 series, essentially – Juniper has left the old style ERX attributes in place and not renamed them.
  • Line 2 essentially tells the BNG to put the subscriber into the “residential” routing-instance, removing this will place the subscriber into the global routing table
  • Line 4 basically tells the BNG to allocate a /32 host route from the pool

 

Connectivity!

A quick re-cap of the diagram:

Untitled-2

 

After all that we’re ready to start some connections and see what happens, as I mentioned ages ago – I’ll be using IXIA to simulate PPPoE clients, each client is configured with the username and password specified in the above Freeradius section, I’ll begin by starting a single connection on IXIA:

Capture

The connection is successful – lets look at a Wireshark trace of the PPPoE and PPP negotiation:

93 144.144488380 a8:d0:e5:5b:75:78 00:14:01:00:00:01 PPPoED 64 Active Discovery Offer (PADO)
94 144.144715040 00:14:01:00:00:01 a8:d0:e5:5b:75:78 PPPoED 64 Active Discovery Request (PADR)
95 144.220336780 a8:d0:e5:5b:75:78 00:14:01:00:00:01 PPPoED 64 Active Discovery Session-confirmation (PADS)
96 144.221360740 00:14:01:00:00:01 a8:d0:e5:5b:75:78 PPP LCP 64 Configuration Request
97 144.221955780 a8:d0:e5:5b:75:78 00:14:01:00:00:01 PPP LCP 64 Configuration Request
98 144.222081460 00:14:01:00:00:01 a8:d0:e5:5b:75:78 PPP LCP 64 Configuration Ack
99 144.222448200 a8:d0:e5:5b:75:78 00:14:01:00:00:01 PPP LCP 64 Configuration Ack
100 144.223760200 a8:d0:e5:5b:75:78 00:14:01:00:00:01 PPP CHAP 70 Challenge
101 144.223884140 00:14:01:00:00:01 a8:d0:e5:5b:75:78 PPP CHAP 68 Response
102 144.230453160 a8:d0:e5:5b:75:78 00:14:01:00:00:01 PPP CHAP 64 Success
103 144.230675300 00:14:01:00:00:01 a8:d0:e5:5b:75:78 PPP IPCP 64 Configuration Request
104 144.231844920 a8:d0:e5:5b:75:78 00:14:01:00:00:01 PPP IPCP 64 Configuration Nak

 

Once the PPPoE session is established, LCP negotiates the link parameters, followed by the CHAP authentication, followed by IPCP IP address negotiation (all standard old-school PPP stuff)

Lets take a look on the MX:

tim@MX5-1 show subscribers summary
Subscribers by State
 Active: 2
 Total: 2

Subscribers by Client Type
 VLAN: 1
 PPPoE: 1
 Total: 2

tim@MX5-1 show subscribers extensive
Type: VLAN
Logical System: default
Routing Instance: default
Interface: demux0.1073776124
Interface type: Dynamic
Underlying Interface: ge-1/0/0
Dynamic Profile Name: vlan-prof-0
State: Active
Session ID: 34301
Stacked VLAN Id: 0x8100.10
VLAN Id: 0x8100.100
Login Time: 2016-03-11 23:13:24 UTC

Type: PPPoE
User Name: user1@users.com
IP Address: 10.170.55.177
IP Netmask: 255.255.255.255
Logical System: default
Routing Instance: residential
Interface: pp0.1073776125
Interface type: Dynamic
Interface Set: ge-1/0/0-10
Underlying Interface: demux0.1073776124
Dynamic Profile Name: pppoe-client-profile
MAC Address: 00:14:01:00:00:01
State: Active
Radius Accounting ID: 34302
Session ID: 34302
Stacked VLAN Id: 10
VLAN Id: 100
Login Time: 2016-03-11 23:13:35 UTC
IP Address Pool: TG-POOL

tim@MX5-1;

The astute reader will notice that 2 interfaces are generated, if you remember from the earlier sections – the MX creates two interfaces, a Vlan demux interface for demultiplexing the layer-2 QinQ traffic, and a layer-3 PPPoE interface for routing the subscriber traffic, these interfaces appear on the router along with the others, and can also be viewed by using the standard “show subscribers” command:


tim@MX5-1 show subscribers
Interface IP Address/VLAN ID User Name LS:RI
demux0.1073776124 0x8100.10 0x8100.100 default:default
pp0.1073776125 10.170.55.177 user1@users.com default:residential

tim@MX5-1

From here you can see the VLAN demux0 interface “demux0.1073776124” and it’s associated Ether-type and VLAN tags, 0x8100.10 and 0x8100.100 . Directly below sits the PPPoE interface – pp0.1073776125 and the subscriber has been given an IP address of 10.170.55.177, and the interface has been placed into the default logical-system inside the residential routing-instance, we can see the connected routes by looking in the residential routing-table:


tim@MX5-1 show route table residential.inet.0

residential.inet.0: 3 destinations, 4 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.100.0.2/32 *[Direct/0] 2d 10:41:50
 via lo0.100
10.170.55.177/32 *[Access-internal/12] 00:23:32
 via pp0.1073776125

tim@MX5-1

The routing-instance is configured to directly export the /32 host route to the residential L3 VPN, so if I hop across to MX-2 on the other side of the network, I should see a host route for 10.170.55.177/32 in the VPNv4 BGP table


tim@MX5-2 show route table VRF1.inet.0 10.170.55.177/32

VRF1.inet.0: 4 destinations, 6 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

10.170.55.177/32 *[BGP/170] 00:34:06, localpref 100, from 1.1.1.253
AS path: I, validation-state: unverified
to 192.169.100.15 via ae0.0, Push 16, Push 300112(top)
[BGP/170] 00:34:06, localpref 100, from 1.1.1.254
AS path: I, validation-state: unverified
to 192.169.100.15 via ae0.0, Push 16, Push 300112(top)

tim@MX5-2

I could also ping, but things get a bit weird when trying to Ping IXIA, so I’ll just send real IXIA traffic from the PPPoE subscriber, to another endpoint behind MX-2:

450Mbps both ways:

Capture

 

If we take a look at the pp0 interface, it shows us the statistics for the traffic and the single logical interface attached to it:


tim@MX5-1 show interfaces pp0 extensive
Physical interface: pp0, Enabled, Physical link is Up
Interface index: 131, SNMP ifIndex: 504, Generation: 134
Type: PPPoE, Link-level type: PPPoE, MTU: 1532, Speed: Unspecified
Device flags : Present Running
Interface flags: Point-To-Point SNMP-Traps
Link type : Full-Duplex
Link flags : None
Physical info : Unspecified
Hold-times : Up 0 ms, Down 0 ms
Current address: Unspecified, Hardware address: Unspecified
Alternate link address: Unspecified

Logical interface pp0.1073776125 (Index 336) (SNMP ifIndex 675) (Generation 34445)
Flags: Up Point-To-Point 0x0 Encapsulation: PPPoE
Interface set: ge-1/0/0-10
PPPoE:
State: SessionUp, Session ID: 1,
Session AC name: MX5-1, Remote MAC address: 00:14:01:00:00:01,
Underlying interface: demux0.1073776124 (Index 335)
Traffic statistics:
Input bytes : 28681394289
Output bytes : 29220920079
Input packets: 20057255
Output packets: 20264510
Local statistics:
Input bytes : 107
Output bytes : 287
Input packets: 6
Output packets: 7
Transit statistics:
Input bytes : 28681394182 440755448 bps
Output bytes : 29220919792 444454096 bps
Input packets: 20057249 38527 pps
Output packets: 20264503 38527 pps
Keepalive settings: Interval 30 seconds, Up-count 1, Down-count 3
LCP state: Opened
NCP state: inet: Opened, inet6: Not-configured, iso: Not-configured, mpls: Not-configured
CHAP state: Success
PAP state: Closed
Protocol inet, MTU: 1492, Generation: 34339, Route table: 6
Flags: Sendbcast-pkt-to-re, uRPF
RPF Failures: Packets: 0, Bytes: 0
Addresses, Flags: Is-Primary
Destination: Unspecified, Local: 10.100.0.2, Broadcast: Unspecified, Generation: 14403

tim@MX5-1

And likewise the Demux0 interface:


tim@MX5-1 show interfaces demux0 extensive
Physical interface: demux0, Enabled, Physical link is Up
Interface index: 128, SNMP ifIndex: 501, Generation: 131
Type: Software-Pseudo, Link-level type: Unspecified, MTU: 9192, Clocking: 1, Speed: Unspecified
Device flags : Present Running
Interface flags: Point-To-Point SNMP-Traps
Link type : Full-Duplex
Link flags : None
Physical info : Unspecified
Hold-times : Up 0 ms, Down 0 ms
Current address: Unspecified, Hardware address: Unspecified
Alternate link address: Unspecified
Last flapped : Never
Statistics last cleared: Never
Traffic statistics:
Input bytes : 0
Output bytes : 0
Input packets: 0
Output packets: 0
IPv6 transit statistics:
Input bytes : 0
Output bytes : 0
Input packets: 0
Output packets: 0
Input errors:
Errors: 0, Drops: 0, Framing errors: 0, Runts: 0, Giants: 0, Policed discards: 0, Resource errors: 0
Output errors:
Carrier transitions: 0, Errors: 0, Drops: 0, MTU errors: 0, Resource errors: 0

Logical interface demux0.1073776124 (Index 335) (SNMP ifIndex 665) (Generation 34444)
Flags: Up 0x0 VLAN-Tag [ 0x8100.10 0x8100.100 ] Encapsulation: ENET2
Demux:
Underlying interface: ge-1/0/0 (Index 140)
Traffic statistics:
Input bytes : 36620166019
Output bytes : 37174746186
Input packets: 25573105
Output packets: 25780359
Local statistics:
Input bytes : 84
Output bytes : 117
Input packets: 2
Output packets: 1
Transit statistics:
Input bytes : 36620165935 441367016 bps
Output bytes : 37174746069 444449192 bps
Input packets: 25573103 38527 pps
Output packets: 25780358 38527 pps
Protocol pppoe
Dynamic Profile: pppoe-client-profile,
Service Name Table: None,
Max Sessions: 32000, Max Sessions VSA Ignore: Off,
Duplicate Protection: On, Short Cycle Protection: Off,
Direct Connect: Off,
AC Name: MX5-1
Generation: 34338, Route table: 0

tim@MX5-1

And that about wraps it up, with an authenticated subscriber and successful end to end traffic there’s not that much more to a basic configuration, in recent times there has been some more exotic ways of terminating subscribers, which I may demonstrate in a later post – specifically Pseudowire headend termination – where the PPPoE sessions are tunnelled directly to a centrally located BNG using L2VPN pseudowires – this makes for a very scalable broadband model.

I hope you found this useful!