iBGP for PE-CE

I’ve worked on many large-scale MPLS VPN solutions, some with as many as 20k-30k managed CPEs, and as everybody knows – where you run BGP with this sort of setup. It’s almost always eBGP with a single AS across all sites using AS-override, or each site gets a different AS number, to get around the age-old eBGP loop prevention mechanisms which tend to get in the way when we use L3VPNs.

Recently I came across RFC 6368 which describes how iBGP can actually be used as a PE-CE protocol, in order to make the provider network more transparent from a BGP perspective. Usually there’s no problem running eBGP and 99% of networks seem to operate perfectly fine with it, however if the customer CE routers have a large BGP element behind them, the provider’s AS numbers and interactions with the BGP updates can in some cases cause problems.

Recently Cisco added support to run iBGP for PE-CE with the addition of a new command placed under the VRF – “neighbor <x.x.x.x> internal-vpn-client” in JUNOS the command is “independent-domain” which goes under the routing-options for the routing-instance.

For this configuration, consider the following basic topology:

Untitled-2

CE-1 and CE-2 are both Cisco routers, MX-1 and MX-2 are Juniper MX’s running inet-vpn unicast between loopbacks, with ISIS-L2 and LDP configured in the simplest way possible, with all devices inside BGP-AS 100.

The routing instances on MX-1 and MX-2 are identical, apart from the peering IP address and the route-distinguishers.

routing-instances {
    as100 {
        instance-type vrf;
        interface ge-0/0/4.0;
        route-distinguisher 100:100;
        vrf-target target:100:100;
        routing-options {
            autonomous-system 100 independent-domain;
        }
        protocols {
            bgp {
                group iBGP-CE {
                    type internal;     
                    neighbor 10.10.11.0 {
                        family inet {
                            unicast;
                        }

Notice the command “independent-domain” present under the autonomous-system configuration under the routing-instance on each MX, this essentially allows the device to run iBGP for PE-CE.

The Cisco routers are running a simple configuration, again they’re both identical except for the peering address and LAN interface range:

router bgp 100
bgp log-neighbor-changes
network 10.10.100.0 mask 255.255.255.0
neighbor 10.10.10.1 remote-as 100

BGP comes up as expected on both devices, and the LAN range is reachable from each CE:

CE-1#sh ip route
Codes: L – local, C – connected, S – static, R – RIP, M – mobile, B – BGP
D – EIGRP, EX – EIGRP external, O – OSPF, IA – OSPF inter area
N1 – OSPF NSSA external type 1, N2 – OSPF NSSA external type 2
E1 – OSPF external type 1, E2 – OSPF external type 2
i – IS-IS, su – IS-IS summary, L1 – IS-IS level-1, L2 – IS-IS level-2
ia – IS-IS inter area, * – candidate default, U – per-user static route
o – ODR, P – periodic downloaded static route, H – NHRP, l – LISP
a – application route
+ – replicated route, % – next hop override
Gateway of last resort is not set
      10.0.0.0/8 is variably subnetted, 6 subnets, 3 masks
B        10.10.10.0/31 [200/0] via 10.10.11.1, 00:06:08
C        10.10.11.0/31 is directly connected, Ethernet0/0
L        10.10.11.0/32 is directly connected, Ethernet0/0
C        10.10.100.0/24 is directly connected, Ethernet0/1
L        10.10.100.1/32 is directly connected, Ethernet0/1
B        10.10.200.0/24 [200/0] via 10.10.11.1, 00:06:08
CE-1#ping 10.10.200.1 source 10.10.100.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.10.200.1, timeout is 2 seconds:
Packet sent with a source address of 10.10.100.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 14/21/32 ms
CE-1#

From the perspective of CE1 and CE2, and also MX-1 and MX-2 all sessions are iBGP sessions, so no AS numbers have been appended to the AS-Sequence, from the perspective of CE-1 and CE-2, it’s business as usual as far as iBGP is concerned – however it must be noted that MX-1 and MX-2 are changing the next-hops when BGP routes are loaded into the RIB – but this is normal VRF L3VPN behaviour anyway.

One of the interesting aspects of this particular feature, apart from the ability to run iBGP for PE-CE sessions, is that it uses a new attribute (optional transitive 128) to effectively hide specific BGP customer attributes and tunnel them through the provider core between PE routers. This means that internal BGP settings set by customers such as local-preference can be tunnelled through the provider core without it interfering with the provider’s best-path selection process. This system is analogous to running OSPF as the PE-CE protocol, where route-types are encoded into VPNv4 and transported across the core, as the OSPF domain-tag.

To demonstrate this, if we modify the local-preference on CE1, so that outgoing routes are set with a local-preference of 250, MX-1 should hide the local-pref value in it’s L3VPN advertisement to MX-2, it’s only on the routes subsequent advertisement from MX-2 to CE-2 that the local-preference value is unmasked.

Set the local-preference to 250 on CE1:

router bgp 100
bgp log-neighbor-changes
network 10.10.100.0 mask 255.255.255.0
neighbor 10.10.11.1 remote-as 100
neighbor 10.10.11.1 route-map lpref out
!
route-map lpref permit 10
set local-preference 250
!

We also see the route (10.10.100.0/24), complete with a local-preference of 250 received intact on CE-2:

CE-2#sh ip bgp
BGP table version is 31, local router ID is 10.10.10.0
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network          Next Hop            Metric LocPrf Weight Path
*>i 10.10.11.0/31    10.10.10.1                    100      0 i
*>i 10.10.100.0/24   10.10.10.1               0    250      0 i
*>  10.10.200.0/24   0.0.0.0                  0         32768 i
CE-2#

When we take a look at the RIB-IN on MX-1, we clearly see the route coming in with a local-pref of 250:

root@PE1> show route 10.10.100.0/24

as100.inet.0: 5 destinations, 5 routes (5 active, 0 holddown, 0 hidden)
+ = Active Route, – = Last Active, * = Both

10.10.100.0/24     *[BGP/170] 00:21:48, MED 0, localpref 250
AS path: I, validation-state: unverified
> to 10.10.11.0 via ge-0/0/4.0

root@PE1>

However, the big difference is in the L3VPN advertisement from MX-1 to MX-2, the local-preference of 250 is tunnelled inside a new attribute set, and route has the local-preference of whatever the provider’s core is using (in this case the default 100):

Output taken on the 10.10.100.0/24 route, received on MX-2 from MX-1

 

100:100:10.10.100.0/24 (1 entry, 0 announced)
*BGP Preference: 170/-101
Route Distinguisher: 100:100
Next hop type: Indirect
Address: 0x940db4c
Next-hop reference count: 8
Source: 4.4.4.4
Next hop type: Router, Next hop index: 531
Next hop: 192.169.1.3 via ge-0/0/2.0, selected
Label operation: Push 299840
Label TTL action: prop-ttl
Load balance label: Label 299840: None;
Session Id: 0x3
Protocol next hop: 4.4.4.4
Label operation: Push 299840
Label TTL action: prop-ttl
Load balance label: Label 299840: None;
Indirect next hop: 0x972c220 1048576 INH Session ID: 0x10
State:
Local AS: 100 Peer AS: 100
Age: 20:11 Metric2: 1
Validation State: unverified
Task: BGP_100.4.4.4.4+55077
AS path: I
Communities: target:100:100
Import Accepted
VPN Label: 299840
Localpref: 100
Router ID: 4.4.4.4
Secondary Tables: as100.inet.0
Indirect next hops: 1
Protocol next hop: 4.4.4.4 Metric: 1
Label operation: Push 299840
Label TTL action: prop-ttl
Load balance label: Label 299840: None;
Indirect next hop: 0x972c220 1048576 INH Session ID: 0x10
Indirect path forwarding next hops: 1
Next hop type: Router
Next hop: 192.169.1.3 via ge-0/0/2.0
Session Id: 0x3
4.4.4.4/32 Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding nexthops: 1
Nexthop: 192.169.1.3 via ge-0/0/2.0

What’s interesting, is if we disable the “independent-domain” feature on MX-1 and re-check the output from MX-2:

100:100:10.10.100.0/24 (1 entry, 0 announced)
*BGP Preference: 170/-251
Route Distinguisher: 100:100
Next hop type: Indirect
Address: 0x940fcc4
Next-hop reference count: 4
Source: 4.4.4.4
Next hop type: Router, Next hop index: 531
Next hop: 192.169.1.3 via ge-0/0/2.0, selected
Label operation: Push 299856
Label TTL action: prop-ttl
Load balance label: Label 299856: None;
Session Id: 0x3
Protocol next hop: 4.4.4.4
Label operation: Push 299856
Label TTL action: prop-ttl
Load balance label: Label 299856: None;
Indirect next hop: 0x972c220 1048576 INH Session ID: 0x11
State:
Local AS: 100 Peer AS: 100
Age: 25 Metric: 0 Metric2: 1
Validation State: unverified
Task: BGP_100.4.4.4.4+54013
AS path: I
Communities: target:100:100
Import Accepted
VPN Label: 299856
Localpref: 250
Router ID: 4.4.4.4
Secondary Tables: as100.inet.0
Indirect next hops: 1
Protocol next hop: 4.4.4.4 Metric: 1
Label operation: Push 299856
Label TTL action: prop-ttl
Load balance label: Label 299856: None;
Indirect next hop: 0x972c220 1048576 INH Session ID: 0x11
Indirect path forwarding next hops: 1
Next hop type: Router
Next hop: 192.169.1.3 via ge-0/0/2.0
Session Id: 0x3
4.4.4.4/32 Originating RIB: inet.3
Metric: 1 Node path count: 1
Forwarding nexthops: 1
Nexthop: 192.169.1.3 via ge-0/0/2.0

Essentially this breaks the feature, and the customer’s iBGP attributes are no-longer tunneled, if the provider has any sort of common policy for modifying the best-path selection process using local-preference for L3VPNs, it would obviously conflict with the customer’s setting.

The RFC states that customer specific iBGP attributes are encoded by the receiving PE router, using the “ATTR_SET” attribute, these are applied using the attribute-flags the same as vanilla L3VPNs.

I haven’t used this feature in any designs yet, whilst the RFC was finished in 2011 Cisco only released support for this last year – however it has been present in Juniper for a while, but it could be very effective for simplifying some requirements where customers have significant BGP setups behind PE routers.

Happy Easter!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s