BGP Optimal-route-reflection (BGP-ORR)

Been a while since my last update, been quite busy! but I thought I’d do a post on something BGP related, as everyone loves BGP!

There’s an interesting addition to BGP route-reflection that’s found it’s way into a few trains of code on Juniper and Cisco, (I assume it’s on others too) that attempts to solve one of the annoying issues that occurs when centralised route-reflectors are used.

It all boils down to the basics of path selection, in networks where the setup is relatively simple and identical routes are received, at different edge routers within the network – similar to anycast routing.

Consider the below lab topology;

Screen Shot 2017-06-01 at 22.25.36

The core of the network is super simple, basic ISIS, basic LDP/MPLS with ASR-9kv as an out-of-path route-reflector, with iBGP adjacencies configured along the green arrows, the red arrows signify the eBGP sessions, between AS 100-200 and AS 100-300, where IOSv-7 and IOSv-8 advertise an identical 6.6.6.6/32 route. IOSv-3 and IOSv-4 are just P routers running ISIS/LDP only, for the sake of adding a few hops.

With everything configured as defaults, lets look at the path selection;

 iosv-1#show ip bgp
BGP table version is 6, local router ID is 192.168.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
* i 6.6.6.6/32 192.168.0.4 0 100 0 300 i
*>                    10.0.0.2 0 0 200 i

 

iosv-2#show ip bgp
BGP table version is 29, local router ID is 192.168.0.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
*>i 6.6.6.6/32 192.168.0.4 0 100 0 300 i

iosv-5#sh ip bgp
BGP table version is 27, local router ID is 192.168.0.3
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
*>i 6.6.6.6/32 192.168.0.4 0 100 0 300 i

iosv-6#sh ip bgp
BGP table version is 6, local router ID is 192.168.0.4
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
*> 6.6.6.6/32 10.1.0.2 0 0 300 i

 

 

If we take a brief look at the situation, specifically IOSv-2 and IOSv-5 it’s pretty easy to see what’s happening, the network has basically converged to prefer the path via AS-300 to get to 6.6.6.6/32

For many networks, this sort of thing isn’t a problem – there’s a functional, working path to 6.6.6.6/32, if the edge router connected to AS-300 fails, the path through AS-200 via IOSv-1 will be used to get to the same prefix — everybody is happy because we can ping stuff.

Screen Shot 2017-06-01 at 22.41.33

The problem though, is that even a layman with no knowledge of networks or routing would look at this situation and think ‘that seems a bit rubbish’ especially considering that the basic cost of each of those routers (in a large scale environment) might cost as much as $1million – it seems a bit lame how they can’t make better use of paths.

Surely there has to be a simple way to make better use of paths? First – lets look at why the network has converged in such a way, starting with the route-reflector (ASR-9kv)

RP/0/RP0/CPU0:iosxrv9000-1#sh bgp
Thu Jun 1 21:46:12.042 UTC
BGP router identifier 192.168.0.5, local AS number 100
BGP generic scan interval 60 secs
Non-stop routing is enabled
BGP table state: Active
Table ID: 0xe0000000 RD version: 41
BGP main routing table version 41
BGP NSR Initial initsync version 2 (Reached)
BGP NSR/ISSU Sync-Group versions 0/0
BGP scan interval 60 secs

Status codes: s suppressed, d damped, h history, * valid, > best
i – internal, r RIB-failure, S stale, N Nexthop-discard
Origin codes: i – IGP, e – EGP, ? – incomplete
Network Next Hop Metric LocPrf Weight Path
* i6.6.6.6/32 192.168.0.1 0 100 0 200 i
*>i                 192.168.0.4 0 100 0 300 i

Processed 1 prefixes, 2 paths
RP/0/RP0/CPU0:iosxrv9000-1#show bgp 6.6.6.6/32
Thu Jun 1 21:47:19.015 UTC
BGP routing table entry for 6.6.6.6/32
Versions:
Process bRIB/RIB SendTblVer
Speaker 41 41
Last Modified: Jun 1 20:28:41.601 for 01:18:38
Paths: (2 available, best #2)
Advertised to update-groups (with more than one peer):
0.2
Path #1: Received by speaker 0
Not advertised to any peer
200, (Received from a RR-client)
192.168.0.1 (metric 22) from 192.168.0.1 (192.168.0.1)
Origin IGP, metric 0, localpref 100, valid, internal, group-best
Received Path ID 0, Local Path ID 0, version 0
Path #2: Received by speaker 0
Advertised to update-groups (with more than one peer):
0.2
300, (Received from a RR-client)
192.168.0.4 (metric 21) from 192.168.0.4 (192.168.0.4)
Origin IGP, metric 0, localpref 100, valid, internal, best, group-best
Received Path ID 0, Local Path ID 1, version 23
RP/0/RP0/CPU0:iosxrv9000-1#

 

So it’s pretty easy to see the reason why the path through AS-300 has been selected, with two competing routes, the BGP path selection process works through each of the attributes of the routes before it finds a difference, to select a winner;

1; Weight (no weight configured anywhere on the network other than defaults)

2: Local-preference (both routes have the default of 100)

3: Prefer locally originated routes (both identical, neither are locally originated – they are received from the RR)

4: Prefer shortest AS-Path (both paths lengths are identical)

5: Prefer lowest origin code (both routes have the same origin of IGP)

6: Prefer the lowest MED (both MED values are unconfigured and 0)

7: Prefer eBGP paths over iBGP (IOSv-2 and IOSv-5 receive both paths as iBGP from the RR)

8: Prefer the path with the lowest IGP metric (Bingo! the path via IOSv-6 on AS-300 has a IGP next-hop metric of 21, vs the path via IOSv-1 with it’s IGP next-hop metric of 22)

The problem here, is that once the route-reflector has made this decision – other alternate paths can’t be used in any way at all, because as everyone knows – BGP only normally advertises best-paths, so any other routes received by the route-reflector go no further and aren’t advertised to the network.

In the case of this lab, the only reason this has happened is because one edge router is only slightly closer than another to the route-reflector, so the route-reflector has gone ahead and made the decision for everyone, despite the obvious fact that from a packet forwarding and latency perspective – IOSv-2 has a suboptimal path, it would be much better if IOSv-2 went via IOSv-1 rather than all the way through IOSv-6 to get to 6.6.6.6/32

The diagram with the ISIS metrics imposed shows the simplicity of the problem;

Screen Shot 2017-06-01 at 23.16.02

If we had 1000 edge routers on this network, every single one of them would select the path through IOSv-6 in AS-300 – where IOSv-1 wouldn’t receive a single packet of egress traffic., apart from anything it sends locally, (because eBGP routes are preferred over iBGP)

The problem with IGPs in service-provider networks, is that they’re difficult to tweak at the best of times, even if we made them the same – the RR would still only advertise a single route, based on the next decision in the BGP path selection process (oldest path followed by RID) <yes we know add-paths exists, but that’s not without issues 🙂 >

If we start to manipulate the metrics, that normally has the undesirable result of moving lots of traffic from one link to another – which makes management and planning difficult.

My personal approach would normally be to try and stick to good design, in order to prevent this sort of behaviour. An obvious and simple method and one that’s normally employed in larger ISPs is to have route-reflectors that are pop based in a hierarchy, that is route-reflector clients are always served by a route-reflector that’s closest to them – that way the IGP next-hop costs will always be lower, than relying on a centralised route-reflector that’s buried in the middle of the network, somewhere behind 20 P routers.

For example in the below change to the design, IOSv-1 and IOSv-6 each have their own local route-reflector (RR1 and RR2), in this case each RR is metrically closer to the edge-router it serves, meaning that if the BGP tiebreaker happens to fall on the IGP next-hop cost, the closest value will always be chosen.

Screen Shot 2017-06-01 at 23.33.15

The problem with the above design, is that whilst it’s simpler from a protocols perspective – it ends up being much more expensive and eventually more complex in the long run. If I have 500x POPs that’s a lot of route-reflectors and a more complex hierarchy, along with longer convergence times – but then again with 500x POPs, I’d also have many other issues to contend with.

In smaller networks with perhaps a pair of centralised route-reflectors, we can use BGP-ORR (optimal route reflection) to employ some of the information held inside the IGP LSA database to assist BGP in making a better routing decision.

This is possible because as we all know – with link-state IGPs such as ISIS or OSPF, they each hold a full live state of all links and all paths in the network, so it makes sense to hook into this information, rather than having BGP act in isolation and compute a suboptimal path.

More information on the draft is given below;

https://tools.ietf.org/html/draft-ietf-idr-bgp-optimal-route-reflection-13

So – I’ll go ahead with the existing topology and configure BGP-ORR on the route-reflector only, and we’ll look at how the routing has changed;

A reminder of the topology;

Screen Shot 2017-06-01 at 23.55.37

A quick look at the BGP configuration on ARS9kv;

RP/0/RP0/CPU0:iosxrv9000-1#show run router bgp
Thu Jun 1 22:57:50.574 UTC
router bgp 100
bgp router-id 192.168.0.5
address-family ipv4 unicast
optimal-route-reflection r2 192.168.0.2
optimal-route-reflection r5 192.168.0.3
optimal-route-reflection r7 192.168.0.1
optimal-route-reflection r8 192.168.0.4

!
! iBGP
! iBGP clients
neighbor 192.168.0.1
remote-as 100
description RR client iosv-1
update-source Loopback0
address-family ipv4 unicast
optimal-route-reflection r7
route-reflector-client
!
!
neighbor 192.168.0.2
remote-as 100
description RR client iosv-2
update-source Loopback0
address-family ipv4 unicast
optimal-route-reflection r2
route-reflector-client
!
!
neighbor 192.168.0.3
remote-as 100
description RR client iosv-5
update-source Loopback0
address-family ipv4 unicast
optimal-route-reflection r5
route-reflector-client
!
!
neighbor 192.168.0.4
remote-as 100
description RR client iosv-6
update-source Loopback0
address-family ipv4 unicast
optimal-route-reflection r8
route-reflector-client
!
!
!

RP/0/RP0/CPU0:iosxrv9000-1# sh run router isis
Thu Jun 1 22:57:56.223 UTC
router isis 100
is-type level-2-only
net 49.1921.6800.0005.00
distribute bgp-ls
address-family ipv4 unicast
metric-style wide
!
interface Loopback0
passive
circuit-type level-2-only
address-family ipv4 unicast
!
!
interface GigabitEthernet0/0/0/0
point-to-point
address-family ipv4 unicast
!
!
!RP/0/RP0/CPU0:iosxrv9000-1#

 

Before we go over the configuration, lets look at the results on IOSv-1 and IOSv-5 (recall from a few pages up, that previously both routers had picked the route via IOSv-6 (AS-300)

iosv-2#sh ip bgp
BGP table version is 32, local router ID is 192.168.0.2
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
*>i 6.6.6.6/32 192.168.0.1 0 100 0 200 i
iosv-2#

iosv-5#sh ip bgp
BGP table version is 29, local router ID is 192.168.0.3
Status codes: s suppressed, d damped, h history, * valid, > best, i – internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i – IGP, e – EGP, ? – incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
*>i 6.6.6.6/32 192.168.0.4 0 100 0 300 i
iosv-5#

 

Notice how IOSv-2 and IOSv-5 have each selected their closest peering router (IOSv-1 and IOSv-6) respectively, to get to 6.6.6.6/32, instead of everything going via IOSv-6, as illustrated below;

Screen Shot 2017-06-02 at 10.07.11

For ye of little faith – a traceroute confirms the newer optimised best path from IOSv-2 and IOSv-5 – both routers choose their closest exit (1 hop away)

iosv-2#traceroute 6.6.6.6 source lo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.128.1 1 msec 0 msec 0 msec
2 10.0.0.2 1 msec * 0 msec

iosv-2#

iosv-5#trace 6.6.6.6 source lo0
Type escape sequence to abort.
Tracing the route to 6.6.6.6
VRF info: (vrf in name/id, vrf out name/id)
1 10.0.0.14 1 msec 0 msec 0 msec
2 10.1.0.2 1 msec * 0 msec

iosv-5#

 

So with the configuration applied, how does BGP-ORR actually work?

It all boils down to perspective, that is – rather than the route-reflector making a decision based purely on it’s own information such as it’s own IGP cost to the next-hop. Using BGP-ORR the route-reflector can ‘hook’ into the LSA database and check the IGP next-hop cost from the perspective of the RR client, rather than the RR itself.

This is possible with IGPs because IGPs generally contain a full database of link states that are distributed to all devices running the IGP, which ultimately means we can put the route-reflector anywhere in the network using BGP-ORR. Because we can ‘hack’ the protocol to make a calculation from the perspective of wherever we choose, rather than the current location.

The below diagram illustrates it as simply as possible in the current topology for IOSv-2 only;

Screen Shot 2017-06-02 at 11.13.42

In the above diagram, ASR-9kv decides the best path using IOSv-2’s cost to IOSv-1, by looking at the ISIS database in the same way that IOSv-2 looks at it, or from the perspective of IOSv-2.

If we look at the ISIS routes on IOSv-2, followed by the BGP-ORR policy on the route-reflector, we can see that the route-reflector uses the very same costs.

iosv-2#show ip route isis
Codes: L – local, C – connected, S – static, R – RIP, M – mobile, B – BGP
D – EIGRP, EX – EIGRP external, O – OSPF, IA – OSPF inter area
N1 – OSPF NSSA external type 1, N2 – OSPF NSSA external type 2
E1 – OSPF external type 1, E2 – OSPF external type 2
i – IS-IS, su – IS-IS summary, L1 – IS-IS level-1, L2 – IS-IS level-2
ia – IS-IS inter area, * – candidate default, U – per-user static route
o – ODR, P – periodic downloaded static route, H – NHRP, l – LISP
a – application route
+ – replicated route, % – next hop override, p – overrides from PfR

Gateway of last resort is not set

10.0.0.0/8 is variably subnetted, 8 subnets, 3 masks
i L2 10.2.128.0/30 [115/12] via 10.2.0.1, 01:25:02, GigabitEthernet0/2
192.168.0.0/32 is subnetted, 7 subnets
i L2 192.168.0.1 [115/11] via 10.0.128.1, 01:35:19, GigabitEthernet0/1
i L2 192.168.0.3 [115/13] via 10.2.0.1, 01:34:49, GigabitEthernet0/2
[115/13] via 10.0.128.1, 01:34:49, GigabitEthernet0/1
i L2 192.168.0.4 [115/12] via 10.2.0.1, 01:34:49, GigabitEthernet0/2
i L2 192.168.0.5 [115/2] via 10.2.0.1, 01:25:02, GigabitEthernet0/2
i L2 192.168.0.9 [115/12] via 10.0.128.1, 01:35:09, GigabitEthernet0/1
i L2 192.168.0.10 [115/11] via 10.2.0.1, 01:35:09, GigabitEthernet0/2
iosv-2#

RP/0/RP0/CPU0:iosxrv9000-1#show orrspf database r2
Fri Jun 2 10:20:25.187 UTC

ORR policy: r2, IPv4, RIB tableid: 0xe0000002
Configured root: primary: 192.168.0.2, secondary: NULL, tertiary: NULL
Actual Root: 192.168.0.2, Root node: 1921.6800.0002.0000

Prefix                                  Cost
192.168.0.1                           11
192.168.0.2                           10
192.168.0.3                           13
192.168.0.4                           12
192.168.0.5                            2
192.168.0.9                           12
192.168.0.10                         11

Number of mapping entries: 8
RP/0/RP0/CPU0:iosxrv9000-1#

 

Essentially, the ISIS costs are copied and pasted from the IGP database into the BGP-ORR database, so that the route-reflector can use this information in it’s path selection process.

Lets have a quick review of the route-reflector config;

router bgp 100
bgp router-id 192.168.0.5
address-family ipv4 unicast
optimal-route-reflection r2 192.168.0.2
optimal-route-reflection r5 192.168.0.3
optimal-route-reflection r7 192.168.0.1
optimal-route-reflection r8 192.168.0.4

!
! iBGP
! iBGP clients
neighbor 192.168.0.1
remote-as 100
description RR client iosv-1
update-source Loopback0
address-family ipv4 unicast
optimal-route-reflection r7
route-reflector-client
!
!
neighbor 192.168.0.2
remote-as 100
description RR client iosv-2
update-source Loopback0
address-family ipv4 unicast
optimal-route-reflection r2
route-reflector-client
!
!
neighbor 192.168.0.3
remote-as 100
description RR client iosv-5
update-source Loopback0
address-family ipv4 unicast
optimal-route-reflection r5
route-reflector-client
!
!
neighbor 192.168.0.4
remote-as 100
description RR client iosv-6
update-source Loopback0
address-family ipv4 unicast
optimal-route-reflection r8
route-reflector-client
!
!
!

RP/0/RP0/CPU0:iosxrv9000-1#sh run router isis
Fri Jun 2 10:35:00.027 UTC
router isis 100
is-type level-2-only
net 49.1921.6800.0005.00
distribute bgp-ls
address-family ipv4 unicast
metric-style wide
!
interface Loopback0
passive
circuit-type level-2-only
address-family ipv4 unicast
!
!
interface GigabitEthernet0/0/0/0
point-to-point
address-family ipv4 unicast
!
!
!

 

The first portion of the configuration, we specify the root device that we want to use to compute the IGP cost, in the case of which in the case of IOSv-2 is R2 – 192.168.0.2 in the config. We can also specify secondary and tertiary devices.

Under the RR neighbour configuration, we apply ‘optimal-route-reflection’ along with the policy we configured previously, in the case of IOSv-2 for neighbour 192.168.0.2 

Lastly, under ISIS we need to configure ‘distribute BGP-LS‘ this essentially tells ISIS to distribute it’s information into BGP, more information on BGP-LS (BGP Link-State) see my previous blog post on Segment-routing and ODL

As a conclusion, I think BGP-ORR is a useful addition to the protocol – I’ve certainly worked on networks where it would make sense to implement, unfortunately it only seems to exist in a few trains of code on certain devices. In this lab example – I was using Cisco VIRL spun up under Vagrant on packet.net, where the XR9kv router is the only one that supports BGP-ORR, but I have seen it recently on JUNOS.

As for potential downsides of BGP-ORR, in larger networks it could become quite complicated to design, where you have lots of different routers that all need to be balanced correctly, and in larger networks having centralised route-reflectors can be a big downside and distributed RR designs may work better.

I’d also be interested to see how well BGP-ORR converges in networks with larger LSA databases and BGP tables.

Bye for now! 🙂

6 thoughts on “BGP Optimal-route-reflection (BGP-ORR)”

  1. RE your comment- ‘I’d also be interested to see how well BGP-ORR converges in networks with larger LSA databases and BGP tables’

    I think this is the key concern. We are essentially taking the # of CPU cycles required to compute best path from 1 cycle per route to 1 cycle per route per client router. It would be interesting to think about how this will scale – I would be concerned that it wouldn’t frankly? In your testing, did you see any CPU impact? Did you test with full tables?

    Liked by 1 person

    1. Thanks for the comments,

      I didn’t test with full tables, however I’m pretty sure it would add more CPU load, however I’d also add that it’s probably comparable to running something like R-LFA, or having loads of policies applied to a BGP neighbour. If I was going to deploy this in a live environment, I would certainly scale test it with the global routing table, and run some comparisons to compare,

      Liked by 1 person

  2. I’ve been trying to test BGP-ORR but running in to issues. I’ve tried several XR Code versions but for some reason, as soon as ORR is enabled, my bgp routes start flapping under the rr client routers. When I check pcaps, theres a ton of BGP update messages being sent. This was under a topology smaller than yours with no toplogy changes occurring. Any ideas on what may be wrong?

    Liked by 1 person

Leave a comment