Anybody who’s been to any seminar, associated with any major networking systems manufacturer or bought any recent study material, will almost certainly have come across something new called “Segment Routing” it sounds pretty cool – but what is it and why has it been created?
To understand this we first need to rewind to what most of us are used to doing on a daily basis – designing/building/maintaining/troubleshooting networks, that are built mostly around LDP or RSVP-TE based protocols. But what’s wrong with these protocols? why has Segment-Routing been invented and what problems does it solve?
Before we delve into the depths of Segment-Routing, lets first remind ourselves of what basic LDP based MPLS is. LDP or “Label Distribution Protocol” was first invented around 1999, superseding the now defunct “TGP” or “Tag distribution protocol” in order to solve the problems of traditional IPv4 based routing. Where control-plane resources were finite in nature, MPLS enabled routers to forward packets based solely on labels, rather than destination IP address, allowing for a much more simple design. The fact that the “M” in MPLS stands for “Multiprotocol” allowed engineers to support a whole range of different services and encapsulations, that could be tunnelled between devices in a network running nothing other than traditional IPv4, the role of LDP was to generate and distribute MPLS label bindings to other devices in a network, alongside a common IGP such as ISIS or OSPF.
Back in the late 1990’s and early 2000’s, routers were much smaller and far less powerful – especially where relatively resource intense protocols like OSPF or ISIS were concerned, there was also the problem that protocols like OSPF – which is based on IP were very difficult to modify due to the size of the IP header, as a result rather than modify the IGPs to support MPLS natively – the decision was made to invent a totally separate protocol (LDP) to run alongside the IGPs simply to provide the MPLS label distribution and binding capability – many people today regard LDP as a “Sticking plaster” I myself prefer the phrase “Gaffer tape” 🙂
A quick refresher on how LDP works using a pile of MX routers, consider the following basic topology;
All routers have an identical configuration, the only difference is the ISIS ISO address and the IP addressing;
Assume LDP adjacencies are established between all devices, the following sequence of events occurs;
- MX-4 injects it’s local loopback 22.214.171.124/32 into ISIS, this is advertised throughout the network – LDP also creates an MPLS label-binding for label-value 3 (the implicit-null label) which is advertised towards MX-3;
- MX-3 receives the prefix with the label-binding of 3 (implicit-null) and creates an entry in it’s forwarding table with a “pop” action, for any traffic destined for 126.96.36.199 out of interface xe-0/0/0 (essentially sending the packet unlabelled) at the same time it generates a new outgoing label of “299780” for 188.8.131.52 which is advertised towards MX-2;
- When MX-2 receives 184.108.40.206 with a label binding of 299780, it adds the entry to it’s forwarding table out of interface xe-0/0/1, whilst at the same time forwarding the prefix towards MX-1 with a different label of, “299781” MX-2 is now aware of 2x MPLS labels for 220.127.116.11 – the label of 299780 it received from MX-3 and the new label of 299781 it generated and sent to MX-1, this essentially means any packets coming from MX-1 towards 18.104.22.168, tagged with label 299781 on xe-0/0/0 will be swapped to 299780 and forwarded out of xe-0/0/1 – hence the “hop by hop” forwarding paradigm;
With such a small network involving only 4x routers, it’s difficult to imagine running into problems with LDP because it’s so simple and easy, however the moment you go from 4x routers to 1000x routers or beyond it starts to become far less efficient;
- Because LSRs generate labels for remote FEC’s on a hop-by-hop basis you end up with a large amount of MPLS labels contained in the LFIB which have to be distributed alongside the IGP, resulting in a large amount of overhead. In the above example we have multiple labels for a single prefix with only 3 routers (with the fourth performing PHP)
- We have to run LDP alongside the IGP everywhere, simply for MPLS to work – it’s true that we’ve all been doing this for years so why complain about it now when it works just fine? A simple solution is always the best solution, larger networks would be much simpler if the IGP could be made to accommodate the MPLS label advertisement functionality.
- No traffic-engineering functionality; ultimately at the end of the day, in 99% of networks LDP simply “follows” the IGP best-path mechanism, if you change the IGP metrics you end up shifting large amounts of traffic around which is often undesirable – as such LDP tends to be a pain in the neck, if you have more complex traffic requirements, for example making sure that 40Gbps of streaming video avoids a certain link in the network – with LDP it can’t be done very easily without resorting to endless hacks and tactical tweaks.
So LDP is far from perfect when we get into more complicated scenarios, if we have a larger network where we want to do any sort of traffic-engineering – the only real alternative is RSVP-TE.
RSVP-TE – essentially is an extension of the original “RSVP” Resource Reservation Protocol that allows it to generate MPLS labels for prefixes, whilst at the same time using it’s Resource reservation capabilities to reserve specific LSPs through the network, that require a certain amount of bandwidth – or simply reserving a path that’s determined by the network designer, rather than the IGP and it’s lowest-path-cost mentality.
The rather obvious cost with RSVP-TE is that it’s a lot more complex, I’ve lost count of the amount of times I’ve suggested a relatively simple RSVP-TE solution to a traffic-engineering problem, for the people in the room to simply rule it out just because it’s just too complex in nature – I’ve worked with a small number of global carrier/mobile networks who almost exclusively use RSVP-TE along with it’s fancy features, such as “auto-bandwidth” but the vast majority of smaller networks tend to stay away from it.
A further problem with RSVP-TE is that in large networks with numerous “P” routers and “PE” routers, the LSP state between the ingress and egress LSR must be maintained – in a network with 1000’s of routers, all of that information needs to be signalled – including bandwidth reservations, path reservations so on and so fourth, as opposed to LDP where we simply bind an MPLS label. The end result can be that in some networks control-plane processing can be extremely intense on the route engines if the network encounters a significant failure – imagine a P router with 5k signalled LSPs traversing it, if it drops a link or card – those 5k LSPs need to be recalculated and re-signalled throughout the entire network.
To make matters worse, many networks run LDP and RSVP-TE at the same time, LDP for traditional basic MPLS connectivity, with RSVP-TE LSPs running over the top to provide the traffic-engineering capability, that might be needed in certain niche parts of the network – like keeping sensitive VOIP traffic separate from bulk internet traffic – the complexity ramps up pretty quickly in these environments and you end up with a lot of different protocols stacked up on top of each other – when all we really want to do is just forward packets between routers in a network………. 😀
Which brings me finally to Segment routing!
Segment routing is essentially proposed as a replacement for LDP or RSVP-TE, where the IGP (currently ISIS or OSPF) has been extended to incorporate the MPLS labelling and segment-routing functions internally, leading to the immediate obvious benefit, of not having to run an additional protocol alongside the IGP to provide the MPLS functionality – we can do everything inside ISIS or OSPF.
To make things even cooler, Segment-routing can operate over an IPv4 or IPv6 data-plane, supports ECMP and also has extensions built into it, which allow it cater for things like L3-VPNs or VPLS running over the top. The only thing it can’t do is reserve bandwidth in the same way that RSVP-TE can, but this can be accomplished via the use of an external controller (SDN)
Segment routing support was released on Juniper MX routers under 15.1F6
For now lets look at a basic topology, along with some of the basic concepts and configurations, consider the below expanded topology from the LDP examples above;
Everything is the same, except that I’ve gone an added an additional link between MX-2 and MX-4. The first step is to enable segment-routing, for this network I’m using ISIS as the IGP. Turning segment-routing on is pretty simple – I just need to have MPLS and ISIS enabled on the correct interfaces and switch on “source-packet-routing” under ISIS;
Notice how it’s called “source-packet-routing” essentially, Segment-routing uses a source routing paradigm, where the ingress PE determines the path through the network based on a set of instructions or “segments”
Take this on contrast with RSVP-TE, where the control-plane is source routed (the head-end LSR computes the path through the network to the tail-end) but the packets are only sent with a single RSVP MPLS label, and so the control-plane is source-routed, but the data-plane is not.
With “segment-routing” enabled on all the routers in the network, lets take a look and see what’s what;
We have a normal ISIS adjacency on MX-1;
Let’s check out the ISIS database and see if anything new is present;
So if we look at the ISIS database against MX-1’s neighbour (MX-2) we can see some additional things happening in ISIS;
- We can see that SPRING (Segment-routing) is turned on and is a known TLV
- We can see something called a “P2P IPv4 Adj-SID” with an associated MPLS label
The “IPv4 Adj-SID” is known as the IGP adjacency segment, and is essentially a segment attached to a directly connected IGP adjacency, it’s injected locally by the router at either side of the adjacency – this can easily be demonstrated if we simply have a link between MX-1 and MX-2;
We take another look at the ISIS database on MX1;
So we can see from the ISIS database, that each router on either side of the adjacency has locally generated a label for it’s own side of the link. Consider that this information is injected into the ISIS database, and the ISIS database is flooded throughout the entire network – this gives any ingress LSR the required knowledge to perform traffic-engineering by simply imposing whichever adjacency segment instructions it needs for a packet to take a specific path through the network, for the purposes of traffic-engineering.
Take the below example, if MX-1 sends packets containing the IGP Adj-SID of 10 for MX-2’s link to MX-3 (ADJ-SID = 10) traffic can be steered via MX-3 as soon as it lands on MX-2. Note that whilst MX-2 will allocate it’s ADJ-SID of 10 and distribute it via the IGP, only MX-2 will install that label in the forwarding-table – because it’s locally significant.
The Adjacency segment is of the two main building blocks of segment-routing, and is generally known as a local segment, simply because it’s designed to have a local significance – if a packet arrives on an interface with a specific local-segment instruction in the stack, the device will act on that instruction and forward the packet in a particular way for that segment, or part of the network.
The next type of segment is known as the “nodal segment” or “global segment” and is globally significant, it generally represents the loopback address of each router in the network and is configured as an index, lets go ahead and look at the configuration;
So a relatively straightforward configuration, I’ll go ahead and configure the rest of the network as above but with the following indexes;
- MX-1 = node-segment index-10
- MX-2 = node-segment index-20
- MX-3 = node-segment index-30
- MX-4 = node-segment index-40
So with the node-segment index configured on each router, lets check what’s changed inside the ISIS database on MX-1, for the LSAs received for MX-2 to keep things simple for now;
- Line 8 signifies that MX-2 is advertising a nodal segment block or SRGB “Segment-routing global block” this is essentially a range that all networking vendors have agreed, from which to allocate nodal-segment labels, here is starts at value 800000 and has a maximum range of 4096
- Lines5 51, 56 and 61 show the IGP Adjecency segments we’ve already talked about (for the links to MX-2’s neighbours
- Line 68 is the important one – here we can see a node SID with a value of 20, which is the value I configured under MX-2;
So if I go back onto MX-1 and look at the mpls.0 routing-table – I should see an egress label of 20 for 22.214.171.124?
Wrong! Label 2o doesn’t seem to be anywhere, instead I have 800020..
Remember from the previous example above on line 42 – we have the “SRGB” base starting at 800000. Because global-segments are unique, all routers use the same SRGB block starting at 800000, then each configured loopback index shifts the SRGB base value by the index value. If I configured an index of “666” on MX-4, then it’s global-segment ID would be 800666 and so on.
If we look at the entire ISIS Database on MX-1 for all routers – we can see all the node segments, and their configured values;
We can look at the inet.3 table to see the loopback prefixes of all the routers in the network, being resolved down to their nodal-segment labels;
We see the node-segments for MX-3 and MX-4, but not for MX-2 simply because of PHP – but nevertheless, we can see how it all fits together quite nicely.
It must be pointed out that in a network where packets are simply being forwarded using the global-segment label of the destination, for example; if we wanted to send packets from MX-1 to MX-4 without any traffic-engineering, the same label will be used end to end, (the SRGB base of 800000 + the index 40 = 800040) as opposed to LDP, where labels for a single destination or FEC, are generated on a hop-by-hop basis, and get swapped to different values at every hop. Routers will also perform the same IGP based ECMP hashing for equal-cost paths, essentially the packet forwarding behaves the same as LDP, but with much less state information in the network.
The whole aim of basic segment routing, is to use the global “nodal-segments” alongside local “adjacency-segments” to allow an ingress LSR to calculate an exact path through the network – with much less state than what was previously possible with protocols such as RSVP-TE
For example, if we wanted to perform basic traffic-engineering, and send packets from MX-1 to MX-4, but via the longer path through MX-3, the following things would occur;
MX-1 imposes 2x labels, label 299784 for the Adj-SID of MX-2’s path via MX-3, and label value 800040, (the node-index 40 configured at Mx-4, plus the SRGB base value of 800000) and forwards the packet to MX-2;
MX-2 receives the packet, due to the presence of the ADJ-SID=299784 label, it follows the instruction and forwards the packet out of that link, towards MX-3 – popping the ADJ-SID label in the process;
MX-3 receives the packet with label 800040 (the node-SID of MX-4) performs PHP in the standard way, and forwards the packet direct to MX-4, completing the process. It’s entirely acceptable to use explicit-null to preserve the MPLS label on egress towards MX-4 for the purposes of EXP QoS if you’re running pipe-mode.
Clever readers will notice that segment-routing basically all boils down to a head-end LSR programming it’s own path through the network, by imposing a number of MPLS labels which are treated as instructions – this leads the obvious question of hardware support, even high-end routers have a limitation to the number of MPLS labels that can be handled by an ASIC, the maximum label-depth tends to be 3-5 depending on which model of router or chipset you’re using, so it might be a while until more hardware vendors accommodate larger numbers of labels in the label stack.
Consider the fact that with segment-routing, it’s possible to perform VPN connectivity along with traffic-engineering purely inside ISIS or OSPF, by simply using a much deeper label stack – we could quite quickly end up with 3-5 labels in the stack and hit the limits of our already very expensive linecards.
In terms of providing VPN services and performing things like traffic-engineering, as far as I can tell it’s not possible to do this manually on Juniper router inside the CLI at this time – you need a centralised controller to do this, or a “PCE” – “path computational element” which is generally a server running the controller software, this connects into a “PCC” – “path computational client” which would be the head-end LSR node performing the signalling, as directed by the server (PCE). This generally takes place via a protocol known as PCEP (path computational element protocol)
Essentially the difference between a PCE that’s provisioning RSVP-TE tunnels, and a PCE that’s signalling segments – both tell the head-end LSR how to forward traffic, except with segment-routing, no LSP’s are provisioned – it simply imposes a set of instructions (labels) as opposed to constructing an actual LSP through a chain of devices – again saving on state in the network.
At this time there are a few different controllers on the Market, Juniper’s Northstar, Cisco’s Open SDN, and a freeware controller known as “open daylight” one of my colleagues has managed to get open daylight working with IOS-XR to good effect, I may try and get hold of a demo Northstar license so I can demo this technology in action with IXIA – but that’ll be for next time,
Thanks for reading 🙂