Segment Routing on JUNOS – The basics

Anybody who’s been to any seminar, associated with any major networking systems manufacturer or bought any recent study material, will almost certainly have come across something new called “Segment Routing” it sounds pretty cool – but what is it and why has it been created?

To understand this we first need to rewind to what most of us are used to doing on a daily basis – designing/building/maintaining/troubleshooting networks, that are built mostly around LDP or RSVP-TE based protocols. But what’s wrong with these protocols? why has Segment-Routing been invented and what problems does it solve?

Before we delve into the depths of Segment-Routing, lets first remind ourselves of what basic LDP based MPLS is. LDP or “Label Distribution Protocol” was first invented around 1999, superseding the now defunct “TGP” or “Tag distribution protocol” in order to solve the problems of traditional IPv4 based routing. Where control-plane resources were finite in nature, MPLS enabled routers to forward packets based solely on labels, rather than destination IP address, allowing for a much more simple design. The fact that the “M” in MPLS stands for “Multiprotocol” allowed engineers to support a whole range of different services and encapsulations, that could be tunnelled between devices in a network running nothing other than traditional IPv4, the role of LDP was to generate and distribute MPLS label bindings to other devices in a network, alongside a common IGP such as ISIS or OSPF.

Back in the late 1990’s and early 2000’s, routers were much smaller and far less powerful – especially where relatively resource intense protocols like OSPF or ISIS were concerned, there was also the problem that protocols like OSPF – which is based on IP were very difficult to modify due to the size of the IP header, as a result rather than modify the IGPs to support MPLS natively – the decision was made to invent a totally separate protocol (LDP) to run alongside the IGPs simply to provide the MPLS label distribution and binding capability – many people today regard LDP as a “Sticking plaster” I myself prefer the phrase “Gaffer tape” 🙂

A quick refresher on how LDP works using a pile of MX routers, consider the following basic topology;

seg3

All routers have an identical configuration, the only difference is the ISIS ISO address and the IP addressing;

  1. tim@MX-1> show configuration protocols
  2. isis {
  3.     level 1 disable;
  4.     interface xe-2/0/0.0 {
  5.         point-to-point;
  6.     }
  7.     interface lo0.0 {
  8.         passive;
  9.     }
  10. }
  11. ldp {
  12.     interface xe-2/0/0.0;
  13. }

 

Assume LDP adjacencies are established between all devices, the following sequence of events occurs;

  • MX-4 injects it’s local loopback 4.4.4.4/32 into ISIS, this is advertised throughout the network – LDP also creates an MPLS label-binding for label-value 3 (the implicit-null label) which is advertised towards MX-3;

seg4

  • MX-3 receives the prefix with the label-binding of 3 (implicit-null) and creates an entry in it’s forwarding table with a “pop” action, for any traffic destined for 4.4.4.4 out of interface xe-0/0/0 (essentially sending the packet unlabelled) at the same time it generates a new outgoing label of “299780” for 4.4.4.4 which is advertised towards MX-2;

seg5

  • When MX-2 receives 4.4.4.4 with a label binding of 299780, it adds the entry to it’s forwarding table out of interface xe-0/0/1, whilst at the same time forwarding the prefix towards MX-1 with a different label of, “299781” MX-2 is now aware of 2x MPLS labels for 4.4.4.4 – the label of 299780 it received from MX-3 and the new label of 299781 it generated and sent to MX-1, this essentially means any packets coming from MX-1 towards 4.4.4.4, tagged with label 299781 on xe-0/0/0 will be swapped to 299780 and forwarded out of xe-0/0/1 – hence the “hop by hop” forwarding paradigm;

seg6

With such a small network involving only 4x routers, it’s difficult to imagine running into problems with LDP because it’s so simple and easy, however the moment you go from 4x routers to 1000x routers or beyond it starts to become far less efficient;

  • Because LSRs generate labels for remote FEC’s on a hop-by-hop basis you end up with a large amount of MPLS labels contained in the LFIB which have to be distributed alongside the IGP, resulting in a large amount of overhead. In the above example we have multiple labels for a single prefix with only 3 routers (with the fourth performing PHP)
  • We have to run LDP alongside the IGP everywhere, simply for MPLS to work – it’s true that we’ve all been doing this for years so why complain about it now when it works just fine? A simple solution is always the best solution, larger networks would be much simpler if the IGP could be made to accommodate the MPLS label advertisement functionality.
  • No traffic-engineering functionality; ultimately at the end of the day, in 99% of networks LDP simply “follows” the IGP best-path mechanism, if you change the IGP metrics you end up shifting large amounts of traffic around which is often undesirable – as such LDP tends to be a pain in the neck, if you have more complex traffic requirements, for example making sure that 40Gbps of streaming video avoids a certain link in the network – with LDP it can’t be done very easily without resorting to endless hacks and tactical tweaks.

So LDP is far from perfect when we get into more complicated scenarios, if we have a larger network where we want to do any sort of traffic-engineering – the only real alternative is RSVP-TE.

RSVP-TE – essentially is an extension of the original “RSVP” Resource Reservation Protocol that allows it to generate MPLS labels for prefixes, whilst at the same time using it’s Resource reservation capabilities to reserve specific LSPs through the network, that require a certain amount of bandwidth – or simply reserving a path that’s determined by the network designer, rather than the IGP and it’s lowest-path-cost mentality.

The rather obvious cost with RSVP-TE is that it’s a lot more complex, I’ve lost count of the amount of times I’ve suggested a relatively simple RSVP-TE solution to a traffic-engineering problem, for the people in the room to simply rule it out just because it’s just too complex in nature – I’ve worked with a small number of global carrier/mobile networks who almost exclusively use RSVP-TE along with it’s fancy features, such as “auto-bandwidth” but the vast majority of smaller networks tend to stay away from it.

A further problem with RSVP-TE is that in large networks with numerous “P” routers and “PE” routers, the LSP state between the ingress and egress LSR must be maintained – in a network with 1000’s of routers, all of that information needs to be signalled – including bandwidth reservations, path reservations so on and so fourth, as opposed to LDP where we simply bind an MPLS label. The end result can be that in some networks control-plane processing can be extremely intense on the route engines if the network encounters a significant failure – imagine a P router with 5k signalled LSPs traversing it, if it drops a link or card – those 5k LSPs need to be recalculated and re-signalled throughout the entire network.

To make matters worse, many networks run LDP and RSVP-TE at the same time, LDP for traditional basic MPLS connectivity, with RSVP-TE LSPs running over the top to provide the traffic-engineering capability, that might be needed in certain niche parts of the network – like keeping sensitive VOIP traffic separate from bulk internet traffic – the complexity ramps up pretty quickly in these environments and you end up with a lot of different protocols stacked up on top of each other – when all we really want to do is just forward packets between routers in a network………. 😀

 

Which brings me finally to Segment routing!

 

Segment routing is essentially proposed as a replacement for LDP or RSVP-TE, where the IGP (currently ISIS or OSPF) has been extended to incorporate the MPLS labelling and segment-routing functions internally, leading to the immediate obvious benefit, of not having to run an additional protocol alongside the IGP to provide the MPLS functionality – we can do everything inside ISIS or OSPF.

To make things even cooler, Segment-routing can operate over an IPv4 or IPv6 data-plane, supports ECMP and also has extensions built into it, which allow it cater for things like L3-VPNs or VPLS running over the top. The only thing it can’t do is reserve bandwidth in the same way that RSVP-TE can, but this can be accomplished via the use of an external controller (SDN)

Segment routing support was released on Juniper MX routers under 15.1F6

For now lets look at a basic topology, along with some of the basic concepts and configurations, consider the below expanded topology from the LDP examples above;

seg7

Everything is the same, except that I’ve gone an added an additional link between MX-2 and MX-4. The first step is to enable segment-routing, for this network I’m using ISIS as the IGP. Turning segment-routing on is pretty simple – I just need to have MPLS and ISIS enabled on the correct interfaces and switch on “source-packet-routing” under ISIS;

  1. tim@MX-1# show protocols
  2. mpls {
  3.     interface xe-2/0/0.0;
  4. }
  5. isis {
  6.     source-packet-routing;
  7.     level 1 disable;
  8.     interface xe-2/0/0.0 {
  9.         point-to-point;
  10.     }
  11.     interface lo0.0 {
  12.         passive;
  13.     }
  14. }

 

Notice how it’s called “source-packet-routing” essentially, Segment-routing uses a source routing paradigm, where the ingress PE determines the path through the network based on a set of instructions or “segments”

Take this on contrast with RSVP-TE, where the control-plane is source routed (the head-end LSR computes the path through the network to the tail-end) but the packets are only sent with a single RSVP MPLS label, and so the control-plane is source-routed, but the data-plane is not. 

With “segment-routing” enabled on all the routers in the network, lets take a look and see what’s what;

We have a normal ISIS adjacency on MX-1;

  1. tim@MX-1> show isis adjacency
  2. Interface             System         L State        Hold (secs) SNPA
  3. xe-2/0/0.0            MX-2           2  Up                   21
  4. {master}
  5. tim@MX-1>

 

Let’s check out the ISIS database and see if anything new is present;

  1. tim@MX-1> show isis database extensive MX-2.00
  2. IS-IS level 1 link-state database:
  3. IS-IS level 2 link-state database:
  4. MX-2.00-00 Sequence: 0x28, Checksum: 0x4cff, Lifetime: 616 secs
  5.    IS neighbor: MX-1.00                       Metric:       10
  6.      Two-way fragment: MX-1.00-00, Two-way first fragment: MX-1.00-00
  7.    IS neighbor: MX-3.00                       Metric:       10
  8.      Two-way fragment: MX-3.00-00, Two-way first fragment: MX-3.00-00
  9.    IS neighbor: MX-4.00                       Metric:       10
  10.      Two-way fragment: MX-4.00-00, Two-way first fragment: MX-4.00-00
  11.    IP prefix: 2.2.2.2/32                      Metric:        0 Internal Up
  12.    IP prefix: 10.10.10.0/31                   Metric:       10 Internal Up
  13.    IP prefix: 10.10.10.2/31                   Metric:       10 Internal Up
  14.    IP prefix: 10.10.10.4/31                   Metric:       10 Internal Up
  15.   Header: LSP ID: MX-2.00-00, Length: 315 bytes
  16.     Allocated length: 335 bytes, Router ID: 2.2.2.2
  17.     Remaining lifetime: 616 secs, Level: 2, Interface: 327
  18.     Estimated free bytes: 81, Actual free bytes: 20
  19.     Aging timer expires in: 616 secs
  20.     Protocols: IP, IPv6
  21.   Packet: LSP ID: MX-2.00-00, Length: 315 bytes, Lifetime : 1198 secs
  22.     Checksum: 0x4cff, Sequence: 0x28, Attributes: 0x3 <L1 L2>
  23.     NLPID: 0x83, Fixed length: 27 bytes, Version: 1, Sysid length: 0 bytes
  24.     Packet type: 20, Packet version: 1, Max area: 0
  25.   TLVs:
  26.     Area address: 49.0001 (3)
  27.     LSP Buffer Size: 1492
  28.     Speaks: IP
  29.     Speaks: IPV6
  30.     IP router id: 2.2.2.2
  31.     IP address: 2.2.2.2
  32.     Hostname: MX-2
  33.     Router Capability:  Router ID 2.2.2.2, Flags: 0x01
  34. SPRING Algorithm – Algo: 0
  35.     IS neighbor: MX-1.00, Internal, Metric: default 10
  36.     IS neighbor: MX-3.00, Internal, Metric: default 10
  37.     IS neighbor: MX-4.00, Internal, Metric: default 10
  38.     IS extended neighbor: MX-1.00, Metric: default 10
  39.       IP address: 10.10.10.1
  40.       Neighbor’s IP address: 10.10.10.0
  41.       Local interface index: 328, Remote interface index: 327
  42. P2P IPV4 Adj-SID – Flags:0x30(F:0,B:0,V:1,L:1,S:0), Weight:0, Label: 299784
  43.     IS extended neighbor: MX-3.00, Metric: default 10
  44.       IP address: 10.10.10.2
  45.       Neighbor’s IP address: 10.10.10.3
  46.       Local interface index: 329, Remote interface index: 333
  47. P2P IPV4 Adj-SID – Flags:0x30(F:0,B:0,V:1,L:1,S:0), Weight:0, Label: 299783
  48.     IS extended neighbor: MX-4.00, Metric: default 10
  49.       IP address: 10.10.10.4
  50.       Neighbor’s IP address: 10.10.10.5
  51.       Local interface index: 331, Remote interface index: 333
  52. P2P IPV4 Adj-SID – Flags:0x30(F:0,B:0,V:1,L:1,S:0), Weight:0, Label: 299785
  53.     IP prefix: 2.2.2.2/32, Internal, Metric: default 0, Up
  54.     IP prefix: 10.10.10.0/31, Internal, Metric: default 10, Up
  55.     IP prefix: 10.10.10.2/31, Internal, Metric: default 10, Up
  56.     IP prefix: 10.10.10.4/31, Internal, Metric: default 10, Up
  57.     IP extended prefix: 2.2.2.2/32 metric 0 up
  58.     IP extended prefix: 10.10.10.0/31 metric 10 up
  59.     IP extended prefix: 10.10.10.2/31 metric 10 up
  60.     IP extended prefix: 10.10.10.4/31 metric 10 up
  61.   No queued transmissions
  62. {master}
  63. tim@MX-1>

 

So if we look at the ISIS database against MX-1’s neighbour (MX-2) we can see some additional things happening in ISIS;

  • We can see that SPRING (Segment-routing) is turned on and is a known TLV
  • We can see something called a “P2P IPv4 Adj-SID” with an associated MPLS label

The “IPv4 Adj-SID” is known as the IGP adjacency segment, and is essentially a segment attached to a directly connected IGP adjacency, it’s injected locally by the router at either side of the adjacency – this can easily be demonstrated if we simply have a link between MX-1 and MX-2;

seg8

We take another look at the ISIS database on MX1;

  1. tim@MX-1> show isis database extensive
  2. IS-IS level 1 link-state database:
  3. IS-IS level 2 link-state database:
  4. MX-1.00-00 Sequence: 0x2, Checksum: 0xf229, Lifetime: 827 secs
  5.    IS neighbor: MX-2.00                       Metric:       10
  6.      Two-way fragment: MX-2.00-00, Two-way first fragment: MX-2.00-00
  7.    IP prefix: 1.1.1.1/32                      Metric:        0 Internal Up
  8.    IP prefix: 10.10.10.0/31                   Metric:       10 Internal Up
  9.   Header: LSP ID: MX-1.00-00, Length: 171 bytes
  10.     Allocated length: 1492 bytes, Router ID: 1.1.1.1
  11.     Remaining lifetime: 827 secs, Level: 2, Interface: 0
  12.     Estimated free bytes: 1273, Actual free bytes: 1321
  13.     Aging timer expires in: 827 secs
  14.     Protocols: IP, IPv6
  15.   Packet: LSP ID: MX-1.00-00, Length: 171 bytes, Lifetime : 1198 secs
  16.     Checksum: 0xf229, Sequence: 0x2, Attributes: 0x3 <L1 L2>
  17.     NLPID: 0x83, Fixed length: 27 bytes, Version: 1, Sysid length: 0 bytes
  18.     Packet type: 20, Packet version: 1, Max area: 0
  19.   TLVs:
  20.     Area address: 49.0001 (3)
  21.     LSP Buffer Size: 1492
  22.     Speaks: IP
  23.     Speaks: IPV6
  24.     IP router id: 1.1.1.1
  25.     IP address: 1.1.1.1
  26.     Hostname: MX-1
  27.     Router Capability:  Router ID 1.1.1.1, Flags: 0x01
  28.       SPRING Algorithm – Algo: 0
  29.     IP prefix: 1.1.1.1/32, Internal, Metric: default 0, Up
  30.     IP prefix: 10.10.10.0/31, Internal, Metric: default 10, Up
  31.     IP extended prefix: 1.1.1.1/32 metric 0 up
  32.     IP extended prefix: 10.10.10.0/31 metric 10 up
  33.     IS neighbor: MX-2.00, Internal, Metric: default 10
  34.     IS extended neighbor: MX-2.00, Metric: default 10
  35.       IP address: 10.10.10.0
  36.       Neighbor’s IP address: 10.10.10.1
  37.       Local interface index: 327, Remote interface index: 328
  38. P2P IPV4 Adj-SID – Flags:0x30(F:0,B:0,V:1,L:1,S:0), Weight:0, Label: 299856
  39.   No queued transmissions
  40. MX-2.00-00 Sequence: 0x2, Checksum: 0x90bf, Lifetime: 825 secs
  41.    IS neighbor: MX-1.00                       Metric:       10
  42.      Two-way fragment: MX-1.00-00, Two-way first fragment: MX-1.00-00
  43.    IP prefix: 2.2.2.2/32                      Metric:        0 Internal Up
  44.    IP prefix: 10.10.10.0/31                   Metric:       10 Internal Up
  45.   Header: LSP ID: MX-2.00-00, Length: 171 bytes
  46.     Allocated length: 284 bytes, Router ID: 2.2.2.2
  47.     Remaining lifetime: 825 secs, Level: 2, Interface: 327
  48.     Estimated free bytes: 113, Actual free bytes: 113
  49.     Aging timer expires in: 825 secs
  50.     Protocols: IP, IPv6
  51.   Packet: LSP ID: MX-2.00-00, Length: 171 bytes, Lifetime : 1198 secs
  52.     Checksum: 0x90bf, Sequence: 0x2, Attributes: 0x3 <L1 L2>
  53.     NLPID: 0x83, Fixed length: 27 bytes, Version: 1, Sysid length: 0 bytes
  54.     Packet type: 20, Packet version: 1, Max area: 0
  55.   TLVs:
  56.     Area address: 49.0001 (3)
  57.     LSP Buffer Size: 1492
  58.     Speaks: IP
  59.     Speaks: IPV6
  60.     IP router id: 2.2.2.2
  61.     IP address: 2.2.2.2
  62.     Hostname: MX-2
  63.     Router Capability:  Router ID 2.2.2.2, Flags: 0x01
  64.       SPRING Algorithm – Algo: 0
  65.     IP prefix: 2.2.2.2/32, Internal, Metric: default 0, Up
  66.     IP prefix: 10.10.10.0/31, Internal, Metric: default 10, Up
  67.     IP extended prefix: 2.2.2.2/32 metric 0 up
  68.     IP extended prefix: 10.10.10.0/31 metric 10 up
  69.     IS neighbor: MX-1.00, Internal, Metric: default 10
  70.     IS extended neighbor: MX-1.00, Metric: default 10
  71.       IP address: 10.10.10.1
  72.       Neighbor’s IP address: 10.10.10.0
  73.       Local interface index: 328, Remote interface index: 327
  74. P2P IPV4 Adj-SID – Flags:0x30(F:0,B:0,V:1,L:1,S:0), Weight:0, Label: 299784
  75.   No queued transmissions
  76. {master}
  77. tim@MX-1>

 

So we can see from the ISIS database, that each router on either side of the adjacency has locally generated a label for it’s own side of the link. Consider that this information is injected into the ISIS database, and the ISIS database is flooded throughout the entire network – this gives any ingress LSR the required knowledge to perform traffic-engineering by simply imposing whichever adjacency segment instructions it needs for a packet to take a specific path through the network, for the purposes of traffic-engineering.

Take the below example, if MX-1 sends packets containing the IGP Adj-SID of 10  for MX-2’s link to MX-3 (ADJ-SID = 10) traffic can be steered via MX-3 as soon as it lands on MX-2. Note that whilst MX-2 will allocate it’s ADJ-SID of 10 and distribute it via the IGP, only MX-2 will install that label in the forwarding-table – because it’s locally significant.

seg9

The Adjacency segment is of the two main building blocks of segment-routing, and is generally known as a local segment, simply because it’s designed to have a local significance – if a packet arrives on an interface with a specific local-segment instruction in the stack, the device will act on that instruction and forward the packet in a particular way for that segment, or part of the network.

The next type of segment is known as the “nodal segment” or “global segment” and is globally significant, it generally represents the loopback address of each router in the network and is configured as an index, lets go ahead and look at the configuration;

  1. tim@MX-1> show configuration protocols isis
  2. source-packet-routing {
  3.     node-segment ipv4-index 10;
  4. }
  5. level 1 disable;
  6. interface xe-2/0/0.0 {
  7.     point-to-point;
  8. }
  9. interface lo0.0 {
  10.     passive;
  11. }

 

So a relatively straightforward configuration, I’ll go ahead and configure the rest of the network as above but with the following indexes;

  • MX-1 = node-segment index-10
  • MX-2 = node-segment index-20
  • MX-3 = node-segment index-30
  • MX-4 = node-segment index-40

seg10

So with the node-segment index configured on each router, lets check what’s changed inside the ISIS database on MX-1, for the LSAs received for MX-2 to keep things simple for now;

  1. tim@MX-1> show isis database extensive MX-2
  2. IS-IS level 1 link-state database:
  3. IS-IS level 2 link-state database:
  4. MX-2.00-00 Sequence: 0x73, Checksum: 0xd32e, Lifetime: 479 secs
  5.   IPV4 Index: 20
  6.   Node Segment Blocks Advertised:
  7.     Start Index : 0, Size : 4096, Label-Range: [ 800000, 804095 ]
  8.    IS neighbor: MX-1.00                       Metric:       10
  9.      Two-way fragment: MX-1.00-00, Two-way first fragment: MX-1.00-00
  10.    IS neighbor: MX-3.00                       Metric:       10
  11.      Two-way fragment: MX-3.00-00, Two-way first fragment: MX-3.00-00
  12.    IS neighbor: MX-4.00                       Metric:       10
  13.      Two-way fragment: MX-4.00-00, Two-way first fragment: MX-4.00-00
  14.    IP prefix: 2.2.2.2/32                      Metric:        0 Internal Up
  15.    IP prefix: 10.10.10.0/31                   Metric:       10 Internal Up
  16.    IP prefix: 10.10.10.2/31                   Metric:       10 Internal Up
  17.    IP prefix: 10.10.10.4/31                   Metric:       10 Internal Up
  18.   Header: LSP ID: MX-2.00-00, Length: 335 bytes
  19.     Allocated length: 335 bytes, Router ID: 2.2.2.2
  20.     Remaining lifetime: 479 secs, Level: 2, Interface: 327
  21.     Estimated free bytes: 113, Actual free bytes: 0
  22.     Aging timer expires in: 479 secs
  23.     Protocols: IP, IPv6
  24.   Packet: LSP ID: MX-2.00-00, Length: 335 bytes, Lifetime : 1198 secs
  25.     Checksum: 0xd32e, Sequence: 0x73, Attributes: 0x3 <L1 L2>
  26.     NLPID: 0x83, Fixed length: 27 bytes, Version: 1, Sysid length: 0 bytes
  27.     Packet type: 20, Packet version: 1, Max area: 0
  28.   TLVs:
  29.     Area address: 49.0001 (3)
  30.     LSP Buffer Size: 1492
  31.     Speaks: IP
  32.     Speaks: IPV6
  33.     IP router id: 2.2.2.2
  34.     IP address: 2.2.2.2
  35.     Hostname: MX-2
  36.     Router Capability:  Router ID 2.2.2.2, Flags: 0x01
  37.       SPRING Capability – Flags: 0xc0(I:1,V:1), Range: 4096, SID-Label: 800000
  38.       SPRING Algorithm – Algo: 0
  39.     IS neighbor: MX-1.00, Internal, Metric: default 10
  40.     IS neighbor: MX-3.00, Internal, Metric: default 10
  41.     IS neighbor: MX-4.00, Internal, Metric: default 10
  42.     IS extended neighbor: MX-1.00, Metric: default 10
  43.       IP address: 10.10.10.1
  44.       Neighbor’s IP address: 10.10.10.0
  45.       Local interface index: 328, Remote interface index: 0
  46.       P2P IPV4 Adj-SID – Flags:0x30(F:0,B:0,V:1,L:1,S:0), Weight:0, Label: 299784
  47.     IS extended neighbor: MX-3.00, Metric: default 10
  48.       IP address: 10.10.10.2
  49.       Neighbor’s IP address: 10.10.10.3
  50.       Local interface index: 329, Remote interface index: 0
  51.       P2P IPV4 Adj-SID – Flags:0x30(F:0,B:0,V:1,L:1,S:0), Weight:0, Label: 299789
  52.     IS extended neighbor: MX-4.00, Metric: default 10
  53.       IP address: 10.10.10.4
  54.       Neighbor’s IP address: 10.10.10.5
  55.       Local interface index: 331, Remote interface index: 0
  56.       P2P IPV4 Adj-SID – Flags:0x30(F:0,B:0,V:1,L:1,S:0), Weight:0, Label: 299788
  57.     IP prefix: 2.2.2.2/32, Internal, Metric: default 0, Up
  58.     IP prefix: 10.10.10.0/31, Internal, Metric: default 10, Up
  59.     IP prefix: 10.10.10.2/31, Internal, Metric: default 10, Up
  60.     IP prefix: 10.10.10.4/31, Internal, Metric: default 10, Up
  61.     IP extended prefix: 2.2.2.2/32 metric 0 up
  62.       8 bytes of subtlvs
  63.       Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 20
  64.     IP extended prefix: 10.10.10.0/31 metric 10 up
  65.     IP extended prefix: 10.10.10.2/31 metric 10 up
  66.     IP extended prefix: 10.10.10.4/31 metric 10 up
  67.   No queued transmissions
  68. {master}
  69. tim@MX-1>

 

Some explanations;

  • Line 8 signifies that MX-2 is advertising a nodal segment block or SRGB “Segment-routing global block” this is essentially a range that all networking vendors have agreed, from which to allocate nodal-segment labels, here is starts at value 800000 and has a maximum range of 4096
  • Lines5 51, 56 and 61 show the IGP Adjecency segments we’ve already talked about (for the links to MX-2’s neighbours
  • Line 68 is the important one – here we can see a node SID with a value of 20, which is the value I configured under MX-2;
  1. tim@MX-2> show configuration protocols isis
  2. source-packet-routing {
  3.     node-segment ipv4-index 20;
  4. }
  5. level 1 disable;
  6. interface xe-0/0/0.0 {
  7.     point-to-point;
  8. }
  9. interface xe-0/0/1.0 {
  10.     point-to-point;
  11. }
  12. interface xe-0/0/2.0 {
  13.     point-to-point;
  14. }
  15. interface lo0.0 {
  16.     passive;
  17. }

.

So if I go back onto MX-1 and look at the mpls.0 routing-table – I should see an egress label of 20 for 2.2.2.2?

  1. tim@MX-1> show route table mpls.0
  2. mpls.0: 12 destinations, 12 routes (12 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 0                  *[MPLS/0] 16:12:24, metric 1
  5.                       to table inet.0
  6. 0(S=0)             *[MPLS/0] 16:12:24, metric 1
  7.                       to table mpls.0
  8. 1                  *[MPLS/0] 16:12:24, metric 1
  9.                       Receive
  10. 2                  *[MPLS/0] 16:12:24, metric 1
  11.                       to table inet6.0
  12. 2(S=0)             *[MPLS/0] 16:12:24, metric 1
  13.                       to table mpls.0
  14. 13                 *[MPLS/0] 16:12:24, metric 1
  15.                       Receive
  16. 299856             *[L-ISIS/14] 15:24:07, metric 0
  17.                     > to 10.10.10.1 via xe-2/0/0.0, Pop
  18. 299856(S=0)        *[L-ISIS/14] 00:07:46, metric 0
  19.                     > to 10.10.10.1 via xe-2/0/0.0, Pop
  20. 800020             *[L-ISIS/14] 00:22:52, metric 10
  21.                     > to 10.10.10.1 via xe-2/0/0.0, Pop  
  22. 800020(S=0)        *[L-ISIS/14] 00:07:46, metric 10
  23.                     > to 10.10.10.1 via xe-2/0/0.0, Pop
  24. 800030             *[L-ISIS/14] 00:22:47, metric 20
  25.                     > to 10.10.10.1 via xe-2/0/0.0, Swap 800030
  26. 800040             *[L-ISIS/14] 00:22:40, metric 20
  27.                     > to 10.10.10.1 via xe-2/0/0.0, Swap 800040
  28. {master}
  29. tim@MX-1>

.

Wrong! Label 2o doesn’t seem to be anywhere, instead I have 800020..

Remember from the previous example above on line 42 – we have the “SRGB” base starting at 800000. Because global-segments are unique, all routers use the same SRGB block starting at 800000, then each configured loopback index shifts the SRGB base value by the index value. If I configured an index of “666” on MX-4, then it’s global-segment ID would be 800666 and so on.

If we look at the entire ISIS Database on MX-1 for all routers – we can see all the node segments, and their configured values;

  1. tim@MX-1> show isis database extensive | match node
  2.   Node Segment Blocks Advertised:
  3.       Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 10
  4.   Node Segment Blocks Advertised:
  5.       Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 20
  6.   Node Segment Blocks Advertised:
  7.       Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 30
  8.   Node Segment Blocks Advertised:
  9.       Node SID, Flags: 0x40(R:0,N:1,P:0,E:0,V:0,L:0), Algo: SPF(0), Value: 40
  10. {master}
  11. tim@MX-1>

 

We can look at the inet.3 table to see the loopback prefixes of all the routers in the network, being resolved down to their nodal-segment labels;

  1. tim@MX-1> show route table inet.3
  2. inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
  3. + = Active Route, – = Last Active, * = Both
  4. 2.2.2.2/32         *[L-ISIS/14] 00:31:02, metric 10
  5.                     > to 10.10.10.1 via xe-2/0/0.0
  6. 3.3.3.3/32         *[L-ISIS/14] 00:30:57, metric 20
  7.                     > to 10.10.10.1 via xe-2/0/0.0, Push 800030
  8. 4.4.4.4/32         *[L-ISIS/14] 00:30:50, metric 20
  9.                     > to 10.10.10.1 via xe-2/0/0.0, Push 800040

 

We see the node-segments for MX-3 and MX-4, but not for MX-2 simply because of PHP – but nevertheless, we can see how it all fits together quite nicely.

It must be pointed out that in a network where packets are simply being forwarded using the global-segment label of the destination, for example; if we wanted to send packets from MX-1 to MX-4 without any traffic-engineering, the same label will be used end to end, (the SRGB base of 800000 + the index 40 = 800040) as opposed to LDP, where labels for a single destination or FEC, are generated on a hop-by-hop basis, and get swapped to different values at every hop. Routers will also perform the same IGP based ECMP hashing for equal-cost paths, essentially the packet forwarding behaves the same as LDP, but with much less state information in the network.

 

The whole aim of basic segment routing, is to use the global “nodal-segments” alongside local “adjacency-segments” to allow an ingress LSR to calculate an exact path through the network – with much less state than what was previously possible with protocols such as RSVP-TE

For example, if we wanted to perform basic traffic-engineering, and send packets from MX-1 to MX-4, but via the longer path through MX-3, the following things would occur;

seg11

MX-1 imposes 2x labels, label 299784 for the Adj-SID of MX-2’s path via MX-3, and label value 800040, (the node-index 40 configured at Mx-4, plus the SRGB base value of 800000) and forwards the packet to MX-2;

seg12

MX-2 receives the packet, due to the presence of the ADJ-SID=299784 label, it follows the instruction and forwards the packet out of that link, towards MX-3 – popping the ADJ-SID label in the process;

seg13

MX-3 receives the packet with label 800040 (the node-SID of MX-4) performs PHP in the standard way, and forwards the packet direct to MX-4, completing the process. It’s entirely acceptable to use explicit-null to preserve the MPLS label on egress towards MX-4 for the purposes of EXP QoS if you’re running pipe-mode.

 

Clever readers will notice that segment-routing basically all boils down to a head-end LSR programming it’s own path through the network, by imposing a number of MPLS labels which are treated as instructions – this leads the obvious question of hardware support, even high-end routers have a limitation to the number of MPLS labels that can be handled by an ASIC, the maximum label-depth tends to be 3-5 depending on which model of router or chipset you’re using, so it might be a while until more hardware vendors accommodate larger numbers of labels in the label stack.

Consider the fact that with segment-routing, it’s possible to perform VPN connectivity along with traffic-engineering purely inside ISIS or OSPF, by simply using a much deeper label stack – we could quite quickly end up with 3-5 labels in the stack and hit the limits of our already very expensive linecards.

In terms of providing VPN services and performing things like traffic-engineering, as far as I can tell it’s not possible to do this manually on Juniper router inside the CLI at this time – you need a centralised controller to do this, or a “PCE” – “path computational element” which is generally a server running the controller software, this connects into a “PCC” – “path computational client” which would be the head-end LSR node performing the signalling, as directed by the server (PCE). This generally takes place via a protocol known as PCEP (path computational element protocol)

Essentially the difference between a PCE that’s provisioning RSVP-TE tunnels, and a PCE that’s signalling segments – both tell the head-end LSR how to forward traffic, except with segment-routing, no LSP’s are provisioned – it simply imposes a set of instructions (labels) as opposed to constructing an actual LSP through a chain of devices – again saving on state in the network.

At this time there are a few different controllers on the Market, Juniper’s Northstar, Cisco’s Open SDN, and a freeware controller known as “open daylight” one of my colleagues has managed to get open daylight working with IOS-XR to good effect, I may try and get hold of a demo Northstar license so I can demo this technology in action with IXIA – but that’ll be for next time,

Thanks for reading 🙂

 

 

12 thoughts on “Segment Routing on JUNOS – The basics”

  1. I’m unfamiliar with how you can replace mpbgp with sr, can you elaborate? Surely the scale alone prevents the igp carrying the detail required in an nlri? For sure use sr in your igp to ultimately replace LDP, rsvp-te but I don’t see how it can replace VPN label signalling. I don’t know a huge amount about sr extensions to bgp but I would have thought that would indicate less of a reliance on the igp and provide a stronger case for bgp everywhere?

    Liked by 1 person

    1. You’re right actually! I got confused when reading the RFC and some stuff on ODN (on-demand next-hop) that Cisco are working on – which does actually use MP BGP (I thought it used a controller) I’ll edit accordingly, cheers 🙂

      Like

      1. Cool, I can’t replace my ‘BGP all the things’ with ‘IGP all the things’ just yet 🙂

        Like

  2. Hi Tim,

    Great article, and I believe the first to describe SR on Junos.
    Some comments:
    The precursor of LDP was called TDP and had been implemented on Cisco gear, for long time you could choose between those 2.

    I think it is not obvious from the reading that without configuring nodal-sid, adj-sids can’t be used, because, as you mentioned they are local to the node originating (and signaled with L bit set).

    Wrt to label stack depth HW support, it is around 4 for merchant low end and 10+ for high end vendor and merchant.
    The max depth could be signaled thru IGP/BGP-LS extensions as defined in:
    draft-tantsura-ospf/isis/bgp-ls-segment-routing-msd

    PCEP has SR extensions as well, defined in draft-ietf-pce-segment-routing

    There’s full implementation in ODL, ONOS is working on it

    Hope this helps,
    Jeff

    Like

      1. One of my colleagues went to the very same conference and said the same things… It’s pretty frustrating at the moment as most of our customers are looking at SR, but Juniper are miles behind on it – I had SR working in the lab on a Cisco NCS6k in 2014/15 and it’s taken until now for it to appear on Junos,

        Like

      2. I was there.
        This was kind of pathetic, as well as following presentations as to why RSVP is better than SR.
        Since I first time configured a routing policy on M40 around 2002 i felt in love with Junos, seeing this behavior was rather upsetting… You’d expect an old company protecting their assent to act that way, not a company that used to be among the most innovative ones. Hopefully, with Rami things have changed…

        Like

    1. Cheers Jeff, some useful information there, one of my colleagues actually has ODL working inside VIRL on IOS-XR which is pretty nice, he’s also looking at Pathman too which we checked out today. It would be interesting to see where vendors are regarding the label depth and hardware, as there seems to be a lot of smoke and mirrors when trying to get answers from some of the vendors – although I imagine most of it hasn’t been ironed out yet.

      Like

Leave a comment