This is a follow up netlab article about EXOS ESRP simple setup configuration. The prerequisite is the EXOS ESRP adding OSPF. In this step, the ospfrid will be configured and loopback interfaces created and added to OSPF. Routers RS3 and RS4 which have the same ospfrid - 5.5.5.5. Both routers configuration will be changed.

The goal is to get the floating IP prefix 10.1.2.0/24 stable and minimise the amount of generated OSPF events in the area 0. By applying this the network solution using ESRP should get deterministic and operative.

network topology

OSPF configuration is added to RS3 and RS4 only on top of already running and configuration from previous ESRP article EXOS ESRP adding OSPF:

OSPF router-id overview:

sysName ospf router-id
RS1 1.1.1.1
RS2 2.2.2.2
RS3 3.3.3.3
RS4 4.4.4.4

Network topology showing relevant IP and OSPF settings:

     1.1.1.1                                     2.2.2.2
    +-------+                                   +-------+
    |       | 12                             12 |       |
    |  RS1  |-----------------------------------|  RS2  |
    |       |                                   |       |
    +-------+                                   +-------+
     10 |                                           | 10
        |                  (OSPF)                   |
        |                                           |
        |                                           |
     3.3.3.3                                     4.4.4.4 
    +-------+                                   +-------+
    |       | 3                               3 |       |
    |  RS3  +-----------------------------------+  RS4  |
    |       |                                   |       |
    +---+---+                                   +---+---+
     1  |                                           | 1
        |                 ESRP VR1                  |
        |                 10.1.2.3                  |
        |                                           |
        |                 +-------+                 |
        |                 |       |                 |
        +-----------------+  RS5  +-----------------+
                          |       |
                          +-+---+-+
                            |   |
              +-------------+   +-------------+
              | eth0                     eth0 |
          +---+---+                       +---+---+
          |       |                       |       |
          | node1 |                       | node2 |
          |       |                       |       |
          +-------+                       +-------+

EXOS version

EXOS version used in this netlab:

RS1.1 # show session images

Card Partition Installation Date Version Name Branch ------------------------------------------------------------------------------ Switch primary Thu Jan 16 13:06:22 UTC 2025 32.7.2.19 rescue.xos 32.7.2.19 Switch secondary Thu Jan 16 13:06:22 UTC 2025 32.7.2.19 rescue.xos 32.7.2.19

Configuration

Before starting even fixing anything. Get a overview what works at current state. This is output of ip route on one of connected OSPF routers:

ip route
1.1.1.1 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
2.2.2.2 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
10.0.0.0/31 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
10.0.0.2/31 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
10.0.0.4/31 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
10.0.0.6/31 dev eth0 proto kernel scope link src 10.0.0.7 
10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
  • 2 loopback IP addresses
    • 1.1.1.1 and 2.2.2.2
  • many /31 IP transfer networks
  • one IP prefix 10.1.2.0/24

No IP loopback for RS3 and RS5, the above routing table. These are recurring events that currently happen all the time, IP prefix flapping in the OSPF area:

ip monitor
Deleted 10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
Deleted 10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
Deleted 10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 
10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20 

That is to fix.

RS3

Adjust the OSPF routerid settings and add a loopback interface to RS3 and RS4. Configuration commands:

#
# RS3
#
create vlan lo0
configure vlan lo0 ipaddress 3.3.3.3/32
enable ipforwarding vlan lo0
enable loopback-mode vlan lo0
#
disable ospf
configure ospf routerid 3.3.3.3 
configure ospf add vlan lo0 area 0.0.0.0 passive
enable ospf
#
save
y

RS4

Configuration commands for R4

#
# RS4
#
create vlan lo0
configure vlan lo0 ipaddress 4.4.4.4/32
enable ipforwarding vlan lo0
enable loopback-mode vlan lo0
#
disable ospf
configure ospf routerid 4.4.4.4
configure ospf add vlan lo0 area 0.0.0.0 passive
enable ospf
#
save
y

node1

Full router configurations

The R100 - 10.255.255.100 - (FRRouting) is connected to port1 on router RS2, using SVI/vlan named vlan255:

  • RS3 full configuration
  • RS4 full configuration

Verify OSPF

When the configuration is finished, and the routing protocol converged, inspect the OSPF state using following commands:

show ospf neighbor

Show the ip routing table:

show iproute

RS3

OSPF neighbour output, this router has only 1 OSPF neighbour, R1:

RS3.1 # show ospf neighbor 
Neighbor ID     Pri State              Up/Dead Time             Address         Interface
          BFD Session State 
==========================================================================================
1.1.1.1           1 FULL      /DROTHER 00:00:31:44/00:00:00:09  10.0.0.2        vlan13    
          None              

Total number of neighbors: 1 (All neighbors in Full state)

The routing table of RS3, verify that hash # attached to the IP prefix 10.1.2.0/24:

RS3.2 # show iproute 
Ori  Destination        Gateway         Mtr  Flags         VLAN       Duration
#oa  1.1.1.1/32         10.0.0.2        15   UG-D---um--f- vlan13     0d:0h:32m:26s
#oa  2.2.2.2/32         10.0.0.2        20   UG-D---um--f- vlan13     0d:0h:32m:26s
#d   3.3.3.3/32         3.3.3.3         1    U------um--f- lo0        0d:0h:32m:41s
#oa  4.4.4.4/32         10.0.0.2        25   UG-D---um--f- vlan13     0d:0h:23m:20s
#oa  10.0.0.0/31        10.0.0.2        10   UG-D---um--f- vlan13     0d:0h:32m:26s
#d   10.0.0.2/31        10.0.0.3        1    U------um--f- vlan13     0d:0h:32m:41s
#oa  10.0.0.4/31        10.0.0.2        15   UG-D---um--f- vlan13     0d:0h:32m:26s
#oa  10.0.0.6/31        10.0.0.2        15   UG-D---um--f- vlan13     0d:0h:32m:26s
#d   10.1.2.0/24        10.1.2.3        1    U------um--f- sales      0d:0h:32m:41s
#oa  10.255.255.100/32  10.0.0.2        15   UG-D---um--f- vlan13     0d:0h:32m:26s

The hash # indicates R3 is the ESRP Master router at current time.

(#) Preferred unicast and multicast route.

Verify the ESRP state:

RS3.3 # show esrp

Configured Mode:          Extended
# ESRP domain configuration :
--------------------------------------------------------------------------------
 Domain Grp Ver VLAN    VID  DId   IP/IPX         State  Master MAC Address Nbr
--------------------------------------------------------------------------------
 esrp1   0  E  sales   4094 4096 10.1.2.3        Master   0c:7e:ff:ed:00:00  1 
--------------------------------------------------------------------------------
# ESRP Port configuration:
--------------------------------------------------------------------------------
Port      Weight    Host    Restart
--------------------------------------------------------------------------------

RS3 IS the ESRP Master router.

RS4

OSPF neighbour output, this router has also only 1 OSPF neighbour, R2:

RS4.4 # show ospf neighbor 
Neighbor ID     Pri State              Up/Dead Time             Address         Interface
          BFD Session State 
==========================================================================================
2.2.2.2           1 FULL      /DROTHER 00:00:16:20/00:00:00:09  10.0.0.4        vlan24    
          None              

Total number of neighbors: 1 (All neighbors in Full state)

Converged IP routing table. Take a notice at the Duration column and compare times show here to the IP prefix 10.1.2.0/24:

RS4.5 # show iproute 
Ori  Destination        Gateway         Mtr  Flags         VLAN       Duration
#oa  1.1.1.1/32         10.0.0.4        20   UG-D---um--f- vlan24     0d:0h:16m:53s
#oa  2.2.2.2/32         10.0.0.4        15   UG-D---um--f- vlan24     0d:0h:16m:53s
#oa  3.3.3.3/32         10.0.0.4        25   UG-D---um--f- vlan24     0d:0h:16m:53s
#d   4.4.4.4/32         4.4.4.4         1    U------um--f- lo0        0d:0h:17m:7s
#oa  10.0.0.0/31        10.0.0.4        10   UG-D---um--f- vlan24     0d:0h:16m:53s
#oa  10.0.0.2/31        10.0.0.4        15   UG-D---um--f- vlan24     0d:0h:16m:53s
#d   10.0.0.4/31        10.0.0.5        1    U------um--f- vlan24     0d:0h:17m:7s
#oa  10.0.0.6/31        10.0.0.4        10   UG-D---um--f- vlan24     0d:0h:16m:53s
 d   10.1.2.0/24        10.1.2.3        1    -------um---- sales      0d:0h:17m:7s
#oa  10.1.2.0/24        10.0.0.4        20   UG-D---um--f- vlan24     0d:0h:16m:53s
#oa  10.255.255.100/32  10.0.0.4        10   UG-D---um--f- vlan24     0d:0h:16m:53s

And the times shown in previous blog entry where both routers had identical OSPF router-id set:EXOS ESRP adding OSPF. RS4 is the ESRP Slave router.

(#) Preferred unicast and multicast route.

Result

The results split in 2 different states

  • ESRP converged (stable operations)
  • ESRP Master power outage

To have meaningful results regarding the converge time, node1 and node meassure convergence from the HA LAN side, being client to ESPR. And there is a OSPF FRR appliance connected to router R2.

ESRP converged

Testing when all routers are working on a converged network. Nothing is being shut down or powered off. Just all components running all the time, stable state. Here the results from the linux nodes pinging the RS1 and RS2 routers.

node1

This is the reachability for one of the backbone routers 1.1.1.1 while ESRP is converged and stable:

user % ping 1.1.1.1 -c 100

... --- 1.1.1.1 ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 99166ms rtt min/avg/max/mdev = 0.666/1.226/1.465/0.135 ms

Clean results.

node2

This is the reachabilty of one of the backbone routers 2.2.2.2:

user % ping 2.2.2.2 -c 100

... --- 2.2.2.2 ping statistics --- 100 packets transmitted, 100 packets received, 0% packet loss round-trip min/avg/max = 1.217/1.758/2.074 ms

Clean results. No packet loss.

RS1

Now sending ICMP packets to a node in the 10.1.2.0/24 network from router RS1

RS1.1 # ping 10.1.2.11

Ping(ICMP) 10.1.2.11: 4 packets, 8 data bytes, interval 1 second(s). 16 bytes from 10.1.2.11: icmp_seq=0 ttl=63 time=0.970 ms 16 bytes from 10.1.2.11: icmp_seq=1 ttl=63 time=0.775 ms 16 bytes from 10.1.2.11: icmp_seq=2 ttl=63 time=1.339 ms 16 bytes from 10.1.2.11: icmp_seq=3 ttl=63 time=1.453 ms --- 10.1.2.11 ping statistics --- 4 packets transmitted, 4 packets received, 0% loss round-trip min/avg/max = 0/1/1 ms

The routing table shows the IP prefix is from directly connected router R3 available via OSPF:

* RS1.2 # show iproute 10.1.2.0/24
Ori  Destination        Gateway         Mtr  Flags         VLAN       Duration
#oa  10.1.2.0/24        10.0.0.3        10   UG-D---um--f- vlan13     0d:0h:41m:42s

Lets check that same form R2, and it says it is available from Gateway 10.0.0.0 on interface vlan12

* RS2.5 # show iproute 10.1.2.0/24
Ori  Destination        Gateway         Mtr  Flags         VLAN       Duration
#oa  10.1.2.0/24        10.0.0.0        15   UG-D---um--f- vlan12     0d:0h:52m:49s

R1 is the OSPF neighbor on vlan12:

* RS2.8 # show ospf neighbor vlan12
Neighbor ID     Pri State              Up/Dead Time             Address         Interface
          BFD Session State 
==========================================================================================
1.1.1.1           1 FULL      /DROTHER 00:02:22:59/00:00:00:09  10.0.0.0        vlan12    
          None              

Total number of neighbors: 1 (All neighbors in Full state)

So in this converged state while R3 is in ESRP Master state, and has a OSPF neighborship to R1. This shows the ESRP Master router is the only one in OSPF (too) the network topology advertising the 10.1.2.0/24 IP prefix.

And just by fixing the OSPF routerid setting, the IP prefix is now stable in the OSPF network and all events have stopped.

ESRP failover

The results of shutting down the ESRP Master router R3. Doing simple tests. Measuring the amount of ICMP packets getting lost during convergence, when pinging a IP destination that is in the OSPF area.

node1

ICMP test from 10.1.2.11 to 10.255.255.100:

--- 10.255.255.100 ping statistics ---
100 packets transmitted, 91 received, +3 errors, 9% packet loss, time 99553ms
rtt min/avg/max/mdev = 3.145/7.006/18.737/3.210 ms, pipe 4

node2

ICMP test from 10.1.2.12 to 2.2.2.2:

--- 2.2.2.2 ping statistics ---
100 packets transmitted, 92 received, 8% packet loss, time 99349ms
rtt min/avg/max/mdev = 2.332/11.284/514.835/52.860 ms

FRR

ICMP test from a 10.255.255.100 to 10.1.2.11 which is node1 IP address.

--- 10.1.2.11 ping statistics ---
100 packets transmitted, 92 packets received, 8% packet loss
round-trip min/avg/max = 3.116/18.900/1047.462 ms

The ESRP FHRP protocol defaults:

Timer Configuration:       Hello     2s(2)      Neighbor     8s(0)
                           PreMaster     6s(0)      Neutral      4s(0)
                           NbrRestart    2s(0)

Summary

This example scenario is limited to simply fix the frequent IP prefix announcements in the routing table. Fixing the OSPF routerid stabilized the network topology It also has shown that EXOS assigns the IP network forwarding router in ESRP and OSPF. Only the ESRP Master router forwards IP traffic, from inside he LAN to the backbone. I am not sure how this is done in the protocol, but this is a good design choice when dealing with dynamic routing protocols and FHRP protocols.

That applied workaround also lacks a lot of network test simulating, outages that happen during productive network running:

  • OSPF neighbor link fail, RS3 RS4 have only one OSPF neighbor:
    • OSPF ESRP active/primary/master
    • OSPF ESRP backup/secondary/slave
  • ISL link fail (RS3<->RS4>)
  • RS5 link fail
    • ESRP active/primary/master
    • ESRP backup/secondary/slave
  • Router fail (power outage)
    • ESRP active/primary/master
    • ESRP backup/secondary/slave

To take check out the ESRP/OSPF/STP how the configured networking solution behaves when things begin to fail.

Also Spanning Tree configuration is missing in this example. Another networking protocol to take care of. Adding BFD to all point-to-point interfaces, would speed up the convergence time of OSPF. And what is completely left out here in the blog, is the further tuning of ESRP related settings to reach the a highly available and fast converging networking solution/connection.

ESRP is the only protocol so far, that I know have tested, which has a direct physical link per default between both FHRP routers. All other FHRP protocols do not have a direct link.

Update

A reader of confirmed blog confirmed that it is a bug in the documentation. Pointing out where the official EXOS ESRP guidelines it is written, to use unique OSPF router-id setting. It is probably the most important setting.

📘 Note
If you configure the OSPF routing protocol and ESRP, you MUST manually configure an OSPF router identifier. Be sure that you configure a unique OSPF router ID on each switch running ESRP. For more information, see OSPF

And this particular example configuration is is wrong, setting bot routers OSPF routerid to 5.5.5.5, identical.

Thanks you Erik for confirming.

See also

References