This is a follow up netlab article about EXOS ESRP simple setup configuration. The prerequisite is the EXOS ESRP adding OSPF. In this step, the ospfrid
will be configured and loopback interfaces created and added to OSPF. Routers RS3 and RS4 which have the same ospfrid
- 5.5.5.5. Both routers configuration will be changed.
The goal is to get the floating IP prefix 10.1.2.0/24
stable and minimise the amount of generated OSPF events in the area 0. By applying this the network solution using ESRP should get deterministic and operative.
network topology
OSPF configuration is added to RS3 and RS4 only on top of already running and configuration from previous ESRP article EXOS ESRP adding OSPF:
OSPF router-id overview:
sysName | ospf router-id |
---|---|
RS1 | 1.1.1.1 |
RS2 | 2.2.2.2 |
RS3 | 3.3.3.3 |
RS4 | 4.4.4.4 |
Network topology showing relevant IP and OSPF settings:
1.1.1.1 2.2.2.2
+-------+ +-------+
| | 12 12 | |
| RS1 |-----------------------------------| RS2 |
| | | |
+-------+ +-------+
10 | | 10
| (OSPF) |
| |
| |
3.3.3.3 4.4.4.4
+-------+ +-------+
| | 3 3 | |
| RS3 +-----------------------------------+ RS4 |
| | | |
+---+---+ +---+---+
1 | | 1
| ESRP VR1 |
| 10.1.2.3 |
| |
| +-------+ |
| | | |
+-----------------+ RS5 +-----------------+
| |
+-+---+-+
| |
+-------------+ +-------------+
| eth0 eth0 |
+---+---+ +---+---+
| | | |
| node1 | | node2 |
| | | |
+-------+ +-------+
EXOS version
EXOS version used in this netlab:
Card Partition Installation Date Version Name Branch ------------------------------------------------------------------------------ Switch primary Thu Jan 16 13:06:22 UTC 2025 32.7.2.19 rescue.xos 32.7.2.19 Switch secondary Thu Jan 16 13:06:22 UTC 2025 32.7.2.19 rescue.xos 32.7.2.19
Configuration
Before starting even fixing anything. Get a overview what works at current state. This is output of ip route
on one of connected OSPF routers:
ip route
1.1.1.1 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
2.2.2.2 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
10.0.0.0/31 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
10.0.0.2/31 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
10.0.0.4/31 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
10.0.0.6/31 dev eth0 proto kernel scope link src 10.0.0.7
10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
- 2 loopback IP addresses
- 1.1.1.1 and 2.2.2.2
- many
/31
IP transfer networks - one IP prefix
10.1.2.0/24
No IP loopback for RS3 and RS5, the above routing table. These are recurring events that currently happen all the time, IP prefix flapping in the OSPF area:
ip monitor
Deleted 10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
Deleted 10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
Deleted 10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
10.1.2.0/24 nhid 12 via 10.0.0.6 dev eth0 proto ospf metric 20
That is to fix.
RS3
Adjust the OSPF routerid
settings and add a loopback interface to RS3 and RS4. Configuration commands:
#
# RS3
#
create vlan lo0
configure vlan lo0 ipaddress 3.3.3.3/32
enable ipforwarding vlan lo0
enable loopback-mode vlan lo0
#
disable ospf
configure ospf routerid 3.3.3.3
configure ospf add vlan lo0 area 0.0.0.0 passive
enable ospf
#
save
y
RS4
Configuration commands for R4
#
# RS4
#
create vlan lo0
configure vlan lo0 ipaddress 4.4.4.4/32
enable ipforwarding vlan lo0
enable loopback-mode vlan lo0
#
disable ospf
configure ospf routerid 4.4.4.4
configure ospf add vlan lo0 area 0.0.0.0 passive
enable ospf
#
save
y
node1
Full router configurations
The R100 - 10.255.255.100 - (FRRouting) is connected to port1 on router RS2, using SVI/vlan named vlan255
:
- RS3 full configuration
- RS4 full configuration
Verify OSPF
When the configuration is finished, and the routing protocol converged, inspect the OSPF state using following commands:
show ospf neighbor
Show the ip routing table:
show iproute
RS3
OSPF neighbour output, this router has only 1 OSPF neighbour, R1:
RS3.1 # show ospf neighbor
Neighbor ID Pri State Up/Dead Time Address Interface
BFD Session State
==========================================================================================
1.1.1.1 1 FULL /DROTHER 00:00:31:44/00:00:00:09 10.0.0.2 vlan13
None
Total number of neighbors: 1 (All neighbors in Full state)
The routing table of RS3, verify that hash #
attached to the IP prefix 10.1.2.0/24
:
RS3.2 # show iproute
Ori Destination Gateway Mtr Flags VLAN Duration
#oa 1.1.1.1/32 10.0.0.2 15 UG-D---um--f- vlan13 0d:0h:32m:26s
#oa 2.2.2.2/32 10.0.0.2 20 UG-D---um--f- vlan13 0d:0h:32m:26s
#d 3.3.3.3/32 3.3.3.3 1 U------um--f- lo0 0d:0h:32m:41s
#oa 4.4.4.4/32 10.0.0.2 25 UG-D---um--f- vlan13 0d:0h:23m:20s
#oa 10.0.0.0/31 10.0.0.2 10 UG-D---um--f- vlan13 0d:0h:32m:26s
#d 10.0.0.2/31 10.0.0.3 1 U------um--f- vlan13 0d:0h:32m:41s
#oa 10.0.0.4/31 10.0.0.2 15 UG-D---um--f- vlan13 0d:0h:32m:26s
#oa 10.0.0.6/31 10.0.0.2 15 UG-D---um--f- vlan13 0d:0h:32m:26s
#d 10.1.2.0/24 10.1.2.3 1 U------um--f- sales 0d:0h:32m:41s
#oa 10.255.255.100/32 10.0.0.2 15 UG-D---um--f- vlan13 0d:0h:32m:26s
The hash #
indicates R3 is the ESRP Master
router at current time.
(#) Preferred unicast and multicast route.
Verify the ESRP state:
RS3.3 # show esrp
Configured Mode: Extended
# ESRP domain configuration :
--------------------------------------------------------------------------------
Domain Grp Ver VLAN VID DId IP/IPX State Master MAC Address Nbr
--------------------------------------------------------------------------------
esrp1 0 E sales 4094 4096 10.1.2.3 Master 0c:7e:ff:ed:00:00 1
--------------------------------------------------------------------------------
# ESRP Port configuration:
--------------------------------------------------------------------------------
Port Weight Host Restart
--------------------------------------------------------------------------------
RS3 IS the ESRP Master
router.
RS4
OSPF neighbour output, this router has also only 1 OSPF neighbour, R2:
RS4.4 # show ospf neighbor
Neighbor ID Pri State Up/Dead Time Address Interface
BFD Session State
==========================================================================================
2.2.2.2 1 FULL /DROTHER 00:00:16:20/00:00:00:09 10.0.0.4 vlan24
None
Total number of neighbors: 1 (All neighbors in Full state)
Converged IP routing table. Take a notice at the Duration
column and compare times show here to the IP prefix 10.1.2.0/24
:
RS4.5 # show iproute
Ori Destination Gateway Mtr Flags VLAN Duration
#oa 1.1.1.1/32 10.0.0.4 20 UG-D---um--f- vlan24 0d:0h:16m:53s
#oa 2.2.2.2/32 10.0.0.4 15 UG-D---um--f- vlan24 0d:0h:16m:53s
#oa 3.3.3.3/32 10.0.0.4 25 UG-D---um--f- vlan24 0d:0h:16m:53s
#d 4.4.4.4/32 4.4.4.4 1 U------um--f- lo0 0d:0h:17m:7s
#oa 10.0.0.0/31 10.0.0.4 10 UG-D---um--f- vlan24 0d:0h:16m:53s
#oa 10.0.0.2/31 10.0.0.4 15 UG-D---um--f- vlan24 0d:0h:16m:53s
#d 10.0.0.4/31 10.0.0.5 1 U------um--f- vlan24 0d:0h:17m:7s
#oa 10.0.0.6/31 10.0.0.4 10 UG-D---um--f- vlan24 0d:0h:16m:53s
d 10.1.2.0/24 10.1.2.3 1 -------um---- sales 0d:0h:17m:7s
#oa 10.1.2.0/24 10.0.0.4 20 UG-D---um--f- vlan24 0d:0h:16m:53s
#oa 10.255.255.100/32 10.0.0.4 10 UG-D---um--f- vlan24 0d:0h:16m:53s
And the times shown in previous blog entry where both routers had identical OSPF router-id set:EXOS ESRP adding OSPF. RS4 is the ESRP Slave
router.
(#) Preferred unicast and multicast route.
Result
The results split in 2 different states
- ESRP converged (stable operations)
- ESRP Master power outage
To have meaningful results regarding the converge time, node1 and node meassure convergence from the HA LAN side, being client to ESPR. And there is a OSPF FRR appliance connected to router R2.
ESRP converged
Testing when all routers are working on a converged network. Nothing is being shut down or powered off. Just all components running all the time, stable state. Here the results from the linux nodes pinging the RS1
and RS2
routers.
node1
This is the reachability for one of the backbone routers 1.1.1.1
while ESRP is converged and stable:
... --- 1.1.1.1 ping statistics --- 100 packets transmitted, 100 received, 0% packet loss, time 99166ms rtt min/avg/max/mdev = 0.666/1.226/1.465/0.135 ms
Clean results.
node2
This is the reachabilty of one of the backbone routers 2.2.2.2
:
... --- 2.2.2.2 ping statistics --- 100 packets transmitted, 100 packets received, 0% packet loss round-trip min/avg/max = 1.217/1.758/2.074 ms
Clean results. No packet loss.
RS1
Now sending ICMP packets to a node in the 10.1.2.0/24
network from router RS1
Ping(ICMP) 10.1.2.11: 4 packets, 8 data bytes, interval 1 second(s). 16 bytes from 10.1.2.11: icmp_seq=0 ttl=63 time=0.970 ms 16 bytes from 10.1.2.11: icmp_seq=1 ttl=63 time=0.775 ms 16 bytes from 10.1.2.11: icmp_seq=2 ttl=63 time=1.339 ms 16 bytes from 10.1.2.11: icmp_seq=3 ttl=63 time=1.453 ms --- 10.1.2.11 ping statistics --- 4 packets transmitted, 4 packets received, 0% loss round-trip min/avg/max = 0/1/1 ms
The routing table shows the IP prefix is from directly connected router R3 available via OSPF:
* RS1.2 # show iproute 10.1.2.0/24
Ori Destination Gateway Mtr Flags VLAN Duration
#oa 10.1.2.0/24 10.0.0.3 10 UG-D---um--f- vlan13 0d:0h:41m:42s
Lets check that same form R2, and it says it is available from Gateway 10.0.0.0
on interface vlan12
* RS2.5 # show iproute 10.1.2.0/24
Ori Destination Gateway Mtr Flags VLAN Duration
#oa 10.1.2.0/24 10.0.0.0 15 UG-D---um--f- vlan12 0d:0h:52m:49s
R1 is the OSPF neighbor on vlan12:
* RS2.8 # show ospf neighbor vlan12
Neighbor ID Pri State Up/Dead Time Address Interface
BFD Session State
==========================================================================================
1.1.1.1 1 FULL /DROTHER 00:02:22:59/00:00:00:09 10.0.0.0 vlan12
None
Total number of neighbors: 1 (All neighbors in Full state)
So in this converged state while R3
is in ESRP Master state, and has a OSPF neighborship to R1. This shows the ESRP Master router is the only one in OSPF (too) the network topology advertising the 10.1.2.0/24
IP prefix.
And just by fixing the OSPF routerid
setting, the IP prefix is now stable in the OSPF network and all events have stopped.
ESRP failover
The results of shutting down the ESRP Master router R3. Doing simple tests. Measuring the amount of ICMP packets getting lost during convergence, when pinging a IP destination that is in the OSPF area.
node1
ICMP test from 10.1.2.11
to 10.255.255.100
:
--- 10.255.255.100 ping statistics ---
100 packets transmitted, 91 received, +3 errors, 9% packet loss, time 99553ms
rtt min/avg/max/mdev = 3.145/7.006/18.737/3.210 ms, pipe 4
node2
ICMP test from 10.1.2.12
to 2.2.2.2
:
--- 2.2.2.2 ping statistics ---
100 packets transmitted, 92 received, 8% packet loss, time 99349ms
rtt min/avg/max/mdev = 2.332/11.284/514.835/52.860 ms
FRR
ICMP test from a 10.255.255.100
to 10.1.2.11
which is node1 IP address.
--- 10.1.2.11 ping statistics ---
100 packets transmitted, 92 packets received, 8% packet loss
round-trip min/avg/max = 3.116/18.900/1047.462 ms
The ESRP FHRP protocol defaults:
Timer Configuration: Hello 2s(2) Neighbor 8s(0)
PreMaster 6s(0) Neutral 4s(0)
NbrRestart 2s(0)
Summary
This example scenario is limited to simply fix the frequent IP prefix announcements in the routing table. Fixing the OSPF routerid stabilized the network topology It also has shown that EXOS assigns the IP network forwarding router in ESRP and OSPF. Only the ESRP Master router forwards IP traffic, from inside he LAN to the backbone. I am not sure how this is done in the protocol, but this is a good design choice when dealing with dynamic routing protocols and FHRP protocols.
That applied workaround also lacks a lot of network test simulating, outages that happen during productive network running:
- OSPF neighbor link fail, RS3 RS4 have only one OSPF neighbor:
- OSPF ESRP active/primary/master
- OSPF ESRP backup/secondary/slave
- ISL link fail (RS3<->RS4>)
- RS5 link fail
- ESRP active/primary/master
- ESRP backup/secondary/slave
- Router fail (power outage)
- ESRP active/primary/master
- ESRP backup/secondary/slave
To take check out the ESRP/OSPF/STP how the configured networking solution behaves when things begin to fail.
Also Spanning Tree configuration is missing in this example. Another networking protocol to take care of. Adding BFD to all point-to-point interfaces, would speed up the convergence time of OSPF. And what is completely left out here in the blog, is the further tuning of ESRP related settings to reach the a highly available and fast converging networking solution/connection.
ESRP is the only protocol so far, that I know have tested, which has a direct physical link per default between both FHRP routers. All other FHRP protocols do not have a direct link.
Update
A reader of confirmed blog confirmed that it is a bug in the documentation. Pointing out where the official EXOS ESRP guidelines it is written, to use unique OSPF router-id setting. It is probably the most important setting.
If you configure the OSPF routing protocol and ESRP, you MUST manually configure an OSPF router identifier. Be sure that you configure a unique OSPF router ID on each switch running ESRP. For more information, see OSPF
And this particular example configuration is is wrong, setting bot routers OSPF routerid to 5.5.5.5
, identical.
Thanks you Erik for confirming.
See also
- 01 - EXOS ESRP simple setup
- 02 - EXOS ESRP adding OSPF
- 03 - EXOS ESRP change OSPF routerid