networking n stuff

Friday, April 10, 2020

Demystifying BFD intervals

If I was to make a list of most wide-used and still very poorly-documented protocols, I'd definitely put BFD in it. In this post I'll try to answer several questions I was looking answers for.

TLDR: by default when you enable BFD there's both Echo and Asynchronous BFD modes running simultaneously (so there's two different types of packets your devices exchange with each other). The one's responsible for failure detection is Echo Mode and it's the one you are configuring intervals and multipliers for. In outputs you'll also see intervals for Asynchronous Mode packets which are significantly higher but you don't have to worry because Echo Mode packets are used for failure detection.

Let's say you configured BFD as follows (I'm using CSR1000V with IOS XE 16.04.03 for this example):

interface GigabitEthernet2
<...>
 bfd interval 50 min_rx 50 multiplier 3
!
router ospf 1
 bfd all-interfaces

So we're all good, setting our timers for minimal possible values. But when you start to check things out, you see this:

Core3#sh bfd nei details 

IPv4 Sessions
NeighAddr                              LD/RD         RH/RS     State     Int
10.30.1.30                           4097/4097       Up        Up        Gi2
Session state is UP and using echo function with 50 ms interval.
Session Host: Software
OurAddr: 10.30.1.29     
Handle: 1
Local Diag: 0, Demand mode: 0, Poll bit: 0
MinTxInt: 1000000, MinRxInt: 1000000, Multiplier: 3
Received MinRxInt: 1000000, Received Multiplier: 3
Holddown (hits): 0(0), Hello (hits): 1000(1590)
Rx Count: 1587, Rx Interval (ms) min/max/avg: 2/1003/875 last: 358 ms ago
Tx Count: 1594, Tx Interval (ms) min/max/avg: 2/1004/871 last: 315 ms ago
Elapsed time watermarks: 0 0 (last: 0)
Registered protocols: OSPF CEF 
Uptime: 00:23:09
Last packet: Version: 1                  - Diagnostic: 0
             State bit: Up               - Demand bit: 0
             Poll bit: 0                 - Final bit: 0
             C bit: 0                                   
             Multiplier: 3               - Length: 24
             My Discr.: 4097             - Your Discr.: 4097
             Min tx interval: 1000000    - Min rx interval: 1000000
             Min Echo interval: 50000  

What's this? Why do we have Rx/TxInt of 1 second? And what's echo? OK, let's figure out how exactly BFD works.



RFC5880 says BFD has several operation modes:

   The primary mode is known as Asynchronous mode.  In this mode, the
   systems periodically send BFD Control packets to one another, and if
   a number of those packets in a row are not received by the other
   system, the session is declared to be down.

   The second mode is known as Demand mode.  In this mode, it is assumed
   that a system has an independent way of verifying that it has
   connectivity to the other system.  Once a BFD session is established,
   such a system may ask the other system to stop sending BFD Control
   packets, except when the system feels the need to verify connectivity
   explicitly, in which case a short sequence of BFD Control packets is
   exchanged, and then the far system quiesces.  Demand mode may operate
   independently in each direction, or simultaneously.

   An adjunct to both modes is the Echo function.  When the Echo
   function is active, a stream of BFD Echo packets is transmitted in
   such a way as to have the other system loop them back through its
   forwarding path.  If a number of packets of the echoed data stream
   are not received, the session is declared to be down.  The Echo
   function may be used with either Asynchronous or Demand mode.  Since
   the Echo function is handling the task of detection, the rate of
   periodic transmission of Control packets may be reduced (in the case
   of Asynchronous mode) or eliminated completely (in the case of Demand
   mode).

Sounds interesting. But we haven't configured any mode specifically, so what is the default mode? Quick check on configuration guide says Echo mode is enabled by default. This is true for IOS-XE, IOS and NX-OS. Anyway, if you see "using echo function with 50 ms interval" in your sh bdf nei output, echo is enabled.

As the RFC says Echo is adjunct to both modes and I was curious anyway, I decided to look closer what actual packets look like. That's what capture looks like:



We can see packets are send at 50 ms interval indeed and those are echo packets. BFD Control packets are still send with an interval of 1 second, which in fact means we're running echo and asynchronous mode at the same time. Note that echo packets destination IP is set to the same address as the source IP, and the destination MAC is the MAC of BFD neighbor, and when neighbor receives this packet, it sends it back to the originator, only swapping destination and source MACs without changing IP values. Here's a close look.
First pair of packets, originated by 10.30.1.29 and sent back by 10.30.1.30:

















































Second pair, originated by 10.30.1.30 and sent back by 10.30.1.29. We see another value at packet data.

























Oh, cool. So now we know why it's recommended to disable ip redirects when we turn BFD on. Nevertheless, we're still confused about intervals. As we saw in outputs, the echos are send at 50 ms intervals, indeed, the control packets - at 1s interval, so how soon we'll see a failure? Let's find out by disabling interface. Since I'm using EVE-NG for this lab, I can just shut down interface one one of the routers - the media on the other side will remain up so it's a perfectly valid check. Let's see:

Core4(config)#int gi2
Core4(config-if)#sh
Core4(config-if)#
Apr 10 10:58:28.861: %OSPF-5-ADJCHG: Process 1, Nbr 10.30.1.203 on GigabitEthernet2 from FULL to DOWN, Neighbor Down: Interface down or detached
Apr 10 10:58:28.863: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed,  ld:4097 neigh proc:OSPF, handle:1 act

Core4(config-if)#


Core3#
Apr 10 10:58:28.980: %BFDFSM-6-BFD_SESS_DOWN: BFD-SYSLOG: BFD session ld:4097 handle:1,is going Down Reason: ECHO FAILURE
Apr 10 10:58:28.983: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed,  ld:4097 neigh proc:OSPF, handle:1 act
Apr 10 10:58:28.985: %OSPF-5-ADJCHG: Process 1, Nbr 10.30.1.204 on GigabitEthernet2 from FULL to DOWN, Neighbor Down: BFD node down
Core3#
Apr 10 10:59:28.989: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed,  ld:4097 neigh proc:CEF, handle:1 act

Core3#

As we see, BFD session Core3 was declared down in less then 150 ms, which means timers we have configured are in effect and while there are both echo and control packets send, echo packets are the ones responsible for failure detection. Now, let's take a look at control packet:















Let's check RFC once again:
   Required Min RX Interval

      This is the minimum interval, in microseconds, between received
      BFD Control packets that this system is capable of supporting,
      less any jitter applied by the sender (see section 6.8.2).  If
      this value is zero, the transmitting system does not want the
      remote system to send any periodic BFD Control packets.
So, as we see, there are different timers for control packets and echo packets. That's why the output is so confusing! I wasn't able to find any way to change these values while still using echo mode (to be honest, I wasn't looking way too hard). Seems like while you're using Echo, you're only able to change Echo intervals, but not control packets'.

Now, let's try to only run asynchronous mode. I can't instantly come up with idea why I'd really need this, since you can really call the detection bidirectional only when you use echo packets, bit let's try this:

(config-if)#no bfd echo 
Core4#sh bfd nei det

IPv4 Sessions
NeighAddr                              LD/RD         RH/RS     State     Int
10.30.1.29                           4097/4097       Up        Up        Gi2
Session state is UP and not using echo function.
Session Host: Software
OurAddr: 10.30.1.30     
Handle: 1
Local Diag: 0, Demand mode: 0, Poll bit: 0
MinTxInt: 50000, MinRxInt: 50000, Multiplier: 3
Received MinRxInt: 50000, Received Multiplier: 3
Holddown (hits): 140(0), Hello (hits): 50(3560)
Rx Count: 1184, Rx Interval (ms) min/max/avg: 1/224/117 last: 10 ms ago
Tx Count: 1186, Tx Interval (ms) min/max/avg: 1/242/117 last: 32 ms ago
Elapsed time watermarks: 0 0 (last: 0)
Registered protocols: OSPF CEF 
Uptime: 00:37:10
Last packet: Version: 1                  - Diagnostic: 0
             State bit: Up               - Demand bit: 0
             Poll bit: 0                 - Final bit: 0
             C bit: 0                                   
             Multiplier: 3               - Length: 24
             My Discr.: 4097             - Your Discr.: 4097
             Min tx interval: 50000      - Min rx interval: 50000
             Min Echo interval: 0  

Cool! Note I didn't have to reconfigure intervals and multipliers. I still only have bfd interval 50 min_rx 50 multiplier 3 configured on my physical interface, but before it was used for echo packets and now when I turned echo off it's used for control packets. Kinda confusing, if you'd ask me.

Just to be sure, going through the capture:



Yes, now it's just usual control packets. Cool! Can't think why I'll need to only use this mode though, aside the situations when your devices does not support echo.


No comments:

Post a Comment