Understanding BGP Route Convergence

BGP Route Convergence

What is Routing Convergence? Routing convergence can be broadly defined as how quickly a routing protocol can become stable after changes occur in the network, for example, a protocol or link flap.

Faster convergence leads to higher availability and improved network stability. Thus it is important that before the network is deployed in production, convergence time is properly calculated with thorough testing. But what is convergence time?

If a link on primary path fails, the best path is impacted and leads to a traffic loss. Because of the failure event, a next-best path is computed. The amount of time during which there was a traffic loss in the network while the alternate path was not available to forward the traffic to the point where traffic starts flowing again is called the convergence time.

Like any other dynamic routing protocol, BGP accepts routing updates from its neighbors. It then advertises those updates to its peers except to the one from which it received, only if the route is a best route. BGP uses an explicit withdrawal section in the update message to inform the peers on loss of the path so they can update their BGP table accordingly.

Topology with Primary and Secondary Path

As the networks grow larger, this could eventually pose scalability challenges and convergence issues especially to the service provider and enterprise networks to maintain an ever-increasing number of Transmission Control Protocol (TCP) sessions and routes. If the scale of the network has increased, the BGP process will have to process all the routes present in the BGP table and update its peers. In addition, the router processing the updates in such a scaled environment demand more memory and CPU resources. Because BGP is a key protocol for the Internet, it is important to ensure that BGP is highly convergent even with increased scale.

BGP convergence depends on various factors. BGP convergence is all about the speed of the following:

  • Establishing sessions with a number of peers
  • Locally generate all the BGP paths (either via network statement, redistribution of static/connected/IGP routes) and/or from other component for other address-family for example, Multicast Virtual Private Network (MVPN) from multicast, Layer 2 Virtual Private Network (L2VPN) from l2vpn manager, and so on.)
  • Send and receive multiple BGP tables; that is, different BGP address-families to/from each peer
  • Upon receiving all the paths from peers, perform the best-path calculation to find the best path and/or multipath, additional-path, backup path
  • Installing the best path into multiple routing tables, such as the default or Virtual Routing and Forwarding (VRF) routing table
  • Import and export mechanism
  • For another address-family, like l2vpn or multicast, pass the path calculation result to different lower layer components

BGP uses lot of CPU cycles when processing BGP updates and requires memory for maintaining BGP peers and routes in the BGP table. Based on the role of the BGP router in the network, appropriate hardware should be chosen. The more memory a router has, the more routes it can support, much like how a router with a faster CPU can support a larger number of peers.

BGP updates rely on TCP, optimization of router resources such as memory and TCP session parameters such as maximum segment size (MSS), path MTU discovery, interface input queues, TCP window size, and so on help improve convergence.

Scenario

  • R1‘s session to R7 just came up and follow the way that prefix 20.0.0.0/8 takes to propagate through AS 300.

BGP Read-Only Mode

Upon session establishment and exchanging the BGP OPEN messages, the router enters the “BGP Read-Only Mode“, this means that R1 will not start the BGP Best-Path Selection Process until it either receives all prefixes from R7 or reaches the BGP read-only mode timeout. The timeout is defined using the BGP process command bgp update-delay.

The reason to hold the BGP best-path selection process is to ensure that the peer has supplied us all routing information. This allows minimizing the number of best-path selection process runs, simplify update generation and ensure better prefix per message packing, thus improving transportation efficiency.

BGP Update-Delay Timer Read-Only Mode (Update Reception)

This timer ensures that the peer has supplied us all routing information in order to minimize the number of BGP best path runs, simplify update generation and to better pack routes into TCP segments.

When BGP establishes its first peer, a timer called the update-delay is triggered. This is by default set to 120 seconds and the BGP best path algorithm will not run until this timer expires or until the peer signals that it has sent all routes. The peer can signal that it’s done by either sending a BGP Keepalive or the BGP End of RIB message which is normally used with graceful restart (GR). The reason to hold the BGP best-path selection process is to ensure that the peer has supplied us all routing information in order to minimize the number of BGP best path runs, simplify update generation and to better pack routes into TCP segments.

The BGP End-Of-RIB message is normally used for BGP graceful restart, but could also be used to explicitly signalize the end of BGP UPDATE exchange process. Even if BGP process does not support the End-of-RIB marker, Cisco’s BGP implementation always sends a Keepalive message when it finishes sending updates to a peer.

It is clear that the best-path selection delay would be longer in case when peers have to exchange larger routing tables, or the underlying TCP transport and router ingress queue settings make the exchange slower.

Defaults: 120 seconds
bgp update-delay seconds [always]
no bgp update-delay [seconds] [always]

				
					router bgp 64530
 bgp update-delay 240

				
			

BGP Best-Path Selection

When a BGP router leaves read-only mode, it starts the best-path selection process. This process walks over new information and compare it with the Local BGP RIB contents, selecting the best-path for every prefix. As soon as the best-path process is finished, BGP has to upload all routes to the RIB, before advertising them to the peers.

This is a requirement of distance vector protocols – having the routing information active in the RIB before propagating it further. The RIB update will in turn trigger FIB information upload to the router’s line-cards, if the platform supports distributed forwarding. Both RIB and FIB updates are time-consuming and take the time proportional to the number of prefixes being updated.

BGP Advertisement-Interval Timer (Update Generation)

The primary cause for the slowness of the BGP convergence delay is the Minimum Route Advertisement Interval (MRAI). This timer forces the BGP routers to wait for at least that amount of time before sending an advertisement for the same prefix.

The goal of this timer is to reduce route churn and to produce fewer BGP updates but it does slow down convergence. So instead of using flash updates triggered by a change, BGP waits for the expiration of the BGP advertisement-interval before sending out the BGP update. In this way if there are other changes that should be advertised the BGP process can prepare a more efficient update.

After information has been committed to RIB, the router needs to replicate the best-paths to every peer that should receive it. The replication process could be most memory and CPU intensive as the process has to perform a full BGP table walk for every peer and construct the output for the corresponding BGP Adj-RIB-Out. This may require additional transient memory in the course of the update batch calculation. However, the update generation process is highly optimized in Cisco’s BGP implementation by means of dynamic update groups.

The dynamic-update groups is that BGP process dynamically finds all neighbors sharing the same outbound policies, then elects a peer with the lowest IP address as the group leader and only generates the updates batch for the group leader. All other members of the same group receive the same updates.

In our case, R1 has to generate two update sets: one for R5 and another for the pair of RR1 and RR2 route reflectors.

R1 starts sending updates to R5 and RR1, RR2. This will take some time, depending on the BGP TCP transport settings and BGP table size. However, before R1 will ever start sending any updates to any peer/update group, it checks if Advertisement-Interval timer is running for this peer.

BGP speaker starts this timer on per-peer basis every time its done sending the full batch of updates to the peer. If the subsequent batch is prepared to be sent and the timer is still running, the update will be delayed until the timer expires. This is a dampening mechanism to prevent unstable peers from flooding the network with updates. This timer really starts playing its role only for “Down-Up” or “Up-Down” convergence, as any rapid flapping changes are delayed for the amount of advertisement-interval seconds.

The process repeats itself on RR1 and RR2, starting with the incoming UPDATE packet reception, best-path selection and update generation.

As we can see, the main limiting factors of BGP convergence are BGP table size, transport-level settings and advertisement delay. The best-path selection time is proportional to the table size as well as time required for update batching.

Defaults: IBGP 5 seconds / EBGP 30 seconds
Command: neighbor {ip-address | peer-group-name} advertisement-interval seconds
If an advertised route is flapping, usually caused when an interface is unstable, a flood of UPDATE and WITHDRAWN messages occurs.
With the default value of 30 seconds for EBGP neighbors, BGP routing updates are sent only every 30 seconds, even if a route is flapping many times during this 30-seconds interval.

				
					router bgp 1
 neighbor 10.1.1.1 remote-as 1
 neighbor 10.2.1.2 remote-as 2
 neighbor 10.1.1.1 advertisement-interval 15
 neighbor 10.2.1.2 advertisement-interval 45
exit
				
			

Update Generation Improvements

The following methods improve update generation, which are the basis for any BGP convergence tuning:

  • Peer Groups
  • BGP Dynamic Update Peer Groups
  • BGP read-only mode

Leave a Reply