BGP Graceful-Restart Feature (GR)

NSF/SSO, NSR, Graceful Restart

Nonstop forwarding (NSF) refers to the capability of the data plane to continue forwarding IP packets when the control plane disappears (momentarily, that is), most likely an RP switchover (failing over to a standby RP.)

Stateful switchover (SSO) refers to the capability of the control plane to hold configuration and various states during this switchover, and to thus effectively reduce the time to utilize the newly failed-over control plane. This is also handy when doing scheduled hitless upgrades within the ISSU execution path. The time to reach SSO for the newly active RP may vary depending on the type and scale of the configuration.

Graceful restart (GR) refers to the capability of the control plane to delay advertising the absence of a peer (going through control-plane switchover) for a “grace period”, and thus help minimize disruption during that time (assuming the standby control plane comes up). GR is based on extensions per routing protocol, which are interoperable across vendors. The downside of the grace period is huge when the peer completely fails and never comes up, because that slows down the overall network convergence, which brings us to the final concept: nonstop routing (NSR).

NSR is an internal (vendor-specific) mechanism to extend the awareness of routing to the standby control plane so that in case of failover, the newly active control plane can take charge of the already established sessions.

BGP Graceful-Restart Feature

The BGP Graceful-Restart (GR) feature allows a BGP speaker to express its ability to preserve forwarding state during BGP restart or Route Processor (RP) switchover. In other words, it is the capability exchanged between the BGP speakers to indicate its ability to perform Nonstop Forwarding (NSF). This helps in minimizing the impact of services caused by BGP restart. Specially in large network deployments, where BGP carries large number of prefixes, a BGP restart, especially by a route-reflector (RR) router, can have a severe performance and service impact and can lead to major outages.


R1 is acting as the RR and its peering with multiple clients. If there is a BGP restart or RP switchover on R1, the peer detects the session flaps and propagate routing updates throughout the network. This can lead to increased CPU utilization if the RR is holding a large BGP table. The traffic destined to the prefixes that were removed are impacted.

Impact of Node Failure in a Network with BGP Route Reflectors

RFC 4724 defines the GR mechanism for BGP. The BGP GR was developed with the following motivations:

  • Avoid widespread routing changes.
  • Decrease control plane overhead throughout the network.
  • Enhance overall stability of routing.

A GR-capable device announces its ability to perform GR for the BGP peer. It also initiates the graceful-restart process when a RP switchover occurs and acts as a GR-aware device. A GR-aware device, also known GR helper mode, is capable of understanding that a peer router is transitioning and takes appropriate actions based on the configuration or default timers.

GR capability should always be enabled for all routing protocols, especially when the routers are running with dual route processors (RP) and perform a switchover in case of any failure instance. Because BGP runs on TCP, GR should be enabled on both the peering devices. After GR is configured or enabled on both peering devices, reset the BGP session to exchange the capability and activate the GR feature.

Note: GR is always on by default for non-TCP–based protocols such as Interior Gateway Protocol (IGPs).

BGP GR is an optional feature and is not enabled by default. BGP peers announce GR capability in the BGP OPEN message. Within the OPEN message, the following information is negotiated:

  • Restart Flag: This bit indicates if a peer sending the GR capability has just restarted.
  • Restart Time: Indicates the length of time that the sender of the GR capability requires to complete a restart. The restart timer also helps in speeding up convergence in the event the peer never comes back up after a restart.
  • AFI/SAFI: Address-family for which GR is supported.
  • AFI Flags: It contains a Forwarding State bit. This bit indicates whether the peer sending the GR capability has preserved forwarding during the previous restart.

When a BGP restart happens on the peer router or when RP switchover occurs, the routes currently held in the forwarding table; that is, hardware, are marked as stale. This way, the forwarding state is preserved as the control plane and the forwarding plane operate independently.

  1. On the restarting peer (where the switchover occurred), BGP on the newly active RP starts to establish sessions with all the configured peers.
  2. BGP on the other side, the nonrestarting side, sees new connection requests coming in while BGP already is in established state. Such an event is an indication for the nonrestarting peer that the peer has restarted. At this point, the restarting peer sends the GR capability with Restart State bit set to 1 and Forwarding State bit set to 1 for the AFI/SAFIs.
  3. The nonrestarting peer at this point cleans up old (dead) BGP sessions and marks all the routes in the BGP table that are received from the restarting peer as stale.
    • If the restarting peer never reestablishes the BGP session, the nonrestarting peer purges all stale routes after the Restart Time expires.
    • The nonrestarting peer sends an initial routing table update, followed by an End-of-RIB (EoR) marker.
    • Restarting peer delays best-path calculation for an AFI until after receiving EoR from all peers except for those that are not GR capable or for the ones that have Restart State bit set.
  4. The restarting peer finally generates updates for its peers and sends the EoR marker for each AFI after the initial table is sent.
  5. The nonrestarting peers receive the routing updates from the restarting peer and remove stale marking for any refreshed route. It purges any remaining stale routes after EoR is received from the restarting peer or the Stale Path Timer expires.


BGP GR is an optional feature and is not enabled by default.

  • Use the command bgp graceful-restart to enable GR globally.
  • Use the command bgp graceful-restart restart-time value to set the GR restart timer
  • Use the command bgp graceful-restart stalepath-time value to set the maximum time for which the router will maintain the stale path entries in case it does not receives an EoR from the restarting peer.

The GR Restart Timer, which defaults to 120 seconds, takes care of clearing the stale path entries in case the BGP peer does not comes up within this time period.

If the BGP session is already in established state before GR configuration, the BGP sessions are required to be reset in order to exchange the GR capability.

The GR capability is verified by using the command show bgp afi safi neighbors ip-address. Notice that in the command output, the GR capability is in advertised and received state. If either the advertised or received state is missing, it means that one of the peers is not having GR configured or the GR was configured after the session came up.

					// IOS and NX-OS
router bgp 100
 bgp graceful-restart
 bgp graceful-restart restart-time 300
 bgp graceful-restart stalepath-time 400

router bgp 100
 bgp graceful-restart
 bgp graceful-restart restart-time 300
 bgp graceful-restart stalepath-time 400
 bgp graceful-restart purge-time 400

Sometimes, not all peers are GR capable and are not required to be GR capable as well. GR can also be configured on a per-neighbor basis and having the GR globally disabled. This helps in exchanging GR capability with only those neighbors for which forwarding should not be impacted or be least impacted.

  • GR is enabled for an individual neighbor using the command neighbor ip-address graceful-restart on both Cisco IOS XR and NX-OS.
  • Using the command neighbor ip-address ha-mode graceful-restart on Cisco IOS software.
					// IOS
router bgp 100
 neighbor ha-mode graceful-restart

// NX-OS and IOS XR
router bgp 100


Cisco’s implementation of GR assumes NSF is enabled and tells the peers: “If I ever drop this session, it is because I am failing over from primary RP to secondary RP and will keep forwarding packets.This makes the peer think that it needs to keep sending the packets.

This scenario works as long as there is no reload or reboot on the router. If the router goes down, the neighbor router keeps sending the packets to this router, instead of forwarding the traffic to a working path, assuming the router that restarted is performing a switchover and it has its Forwarding Information Base (FIB) updated. This causes the traffic to black hole and causes an outage.

The problem is not with the feature itself but with the understanding between GR and NSF. GR does not mean that NSF is enabled but only assumes that NSF is enabled on the router. NSF is not configurable but is enabled by default when the router is running in Stateful Switchover (SSO) mode. NSF can also be defined as a function to checkpoint the FIB on the standby router.

Stateful Switchover (SSO) is a redundancy feature that allows a Cisco device with two route processors to synchronise router configuration and control plane state information. In modular chassis with dual supervisors, NSF/SSO synchronizes information between the primary and backup supervisor, allowing for rapid supervisor switchover in case the primary fails.

Note: It is important to understand routers’ and switches’ different high-availability operating modes with dual RPs.

  • Stateful Switchover (SSO): Failover from the active RP (crashing or reloading) to the standby RP (which takes over as the active role) where state is preserved and the router was in hot-standby mode before the switchover.
  • RPR+: RP redundancy mode where standby RP is partially initialized, but there is no synchronization of state.

It is required to have SSO state for features like NSF, Nonstop Routing (NSR), or GR.

Parameter RPR RPR+ SSO
Failover Time 2-4 minutes 30-60 seconds 2-4 seconds
Status on "show module" output Cold Warm Hot
Backup SUP Engine Status The backup SUP engine is partially initialized and must reload every switch module after the primary engine fails. The backup SUP engine is partially initialized but doesn't need to reload each switch module after primary engine fails. The backup SUP engine is completely initialized and layer 2 information is synchronized with the primary engine.
Configuration redundancy
mode rpr
mode rpr-plus
mode sso
FIB Table Status The backup SUP engine doesn't have the FIB table synchronized.
All tables must be rebuilt after backup engine is initialized.
The backup SUP engine doesn't have the FIB table synchronized.
All tables must be rebuilt after backup engine is initialized.
FIB table is not flushed since it is already updated.
NSF No support No support Supports
Netflow Records Not maintained Not maintained Maintained

Leave a Reply

Related Post