New standard minimizes duration and impact of BGP router failure.
Extending the capabilities of the Border Gateway Protocol (BGP) is one of the first implementations of this initiative.
The Internet-Draft describing the protocol extensions to BGP is called "Graceful Restart Mechanism for BGP". Using BGP graceful restart, the data-forwarding plane can continue to process and forward packets even if the control plane - which is responsible for determining best paths - fails.
Graceful restart also reduces routing flaps, which stabilizes the network and reduces the consumption of control-plane resources.
BGP is an especially strong candidate for high-availability modifications. One reason is that it has been designed to carry a large number of routes. Convergence after a BGP software failure, then, usually takes longer than with other routing protocols, resulting in an outage of greater duration. In addition, because BGP is typically deployed at the WAN edge - where corporate and service provider networks meet - the effect of a failed BGP process can propagate across multiple networks rather than being confined to one domain.
BGP graceful restart was developed to minimize the duration and reach of an outage associated with a failed BGP process. To do so, the software extensions must be deployed on the router restarting the BGP process and on that router's BGP peers. The peers help the BGP process regain lost forwarding information and also help isolate failures from the rest of the network.
The protocol modifications begin when the initial BGP connection is established. Both the restarting router and its peers indicate their understanding of the BGP graceful restart mechanism by exchanging a new BGP capability (BGP capability code 64) in the initial BGP open messages that establish the session.
The restarting router also provides to its peers a list of IP-based protocols for which it has the capability to maintain forwarding state across a BGP restart. This list could include such protocols as IPv4, IPv6, IP Multicast and Multi-protocol Label Switching.
When the router restarts its BGP process, the TCP connection to the peer router might be cleared. Under normal circumstances, this would cause the peer router to clear all routes associated with the restarting router. This does not occur with BGP graceful restart, however. Instead, the peer router marks all routes as "stale," but continues to use them to forward packets based on the expectation that the restarting router will re-establish the BGP session shortly. Likewise, the restarting router also continues forwarding packets in the interim.
When the restarting router opens the new BGP session, it will again send BGP capability 64 to its peers. But this time, flags will be set in the graceful restart capabilities exchange to let the peer router know that the BGP process has restarted.
While continuing to forward packets, the peer router will refresh the restarting router with any relevant BGP routing information base (RIB) updates. The peer signals that it has finished sending the updates with an "End-of-RIB" (EOR) marker - an "empty" BGP update message. EOR markers help speed convergence because once the restarting router has received them from all peers, it knows it can begin best-path selection again using the new routing information. Similarly, the restarting router then sends any updates to its peer routers and uses the EOR marker to indicate the completion of the process.
Throughout this entire recovery procedure, user data packets have continued to flow between the BGP peers.
All the major routing vendors are in agreement on the IETF "Graceful Restart Mechanism for BGP" draft, and the two primary Internet backbone router suppliers currently are shipping code that supports the protocol extensions.