- Table of Contents
-
- 12-High Availability Configuration Guide
- 00-Preface
- 01-High Availability Overview
- 02-Active and Standby Switchover Configuration
- 03-Ethernet OAM Configuration
- 04-CFD Configuration
- 05-DLDP Configuration
- 06-RPR Configuration
- 07-RRPP Configuration
- 08-Smart Link Configuration
- 09-Monitor Link Configuration
- 10-VRRP Configuration
- 11-BFD Configuration
- 12-Track Configuration
- Related Documents
-
Title | Size | Download |
---|---|---|
01-High Availability Overview | 74.88 KB |
Communication interruptions can seriously affect widely-deployed value-added services such as IPTV and video conference. Therefore, the basic network infrastructures must be able to provide high availability.
There are three effective ways to improve availability:
· Increasing fault tolerance
· Speeding up fault recovery
· Reducing impact of faults on services
Availability requirements
Availability requirements fall into three levels based on purpose and implementation.
Table 1 Availability requirements
Level |
Requirement |
Solution |
1 |
Decrease system software and hardware faults |
· Hardware—Simplified circuit design, enhanced production techniques, and reliability tests. · Software—Reliability design and tests |
2 |
Protect system functions from being affected by failures |
Device and link redundancy and switchover |
3 |
Enable the system to recover as fast as possible |
Fault detection, diagnosis, isolation, and recovery technologies |
The level 1 availability requirement should be considered during the design and production process of network devices. The level 2 availability requirement should be considered during network design. The level 3 availability requirement should be considered during network deployment according to the network infrastructure and service characteristics.
Availability evaluation
Typically, Mean Time Between Failures (MTBF) and Mean Time to Repair (MTTR) are used to evaluate the availability of a network.
MTBF
MTBF is the predicted elapsed time between inherent failures of a system during operation. It is typically expressed in hours. A higher MTBF means a higher availability.
MTTR
MTTR is the average time required to repair a failed system. MTTR in a broad sense also involves spare parts management and customer services.
MTTR = fault detection time + hardware replacement time + system initialization time + link recovery time + routing time + forwarding recovery time. A smaller value of each item means a smaller MTTR and a higher availability.
High availability technologies
As previously mentioned, increasing MTBF or decreasing MTTR can enhance the availability of a network. The high availability technologies described in this section meet the level 3 high availability requirements in the aspect of decreasing MTTR.
High availability technologies can be classified as fault detection technologies or protection switchover technologies.
Fault detection technologies
Table 2 Fault detection technologies
Technology |
Introduction |
Reference |
CFD |
Connectivity Fault Detection (CFD), which conforms to IEEE 802.1ag Connectivity Fault Management (CFM) and ITU-T Y.1731, is an end-to-end per-VLAN link layer Operations, Administration and Maintenance (OAM) mechanism used for link connectivity detection, fault verification, and fault location. |
Chapter "Configuring CFD" |
DLDP |
The Device link detection protocol (DLDP) deals with unidirectional links that may occur in a network. On detecting a unidirectional link, DLDP, as configured, can shut down the related port automatically or prompt users to take actions to avoid network problems. |
Chapter "Configuring DLDP" |
Ethernet OAM |
As a tool monitoring Layer 2 link status, Ethernet OAM is mainly used to address common link-related issues on the "last mile". You can monitor the status of the point-to-point link between two directly connected devices by enabling Ethernet OAM on them. |
Chapter "Configuring Ethernet OAM" |
BFD |
Bidirectional forwarding detection (BFD) provides a single mechanism to quickly detect and monitor the connectivity of links or IP forwarding in networks. To improve network performance, devices must quickly detect communication failures to restore communication through backup paths as soon as possible. |
Chapter "Configuring BFD" |
NQA |
Network Quality Analyzer (NQA) analyzes network performance, services and service quality through sending test packets, and provides you with network performance and service quality parameters such as jitter, TCP connection delay, FTP connection delay and file transfer rate. |
Network Management and Monitoring Configuration Guide |
Monitor Link |
Monitor Link works together with Layer 2 topology protocols to adapt the up/down state of a downlink port to the state of an uplink port. This feature enables fast link switchover on a downstream device in response to the uplink state change on its upstream device. |
Chapter "Configuring Monitor Link" |
Track |
The track module is used to implement collaboration between different modules. The collaboration here involves three parts: the application modules, the track module, and the detection modules. These modules collaborate with one another through collaboration entries. That is, the detection modules trigger the application modules to perform certain operations through the track module. More specifically, the detection modules probe the link status, network performance and so on, and inform the application modules of the detection result through the track module. Once notified of network status changes, the application modules deal with the changes to avoid communication interruption and network performance degradation. |
Chapter "Configuring track" |
Protection switchover technologies
Protection switchover technologies aim at recovering network faults. They back up hardware, link, routing, and service information for switchover in case of network faults to ensure continuity of network services.
Table 3 Protection switchover technologies
Technology |
Introduction |
Reference |
Active and standby switchover |
Devices supporting active and standby switchover are normally equipped with two main boards, with one being the active main processing unit (MPU), and the other being the standby MPU. The configurations on the standby MPU are the same as those on the active MPU. When the active MPU fails or is plugged out, the standby MPU automatically becomes the active MPU to ensure non-stop operating of the devices. |
Chapter "Configuring active and standby switchover" |
Ethernet Link Aggregation |
Ethernet link aggregation, most often simply called link aggregation, aggregates multiple physical Ethernet links into one logical link to increase link bandwidth beyond the limits of any one single link. This logical link is called an aggregate link. It allows for link redundancy because the member physical links can dynamically back up one another. |
Layer 2—LAN Switching Configuration Guide |
Smart Link |
Smart Link is a feature developed to address the slow convergence issue with STP. It provides link redundancy as well as fast convergence in a dual uplink network, allowing the backup link to take over quickly when the primary link fails. |
Chapter "Configuring Smart Link" |
MSTP |
As a Layer 2 management protocol, the Multiple Spanning Tree Protocol (MSTP) eliminates Layer 2 loops by selectively blocking redundant links in a network, and in the mean time, allows for link redundancy. |
Layer 2—LAN Switching Configuration Guide |
RPR |
Resilient Packet Ring (RPR) is a new MAC layer protocol designed for transferring mass data services over MANs. It can operate on synchronous optical network/synchronous digital hierarchy (SONET/SDH), Dense Wavelength Division Multiplexing (DWDM) and Ethernet to provide flexible and efficient networking schemes for broadband IP MANs carriers. |
Chapter "Configuring RPR" |
RRPP |
The Rapid Ring Protection Protocol (RRPP) is a link layer protocol designed for Ethernet rings. RRPP can prevent broadcast storms caused by data loops when an Ethernet ring is healthy, and rapidly restore the communication paths between the nodes in the event that a link is disconnected on the ring. |
Chapter "Configuring RRPP" |
FRR |
Fast Reroute (FRR) provides a quick per-link or per-node protection on an LSP. In this approach, once a link or node fails on a path, FRR comes up to reroute the path to a new link or node to bypass the failed link or node. This can happen as fast as 50 milliseconds thus minimizing data loss. Protocols such as RIP, OSPF, IS-IS, static routing, and RSVP-TE support this technology. |
Layer 3—IP Routing Configuration Guide, MPLS Configuration Guide, and Configuration Guide of the corresponding protocols |
GR |
Graceful Restart (GR) ensures the continuity of packet forwarding when a protocol, such as BGP, IS-IS, OSPF, LDP, or RSVP-TE, restarts or during an active/standby switchover process. It needs other devices to implement routing information backup and recovery. |
Layer 3—IP Routing Configuration Guide, MPLS Configuration Guide, and Configuration Guide of the corresponding protocols |
NSR |
Non-stop Routing (NSR) is a new feature used to ensure non-stop data transmission during an active/standby switchover. It backs up IP/MPLS forwarding information from the active MPU to the standby MPU. Upon an active/standby switchover, NSR can complete link state recovery and route re-generation without requiring the cooperation of other devices. IS-IS supports this feature. |
Layer 3—IP Routing Configuration Guide |
VRRP |
Virtual Router Redundancy Protocol (VRRP) is an error-tolerant protocol, which provides highly reliable default links on multicast and broadcast LANs such as Ethernet, avoiding network interruption due to failure of a single link. |
Chapter "Configuring VRRP" |
A single availability technology cannot solve all problems. Therefore, a combination of availability technologies, chosen on the basis of detailed analysis of network environments and user requirements, should be used to enhance network availability. For example, access-layer devices should be connected to distribution-layer devices over redundant links, and core-layer devices should be fully meshed. Also, network availability should be considered during planning prior to building a network.