ISSU Technology White Paper

    16-08-2018

H3C_彩色.emf


Contents

Overview

Software segmentation

ISSU methods

Incremental upgrade

Supporting technologies and design

Modularity

Process-level GR

Application scenarios

Adding features

Removing features

Incremental upgrade implementation

Upgrading features that are not running

Upgrading features that are running

ISSU reboot upgrade

Supporting technologies and design

ISSU reboot data restoration

Protocol agent

Process-level GR

Packet redirection

System-level switchover

Protocol GR

After reboot, the protocol re-establishes a session with each neighboring device and exchanges information to restore its route information. The neighboring devices determine which routes to restore and which routes to delete. If the protocol fails to start up in the required period, the neighboring devices delete the flagged routes.Application scenarios

Interface card ISSU reboot

ISSU reboot of a dual-MPU device or IRF fabric

ISSU reboot of a centralized device

ISSU reboot of a single-MPU device

Reboot upgrade

ISSU procedures

Upgrading a centralized device or a single-MPU distributed device

Upgrade prerequisites

Upgrade procedure

Upgrading a dual-MPU distributed device

Upgrade prerequisites

Upgrade procedure

Upgrading an IRF fabric

Uninstalling feature images

Support for MDCs


Overview

Nowadays, companies, organizations, and governments heavily depend on the network to provide services. A very short period of downtime might result in huge loss. Network availability has never been more important than today.

The In-Service Software Upgrade (ISSU) feature upgrades software with a minimum amount of downtime. The feature eliminates scheduled downtime to ensure service continuity while software is being upgraded.

Comware V7 provides multiple ISSU methods to meet the upgrade requirements in various scenarios. Each upgrade method takes advantage of multiple technologies and system designs. This document describes the major technologies used in ISSU and the processes that these technologies cooperate to implement ISSU in different scenarios.

Software segmentation

Most upgrades are to add new features, enhance old features, or fix bugs, and do not require upgrading the kernel or basic system functions. Segmentation of software code allows an upgrade to be performed without touching the kernel or basic functionality. This helps improve upgrade efficiency and reduce service interruption.

Comware V7 software is segmented and packaged into the following types of image files:

Boot image file—A .bin file that contains the Linux operating system kernel. It provides process management, memory management, file system management, and the emergency shell.

System image file—A .bin file that contains the minimum feature modules required for device operation and some basic features, including device management, interface management, configuration management, and routing.

Feature image file—A .bin file that contains advanced software features. Users purchase feature images as needed.

Patch image file—A .bin file irregularly released for fixing bugs without rebooting the device. A patch image does not add new features or functions.

Comware V7 software images might be released separately or as a whole in one image package envelope (.ipe) file. With an .ipe file, you do not need to care about version compatibility between the images.

If an .ipe file is used, you can use the display install ipe-info command to view the .bin files. You can also use the install add command to decompress an .ipe file to obtain the .bin files.

ISSU methods

Before releasing a software image, H3C determines the ISSU methods for upgrading each history version of the image to the new version based on version compatibility. Users can use the release notes or CLI commands to identify the ISSU methods.

ISSU supports the following upgrade types:

Compatible upgrade—The running version is compatible with the new version. This upgrade type supports the ISSU methods listed in Table 1. Compatible upgrade has less service impact than incompatible upgrade.

Incompatible upgrade—The running software version is incompatible with the new software version. The two versions cannot run concurrently.

This upgrade type supports only one upgrade method (also called incompatible upgrade). This method requires a cold reboot to upgrade both control and data planes. Incompatible upgrade disrupts services if hardware redundancy is not available.

This document focuses on compatible upgrade.

Table 1 ISSU methods for compatible upgrade

ISSU method

Description

Applicable scenarios

Incremental upgrade

Upgrades only user mode processes that have differences between the new and old software versions. The user mode processes must have backup processes to provide services during the upgrade.

This method is typically used for user mode processes.

ISSU Reboot

Reboots CPUs to complete software upgrade. Before the reboot, this method saves all hardware data, configuration settings, running data, and status information to memory. During the reboot, the data plane still forwards traffic. For services that require regular communication with their peers, this method uses protocol agents to maintain their connectivity and status.

This method is typically used for critical processes, including kernel mode processes and user mode processes that cannot be upgraded by using incremental upgrade.

Reboot

Reboots both the control and data planes to complete the software upgrade. This method disrupts service if hardware redundancy is not available.

N/A

Incremental upgrade

Supporting technologies and design

Modularity

Comware V7 uses a modular framework. Each network service runs and maintains its own process. Starting, stopping, or restarting one process does not affect any other processes.

To upgrade a feature, you only need to use the new software image to reboot the feature process, eliminating the need to reboot the entire system.

Modular design also brings scalability. You can add new features without affecting system operation.

Figure 1 Comware V7 modular framework

Process-level GR

Comware V7 uses process-level GR technologies to prevent process reboots from interrupting ongoing services. Process-level GR technologies include single-process GR and standby-process GR. Single-process GR provides data backup and standby-process GR provides process backup.

Single-process GR

To implement single-process GR for a process, the system maintains a copy of operating data for the process in an in-memory database, as shown in Figure 2. When the process reboots, the system retains the data and continues to provide services. After reboot, the process restores the operating data from the in-memory database, communicates with relevant processes to update the operating data, and continues to provide services. During the reboot, no ongoing services are interrupted, no other processes on the device are affected, and no neighbor devices detect the change.

Compared with standby-process GR, single-process GR requires fewer resources and has fewer system constraints.

Figure 2 Single-process GR

Standby-process GR

To implement standby-process GR, the standby process receives operating data from the active process in real time and is ready to take over, as shown in Figure 3. When the active process fails or restarts, the standby process immediately takes over. The original active process reboots and becomes the standby process.

Compared with single-process GR, standby-process GR takes less time and provides higher availability, but it requires more resources.

Figure 3 Standby-process GR

Process-level GR prevents an incremental upgrade from interrupting ongoing services and confines the impact to an individual feature.

Application scenarios

Adding features

Incremental upgrade is performed to add new service features. Because of the modular framework of Comware V7, features can be added to the system without any impact on the system.

After being added to the system, some features run by default and some features do not. You can enable or disable features as required.

Removing features

Incremental upgrade is also performed to remove features, for example, for a software downgrade. Removing inactive features does not affect system operation, and removing active features affects only the services provided by the features. If a feature is active, H3C recommends that you disable the feature before removing it to minimize the impact on the users and the network.

Incremental upgrade implementation

Upgrading features that are not running

Upgrading features that are not running does not affect system operation.

Upgrading features that are running

Upgrading a feature that has standby processes

If a running feature has redundant processes, all processes must be upgraded, with a minimum of one standby processes upgraded prior to the active process. Typically, a program has redundant processes across MPUs (supervisor engines) in a distributed system, for example, a dual-MPU distributed device or an IRF fabric.

Upgrading a standby process does not affect system operation, as shown in Figure 4.

Figure 4 Upgrading the standby process

ISSU uses standby-process GR when it upgrades the active process. After a graceful restart, the upgraded process becomes a standby process.

Figure 5 Upgrading the active process

Upgrading a feature that does not have a standby process

A program has only one process when it runs on the MPU of a single-MPU distributed device, on a centralized device, or on an interface card of a distributed device. ISSU uses single-process GR for such a program.

Figure 6 Upgrading a feature that has a single process

ISSU reboot upgrade

Supporting technologies and design

ISSU reboot data restoration

ISSU reboot data restoration ensures that processes can immediately provide services after an upgrade, rather than start from scratch.

Process-level GR technologies maintain a copy of the real-time operating data for a process in memory. Data in memory cannot survive a software reboot. For processes to obtain the operating data after a software reboot, the system saves the data to a non-volatile storage medium. After the software upgrade, the system restores the operating data from the storage medium to memory. The processes can continue to use the data to provide services.

Figure 7 Saving and restoring data for ISSU reboot

Protocol agent

Most control protocols periodically send hello or keepalive packets to maintain or detect connectivity. If such a protocol does not receive a reply in the specified period, the protocol determines that the control session has been down and changes the protocol status. Protocol status changes might result in network topology flapping. To avoid this issue during an upgrade, Comware V7 provides the protocol agent feature. This feature enables the system to select a properly running card to operate as the agent of the protocols to send hello or keepalive packets during an upgrade.

As shown in Figure 8, the system starts the protocol agent to take over the keepalive responsibility before performing an ISSU reboot upgrade. During the upgrade, the protocol agent sends hello or keepalive packets to maintain the protocol status. After the upgrade, the system stops the agent and the protocols continue to work.

Figure 8 Protocol agent working process

Process-level GR

Process-level GR technologies are also supported for ISSU reboot upgrade. For more information, see "Process-level GR."

Packet redirection

Packet redirection prevents packets destined for the native device from being dropped when they arrive at a card that is being upgraded.

Before performing an ISSU reboot on a card, Comware V7 redirects packets on the card to an operating MPU if the packets are destined for the device. After the reboot is complete, Comware V7 cancels the redirection.

Packet redirection does not apply to packets that are not destined for the native device. The card can forward these packets in hardware while an ISSU reboot is being performed on it.

System-level switchover

For a system with two or more MPUs, MPUs can be upgraded one by one. While one MPU is being upgraded, any one of the other MPUs controls the system to provide uninterrupted services. The key to this method is to implement graceful takeover of the controller role from one MPU to another MPU. The system-level high availability features of Comware V7 can meet this requirement.

Protocol GR

Protocol GR is used when the device has only one MPU and service continuity cannot be ensured through protocol status backup within the device.

Protocol GR ensures continuous services while the device is performing an active/standby switchover or rebooting an IP or MPLS forwarding protocol (for example, BGP, IS-IS, OSPF, LDP, and RSVP-TE). Protocol GR requires the cooperation of neighboring devices for information backup and restoration, such as route information backup and restoration.

Different protocols have independent GR protocols, but the GR processes are the same. Two neighboring devices first negotiate the GR capability, and start the GR process when they both have the GR capability.

The following is a GR process summary:

1. When a protocol reboots, the device retains its forwarding entries and continues to forward packets based on the entries. Each neighboring device, upon detecting the reboot, flags the routes learned from the device and continues to use the routes to forward packets.

2. After reboot, the protocol re-establishes a session with each neighboring device and exchanges information to restore its route information. The neighboring devices determine which routes to restore and which routes to delete.

If the protocol fails to start up in the required period, the neighboring devices delete the flagged routes.

Application scenarios

Interface card ISSU reboot

For ISSU reboot upgrade of an interface card, the system performs the following operations to ensure service continuity:

Starts the protocol agent on the active MPU to provide the keepalive function for the interface card.

Creates a redirection entry for packets destined for the device that will arrive at the interface card. These packets will be redirected to the active MPU.

Uses the ISSU reboot data restoration function for processes on the interface card.

The ISSU reboot upgrade process for an interface card is as follows:

1. Processes save their real-time operating data to memory while they are operating.

2. Before starting an ISSU reboot for an interface card, the system backs up the operating data of the processes on the card from memory to a non-volatile storage medium. Then, the system starts the protocol agent and creates a packet redirection entry for the card.

3. The system stops the old software and loads the new version. The protocol agent provides the keepalive function for the interface card. During this process, the hardware continues to forward traffic because it is not reset. The control planes of other cards operate without interruption.

4. After the new version starts up, the system restores the backup operating data to memory.

5. The processes on the card restart up and restore their operating data. The system stops the protocol agent and cancels the packet redirection.

Figure 9 ISSU reboot upgrade for an interface card

During the ISSU reboot upgrade, the device continues to forward traffic, the cards that are not rebooted provide services normally, and the rebooted card restores quickly after reboot.

ISSU reboot of a dual-MPU device or IRF fabric

During an ISSU reboot, a dual-MPU device or IRF fabric typically uses active/standby switchover to ensure the continuity of services on the MPUs. If the MPUs also provide functionality of interface cards, there might be processes that do not have backup processes. The device will perform the same ISSU upgrade operations as in an interface card upgrade, in addition to active/standby switchover.

Figure 10 ISSU reboot upgrade for the active MPU

As shown in Figure 10, processes 4, 5, and 6 do not have backup processes. ISSU reboot data restoration is used to ensure service continuity for these processes. Processes 1, 2, and 3 have backup processes on the standby MPU. An active/standby switchover is performed for these processes.

Standby MPU upgrade does not affect system operation and is simpler than active MPU upgrade. After upgrade, the standby MPU receives backup data from the active MPU. As for the services that the standby MPU provides as an interface card, the ISSU reboot upgrade is the same as upgrading an interface card.

Upgrading the standby MPU imposes less impact on the system than upgrading the active MPU. H3C recommends upgrading the standby MPU before upgrading the active MPU.

Figure 11 ISSU reboot upgrade for the standby MPU

ISSU reboot of a centralized device

A centralized device has only one MPU and does not have a second MPU to take over during upgrade. Before performing an ISSU reboot upgrade, you must perform the following tasks:

Tune up the keepalive interval for each protocol that uses a keepalive mechanism or disable the keepalive mechanism.

Enable GR for Layer 3 protocols so the neighboring devices can help with entry backup and restoration.

The following is the ISSU reboot upgrade process for a centralized device:

1. Processes save their real-time operating data to memory while they are operating.

2. Before starting an ISSU reboot, the system backs up the operating data of the processes in memory to a non-volatile storage medium.

3. The system stops the old software and loads the new version. During this process, the hardware continues to forward traffic because it is not reset.

4. After the new version starts up, the system restores the backup operating data to memory.

5. The processes restart, restore their operating data, and continue to provide services.

Figure 12 ISSU reboot upgrade of a centralized device

After the upgrade is complete, cancel the settings you have made before the upgrade as needed.

ISSU reboot of a single-MPU device

A single-MPU device uses the same ISSU reboot method as a centralized device. Due to lack of backup processes, all processes use ISSU reboot data restoration for ISSU reboot. Routing protocols and MPLS protocols must use GR for data restoration.

Reboot upgrade

A reboot upgrade is required if the old and new software versions cannot use ISSU reboot data restoration for ISSU. Service continuity is ensured only if the system has redundant MPUs and the upgrade does not involve interface cards.

Figure 13 Reboot upgrade process

To perform a reboot upgrade for a dual-MPU device:

1. Upgrade the standby MPU.

2. Upgrade the active MPU.

When you reboot the active MPU, an active/standby switchover occurs. The standby MPU takes over the role of the active MPU. The reboot typically does not disrupt services because MPUs used in most distributed devices do not have network service interfaces.

ISSU procedures

This document provides a simplified version of ISSU procedures. When you perform an ISSU, read the fundamentals configuration guide for your device. Strictly follow the recommended ISSU procedure to ensure a successful upgrade or downgrade.

Upgrading a centralized device or a single-MPU distributed device

Upgrade prerequisites

1. (Optional.) Disable running features that are not available in the new version.

If exceptions occur after you disable the features, quickly restore the original settings, and find a solution before you continue with the ISSU. This step helps reduce inappropriate upgrade or downgrade attempts.

2. Verify that the free storage space is more than twice the .ipe file size. This amount of storage space is required for storing the .ipe file and the .bin files extracted from the .ipe file. If the free space is insufficient, delete unused files.

3. Use FTP or TFTP to transfer the .ipe file to the root directory of the device's storage medium.

4. Examine the system operating status. Make sure the system is operating stably and no cards are being installed or removed.

5. Use the display version comp-matrix command to identify the version compatibility and the ISSU method.

6. Identify the protocols that will restart during the upgrade. Make sure GR is enabled for all GR-capable protocols.

For an ISSU reboot upgrade, all running protocols will restart.

For an incremental upgrade, the display version comp-matrix command displays protocols that will restart.

7. (Optional.) For ISSU reboot upgrade of a single-MPU device, specify a long keepalive interval or disable the keepalive feature. Make sure the neighboring devices use the same keepalive settings as the local device. For example, you must change MSTP keepalive settings on the peer network devices.

8. (Optional.) Use the save command to save the running configuration.

If you do not save the running configuration, configuration changes will be lost after a reboot.

Upgrade procedure

1. Use the issu load file ipe filename command to upgrade the software.

The affected processes stop, and then start with the new version of images.

If the upgrade involves interface cards, the interface cards are automatically upgraded when the MPU is upgraded.

2. Execute the issu commit command to commit the upgrade.

If you do not commit the upgrade within the required period, the system rolls back to the original software version.

3. (Optional.) If you modified the configuration when preparing for the upgrade, restore the configuration as needed, and then use the save command to save the running configuration.

Upgrading a dual-MPU distributed device

Upgrade prerequisites

1. Verify that the two MPUs are running the same software version and are operating correctly.

2. Verify that no ISSU process is in progress.

3. For other upgrade prerequisites, see "Upgrade prerequisites."

Upgrade procedure

To reduce ISSU impact on the system, do not modify the device configuration or perform any other device management operations during the upgrade process.

Figure 14 Upgrading a dual-MPU device

To upgrade a dual-MPU device:

1. Use the issu load file ipe filename slot slot-number command to upgrade the standby MPU.

On a dual-MPU device, the two MPUs operate in load-balanced mode. Active processes of protocols are distributed between the MPUs. However, the active processes of some fundamental functions always run on the active MPU. Upgrading the standby MPU first imposes less impact on the system. If an error occurs while the standby MPU is being upgraded, you can roll back the software.

2. Use the issu run switchover command to perform an active/standby switchover.

After the switchover, the upgraded standby MPU becomes the new active MPU.

3. Execute the issu accept command to accept the upgrade.

To improve the reliability of the ISSU process, the system maintains a rollback timer. If you do not accept the upgrade before the timer expires, the system automatically rolls back to the original software version.

4. Use the issu commit command to commit the upgrade.

When you execute this command, the system upgrades the new standby MPU and the interface cards.

5. (Optional.) If you modified the device configuration when preparing for the upgrade, restore the configuration as needed, and then use the save command to save the running configuration.

Upgrading an IRF fabric

An IRF fabric uses the same ISSU procedure as a single-MPU distributed device if the fabric contains a single centralized device or a single-MPU distributed device. For more information, see "Upgrading a centralized device or a single-MPU distributed device."

A multichassis or multi-MPU IRF fabric uses the same ISSU procedure as a dual-MPU distributed device except that you must perform the upgrade member by member. If a member has two MPUs, upgrade the active MPU first. In addition, you must use the issu commit command to commit the upgrade for each subordinate member.

Uninstalling feature images

Uninstall a feature image if you do not want to run the feature. This process is simple because ISSU does not need to run new functions or handle changes to running functions.

To uninstall a feature image:

1. (Optional.) If the feature is enabled, disable it and remove relevant settings.

2. (Optional.) Use the save command to save the running configuration.

If you do not save the running configuration, configuration changes will be lost after a reboot.

3. Use the install deactivate command to uninstall the feature image from MPUs in the order of subordinate members' MPUs, the master's standby MPU, and the master's active MPU.

4. Use the install commit command to commit the downgrade.

Support for MDCs

All multitenant device contexts (MDCs) on a device must run the same software images. If a device has MDCs, you must perform the upgrade tasks on the default MDC. All other MDCs are automatically upgraded.

The upgrade methods and procedures are the same as upgrading MDC-incapable devices.

On an MDC-capable device, only one copy of the new software images is required. You do not need to reserve storage space on any MDCs except for the default MDC. However, you must perform the preparation tasks on each MDC. For example, disable the running features that are not provided in the new version, and prolong the keepalive intervals on each MDC.

新华三官网