StarlingX 补丁升级功能

Patch功能说明

StarlingX系统具有升级的能力,这个特性叫做“patching”,提供从2个版本之间升级的能力,主要用户bug修复、安全补丁和特性增强等等。

Patching支持两种补丁, In-Service补丁和Reboot-required补丁。In-Service补丁不需要主机节点重启,只需要服务进程重启即可。Reboot-required补丁需要重启主机以实现补丁生效。在升级Reboot-required补丁时,需要先对主机进行lock操作,等待补丁applied,再unlock使其生效。

这篇介绍文档,主要面向开发人员使用补丁功能,而不是产品用户指南。它更着重介绍补丁修复功能,而不是包含补丁的各个方面。

简要的说,补丁修复包含2个阶段,创建补丁和应用补丁。下面进行详细介绍这两部分。

创建补丁

一个StarlingX补丁包括一个或多个系统升级所需要的rpm包。在开始创建补丁之前需要验证rpm包已经在已部署的StarlingX上安装了。以下步骤可以帮助我们确认。

  1. 确定已部署系统的软件版本,有两种方式
    • 在horizon界面上 Admin -> Platform
    -> System Configuration -> Systems
    • 使用命令行 system show
    controller-0:~$ . /etc/platform/openrc
    [sysadmin@controller-0 ~(keystone_admin)]$ system show
    +----------------------+--------------------------------------+
    | Property | Value |
    +----------------------+--------------------------------------+
    | contact | None |
    | created_at | 2019-10-14T03:10:50.862114+00:00 |
    | description | None |
    | https_enabled | False |
    | location | None |
    | name | 608dfe48-9a05-4b21-afc1-ea122574caa7 |
    | region_name | RegionOne |
    | sdn_enabled | False |
    | security_feature | spectre_meltdown_v1 |
    | service_project_name | services |
    | software_version | 19.09 |
    | system_mode | duplex |
    | system_type | All-in-one |
    | timezone | UTC |
    | updated_at | 2019-10-14T03:12:41.983029+00:00 |
    | uuid | 2639ad15-08a7-4f1b-a372-f927a5e4ab31 |
    | vswitch_type | none |
    +----------------------+--------------------------------------+

  2. 检查最新构建,找到针对这个版本需要升级的rpm包,选择需要的rpm生成补丁
    一旦确定需要升级/安装的rpm包,下一步就是准备补丁构建环境。作为StarlingX开发人员,最简单的办法是使用StarlingX Building 容器,我们只需要对容器进行小小的修改就可以了。StarlingX Building 容器可以使用构建教程生成。

现在假设StarlingX的源码已经下载好了,需要升级安装的rpm包也准备好了,现在我们开始构造补丁构建环境。再次声明,这个教程主要针对开发人员,而不是产品。

  1. 安装2个cgcs-patch依赖包,crypto和pycrypto
    sudo pip install crypto pycrypto

  2. 使用脚本$MY_REPO/stx/stx-update/extras/scripts/patch_build.sh创建补丁。

在这个脚本中,它从release-info.inc 文件中获取PLATFORM_RELEASE参数,并把PYTHONPATH指向repo中的cgcs-patch包,避免了安装cgcs-patch和手动指定PLATFORM_RELEASE参数。可以使用下面命令查看构建脚本的使用说明。

$ $MY_REPO/stx/stx-update/cgcs-patch/bin/patch_build --help
Usage: patch_build [ <args> ] ... <rpm list>
Options:
--id <id>                   Patch ID
--release <version>         Platform release version
--status <status>           Patch Status Code (ie. O, R, V)
--unremovable               Marks patch as unremovable
--reboot-required <Y|N>     Marks patch as reboot-required (default=Y)
--summary <summary>         Patch Summary
--desc <description>        Patch Description
--warn <warnings>           Patch Warnings
--inst <instructions>       Patch Install Instructions
--req <patch_id>            Required Patch
--controller <rpm>          New package for controller
--worker <rpm>              New package for worker node
--worker-lowlatency <rpm>   New package for worker-lowlatency node
--storage <rpm>             New package for storage node
--controller-worker <rpm>   New package for combined node
--controller-worker-lowlatency <rpm>   New package for lowlatency
combined node
--all-nodes <rpm>           New package for all node types

使用这个脚本可以指定patch id、reboot required、depended patches、rpm list等等,如果系统上没有的,需要新安装的包需要指定节点,比如 --controller 指定是在控制节点上新装包。脚本执行完后,可以得到名字为“<patch-id>.patch”的文件。

下面深入研究下这个补丁文件。

  1. 首先,这个补丁文件是个gzip压缩包。我们可以通过file命令去检查下。
    $ file 001.patch
    001.patch: gzip compressed data, was "001.patch", last modified:
    Fri Aug 16 05:56:59 2019, max compression

  2. 解压出来后,可以看到以下文件
    $ tar -xf 001.patch
    $ tree
    ├── 001.patch
    ├── metadata.tar
    ├── signature
    ├── signature.v2
    └── software.tar

  3. 解压 software.tar,可以发现它包含了所有需要安装的rpm包。注意:所有的rpm包在补丁构建时用下面的key进行签名。

$MY_REPO/build-tools/signing/ima_signing_key.priv

  1. 在metadata.tar中只有一个文件metadata.xml,包含补丁构建的所有信息。StarlingX集群系统会读取这个文件信息。

  2. signature文件包含software.tar和metadata.tar的MD5的组合。

  3. signature.v2是为software.tar和metadata.tar的签名文件,在当前环境中,它由$MY_REPO/build-tools/signing/dev-private-key.pemkey文件生成。

安装补丁

补丁生成后,可以手动安装补丁到指定的StarlingX系统,同时支持界面和命令行安装操作。补丁的生命周期包括四个状态: Available,Partial-Apply, Applied 和 Partial-Remove.

• Available:表示补丁已经上传到补丁存储库里了,但是还没有到软件升级的仓库,同样也没有在任何主机上安装。

• Partial-Apply:表示补丁升级程序已经被触发(sw-patch apply),已经在部分主机上安装,但是还没有在所有需要安装的主机上安装。

• Applied:表示已经在所有需要安装的主机上安装完成。

• Partial-Remove:表示补丁正在被移除,通过命令触发(sw-patch remove),正在移除,但是还没完全移除。

如果需要用命令行安装补丁,需要把补丁拷贝到active的控制节点上。StarlingX集群提供客户端命令sw-patch。补丁操作都是通过这个命令完成,这个命令提供了很多功能,包括upload, apply, query,host-install, delete, remove等等。

controller-0:~$ sw-patch --help
usage: sw-patch [--debug]
<subcommand> ...
Subcomands:
upload:         Upload one or more patches to the patching system.
upload-dir:     Upload patches from one or more directories to the
patching system.
apply:          Apply one or more patches. This adds the specified
patches to the repository, making the update(s)
available to the hosts in the system. Use --all to
apply all available patches.
Patches are specified as a space-separated list of
patch IDs.
remove:         Remove one or more patches. This removes the specified
patches from the repository.
Patches are specified as a space-separated list of
patch IDs.
delete:         Delete one or more patches from the patching system.
Patches are specified as a space-separated list of
patch IDs.
query:          Query system patches. Optionally, specify 'query
applied' to query only those patches that are applied,
or 'query available' to query those that are not.
show:           Show details for specified patches.
what-requires:  List patches that require the specified patches.
query-hosts:    Query patch states for hosts in the system.
host-install:   Trigger patch install/remove on specified host. To
force install on unlocked node, use the --force option.
host-install-async: Trigger patch install/remove on specified host. To
force install on unlocked node, use the --force option.
Note: This command returns immediately upon dispatching
installation request.
install-local:  Trigger patch install/remove on the local host. This
command can only be used for patch installation prior
to initial configuration.
drop-host:      Drop specified host from table.
query-dependencies: List dependencies for specified patch. Use
--recursive for recursive query.
is-applied:     Query Applied state for list of patches. Returns True
if all are Applied, False otherwise.
report-app-dependencies: Report application patch dependencies,
specifying application name with --app option, plus a
list of patches. Reported dependencies can be dropped
by specifying app with no patch list.
query-app-dependencies: Display set of reported application patch
dependencies.
commit:         Commit patches to free disk space. WARNING: This
action is irreversible!
--os-region-name: Send the request to a specified region

下面演示如何使用这个命令去安装补丁。演示用的补丁是需要安装在所有主机上的In-Service补丁,需要升级的StarlingX环境是 2+2+2的标准环境。

  1. 上传补丁文件
    controller-0:~$ sudo sw-patch upload 001.patch
    001 is now available
    检查补丁状态
    controller-0:~$ sudo sw-patch query
    Patch ID RR Release Patch State
    ======== == ======= ===========
    001 N 19.09 Available
    检查所有主机的的升级状态
    controller-0:/$ sudo sw-patch query-hosts
    Hostname IP Address Patch Current Reboot Required Release State
    ============ ============== ============= =============== ====== =====
    compute-0 192.178.204.7 Yes No 19.09 idle
    compute-1 192.178.204.9 Yes No 19.09 idle
    controller-0 192.178.204.3 Yes No 19.09 idle
    controller-1 192.178.204.4 Yes No 19.09 idle
    storage-0 192.178.204.12 Yes No 19.09 idle
    storage-1 192.178.204.11 Yes No 19.09 idle

Patch Current 表示当前主机是否有补丁安装,Yes表示没有安装补丁,No表示至少有一个补丁在安装

  1. 当补丁状态available后,可以触发补丁安装
    controller-0:/$ sudo sw-patch apply 001
    001 is now in the repo
    检查补丁状态
    controller-0:~$ sudo sw-patch query
    Patch ID RR Release Patch State
    ======== == ======= =============
    001 N 19.09 Partial-Apply
    检查节点状态
    controller-0:~$ sudo sw-patch query-hosts
    Hostname IP Address Patch Current Reboot Required Release State
    ============ ============== ============= =============== ====== =====
    compute-0 192.178.204.7 No No 19.09 idle
    compute-1 192.178.204.9 No No 19.09 idle
    controller-0 192.178.204.3 No No 19.09 idle
    controller-1 192.178.204.4 No No 19.09 idle
    storage-0 192.178.204.12 No No 19.09 idle
    storage-1 192.178.204.11 No No 19.09 idle

  2. 在每个节点上安装补丁,由于是in-service 补丁,所以不需要执行lock操作。
    controller-0:~$ sudo sw-patch host-install controller-0
    ...
    Installation was successful.
    检查主机升级状态
    controller-0:~$ sudo sw-patch query-hosts
    Hostname IP Address Patch Current Reboot Required Release State
    ============ ============== ============= =============== ====== =====
    compute-0 192.178.204.7 No No 19.09 idle
    compute-1 192.178.204.9 No No 19.09 idle
    controller-0 192.178.204.3 Yes No 19.09 idle
    controller-1 192.178.204.4 No No 19.09 idle
    storage-0 192.178.204.12 No No 19.09 idle
    storage-1 192.178.204.11 No No 19.09 idle

    在所有节点上安装补丁,需要为每个节点执行命令
    controller-0:~$ sudo sw-patch host-install controller-1
    ....
    Installation was successful.
    controller-0:~$ sudo sw-patch host-install compute-0
    ....
    Installation was successful.
    controller-0:~$ sudo sw-patch host-install compute-1
    ....
    Installation was successful.
    controller-0:~$ sudo sw-patch host-install storage-0
    ...
    Installation was successful.
    controller-0:~$ sudo sw-patch host-install storage-1
    ...
    Installation was successful.

  3. 所有节点按照完毕后,可以看到下面状态
    controller-0:~$ sudo sw-patch query
    Patch ID RR Release Patch State
    ======== == ======= ===========
    001 N 19.09 Applied
    controller-0:~$ sudo sw-patch query-hosts
    Hostname IP Address Patch Current Reboot Required Release State
    ============ ============== ============ =============== ======= =====
    compute-0 192.178.204.7 Yes No 19.09 idle
    compute-1 192.178.204.9 Yes No 19.09 idle
    controller-0 192.178.204.3 Yes No 19.09 idle
    controller-1 192.178.204.4 Yes No 19.09 idle
    storage-0 192.178.204.12 Yes No 19.09 idle
    storage-1 192.178.204.11 Yes No 19.09 idle
    此时补丁升级程序完成

除了补丁升级,StarlingX还支持补丁回退和删除,通过下面两个命令实现sw-patch remove和sw-patch host-install,和补丁安装有点类似。

补丁编排

在上面的例子中,演示了在集群中补丁升级的功能。但是在大规模集群中,整个升级过程会持续很长的时间。特别是reboot-required补丁,这个方案会很糟糕,效率很低而且给管理员带来很多工作。因此StarlingX提供了另一个高级特性“补丁编排”。它支持集群通过一些简单的操作达到升级的目的,极大减少管理员的工作负担和较少出错。这个功能有三种方式使用,客户端CLI、界面Horizon和VIM Restful API。

  1. 客户端CLI。StarlingX提供客户端工具sw-manager, 可以用于补丁编排。如下所示,可以通过创建和应用补丁策略来升级整个集群

    controller-0:~$ sw-manager patch-strategy -h
    usage: sw-manager patch-strategy [-h] ...
    optional arguments:
    -h, --help show this help message and exit
    Software Patch Commands:
    create Create a strategy
    delete Delete a strategy
    apply Apply a strategy
    abort Abort a strategy
    show Show a strategy
    controller-0:~$ sw-manager patch-strategy create -h
    usage: sw-manager patch-strategy create [-h]
    [--controller-apply-type {serial,ignore}]
    [--storage-apply-type {serial,parallel,ignore}]
    [--worker-apply-type {serial,parallel,ignore}]
    [--max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10,
    11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,
    28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,
    45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,
    62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,
    79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,
    96,97,98,99,100}]
    [--instance-action {migrate,stop-start}]
    [--alarm-restrictions {strict,relaxed}]
    optional arguments:
    -h, --help show this help message and exit
    --controller-apply-type {serial,ignore}
    defaults to serial
    --storage-apply-type {serial,parallel,ignore}
    defaults to serial
    --worker-apply-type {serial,parallel,ignore}
    defaults to serial
    --max-parallel-worker-hosts {2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,
    17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,
    37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,
    57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,
    77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
    97,98,99,100}
    maximum worker hosts to patch in parallel
    --instance-action {migrate,stop-start}
    defaults to stop-start
    --alarm-restrictions {strict,relaxed}
    defaults to strict

  2. 界面Horizon。打开Admin -> Platform
    -> Software Management -> Patch Orchestration 标签

  3. VIM API。<http://<oam_ip>:4545&gt;
    +--------+---------------------------------------+----------------------------+
    | Method | URI | Description |
    +========+=======================================+============================+
    | Post | /api/orchestration/sw-update/strategy | Create a patch strategy |
    +--------+---------------------------------------+----------------------------+
    | Delete | /api/orchestration/sw-update/strategy | Delete current patch |
    | | | strategy |
    +--------+---------------------------------------+----------------------------+
    | Get | /api/orchestration/sw-update/strategy | Get detailed information of|
    | | | current patch strategy |
    +--------+---------------------------------------+----------------------------+
    | Post | /api/orchestration/sw-update/strategy/| Apply or abort a patch |
    | | actions | strategy |
    +--------+---------------------------------------+----------------------------+

在补丁安装时,补丁编排要求集群处于一个良好的状态。
• 所有主机必须处于unlocked-enabled-available状态
• 系统没有告警
• 足够的空间用于VM迁移

当前开发状态

• 所有的源码都在StarlingX仓库里开源,包括“update”和“nfv”
• in-service补丁和reboot-required补丁的生成和安装已经经过验证
• 补丁编排还没经过验证