Overview 概览

Prior to the 3.5.0 release, the membership and all other configuration parameters of Zookeeper were static - loaded during boot and immutable at runtime. Operators resorted to ‘‘rolling restarts’’ - a manually intensive and error-prone method of changing the configuration that has caused data loss and inconsistency in production.

在3.5.0版本之前,Zookeeper 的成员关系和所有其他配置参数在引导期间是静态加载的,在运行时是不可变的。操作员采用“滚动重启”——这是一种手动步骤密集且容易出错的方法,可能会导致生产中数据丢失和不一致的配置。

Starting with 3.5.0, “rolling restarts” are no longer needed! ZooKeeper comes with full support for automated configuration changes: the set of Zookeeper servers, their roles (participant / observer), all ports, and even the quorum system can be changed dynamically, without service interruption and while maintaining data consistency. Reconfigurations are performed immediately, just like other operations in ZooKeeper. Multiple changes can be done using a single reconfiguration command. The dynamic reconfiguration functionality does not limit operation concurrency, does not require client operations to be stopped during reconfigurations, has a very simple interface for administrators and no added complexity to other client operations.

从3.5.0开始,“滚动重启”就不再需要了!ZooKeeper 提供了对自动化配置更改的全面支持: ZooKeeper 服务器集、它们的角色(参与者/观察者)、所有端口,甚至 quorum 系统都可以动态更改,不会出现服务中断,同时保持数据一致性。

就像 ZooKeeper 中的其他操作一样,重新配置会立即执行。可以使用单个重新配置命令进行多个更改。动态重新配置功能不限制操作并发性,不需要在重新配置期间停止客户端操作,具有非常简单的管理员接口,对其他客户端操作没有增加复杂性。

New client-side features allow clients to find out about configuration changes and to update the connection string (list of servers and their client ports) stored in their ZooKeeper handle. A probabilistic algorithm is used to rebalance clients across the new configuration servers while keeping the extent of client migrations proportional to the change in ensemble membership.

新的客户端特性允许客户端发现配置更改,并更新存储在 ZooKeeper 句柄中的连接字符串(服务器及其客户端端口列表)。使用概率算法在新的配置服务器之间重新平衡客户端,同时保持客户端迁移的范围与集成成员的变化成正比。

This document provides the administrator manual for reconfiguration. For a detailed description of the reconfiguration algorithms, performance measurements, and more, please see our paper:

本文档提供了重新配置的管理员手册。关于重构算法的详细描述,性能测量等,请参阅我们的论文:

Shraer, A., Reed, B., Malkhi, D., Junqueira, F. Dynamic Reconfiguration of Primary/Backup Clusters. In *USENIX Annual Technical Conference (ATC)*(2012), 425-437* : Links: paper (pdf), slides (pdf), video, hadoop summit slides

Note: Starting with 3.5.3, the dynamic reconfiguration feature is disabled by default, and has to be explicitly turned on via reconfigEnabled configuration option.
注意: 从3.5.3开始,动态配置特性默认是禁用的,必须通过 reconfigEnabled配置选项显式打开。

Changes to Configuration Format 更改配置格式

Specifying the client port 指定客户端端口

A client port of a server is the port on which the server accepts client connection requests. Starting with 3.5.0 the clientPort and clientPortAddress configuration parameters should no longer be used. Instead, this information is now part of the server keyword specification, which becomes as follows:

服务器的客户端端口是服务器接受客户端连接请求的端口。从3.5.0开始,应该不再使用 clientPort 和 clientPortAddress 配置参数。相反,这些信息现在是服务器关键字规范的一部分,如下所示:

server.<positive id> = <address1>:<port1>:<port2>[:role];[<client port address>:]<client port>

The client port specification is to the right of the semicolon. The client port address is optional, and if not specified it defaults to “0.0.0.0”. As usual, role is also optional, it can be participant or observer (participant by default).

客户端端口规范位于分号的右侧。客户端端口地址是可选的,如果没有指定,则默认为“0.0.0.0”。通常,角色也是可选的,它可以是参与者或观察者(缺省情况下是参与者)。

Examples of legal server statements:

合法的服务器语句示例:

server.5 = 125.23.63.23:1234:1235;1236
server.5 = 125.23.63.23:1234:1235:participant;1236
server.5 = 125.23.63.23:1234:1235:observer;1236
server.5 = 125.23.63.23:1234:1235;125.23.63.24:1236
server.5 = 125.23.63.23:1234:1235:participant;125.23.63.23:1236

Specifying multiple server addresses 指定多个服务器地址

Since ZooKeeper 3.6.0 it is possible to specify multiple addresses for each ZooKeeper server (see ZOOKEEPER-3188). This helps to increase availability and adds network level resiliency to ZooKeeper. When multiple physical network interfaces are used for the servers, ZooKeeper is able to bind on all interfaces and runtime switching to a working interface in case a network error. The different addresses can be specified in the config using a pipe (’|’) character.

由于 ZooKeeper 3.6.0,可以为每个 ZooKeeper 服务器指定多个地址(参见 https://issues.apache.org/jira/projects/ZOOKEEPER/issues/ZOOKEEPER-3188)。这有助于提高可用性,并增加 ZooKeeper 的网络级弹性。当服务器使用多个物理网络接口时,ZooKeeper 能够绑定到所有接口,并在运行时切换到一个工作接口,以防出现网络错误。不同的地址可以使用管道(“ |”)字符在配置中指定。

Examples for a valid configurations using multiple addresses:

使用多个地址的有效配置示例:

server.2=zoo2-net1:2888:3888|zoo2-net2:2889:3889;2188
server.2=zoo2-net1:2888:3888|zoo2-net2:2889:3889|zoo2-net3:2890:3890;2188
server.2=zoo2-net1:2888:3888|zoo2-net2:2889:3889;zoo2-net1:2188
server.2=zoo2-net1:2888:3888:observer|zoo2-net2:2889:3889:observer;2188

The standaloneEnabled flag 独立模式启用的标志

Prior to 3.5.0, one could run ZooKeeper in Standalone mode or in a Distributed mode. These are separate implementation stacks, and switching between them during run time is not possible. By default (for backward compatibility) standaloneEnabled is set to true. The consequence of using this default is that if started with a single server the ensemble will not be allowed to grow, and if started with more than one server it will not be allowed to shrink to contain fewer than two participants.

在3.5.0之前,可以在独立模式或分布式模式下运行 ZooKeeper。这两种模式是互相独立实现的,在运行时不可能在它们之间进行切换。默认情况下(对于向下兼容来说),standaloneEnabled设置为 true 。使用这个默认值的结果是,如果从一个服务器开始,集群将不允许增长,如果从多个服务器开始,它将不允许缩小到少于两个参与者。

Setting the flag to false instructs the system to run the Distributed software stack even if there is only a single participant in the ensemble. To achieve this the (static) configuration file should contain:

将标志设置为 false 会指示系统以分布式方式运行,即使集合中只有一个参与者。为了实现这一点,(静态)配置文件应该包含:

standaloneEnabled=false

With this setting it is possible to start a ZooKeeper ensemble containing a single participant and to dynamically grow it by adding more servers. Similarly, it is possible to shrink an ensemble so that just a single participant remains, by removing servers.

有了这个设置,就可以启动一个包含单个参与者的 ZooKeeper 集群,并通过添加更多的服务器来动态地增长它。同样也可以通过移除服务器缩小它,甚至只保留一个参与者。

Since running the Distributed mode allows more flexibility, we recommend setting the flag to false. We expect that the legacy Standalone mode will be deprecated in the future.

由于运行分布式模式允许更多的灵活性,我们建议将标志设置为 false。我们希望,传统的独立模式将在未来被弃用。

The reconfigEnabled flag 重新配置启用的标志

Starting with 3.5.0 and prior to 3.5.3, there is no way to disable dynamic reconfiguration feature. We would like to offer the option of disabling reconfiguration feature because with reconfiguration enabled, we have a security concern that a malicious actor can make arbitrary changes to the configuration of a ZooKeeper ensemble, including adding a compromised server to the ensemble. We prefer to leave to the discretion of the user to decide whether to enable it or not and make sure that the appropriate security measure are in place. So in 3.5.3 the reconfigEnabled configuration option is introduced such that the reconfiguration feature can be completely disabled and any attempts to reconfigure a cluster through reconfig API with or without authentication will fail by default, unless reconfigEnabled is set to true.

从3.5.0开始,在3.5.3之前,没有办法禁用动态重新配置功能。我们希望提供禁用重新配置功能的选项,因为启用重新配置后,我们有一个安全问题,即恶意参与者可以对 ZooKeeper 集成的配置进行任意更改,包括在集成中添加一个被破坏的服务器。我们倾向于让用户自行决定是否启用它,并确保适当的安全措施到位。因此在3.5.3中引入了 reconfiggenabled配置选项,这样可以完全禁用重新配置功能,并且默认情况下,任何通过重新配置 API 配置集群的尝试都会失败,无论是否使用身份验证,除非 reconfiggenabled设置为 true。

To set the option to true, the configuration file (zoo.cfg) should contain:

要将该选项设置为 true,配置文件(zoo.cfg)应该包含:

reconfigEnabled=true

Dynamic configuration file 动态配置文件

Starting with 3.5.0 we’re distinguishing between dynamic configuration parameters, which can be changed during runtime, and static configuration parameters, which are read from a configuration file when a server boots and don’t change during its execution. For now, the following configuration keywords are considered part of the dynamic configuration: server, group and weight.

从3.5.0开始,我们区分了动态配置参数和静态配置参数,前者可以在运行时更改,后者可以在服务器启动时从配置文件中读取,而且在执行过程中不会更改。现在,以下配置关键字被认为是动态配置的一部分: server、 group 和 weight。

Dynamic configuration parameters are stored in a separate file on the server (which we call the dynamic configuration file). This file is linked from the static config file using the new dynamicConfigFile keyword.

动态配置参数存储在服务器上的一个单独的文件中(我们称之为动态配置文件)。该文件使用新的dynamicConfigFile 关键字从静态配置文件链接。

Example

zoo_replicated1.cfg

tickTime=2000
dataDir=/zookeeper/data/zookeeper1
initLimit=5
syncLimit=2
dynamicConfigFile=/zookeeper/conf/zoo_replicated1.cfg.dynamic

zoo_replicated1.cfg.dynamic

server.1=125.23.63.23:2780:2783:participant;2791
server.2=125.23.63.24:2781:2784:participant;2792
server.3=125.23.63.25:2782:2785:participant;2793

When the ensemble configuration changes, the static configuration parameters remain the same. The dynamic parameters are pushed by ZooKeeper and overwrite the dynamic configuration files on all servers. Thus, the dynamic configuration files on the different servers are usually identical (they can only differ momentarily when a reconfiguration is in progress, or if a new configuration hasn’t propagated yet to some of the servers). Once created, the dynamic configuration file should not be manually altered. Changed are only made through the new reconfiguration commands outlined below. Note that changing the config of an offline cluster could result in an inconsistency with respect to configuration information stored in the ZooKeeper log (and the special configuration znode, populated from the log) and is therefore highly discouraged.

当集成配置发生变化时,静态配置参数保持不变。动态参数由 ZooKeeper 推送,并覆盖所有服务器上的动态配置文件。因此,不同服务器上的动态配置文件通常是相同的(只有在重新配置正在进行时,或者新配置尚未传播到某些服务器时,它们才会暂时不同)。创建后,不应手动更改动态配置文件,仅通过下面概述的新的重新配置命令进行更改。请注意,更改离线集群的配置可能会导致存储在 ZooKeeper 日志中的配置信息(以及从日志中填充的特殊配置 znode)不一致,因此非常不鼓励这样做。

Example 2

Users may prefer to initially specify a single configuration file. The following is thus also legal:

用户可能更喜欢在开始时指定一个配置文件,因此以下内容也是合法的:

zoo_replicated1.cfg

tickTime=2000
dataDir=/zookeeper/data/zookeeper1
initLimit=5
syncLimit=2
clientPort=

The configuration files on each server will be automatically split into dynamic and static files, if they are not already in this format. So the configuration file above will be automatically transformed into the two files in Example 1. Note that the clientPort and clientPortAddress lines (if specified) will be automatically removed during this process, if they are redundant (as in the example above). The original static configuration file is backed up (in a .bak file).

如果它们还不是如示例1标准的格式,每个服务器上的配置文件将自动分为动态和静态文件。因此,上面的配置文件将自动转换为示例1中的两个文件。注意,clientPort 和 clientPortAddress 行(如果指定的话)将在这个过程中自动删除。原始静态配置文件以.Bak 文件形式备份。

Backward compatibility 向下兼容

We still support the old configuration format. For example, the following configuration file is acceptable (but not recommended):

我们仍然支持旧的配置格式。例如,下面的配置文件是可以接受的(但不推荐) :

zoo_replicated1.cfg

tickTime=2000
dataDir=/zookeeper/data/zookeeper1
initLimit=5
syncLimit=2
clientPort=2791
server.1=125.23.63.23:2780:2783:participant
server.2=125.23.63.24:2781:2784:participant
server.3=125.23.63.25:2782:2785:participant

During boot, a dynamic configuration file is created and contains the dynamic part of the configuration as explained earlier. In this case, however, the line “clientPort=2791” will remain in the static configuration file of server 1 since it is not redundant – it was not specified as part of the “server.1=…” using the format explained in the section Changes to Configuration Format. If a reconfiguration is invoked that sets the client port of server 1, we remove “clientPort=2791” from the static configuration file (the dynamic file now contain this information as part of the specification of server 1).

在启动期间,将创建一个动态配置文件,其中包含前面解释的配置的动态部分。但是,在这种情况下,clientPort = 2791一行将保留在server1的静态配置文件中,因为它不是冗余的——它没有使用“更改配置格式”一节中解释的格式指定为server. 1 = ...的一部分。如果调用一个重新配置来设置server1的客户端,我们将从静态配置文件中删除clientPort = 2791(动态文件现在包含这个信息,作为server1规范的一部分)。

Upgrading to 3.5.0 升级到3.5.0

Upgrading a running ZooKeeper ensemble to 3.5.0 should be done only after upgrading your ensemble to the 3.4.6 release. Note that this is only necessary for rolling upgrades (if you’re fine with shutting down the system completely, you don’t have to go through 3.4.6). If you attempt a rolling upgrade without going through 3.4.6 (for example from 3.4.5), you may get the following error:

升级到3.5.0版本只能在升级到3.4.6版本之后。请注意,这只是滚动升级所必需的(如果您对完全关闭系统没有意见,则不必通过3.4.6)。如果您尝试滚动升级而没有通过3.4.6(例如从3.4.5) ,您可能会得到以下错误:

2013-01-30 11:32:10,663 [myid:2] - INFO [localhost/127.0.0.1:2784:QuorumCnxManager$Listener@498] - Received connection request /127.0.0.1:60876
2013-01-30 11:32:10,663 [myid:2] - WARN [localhost/127.0.0.1:2784:QuorumCnxManager@349] - Invalid server id: -65536

During a rolling upgrade, each server is taken down in turn and rebooted with the new 3.5.0 binaries. Before starting the server with 3.5.0 binaries, we highly recommend updating the configuration file so that all server statements “server.x=…” contain client ports (see the section Specifying the client port). As explained earlier you may leave the configuration in a single file, as well as leave the clientPort/clientPortAddress statements (although if you specify client ports in the new format, these statements are now redundant).

在滚动升级期间,每个服务器依次关闭并使用新的3.5.0二进制文件重新启动。在使用3.5.0二进制文件启动服务器之前,我们强烈建议更新配置文件,使所有服务器语句server.x = ...包含客户端端口(参见指定客户端端口一节)。如前所述,您可以将配置保留在单个文件中,也可以保留 clientPort/clientPortAddress 语句(尽管如果您以新的格式指定客户端,这些语句现在是多余的)。

Dynamic Reconfiguration of the ZooKeeper Ensemble

ZooKeeper集群的动态重构

The ZooKeeper Java and C API were extended with getConfig and reconfig commands that facilitate reconfiguration. Both commands have a synchronous (blocking) variant and an asynchronous one. We demonstrate these commands here using the Java CLI, but note that you can similarly use the C CLI or invoke the commands directly from a program just like any other ZooKeeper command.

通过 getConfigreconfig命令对 ZooKeeper的Java版 和 C版 API 进行了扩展,这些命令有助于重新配置。两个命令都有一个同步(阻塞)变体和一个异步命令。我们在这里使用 Java CLI 演示这些命令,但是请注意,您也可以类似地使用 C CLI或者像其他 ZooKeeper 命令一样直接从程序调用命令。

API

There are two sets of APIs for both Java and C client.

Java 和 C 客户端都有两套 API。

  • Reconfiguration API : Reconfiguration API is used to reconfigure the ZooKeeper cluster. Starting with 3.5.3, reconfiguration Java APIs are moved into ZooKeeperAdmin class from ZooKeeper class, and use of this API requires ACL setup and user authentication (see Security for more information.).

    Reconfiguration API: Reconfiguration API用于重新配置 ZooKeeper 集群。从3.5.3开始,重新配置的 Java API 被从 ZooKeeper 类移动到 ZooKeeperAdmin类中,使用这个 API 需要 ACL 设置和用户身份验证.

  • Get Configuration API : Get configuration APIs are used to retrieve ZooKeeper cluster configuration information stored in /zookeeper/config znode. Use of this API does not require specific setup or authentication, because /zookeeper/config is readable to any users.

    Get Configuration API: Get Configuration API 用于检索存储在/ZooKeeper/config znode中的 ZooKeeper 集群配置信息。使用这个 API 不需要特定的设置或身份验证,因为/zookeeper/config 对于任何用户都是可读的。

Security

Prior to 3.5.3, there is no enforced security mechanism over reconfig so any ZooKeeper clients that can connect to ZooKeeper server ensemble will have the ability to change the state of a ZooKeeper cluster via reconfig. It is thus possible for a malicious client to add compromised server to an ensemble, e.g., add a compromised server, or remove legitimate servers. Cases like these could be security vulnerabilities on a case by case basis.

在3.5.3之前,没有针对重新配置的强制安全机制,因此任何可以连接到 ZooKeeper 服务器集成的 ZooKeeper 客户端都可以通过重新配置来更改 ZooKeeper 集群的状态。因此,恶意客户端有可能向集群中添加受损服务器,例如,添加受损服务器,或删除合法服务器。像这样的情况可能是基于具体情况的安全漏洞。

To address this security concern, we introduced access control over reconfig starting from 3.5.3 such that only a specific set of users can use reconfig commands or APIs, and these users need be configured explicitly. In addition, the setup of ZooKeeper cluster must enable authentication so ZooKeeper clients can be authenticated.

为了解决这个安全问题,我们从3.5.3开始引入了对重新配置的访问控制,这样只有一组特定的用户可以使用重新配置命令或 API,并且这些用户需要显式配置。此外,ZooKeeper 集群的设置必须启用身份验证,以便 ZooKeeper 客户端能够进行身份验证。

We also provide an escape hatch for users who operate and interact with a ZooKeeper ensemble in a secured environment (i.e. behind company firewall). For those users who want to use reconfiguration feature but don’t want the overhead of configuring an explicit list of authorized user for reconfig access checks, they can set “skipACL” to “yes” which will skip ACL check and allow any user to reconfigure cluster.

我们还为那些在安全的环境(即公司防火墙之后)中操作和与 ZooKeeper 集成交互的用户提供了一个逃生窗口。对于那些想要使用重新配置特性但不想为重新配置访问检查配置明确的授权用户列表的用户,他们可以将skipACL设置为“ yes”,这将跳过 ACL 检查,允许任何用户重新配置集群。

Overall, ZooKeeper provides flexible configuration options for the reconfigure feature that allow a user to choose based on user’s security requirement. We leave to the discretion of the user to decide appropriate security measure are in place.

总的来说,ZooKeeper 为重新配置功能提供了灵活的配置选项,允许用户根据用户的安全需求进行选择。我们让用户自行决定适当的安全措施是否到位。

  • Access Control : The dynamic configuration is stored in a special znode ZooDefs.CONFIG_NODE = /zookeeper/config. This node by default is read only for all users, except super user and users that’s explicitly configured for write access. Clients that need to use reconfig commands or reconfig API should be configured as users that have write access to CONFIG_NODE. By default, only the super user has full control including write access to CONFIG_NODE. Additional users can be granted write access through superuser by setting an ACL that has write permission associated with specified user. A few examples of how to setup ACLs and use reconfiguration API with authentication can be found in ReconfigExceptionTest.java and TestReconfigServer.cc.

    访问控制: 动态配置存储在一个特殊的 znode ZooDefs.CONFIG \_ node =/zookeeper/config 中。此节点默认情况下只对所有用户读取,但超级用户和显式配置为写访问的用户除外。需要使用重新配置命令或重新配置 API 的客户端应配置为具有对 config_node 的写访问权限的用户。默认情况下,只有超级用户拥有完全的控制权,包括对 config_node 的写访问权。通过设置具有与指定用户关联的写权限的 ACL,可以通过超级用户授予其他用户写访问权限。在 ReconfigExceptionTest.javatestreconfigserver.cc 中可以找到一些关于如何设置 ACL 和使用重新配置 API 进行身份验证的例子。

  • Authentication : Authentication of users is orthogonal to the access control and is delegated to existing authentication mechanism supported by ZooKeeper’s pluggable authentication schemes. See ZooKeeper and SASL for more details on this topic.

    身份验证: 用户的身份验证与访问控制是正交的,并被委托给由 ZooKeeper 的可插入身份验证方案支持的现有身份验证机制。请参阅 ZooKeeper 和 SASL 了解更多关于这个主题的详细信息。

  • Disable ACL check : ZooKeeper supports “skipACL” option such that ACL check will be completely skipped, if skipACL is set to “yes”. In such cases any unauthenticated users can use reconfig API.

    禁用 ACL 检查: ZooKeeper 支持skipACL选项,这样,如果 skipACL设置为“ yes”,ACL 检查将被完全跳过。在这种情况下,任何未经身份验证的用户都可以使用 reconfig API

Retrieving the current dynamic configuration

检索当前动态配置

The dynamic configuration is stored in a special znode ZooDefs.CONFIG_NODE = /zookeeper/config. The new config CLI command reads this znode (currently it is simply a wrapper to get /zookeeper/config). As with normal reads, to retrieve the latest committed value you should do a sync first.

动态配置存储在一个特殊的 znode ZooDefs.CONFIG \_ node =/zookeeper/config 中。新的 config CLI命令读取这个 znode (目前它只是一个 get/zookeeper/config的包装)。与正常读取一样,要检索最新提交的值,您应该首先进行同步。

[zk: 127.0.0.1:2791(CONNECTED) 3] config
server.1=localhost:2780:2783:participant;localhost:2791
server.2=localhost:2781:2784:participant;localhost:2792
server.3=localhost:2782:2785:participant;localhost:2793

Notice the last line of the output. This is the configuration version. The version equals to the zxid of the reconfiguration command which created this configuration. The version of the first established configuration equals to the zxid of the NEWLEADER message sent by the first successfully established leader. When a configuration is written to a dynamic configuration file, the version automatically becomes part of the filename and the static configuration file is updated with the path to the new dynamic configuration file. Configuration files corresponding to earlier versions are retained for backup purposes.

注意输出的最后一行。这是配置版本。版本等于创建此配置的重新配置命令的 zxid。第一个建立的配置的版本等于第一个成功建立的领导者发送的 NEWLEADER 消息的 zxid。当配置写入动态配置文件时,版本自动成为文件名的一部分,静态配置文件随着新的动态配置文件的路径更新。保留与早期版本相对应的配置文件以备份。

During boot time the version (if it exists) is extracted from the filename. The version should never be altered manually by users or the system administrator. It is used by the system to know which configuration is most up-to-date. Manipulating it manually can result in data loss and inconsistency.

在启动期间,从文件名中提取版本(如果存在的话)。这个版本永远不应该被用户或者系统管理员修改。系统使用它来知道哪种配置是最新的。手动操作可能导致数据丢失和不一致。

Just like a get command, the config CLI command accepts the -w flag for setting a watch on the znode, and -s flag for displaying the Stats of the znode. It additionally accepts a new flag -c which outputs only the version and the client connection string corresponding to the current configuration. For example, for the configuration above we would get:

与 get 命令一样,config CLI 命令接受 -w标志用于在 znode 上设置watch ,-s 标志用于显示 znode 的 Stats。它还接受一个新的标志 -c-c只输出与当前配置对应的版本和客户端连接字符串。例如,对于上面的配置,我们会得到:

[zk: 127.0.0.1:2791(CONNECTED) 17] config -c
400000003 localhost:2791,localhost:2793,localhost:2792

Note that when using the API directly, this command is called getConfig.

注意,当直接使用 API 时,这个命令称为 getConfig。

As any read command it returns the configuration known to the follower to which your client is connected, which may be slightly out-of-date. One can use the sync command for stronger guarantees. For example using the Java API:

与任何 read 命令一样,它返回客户端所连接的跟随者已知的配置,这可能有点过时。你可以使用同步命令来获得更强的保证。例如使用 Java API:

zk.sync(ZooDefs.CONFIG_NODE, void_callback, context);
zk.getConfig(watcher, callback, context);

Note: in 3.5.0 it doesn’t really matter which path is passed to the sync() command as all the server’s state is brought up to date with the leader (so one could use a different path instead of ZooDefs.CONFIG_NODE). However, this may change in the future.

注意: 在3.5.0中,传递给 sync ()命令的路径并不重要,因为所有服务器的状态都是最新的(因此可以使用不同的路径而不是ZooDefs.CONFIG \_ NODE)。然而,这在未来可能会改变。

Modifying the current dynamic configuration

修改当前的动态配置

Modifying the configuration is done through the reconfig command. There are two modes of reconfiguration: incremental and non-incremental (bulk). The non-incremental simply specifies the new dynamic configuration of the system. The incremental specifies changes to the current configuration. The reconfig command returns the new configuration.

修改配置是通过 reconfig 命令完成的。重新配置有两种模式: 增量和非增量(批量)。非增量式只是简单地指定系统的新的动态配置。增量指定对当前配置的更改。reconfig 命令返回新的配置。

A few examples are in: ReconfigTest.java, ReconfigRecoveryTest.java and TestReconfigServer.cc.

下面是一些例子: ReconfigTest.java,reconfgrecoverytest.java 和 testreconfigserver.cc。

General

Removing servers: Any server can be removed, including the leader (although removing the leader will result in a short unavailability, see Figures 6 and 8 in the paper). The server will not be shut-down automatically. Instead, it becomes a “non-voting follower”. This is somewhat similar to an observer in that its votes don’t count towards the Quorum of votes necessary to commit operations. However, unlike a non-voting follower, an observer doesn’t actually see any operation proposals and does not ACK them. Thus a non-voting follower has a more significant negative effect on system throughput compared to an observer. Non-voting follower mode should only be used as a temporary mode, before shutting the server down, or adding it as a follower or as an observer to the ensemble. We do not shut the server down automatically for two main reasons. The first reason is that we do not want all the clients connected to this server to be immediately disconnected, causing a flood of connection requests to other servers. Instead, it is better if each client decides when to migrate independently. The second reason is that removing a server may sometimes (rarely) be necessary in order to change it from “observer” to “participant” (this is explained in the section Additional comments).

删除服务器: 可以删除任何服务器,包括领导者(尽管删除领导者会导致短暂的不可用)。服务器不会自动关闭,相反,它变成了一个“无投票权的追随者”。这有点类似于观察员,因为它的投票不计入提交操作所需的法定人数。然而,不像一个无投票权的跟随者,一个观察者实际上不会看到任何操作建议,也不会对它们进行 ACK。因此,与观察者相比,无投票关注者对系统吞吐量的负面影响更为显著。无投票跟随者模式应该只作为一个临时模式,在关闭服务器之前,或添加它作为一个跟随者或作为一个观察者的集合。

我们不会自动关闭服务器,主要有两个原因。第一个原因是,我们不希望连接到此服务器的所有客户端立即断开连接,这会导致大量连接请求涌向其他服务器。相反,如果每个客户端都能独立决定何时进行迁移,那就更好了。第二个原因是,为了将服务器从“观察者”更改为“参与者”,有时(很少)可能需要删除服务器。

Note that the new configuration should have some minimal number of participants in order to be considered legal. If the proposed change would leave the cluster with less than 2 participants and standalone mode is enabled (standaloneEnabled=true, see the section The standaloneEnabled flag), the reconfig will not be processed (BadArgumentsException). If standalone mode is disabled (standaloneEnabled=false) then it’s legal to remain with 1 or more participants.

请注意,新的配置应该有一些最低数量的参与者,以便被视为合法的。如果提议的更改将使集群的参与者少于2个,并且启用了独立模式(standaloneEnabled = true) ,则不会处理重新配置(BadArgumentsException)。如果禁用了独立模式(standalooneenabled = false) ,那么保留一个或多个参与者是合法的。

Adding servers: Before a reconfiguration is invoked, the administrator must make sure that a quorum (majority) of participants from the new configuration are already connected and synced with the current leader. To achieve this we need to connect a new joining server to the leader before it is officially part of the ensemble. This is done by starting the joining server using an initial list of servers which is technically not a legal configuration of the system but (a) contains the joiner, and (b) gives sufficient information to the joiner in order for it to find and connect to the current leader. We list a few different options of doing this safely.

添加服务器: 在调用重新配置之前,管理员必须确保来自新配置的大多数参与者已经连接并与当前领导者同步。为了实现这一点,我们需要在leader正式成为集群的一部分之前连接一个新的连接服务器。这是通过使用服务器的初始列表来启动连接服务器,这在技术上不是系统的合法配置,但(a)包含joiner,并(b)为joiner提供足够的信息,以便它找到并连接到当前的leader。我们列出了几种安全的方法:

  1. Initial configuration of joiners is comprised of servers in the last committed configuration and one or more joiners, where joiners are listed as observers. For example, if servers D and E are added at the same time to (A, B, C) and server C is being removed, the initial configuration of D could be (A, B, C, D) or (A, B, C, D, E), where D and E are listed as observers. Similarly, the configuration of E could be (A, B, C, E) or (A, B, C, D, E), where D and E are listed as observers. Note that listing the joiners as observers will not actually make them observers - it will only prevent them from accidentally forming a quorum with other joiners. Instead, they will contact the servers in the current configuration and adopt the last committed configuration (A, B, C), where the joiners are absent. Configuration files of joiners are backed up and replaced automatically as this happens. After connecting to the current leader, joiners become non-voting followers until the system is reconfigured and they are added to the ensemble (as participant or observer, as appropriate).

    joiners的初始配置由最后提交配置中的服务器和一个或多个joiners组成,其中joiners被列为观察者。例如,如果服务器D和E同时被添加到(A、B、C),而服务器C正在被删除,那么D的初始配置可以是(A、B、C、D)或(A、B、C、D、E),其中D和E被列为观察者。类似地,E的配置可以是(A, B, C, E)或(A, B, C, D, E),其中D和E被列为观察者。请注意,将joiners列表为观察者实际上并不会让它们成为观察者——它只会阻止它们成为观察者。

  2. Initial configuration of each joiner is comprised of servers in the last committed configuration + the joiner itself, listed as a participant. For example, to add a new server D to a configuration consisting of servers (A, B, C), the administrator can start D using an initial configuration file consisting of servers (A, B, C, D). If both D and E are added at the same time to (A, B, C), the initial configuration of D could be (A, B, C, D) and the configuration of E could be (A, B, C, E). Similarly, if D is added and C is removed at the same time, the initial configuration of D could be (A, B, C, D). Never list more than one joiner as participant in the initial configuration (see warning below).

    每个joiner的初始配置由最后提交配置中的服务器+ joiner本身组成,作为参与者列出。例如,添加一个新的服务器D组成的一个配置服务器(A, B, C),管理员可以启动一个使用初始配置文件D组成的服务器(a, B, C, D)。如果两个同时添加D和E (A, B, C), D可能的初始配置(A, B, C, D)和E可能的配置(A, B, C,E).同样,如果添加D的同时删除C, D的初始配置可以是(A, B, C)。

  3. Whether listing the joiner as an observer or as participant, it is also fine not to list all the current configuration servers, as long as the current leader is in the list. For example, when adding D we could start D with a configuration file consisting of just (A, D) if A is the current leader. however this is more fragile since if A fails before D officially joins the ensemble, D doesn’t know anyone else and therefore the administrator will have to intervene and restart D with another server list.

    无论将joiner作为观察者还是参与者列出,也可以不列出所有当前配置服务器,只要当前leader在列表中。例如,当添加D时,如果A是当前的leader,我们可以在D开始时使用一个只包含(A, D)的配置文件。然而,这是更脆弱的,因为如果A在D正式加入集合之前失败,D不认识任何人,因此管理员将不得不干预和重新启动D与另一个服务器列表。

Note 注意
Warning 警告

Never specify more than one joining server in the same initial configuration as participants. Currently, the joining servers don’t know that they are joining an existing ensemble; if multiple joiners are listed as participants they may form an independent quorum creating a split-brain situation such as processing operations independently from your main ensemble. It is OK to list multiple joiners as observers in an initial config.

不要在与参与者相同的初始配置中指定多个加入服务器。目前,加入的服务器不知道他们加入了一个现有的集群; 如果多个加入者被列为参与者,他们可能形成一个独立的法定人数,创造一个裂脑的情况,如独立于你的主集群的处理操作。在初始配置中将多个参与者列为观察者是可以的。

If the configuration of existing servers changes or they become unavailable before the joiner succeeds to connect and learn about configuration changes, the joiner may need to be restarted with an updated configuration file in order to be able to connect.

如果现有服务器的配置发生变化,或者在连接器成功连接并了解配置变化之前变得不可用,那么连接器可能需要使用更新的配置文件重新启动,以便能够连接。

Finally, note that once connected to the leader, a joiner adopts the last committed configuration, in which it is absent (the initial config of the joiner is backed up before being rewritten). If the joiner restarts in this state, it will not be able to boot since it is absent from its configuration file. In order to start it you’ll once again have to specify an initial configuration.

最后,请注意,一旦连接到领导者,joiner 就会采用最后提交的配置,在这个配置中它是不存在的(joiner 的初始配置在被重写之前会得到备份)。如果 joiner 在这种状态下重新启动,它将无法引导,因为它不在其配置文件中。为了启动它,你需要再次指定一个初始配置。

Modifying server parameters: One can modify any of the ports of a server, or its role (participant/observer) by adding it to the ensemble with different parameters. This works in both the incremental and the bulk reconfiguration modes. It is not necessary to remove the server and then add it back; just specify the new parameters as if the server is not yet in the system. The server will detect the configuration change and perform the necessary adjustments. See an example in the section Incremental mode and an exception to this rule in the section Additional comments.

修改服务器参数: 可以修改服务器的任何端口或其角色(参与者/观察者) ,方法是使用不同的参数将其添加到集成中。这在增量和批量重新配置模式下都可以工作。不需要先删除服务器,然后再将其添加回来; 只需指定新的参数,就好像服务器尚未在系统中一样。服务器将检测配置更改并执行必要的调整。请参阅增量模式一节中的示例以及附加注释一节中对此规则的异常。

It is also possible to change the Quorum System used by the ensemble (for example, change the Majority Quorum System to a Hierarchical Quorum System on the fly). This, however, is only allowed using the bulk (non-incremental) reconfiguration mode. In general, incremental reconfiguration only works with the Majority Quorum System. Bulk reconfiguration works with both Hierarchical and Majority Quorum Systems.

还可以更改集合所使用的法定人数系统(例如,动态地将多数法定人数系统更改为分级法定人数系统)。但是,这只允许使用批量(非增量)重新配置模式。一般来说,增量重新配置只适用于多数仲裁系统。批量重新配置适用于分级和多数仲裁系统。

Performance Impact: There is practically no performance impact when removing a follower, since it is not being automatically shut down (the effect of removal is that the server’s votes are no longer being counted). When adding a server, there is no leader change and no noticeable performance disruption. For details and graphs please see Figures 6, 7 and 8 in the paper.

性能影响: 删除关注者实际上不会对性能产生影响,因为它不会被自动关闭(删除的结果是服务器的投票不再被计算)。当添加服务器时,不会有领导者更改,也不会有明显的性能中断。

The most significant disruption will happen when a leader change is caused, in one of the following cases:

最严重的混乱会发生在领导人变更的时候,在下列情况之一:

  1. Leader is removed from the ensemble. 领导者被逐出集群
  2. Leader’s role is changed from participant to observer. 领导者的角色由参与者转变为观察者
  3. The port used by the leader to send transactions to others (quorum port) is modified. 领导者用于将事务发送给其他人的端口(仲裁端口)被修改。

In these cases we perform a leader hand-off where the old leader nominates a new leader. The resulting unavailability is usually shorter than when a leader crashes since detecting leader failure is unnecessary and electing a new leader can usually be avoided during a hand-off (see Figures 6 and 8 in the paper).

在这种情况下,我们进行领导者交接,由老领导人提名新领导人。由于检测引导失败是不必要的,因此在交接过程中通常可以避免选择一个新的领导者失败,所以由此产生的不可用性通常比引导失败要短。

When the client port of a server is modified, it does not drop existing client connections. New connections to the server will have to use the new client port.

当修改服务器的客户端端口时,它不会删除现有的客户端连接。到服务器的新连接必须使用新的客户端端口。

Progress guarantees: Up to the invocation of the reconfig operation, a quorum of the old configuration is required to be available and connected for ZooKeeper to be able to make progress. Once reconfig is invoked, a quorum of both the old and of the new configurations must be available. The final transition happens once (a) the new configuration is activated, and (b) all operations scheduled before the new configuration is activated by the leader are committed. Once (a) and (b) happen, only a quorum of the new configuration is required. Note, however, that neither (a) nor (b) are visible to a client. Specifically, when a reconfiguration operation commits, it only means that an activation message was sent out by the leader. It does not necessarily mean that a quorum of the new configuration got this message (which is required in order to activate it) or that (b) has happened. If one wants to make sure that both (a) and (b) has already occurred (for example, in order to know that it is safe to shut down old servers that were removed), one can simply invoke an update (set-data, or some other quorum operation, but not a sync) and wait for it to commit. An alternative way to achieve this was to introduce another round to the reconfiguration protocol (which, for simplicity and compatibility with Zab, we decided to avoid).

进度保证: 在调用 reconfig 操作之前,需要有旧配置的 quorum 可用并连接到 ZooKeeper 以便能够取得进展。一旦重新配置被调用,旧配置和新配置的仲裁必须可用。最后的转换发生在(a)新配置被激活,(b)所有在新配置被领导者激活之前计划的操作被提交之后。一旦(a)和(b)发生,只需要新配置的仲裁人数。但是请注意,客户机不能看到(a)或(b)。具体地说,当重新配置操作提交时,这只意味着激活消息是由领导者发出的。这并不一定意味着新配置的 quorum 获得了这个消息(激活它需要这个消息)或者(b)已经发生。如果希望确保(a)和(b)都已发生(例如,为了知道关闭已删除的旧服务器是安全的) ,可以简单地调用一个更新(set-data 或其他仲裁操作,但不是同步) ,并等待它提交。实现这一点的另一种方法是引入另一轮重新配置协议(为了简单和与 Zab 的兼容性,我们决定避免)。

Incremental mode

增量模式

The incremental mode allows adding and removing servers to the current configuration. Multiple changes are allowed. For example:

增量模式允许在当前配置中添加和删除服务器。允许多次更改。例如:

> reconfig -remove 3 -add
server.5=125.23.63.23:1234:1235;1236

Both the add and the remove options get a list of comma separated arguments (no spaces):

Add 和 remove 选项都会得到一个以逗号分隔的参数列表(没有空格) :

> reconfig -remove 3,4 -add
server.5=localhost:2111:2112;2113,6=localhost:2114:2115:observer;2116

The format of the server statement is exactly the same as described in the section Specifying the client port and includes the client port. Notice that here instead of “server.5=” you can just say “5=”. In the example above, if server 5 is already in the system, but has different ports or is not an observer, it is updated and once the configuration commits becomes an observer and starts using these new ports. This is an easy way to turn participants into observers and vice versa or change any of their ports, without rebooting the server.

服务器语句的格式与指定客户端一节中描述的格式完全相同,并且包含客户端。注意,这里可以直接说“5 =”,而不是“ server. 5 =”。在上面的例子中,如果服务器5已经在系统中,但是有不同的端口或者不是观察者,那么它将被更新,一旦配置提交成为观察者并开始使用这些新端口。这是一种简单的方法,可以将参与者转换为观察者,反之亦然,或者更改其任何端口,而无需重启服务器。

ZooKeeper supports two types of Quorum Systems – the simple Majority system (where the leader commits operations after receiving ACKs from a majority of voters) and a more complex Hierarchical system, where votes of different servers have different weights and servers are divided into voting groups. Currently, incremental reconfiguration is allowed only if the last proposed configuration known to the leader uses a Majority Quorum System (BadArgumentsException is thrown otherwise).

ZooKeeper 支持两种类型的 Quorum 系统——简单多数系统(领导者在收到大多数投票者的 ack 后进行操作)和更复杂的分层系统,不同服务器的投票有不同的权重,服务器被分成投票组。目前,只有在领导者知道的最后一个提议配置使用了多数 Quorum 系统时,才允许增量重新配置(否则将引发 BadArgumentsException)。

Incremental mode - examples using the Java API:

增量模式-使用 java API 的例子:

List<String> leavingServers = new ArrayList<String>();
leavingServers.add("1");
leavingServers.add("2");
byte[] config = zk.reconfig(null, leavingServers, null, -1, new Stat());List<String> leavingServers = new ArrayList<String>();
List<String> joiningServers = new ArrayList<String>();
leavingServers.add("1");
joiningServers.add("server.4=localhost:1234:1235;1236");
byte[] config = zk.reconfig(joiningServers, leavingServers, null, -1, new Stat());String configStr = new String(config);
System.out.println(configStr);

There is also an asynchronous API, and an API accepting comma separated Strings instead of List. See src/java/main/org/apache/zookeeper/ZooKeeper.java.

还有一个异步 API 和一个接受逗号分隔的字符串而不是 List 的 API。

Non-incremental mode

非增量模式

The second mode of reconfiguration is non-incremental, whereby a client gives a complete specification of the new dynamic system configuration. The new configuration can either be given in place or read from a file:

重新配置的第二种模式是非增量的,客户机提供了新的动态系统配置的完整规范。新的配置可以就地给出,也可以从文件中读取:

> reconfig -file newconfig.cfg

//newconfig.cfg is a dynamic config file, see Dynamic configuration file

> reconfig -members
server.1=125.23.63.23:2780:2783:participant;2791,server.2=125.23.63.24:2781:2784:participant;2792,server.3=125.23.63.25:2782:2785:participant;2793}}

The new configuration may use a different Quorum System. For example, you may specify a Hierarchical Quorum System even if the current ensemble uses a Majority Quorum System.

新配置可能使用不同的仲裁系统。例如,即使当前集合使用多数法定人数系统,您也可以指定分级法定人数系统。

Bulk mode - example using the Java API:

批量模式-使用 java API 的示例:

List<String> newMembers = new ArrayList<String>();
newMembers.add("server.1=1111:1234:1235;1236");
newMembers.add("server.2=1112:1237:1238;1239");
newMembers.add("server.3=1114:1240:1241:observer;1242");byte[] config = zk.reconfig(null, null, newMembers, -1, new Stat());String configStr = new String(config);
System.out.println(configStr);

There is also an asynchronous API, and an API accepting comma separated String containing the new members instead of List. See src/java/main/org/apache/zookeeper/ZooKeeper.java.

还有一个异步 API 和一个 API 接受逗号分隔的 String,其中包含新成员,而不是 List。参见 src/java/main/org/apache/zookeeper/ZooKeeper.java

Conditional reconfig 条件重新配置

Sometimes (especially in non-incremental mode) a new proposed configuration depends on what the client “believes” to be the current configuration, and should be applied only to that configuration. Specifically, the reconfig succeeds only if the last configuration at the leader has the specified version.

有时(特别是在非增量模式下) ,新提出的配置取决于客户端“认为”是当前配置,并且应该仅应用于该配置。具体地说,只有在领导者的最后一个配置具有指定的版本时,重新配置才会成功。

> reconfig -file <filename> -v <version>

In the previously listed Java examples, instead of -1 one could specify a configuration version to condition the reconfiguration.

在前面列出的 Java 示例中,可以指定一个配置版本来控制重新配置,而不是 -1。

Error conditions

错误条件

In addition to normal ZooKeeper error conditions, a reconfiguration may fail for the following reasons:

除了通常的 ZooKeeper 错误情况外,重新配置可能会失败,原因如下:

  1. another reconfig is currently in progress (ReconfigInProgress) 另一个配置当前正在进行。

  2. the proposed change would leave the cluster with less than 2 participants, in case standalone mode is enabled, or, if standalone mode is disabled then its legal to remain with 1 or more participants (BadArgumentsException)

    如果启用了独立模式,建议集群少于2个参与者,如果禁用独立模式,则集群可以合法地保留1个或多个参与者(BadArgumentsException)

  3. no quorum of the new configuration was connected and up-to-date with the leader when the reconfiguration processing began (NewConfigNoQuorum)

    在重新配置处理开始时,新配置没有与领导者连接并保持最新(NewConfigNoQuorum)

  4. -v x was specified, but the version y of the latest configuration is not x (BadVersionException)

  5. an incremental reconfiguration was requested but the last configuration at the leader uses a Quorum System which is different from the Majority system (BadArgumentsException)

    请求增量重新配置,但领导者的最后配置使用 Quorum 系统,该系统不同于多数系统。

  6. syntax error (BadArgumentsException) 语法错误

  7. I/O exception when reading the configuration from a file (BadArgumentsException) 从文件读取配置时发生I/O异常(BadArgumentsException)

Most of these are illustrated by test-cases in ReconfigFailureCases.java.

其中大多数用 ReconfigFailureCases.java 中的测试用例来说明。

Additional comments 其他评论

Liveness: To better understand the difference between incremental and non-incremental reconfiguration, suppose that client C1 adds server D to the system while a different client C2 adds server E. With the non-incremental mode, each client would first invoke config to find out the current configuration, and then locally create a new list of servers by adding its own suggested server. The new configuration can then be submitted using the non-incremental reconfig command. After both reconfigurations complete, only one of E or D will be added (not both), depending on which client’s request arrives second to the leader, overwriting the previous configuration. The other client can repeat the process until its change takes effect. This method guarantees system-wide progress (i.e., for one of the clients), but does not ensure that every client succeeds. To have more control C2 may request to only execute the reconfiguration in case the version of the current configuration hasn’t changed, as explained in the section Conditional reconfig. In this way it may avoid blindly overwriting the configuration of C1 if C1’s configuration reached the leader first.

Liveness: 为了更好地理解增量配置和非增量配置之间的区别,假设客户端 C1向系统添加服务器 D,而不同的客户端C2添加服务器 E。然后可以使用非增量重新配置命令提交新配置。在两次重新配置完成之后,将只添加 E 或 D中的一个(不是两个) ,这取决于哪个客户机的请求次于领导者,覆盖以前的配置。另一个客户端可以重复该过程,直到其更改生效。这种方法可以保证系统范围的进度(例如,对于其中一个客户端) ,但不能保证每个客户端都成功。为了获得更多的控制权 C2可能会请求只在当前配置版本没有更改的情况下执行重新配置,如条件重新配置一节所解释的那样。这样,如果 C1的配置先到达前端,就可以避免盲目覆盖 C1的配置。

With incremental reconfiguration, both changes will take effect as they are simply applied by the leader one after the other to the current configuration, whatever that is (assuming that the second reconfig request reaches the leader after it sends a commit message for the first reconfig request – currently the leader will refuse to propose a reconfiguration if another one is already pending). Since both clients are guaranteed to make progress, this method guarantees stronger liveness. In practice, multiple concurrent reconfigurations are probably rare. Non-incremental reconfiguration is currently the only way to dynamically change the Quorum System. Incremental configuration is currently only allowed with the Majority Quorum System.

通过增量重新配置,这两个更改都会生效,因为它们只是由领导者一个接一个地应用到当前配置中,不管是什么(假设第二个重新配置请求在发送第一个重新配置请求的提交消息后到达领导者——如果另一个已经挂起,领导者将拒绝提出重新配置)。既然两个客户端都能保证进步,这种方法就保证了更强的活性。在实践中,多次并发重新配置可能很少见。非增量重配置是目前动态更改 Quorum 系统的唯一方法。增量配置目前只允许使用多数仲裁系统。

Changing an observer into a follower: Clearly, changing a server that participates in voting into an observer may fail if error (2) occurs, i.e., if fewer than the minimal allowed number of participants would remain. However, converting an observer into a participant may sometimes fail for a more subtle reason: Suppose, for example, that the current configuration is (A, B, C, D), where A is the leader, B and C are followers and D is an observer. In addition, suppose that B has crashed. If a reconfiguration is submitted where D is said to become a follower, it will fail with error (3) since in this configuration, a majority of voters in the new configuration (any 3 voters), must be connected and up-to-date with the leader. An observer cannot acknowledge the history prefix sent during reconfiguration, and therefore it does not count towards these 3 required servers and the reconfiguration will be aborted. In case this happens, a client can achieve the same task by two reconfig commands: first invoke a reconfig to remove D from the configuration and then invoke a second command to add it back as a participant (follower). During the intermediate state D is a non-voting follower and can ACK the state transfer performed during the second reconfig command.

将一个观察者变成一个跟随者: 很明显,如果出现错误(2) ,也就是说,如果少于允许的最小参与者数量,将参与投票的服务器变成一个观察者可能会失败。然而,将观察者转化为参与者有时可能会因为一个更微妙的原因而失败: 例如,假设当前配置是(A,B,C,D) ,其中 A 是领导者,B 和 C 是追随者,D 是观察者。另外,假设 B 已经崩溃。如果一个重新配置被提交,其中 D 被称为追随者,它将失败与错误(3) ,因为在这种配置中,在新的配置中的大多数选民(任何3个选民) ,必须与领导者连接和更新。观察者不能确认在重新配置期间发送的历史前缀,因此它不计入这3个必需的服务器,重新配置将被中止。在这种情况下,客户端可以通过两个重新配置命令来实现相同的任务: 首先调用一个重新配置来从配置中删除 D,然后调用第二个命令将其作为参与者(follower)添加回来。在中间态,D 是一个无投票权的追随者,可以在第二次配置命令执行的状态转移进行 ACK。

Rebalancing Client Connections 重新平衡客户端连接

When a ZooKeeper cluster is started, if each client is given the same connection string (list of servers), the client will randomly choose a server in the list to connect to, which makes the expected number of client connections per server the same for each of the servers. We implemented a method that preserves this property when the set of servers changes through reconfiguration. See Sections 4 and 5.1 in the paper.

当 ZooKeeper 集群启动时,如果给每个客户端相同的连接字符串(服务器列表) ,客户端将在列表中随机选择要连接的服务器,这使得每个服务器的预期客户端连接数相同。我们实现了一个方法,当服务器集通过重新配置发生变化时,该方法保留了这个属性。

In order for the method to work, all clients must subscribe to configuration changes (by setting a watch on /zookeeper/config either directly or through the getConfig API command). When the watch is triggered, the client should read the new configuration by invoking sync and getConfig and if the configuration is indeed new invoke the updateServerList API command. To avoid mass client migration at the same time, it is better to have each client sleep a random short period of time before invoking updateServerList.

为了使该方法正常工作,所有客户端都必须订阅配置更改(通过直接或通过 getConfig API 命令在/zookeeper/config 上设置 watch)。当监视被触发时,客户端应该通过调用 sync 和 getConfig 来读取新的配置,如果配置确实是新的,则调用 updateServerList API 命令。为了避免同时进行大规模客户机迁移,最好在调用 updateServerList 之前让每个客户机随机睡眠一小段时间。

A few examples can be found in: StaticHostProviderTest.java and TestReconfig.cc

Example (this is not a recipe, but a simplified example just to explain the general idea):

public void process(WatchedEvent event) {synchronized (this) {if (event.getType() == EventType.None) {connected = (event.getState() == KeeperState.SyncConnected);notifyAll();} else if (event.getPath()!=null &&  event.getPath().equals(ZooDefs.CONFIG_NODE)) {// in prod code never block the event thread!zk.sync(ZooDefs.CONFIG_NODE, this, null);zk.getConfig(this, this, null);}}
}public void processResult(int rc, String path, Object ctx, byte[] data, Stat stat) {if (path!=null &&  path.equals(ZooDefs.CONFIG_NODE)) {String config[] = ConfigUtils.getClientConfigStr(new String(data)).split(" ");   // similar to config -clong version = Long.parseLong(config[0], 16);if (this.configVersion == null){this.configVersion = version;} else if (version > this.configVersion) {hostList = config[1];try {// the following command is not blocking but may cause the client to close the socket and// migrate to a different server. In practice it's better to wait a short period of time, chosen// randomly, so that different clients migrate at different timeszk.updateServerList(hostList);} catch (IOException e) {System.err.println("Error updating server list");e.printStackTrace();}this.configVersion = version;}}
}

原文链接:https://zookeeper.apache.org/doc/current/zookeeperReconfig.html#sc_reconfig_file

ZooKeeper Dynamic Reconfiguration(ZooKeeper 动态配置重构)相关推荐

  1. ZooKeeper Dynamic Reconfiguration (dynamicConfigFile) ZooKeeper动态配置

    有人翻译的地址:https://www.cnblogs.com/dupang/p/5649843.html ZooKeeper Dynamic Reconfiguration Overview Cha ...

  2. 取代ZooKeeper,Twitter 的动态配置实践

    作者 | Twitter Engineering Blog 译者 | 谢丽 ConfigBus 是 Twitter 的动态配置系统,包括存储配置的数据库.将配置分发到 Twitter 数据中心中的机器 ...

  3. Kubernetes 1.6新特性系列 | 动态配置和存储类

    导读: Dynamic Provisioning的目标是完全自动化存储资源的生命周期管理,让用户无需过多的关注存储的管理,可以按需求自动动态创建和调整存储资源.StorageClass本质上是底层存储 ...

  4. 动态配置接口DRP(Dynamic Reconfiguration Port)

    文章目录 Introduction 一.DRP 寄存器 二.DRP状态机 状态机框图 Introduction Reconfiguration of MMCM or PLL is performed ...

  5. Maven+SpringMVC+Dubbo+zookeeper 简单的入门demo配置

    参考:http://blog.csdn.net/aixiaoyang168/article/details/51362675 dubbo是一个分布式服务框架,致力于提供高性能和透明化的RPC远程服务调 ...

  6. zookeeper工作原理、安装配置、工具命令简介

    1 Zookeeper简介 Zookeeper 是分布式服务框架,主要是用来解决分布式应用中经常遇到的一些数据管理问题,如:统一命名服务.状态同步服务.集群管理.分布式应用配置项的管理等等. ZooK ...

  7. ZooKeeper之(四)配置与命令

    4.1 配置文件 ZooKeeper安装好之后,在安装目录的conf文件夹下可以找到一个名为"zoo_sample.cfg"的文件,是ZooKeeper配置文件的模板. ZooKe ...

  8. 为什么zookeeper集群中节点配置个数是奇数个?

    Zookeeper的大部分操作都是通过选举产生的.比如,标记一个写是否成功是要在超过一半节点发送写请求成功时才认为有效.同样,Zookeeper选择领导者节点也是在超过一半节点同意时才有效.最后,Zo ...

  9. ConfigBus:Twitter的动态配置实践

    动态配置能够在不重新启动应用程序的情况下更改正在运行的系统的行为和功能.理想的动态配置系统使服务开发人员和管理员能够方便地查看和更新配置,并高效可靠地向应用程序提供配置更新.它使组织能够快速.大胆地迭 ...

最新文章

  1. android 单个模块编译的方法
  2. 现代儿童亟待满足的八种需要
  3. UTF-8带BOM和不带BOM的转换
  4. 前端学习(1164):剩余参数02
  5. oracle linux6 u盘安装,U盘安装RHEL6
  6. 【报告分享】2020年618直播带货数据报告.pdf(附下载链接)
  7. SQL Server:关键字搜索
  8. 关于使用MYSQL出现的内存泄漏问题
  9. 教你轻松搞定Vue事件总线(EventBus)
  10. python selenium click 动态加载_python selenium:不要等到click()命令之后加载页面
  11. 记风雨兼程的2020年,2021年,我来了!
  12. c语言中fabs是什么意思,c语言fabs是什么意思
  13. springboot配置logback日志
  14. jPBC 2.0.0配置与测试(补充版)
  15. Cufflinks的使用
  16. Eclipse clearcase plugin
  17. 市面上有哪些免费堡垒机品牌?好用吗?
  18. mqtt 传文件断开连接的原因_mqtt服务器连上就断开
  19. 51单片机真的过时了吗?单片机、ARM、DSP、FPGA/CPLD
  20. 交换机端口镜像的设置

热门文章

  1. pycharm,IDLE,Win10快捷键
  2. 互联网公司招聘公关经理
  3. 最全公关培训资料合集(共57份)
  4. 爱奇艺EPG(复刻平板端)
  5. 《第一堂棒球课》:王牌一垒手·棒球3号位
  6. 【PCL自学:ocTree】八叉树(octree)的原理及应用案例(点云压缩,搜索,空间变化)
  7. win10计算机用户密码,win10台式电脑怎么设置开机密码
  8. 《平面设计前途的金钥匙》0-3年平面设计师必看!
  9. vue-baidu-map 个性化地图
  10. 云计算中的自动化运维技术及其实践