What is the most convenient way to set up a ZooKeeper server 07/12 Update SLTechnology News&Howtos

What is the most convenient way to set up a ZooKeeper server

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about what is the most convenient way to build a ZooKeeper server. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

What is ZooKeeper?

ZooKeeper is a top-level project of Apache, which provides efficient and highly available distributed coordination services for distributed applications, such as data publishing / subscribing, load balancing, naming services, distributed coordination / notification, distributed locking and other distributed basic services. Because of its convenient use, excellent performance and good stability, ZooKeeper is widely used in large-scale distributed systems such as Hadoop, HBase, Kafka and Dubbo.

There are three operation modes of ZooKeeper: stand-alone mode, pseudo-cluster mode and cluster mode.

Stand-alone mode: this mode is generally suitable for the development and test environment, on the one hand, we do not have so many machine resources, on the other hand, the usual development and debugging does not need excellent stability.

Cluster mode: a ZooKeeper cluster is usually composed of a group of machines. Generally, more than 3 machines can form a usable ZooKeeper cluster. Each of the machines that make up the ZooKeeper cluster maintains the current server state in memory, and each machine keeps communicating with each other.

Pseudo-cluster mode: this is a special cluster mode in which all the servers in the cluster are deployed on a single machine. When you have a good machine on hand, if you deploy it in stand-alone mode, resources will be wasted. In this case, ZooKeeper allows you to start multiple ZooKeeper service instances on a single machine by starting different ports, so as to provide external services with the characteristics of cluster.

Related knowledge of ZooKeeper roles in ZooKeeper

Leader (leader): responsible for initiating and deciding on voting, updating system status

Follower (follower): used to receive the client request and return the result to the client, and vote during the selection process

Observer (observer): you can accept client connections and forward write requests to leader, but observer does not participate in the voting process, just to expand the system and improve the speed of reading.

The data Model of role ZooKeeper in ZooKeeper

Hierarchical directory structure, named in accordance with regular file system specifications, similar to Linux

Each node is called Znode in ZooKeeper, and it has a unique path identification

A node Znode can contain data and child nodes, but a node of type EPHEMERAL cannot have child nodes

There can be multiple versions of the data in Znode. For example, there are multiple versions of data stored in a certain path, so you need to bring the version to query the data under this path.

Client applications can set monitors on nodes

Nodes do not support partial read and write, but complete read and write at one time

Node characteristics of ZooKeeper's data model ZooKeeper

ZooKeeper nodes are lifecycle, depending on the type of node. In ZooKeeper, nodes can be divided into persistent nodes (PERSISTENT), temporary nodes (EPHEMERAL) according to their duration, and sequential nodes (SEQUENTIAL) and unordered nodes (unordered by default) according to whether they are ordered or not.

Once a persistent node is created, it will always be saved in ZooKeeper unless it is actively removed (it will not disappear because the session of the client that created the node fails), temporary node.

Application scenarios of ZooKeeper

ZooKeeper is a highly available distributed data management and system coordination framework. Based on the implementation of the Paxos algorithm, the framework ensures the strong consistency of data in the distributed environment, and it is based on this characteristic that ZooKeeper solves many distributed problems.

It is worth noting that ZooKeeper is not naturally designed for these application scenarios, but is a typical usage later explored by many developers according to the characteristics of its framework and using a series of API interfaces (or primitive sets) provided by it.

Data publish and subscribe (configuration Center)

Publish and subscribe model, the so-called configuration center, as the name implies, is that publishers publish data to ZooKeeper nodes for subscribers to obtain data dynamically, and achieve centralized management and dynamic update of configuration information. For example, the global configuration information, the service address list of the service framework, etc., are very suitable for use.

Some configuration information used in the application is put on ZK for centralized management. This kind of scenario usually goes like this: when the application starts, it will actively obtain the configuration, and at the same time, register a Watcher on the node, so that every time there is an update in the configuration, it will notify the subscribing client in real time, thus achieving the goal of obtaining the latest configuration information. In the distributed search service, the meta-information of the index and the node state of the server cluster machine are stored in some designated nodes of the ZooKeeper for each client subscription. Distributed log collection system. The core job of this system is to collect logs distributed on different machines. The collector usually allocates the collection task unit according to the application, so it is necessary to create a node P with the application name as path on the ZooKeeper, and register all the machine IP of the application on the node P in the form of child nodes, so that when the machine changes, it can notify the collector to adjust the task assignment in real time. Some information in the system needs to be obtained dynamically, and there will be manual questions to modify this information. Usually an interface, such as the JMX interface, is exposed to get some information about the runtime. With the introduction of ZooKeeper, you don't have to implement your own solution, just store the information on the specified ZooKeeper node. Note: in the above-mentioned application scenarios, there is a default premise: scenarios where the amount of data is small, but the data may be updated quickly.

Load balancing

Load balancing here refers to soft load balancing. In a distributed environment, in order to ensure high availability, the same application or provider of the same service will usually deploy multiple copies to achieve peer-to-peer services. Consumers have to choose one of these peer-to-peer servers to execute the relevant business logic, among which the typical one is the producer and consumer load balance in message middleware.

Naming Service (Naming Service)

Naming service is also a common scenario in distributed systems. In a distributed system, by using naming services, client applications can obtain information such as the address and provider of resources or services according to the specified name. Named entities can usually be machines in a cluster, service addresses provided, remote objects, and so on-all of which we can collectively call names (Name). One of the more common is the list of service addresses in some distributed service frameworks. By calling the API provided by ZK to create the node, you can easily create a globally unique path, which can be used as a name.

Dubbo, the open source distributed service framework of Alibaba Group, uses ZooKeeper as its naming service to maintain the global service address list. In the Dubbo implementation: at startup, the service provider writes its URL address to the specified node / dubbo/$ {serviceName} / providers directory on the ZooKeeper, which completes the publication of the service. When the service consumer starts, subscribe to the provider URL address in the / dubbo/$ {serviceName} / providers directory and write its own URL address to the / dubbo/$ {serviceName} / consumers directory. Note that all addresses registered with ZooKeeper are temporary nodes, which ensures that service providers and consumers are automatically aware of changes in resources.

In addition, Dubbo has monitoring for service granularity by subscribing to the information of all providers and consumers under the / dubbo/$ {serviceName} directory.

Distributed notification / coordination

ZooKeeper has a unique watcher registration and asynchronous notification mechanism, which can well realize the notification and coordination between different systems in the distributed environment, and realize the real-time processing of data changes. The method of use is usually that different systems register the same Znode on the ZooKeeper and listen for changes in the Znode (including the content of the Znode itself and its child nodes). If one system update the Znode, the other system can receive the notification and deal with it accordingly.

Another heartbeat detection mechanism: the detection system and the detected system are not directly related, but through the association of a node on the zk, the system coupling is greatly reduced. Another system scheduling mode: a system consists of a console and a push system, and the duty of the console is to control the push system to carry out the corresponding push work. Some of the operations that managers do in the console actually modify the status of some nodes on the ZooKeeper, and ZooKeeper notifies them of these changes to the client that registers Watcher, that is, the push system, and then makes the corresponding push task.

Another work reporting mode: some similar to the task distribution system, after the sub-task starts, register a temporary node to ZooKeeper, and report their progress regularly (write the progress back to this temporary node), so that the task manager can know the progress of the task in real time.

Distributed lock

Distributed lock, which is mainly due to the fact that ZooKeeper ensures strong data consistency for us. Lock services can be divided into two categories, one is to keep exclusive, the other is to control the timing.

Maintaining exclusivity means that only one client who tries to acquire the lock can successfully acquire the lock. The usual practice is to treat a Znode on ZooKeeper as a lock, which is implemented by create znode. All clients create the / distribute_lock node, and the client that is successfully created owns the lock. Control timing, that is, all the views to acquire the lock of the client, will eventually be scheduled to execute, but there is a global timing. The practice is similar to the above, except that / distribute_lock already exists here, and the client creates a temporary ordered node under it (this can be specified by the node's attribute control: CreateMode.EPHEMERAL_SEQUENTIAL). The parent node (/ distribute_lock) of the ZooKeeper maintains a copy of the sequence, which ensures the timing of the creation of the child nodes, thus forming the global timing of each client.

Since the name of the child node under the same node cannot be the same, as long as the Znode is created under a node, it indicates that the lock is successful. The registered listener listens to this Znode and notifies other clients to lock it as soon as the Znode is deleted.

Create a temporary sequential node: create a node under a node, and a request creates a node. because it is sequential, the one with the lowest sequence number acquires the lock, and when the lock is released, notify the next sequence number to acquire the lock.

Distributed queue

In terms of queues, there are two kinds of queues, one is the conventional first-in-first-out queue, and the other is to wait for the members of the queue to gather before they are executed in order. The first type of queue is consistent with the basic principle of timing control in the distributed lock service described above, so I won't go into detail here.

The second queue is actually an enhancement based on the FIFO queue. Usually, you can set up a / queue/num node in advance under the Znode of / queue, and assign a value of n (or directly assign n to / queue) to indicate the queue size. After that, each time a queue member joins, you can determine whether the queue size has been reached and decide whether execution can be started. A typical scenario for this usage is that in a distributed environment, a large task Task A can only be done when many subtasks are completed (or conditionally ready). At this time, when one of the subtasks is completed (ready), then set up your own temporary timing node (CreateMode.EPHEMERAL_SEQUENTIAL) under / taskList. When / taskList finds that the number of child nodes below it meets the specified number, it can proceed to the next step in sequence processing.

Use dokcer-compose to build a cluster

We have introduced so many application scenarios about ZooKeeper above, so let's first learn how to build a ZooKeeper cluster and then carry out the above application scenarios.

The directory structure of the file is as follows:

├── docker-compose.yml

Write docker-compose.yml files

The docker-compose.yml file is as follows:

Version: '3.4' services: zoo1: image: zookeeper restart: always hostname: zoo1 ports:-2181 environment: ZOO_MY_ID: 1 ZOO_SERVERS: server.1=0.0.0.0:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181 zoo2: image: zookeeper restart: always hostname: zoo2 ports: 2181 environment: ZOO_MY_ID: 2 ZOO_SERVERS: server.1=zoo1:2888:3888 2181 server.2=0.0.0.0:2888:3888;2181 server.3=zoo3:2888:3888;2181 zoo3: image: zookeeper restart: always hostname: zoo3 ports:-2183 zookeeper restart 2181 environment: ZOO_MY_ID: 3 ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=0.0

In this configuration file, Docker runs three ZooKeeper images and binds the local port 2181 ~ 2182 ~ 2183 to port 2181 of the corresponding container through the ports field.

ZOO_MY_ID and ZOO_SERVERS are two environment variables needed to build a ZooKeeper cluster. ZOO_MY_ID identifies the id of the service, which is an integer between 1 and 255. it must be unique in the cluster. ZOO_SERVERS is a list of hosts in the cluster.

Execute docker-compose up under the directory where docker-compose.yml is located, and you can see the startup log.

Connect ZooKeeper

After starting the cluster, we can connect to ZooKeeper to perform node-related operations on it.

First, we need to download the ZooKeeper. ZooKeeper download address.

Decompress it.

Enter its conf directory and change zoo_sample .cfg to zoo.cfg

Profile description:

# The number of milliseconds of each tick # tickTime:CS communication heartbeats # the interval at which a heartbeat is maintained between ZooKeeper servers or between clients and servers, that is, a heartbeat is sent for each tickTime time. TickTime is in milliseconds. TickTime=2000 # The number of ticks that the initial # synchronization phase can take # initLimit:LF initial communication time # the maximum number of heartbeats (number of tickTime) that can be tolerated during the initial connection between the follower server (F) and the leader server (L) in the cluster. InitLimit=5 # The number of ticks that can pass between # sending a request and getting an acknowledgement # syncLimit:LF synchronous communication time limit # the maximum number of heartbeats (number of tickTime) that can be tolerated between requests and responses between follower servers and leader servers in the cluster. SyncLimit=2 # the directory where the snapshot is stored. # do not use / tmp for storage, / tmp here is just # example sakes. # dataDir: data file directory # ZooKeeper the directory where the data is saved. By default, ZooKeeper also saves the log files for writing data in this directory. DataDir=/data/soft/zookeeper-3.4.12/data # dataLogDir: directory of log files # ZooKeeper the directory where the log files are saved. DataLogDir=/data/soft/zookeeper-3.4.12/logs # the port at which the clients will connect # clientPort: client connection port # the port on which the client connects to the ZooKeeper server. ZooKeeper listens on this port and accepts access requests from the client. ClientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients # maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance The number of snapshots to retain in dataDir # autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable autopurge feature # autopurge.purgeInterval=1 # Server name and address: cluster information (server number, server address, LF communication port, election port) # this configuration item has a special writing format The rules are as follows: # server.N=YYY:A:B # where N represents the server number, YYY represents the IP address of the server, and An is the LF communication port, indicating that the server and the leader in the cluster

You don't have to modify zoo.cfg, just configure it by default, and then execute the command in the unzipped bin directory. / zkCli.sh-server 127.0.0.1 zkCli.sh 2181 can be connected.

Welcome to ZooKeeper! 2020-06-01 15 INFO 03VR 52512 [myid:]-INFO [main-SendThread (localhost:2181): ClientCnxn$SendThread@1025]-Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2020-06-01 15 JLine support is enabled 03VR 52576 [myid:]-INFO [main-SendThread (localhost:2181): ClientCnxn$SendThread@879]-Socket connection established to localhost/127.0.0.1:2181, initiating session 2020-06-01 15 purge 03VR 52599 [myid:]-INFO [main-SendThread (localhost:2181): ClientCnxn$SendThread@1299]-Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x100001140080000 Negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: 127.0.0.1 zk 2181 (CONNECTED) 0]

Next we can use the command to view the nodes.

Use the ls command to see what is contained in the current ZooKeeper.

Command: ls /

[zk: 127.0.0.1 ls 2181 (CONNECTED) 10] / [zookeeper]

A new Znode node "zk" and the string associated with it are created.

Command: create / zk myData

[zk: 127.0.0.1 create 2181 (CONNECTED) 11] create / zk myData Created / zk [zk: 127.0.0.1 create 2181 (CONNECTED) 12] ls / [zk, zookeeper] [zk: 127.0.0.1 zk myData Created 2181 (CONNECTED) 13]

Get the Znode node zk.

Command: get / zk

[zk: 127.0.0.1 get 2181 (CONNECTED) 13] get / zk myData cZxid = 0x400000008 ctime = Mon Jun 01 15:07:50 CST 2020 mZxid = 0x400000008 mtime = Mon Jun 01 15:07:50 CST 2020 pZxid = 0x400000008 cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 6 numChildren = 0

Delete the Znode node zk.

Command: delete / zk

[zk: 127.0.0.1 delete 2181 (CONNECTED) 14] delete / zk [zk: 127.0.0.1 delete 2181 (CONNECTED) 15] ls / [zookeeper] after reading the above, do you have any further understanding of the most convenient way to set up a ZooKeeper server? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.