In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the knowledge of "how to use ZooKeeper, the component of Hadoop". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
1. What is zookeeper?
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
What is the use of ZooKeeper? Centralized maintenance of configuration information, naming management, providing distributed synchronization, providing group services.
two。 Why use zookeeper?
Most distributed applications need a master, coordinator or controller to manage physically distributed child processes (such as resource, task allocation, etc.). At present, most applications need to develop private coordinator programs, lack of a general mechanism. The repeated writing of the coordination program is wasteful, and it is difficult to form a universal and scalable coordinator. Zookeeper has a universal mechanism.
ZooKeeper is a highly available distributed data management and system coordination framework. Based on the implementation of the Paxos algorithm, the framework ensures the strong consistency of data in the distributed environment, and it is based on this characteristic that ZooKeeper solves many distributed problems. On the Internet, there is also a lot of introduction to the application scenarios of ZK, introducing the scenes of zookeeper that I have used, but I don't know much about others.
A.Hadoop2.0, using Zookeeper event handling to ensure that the entire cluster has only one active NameNode, storing configuration information, etc.
B.HBase, using Zookeeper event handling, ensures that the entire cluster has only one HMaster, detects HRegionServer online and downtime, stores access control lists, etc.
C.Dubbo, using zookeeper as a registry, is used to register dubbo services. For publishing and subscribing purposes.
ZK is not designed for these application scenarios, but is a typical usage later explored by many developers according to the characteristics of its framework and using a series of API interfaces (or primitive sets) provided by it.
Installation and configuration of 3.zookeeper (stand-alone mode)
a. Download Resources: http://labs.renren.com/apache-mirror/zookeeper/zookeeper-3.4.3/zookeeper-3.4.3.tar.gz
b. Decompress the resource: tar xzvf zookeeper-3.4.3.tar.gz
c. Modify configuration files: in general
The command is as follows: cd zookeeper-3.3.3
Cp conf/zoo_sample.cfg conf/zoo.cfg
Vi conf/zoo.cfg
The parameters are configured as follows:
d. Start:. / bin/zkServer.sh start
E: stop:. / bin/zkServer.sh stop
F: test:
Installation and configuration of 4.zookeeper (cluster mode)
a. Upload Zookeeper package
b. Decompress the Zookeeper package
c. On a machine (yun10-5), modify the configuration file zoo.cfg.
d. Create a myid file in (dataDir=/home/lisai/app/zookeeper-3.4.5/data) that contains the N (5 in server.5) command in server.N: echo "5" >
e. Copy the configured zk information to another node, command:
Scp-r / home/lisai/app/zookeeper-3.4.5/ yun10-6:/home/lisai/app/
Scp-r / home/lisai/app/zookeeper-3.4.5/ yun10-7:/home/lisai/app/
F.SSH logs in to the yun10-6 Magi Yun 10-7 mainframe. Then modify the contents of myid under data. They were changed to 6 and 7 respectively.
g. Start zookeeper:. / bin/zkServer.sh start in turn from yun10-5
Configure a file myid in cluster mode. The file is in the dataDir directory, and there is a data in this file that is the value of A.
When Zookeeper starts, it reads this file and compares the data in it with the configuration information in zoo.cfg to determine which server it is.
h. Test results.
Data Model of 5.zookeeper
Zookeeper maintains a hierarchical data structure that is very similar to a standard file system, as shown in the figure:
We can sum up:
1. Each subdirectory entry, such as NameService, is called znode, and this znode is uniquely identified by the path it is located in, such as Server1, the znode is identified as / NameService/Server1.
2.znode can have child node directories, and each znode can store data. Note that directory nodes of EPHEMERAL type cannot have child node directories.
There are versions of 3.znode, and there can be multiple versions of data stored in each znode, that is, multiple copies of data can be stored in an access path.
4.znode can be a temporary node. Once the client that created the znode loses contact with the server, the znode will also be deleted automatically. The client and server of Zookeeper communicate with each other through a persistent connection. Each client and server maintains a connection through a heartbeat. This connection state is called session. If the znode is a temporary node, the session fails, and the znode is deleted.
The directory name of 5.znode can be numbered automatically. If App1 already exists and is created, it will be automatically named App2.
6.znode can be monitored, including the modification of the data stored in this directory node, the change of the child node directory, etc., once the change can notify the monitoring client, this is the core feature of Zookeeper, many functions of Zookeeper are based on this feature.
Role Analysis of 6.zookeeper
Leader (leader), responsible for initiating and deciding on voting, updating system status
Learners (learner), including followers (follower) and observers (observer), follower is used to accept client requests and return results to the client, and participate in voting during the selection process
Observer can accept client connections and forward write requests to leader, but observer does not participate in the voting process and only synchronizes the status of leader. The purpose of observer is to expand the system and improve the reading speed.
Client (client), requesting initiator.
Command line actions:
Java API of 7.zookeeper
Zookeeper Java programming is not very commonly used, it is simple and practical basic operation.
/ / create a connection to the server ZooKeeper zk = new ZooKeeper ("localhost:" + CLIENT_PORT, ClientBase.CONNECTION_TIMEOUT, new Watcher () {/ / monitor all triggered events public void process (WatchedEvent event) {System.out.println ("has triggered" + event.getType () + "event!") ); / / create a directory node zk.create ("/ testRootPath", "testRootData" .getBytes (), Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT); / / create a subdirectory node zk.create ("/ testRootPath/testChildPathOne", "testChildDataOne" .getBytes (), Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT) System.out.println (new String (zk.getData ("/ testRootPath", false,null)); / / take out the subdirectory node list System.out.println (zk.getChildren ("/ testRootPath", true)); / / modify the subdirectory node data zk.setData ("/ testRootPath/testChildPathOne", "modifyChildDataOne" .getBytes (),-1); System.out.println ("directory node status: [" + zk.exists ("/ testRootPath", true) + "]") / / create another subdirectory node zk.create ("/ testRootPath/testChildPathTwo", "testChildDataTwo" .getBytes (), Ids.OPEN_ACL_UNSAFE,CreateMode.PERSISTENT); System.out.println (new String (zk.getData ("/ testRootPath/testChildPathTwo", true,null)); / / delete subdirectory node zk.delete ("/ testRootPath/testChildPathTwo",-1); zk.delete ("/ testRootPath/testChildPathOne",-1) / / Delete the parent directory node zk.delete ("/ testRootPath",-1); / / close the connection zk.close ()
Output result:
The None event has been triggered! TestRootData [testChildPathOne] directory node status: [5Magne5Magne1281804532336 pyrr1281804532336jin0jing0jin0jin12jin1re6] has triggered the NodeChildrenChanged event! TestChildDataTwo has triggered the NodeDeleted event! The NodeDeleted event has been triggered! How 8.zookeeper works
The core of Zookeeper is atomic broadcasting, which ensures synchronization between server. The protocol that implements this mechanism is called the Zab protocol. There are two modes of Zab protocol, which are recovery mode and broadcast mode. When the service starts or after the leader crashes, the Zab enters the recovery mode, and when the leader is elected and most of the server's completion is synchronized with the state of the leader, the recovery mode ends. State synchronization ensures that leader and server have the same system state.
Once leader has synchronized his status with most of the follower, he can start broadcasting the message, that is, entering the broadcast state. At this point, when a server joins the zookeeper service, it starts in recovery mode, discovers the leader, and synchronizes the status with the leader. When the synchronization ends, it also participates in the message broadcast. The Zookeeper service remains in the Broadcast state until leader crashes or leader loses most of its followers support.
The broadcast mode needs to ensure that the proposal is processed sequentially, so zk uses an incremental transaction id number (zxid) to guarantee it. All proposals (proposal) are made with zxid when they are made. In the implementation, zxid is a 64-bit number, and its high 32-bit epoch is used to identify whether the leader relationship has changed. Every time a leader is selected, it will have a new epoch. The lower 32 bits is an incremental count.
When leader crashes or leader loses most of its follower, the zk enters recovery mode, which requires a new leader to be re-elected so that all server are restored to the correct state.
9.zookeeper 's election process-"after the election"
Election process:
1. After each Server starts, it asks the other Server who it wants to vote for.
two。 For other server queries, server replies to the id of the leader recommended by him and the zxid of the last transaction according to his own status (each server will recommend himself when the system starts)
3. After receiving all the Server responses, calculate which Server has the largest zxid, and set the Server related information to the Server to vote next time.
4. The sever that gets the most votes in this process is the winner. If the winner has more than half of the votes, then change server to leader. Otherwise, continue the process until leader is elected.
After the election:
1.leader will start waiting for the server connection.
2.Follower connects to leader and sends the largest zxid to leader.
3.Leader determines the synchronization point based on follower's zxid.
4. Notify follower that it has become uptodate after the synchronization is completed.
After receiving the uptodate message, 5Follower can re-accept the client request for service.
And how does the observer use it?
Observing: observe the status. At this time, observer will observe whether the leader has changed, and then synchronize the status of leader.
Following: follow the status, receive the proposal of the leader, and vote. And synchronize the status with leader
Follower: will go to Looking to find the status, this status does not know who is leader, will initiate a leader election
Observing: will wait.
Application scenarios of 10.zookeeper
a. Unified naming service (similar to JNDI)
Naming Service, naming service is also a common scenario in distributed systems. In a distributed system, by using naming services, client applications can obtain information such as the address and provider of resources or services according to the specified name. Named entities can usually be machines in a cluster, service addresses provided, remote objects, and so on-all of which we can collectively call names (Name). One of the more common is the list of service addresses in some distributed service frameworks. By calling the API provided by ZK to create the node, you can easily create a globally unique path, which can be used as a name.
Typical case: Alibaba Group's open source distributed service framework Dubbo uses ZooKeeper as its naming service to maintain a global list of service addresses. Click here to view the Dubbo open source project. In the Dubbo implementation:
Upon startup, the service provider writes its URL address to the specified node / dubbo/$ {serviceName} / providers directory on the ZK, which completes the publication of the service.
When the service consumer starts, subscribe to the provider URL address in the / dubbo/$ {serviceName} / providers directory and write its own URL address to the / dubbo/$ {serviceName} / consumers directory.
Note that all addresses registered with ZK are temporary nodes, which ensures that service providers and consumers are automatically aware of changes in resources.
In addition, Dubbo has monitoring for service granularity by subscribing to the information of all providers and consumers under the / dubbo/$ {serviceName} directory.
b. Configuration center
Configuration management is very common in distributed application environments, for example, the same application system requires multiple PC Server to run, but some configuration items of the application systems they run are the same. If you want to modify these same configuration items, you must modify the PC Server of each running application system at the same time, which is very troublesome and error-prone.
Save the configuration information in a directory node of Zookeeper, and then monitor the status of the configuration information by all the application machines that need to be modified. once the configuration information changes, each application machine will receive a notification from Zookeeper, and then obtain the new configuration information from Zookeeper and apply it to the system.
Case study: zookeeper can easily implement this centralized configuration management, such as configuring all configurations of APP1 under / APP1 znode, monitoring the node / APP1 as soon as all APP1 machines are started (zk.exist ("/ APP1", true)), and implementing the callback method Watcher. When the data changes on the zookeeper / APP1 znode node, each machine will be notified and the Watcher method will be executed. Then the application can remove the data again.
(zk.getData ("/ APP1", false,null))
c. Cluster Management and Master Election
Zookeeper can easily implement the function of cluster management. For example, if there are multiple Server to form a service cluster, then a "manager" must know the service status of each machine in the current cluster. Once a machine cannot provide services, other clusters in the cluster must know, so as to adjust and reallocate the service strategy. Similarly, when increasing the service capacity of the cluster, one or more Server will be added, which must also be known to the manager.
Case study: create a node of EPHEMERAL type, such as server1 creation / APP1SERVERS/SERVER1 (you can use ip to ensure no repetition), server2 create / APP1SERVERS/SERVER2, and then SERVER1 and SERVER2 are both watch / APP1SERVERS as the parent node, then the client who watch the node will be notified of the data or child node changes under this parent node. Because the EPHEMERAL node has a very important feature, that is, if the connection between the client and the server is lost or the session expires, the node will disappear, so when a machine dies or is disconnected, the corresponding node will be lost, and then all clients in the cluster that watch / APP1SERVERS will receive a notification and get the latest list.
-
It can not only maintain the service status of the machines in the current cluster, but also select a "manager" to manage the cluster, which is another function of Zookeeper Leader Election. Once the master is dead, you can immediately select a master from the slave, the implementation step is the same as the former, except that the node type created by the machine in the APP1SERVERS at startup is changed to the EPHEMERAL_SEQUENTIAL type, so that each node will be automatically numbered, and the lowest number will be specified as master, so when we monitor the SERVERS node, we will get a list of servers, as long as all the cluster machines logically think that the lowest numbered node is master. Then the master is selected, and when the master goes down, the corresponding znode disappears, then the new server list is pushed to the client, and each node logically considers the lowest numbered node to be master, thus achieving the dynamic master election.
Case study: in Hbase, ZooKeeper is also used to achieve dynamic HMaster elections. In the implementation of Hbase, some addresses of ROOT table and HMaster will be stored on ZK, and HRegionServer will register itself in Zookeeper as a temporary node (Ephemeral), so that HMaster can sense the survival state of each HRegionServer at any time. At the same time, if there is a problem with HMaster, it will re-elect a HMaster to run, thus avoiding the single point of HMaster problem.
d. Shared lock
Shared locks are easy to implement in the same process, but not across processes or between different Server. However, Zookeeper is easy to implement this function, which also requires the Server to acquire the lock to create an EPHEMERAL_SEQUENTIAL directory node, and then call the getChildren method to get whether the smallest directory node in the current directory node list is a self-created directory node. If it is created by itself, then it acquires the lock. If not, it calls the exists (String path, boolean watch) method and monitors the changes in the list of directory nodes on the Zookeeper until the node it creates is the lowest numbered directory node in the list, thus obtaining the lock and releasing the lock, as long as it deletes the directory node it created previously.
e. Queue management
You can handle two types of queues: a queue is only available when its members are gathered together. Otherwise, wait for all the members to arrive all the time. This is the synchronous queue. The second queue operates in and out of the queue according to FIFO, with slight enhancements, such as implementing the producer and consumer model.
Case study: a large task Task A can only be carried out when many subtasks are completed (or conditions are ready).
Create a parent directory / synchronizing, and each member monitors the existence of the directory / synchronizing/start, then each member joins the queue (creating a temporary directory node of / synchronizing/member_i), and then each member gets all the directory nodes of the / synchronizing directory to determine whether the value of I is already the number of members, if it is less than the number of members waiting for the emergence of / synchronizing/start Create / synchronizing/start if it is already equal.
This is the end of the content of "how to use ZooKeeper, the component of Hadoop". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.