Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Basic concepts and functions of zookeeper

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Tuesday, 2019-2-19

Basic concepts and functions of zookeeper

Zookeeper is an important low-level framework in the hadoop ecosystem, which mainly provides distributed coordination services for the upper framework.

Hadoop-spof problem and HA solution

The necessity of introducing Cluster Coordination Service Framework

Introduction to zookeeper

ZooKeeper is a distributed application coordination service, based on which distributed applications can implement synchronization services, configuration maintenance, naming services and so on.

At present, zookeeper is widely used in the distributed coordination of various frameworks in the hadoop ecosystem, and we can also use zookeeper to simplify distributed application development.

Introduction to Zk

1. What is the function of zookeeper zookeeper into English zookeeper

2. Let elephants (hadoop), beehives (hive) and pigs (pig) be more friendly together, all of which are components of hadoop

3. ZooKeeper is a distributed, open source distributed application coordination service.

4. Zookeeper is actually a piece of software, and all servers with zookeeper installed are called zookeeper server

5. Zookeeper server is also divided into two types of roles, which are composed of leader and follower. If leader dies, there will be an election mechanism. Follower directly replaces leader, there is only one leader, and the rest are follower.

6. All the data structures (tree structure) in all the servers of zookeeper are exactly the same, that is, if I build a zookeeper cluster, the data of all the machines in the cluster are the same.

7. The data is tree-shaped, which is the same as the linux directory structure. Each data directory of zk is a znode.

8. How many ZooKeeper should I run? It's OK for you to run a zookeeper, but in a production environment, you'd better deploy 3pr 5pm 7 nodes. The more deployments, the higher the reliability. Of course, it is best to deploy an odd number, even number is not impossible, but the zookeeper cluster is more than half the number of outages will make the whole cluster down, so odd clusters are better.

What is the effect of zk on hdfs HA high availability clusters?

1. Qjn cluster (HA of edit log management system) needs zk cluster to implement and coordinate the service.

2. Namenode who is active and who is standay is recorded in zk

3. Zkfc is a fail-over controller based on zk.

What is the effect of zk on yarn cluster HA?

1. Who is recoursemanager active and who is from the RM is recorded in zk

It's basically the same as in hdfs.

What is the effect of zk on hbase cluster HA?

1. Who is hmaster recorded in zk

2. Zk retains the information about the health status and availability of the server, and provides server failure notification. Through zookeeper cluster, you can get the current system table. META. The corresponding regionserver information is stored in the

3. Zk uses consensus protocol to protect the shared state. It should be noted that 3 to 5 machines are required to participate in an consensus protocol (odd number).

/ / official explanation

1 guarantee that there is only one master in the cluster at any time

2 store the addressing entry of all Region.

3 monitor the status of Region Server in real time, and inform Master of the online and offline information of Region server in real time.

4. Store the schema (mode) of Hbase, including what table there are and what column family each table has.

What is the effect of Zk on kafka clusters?

1. When a kafka broker starts, it first registers its own node information (temporary znode) with zookeeper, and when broker and zookeeper are disconnected, the znode is deleted.

two。 Where the location of partition leader (host:port) is registered in zookeeper

3. For the preservation and use of offset, there is a consumer to control the offset will be saved in zookeeper.

Summary:

Distributed coordination services that can be implemented by Zookeeper (third party) include:

1. Uniform name service / / if each server sever is compared to an one-by-one resource, the client gets the name service resource.

2. Configuration management

3. Distributed shared lock / / for example, a shared resource needs to be modified on each distributed server, and conflicts will occur at this time

4. Cluster node state coordination (load balancing / master-slave coordination) / / server cluster dynamic awareness and failover zkfc

Summary: the above four functions are not included in zk itself, but can be realized by zk.

Functions of zookeeper:

That is, the third party, when you are asked to check the data, you can return it to the client, so he doesn't know what to do.

(the most important function is to keep the data for the client and provide the data monitoring service for the customer)

Internally, we have designed our own distributed memory database (for keeping data).

ZooKeeper data Model and hierarchical Namespace

The namespaces provided are very similar to standard file systems. A name consists of a sequence of pathnames separated by slashes. Each node in the ZooKeeper is identified by a path.

Data nodes in ZooKeeper:

Each node is called znode and is accessed through a path

Each znode maintains: data, stat data structures (ACL, timestamp and version number)

The data maintained by znode is mainly used to store coordinated data, such as status, configuration, location and other information. The amount of data stored by each node is very small, at the KB level.

After the data of znode is updated, the control information such as version number will also be updated (increased).

Znode also has the characteristics of atomic operation: write-replace all, read-all

Znode can be divided into permanent node and temporary node: temporary node means that the node is deleted by zookeeper as soon as the session that created it ends.

Zk performance:

Zookeeper reads and writes very fast (based on an in-memory database), and reads faster than writes.

Order consistency: the order in which clients are updated is consistent with the order in which they are sent.

Atomicity: the update operation either succeeds or fails. There is no third result.

Single system image: no matter which server the client connects to, the client will see the same

ZooKeeper view.

Reliability: once an update operation is applied, its value will not change until the client updates it again. This guarantee will produce the following two results:

1. If the client successfully obtains the correct return code, the update has been successful. If the return code cannot be obtained (due to a communication error, timeout, and so on), the client will not know whether the update operation takes effect.

2. When recovering from a failure, any successful update operation that can be seen by the client will not be rolled back.

Real-time: in a specific period of time, the system seen by the client needs to be guaranteed to be real-time. During this period of time, any changes to the system will be seen or detected by the client. Given these consistency guarantees, the design and implementation of more advanced features of ZooKeeper will become very easy.

For example, the implementation of mechanisms such as leader election, queue and revocable lock.

Zookeeper cluster components:

There are three kinds of server under the same zookeeper service, one is leader server, the other is follower server, and the other is called observer server

What makes leader special is that it has the decision-making power and Request Processor (the difference between observer server and follower server is that it does not participate in the leader election)

The algorithm of Internal leader Election in zk: paxos

If the client modifies the data in the zk cluster, the leader will be found in the cluster first, then the local data will be modified on the leader, and then each follower will synchronize the information

Port meaning in zk

(where 2181 represents the port used by the client to connect to the server)

(where 2888 represents the port used for communication between leader and follower)

(among them, 3888 represents the port used for voting between follower)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report