Why is Eureka better than ZooKeeper? 07/11 Update SLTechnology News&Howtos

Why is Eureka better than ZooKeeper?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you why Eureka is better than ZooKeeper. It is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

1. The preface service registry provides the client with a list of services that can be invoked. When the client invokes the remote service, the client invokes the service according to the service list and then selects the service address of the service provider. Service registry is widely used in distributed system, and it is an indispensable component in distributed system, such as zk registry in namenode,dubbo in name server,hdfs of rocketmq and eureka in spring cloud. In spring cloud, in addition to using eureka as the registry, you can also configure to use zookeeper as the registry. In that case, how do we choose the implementation of the registry? The famous CAP theory points out that a distributed system cannot satisfy C (consistency), A (availability) and P (partition fault tolerance) at the same time. Because partition fault tolerance must be guaranteed in distributed systems, we can only make a tradeoff between An and C. Here Zookeeper guarantees CP, while Eureka is AP. 2. Zookeeper guarantees that when CP queries the registry for the list of services, we can tolerate that the registry returns the registration information from a few minutes ago, but cannot accept that the service is directly down unavailable. In other words, the service registration function requires higher availability than consistency. However, there will be such a situation in zk, when the master node loses contact with other nodes because of a network failure, the remaining nodes will re-elect leader. The problem is that the election time of the leader is too long, 30 ~ 120s, and the entire zk cluster is not available during the election, which leads to the paralysis of the registration service during the election. In the cloud deployment environment, it is highly likely that the zk cluster will lose its master node due to network problems. Although the service can eventually be restored, the long-term unavailability of registration caused by a long election time cannot be tolerated. 3. Eureka ensures that APEureka understands this, so usability is a priority at design time. All the nodes in Eureka are equal, the failure of several nodes will not affect the work of normal nodes, and the remaining nodes can still provide registration and query services. On the other hand, when the client of Eureka registers with an Eureka or if it finds that the connection fails, it will automatically switch to another node. As long as an Eureka is still there, the registration service can be guaranteed (availability is guaranteed), but the information found may not be up-to-date (strong consistency is not guaranteed). In addition, Eureka also has a self-protection mechanism. If more than 85% of the nodes do not have a normal heartbeat within 15 minutes, then Eureka believes that there is a network failure between the client and the registry, and the following situations will occur:

1. Eureka no longer removes services from the registration list that should expire because they have not received a heartbeat for a long time.

2. Eureka can still accept registration and query requests for new services, but it will not be synchronized to other nodes (that is, to ensure that the current node is still available)

3. When the network is stable, the new registration information of the current instance will be synchronized to other nodes, so Eureka can well deal with the situation that some nodes lose contact due to network failure, and will not paralyze the whole registration service like zookeeper. 4. More in-depth discussion the following is a more in-depth discussion of the difference between zookeeper and eureka as a registry. The article was forwarded from http://dockone.io/article/78 and translated from a foreign article. Why ZooKeeper should not be used for Service Discovery [editor's words] the author shares Knewton's experience in deploying services on cloud computing platforms by comparing ZooKeeper and Eureka as Service discovery services (note: UDDI in WebServices architecture is a discovery service). Although this article is a little extreme, we can see that Knewton is very experienced in cloud platforms. From a practical point of view, this article compares the advantages and disadvantages of ZooKeeper and Eureka as publishing services from the three aspects of cloud platform characteristics, CAP principles and operation and maintenance, and puts forward the methodology of building discovery services on cloud platforms. Background many companies choose to use ZooKeeper as the Service Discovery Service (Service Discovery), but when building Knewton (Knewton is a personalized education platform for companies, schools and publishers to provide students with adaptive learning materials through the Knewton platform), we found that this was a fundamental mistake. In this article, we will use the problems we have encountered in practice to illustrate why using ZooKeeper as a Service discovery service is a mistake. 4.3 Please note that the service deployment environment allows us to start from scratch. When we deploy services, we should first consider the platform (platform environment) for service deployment, and then consider the software system running on the platform or how to build a system on the selected platform. For example, for a cloud deployment platform, the scalability of the platform at the hardware level (Note: the author should refer to the redundant design of the system, that is, the system encounters a single point of failure. The ability to quickly switch to other nodes to complete tasks) and how to deal with network failures are the first consideration. When your service runs on a cluster built by a large number of servers, there is bound to be a single point of failure. For knewton, although we are deployed on AWS, we have encountered all kinds of failures in the past operation and maintenance; therefore, you should design the system as "fault open" (expecting failure). In fact, there are a lot of companies that also use AWS and we have encountered similar problems (and there are many books about it). You must be able to anticipate the problems that may occur on the platform in advance, such as unexpected failures (note: the original text is box failure, can only be expected to refer to the unexpected pop-up error box), high latency and network segmentation problems (note: the original text is network partitions. It means that when a network switch fails, communication between different subnets will be interrupted)-at the same time, we need to be able to build systems that are resilient enough to deal with them. Never expect you to deploy services on the same platform as everyone else! Of course, if you are operating and maintaining a data center alone, you may spend a lot of time and money to avoid hardware failures and network segmentation problems, which is another case; but in cloud computing platforms, such as AWS, different problems and different solutions will arise. You will understand when you actually use them, but you'd better deal with them in advance (note: referring to unexpected failures, high latency and network fragmentation problems mentioned in the previous section). 4.4 ZooKeeper as the problem of discovering services ZooKeeper (Note: ZooKeeper is a sub-project of the famous Hadoop, which aims to solve the problem of service coordination synchronization (Coordinate Service) in large-scale distributed application scenarios. It can provide other services in the same distributed system: unified naming service, configuration management, distributed lock service, cluster management, etc.) is a great open source project, it is very mature, there is a considerable community to support its development, and has been widely used in the production environment; but it is a mistake to use it as a Service discovery service solution. There is a famous CAP theorem in the field of distributed systems (C-data consistency; A-service availability; P-service fault tolerance to network partition failures, these three features can not be satisfied at the same time in any distributed system, at most two); ZooKeeper is CP, that is, the access request to ZooKeeper at any time can get consistent data results, and the system is fault-tolerant for network segmentation. However, it does not guarantee the availability of each service request (that is, in extreme circumstances, ZooKeeper may drop some requests, and consumer programs need to re-request to get the results). But don't forget that ZooKeeper is a distributed coordination service, and its responsibility is to ensure that data (note: configuration data, state data) is synchronized and consistent among all services under its jurisdiction. So it is not difficult to understand why ZooKeeper is designed to be CP rather than AP. If it is AP, it will bring terrible consequences. (note: ZooKeeper is like a traffic light at an intersection. Can you imagine a sudden traffic light failure at a major road? ). Moreover, as the core implementation algorithm of ZooKeeper, Zab solves the problem of how to keep data synchronized among multiple services in distributed systems. As a distributed collaborative service, ZooKeeper is very good, but not suitable for Service Discovery Service, because for Service Discovery Service, it is better to return results that contain false information than nothing. Furthermore, for the Service Discovery Service, you would rather return the information available on which servers a service was available five minutes ago, rather than not finding an available server because of a temporary network failure, without returning any results. So, using ZooKeeper to do Service discovery service is definitely wrong, if you use it in this way, you will be miserable! What's more, if it is used as a Service discovery service, ZooKeeper itself does not correctly handle the problem of network segmentation; in the cloud, network segmentation problems do occur like other types of failures, so it is best to prepare for this problem 100% in advance. As Jepsen said in his blog on the ZooKeeper website: in ZooKeeper, if the number of nodes (nodes) in the same network partition (partition) does not reach the "quorum" of Leader nodes selected by ZooKeeper, they will be disconnected from ZooKeeper, and of course, they will not be able to provide Service discovery service. Adding client-side caching to ZooKeeper or other similar technologies can alleviate the problem of node synchronization information errors caused by network failures in ZooKeeper. Pinterest and Airbnb use this method to prevent ZooKeeper failures. This approach can ostensibly solve this problem, specifically, when some or all nodes are disconnected from ZooKeeper, each node can also obtain data from the local cache; but even so, it is impossible for all nodes under ZooKeeper to cache all service registration information at any time. If all the nodes under the ZooKeeper are disconnected, or if there is a failure of network segmentation in the cluster, then ZooKeeper will remove them from its own management scope and the outside world will not be able to access these nodes, even if they are "healthy" and can provide services normally. As a result, service requests to these nodes are lost. (note: this is also the reason why ZooKeeper does not meet the An in CAP.) the deeper reason is that ZooKeeper is built according to the principle of CP, that is, it ensures that the data of each node is consistent, while the practice of caching ZooKeeper is designed to make ZooKeeper more reliable (available); however, ZooKeeper is designed to keep node data consistent, that is, CP. So, in this way, you may not get either a data-consistent (CP) or a high-availability (AP) Service discovery service; because this is equivalent to forcing an AP system on an existing CP system, which essentially won't work! A Service discovery service should be designed to be highly available from the start! If we put aside the principle of CAP, it is very difficult to set up and maintain ZooKeeper services correctly; errors often occur, resulting in many projects being set up just to reduce the difficulty of maintaining ZooKeeper. These errors exist not only with the client but also with the ZooKeeper server itself. Many failures of Knewton platform are caused by improper use of ZooKeeper. Seemingly simple operations, such as the correct rebuild observer (reestablishing watcher), client-side Session and exception handling, and managing memory in the ZK window can easily lead to ZooKeeper errors. At the same time, we did encounter some classic bug:ZooKeeper-1159 and ZooKeeper-1576; of ZooKeeper. We even encountered the failure of ZooKeeper to elect a Leader node in a production environment. The reason for these problems is that ZooKeeper needs to manage and guarantee the Session and network connection resources of the service group under its jurisdiction (Note: the management of these resources is extremely difficult in a distributed system environment); but it is not responsible for managing service discovery, so it uses ZooKeeper when Service discovers that the loss of the service outweighs the gain. Make the right choice: the success of Eureka We switched the Service Discovery Service from ZooKeeper to the Eureka platform, an open source service discovery solution developed by Netflix. (note: Eureka consists of two components: the Eureka server and the Eureka client. The Eureka server is used as the service registration server. The Eureka client is a java client that simplifies interaction with the server, acts as a polling load balancer, and provides failover support for services. Eureka was designed from the beginning as a highly available and scalable Service discovery service, which are also two features of all platforms developed by Netflix. They are all talking about Eureka. Since the beginning of the switching work, we have achieved the record of no offline maintenance of all products that depend on Eureka in the production environment. We have also been told that service migration on cloud platforms is doomed to failure, but the lesson we have learned from this example is that a good Service discovery service plays a vital role in it! First of all, in the Eureka platform, if a server goes down, Eureka will not go through the process of electing leader similar to ZooKeeper; client requests will automatically switch to the new Eureka node; when the down server is restored, Eureka will once again bring it into the server cluster management; for it, all it has to do is to synchronize some new service registration information. So you no longer have to worry about the risk of being removed from the Eureka server cluster when "lagging" servers are restored. Eureka is even designed to deal with a wider range of network segmentation failures and to achieve "0" downtime maintenance requirements. When a network segmentation failure occurs, each Eureka node will continuously provide services (note: ZooKeeper will not): receive new service registrations and provide them to downstream service discovery requests. In this way, it can be implemented in the same subnet (same side of partition), and the newly released services can still be discovered and accessed. But Eureka has done more than that. Normally, Eureka has a built-in heartbeat service to weed out some "dying" servers; if the "heartbeat" of a service registered with Eureka becomes slow, Eureka removes it from management (a bit like ZooKeeper). This is a good feature, but it is also very dangerous when a network segmentation failure occurs, because the servers that are removed because of network problems (note: slow heartbeat is eliminated) are themselves "healthy", just because the network segmentation failure splits the Eureka cluster into separate subnets and cannot access each other. Fortunately, Netflix takes this flaw into account. If the Eureka service node loses a large number of heartbeat connections in a short period of time, the Eureka node will enter "self-protection mode" while keeping the service registration information of "heartbeat death" unexpired. At this point, the Eureka node can also provide registration services for new services and retain "dead" ones in case clients make requests to it. When the network failure recovers, the Eureka node exits "self-protection mode". So Eureka's philosophy is that it's better to keep both "good data" and "bad data" than to lose any "good data", so this model is very effective in practice. Finally, Eureka also has client-side caching function (Note: Eureka is divided into two parts: client program and server-side program, client program is responsible for providing registration and discovery service interface). So even if all the nodes in the Eureka cluster fail, or the client cannot access any of the Eureka servers due to a network segmentation failure, consumers of Eureka services can still obtain the existing service registration information through the Eureka client cache. Even in the most extreme environment, when all normal Eureka nodes do not respond to requests and there is no better server solution to solve this problem, it is important that consumer services can still query and obtain registration service information through the Eureka client, thanks to Eureka's client-side caching technology. The architecture of Eureka ensures that it can become a Service discovery service. Compared with ZooKeeper, it eliminates the selection of Leader nodes or transaction log mechanism, which helps to reduce the difficulty of user maintenance and ensure the robustness of Eureka at run time. And Eureka is designed for discovery service, it has independent client library, and provides heartbeat service, service health monitoring, automatic publishing service and automatic cache refresh function. However, if you use ZooKeeper, you have to implement these functions yourself. All Eureka libraries are open source, and everyone can see and use the source code, which is better than client libraries that only one or two people can see or maintain. Maintaining the Eureka server is also very simple, for example, switching a node only requires removing an existing node under the existing EIP and adding a new one. Eureka provides a graphical operation and maintenance interface of web-based, in which you can view the running status information of the registration service managed by Eureka: whether it is healthy, running log, etc. Eureka even provides Restful-API interface to facilitate third-party programs to integrate the functions of Eureka.

With regard to the Service Discovery Service, we would like to make two points in this article:

1. Pay attention to the hardware platform on which the service runs

2. Always pay attention to the problem you want to solve, and then decide which platform to use. It is from these two aspects that Knewton considers using Eureka to replace ZooKeeper as a service discovery service. Cloud deployment platform is full of unreliability, Eureka can deal with these shortcomings; at the same time, Service discovery service must be both highly reliable and highly resilient, Eureke is what we want!

The above is why Eureka is better than ZooKeeper. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.