How to solve the problems related to akka cluster 07/15 Update SLTechnology News&Howtos

How to solve the problems related to akka cluster

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to solve akka cluster-related problems". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to solve akka cluster-related problems".

Background

In a recent project, akka (2.6.8) cluster was used for distributed deployment in K8s. If unreachable node has not been manually restarted, other node will not be added to the cluster.

The specific operation is that one of the non-seed node nodes is deployed to other nodes due to pod restart, while the previous node (ip) and cluster will always connect to the node (ip), resulting in an exception

Analysis of specific reasons

First, let's take a look at the concept Gossip Convergence, as follows:

Gossip convergence cannot occur while any nodes are unreachable. The nodes need to become reachable again, or moved to the down and removed states (see the Cluster Membership Lifecycle section). This only blocks the leader from performing its cluster membership management and does not influence the application running on top of the cluster. For example this means that during a network partition it is not possible to add more nodes to the cluster. The nodes can join, but they will not be moved to the up state until the partition has healed or the unreachable nodes have been downed.

When no node is reachable, the Gossip convergence does not agree. The node needs to become reachable again, or transition to the down and removed state. This only prevents leaders from performing their cluster membership management and does not affect applications running at the top of the cluster. For example, this means that on the network

It is not possible to add more nodes to the cluster during the zone. Nodes can join, but they will not be moved to the up state until the partition is repaired or unreachable nodes have been shut down.

Obviously, akka is to ensure that each node is reachable or down, so that consistency negotiation can be carried out.

Membership-lifecycle also mentioned:

If a node is unreachable then gossip convergence is not possible and therefore most leader actions are impossible (for instance, allowing a node to become a part of the cluster). To be able to move forward, the node must become reachable again or the node must be explicitly "downed". This is required because the state of an unreachable node is unknown and the cluster cannot know if the node has crashed or is only temporarily unreachable because of network issues or GC pauses. See the section about User Actions below for ways a node can be downed.

In other words, if a node is unreachable, you must ensure that the node is in reachable or downed state, because the unreachable state may also be caused by network jitter, or the server load is too high caused by GC. These states cannot be distinguished by akka and can only be reconnected indefinitely.

Solution method

Now that there is a problem, we have to solve the problem, and the solution can naturally be solved on the official website, by automatically transforming the unreachable node into a down state.

Active state transition in the form of a http request

Introduction of split-brain-resolver (SBR)

The first method is studied by ourselves, and we adopt the second way: SBR is divided into five kinds of strategies: tatic-quorum, keep-majority, keep-oldest, down-all and lease-majority

We adopt the keep-majority strategy, and the advantages and disadvantages of five strategies and their usage scenarios are analyzed through the official website strategies.

Let's take a look at the akka configuration under keep-majority policy

Akka.coordinated-shutdown.exit-jvm = on akka.coordinated-shutdown.exit-code = 0 akka.cluster.downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider" akka.cluster.split-brain-resolver.down-all-when-unstable = off akka.cluster.split-brain-resolver.stable-after = 20s akka.cluster.split-brain-resolver.active-strategy = keep-majority akka.cluster.split-brain-resolver.keep-majority.role = "admin" noun denotes akka.coordinated-shutdown .exit-jvm when a node is removed from the cluster Whether to exit jvm or not. The status code akka.cluster.downing-provider-class when on offakka.coordinated-shutdown.exit-code exits is configured as akka.cluster.sbr.SplitBrainResolverProvider, indicating how long it takes to start SBRakka.cluster.split-brain-resolver.down-all-when-unstable when the cluster is in an unstable state, and all nodes will be shut down. Optional on off or duration, such as how long the 15sakka.cluster.split-brain-resolver.stable-after node is in unreachable. SBR starts the node down operation akka.cluster.split-brain-resolver.active-strategykeep-majority, and only this role can make SBR decisions on the policy akka.cluster.split-brain-resolver.keep-majority.role settings started.

Note: for akka.cluster.split-brain-resolver.keep-majority.role, if there are only a few nodes (less than half of the cluster nodes) due to other reasons in the cluster, and the role of the minority nodes happens to be equal to this value, then the minority nodes will not exit

If this item is not configured, a small number of nodes will all exit, resulting in the entire cluster down

Thank you for your reading, the above is the content of "how to solve akka cluster-related problems". After the study of this article, I believe you have a deeper understanding of how to solve this problem related to akka cluster, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.