How to understand keycloak clustering 07/19 Update SLTechnology News&Howtos

How to understand keycloak clustering

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "how to understand keycloak clustering". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Clusters in keycloak

We know that there are two modes in keycloak, one is called Standalone and the other is called domain.

The only difference between the two modes is whether the deployment files are managed centrally, and if the deployment files need to be copied manually, it is the standalone mode. If it is an one-click automatic installation, then it is domain mode.

There is a configuration file called / standalone/configuration/standalone-ha.xml in standalone mode, which is the xml file that configures the cluster in standalone mode.

In domain mode, the configuration files are configured on the domain controller machine, and the specific file is domain/configuration/domain.xml.

Let's take a look at the cluster-related components that ha uses:

... ......

Modcluster,infinispan and jgroups are mainly used.

In addition, keycloak introduces a kind of cluster called cross-data center.

This model is mainly used in situations where services are cross-data centers, such as remote computer rooms, where disaster tolerance is particularly strong.

After looking at the basic cluster building of keycloak, let's talk about some of the key concepts and uses of keycloak clusters.

Load balancing load balancing

Because it is a cluster structure, our backend has multiple servers, so when users access our service through the client, which server should they locate?

At this point, we will use the load balancing software, that is, load balancing.

Generally speaking, there are three ways of load balancing:

The first is the client load balancing, the client already knows multiple service addresses of the server, and it is up to the client to choose the service address to be requested when sending the request.

This mode generally requires the configuration of a powerful client-side API, through which the client-side API performs routing functions, such as Memcached.

The magic of Memcached comes from two-stage two-stagehash. Memcached is like a huge hash table that stores many pairs. With key, you can store or query arbitrary data.

The client can store data on multiple memcached. When querying data, the client first refers to the node list to calculate the hash value of key (phase 1 hash), and then selects a node; the client sends the request to the selected node, and then the memcached node uses an internal hash algorithm (phase 2 hash) to find the real data (item).

The second is proxy service load balancing. in this mode, a proxy server connects with multiple back-end services, and the client interacts with the proxy server. instead of the client, the proxy server chooses which service to route to.

There are more routing software for this agent, such as the familiar nginx and HTTPD, as well as ildFly with mod_cluster, HA Proxy, or other hardware load balancing.

The third is routing load balancing. in this mode, users randomly select a back-end server to request a connection, and then route the request inside the server to send the request to other servers.

In this mode, it is generally necessary to implement specific load balancing functions within the server.

Expose client IP address

No matter what mode of load balancer is used, we may need to use the IP address accessed by the customer in our business.

We need to obtain the user's ip address in a specific business to do some operations, such as recording the user's operation log. If we can't get the real ip address, we may use the wrong ip address. There is the authentication or anti-brushing work based on the ip address.

If we use a reverse proxy server before serving, there will be a problem. So we need to configure the reverse proxy server to ensure that the values of X-Forwarded-For and X-Forwarded-Proto HTTP header are valid.

The server side can then get the real ip address of the customer from X-Forwarded-For.

In keycloak, if it is http forwarding, you can configure it as follows:

......

If it is AJP forward, such as using Apache HTTPD + mod-cluster, configure it as follows:

......... Sticky sessions and non-sticky sessions

If you are in an environment where session exists, such as a web application, and if the back-end server is cluster, you also need to consider session sharing.

Because for each server, its session is maintained locally, what if multiple servers want session sharing?

One way is for all servers to store session in the same external caching system, such as redis. In this way, no matter which server the user accesses, the same session data can be read.

Of course, the cache system can be a single point or a cluster, and if it is a different data center, the cache cluster even needs to be synchronized across the data center.

Cache synchronization is certainly a good idea, but synchronization actions naturally have overhead. Is there a more simple and convenient way to deal with it? For example, can a fixed user only access the same server to solve the problem of cache synchronization?

This mode of fixed user access to a particular server is called sticky sessions mode. In this mode, you don't have to worry about session synchronization. Of course, in this mode, if a server goes down, the user's session will be lost. So you still have to do some session synchronization, but you don't need real-time synchronization.

In addition, sticky session has another disadvantage: if it is a request from the background, you cannot get the information of session, so you cannot implement sticky session. At this time, you need to copy the background data to ensure consistent performance no matter where the request is sent.

Shared databases

All applications need to save data. Generally speaking, we have two kinds of data:

One is database data, which will store user information permanently.

One is cache, which is used as a buffer for databases and applications.

No matter what kind of data, there can be a cluster mode, that is, multiple servers read and write data at the same time. In this way, the shared data involves the problem of updating the cluster data.

There are two update modes for updating cluster data:

One is reliability first, Active/Active mode, in which data updated by one node is immediately synchronized to another node.

One is performance priority, Active/Passive mode, in which data updated by one node is not immediately synchronized to another node.

The reliability-first running logic is that an update request will not be successful until all cluster services return a successful update. The running logic of performance priority is that even if the master data is updated successfully, other nodes will synchronize with the master data node asynchronously.

The cache used in keycloak is infinispan, and a variety of session caches are built, different caches use different synchronization strategies:

AuthenticationSessions: this cache holds the information of the logged-in user. If you are in sticky sessions mode, data synchronization is not required.

Action tokens: if users need to authenticate messages asynchronously, such as forgetting passwords, they need to use this type of cache. Because the token in this operation can only be used once, data synchronization is required.

Non-authenticated session information: because the use of sticky session mode is not guaranteed, it needs to be copied.

LoginFailures: count the login exceptions of users and do not need to be copied.

When saving data in the cache, we need to pay attention to the invalidation problem after the data is updated.

In keycloak, a separate work cache is used, which is synchronized by all data centers and does not store actual data, only data notifications to be invalid. The service of each data reads the invalid data list from the work cache and carries out the corresponding data cache invalidation treatment.

Multicasting

Finally, IP broadcasts are required if the cluster requires the ability to dynamically discover and manage nodes. For example, you can use JGroups to do this.

This is the end of "how to understand keycloak clustering". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.