How to understand the highly available solution of Harbor 07/15 Update SLTechnology News&Howtos

How to understand the highly available solution of Harbor

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly analyzes how to read the relevant knowledge points of the high-availability scheme of Harbor, the content is detailed and easy to understand, the operation details are reasonable, and has a certain reference value. If you are interested, you might as well follow the editor to have a look, and follow the editor to learn more about "how to understand Harbor's highly available solutions".

As more and more Harbor is deployed in production environment, the high availability of Harbor has become the focus of users' attention. For some large and medium-sized enterprise users, if there is only a single instance of Harbor, once a failure occurs, the pipeline from development to delivery may be forced to stop, unable to meet the high availability requirements.

High availability solutions for different Harbor-based installation packages are provided below, with the goal of removing a single point of failure and improving the high availability of the system. Among them, the high availability scheme based on Harbor Helm Chart is the officially verified scheme, and the high availability scheme based on multi-Kubernetes cluster and offline installation package is the reference scheme.

1. High availability Scheme based on Harbor Helm Chart

The Kubernetes platform has the ability of self-healing (self-healing). When the container crashes or does not respond, the container can be restarted automatically, and the container can be scheduled from the failed node to the normal node if necessary. This solution achieves high availability through Helm deploying Harbor Helm Chart to Kubernetes cluster, ensuring that each Harbor component has more than one copy running in the Kubernetes cluster, and when a Harbor container is not available, the Harbor service can still be used normally.

To achieve the high availability of Harbor in Kubernetes clusters, most of the components of Harbor are stateless. The state information of stateful components is stored in shared storage rather than in memory. In this way, you only need to configure the number of copies of the components in the Kubernetes cluster to achieve high availability with the help of the Kubernetes platform. (this article comes from the official account: Henry's Notes)

◎ Kubernetes platform achieves the expected number of replicas of Harbor components through coordinated scheduling (Reconciliation Loop) mechanism, so as to achieve high availability of services.

◎ PostgreSQL and Redis clusters achieve high availability, consistency and front-end session (session) sharing of data.

◎ shared data storage realizes the consistency of Artifact data.

Highly available architecture for multi-Kubernetes clusters

The solution of using Harbor Helm Chart to build a Harbor high availability environment in a single Kubernetes cluster is described above, in which the high availability of Harbor services is achieved, but the overall availability of the service is still affected by the availability of the Kubernetes cluster on which it runs. If the cluster crashes, it will cause the service to be unavailable. There are higher availability requirements in some production environments, so high availability solutions based on multiple data center deployments with multiple Kubernetes clusters are particularly important. The following is a reference scenario for building a Harbor highly available environment on multiple Kubernetes clusters across data centers.

As you can see in the figure above, Harbor has separate data and content storage in the two data centers. The remote replication function of Harbor is configured between the two data centers, which realizes the replication of Artifact (artifact) data (such as mirror replication). In other words, Artifact consistency is ensured by remote replication on the data storage of two Kubernetes clusters. For the data consistency of PostgreSQL and Redis between the two data centers, users are required to provide their own data backup scheme based on different types of data centers, in order to maintain the consistency of PostgreSQL and Redis data between the two data centers.

This scheme uses Harbor master-slave (Active-Standby) mode. Due to the use of mirroring and other Artifact remote replication, there is a certain delay in data synchronization, so we need to pay attention to the impact on the application in practical use. For users with low real-time requirements, you can refer to this solution to build a highly available solution across data centers and multi-Kubernetes clusters.

Note: during multiple installations, you need to ensure that the values of the values.yml configuration items core.secretName and core.xsrfKey are the same, and other configuration items can be configured according to the needs of different data centers.

For the specific reasons why core.secretName and core.xsrfKey values are the same, see the following section on files or configurations that need to be shared among multiple Harbor instances.

two。 High availability solution based on offline installation package

The highly available architecture based on Kubernetes cluster is an official solution provided by Harbor. However, for some reason, users may not be able to deploy a separate Kubernetes cluster and prefer to create a highly available scenario based on Harbor offline installation packages. Building a high availability system based on Harbor offline installation package is a complex task, which requires users to have a highly available technical foundation and have an in-depth understanding of the architecture and configuration of Harbor. The two general modes described in this section are for reference only and mainly explain the problems that users need to solve and what they need to pay attention to when achieving high availability based on offline installation packages. It is recommended to read the rest of this chapter to understand the installation and deployment of Harbor, and then modify and implement it according to their actual production conditions. (this article comes from the official account: Henry's Notes)

(1) High availability scheme based on shared service

The basic idea of this scheme is that multiple Harbor instances share PostgreSQL, Redis and storage, and multiple servers provide Harbor services through load balancers.

Important configuration: multiple Harbor instances need proper configuration to work. The relevant principles are described below. In practice, users can use them flexibly.

A) about the setting of the load balancer

You need to set the external_url entry for each Harbor instance's configuration file and specify that address as the address of the load balancer. After you specify the address of the load balancer through this entry, Harbor will no longer use hostname in the configuration file as the access address. Clients (Docker, browsers, etc.) access the API of the back-end service through the address provided by external_url (that is, the address of the load balancer). (this article comes from the official account: Henry's Notes)

If this value is not set, the client will access the API of the backend service based on the address of hostname, and load balancer does not play a role here. In other words, service access does not directly reach the backend through load balancer, and service access will fail when the backend address is not externally recognized (such as NAT or firewall).

If the Harbor instance uses HTTPS, especially if you own a certificate, you need to configure the load balancer to trust the certificate of each Harbor instance at its backend. At the same time, the load balancer certificate needs to be placed in each Harbor instance in the "ca_download" folder under the path specified by data_volume in the harbor.yml configuration item, which needs to be created manually. In this way, the certificate downloaded by the user from the UI of any Harbor instance is the load balancer certificate, as shown in the figure.

B) configuration of external database

Users need to create their own PostgreSQL shared instances or clusters and configure their information into the database configuration item external to each Harbor instance. Note: the external PostgreSQL database needs to create empty databases registry, clair, notary_server and notary_singer for Harbor Core, Clair, Notary Server and Notary Signer components in advance, and configure the created database information to the external database information section of the corresponding components. When Harbor starts, it automatically creates database tables for the corresponding database. (this article comes from the official account: Henry's Notes)

C) configuration of external Redis

Users need to create their own Redis shared instances or clusters and configure their information in the external Redis configuration item of each Harbor instance.

D) configuration of external storage

Users need to provide local or cloud shared storage and configure their information in the external storage configuration item of each Harbor instance.

E) Files or configurations that need to be shared among multiple Harbor instances

Highly available scenarios based on offline installation packages need to ensure the consistency of the following files across multiple instances. At the same time, because these files are generated by default during the installation of each Harbor instance, users need to manually copy these files to ensure consistency.

Private_key.pem and root.crt files

Harbor provides certificate and private key files for Distribution to create and verify the Bearer token in the request in the client authentication process. In the high availability scenario of multi-instance Harbor, it is necessary to ensure that the Bearer token created by any one instance can be identified and verified by other instances, that is, all instances need to use the same private_key.pem and root.crt files.

If the two files are different between multi-instance Harbor, random success or failure may occur in the authentication process. The reason for the failure is that the request was forwarded by the load balancer to an instance that did not create the Bearer token, and the instance was unable to resolve the token that was not created by itself, resulting in authentication failure.

The private_key.pem file is located in the "secret/core" subdirectory of the path specified by the harbor.yml configuration item data_volume. The root.crt file is located in the "secret/registry" subdirectory of the path specified by the harbor.yml configuration item data_volume.

Csrf_key

To prevent cross-site attacks (Cross Site Request Forgery), Harbor enables token check for csrf. Harbor will generate a random number and append it to the cookie as the token of csrf. When the user submits the request, the client will extract the random number from cookie and submit it as the token of csrf. Harbor rejects the access request based on whether the value is empty or invalid. Then, among multiple instances, it is necessary to ensure that the token created by any instance can be successfully verified by any other instance, that is, the csrf token private key value of each instance needs to be unified.

The configuration is located in the "common/config/core/env" file under the Harbor installation directory, and users need to manually copy the value of one Harbor instance to the other instances to make the value consistent across all instances.

Note: when you manually modify the above files or configurations, you need to restart the Harbor instance through docker-compose to make the configuration effective. In addition, if you want to use the prepare script in the Harbor installation package later, you need to repeat the above manual replication process. (this article comes from the official account: Henry's Notes)

(2) High availability scheme based on replication strategy

The basic idea of this scheme is that multiple Harbor instances use the native remote replication function of Harbor to achieve Artifact consistency, and multiple servers provide a single Harbor service through load balancer.

Scenario (2) differs from scenario (1) in that there is no need to specify external PostgreSQL, Redis and storage when installing Harbor instances, and each instance uses its own independent storage. The consistency of Artifact data is realized by remote replication among multiple instances of Harbor. The problem of data consistency between PostgreSQL and Redis requires users to realize the solution of data synchronization by themselves. The multi-instance solution based on replication is not as real-time as the solution based on shared storage, but it is easier to build. Users can use the PostgreSQL and Redis provided by the Harbor offline installation package.

This is the end of the introduction on "how to understand the highly available solution of Harbor". More related content can be searched for previous articles, hoping to help you answer questions and questions, please support the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.