How to configure a swam cluster 07/12 Update SLTechnology News&Howtos

How to configure a swam cluster

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

What this article shares to you is about how to configure swam clusters. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Overview

Each host (including physical and virtual machines) usually runs only one docker main process (a single host can be configured to run multiple docker main processes, but as of version 18.09.6, this feature is still in the experimental stage and there are many unresolved problems that should be avoided in the production environment). By default, docker master processes on multiple hosts are independent of each other, and each docker master process can only manage local containers, which is called independent container (standalone container) mode. Docker-ce has introduced swarm mode since version 1.12 to realize the cluster management of the docker main process, including container allocation, scale scaling, load balancing, update rollback, network settings and other functions.

Node (node)

The cluster contains one or more hosts running in swarm mode. Each host is called a node and is divided into two role levels: manager and worker. There can be multiple nodes with the same role. Roles can be assigned during cluster initialization or changed after cluster initialization.

The management node is responsible for cluster management, including maintaining the cluster state, scheduling services, and handling swarm mode API calls (including http requests and command line interfaces), and the worker node is responsible for executing the container. In addition to exiting the cluster, non-administrative nodes cannot perform any operations related to clusters, nodes, and services. By default, the management node also acts as a work node. Swarm allows only a single management node within the cluster, but does not allow only a working node without any management node.

The swarm scheduler decides whether to assign tasks to a node based on its availability (availability), including:

Active: the node expects to accept the new task.

Pause: the node does not expect to accept new tasks, but existing tasks will continue to run on the current node.

Drain: the node does not expect to accept new tasks, existing tasks will be closed, and the scheduler will reassign the appropriate number of tasks to other available nodes.

The availability of nodes can be specified during cluster initialization, defaults to Active, or can be changed after cluster initialization.

When there are multiple management nodes in the cluster, only one of them is the primary management node with a status of Leader, and the others are standby management nodes with a status of Reachable. Swarm provides the fault-tolerant mechanism of the management node through the Raft Consensus algorithm, which can realize the automatic selection and switching function of the management node within the failure quota. Assuming that the total number of management nodes is N and the failure quota is M, the relationship between them is M = (N-1) / 2 M rounded down. For example, when the total number of management nodes is 5 or 6, a maximum of 2 of them are allowed to fail at the same time. "if the number of failed management nodes is within the quota range, swarm automatically elects a new primary management node among other nodes with a management state of Reachable; if the quota is exceeded (including if only a single management node exists), the cluster feature will not be available and the cluster must be manually rebuilt to recover from the failure." The operational status of containers in the cluster is not affected by the failure of the management node, unless all nodes, including the management node and the worker node, are not available at the physical level.

The use of a single management node cluster should be avoided in a production environment, but state synchronization between management nodes can incur additional network overhead. Docker officially recommends that the number of management nodes should be odd as far as possible, and the maximum value is 7, in order to better play the fault-tolerant function of swarm. Therefore, at least 3 management nodes should be deployed within the cluster to provide basic availability and fault tolerance.

Service (service)

The service is the main operation object of the swarm cluster, which is used to define the expected running state of the container, including the images used by the container and the commands executed inside the container, as well as other configurable parameters, such as environment variables, open ports, network mode, system resources, mount options, rollback update policy, and so on. The status of the service can be specified on the command line or in the configuration file, and the swarm scheduler assigns tasks to available worker nodes based on the definition of the service status.

Services are divided into two operating modes: replicated and global. Replica mode runs a specified number of tasks on all work nodes, and the number of tasks running on each available node is assigned by the scheduler, while global mode runs a task on each available work node. the number of tasks depends on the number of available nodes. A particular service can only run in one of two modes, which defaults to replica mode.

Task (task)

The task is the atomic scheduling unit of the swarm cluster, and the container constitutes the whole life cycle of the corresponding task from startup to termination. The container is the instantiation of the task, and there is an one-to-one corresponding relationship between the two, and the operations related to the container are completed by the docker main process on the work node, so there is also an one-to-one correspondence between the task and the node.

After the task is assigned, it has the following states, according to which the scheduler continuously monitors the operation of containers and nodes:

NEW: the task is initialized.

PENDING: the task is ready, waiting for assignment.

ASSIGNED: the task is assigned to the node.

ACCEPTED: the task has been accepted by the worker node, waiting to be executed.

PREPARING: the node is preparing to perform the task.

STARTING: the task is starting.

RUNNING: the task is running.

COMPLETE: the task exited due to successful execution.

FAILED: the task exited due to an execution error.

SHUTDOWN: the task is closed by the node.

REJECTED: the task was rejected by the node.

ORPHANED: the task has been assigned, but the node is unreachable for a long time.

REMOVE: the task is not terminated, but the services associated with it are removed or downsized.

When the container fails to start or terminates due to an error, the scheduler assigns a new task and attempts to rerun the container. Because of the correspondence between tasks, nodes, and containers, the same task can only run on the assigned specific node until it is terminated, and cannot be transferred from one node to another.

Example

This section takes the tomcat container as an example to outline the basic management of clusters, services, and tasks.

Environment

There are 3 hosts: dock_host_0 (192.168.9.168), dock_host_1 (192.168.9.169), and dock_host_2 (192.168.9.170). The system is consistent with the software environment, and they are all installed with a new minimum, single physical network card, operating system version CentOS Linux release 7.6.1810 (Core), kernel version 3.10.0-957.12.2.el7.x86_64, and turn off selinux and firewall. Docker is installed by default, version 18.09.6, with no other additional settings.

The basic image is the latest version of CentOS 7 official image.

The jdk environment is mounted to the container's / opt/jdks directory by naming the volume jdks.

The source code packages jdk-8u212-linux-x64.tar.gz and apache-tomcat-8.5.40.tar.gz are located in the / opt/ directory of the host.

The tomcat environment is located in the / opt/apps/app_0 directory of the container.

Initialize the cluster

The initialization of the cluster includes docker swarm init and docker swarm join, both of which turn on the swarm mode of the docker engine.

1. Create a cluster.

The docker swarm init command is used to create a cluster and set its own node as the primary administrative role.

Host docker_host_0 creates a cluster, and the management status of the node is Leader:

[root@docker_host_0 ~] # ip addr show eth0 | sed-n'/ inet / p' | awk'{print $2} '192.168.9.168 / 24 [root @ docker_host_0 ~] # [root@docker_host_0 ~] # uname-r3.10.0-957.12.2.el7.x86_64 [root@docker_host_0 ~] # [root@docker_host_0 ~] # docker-vDocker version 18.09.6 Build 481bc77156 [root @ docker_host_0 ~] # [root@docker_host_0 ~] # docker swarm initSwarm initialized: current node (5h7m2fspnhtg0lr0x6d481qdr) is now a manager.To add a worker to this swarm, run the following command: docker swarm join--token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-434u94ack6bd9gwgxbvf2dqiw 192.168.9.168:2377To add a manager to this swarm, run 'docker swarm join-token manager' and follow the alternatives. [root @ docker_host_0 ~] #

two。 View node properties.

The docker node ls command is used to view node properties within a cluster, including:

ID: node ID.

HOSTNAME: node hostname.

STATUS: node status, Ready indicates that the node is available, Down indicates that the node has exited the cluster, and Unknown indicates an error in session synchronization between the node and the management node.

AVAILABILITY: node availability (Active/Pause/Drain).

MANAGER STATUS: management status / role, Leader represents the primary management node, Reachable represents the standby management node, and null means only the work node.

ENGINE VERSION:docker engine version.

[root@docker_host_0 ~] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr * docker_host_0 Ready Active Leader 18.09.6 [root @ docker_host_0 ~] #

3. View how the cluster is joined.

The docker swarm join-token-- rotate manager/worker command is used to view or set (--rotate) the token used by the management node / worker node to join the cluster, and the output contains the way in which the docker master process joins the cluster with the appropriate role.

[root@docker_host_0 ~] # docker swarm join-token managerTo add a manager to this swarm, run the following command: docker swarm join--token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-381n4jpj6ur4il4k6qo0wifhq 192.168.9.168: 2377 [root @ docker_host_0 ~] # docker swarm join-token workerTo add a worker to this swarm, run the following command: docker swarm join--token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-434u94ack6bd9gwgxbvf2dqiw 192.168.9.168: 2377 [root @ docker_host_0 ~] #

4. Join the cluster.

According to the output of the docker swarm join-token manager/worker command in the management node, the host docker_host_1 and docker_host_2 are added to the cluster in the administrative / working role, respectively.

[root@docker_host_1 ~] # ip addr show eth0 | sed-n'/ inet / p' | awk'{print $2} '192.168.9.169 / 24 [root @ docker_host_1 ~] # [root@docker_host_1 ~] # uname-r3.10.0-957.12.2.el7.x86_64 [root@docker_host_1 ~] # [root@docker_host_1 ~] # docker-vDocker version 18.09.6 Build 481bc77156 [root @ docker_host_1 ~] # [root@docker_host_1 ~] # docker swarm join-- token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-381n4jpj6ur4il4k6qo0wifhq 192.168.9.168:2377This node joined a swarm as a manager. [root @ docker_host_1 ~] # [root@docker_host_1 ~] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host _ 0 Ready Active Leader 18.09.6cos4ftcikaykcit9m15kqmvlh * docker_host_1 Ready Active Reachable 18.09.6 [root @ docker_host_1 ~] # [root@docker_host_2 ~] # ip addr show eth0 | sed-n'/ inet / p' | awk'{print $2} '192.168.9.170 / 24 [root@docker_host_2 ~] # [root@docker_host_2 ~] # uname-r3.10.0-957.12.2.el7.x86_64 [root@docker_host_2 ~] # [root@docker_host_2 ~] # docker-vDocker version 18.09.6 Build 481bc77156 [root @ docker_host_2 ~] # [root@docker_host_2 ~] # docker swarm join-- token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-434u94ack6bd9gwgxbvf2dqiw 192.168.9.168:2377This node joined a swarm as a worker. [root @ docker_host_2 ~] # [root@docker_host_2 ~] # docker node lsError response from daemon: This node is not a swarm manager. Worker nodes can't be used to view or modify cluster state. Please run this command on a manager node or promote the current node to a manager. [root @ docker_host_2 ~] #

5. Set the role of the node.

The docker node promote/demote command is used to upgrade / demote the role of a specified node.

The standby management node has the same cluster operation rights as the primary management node.

Raise the role level of the docker_host_2 node:

[root@docker_host_1] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Ready Active Leader 18.09.6cos4ftcikaykcit9m15kqmvlh * docker_host_1 Ready Active Reachable 18. 09.6rvomnj0q7aari989o3c4t6w02 docker_host_2 Ready Active 18.09.6 [root @ docker_host_1 ~] # [root@docker_host_1 ~] # docker node promote docker_host_2Node docker_host_2 promoted to a manager in the swarm. [root@docker_host_1 ~] # [root@docker_host_1 ~] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Ready Active Leader 18.09.6cos4ftcikaykcit9m15kqmvlh * docker_host_1 Ready Active Reachable 18.09.6rvomnj0q7aari989o3c4t6w02 docker_host_2 Ready Active Reachable 18.09.6 [root @ docker_host_1 ~] #

The docker info command can view cluster-related information, including whether swarm mode is enabled, cluster ID, number of management nodes, total number of nodes, management node IP, current node role / ID/IP, and so on.

[root@docker_host_1 ~] # docker info... Swarm: active NodeID: cos4ftcikaykcit9m15kqmvlh Is Manager: true ClusterID: odbfcfeayjogvdn34m3nruq2f Managers: 3 Nodes: 3 Default Address Pool: 10.0.0.0 SubnetSize: 24 Orchestration: Task History Retention Limit: 5 Raft: Snapshot Interval: 10000 Number of Old Snapshots to Retain: 0 Heartbeat Tick: 1 Election Tick: 10 Dispatcher: Heartbeat Period: 5 seconds CA Configuration: Expiry Duration: 3 months Force Rotate: 0 Autolock Managers: false Root Rotation In Progress: false Node Address: 192 .168.9.169 Manager Addresses: 192.168.9.168VR 2377 192.168.9.169VR 2377 192.168.9.170 Manager Addresses 2377... [root@docker_host_1 ~] #

At this point, there are three management nodes in the cluster, the docker_host_0 is the primary management node, and the rest are standby management nodes.

Run the global service

1. Prepare to apply the mirror.

Build the image in docker_host_0 as dockerfile, named tomcat_app:8.5.40.

Set up the tomcat:

The pattern field in server.xml is used to set the default access log format, which is changed to% local% {local} p% av% {remote} p, indicating local IP: Port-to-peer IP: Port to distinguish the access source.

[root@docker_host_0 ~] # cd / opt/ [root @ docker_host_0 opt] # [root@docker_host_0 opt] # lsapache-tomcat-8.5.40.tar.gz containerd jdk-8u212-linux- x64.tar.gz [root @ docker_host_0 opt] # [root@docker_host_0 opt] # tar axf apache-tomcat- 8.5.40.tar.gz [root @ docker_host_0 opt] # [root@docker_host_0 opt] # sed-I's sed docker_host_0 opt = "% h% l% u% t root@docker_host_0 opt"% remote% {apache-tomcat-8.5.40/conf/server.xml pattern=} p% docker_host_0 opt% {docker_host_0 opt} p% sed. XML [root @ docker_host_0 opt] # [root@docker_host_0 opt] # sed-n'/ pattern= "% AV% apache-tomcat-8.5.40/conf/server.xml pattern="% Alux% { Local} p av {remote} p t "r" s b "/ > [root@docker_host_0 opt] #

Set the data volume for mounting the jdk environment:

[root@docker_host_0 opt] # docker volume create jdksjdks [root @ docker_host_0 opt] # [root@docker_host_0 opt] # tar axf jdk-8u212-linux-x64.tar.gz-C / var/lib/docker/volumes/jdks/_data/ [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker volume lsDRIVER VOLUME NAMElocal jdks [root @ docker_host_0 opt] #

Set up the dockerfile:

[root@docker_host_0 opt] # vi dockerfile-for-tomcatFROM centos:latestCOPY apache-tomcat-8.5.40 / opt/apps/app_0EXPOSE 8080ENV JAVA_HOME / opt/jdks/jdk1.8.0_212WORKDIR / opt/apps/app_0CMD bin/catalina.sh r un[ root @ docker_host_0 opt] #

Compile the image:

[root@docker_host_0 opt] # docker image lsREPOSITORY TAG IMAGE ID CREATED size [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker build-f dockerfile-for-tomcat-t tomcat_app:8.5.40 .Sending build context to Docker daemon 219.1MBStep 1 to 6: FROM centos:latestlatest: Pulling from library/centos8ba884070f61: Pull completeDigest: sha256:b5e66c4651870a1ad435cd75922fe2cb943c9e973a9673822d1414824a1d0475Status: Downloaded newer image for Centos:latest-> 9f38484d220fStep 2 opt/jdks/jdk1.8.0_212 6: COPY apache-tomcat-8.5.40 / opt/apps/app_0-- > 155b18437d11Step 3 opt/jdks/jdk1.8.0_212 6: EXPOSE 8080-> Running in 93fdd5ea8433Removing intermediate container 93fdd5ea8433-- > 1c2487ffdd9bStep 4 opt/jdks/jdk1.8.0_212 6: ENV JAVA_HOME / opt/jdks/jdk1.8.0_212-> Running in 2ef953a36a71Removing intermediate container 2ef953a36a71-- > 459c7c25ccc2Step 5 opt/jdks/jdk1.8.0_212 6: WORKDIR / opt/apps/app_0-> Running in 8dc1cde1177eRemoving intermediate container 8dc1cde1177e-- > 35af515cc94fStep 6 / 6: CMD bin/catalina.sh run-- > Running in 6733ba74c3d0Removing intermediate container 6733ba74c3d0-> 74df48f4f0fcSuccessfully built 74df48f4f0fcSuccessfully tagged tomcat_app: 8.5.40 [root @ docker_host_0 opt] # [root@docker_host_0 opt] # [root@docker_host_0 opt] # docker image lsREPOSITORY TAG IMAGE ID CREATED SIZEtomcat_app 8.5.40 74df48f4f0fc 5 seconds ago 216MBcentos latest 9f38484d220f 2 months ago 202 MB [root @ docker_host_0 opt] #

two。 Create a global service.

On the node docker_host_0, create the service with tomcat_app:8.5.40 image (docker service create), mode (--mode) as global, name (--name) as webapp_g, mount (--mount) data volume, and open port (- p/--publish) 8080:

[root@docker_host_0 opt] # docker service create-- name webapp_g-- mount type=volume,src=jdks,dst=/opt/jdks-- mode global-p 808080VOF 8080 tomcat_app:8.5.40image tomcat_app:8.5.40 could not be accessed on a registry to recordits digest. Each node will access tomcat_app:8.5.40 independently,possibly leading to different nodes running differentversions of the image.kp6qdrzoswljwfmiphh29pogvoverall progress: 1 out of 3 tasks5h7m2fspnhtg: runningrvomnj0q7aar: No such image: tomcat_app:8.5.40cos4ftcikayk: No such image: tomcat_app: 8.5.40 ^ tolerance continuing in background.Use `docker service ps kp6qdrzoswljwfmiphh29pogv`root @ docker_host_0 opt] #

3. View the service properties.

The docker service ls command is used to view a list of services currently running in the cluster and related information, including:

ID: service ID.

NAME: service name.

MODE: service operation mode (global/replicated).

REPLICAS: number of tasks assigned successfully / number of tasks requested to be assigned.

IMAGE: image name.

PORTS: open ports and protocols.

[root@docker_host_0 opt] # docker service lsID NAME MODE REPLICAS IMAGE PORTSkp6qdrzoswlj webapp_g global 1 tomcat_app:8.5.40 3 tomcat_app:8.5.40 *: 8080-> 8080 / TCP [root @ docker_host_0 opt] #

4. View task properties.

The docker service ps command is used to view the execution of tasks in the specified service, including:

ID: task ID.

NAME: the container name corresponding to the task.

IMAGE: image name.

NODE: the node assigned by the task.

DESIRED STATE: the expected state of the task.

CURRENT STATE: the current state of the task.

ERROR: error message.

PORTS: open port.

The-f/--filter option filters the output in the way of key-value pairs. Currently supported keys include id/name/node/desired-state, corresponding to the above parameters.

[root@docker_host_0 opt] # docker service ps-f "node=docker_host_0" webapp_gID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSncd0lscuk5ts webapp_g.5h7m2fspnhtg0lr0x6d481qdr tomcat_app:8.5.40 docker_host_0 Running Running about a minute ago [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker service ps-f "node=docker_host_1" webapp_gID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSumkwfusbj5rt Webapp_g.cos4ftcikaykcit9m15kqmvlh tomcat_app:8.5.40 docker_host_1 Ready Preparing 3 seconds agobp49pjyqh6ku\ _ webapp_g.cos4ftcikaykcit9m15kqmvlh tomcat_app:8.5.40 docker_host_1 Shutdown Rejected 3 seconds ago "No such image: tomcat_app:8.5..." Qepo1tzhcz68\ _ webapp_g.cos4ftcikaykcit9m15kqmvlh tomcat_app:8.5.40 docker_host_1 Shutdown Rejected 8 seconds ago "No such image: tomcat_app:8.5..." 2gg2f0d8d3tk\ _ webapp_g.cos4ftcikaykcit9m15kqmvlh tomcat_app:8.5.40 docker_host_1 Shutdown Rejected 15 seconds ago "No such image: tomcat_app:8.5..." Rc41gutotc64\ _ webapp_g.cos4ftcikaykcit9m15kqmvlh tomcat_app:8.5.40 docker_host_1 Shutdown Rejected 21 seconds ago "No such image: tomcat_app:8.5..." [root@docker_host_0 opt] # [root@docker_host_0 opt] # [root@docker_host_0 opt] # docker service ps-f "node=docker_host_2" webapp_gID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSk8iyvkp5iv14 Webapp_g.rvomnj0q7aari989o3c4t6w02 tomcat_app:8.5.40 docker_host_2 Ready Rejected 1 second ago "No such image: tomcat_app:8.5..." Wbxd2787npfl\ _ webapp_g.rvomnj0q7aari989o3c4t6w02 tomcat_app:8.5.40 docker_host_2 Shutdown Rejected 5 seconds ago "No such image: tomcat_app:8.5..." Tv7x0fl8qwpe\ _ webapp_g.rvomnj0q7aari989o3c4t6w02 tomcat_app:8.5.40 docker_host_2 Shutdown Rejected 11 seconds ago "No such image: tomcat_app:8.5..." Vatre7kv4ggt\ _ webapp_g.rvomnj0q7aari989o3c4t6w02 tomcat_app:8.5.40 docker_host_2 Shutdown Rejected 16 seconds ago "No such image: tomcat_app:8.5..." Xge3egwymkmj\ _ webapp_g.rvomnj0q7aari989o3c4t6w02 tomcat_app:8.5.40 docker_host_2 Shutdown Rejected 22 seconds ago "No such image: tomcat_app:8.5..." [root@docker_host_0 opt] #

The service in global mode runs one task instance on each available node, but there is no tomcat_app:8.5.40 image in the local and mirror repositories of the docker_host_1 and docker_host_2 nodes, so the other two task instances have been rejected by the node because of a "No such image" error (the task status is Rejected), and then assign a new task to try to execute again.

The cluster service hides the operation details of the container and behaves in the same form. Sending a request to any node is accessible, and even if the specified container is not running on that node, the scheduler assigns the incoming connection to the container that runs successfully within the cluster.

The task on the docker_host_0 node executes successfully, the specified container is running and the port is open:

[root@docker_host_0 opt] # docker container ls-aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES8330cf1374db tomcat_app:8.5.40 "/ bin/sh-c 'bin/cat..." 4 minutes ago Up 4 minutes 8080/tcp webapp_ g.5h7m2fspnhtg0lr0x6d481qdr.ncd0lscuk5tsvsdmcqse3vibm [root @ docker_host_0 opt] # [root@docker_host_0 opt] # ss-atn | grep 8080LISTEN 0 128:: 8080:: * [root@docker_host_0 opt] #

The tasks on the docker_host_1 and docker_host_2 nodes were not performed successfully, but the ports are still open and provide access:

[root@docker_host_1 ~] # docker container ls-aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS name [root @ docker_host_1 ~] # [root@docker_host_1 ~] # ss-atn | grep 8080LISTEN 0 128:: 8080:: * [root@docker_host _ 1 ~] # [root@docker_host_1 ~] # curl-I-o / dev/null-s-w% {http_code} 192.168.9.168: 8080200 [root @ docker_host_1] # curl-I-o / dev/null-s-w% {http_code} 192.168.9.169: 80200 [root @ docker_host_1 ~] # curl-I-o / dev/null-s-w% {http_code} 192.168.9.170 : 8080200 [root@docker_host_1 ~] # [root@docker_host_2 ~] # docker container ls-aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [root @ docker_host_2 ~] # [root@docker_host_2 ~] # ss-atn | grep 8080LISTEN 0 128: 8080 : * [root@docker_host_2 ~] # [root@docker_host_2 ~] # curl-I-o / dev/null-s-w% {http_code} 192.168.9.168: 8080200 [root @ docker_host_2 ~] # curl-I-o / dev/null-s-w% {http_code} 192.168.9.169 curl 80200 [root@docker_host_2 ~] # curl-I-o / dev/null-s-w % {http_code} 192.168.9.170: 8080200 [root @ docker_host_2 ~] #

5. Set node availability.

The-- availability option of the docker swarm init/join command is used to set the availability of nodes when creating / joining the cluster, which defaults to active.

The docker node update command is used to set the availability of the specified node (--availability "active" | "pause" | "drain"), label (--label-add/--label-rm) and role (--role "worker" | "manager")

Change the availability of the docker_host_1 node to that of the pause,docker_host_2 node to drain:

[root@docker_host_1] # docker node update-- availability pause docker_host_1docker_host_1 [root@docker_host_1 ~] # docker node update-- availability drain docker_host_2docker_host_2 [root@docker_host_1 ~] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Ready Active Leader 18.09.6cos4ftcikaykcit9m15kqmvlh * docker_host_1 Ready Pause Reachable 18.09.6rvomnj0q7aari989o3c4t6w02 docker_host_2 Ready Drain Reachable 18.09.6 [root @ docker_host_1 ~] #

The docker_host_1 and docker_host_2 nodes reject the task because the mirror does not exist, so when the node availability of both nodes is changed to pause and drain, the number of tasks assigned by the service request changes from 3 to 1, and the previously failed two tasks no longer attempt to execute. As a result, only one task (container) is running on the docker_host_1 node.

[root@docker_host_0 opt] # docker service lsID NAME MODE REPLICAS IMAGE PORTSkp6qdrzoswlj webapp_g global 1 tomcat_app:8.5.40 1 tomcat_app:8.5.40 *: 8080-> 8080 / TCP [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker service ps-f 'desired-state=running' webapp_gID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSncd0lscuk5ts webapp_g.5h7m2fspnhtg0lr0x6d481qdr tomcat_app:8.5.40 docker_host_0 Running Running 40 minutes ago [root @ docker_host_0 opt] #

6. The node role is degraded.

Demote the role of the docker_host_1 and docker_host_2 nodes to non-administrative nodes:

[root@docker_host_0 opt] # docker node demote docker_host_1Manager docker_host_1 demoted in the slots. [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker node demote docker_host_2Manager docker_host_2 demoted in the slots. [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr * docker_host_0 Ready Active Leader 18.09.6cos4ftcikaykcit9m15kqmvlh docker_host_1 Ready Pause 18.09.6rvomnj0q7aari989o3c4t6w02 docker_host_2 Ready Drain 18.09.6 [root @ docker_host_0 opt] #

7. Exit the cluster.

Exiting the cluster is the only cluster-related operation that a non-administrative node has permission to perform, including the docker swarm leave command and the POST / swarm/leave interface, both of which have the same effect. After the exit operation, the swarm mode is turned off.

The docker daemon listens on the local UNIX domain socket / var/run/docker.sock by default.

The docker_host_1 node exits the cluster through the command line interface:

[root@docker_host_1 ~] # docker swarm leaveNode left the swarm. [root @ docker_host_1 ~] # [root@docker_host_1 ~] # docker info-f'{{.Swarm}}'{inactive false [] 0} [root@docker_host_1 ~] #

The docker_host_2 node exits the cluster by performing HTTP API:

[root@docker_host_2 ~] # curl-0-I-X POST-- unix-socket / var/run/docker.sock http:/swarm/leaveHTTP/1.0 200 OKApi-Version: 1.39Docker-Experimental: falseOstype: linuxServer: Docker/18.09.6 (linux) Date: Fri, 24 May 2019 05:04:55 GMTContent-Length: 0 [root@docker_host_2 ~] # docker info-f'{inactive false [] 00} [root@docker_host_2 ~] #

After exiting the cluster, the information entry for the corresponding node still exists, but the status changes to Down:

[root@docker_host_0 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr * docker_host_0 Ready Active Leader 18.09.6cos4ftcikaykcit9m15kqmvlh docker_host_1 Down Pause 18.09.6rvomnj0q7aari989o3c4t6w02 docker_host_2 Down Drain 18.09.6 [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker info-f'{{.saw.nodes}}'3 [root@docker_host_0 opt] #

8. Remove the node.

The docker node rm command is used to remove a specified node from within the cluster.

Remove the docker_host_1 and docker_host_2 nodes from the cluster:

[root@docker_host_0 opt] # docker node rm docker_host_1docker_host_1 [root@docker_host_0 opt] # [root@docker_host_0 opt] # docker node rm docker_host_2docker_host_2 [root@docker_host_0 opt] # [root@docker_host_0 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr * Docker_host_0 Ready Active Leader 18.09.6 [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker info-f'{{.saw.nodes}}'1 [root@docker_host_0 opt] #

9. Remove the service.

The docker service rm command is used to remove the specified service from within the cluster.

Remove the webapp_g service from the cluster:

[root@docker_host_0 opt] # docker service rm webapp_gwebapp_ g [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker service lsID NAME MODE REPLICAS IMAGE ports [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker container ls-aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES [root @ docker_host_0 opt] #

Rejoin the host docker_host_1 and docker_host_2 to the cluster in an administrative role:

[root@docker_host_0 opt] # docker swarm join-token-- rotate managerSuccessfully rotated manager join token.To add a manager to this swarm Run the following command: docker swarm join-- token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-cav7ypxfv6hzuyz5hq7jvn87l 192.168.9.168: 2377 [root @ docker_host_0 opt] # [root@docker_host_1 ~] # docker swarm join-- token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-cav7ypxfv6hzuyz5hq7jvn87l 192.168.9.168:2377This node joined a swarm as a manager. [root @ docker_host_1 ~] # [root@docker_host_1 ~] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Ready Active Leader 18.09.6upn0vc4vx47224gxaxn6hwec9 * docker_host_1 Ready Active Reachable 18.09.6 [root@docker_host_1 ~] # [root@docker_host_2 ~] # docker swarm join -- token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-cav7ypxfv6hzuyz5hq7jvn87l 192.168.9.168:2377This node joined a swarm as a manager. [root @ docker_host_2 ~] # [root@docker_host_2 ~] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Ready Active Leader 18.09.6upn0vc4vx47224gxaxn6hwec9 docker_host_1 Ready Active Reachable 18.09.6jekdpdzmwcxrdfsxaudzbdp2z * docker_host_2 Ready Active Reachable 18.09.6 [root @ docker_host_2 ~] #

So far, there are still three management nodes in the cluster, with docker_host_0 as the main management node.

Run the copy service

1. Import the image.

Package the mirror tomcat_app:8.5.40 on the docker_host_0 node and transfer it to the docker_host_1 and docker_host_2 nodes:

[root@docker_host_0 opt] # docker image lsREPOSITORY TAG IMAGE ID CREATED SIZEtomcat_app 8.5.40 74df48f4f0fc About an hour ago 216MBcentos latest 9f38484d220f 2 months ago 202MB [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker image save tomcat_ App:8.5.40-o tomcat_ app. Tar [root @ docker_host_0 opt] # [root@docker_host_0 opt] # ll-h tomcat_app.tar-rw- 1 root root 214M May 24 05:28 tomcat_ app.tar [root @ docker_host_0 opt] # [root@docker_host_0 opt] # scp tomcat_app.tar root@192.168.9.169:/opttomcat_app.tar 214MB 78.5MB/s 00:02 [root@docker_host_0 opt] # scp tomcat_app.tar root@192.168.9.170:/opttomcat_app.tar 100 214MB 96.0MB/s 00:02 [root@docker_host_0 opt] #

Import mirrors on the docker_host_1 and docker_host_2 nodes, respectively, and set up the data volumes required for the jdk environment:

[root@docker_host_1 ~] # cd / opt/ [root @ docker_host_1 opt] # [root@docker_host_1 opt] # lscontainerd jdk-8u212-linux-x64.tar.gz tomcat_ app.tar [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker image load-I tomcat_app.tard69483a6face: Loading layer 209.5MB/209.5MB59eb00de447b: Loading layer 14.39MB/14.39MBLoaded image: tomcat_app: 8.5.40 [root @ Docker_host_1 opt] # [root@docker_host_1 opt] # docker image lsREPOSITORY TAG IMAGE ID CREATED SIZEtomcat_app 8.5.40 74df48f4f0fc About an hour ago 216MB [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker volume create jdksjdks [root @ docker_host_1 opt] # [root@docker_host _ 1 opt] # tar axf jdk-8u212-linux-x64.tar.gz-C / var/lib/docker/volumes/jdks/_data/ [root @ docker_host_1 opt] # [root@docker_host_2 ~] # cd / opt/ [root @ docker_host_2 opt] # [root@docker_host_2 opt] # lscontainerd jdk-8u212-linux-x64.tar.gz tomcat_ app.tar [root @ docker_host_2 opt] # [root@docker_host_2 opt] # docker image load-I tomcat_app.tard69483a6face: Loading layer 209.5MB/209.5MB59eb00de447b: Loading layer 14.39MB/14.39MBLoaded image: tomcat_app: 8.5.40 [root @ docker_host_2 opt] # [root@docker_host_2 opt] # docker image lsREPOSITORY TAG IMAGE ID CREATED SIZEtomcat_app 8.5.40 74df48f4f0fc About an hour ago 216MB [root @ docker_host_2 opt] # [root@docker_host_2 opt] # docker volume create jdksjdks [root @ docker_host_2 opt] # [root@docker_host_2 opt] # tar axf jdk-8u212-linux-x64.tar.gz-C / var/lib/docker/volumes/jdks/_data/ [root @ docker_host_2 opt] #

two。 Create a copy service.

If the docker service create command does not specify the-- mode option, the service in replica mode is created by default, and the quantity is 1.

Perform tomcat_app:8.5.40 mirroring in replica mode with a quantity (--replicas) of 3 and the name webapp_d:

[root@docker_host_1 opt] # docker service create-- name webapp_d-- mount type=volume,src=jdks,dst=/opt/jdks-- replicas 3-p 8080pur8080 tomcat_app:8.5.40image tomcat_app:8.5.40 could not be accessed on a registry to recordits digest. Each node will access tomcat_app:8.5.40 independently,possibly leading to different nodes running differentversions of the image.hmhqo34e46m1syf4eoawre3fxoverall progress: 3 out of 3 tasks1/3: running2/3: running3/3: runningverify: Service converged [root@docker_host_1 opt] #

Tasks are assigned to an average of 3 nodes and run successfully:

[root@docker_host_1 opt] # docker service lsID NAME MODE REPLICAS IMAGE PORTShmhqo34e46m1 webapp_d replicated 3amp 3 tomcat_app:8.5.40 *: 8080-> 8080 / TCP [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker Service ps webapp_dID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSe2eynnrned2j webapp_d.1 tomcat_app:8.5.40 docker_host_1 Running Running 6 minutes agond8qg74l4t7b webapp_d.2 tomcat_app:8.5.40 Docker_host_2 Running Running 6 minutes agoe2ef0oc66sqh webapp_d.3 tomcat_app:8.5.40 docker_host_0 Running Running 6 minutes ago [root @ docker_host_1 opt] #

3. Change node availability (pause).

Change the availability of the docker_host_2 node to pause:

[root@docker_host_1 opt] # docker node update-- availability pause docker_host_2docker_host_2 [root@docker_host_1 opt] # [root@docker_host_1 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Ready Active Leader 18.09.6upn0vc4vx47224gxaxn6hwec9 * docker_host_1 Ready Active Reachable 18.09.6jekdpdzmwcxrdfsxaudzbdp2z docker_host_2 Ready Pause Reachable 18.09.6 [root @ docker_host_1 opt] #

The running task on the docker_host_2 node continues to run:

4. Change the replica size.

The docker service scale command is used to change the number of runs of the copy service. The parameter format is the service name or ID= value, and the specified value is the final number of copies run, which can be increased or decreased.

Set the number of copies of the service to 6:

[root@docker_host_1 opt] # docker service scale webapp_d=6webapp_d scaled to 6overall progress: 6 out of 6 tasks1/6: running2/6: running3/6: running4/6: running5/6: running6/6: runningverify: Service c onward [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker service lsID NAME MODE REPLICAS IMAGE PORTShmhqo34e46m1 Webapp_d replicated 6 tomcat_app:8.5.40 *: 8080-> 8080 / TCP [root @ docker_host_1 opt] #

The availability of the docker_host_2 node is pause, so new tasks are no longer accepted, and 3 new tasks are assigned to the other 2 nodes:

[root@docker_host_1 opt] # docker service ps webapp_dID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSe2eynnrned2j webapp_d.1 tomcat_app:8.5.40 docker_host_1 Running Running 21 minutes agond8qg74l4t7b webapp_d.2 Tomcat_app:8.5.40 docker_host_2 Running Running 21 minutes agoe2ef0oc66sqh webapp_d.3 tomcat_app:8.5.40 docker_host_0 Running Running 21 minutes ago67mfqjvqgi7b webapp_d.4 tomcat_app:8.5.40 docker_host_0 Running Running 2 minutes agoqrdqrzm2f6si webapp_d .5 tomcat_app:8.5.40 docker_host_1 Running Running 2 minutes agomejk0zee8ovy webapp_d.6 tomcat_app:8.5.40 docker_host_1 Running Running 2 minutes ago [root @ docker_host_1 opt] #

5. Change node availability (drain).

Change the availability of the docker_host_2 node to drain:

[root@docker_host_1 opt] # docker node update-- availability drain docker_host_2docker_host_2 [root@docker_host_1 opt] # [root@docker_host_1 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Ready Active Leader 18.09.6upn0vc4vx47224gxaxn6hwec9 * docker_host_1 Ready Active Reachable 18.09.6jekdpdzmwcxrdfsxaudzbdp2z docker_host_2 Ready Drain Reachable 18.09.6 [root @ docker_host_1 opt] #

The total number of replicas remains the same, but the previously running task (nd8qg74l4t7b) on the docker_host_2 node is turned off, and the new task (tuuq6q1tlcib) is assigned to the docker_host_0 node, resulting in 3 tasks each running on the docker_host_0 and docker_host_1 nodes:

[root@docker_host_1 opt] # docker service lsID NAME MODE REPLICAS IMAGE PORTShmhqo34e46m1 webapp_d replicated 6 tomcat_app:8.5.40 6 tomcat_app:8.5.40 *: 8080-> 8080 / TCP [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker Service ps webapp_dID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSe2eynnrned2j webapp_d.1 tomcat_app:8.5.40 docker_host_1 Running Running 28 minutes agotuuq6q1tlcib webapp_d.2 tomcat_app:8.5. 40 docker_host_0 Running Running 13 seconds agond8qg74l4t7b\ _ webapp_d.2 tomcat_app:8.5.40 docker_host_2 Shutdown Shutdown 15 seconds agoe2ef0oc66sqh webapp_d.3 tomcat_app:8.5.40 docker_host_0 Running Running 28 minutes ago67mfqjvqgi7b webapp_d.4 tomcat_app:8 .5.40 docker_host_0 Running Running 9 minutes agoqrdqrzm2f6si webapp_d.5 tomcat_app:8.5.40 docker_host_1 Running Running 9 minutes agomejk0zee8ovy webapp_d.6 tomcat_app:8.5.40 docker_host_1 Running Running 9 minutes Aso [root @ docker_host_1 opt] #

The container on the docker_host_2 node exits, but the port remains open and provides access to the outside world:

[root@docker_host_2 opt] # docker container ls-aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES50187866b04e tomcat_app:8.5.40 "/ bin/sh-c 'bin/cat..." 36 minutes ago Exited 8 minutes ago webapp_ d.2.nd8qg74l4t7b2oju7bzs9qsk1 [root @ docker_host_2 opt] # [root@docker_host_2 opt] # ss-atn | grep 8080LISTEN 0128:: 8080: * [root@docker_host_2 opt] # [root@docker_host_1 opt] # curl-I-o / dev/null-s- W% {http_code} 192.168.9.170: 8080200 [root @ docker_host_1 opt] #

So far, there are still three management nodes in the cluster, with docker_host_0 as the main management node.

Fault recovery

The failure quota of the three management nodes is (3-1) / 2 = 1. If the number of failures is less than or equal to the quota, swarm automatically performs election and switching; if the quota is beyond the range, the cluster must be manually forced to rebuild (docker swarm init-f/--force).

When a failure occurs, you need to view the cluster, node, service, task and other information, combined with command line output and log to locate the cause of the fault. The docker log file under CentOS 7 defaults to / var/log/message and can be set as a separate file. Refer to the docker CE on Linux example (1) installation and basic operation.

The / var/lib/docker/swarm/ directory of the management node is used to save the cluster state and administrative logs, and you can perform failure recovery by following the steps of backup directory, import directory, and rebuild the cluster. Docker officially recommends that backup and import operations be performed after the main docker process is deactivated.

1. Fault simulation.

Stop the docker daemon in the primary management node docker_host_0:

[root@docker_host_0 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr * docker_host_0 Ready Active Leader 18.09.6upn0vc4vx47224gxaxn6hwec9 docker_host_1 Ready Active Reachable 18. 09.6jekdpdzmwcxrdfsxaudzbdp2z docker_host_2 Ready Drain Reachable 18.09.6 [root @ docker_host_0 opt] # [root@docker_host_0 opt] # systemctl stop docker [root @ docker_host_0 opt] #

The main administrative role automatically switches to the docker_host_2 node, and the cluster-related functions are not affected, but the state of the docker_host_0 node changes to Unknown and Down, and the administrative state changes to Unreachable:

[root@docker_host_1 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Unknown Active Unreachable 18.09.6upn0vc4vx47224gxaxn6hwec9 * docker_host_1 Ready Active Reachable 18.09.6jekdpdzmwcxrdfsxaudzbdp2z Docker_host_2 Ready Drain Leader 18.09.6 [root @ docker_host_1 opt] #

The number of copies has not changed, but all three tasks previously running on the docker_host_0 node are closed, and the newly assigned tasks are assigned to the docker_host_1 node, resulting in 6 tasks running on the docker_host_1 node:

[root@docker_host_1 opt] # docker service lsID NAME MODE REPLICAS IMAGE PORTShmhqo34e46m1 webapp_d replicated 6 tomcat_app:8.5.40 6 tomcat_app:8.5.40 *: 8080-> 8080 / TCP [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker Service ps webapp_dID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSe2eynnrned2j webapp_d.1 tomcat_app:8.5.40 docker_host_1 Running Running 2 minutes agownrm2ndqjk7r webapp_d.2 tomcat_app:8.5. 40 docker_host_1 Running Running 2 minutes agotuuq6q1tlcib\ _ webapp_d.2 tomcat_app:8.5.40 docker_host_0 Shutdown Running 16 minutes agond8qg74l4t7b\ _ webapp_d.2 tomcat_app:8.5.40 docker_host_2 Shutdown Shutdown 16 minutes agoxazm6xhtji5d webapp_d.3 tomcat_app:8 .5.40 docker_host_1 Running Running 2 minutes agoe2ef0oc66sqh\ _ webapp_d.3 tomcat_app:8.5.40 docker_host_0 Shutdown Running 44 minutes agooervwdwtj9ei webapp_d.4 tomcat_app:8.5.40 docker_host_1 Running Running 2 minutes ago67mfqjvqgi7b\ _ webapp_d.4 tomcat _ app:8.5.40 docker_host_0 Shutdown Running 25 minutes agoqrdqrzm2f6si webapp_d.5 tomcat_app:8.5.40 docker_host_1 Running Running 2 minutes agomejk0zee8ovy webapp_d.6 tomcat_app:8.5.40 docker_host_1 Running Running 2 minutes Aso [root @ docker_host_1 opt] #

Stop the docker daemon in the management node docker_host_2:

[root@docker_host_2 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Down Active Unreachable 18.09.6upn0vc4vx47224gxaxn6hwec9 docker_host_1 Ready Active Reachable 18.09. 6jekdpdzmwcxrdfsxaudzbdp2z * docker_host_2 Ready Drain Leader 18.09.6 [root @ docker_host_2 opt] # [root@docker_host_2 opt] # systemctl stop docker [root @ docker_host_2 opt] #

The cluster feature is not available, but you can still access the running container:

[root@docker_host_1 opt] # docker node lsError response from daemon: rpc error: code = DeadlineExceeded desc = context deadline exceded[ root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker service lsError response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.[ root @ docker_host_1 opt] # docker service ps webapp_gError response from daemon: rpc error: code = Unknown desc = The swarm does not have a leader. It's possible that too few managers are online. Make sure more than half of the managers are online.[ root @ docker_host_1 opt] # docker container ls-aCONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMESda48a25cf5b2 tomcat_app:8.5.40 "/ bin/sh-c 'bin/cat..." 13 minutes ago Up 13 minutes 8080/tcp webapp_d.4.oervwdwtj9eixi9ye225bfaqu2cfc8b941397 tomcat_app:8.5.40 "/ bin/sh-c 'bin/cat..." 13 minutes ago Up 13 minutes 8080/tcp webapp_d.2.wnrm2ndqjk7r6couisn3jhuuvc419aa6ac995 tomcat_app:8.5.40 "/ bin/sh-c 'bin/cat..." 13 minutes ago Up 13 minutes 8080/tcp webapp_d.3.xazm6xhtji5dir6xed8gdmmim577267e59128 tomcat_app:8.5.40 "/ bin/sh-c 'bin/cat..." 37 minutes ago Up 37 minutes 8080/tcp webapp_d.6.mejk0zee8ovyiukv2yxh88n4sbfa09130f72f tomcat_app:8.5.40 "/ bin/sh-c 'bin/cat..." 37 minutes ago Up 37 minutes 8080/tcp webapp_d.5.qrdqrzm2f6siee5pz7a84p6gi9c9455285a21 tomcat_app:8.5.40 "/ bin/sh-c 'bin/cat..." About an hour ago Up About an hour 8080/tcp webapp_ d.1.e2eynnrned2j9732uhf1v0dhi [root @ docker_host_1 opt] # [root@docker_host_1 opt] # ss-atn | grep 8080LISTEN 0 128: 8080: * [root@docker_host_1 opt] # [root@docker_host_1 opt] # curl-I-o / dev/null-s-w% {http_code} 192.168.9.1698080200

two。 Rebuild the cluster.

Manually force the rebuilding of the cluster on the docker_host_1 node and remove the failed docker_host_0 and docker_host_2 nodes:

[root@docker_host_1 opt] # docker swarm init-force-new-clusterSwarm initialized: current node (upn0vc4vx47224gxaxn6hwec9) is now a manager.To add a worker to this swarm, run the following command: docker swarm join-token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-434u94ack6bd9gwgxbvf2dqiw 192.168.9.169:2377To add a manager to this swarm Run 'docker swarm join-token manager' and follow the substitutions. [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION5h7m2fspnhtg0lr0x6d481qdr docker_host_0 Down Active 18.09.6upn0vc4vx47224gxaxn6hwec9 * docker_host _ 1 Ready Active Leader 18.09.6jekdpdzmwcxrdfsxaudzbdp2z docker_host_2 Unknown Drain 18.09.6 [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker node rm docker_host_0docker_host_0 [root@docker_host_1 opt] # docker node rm Docker_host_2docker_host_2 [root@docker_host_1 opt] #

Because the / var/lib/docker/swarm/ directory of the docker_host_1 node holds the cluster state data, after performing the rebuild, the cluster ID is the same as the previous one (odbfcfeayjogvdn34m3nruq2f), indicating that no new cluster was created.

[root@docker_host_1 opt] # ll / var/lib/docker/swarm/total 8drwxr-xr-x 2 root root 75 May 24 05:10 certificates-rw- 1 root root 193 May 24 07:24 docker-state.jsondrwx- 4 root root 55 May 24 05:10 raft-rw- 1 root root 69 May 24 07:24 state.jsondrwxr-xr-x 2 root root 22 May 24 05:10 worker [root @ docker_host_1 opt ] # [root@docker_host_1 opt] # docker info-f'{{.Signor.Cluster.ID}} 'odbfcfeayjogvdn34m3nruq2f [root @ docker_host_1 opt] # [root@docker_host_1 opt] # docker swarm join-token managerTo add a manager to this swarm Run the following command: docker swarm join-- token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-cav7ypxfv6hzuyz5hq7jvn87l 192.168.9.169: 2377 [root @ docker_host_1 opt] #

After the failed node docker_host_0 and docker_host_2 are forced to detach from the cluster, rejoin with the administrative role:

[root@docker_host_0 opt] # systemctl start docker [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker swarm leave-fNode left the slots. [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker swarm join-- token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-cav7ypxfv6hzuyz5hq7jvn87l 192.168.9.169:2377This node joined a swarm as a manager. [root @ docker_host_0 opt] # [root@docker_host_0 opt] # docker Node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSIONi6jberxzc51hprbtgh94e1nzw * docker_host_0 Ready Active Reachable 18.09.6upn0vc4vx47224gxaxn6hwec9 docker_host_1 Ready Active Leader 18.09.6 [root @ docker_host_0 opt ] # [root@docker_host_2 opt] # systemctl start docker [root @ docker_host_2 opt] # [root@docker_host_2 opt] # docker swarm leave-fNode left the slots. [root @ docker_host_2 opt] # [root@docker_host_2 opt] # docker swarm join-- token SWMTKN-1-4nsmenxl72484akypkevpirfse35u2ouxusbgemzzkuz0otgyv-cav7ypxfv6hzuyz5hq7jvn87l 192.168.9.169:2377This node joined a swarm as a manager. [root @ docker_host_2 opt] # [root@docker_host_2 opt] # docker node lsID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSIONi6jberxzc51hprbtgh94e1nzw docker_host_0 Ready Active Reachable 18.09.6upn0vc4vx47224gxaxn6hwec9 docker_host_1 Ready Active Leader 18.09.6sp544qzpe3ghr4ox6gvdv3ylo * docker_host_2 Ready Active Reachable 18.09.6 [root @ docker_host_2 opt] #

After the cluster is restored, all tasks run on the docker_host_1 node:

[root@docker_host_2 opt] # docker service ps-f 'desired-state=running' webapp_dID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSm9qad5jldkmf webapp_d.1 tomcat_app:8.5.40 docker_host_1 Running Running 22 minutes ago43ycztehfjft Webapp_d.2 tomcat_app:8.5.40 docker_host_1 Running Running 22 minutes agoeu49cks7twj1 webapp_d.3 tomcat_app:8.5.40 docker_host_1 Running Running 22 minutes agopagn85s95a4l webapp_d.4 tomcat_app:8.5.40 docker_host_1 Running Running 22 minutes Agomep9zebz50be webapp_d.5 tomcat_app:8.5.40 docker_host_1 Running Running 22 minutes agoq8cetbu1lgpa webapp_d.6 tomcat_app:8.5.40 docker_host_1 Running Running 22 minutes ago [root @ docker_host_2 opt] #

The docker service update-- force command is used to force load balancing of cluster tasks, which includes stopping and reassigning tasks.

[root@docker_host_2 opt] # docker service update-- force webapp_dwebapp_doverall progress: 6 out of 6 tasks1/6: running2/6: running3/6: running4/6: running5/6: runningverify: Service conver ged [root @ docker_host_2 opt] # [root@docker_host_2 opt] # docker service ps-f 'desired-state=running' webapp_dID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTSpezzriea1ql3 webapp_d.1 tomcat_app:8.5.40 docker_host_0 Running Running 35 seconds agomiehr525l161 webapp_d.2 tomcat_app:8.5.40 docker_host_2 Running Running 22 seconds agoivo43js9eolh webapp_d.3 tomcat_ App:8.5.40 docker_host_1 Running Running 13 seconds agoool0tu1tyke3 webapp_d.4 tomcat_app:8.5.40 docker_host_1 Running Running 18 seconds agounysta4y6woe webapp_d.5 tomcat_app:8.5.40 docker_host_0 Running Running 26 seconds agoj63gtlovl0k9 webapp_d.6 Tomcat_app:8.5.40 docker_host_2 Running Running 31 seconds ago [root @ docker_host_2 opt] # above is how to configure the swam cluster The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.