Some pits in pve+ceph (5) 07/11 Update SLTechnology News&Howtos

Some pits in pve+ceph (5)

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

The clock of the three servers is very important, so configure the clock to be consistent.

Ceph health detail

HEALTH_WARN application not enabled on 1 pool (s)

POOL_APP_NOT_ENABLED application not enabled on 1 pool (s)

Application not enabled on pool 'kube'

Use 'ceph osd pool application enable', where is' cephfs', 'rbd',' rgw', or freeform for custom applications.

Ceph osd pool application enable kube rbd

Enabled application 'rbd' on pool' kube'

Ceph health

HEALTH_OK

All nodes must be on the same subnet so that each node can communicate using corosync multicast (see Corosync Cluster Engine for details). Corosync uses UDP 5404 and 5405 ports for cluster communication

Note: some switches turn off IP Multicast by default, so you need to enable Multicast Communication manually first.

Add nodes located in different network segments

If you want to add a node and the cluster network and the node are in a different network segment, you need to use the ringX_addr parameter to specify the address that the node uses within the cluster network.

Pvecm add IP-ADDRESS-CLUSTER-ring0_addr IP-ADDRESS-RING0

If you want to use the redundant ring protocol, you also need to set the ring1_addr parameter to pass the second cluster network address.

Delete nod

Warning: please read the deletion procedure carefully before deleting the node, otherwise something you did not expect may happen.

First, migrate all virtual machines on the node to be deleted to other nodes. Make sure that there are no data and backups you need to keep on the node to be deleted, or that the related data has been properly backed up.

Important: at this point, you must shut down and power off the node to be deleted to ensure that the node is no longer started (within the current cluster network).

Root@pve-1:~# pvecm nodes

Membership informationNodeid Votes Name 1 192.168.77.160 (local) 2 1 192.168.77.170 3 1 192.168.77.180

Root@pve-1:~#

one

two

three

four

five

six

seven

eight

nine

Root@pve-1:~# pvecm nodes

Membership informationNodeid Votes Name 1 192.168.77.160 (local) 2 1 192.168.77.170 3 1 192.168.77.180

Root@pve-1:~#

Hp1# pvecm delnode hp4

If the command executes successfully, it will return directly, and there will be no output. You can run pvecm nodes or pvecm status to check the cluster status after the node is deleted.

Important: as mentioned earlier, the node to be deleted must be closed before the delete command is executed, and ensure that the deleted point is no longer started (in the original cluster network). This is very important!

If you restart the deleted nodes in the original cluster network, your cluster will crash and it will be difficult to restore to a clean state.

If, for some reason, you need to rejoin the deleted node to the original cluster, follow these steps:

Format the deleted node and reinstall Proxmox VE.

Rejoin the node into the cluster as described in the previous section.

Isolated node

Important: we do not recommend using isolated node operations, please be careful when operating in this way. If you have doubts about the results of the operation, it is recommended to delete the node.

You can isolate a node from the cluster without formatting and reinstalling the node. However, after isolating the node from the cluster, the isolated node can still access the shared storage configured for it by the original Proxmox VE cluster.

You have to solve this problem before isolating the nodes. Because there is no guarantee to avoid virtual machine ID conflicts

Therefore, the same storage device cannot be shared between Proxmox VE clusters. It is recommended to create a new storage service exclusively for the nodes to be isolated. For example, you can assign a new NFS service or Ceph storage pool to the node to be isolated. You must ensure that the storage service is exclusive. After the storage is allocated, the virtual machines of the node can be migrated to the new storage service, and then the operation of isolating the node can begin.

Warning: you must ensure that all resources are completely quarantined. Otherwise, conflicts or other problems may arise.

First stop the pve-cluster service on the node to be quarantined:

Systemctl stop pve-cluster

Systemctl stop corosync

Then set the cluster file system of the node to be isolated to local mode:

Pmxcfs-l

Next, delete the corosync configuration file:

Rm / etc/pve/corosync.conf

Rm / etc/corosync/*

Finally, restart the cluster file system service:

Killall pmxcfs

Systemctl start pve-cluster

At this point, the node has been isolated from the cluster. You can execute the delete command on any node in the original cluster:

Pvecm delnode oldnode

If the remaining nodes in the original cluster no longer satisfy the majority of votes due to the previous isolation operation, the node deletion command will fail. You can set the expected majority of votes to 1, as follows:

Pvecm expected 1

Then repeat the node delete command.

Then you can log back in to the isolated node and delete the configuration files left behind by the original cluster. Delete

When completed, the node can rejoin any other cluster.

Rm / var/lib/corosync/*

The cluster file system of the isolated node still has configuration files related to other nodes in the original cluster, which also need to be deleted. You can recursively delete the / etc/pve/nodes/NODENAME directory to clear these files. However, please check again before performing the delete operation to make sure that the delete operation is correct.

Warning: the SSH public keys of other nodes in the original cluster will remain in the authorized_key file. This means that the isolated node and the original cluster node can still access each other using the SSH public key. To avoid unexpected conditions, delete the corresponding public key in the / etc/pve/priv/authorized_keys file.

Majority vote

Proxmox VE adopts a mechanism based on majority vote (quorum) to ensure that the state of cluster nodes is consistent. Majority vote refers to the minimum number of votes that must be obtained for a distributed transaction to be executed in a distributed system. -- Wikipedia majority vote (distributed computing)

In the case that the network may be split into multiple areas, modifying the cluster state requires that most of the nodes are online. If the number of nodes in the cluster is not enough to constitute a majority of votes, the cluster will automatically become read-only.

Note: by default, each node in the Proxmox VE cluster has one vote.

Cluster network

Cluster network is the core of Proxmox VE cluster. The cluster network must ensure that cluster communication packets are delivered reliably to all nodes in sequence. Proxmox VE uses corosync to achieve trunked network communication to ensure high performance, low latency and high availability of trunked network communication. Our distributed cluster file system (pmxcfs) is built on this basis.

Cluster network configuration requirements

Proxmox VE cluster network can work only when the network latency is lower than 2ms (within the local area network). Although corosync supports unicast communication between nodes, we strongly recommend using multicast for cluster communication. There should be no other heavy traffic within the cluster network. Ideally, corosync would prefer to have a private network.

Be careful not to run both Proxmox VE clustering and storage services on the same network.

The best practice is to check the quality of the network before creating a cluster to ensure that the network meets the requirements of cluster communication.

Make sure that all nodes are on the same network segment. And make sure that only the network card used for trunking communication (corosync) is connected to the network.

Ensure that the networks between the nodes are connected to each other properly. You can use the ping command to test.

To ensure that the multicast network communication works properly and can achieve a very high packet transmission speed. You can use the omping command to test. Under normal circumstances, the packet loss rate should be less than 1%.

Omping-c 10000-I 0.001-F-Q NODE1-IP NODE2-IP...

Ensure that multicast communications work reliably within the required period of time. This is primarily to prevent the physical switch from enabling IGMP but not configuring a multicast querier (multicast querier). The test lasts at least 10 minutes.

Omping-c 600-I 1-Q NODE1-IP NODE2-IP...

If any of the above tests fail, your network is not suitable for building Proxmox VE clusters. At this point you need to check the network configuration. In general, either the switch does not enable multicast communication, or the switch is configured with IGMP but not multicast querier.

If you have a small number of cluster nodes, you can also consider using unicast if you really can't use multicast communication.

Independent cluster network

By default, when you create a cluster without any parameters, the Proxmox VE cluster shares the same network as Web GUI and virtual machines. If you do not configure it properly, storage network traffic may also be transmitted over a clustered network. We recommend that you avoid sharing cluster networks with other applications because corosync is a latency-sensitive real-time application.

Prepare a new network

First, you need to prepare a new network port, which should be connected to a separate physical network. Secondly, we need to ensure that this network meets the above 5.71 cluster network configuration requirements.

Configure a separate network when creating a cluster

You can use the pvecm command with ring0_addr and bindnet0_addr parameters to create a Proxmox VE cluster with a separate network.

If you want to configure a stand-alone network card for cluster communication, and the network card is configured with a static IP address 10.10.10.1and25

Then you can use the following command:

Pvecm create test-ring0_addr 10.10.10.1-bindnet0_addr 10.10.10.0

You can then use the following command to check whether the cluster communication is normal:

Systemctl status corosync

Configure an independent network after creating a cluster

Even after you create a cluster, you can configure the cluster to communicate with other separate networks without having to rebuild the entire cluster. To modify the cluster communication network, the corosync services of each node need to be restarted one by one in order to use the new network to communicate, which may cause the cluster to lose the majority of votes for a short time.

First of all, make sure that you know how to edit corosync.conf files. Then open the corosync.conf file. An example of the content of the configuration file corosync.conf is as follows:

Logging {

Debug: off

To_syslog: yes

}

Nodelist {

Node {

Name: due

Nodeid: 2

Quorum_votes: 1

Ring0_addr: due

}

Node {

Name: tre

Nodeid: 3

Quorum_votes: 1

Ring0_addr: tre

}

Node {

Name: uno

Nodeid: 1

Quorum_votes: 1

Ring0_addr: uno

}

Quorum {

Provider: corosync_votequorum

}

Totem {

Cluster_name: thomas-testcluster

Config_version: 3

Ip_version: ipv4

Secauth: on

Version: 2

Interface {

Bindnetaddr: 192.168.30.50

Ringnumber: 0

}

First, if the name attribute is missing from the node object, you need to add it manually. Note that the value of the name attribute must match the hostname of the node.

Then, you need to change the value of the ring0_addr attribute to the address of the node in the new cluster network. You can use the IP address or hostname to set the ring0_addr property. If you use a host name, you must make sure that all nodes can resolve the host name smoothly.

Here, we plan to change the trunked communication network to 10.10.10.1 ring0_addr 25, so we need to modify all the trunked communication properties accordingly. In addition, you need to change the value of the bindnetaddr attribute in the totem section to the address in the new network. This address can be configured as the IP address of the current node connected to the new cluster network card.

Finally, you need to increase the value of the config_version parameter by 1. An example of the modified configuration file is as follows:

Logging {

Debug: off

To_syslog: yes

}

Nodelist {

Node {

Name: due

Nodeid: 2

Quorum_votes: 1

Ring0_addr: 10.10.10.2

}

Node {

Name: tre

Nodeid: 3

Quorum_votes: 1

Ring0_addr: 10.10.10.3

}

Node {

Name: uno

Nodeid: 1

Quorum_votes: 1

Ring0_addr: 10.10.10.1

}

Quorum {

Provider: corosync_votequorum

}

Totem {

Cluster_name: thomas-testcluster

Config_version: 4

Ip_version: ipv4

Secauth: on

Version: 2

Interface {

Bindnetaddr: 10.10.10.1

Ringnumber: 0

}

Finally, you need to check again whether the configuration changes are correct, and then you can enable the new configuration according to the contents of the edit corosync.conf file section.

Since the modified configuration does not take effect online in real time, the corosync service must be restarted.

Execute on one node:

Systemctl restart corosync

Then check whether the cluster communication is normal.

Systemctl status corosync

If the corosync service can be successfully restarted and run normally on all nodes, then all nodes will be connected to the new cluster network one by one.

Cold start of cluster

Obviously, when all the nodes are offline, the cluster will not be able to meet the majority vote requirements. For example, after an accidental power outage in the computer room, the cluster is often in this state.

Note: using an uninterruptible power supply (UPS, also known as "backup battery power") is a good way to prevent a cluster from losing most votes due to a power outage, especially if you need to achieve a HA effect.

When a node restarts, the pve-manager service waits for the node to join the cluster and get a majority of votes. Once a majority of votes are obtained, the service starts all virtual machines with the onboot identity set.

Therefore, when you start a node, or when you restore power after an unexpected power outage, you will find that some nodes start faster than others. It is also important to note that no virtual machine can be started between your cluster getting a majority of votes.

Virtual machine migration

The ability to migrate virtual machines from one node to another is an important feature of clusters. Proxmox VE provides some ways for you to control the virtual machine migration process. First, datacenter.cfg provides some configuration parameters, and secondly, the migration command line and the API interface provide relevant control parameters.

5.10.1 Migration Typ

The migration type means that the migration process uses encrypted (secure) or unencrypted (insecure) channels to transfer virtual machine data. When the migration type is set to insecure, the memory data of the virtual machine will be transferred in clear text during the migration, which may lead to the disclosure of sensitive data (such as passwords, keys) on the virtual machine.

Therefore, we strongly recommend using secure channels to migrate virtual machines, especially if you cannot control the entire network link and cannot guarantee that the network will not be eavesdropped.

Note: virtual machine disk migration is not affected by this configuration. Currently, virtual machine disks are always migrated through secure channels.

Because data encryption consumes a lot of computing resources, the virtual machine often chooses an "insecure" transmission mode when migrating to save computing resources. The newer system uses hardware to encrypt AES, which is less affected. However, in 10Gb or higher-speed networks, the impact of this parameter setting on performance can be significant.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.