In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
2.3.3 configuring Cluster Explorer Pacemaker
Introduction to Pacemaker on Linux
On Windows Server operating systems, Windows Server Failover Cluster (WSFC) provides high availability, fault detection, and automatic failover of SQL Server AlwaysOn AG. WSFC is a Cluster Resource Manager (CRM) that is responsible for maintaining a consistent mirror of the cluster on all nodes on the cluster. The purpose of Cluster Administrator is to provide high availability and fault tolerance for resources running on the cluster.
On the Linux operating system, the cluster resource manager is actually the open source software Pacemaker. It is mainly by the ClusterLabs organization to provide community contributions, Red Hat and SUSE to drive collaborative development. Pacemaker is available on most Linux releases, and SQL Server AlwaysOn AG is only supported in the current Red Hat Enterprise Linux 7.3 SP2 version 7.4, SUSE Linux Enterprise Server 12 SP2, and Ubuntu 16.04.
The Pacemaker stack consists of the following components:
The Pacemaker software itself, which is similar to the cluster service on Windows.
Corosync, a set of communication systems, is similar to heartbeat and arbitration on Windows (not to be confused with Heartbeat, it is a Linux program that functions similar to Corosync); it is also responsible for restarting failed application processes.
LibQB, a high-performance logging, tracking, interprocess communication and polling system, is similar to how cluster.log is generated on Windows.
Resource Agents, resource agent, software that allows Pacemaker to manage services and resources, such as starting or stopping SQL Server AlwaysOn AG resources, such as cluster resource DLL on Windows.
Fence Agents, the quarantine agent, allows Pacemaker to isolate and block nodes that behave abnormally to affect cluster availability.
Install the Pacemaker package on all nodes
Sudo yum install pacemaker pcs fence-agents-all resource-agents
Look at the installed packages, which make up the Pacemaker stack with different components:
Pcs, the Pacemaker Configuration System,Pacemaker and Corosync configuration tools
Fence-agents-all, a collection of all supported quarantine agents
Resource-agents, a repository of all resource agents that conform to the Open Cluster Framework (OCF) specification.
Set the password for the default user created when installing the Pacemaker and Corosync packages
Use the same password on all nodes.
Sudo passwd hacluster
Enable and start the pcsd service and Pacemaker
Allows nodes to rejoin the cluster after reboot. Run the following command on all nodes:
Sudo systemctl enable pcsdsudo systemctl start pcsdsudo systemctl enable pacemaker
Create a cluster
First of all, in order to prevent the residual configuration files with Cluster from affecting the post-build, you can first execute the following command to delete the existing Cluster:
Sudo pcs cluster destroy # On all nodessudo systemctl enable pacemaker
Then create and configure the cluster:
Sudo pcs cluster auth-u hacluster-p sudo pcs cluster setup-name sudo pcs cluster start-- allsudo pcs cluster enable-- all
After Pacemaker is configured, use pcs to interact with the cluster. Execute all commands on one node in the cluster.
Configure isolation
The Pacemaker cluster vendor needs to enable STONITH and configure quarantined devices for supported cluster installations. When Cluster Explorer is unable to determine the state of the node or the resources on the node, isolation takes the cluster back to a known state.
Resource-level isolation to ensure that there is no data corruption in the event of an outage by configuring resources. For example, when the communication link is damaged, you try resource-level isolation to mark the disk on a node as obsolete.
Node-level isolation ensures that a node does not run any resources. This is achieved by resetting the node. Pacemaker supports a variety of isolation devices, depending on your environment. You can use intelligent power distribution units (PDU), network switches, HP iLO devices, or plug-ins like VMWare STONITH agents. Currently, STONITH agents for Hyper-V and Microsoft Azure are not supported.
Note: disable STONITH for testing purposes only. If you plan to use Pacemaker in a production environment, you should implement STONITH according to the environment plan and keep it enabled.
Production deployment isolation, refer to the official documentation: high availability add-in for Red Hat and Pacemaker: isolation
Because the node-level isolation configuration largely depends on your environment, in the test environment, you can disable node-level isolation with the following script:
Sudo pcs property set stonith-enabled=false
Configure the cluster properties cluster-recheck-interval
Cluster-recheck-interval represents the polling interval that checks for changes in cluster resource parameters, constraints, and other cluster options. If the replica fails, the cluster attempts to restart the replica within a certain interval determined by the failure- timeout value and the cluster-recheck-interval value. For example, if failure-timeout is set to 60 seconds and cluster-recheck-interval is set to 120 seconds, the restart attempt interval is greater than 60 seconds and less than 120 seconds. It is officially recommended that failure-timeout be set to 60 seconds and cluster-recheck-interval set to more than 60 seconds. Cluster-recheck-interval does not recommend setting to a smaller value. The following script updates the attribute value to 2 minutes:
Sudo pcs property set cluster-recheck-interval=2min
Configure the cluster properties start-failure-is-fatal
All releases that include RHEL 7. 3 and 7. 4, using the latest available Pacemaker package 1.1.18-11.el7, describe the behavior changes when the cluster is configured with start-failure-is-fatal as false. It affects the failover workflow. If service disruption occurs in the primary replica, the cluster should fail over to one of the available secondary replicas. Instead, the user will notice that the cluster has been trying to start the failed master copy. If the primary replica will never be online (due to a permanent outage), the cluster will never fail over to another available secondary replica. Because of this change, the previously recommended configuration for start-failure-is-fatal is no longer valid, and the configuration needs to be restored to its default value of true.
Sudo pcs property set start-failure-is-fatal=true
In addition, the AG resource needs to be updated to include the failover-timeout attribute.
Use the following script to update the failover-timeout property of the ag1 resource to 60s:
Pcs resource update ag1 meta failure-timeout=60s
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.