In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Storage Space Direct (abbreviated as S2D) is the third generation of software-defined storage technology integrated by Microsoft in Windows Server 2016 data center version. S2D technology can aggregate the local disks of industry-standard X86 servers to build a software-defined storage architecture with high availability, high performance and easy scalability. The advantage of S2D lies in its good integration with its own products such as Hyper-V/WAC/SCVMM/SCOM, which is suitable for users who have widely used Microsoft enterprise products. Microsoft not only has mature enterprise products, but also has a mature OEM ecosystem. Microsoft not only released Windows Server 2019 this year, but also announced a roadmap for WSSD products. With the continuous optimization and enhancement of the functions of Windows Server 2019, the performance and reliability of S2D have been recognized by enterprise users. It is believed that some S2D users who are already using version 2016 will choose to upgrade the original environment to the latest version, so this article Lao Wang will show you how to upgrade the S2D of 2016 environment to 2019 without downtime.
In this article, Lao Wang will not introduce the relevant concepts and principles of S2D too much, but we will mainly focus on the process of rolling upgrade of S2D.
In this paper, Lao Wang takes the S2D fusion architecture as an example to explain, and will give a simple hint for the super-fusion scenario.
As there are no relevant articles on the network at present, Lao Wang will present all the details of the S2D rolling upgrade for reference by domestic users in the future.
Before the rolling upgrade, Lao Wang wrote an article about rolling upgrade 2016 in Hyper-V 2012. The concept of rolling upgrade can be realized, that is, the upgrade can be completed without downtime in the same cluster, with two prerequisites: one cluster supports mixed mode, and two virtualization software supports backward compatibility. The same is true when you come to S2D, except that rolling upgrades are supported not from 2012 to 2016, but from 2016 to 2019. Because 2012 storage space is different from 2016S2D architecture, 2012 storage space can only be reinstalled on 2012 nodes when upgrading to 2016S2D, and then upgrade the storage pool, so there must be downtime, but it is different from 2016 to 2019. Because 2016and 2019 use the same S2D architecture, so we can upgrade in the process. The implementation data is scattered on 2016 nodes and 2019 nodes at the same time, which means that when we remake a 2016 node installed as 2019, this node can directly join the fault tolerance of the existing storage pool without any impact on business availability.
After understanding the prerequisites, continue to look back, it is well known that S2D is based on the architecture implemented by Microsoft WSFC cluster, and once a WSFC cluster becomes an S2D cluster, each machine will not only act as an ordinary cluster node, but also assume the role of writing S2D data to the failure domain. S2D aggregates the local disks of each cluster node to form a virtual machine enclosure in the cluster, when the data is written. S2D will split the data into 1GB extent and write according to virtual disk fault tolerance rules. If mirrored, make sure that two copies of extent are always written on two different nodes, and so on. Virtual disk availability is not allowed to be affected by a single node failure.
In this way, we need to consider some additional issues, which can not be treated as a separate cluster rolling upgrade, for example, when we pause or delete a node, for the impact of S2D, what actions should be performed after the node is restored? with these questions, we went into the environment and began to fight.
According to the environment, there is currently a set of three-node S2D2016 clusters that have enabled S2D and have created an image-tolerant virtual disk that is in use.
We will eventually upgrade all three nodes to 2019 by rolling upgrade without down. as for the method of verifying non-downtime, Lao Wang uses a real-time script to verify it. The script is as follows
For / L% I in (1JI 30000) do fsutil File createnew% I 1024
This script means that every second to the S2D volume to write a 1KB file, write 30000 times, if the write in the upgrade process without interruption, can continue to write, then the rolling upgrade did not have any impact on the write, the script can run in one node to the end, until the last node to upgrade, switch to other nodes to continue to run.
Before performing an S2D rolling upgrade, one of the first steps to do is to check, check four things
1. Is the virtual disk healthy?
two。 Are there any outstanding storage jobs
3. Check the disk operation status and see if there is a Needs Rebalance prompt. If so, please perform Optimize-Storagepool operation.
4. Whether there are automatic update tasks such as CAU or VMM, and if so, shut down all cluster nodes to prevent manual operation of the cluster upgrade process.
To ensure that the environment is ready, we will begin to perform the first operation of S2D rolling upgrade, pausing nodes. For general cluster nodes, suspending nodes means expelling roles to other nodes according to placement rules, which means a little more to S2D. First of all, under normal circumstances, data will randomly find two nodes between three nodes for mirror writing, assuming that one cluster node is down. Then the virtual disk will become degraded, because the extent copy on the node will not be available, which means that some data blocks may not have a second copy available at this time. After pausing the node, the data is written again, and the extent will not be scattered on the paused node again. When the node resumes, the incremental data written by other nodes during the pause will be automatically synchronized to the paused node. If it is a super-fusion scenario, the virtual machine of this step will also be expelled to other nodes.
S2D refreshes and commits data when the node is paused, so be sure to remember to perform the pause node operation. When the node is placed in the paused state, the next action is to remove the node from the S2D cluster, and we only perform the eviction operation, which means that only the node is deleted in the cluster, but not the physical disks that have been registered in the cluster pool, even if the node is expelled from the cluster. However, the node's local disk will still be recorded in the cluster pool, just marked as lost, even after the node redoes the system, as long as the disk is still plugged into the node, it can be re-associated with the disk in the cluster pool.
After eviction, dedomain the node, clean reinstall the 2019 operating system, the machine name can be the same as the original, or renamed, this process is slightly
When the node is suspended and expelled, you can see that the operation status of the virtual disk is degraded at this time. This is due to the loss of a failure domain node, but does not affect the write to the disk. As can be seen from the write script, for the virtual disk in a failure domain, no node can be down at this time, otherwise the disk will be inaccessible. If there are two failure domains in a three-way mirror, you can allow one more node to be down.
Friends who perform this operation in a virtualized environment may encounter a BUG, that is, the 2019 node cannot join the cluster after it is newly done. The root cause is the problem of error 1809. This node has joined the cluster with storage space pass-through enabled, but has not been verified on the current version, and this node will be isolated.
See KB https://support.microsoft.com/en-us/help/4464776/software-defined-data-center-and-software-defined-networking for details
WSSD is a brand-new plan of Microsoft. Microsoft and DELL/Dataon/HPE and other manufacturers cooperate to launch cooperative solutions, and manufacturers provide devices, which can run Microsoft's latest SDDC technology, S2D/SDN/WAC, etc., after Microsoft certification. But some customers may be virtualized, or do not buy WSSD certified devices, then they will encounter this problem. According to the research, when the 2019 node joins the S2D cluster, it will go through a judgment to check a key value in the registry. If it already exists, it can be run to join the S2D cluster.
The location is under HKEY_LOCAL_MECHINE\ SYSTEM\ CurrentControlSet\ Services\ ClusSvc\ Parameters. The administrator needs to type the key value, DWORD type, name S2D, value 1 for each node in the S2D cluster. For 2019 nodes, you need to create a Parameters entry in advance, and then create a new key value to avoid this problem.
After the completion of processing, 2019 nodes can join the S2D2016 cluster normally.
When 2019 nodes join the cluster, we can observe from the registry that the current cluster has entered mixed mode. The MixedMode value in the following location is 1, which means that the cluster is currently in mixed mode. If all nodes are upgraded to 2019 and the cluster FunctionalLevel is upgraded, the value will be 0.
By obtaining the cluster details, you can see that the current feature level of the cluster is 9, which represents that the cluster is currently the cluster feature level of Windows Server 2016. If the cluster nodes are all upgraded to 2019 and the cluster FunctionalLevel is promoted, the cluster feature level will be upgraded to 10.
After the node joins the cluster normally, the disk that was previously lost in the cluster pool due to redoing the system becomes normal after being recontacted.
To get the storage job, you can see that first, after the 2019 node joins the S2D cluster, the S2D executes the Repair job first. The purpose of this job is to resynchronize the incremental data of each virtual disk during the node's absence. This operation proves that S2D supports dropping data across the 2016Compact 2019 node. This operation will not affect the normal write to the disk, but the disk performance will be degraded from the time the node pauses to the repair process.
You may also see the Optimize job operation, which explains why the upgrade process should pay attention to it and why it should be checked before starting. During the operation of S2D, with the large number of writes and deletions of data, it is possible that at a certain time node, the utilization rate of one physical disk has reached 90 percent, while that of other physical disks is only 60 percent, so it is possible for a single physical hard disk to be written out. As a result, write operations and even virtual disks are affected. Therefore, it is recommended that during the rolling upgrade process, each node should pay attention to the usage of each disk or the status of virtual disks in Cluster Administrator, whether there is Needs Rebalance, and if so, execute Optimize-Storagepool immediately. In addition, when a new node joins the S2D cluster, the Optimize-Storagepool operation will be performed automatically every 30 minutes. If you can't wait, you can do it manually.
The Repair job is to synchronize the data updated by other nodes when the node is paused, and the Optimize job is to balance the physical disk usage of each node after the node is added. Do not confuse the two.
Because the virtual disk is degraded each time you do a node, and the node needs to resynchronize the data and balance the load after the redo, be sure to perform the upgrade operation of only one node at a time, and if you execute the next node before the Repair and Optimize jobs are completed after one node joins the cluster, there will be a risk of downtime.
Confirm that the first node completes the Repair and Optimize jobs, and the virtual disk operation status is restored to OK, and then you can start to do the next node.
Pause node-eject node-node undomain-redo system-modify registry-join cluster-wait for Repair and Optimize jobs to execute-check cluster virtual disk status
The second node follows the same steps. The so-called rolling upgrade means that we only reinstall the nodes allowed in the failed domain each time to ensure that after reinstallation, the virtual disk is only degraded and will not affect read and write. after the node redoes, the virtual disk is restored to integrity after joining the cluster, and then redo the next one, always preventing the disk from losing read and write by making up. The steps are not difficult, but the key is to understand what happens behind each step and what you should pay attention to after each operation is performed.
According to this rolling thinking, when the last node is completed, the last node performs all the steps and checks that the status of the cluster virtual disk is OK, which means that all nodes have been upgraded to 2019 without downtime, but the current cluster function level is still Windows Server 2016. After upgrading the cluster function level, the cluster feature level will be 10, which will enjoy all the new features of WSFC2019. This operation is irreversible.
Check the registry location to find that MixedMode has been canceled to the 0 key.
In addition, if you check the storage pool, you can see that the current version of the storage pool is still Windows Server 2016.
Execute the command Get-StoragePool-FriendlyName "PoolName" | Update-StoragePool upgrade the storage pool to the latest version, which is irreversible.
After upgrading the cluster feature level and storage pool, you can find 2019 of the new features and the performance history of WAC integration. Create a 10GB disk in S2D to hold historical data, which is proprietary to S2D2019.
Finally, if it is a super-converged architecture, you also need to upgrade the virtual machine configuration. For this operation, you need to shut down the virtual machine before you can upgrade, and you need to plan the time for operation separately.
It should be noted that upgrading the cluster function level, upgrading the storage pool, and upgrading the virtual machine configuration are all irreversible. Once you hit the command, there is no regret to eat. If you are indecisive, you can observe the fallback during the mixed mode, and you cannot fall back after upgrading the cluster feature level.
The most important thing in the upgrade process is to understand the implementation of each operation step to what happens in the S2D cluster, so that the train of thought is clear, the operation steps are not tedious, and the upgrade of each node is completed step by step, and each step cannot be skipped. Each node upgrade should be carefully checked and the next node should be checked carefully. If we operate sequentially, we can achieve the S2D rolling upgrade without downtime, and finally complete the operating system upgrade. Cluster feature level upgrade, storage pool upgrade, virtual machine upgrade.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.