In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
How to carry out vsan capacity device failure and cache device fault analysis, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.
Capacity equipment failure resolution:
Disk failures are probably the most common failure in any storage environment, and vsan is no exception. Disk group is the management structure of vSAN, which includes a cache device and one or more capacity devices. Most of the disks of the capacity devices are SATA disks. A host can provide up to five disk groups for VSAN: each disk group requires one SDD and a minimum of one and a maximum of six HDD. The maximum number of HDD per host is 5 x 6 = 30. The maximum number of SSD per host is 5 x 1 = 5.
In the daily operation and maintenance, the simplified configuration is generally adopted, and the virtual disk of the virtual machine only occupies the use space of the actual data, which saves a lot of cost. However, in the case of unmonitored and rapid growth of space, overallocation of storage space may occur, resulting in a decline in the performance of business applications, or even unable to work.
So how does VSAN handle capacity disk failures? What happens if there happens to be a read or write operation on the disk when the failure occurs? The capacity device failure of vsan is analyzed below.
As shown in the figure, for example, a capacity storage component on exsi-03 returns a read error, then VSAN checks to see if there is a replica component, and if so, reads from that replica. By default, each object is configured with a FTT of 1 when it is created, which means that two identical replica components are always available for each object. When the failure occurs when reading, there are two different situations, the first case can be repaired, and the second case cannot be repaired. When the problem is repairable, the Icano error is reported to the owner of the object, and the owner of the object initiates component refactoring. When the component refactoring is complete, the failed component will be deleted. However, if, for some reason, no replica component exists, VSAN will report that the virtual machine has an Icano error.
If a write error is returned, it will also be passed to the object owner, the component will be marked "degraded" and component refactoring will be triggered on another disk in the VSAN cluster. When the component refactoring is complete, the cluster directory is updated. Note that the flash device (which has no errors) continues to use caching to provide read services.
In the original version of vsan, when one or more components were rebuilt due to a failure, the vsphere web client did not show how much data needed to be synchronized. However, starting with vsan6.0, the vsphere web client provides the ability to monitor data synchronization in the event of a failure, such as showing the number of components being resynchronized, the number of bytes remaining for resynchronization, and the time required to complete the resynchronization.
Note: when the disk capacity is full, vSAN will pause writing data and request new disk space for the write request. If the new disk is not added in time, the vsan write operation will cause an error, causing the virtual machine Icano error.
Cache device failure resolution:
What happens if the cache device SSD is not accessible? When the cache device is inaccessible, all capacity devices supported by that cache device in the same disk group are inaccessible. A cache device failure is equivalent to all capacity device failures behind the cache device. In essence, when a cache device fails, the entire disk group is considered "degraded". If there is excess capacity in the VSAN cluster, it attempts to reconfigure the storage object on another host or disk. Therefore, from an architectural decision point of view, it may be better to create multiple small disk groups than a single large disk group, depending on the type of host used, because a disk group can be considered a failure domain.
Note: VSAN uses the elevator algorithm to periodically "flush" the data in the write cache in the cache layer into the disk in address order, which is a self-adjusting algorithm that determines how often SSD is written back to disk. When an application in the exsi-01 virtual machine initiates a write operation, the object owner clones the write operation. Concurrent write requests are sent to the write cache on exsi-02 and exsi-03 over the 10 Gigabit network. When the data is written to the cache, the write is confirmed, and the preparation operation on the SSD is completed. The owner waits for the ACK signal of all 2 hosts to complete the Imax O. Later this write-in will eventually be written back to disk as part of batch processing. The writeback operations of each host are independent of each other, that is, the writeback operation time on exsi-02 and exsi-03 may be different. This is because the situation varies from host to host, such as the speed at which the cache space fills up, the size of the remaining space, and where the data will be stored on disk.
Add:
Guidelines for setting capacity size
1. Leave at least 30% unused space to prevent vSAN from rebalancing the storage load. As long as the consumption on a single capacity device reaches 80% or more, vSAN rebalances the components in the cluster. Rebalancing operations may affect the performance of your application. To avoid these problems, storage consumption should be less than 70%.
2. Plan additional capacity to handle potential failures or replace capacity devices, disk groups, and hosts. When a capacity device is inaccessible, vSAN recovers components in other devices in the cluster. VSAN recovers components from the entire disk group when the flash cache device fails or is removed.
3. Reserve additional capacity to ensure that vSAN recovers components in the event of a host failure or when the host enters maintenance mode. For example, set up a host with sufficient capacity to leave enough available capacity for components to be successfully rebuilt during host failure or maintenance. This is important when there are more than three hosts so that you have enough available capacity to rebuild the failed component. If the host fails, it will be rebuilt on the available storage of the other host, which allows the failure to occur again. "however, in a three-host cluster, if the primary level of allowed failures is set to 1, vSAN does not perform a rebuild operation because after one host fails, there are only two hosts left in the cluster." To allow rebuilding after a failure, there must be at least three hosts.
4. Provide sufficient temporary storage space to make changes in the vSAN virtual machine storage policy. When you dynamically change the virtual machine storage policy, vSAN may create a layout for the copies of the constituent objects. When vSAN instantiates these copies and synchronizes them with the original, the cluster must temporarily provide additional space.
5. If you plan to use advanced features such as software checksum or de-duplication and compression, leave extra space to handle operational overhead.
Add to the question:
The problem of SSD congestion causes VSAN tamping. When the active working set written to IO of a particular disk group is significantly larger than the size of the cache layer of that disk group, it usually causes SSD congestion, which in turn causes the VSAN cluster to tamp. In mixed and all-flash vSAN clusters, data is first written to the write cache (also known as write buffer). A process called a degraded dump moves data from the write buffer to the capacity disk. The write cache is subject to a high write rate, which ensures that write performance is not limited by capacity disks. However, if the write cache is populated at a very fast rate, the degraded dump process may not keep up with the rate at which the IO is reached. In this case, SSD congestion occurs and the vSAN DOM client layer needs to be instructed to slow down the IO to a rate that the vSAN disk group can handle.
Remedy: to avoid SSD congestion, resize the virtual machine disks used. For best results, we recommend that the size of the virtual machine disk (active working set) does not exceed 40% of the cumulative write cache size of all disk groups. Note that for mixed vSAN clusters, the write cache size is 30% of the cache tier disk size. "in an all-flash cluster, the size of the write cache is the size of the cache tier disk, but should not exceed 600 GB." If a large number of writes exceed the limit, it is easy to cause the VSAN cluster to be rammed, and the capacity layer disk will not be accessible.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.