In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article is about how to analyze the Lustre performance optimization scheme, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article. Let's take a look at it with the editor.
When talking about HPC, it seems that Lustre cannot be bypassed. Lustre is synonymous with HPC, it is the open source HPC parallel file system market share file system, and has been strongly supported by vendors such as Intel and DDN. At present, the business related to Lustre of Intel has been taken over by DDN.
1 Lustre performance optimization referenc
1.1 Network bandwidth
Network bandwidth often determines the aggregate bandwidth of lustre file systems. Lustre uses multiple OSS to read data at the same time to improve the overall read and write performance of the system. However, if the performance of network transmission is too low, it can not take advantage of the performance of the lustre file system. Consider the impact of network bandwidth on performance from the following points:
Network type (TCP/IP network and Infiniband network)
Network card type (Gigabit network / 10 Gigabit network)
The number of network cards and the binding method (network cards are bound together)
Network card binding mode
Add:
In general, the performance of Infiniband network is much higher than that of TCP/IP network, but the cost is higher.
The performance of 10 Gigabit network is higher than that of gigabit network.
The binding mode of the network card is generally 6.
1.2 Lustre self-settings
The setting of Luster itself is mainly about the number of blocks (that is, the number of OST) and how to be striped, which are also the key for lustre to achieve Ibank O concurrency. Striping can enable the system to achieve the purpose of concurrency, thus affecting the performance of the system. The impact of Luster self-setting on system performance is mainly from the following aspects:
Bar size (stripesize,min=64KB)
Number of pieces (stripecount)
Number of starting blocks (start-ost, that is, the starting position of the piece)
Add:
Normally, start-ost defaults to-1 and does not need to be modified. This setting does not specify the initial location, which can achieve the purpose of load balancing very well.
In general, with the increase of the size of lustre strips, the aggregate bandwidth generally shows a downward trend. When the strips are too large, multiple I _ pico in a certain time occur on the same OST, resulting in I _ max O waiting, usually set to 64KB.
In general, with the increase of the number of blocks, the aggregate bandwidth shows an overall upward trend. In a certain environment, reasonable configuration of OST can give full play to the system performance of lustre.
1.3 client Settings
In the Lustre file system, the client generates a global storage space, and the user data is stored in the lustre file system through the client. The settings of the client will also affect the performance of the system.
Mainly from the following points:
Number of processes per client (number of connections)
Read and write block size
Number of client
Add:
With the increase of the number of connections (processes), the aggregate bandwidth begins to increase, and then stabilizes to a certain extent (when the system performance has not yet reached saturation), and the bandwidth begins to decrease with the increase of the number of connections.
With the increase of the size of 64KB~64MB O read and write blocks, the aggregate bandwidth begins to increase, then stabilizes to a certain extent, then decreases with the increase of block size, and remains stable when the block size is large.
With the increase of the number of clients, the aggregate bandwidth in read mode increases significantly, while the aggregate bandwidth in write mode does not change significantly.
1.4 Storage RAID
Luster underlying storage devices use general storage devices, which can be single disk, RAID or LVP. Most of them use RAID mode, which can not only ensure aggregate storage capacity, but also provide data protection. It is mainly explained from the following points:
RAID mode (hard RAID/ soft RAID)
RAID mode (RAID0/1/2/3/4/5/6/10/01)
Hard raid card type
Disk type of RAID (SATA, SAS, SSD)
Add:
Usually, the lustre file system uses hard RAID for underlying storage, and its performance is much better than that of soft RAID, but the cost is high.
Luster usually does RAID6 to improve data protection.
OST disks generally use low-cost SATA disks, while MDS disks generally use SSD disks.
2 Lustre small file optimization
2.1 overall setup
1. Improve performance by applying aggregate read and write, such as Tar small files, or create large files or store small files through loopback mount. The overhead of small file system calls and the extra Icano overhead are very high, and the application of aggregation optimization can significantly improve performance. In addition, you can use multi-node, multi-process / multi-thread to increase the Ihand O bandwidth through aggregation as much as possible.
2. The App uses O_DIRECT to perform direct I / O, and the read and write record size is set to 4KB, which is consistent with the file system. Disable locking on the output file to avoid competition between clients.
3. The application program should try its best to write continuous data, and the sequential reading and writing of small files is obviously better than that of random small files.
4. OST uses SSD or more disks to improve IOPS to improve the performance of small files. Create a large capacity OST instead of multiple small capacity OST to reduce the load of logs, connections, and so on.
5. OST uses RAID 1: 0 instead of RAID 5 RAID 6 to avoid the data check overhead caused by frequent small files Imax O.
2.2 system Settings
1. Disable all client-side LNET debug functions: enable multiple debugging information by default, sysctl-w lnet.debug=0 to reduce system overhead, but there will be no LOG to inquire when errors occur.
2. Increase the client Dirty Cache size: the default is 32MB. Increasing the cache will improve the performance of IDirty Cache O, but the risk of data loss will also increase.
3. Increase the number of RPC parallelism: the default is 8, and raising to 32 will improve data and metadata performance. The downside is that if the server is under a lot of pressure, it may affect performance instead.
4. Control Lustre striping:lfs setstripe-c 0 OST 1 / path/filename. If the number of OST objects is greater than 1, the performance of small files will decline, so set the OST object to 1.
5. The client considers using a local lock: mount-t lustre-o localflock. If you determine that multiple processes write files from the same client, you can use localflock instead of flock to reduce the number of RPC sent to MDS.
6. Use loopback mount files: create a large Lustre file, associate it with a loop device and create a file system, and then mount it as a file system. If a small file acts on it, a large number of MDS metadata operations will be converted into OSS read and write operations, which eliminates the metadata bottleneck and can significantly improve the performance of small files.
This method is feasible for scratch space, but it should be used with caution for production data, because there are still some problems with Lustre working in this mode. The operation method is as follows:
The above is how to analyze the Lustre performance optimization scheme, the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.