High performance distributed storage Lustre 04/27 Update SLTechnology News&Howtos

High performance distributed storage Lustre

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Preface

In recent years, distributed storage has entered the storage market because of its high performance and high availability. In addition to commercial products, open source distributed storage software is more popular, with Lustre, CephFS and GlusterFS as typical representatives.

1. Brief introduction

Lustre is an open source, distributed parallel file system software platform with high scalability, high performance, high availability and so on. The construction goal of Lustre is to provide a globally consistent POSIX-compatible namespace for large-scale high-performance computing systems. It supports hundreds of petabytes of data storage space and hundreds of GB/s or even several TB/s concurrent aggregate bandwidth.

1.1 Environmental Architecture

MGS (Management Server, management server), MGS stores configuration information for all Lustre files in the cluster and provides information for other Lustre components.

MDS (Metadata Servers, metadata server), MDS makes metadata valid for the client, and each MDS manages the names and directories in the Lustre file system.

OSS (Object Storage Servers, object storage server), OSS is used to store client business access data.

1.2 Network Planning

two。 Environmental preparation

Note: do the following on all hosts

1. Set hostname

Hostnamectl set-hostname node1

two。 Close firewalld and selinux

Systemctl stop firewalld & & systemctl disable firewalldsed-I "s/SELINUX=enforcing/SELINUX=disabled/g" / etc/selinux/config

3. Create a temporary yum source

Cat > / tmp/lustre-repo.conf-- mgs: specified as the MGS partition

-- mgt: specified as the MGT partition

-- ost: specified as the OST partition

-- servicenode=ServiceNodeIP@tcp0: specify the node that takes over the service when the node fails. If it is an InfiniBand network, then the tcp0 needs to be replaced with o2ib.

Set up MGS and MGT (Note: execute on server MGS host node1)

Mkdir-p / data/mdtmkfs.lustre-- fsname=lufs-- mgs-- mdt-- index=0-- servicenode=10.10.201.61@tcp0-- reformat / dev/sdbmount-t lustre / dev/sdb / data/mdt/

Establish an OST1 (Note: execute on the server OSS host node2)

Mkdir / data/ost1-pmkfs.lustre-- fsname=sgfs-- mgsnode=10.10.201.61@tcp0-- servicenode=10.10.201.62@tcp0-- servicenode=10.10.201.63@tcp0-- ost-- reformat-- index=1 / dev/sdbmount-t lustre / dev/sdb / data/ost1/

Establish an OST2 (Note: execute on the server OSS host node3)

Mkdir / data/ost2-pmkfs.lustre-- fsname=sgfs-- mgsnode=10.10.201.61@tcp0-- servicenode=10.10.201.63@tcp0-- servicenode=10.10.201.62@tcp0-- ost-- reformat-- index=2 / dev/sdbmount-t lustre / dev/sdb / data/ost2/5 Client mount access

The client creates a mount directory and mounts the access. (note: execute on client host node4)

Mkdir / lustre/sgfs/mount.lustre 10.10.201.61@tcp0:/sgfs / lustre/sgfs/

If the mount fails, use the lctl command to check the network connection and check the Syslog for troubleshooting.

Lctl ping 10.10.201.61@tcp0

Check to see if the mount is successful

Df-ht lustre6. Common problem handling

Error 1:

[root@node1] # modprobe-v lustreinsmod / lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/lustre/net/libcfs.ko insmod / lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/lustre/net/lnet.ko insmod / lib/modules/3.10.0-957.10.1.el7_lustre.x86_64/extra/lustre/fs/obdclass.ko insmod / lib/modules / 3.10.0-957.10.1.el7_lustre.x86_64/extra/lustre/fs/ptlrpc.ko modprobe: ERROR: could not insert 'lustre': Cannot allocate memory

Cause of error: the server has 2 CPU and one CPU has no memory, as shown below

[root@node2] # numactl-Havailable: 2 nodes (0-1) node 0 cpus: 0 12 34 5 6 7 8 9 20 21 22 23 25 26 28 29node 0 size: 0 MBnode 0 free: 0 MBnode 1 cpus: 10 11 12 13 14 15 16 17 18 30 31 33 35 37 38 39node 1 size: 32654 MBnode 1 free: 30680 MBnode distances:node 0 10: 10 20 1: 20 10

Adjust the state after reseating the memory

[root@node1] # numactl-Havailable: 2 nodes (0-1) node 0 cpus: 0 12 34 5 6 7 8 9 20 21 22 23 25 26 28 29node 0 size: 16270 MBnode 0 free: 15480 MBnode 1 cpus: 10 11 12 13 14 15 16 17 18 30 31 33 35 37 38 39node 1 size: 16384 MBnode 1 free: 15504 MBnode distances:node 0 10: 10 21 1: 21 10

Refer to the solution:

Https://jira.whamcloud.com/browse/LU-11163

Welcome to scan the code to ask questions, which can be answered online. Regular sharing of virtualization, containers, DevOps and other related content

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.