Translation of mfs frequently asked questions (moosefs) 07/19 Update SLTechnology News&Howtos

Translation of mfs frequently asked questions (moosefs)

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

For the original English text, see:

Https://moosefs.com/documentation/faq.html#26

Looking for translation tools to translate, the personal feeling is still more accurate, if there are omissions and errors, you can leave a message.

Welcome to communicate with us: QQ 249016681

Frequently asked questions

Table of contents:

1. What average write / read speed do we expect?

2. Does the target setting affect the write / read speed?

3. Do you support concurrent read and write operations?

4. How many CPU / RAM resources are used?

5. Can I add / remove chunkserver and disk during flight?

6. How do I mark the disk to be deleted?

7. My experience with cluster file systems is that metadata operations are quite slow. How did you solve this problem?

8. What does the value of directory size mean on MooseFS? It is different from the standard Linux ls-l output. Why?

9. When I executed df-h on the file system, the results were as expected, taking into account the actual size of the written file.

10. Can I keep the source code on MooseFS? Why do small files take up more space than expected?

11. Do Chunkservers and Metadata Server checksum on their own?

12. What resources are needed for the primary server?

13. When I delete a file or directory, the MooseFS size does not change. Why?

When I add a third server as an additional chunkserver, it looks like the system starts replicating data to the third server, even though the file target is still set to 2.

15. Is MooseFS 64-bit compatible?

16. Can I change the block size?

17. How to know whether a file has been successfully written to MooseFS

What are the limitations of MooseFS (such as file size limit, file system size limit, maximum number of files, which can be stored on the file system)?

19. Can I set HTTP basic authentication for mfscgiserv?

20. Can I run the mail server application on MooseFS? The mail server is a very busy application with a large number of small files-I won't lose any files?

21. Do you have any suggestions for the network, MTU or bandwidth?

22. Does MooseFS support supplementary groups?

23. Does MooseFS support file locking?

24. Can I assign an IP address to an IP address through DHCP?

25. Some of my block servers take up 90% of the space, while others only take up 10%. Why does the rebalancing process take so long?

26. I have Metalogger running-should I further back up metadata files on the main server?

27. I think one of my disks is slow / damaged. How do I find it?

28. How to find the master server PID?

29. The Web interface shows copies of some blocks with target 0. What does this mean?

Is every error message reported by mfsmount a serious problem?

31. How to verify that the MooseFS cluster is online? What happens to mfsmount when the primary server shuts down?

32. What average write / read speed do we expect?

1. In addition to common (for most file systems) factors, such as block size and access type (sequential or random), speed in MooseFS also depends on hardware performance. The main factors are hard disk performance and network capacity and topology (network latency). The better the performance of the hard disk used, the better the throughput of the network, the better the performance of the whole system.

two。 Does the target setting affect the write / read speed?

Generally speaking, no. In the case of reading a file, in some cases, a goal higher than one goal may help speed up the read operation, that is, when two clients access a file with a target of 2 or higher, they can perform read operations on different replicas, thus having all available in their own way, but on average, the goal setting does not change the speed of the read operation in any way.

Similarly, the write speed can be negligible affected by the goal setting. Writing links with targets higher than 2 is similar to: the client sends data to a chunk server, and the chunk server reads, writes and sends the data to another chunk server (which may send them to the next component) to achieve the goal. In this way, the client's input does not send multiple copies, and all copies are written almost at the same time. Our tests show that write operations can use all the available bandwidth of the client in the 1Gbps network.

3. Do you support concurrent read and write operations?

All read operations are parallel-there is no problem for several clients to read the same data at the same time. Write operations are parallel, execpt operations are on the same block (file fragments) and synchronized by the master server, so they need to be sequential.

4. How many CPU / RAM resources are used?

In our environment (a total of 1 PiB space, 36 million files, 38 million packets on 100 machines, 6 million folders), the use of chunkserver CPU (transfer through constant files) is about 15-30%. RAM is usually consumed between 100Mb and 1GiB (depending on the number of blocks on each block server). The primary server consumes about 50% of the modern 3.3 GHz CPU (approximately 5000 file system operations per second, of which about 1500 are modifications) and 12GiB RAM. The CPU load depends on the number of operations and the total number of RAM files and folders, not the total size of the files themselves. RAM usage is proportional to the number of entries in the file system because the main server process keeps the entire metadata in memory for performance. HHD usage on our main server is 22 GB.

5. Can I add / remove chunkserver and disks during flight?

You can instantly add / remove chunks of servers. Keep in mind, however, that if this server contains a unique copy of the block in the file system, it is unwise to disconnect the server (the CGI monitor will be marked orange). You can also disconnect (change) a single hard drive. The scenario for this operation will be:

Mark the disk to delete (see how to mark a disk for deletion? )

Reload the chunkserver process

Waiting for replication (there should be no "experience" or "missing" blocks marked × ×, orange or red in the CGI monitor)

Stop the chunkserver process

Delete entries for mfshdd.cfg disconnected disks

Stop the chunkserver machine

Delete hard disk driv

Start the machine

Start the chunkserver process

If you have a hotswap disk, you should follow these:

Mark the disk to delete (see how to mark a disk for deletion? )

Reload the chunkserver process

Waiting for replication (there should be no "experience" or "missing" blocks marked × ×, orange or red in the CGI monitor)

Delete entries for mfshdd.cfg disconnected disks

Reload the chunkserver process

Unmount disk

Delete hard disk driv

If you follow the above steps, the work of the client computer will not be interrupted and MooseFS users will not notice the whole operation.

6. How do I mark the disk for deletion?

When you want to mark a disk to be deleted from chunkserver, you need to edit the mfshdd.cfg configuration file for chunkserver and add an asterisk "*" at the beginning of the line of the disk to be deleted. For example, in this mfshdd.cfg, we mark "/ mnt / hdd" to delete:

/ mnt / hda

/ mnt / hdb

/ mnt / hdc

* / mnt / hdd

/ mnt / hde

After changing the mfshdd.cfg, you need to reload the chunkserver (on Linux Debian / Ubuntu: service moosefs-pro-chunkserver reload).

Once the disk is marked for deletion and the chunkserver process is restarted, the system will copy the appropriate number of copies of the blocks stored on the disk to maintain the required "target" copies.

Finally, before you disconnect the disk, you need to make sure that there are no "traditional" blocks on the other disks. This can be done using the CGI monitor. Select the General Block State Matrix mode on the Information tab.

7. My experience with clustered file systems is that metadata operations are quite slow. How did you solve this problem?

In the process of research and development, we also observed the problem of slow operation of metadata. We decided to alleviate some speed issues by leaving the file system structure in the RAM on the metadata server. This is why the metadata server increases memory requirements. Metadata is often flushed to files on the primary server.

In addition, in the CE version, the metadata logger server often receives updates to metadata structures and writes them to its file system.

In the Pro version, metalworkers are optional because the main followers are in sync with the leader master. They also save metadata to the hard drive.

8. What does the value of the directory size mean to MooseFS? It is different from the standard Linux ls-l output. Why?

Folder size has no special meaning in any file system, so our development team decided to give additional information. This number represents the total length of all files shown in exponential notation (for example, mfsdirinfo-h-l).

You can "translate" directory sizes in the following ways:

There are seven digits: xAAAABB. To convert this symbol to the number of bytes, use the following expression:

AAAA.BB xBytes

Where x:

0 =

1 = base ratio

2 = Mebi

3 = Gibby

4 = Tebi

Example:

Translate the following entries:

Drwxr-xr-x 164Roots were tested at 11:47 on May 24, 2010.

XAAAABB

Folder size 2010616 should be read as 106.16 MiB.

When x = 0, the number may be smaller:

Example:

The folder size 10200 represents 102bytes.

9. When I performed df-h on the file system, the results were as expected, taking into account the actual size of the written file.

Each chunkserver sends its own disk usage increment 256MB for each partition / hdd used, and the host sends the sum of these values to the client as the total disk usage. If you have 3 chunkservers of 7 hdd each, your disk usage will increase by 3 * 7 * 256MB (about 5GB).

Another reason for the difference is that when you use a disk dedicated to MooseFS on chunkserver, df will show the correct disk usage, but if you have other data on your MooseFS disk, df will also count your own files.

If you want to see the actual space usage of the MooseFS file, use the mfsdirinfo command.

10. Can I keep the source code on MooseFS? Why do small files take up more space than expected?

The system was originally designed to hold a large number of (such as thousands) very large files (dozens of GB) and has the hard-coded block size of 64MiB and the block size of 64KiB. Using a consistent block size helps improve network performance and efficiency because all nodes in the system can use a single "bucket" size. This is why even a small file will take up 64KiB plus the checksum of 4Ki and the title of 1KiB.

The question of the footprint of small files stored in MooseFS blocks is really more important, but in our view, it is still trivial. Let's set the target for 25 million files to 2. 0. Calculate the storage overhead, which may create blocks of approximately 50 million 69 KB that may not be fully utilized due to internal fragmentation (the file size is smaller than the block size). Therefore, the overall waste of space of $50 million is about 3.2TiB. By modern standards, this should not be a major concern. Due to the block size of the file system used, more typically medium to large projects with 100000 small files will take up the most additional 13GiB space.

So it makes sense to store source code files on MooseFS systems, whether during development or for long-term reliable storage or archiving purposes.

Considering the performance of the network file system, the larger factor to consider may be the comfort of the development code. When using MooseFS (or any other network-based file system, such as NFS,CIFS) under an active development project, the network file system may not be able to perform file IO operations at the same speed as a directly connected regular hard drive.

Some modern integrated development environments (IDE), such as Eclipse, make frequent IO requests on several small workspace metadata files. Running Eclipse with the workspace folder on the MooseFS file system (and any other networked file system) will produce slightly slower user interface performance than running Eclipse with the workspace on the local hard drive.

If you use MooseFS as the active development working copy in IDE, you may need to evaluate it yourself.

In another example, using a typical text editor for source code editing and version control systems such as Subversion to check project files into the MooseFS file system usually does not cause any performance degradation. The IO overhead of the network file system nature of MooseFS is offset by the large IO latency of interacting with the remote Subversion repository. When using a simple text editor (outside of complex IDE products), there is no observable delay in single file manipulation (opening, saving).

It is more likely that the Subversion repository files are hosted on the MooseFS file system, where svnserver or Apache + mod_svn will provide requests to the Subversion repository and the user will check the work sandbox to the local hard drive.

11. Will Chunkserver and Metadata Server do their own checksum?

The block server does its own checksum. The cost of each 64KiB block is about 4B, and each 64MiB block is 4KiB.

There is no metadata server. We thought this would take up CPU. We recommend using the ECC RAM module.

twelve。 What resources are required for the primary server?

The most important factor is the RAM of the MooseFS Master machine, because the complete file system structure is cached in RAM to increase speed. In addition to RAM, the MooseFS Master machine needs some space on the HDD for the main metadata file and incremental logs.

The size of the metadata file depends on the number of files (not the size). The size of the incremental log depends on the number of operations per hour, but the length of this incremental log (in hours) is configurable.

13. When you delete a file or directory, the size of the MooseFS does not change. Why?

MooseFS does not erase the file immediately upon deletion, allowing you to restore the delete operation. Deleted files are kept in the trash so that the amount of time configured before deletion.

You can configure the length of time the file is kept in the trash can and manually empty the trash can (to free up space). For more information in the reference Guide, see the section "Operations for MooseFS".

In short, the time when deleted files are stored can be verified by the mfsgettrashtime command and changed using mfssettrashtime.

14. When I added a third server as an additional chunkserver, it looked like it was starting to copy data to the third server, even though the file target was still set to 2.

Yes. Disk balancers use blocks independently, so one file can be reassigned to all chunkserver.

Is 15.MooseFS 64-bit compatible?

Yes!

16. Can I change the block size?

No. The file data is divided into fragments (blocks) of the largest 64MiB. The value of 64 MiB is hard-coded into the system, so you cannot modify its size. We based on the block size in real-world data and determined that it was a very good compromise between the number of blocks and the speed of rebalancing / updating the file system. Of course, if the file is less than 64 MiB, it takes up less space.

In the system we are dealing with several file sizes are significantly larger than 100GB and there is no obvious penalty for block size.

17. How to know whether a file has been successfully written to MooseFS

Let's briefly discuss the process of writing to the file system and the consequences of this program.

In all modern file systems, files are written through buffers (write caches). Therefore, the execution of the write command itself only transfers the data to the buffer (cache), not the actual write. Therefore, the execution of the confirmed write command does not mean that the data has been written to the disk correctly. Only by calling and completing the fsync (or close) command can all data stored in the buffer (cache) be physically written out. If an error occurs while writing to such a buffer to save data, it may cause the fsync (or close) command to return an incorrect response.

The problem is that most programmers do not test the state of the off command (which is usually a very common error). Therefore, a program that writes data to disk can "assume" that the data has been written correctly from the successful response of the write command, when in fact it may fail during a subsequent shutdown command.

In a network file system, such as MooseFS, because of its nature, the average amount of "remaining" data in the buffer (cache) will be higher than that in the regular file system. Therefore, the amount of data processed during the close or fsync command is usually important, and if an error occurs while writing data [from the close or fsync command], it will be returned as an error during the execution of the command. Therefore, before performing a shutdown, it is recommended that you perform a fsync operation after writing to the file, especially when using MooseFS, and then check the status of the result of the fsync operation. Then, for good measurement, check the return status of the shutdown.

Be careful! When using stdio, the fflush function only executes the "write" command, so executing fflush correctly is not enough to ensure that all the data is written-you should also check the status of the fclose.

The above problem may occur when redirecting the standard output of a program to a file in shell. Bash (and many other programs) do not check the status of closed execution. As a result, the syntax of the "application > outcome.txt" type may end up successfully in shell when there is actually an error writing out the "outcome.txt" file. It is strongly recommended that you avoid using the above shell output redirection syntax when writing to MooseFS mount points. If necessary, you can create a simple program that reads standard input and writes everything to the selected file, which correctly uses the fsync command to properly check the status of the results. For example, "application | mysaver outcome.txt"

Please note that the above problems are by no means exceptional and do not stem directly from the characteristics of MooseFS itself. It can affect any file system-network type systems are more prone to this difficulty. Technically, the above recommendations should always be followed (also applicable to the use of traditional file systems).

18. What are the limitations of MooseFS (such as file size limit, file system size limit, maximum number of files, which can be stored on the file system)?

The maximum file size in MooseFS is limited to 257 bytes = 128PiB.

The maximum file system size is limited to 2 64 bytes = 16 EiB = 16 384 PiB

The maximum number of files that can be stored on an MooseFS instance is 231-more than 2.1bln.

19. Can I set up HTTP basic authentication for mfscgiserv?

Mfscgiserv is a very simple HTTP server that is only used to run MooseFS CGI scripts. It does not support any other features, such as HTTP authentication. However, MooseFS CGI scripts can be provided from another fully functional HTTP server with CGI support, such as lighttpd or Apache. When using a full-featured HTTP server, such as Apache, you can also take advantage of the capabilities provided by other modules, such as HTTPS transport. Simply place the CGI and its data file (index.html,mfs.cgi,chart.cgi,mfs.css,acidtab.js,logomini.png,err.gif) under the selected DocumentRoot. If you already have an HTTP server instance on a given host

20. Can I run the mail server application on MooseFS? The mail server is a very busy application with a large number of small files-I won't lose any files?

You can run a mail server on MooseFS. You will not lose any files under a large system load. When the file system is busy, it blocks until its operation is complete, which causes the mail server to slow down.

21. Is there any suggestion for network, MTU or bandwidth?

We recommend using jumbo frames (MTU = 9000). Using a larger number of block servers, the switch should be connected through optical fiber or use aggregation links.

Does 22.MooseFS support supplementary organizations?

Yes.

Does 23.MooseFS support file locking?

Yes, because of MooseFS 3.0.

24. Can I assign an IP address to a block server through DHCP?

Yes, but we strongly recommend that you set "DHCP reservation" based on the MAC address.

25. Some of my small servers take up 90% of the space, while others only take up 10%. Why does the rebalancing process take so long?

Our experience working in a production environment shows that active replication is not desirable because it can greatly slow down the entire system. The overall performance of the system is more important than the equal use of hard drives on all chunked servers. By default, replication is configured as a non-aggressive operation. In our environment, it usually takes about a week for a new chunkserver to reach standard hdd utilization. Active replication will make the whole system quite slow in a few days.

You can adjust the replication speed when the primary server starts by setting the following two options:

CHUNKS_WRITE_REP_LIMIT

The maximum number of blocks to copy to a chunkserver (the default value is 2pm 1pm 1pm 4).

One number equals four identical numbers separated by colons.

The first restriction is an endangered block (a block with only one copy)

The second limitation is for traditional blocks (the number of blocks that are less than the specified target)

The third limitation is rebalancing between servers used for space with arithmetic averages

The fourth limitation is rebalancing between other servers (very low or very high space usage)

Usually the first number should be greater than or equal to the second, the second greater than or equal to the third, and the fourth greater than or equal to the third (1st > = 2nd > = 3rd = 2nd > = 3rd

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.