Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Vernacular big data's HDFS

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

What is said in many books is too esoteric, translated from official explanations, or added some notes, but our students who need entry need to speak vernacular, the simpler the better, so let me sort it out and write the catalogue here:

First of all, why use hadoop?

Talk about what a file system is.

The development process of hard disk

Let's take a look at HDFS.

Practical process

Next, one by one.

Why use hadoop?

Because distributed storage + distributed computing can save more than a single computer, the performance of a pile of CPU in computing is better than that of multiple CPU on a single machine.

two。 What is a file system?

So what is HDFS? HDFS is Hadoop Distributed File System. Can be understood as a module of hadoop, dedicated to managing file storage.

In order to have a deep understanding of HDFS, we first need to understand what a file system is, because HDFS is hadoop's distributed file system in literal translation, so let's look at the file system first.

Imagine that if you were given a disk with 100 gigabytes of free space, how would you use it?

Of course we all know to format first, but do you know why it is formatted. After formatting, the file system is established.

In fact, it is very simple, many people should have done disk defragmentation, should probably understand the concept of disk cluster, that is, when formatting, in fact, cluster is a part of the file system. Suppose you think of the disk storage space as a blank piece of paper, and the data is the word to be written on the blank paper. Formatting is to draw a line on the paper and divide the paper into small squares. Corresponding to our hard disk, that is, a lot of disk clusters are delineated, usually a cluster is 2k in size. But the cluster on the hard disk will be a little more complicated than the small grid on the paper, because it has some additional content, which at least records: where is my last cluster, where is my next cluster?

Well, when you store a file, you have to find space on a disk first. Of course, we want to have a continuous space on the disk, so that storage and reading efficiency will be very high. However, your file may continue to be edited, it is possible that one day, it will change from the original 20m to 30m, but when you need to increase the storage space, you may find that, ah, the space behind 20m is gone and is occupied by another file. What are we going to do? do you need to move all the files back 10m? I believe you won't do this inefficient work. So, just find another 10m space, and then point the last cluster of your first 20m (where is my next cluster) to the first cluster of the newly found 10m. That's what the file system provides.

Then, you must notice that the file system should know the general situation of each cluster on the disk, such as which cluster has data and which cluster is blank, right?

Of course, how does the file system know? Remember a word you once heard, called FAT, then FAT16,FAT32, then NTFS (these are all used in the windows file system, I didn't mention the linux file system, but the reason should be the same), let's start with the simplest, FAT (File Allocation Table) translated is a file allocation table, usually placed in the head of the disk. You can probably know what he does from his name. In fact, it is the cluster on which the main registration documents of the file system are stored. This FAT is very important, must not be lost, once lost, the files on the disk will not be found (except for those that can be repaired with disk repair tools or Tuhao). You can think of FAT as a total directory of contents stored on disk. Files on disk are deleted, added, moved, and so on, which correspond to changes in the contents of FAT.

So, let's think about how big such a FAT is, how much content it can store, and how many disk clusters it can manage. By the way, he is limiting that a FAT16 can only manage no more than 2G of disk space, and then FAT32 has made improvements to manage 2T of space. With the improvement of NTFS, there is no space limit, because NTFS will have a lot of things like FAT32, which will no longer be concentrated in the disk header, but will be found in many places, and the number will vary with disk capacity.

What? no, no, no.

FAT16 can only have 2G of disk space. Remember that in those days, when you bought a computer to go home, the disk had to be partitioned? there was also this reason.

Well, we probably reviewed the file system FAT of that year, and you will probably understand the function of the file system. With him, our operating system is at ease when it stores files. Just like in our newly bought house, when we put things on the floor, we will certainly not pile up on the floor. We will buy some furniture and let the family have more formatting. In this way, the things we put will be more regular, and it will be easy to find them when looking for them.

3. The development process of hard disk

This is relatively simple. When I first learned the computer, the hard disk storage space was M, and later, when I had a 1G disk, I thought it was awesome.

Now, the disk is always 2T.

Moreover, we also have RAID technology on our server, that is, disk array technology, which can format a pile of disks into a single disk, so that there is more storage space. This used to be a very good technology, but now it is certainly very popular.

However, do you think the development speed of the disk is fast enough? in fact, it is also very fast. But the data is growing faster. Especially in the Internet age, it's really an explosion.

No matter how awesome the hard disk and array, there is not enough storage. For example, last year, the daily data increment was about 150T, can you imagine.

Even if you save it with several large hard drives today and find several large hard drives tomorrow, when you look up the data, when you do the analysis.

What to do, does the hard disk technology want to change.

People are very smart, and it's natural to think that we can imitate disk array technology, write a piece of software, and manage all the disks on a lot of machines, so that we can make a super-large virtual disk. Ha, it has to be possible. Well, with this idea, this is DFS's train of thought, so hadoop also has a specific idea, and he has also implemented it, and that is his HDFS.

So you can see the development path of the disk.

Small disk-large disk-disk array-virtual distributed disk group

4. Let's take a look at HDFS.

Since we want to make a virtual disk, do we have to do it like a real disk storage? the way of thinking is similar.

The original file system we mentioned above, FAT, is actually needed for distributed disks, but the name will certainly not be so, so hadoop has a name: namenode.

In addition, there are disk clusters after ordinary hard disk formatting, and what is it after HDFS formatting? of course, it is also a small storage format one by one. Hadoop calls it datanode.

Namenode must exist on one or a few machines, equivalent to the total directory of the entire virtual disk, and it records which datanode is blank and which datanode has files on it. You see, is it very similar to FAT? But what is more brain-burning is that he now manages a lot of disk storage on many machines. You can imagine, in your deployment of the HDFS cluster, is actually running a large virtual disk, is not good-looking

After understanding the above, you can take a look at another article. I think it is very good. I collect it directly here for reference:

Http://www.cnblogs.com/laov/p/3434917.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report