How to ensure the high availability of data in the server 04/18 Update SLTechnology News&Howtos

How to ensure the high availability of data in the server

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the relevant knowledge of "how to ensure the high availability of data in the server". The editor shows you the operation process through an actual case, and the operation method is simple, fast and practical. I hope this article "how to ensure the high availability of data in the server" can help you solve the problem.

Distributed technology, when ensuring high availability and fault tolerance or resilience, these two methods are commonly used:

Copy

Zoning

One corresponding to English is replication, and the other is partition.

Replication is to save more than N copies of the data written each time, so that it can be used to recover in the event of a failure, and at the same time, read-write separation can be carried out to ease the pressure of reading. Partition is when the data exceeds the storage capacity of a single machine, it is divided into multiple sets of storage according to certain rules. In fact, there is replication of partitioned data to ensure that the data of this partition is not lost.

Copy it and you can rest easy?

This is an ideal situation, in fact, there will be computer room failures, hard disk damage, power outages and other problems every day.

For example, we backup the photos and files on the mobile phone to the network disk, and as soon as the mobile phone is cleaned, we are happy to free up a lot of space. When I went to see it one day, some of the data in the network disk could not be found. The customer service told you that the hard drive of the backup files in the computer room one day was broken and could never be found again. How do you feel?

You must angrily ask the customer service why he didn't save more copies. But what if the rack where the whole hard drive is located is down?

Like our application services, in order to deal with problems and ensure high availability, in addition to not having a single point, you have to consider deploying instances in different server rooms, so that even if there is a problem in one server room, another server room can still carry it.

Corresponding to the replication and backup of data, smart brains have come up with a similar idea to store the backup on different hard drives, different racks, or even different computer rooms, like rabbits.

The three grottoes of cunning rabbit. : -)

We generally use the network disk, all kinds of storage provided by cloud service vendors, and there is a distributed storage service behind it to ensure the high availability of applications, flexible fault tolerance, and so on. Like the divine book DDIA we shared before, many technologies are used in these services.

As the core storage implementation of Hadoop, HDFS also supports this more secure multi-storage backup implementation internally. In HDFS, this technique is called Rack Awareness.

Rack is a rack, which stores a bunch of physical machines in the computer room or data center and is managed by network technology.

In Hadoop, in order to improve the speed of reading and writing HDFS files on the network in a cluster, the NameNode that manages MetaData information selects DataNode according to the nearest Rack.

Read and write. After all, NameNode stores the corresponding information of rack id and DataNode, and our backup data is actually written into DataNode. Rack id is equivalent to a code name.

The process by which NameNode chooses a closer DataNode based on Rack id is called Rack Awareness.

The default Hadoop belongs to the same Rack according to all DataNode. This is easy to cause problems, and when you open Awareness, the effect is similar to the following figure

When we save the file, the file will be divided into 128m Block, then through NameNode to get the specific saved data DataNode address, the default is 3 backups, following the principle is "each block, two backups exist on the same rack, the third backup exists on a different rack". This rule is also known as Replica Placement Policy.

How to determine which rack to put on when storing? As we mentioned earlier, it is determined by rack id, which can be obtained internally by executing an external script or by specifying a Java class in the configuration file.

The following is an example of python given in the official documentation

Why do I need Rack Awareness?

It can ensure the high availability and dependence of the data.

Improve the performance of the cluster

Avoid data loss in the event of an entire Rack failure

This is the end of the content about "how to ensure the high availability of data in the server". Thank you for your reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.