In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
[TOC]
I. fsimage and edits documents 1. Basic concepts
Txid:
Namenode gives a unique id identity to each action event (add, delete and modify operation), which is called txid, which usually increases from 0, and the txid increments by 1 for each additional operation.
Fsimage:
It is a mirror file of namenode's in-memory metadata on local disk, but usually fsimage does not contain refreshed operation events, so there is a gap between in-memory metadata and in-memory metadata in essence. What is recorded here is not the operation log, which contains serialization information for all directories and files idnode of the HDFS file system. The general naming is in the form of fsimage_txid, and the following txid represents the txid of the latest action event recorded by the fsimage.
Edits:
This is the operation log file that records the addition, deletion and modification of namenode. If you know mysql, this is similar to the binary log of mysql.
2 、 Directory structure of namenode [root@bigdata121 tmp] # tree dfs/namedfs/name ├── current │ ├── edits_0000000000000000001-0000000000000000002 │ ├── edits_0000000000000000003-000000000000004 │ ├── edits_0000000000000000005-00000000000000006 │ ├── edits_0000000000000000007-000000000000008 │ ├── edits_0000000000000000009-00000000000000009 │ ├── edits_0000000000000000010-00000000000000011 │ ├── edits_0000000000000000012-000000000000013 │ ├── edits_0000000000000000014-000000000000015 │ ├── edits_0000000000000000016 -0000000000000000017 │ ├── edits_0000000000000000018-00000000000000019 │ ├── edits_0000000000000000020-00000000000000021 │ ├── edits_0000000000000000022-00000000000000024 │ ├── edits_0000000000000000025-000000000000026 │ edits_inprogress_0000000000000000027 │ fsimage_0000000000000000024 │ ├── fsimage_0000000000000000024.md5 │ ├── fsimage_0000000000000000026 │ ├── fsimage_0000000000000000026.md5 │ seen_txid VERSION in_use.lock
To sum up, it is actually simplified into the following structure:
Dfs/name ├── current │ ├── edits_txid1-txid2 may have multiple The old edits file │ ├── edits_inprogress_txid3 that has been scrolled is the md5 check value of the currently used edits │ ├── fsimage_0000000000000000024 fsimage file │ ├── fsimage_0000000000000000024.md5 fsimage file │ ├── seen_txid record the latest txid │ └── VERSION record some simple messages from the hdfs cluster Information └── in_use.lock lock file Avoid using this directory to start multiple namenode
(1) contents of VERSION file
# there will be multiple namenode in the hdfs, and the namenodeID of different namenode is different. Manage a group of blockpoolIDnamespaceID=983105879# cluster ID respectively. The global unique clusterID=CID-c12b7022-0c51-49c5-942f-edc889d37fee# marks the time when the storage directory of the namenode is created. For the storage system you just created, this property is 0. 0. However, after the file system upgrade, the value is updated to the new timestamp cTime=1558262787574 # marking whether the storage directory is namenode or datanodestorageType=NAME_NODE # A block pool id identifies a block pool and is globally unique across clusters. When a new Namespace is created (part of the format process), a unique ID is created and persisted. Building globally unique BlockPoolID during the creation process is more reliable than artificial configuration. NN persists the BlockPoolID to disk and will load and use it again during subsequent startup. BlockpoolID=BP-473222668-192.168.50.121-1558262787574 # this is useless layoutVersion=-63
(2) seen_txid
The latest txid is recorded in this file.
(3) the directory structure of SNN is the same as that of namenode, except that some of the latest edits files are missing.
3. The relationship between fsimage and edit file naming.
We can see that the file names of the above fsimage and edits files are followed by a long string of numbers, what is that, it is actually txid, from the way they are named, we can see some rules.
Edits file:
We can see that the edits files are named in the edits_00000xxx-000000xxx way, which actually means that the scope of the txid operation event is recorded in the edits file. Edit_inprogess_00000xxx, on the other hand, indicates the latest txid event currently logged and that the file is the edits file currently in use.
Fsimage file:
Named as fsimage_000000xxx, it represents the latest txid event logged to the fsimage file. Note that the edits file will not be merged into fsimage until fsimage is conditionally triggered, otherwise it will not be merged. So in general, the txid behind the edits file will be larger than fsimage.
4. View the content / / format of the fsimage file: hdfs oiv-p output format-I input file-o output file [root@bigdata121 current] # hdfs oiv-p XML-I fsimage_0000000000000000037-o / tmp/fsimage37.xml
As mentioned earlier, fsimage records mainly metadata information, which describes the directory structure stored in hdfs and the files under the directory, as well as the corresponding directory and file metadata. Let's intercept some of the information and take a look:
-63 1 17e75c2a11685af3e043aa5e604dc831e5b14674 983105879 1000 1014 0 1073741837 334 16407 16 here is the key point, recording the directory structure and meta information 16386 DIRECTORY this is the directory The name is test test 1558263065070 modify time root:supergroup:0755 permission-1-1 16387 this is a file Named edit_new.xml FILE edit_new.xml 2 1558263065045 1558269494520 134217728 root:supergroup:0644 permission here is block information Which block 1073741825 1001 5800 are included?
From the above part of the fsimage information, it records the directory structure of the current file system and the corresponding meta-information. Unlike edits, edits records operations on the file system.
5. View the edits file content / / format: hdfs oev-p output format (default XML)-I input file-o output file [root@bigdata121 current] # hdfs oev-I edits_inprogress_0000000000000000038-o / tmp/edits_inprogess.xml
Also intercept part of the information to view:
-63 OP_START_LOG_SEGMENT indicates the category of the operation. Here, it means that the log starts to record 38 similar to fuck ID. Is the only thing recorded by each RECORD is a single operation OP_ADD_BLOCK / / like this indicates the operation of uploading files 34 / jdk-8u144-linux-x64.tar.gz._COPYING_ 1073741825 1001 1073741826 01002-2
Each RECORD records an operation, such as the one in the figure
OP_ADD stands for add file operation. Generally speaking, it is also recorded
File path (PATH)
Modification time (MTIME)
Add time (ATIME)
Client name (CLIENT_NAME)
Client address (CLIENT_MACHINE)
Very useful information such as permissions (PERMISSION_STATUS)
6. Manually scroll edits log format: hdfs dfsadmin-rollEdits7, configuration of NN,SNN,DN data directory (1) hadoop.tmp.dir is configured
If hadoop.tmp.dir is configured in core-site.xml, the respective data directories are as follows:
NN: {hadoop.tmp.dir} / dfs/name fsimage and edits files are stored in this directory SNN:$ {hadoop.tmp.dir} / dfs/namesecondary SNN data directory DN:$ {hadoop.tmp.dir} / dfs/data datanode data directory (2) set up a separate directory
If you do not set the value of hadoop.tmp.dir, then NN,SNN,DN needs to set its own data directory manually, otherwise the data files will be generated under / tmp/hadoop-root/dfs/. The setting parameters are as follows:
/ * both are set in hdfs-site.xml * / / if only one of these two is set, then both fsimage and edits files will be stored in a specified directory. NN:dfs.namenode.name.dir sets fsimage storage path dfs.namenode.edits.dir sets edits storage path DN: dfs.datanode.data.dir this is the datanode storage directory SNN:dfs.namenode.checkpoint.dir this is the SNN storage directory (3) namenode multi-directory settings
When setting the working directory of namenode separately, we can set multiple values to dfs.namenode.name.dir separated by commas, so when hdfs namenode-format format, two namenode directories will also be formatted, and the contents of the two directories will be consistent during the running process, which can be used as a supplement to namenode backup data. Such as:
Dfs.namenode.name.dir file:///${hadoop.tmp.dir}/dfs/name1,file:///${hadoop.tmp.dir}/dfs/name2 II, namenode and SNN workflow
1. Namenode startup phase
(1) when you start namenode for the first time (that is, after formatting namenode for the first time), fsimage and edits files are automatically created. If it is not the first time to start, namenode loads the latest fsimage and the edits file contained from the fsimage to the latest action event (subject to the txid recorded in the seen_txid file) into memory, and finally stores the latest meta-information in memory. And create a new edits file for recording the operation, named edit_inprogress_xxxx.
(2) client initiates a request for addition, deletion and modification of namenode
(3) namenode response request
(4) when namenode records the addition, deletion and modification operation, the operation will be written to the edits file first, and the metadata stored in memory will not be modified until it is successfully written. The purpose of this approach is to ensure that the latest operations must be continuously stored in permanent storage such as disk, so as to avoid accidental loss of operation records.
2. Working stage of SNN
(1) according to the set time interval for checking checkpoint, SNN asks whether namenode needs to execute checkpoint. Namenode responds to the SNN result.
(2) if the result is yes, SNN requests namenode to perform checkpoint operation
(3) namenode starts to perform checkpoint operation, first scroll the edits file that is currently in use, name the scrolling edits file as edits_txid1-txid2 form, and create a new edits file named edits_inprogess_txid2+1. The main purpose of scrolling edits is to prevent the merge operation from affecting the services provided by namenode. After scrolling, the operation records can be normally written into the new edits file.
(4) copy the latest fsimage (see the txid at the back of the file, the largest is the latest), and the edits file from this to the latest txid to SNN. Note that the txid of the fsimage file name and the txid after the edits file name do not need to be copied. The edits file that is smaller than the txid after fsimage does not need to be copied.
(5) SNN reads the copied fsiamge and edits files into memory for merging
(6) generate a new fsimage file named fsimage.chkpoint after merging
(7) copy fsimage.chkpoint to namenode
(8) after namenode receives the fsimage.chkpoint, it is renamed to the form of fsimage_txid, followed by a txid that represents the txid of the latest operation recorded by this fsimage file.
3. Checkpoint check parameter configuration
Hdfs-default.xml
-actual interval of checkpoint-dfs.namenode.checkpoint.period 3600-number of operations-dfs.namenode.checkpoint.txns 1000000 number of actions dfs.namenode.checkpoint.check.period 604, conditions for triggering checkpoint
(1) the size of the edis file exceeds 64m
(2) the current edits file exists for more than a certain period of time. Default is 3600 seconds.
(3) the number of operations recorded in the edit file reaches the specified number of times, and the default is 1000000.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.