In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you what kind of FileSystem in HDFS, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
First, let's take a look at FileSystem (org.apache.hadoop.fs.FileSystem), an abstract class that is the parent of all file systems.
If we want to download data from HDFS (Hadoop DistributedFileSystem), we should get an instance of DistributedFileSystem, so how do we get an instance of DistributedFileSystem?
FileSystem fs = FileSystem.get (new Configuration ())
There are three overloaded get () methods in FileSystem
/ / 1. Get a FileSystem instance public static FileSystem get (Configuration conf) / / 2 through configuration file. Get a FileSystem instance public static FileSystem get (URI uri, Configuration conf) / / 3 through the specified URI of FileSystem. Get a FileSystem instance public static FileSystem get (final URI uri, final Configuration conf, final String user) through the specified FileSystem URI, configuration file, and FileSystem user name
First call the FileSystem.get (Configuration conf) method, and then call the overloaded method FileSystem.get (URI uri, Configuration conf)
Public static FileSystem get (URI uri, Configuration conf) throws IOException {/ / schem is the specific URI scheme of FileSystem, such as file, hdfs, Webhdfs, har, etc. String scheme = uri.getScheme (); / / scheme = hdfs / / authority is the host name of NameNode, port number String authority = uri.getAuthority () / / authority = node1:9000... / / disableCacheName = fs.hdfs.impl.disable.cache String disableCacheName = String.format ("fs.%s.impl.disable.cache", scheme); / / read the configuration file to determine whether to disable cache if (conf.getBoolean (disableCacheName, false)) {/ / if cache return createFileSystem (uri, conf) is disabled / / directly call the method of creating FileSystem instance} / / do not disable the cache, first obtain the FileSystem instance return CACHE.get (uri, conf) from the static member variable CACHE of FileSystem;}
Then call the FileSystem$Cache.get (URI uri, Configuration conf) method (Cache is the static inner class of FileSystem)
FileSystem get (URI uri, Configuration conf) throws IOException {Key key = new Key (uri, conf); / / key = (root (auth:SIMPLE)) @ hdfs://node1:9000 return getInternal (uri, conf, key);}
Then call the FileSystem$Cache.getInternal (URI uri, Configuration conf, FileSystem$Cache$Key key) method (Key is also the static inner class of Cache)
Private FileSystem getInternal (URI uri, Configuration conf, Key key) throws IOException {FileSystem fs; synchronized (this) {/ / map is a member variable used to cache FileSystem instances in Cache with the type HashMap fs = map.get (key);} if (fs! = null) {/ / if the corresponding FileSystem instance return fs is obtained from the cache map / / then return this instance} / / otherwise, call the FileSystem.createFileSystem (URI uri, Configuration conf) method to create the FileSystem instance fs = createFileSystem (uri, conf) / * split line 1, expecting the return of the createFileSystem () method * / synchronized (this) {/ / refetch the lock again / * * in a multithreaded environment, another client (another thread) may have created a DistributedFileSystem instance and cached it in map * so At this point, log out the newly created DistributedFileSystem instance on the current client * in fact, this is a special singleton mode, where a key maps a DistributedFileSystem instance * / FileSystem oldfs = map.get (key). If (oldfs! = null) {/ / a file system is created while lock is releasing fs.close (); / / close the new file system return oldfs; / / return the old file system} / * * now insert the new file system into the map * cache the newly created DistributedFileSystem instance to map * / fs.key = key Map.put (key, fs); Return fs;}}
From split line 1, call the FileSystem.createFileSystem (URI uri, Configuration conf) method first
Private static FileSystem createFileSystem (URI uri, Configuration conf) throws IOException {/ / get the specific URI mode of FileSystem by reading the configuration file: hdfs class object Class clazz = getFileSystemClass (uri.getScheme (), conf); / / clazz = org.apache.hadoop.hdfs.DistributedFileSystem. / / reflect a DistributedFileSystem instance FileSystem fs = (FileSystem) ReflectionUtils.newInstance (clazz, conf) / / A pair of DistributedFileSystem instances initializes fs.initialize (uri, conf); return fs;}
Before calling the DistributedFileSystem.initialize (URI uri, Configuration conf) method, let's take a look at the DistributedFileSystem class.
DistributedFileSystem is a subclass implementation of the abstract class FileSystem
Public class DistributedFileSystem extends FileSystem {... DFSClient dfs; / / DistributedFileSystem holds a member variable of type DFSClient dfs, the most important member variable!.}
Call the DistributedFileSystem.initialize (URI uri, Configuration conf) method
Public void initialize (URI uri, Configuration conf) throws IOException {. / / new a DFSClient instance. The member variable dfs refers to the DFSClient instance this.dfs = new DFSClient (uri, conf, statistics); / * split line 2, expecting the return of new DFSClient () * /...}
Before the new DFSClient instance, let's take a look at the DFSClient class! See which member variables to assign values to.
Public class DFSClient implements java.io.Closeable, RemotePeerFactory {... Final ClientProtocol namenode; / / DFSClient holds a member variable of type ClientProtocol, namenode, and a RPC proxy object / * The service used for delegation tokens * / private Text dtService;.}
From split line 2, call DFSClient's constructor DFSClient (URI nameNodeUri, Configuration conf, FileSystem$Statistics statistics), and then call the overloaded constructor DFSClient (URI nameNodeUri, ClientProtocol rpcNamenode, Configuration conf, FileSystem$Statistics statistics)
Public DFSClient (URI nameNodeUri, ClientProtocol rpcNamenode, Configuration conf, FileSystem.Statistics stats) throws IOException {... NameNodeProxies.ProxyAndInfo proxyInfo = null; if (numResponseToDrop > 0) {/ / numResponseToDrop = 0 / / This case is used for testing. LOG.warn (DFSConfigKeys.DFS_CLIENT_TEST_DROP_NAMENODE_RESPONSE_NUM_KEY + "is set to" + numResponseToDrop + ", this hacked client will proactively drop responses"); proxyInfo = NameNodeProxies.createProxyWithLossyRetryHandler (conf, nameNodeUri, ClientProtocol.class, numResponseToDrop);} if (proxyInfo! = null) {/ / proxyInfo = null this.dtService = proxyInfo.getDelegationTokenService (); this.namenode = proxyInfo.getProxy () } else if (rpcNamenode! = null) {/ / rpcNamenode = null / / This case is used for testing. Preconditions.checkArgument (nameNodeUri = = null); this.namenode = rpcNamenode; dtService = null } else {/ / the first two if are set up only in the case of testing. The code block of this else is the key. / * * create an object of type NameNodeProxies.ProxyAndInfo. Is the proxyInfo referencing this object * createProxy (conf, nameNodeUri, ClientProtocol.class) method very similar to RPC.getProxy (Class protocol, * long clientVersion, InetSocketAddress addr, Configuration conf)? * that's right! You read it right! This means that the relevant methods of RPC must be called inside the createProxy () method. * conf is conf * nameNodeUri = hdfs://node1:9000 of type Configuration. This is not the case that hostName and port * ClientProtocol.class of addr of type InetSocketAddress are class objects of RPC protocol interface * ClientProtocol is used by user code via DistributedFileSystem class to communicate * with the NameNode * ClientProtocol is used by DistributedFileSystem to communicate with NameNode. * / proxyInfo = NameNodeProxies.createProxy (conf NameNodeUri, ClientProtocol.class) / * Segmentation line 3, expecting the return of the createProxy () method * / this.dtService = proxyInfo.getDelegationTokenService (); this.namenode = proxyInfo.getProxy ();}.}
From split line 3, call the NameNodeProxies.createProxy (Configuration conf, URI nameNodeUri, Class xface) method
/ * Creates the namenode proxy with the passed protocol. This will handle * creation of either HA- or non-HA-enabled proxy objects, depending upon * if the provided URI is a configured logical URI. * create a proxy object for namenode through the passed protocol parameter. As for whether it is a HA or non-HA namenode proxy object, * it depends on the actual Hadoop environment * * / public static ProxyAndInfo createProxy (Configuration conf, URI nameNodeUri, Class xface) throws IOException {/ / get the configuration of HA in the real Hadoop environment Class failoverProxyProviderClass = getFailoverProxyProviderClass (conf, nameNodeUri, xface) If (failoverProxyProviderClass = = null) {/ / non-HA, here is the pseudo-distributed build / / Non-HA case of Hadoop, create a non-HA namenode proxy object return createNonHAProxy (conf, NameNode.getAddress (nameNodeUri), xface, UserGroupInformation.getCurrentUser (), true);} else {/ / HA / / HA case FailoverProxyProvider failoverProxyProvider = NameNodeProxies .createFailoverProxyProvider (conf, failoverProxyProviderClass, xface, nameNodeUri) Conf config = new Conf (conf); T proxy = (T) RetryProxy.create (xface, failoverProxyProvider, RetryPolicies.failoverOnNetworkException (RetryPolicies.TRY_ONCE_THEN_FAIL, config.maxFailoverAttempts, config.maxRetryAttempts, config.failoverSleepBaseMillis, config.failoverSleepMaxMillis)); Text dtService = HAUtil.buildTokenServiceForLogicalUri (nameNodeUri) / / return a proxy, the encapsulated object proxyInfo return new ProxyAndInfo (proxy, dtService) of dtService;}}
Call the NameNodeProxies.createNonHAProxy (Configuration conf, InetSocketAddress nnAddr, Class xface, UserGroupInformation ugi, boolean withRetries) method
Public static ProxyAndInfo createNonHAProxy (Configuration conf, InetSocketAddress nnAddr, Class xface, UserGroupInformation ugi, boolean withRetries) throws IOException {Text dtService = SecurityUtil.buildTokenService (nnAddr); / / dtService = 192.168.101 T proxy; if 9000 T proxy; if (xface = = ClientProtocol.class) {/ / xface = ClientProtocol.class / / create a namenode proxy object proxy = (T) createNNProxyWithClientProtocol (nnAddr, conf, ugi, withRetries) / * split line 4, looking forward to the createNNProxyWithClientProtocol () method returning * /} else if {.} / / encapsulating proxy, dtService into a ProxyAndInfo object and returning return new ProxyAndInfo (proxy, dtService);} above is all the content of the article "what kind of FileSystem is in HDFS". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.