In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how to deal with Hadoop configuration information, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to know about it.
1 profile introduction
Configuration files are an integral part of a flexible system, and although they are important, there are no standards.
1.1 Java profile
JDK provides the java.util.Properties class for working with simple configuration files. Properties was introduced into the Java class library a long time ago and has not changed much. It inherits from Hashtable and represents a persistent set of properties that can be saved in or loaded from a stream. Each key in the property list and its corresponding value are of a string type.
Public class Properties extends Hashtable {. }
The configuration file format that Properties handles is very simple, it only supports key-value pairs, with keys on the left and values on the right of the equal sign. Value. The form is as follows:
ENTRY=VALUE
The main methods for dealing with attribute lists in java.util.Properties are as follows:
1. GetProperty (): used to get the property corresponding to the specified key (parameter key) in the attribute list. It has two forms, one does not provide the default value, and the other can provide the default value.
2. SetProperty (): used to set / update property values in the attribute list.
3. Load (): this method reads key-value pairs from the input stream.
4. Store (): this method writes the list of attributes in the Properties table to the output stream.
The related code is as follows:
/ / search the attribute public String getProperty (String key) / / in this attribute list with the specified key. The function is the same as above. The parameter defaultValue provides the default value public String getProperty (String key, String defaultValue) / / finally calls the Hashtable method putpublic synchronized Object setProperty (String key, String value)
Using input and output streams, Properties objects can be saved not only in files, but also in other systems that support streams, such as Web servers. After J2SE 1.5, the data in Properties can also be saved in XML format, and the corresponding loading and writing methods are loadFromXML () and storeToXML ().
1.2 configuration files provided by the Java community
Due to the limited capabilities provided by java.util.Properties, there are a large number of read / write solutions for configuration information in the Java community, the more famous of which is Commons Configuration provided in the Apache Jakarta Commons toolset.
The PropertiesConfiguration class in Commons Configuration provides a wealth of methods to access configuration parameters. The specific features are as follows:
1. Support text and XML configuration file format.
2. Multiple configuration files can be loaded.
3. Hierarchical or multi-level configuration is supported.
4. Provide type-based access to single-valued or multi-valued configuration parameters.
It should be said that Commons Configuration is a powerful configuration file processing tool.
1.3 configuration files provided by Hadoop
Instead of using java.util.Properties to manage profiles or Apache Jakarta Commons Configuration to manage profiles, Hadoop uses a unique profile management system and provides its own API.
Even if you use org.apache.hadoop.conf.Configuration to process configuration information.
2 HadoopConfiguration detailed explanation
2.1 format of Hadoop configuration file
The Hadoop configuration file is in XML format. Here is an example of a Hadoop configuration file:
xml version= "1.0"? > xml-stylesheet type= "text/xsl" href= "configuration.xsl"? > < configuration > < property > < name > io.sort.factor < / name > < value > 10 < / value > < final > true < / final > < description > The number of. < / description > < / property > < / configuration >
The elements of the Hadoop configuration file have the following meanings:
1. The root element of the Hadoop configuration file is configuration, which generally contains only the child element property.
2. Each property element is a configuration item, and the configuration file does not support layering or grading.
3. Each configuration item generally includes the name name of the configuration property, the value value, and a description description about the configuration item.
4. The element final is similar to the keyword final in Java, meaning that the configuration item is "fixed". Final generally does not appear, but when merging resources, you can prevent the value of the configuration item from being overwritten.
5. Hadoop configuration system also has a very important function, that is, attribute extension. For example, the value of the configuration item dfs.name.dir is ${hadoop.tmp.dir} / dfs/name, where ${hadoop.tmp.dir} is extended with the corresponding attribute value in Configuration. If the value of hadoop.tmp.dir is "data", then the value of the expanded dfs.name.dir is "data/dfs/name".
2.2 Overview of Configuration classes
In the Configuration class, each attribute is of type String, but the value type can be a variety of types, including basic types in Java, such as boolean (getBoolean), int (getInt), long (getLong), float (getFloat), or other types, such as String (get), java.io.File (getFile), String array (getStrings), and so on.
The Configuration class can also merge resources, which means merging multiple profiles to produce a single configuration. If you have two configuration files, that is, two resources, such as core-default.xml and core-site.xml, merge them into one configuration through the loadResources () method of the Configuration class. The code is as follows:
Configurationconf=new Configuration (); conf.addResource ("core-default.xml"); conf.addResource ("core-site.xml")
If both configuration resources contain the same configuration item, and the configuration item of the previous resource is not marked as final, then the later configuration will override the previous configuration.
If a configuration item is marked as final in the first resource, there will be a warning when the second resource is loaded.
The general process of using the Configuration class is to construct the Configuration object and add the resources that need to be loaded through the class's addResource () method; then you can use the get* method and the set* method to access / set the configuration item, and the resource is automatically loaded into the object the first time it is used.
2.3 member variables of Configuration
First, take a look at the class diagram of org.apache.hadoop.conf.Configuration:
As you can see from the class diagram, Configuration has seven main non-static member variables.
1. Quietmode: is a Boolean variable that sets the mode in which the configuration is loaded. If quietmode is true (the default), log information is not output during the load of the resolution configuration file. This variable is just a variable that is convenient for developers to debug.
2. Resources: is an array variable that holds all resources that are added to the Configuration object through the addResource () method.
3. LoadDefaults: Boolean variable that determines whether to load default resources, which are stored in defaultResources.
Note: defaultResources is a static member variable, and the default resource for the system can be added through the method addDefaultResource ().
In HDFS, hdfs-default.xml and hdfs-site.xml are taken as default resources and saved in the member variable defaultResources through addDefaultResource ()
In MapReduce, the default resources are mapred-default.xml and mapred-site.xml.
4. Properties: is a java.util.Properties type, a member variable related to the configuration item.
5. Overlay: it is of type java.util.Properties and is used to record configuration items that have been changed by set (). That is, the key-value pairs that appear in the overlay are set by the application, not by parsing the configuration resources.
6. FinalParameters: type is Set < String >, which is used to hold all keys of key-value pairs that have been declared as final in the configuration file.
Tip:
A, Hadoop configuration file parsed key-value pairs, are stored in properties.
Member variables related to B, properties, overlay, finalParameters configuration items
7. ClassLoader: a very important member variable, which is a class loader variable that can be used to load specified classes or related resources.
Knowing the specific meaning of the member variables of Configuration, the rest of the Configuration class is easier to understand. They are parsing, setting, and fetching methods for manipulating these variables.
2.4 Resource loading for Configuration
2.4.1 addResource method
Resources are added to the Configuration object through the object's addResource () method or the class's static addDefaultResource () method (with the loadDefaults flag set).
There are four forms of addResource methods:
AddResource (String name) / / load addResource (URL url) based on classpath resources / / load addResource (Path file) based on URL resources / / load addResource (InputStream in) based on file path object / / load based on an open input stream object
At the same time, the added resources are not loaded immediately, but the properties and finalParameters are cleared through the reloadConfiguration () method. The related code is as follows:
/ / take classpath resources as an example: public void addResource (InputStream in) {addResourceObject (new Resource (in));} private synchronized void addResourceObject (Resource resource) {resources.add (resource); / / add to the resources member variable reloadConfiguration ();} public synchronized void reloadConfiguration () {properties = null; / / trigger the resource's reload finalParameters.clear () / / clear the limit that does not change the parameters}
2.4.2 addDefaultResource method
The static method addDefaultResource can also empty the data in the Configuration object (charge static member variables), which is done through the static member variable REGISTRY of the class as a medium.
REGISTRY records all Configuration objects in the system. The code is as follows:
/ / static variable private static final WeakHashMap REGISTRY = new WeakHashMap (); / / record Configuration object public Configuration (boolean loadDefaults) {this.loadDefaults = loadDefaults; updatingResource = new ConcurrentHashMap (); synchronized (Configuration.class) {REGISTRY.put (this, null);}}
When addDefaultResource is called, it iterates through all the Configuration objects in the REGISTRY and calls the reloadConfiguration method on the Configuration object, which triggers a new load of the resource. The related code is as follows:
Public static synchronized void addDefaultResource (String name) {if (! defaultResources.contains (name)) {defaultResources.add (name); for (Configuration conf: REGISTRY.keySet ()) {if (conf.loadDefaults) {conf.reloadConfiguration ();}}
2.4.3 getProps method
The data in the member variable properties is not loaded until it is needed. In the getPrpops method, if properties is found to be empty, the loadResources () method will be triggered to load configuration resources. In fact, the design pattern of delayed loading is adopted here, and the configuration file will be analyzed only when configuration data is really needed, which can save system resources and improve performance. The related code is as follows:
Protected synchronized Properties getProps () {if (properties = = null) {properties = new Properties (); Map backup = new ConcurrentHashMap (updatingResource); loadResources (properties, resources, quietmode); if (overlay! = null) {properties.putAll (overlay) For (Map.Entry item: overlay.entrySet ()) {String key = (String) item.getKey (); String [] source = backup.get (key); if (source! = null) {updatingResource.put (key, source) } return properties;}
2.4.4 loadResources method
The configuration files of Hadoop are all in the form of XML, JAXP,Java API for XML Processing, which is a stable and reliable XML processing API processing XML, generally there are two ways
1. SAX,SimpleAPI for XML, which provides a streaming and event-driven XML processing method, but it is more complex to write and more suitable for dealing with large XML files.
2. DOM,DocumentObject Model, which works as follows:
A. load the XML document into memory at one time
B. create a tree structure in memory according to the elements and attributes defined in the document, objectify the document, and each node in the document corresponds to an object in the model.
C, use the programming interface provided by the object to access the XML document and then operate the XML document.
Because the configuration files of Hadoop are very small, the Configuration object uses DOM to process XML.
First analyze the code in the load part of DOM:
Private Resource loadResource (Properties properties, Resource wrapper, boolean quiet) {String name = UNKNOWN_RESOURCE; try {/ / get the factory used to create the DOM parser Object resource = wrapper.getResource (); name = wrapper.getName (); DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance (); / / ignore the annotation docBuilderFactory.setIgnoringComments (true) in XML / / provide support for XML namespaces docBuilderFactory.setNamespaceAware (true); try {/ / set Xinclude processing status to true, that is, allow XInclude mechanism docBuilderFactory.setXIncludeAware (true);} catch (UnsupportedOperationException e) {. } / / get the parsed DocumentBuilder object of XML DocumentBuilder builder = docBuilderFactory.newDocumentBuilder (); Document doc = null; Element root = null; boolean returnCachedProperties = false / / according to different resources, do preprocessing and use the corresponding if (resource instanceof URL) {/ / URL resource form. } else if (resource instanceof String) {/ / classpath resource form. } else if (resource instanceof Path) {/ / PATH resource form of hadoop. } else if (resource instanceof InputStream) {/ / Resource form of the input stream. } else if (resource instanceof Properties) {/ / resource form of key-value pairs. } else if (resource instanceof Element) {/ / handles the child element root = (Element) resource; of the configuration object.
Normal JAXP processing starts with the factory, and the factory that creates the DOM parser is obtained by calling the newInstance method of DocumentBuilderFactory. Instead of creating a DOM parser, you just get a factory for creating a DOM parser, and then you need to make some settings on the DocumentBuilder object obtained by the above newInstance method before you can go further through DocumentBuilder to get the DOM parser object builder.
The main settings for the DocumentBuilder object include:
A. ignore comments in XML documents
B. Support XML space
C. Support the inclusion mechanism of XML.
The Xinclude mechanism allows you to decompose an XML document into manageable blocks, and then assemble one or more smaller documents into one large document. In other words, in one of the configuration files of hadoop, you can use the XInclude mechanism to include other configuration files and process them as well. Examples are as follows:
< configuration > < / configuration >
Through the XInclude mechanism, the referenced xml file is embedded into the current configuration file, which is more conducive to modular management of the configuration file, and there is no need to reload the referenced xml file using the addResource method.
After setting up the DocumentBuilderFactory object, pass the
DocumentBuilderFactory.newDocumentBuilder gets the DocumentBuilder object, which is used to parse the XML from various inputs.
In loadResource, each of the four resources supported by the Configuration object needs to be processed, but in these four cases, the DocumentBuilder.parse () function is finally called and a DOM parsing result is returned.
If the input is a child of a DOM, the parsing result is set to the input element. This is to handle the following special case where the element Configuration contains a Configuration child node.
The second part of the code of the member function loadResource is to set the member variables properties and finalParameters of Configuration according to the parsing result of DOM.
After confirming that the root node of the XML is configuration, get all the children of the root node and process all the children. Note here that the child node of the element configuration can be either configuration or properties. In the case of configuration, the loadResource method is called recursively, and the child node is continued to be processed as the root node during the processing of the loadResource method.
If it is a property child node, try to get the child elements name, value, and final of property. After successfully obtaining the values of name and value, set the object's member variables propertis and finalParameters as appropriate. The related code is as follows:
If (root = = null) {if (doc = = null) {if (quiet) {return null;} throw new RuntimeException (resource + "not found");} / / the root node should be configuration root = doc.getDocumentElement ();} Properties toAddTo = properties;if (returnCachedProperties) {toAddTo = new Properties () } if (! "configuration" .equals (root.getTagName ()) LOG.fatal ("bad conf file: top-level element not"); / / get all the children of the root node NodeList props = root.getChildNodes (); DeprecationContext deprecations = deprecationContext.get (); for (int I = 0; I < props.getLength ()) {Node propNode = props.item (I) If (! (propNode instanceof Element)) / / ignore continue; Element prop = (Element) propNode if the child node is not Element If ("configuration" .equals (prop.getTagName () {/ / if the child node is configuration, recursively call loadResource for processing. / / this means that the child node of configuration is configuration loadResource (toAddTo, new Resource (prop, name), quiet); continue } / / the child node is property if (! "property" .equals (prop.getTagName () LOG.warn ("bad conf file: element not"); NodeList fields = prop.getChildNodes (); String attr = null; String value = null; boolean finalParameter = false; LinkedList source = new LinkedList () / / find the values of name, value and final for (int j = 0; j < fields.getLength (); jacks +) {Node fieldNode = fields.item (j); if (! (fieldNode instanceof Element)) continue; Element field = (Element) fieldNode If ("name" .equals (field.getTagName ()) & & field.hasChildNodes ()) attr = StringInterner.weakIntern (Text) field.getFirstChild ()) .getData () .trim ()) If ("value" .equals (field.getTagName ()) & & field.hasChildNodes ()) value = StringInterner.weakIntern (Text) field.getFirstChild ()). GetData ()) If ("final" .equals (field.getTagName ()) & & field.hasChildNodes ()) finalParameter = "true" .equals (Text) field.getFirstChild ()). GetData ()) If ("source" .equals (field.getTagName ()) & & field.hasChildNodes ()) source.add (StringInterner.weakIntern (Text) field.getFirstChild ()). GetData ());} source.add (name);.
2.4.5 get* method
Get* represents a total of 21 methods that are used to get the corresponding configuration information in the configuration object.
The configuration information can be basic types such as boolean, int, long, or other common hadoop types, such as class information ClassName, Classes, Class,String array StringCollection, Strings,URL, and so on. The most important of these methods is the get () method, which gets the corresponding value based on the key of the configuration item and returns the default value of defaultValue if the key does not exist. Other methods rely on the get () method and make improvements based on get (). The code is as follows:
Public String get (String name) {String [] names = handleDeprecation (deprecationContext.get (), name); String result = null; for (String n: names) {result = substituteVars (getProps (). GetProperty (n));} return result;}
The get native method calls the Configuration's private method substitutevars method, which completes the property extension of the configuration.
A property extension means that the value of a configuration item contains variables in the format ${key}, which are automatically replaced with the corresponding values. That is, ${key} will be replaced with the value of the configuration item with key as the key.
Note that if the resulting configuration item value still contains variables after ${key} replacement, the process continues until the variable no longer appears in the replaced value.
Finally, it is important to note that the property extensions in subsititute can use not only the key-value pairs saved in the Configuration object, but also the virtual system properties of java. Property extensions take precedence over system properties, followed by key-value pairs saved in the Configuration object. However, in order to prevent the endless loop of property extension acquisition, the acquisition is terminated after 20 cycles, and if the value is not obtained after 20 times, an exception will be thrown.
2.4.6 set* method
Compared to get*, most of the methods of set* are very simple. After performing type conversion and other processing relative to the input, these methods end up calling the set () method, which simply calls the setroperty method of the member variables properties and overlay to save the incoming key-value pairs. The code is as follows:
Public void set (String name, String value, String source) {/ / pre-check parameter Preconditions.checkArgument (name! = null, "Property name must not be null"); Preconditions.checkArgument (value! = null, "The value of property" + name + "must not be null"); name = name.trim (); DeprecationContext deprecations = deprecationContext.get () If (deprecations.getDeprecatedKeyMap (). IsEmpty ()) {/ / load resource getProps ();} getOverlay (). SetProperty (name, value); getProps (). SetProperty (name, value); String newSource = (source = = null? "programatically": source); if (! isDeprecated (name)) {updatingResource.put (name, new String [] {newSource}); String [] altNames = getAlternativeNames (name) If (altNames! = null) {for (String n: altNames) {if (! n.equals (name)) {getOverlay () .setProperty (n, value); getProps () .setProperty (n, value); updatingResource.put (n, new String [] {newSource})) } else {String [] names = handleDeprecation (deprecationContext.get (), name); String altSource = "because" + name + "is deprecated"; for (String n: names) {getOverlay () .setProperty (n, value) GetProps () .setProperty (n, value); updatingResource.put (n, new String [] {altSource});}
3 Configurable interface
Configurable is a very simple interface and is also located in the org.apache.hadoop.conf package. The class diagram is as follows:
Literally, the meaning of Configurable is configurable. If a class implements the Configurable interface, it means that the class is configurable, that is, you can provide some configuration information required by the object through a Configuration instance of this class. A large number of Hadoop code implements the Coonfigurable interface, such as
Org.apache.hadoop.mapred.SequenceFileInputFilter.RegexFilter .
When is the Configurable.setConf method called? In general, after an object is created, you should use the setConf method to provide further initialization work for the object. To simplify the two consecutive steps of creating and invoking the setConf method:
Static methods are provided in org.apache.hadoop.util.ReflectionUtils
The newInstance method, which uses the java reflection mechanism, based on the object type information. Create a new object of the corresponding type, and then call another static method in ReflectionUtils, setConf, to configure the object.
In the setConf method, if the object implements the Configurable interface, the object's setConf method is called and the object is further initialized based on the instance conf of Configuration.
Thank you for reading this article carefully. I hope the article "how to deal with Hadoop configuration Information" shared by the editor will be helpful to everyone. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.