In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces "what is the internal structure of sqoop". In daily operation, I believe many people have doubts about the internal structure of sqoop. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful for you to answer the doubts about "what is the internal structure of sqoop?" Next, please follow the editor to study!
1.1 Sqoop internal structure
The Sqoop program is driven by the main class com.cloudera.sqoop.Sqoop. A limited number of additional classes are in the same package: SqoopOptions (as mentioned earlier), ConnFactory (that is, proficient with ManagerFactory instances).
1.1.1 General procedure flow
The general procedure is as follows:
Com.cloudera.sqoop.Sqoop is the main class and implements Tool, a new instance ToolRunner is introduced, the first parameter of Sqoop is a string, which is defined and executed in SqoopTool, and SqoopTool executes various request operations of the user (such as import, export, codegen, etc.).
SqoopTool will parse the remaining parameters, set the corresponding fields in the SqoopOptions class, and then execute.
In the run () method of SqoopTool, import, export, or other correct instructions are executed. In general, ConnManager is generally instantiated based on SqoopOptions data. ConnFactory is used to get a ConnManager from ManagerFactory. This has been described in the previous section. The migration of Imports, exports, or other large data is usually a parallel and reliable MapReduce task. The Import operation does not have to be run as a MapReduce job, and ConnManager.importTable () determines how best to perform the import operation. Every major operation is actually controlled by ConnMananger, which is done by CompilationManager and ClassWriter (all in the com.cloudera.sqoop.orm package), except for the operations that generate the code. Importing data into Hive is done by com.cloudera.sqoop.hive.HiveImport 's importTable () so that you don't have to worry about implementations that use ConnManager.
The importTable () method of ConnManager receives a parameter of type ImportJobContext that contains the various parameter values required by this method. In the future, this class can extend additional parameters to enable more powerful import operations. Similarly, the parameter type ExportJobContext received by the exportTable () method. These classes contain a series of imports / exports that point to SqoopOptions objects and other related data.
1.1.2 subpackage
The subpackages in the com.cloudera.sqoop package, including:
Imports Hive: facilitates data import into Hive
Implemented IO: the java.io. * API
External Lib: external public API (as mentioned earlier)
Alternate Manager: the interface between ConnManager and ManagerFactory and their implementation
API Mapreduce: the class of the new (0.20 +) MapReduce's API interface
Automatic Orm: automatic code generation
Implementing Tool: implementing SqoopTool
Utility Util: various utility classes
The implementations of OutputStream and BufferedWriter in the IO package are used to write directly to HDFS. SplittableBufferedWriter allows a single BufferedWriter to be opened for the client, which is continuously written to the file system when the target value is reached under the engine. This allows compression mechanisms (such as gzip) to be used at the same time as Sqoopimport, while allowing data sets to be split after MapR tasks. The code for the large object file storage system also exists in the IO package.
The code in the Mapreduce package is used to interface directly with Hadoop MapReduce, which is described in more detail in the next chapter.
The code in the ORM package is used to produce the code. It relies on the tools.jar package of JDK that provides the com.sun.tools.javac package.
The UTIL package contains various tools for the entire Sqoop
Managing ClassLoaderStack: an instance that manages the ClassLoader used by the stack of the current thread, which is how code is automatically generated to write to the beware thread when the MapReduce task is run in local mode.
Convenient DirectImportUtils: contains some convenient ways to do Hadoop import operations directly.
Initiate Executor: start external processes and connect these to generate a stream handler by an AsyncSink (see more details below).
Throw ExportException: ConnManagers throws an exception when exports fails.
Throw ImportException: ConnManagers throws an exception when import fails.
Connection JDBCURL: handles the parsing of connection strings, which is URL-like and is nonstandard and substandard.
Upload PerfCounters: is used to estimate the transmission rate used to display to the user.
Print ResultSetPrinter: print the result set beautifully.
At different times, Sqoop reads stdout from external programs, and the simplest mode is the direct mode (direct-mode) import executed by LocalMySQLManager and DirectPostgresqlManager. A process is then generated by Runtime.exec (), whose standard output (Process.getInputStream ()) and potential errors (Process.getErrorStream ()) need to be handled. No more data can be read between these streams, causing external processes to block before more data is written. Therefore, these must be dealt with, preferably in an asynchronous manner.
According to Sqoop, an "asynchronous receiver" is a thread that requires an InputStream and read completion. These implement AsyncSink implementations. The com.cloudera.sqoop.util.AsyncSink abstract class defines the operations that this factory must perform. ProcessStream () will generate another thread that immediately starts processing the data parameters read from the InputStream; it must read the stream to do so. The join () method allows external threads to wait until the processing is complete.
Some "stock" are implemented synchronously: LoggingAsyncSink repeats everything on InputStream in the log4j INFO statement. NullAsyncSink consumes all input and does nothing.
Various ConnManagers allow external processes to have their own AsyncSink implementations as inner classes, which are read through the database tools and drive the data to HDFS, during which time format conversion is possible.
1.1.3 Interface with MapReduce
Sqoop schedules MapReduce jobs to generate imports and exports. The configuration and execution of MapReduce work follows several common steps (configuring the implementation of InputFormat configuration OutputFormat settings mapping, and so on). These steps are in the com.cloudera.sqoop.mapreduce.JobBase class. For use, JobBase allows a user to specify InputFormat,OutputFormat, and mappers.
JobBase itself is a subset of ImportJobBase and ExportJobBase and provides better support for specific configuration steps, by subclasses of ImportJobBase and ExportJobBase, respectively. ImportJobBase.runImport () will invoke the configuration step and run a worksheet to import HDFS.
At this point, the study of "what is the internal structure of sqoop" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.