In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Sqoop (pronunciation: skup)
Is an open source tool, mainly used in Hadoop (Hive) and traditional databases (mysql, postgresql...) Data transfer between can import the data from a relational database (such as MySQL, Oracle, Postgres, etc.) into the HDFS of Hadoop, and also import the data from HDFS into the relational database.
Incremental import is the most important issue related to efficiency because Sqoop is specifically designed for the big data collection. Sqoop supports incremental updates, adding new records to the last exported data source, or specifying the last modified timestamp.
Given Sqoop's ability to move data into and out of relational databases, it's not surprising that it has special support for the famous SQL-like data warehouse in the Hive-Hadoop ecosystem. The command "create-hive-table" can be used to import data table definitions into Hive.
To put it simply: sqoop is a data migration tool
Working mechanism: it facilitates our data migration needs by building some MR programs for data import and export.
Sqoop: built-in mr program for importing and exporting data
Official website: http://sqoop.apache.org/
Download: http://archive.apache.org/dist/sqoop/
Sqoop is a tool that can be installed on a machine and run in a hadoop environment.
First download sqoop on the official website, and then upload it to linux
Decompression
Tar-zxvf sqoop (compressed package)-C / usr/local/apps
Then go to sqoop to modify the configuration
Cd conf/
Modify sqoop-env-template.sh
First, we need to change the name of sqoop-env-template.sh.
Mv sqoop-env-template.sh sqoop-env.sh
And then we edit him.
Vi sqoop-env.sh
First configure this, that is, configure the installation directory of hadoop
# Set path to where bin/hadoop is available
Export HADOOP_COMMON_HOME=/usr/local/apps/hadoop-2.6.5
Next, configure mapreduce HOME, that is, the installation path of hadoop.
# Set path to where hadoop-*-core.jar is available
Export HADOOP_MAPRED_HOME=/usr/local/apps/hadoop-2.6.5
Next, configure the installation directory for hbase
# set the path to where bin/hbase is available
Export HBASE_HOME=/usr/local/apps/hbase-0.98.24-hadoop2
Configure the installation directory for hive
# Set the path to where bin/hive is available
Export HIVE_HOME=/usr/local/apps/apache-hive-2.1.1-bin
Finally configure the conf directory of zookeeper
# Set the path for where zookeper config dir is
Export ZOOCFGDIR=/usr/local/apps/zookeeper-3.4.6/conf
Save exit
And then we're configuring sqoop into our system's environment variables.
Vi / etc/profile
Write at the bottom
Export SQOOP_HOME=/usr/local/apps/sqoop-1.4.6.bin__hadoop-2.0.4-alpha
Add to PATH
Export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin
Save exit
Don't forget to let the environment variables of the system take effect.
Source / etc/profile
Next, let's call sqoop to see if it works properly.
Note: all our installation packages and installation directories are best installed under root, and create a new installation directory apps under root
Otherwise, the system will prompt for an error.
Warning: / usr/local/apps/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
The two most commonly used commands in sqoop
Import Import a table from a database to HDFS (import tables from an external database into hdfs)
Export Export an HDFS directory to a database table (export tables in hdfs to sqoop)'
Test sqoop usage (as long as you can use it)
The first category: import the data from the database into hdfs
First of all, make sure that both hdfs and yarn are started.
Start-dfs.sh
Start-yarn.sh
Next let's run the command of sqoop
Sqoop import-connect jdbc:mysql://hadoop-server-00:3306/baba-username root-password root-table bbs_detail
He put it in the hdfs.
Hadoop fs-ls /
Hadoop fs-ls / user
Hadoop fs-ls / user/root
Hadoop fs-ls / user/root/bbs_detail
Here is the data we uploaded, and he will basically distribute our uploaded data to several map averagely, and upload it to 4 map task by default.
Without guidance, he will upload the data we upload to / user/root/bbs_detail, which is automatically generated by himself.
The delimiter by default is','
Advanced usage of sqoop importing data into hdfs (note the space between write commands)
first
Specify the output path (--target-dir'/ sqoop/td') and point to the data separator (--field-terminated-by'\ t')
Sqoop import-- connect jdbc:mysql://hadoop-server-00:3306/baba-- username root-- password root-- table bbs_detail-- target-dir'/ sqoop/td'-- fields-terminated-by'
Point to the number of map task-m (controlling the number of map task can control the number of small files in memory, the efficiency will be low, if the table is very large, we also have to increase the number of map task)
Sqoop import-- connect jdbc:mysql://hadoop-server-00:3306/baba-- username root-- password root-- table bbs_detail-- target-dir'/ sqoop/td'-- fields-terminated-by'\ 001mm 1
(in actual generation, it is easy to use commas or tab as delimiters, we use (\ 001) non-printable characters as delimiters)
Add where condition, note: the condition must be enclosed in single quotation marks
Sqoop import-- connect jdbc:mysql://hadoop-server-00:3306/baba-- username root-- password root-- table bbs_detail-- where'id > 30'--target-dir'/ sqoop/td2'-- fields-terminated-by'\ 001
Add query statement (using\ to wrap the statement'\'to indicate escape carriage return, because the sqoop statement is written on one line, and it is too long to write, so we use'\'to escape)
Sqoop import-connect jdbc:mysql://hadoop-server-00:3306/baba-username root-password root\
-- query 'SELECT id,order_id,product_id FROM bbs_detail where id > 30 AND $CONDITIONS'
-- split-by bbs_detail.id-- target-dir'/ sqoop/td3'
Note: if you use the command-- query, you need to pay attention to the parameter after where, and the parameter AND $CONDITIONS must be added (AND $CONDITIONS indicates that the select condition written above ends, which is fixed)
And there is a difference between single quotation marks and double quotation marks. If-- query is followed by double quotation marks, you need to put\ that is\ $CONDITIONS before $CONDITIONS.
If you set the number of map to-m 1, you don't need to add-- split-by ${tablename.column}, otherwise you need to add (--split-by bbs_detail.id means tell map task to divide by bbs_detail.id)
We usually import the data into hive (data Warehouse) and convert the data operation logic expressed by SQL syntax into mapreduce programs to analyze huge amounts of data on hadoop clusters.
Import data from the database into hive
Sqoop import-hive-import-connect jdbc:mysql://hadoop-server-00:3306/baba-username root-password root-table bbs_detail
Start hive
The command is hive
Show tables
You can also import data from the database into hbase
Sqoop import-hbase-import-connect jdbc:mysql://hadoop-server-00:3306/baba-username root-password root-table bbs_detail
The second category: export data on hdfs to database
Sqoop export--connect jdbc:mysql://hadoop-server-00:3306/baba-- username root-- password root-- export-dir'/ sqoop/bak'-- table td_bak-- columns id,order_id-- fields-terminated-by','-m2
(baba: library-- export-dir'/ sqoop/bak' indicates which directory the exported data is now located in-- table td_bak indicates that the imported table must exist beforehand (that is, the td_bak table must be created first in mysql). That is, you have to create it first-- columns id,order_id indicates which fields to import into the table-- fields-terminated-by',' indicates that the file delimiter is comma and map task is 2)
Because the command is too long, we can use the command like this:
Sqoop export-- connect jdbc:mysql://hadoop-server-00:3306/baba\
-- username root-- password root\
-- export-dir'/ sqoop/bak'-- table td_bak\
-- columns id,order_id-- fields-terminated-by','-m2
Note: the remote connection of mysql is configured in the above test.
GRANT ALL PRIVILEGES ON mytest.* TO 'root'@'192.168.0.104' IDENTIFIED BY' itcast' WITH GRANT OPTION
FLUSH PRIVILEGES
GRANT ALL PRIVILEGES ON*.*TO'root'@'%'IDENTIFIED BY 'itcast' WITH GRANT OPTION
Set up Mysql remote access
Grant all privileges on. To 'root'@'%' identified by' 123456 'with grant option
Solve the problem of Mysql garbled code
Find a configuration file, copy it to the / etc/ directory, and name it my.cnf
(sometimes without my.cnf)
Cp / usr/share/doc/mysql-server-5.1.73/my-medium.cnf / etc/my.cnf
Vim my.cnf
Add under [client] and [mysqld]
Default-character-set=utf8
Finally, press Esc to enter
: wq
Save exit
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.