Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Sqoop basic Concepts installation Test (2017)

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Sqoop (pronunciation: skup)

Is an open source tool, mainly used in Hadoop (Hive) and traditional databases (mysql, postgresql...) Data transfer between can import the data from a relational database (such as MySQL, Oracle, Postgres, etc.) into the HDFS of Hadoop, and also import the data from HDFS into the relational database.

Incremental import is the most important issue related to efficiency because Sqoop is specifically designed for the big data collection. Sqoop supports incremental updates, adding new records to the last exported data source, or specifying the last modified timestamp.

Given Sqoop's ability to move data into and out of relational databases, it's not surprising that it has special support for the famous SQL-like data warehouse in the Hive-Hadoop ecosystem. The command "create-hive-table" can be used to import data table definitions into Hive.

To put it simply: sqoop is a data migration tool

Working mechanism: it facilitates our data migration needs by building some MR programs for data import and export.

Sqoop: built-in mr program for importing and exporting data

Official website: http://sqoop.apache.org/

Download: http://archive.apache.org/dist/sqoop/

Sqoop is a tool that can be installed on a machine and run in a hadoop environment.

First download sqoop on the official website, and then upload it to linux

Decompression

Tar-zxvf sqoop (compressed package)-C / usr/local/apps

Then go to sqoop to modify the configuration

Cd conf/

Modify sqoop-env-template.sh

First, we need to change the name of sqoop-env-template.sh.

Mv sqoop-env-template.sh sqoop-env.sh

And then we edit him.

Vi sqoop-env.sh

First configure this, that is, configure the installation directory of hadoop

# Set path to where bin/hadoop is available

Export HADOOP_COMMON_HOME=/usr/local/apps/hadoop-2.6.5

Next, configure mapreduce HOME, that is, the installation path of hadoop.

# Set path to where hadoop-*-core.jar is available

Export HADOOP_MAPRED_HOME=/usr/local/apps/hadoop-2.6.5

Next, configure the installation directory for hbase

# set the path to where bin/hbase is available

Export HBASE_HOME=/usr/local/apps/hbase-0.98.24-hadoop2

Configure the installation directory for hive

# Set the path to where bin/hive is available

Export HIVE_HOME=/usr/local/apps/apache-hive-2.1.1-bin

Finally configure the conf directory of zookeeper

# Set the path for where zookeper config dir is

Export ZOOCFGDIR=/usr/local/apps/zookeeper-3.4.6/conf

Save exit

And then we're configuring sqoop into our system's environment variables.

Vi / etc/profile

Write at the bottom

Export SQOOP_HOME=/usr/local/apps/sqoop-1.4.6.bin__hadoop-2.0.4-alpha

Add to PATH

Export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SQOOP_HOME/bin

Save exit

Don't forget to let the environment variables of the system take effect.

Source / etc/profile

Next, let's call sqoop to see if it works properly.

Note: all our installation packages and installation directories are best installed under root, and create a new installation directory apps under root

Otherwise, the system will prompt for an error.

Warning: / usr/local/apps/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

The two most commonly used commands in sqoop

Import Import a table from a database to HDFS (import tables from an external database into hdfs)

Export Export an HDFS directory to a database table (export tables in hdfs to sqoop)'

Test sqoop usage (as long as you can use it)

The first category: import the data from the database into hdfs

First of all, make sure that both hdfs and yarn are started.

Start-dfs.sh

Start-yarn.sh

Next let's run the command of sqoop

Sqoop import-connect jdbc:mysql://hadoop-server-00:3306/baba-username root-password root-table bbs_detail

He put it in the hdfs.

Hadoop fs-ls /

Hadoop fs-ls / user

Hadoop fs-ls / user/root

Hadoop fs-ls / user/root/bbs_detail

Here is the data we uploaded, and he will basically distribute our uploaded data to several map averagely, and upload it to 4 map task by default.

Without guidance, he will upload the data we upload to / user/root/bbs_detail, which is automatically generated by himself.

The delimiter by default is','

Advanced usage of sqoop importing data into hdfs (note the space between write commands)

first

Specify the output path (--target-dir'/ sqoop/td') and point to the data separator (--field-terminated-by'\ t')

Sqoop import-- connect jdbc:mysql://hadoop-server-00:3306/baba-- username root-- password root-- table bbs_detail-- target-dir'/ sqoop/td'-- fields-terminated-by'

Point to the number of map task-m (controlling the number of map task can control the number of small files in memory, the efficiency will be low, if the table is very large, we also have to increase the number of map task)

Sqoop import-- connect jdbc:mysql://hadoop-server-00:3306/baba-- username root-- password root-- table bbs_detail-- target-dir'/ sqoop/td'-- fields-terminated-by'\ 001mm 1

(in actual generation, it is easy to use commas or tab as delimiters, we use (\ 001) non-printable characters as delimiters)

Add where condition, note: the condition must be enclosed in single quotation marks

Sqoop import-- connect jdbc:mysql://hadoop-server-00:3306/baba-- username root-- password root-- table bbs_detail-- where'id > 30'--target-dir'/ sqoop/td2'-- fields-terminated-by'\ 001

Add query statement (using\ to wrap the statement'\'to indicate escape carriage return, because the sqoop statement is written on one line, and it is too long to write, so we use'\'to escape)

Sqoop import-connect jdbc:mysql://hadoop-server-00:3306/baba-username root-password root\

-- query 'SELECT id,order_id,product_id FROM bbs_detail where id > 30 AND $CONDITIONS'

-- split-by bbs_detail.id-- target-dir'/ sqoop/td3'

Note: if you use the command-- query, you need to pay attention to the parameter after where, and the parameter AND $CONDITIONS must be added (AND $CONDITIONS indicates that the select condition written above ends, which is fixed)

And there is a difference between single quotation marks and double quotation marks. If-- query is followed by double quotation marks, you need to put\ that is\ $CONDITIONS before $CONDITIONS.

If you set the number of map to-m 1, you don't need to add-- split-by ${tablename.column}, otherwise you need to add (--split-by bbs_detail.id means tell map task to divide by bbs_detail.id)

We usually import the data into hive (data Warehouse) and convert the data operation logic expressed by SQL syntax into mapreduce programs to analyze huge amounts of data on hadoop clusters.

Import data from the database into hive

Sqoop import-hive-import-connect jdbc:mysql://hadoop-server-00:3306/baba-username root-password root-table bbs_detail

Start hive

The command is hive

Show tables

You can also import data from the database into hbase

Sqoop import-hbase-import-connect jdbc:mysql://hadoop-server-00:3306/baba-username root-password root-table bbs_detail

The second category: export data on hdfs to database

Sqoop export--connect jdbc:mysql://hadoop-server-00:3306/baba-- username root-- password root-- export-dir'/ sqoop/bak'-- table td_bak-- columns id,order_id-- fields-terminated-by','-m2

(baba: library-- export-dir'/ sqoop/bak' indicates which directory the exported data is now located in-- table td_bak indicates that the imported table must exist beforehand (that is, the td_bak table must be created first in mysql). That is, you have to create it first-- columns id,order_id indicates which fields to import into the table-- fields-terminated-by',' indicates that the file delimiter is comma and map task is 2)

Because the command is too long, we can use the command like this:

Sqoop export-- connect jdbc:mysql://hadoop-server-00:3306/baba\

-- username root-- password root\

-- export-dir'/ sqoop/bak'-- table td_bak\

-- columns id,order_id-- fields-terminated-by','-m2

Note: the remote connection of mysql is configured in the above test.

GRANT ALL PRIVILEGES ON mytest.* TO 'root'@'192.168.0.104' IDENTIFIED BY' itcast' WITH GRANT OPTION

FLUSH PRIVILEGES

GRANT ALL PRIVILEGES ON*.*TO'root'@'%'IDENTIFIED BY 'itcast' WITH GRANT OPTION

Set up Mysql remote access

Grant all privileges on. To 'root'@'%' identified by' 123456 'with grant option

Solve the problem of Mysql garbled code

Find a configuration file, copy it to the / etc/ directory, and name it my.cnf

(sometimes without my.cnf)

Cp / usr/share/doc/mysql-server-5.1.73/my-medium.cnf / etc/my.cnf

Vim my.cnf

Add under [client] and [mysqld]

Default-character-set=utf8

Finally, press Esc to enter

: wq

Save exit

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report