In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Hive can be used in a variety of formats, such as plain text, lzo, orc, etc., in order to find out the relationship between them, specially do a test.
First, establish a sample table
Hive > create table tbl (id int, name string) row format delimited fields terminated by'| 'stored as textfile
OK
Time taken: 0.338 seconds
Hive > load data local inpath'/ home/grid/users.txt' into table tbl
Copying data from file:/home/grid/users.txt
Copying file: file:/home/grid/users.txt
Loading data to table default.tbl
Table default.tbl stats: [numFiles=1, numRows=0, totalSize=111, rawDataSize=0]
OK
Time taken: 0.567 seconds
Hive > select * from tbl
OK
1 Awyp
2 Azs
3 Als
4 Aww
5 Awyp2
6 Awyp3
7 Awyp4
8 Awyp5
9 Awyp6
10 Awyp7
11 Awyp8
12 Awyp5
13 Awyp9
14 Awyp20
Time taken: 0.237 seconds, Fetched: 14 row (s)
Second, test write
1. No compression
Hive > set hive.exec.compress.output
Hive.exec.compress.output=false
Hive >
>
> create table tbltxt as select * from tbl
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498527794024_0001, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0001/
Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498527794024_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-27 10 55 29906 Stage-1 map = 0%, reduce = 0%
2017-06-27 10 Cumulative CPU 55 Cumulative CPU 39532 Stage-1 map = 100%, reduce = 0%
MapReduce Total cumulative CPU time: 2 seconds 660 msec
Ended Job = job_1498527794024_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27 10-55-18 "962" 2187345348997213497-1/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbltxt
Table default.tbltxt stats: [numFiles=1, numRows=14, totalSize=111, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.66 sec HDFS Read: 318 HDFS Write: 181 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 660 msec
OK
Time taken: 22.056 seconds
Hive >
> show create table tbltxt
OK
CREATE TABLE `tbltxt` (
`id` int
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbltxt'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true'
'numFiles'='1'
'numRows'='14'
'rawDataSize'='97'
'totalSize'='111'
'transient_lastDdlTime'='1498532140')
Time taken: 0.202 seconds, Fetched: 18 row (s)
Hive >
>
> select * from tbltxt
OK
1 Awyp
2 Azs
3 Als
4 Aww
5 Awyp2
6 Awyp3
7 Awyp4
8 Awyp5
9 Awyp6
10 Awyp7
11 Awyp8
12 Awyp5
13 Awyp9
14 Awyp20
Time taken: 0.059 seconds, Fetched: 14 row (s)
Hive >
>
> dfs-ls / user/hive/warehouse/tbltxt
Found 1 items
-rwxr-xr-x 1 grid supergroup 2017-06-27 10:55 / user/hive/warehouse/tbltxt/000000_0
Hive >
>
> dfs-cat / user/hive/warehouse/tbltxt/000000_0
1Awyp
2Azs
3Als
4Aww
5Awyp2
6Awyp3
7Awyp4
8Awyp5
9Awyp6
10Awyp7
11Awyp8
12Awyp5
13Awyp9
14Awyp20
The formats for reading and writing are:
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
The data can be read normally. The data format is plain text and can be viewed directly with cat.
2. Use compression. The format is default compression.
Hive >
> set hive.exec.compress.output=true
Hive >
>
> set mapred.output.compression.codec
Mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
It can be seen that the current compression format is the default DefaultCodec.
Hive >
> create table tbldefault as select * from tbl
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498527794024_0002, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0002/
Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498527794024_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-27 11 44845 Stage-1 map = 0%, reduce = 0%
2017-06-27 1114 Cumulative CPU 48964 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec
MapReduce Total cumulative CPU time: 1 seconds 80 msec
Ended Job = job_1498527794024_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27011-14-39351-6035948930260680086-1/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbldefault
Table default.tbldefault stats: [numFiles=1, numRows=14, totalSize=76, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.08 sec HDFS Read: 318 HDFS Write: 150 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 80 msec
OK
Time taken: 10.842 seconds
Hive >
>
> show create table tbldefault
OK
CREATE TABLE `tbldefault` (
`id` int
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbldefault'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true'
'numFiles'='1'
'numRows'='14'
'rawDataSize'='97'
'totalSize'='76'
'transient_lastDdlTime'='1498533290')
Time taken: 0.044 seconds, Fetched: 18 row (s)
Hive >
>
> select * from tbldefault
OK
1 Awyp
2 Azs
3 Als
4 Aww
5 Awyp2
6 Awyp3
7 Awyp4
8 Awyp5
9 Awyp6
10 Awyp7
11 Awyp8
12 Awyp5
13 Awyp9
14 Awyp20
Time taken: 0.037 seconds, Fetched: 14 row (s)
Hive >
>
> dfs-ls / user/hive/warehouse/tbldefault
Found 1 items
-rwxr-xr-x 1 grid supergroup 76 2017-06-27 11:14 / user/hive/warehouse/tbldefault/000000_0.deflate
Hive >
> dfs-cat / user/hive/warehouse/tbldefault/000000_0.deflate
Xws
DfX0) 60K:HBhive >
>
>
It can be seen that under the default compression, the read and write format of the table is the same as txt, but the data file is compressed by the default library with the suffix deflate, so users cannot view the content directly. It means that the input of org.apache.hadoop.mapred.TextInputFormat can recognize the default compression based on the suffix and read out the content.
3. Lzo compression
Hive >
> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec
Hive >
>
> create table tbllzo as select * from tbl
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498527794024_0003, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0003/
Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498527794024_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-27 11 2908436 Stage-1 map = 0%, reduce = 0%
2017-06-27 11 Cumulative CPU 2914 638 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec
MapReduce Total cumulative CPU time: 1 seconds 870 msec
Ended Job = job_1498527794024_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-279-29-03-249-4340474818139134521-1/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbllzo
Table default.tbllzo stats: [numFiles=1, numRows=14, totalSize=106, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.87 sec HDFS Read: 318 HDFS Write: 176 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 870 msec
OK
Time taken: 13.744 seconds
Hive >
>
> show create table tbllzo
OK
CREATE TABLE `tbllzo` (
`id` int
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbllzo'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true'
'numFiles'='1'
'numRows'='14'
'rawDataSize'='97'
'totalSize'='106'
'transient_lastDdlTime'='1498534156')
Time taken: 0.044 seconds, Fetched: 18 row (s)
Hive >
> select * from tbllzo
OK
1 Awyp
2 Azs
3 Als
4 Aww
5 Awyp2
6 Awyp3
7 Awyp4
8 Awyp5
9 Awyp6
10 Awyp7
11 Awyp8
12 Awyp5
13 Awyp9
14 Awyp20
Time taken: 0.032 seconds, Fetched: 14 row (s)
Hive >
>
> dfs-ls / user/hive/warehouse/tbllzo
Found 1 items
-rwxr-xr-x 1 grid supergroup 106 2017-06-27 11:29 / user/hive/warehouse/tbllzo/000000_0.lzo_deflate
Hive >
>
> dfs-cat / user/hive/warehouse/tbllzo/000000_0.lzo_deflate
Ob1Awyp
2Azs
3Als
4Aww
5Awyp2
six
seven
eight
nine
ten
one
one hundred and twenty five
13Awyp9
14Awyp20
Under lz compression, the read and write format of the table is still org.apache.hadoop.mapred.TextInputFormat, the data file is suffixed with .lzo _ deflate, and the user cannot view the content directly. In other words, an input like org.apache.hadoop.mapred.TextInputFormat can recognize lzo compression and read content. How powerful! )
4. Lzop compression
Hive >
> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec
Hive >
> create table tbllzop as select * from tbl
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498527794024_0004, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0004/
Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498527794024_0004
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-27 11 3714 28010 Stage-1 map = 0%, reduce = 0%
2017-06-27 11 Cumulative CPU 3715 32127 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.1 sec
MapReduce Total cumulative CPU time: 2 seconds 100 msec
Ended Job = job_1498527794024_0004
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27 11-37-23 099 3493082162039010112-1/-ext-10001
Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbllzop
Table default.tbllzop stats: [numFiles=1, numRows=14, totalSize=148, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.1 sec HDFS Read: 318 HDFS Write: 219 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 100 msec
OK
Time taken: 10.233 seconds
Hive >
>
> show create table tbllzop
OK
CREATE TABLE `tbllzop` (
`id` int
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbllzop'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true'
'numFiles'='1'
'numRows'='14'
'rawDataSize'='97'
'totalSize'='148'
'transient_lastDdlTime'='1498534653')
Time taken: 0.046 seconds, Fetched: 18 row (s)
Hive >
>
>
> select * from tbllzop
OK
1 Awyp
2 Azs
3 Als
4 Aww
5 Awyp2
6 Awyp3
7 Awyp4
8 Awyp5
9 Awyp6
10 Awyp7
11 Awyp8
12 Awyp5
13 Awyp9
14 Awyp20
Time taken: 0.033 seconds, Fetched: 14 row (s)
Hive >
>
> dfs-ls / user/hive/warehouse/tbllzop
Found 1 items
-rwxr-xr-x 1 grid supergroup 2017-06-27 11:37 / user/hive/warehouse/tbllzop/000000_0.lzo
Hive >
>
> dfs-cat / user/hive/warehouse/tbllzop/000000_0.lzo
Ob1Awyp
2Azs
3Als
4Aww
5Awyp2
six
seven
eight
nine
ten
one
one hundred and twenty five
13Awyp9
14Awyp20
Similarly, under lzop compression, the read and write format of the table is still org.apache.hadoop.mapred.TextInputFormat, the data file is suffixed with .lzo, and the user cannot view the content directly. Org.apache.hadoop.mapred.TextInputFormat can recognize lzop compression and read content
As you can see from the above, no matter which compression is used, it is plain text in hive's view (just using different methods of compression), can be read with org.apache.hadoop.mapred.TextInputFormat, and hive will only be compressed according to mapred.output.compression.codec when inserted (regardless of what the inputFormat defined by the table is). The following can be verified:
1. Data is inserted when set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec, and the data file is compressed by lzop and can be read out normally.
Hive > set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec
Hive >
> create table tbltest1 (id int, name string)
> stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'
> outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
OK
Time taken: 0.493 seconds
Hive >
> insert into table tbltest1 select * from tbl
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498660018952_0001, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0001/
Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498660018952_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-28 22 5927886 Stage-1 map = 0%, reduce = 0%
2017-06-28 22 sec 59 Cumulative CPU 36427 Stage-1 map = 100%, reduce = 0%
MapReduce Total cumulative CPU time: 2 seconds 250 msec
Ended Job = job_1498660018952_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28 22-59-14 730 4437480099583255943-1/-ext-10000
Loading data to table default.tbltest1
Table default.tbltest1 stats: [numFiles=1, numRows=14, totalSize=148, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 2.25 sec HDFS Read: 318 HDFS Write: 220 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 250 msec
OK
Time taken: 24.151 seconds
Hive >
> dfs-ls / user/hive/warehouse/tbltest1
Found 1 items
-rwxr-xr-x 1 grid supergroup 2017-06-28 22:59 / user/hive/warehouse/tbltest1/000000_0.lzo
Hive >
> select * from tbltest1
OK
1 Awyp
2 Azs
3 Als
4 Aww
5 Awyp2
6 Awyp3
7 Awyp4
8 Awyp5
9 Awyp6
10 Awyp7
11 Awyp8
12 Awyp5
13 Awyp9
14 Awyp20
Time taken: 0.055 seconds, Fetched: 14 row (s)
2. Data is inserted when set mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec. The data file is compressed by default and can be read out normally.
Hive > set mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec
Hive > create table tbltest2 (id int, name string)
> stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'
> outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
OK
Time taken: 0.142 seconds
Hive > insert into table tbltest2 select * from tbl
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498660018952_0002, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0002/
Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498660018952_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-28 23 09 06439 Stage-1 map = 0%, reduce = 0%
2017-06-28 23 Cumulative CPU 09 sec 11668 Stage-1 map = 100%, reduce = 0%
MapReduce Total cumulative CPU time: 1 seconds 150 msec
Ended Job = job_1498660018952_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28 23-09-01 "674" 9172062679713398655-1/-ext-10000
Loading data to table default.tbltest2
Table default.tbltest2 stats: [numFiles=1, numRows=14, totalSize=76, rawDataSize=97]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.15 sec HDFS Read: 318 HDFS Write: 148 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 150 msec
OK
Time taken: 11.278 seconds
Hive >
>
>
> dfs-ls / user/hive/warehouse/tbltest2
Found 1 items
-rwxr-xr-x 1 grid supergroup 76 2017-06-28 23:09 / user/hive/warehouse/tbltest2/000000_0.deflate
Hive >
> select * from tbltest2
OK
1 Awyp
2 Azs
3 Als
4 Aww
5 Awyp2
6 Awyp3
7 Awyp4
8 Awyp5
9 Awyp6
10 Awyp7
11 Awyp8
12 Awyp5
13 Awyp9
14 Awyp20
Time taken: 0.035 seconds, Fetched: 14 row (s)
3. When the table is in orc format, it will be compressed according to ORC format and will not be affected by mapred.output.compression.codec and hive.exec.compress.output.
Hive > set hive.exec.compress.output=false
Hive > create table tbltest3 (id int, name string)
> stored as orc tblproperties ("orc.compress" = "SNAPPY")
OK
Time taken: 0.08 seconds
Hive > insert into table tbltest3 select * from tbl
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498660018952_0003, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0003/
Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498660018952_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2017-06-28 23 3015 Stage-1 map = 0%, reduce = 0%
2017-06-28 23 Cumulative CPU 30 Cumulative CPU 34007 Stage-1 map = 100%, reduce = 0%, 1.14 sec
MapReduce Total cumulative CPU time: 1 seconds 140 msec
Ended Job = job_1498660018952_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28-23-30-25-350-7458831371800658041-1/-ext-10000
Loading data to table default.tbltest3
Table default.tbltest3 stats: [numFiles=1, numRows=14, totalSize=365, rawDataSize=1288]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.14 sec HDFS Read: 318 HDFS Write: 439 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 140 msec
OK
Time taken: 9.963 seconds
Hive > dfs-ls / user/hive/warehouse/tbltest3
Found 1 items
-rwxr-xr-x 1 grid supergroup 2017-06-28 23:30 / user/hive/warehouse/tbltest3/000000_0
Hive >
> dfs-cat / user/hive/warehouse/tbltest3/000000_0
ORC
)
nine
"
Asians Aza _
+ @ DA+y-Az_A+_A++A+y-2345678,5A+y-9A+y-20
Hive >
> show create table tbltest3
OK
CREATE TABLE `tbltest3` (
`id` int
`name` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://hadoop1:9000/user/hive/warehouse/tbltest3'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true'
'numFiles'='1'
'numRows'='14'
'orc.compress'='SNAPPY'
'rawDataSize'='1288'
'totalSize'='365'
'transient_lastDdlTime'='1498663835')
Time taken: 0.217 seconds, Fetched: 19 row (s)
Hive >
> select * from tbltest3
OK
1 Awyp
2 Azs
3 Als
4 Aww
5 Awyp2
6 Awyp3
7 Awyp4
8 Awyp5
9 Awyp6
10 Awyp7
11 Awyp8
12 Awyp5
13 Awyp9
14 Awyp20
Time taken: 0.689 seconds, Fetched: 14 row (s)
It can be seen that when in orc format, inserting data is not affected by compression parameters. And inputformat and outputformat are no longer text.
III. Summary
1. No compression, default compression, lzo, lzop and other formats are all text formats for hive, which can be automatically identified according to the suffix name of the data file, and decide whether to compress and what format to compress according to the parameters when writing.
2. Orc is another format for hive. No matter how the parameter is specified, it will be read and written according to the format specified by the table name.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.