Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Hive file compression test

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Hive can be used in a variety of formats, such as plain text, lzo, orc, etc., in order to find out the relationship between them, specially do a test.

First, establish a sample table

Hive > create table tbl (id int, name string) row format delimited fields terminated by'| 'stored as textfile

OK

Time taken: 0.338 seconds

Hive > load data local inpath'/ home/grid/users.txt' into table tbl

Copying data from file:/home/grid/users.txt

Copying file: file:/home/grid/users.txt

Loading data to table default.tbl

Table default.tbl stats: [numFiles=1, numRows=0, totalSize=111, rawDataSize=0]

OK

Time taken: 0.567 seconds

Hive > select * from tbl

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.237 seconds, Fetched: 14 row (s)

Second, test write

1. No compression

Hive > set hive.exec.compress.output

Hive.exec.compress.output=false

Hive >

>

> create table tbltxt as select * from tbl

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498527794024_0001, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0001/

Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498527794024_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-27 10 55 29906 Stage-1 map = 0%, reduce = 0%

2017-06-27 10 Cumulative CPU 55 Cumulative CPU 39532 Stage-1 map = 100%, reduce = 0%

MapReduce Total cumulative CPU time: 2 seconds 660 msec

Ended Job = job_1498527794024_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27 10-55-18 "962" 2187345348997213497-1/-ext-10001

Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbltxt

Table default.tbltxt stats: [numFiles=1, numRows=14, totalSize=111, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 2.66 sec HDFS Read: 318 HDFS Write: 181 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 660 msec

OK

Time taken: 22.056 seconds

Hive >

> show create table tbltxt

OK

CREATE TABLE `tbltxt` (

`id` int

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbltxt'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true'

'numFiles'='1'

'numRows'='14'

'rawDataSize'='97'

'totalSize'='111'

'transient_lastDdlTime'='1498532140')

Time taken: 0.202 seconds, Fetched: 18 row (s)

Hive >

>

> select * from tbltxt

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.059 seconds, Fetched: 14 row (s)

Hive >

>

> dfs-ls / user/hive/warehouse/tbltxt

Found 1 items

-rwxr-xr-x 1 grid supergroup 2017-06-27 10:55 / user/hive/warehouse/tbltxt/000000_0

Hive >

>

> dfs-cat / user/hive/warehouse/tbltxt/000000_0

1Awyp

2Azs

3Als

4Aww

5Awyp2

6Awyp3

7Awyp4

8Awyp5

9Awyp6

10Awyp7

11Awyp8

12Awyp5

13Awyp9

14Awyp20

The formats for reading and writing are:

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

The data can be read normally. The data format is plain text and can be viewed directly with cat.

2. Use compression. The format is default compression.

Hive >

> set hive.exec.compress.output=true

Hive >

>

> set mapred.output.compression.codec

Mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

It can be seen that the current compression format is the default DefaultCodec.

Hive >

> create table tbldefault as select * from tbl

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498527794024_0002, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0002/

Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498527794024_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-27 11 44845 Stage-1 map = 0%, reduce = 0%

2017-06-27 1114 Cumulative CPU 48964 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.08 sec

MapReduce Total cumulative CPU time: 1 seconds 80 msec

Ended Job = job_1498527794024_0002

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27011-14-39351-6035948930260680086-1/-ext-10001

Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbldefault

Table default.tbldefault stats: [numFiles=1, numRows=14, totalSize=76, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.08 sec HDFS Read: 318 HDFS Write: 150 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 80 msec

OK

Time taken: 10.842 seconds

Hive >

>

> show create table tbldefault

OK

CREATE TABLE `tbldefault` (

`id` int

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbldefault'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true'

'numFiles'='1'

'numRows'='14'

'rawDataSize'='97'

'totalSize'='76'

'transient_lastDdlTime'='1498533290')

Time taken: 0.044 seconds, Fetched: 18 row (s)

Hive >

>

> select * from tbldefault

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.037 seconds, Fetched: 14 row (s)

Hive >

>

> dfs-ls / user/hive/warehouse/tbldefault

Found 1 items

-rwxr-xr-x 1 grid supergroup 76 2017-06-27 11:14 / user/hive/warehouse/tbldefault/000000_0.deflate

Hive >

> dfs-cat / user/hive/warehouse/tbldefault/000000_0.deflate

Xws

DfX0) 60K:HBhive >

>

>

It can be seen that under the default compression, the read and write format of the table is the same as txt, but the data file is compressed by the default library with the suffix deflate, so users cannot view the content directly. It means that the input of org.apache.hadoop.mapred.TextInputFormat can recognize the default compression based on the suffix and read out the content.

3. Lzo compression

Hive >

> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzoCodec

Hive >

>

> create table tbllzo as select * from tbl

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498527794024_0003, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0003/

Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498527794024_0003

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-27 11 2908436 Stage-1 map = 0%, reduce = 0%

2017-06-27 11 Cumulative CPU 2914 638 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.87 sec

MapReduce Total cumulative CPU time: 1 seconds 870 msec

Ended Job = job_1498527794024_0003

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-279-29-03-249-4340474818139134521-1/-ext-10001

Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbllzo

Table default.tbllzo stats: [numFiles=1, numRows=14, totalSize=106, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.87 sec HDFS Read: 318 HDFS Write: 176 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 870 msec

OK

Time taken: 13.744 seconds

Hive >

>

> show create table tbllzo

OK

CREATE TABLE `tbllzo` (

`id` int

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbllzo'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true'

'numFiles'='1'

'numRows'='14'

'rawDataSize'='97'

'totalSize'='106'

'transient_lastDdlTime'='1498534156')

Time taken: 0.044 seconds, Fetched: 18 row (s)

Hive >

> select * from tbllzo

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.032 seconds, Fetched: 14 row (s)

Hive >

>

> dfs-ls / user/hive/warehouse/tbllzo

Found 1 items

-rwxr-xr-x 1 grid supergroup 106 2017-06-27 11:29 / user/hive/warehouse/tbllzo/000000_0.lzo_deflate

Hive >

>

> dfs-cat / user/hive/warehouse/tbllzo/000000_0.lzo_deflate

Ob1Awyp

2Azs

3Als

4Aww

5Awyp2

six

seven

eight

nine

ten

one

one hundred and twenty five

13Awyp9

14Awyp20

Under lz compression, the read and write format of the table is still org.apache.hadoop.mapred.TextInputFormat, the data file is suffixed with .lzo _ deflate, and the user cannot view the content directly. In other words, an input like org.apache.hadoop.mapred.TextInputFormat can recognize lzo compression and read content. How powerful! )

4. Lzop compression

Hive >

> set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec

Hive >

> create table tbllzop as select * from tbl

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498527794024_0004, Tracking URL = http://hadoop1:8088/proxy/application_1498527794024_0004/

Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498527794024_0004

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-27 11 3714 28010 Stage-1 map = 0%, reduce = 0%

2017-06-27 11 Cumulative CPU 3715 32127 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.1 sec

MapReduce Total cumulative CPU time: 2 seconds 100 msec

Ended Job = job_1498527794024_0004

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-27 11-37-23 099 3493082162039010112-1/-ext-10001

Moving data to: hdfs://hadoop1:9000/user/hive/warehouse/tbllzop

Table default.tbllzop stats: [numFiles=1, numRows=14, totalSize=148, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 2.1 sec HDFS Read: 318 HDFS Write: 219 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 100 msec

OK

Time taken: 10.233 seconds

Hive >

>

> show create table tbllzop

OK

CREATE TABLE `tbllzop` (

`id` int

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

STORED AS INPUTFORMAT

'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbllzop'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true'

'numFiles'='1'

'numRows'='14'

'rawDataSize'='97'

'totalSize'='148'

'transient_lastDdlTime'='1498534653')

Time taken: 0.046 seconds, Fetched: 18 row (s)

Hive >

>

>

> select * from tbllzop

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.033 seconds, Fetched: 14 row (s)

Hive >

>

> dfs-ls / user/hive/warehouse/tbllzop

Found 1 items

-rwxr-xr-x 1 grid supergroup 2017-06-27 11:37 / user/hive/warehouse/tbllzop/000000_0.lzo

Hive >

>

> dfs-cat / user/hive/warehouse/tbllzop/000000_0.lzo

Ob1Awyp

2Azs

3Als

4Aww

5Awyp2

six

seven

eight

nine

ten

one

one hundred and twenty five

13Awyp9

14Awyp20

Similarly, under lzop compression, the read and write format of the table is still org.apache.hadoop.mapred.TextInputFormat, the data file is suffixed with .lzo, and the user cannot view the content directly. Org.apache.hadoop.mapred.TextInputFormat can recognize lzop compression and read content

As you can see from the above, no matter which compression is used, it is plain text in hive's view (just using different methods of compression), can be read with org.apache.hadoop.mapred.TextInputFormat, and hive will only be compressed according to mapred.output.compression.codec when inserted (regardless of what the inputFormat defined by the table is). The following can be verified:

1. Data is inserted when set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec, and the data file is compressed by lzop and can be read out normally.

Hive > set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec

Hive >

> create table tbltest1 (id int, name string)

> stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'

> outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

OK

Time taken: 0.493 seconds

Hive >

> insert into table tbltest1 select * from tbl

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498660018952_0001, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0001/

Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498660018952_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-28 22 5927886 Stage-1 map = 0%, reduce = 0%

2017-06-28 22 sec 59 Cumulative CPU 36427 Stage-1 map = 100%, reduce = 0%

MapReduce Total cumulative CPU time: 2 seconds 250 msec

Ended Job = job_1498660018952_0001

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28 22-59-14 730 4437480099583255943-1/-ext-10000

Loading data to table default.tbltest1

Table default.tbltest1 stats: [numFiles=1, numRows=14, totalSize=148, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 2.25 sec HDFS Read: 318 HDFS Write: 220 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 250 msec

OK

Time taken: 24.151 seconds

Hive >

> dfs-ls / user/hive/warehouse/tbltest1

Found 1 items

-rwxr-xr-x 1 grid supergroup 2017-06-28 22:59 / user/hive/warehouse/tbltest1/000000_0.lzo

Hive >

> select * from tbltest1

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.055 seconds, Fetched: 14 row (s)

2. Data is inserted when set mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec. The data file is compressed by default and can be read out normally.

Hive > set mapred.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec

Hive > create table tbltest2 (id int, name string)

> stored as inputformat 'org.apache.hadoop.mapred.TextInputFormat'

> outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

OK

Time taken: 0.142 seconds

Hive > insert into table tbltest2 select * from tbl

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498660018952_0002, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0002/

Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498660018952_0002

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-28 23 09 06439 Stage-1 map = 0%, reduce = 0%

2017-06-28 23 Cumulative CPU 09 sec 11668 Stage-1 map = 100%, reduce = 0%

MapReduce Total cumulative CPU time: 1 seconds 150 msec

Ended Job = job_1498660018952_0002

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28 23-09-01 "674" 9172062679713398655-1/-ext-10000

Loading data to table default.tbltest2

Table default.tbltest2 stats: [numFiles=1, numRows=14, totalSize=76, rawDataSize=97]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.15 sec HDFS Read: 318 HDFS Write: 148 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 150 msec

OK

Time taken: 11.278 seconds

Hive >

>

>

> dfs-ls / user/hive/warehouse/tbltest2

Found 1 items

-rwxr-xr-x 1 grid supergroup 76 2017-06-28 23:09 / user/hive/warehouse/tbltest2/000000_0.deflate

Hive >

> select * from tbltest2

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.035 seconds, Fetched: 14 row (s)

3. When the table is in orc format, it will be compressed according to ORC format and will not be affected by mapred.output.compression.codec and hive.exec.compress.output.

Hive > set hive.exec.compress.output=false

Hive > create table tbltest3 (id int, name string)

> stored as orc tblproperties ("orc.compress" = "SNAPPY")

OK

Time taken: 0.08 seconds

Hive > insert into table tbltest3 select * from tbl

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1498660018952_0003, Tracking URL = http://hadoop1:8088/proxy/application_1498660018952_0003/

Kill Command = / opt/hadoop/bin/hadoop job-kill job_1498660018952_0003

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2017-06-28 23 3015 Stage-1 map = 0%, reduce = 0%

2017-06-28 23 Cumulative CPU 30 Cumulative CPU 34007 Stage-1 map = 100%, reduce = 0%, 1.14 sec

MapReduce Total cumulative CPU time: 1 seconds 140 msec

Ended Job = job_1498660018952_0003

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://hadoop1:9000/tmp/hive-grid/hive_2017-06-28-23-30-25-350-7458831371800658041-1/-ext-10000

Loading data to table default.tbltest3

Table default.tbltest3 stats: [numFiles=1, numRows=14, totalSize=365, rawDataSize=1288]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Cumulative CPU: 1.14 sec HDFS Read: 318 HDFS Write: 439 SUCCESS

Total MapReduce CPU Time Spent: 1 seconds 140 msec

OK

Time taken: 9.963 seconds

Hive > dfs-ls / user/hive/warehouse/tbltest3

Found 1 items

-rwxr-xr-x 1 grid supergroup 2017-06-28 23:30 / user/hive/warehouse/tbltest3/000000_0

Hive >

> dfs-cat / user/hive/warehouse/tbltest3/000000_0

ORC

)

nine

"

Asians Aza _

+ @ DA+y-Az_A+_A++A+y-2345678,5A+y-9A+y-20

Hive >

> show create table tbltest3

OK

CREATE TABLE `tbltest3` (

`id` int

`name` string)

ROW FORMAT SERDE

'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

STORED AS INPUTFORMAT

'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'

OUTPUTFORMAT

'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

LOCATION

'hdfs://hadoop1:9000/user/hive/warehouse/tbltest3'

TBLPROPERTIES (

'COLUMN_STATS_ACCURATE'='true'

'numFiles'='1'

'numRows'='14'

'orc.compress'='SNAPPY'

'rawDataSize'='1288'

'totalSize'='365'

'transient_lastDdlTime'='1498663835')

Time taken: 0.217 seconds, Fetched: 19 row (s)

Hive >

> select * from tbltest3

OK

1 Awyp

2 Azs

3 Als

4 Aww

5 Awyp2

6 Awyp3

7 Awyp4

8 Awyp5

9 Awyp6

10 Awyp7

11 Awyp8

12 Awyp5

13 Awyp9

14 Awyp20

Time taken: 0.689 seconds, Fetched: 14 row (s)

It can be seen that when in orc format, inserting data is not affected by compression parameters. And inputformat and outputformat are no longer text.

III. Summary

1. No compression, default compression, lzo, lzop and other formats are all text formats for hive, which can be automatically identified according to the suffix name of the data file, and decide whether to compress and what format to compress according to the parameters when writing.

2. Orc is another format for hive. No matter how the parameter is specified, it will be read and written according to the format specified by the table name.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report