Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use gpfdist and external tables to import data efficiently in greenplum

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to use gpfdist and external tables to import data efficiently in greenplum". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!

Greenplum, as an OLAP analytical software, naturally cannot avoid loading a large amount of data from an external database. However, the traditional ETL data transmission method (select=>insert) needs to pass through a single point master of GP, which is very inefficient.

The following describes how to import data quickly from external tables using gpfdist:

Difference between ordinary external table and writable external table:

1. Ordinary external tables can only select, writable external tables can only insert

2, writable external table without error table

3, writable external table can specify the distribution key, if not specified, the default random distribution; ordinary external table can only be randomly distributed

GPFDIST Advantages:

1. Loading concurrently directly by segment

2, directly load the data file, and can read and write (and select the external table type)

3. Default data is randomly distributed, and each node Load Balancer (related to the selected external table type)

Examples:

1. Start gpfdist

After installing GP, it comes with gpfdist file, and you can start the service by directly specifying directory, port, etc. If you need a separate file server, you need to download gpfdist separately on the file server.

[gpadmin@gp-master ~]nohup /disk/GP/bin/gpfdist -p 8081 -d /disk/upload &

Use nohup & to function as a daemon, otherwise the process will be shut down after the client executing the startup service is shut down; specify port 8081, specify the file server directory/disk/upload

[gpadmin@gp-master ~]$ ps -ef|grep gpfdist

gpadmin 816 32606 0 17:08 pts/4 00:00:00 grep gpfdist

gpadmin 13036 1 0 Oct21 ? 00:00:44 /disk/GP/bin/gpfdist -p 8081 -d /disk/upload

2. Create an ordinary external table

CREATE EXTERNAL TABLE "ods". "order" (

"id" varchar(64),

"create_by" varchar(64),

"create_date" timestamp,

"update_by" varchar(64),

"update_date" timestamp,

"del_flag" char(1),

"user_id" varchar(64),

"user_name" varchar(64),

"account_id" varchar(64),

"equ_def_id" varchar(64),

"amount_in" numeric(20,2),

"amount_in_money" numeric(20,2),

"amount_out" numeric(20,2),

"amount_out_money" numeric(20,2),

"fee" numeric(20,2),

"fee_discount" numeric(20,2),

"money" numeric(20,2),

"actual_pay_money" numeric(20,2),

"discount_money" numeric(20,2),

"pay_time" timestamp,

"order_type" varchar(16),

"status" varchar(64),

"rel_biz_type" varchar(64),

"rel_biz_id" varchar(64),

"equ_agreement" varchar(64),

"remarks" varchar(100),

"transaction_type" varchar(1),

"pay_stop_date" timestamp,

"stop_time" timestamp,

"bid_method" varchar(64),

"profit_fee_rate" numeric(20,2),

"quit_charge_rate" numeric(20,2),

"return_tb" numeric(20,2),

"last_order_id" varchar(64),

"root_order_id" varchar(64),

"invest_name_for_me" varchar(64),

"invest_name_for_buyer" varchar(64),

"invest_start_time" timestamp,

"available_amount" numeric(20,2),

"surplus_days" int4,

"transferable_flag" varchar(64),

"trc_order_id" varchar(64)

) LOCATION ('gpfdist://gp-master:8081/order.csv') format 'csv' (DELIMITER ';');

Note that there are several holes:

1. This experimental data order.csv is the test data of mysql in the production environment. There is a time type field in the table. If you export the online data to txt text, your time format will become varchar at this time. When you import it into the time type field of GP, you will report a format error, so try to export it to csv format text.

2. Specify gpfdist information: LOCATION ('gpfdist://file server host name or IP:gpfdist port/load file') format 'file format'(DELIMITER ' delimiter character')

3. Export data from line to text, no wardrobe, no closing characters, only separator characters

Test data 90W recorded data, gpfdist loading data in seconds, high efficiency

Create writable external tables

CREATE WRITABLE EXTERNAL TABLE "ods". "order1" (

"id" varchar(64),

"create_by" varchar(64),

"create_date" timestamp,

"update_by" varchar(64),

"update_date" timestamp,

"del_flag" char(1),

"user_id" varchar(64),

"user_name" varchar(64),

"account_id" varchar(64),

"equ_def_id" varchar(64),

"amount_in" numeric(20,2),

"amount_in_money" numeric(20,2),

"amount_out" numeric(20,2),

"amount_out_money" numeric(20,2),

"fee" numeric(20,2),

"fee_discount" numeric(20,2),

"money" numeric(20,2),

"actual_pay_money" numeric(20,2),

"discount_money" numeric(20,2),

"pay_time" timestamp,

"order_type" varchar(16),

"status" varchar(64),

"rel_biz_type" varchar(64),

"rel_biz_id" varchar(64),

"equ_agreement" varchar(64),

"remarks" varchar(100),

"transaction_type" varchar(1),

"pay_stop_date" timestamp,

"stop_time" timestamp,

"bid_method" varchar(64),

"profit_fee_rate" numeric(20,2),

"quit_charge_rate" numeric(20,2),

"return_tb" numeric(20,2),

"last_order_id" varchar(64),

"root_order_id" varchar(64),

"invest_name_for_me" varchar(64),

"invest_name_for_buyer" varchar(64),

"invest_start_time" timestamp,

"available_amount" numeric(20,2),

"surplus_days" int4,

"transferable_flag" varchar(64),

"trc_order_id" varchar(64)

) LOCATION ('gpfdist://gp-master:8081/order1.csv') format 'CSV' (DELIMITER ';')

DISTRIBUTED BY (id);

"Greenplum how to use gpfdist and external table efficient import data" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report