In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "how to use gpfdist and external tables to import data efficiently in greenplum". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!
Greenplum, as an OLAP analytical software, naturally cannot avoid loading a large amount of data from an external database. However, the traditional ETL data transmission method (select=>insert) needs to pass through a single point master of GP, which is very inefficient.
The following describes how to import data quickly from external tables using gpfdist:
Difference between ordinary external table and writable external table:
1. Ordinary external tables can only select, writable external tables can only insert
2, writable external table without error table
3, writable external table can specify the distribution key, if not specified, the default random distribution; ordinary external table can only be randomly distributed
GPFDIST Advantages:
1. Loading concurrently directly by segment
2, directly load the data file, and can read and write (and select the external table type)
3. Default data is randomly distributed, and each node Load Balancer (related to the selected external table type)
Examples:
1. Start gpfdist
After installing GP, it comes with gpfdist file, and you can start the service by directly specifying directory, port, etc. If you need a separate file server, you need to download gpfdist separately on the file server.
[gpadmin@gp-master ~]nohup /disk/GP/bin/gpfdist -p 8081 -d /disk/upload &
Use nohup & to function as a daemon, otherwise the process will be shut down after the client executing the startup service is shut down; specify port 8081, specify the file server directory/disk/upload
[gpadmin@gp-master ~]$ ps -ef|grep gpfdist
gpadmin 816 32606 0 17:08 pts/4 00:00:00 grep gpfdist
gpadmin 13036 1 0 Oct21 ? 00:00:44 /disk/GP/bin/gpfdist -p 8081 -d /disk/upload
2. Create an ordinary external table
CREATE EXTERNAL TABLE "ods". "order" (
"id" varchar(64),
"create_by" varchar(64),
"create_date" timestamp,
"update_by" varchar(64),
"update_date" timestamp,
"del_flag" char(1),
"user_id" varchar(64),
"user_name" varchar(64),
"account_id" varchar(64),
"equ_def_id" varchar(64),
"amount_in" numeric(20,2),
"amount_in_money" numeric(20,2),
"amount_out" numeric(20,2),
"amount_out_money" numeric(20,2),
"fee" numeric(20,2),
"fee_discount" numeric(20,2),
"money" numeric(20,2),
"actual_pay_money" numeric(20,2),
"discount_money" numeric(20,2),
"pay_time" timestamp,
"order_type" varchar(16),
"status" varchar(64),
"rel_biz_type" varchar(64),
"rel_biz_id" varchar(64),
"equ_agreement" varchar(64),
"remarks" varchar(100),
"transaction_type" varchar(1),
"pay_stop_date" timestamp,
"stop_time" timestamp,
"bid_method" varchar(64),
"profit_fee_rate" numeric(20,2),
"quit_charge_rate" numeric(20,2),
"return_tb" numeric(20,2),
"last_order_id" varchar(64),
"root_order_id" varchar(64),
"invest_name_for_me" varchar(64),
"invest_name_for_buyer" varchar(64),
"invest_start_time" timestamp,
"available_amount" numeric(20,2),
"surplus_days" int4,
"transferable_flag" varchar(64),
"trc_order_id" varchar(64)
) LOCATION ('gpfdist://gp-master:8081/order.csv') format 'csv' (DELIMITER ';');
Note that there are several holes:
1. This experimental data order.csv is the test data of mysql in the production environment. There is a time type field in the table. If you export the online data to txt text, your time format will become varchar at this time. When you import it into the time type field of GP, you will report a format error, so try to export it to csv format text.
2. Specify gpfdist information: LOCATION ('gpfdist://file server host name or IP:gpfdist port/load file') format 'file format'(DELIMITER ' delimiter character')
3. Export data from line to text, no wardrobe, no closing characters, only separator characters
Test data 90W recorded data, gpfdist loading data in seconds, high efficiency
Create writable external tables
CREATE WRITABLE EXTERNAL TABLE "ods". "order1" (
"id" varchar(64),
"create_by" varchar(64),
"create_date" timestamp,
"update_by" varchar(64),
"update_date" timestamp,
"del_flag" char(1),
"user_id" varchar(64),
"user_name" varchar(64),
"account_id" varchar(64),
"equ_def_id" varchar(64),
"amount_in" numeric(20,2),
"amount_in_money" numeric(20,2),
"amount_out" numeric(20,2),
"amount_out_money" numeric(20,2),
"fee" numeric(20,2),
"fee_discount" numeric(20,2),
"money" numeric(20,2),
"actual_pay_money" numeric(20,2),
"discount_money" numeric(20,2),
"pay_time" timestamp,
"order_type" varchar(16),
"status" varchar(64),
"rel_biz_type" varchar(64),
"rel_biz_id" varchar(64),
"equ_agreement" varchar(64),
"remarks" varchar(100),
"transaction_type" varchar(1),
"pay_stop_date" timestamp,
"stop_time" timestamp,
"bid_method" varchar(64),
"profit_fee_rate" numeric(20,2),
"quit_charge_rate" numeric(20,2),
"return_tb" numeric(20,2),
"last_order_id" varchar(64),
"root_order_id" varchar(64),
"invest_name_for_me" varchar(64),
"invest_name_for_buyer" varchar(64),
"invest_start_time" timestamp,
"available_amount" numeric(20,2),
"surplus_days" int4,
"transferable_flag" varchar(64),
"trc_order_id" varchar(64)
) LOCATION ('gpfdist://gp-master:8081/order1.csv') format 'CSV' (DELIMITER ';')
DISTRIBUTED BY (id);
"Greenplum how to use gpfdist and external table efficient import data" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.