In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article shows you how to achieve the DDL operation in the data Lake DeltaLake. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
I talked about the brief introduction, features and basic operation of delta lake earlier. This article is mainly about the DDL operation of DeltaLake, which actually depends on spark datasourcev2 and catalog API (3.0 +), so when Deltalake integrates spark, it's best to start with 3.0, which has recently been released.
There are some requirements for creating sparksession, and you need to add two configurations:
Valspark = SparkSession .builder () .appName (this.getClass.getCanonicalName) .master ("local [2]") .config ("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension") .config ("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog") .getOrCreate ()
1. Create a tabl
Deltalake creates a table in two ways:
1)。 DataFrameWriter, which is no stranger to you, spark defaults to the way you write files.
Df.write.format ("delta") .saveAsTable ("events") / / create table in the metastore
Df.write.format ("delta") .save ("/ delta/events") / / create table by path
2)。 DeltaLake also supports the use of spark sql's new DDL operation to create tables, CREATE TABLE.
-- Create table in the metastoreCREATE TABLE events (date DATE, eventId STRING, eventType STRING, data STRING) USING DELTA
When you create a table in metastore using Deltalake, the location information of the data is recorded in metastore. This benefit is obvious, it is easier to find when others use it, and you don't have to pay attention to the real location of the data. However, metastore does not store whether the data content is valid or not.
two。 Data partition
When building a data warehouse in production, the data will be partitioned, the query will be accelerated and the DML operation will be optimized. When you use Delta lake to create a partition table, you only need to specify a partition column. Here is an example of a common time-by-time partition:
1) .DDL operation
-Create table in the metastoreCREATE TABLE events (date DATE, eventId STRING, eventType STRING, data STRING) USING DELTAPARTITIONED BY (date) LOCATION'/ delta/events'
2). Scala API
Df.write.format ("delta"). PartitionBy ("date"). SaveAsTable ("events") / / create table in the metastoredf.write.format ("delta"). PartitionBy ("date"). Save ("/ delta/events") / / create table by path
3. Specify storage location
In fact, we can control the storage location of Delta lake table data files, and we can specify path when writing DDL.
In fact, this is very similar to the function of the external table of hive. The table of delta lake at the specified location can be regarded as not managed by metastore. When this kind of table is deleted, the data will not be actually deleted.
Suppose that when creating the Delta lake table, the data file already exists in the specified path, and delta lake will do the following things when creating the table:
1)。 If you only specify the table name and path when you create it, as follows:
CREATE TABLE eventsUSING DELTALOCATION'/ delta/events'
Hive metastore's table automatically infers the schema,partition, and attributes from the existing data. This feature can be used to import data into metastore.
2)。 Assuming that you specify some configuration (schema,partition, or properties of the table), delta lake will only recognize the configuration information you specify from the existing data, not all of the configuration. Assuming that the configuration you specify does not exist in the existing data, an inconsistent exception will be thrown.
3. Read data
Data can directly support sql query, old spark users can also directly use dataframe api to query data.
Sql query
SELECT * FROM events-- query table in the metastore
SELECT * FROM delta.` / delta/ events`-- query table by path
Dataframe query
Spark.table ("events") / / query table in the metastore
Spark.read.format ("delta") .load ("/ delta/events") / / create table by path
Dataframe automatically reads the latest snapshots of data, and users do not need to refresh table. When predicate pushdown is available, delta lake automatically uses dividers and statistics to optimize the query, thereby reducing data loading.
4. Write data
A). Append
Spark's own append mode can be used to append data to existing tables:
Df.write.format ("delta"). Mode ("append"). Save ("/ delta/events") df.write.format ("delta"). Mode ("append"). SaveAsTable ("events")
Of course, delta also supports insert into:
INSERT INTO events SELECT * FROM newEvents
B). Overwrite
Delta lake also supports overwriting the entire table directly, using overwrite mode directly.
Dataframe api is as follows:
Df.write.format ("delta"). Mode ("overwrite"). Save ("/ delta/events") df.write.format ("delta"). Mode ("overwrite"). SaveAsTable ("events")
The SQL API format is as follows:
INSERT OVERWRITE events SELECT * FROM newEvents
When using Dataframe, you can also support overwriting only the data of the specified partition. The following example covers only January data:
Df.write .format ("delta") .mode ("overwrite") .option ("replaceWhere", "date > = '2017-01-01' AND date
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.