In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the characteristics of Delta Lake 0.5.0, which is very detailed and has a certain reference value. Friends who are interested must finish reading it!
Delta Lake 0.5.0 was officially released on December 13, 2019, and the official version can be downloaded and used at https://github.com/delta-io/delta/releases/tag/v0.5.0?spm=a2c6h.12873639.0.0.23ea24406RkWrF&file=v0.5.0. This version supports multiple query engines to query Delta Lake data, such as the common Hive and Presto query engines. Concurrency operations have been improved. Of course, this version still does not support the direct use of SQL to add, delete, modify and check Delta Lake data, which may have to wait until the release of Apache Spark 3.0.0 in January.
Support multiple query engines by using manifest files
In previous versions of Delta Lake, only Spark was supported to query Delta Lake data, making its usage scenario somewhat limited. But by introducing the manifest file (see # 76, https://github.com/delta-io/delta/issues/76?spm=a2c6h.12873639.0.0.152d2440UP8orM), we can use query engines such as Presto/Amazon Athena to query Delta Lake data. Amazon Athena is an interactive query service that allows you to easily analyze data in Amazon S3 directly using standard SQL, which is internally implemented using Presto.
The generation of manifest files can be implemented using Scala, Java, Python and SQL. For more information, please see the latest document of Delta Lake: Presto and Athena to Delta Lake Integration, https://docs.delta.io/0.5.0/presto-integration.html?spm=a2c6h.12873639.0.0.152d2440FjP2qK
In addition to supporting Presto/Amazon Athena, it also supports Redshift Spectrum (Amazon Redshift is a fast, fully managed PB-level data warehouse service that makes it easy and affordable to efficiently analyze all your data with existing business intelligence tools. Amazon Redshift cannot analyze the data on S3 directly, he needs to copy the data on S3 to Amazon Redshift, while Redshift Spectrum is a new feature of Amazon Redshift that supports direct analysis of data on S3. ), Snowflake (this is a data lake product of an American company that supports the analysis of data on S3), and Hive (only Delta Lake data is supported, not metastore).
But after all, it is not implemented through manifest files, so there are some Delta Lake native features that are not supported for the time being, such as data consistency may not be guaranteed, and the upper query engine will not be aware of the underlying schema change and will have to re-establish it. For more information, please see limitations
Https://docs.delta.io/0.5.0/presto-integration.html?spm=a2c6h.12873639.0.0.152d24401m0KhO#limitations
Better support for concurrent operations
Now we can run more Delta Lake operations at the same time. This implementation is resolved by making conflict detection of Delta Lake optimistic concurrency control more fine-grained. This allows us to run more complex workflows on the Delta Lake table:
Delete the old partition when you add a new partition
Run updates (updates) and merge (merges) concurrently on disjoint partitions
Add data to Delta Lake at the same time when the data is compressed.
For details, see concurrency Control, https://docs.delta.io/0.5.0/concurrency-control.html.
Improved support for file compression
When compressing the data, you can now rewrite the file by setting the dataChange option of DataFrameWriter to false. This option allows compression operations to run concurrently with other batch and stream operations. For information on how to use the zip file, see
Https://docs.delta.io/0.5.0/best-practices.html?spm=a2c6h.12873639.0.0.23ea2440hRtZny#compact-files .
Improved performance of pure insert merge (insert-only merge)
Delta Lake now provides more optimized performance for merge operations that only insert clauses without updating clauses. In addition, Delta Lake ensures that this insert-only merge operation only appends (append) new data to the table. For example, a common ETL operation appends the collected data to the Delta Lake table. However, these sources often generate duplicate log records that require the lower end to delete the duplicate data, and with this feature, we can avoid inserting duplicate records. For more information, see https://docs.delta.io/0.5.0/delta-update.html?spm=a2c6h.12873639.0.0.152d2440yuXqE9#-merge-in-dedup.
Convert Parquet table to Delta Lake table through SQL
In Delta Lake version 0. 4. 0, Scala, Java, and Python are already supported for this purpose. To make it easier to use, Delta Lake version 0.5.0 supports converting Parquet tables to Delta Lake tables directly through SQL, as follows:
Convert unpartitioned parquet table at path 'path/to/table'
CONVERT TO DELTA parquet.`path / to/ Table`
Convert partitioned parquet table at path 'path/to/table' and partitioned by integer column named' part'
CONVERT TO DELTA parquet.`path / to/ Table`PARTITIONED BY (part int)
These are all the contents of this article entitled "what are the features of Delta Lake 0.5.0?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.