In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
This article Xiaobian for you to introduce in detail the "Apache Hudi 0.5.2 version of what features", detailed content, clear steps, details handled properly, I hope that this "Apache Hudi 0.5.2 version of what features" article can help you solve doubts, the following follow the editor's ideas slowly in-depth, together to learn new knowledge.
1. Migration Guid
The organizational structure of the Write Client module is refactored, see HUDI-554 for details. Now the client package contains all transaction management classes, the func package has been renamed execution, some helper classes have been moved to client/utils, and all previous code under the io package and compaction has been moved to table/compact. The table/rollback package places code related to the Rollback operation, and some general classes are placed under the table package. The above changes only affect users who rely on the hudi-client module. Users who use deltastreamer/datasource are not affected and do not need to make any changes.
two。 Key characteristics
It is supported to specify hoodie.compaction.payload.class configuration items in hoodie.properties to override the palyload implementation, which cannot be changed once the payload class is set in hoodie.properties. But in some cases, such as updating the jar package after code refactoring, you may need to pass a new payload implementation, and if you have this requirement, try using this feature.
TimestampBasedKeyGenerator supports CharSequence types. Previously, TimestampBasedKeyGenerator only supported four partition field types of Double,Long,Float,String, but now it extends to partition field types that can support CharSequence.
Hudi now supports incremental pull by specifying partitions through hoodie.datasource.read.incr.path.glob configuration items. In some scenarios, users only need to incrementally pull some partitions to speed up data pull by loading only relevant Parquet data files.
Version 0.5.2 supports the update of partition paths under GLOBAL_BLOOM indexes. Before setting up the GLOBAL_BLOOM index, when the updated record has different partition paths, Hudi will ignore the new partition path and update the record in the old partition. Now Hudi supports inserting data in the new partition and deleting the old partition data. This feature can be enabled through the hoodie.index.bloom.update.partition.path=true configuration item.
Version 0.5.2 supports the acquisition of metadata through JDBC by providing JdbcbasedSchemaProvider. This is useful for users who want to synchronize data from MySQL and want to get schema from the database.
Version 0.5.2 no longer has a limit on 2GB size for HoodieBloomIndex indexes. before version 2.4.0 of spark, each spark partition had a limit on 2GB size. when Hudi 0.5.1, the version of spark was upgraded to 2.4.4, and now there is no limit, so the calculation logic for security parallelism in HoodieBloomIndex has been removed.
CLI related changes
Allows the user to specify configuration items to print additional commit metadata, such as total number of Log Block, total number of Rollback Block, total number of compressed, updated entries, and so on.
Support for temp_query and temp_delete to query and delete temporary views, this command creates a temporary table that users can query through HiveQL, such as
Java temp_query-sql "select Instant, NumInserts, NumWrites from satishkotha_debug where FileId='ed33bd99-466f-4417-bd92-5d914fa58a8f 'and Instant >' 20200123211217' order by Instant"
After reading this, the article "what are the characteristics of Apache Hudi 0.5.2 version" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself to understand it. If you want to know more about related articles, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.