In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I would like to talk to you about Filter, Project and Pushdowns case analysis, many people may not know much about it. In order to make you understand better, the editor has summarized the following for you. I hope you can get something according to this article.
1. Filter (filtering) and Project (mapping)
In traditional OLAP systems, the use of filtering and mapping in Join will greatly improve performance. Similarly, using Filter and Projection in Hadoop can also improve efficiency by reducing the amount of data that a pipeline needs to process. Reducing the amount of data processed in Hadoop is critical, especially if it needs to be processed over a network and local disk. We all know that MapReduce's shuffle process writes data to disk over the network, so having less data means less work for Job and MapReduce frameworks, faster data transfer for Job, and less pressure on CPU, disks, and network devices.
Use filters and projection to reduce data size
Filter and projections are finally executed close to the data source; in MapReduce, it is best to execute in mapper. The following code shows an exclusion of users under the age of 30 and maps only their names and status:
Public static class JoinMap extends Mapper {@ Override protected void map (LongWritable offset, Text value, Context context) throws IOException, InterruptedException {
User user = User.fromText (value); if (user.getAge () > = 30) {context.write (new Text (user.getName ()), new Text (user.getState ();} use filter in Join to note that not all connected datasets contain the fields you need to filter. In this case, you need to use the Bloom filter method. 2. Pushdowns predicate push-down (predicate pushdown) belongs to logic optimization. The optimizer can push predicate filtering down to the data source, causing physical execution to skip extraneous data. In the case of Parquet, the file is more likely to be skipped as a whole, and the system converts string comparisons into less expensive integer comparisons through dictionary encoding. In a relational database, predicates are pushed down to an external database to reduce data transfer. Through the following diagram, we can find that predicate pushdown can be logically understood as using the filtering conditions in the where condition to filter out the useless data and finally get the desired ranks.
Projection and predicate push-down are further filtered by mapping the storage format and pushing predicates. For storage formats like Parquet, we can skip the whole record or the whole block directly, which greatly improves the performance of Job and reduces unnecessary overhead.
It is important to note here that Avro is a row-based storage format.
What you need to know: 1. For Inner Join, Hive only supports equivalent connections, not non-equivalent connections. Because unequal connections are troublesome to convert in MapReduce Job. two。 Although Hive does not support equivalent connections, it can still be used in Cross Join and Where conditions. Here are the conditions under which Cross Join occurs:
Use the Cross Join keyword
Only Join keywords, no On condition
There is a Join keyword, and On is followed by a situation where it is absolutely Ture (for example, 1x1)
After reading the above, do you have any further understanding of Filter, Project and Pushdowns case analysis? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.