In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
How to use streaming in spark? for this problem, this article introduces the corresponding analysis and answer in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible way.
1, basic use
Mainly conversion operators, action, and state operators, in fact, according to the api manual or source code interface introduction combined with business coding.
In fact, it is very necessary to master the principles of spark core,spark rpc,spark task scheduling and spark parallelism in order to make good use of spark streaming.
2, intermediate state cache
When it comes to intermediate operators, we will certainly think of states such as UpdateStateByKey. There are a lot of considerations, such as sequencing, key timeout mechanism maintenance. This is suitable for a small amount of data, especially when the dimensions of key are small and the value is not large.
Of course, the amount of data has increased, what to do if you want to maintain the intermediate state? In fact, it must be third-party storage at this time, such as redis,alluxio. Redis is more suitable for key with a timeout mechanism, and the amount of data must not be too large. And alluxio is very suitable for those with high throughput, such as deduplicating statistics.
3, result output
Direct streaming can guarantee only one processing, but requires that the output store support confidentiality, or actively change the result so that there is an update and there is no insertion. Of course, it's even higher if the external storage system supports transactions and can be processed at exactly one time.
In fact, at the level of offset maintenance, the combination of different versions of spark streaming and different versions of kafka is very different.
4. Monitoring, alarm and automatic fault recovery
I think it is no less important for monitoring alarm and automatic fault recovery than for business scenarios. Because no matter how good the business is, you don't know how to hang up the system. Because you can't stare at the system 24 hours a day. And many companies have kpi for automatic fault recovery, such as 3min, it is impossible to detect and recover the fault manually, so they need to implement their own monitoring system.
5, tuning
Tuning is very important for spark streaming, because a batch processing delay will lead to job accumulation, resulting in output delay and data loss for Shenzhen tasks. In fact, tuning pays most attention to the control of spark principles, the understanding of the amount of data and the relationship between resources and data.
6, source code
Source code reading, in order to help you understand the principle more thoroughly. It will be divided into three parts:
Spark streaming and kafka-0.8.2 direct stream.
Spark streaming and kafka-0.8.2 receiver based stream.
Spark streaming and kafka-0.10.2 direct api.
This is the answer to the question about how to use streaming in spark. I hope the above content can be of some help to you. If you still have a lot of doubts to solve, you can follow the industry information channel for more related knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.