In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
The content of this course is:
Analysis of online blacklist filtering
Implementation of online blacklist filtering with SparkStreaming
Advertising billing system is an indispensable function point of e-commerce. In order to prevent malicious ad clicks (assuming that merchants An and B advertise in an e-commerce company at the same time, An and B are competitors, then if A uses a click robot to make malicious clicks on B's ads, then B's advertising fees will soon be used up), ad clicks must be blacklisted.
You can use leftOuterJoin to associate target data with blacklist data and filter out the data that hit the blacklist.
This paper mainly introduces the use of transform function of DStream.
SparkStreaming code implementation
Package com.dt.spark.sparkapps.streamingimport org.apache.spark.SparkConfimport org.apache.spark.streaming. {Seconds, StreamingContext} / * Spark online blacklist filtering program run by Scala development cluster * Created by Limaoran on 2016-5-2. * Sina Weibo: http://weibo.com/ilovepains/ * * background description: in the ad click billing system, we filter out the clicks on the blacklist online to protect the interests of advertisers * only use effective ad click billing or in the anti-brush scoring (or traffic) system to filter out invalid votes or ratings or traffic * implementation technology: use transform Api to program directly based on RDD, perform join operation * / object OnlineBlackListFilter {def main (args: Array [String]) {/ * step 1: create the configuration object SparkConf of Spark, and set the configuration information of the Spark program at run time. * for example, use setMaster to set the URL of the Master of the Spark cluster to be linked by the program. If set * to local, it means that the Spark program runs locally, which is especially suitable for beginners with very poor machine configuration conditions (for example, * only 1G of memory) * / val conf = new SparkConf () / / create the SparkConf object conf.setAppName ("OnlineBlackListFilter") / / set the name of the application. In the monitoring interface where the program is running, you can see the name conf.setMaster ("spark://Master:7077") / / at this time, the program prepares the blacklist data in the Spark cluster val ssc = new StreamingContext (conf,Seconds (30)) / *. In fact, the blacklist is generally dynamic. For example, in Redis or database, blacklist generation often has complex business * logic. The algorithms are different in specific situations. But during Spark Streaming processing, workers can access the complete information every time * / val blackList = Array (("hadoop", true), ("mahout", true)) val blackListRDD = ssc.sparkContext.parallelize (blackList,8) val adsClickStream = ssc.socketTextStream ("Master") 9999) / * each piece of data clicked on an advertisement simulated here is in the format of time, name * the result of map operation here is name, (time) Name) format * / val adsClientStreamFormated = adsClickStream.map (ads= > (ads.split (") (1), ads)) adsClientStreamFormated.transform (userClickRDD = > {/ / through leftOuterJoin operation not only retains all the RDD content of the content clicked by the user on the left, but also obtains whether the corresponding clicked content is filtered by filter when val joinedBlackListRDD = userClickRDD.leftOuterJoin (blackListRDD) / * * is in the blacklist. The input element is a Tuple: (name, ((time,name), boolean)) * where the first element is the name of the blacklist, and the second element is whether the value exists during leftOuterJoin * if so, the current ad click on the surface is a blacklist and needs to be filtered out, otherwise it will effectively click on the content. * / val validClicked = joinedBlackListRDD.filter (joinedItem= > {if (joinedItem._2._2.getOrElse (false)) {false} else {true}}) validClicked.map (validClick = > {validClick._2._1})}) .print () / * calculated valid data is generally written to Kafka The downstream billing system will pull from kafka to valid data for billing * / ssc.start () ssc.awaitTermination ()}}
Package the program and upload it to the spark cluster
On the spark-master node, start nc
Root@spark-master:~# nc-lk 9999
Run the OnlineBlacklistFilter program
Root@spark-master:~# / usr/local/spark-1.6.0/bin/spark-submit-- class com.dt.spark.sparkapps.streaming.OnlineBlackListFilter-- master spark://Master:7077. / sparkApps.jar
Input data on the NC side
Root@spark-master:~# nc-lk 999922555 spark124321 hadoop5555 Flink6666 HDFS2222 Kafka572231 Java66662 mahout
Results of SparkStreaming operation:
16-05-02 08:28:00 INFO MapPartitionsRDD: Removing RDD 8 from persistence list---5555 Flink6666 HDFS572231 Java22555 spark2222 Kafka
As a result, the hadoop and mathou set by the blacklist have been filtered out.
On the basis of this procedure, more complex business logic rules can be added to meet the needs of the enterprise.
Note:
1. DT big data DreamWorks Wechat official account DT_Spark
2. IMF 8: 00 p.m. Big data actual combat YY live broadcast channel number: 68917580
3. Sina Weibo: http://www.weibo.com/ilovepains
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.