In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Practical case of the project: query and analysis of Sogou's log
Data:
I. the overall structure of the e-commerce big data platform
1. Big data (Hadoop, Spark, Hive) is a way to implement data warehouse.
Core issues: data storage, data computing
What is a data warehouse? The traditional way to solve big data is a database.
Usually only do queries.
2. The overall architecture deployment of big data platform: Apache, Ambari (HDP), CDH
Second, use the waterfall model in the project (software engineering: methodology)
1. How many stages of the waterfall model?
2. Tasks completed in each stage
Third, use MapReduce for analysis and processing (Java program)
1. The basic principle of MapReduce (programming model)
() Source of thought: Google's paper: MapReduce question PageRank (page ranking)
() split and then merge-> distributed computing
2. Use MapReduce for log analysis
Use Spark for analysis and processing (Scala language, Java language)
1. The advantages and architecture of Spark
2. Use Scala to develop Spark tasks for log analysis
Bin/spark-shell-master spark://bigdata11:7077
Val rdd1 = sc.textFile ("hdfs://mydemo71:8020/myproject/data/SogouQ1.txt") val rdd2=rdd1.map (_ .split ("\ t"). Filter (_ .length = = 6) rdd2.count () val rdd3=rdd2.filter (_ (3) .toInt = = 1). Filter (_ (4) .toInt==2) rdd3.count () rdd3.take (3)
Fifth, use Hive (honeycomb) for analysis and processing.
1. What is Hive? Characteristics? Hive architecture
Is a data warehouse based on HDFS
Support for SQL statements
It's a translator: SQL-> MapReduce (Spark task)
2. Use Hive for query operation! [] (https://cache.yisu.com/upload/information/20200310/72/153260.jpg?x-oss-process=image/watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=) ① creates the table create table sogoulog (accesstime string,useID string,keyword string,no1 int,clickid int,url string) row format delimited fields terminated by', 'corresponding to Hive * * ② cleans the raw data: because some do not satisfy the length of 6 val rdd1 = sc.textFile ("hdfs://mydemo71:8020/myproject/data/SogouQ1.txt") val rdd2=rdd1.map (_ .split ("\ t"). Filter (_ .length = = 6) val rdd3 = rdd2.map (x = > x.mkString (") ")) it should be noted that the cleaned data will be imported into Hive load data inpath'/ myproject/cleandata/sogou/part-00000' into table sogoulog by converting it into the string rdd3.saveAsTextFile (" hdfs://mydemo71:8020/myproject/cleandata/sogou ") * * ③. Load data inpath'/ myproject/cleandata/sogou/part-00001' into table sogoulog; ④ uses SQL to query qualified data (only the first 10 items are displayed) * * select * from sogoulog where no1=1 and clickid=2 limit 10 roles * query employees in Department 10 whose salary is greater than 2000. Many people know that I have big data training materials, and they naively think that I have a full set of big data development, hadoop, spark and other video learning materials. I would like to say that you are right. I do have a full set of video materials developed by big data, hadoop and spark.
If you are interested in big data development, you can add a group to get free learning materials: 763835121
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.