Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The thought of MapReduce

2025-03-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Practical case of the project: query and analysis of Sogou's log

Data:

I. the overall structure of the e-commerce big data platform

1. Big data (Hadoop, Spark, Hive) is a way to implement data warehouse.

Core issues: data storage, data computing

What is a data warehouse? The traditional way to solve big data is a database.

Usually only do queries.

2. The overall architecture deployment of big data platform: Apache, Ambari (HDP), CDH

Second, use the waterfall model in the project (software engineering: methodology)

1. How many stages of the waterfall model?

2. Tasks completed in each stage

Third, use MapReduce for analysis and processing (Java program)

1. The basic principle of MapReduce (programming model)

() Source of thought: Google's paper: MapReduce question PageRank (page ranking)

() split and then merge-> distributed computing

2. Use MapReduce for log analysis

Use Spark for analysis and processing (Scala language, Java language)

1. The advantages and architecture of Spark

2. Use Scala to develop Spark tasks for log analysis

Bin/spark-shell-master spark://bigdata11:7077

Val rdd1 = sc.textFile ("hdfs://mydemo71:8020/myproject/data/SogouQ1.txt") val rdd2=rdd1.map (_ .split ("\ t"). Filter (_ .length = = 6) rdd2.count () val rdd3=rdd2.filter (_ (3) .toInt = = 1). Filter (_ (4) .toInt==2) rdd3.count () rdd3.take (3)

Fifth, use Hive (honeycomb) for analysis and processing.

1. What is Hive? Characteristics? Hive architecture

Is a data warehouse based on HDFS

Support for SQL statements

It's a translator: SQL-> MapReduce (Spark task)

2. Use Hive for query operation! [] (https://cache.yisu.com/upload/information/20200310/72/153260.jpg?x-oss-process=image/watermark,size_16,text_QDUxQ1RP5Y2a5a6i,color_FFFFFF,t_100,g_se,x_10,y_10,shadow_90,type_ZmFuZ3poZW5naGVpdGk=) ① creates the table create table sogoulog (accesstime string,useID string,keyword string,no1 int,clickid int,url string) row format delimited fields terminated by', 'corresponding to Hive * * ② cleans the raw data: because some do not satisfy the length of 6 val rdd1 = sc.textFile ("hdfs://mydemo71:8020/myproject/data/SogouQ1.txt") val rdd2=rdd1.map (_ .split ("\ t"). Filter (_ .length = = 6) val rdd3 = rdd2.map (x = > x.mkString (") ")) it should be noted that the cleaned data will be imported into Hive load data inpath'/ myproject/cleandata/sogou/part-00000' into table sogoulog by converting it into the string rdd3.saveAsTextFile (" hdfs://mydemo71:8020/myproject/cleandata/sogou ") * * ③. Load data inpath'/ myproject/cleandata/sogou/part-00001' into table sogoulog; ④ uses SQL to query qualified data (only the first 10 items are displayed) * * select * from sogoulog where no1=1 and clickid=2 limit 10 roles * query employees in Department 10 whose salary is greater than 2000. Many people know that I have big data training materials, and they naively think that I have a full set of big data development, hadoop, spark and other video learning materials. I would like to say that you are right. I do have a full set of video materials developed by big data, hadoop and spark.

If you are interested in big data development, you can add a group to get free learning materials: 763835121

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report