In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces how to use Hadoop to find common friends, the article is very detailed, has a certain reference value, interested friends must read it!
Preface
In many social APP, such as the familiar QQ friend list, when you open the dialog box, you can often see a recommended list of common friends below. In this way, users can add potential associated friends.
How can this function be realized? Students who know more about redis should quickly think that they can use redis to achieve this function. Yes, redis is really a good solution to achieve this function.
However, the implementation of redis has some limitations, because redis requires a lot of memory resources for storage, data and computing. Imagine that on a scale like Tencent QQ, if you do it in this way, the estimated input cost of the Redis server will be a lot of overhead.
You can also use MapReduce in hadoop to achieve this function, how to achieve it?
Business analysis
The following is the original data file, the first column can be understood as me, and the second behavior of the user's friend list is separated by a comma. For example, the friends of user An include: BMagi, C, D, D, F, and E, and so on.
A:B,C,D,F,E,O
B:A,C,E,K
C:F,A,D,I
D:A,E,F,L
E:B,C,D,M,L
F:A,B,C,D,E,O,M
G:A,C,D,E,F
H:A,C,D,E,O
I:A,O
J:B,O
K:A,C,D
L:D,E,F
M:E,F,G
O:A,H,I,J
The requirement now is to output which of all the users in the file have common friends through the original data file, in the following format:
Amurb Cpene
Amurc Fpeng D
Amurd Eponent F
.
Analysis of realization ideas
Step 1: split the original data into the following format
Through this step, you can get a set of Khand Vs, which can clearly reflect all the friends of a user.
BRV A # B is A's friend.
CJV A # C is A's friend.
DJV A # D is A's friend.
F:A
E:A
O:A
A:B
C:B
E:B
K:B
F:C
A:C
D:C
I:C
B:E
C:E
D:E
M:E
L:E
Step 2. The data in the first step is further processed into the following format
From the data after the first step has been formatted, we can clearly see and sum up a rule, that is, the friend list of those users on the left. Taking C user as an example, we can see that C user has three friends of A _ Magi B _ E. Conversely, the three ABE users have a common friend A.
Other analogies are understood.
C A-B-E # C is a common friend of An and B and E.
D Amurc # D is a mutual friend of An and B.
A BMY C # An is a mutual friend of B and C.
B Amure E # An is the mutual friend of E and B.
.
Step 3. Change the location of the data in step 2
We know from step 2 that C's friends have ABE, on the other hand, ABE and their common friends have C. for this kind of more than 3, you can consider pairwise combination in the next step.
A-B-E C # A, B, E have common friends C.
Amurc D # An and C have a common friend D
BMY C A # B and C have a common friend A
Amure B # An and E have a common friend B
Step 4. Continue to split the data obtained from step 3.
In step 3, data such as: A-B-E C obviously needs to be further split, because the final result is to find a common friend between two friends, so it can be split into: Amurb, Cmeme, Cmae, Bmure, C, to make the final preparation for the next step of data combination.
Amurb C
Amure C
BMY E C
Amurc D
Bmurc A
Amure B
.
Step 5. Merge the data obtained from step 4.
In programming with MapReduce, we know that the data out of the Map phase and the data entered into the reduce method are all the same as key. In the fourth step, for example, there are two Amure E key, so the final output result through the reduce method is: AME Cpeng B, that is, the common friends of the two users are C and B.
Amura B C # A Magi B common friends are C.
Amure CMagic B # Arecoe E has a common friend CMagic B
Bmure E C # BMague E has a common friend C.
Amurc D # AMague C has a common friend D
Bmurc A # BMague C has a common friend A
.
Through the above data analysis, we can finally achieve the desired results, at the same time, we can also see that the above steps are divided into MapRedcue, it is obvious that a MapReduce can not be completed, at least 2
The following is combined with the above step analysis to get the data flow chart that requires two MapReduce. Refer to this diagram to help us analyze and write the code logic as a reference.
Coding implementation 1. The first map class public class FirstMapper extends Mapper {@ Override protected void map (LongWritable key, Text value, Context context) throws IOException, InterruptedException {String val = value.toString (); String [] split = val.split (":"); / / A _ Vub _ B _ split _ C _ parting, the left is the original user, and the right is the friend String user = split [0]; String friends = split [1] String [] friendLists = friends.split (","); / / the output result of Map1 is: / * * B A * C A * D A * F A * E A * / for (String str: friendLists) {context.write (new Text (str), new Text (user)) 2. The first Reduce class public class FirstReducer extends Reducer {@ Override protected void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {StringBuffer stringBuffer = new StringBuffer (); for (Text text: values) {stringBuffer.append (text) .append ("-");} / / the data format written out is: AME B. Context.write (new Text (stringBuffer.toString ()), key);}} 3, the first Job class public class FirstJob {public static void main (String [] args) throws Exception {/ / 1, get job Configuration configuration = new Configuration (); Job job = Job.getInstance (configuration); / / 2, set jar path job.setJarByClass (FirstJob.class) / / 3. Associate mapper with Reducer job.setMapperClass (FirstMapper.class); job.setReducerClass (FirstReducer.class); / / 4. Set the key/val type of map output job.setMapOutputKeyClass (Text.class); job.setMapOutputValueClass (Text.class); / / 5. Set the key/val type of final output job.setOutputKeyClass (Text.class); job.setOutputValueClass (Text.class) / / 6. Set the final output path String inputPath = "F:\\ network disk\\ csv\\ friends.txt"; String outPath = "F:\\ network disk\\ csv\\ friends1"; FileInputFormat.setInputPaths (job,new Path (inputPath)); FileOutputFormat.setOutputPath (job,new Path (outPath)); / / 7 submit job boolean result = job.waitForCompletion (true) System.exit (result? 0: 1);}}
Run the Job code above, and then open the file of the first stage after the run. In terms of content format, it meets the requirements of the output result of the first phase, that is, the following data format
4. The second map class public class SecondMapper extends Mapper {@ Override protected void map (LongWritable key, Text value, Context context) throws IOException InterruptedException {/ / I Murk Kmurc Mustco Bmurf Gmurh Murray Omura D-A stage 1 file output format / * final output format: * Imurk A * Imurc A * Imurb A *. * / / the data on the left needs to be split in pairs and combined with V to output String val = value.toString (); String [] split = val.split ("\ t"); String v2 = split [1]; String [] allUsers = split [0] .split ("-"); Arrays.sort (allUsers); for (int iTuno [I)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.