In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. Principle explanation
The main results are as follows: (1) the original data set List is sorted according to certain rules, and the initial distance threshold is set as T1, T2 T1 > T2.
(2) A data vector is randomly selected in List, and a rough distance calculation method is used to calculate the distance d between An and other sample data vectors in List.
(3) according to the distance d in 2, the sample data vector with d less than T1 is drawn into a canopy, and the sample data vector with d less than T2 is removed from the List.
(4) repeat 2 and 3 until List is empty
2. Download test data
Cd / tmp
Hadoop dfs-mkdir / input
Wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
Hadoop dfs-copyFromLocal / tmp/synthetic_control.data / input/synthetic_control.data
3. Format conversion (text → vector)
Edit the file Text2VectorWritable.jar
Package mahout.fansy.utils.transform
Import java.io.IOException
Import org.apache.hadoop.conf.Configuration
Import org.apache.hadoop.fs.Path
Import org.apache.hadoop.io.LongWritable
Import org.apache.hadoop.io.Text
Import org.apache.hadoop.mapreduce.Job
Import org.apache.hadoop.mapreduce.Mapper
Import org.apache.hadoop.mapreduce.Reducer
Import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
Import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat
Import org.apache.hadoop.util.ToolRunner
Import org.apache.mahout.common.AbstractJob
Import org.apache.mahout.math.RandomAccessSparseVector
Import org.apache.mahout.math.Vector
Import org.apache.mahout.math.VectorWritable
/ * *
*-- * transform text data to vectorWritable data
*-- * @ author fansy
*-- *
*-* /
Public class Text2VectorWritable extends AbstractJob {
Public static void main (String [] args) throws Exception {
ToolRunner.run (new Configuration (), new Text2VectorWritable (), args)
}
@ Override
Public int run (String [] arg0) throws Exception {
AddInputOption ()
AddOutputOption ()
If (parseArguments (arg0) = = null) {
Return-1
}
Path input=getInputPath ()
Path output=getOutputPath ()
Configuration conf=getConf ()
/ / set job information
Job job=new Job (conf, "text2vectorWritableCopy with input:" + input.getName ())
Job.setOutputFormatClass (SequenceFileOutputFormat.class)
Job.setMapperClass (Text2VectorWritableMapper.class)
Job.setMapOutputKeyClass (LongWritable.class)
Job.setMapOutputValueClass (VectorWritable.class)
Job.setReducerClass (Text2VectorWritableReducer.class)
Job.setOutputKeyClass (LongWritable.class)
Job.setOutputValueClass (VectorWritable.class)
Job.setJarByClass (Text2VectorWritable.class)
FileInputFormat.addInputPath (job, input)
SequenceFileOutputFormat.setOutputPath (job, output)
If (! job.waitForCompletion (true)) {/ / wait for the job is done
Throw new InterruptedException ("Canopy Job failed processing" + input)
}
Return 0
}
/ * *
* Mapper main procedure
* @ author fansy
*
-- * /
Public static class Text2VectorWritableMapper extends Mapper {
Public void map (LongWritable key,Text value,Context context) throws IOException,InterruptedException {
String [] str=value.toString () .split ("\\ s {1,}")
/ / split data use one or more blanker
Vector vector=new RandomAccessSparseVector (str.length)
For (int iTuno Bandi)
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.