MapReduce實例淺析(4)_數據庫

MapReduce實例淺析(4)

發表于：2015-07-10來源：uml.org.cn作者：open經驗庫點擊數：標簽：數據庫

14/12/17 23:04:20 INFO mapred.MapTask: record buffer = 262144/327680 14/12/17 23:04:20 INFO mapred.MapTask: Starting flush of map output 14/12/17 23:04:20 INFO mapred.MapTask: Finished spill 0 14/12/1

　　14/12/17 23:04:20 INFO mapred.MapTask: record buffer = 262144/327680

　　14/12/17 23:04:20 INFO mapred.MapTask: Starting flush of map output

　　14/12/17 23:04:20 INFO mapred.MapTask: Finished spill 0

　　14/12/17 23:04:20 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting

　　14/12/17 23:04:20 INFO mapred.LocalJobRunner:

　　14/12/17 23:04:20 INFO mapred.TaskRunner: Task ‘attempt_local_0001_m_000001_0′ done.

　　14/12/17 23:04:20 INFO mapred.LocalJobRunner:

　　14/12/17 23:04:20 INFO mapred.Merger: Merging 2 sorted segments

　　14/12/17 23:04:20 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 90 bytes

　　14/12/17 23:04:20 INFO mapred.LocalJobRunner:

　　14/12/17 23:04:20 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting

　　14/12/17 23:04:20 INFO mapred.LocalJobRunner:

　　14/12/17 23:04:20 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now

　　14/12/17 23:04:20 INFO output.FileOutputCommitter: Saved output of task ‘attempt_local_0001_r_000000_0′ to out

　　14/12/17 23:04:20 INFO mapred.LocalJobRunner: reduce > reduce

　　14/12/17 23:04:20 INFO mapred.TaskRunner: Task ‘attempt_local_0001_r_000000_0′ done.

　　14/12/17 23:04:20 INFO mapred.JobClient: map 100% reduce 100%

　　14/12/17 23:04:20 INFO mapred.JobClient: Job complete: job_local_0001

　　14/12/17 23:04:20 INFO mapred.JobClient: Counters: 14

　　14/12/17 23:04:20 INFO mapred.JobClient: FileSystemCounters

　　14/12/17 23:04:20 INFO mapred.JobClient: FILE_BYTES_READ=46040

　　14/12/17 23:04:20 INFO mapred.JobClient: HDFS_BYTES_READ=51471

　　14/12/17 23:04:20 INFO mapred.JobClient: FILE_BYTES_WRITTEN=52808

　　14/12/17 23:04:20 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=98132

　　14/12/17 23:04:20 INFO mapred.JobClient: Map-Reduce Framework

　　14/12/17 23:04:20 INFO mapred.JobClient: Reduce input groups=3

　　14/12/17 23:04:20 INFO mapred.JobClient: Combine output records=0

　　14/12/17 23:04:20 INFO mapred.JobClient: Map input records=4

　　14/12/17 23:04:20 INFO mapred.JobClient: Reduce shuffle bytes=0

　　14/12/17 23:04:20 INFO mapred.JobClient: Reduce output records=4

　　14/12/17 23:04:20 INFO mapred.JobClient: Spilled Records=8

　　14/12/17 23:04:20 INFO mapred.JobClient: Map output bytes=78

　　14/12/17 23:04:20 INFO mapred.JobClient: Combine input records=0

　　14/12/17 23:04:20 INFO mapred.JobClient: Map output records=4

　　14/12/17 23:04:20 INFO mapred.JobClient: Reduce input records=4

　　可見在默認情況下，MapReduce原封不動地將輸入寫到輸出

　　下面介紹MapReduce的部分參數及其默認設置：

　　(1)InputFormat類

　　該類的作用是將輸入的數據分割成一個個的split，并將split進一步拆分成對作為map函數的輸入

　　(2)Mapper類

　　實現map函數，根據輸入的對生產中間結果

　　(3)Combiner

　　實現combine函數，合并中間結果中具有相同key值的鍵值對。

　　(4)Partitioner類

　　實現getPartition函數，用于在Shuffle過程按照key值將中間數據分成R份，每一份由一個Reduce負責

　　(5)Reducer類

　　實現reduce函數，將中間結果合并，得到最終的結果

　　(6)OutputFormat類

　　該類負責輸出最終的結果

　　上面的代碼可以改寫為:

public class LazyMapReduce {
    public static void main(String[] args) throws Exception {
        // TODO Auto-generated method stub
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if(otherArgs.length != 2) {
            System.err.println("Usage:wordcount");
            System.exit(2);
        }
        Job job = new Job(conf, "LazyMapReduce");
        job.setInputFormatClass(TextInputFormat.class);
        job.setMapperClass(Mapper.class);
         
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(Text.class);
        job.setPartitionerClass(HashPartitioner.class);
        job.setReducerClass(Reducer.class);
         
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(Text.class);
        job.setOutputFormatClass(FileOutputFormat.class);
         
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true)? 0:1);
    }
}

原文轉自：http://www.uml.org.cn/sjjm/201501201.asp

軟件測試 > 測試開發技術 > 軟件測試開發語言 > 數據庫 >