大家好,今天给大家介绍一下如何建立MapReduce程序的基本模板

MapReduce程序与您所学过的编程模型有所不同。您需要花一些时间,并进行一些练习来熟悉它。为了帮助您精通它,我们在后面几章会通过多个例子来进行练习。这些例子描述了不同的MapReduce编程技术。通过用不同方式应用MapReduce,您可以开始培养一种直觉,并养成“用MapReduce思考(thinking in MapReduce)”的习惯。这些例子包括了简单的例子和高级的应用。在一个高级的应用程序中,我们介绍了Bloom滤镜,一种在标准的计算机科学课程中不会讲授的数据结构。您会了解到处理大量的数据集时,无论您是否使用Hadoop,通常都会需要重新考虑底层的算法。

  我们假设您已经掌握了Hadoop的基础,您可以建立Hadoop,并编译和运行示例程序,例如第一章中的单词统计的例子。我们将以现实世界中的数据集为例来进行学习。

获取专利数据集

要用Hadoop做一点有意义的事情的话,我们需要数据。我们的许多例子会使用专利数据集,可以从全国经济研究局(NBER)的网址http://www.nber.org/patents/获取这些数据。这些数据集最初是为论文《NBER专利引用数据文件:经验,见解和方法工具》编制的。我们将使用专利引用数据集cite75_99.txt和专利描述数据集apat63_99.txt。
请注意

 每个数据集有将近250MB,这对于我们的以独立或伪分布模式运行的Hadoop而言是足够小的。您可以使用它们练习编写MapReduce程序,甚至不需要访问一个集群。Hadoop最好的一个方面是您可以很确定您的MapReduce程序可以在集群机上运行,处理100或者1000倍的数据,而几乎不需要改动任何代码。

一个开发中经常涉及的话题,是为您的大量的生产数据建立较小的用于示范的子集,这也被称为开发数据集。这些开发数据集可能只有几百兆。这将缩短您的开发进程中的,在开发与生产环境之间切换所需要的往返时间,便于您在自己的机器上运行,并在另一个独立的环境中进行调试。

  我们选择这两个数据集是因为它们与您遇到的大多数数据类型相似。首先,这些引用数据构成了一个“图”,而用于描述网络连接和社交网络的数据结构也是图。专利是按时间顺序公布的,它们的一些属性表示了时间序列。每个专利都与一个人(发明者)和一个地点(发明者所在的国家)。您可以将它们看作个人或者地理信息。最后您可以将这些数据看作定义良好的数据库关系,它们以逗号分隔。

请注意

有很多这两个数据集无法完全表现的数据类型,例如文本,但您已经在单词计算的例子里见过文本了。其他没有涉及的类型包括XML、图像和地理位置信息(用经纬度的形式表示)。数学矩阵没有以一般的形式表示,尽管引用图可以被解释为离散的0/1矩阵。
专利引用数据

这些专利引用数据包含了美国从1975年到1999年之间发布的引用。它有超过1600万行数据,并且前几行包含类似这样的信息:
以专利数据集65为例:

“CITING”,”CITED”
3858241,956203
3858241,1324234
3858241,3398406
3858241,3557384
3858241,3634889
3858242,1515701
3858242,3319261
3858242,3668705
3858242,3707004

数据集以标准的逗号分隔值(CSV)格式表示,第一行是列的描述。其他的每一行记录了一个特定的引用。例如,第二行表示专利3858241引用了专利956203。文件是按照进行引用的专利(而不是被引用的专利)进行排序的。我们可以看到专利3858241总共引用了五个专利。更定量地分析这些数据可以使我们对它有一个更深入的了解。

  如果您只是阅读这个数据文件,引用数据看起来好像只是一系列的数据。您可以用更有趣的术语来考虑这些数据。一种方式是将它想象为一张图。在图 4.1中,我们展示了这张引用图的一部分。我们可以看到有些专利经常被引用,而另一些则从来没有被引用过。专利5936972和6009552引用了类似的专利集合(4354269, 4486882, 5598422),尽管它们没有相互引用。我们可以使用Hadoop来获取关于这些专利数据的描述性的数据,并寻找有趣的但不那么明显的专利。

专利描述数据

  我们使用的另一个数据集是描述数据。它包含了专利号、专利申请年份、专利授予年份、索赔金额和其他关于专利的元数据。看看这个数据的前面几行。它与一个关系型数据库中的表格很相似,但它是CSV格式的。这个数据集有超过290万行记录。和现实世界中的很多数据集一样,它可能有丢失的数据

专利描述数据

“PATENT”,”GYEAR”,”GDATE”,”APPYEAR”,”COUNTRY”,”POSTATE”,”ASSIGNEE”,
➥ ”ASSCODE”,”CLAIMS”,”NCLASS”,”CAT”,”SUBCAT”,”CMADE”,”CRECEIVE”,
➥ ”RATIOCIT”,”GENERAL”,”ORIGINAL”,”FWDAPLAG”,”BCKGTLAG”,”SELFCTUB”,
➥ ”SELFCTLB”,”SECDUPBD”,”SECDLWBD”
3070801,1963,1096,,”BE”,””,,1,,269,6,69,,1,,0,,,,,,,
3070802,1963,1096,,”US”,”TX”,,1,,2,6,63,,0,,,,,,,,,
3070803,1963,1096,,”US”,”IL”,,1,,2,6,63,,9,,0.3704,,,,,,,
3070804,1963,1096,,”US”,”OH”,,1,,2,6,63,,3,,0.6667,,,,,,,
3070805,1963,1096,,”US”,”CA”,,1,,2,6,63,,1,,0,,,,,,,
3070806,1963,1096,,”US”,”PA”,,1,,2,6,63,,0,,,,,,,,,
3070807,1963,1096,,”US”,”OH”,,1,,623,3,39,,3,,0.4444,,,,,,,
3070808,1963,1096,,”US”,”IA”,,1,,623,3,39,,4,,0.375,,,,,,,
3070809,1963,1096,,”US”,”AZ”,,1,,4,6,65,,0,,,,,,,,,
请注意

和其他数据分析一样,我们在解释这些有限的数据时需要非常地谨慎。如果一个专利看起来没有引用任何其他的专利,它可能是我们没有引用信息的旧的专利。另一方面,时间越晚的专利被引用的频率更小,因为只有更新的专利才会意识到它们的存在。

第一行包含了一些属性的名称,这只有对专利专家有意义。尽管我们不了解所有的属性,了解它们中的一部分仍然是十分有用的。表 4.1描述了前10行。

属性名称 内容
PATENT 专利号
GYEAR 授权年份
GDATE 授权日期, 从1960年1月1日算起的日期数
APPYEAR 申请日期(只对1967年之后授权的专利有效)
COUNTRY 第一发明人的国家
POSTATE 第一发明人所在的州(如果国家是美国)
ASSIGNEE 专利受让人的数字标识(例如,专利拥有者)
ASSCODE 一位数(1-9)表示的受让人类型。 (受让人类型包括美国个人,美国政府,美国组织,非美国个人,等等)
CLAIMS 索赔金额(只对1975年之后授权的专利有效)
NCLASS 三位数表示的专利类别

既然我们已经有了两个专利数据集,那么让我们编写Hadoop程序来处理这些数据吧。

建立MapReduce程序的基本模板

我们的大多数MapReduce程序是简短的并且是在一个模板上进行变化的。担负编写一个新的MapReduce程序时,您通常需要在一个现有的MapReduce程序上进行修改,直到它成为您想要的样子。在这个小节里,我们将编写第一个MapReduce程序并解释它的不同部分。这个程序可以作为将来的MapReduce程序的模板。我们的第一个程序将把专利引用数据作为输入,并将它反转。对每个专利,我们想要找出引用它的专利并将它们分组。

环境:Vmware 8.0 和Ubuntu11.04

Hadoop 实战之分析专利引用数据集(一)---计算专利引用数据并排序

第一步:首先创建一个工程命名为HadoopTest.目录结构如下图:

第二步: 在/home/tanglg1987目录下新建一个start.sh脚本文件,每次启动虚拟机都要删除/tmp目录下的全部文件,重新格式化namenode,代码如下:

sudo rm -rf /tmp/*
rm -rf /home/tanglg1987/hadoop-0.20.2/logs
hadoop namenode -format
hadoop datanode -format
start-all.sh
hadoop fs -mkdir input
hadoop dfsadmin -safemode leave

第三步:给start.sh增加执行权限并启动hadoop伪分布式集群,代码如下:

chmod 777 /home/tanglg1987/start.sh
./start.sh 

执行过程如下:

第四步:上传本地文件到hdfs

在专利局http://data.nber.org/patents/网站下载专利数据

http://data.nber.org/patents/apat63_99.zip

hadoop fs -put /home/tanglg1987/cite75_99.txt input

五步:新建一个MyJob.java,代码如下:

package com.baison.action;
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.KeyValueTextInputFormat;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class MyJob extends Configured implements Tool {   public static class MapClass extends MapReduceBaseimplements Mapper<Text, Text, Text, Text> {        public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {                        output.collect(value, key);}}    public static class Reduce extends MapReduceBaseimplements Reducer<Text, Text, Text, Text> {public void reduce(Text key, Iterator<Text> values,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {                           String csv = "";while (values.hasNext()) {if (csv.length() > 0) csv += ",";csv += values.next().toString();}output.collect(key, new Text(csv));}}    public int run(String[] args) throws Exception {for (String string : args) {System.out.println(string);}Configuration conf = getConf();        JobConf job = new JobConf(conf, MyJob.class);        Path in = new Path(args[0]);Path out = new Path(args[1]);FileInputFormat.setInputPaths(job, in);FileOutputFormat.setOutputPath(job, out);        job.setJobName("MyJob");job.setMapperClass(MapClass.class);job.setReducerClass(Reduce.class);        job.setInputFormat(KeyValueTextInputFormat.class);job.setOutputFormat(TextOutputFormat.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(Text.class);job.set("key.value.separator.in.input.line", ",");       JobClient.runJob(job);       return 0;}public static void main(String[] args) throws Exception { String [] arg={"hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt","hdfs://localhost:9100/user/tanglg1987/output"};int res = ToolRunner.run(new Configuration(), new MyJob(), arg);       System.exit(res);}
}

第六步:Run On Hadoop,运行过程如下:

hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt
hdfs://localhost:9100/user/tanglg1987/output
12/10/17 21:16:14 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/10/17 21:16:14 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
12/10/17 21:16:14 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/17 21:16:14 INFO mapred.JobClient: Running job: job_local_0001
12/10/17 21:16:14 INFO mapred.FileInputFormat: Total input paths to process : 1
12/10/17 21:16:14 INFO mapred.MapTask: numReduceTasks: 1
12/10/17 21:16:14 INFO mapred.MapTask: io.sort.mb = 100
12/10/17 21:16:14 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/17 21:16:14 INFO mapred.MapTask: record buffer = 262144/327680
12/10/17 21:16:15 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:15 INFO mapred.MapTask: bufstart = 0; bufend = 4185926; bufvoid = 99614720
12/10/17 21:16:15 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680
12/10/17 21:16:15 INFO mapred.JobClient:  map 0% reduce 0%
12/10/17 21:16:17 INFO mapred.MapTask: Finished spill 0
12/10/17 21:16:17 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:17 INFO mapred.MapTask: bufstart = 4185926; bufend = 8372612; bufvoid = 99614720
12/10/17 21:16:17 INFO mapred.MapTask: kvstart = 262144; kvend = 196607; length = 327680
12/10/17 21:16:17 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:18 INFO mapred.JobClient:  map 13% reduce 0%
12/10/17 21:16:18 INFO mapred.MapTask: Finished spill 1
12/10/17 21:16:18 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:18 INFO mapred.MapTask: bufstart = 8372612; bufend = 12558934; bufvoid = 99614720
12/10/17 21:16:18 INFO mapred.MapTask: kvstart = 196607; kvend = 131070; length = 327680
12/10/17 21:16:19 INFO mapred.MapTask: Finished spill 2
12/10/17 21:16:19 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:19 INFO mapred.MapTask: bufstart = 12558934; bufend = 16745318; bufvoid = 99614720
12/10/17 21:16:19 INFO mapred.MapTask: kvstart = 131070; kvend = 65533; length = 327680
12/10/17 21:16:20 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:20 INFO mapred.MapTask: Finished spill 3
12/10/17 21:16:20 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:20 INFO mapred.MapTask: bufstart = 16745318; bufend = 20931889; bufvoid = 99614720
12/10/17 21:16:20 INFO mapred.MapTask: kvstart = 65533; kvend = 327677; length = 327680
12/10/17 21:16:21 INFO mapred.JobClient:  map 26% reduce 0%
12/10/17 21:16:21 INFO mapred.MapTask: Finished spill 4
12/10/17 21:16:22 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:22 INFO mapred.MapTask: bufstart = 20931889; bufend = 25118734; bufvoid = 99614720
12/10/17 21:16:22 INFO mapred.MapTask: kvstart = 327677; kvend = 262140; length = 327680
12/10/17 21:16:23 INFO mapred.MapTask: Finished spill 5
12/10/17 21:16:23 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:23 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:23 INFO mapred.MapTask: bufstart = 25118734; bufend = 29305269; bufvoid = 99614720
12/10/17 21:16:23 INFO mapred.MapTask: kvstart = 262140; kvend = 196603; length = 327680
12/10/17 21:16:24 INFO mapred.JobClient:  map 43% reduce 0%
12/10/17 21:16:24 INFO mapred.MapTask: Finished spill 6
12/10/17 21:16:24 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:24 INFO mapred.MapTask: bufstart = 29305269; bufend = 33492764; bufvoid = 99614720
12/10/17 21:16:24 INFO mapred.MapTask: kvstart = 196603; kvend = 131066; length = 327680
12/10/17 21:16:25 INFO mapred.MapTask: Finished spill 7
12/10/17 21:16:26 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:26 INFO mapred.MapTask: bufstart = 33492764; bufend = 37680391; bufvoid = 99614720
12/10/17 21:16:26 INFO mapred.MapTask: kvstart = 131066; kvend = 65529; length = 327680
12/10/17 21:16:26 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:27 INFO mapred.MapTask: Finished spill 8
12/10/17 21:16:27 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:27 INFO mapred.MapTask: bufstart = 37680391; bufend = 41868206; bufvoid = 99614720
12/10/17 21:16:27 INFO mapred.MapTask: kvstart = 65529; kvend = 327673; length = 327680
12/10/17 21:16:27 INFO mapred.JobClient:  map 57% reduce 0%
12/10/17 21:16:28 INFO mapred.MapTask: Finished spill 9
12/10/17 21:16:28 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:28 INFO mapred.MapTask: bufstart = 41868206; bufend = 46056257; bufvoid = 99614720
12/10/17 21:16:28 INFO mapred.MapTask: kvstart = 327673; kvend = 262136; length = 327680
12/10/17 21:16:29 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:29 INFO mapred.MapTask: Finished spill 10
12/10/17 21:16:29 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:29 INFO mapred.MapTask: bufstart = 46056257; bufend = 50244288; bufvoid = 99614720
12/10/17 21:16:29 INFO mapred.MapTask: kvstart = 262136; kvend = 196599; length = 327680
12/10/17 21:16:30 INFO mapred.JobClient:  map 70% reduce 0%
12/10/17 21:16:30 INFO mapred.MapTask: Finished spill 11
12/10/17 21:16:31 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:31 INFO mapred.MapTask: bufstart = 50244288; bufend = 54432271; bufvoid = 99614720
12/10/17 21:16:31 INFO mapred.MapTask: kvstart = 196599; kvend = 131062; length = 327680
12/10/17 21:16:32 INFO mapred.MapTask: Finished spill 12
12/10/17 21:16:32 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:32 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:32 INFO mapred.MapTask: bufstart = 54432271; bufend = 58620635; bufvoid = 99614720
12/10/17 21:16:32 INFO mapred.MapTask: kvstart = 131062; kvend = 65525; length = 327680
12/10/17 21:16:33 INFO mapred.JobClient:  map 87% reduce 0%
12/10/17 21:16:33 INFO mapred.MapTask: Finished spill 13
12/10/17 21:16:33 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:33 INFO mapred.MapTask: bufstart = 58620635; bufend = 62808941; bufvoid = 99614720
12/10/17 21:16:33 INFO mapred.MapTask: kvstart = 65525; kvend = 327669; length = 327680
12/10/17 21:16:34 INFO mapred.MapTask: Finished spill 14
12/10/17 21:16:35 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:35 INFO mapred.MapTask: bufstart = 62808941; bufend = 66997060; bufvoid = 99614720
12/10/17 21:16:35 INFO mapred.MapTask: kvstart = 327669; kvend = 262132; length = 327680
12/10/17 21:16:35 INFO mapred.MapTask: Starting flush of map output
12/10/17 21:16:35 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:36 INFO mapred.JobClient:  map 100% reduce 0%
12/10/17 21:16:36 INFO mapred.MapTask: Finished spill 15
12/10/17 21:16:36 INFO mapred.MapTask: Finished spill 16
12/10/17 21:16:36 INFO mapred.Merger: Merging 17 sorted segments
12/10/17 21:16:36 INFO mapred.Merger: Merging 8 intermediate segments out of a total of 17
12/10/17 21:16:38 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:41 INFO mapred.Merger: Down to the last merge-pass, with 10 segments left of total size: 75511476 bytes
12/10/17 21:16:41 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:44 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:47 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:48 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/10/17 21:16:48 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:0+67108864
12/10/17 21:16:48 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
12/10/17 21:16:48 INFO mapred.MapTask: numReduceTasks: 1
12/10/17 21:16:48 INFO mapred.MapTask: io.sort.mb = 100
12/10/17 21:16:48 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/17 21:16:48 INFO mapred.MapTask: record buffer = 262144/327680
12/10/17 21:16:48 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:48 INFO mapred.MapTask: bufstart = 0; bufend = 4188247; bufvoid = 99614720
12/10/17 21:16:48 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680
12/10/17 21:16:49 INFO mapred.MapTask: Finished spill 0
12/10/17 21:16:49 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:49 INFO mapred.MapTask: bufstart = 4188247; bufend = 8376832; bufvoid = 99614720
12/10/17 21:16:49 INFO mapred.MapTask: kvstart = 262144; kvend = 196607; length = 327680
12/10/17 21:16:50 INFO mapred.MapTask: Finished spill 1
12/10/17 21:16:51 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:51 INFO mapred.MapTask: bufstart = 8376832; bufend = 12565282; bufvoid = 99614720
12/10/17 21:16:51 INFO mapred.MapTask: kvstart = 196607; kvend = 131070; length = 327680
12/10/17 21:16:51 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:16:51 INFO mapred.JobClient:  map 59% reduce 0%
12/10/17 21:16:52 INFO mapred.MapTask: Finished spill 2
12/10/17 21:16:52 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:52 INFO mapred.MapTask: bufstart = 12565282; bufend = 16754414; bufvoid = 99614720
12/10/17 21:16:52 INFO mapred.MapTask: kvstart = 131070; kvend = 65533; length = 327680
12/10/17 21:16:53 INFO mapred.MapTask: Finished spill 3
12/10/17 21:16:54 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:54 INFO mapred.MapTask: bufstart = 16754414; bufend = 20943273; bufvoid = 99614720
12/10/17 21:16:54 INFO mapred.MapTask: kvstart = 65533; kvend = 327677; length = 327680
12/10/17 21:16:54 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:16:54 INFO mapred.JobClient:  map 65% reduce 0%
12/10/17 21:16:55 INFO mapred.MapTask: Finished spill 4
12/10/17 21:16:55 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:55 INFO mapred.MapTask: bufstart = 20943273; bufend = 25132396; bufvoid = 99614720
12/10/17 21:16:55 INFO mapred.MapTask: kvstart = 327677; kvend = 262140; length = 327680
12/10/17 21:16:57 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:16:57 INFO mapred.JobClient:  map 69% reduce 0%
12/10/17 21:16:58 INFO mapred.MapTask: Finished spill 5
12/10/17 21:16:59 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:16:59 INFO mapred.MapTask: bufstart = 25132396; bufend = 29321431; bufvoid = 99614720
12/10/17 21:16:59 INFO mapred.MapTask: kvstart = 262140; kvend = 196603; length = 327680
12/10/17 21:17:00 INFO mapred.MapTask: Finished spill 6
12/10/17 21:17:00 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:00 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:00 INFO mapred.MapTask: bufstart = 29321431; bufend = 33510738; bufvoid = 99614720
12/10/17 21:17:00 INFO mapred.MapTask: kvstart = 196603; kvend = 131066; length = 327680
12/10/17 21:17:00 INFO mapred.JobClient:  map 73% reduce 0%
12/10/17 21:17:01 INFO mapred.MapTask: Finished spill 7
12/10/17 21:17:01 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:01 INFO mapred.MapTask: bufstart = 33510738; bufend = 37700146; bufvoid = 99614720
12/10/17 21:17:01 INFO mapred.MapTask: kvstart = 131066; kvend = 65529; length = 327680
12/10/17 21:17:02 INFO mapred.MapTask: Finished spill 8
12/10/17 21:17:02 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:02 INFO mapred.MapTask: bufstart = 37700146; bufend = 41889795; bufvoid = 99614720
12/10/17 21:17:02 INFO mapred.MapTask: kvstart = 65529; kvend = 327673; length = 327680
12/10/17 21:17:03 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:03 INFO mapred.JobClient:  map 81% reduce 0%
12/10/17 21:17:03 INFO mapred.MapTask: Finished spill 9
12/10/17 21:17:04 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:04 INFO mapred.MapTask: bufstart = 41889795; bufend = 46079341; bufvoid = 99614720
12/10/17 21:17:04 INFO mapred.MapTask: kvstart = 327673; kvend = 262136; length = 327680
12/10/17 21:17:05 INFO mapred.MapTask: Finished spill 10
12/10/17 21:17:05 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:05 INFO mapred.MapTask: bufstart = 46079341; bufend = 50269061; bufvoid = 99614720
12/10/17 21:17:05 INFO mapred.MapTask: kvstart = 262136; kvend = 196599; length = 327680
12/10/17 21:17:06 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:06 INFO mapred.MapTask: Finished spill 11
12/10/17 21:17:06 INFO mapred.JobClient:  map 88% reduce 0%
12/10/17 21:17:06 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:06 INFO mapred.MapTask: bufstart = 50269061; bufend = 54458860; bufvoid = 99614720
12/10/17 21:17:06 INFO mapred.MapTask: kvstart = 196599; kvend = 131062; length = 327680
12/10/17 21:17:07 INFO mapred.MapTask: Finished spill 12
12/10/17 21:17:08 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:08 INFO mapred.MapTask: bufstart = 54458860; bufend = 58648475; bufvoid = 99614720
12/10/17 21:17:08 INFO mapred.MapTask: kvstart = 131062; kvend = 65525; length = 327680
12/10/17 21:17:09 INFO mapred.MapTask: Finished spill 13
12/10/17 21:17:09 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:09 INFO mapred.JobClient:  map 95% reduce 0%
12/10/17 21:17:09 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:09 INFO mapred.MapTask: bufstart = 58648475; bufend = 62838648; bufvoid = 99614720
12/10/17 21:17:09 INFO mapred.MapTask: kvstart = 65525; kvend = 327669; length = 327680
12/10/17 21:17:10 INFO mapred.MapTask: Finished spill 14
12/10/17 21:17:10 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:10 INFO mapred.MapTask: bufstart = 62838648; bufend = 67028992; bufvoid = 99614720
12/10/17 21:17:10 INFO mapred.MapTask: kvstart = 327669; kvend = 262132; length = 327680
12/10/17 21:17:10 INFO mapred.MapTask: Starting flush of map output
12/10/17 21:17:11 INFO mapred.MapTask: Finished spill 15
12/10/17 21:17:11 INFO mapred.MapTask: Finished spill 16
12/10/17 21:17:11 INFO mapred.Merger: Merging 17 sorted segments
12/10/17 21:17:11 INFO mapred.Merger: Merging 8 intermediate segments out of a total of 17
12/10/17 21:17:12 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:12 INFO mapred.JobClient:  map 100% reduce 0%
12/10/17 21:17:15 INFO mapred.Merger: Down to the last merge-pass, with 10 segments left of total size: 75507463 bytes
12/10/17 21:17:15 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:18 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:21 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:23 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/10/17 21:17:23 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:67108864+67108864
12/10/17 21:17:23 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
12/10/17 21:17:23 INFO mapred.MapTask: numReduceTasks: 1
12/10/17 21:17:23 INFO mapred.MapTask: io.sort.mb = 100
12/10/17 21:17:23 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/17 21:17:23 INFO mapred.MapTask: record buffer = 262144/327680
12/10/17 21:17:23 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:23 INFO mapred.MapTask: bufstart = 0; bufend = 4190341; bufvoid = 99614720
12/10/17 21:17:23 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680
12/10/17 21:17:24 INFO mapred.MapTask: Finished spill 0
12/10/17 21:17:24 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:24 INFO mapred.MapTask: bufstart = 4190341; bufend = 8380745; bufvoid = 99614720
12/10/17 21:17:24 INFO mapred.MapTask: kvstart = 262144; kvend = 196607; length = 327680
12/10/17 21:17:25 INFO mapred.MapTask: Finished spill 1
12/10/17 21:17:25 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:25 INFO mapred.MapTask: bufstart = 8380745; bufend = 12571307; bufvoid = 99614720
12/10/17 21:17:25 INFO mapred.MapTask: kvstart = 196607; kvend = 131070; length = 327680
12/10/17 21:17:26 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:26 INFO mapred.MapTask: Finished spill 2
12/10/17 21:17:26 INFO mapred.JobClient:  map 73% reduce 0%
12/10/17 21:17:26 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:26 INFO mapred.MapTask: bufstart = 12571307; bufend = 16762033; bufvoid = 99614720
12/10/17 21:17:26 INFO mapred.MapTask: kvstart = 131070; kvend = 65533; length = 327680
12/10/17 21:17:27 INFO mapred.MapTask: Finished spill 3
12/10/17 21:17:27 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:27 INFO mapred.MapTask: bufstart = 16762033; bufend = 20952800; bufvoid = 99614720
12/10/17 21:17:27 INFO mapred.MapTask: kvstart = 65533; kvend = 327677; length = 327680
12/10/17 21:17:28 INFO mapred.MapTask: Finished spill 4
12/10/17 21:17:29 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:29 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:29 INFO mapred.MapTask: bufstart = 20952800; bufend = 25143412; bufvoid = 99614720
12/10/17 21:17:29 INFO mapred.MapTask: kvstart = 327677; kvend = 262140; length = 327680
12/10/17 21:17:29 INFO mapred.JobClient:  map 79% reduce 0%
12/10/17 21:17:30 INFO mapred.MapTask: Finished spill 5
12/10/17 21:17:30 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:30 INFO mapred.MapTask: bufstart = 25143412; bufend = 29334124; bufvoid = 99614720
12/10/17 21:17:30 INFO mapred.MapTask: kvstart = 262140; kvend = 196603; length = 327680
12/10/17 21:17:31 INFO mapred.MapTask: Finished spill 6
12/10/17 21:17:31 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:31 INFO mapred.MapTask: bufstart = 29334124; bufend = 33524778; bufvoid = 99614720
12/10/17 21:17:31 INFO mapred.MapTask: kvstart = 196603; kvend = 131066; length = 327680
12/10/17 21:17:32 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:32 INFO mapred.JobClient:  map 83% reduce 0%
12/10/17 21:17:32 INFO mapred.MapTask: Finished spill 7
12/10/17 21:17:33 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:33 INFO mapred.MapTask: bufstart = 33524778; bufend = 37715716; bufvoid = 99614720
12/10/17 21:17:33 INFO mapred.MapTask: kvstart = 131066; kvend = 65529; length = 327680
12/10/17 21:17:34 INFO mapred.MapTask: Finished spill 8
12/10/17 21:17:34 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:34 INFO mapred.MapTask: bufstart = 37715716; bufend = 41906544; bufvoid = 99614720
12/10/17 21:17:34 INFO mapred.MapTask: kvstart = 65529; kvend = 327673; length = 327680
12/10/17 21:17:35 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:35 INFO mapred.MapTask: Finished spill 9
12/10/17 21:17:35 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:35 INFO mapred.MapTask: bufstart = 41906544; bufend = 46097751; bufvoid = 99614720
12/10/17 21:17:35 INFO mapred.MapTask: kvstart = 327673; kvend = 262136; length = 327680
12/10/17 21:17:35 INFO mapred.JobClient:  map 88% reduce 0%
12/10/17 21:17:36 INFO mapred.MapTask: Finished spill 10
12/10/17 21:17:36 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:36 INFO mapred.MapTask: bufstart = 46097751; bufend = 50288897; bufvoid = 99614720
12/10/17 21:17:36 INFO mapred.MapTask: kvstart = 262136; kvend = 196599; length = 327680
12/10/17 21:17:37 INFO mapred.MapTask: Finished spill 11
12/10/17 21:17:37 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:37 INFO mapred.MapTask: bufstart = 50288897; bufend = 54480358; bufvoid = 99614720
12/10/17 21:17:37 INFO mapred.MapTask: kvstart = 196599; kvend = 131062; length = 327680
12/10/17 21:17:38 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:38 INFO mapred.MapTask: Finished spill 12
12/10/17 21:17:38 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:38 INFO mapred.MapTask: bufstart = 54480358; bufend = 58671642; bufvoid = 99614720
12/10/17 21:17:38 INFO mapred.MapTask: kvstart = 131062; kvend = 65525; length = 327680
12/10/17 21:17:38 INFO mapred.JobClient:  map 94% reduce 0%
12/10/17 21:17:39 INFO mapred.MapTask: Finished spill 13
12/10/17 21:17:39 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:39 INFO mapred.MapTask: bufstart = 58671642; bufend = 62863227; bufvoid = 99614720
12/10/17 21:17:39 INFO mapred.MapTask: kvstart = 65525; kvend = 327669; length = 327680
12/10/17 21:17:40 INFO mapred.MapTask: Finished spill 14
12/10/17 21:17:40 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:40 INFO mapred.MapTask: bufstart = 62863227; bufend = 67054803; bufvoid = 99614720
12/10/17 21:17:40 INFO mapred.MapTask: kvstart = 327669; kvend = 262132; length = 327680
12/10/17 21:17:40 INFO mapred.MapTask: Starting flush of map output
12/10/17 21:17:41 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:41 INFO mapred.MapTask: Finished spill 15
12/10/17 21:17:41 INFO mapred.MapTask: Finished spill 16
12/10/17 21:17:41 INFO mapred.Merger: Merging 17 sorted segments
12/10/17 21:17:41 INFO mapred.Merger: Merging 8 intermediate segments out of a total of 17
12/10/17 21:17:41 INFO mapred.JobClient:  map 100% reduce 0%
12/10/17 21:17:44 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:44 INFO mapred.Merger: Down to the last merge-pass, with 10 segments left of total size: 75504227 bytes
12/10/17 21:17:47 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:50 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:51 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000002_0 is done. And is in the process of commiting
12/10/17 21:17:51 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:134217728+67108864
12/10/17 21:17:51 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done.
12/10/17 21:17:51 INFO mapred.MapTask: numReduceTasks: 1
12/10/17 21:17:51 INFO mapred.MapTask: io.sort.mb = 100
12/10/17 21:17:51 INFO mapred.MapTask: data buffer = 79691776/99614720
12/10/17 21:17:51 INFO mapred.MapTask: record buffer = 262144/327680
12/10/17 21:17:51 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:51 INFO mapred.MapTask: bufstart = 0; bufend = 4191578; bufvoid = 99614720
12/10/17 21:17:51 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680
12/10/17 21:17:52 INFO mapred.MapTask: Finished spill 0
12/10/17 21:17:53 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:53 INFO mapred.MapTask: bufstart = 4191578; bufend = 8383078; bufvoid = 99614720
12/10/17 21:17:53 INFO mapred.MapTask: kvstart = 262144; kvend = 196607; length = 327680
12/10/17 21:17:53 INFO mapred.MapTask: Finished spill 1
12/10/17 21:17:54 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:54 INFO mapred.MapTask: bufstart = 8383078; bufend = 12574490; bufvoid = 99614720
12/10/17 21:17:54 INFO mapred.MapTask: kvstart = 196607; kvend = 131070; length = 327680
12/10/17 21:17:54 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:17:54 INFO mapred.JobClient:  map 80% reduce 0%
12/10/17 21:17:54 INFO mapred.MapTask: Finished spill 2
12/10/17 21:17:55 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:55 INFO mapred.MapTask: bufstart = 12574490; bufend = 16766127; bufvoid = 99614720
12/10/17 21:17:55 INFO mapred.MapTask: kvstart = 131070; kvend = 65533; length = 327680
12/10/17 21:17:55 INFO mapred.MapTask: Finished spill 3
12/10/17 21:17:56 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:56 INFO mapred.MapTask: bufstart = 16766127; bufend = 20958217; bufvoid = 99614720
12/10/17 21:17:56 INFO mapred.MapTask: kvstart = 65533; kvend = 327677; length = 327680
12/10/17 21:17:57 INFO mapred.MapTask: Finished spill 4
12/10/17 21:17:57 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:57 INFO mapred.MapTask: bufstart = 20958217; bufend = 25150065; bufvoid = 99614720
12/10/17 21:17:57 INFO mapred.MapTask: kvstart = 327677; kvend = 262140; length = 327680
12/10/17 21:17:57 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:17:57 INFO mapred.JobClient:  map 85% reduce 0%
12/10/17 21:17:58 INFO mapred.MapTask: Finished spill 5
12/10/17 21:17:58 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:58 INFO mapred.MapTask: bufstart = 25150065; bufend = 29341612; bufvoid = 99614720
12/10/17 21:17:58 INFO mapred.MapTask: kvstart = 262140; kvend = 196603; length = 327680
12/10/17 21:17:59 INFO mapred.MapTask: Finished spill 6
12/10/17 21:17:59 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:17:59 INFO mapred.MapTask: bufstart = 29341612; bufend = 33533322; bufvoid = 99614720
12/10/17 21:17:59 INFO mapred.MapTask: kvstart = 196603; kvend = 131066; length = 327680
12/10/17 21:18:00 INFO mapred.MapTask: Finished spill 7
12/10/17 21:18:00 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:18:00 INFO mapred.MapTask: bufstart = 33533322; bufend = 37725421; bufvoid = 99614720
12/10/17 21:18:00 INFO mapred.MapTask: kvstart = 131066; kvend = 65529; length = 327680
12/10/17 21:18:00 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:18:00 INFO mapred.JobClient:  map 90% reduce 0%
12/10/17 21:18:01 INFO mapred.MapTask: Finished spill 8
12/10/17 21:18:01 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:18:01 INFO mapred.MapTask: bufstart = 37725421; bufend = 41917251; bufvoid = 99614720
12/10/17 21:18:01 INFO mapred.MapTask: kvstart = 65529; kvend = 327673; length = 327680
12/10/17 21:18:02 INFO mapred.MapTask: Finished spill 9
12/10/17 21:18:02 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:18:02 INFO mapred.MapTask: bufstart = 41917251; bufend = 46108919; bufvoid = 99614720
12/10/17 21:18:02 INFO mapred.MapTask: kvstart = 327673; kvend = 262136; length = 327680
12/10/17 21:18:03 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:18:03 INFO mapred.JobClient:  map 93% reduce 0%
12/10/17 21:18:03 INFO mapred.MapTask: Finished spill 10
12/10/17 21:18:03 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:18:03 INFO mapred.MapTask: bufstart = 46108919; bufend = 50300567; bufvoid = 99614720
12/10/17 21:18:03 INFO mapred.MapTask: kvstart = 262136; kvend = 196599; length = 327680
12/10/17 21:18:04 INFO mapred.MapTask: Finished spill 11
12/10/17 21:18:05 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:18:05 INFO mapred.MapTask: bufstart = 50300567; bufend = 54492159; bufvoid = 99614720
12/10/17 21:18:05 INFO mapred.MapTask: kvstart = 196599; kvend = 131062; length = 327680
12/10/17 21:18:06 INFO mapred.MapTask: Finished spill 12
12/10/17 21:18:06 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:18:06 INFO mapred.JobClient:  map 97% reduce 0%
12/10/17 21:18:06 INFO mapred.MapTask: Spilling map output: record full = true
12/10/17 21:18:06 INFO mapred.MapTask: bufstart = 54492159; bufend = 58684086; bufvoid = 99614720
12/10/17 21:18:06 INFO mapred.MapTask: kvstart = 131062; kvend = 65525; length = 327680
12/10/17 21:18:07 INFO mapred.MapTask: Finished spill 13
12/10/17 21:18:07 INFO mapred.MapTask: Starting flush of map output
12/10/17 21:18:08 INFO mapred.MapTask: Finished spill 14
12/10/17 21:18:08 INFO mapred.Merger: Merging 15 sorted segments
12/10/17 21:18:08 INFO mapred.Merger: Merging 6 intermediate segments out of a total of 15
12/10/17 21:18:09 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:18:09 INFO mapred.JobClient:  map 100% reduce 0%
12/10/17 21:18:11 INFO mapred.Merger: Down to the last merge-pass, with 10 segments left of total size: 70597223 bytes
12/10/17 21:18:12 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:18:15 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:18:17 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000003_0 is done. And is in the process of commiting
12/10/17 21:18:17 INFO mapred.LocalJobRunner: hdfs://localhost:9100/user/tanglg1987/input/cite75_99.txt:201326592+62748839
12/10/17 21:18:17 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000003_0' done.
12/10/17 21:18:17 INFO mapred.LocalJobRunner:
12/10/17 21:18:17 INFO mapred.Merger: Merging 4 sorted segments
12/10/17 21:18:17 INFO mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 297120317 bytes
12/10/17 21:18:17 INFO mapred.LocalJobRunner:
12/10/17 21:18:23 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:24 INFO mapred.JobClient:  map 100% reduce 71%
12/10/17 21:18:26 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:27 INFO mapred.JobClient:  map 100% reduce 74%
12/10/17 21:18:29 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:30 INFO mapred.JobClient:  map 100% reduce 77%
12/10/17 21:18:32 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:33 INFO mapred.JobClient:  map 100% reduce 80%
12/10/17 21:18:35 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:36 INFO mapred.JobClient:  map 100% reduce 83%
12/10/17 21:18:38 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:39 INFO mapred.JobClient:  map 100% reduce 86%
12/10/17 21:18:41 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:42 INFO mapred.JobClient:  map 100% reduce 89%
12/10/17 21:18:44 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:45 INFO mapred.JobClient:  map 100% reduce 91%
12/10/17 21:18:47 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:48 INFO mapred.JobClient:  map 100% reduce 94%
12/10/17 21:18:50 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:51 INFO mapred.JobClient:  map 100% reduce 97%
12/10/17 21:18:53 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/10/17 21:18:53 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:53 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/10/17 21:18:53 INFO mapred.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://localhost:9100/user/tanglg1987/output
12/10/17 21:18:53 INFO mapred.LocalJobRunner: reduce > reduce
12/10/17 21:18:53 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
12/10/17 21:18:54 INFO mapred.JobClient:  map 100% reduce 100%
12/10/17 21:18:54 INFO mapred.JobClient: Job complete: job_local_0001
12/10/17 21:18:54 INFO mapred.JobClient: Counters: 15
12/10/17 21:18:54 INFO mapred.JobClient:   FileSystemCounters
12/10/17 21:18:54 INFO mapred.JobClient:     FILE_BYTES_READ=1853504370
12/10/17 21:18:54 INFO mapred.JobClient:     HDFS_BYTES_READ=930853207
12/10/17 21:18:54 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=2603767487
12/10/17 21:18:54 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=158078539
12/10/17 21:18:54 INFO mapred.JobClient:   Map-Reduce Framework
12/10/17 21:18:54 INFO mapred.JobClient:     Reduce input groups=3258984
12/10/17 21:18:54 INFO mapred.JobClient:     Combine output records=0
12/10/17 21:18:54 INFO mapred.JobClient:     Map input records=16522439
12/10/17 21:18:54 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/10/17 21:18:54 INFO mapred.JobClient:     Reduce output records=3258984
12/10/17 21:18:54 INFO mapred.JobClient:     Spilled Records=57431615
12/10/17 21:18:54 INFO mapred.JobClient:     Map output bytes=264075431
12/10/17 21:18:54 INFO mapred.JobClient:     Map input bytes=264075431
12/10/17 21:18:54 INFO mapred.JobClient:     Combine input records=0
12/10/17 21:18:54 INFO mapred.JobClient:     Map output records=16522439
12/10/17 21:18:54 INFO mapred.JobClient:     Reduce input records=16522439
第七步:查看结果集,运行结果如下:

Hadoop 实战之分析专利引用数据集(一)相关推荐

  1. Hadoop 实战之分析专利引用数据集(三)

    大家好,今天我们在Hadoop 实战之分析专利引用数据集(一)的基础上来实现计算专利被引用的次数 许多外行人认为统计学就是数量统计,并且许多基本的Hadoop Job就是用于统计数量的.我们已经在第一 ...

  2. Hadoop之MapReduce程序应用一读取专利引用数据集并对它进行倒排

    摘要:MapReduce程序处理专利数据集. 关键词:MapReduce程序   专利数据集 数据源:专利引用数据集cite75_99.txt.(该数据集可以从网址http://www.nber.or ...

  3. Hadoop实战系列之MapReduce 分析 Youtube视频数据

    Hadoop实战系列之MapReduce 分析 Youtube视频数据 一.实战介绍 MapReduce 是 Hadoop 的计算框架. 在运行一个 MR 程序时,任务过程被分为两个阶段:Map 阶段 ...

  4. R语言使用survminer包生存分析及可视化(ggsurvplot)实战详解:从数据集导入、生存对象生成、ggsurvplot可视化参数配置、设置、可视化对比

    R语言使用survminer包生存分析及可视化(ggsurvplot)实战详解:从数据集导入.生存对象生成.ggsurvplot可视化参数配置.设置.可视化对比 目录 R语言使用survminer包生 ...

  5. python写的hadoop实战_hadoop实战 pdf

    hadoop实战手册为3个部分,深入浅出地介绍了Hadoop 框架.编写和运行Hadoop 数据处理程序所需的实践技能及Hadoop之外更大的生态系统.本书适合需要处理大量离线数据的云计算程序员.架构 ...

  6. java hadhoop如门pdf_《Hadoop实战》PDF 下载

    <Hadoop实战>PDF 下载 转载自:https://download.csdn.net/download/xieze9994/10860623 下载地址: 版权归出版社和原作者所有, ...

  7. 【大数据Hadoop实战篇】

    大数据Hadoop实战篇 第1章 Hadoop概述 1.1 Hadoop是什么 1.2 Hadoop发展历史(了解) 1.3 Hadoop三大发行版本(了解) 1.4 Hadoop优势(4高) 1.5 ...

  8. 特别推荐 | 专利引用数据,可以用来做哪些研究?

    一.引言 最近有不少学界的朋友向我们询问关于专利引用数据的情况.看来学界已经不能再满足简单统计企业各类专利申请.授权的数量,而希望以更丰富的专利信息为切入点对创新展开更为深入的研究. 文献检索发现,目 ...

  9. 使用Python分析姿态估计数据集COCO的教程

    点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 本文转自:AI算法与图像处理 当我们训练姿势估计模型,比较常用的数 ...

最新文章

  1. ucontext-人人都可以实现的简单协程库
  2. CSS的子选择器与后代选择器的区别
  3. Java多线程与并发系列从0到1全部合集,强烈建议收藏!
  4. Linux查看负载相关命令
  5. display内联属性
  6. 嵌入式Linux驱动学习之路(二)u-boot体验
  7. hp ux安装oracle 11g,HP UX安装oracle 11g asm扫不到盘
  8. js模拟鼠标自动滑动滑块--dispatchEvent
  9. linux 增量升级包,应用增量Linux补丁
  10. Java实现第三方短信接口发送短信验证码
  11. Unity3D好用Unity模型场景素材和Unity资源大合集
  12. 卡首屏源码,淘口令接口,生成淘口令
  13. 3分钟看懂工业交换机EMS试验和高低温测试
  14. android截屏图片大小,Android截屏及图片解析
  15. 四月一个晴朗的早晨,遇见一个百分之百的女孩
  16. 英语发音规则---ai字母组合发音
  17. MapReduce之幺半群
  18. 知识工程重点知识介绍-1
  19. KTV 歌房如何实现伴奏与人声同步功能
  20. 百面机器学习01-特征工程

热门文章

  1. iOS小技能:SKU视图搭建
  2. scrapy的spider中为什么使用yield
  3. 超低功耗离线智能语音识别芯片AT6811
  4. 手机最好的html5浏览器,综合能力的较量 8大手机浏览器半年横评
  5. 0.1.3-01 合宙CORE-ESP32-C3制作1.3寸ST7789驱动的简单相册
  6. 推荐一些实用的谷歌浏览器翻译插件
  7. Do You Kown Asp.Net Core -- Asp.Net Core 2.0 未来web开发新趋势 Razor Page
  8. 【数据结构】-哈夫曼树以及哈夫曼编码
  9. hbase/hadoop异常:No lease on /hbase/archive/data/... File is not open for writing
  10. mysql误删除表后无法重建