通过hadoop + hive搭建离线式的分析系统之快速搭建一览
最近有个需求,需要整合所有店铺的数据做一个离线式分析系统,曾经都是按照店铺分库分表来给各自商家通过highchart多维度展示自家的店铺经营
状况,我们知道这是一个以店铺为维度的切分,非常适合目前的在线业务,这回老板提需求了,曾经也是一位数据分析师,sql自然就溜溜的,所以就来了
一个以买家维度展示用户画像,从而更好的做数据推送和用户行为分析,因为是离线式分析,目前还没研究spark,impala,drill了。
一:搭建hadoop集群
hadoop的搭建是一个比较繁琐的过程,采用3台Centos,废话不过多,一图胜千言。。。
二: 基础配置
1. 关闭防火墙
[root@localhost ~]# systemctl stop firewalld.service #关闭防火墙 [root@localhost ~]# systemctl disable firewalld.service #禁止开机启动 [root@localhost ~]# firewall-cmd --state #查看防火墙状态 not running [root@localhost ~]#
2. 配置SSH免登录
不管在开启还是关闭hadoop的时候,hadoop内部都要通过ssh进行通讯,所以需要配置一个ssh公钥免登陆,做法就是将一个centos的公钥copy到另一
台centos的authorized_keys文件中。
<1>: 在196上生成公钥私钥 ,从下图中可以看到通过ssh-keygen之后会生成 id_rsa 和 id_rsa.pub 两个文件,这里我们
关心的是公钥id_rsa.pub。
[root@localhost ~]# ssh-keygen -t rsa -P '' Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Created directory '/root/.ssh'. Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: 40:72:cc:f4:c3:e7:15:c9:9f:ee:f8:48:ec:22:be:a1 root@localhost.localdomain The key's randomart image is: +--[ RSA 2048]----+ | .++ ... | | +oo o. | | . + . .. . | | . + . o | | S . . | | . . | | . oo | | ....o... | | E.oo .o.. | +-----------------+ [root@localhost ~]# ls /root/.ssh/id_rsa /root/.ssh/id_rsa [root@localhost ~]# ls /root/.ssh id_rsa id_rsa.pub
<2> 通过scp复制命令 将公钥copy到 146 和 150主机,以及将id_ras.pub 追加到本机中
[root@master ~]# scp /root/.ssh/id_rsa.pub root@192.168.23.146:/root/.ssh/authorized_keys root@192.168.23.146's password: id_rsa.pub 100% 408 0.4KB/s 00:00 [root@master ~]# scp /root/.ssh/id_rsa.pub root@192.168.23.150:/root/.ssh/authorized_keys root@192.168.23.150's password: id_rsa.pub 100% 408 0.4KB/s 00:00 [root@master ~]# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
<3> 做host映射,主要给几台机器做别名映射,方便管理。
[root@master ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.23.196 master 192.168.23.150 slave1 192.168.23.146 slave2 [root@master ~]#
<4> java安装环境
hadoop是java写的,所以需要安装java环境,具体怎么安装,大家可以网上搜一下,先把centos自带的openjdk卸载掉,最后在profile中配置一下。
[root@master ~]# cat /etc/profile # /etc/profile# System wide environment and startup programs, for login setup # Functions and aliases go in /etc/bashrc# It's NOT a good idea to change this file unless you know what you # are doing. It's much better to create a custom.sh shell script in # /etc/profile.d/ to make custom changes to your environment, as this # will prevent the need for merging in future updates.pathmunge () {case ":${PATH}:" in*:"$1":*);;*)if [ "$2" = "after" ] ; thenPATH=$PATH:$1elsePATH=$1:$PATHfiesac }if [ -x /usr/bin/id ]; thenif [ -z "$EUID" ]; then# ksh workaroundEUID=`id -u`UID=`id -ru`fiUSER="`id -un`"LOGNAME=$USERMAIL="/var/spool/mail/$USER" fi# Path manipulation if [ "$EUID" = "0" ]; thenpathmunge /usr/sbinpathmunge /usr/local/sbin elsepathmunge /usr/local/sbin afterpathmunge /usr/sbin after fiHOSTNAME=`/usr/bin/hostname 2>/dev/null` HISTSIZE=1000 if [ "$HISTCONTROL" = "ignorespace" ] ; thenexport HISTCONTROL=ignoreboth elseexport HISTCONTROL=ignoredups fiexport PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL# By default, we want umask to get set. This sets it for login shell # Current threshold for system reserved uid/gids is 200 # You could check uidgid reservation validity in # /usr/share/doc/setup-*/uidgid file if [ $UID -gt 199 ] && [ "`id -gn`" = "`id -un`" ]; thenumask 002 elseumask 022 fifor i in /etc/profile.d/*.sh ; doif [ -r "$i" ]; thenif [ "${-#*i}" != "$-" ]; then . "$i"else. "$i" >/dev/nullfifi doneunset i unset -f pathmungeexport JAVA_HOME=/usr/big/jdk1.8 export HADOOP_HOME=/usr/big/hadoop export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH[root@master ~]#
二: hadoop安装包
1. 大家可以到官网上找一下安装链接:http://hadoop.apache.org/releases.html, 我这里选择的是最新版的2.9.0,binary安装。
2. 然后就是一路命令安装【看清楚目录哦。。。没有的话自己mkdir】
[root@localhost big]# pwd /usr/big [root@localhost big]# ls hadoop-2.9.0 hadoop-2.9.0.tar.gz [root@localhost big]# tar -xvzf hadoop-2.9.0.tar.gz
3. 对core-site.xml ,hdfs-site.xml,mapred-site.xml,yarn-site.xml,slaves,hadoop-env.sh的配置,路径都在etc目录下,
这也是最麻烦的。。。
[root@master hadoop]# pwd /usr/big/hadoop/etc/hadoop [root@master hadoop]# ls capacity-scheduler.xml hadoop-policy.xml kms-log4j.properties slaves configuration.xsl hdfs-site.xml kms-site.xml ssl-client.xml.example container-executor.cfg httpfs-env.sh log4j.properties ssl-server.xml.example core-site.xml httpfs-log4j.properties mapred-env.cmd yarn-env.cmd hadoop-env.cmd httpfs-signature.secret mapred-env.sh yarn-env.sh hadoop-env.sh httpfs-site.xml mapred-queues.xml.template yarn-site.xml hadoop-metrics2.properties kms-acls.xml mapred-site.xml hadoop-metrics.properties kms-env.sh mapred-site.xml.template [root@master hadoop]#
<1> core-site.xml 下的配置中,我指定了hadoop的基地址,namenode的端口号,namenode的地址。
<configuration><property><name>hadoop.tmp.dir</name><value>/usr/myapp/hadoop/data</value><description>A base for other temporary directories.</description></property><property><name>fs.defaultFS</name><value>hdfs://master:9000</value></property> </configuration>
<2> hdfs-site.xml 这个文件主要用来配置datanode以及datanode的副本。
<configuration><property><name>dfs.replication</name><value>1</value></property> </configuration>
3. mapred-site.xml 这里配置一下启用yarn框架
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property> </configuration>
4. yarn-site.xml文件配置
<configuration><!-- Site specific YARN configuration properties --> <property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value> </property> <property><name>yarn.resourcemanager.address</name><value>master:8032</value> </property> <property><name>yarn.resourcemanager.scheduler.address</name><value>master:8030</value> </property> <property><name>yarn.resourcemanager.resource-tracker.address</name><value>master:8031</value> </property> </configuration>
5. 在etc的slaves文件中,追加我们在host中配置的salve1和slave2,这样启动的时候,hadoop才能知道slave的位置。
[root@master hadoop]# cat slaves slave1 slave2 [root@master hadoop]# pwd /usr/big/hadoop/etc/hadoop [root@master hadoop]#
6. 在hadoop-env.sh中配置java的路径,其实就是把 /etc/profile的配置copy一下,追加到文件末尾。
[root@master hadoop]# vim hadoop-env.sh export JAVA_HOME=/usr/big/jdk1.8
不过这里还有一个坑,hadoop在计算时,默认的heap-size是512M,这就容易导致在大数据计算时,堆栈溢出,这里将512改成2048。
export HADOOP_NFS3_OPTS="$HADOOP_NFS3_OPTS" export HADOOP_PORTMAP_OPTS="-Xmx2048m $HADOOP_PORTMAP_OPTS"# The following applies to multiple commands (fs, dfs, fsck, distcp etc) export HADOOP_CLIENT_OPTS="$HADOOP_CLIENT_OPTS" # set heap args when HADOOP_HEAPSIZE is empty if [ "$HADOOP_HEAPSIZE" = "" ]; thenexport HADOOP_CLIENT_OPTS="-Xmx2048m $HADOOP_CLIENT_OPTS" fi
7. 不要忘了在/usr目录下创建文件夹哦,然后在/etc/profile中配置hadoop的路径。
/usr/hadoop
/usr/hadoop/namenode
/usr/hadoop/datanode
export JAVA_HOME=/usr/big/jdk1.8 export HADOOP_HOME=/usr/big/hadoop export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH
8. 将196上配置好的整个hadoop文件夹通过scp到 146 和150 服务器上的/usr/big目录下,后期大家也可以通过svn进行hadoop文件夹的
管理,这样比较方便。
scp -r /usr/big/hadoop root@192.168.23.146:/usr/big scp -r /usr/big/hadoop root@192.168.23.150:/usr/big
三:启动hadoop
1. 启动之前通过hadoop namede -format 格式化一下hadoop dfs。
[root@master hadoop]# hadoop namenode -format DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it.17/11/24 20:13:19 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = master/192.168.23.196 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.9.0
2. 在master机器上start-all.sh 启动hadoop集群。
[root@master hadoop]# start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh Starting namenodes on [master] root@master's password: master: starting namenode, logging to /usr/big/hadoop/logs/hadoop-root-namenode-master.out slave1: starting datanode, logging to /usr/big/hadoop/logs/hadoop-root-datanode-slave1.out slave2: starting datanode, logging to /usr/big/hadoop/logs/hadoop-root-datanode-slave2.out Starting secondary namenodes [0.0.0.0] root@0.0.0.0's password: 0.0.0.0: starting secondarynamenode, logging to /usr/big/hadoop/logs/hadoop-root-secondarynamenode-master.out starting yarn daemons starting resourcemanager, logging to /usr/big/hadoop/logs/yarn-root-resourcemanager-master.out slave1: starting nodemanager, logging to /usr/big/hadoop/logs/yarn-root-nodemanager-slave1.out slave2: starting nodemanager, logging to /usr/big/hadoop/logs/yarn-root-nodemanager-slave2.out [root@master hadoop]# jps 8851 NameNode 9395 ResourceManager 9655 Jps 9146 SecondaryNameNode [root@master hadoop]#
通过jps可以看到,在master中已经开启了NameNode 和 ResouceManager,那么接下来,大家也可以到slave1和slave2机器上看一下是不是把NodeManager
和 DataNode都开起来了。。。
[root@slave1 hadoop]# jps 7112 NodeManager 7354 Jps 6892 DataNode [root@slave1 hadoop]# [root@slave2 hadoop]# jps 7553 NodeManager 7803 Jps 7340 DataNode [root@slave2 hadoop]#
四:搭建完成,查看结果
通过下面的tlnp命令,可以看到50070端口和8088端口打开,一个是查看datanode,一个是查看mapreduce任务。
[root@master hadoop]# netstat -tlnp
五:最后通过hadoop自带的wordcount来结束本篇的搭建过程。
在hadoop的share目录下有一个wordcount的测试程序,主要用来统计单词的个数,hadoop/share/hadoop/mapreduce/hadoop-mapreduce-
examples-2.9.0.jar。
1. 我在/usr/soft下通过程序生成了一个39M的2.txt文件(全是随机汉字哦。。。)
[root@master soft]# ls -lsh 2.txt 39M -rw-r--r--. 1 root root 39M Nov 24 00:32 2.txt [root@master soft]#
2. 在hadoop中创建一个input文件夹,然后在把2.txt上传过去
[root@master soft]# hadoop fs -mkdir /input [root@master soft]# hadoop fs -put /usr/soft/2.txt /input [root@master soft]# hadoop fs -ls / Found 1 items drwxr-xr-x - root supergroup 0 2017-11-24 20:30 /input
3. 执行wordcount的mapreduce任务
[root@master soft]# hadoop jar /usr/big/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar wordcount /input/2.txt /output/v1 17/11/24 20:32:21 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id 17/11/24 20:32:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 17/11/24 20:32:21 INFO input.FileInputFormat: Total input files to process : 1 17/11/24 20:32:21 INFO mapreduce.JobSubmitter: number of splits:1 17/11/24 20:32:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1430356259_0001 17/11/24 20:32:22 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 17/11/24 20:32:22 INFO mapreduce.Job: Running job: job_local1430356259_0001 17/11/24 20:32:22 INFO mapred.LocalJobRunner: OutputCommitter set in config null 17/11/24 20:32:22 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/11/24 20:32:22 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 17/11/24 20:32:22 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter 17/11/24 20:32:22 INFO mapred.LocalJobRunner: Waiting for map tasks 17/11/24 20:32:22 INFO mapred.LocalJobRunner: Starting task: attempt_local1430356259_0001_m_000000_0 17/11/24 20:32:22 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/11/24 20:32:22 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 17/11/24 20:32:22 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 17/11/24 20:32:22 INFO mapred.MapTask: Processing split: hdfs://192.168.23.196:9000/input/2.txt:0+40000002 17/11/24 20:32:22 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 17/11/24 20:32:22 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100 17/11/24 20:32:22 INFO mapred.MapTask: soft limit at 83886080 17/11/24 20:32:22 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600 17/11/24 20:32:22 INFO mapred.MapTask: kvstart = 26214396; length = 6553600 17/11/24 20:32:22 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 17/11/24 20:32:23 INFO mapreduce.Job: Job job_local1430356259_0001 running in uber mode : false 17/11/24 20:32:23 INFO mapreduce.Job: map 0% reduce 0% 17/11/24 20:32:23 INFO input.LineRecordReader: Found UTF-8 BOM and skipped it 17/11/24 20:32:27 INFO mapred.MapTask: Spilling map output 17/11/24 20:32:27 INFO mapred.MapTask: bufstart = 0; bufend = 27962024; bufvoid = 104857600 17/11/24 20:32:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 12233388(48933552); length = 13981009/6553600 17/11/24 20:32:27 INFO mapred.MapTask: (EQUATOR) 38447780 kvi 9611940(38447760) 17/11/24 20:32:32 INFO mapred.MapTask: Finished spill 0 17/11/24 20:32:32 INFO mapred.MapTask: (RESET) equator 38447780 kv 9611940(38447760) kvi 6990512(27962048) 17/11/24 20:32:33 INFO mapred.MapTask: Spilling map output 17/11/24 20:32:33 INFO mapred.MapTask: bufstart = 38447780; bufend = 66409804; bufvoid = 104857600 17/11/24 20:32:33 INFO mapred.MapTask: kvstart = 9611940(38447760); kvend = 21845332(87381328); length = 13981009/6553600 17/11/24 20:32:33 INFO mapred.MapTask: (EQUATOR) 76895558 kvi 19223884(76895536) 17/11/24 20:32:34 INFO mapred.LocalJobRunner: map > map 17/11/24 20:32:34 INFO mapreduce.Job: map 67% reduce 0% 17/11/24 20:32:38 INFO mapred.MapTask: Finished spill 1 17/11/24 20:32:38 INFO mapred.MapTask: (RESET) equator 76895558 kv 19223884(76895536) kvi 16602456(66409824) 17/11/24 20:32:39 INFO mapred.LocalJobRunner: map > map 17/11/24 20:32:39 INFO mapred.MapTask: Starting flush of map output 17/11/24 20:32:39 INFO mapred.MapTask: Spilling map output 17/11/24 20:32:39 INFO mapred.MapTask: bufstart = 76895558; bufend = 100971510; bufvoid = 104857600 17/11/24 20:32:39 INFO mapred.MapTask: kvstart = 19223884(76895536); kvend = 7185912(28743648); length = 12037973/6553600 17/11/24 20:32:40 INFO mapred.LocalJobRunner: map > sort 17/11/24 20:32:43 INFO mapred.MapTask: Finished spill 2 17/11/24 20:32:43 INFO mapred.Merger: Merging 3 sorted segments 17/11/24 20:32:43 INFO mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 180000 bytes 17/11/24 20:32:43 INFO mapred.Task: Task:attempt_local1430356259_0001_m_000000_0 is done. And is in the process of committing 17/11/24 20:32:43 INFO mapred.LocalJobRunner: map > sort 17/11/24 20:32:43 INFO mapred.Task: Task 'attempt_local1430356259_0001_m_000000_0' done. 17/11/24 20:32:43 INFO mapred.LocalJobRunner: Finishing task: attempt_local1430356259_0001_m_000000_0 17/11/24 20:32:43 INFO mapred.LocalJobRunner: map task executor complete. 17/11/24 20:32:43 INFO mapred.LocalJobRunner: Waiting for reduce tasks 17/11/24 20:32:43 INFO mapred.LocalJobRunner: Starting task: attempt_local1430356259_0001_r_000000_0 17/11/24 20:32:43 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/11/24 20:32:43 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 17/11/24 20:32:43 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 17/11/24 20:32:43 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@f8eab6f 17/11/24 20:32:43 INFO mapreduce.Job: map 100% reduce 0% 17/11/24 20:32:43 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=1336252800, maxSingleShuffleLimit=334063200, mergeThreshold=881926912, ioSortFactor=10, memToMemMergeOutputsThreshold=10 17/11/24 20:32:43 INFO reduce.EventFetcher: attempt_local1430356259_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events 17/11/24 20:32:43 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1430356259_0001_m_000000_0 decomp: 60002 len: 60006 to MEMORY 17/11/24 20:32:43 INFO reduce.InMemoryMapOutput: Read 60002 bytes from map-output for attempt_local1430356259_0001_m_000000_0 17/11/24 20:32:43 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 60002, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->60002 17/11/24 20:32:43 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 17/11/24 20:32:43 INFO mapred.LocalJobRunner: 1 / 1 copied. 17/11/24 20:32:43 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs 17/11/24 20:32:43 INFO mapred.Merger: Merging 1 sorted segments 17/11/24 20:32:43 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 59996 bytes 17/11/24 20:32:43 INFO reduce.MergeManagerImpl: Merged 1 segments, 60002 bytes to disk to satisfy reduce memory limit 17/11/24 20:32:43 INFO reduce.MergeManagerImpl: Merging 1 files, 60006 bytes from disk 17/11/24 20:32:43 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 17/11/24 20:32:43 INFO mapred.Merger: Merging 1 sorted segments 17/11/24 20:32:43 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 59996 bytes 17/11/24 20:32:43 INFO mapred.LocalJobRunner: 1 / 1 copied. 17/11/24 20:32:43 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords 17/11/24 20:32:44 INFO mapred.Task: Task:attempt_local1430356259_0001_r_000000_0 is done. And is in the process of committing 17/11/24 20:32:44 INFO mapred.LocalJobRunner: 1 / 1 copied. 17/11/24 20:32:44 INFO mapred.Task: Task attempt_local1430356259_0001_r_000000_0 is allowed to commit now 17/11/24 20:32:44 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1430356259_0001_r_000000_0' to hdfs://192.168.23.196:9000/output/v1/_temporary/0/task_local1430356259_0001_r_000000 17/11/24 20:32:44 INFO mapred.LocalJobRunner: reduce > reduce 17/11/24 20:32:44 INFO mapred.Task: Task 'attempt_local1430356259_0001_r_000000_0' done. 17/11/24 20:32:44 INFO mapred.LocalJobRunner: Finishing task: attempt_local1430356259_0001_r_000000_0 17/11/24 20:32:44 INFO mapred.LocalJobRunner: reduce task executor complete. 17/11/24 20:32:44 INFO mapreduce.Job: map 100% reduce 100% 17/11/24 20:32:44 INFO mapreduce.Job: Job job_local1430356259_0001 completed successfully 17/11/24 20:32:44 INFO mapreduce.Job: Counters: 35File System CountersFILE: Number of bytes read=1087044FILE: Number of bytes written=2084932FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=80000004HDFS: Number of bytes written=54000HDFS: Number of read operations=13HDFS: Number of large read operations=0HDFS: Number of write operations=4Map-Reduce FrameworkMap input records=1Map output records=10000000Map output bytes=80000000Map output materialized bytes=60006Input split bytes=103Combine input records=10018000Combine output records=24000Reduce input groups=6000Reduce shuffle bytes=60006Reduce input records=6000Reduce output records=6000Spilled Records=30000Shuffled Maps =1Failed Shuffles=0Merged Map outputs=1GC time elapsed (ms)=1770Total committed heap usage (bytes)=1776287744Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=40000002File Output Format Counters Bytes Written=54000
4. 最后我们到/output/v1下面去看一下最终生成的结果,由于生成的汉字太多,我这里只输出了一部分
[root@master soft]# hadoop fs -ls /output/v1 Found 2 items -rw-r--r-- 2 root supergroup 0 2017-11-24 20:32 /output/v1/_SUCCESS -rw-r--r-- 2 root supergroup 54000 2017-11-24 20:32 /output/v1/part-r-00000 [root@master soft]# hadoop fs -ls /output/v1/part-r-00000 -rw-r--r-- 2 root supergroup 54000 2017-11-24 20:32 /output/v1/part-r-00000 [root@master soft]# hadoop fs -tail /output/v1/part-r-000001609 攟 1685 攠 1636 攡 1682 攢 1657 攣 1685 攤 1611 攥 1724 攦 1732 攧 1657 攨 1767 攩 1768 攪 1624
好了,搭建的过程确实是麻烦,关于hive的搭建,我们放到后面的博文中去说吧。。。希望本篇对你有帮助。
转载于:https://www.cnblogs.com/huangxincheng/p/7895019.html
通过hadoop + hive搭建离线式的分析系统之快速搭建一览相关推荐
- 使用ELK搭建日志收集和分析系统
搭建日志收集和分析系统需要以下步骤: 安装Java运行环境 ELK是基于Java开发的,因此需要在服务器上安装Java运行环境 安装Elasticsearch Elasticsearch是ELK的核心 ...
- 手把手搭建企业IT实战环境第三季:快速搭建SCCM1902服务器
手把手搭建企业IT实战环境第三季:快速搭建SCCM1902服务器 ©Lander Zhang 专注外企按需IT基础架构运维服务,IT Helpdesk 实战培训践行者 博客:https://blog. ...
- 实例分析!如何快速搭建OA办公系统
什么是OA系统,OA是英语Office Automation的简称,字面意思是办公自动化,即将.等现代化技术运用到传统办公中,进而形成的一种新型办公方式.它利用现代化设备和信息化技术,代替了办公人员传 ...
- 用Python搭建一个股票舆情分析系统
写在前面 下面的这篇文章将手把手教大家搭建一个简单的股票舆情分析系统,其中将先通过金融界网站爬取指定股票在一段时间的新闻,然后通过百度情感分析接口,用于评估指定股票的正面和反面新闻的占比,以此确定该股 ...
- ELK日志分析系统之ELK搭建部署
文章目录 配置ELK日志分析系统 1.配置elasticsearch环境 2.部署elasticsearch软件 2.1 安装Elasticsearch软件 2.2 加载系统服务 2.3 修改elas ...
- 单机 搭建kafka集群 本地_单机快速搭建多节点kafka集群
有时候为了更好地了解kafka集群的运行机制,需要自己搭建kafka集群.本文的目的就是让大家在单机上快速搭建kafka集群(仅作为单机测试使用). 环境及工具版本 mac OS 10.15.5 ka ...
- 【服务器搭建个人网站】教程二:快速搭建我们服务器 进来看
前言: 购买一台服务器,再来个域名,搭建一个自己的个人博客,把一些教程.源码.想要分享的好玩的放到博客上,供小伙伴学习玩耍使用. 我把这个过程记录下来,想要尝试的小伙伴,可以按照步骤,自己尝试一下~ ...
- 搭建企业级ELK日志分析系统
2019独角兽企业重金招聘Python工程师标准>>> 一.ELK搭建篇 官网地址:https://www.elastic.co/cn/ 官网权威指南: https://www.el ...
- gcc离线安装 ubuntu 不用编译_Ubuntu快速搭建C++开发环境(VS Code编辑器)
以下安装的是g++-8(目前最新)和Visual Studio Code,此方法适用于Ubuntu 14.04 64位.Ubuntu 16.04 32位/64位.Ubuntu 18.04,Ubuntu ...
最新文章
- 计算机的网络端口管理器,Win7设备管理器没有端口选项的三大原因及解决措施...
- 最高 10 万奖金!2021 新网银行智能语音大赛来了!
- JQuery Smart UI 简介(五) — 框架设计【前篇】(前台UI层架构)
- mysql不能正常yum remove怎么办?--noscripts
- bzoj千题计划116:bzoj1025: [SCOI2009]游戏
- [Codeforces967C]Stairs and Elevators(二分查找)
- bzoj 3261 最大异或和【可持久化trie】
- python stdin和stdout_无法使用Python写入和读取stdin / stdout
- Java项目:医院门诊收费管理系统(java+html+jdbc+mysql)
- hud android,HUD | F-Droid - Free and Open Source Android App Repository
- 计算机教程无线路由器桥接上网,两个无线路由器怎么桥接?如何桥接两个无线路由器...
- Kotlin与Java的异同
- zabbix纯内网环境监控服务器发送邮件报警
- 静图怎样合成gif动图?仅需三步在线制作GIF动图
- 计算机网络:BGP路由协议
- 哈工大C语言程序设计精髓第三周
- 重建古老计算机Pong
- Bash脚本基础:环境变量定义与使用
- Python线性回归:加载共享自行车租赁数据集 BikeSharing.csv。 1. 按以下要求处理数据集 (1)分离出仅含特征列的部分作为 X 和仅含目标列的部分作为 Y。
- WebRTC音视频同步详解
热门文章
- F# 换“山头”啦!现已迁移到 .NET 名下
- 天黑时间跟经度还是纬度有关_经纬度和时间有什么关系
- java-io流理解
- java 服务注册中心_服务治理的含义和java最流行的微服务框架服务治理注册中心的搭建...
- gradle mysql方言_Ktorm | 方言与原生 SQL
- linux判断网站被采集,网站被采集的几个处理方法(非技术)
- 省市区 / 三级联动
- android activity启动模式_从0系统学Android--2.5Activity启动模式
- java性能测试jmh
- 铁甲雄兵显示服务器维护,《铁甲雄兵》5月17日09:00停机维护公告