一. 介绍

HiBench是一款用于hadoop集群性能测试的开源工具。支持MR,HIVE,SPARK等计算框架,且支持多种维度的测试。早在HiBench5.0的时候作者就用过,个人感觉还不错,无论是apache hadoop,cdh还是hdp都可以支持。

最新发现,HiBench 7.x 版本发布了。因此就趁着机会,结合新版本总结一下使用中遇到的问题,助大家过坑。

HiBench的github地址如下:https://github.com/intel-hadoop/HiBench

环境:

OS:CentOS 7.3

Hadoop: HDP 2.6.5.x

HiBench: 7.1

二. 安装

安装方式:

使用maven命令即可,具体请见:https://github.com/intel-hadoop/HiBench/blob/master/docs/build-hibench.md

问题整理:

maven构建问题:

问题描述:

maven构建过程中报以下错误(以mahout为例):

INFO] --- download-maven-plugin:1.2.0:wget (default) @ mahout ---
Downloading: http://archive.apache.org/dist/mahout/0.11.0//apache-mahout-distribution-0.11.0.tar.gz
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)at com.googlecode.WGet.doGet(WGet.java:293)at com.googlecode.WGet.execute(WGet.java:223)at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)at com.googlecode.WGet.doGet(WGet.java:293)at com.googlecode.WGet.execute(WGet.java:223)at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Could not get content
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)at com.googlecode.WGet.doGet(WGet.java:293)at com.googlecode.WGet.execute(WGet.java:223)at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Retrying (1 more)
Downloading: http://archive.apache.org/dist/mahout/0.11.0//apache-mahout-distribution-0.11.0.tar.gz
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)at com.googlecode.WGet.doGet(WGet.java:293)at com.googlecode.WGet.execute(WGet.java:223)at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)at com.googlecode.WGet.doGet(WGet.java:293)at com.googlecode.WGet.execute(WGet.java:223)at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[WARNING] Could not get content
org.apache.maven.wagon.TransferFailedException: Failed to transfer file: http://archive.apache.org/dist/mahout/0.11.0/apache-ma0.11.0.tar.gz. Return code is: 503 , ReasonPhrase:Service Unavailable.at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:1023)at org.apache.maven.wagon.providers.http.AbstractHttpClientWagon.fillInputData(AbstractHttpClientWagon.java:962)at org.apache.maven.wagon.StreamWagon.getInputStream(StreamWagon.java:126)at org.apache.maven.wagon.StreamWagon.getIfNewer(StreamWagon.java:88)at org.apache.maven.wagon.StreamWagon.get(StreamWagon.java:61)at com.googlecode.WGet.doGet(WGet.java:293)at com.googlecode.WGet.execute(WGet.java:223)at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)at org.apache.maven.cli.MavenCli.execute(MavenCli.java:993)at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:345)at org.apache.maven.cli.MavenCli.main(MavenCli.java:191)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)

层主在构建"hadoopbench-sql","mahout"和"nutchindexing"时都遇到了类似问题。

问题原因:

网站访问协议更换导致的对应文件下载异常。

解决方法:

方式1. 将pom.xml中的"http"修改为"https",例如:

<properties><repo>https://archive.apache.org</repo><file>dist/hive/hive-0.14.0/apache-hive-0.14.0-bin.tar.gz</file>
</properties>

方式2. 将对应文件在"https://archive.apache.org/dist"此源中下载至本地http,再修改pom.xml中的文件路径,例如:

<properties><repo>http://private.repo.io/archive.apache.org</repo><file>dist/hive/hive-0.14.0/apache-hive-0.14.0-bin.tar.gz</file>
</properties>

第一种方式最简单。如果网速不给力,请用第二种方法。

三. 使用

使用方式:

具体不同的测试请参见:https://github.com/intel-hadoop/HiBench/blob/master/docs/ 中的"run-*.md"

问题整理:

===================================================================

权限问题:

问题描述:

执行HiBench命令报错:

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/var/tmp":hdfs:hdfs:drwxr-xr-xat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:353)at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:325)at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:246)at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1950)at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1934)at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1917)at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4185)at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1109)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:645)at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)at org.apache.hadoop.ipc.Client.call(Client.java:1498)at org.apache.hadoop.ipc.Client.call(Client.java:1398)at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:610)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290)at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202)at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184)at com.sun.proxy.$Proxy15.mkdirs(Unknown Source)at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3099)... 21 more

问题原因:

HiBench在使用过程中,会需要在HDFS下建立目录,如果默认使用root用户进行测试,且HDFS开启简单权限模式的话(hdfs为超级用户),自然会权限限制。

解决方法:

预先创建以下目录,并修改权限(暴力点可以直接给777,或是确定进行测试用户具备相应目录的写权限):

注:部分目录可通过conf/hibench.conf进行修改,以下目录参考是以默认配置文件为基准的。

/HiBench
/var
/user/root # root是作者用来测试的用户

===================================================================

sparkbench 日志没有输出到job history的问题:

问题描述:

使用默认的HiBench的运行命令执行spark作业测试,例如:

bin/workloads/sql/scan/spark/run.sh

该spark作业的日志将不会出现在job history server中。

问题原因:

HiBench的作业默认使用的是自生成的spark.conf而不是使用实际环境下的spark-defaults.conf。

解决方法:

首先,根据实际环境下的spark-defaults.conf找到以下用于规定日志输出的三个参数:

spark.eventLog.enabled
spark.yarn.historyServer.address
spark.eventLog.dir

将上述三个参数拼接成用于spark-submit的conf信息:

--conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=node1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/" 

修改HiBench的"bin/functions/workload_functions.sh"文件,将上述conf信息添加至提交命令中,大概在216行(修改前请备份):

修改前:

if [[ "$CLS" == *.py ]]; thenLIB_JARS="$LIB_JARS --jars ${SPARKBENCH_JAR}"SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@"
else        SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@"
fi

修改后:

if [[ "$CLS" == *.py ]]; thenLIB_JARS="$LIB_JARS --jars ${SPARKBENCH_JAR}"SUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=node1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/" --properties-file ${SPARK_PROP_CONF} --master ${SPARK_MASTER} ${YARN_OPTS} ${CLS} $@"
elseSUBMIT_CMD="${SPARK_HOME}/bin/spark-submit ${LIB_JARS} --conf "spark.eventLog.enabled=true" --conf "spark.yarn.historyServer.address=node1.com:18081" --conf "spark.eventLog.dir=hdfs:///spark2-history/" --properties-file ${SPARK_PROP_CONF} --class ${CLS} --master ${SPARK_MASTER} ${YARN_OPTS} ${SPARKBENCH_JAR} $@"
fi

===================================================================

Streaming原始数据由HDFS向Kafka输入失败的问题:

问题描述:

执行以下命令将在HDFS生成的种子数据发往Kafka:

bin/workloads/streaming/identity/prepare/dataGen.sh

发生异常:

====================================================
log4j:WARN No appenders could be found for logger (org.apache.kafka.clients.producer.ProducerConfig).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "pool-1-thread-1" java.lang.IllegalArgumentException: java.net.UnknownHostException: myclusterat org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:240)at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:144)at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:579)at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:524)at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)at com.intel.hibench.datagen.streaming.util.SourceFileReader.getReader(SourceFileReader.java:33)at com.intel.hibench.datagen.streaming.util.CachedData.<init>(CachedData.java:56)at com.intel.hibench.datagen.streaming.util.CachedData.getInstance(CachedData.java:43)at com.intel.hibench.datagen.streaming.util.KafkaSender.<init>(KafkaSender.java:55)at com.intel.hibench.datagen.streaming.DataGenerator$DataGeneratorJob.run(DataGenerator.java:116)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.UnknownHostException: mycluster... 20 more

问题原因:

作者本人的hadoop集群的HDFS是采用高可用部署的,namenode为active/standby模式,拥有一个namespace叫"mycluster",因此默认在HiBench的conf/hadoop.conf中的配置信息为:

# The root HDFS path to store HiBench data
hibench.hdfs.master       hdfs://mycluster

具体原因应该出自数据传输的实现,请见"com.intel.hibench.datagen.streaming.DataGenerator"。目测是该实现不支持namespace的访问方式,本人没有细究,有兴趣的可以去看下上述类的源码。

解决方法:

将上述conf/hadoop.conf中的配置信息修改为指定的namenode信息即可,例如:

# The root HDFS path to store HiBench data
hibench.hdfs.master       hdfs://node1.com:8020

===================================================================

其它:

SequenceFile格式化:

很多默认的HiBench的测试结果文件都是以SequenceFile进行输出的(mahout必须是SequenceFile),具体请见各个作业的执行脚本,例如Wordcount的run.sh(截取部分):

run_hadoop_job ${HADOOP_EXAMPLES_JAR} wordcount \-D mapreduce.job.maps=${NUM_MAPS} \-D mapreduce.job.reduces=${NUM_REDS} \-D mapreduce.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat \-D mapreduce.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \-D mapreduce.job.inputformat.class=org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat \-D mapreduce.job.outputformat.class=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat \${INPUT_HDFS} ${OUTPUT_HDFS}
END_TIME=`timestamp`

为了方便阅读或是结果数据的后续使用,可通过如下方式进行转化:

使用FS SHELL进行文本输出:

sudo -u hdfs hadoop fs -text /path/xxx

使用MR(转载)实现文件转换:

https://blog.csdn.net/sinat_29508201/article/details/49155127

使用Spark(pyspark):

# SparkSession available as 'spark'
reader=sc.sequenceFile("xxxx",keyClass="org.apache.hadoop.io.Text",valueClass="org.apache.hadoop.io.IntWritable")# 直接打印出来
def out(x):print x
reader.foreach(out)# 转为sql
df=spark.read.json(reader.map(lambda x: x))

HiBench 7.x 使用问题整理相关推荐

  1. 史上最全大数据学习资源整理

    史上最全大数据学习资源整理 ----------------------------------------------------------------------------------- 转载 ...

  2. 最全大数据学习资源整理

    关系数据库管理系统(RDBMS) MySQL:世界最流行的开源数据库; PostgreSQL:世界最先进的开源数据库; Oracle 数据库:对象-关系型数据库管理系统. 框架 Apache Hado ...

  3. 史上最全“大数据”学习资源整理

    转自:史上最全"大数据"学习资源整理 ------------ 资源列表: 关系数据库管理系统(RDBMS) MySQL:世界最流行的开源数据库; PostgreSQL:世界最先进 ...

  4. Map再整理,从底层源码探究HashMap

    前言 本文为对Map集合的再一次整理.内容包括:Map HashMap LinkedHashMap TreeHashMap HashTable ConcurrentHashMap Map Map< ...

  5. List再整理,从代码底层全面解析List(看完后保证收获满满)

    前言 本文为对List集合的再一次整理,从父集接口Collection到顶级接口Iterable再到线程不安全实现类:ArrayList.LinkedList,再到线程安全实现类:Vector(被弃用 ...

  6. LeetCode简单题之整理字符串

    题目 给你一个由大小写英文字母组成的字符串 s . 一个整理好的字符串中,两个相邻字符 s[i] 和 s[i+1],其中 0<= i <= s.length-2 ,要满足如下条件: 若 s ...

  7. TVM/Relay 的 PartitionGraph()(mod) 函数讨论整理

    TVM/Relay 的 PartitionGraph()(mod) 函数讨论整理 TVM/Relay 的图形分区功能.以下简单示例,错误信息. PartitionGraph() 函数指定图形是用带有 ...

  8. AIFramework基本概念整理

    AIFramework基本概念整理 本文介绍: • 对天元 MegEngine 框架中的 Tensor, Operator, GradManager 等基本概念有一定的了解: • 对深度学习中的前向传 ...

  9. Python库全部整理出来了,非常全面

    库名称简介 Chardet 字符编码探测器,可以自动检测文本.网页.xml的编码. colorama 主要用来给文本添加各种颜色,并且非常简单易用. Prettytable 主要用于在终端或浏览器端构 ...

最新文章

  1. Hadoop - MapReduce
  2. python创建实例时显示没有参数-OSError无法创建文件无效参数
  3. superhot预告片下载_预告片:裸指关节SOA
  4. 20张可视化大屏,给数据分析师最全的大屏模板!无代码直接套用
  5. Hyperledger Fabric Membership Service Providers (MSP)——成员服务
  6. Call to undefined function imagettftext()解决方法
  7. 金字塔原理--公开演讲
  8. 使用原理视角看 Git
  9. Babel 入门教程
  10. c++编程时为什么老是出现cout未定义
  11. 安装SPSS激活时提示could not create the Java virtual machine
  12. 微信小程序二维码生成
  13. java获取微信用户信息(UnionID)
  14. Java打印变量的参数类型
  15. java 磁力下载工具_它可能是现在最好用的磁力下载工具
  16. 华氏温度和摄氏温度的转换-C语言
  17. JavaScript_牛客网_编程初学者入门训练(21-30题解)
  18. Spark Steaming管理kafka的offset
  19. Json 处理 - golang
  20. 那些靠互联网年赚百万的大佬们是如何赚钱的?

热门文章

  1. 华侨大学本科毕业论文答辩PPT模板
  2. WeakHashMap分析
  3. latex对修改内容进行高亮
  4. [导入]柔道视频教程 2
  5. 儿知错父之过下一句_秦始皇:我儿知错了吗?
  6. BSN如何推动全球区块链技术落地?
  7. Android 点击拍照,长按录像保存本地 结合camer2实现 前后摄像头切换,手机闪关灯,
  8. 【​观察】未来十年AI的主场在行业,行业AI的使能之路看华为
  9. 人生苦短,Python当歌
  10. 《科学》封面重磅论文:人工智能终于能像人类一样学习