本文是对TEZ的官方文档[1]的翻译,重点都红色加粗标记,其他都是废话,直接跳过

Install/Deploy Instructions for Tez

Replace x.y.z with the tez release number that you are using. E.g. 0.5.0. For Tez versions 0.8.3 and higher, Tez needs Apache Hadoop to be of version 2.6.0 or higher. For Tez version 0.9.0 and higher, Tez needs Apache Hadoop to be version 2.7.0 or higher.

  1. Deploy Apache Hadoop using version of 2.7.0 or higher.

    • You need to change the value of the hadoop.version property in the top-level pom.xml to match the version of the hadoop branch being used.(pom.xml中修改hadoop的版本)
    $ hadoop version
    
  2. Build tez using mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

    • This assumes that you have already installed JDK8 or later and Maven 3 or later.
    • Tez also requires Protocol Buffers 2.5.0, including the protoc-compiler.
      • This can be downloaded from https://github.com/google/protobuf/tags/.
      • On Mac OS X with the homebrew package manager brew install protobuf250
      • For rpm-based linux systems, the yum repos may not have the 2.5.0 version. rpm.pbone.net has the protobuf-2.5.0 and protobuf-compiler-2.5.0 packages.
    • If you prefer to run the unit tests, remove skipTests from the command above.
    • If you use Eclipse IDE, you can import the projects using “Import/Maven/Existing Maven Projects”. Eclipse does not automatically generate Java sources or include the generated sources into the projects. Please build using maven as described above and then use Project Properties to include “target/generatedsources/java” as a source directory into the “Java Build Path” for these projects: tez-api, tez-mapreduce, tez-runtime-internals and tez-runtime-library. This needs to be done just once after importing the project.
  3. Copy the relevant tez tarball into HDFS, and configure tez-site.xml

    • A tez tarball containing tez and hadoop libraries will be found at tez-dist/target/tez-0.9.2-SNAPSHOT.tar.gz
    • Assuming that the tez jars are put in /apps/ on HDFS, the command would be
    hadoop fs -mkdir /apps/tez-0.9.2-SNAPSHOT
    hadoop fs -copyFromLocal tez-dist/target/tez-0.9.2-SNAPSHOT.tar.gz /apps/tez-0.9.2-SNAPSHOT/
    
    • tez-site.xml configuration.

      • Set tez.lib.uris to point to the tar.gz uploaded to HDFS. Assuming the steps mentioned so far were followed, set tez.lib.uris to ${fs.defaultFS}/apps/tez-x.y.z-SNAPSHOT/tez-x.y.z-SNAPSHOT.tar.gz
      • Ensure tez.use.cluster.hadoop-libs is not set in tez-site.xml, or if it is set, the value should be false
    • Please note that the tarball version should match the version of the client jars used when submitting Tez jobs to the cluster. Please refer to the Version Compatibility Guide for more details on version compatibility and detecting mismatches.
  1. Optional: If running existing MapReduce jobs on Tez. Modify mapred-site.xml to change “mapreduce.framework.name” property from its default value of “yarn” to “yarn-tez”
  2. Configure the client node to include the tez-libraries in the hadoop classpath

    • Extract the tez minimal tarball created in step 2 to a local directory (assuming TEZ_JARS is where the files will be decompressed for the next steps)
    tar -xvzf tez-dist/target/tez-0.9.2-minimal.tar.gz -C $TEZ_JARS
    
    • set TEZ_CONF_DIR to the location of tez-site.xml
    • Add $TEZ_CONF_DIR, ${TEZ_JARS}/* and ${TEZ_JARS}/lib/* to the application classpath. For example, doing it via the standard Hadoop tool chain would use the following command to set up the application classpath:
    export HADOOP_CLASSPATH=${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
    
    • Please note the “*” which is an important requirement when setting up classpaths for directories containing jar files.
  3. There is a basic example of using an MRR job in the tez-examples.jar. Refer to OrderedWordCount.java in the source code. To run this example:

    $HADOOP_PREFIX/bin/hadoop jar tez-examples.jar orderedwordcount <input> <output>上面这句话讲人话就是:
    hadoop jar tez-examples.jar orderedwordcount /input /output
    

    This will use the TEZ DAG ApplicationMaster to run the ordered word count job. This job is similar to the word count example except that it also orders all words based on the frequency of occurrence.

    Tez DAGs could be run separately as different applications or serially within a single TEZ session. There is a different variation of orderedwordcount in tez-tests that supports the use of Sessions and handling multiple input-output pairs. You can use it to run multiple DAGs serially on different inputs/outputs.

    $HADOOP_PREFIX/bin/hadoop jar tez-tests.jar testorderedwordcount <input1> <output1> <input2> <output2> <input3> <output3> ...
    

    The above will run multiple DAGs for each input-output pair.

    To use TEZ sessions, set -DUSE_TEZ_SESSION=true

    $HADOOP_PREFIX/bin/hadoop jar tez-tests.jar testorderedwordcount -DUSE_TEZ_SESSION=true <input1> <output1> <input2> <output2>
    
  4. Submit a MR job as you normally would using something like:

    $HADOOP_PREFIX/bin/hadoop jar hadoop-mapreduce-client-jobclient-3.0.0-SNAPSHOT-tests.jar sleep -mt 1 -rt 1 -m 1 -r 1
    

    This will use the TEZ DAG ApplicationMaster to run the MR job. This can be verified by looking at the AM’s logs from the YARN ResourceManager UI. This needs mapred-site.xml to have “mapreduce.framework.name” set to “yarn-tez”

Various ways to configure tez.lib.uris

The tez.lib.uris configuration property supports a comma-separated list of values.

三种设置tez.lib.uris的方法:

①Path to simple file -

②Path to a directory -

③Path to a compressed archive ( tarball, zip, etc).

For simple files and directories, Tez will add all these files and first-level entries in the directories (recursive traversal of dirs is not supported) into the working directory of the Tez runtime and they will automatically be included into the classpath. For archives i.e. files whose names end with generally known compressed archive suffixes such as ‘tgz’, ‘tar.gz’, ‘zip’, etc. will be uncompressed into the container working directory too. However, given that the archive structure is not known to the Tez framework, the user is expected to configure tez.lib.uris.classpath to ensure that the nested directory structure of an archive is added to the classpath. This classpath values should be relative i.e. the entries should start with “./”.

Hadoop Installation dependent Install/Deploy Instructions

The above install instructions use Tez with pre-packaged Hadoop libraries included in the package and is the recommended method for installation. A full tarball with all dependencies is a better approach to ensure that existing jobs continue to run during a cluster’s rolling upgrade.

Although the tez.lib.uris configuration options enable a wide variety of usage patterns, there are 2 main alternative modes that are supported by the framework(下面是两种主要的安装模式):

  1. Mode A: Using a tez tarball on HDFS along with Hadoop libraries available on the cluster.
  2. Mode B: Using a tez tarball                    along with Hadoop tarball.

Both these modes will require a tez build without Hadoop dependencies and that is available at tez-dist/target/tez-x.y.z-minimal.tar.gz.(这两种模式都需要不依赖hadoop的tez编译包)

下面是A模式:

"hadoop fs -mkdir /apps/tez-0.9.2"
"hadoop fs -copyFromLocal tez-dist/target/tez-0.9.2-minimal.tar.gz /apps/tez-0.9.2"

tez.lib.uris=${fs.defaultFS}/apps/tez-0.9.2/tez-0.9.2-minimal.tar.gz

tez.use.cluster.hadoop-libs=true

下面是B模式:

"hadoop fs -mkdir /apps/tez-0.9.2"

"hadoop fs -copyFromLocal tez-dist/target/tez-0.9.2-minimal.tar.gz /apps/tez-0.9.2"

或者

"hadoop fs -copyFromLocal tez-dist/target/tez-0.9.2-minimal/* /apps/tez-0.9.2"

"hadoop fs -mkdir /apps/hadoop-3.1.2"

"hadoop fs -copyFromLocal hadoop-dist/target/hadoop-3.1.2-SNAPSHOT.tar.gz /apps/hadoop-3.1.2"

############################下面是tez.lib.uris和tez.lib.uris.classpath(针对B模式)#######################

  tez.lib.uri(下面的用,进行连接) tez.lib.uris.classpath(下面的用:进行连接)
both Tez and Hadoop archives(Tez和hadoop的都采用压缩包)

${fs.defaultFS}/apps/tez-0.9.2/tez-0.9.2-minimal.tar.gz

${fs.defaultFS}/apps/hadoop-0.9.2/hadoop-0.9.2-SNAPSHOT.tar.gz

$TEZ_HOME/*
$TEZ_HOME/lib/*
$HADOOP_HOME/share/hadoop/common/*
$HADOOP_HOME/share/hadoop/common/lib/*
$HADOOP_HOME/share/hadoop/hdfs/*
$HADOOP_HOME/share/hadoop/hdfs/lib/*
$HADOOP_HOME/share/hadoop/yarn/*
$HADOOP_HOME/share/hadoop/yarn/lib/*
$HADOOP_HOME/share/hadoop/mapreduce/*
$HADOOP_HOME/share/hadoop/mapreduce/lib/*
Tez jars with a Hadoop archive(Tez jar和hadoop的压缩包)

${fs.defaultFS}/apps/tez-0.9.2

${fs.defaultFS}/apps/tez-0.9.2/lib

${fs.defaultFS}/apps/hadoop-0.9.2/hadoop-0.9.2-SNAPSHOT.tar.gz

$HADOOP_HOME/share/hadoop/common/*
$HADOOP_HOME/share/hadoop/common/lib/*
$HADOOP_HOME/share/hadoop/hdfs/*
$HADOOP_HOME/share/hadoop/hdfs/lib/*
$HADOOP_HOME/share/hadoop/yarn/*
$HADOOP_HOME/share/hadoop/yarn/lib/*
$HADOOP_HOME/share/hadoop

这里的archives就是压缩包的意思

If any archives are specified in tez.lib.uris, then tez.lib.uris.classpath must be set to define the classpath for these archives as the archive structure is not known.(一旦设定了tez.lib.uris,就必须设定tez.lib.uris.classpath)

Reference:

[1]Install/Deploy Instructions for Tez

tez安装官方文档整理+翻译相关推荐

  1. msgpack-c 官方文档整理翻译之pack

    msgpack::packer Supported types msgpack::packer 将任何数据打包成 msgpack 格式.目前支持以下格式: https://github.com/msg ...

  2. CUDA10.0官方文档的翻译与学习之编程接口

    目录 背景 用nvcc编译 编译工作流 二进制适配性 ptx适配性 应用适配性 C/C++适配性 64位适配性 cuda c运行时 初始化 设备内存 共享内存 页锁主机内存 可移植内存 写合并内存 映 ...

  3. Pytorch官方文档英语翻译

    深度学习Pytorch-Pytorch官方文档英语翻译 1. a-e 1.1 span 跨度 1.2 blended 混合的 1.3 criterion 标准 1.4 deprecated 弃用的 1 ...

  4. Spring官方文档中文翻译

    准备做个Spring官方文档全翻译专栏以下是大目录, 本翻译是基于Spring5 Core Technologies

  5. python pymssql - pymssql模块官方文档的翻译

    译者注:译者博客(http://blog.csdn.net/lin_strong),转载请保留这条.此为pymssql模块version2.1.4官方文档的翻译,仅供学习交流使用,请勿用于商业用途. ...

  6. ElasticSearch Java High level Rest Client 官方文档中文翻译(一)

    ElasticSearch Java High level Rest Client 官方文档中文翻译 一 纯粹记录自己在看官网的es rest high level api 时的翻译笔记,可以对照着官 ...

  7. 【开源项目推荐】Android Jetpack 官方文档 中文翻译

    Jetpack 是 Android 软件组件的集合,使您可以更轻松地开发出色的 Android 应用.这些组件可帮助您遵循最佳做法.让您摆脱编写样板代码的工作并简化复杂任务,以便您将精力集中放在所需的 ...

  8. python的pymssql模块的报错_python pymssql - pymssql模块官方文档的翻译

    译者注:译者博客(http://blog.csdn.net/lin_strong),转载请保留这条.此为pymssql模块version2.1.4官方文档的翻译,仅供学习交流使用,请勿用于商业用途. ...

  9. AsyncDisplayKit官方文档个人翻译

    迁移老文章到掘金 文档比较老了,不适用最新的2.0 AsyncDisplayKit 官方文档 最近在拆解学习AsyncDisplayKit这个很知名的轮子,发现这个轮子内容还是非常庞大的,想要分解学习 ...

最新文章

  1. Java初学者如何自学和自己定位解决问题
  2. 保证同一个账号同时只能在一个设备上登录
  3. js 获取sessionid_百战卓越班学员学习经验分享:页面js代码
  4. Java获取页面中所有图片的地址
  5. php add 返回id,PHP curl_multi_close函数
  6. 基于微软Synchronization Services双向同步技术在企业项目中的架构应用研究
  7. 带宽测量:pathload编译及运行
  8. C/C++面试题总结
  9. 机器学习中的Bias,Error,Variance的区别
  10. ble主服务的uuid 是一致的吗_nrf52832 开发之添加DFU服务
  11. oracle行相减,oracle两张表满足某个条件时,表中满足条件的那一行的某列相减
  12. 生信-使用NCBI进行目的基因的引物设计
  13. 如何成为一个合格的ASF贡献者?
  14. SOUI控件的自绘和消息处理
  15. MySQL三个表的连接查询
  16. 软件测试拿了几个20K offer,分享一波面经
  17. kettle 9.x 版本连接资源库,资源库灰色
  18. 小眼睛适合大框还是小框眼镜_【图】小眼睛适合什么眼镜框 这个禁忌千万不要犯_小眼睛_伊秀服饰网|yxlady.com...
  19. 怎样恢复回收站中已被删除的文件
  20. 关于vscode更新后 格式化代码造成函数括号后的空格被删除,单引号变双引号问题的解决方法

热门文章

  1. Python内置函数(62)——exec
  2. NSCTF-部分题目wp
  3. html怎么改变一块区域颜色,更改HTML中所选区域的背景颜色/不透明度
  4. uniapp h5 页面 解决 ios 长按无法保存图片问题(安卓支持此功能)--实现移动端长按保存图片
  5. mysql select 查询选后5个_mysql 查询select语句汇总
  6. 驱动精灵2007_畅玩Steam的Win10游戏掌机发布! |莱莎2新战斗动画,肉腿致命驱动~...
  7. js页面上的excel导出
  8. Caffe: Caffe的Python接口
  9. Git 最佳实践:分支管理
  10. spring 整合 mybatis 中数据源的几种配置方式