转载自官方文档,最新版请见:http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html

补充:建议添加如下环境变量

#hadoop configuration
export PATH=$PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin
export HADOOP_HOME=/home/jediael/hadoop-2.4.1
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.

  • Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.

    • Purpose
    • Prerequisites
      • Supported Platforms
      • Required Software
      • Installing Software
    • Download
    • Prepare to Start the Hadoop Cluster
    • Standalone Operation
    • Pseudo-Distributed Operation
      • Configuration
      • Setup passphraseless ssh
      • Execution
      • YARN on Single Node
    • Fully-Distributed Operation

Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Prerequisites

Supported Platforms

  • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
  • Windows is also a supported platform but the followings steps are for Linux only. To set up Hadoop on Windows, see wiki page.

Required Software

Required software for Linux include:

  1. Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
  2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Installing Software

If your cluster doesn't have the requisite software you will need to install it.

For example on Ubuntu Linux:

  $ sudo apt-get install ssh$ sudo apt-get install rsync

Download

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

  # set to the root of your Java installationexport JAVA_HOME=/usr/java/latest# Assuming your installation directory is /usr/local/hadoopexport HADOOP_PREFIX=/usr/local/hadoop

第二步不做好像没影响。

Try the following command:

  $ bin/hadoop

This will display the usage documentation for the hadoop script.

Now you are ready to start your Hadoop cluster in one of the three supported modes:

  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode

Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

  $ mkdir input$ cp etc/hadoop/*.xml input$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'$ cat output/*

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

Use the following:

etc/hadoop/core-site.xml:

<configuration><property><name>fs.defaultFS</name><value>hdfs://localhost:9000</value></property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration><property><name>dfs.replication</name><value>1</value></property>
</configuration>

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution

The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

  1. Format the filesystem:

      $ bin/hdfs namenode -format

  2. Start NameNode daemon and DataNode daemon:此步骤报很多警告,但不影响执行结果。
      $ sbin/start-dfs.sh

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

  3. Browse the web interface for the NameNode; by default it is available at:
    • NameNode - http://localhost:50070/
  4. Make the HDFS directories required to execute MapReduce jobs:
      $ bin/hdfs dfs -mkdir /user$ bin/hdfs dfs -mkdir /user/<username>

  5. Copy the input files into the distributed filesystem:
      $ bin/hdfs dfs -put etc/hadoop input

  6. Run some of the examples provided:
      $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'

  7. Examine the output files:

    Copy the output files from the distributed filesystem to the local filesystem and examine them:

      $ bin/hdfs dfs -get output output$ cat output/*

    or

    View the output files on the distributed filesystem:

      $ bin/hdfs dfs -cat output/*

  8. When you're done, stop the daemons with:
      $ sbin/stop-dfs.sh

YARN on Single Node

You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

  1. Configure parameters as follows:

    etc/hadoop/mapred-site.xml:

    <configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property>
    </configuration>

    etc/hadoop/yarn-site.xml:

    <configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
    </configuration>

  2. Start ResourceManager daemon and NodeManager daemon:
      $ sbin/start-yarn.sh

  3. Browse the web interface for the ResourceManager; by default it is available at:
    • ResourceManager - http://localhost:8088/
  4. Run a MapReduce job.
  5. When you're done, stop the daemons with:
      $ sbin/stop-yarn.sh

单机/伪分布式Hadoop2.4.1安装文档相关推荐

  1. hadoop2.6.5安装文档及解决root用户无法ssh localhost的问题

    hadoop2.6.5官方安装文档 解决root用户无法ssh localhost的问题:

  2. linux centos/debian下hadoop2.6.5单机伪分布式安装

    官方文档 hadoop2.6.5官方安装文档 配置/etc/hosts 增加节点名称node01 修改hdp配置文件 cd /opt/hadoop-2.6.5/etc/hadoop 第一个:hadoo ...

  3. Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04

    给力星 追逐内心的平和 首页 笔记 搜藏 代码 音乐 关于 Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04 2014-08-09 (updated: 2016 ...

  4. Hadoop安装教4程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.0

    Hadoop安装教4程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.0 当开始着手实践 Hadoop 时,安装 Hadoop 往往会成为新手的一道门槛.尽管安装其实很简单,书上有写到 ...

  5. Hadoop单机伪分布式安装(完整版)

    在学习Hadoop时,我发现网上的各种安装的资料要不不全,要不前后不匹配(比如有的是伪分布式,有的是完全分布式).此篇文章,我总结了身边的同学在安装Hadoop时遇到的毛病,在前面安装配置环节,尽可能 ...

  6. Hadoop单机伪分布式安装详解

    文章目录 写在开头的话 前提环境准备 配置JAVA环境 将jdk安装包传输至你的Linux宿主机中 操作jdk安装包,然后配置java环境 配置ssh免密登录 安装Hadoop,及其hadoop配置 ...

  7. Ubuntu安装HBase2.2.4并进行单机/伪分布式配置

    Ubuntu安装HBase2.2.4并进行单机/伪分布式配置 文章目录 Ubuntu安装HBase2.2.4并进行单机/伪分布式配置 前言 版本兼容性 详细流程 安装HBase2.2.4 HBase单 ...

  8. Hadoop3.1.3安装教程_单机/伪分布式配置_Hadoop3.1.3/Ubuntu18.04(16.04)

    厦门大学(林子雨老师)Hadoop3.1.3安装教程_单机/伪分布式配置_Hadoop3.1.3/Ubuntu18.04(16.04)

  9. 大数据-安装 Hadoop3.1.3 详细教程-单机/伪分布式配置(Centos)

    Centos 7 安装 Hadoop3.1.3 详细教程 前言 00 需准备 01 需掌握 一.准备工作 00 环境 01 创建 hadoop 用户 02 修改 hadoop 用户权限 03 切换为 ...

最新文章

  1. Kubeedge Edged概述
  2. linux查看接口名,在linux下 怎么查看网络接口的名字? 网络接口的名字英文是什么呀?...
  3. 关于Session_End()运行机制的一些细节!
  4. 对图片对比度和亮度的理解
  5. 如何通向“广义人工智能”?LSTM 提出者之一Sepp Hochreiter:将符号 AI 与神经 AI 相结合...
  6. 计算机局域网有哪些硬件组成,局域网的硬件组成有哪些
  7. php如何检测键盘按键,js键盘事件,判断按下的是哪个键
  8. MySQL数据的备份与还原实现步骤
  9. Linux的TUN/TAP编程
  10. [Android系列—] 2. Android 项目目录结构与用户界面的创建
  11. [翻译] KGModal
  12. 力扣题目——98. 验证二叉搜索树
  13. 微软Azure、谷歌GAE、亚马逊AWS比較
  14. java里赋值语句_java输入赋值语句
  15. 【ArcGIS教程03】基础知识(建议收藏)
  16. JFlash合并两个BIN文件
  17. 普元EOS7.5生成入参为数组的WebService接口
  18. 会计如何使用计算机,2020年高级会计师考试如何使用计算器?(图文说明)
  19. 一个全网最详细 Python 教程,不信你来学一学!
  20. epoll与reactor模式

热门文章

  1. I/O数据有几种传送方式?各有什么特点?【最精炼】
  2. 21行代码AC_ 试题H: 修改数组【解题报告】
  3. 使用JFreeChart在网页上绘制平滑曲线
  4. 语言设计谁年龄大_这桌子谁设计的?脑洞够大,除能360°翻转,打台球,乒乓球都行...
  5. 里面的自带的字典在哪里_白羊座性格的弱点在哪里 ?
  6. 发生生成错误是否继续并运行上次的成功生成_JavaScript 是如何运行的?
  7. createprocess失败代码2_极客战记[森林]:边地之叉-通关代码及讲解
  8. java threadstatus_Thread之一:线程生命周期及六种状态
  9. 华为新系统鸿蒙有哪些手机_华为鸿蒙OS系统传来新消息!外媒宣布:未来几年内华为手机都将无缘...
  10. easy admin java_GitHub - zzccbb8/easyadmin: 简易的java后台管理框架,基于SpringBoot+FreeMark+ace admin+mysql...