Hadoop: Rack Awareness Topology

Quoted from http://bradhedlund.com/2011/09/10/understanding-hadoop-clusters-and-the-network/

Hadoop has the concept of “Rack Awareness”. Hadoop administrator can manually define the rack number of each slave Data Node in your cluster. Why would you go through the trouble of doing this? There are two key reasons for this: Data loss prevention, and network performance. Remember that each block of data will be replicated to multiple machines to prevent the failure of one machine from losing all copies of data. Wouldn’t it be unfortunate if all copies of data happened to be located on machines in the same rack, and that rack experiences a failure? Such as a switch failure or power failure. That would be a mess. So to avoid this, somebody needs to know where Data Nodes are located in the network topology and use that information to make an intelligent decision about where data replicas should exist in the cluster. That “somebody” is the Name Node.

There is also an assumption that two machines in the same rack have more bandwidth and lower latency between each other than two machines in two different racks. This is true most of the time. The rack switch uplink bandwidth is usually (but not always) less than its downlink bandwidth. Furthermore, in-rack latency is usually lower than cross-rack latency (but not always). If at least one of those two basic assumptions are true, wouldn’t it be cool if Hadoop can use the same Rack Awareness that protects data to also optimally place work streams in the cluster, improving network performance? Well, it does!

Quoted from http://developer.yahoo.com/hadoop/tutorial/module2.html#rack

Rack Awareness

For small clusters in which all servers are connected by a single switch, there are only two levels of locality: "on-machine" and "off-machine." When loading data from a DataNode's local drive into HDFS, the NameNode will schedule one copy to go into the local DataNode, and will pick two other machines at random from the cluster.

For larger Hadoop installations which span multiple racks, it is important to ensure that replicas of data exist on multiple racks. This way, the loss of a switch does not render portions of the data unavailable due to all replicas being underneath it.

HDFS can be made rack-aware by the use of a script which allows the master node to map the network topology of the cluster. While alternate configuration strategies can be used, the default implementation allows you to provide an executable script which returns the "rack address" of each of a list of IP addresses.

The network topology script receives as arguments one or more IP addresses of nodes in the cluster. It returns on stdout a list of rack names, one for each input. The input and output order must be consistent.

To set the rack mapping script, specify the key topology.script.file.name in conf/hadoop-site.xml. This provides a command to run to return a rack id; it must be an executable script or program. By default, Hadoop will attempt to send a set of IP addresses to the file as several separate command line arguments. You can control the maximum acceptable number of arguments with the topology.script.number.args key.

Rack ids in Hadoop are hierarchical and look like path names. By default, every node has a rack id of /default-rack. You can set rack ids for nodes to any arbitrary path, e.g., /foo/bar-rack. Path elements further to the left are higher up the tree. Thus a reasonable structure for a large installation may be /top-switch-name/rack-name.

Hadoop rack ids are not currently expressive enough to handle an unusual routing topology such as a 3-d torus; they assume that each node is connected to a single switch which in turn has a single upstream switch. This is not usually a problem, however. Actual packet routing will be directed using the topology discovered by or set in switches and routers. The Hadoop rack ids will be used to find "near" and "far" nodes for replica placement (and in 0.17, MapReduce task placement).

The following example script performs rack identification based on IP addresses given a hierarchical IP addressing scheme enforced by the network administrator. This may work directly for simple installations; more complex network configurations may require a file- or table-based lookup process. Care should be taken in that case to keep the table up-to-date as nodes are physically relocated, etc. This script requires that the maximum number of arguments be set to 1.

#!/bin/bash
# Set rack id based on IP address.
# Assumes network administrator has complete control
# over IP addresses assigned to nodes and they are
# in the 10.x.y.z address space. Assumes that
# IP addresses are distributed hierarchically. e.g.,
# 10.1.y.z is one data center segment and 10.2.y.z is another;
# 10.1.1.z is one rack, 10.1.2.z is another rack in
# the same segment, etc.)
#
# This is invoked with an IP address as its only argument# get IP address from the input
ipaddr=$0# select "x.y" and convert it to "x/y"
segments=`echo $ipaddr | cut --delimiter=. --fields=2-3 --output-delimiter=/`
echo /${segments}

Topology Scripts sample from Hadoop Wiki.

Topology scripts are used by hadoop to determine the rack location of nodes. This information is used by hadoop to replicate block data to redundant racks. Here is a sample script that uses a separate data file. You can specified the rack mapping script via the key topology.script.file.name in conf/hadoop-site.xml, it must be an executable script or program.

Topology Script

HADOOP_CONF=/etc/hadoop/conf while [ $# -gt 0 ] ; donodeArg=$1exec< ${HADOOP_CONF}/topology.data result="" while read line ; doar=( $line ) if [ "${ar[0]}" = "$nodeArg" ] ; thenresult="${ar[1]}"fidone shift if [ -z "$result" ] ; thenecho -n "/default/rack "elseecho -n "$result "fi
done

Data file topology.data used by above topology script.

hadoopdata1.ec.com     /dc1/rack1
hadoopdata1            /dc1/rack1
10.1.1.1               /dc1/rack2

OpenFlow

Even more interesting would be a OpenFlow network, where the Name Node could query the OpenFlow controller about a Node’s location in the topology. Refer to http://bradhedlund.com/2011/04/21/data-center-scale-openflow-sdn/

Hadoop: Rack Awareness Topology相关推荐

HDFS机架感知功能原理（rack awareness）
HDFS NameNode对文件块复制相关所有事物负责,它周期性接受来自于DataNode的HeartBeat和BlockReport信息,HDFS文件块副本的放置对于系统整体的可靠性和性能有关键性影 ...
Hadoop HDFS数据仓库技术
作为大数据领域的始祖,开源项目Hadoop已经诞生了近15年了,虽然今天大数据技术已经层出不穷,市场上涌现出了很多优秀的大数据架构和产品,但是Hadoop中的很多技术实现仍然有借鉴意义,本篇我们就来看 ...
Apache Hadoop 2.9.2文档中文译文 -------未完！！！！！！
目录一. General(概括) 1. Overview 2. Single Node Setup 3. Cluster Setup 4. Commands Reference 5. FileSys ...
Apache Hadoop 入门教程
原文同步至 http://waylau.com/about-hadoop/ Apache Hadoop 是一个由 Apache 基金会所开发的分布式系统基础架构.可以让用户在不了解分布式底层细节的情况 ...
[HDFS Manual] CH1 HDFS体系结构
HDFS体系结构 1 1.HDFS体系结构... 2 1.1介绍... 2 1.2假设和目标... 2 1.3 NameNode 和 DataNode. 2 1.4 文件系统命名空间... 3 1.5 ...
HDFS架构（官方文档翻译）
文章目录 HDFS的架构 HDFS构想和目标 NameNode和DataNodes 文件系统命名空间数据副本副本放置副本选择 Block放置策略安全模式文件系统元数据的持久性通信协议鲁棒 ...
Hadoop集群搭 Hadoop分布式文件系统架构和设计
Hadoop集群搭建先决条件确保在你集群中的每个节点上都安装了所有必需软件. 获取Hadoop软件包. 安装安装Hadoop集群通常要将安装软件解压到集群内的所有机器上. 通常,集群里的一台机器 ...
Hadoop自学笔记（二）HDFS简单介绍
1. HDFS Architecture 一种Master-Slave结构.包括Name Node, Secondary Name Node,Data Node Job Tracker, Task T ...
hadoop job 未跑满资源_2018年第26周-解剖MapReduce Job
Hadoop架构预览 Apache Hadoop是一个开源软件框架,用于在廉价硬件上大规模存储和计算数据集.以下是5个组成Hadoop的模块. cluster是一个集合的主机(被称为nodes).No ...

Hadoop: Rack Awareness Topology

Hadoop: Rack Awareness Topology相关推荐

最新文章

热门文章