Before reading this post, please go through my previous post at “Introduction to Hadoop” to get some Apache Hadoop Basics.

在阅读这篇文章之前,请仔细阅读我在Hadoop简介上的文章,以获取一些Apache Hadoop基础知识。

In this post, we are going to discuss about Apache Hadoop 1.x Architecture and How it’s components work in detail.

在本文中,我们将讨论Apache Hadoop 1.x体系结构及其组件的详细工作方式。

邮政的简要目录 (Post’s Brief Table of Contents)

  • Hadoop 1.x ArchitectureHadoop 1.x架构
  • Hadoop 1.x Major ComponentsHadoop 1.x主要组件
  • How Hadoop 1.x Major Components WorksHadoop 1.x主要组件如何工作
  • How Store and Compute Operations Work in HadoopHadoop中存储和计算操作的工作方式

Hadoop 1.x架构 (Hadoop 1.x Architecture)

Apache Hadoop 1.x or earlier versions are using the following Hadoop Architecture. It is a Hadoop 1.x High-level Architecture. We will discuss in-detailed Low-level Architecture in coming sections.

Apache Hadoop 1.x或更早版本使用以下Hadoop体系结构。 它是Hadoop 1.x高级架构。 我们将在接下来的部分中讨论详细的低层架构。

If you don’t understand this Architecture at this stage, no need to worry. Read next sections in this post and also coming posts to understand it very well.

如果您在此阶段不了解此体系结构,则无需担心。 阅读这篇文章的下一部分以及以后的文章,以很好地理解它。

  • Hadoop Common Module is a Hadoop Base API (A Jar file) for all Hadoop Components. All other components works on top of this module.Hadoop通用模块是适用于所有Hadoop组件的Hadoop基本API(一个Jar文件)。 所有其他组件均在此模块上运行。
  • HDFS stands for Hadoop Distributed File System. It is also know as HDFS V1 as it is part of Hadoop 1.x. It is used as a Distributed Storage System in Hadoop Architecture.HDFS代表Hadoop分布式文件系统。 它也被称为HDFS V1,因为它是Hadoop 1.x的一部分。 它用作Hadoop体系结构中的分布式存储系统。
  • MapReduce is a Batch Processing or Distributed Data Processing Module. It is built by following Google’s MapReduce Algorithm. It is also know as “MR V1” or “Classic MapReduce” as it is part of Hadoop 1.x.MapReduce是批处理或分布式数据处理模块。 它是按照Google的MapReduce算法构建的。 由于它是Hadoop 1.x的一部分,因此也称为“ MR V1”或“ Classic MapReduce”。
  • Remaining all Hadoop Ecosystem components work on top of these two major components: HDFS and MapReduce. We will discuss all Hadoop Ecosystem components in-detail in my coming posts.其余所有Hadoop生态系统组件都在这两个主要组件之上工作:HDFS和MapReduce。 在我的后续文章中,我们将详细讨论所有Hadoop生态系统组件。

NOTE:-
Hadoop 1.x MapReduce is also know as “Classic MapReduce” as it was developed by following Google’s MapReduce Algorithm Tech Paper.

注意:-
Hadoop 1.x MapReduce也被称为“经典MapReduce”,因为它是遵循Google的MapReduce算法技术论文而开发的。

Hadoop 1.x主要组件 (Hadoop 1.x Major Components)

Hadoop 1.x Major Components components are: HDFS and MapReduce. They are also know as “Two Pillars” of Hadoop 1.x.

Hadoop 1.x主要组件组件包括:HDFS和MapReduce。 它们也被称为Hadoop 1.x的“两个Struts”。

HDFS:
HDFS is a Hadoop Distributed FileSystem, where our BigData is stored using Commodity Hardware. It is designed to work with Large DataSets with default block size is 64MB (We can change it as per our Project requirements).

HDFS:
HDFS是Hadoop分布式文件系统,其中的BigData使用商品硬件存储。 它设计用于默认块大小为64MB的大型数据集(我们可以根据我们的项目要求进行更改)。

HDFS component is again divided into two sub-components:

HDFS组件再次分为两个子组件:

  1. Name Node名称节点
  2. Name Node is placed in Master Node. It used to store Meta Data about Data Nodes like “How many blocks are stored in Data Nodes, Which Data Nodes have data, Slave Node Details, Data Nodes locations, timestamps etc” .

    名称节点放置在主节点中。 它用于存储有关数据节点的元数据,例如“数据节点中存储了多少块,哪些数据节点具有数据,从节点详细信息,数据节点位置,时间戳等”。

  3. Data Node数据节点
  4. Data Nodes are places in Slave Nodes. It is used to store our Application Actual Data. It stores data in Data Slots of size 64MB by default.

    数据节点是从节点中的位置。 它用于存储我们的应用程序实际数据。 默认情况下,它将数据存储在大小为64MB的数据槽中。

MapReduce:
MapReduce is a Distributed Data Processing or Batch Processing Programming Model. Like HDFS, MapReduce component also uses Commodity Hardware to process “High Volume of Variety of Data at High Velocity Rate” in a reliable and fault-tolerant manner.

MapReduce:
MapReduce是分布式数据处理或批处理编程模型。 与HDFS一样,MapReduce组件也使用商品硬件以可靠且容错的方式处理“高速率下的大量数据”。

MapReduce component is again divided into two sub-components:

MapReduce组件再次分为两个子组件:

  1. Job Tracker工作追踪器
  2. Job Tracker is used to assign MapReduce Tasks to Task Trackers in the Cluster of Nodes. Sometimes, it reassigns same tasks to other Task Trackers as previous Task Trackers are failed or shutdown scenarios.

    Job Tracker用于将MapReduce任务分配给节点集群中的Task Tracker。 有时,由于以前的任务跟踪器发生故障或关闭情况,它会将相同的任务重新分配给其他任务跟踪器。

    Job Tracker maintains all the Task Trackers status like Up/running, Failed, Recovered etc.

    Job Tracker维护所有任务跟踪器状态,如“正在运行/正在运行”,“失败”,“已恢复”等。

  3. Task Tracker任务追踪器
  4. Task Tracker executes the Tasks which are assigned by Job Tracker and sends the status of those tasks to Job Tracker.

    任务跟踪器执行任务跟踪器分配的任务,并将这些任务的状态发送到任务跟踪器。

We will discuss these four sub-component’s responsibilities and how they interact each other to perform a “Client Application Tasks” in detail in next section.

在下一部分中,我们将详细讨论这四个子组件的职责以及它们如何相互作用以执行“客户端应用程序任务”。

Hadoop 1.x主要组件如何工作 (How Hadoop 1.x Major Components Works)

Hadoop 1.x components follow this architecture to interact each other and to work parallel in a reliable and fault-tolerant manner.

Hadoop 1.x组件遵循此架构相互交互,并以可靠且容错的方式并行工作。

Hadoop 1.x Components High-Level Architecture

Hadoop 1.x组件高级架构

  • Both Master Node and Slave Nodes contain two Hadoop Components:主节点和从节点都包含两个Hadoop组件:
  1. HDFS ComponentHDFS组件
  2. MapReduce ComponentMapReduce组件
  • Master Node’s HDFS component is also known as “Name Node”.主节点的HDFS组件也称为“名称节点”。
  • Slave Node’s HDFS component is also known as “Data Node”.从节点的HDFS组件也称为“数据节点”。
  • Master Node’s “Name Node” component is used to store Meta Data.主节点的“名称节点”组件用于存储元数据。
  • Slave Node’s “Data Node” component is used to store actual our application Big Data.从节点的“数据节点”组件用于存储实际的应用程序大数据。
  • HDFS stores data by using 64MB size of “Data Slots” or “Data Blocks”.HDFS通过使用64MB大小的“数据插槽”或“数据块”来存储数据。
  • Master Node’s MapReduce component is also known as “Job Tracker”.主节点的MapReduce组件也称为“作业跟踪器”。
  • Slave Node’s MapReduce component is also known as “Task Tracker”.从节点的MapReduce组件也称为“任务跟踪器”。
  • Master Node’s “Job Tracker” will take care assigning tasks to “Task Tracker” and receiving results from them.主节点的“作业跟踪器”将小心地将任务分配给“任务跟踪器”并从中接收结果。
  • Slave Node’s MapReduce component “Task Tracker” contains two MapReduce Tasks:从节点的MapReduce组件“任务跟踪器”包含两个MapReduce任务:
    1. Map Task地图任务
    2. Reduce Task减少任务

    We will discuss in-detail about MapReduce tasks (Mapper and Reducer) in my coming post with some simple End-to-End Examples.

    我们将在我的后续文章中通过一些简单的端到端示例详细讨论MapReduce任务(Mapper和Reducer)。

  • Slave Node’s “Task Tracker” actually performs Client’s tasks by using MapReduce Batch Processing model.从节点的“任务跟踪器”实际上是通过使用MapReduce批处理模型来执行客户端的任务的。
  • Master Node is a Primary Node to take care of all remaining Slave Nodes (Secondary Nodes).主节点是主要节点,用于照顾所有剩余的从节点(次要节点)。
  • Hadoop 1.x Components In-detail Architecture

    Hadoop 1.x组件详细架构

    Hadoop 1.x Architecture Description

    Hadoop 1.x架构说明

    • Clients (one or more) submit their work to Hadoop System.客户(一个或多个)将他们的工作提交到Hadoop系统。
    • When Hadoop System receives a Client Request, first it is received by a Master Node.Hadoop系统收到客户端请求时,首先会被主节点接收。
    • Master Node’s MapReduce component “Job Tracker” is responsible for receiving Client Work and divides into manageable independent Tasks and assign them to Task Trackers.主节点的MapReduce组件“作业跟踪器”负责接收客户端工作,并分为可管理的独立任务,并将其分配给任务跟踪器。
    • Slave Node’s MapReduce component “Task Tracker” receives those Tasks from “Job Tracker” and perform those tasks by using MapReduce components.从节点的MapReduce组件“任务跟踪器”从“作业跟踪器”接收那些任务,并使用MapReduce组件执行那些任务。
    • Once all Task Trackers finished their job, Job Tracker takes those results and combines them into final result.所有任务跟踪程序完成工作后,任务跟踪程序将获得这些结果并将其合并为最终结果。
    • Finally Hadoop System will send that final result to the Client.最终,Hadoop系统会将最终结果发送给客户端。

    Hadoop中存储和计算操作的工作方式 (How Store and Compute Operations Work in Hadoop)

    All these Master Node and Slave Nodes are organized into a Network of clusters. Each Cluster is again divided into Racks. Each rack contains a set of Nodes (Commodity Computer).

    所有这些主节点和从节点都被组织成一个集群网络。 每个群集再次分为机架。 每个机架包含一组节点(商品计算机)。

    When Hadoop system receives “Store” operation like storing Large DataSets into HDFS, it stores that data into 3 different Nodes (As we configure Replication Factor = 3 by default). This complete data is not stored in one single node. Large Data File is divided into manageable and meaningful Blocks and distributed into different nodes with 3 copies.

    当Hadoop系统收到“将大型数据集存储到HDFS中”的“存储”操作时,它将数据存储到3个不同的节点中(默认情况下,我们将复制因子设置为3)。 此完整数据未存储在单个节点中。 大数据文件分为可管理的有意义的块,并分配给具有3个副本的不同节点。

    If Hadoop system receives any “Compute” operation, it will talk to near-by nodes to retrieve those blocks of Data. While Reading Data or Computing if one or more nodes get failed, then it will automatically pick-up performing those tasks by approaching any near-by and available node.

    如果Hadoop系统收到任何“计算”操作,它将与附近的节点通信以检索这些数据块。 在读取数据或计算时,如果一个或多个节点发生故障,则它将通过接近任何附近的可用节点来自动执行这些任务。

    That’s why Hadoop system provides highly available and fault tolerant BigData Solutions.

    这就是Hadoop系统提供高可用性和容错BigData解决方案的原因。

    NOTE:-

    注意:-

    • Hadoop 1.x Architecture has lot of limitations and drawbacks. So that Hadoop Community has evaluated and redesigned this Architecture into Hadoop 2.x Architecture.Hadoop 1.x体系结构具有很多局限性和缺陷。 因此,Hadoop社区已对该架构进行了评估并将其重新设计为Hadoop 2.x架构。
    • Hadoop 2.x Architecture is completely different and resolved all Hadoop 1.x Architecture’s limitations and drawbacks.Hadoop 2.x架构完全不同,并且解决了所有Hadoop 1.x架构的局限和缺点。

    That’s it all about Hadoop 1.x Architecture, Hadoop Major Components and How those components work together to fulfill Client requirements. We will discuss “Hadoop 2.x Architecture, Major Components and How those components work” in my coming post.

    Hadoop 1.x架构,Hadoop主要组件以及这些组件如何协同工作以满足客户需求就这些了。 在我的后续文章中,我们将讨论“ Hadoop 2.x架构,主要组件以及这些组件如何工作”。

    We hope you understood Hadoop 1.x Architecture and how it works very well now.

    我们希望您了解Hadoop 1.x架构以及它现在如何运作良好。

    Please drop me a comment if you like my post or have any issues/suggestions.

    如果您喜欢我的帖子或有任何问题/建议,请给我评论。

    翻译自: https://www.journaldev.com/8808/hadoop1-architecture-and-how-major-components-works

Hadoop 1.x:体系结构,主要组件以及HDFS和MapReduce的工作方式相关推荐

  1. Hadoop体系结构– YARN,HDFS和MapReduce

    Before reading this post, please go through my previous post at "Hadoop 1.x: Architecture and H ...

  2. 初识Hadoop两大核心:HDFS和MapReduce

    一.Hadoop是什么? Hadoop是一个能够对大量数据进行分布式处理的软件框架,实现了Google的MapReduce编程模型和框架,能够把应用程序分割成许多的小的工作单元,并把这些单元放到任何集 ...

  3. 从零开始大数据--Hadoop、HDFS、MapReduce、HBase、Hive

    文章目录 概述 Hadoop HDFS HBase 实现原理 Regin服务器原理 HBase安装与使用 NoSQL数据库 MapReduce Hive 概述 IT领域每隔十五年就会迎来一次重大变革: ...

  4. Hadoop详解(二):HDFS存储系统设计原理

    1. 基本概念 1.1 NameNode HDFS采用Master/Slave架构.NameNode就是HDFS的Master架构.HDFS系统包括一个NameNode组件,主要负责HDFS文件系统的 ...

  5. Hadoop教程(三):HDFS、MapReduce、程序入门实践

    Hadoop 附带了一个名为 HDFS(Hadoop分布式文件系统)的分布式文件系统,基于 Hadoop 的应用程序使用 HDFS .HDFS 是专为存储超大数据文件,运行在集群的商品硬件上.它是容错 ...

  6. Zookeeper 教程:Zookeeper作为Hadoop和Hbase的重要组件,为分布式应用程序协调服务

    目录 Zookeeper 教程 适用人群 学习前提 Zookeeper 概述 分布式应用 分布式应用的优点 分布式应用的挑战 什么是Apache ZooKeeper? ZooKeeper的好处 Zoo ...

  7. 《R与Hadoop大数据分析实战》一1.6 HDFS和MapReduce架构

    本节书摘来自华章出版社<R与Hadoop大数据分析实战>一书中的第1章,第1.6节,作者 (印)Vignesh Prajapati,更多章节内容可以访问云栖社区"华章计算机&qu ...

  8. HADOOP基本操作命令,及其组件端口

    全栈工程师开发手册 (作者:栾鹏) 架构系列文章 hadoop的集群部署,可以参考https://blog.csdn.net/luanpeng825485697/article/details/819 ...

  9. 【Hadoop快速入门】Hdfs、MapReduce、Yarn

    1. Hahoop概述 1.1 Hodoop是什么 1) Hadoop是一个有Apache基金会所开发的分布式系统基础架构 2) 主要解决海量数据的存储和海量数据的分析计算问题 3) 广义上来说,Ha ...

最新文章

  1. 句法模式识别(两)-正规文法、上下文无关文法
  2. 过年也学(nei)习 (juan)| 图像特征提取与匹配技术
  3. 独家 | Python的“predict_prob”方法不能真实反映预测概率校准(如何实现校准)...
  4. MongoDB 4.0 事务实现解析
  5. 建模大师怎么安装到revit中_「Revit技巧」插件挤满了、冲突了,怎么办?
  6. 复杂数据权限设计方案
  7. golang中的collection
  8. 期初付年金(annuity-due)
  9. 常用的分隔符有哪三种_Node.js系列四 - 常用的内置模块
  10. Linux下配置完整安全的DHCP服务器详解
  11. 直接插入排序-java
  12. 用友服务器系统,用友软件 用友云服务器
  13. fastJson、JackJson以及Gson序列化对象与get、set以及对象属性之间的关系
  14. [精简]托福核心词汇100
  15. HashiConf 2018 视频资源:主题演讲以及Breakout Sessions
  16. uni-app 动态添加class
  17. 蓝桥嵌入式之 USB转串口(FT2232D/ATMEL528)
  18. Win32的入口函数WinMain前面的WINAPI有什么意义?
  19. android音乐播放器开发在线加载歌词,android开发计算器源码
  20. Wonderware-InTouch曲线的趋势笔“添加”与“删除“模型

热门文章

  1. python学习之装饰器---转
  2. android_Media
  3. day1-4js算术运算符及类型转化
  4. 四边形不等式优化dp
  5. hdu1113 Word Amalgamation(详解--map和string的运用)
  6. (转载)Android项目实战(二十七):数据交互(信息编辑)填写总结
  7. 在mysql的操作界面中,如何清屏幕
  8. linux下安装软件后的环境变量设置
  9. c#基础(一)之内存管理
  10. 数据库设计还是不是信息系统的核心?