Hadoop 1.x：体系结构，主要组件以及HDFS和MapReduce的工作方式

Before reading this post, please go through my previous post at “Introduction to Hadoop” to get some Apache Hadoop Basics.

在阅读这篇文章之前，请仔细阅读我在Hadoop简介上的文章，以获取一些Apache Hadoop基础知识。

In this post, we are going to discuss about Apache Hadoop 1.x Architecture and How it’s components work in detail.

在本文中，我们将讨论Apache Hadoop 1.x体系结构及其组件的详细工作方式。

邮政的简要目录 (Post’s Brief Table of Contents)

Hadoop 1.x ArchitectureHadoop 1.x架构
Hadoop 1.x Major ComponentsHadoop 1.x主要组件
How Hadoop 1.x Major Components WorksHadoop 1.x主要组件如何工作
How Store and Compute Operations Work in HadoopHadoop中存储和计算操作的工作方式

Hadoop 1.x架构 (Hadoop 1.x Architecture)

Apache Hadoop 1.x or earlier versions are using the following Hadoop Architecture. It is a Hadoop 1.x High-level Architecture. We will discuss in-detailed Low-level Architecture in coming sections.

Apache Hadoop 1.x或更早版本使用以下Hadoop体系结构。它是Hadoop 1.x高级架构。我们将在接下来的部分中讨论详细的低层架构。

If you don’t understand this Architecture at this stage, no need to worry. Read next sections in this post and also coming posts to understand it very well.

如果您在此阶段不了解此体系结构，则无需担心。阅读这篇文章的下一部分以及以后的文章，以很好地理解它。

Hadoop Common Module is a Hadoop Base API (A Jar file) for all Hadoop Components. All other components works on top of this module.Hadoop通用模块是适用于所有Hadoop组件的Hadoop基本API（一个Jar文件）。所有其他组件均在此模块上运行。
HDFS stands for Hadoop Distributed File System. It is also know as HDFS V1 as it is part of Hadoop 1.x. It is used as a Distributed Storage System in Hadoop Architecture.HDFS代表Hadoop分布式文件系统。它也被称为HDFS V1，因为它是Hadoop 1.x的一部分。它用作Hadoop体系结构中的分布式存储系统。
MapReduce is a Batch Processing or Distributed Data Processing Module. It is built by following Google’s MapReduce Algorithm. It is also know as “MR V1” or “Classic MapReduce” as it is part of Hadoop 1.x.MapReduce是批处理或分布式数据处理模块。它是按照Google的MapReduce算法构建的。由于它是Hadoop 1.x的一部分，因此也称为“ MR V1”或“ Classic MapReduce”。
Remaining all Hadoop Ecosystem components work on top of these two major components: HDFS and MapReduce. We will discuss all Hadoop Ecosystem components in-detail in my coming posts.其余所有Hadoop生态系统组件都在这两个主要组件之上工作：HDFS和MapReduce。在我的后续文章中，我们将详细讨论所有Hadoop生态系统组件。

NOTE:-
Hadoop 1.x MapReduce is also know as “Classic MapReduce” as it was developed by following Google’s MapReduce Algorithm Tech Paper.

注意：-
Hadoop 1.x MapReduce也被称为“经典MapReduce”，因为它是遵循Google的MapReduce算法技术论文而开发的。

Hadoop 1.x主要组件 (Hadoop 1.x Major Components)

Hadoop 1.x Major Components components are: HDFS and MapReduce. They are also know as “Two Pillars” of Hadoop 1.x.

Hadoop 1.x主要组件组件包括：HDFS和MapReduce。它们也被称为Hadoop 1.x的“两个Struts”。

HDFS:
HDFS is a Hadoop Distributed FileSystem, where our BigData is stored using Commodity Hardware. It is designed to work with Large DataSets with default block size is 64MB (We can change it as per our Project requirements).

HDFS：
HDFS是Hadoop分布式文件系统，其中的BigData使用商品硬件存储。它设计用于默认块大小为64MB的大型数据集（我们可以根据我们的项目要求进行更改）。

HDFS component is again divided into two sub-components:

HDFS组件再次分为两个子组件：

Name Node名称节点
Name Node is placed in Master Node. It used to store Meta Data about Data Nodes like “How many blocks are stored in Data Nodes, Which Data Nodes have data, Slave Node Details, Data Nodes locations, timestamps etc” .

名称节点放置在主节点中。它用于存储有关数据节点的元数据，例如“数据节点中存储了多少块，哪些数据节点具有数据，从节点详细信息，数据节点位置，时间戳等”。
Data Node数据节点
Data Nodes are places in Slave Nodes. It is used to store our Application Actual Data. It stores data in Data Slots of size 64MB by default.

数据节点是从节点中的位置。它用于存储我们的应用程序实际数据。默认情况下，它将数据存储在大小为64MB的数据槽中。

MapReduce:
MapReduce is a Distributed Data Processing or Batch Processing Programming Model. Like HDFS, MapReduce component also uses Commodity Hardware to process “High Volume of Variety of Data at High Velocity Rate” in a reliable and fault-tolerant manner.

MapReduce：
MapReduce是分布式数据处理或批处理编程模型。与HDFS一样，MapReduce组件也使用商品硬件以可靠且容错的方式处理“高速率下的大量数据”。

MapReduce component is again divided into two sub-components:

MapReduce组件再次分为两个子组件：

Job Tracker工作追踪器
Job Tracker is used to assign MapReduce Tasks to Task Trackers in the Cluster of Nodes. Sometimes, it reassigns same tasks to other Task Trackers as previous Task Trackers are failed or shutdown scenarios.

Job Tracker用于将MapReduce任务分配给节点集群中的Task Tracker。有时，由于以前的任务跟踪器发生故障或关闭情况，它会将相同的任务重新分配给其他任务跟踪器。

Job Tracker maintains all the Task Trackers status like Up/running, Failed, Recovered etc.

Job Tracker维护所有任务跟踪器状态，如“正在运行/正在运行”，“失败”，“已恢复”等。
Task Tracker任务追踪器
Task Tracker executes the Tasks which are assigned by Job Tracker and sends the status of those tasks to Job Tracker.

任务跟踪器执行任务跟踪器分配的任务，并将这些任务的状态发送到任务跟踪器。

We will discuss these four sub-component’s responsibilities and how they interact each other to perform a “Client Application Tasks” in detail in next section.

在下一部分中，我们将详细讨论这四个子组件的职责以及它们如何相互作用以执行“客户端应用程序任务”。

Hadoop 1.x主要组件如何工作 (How Hadoop 1.x Major Components Works)

Hadoop 1.x components follow this architecture to interact each other and to work parallel in a reliable and fault-tolerant manner.

Hadoop 1.x组件遵循此架构相互交互，并以可靠且容错的方式并行工作。

Hadoop 1.x Components High-Level Architecture

Hadoop 1.x组件高级架构

Both Master Node and Slave Nodes contain two Hadoop Components:主节点和从节点都包含两个Hadoop组件：

HDFS ComponentHDFS组件
MapReduce ComponentMapReduce组件

Master Node’s HDFS component is also known as “Name Node”.主节点的HDFS组件也称为“名称节点”。
Slave Node’s HDFS component is also known as “Data Node”.从节点的HDFS组件也称为“数据节点”。
Master Node’s “Name Node” component is used to store Meta Data.主节点的“名称节点”组件用于存储元数据。
Slave Node’s “Data Node” component is used to store actual our application Big Data.从节点的“数据节点”组件用于存储实际的应用程序大数据。
HDFS stores data by using 64MB size of “Data Slots” or “Data Blocks”.HDFS通过使用64MB大小的“数据插槽”或“数据块”来存储数据。
Master Node’s MapReduce component is also known as “Job Tracker”.主节点的MapReduce组件也称为“作业跟踪器”。
Slave Node’s MapReduce component is also known as “Task Tracker”.从节点的MapReduce组件也称为“任务跟踪器”。
Master Node’s “Job Tracker” will take care assigning tasks to “Task Tracker” and receiving results from them.主节点的“作业跟踪器”将小心地将任务分配给“任务跟踪器”并从中接收结果。
Slave Node’s MapReduce component “Task Tracker” contains two MapReduce Tasks:从节点的MapReduce组件“任务跟踪器”包含两个MapReduce任务：
1. Map Task地图任务
2. Reduce Task减少任务
We will discuss in-detail about MapReduce tasks (Mapper and Reducer) in my coming post with some simple End-to-End Examples.

我们将在我的后续文章中通过一些简单的端到端示例详细讨论MapReduce任务（Mapper和Reducer）。
Slave Node’s “Task Tracker” actually performs Client’s tasks by using MapReduce Batch Processing model.从节点的“任务跟踪器”实际上是通过使用MapReduce批处理模型来执行客户端的任务的。
Master Node is a Primary Node to take care of all remaining Slave Nodes (Secondary Nodes).主节点是主要节点，用于照顾所有剩余的从节点（次要节点）。
Hadoop 1.x Components In-detail Architecture

Hadoop 1.x组件详细架构

Hadoop 1.x Architecture Description

Hadoop 1.x架构说明
- Clients (one or more) submit their work to Hadoop System.客户（一个或多个）将他们的工作提交到Hadoop系统。
- When Hadoop System receives a Client Request, first it is received by a Master Node.Hadoop系统收到客户端请求时，首先会被主节点接收。
- Master Node’s MapReduce component “Job Tracker” is responsible for receiving Client Work and divides into manageable independent Tasks and assign them to Task Trackers.主节点的MapReduce组件“作业跟踪器”负责接收客户端工作，并分为可管理的独立任务，并将其分配给任务跟踪器。
- Slave Node’s MapReduce component “Task Tracker” receives those Tasks from “Job Tracker” and perform those tasks by using MapReduce components.从节点的MapReduce组件“任务跟踪器”从“作业跟踪器”接收那些任务，并使用MapReduce组件执行那些任务。
- Once all Task Trackers finished their job, Job Tracker takes those results and combines them into final result.所有任务跟踪程序完成工作后，任务跟踪程序将获得这些结果并将其合并为最终结果。
- Finally Hadoop System will send that final result to the Client.最终，Hadoop系统会将最终结果发送给客户端。
Hadoop中存储和计算操作的工作方式 (How Store and Compute Operations Work in Hadoop)

All these Master Node and Slave Nodes are organized into a Network of clusters. Each Cluster is again divided into Racks. Each rack contains a set of Nodes (Commodity Computer).

所有这些主节点和从节点都被组织成一个集群网络。每个群集再次分为机架。每个机架包含一组节点（商品计算机）。

When Hadoop system receives “Store” operation like storing Large DataSets into HDFS, it stores that data into 3 different Nodes (As we configure Replication Factor = 3 by default). This complete data is not stored in one single node. Large Data File is divided into manageable and meaningful Blocks and distributed into different nodes with 3 copies.

当Hadoop系统收到“将大型数据集存储到HDFS中”的“存储”操作时，它将数据存储到3个不同的节点中（默认情况下，我们将复制因子设置为3）。此完整数据未存储在单个节点中。大数据文件分为可管理的有意义的块，并分配给具有3个副本的不同节点。

If Hadoop system receives any “Compute” operation, it will talk to near-by nodes to retrieve those blocks of Data. While Reading Data or Computing if one or more nodes get failed, then it will automatically pick-up performing those tasks by approaching any near-by and available node.

如果Hadoop系统收到任何“计算”操作，它将与附近的节点通信以检索这些数据块。在读取数据或计算时，如果一个或多个节点发生故障，则它将通过接近任何附近的可用节点来自动执行这些任务。

That’s why Hadoop system provides highly available and fault tolerant BigData Solutions.

这就是Hadoop系统提供高可用性和容错BigData解决方案的原因。

NOTE:-

注意：-
- Hadoop 1.x Architecture has lot of limitations and drawbacks. So that Hadoop Community has evaluated and redesigned this Architecture into Hadoop 2.x Architecture.Hadoop 1.x体系结构具有很多局限性和缺陷。因此，Hadoop社区已对该架构进行了评估并将其重新设计为Hadoop 2.x架构。
- Hadoop 2.x Architecture is completely different and resolved all Hadoop 1.x Architecture’s limitations and drawbacks.Hadoop 2.x架构完全不同，并且解决了所有Hadoop 1.x架构的局限和缺点。
That’s it all about Hadoop 1.x Architecture, Hadoop Major Components and How those components work together to fulfill Client requirements. We will discuss “Hadoop 2.x Architecture, Major Components and How those components work” in my coming post.

Hadoop 1.x架构，Hadoop主要组件以及这些组件如何协同工作以满足客户需求就这些了。在我的后续文章中，我们将讨论“ Hadoop 2.x架构，主要组件以及这些组件如何工作”。

We hope you understood Hadoop 1.x Architecture and how it works very well now.

我们希望您了解Hadoop 1.x架构以及它现在如何运作良好。

Please drop me a comment if you like my post or have any issues/suggestions.

如果您喜欢我的帖子或有任何问题/建议，请给我评论。

翻译自: https://www.journaldev.com/8808/hadoop1-architecture-and-how-major-components-works