分布式系统开发注意点

by Shubheksha

通过Shubheksha

分布式计算概述：分布式系统如何工作 (Distributed Computing in a nutshell: How distributed systems work)

This post distills the material presented in the paper titled “A Note on Distributed Systems” published in 1994 by Jim Waldo and others.

这篇文章摘录了Jim Waldo等人于1994年发表的题为“ 有关分布式系统的说明 ”的论文中介绍的材料。

The paper presents the differences between local and distributed computing in the context of Object Oriented Programming. It explains why treating them the same is incorrect and leads to applications that aren’t robust or reliable.

本文介绍了在面向对象编程的情况下本地计算和分布式计算之间的差异。它解释了为什么对它们进行相同的处理是不正确的，并导致应用程序不可靠或不可靠。

介绍 (Introduction)

The paper kicks off by stating that the current work in distributed systems is modeled around objects — more specifically, a unified view of objects. Objects are defined by their supported interfaces and the operations they support.

本文首先指出，分布式系统中的当前工作是围绕对象建模的-更具体地说，是对象的统一视图。 对象由其支持的接口及其支持的操作定义。

Naturally, this can be extended to imply that objects in the same address space, or in a different address space on the same machine, or on a different machine, all behave in a similar manner. Their location is an implementation detail.

自然地，这可以扩展为暗示相同地址空间中，同一机器上或不同机器上不同地址空间中的对象的行为均相似。它们的位置是一个实现细节。

Let’s define the most common terms in this paper:

让我们定义本文中最常见的术语：

本地计算 (Local Computing)

It deals with programs that are confined to a single address space only.

它处理的程序仅限于单个地址空间。

分布式计算 (Distributed Computing)

It deals with programs that can make calls to objects in different address spaces either on the same machine or on a different machine.

它处理的程序可以在同一台机器或不同机器上的不同地址空间中调用对象。

统一对象的愿景 (The Vision of Unified Objects)

Implicit in this vision is that the system will be “objects all the way down.” This means that all current invocations, or calls for system services, will eventually be converted into calls that might be made to an object residing on some other machine. There is a single paradigm of object use and communication used no matter what the location of the object might be.

这种愿景的隐含含义是该系统将是“一直向下的对象”。这意味着所有当前的调用或对系统服务的调用最终都将转换为可能对驻留在其他计算机上的对象进行的调用。无论对象位于何处，都只有一个对象使用和通信范式。

This refers to the assumption that all objects are defined only in terms of their interfaces. Their implementation also includes location of the object, and is independent of their interfaces and hidden from the programmer.

这是指所有对象仅根据其接口定义的假设。它们的实现还包括对象的位置，并且与它们的接口无关并且对程序员而言是隐藏的。

As far the programmer is concerned, they write the same type of call for every object, whether local or remote. The system takes care of sending the message by figuring out the underlying mechanisms not visible to the programmer who is writing the application.

就程序员而言，他们为本地或远程的每个对象编写相同类型的调用。系统通过找出对编写应用程序的程序员不可见的底层机制来处理消息。

The hard problems in distributed computing are not the problems of how to get things on and off the wire.

分布式计算中的难题不是如何使事情在线上或离线的问题。

The paper goes on to define the toughest challenges of building a distributed system:

本文继续定义构建分布式系统的最艰巨挑战：

Latency潜伏
Memory Access记忆体存取
Partial failure and concurrency部分失败和并发

Ensuring a reasonable performance while dealing with all the above doesn’t make the life of the a distributed systems engineer any easier. And the lack of any central resource or state manager adds to the various challenges. Let’s observe each of these one by one.

在处理上述所有问题的同时确保合理的性能不会使分布式系统工程师的工作变得更加轻松。而且缺少任何中央资源或状态管理器会增加各种挑战。让我们一一观察。

潜伏 (Latency)

This is the fundamental difference between local and distributed object invocation.

这是本地对象调用和分布式对象调用之间的根本区别。

The paper claims that a remote call is four to five times slower than a local call. If the design of a system fails to recognize this fundamental difference, it is bound to suffer from serious performance problems. Especially if it relies on remote communication.

该论文声称，远程呼叫比本地呼叫慢四到五倍。如果系统的设计未能认识到这一根本差异，则势必会遭受严重的性能问题。特别是如果它依赖于远程通信。

You need to have a thorough understanding of the application being designed so you can decide which objects should be kept together and which can be placed remotely.

您需要对正在设计的应用程序有透彻的了解，以便可以决定哪些对象应该放在一起，哪些可以远程放置。

If the goal is to unify the difference in latency, then we’ve two options:

如果目标是统一延迟差异，那么我们有两个选择：

Rely on the hardware to get faster with time to eliminate the difference in efficiency依靠硬件来获得更快的速度以消除效率差异
Develop tools which allow us to visualize communication patterns between different objects and move them around as required. Since location is an implementation detail, this shouldn’t be too hard to achieve开发工具，使我们可以可视化不同对象之间的通信模式，并根据需要移动它们。由于位置是实现细节，因此实现起来应该不难

记忆 (Memory)

Another difference that’s very relevant to the design of distributed systems is the pattern of memory access between local and remote objects. A pointer in the local address space isn’t valid in a remote address space.

与分布式系统的设计非常相关的另一个区别是本地对象与远程对象之间的内存访问模式。本地地址空间中的指针在远程地址空间中无效。

We’re left with two choices:

我们有两个选择：

The developer must be made aware of the difference between the access patterns必须使开发人员了解访问模式之间的差异
To unify the differences in access between local and remote access, we need to let the system handle all aspects of access to memory.为了统一本地访问和远程访问之间的访问差异，我们需要让系统处理对内存访问的所有方面。

There are several way to do that:

有几种方法可以做到这一点：

Distributed shared memory分布式共享内存
Using the OOP (Object-oriented programming) paradigm, compose a system entirely of objects — one that deals only with object references.

使用OOP (面向对象编程)范式，可以完全由一个对象组成一个系统-一个仅处理对象引用的系统。

The transfer of data between address spaces can be dealt with by marshalling and unmarshalling the data by the layer underneath. This approach, however, makes the use of address-space-relative pointers obsolete.

地址空间之间的数据传输可以通过下面的层对数据进行编组和解组来处理。但是，这种方法使相对于地址空间的指针的使用变得过时了。

The danger lies in promoting the myth that “remote access and local access are exactly the same.” We should not reinforce this myth. An underlying mechanism that does not unify all memory accesses while still promoting this myth is both misleading and prone to error.

危险在于宣传“远程访问和本地访问完全相同”的神话。我们不应该加强这个神话。不能统一所有内存访问而又仍在提倡这一神话的基本机制既容易引起误解，也容易出错。

It’s important for programmers to be made aware of the various differences between accessing local and remote objects. We don’t want them to get bitten by not knowing what’s happening under the covers.

对于程序员来说，重要的是要意识到访问本地对象和远程对象之间的各种差异。我们不希望他们不知道幕后发生的事情而被他们咬住。

部分失败与并发 (Partial failure & concurrency)

Partial failure is a central reality of distributed computing.

部分故障是分布式计算的中心现实。

The paper argues that both local and distributed systems are subject to failure. But it’s harder to discover what went wrong in the case of distributed systems.

该论文认为，本地系统和分布式系统都容易出现故障。但是，很难发现在分布式系统中出了什么问题。

For a local system, either everything is shut down or there is some central authority which can detect what went wrong (the OS, for example).

对于本地系统，要么一切都已关闭，要么有一些中央机构可以检测出哪里出了问题(例如OS)。

Yet, in the case of a distributed system, there is no global state or resource manager available to keep track of everything happening in and across the system. So there is no way to inform other components which may be functioning correctly which ones have failed. Components in a distributed system fail independently.

但是，在分布式系统的情况下，没有可用的全局状态或资源管理器来跟踪系统中和整个系统中发生的一切。因此，无法通知可能正在正常运行的其他组件，哪些已发生故障。分布式系统中的组件会独立发生故障。

A central problem in distributed computing is insuring that the state of the whole system is consistent after such a failure. This is a problem that simply does not occur in local computing.

分布式计算中的一个中心问题是确保发生此类故障后，整个系统的状态保持一致。这是在本地计算中根本不会发生的问题。

For a system to withstand partial failure, it’s important that it deals with indeterminacy, and that the objects react to it in a consistent manner. The interfaces must be able to state the cause of failure, if possible. And then allow the reconstruction of a “reasonable state” in case the cause can’t be determined.

对于承受部分故障的系统，重要的是要处理不确定性，并且对象以一致的方式对其做出React。如果可能的话，接口必须能够说明故障原因。然后，在无法确定原因的情况下，允许重构“合理状态”。

The question is not “can you make remote method invocation look like local method invocation,” but rather “what is the price of making remote method invocation identical to local method invocation?”

问题不是“您能否使远程方法调用看起来像本地方法调用”，而是“使远程方法调用与本地方法调用相同的价格是多少？”

Two approaches come to mind:

我想到两种方法：

Treat all interfaces and objects as local. The problem with this approach is that it doesn’t take into account the failure models associated with distributed systems. Therefore, it’s indeterministic by nature.将所有接口和对象都视为本地。这种方法的问题在于它没有考虑与分布式系统相关的故障模型。因此，它本质上是不确定的。
Treat all interfaces and objects as remote. The flaw with this approach is that it over-complicates local computing. It adds on a ton of work for objects that are never accessed remotely.将所有接口和对象都视为远程对象。这种方法的缺陷在于它使本地计算过于复杂。它为永不远程访问的对象增加了很多工作。

A better approach is to accept that there are irreconcilable differences between local and distributed computing, and to be conscious of those differences at all stages of the design and implementation of distributed applications.

更好的方法是接受本地计算和分布式计算之间的不可调和的差异，并在分布式应用程序的设计和实现的所有阶段意识到这些差异。

P.S. — If you made it this far and would like to receive a mail whenever I publish one of these posts, sign up here.

PS —如果您到现在为止，并且希望在我发布这些帖子之一时收到邮件，请在此处注册。

翻译自: https://www.freecodecamp.org/news/a-note-on-distributed-systems-3c796f1eb0a0/