流沙解压密码

by Shubheksha

通过Shubheksha

在流沙的基础上：总结 (Building on Quicksand: A Summary)

Let’s try to break down the paper “Building On Quicksand” published by Pat Helland and David Campbell in 2009. All pull quotes are from the paper.

让我们尝试分解一下Pat Helland和David Campbell在2009 年发表的论文“ Building on Quicksand ”。所有引文均来自该论文。

The paper focuses on the design of large, fault-tolerant, replicated distributed systems. It also discusses its evolving based on changing requirements over time. It starts off by stating “Reliable systems have always been built out of unreliable components”.

本文着重于大型，容错，复制的分布式系统的设计。它还讨论了随着时间变化的需求而变化的情况。首先指出“可靠的系统始终由不可靠的组件构成”。

As the granularity of the unreliable component grows (from a mirrored disk to a system to a data center), the latency to communicate with a backup becomes unpalatable. This leads to a more relaxed model for fault tolerance. The primary system will acknowledge the work request and its actions without waiting to ensure that the backup is notified of the work. This improves the responsiveness of the system because the user is not delayed behind a slow interaction with the backup.

随着不可靠组件的粒度(从镜像磁盘到系统再到数据中心)的增长，与备份进行通信的延迟变得令人讨厌。这导致了更为宽松的容错模型。主系统将确认工作请求及其动作，而无需等待确保将工作通知备份。由于不延迟用户与备份的缓慢交互，因此可以提高系统的响应速度。

Fault-tolerant systems can be made of many components. Their goal is keep functioning when one of those components fail. We don’t consider Byzantinefailures in this discussion. Instead, the fail fast model where either a component works correctly or it fails.

容错系统可以由许多组件组成。他们的目标是在这些组件之一发生故障时保持运行。在此讨论中，我们不考虑拜占庭式的失败。而是使用故障快速模型，其中组件正常工作或发生故障。

The paper goes on to compare two versions of the Tandem NonStop system. One that used synchronous checkpointing and one that used asynchronous checkpointing. Refer section 3 of the paper for all the details. I’d like to touch upon the difference between the two checkpointing strategies.

本文继续比较了Tandem NonStop系统的两个版本。一种使用同步检查点，另一种使用异步检查点。有关所有详细信息，请参阅本文的第3节。我想谈谈两种检查点策略之间的区别。

Synchronous checkpointing: in this case, with every write to the primary, state needed to be sent to the backup. Only after the backup acknowledged the write, did the primary send a response to the client who issued the write request. This ensured that when the primary fails, the backup can take over without losing any work.同步检查点：在这种情况下，每次写入主节点时，都需要将状态发送到备份。仅在备份确认写入后，主数据库才向发出写入请求的客户端发送响应。这样可以确保当主数据库发生故障时，备份可以接管而不会丢失任何工作。
Asynchronous checkpointing: in this strategy, the primary acknowledges and commits the write. This is done as soon as it processes it without waiting for a reply from the backup. This technique has improved latency but also poses other challenges addressed later.异步检查点：在这种策略中，主要节点确认并提交写入。只要处理完该操作，就可以完成操作，而无需等待备份的答复。该技术改善了等待时间，但也带来了以后解决的其他挑战。

日志传送 (Log Shipping)

A classic database system has a process that reads the log and ships it to a backup data-center. The normal implementation of this mechanism commits transactions at the primary system (acknowledging the user’s commit request) and asynchronously ships the log. The backup database replays the log, constantly playing catch-up.

经典的数据库系统具有读取日志并将其发送到备份数据中心的过程。此机制的常规实现在主系统上提交事务(确认用户的提交请求)，并异步发送日志。备份数据库重播日志，不断进行追赶。

The mechanism described above is termed as log shipping. The main problem this poses is that when the primary fails and the back up takes over, some recent transactions might be lost.

上述机制称为日志传送。造成的主要问题是，当主数据库发生故障并且备份接管时，一些最近的事务可能会丢失。

This inherently opens up a window in which the work is acknowledged to the client but it has not yet been shipped to the backup. A failure of the primary during this window will lock the work inside the primary for an unknown period of time. The backup will move ahead without knowledge of the locked-up work.

这会固有地打开一个窗口，在该窗口中，工作已被客户端确认，但尚未交付给备份。在此窗口期间，主数据库故障将在未知时间段内将工作锁定在主数据库内。备份将继续进行，而无需了解锁定的工作。

The introduction of asynchrony into the system has an advantage in latency, response time and performance. However, it makes the system more prone to the possibility of losing work when the primary fails. There are two ways to deal with this:

将异步引入系统在延迟，响应时间和性能方面具有优势。但是，这使得系统在主节点发生故障时更容易丢失工作。有两种方法可以解决此问题：

Discard the work locked in the primary when it fails. Whether a system can do that or not depends on the requirements and business rules.发生故障时，请丢弃主数据库中锁定的工作。系统能否做到这一点取决于需求和业务规则。
Have a recovery mechanism to sync the primary with backups when it comes back up and retries lost work. This is possible only if the operations can be retried in an idempotent way and the out-of-order retries are possible.具有恢复机制，可以在主数据库备份并重试丢失的工作时使其与备份同步。仅当可以以幂等方式重试操作并且无序重试是可能的时，这才有可能。

The system loses the notion of what the authors call “an authoritative truth”. Nobody knows the accurate state of the system at any given point in time if the work is locked in an unavailable backup or primary.

该系统失去了作者所谓的“权威真理”的概念。如果工作被锁定在不可用的备份或主数据库中，那么在任何给定的时间点，没人会知道系统的准确状态。

The authors conclude that business rules in a system with asynchronous checkpointing are probabilistic.

作者得出的结论是，具有异步检查点的系统中的业务规则是概率性的。

If a primary uses asynchronous checkpointing and applies a business rule on the incoming work, it is necessarily a probabilistic rule. The primary, despite its best intentions, cannot know it will be alive to enforce the business rules.

如果主数据库使用异步检查点并将业务规则应用于传入的工作，则它必然是一个概率规则。主数据库尽管有最好的意图，却不知道执行业务规则会活着。

When the backup system that participates in the enforcement of these business rules is asynchronously tied to the primary, the enforcement of these rules inevitably becomes probabilistic!

当参与执行这些业务规则的备份系统异步地绑定到主数据库时，这些规则的执行不可避免地会成为概率！

The authors state that commutative operations, operations that can be reordered, can be executed independently, as long as the operation preserves business rules. However, this is hard to do with storage systems because the write operation isn’t commutative.

作者指出，交换操作(可以重新排序的操作)可以独立执行，只要该操作保留业务规则即可。但是，这对于存储系统来说很难做到，因为写操作不是可交换的。

Another consideration is that work of a single operation is idempotent. For example, executing the operation any number of time should result in the same state of the system.

另一个考虑因素是，单个操作的工作是幂等的。例如，任意次数执行该操作应导致系统处于相同状态。

To ensure this, applications typically assign a unique number or ID to the work. This is assigned at the ingress to the system (i.e. whichever replica first handles the work). As the work request rattles around the network, it is easy for a replica to detect that it has already seen that operation and, hence, not do the work twice.

为了确保这一点，应用程序通常为作品分配唯一的编号或ID。这是在入口处分配给系统的(即哪个副本首先处理工作)。随着工作请求在网络上四处骚动，副本很容易检测到它已经看到了该操作，因此不会重复进行两次工作。

The authors suggest that different operations within a system provide different consistency guarantees. Yet, this depends on the business requirements. Some operations can choose classic consistency over availability and vice versa.

作者建议，系统内的不同操作可提供不同的一致性保证。但是，这取决于业务需求。某些操作可以选择经典一致性而不是可用性，反之亦然。

Next, the authors argue that as soon there is no notion of authoritative truth in a system. All computing boils down to three things: memories, guesses, and apologies.

接下来，作者认为，系统中没有权威真理的概念。所有计算都归结为三件事：记忆，猜测和道歉。

Memories: you can only hope that your replica remembers what it has already seen.回忆：您只能希望您的副本能记住已经看到的内容。
Guesses: Due to only partial knowledge being available, the replicas take actions based on local state and may be wrong. “In any system which allows a degradation of the absolute truth, any action is, at best, a guess.” Any action in such a system has a high probability of being successful, but it’s still a guess.猜测：由于仅提供部分知识，因此副本根据本地状态采取操作，可能是错误的。 “在任何允许绝对真理降级的系统中，任何行动充其量只能是猜测。” 在这样的系统中，任何动作成功的可能性都很高，但这仍然是一个猜测。
Apologies: Mistakes are inevitable. Hence, every business needs to have an apology mechanism in place either through human intervention or by automating it.道歉：不可避免的错误。因此，每项业务都需要通过人工干预或自动化来实现道歉机制。

The paper next discusses the topic of eventual consistency. The authors do this by taking the Amazon shopping cart built using Dynamo & a system for clearing checks as examples. A single replica identifies and processes the work coming into these systems. It flows to other replicas as and when connectivity permits. The requests coming into these systems are commutative (reorderable). They can be processed at different replicas in different orders.

接下来，本文讨论了最终一致性的主题。作者以使用Dynamo构建的Amazon购物车和用于清除支票的系统为例来完成此任务。单个副本标识并处理进入这些系统的工作。只要连接允许，它就会流到其他副本。进入这些系统的请求是可交换的(可重新排序)。可以在不同副本中以不同顺序对其进行处理。

Storage systems alone cannot provide the commutativity we need to create robust systems that function with asynchronous checkpointing. We need the business operations to reorder. Amazon’s Dynamo does not do this by itself. The shopping cart application on top of the Dynamo storage system is responsible for the semantics of eventual consistency and commutativity. The authors think it is time for us to move past the examination of eventual consistency in terms of updates and storage systems. The real action comes when examining application based operation semantics.

单靠存储系统无法提供交换能力，而我们需要创建具有异步检查点功能的强大系统。我们需要对业务进行重新排序。亚马逊的Dynamo本身并不这样做。 Dynamo存储系统顶部的购物车应用程序负责最终一致性和可交换性的语义。作者认为，现在该是我们超越对更新和存储系统的最终一致性进行检查的时候了。真正的动作是在检查基于应用程序的操作语义时出现的。

Next, they discuss two strategies for allocating resources in replicas that might not be able to communicate with each other:

接下来，他们讨论了两种在副本中可能无法相互通信的资源分配策略：

Over-provisioning: the resources are partitioned between replicas. Each has a fixed subset of resources they can allocate. No replica can allocate a resource that’s not actually available.过度配置：资源在副本之间进行分区。每个人都有可以分配的固定资源子集。没有副本可以分配实际上不可用的资源。
Over-booking: the resources can be individually allocated without ensuring strict partitioning. This may lead to the replicas allocating a resource that’s not available, promising something they can’t deliver.预订过多：可以在不确保严格分区的情况下单独分配资源。这可能会导致副本分配一个不可用的资源，承诺它们无法交付的内容。

The paper talks also about something termed as the “seat reservation pattern”. This is a compromise between over-provisioning and over-booking:

本文还讨论了称为“座位预订模式”的事情。这是超额配置和超额预订之间的折衷方案：

Anyone who has purchased tickets online will recognize the “Seat Reservation” pattern where you can identify potential seats and then you have a bounded period of time, (typically minutes), to complete the transaction. If the transaction is not successfully concluded within the time period, the seats are once again marked as “available”.

在线购买门票的任何人都将认识到“座位预订”模式，您可以在其中识别出潜在的座位，然后有一段有限的时间(通常为几分钟)来完成交易。如果在该时间段内未成功完成交易，则将席位再次标记为“可用”。

酸性2.0 (ACID 2.0)

The classic definition of ACID stands for “Atomic, Consistent, Isolated, and Durable”. Its goal is to make the application think that there is a single computer which isn’t doing anything else when the transaction is being processed. The authors talk about a new definition for ACID. which stands for Associative, Commutative, Idempotent, and Distributed.

ACID的经典定义代表“原子，一致，隔离和耐用”。其目标是使应用程序认为只有一台计算机在处理事务时没有执行任何其他操作。作者讨论了ACID的新定义。代表联合，交换，幂等和分布式。

The goal for ACID2.0 is to succeed if the pieces of the work happen: At least once, anywhere in the system, in any order. This defines a new KIND of consistency. The individual steps happen at one or more system. The application is explicitly tolerant of work happening out of order. It is tolerant of the work happening more than once per machine, too.

ACID2.0的目标是成功完成以下工作：在系统中的任何位置以任何顺序进行至少一次。这定义了一种新的一致性。各个步骤发生在一个或多个系统上。该应用程序明确地容忍混乱的工作。每台机器也可以容忍多次工作。

Going by the classic definition of ACID, a linear history is a basis for fault tolerance. If we want to achieve the same guarantees in a distributed system, it’ll require concurrency control mechanisms which “tend to be fragile”.

按照ACID的经典定义，线性历史记录是容错的基础。如果我们想在分布式系统中实现相同的保证，则将需要“趋于脆弱”的并发控制机制。

When the application is constrained to the additional requirements of commutativity and associativity, the world gets a LOT easier. No longer must the state be checkpointed across failure units in a synchronous fashion. Instead, it is possible to be very lazy about the sharing of information. This opens up offline, slow links, low-quality datacenters, and more.

当应用程序受限于可交换性和关联性的其他要求时，世界变得更容易了。不再必须以同步方式跨故障单元对状态进行检查。相反，可能对信息共享非常懒惰。这将打开离线，慢速链接，低质量的数据中心等。

In conclusion:

结论：

We have attempted to describe the patterns in use by many applications today as they cope with failures in widely distributed systems. It is the reorderability of work and repeatability of work that is essential to allowing successful application execution on top of the chaos of a distributed world in which systems come and go when they feel like it.

我们试图描述当今许多应用程序在应对广泛分布的系统中的故障时正在使用的模式。工作的可重排序性和工作的可重复性对于使成功的应用程序能够在分布式世界的混乱中得以成功运行至关重要，在混乱的世界中，系统会随时随地出现。

P.S. — If you made it this far and would like to receive a mail whenever I publish one of these posts, sign up here.

PS —如果您到现在为止，并且希望在我发布这些帖子之一时收到邮件，请在此处注册。

翻译自: https://www.freecodecamp.org/news/building-on-quicksand-a-summary-bc4e9e7c347/

流沙解压密码

流沙解压密码_在流沙的基础上：总结相关推荐

python武功秘籍解压密码_武林秘籍
你好,我是一名极客!一个 75 后的老工程师! 我将花两分钟,表述清楚我喊你来这里的目的! 如果你看过武侠小说,你可以把这个经历理解为,你失足落入一个山洞遇到了一位垂暮的老者!而这位老者打算传你一套 ...
python武功秘籍解压密码_压缩解压
Linux下最常用的打包程序就是tar了,使用tar程序打出来的包我们常称为tar包,tar包文件的命令通常都是以.tar结尾的.生成tar包后,就可以用其它的程序来进行压缩了,所以首先就来讲讲ta ...
从python开始学编程pdf 解压密码_从Python开始学编程PDF高清完整版网盘免费分享...
提取码:szq0 image 内容简介 · · · · · · 改编自Vamei博客的<Python快速教程>.本书以Python为样本,不仅介绍了编程的基本概念,还着重讲解编程语言的主流 ...
取pi的前8位的解压密码_两种方式实现取16位变量的高低8位, 不严谨对比
程序如下,第一种方式是强制指针转换,再取结构体成员:第二种方式是简单的移位.前面这种写法得考虑大小端序, 后者不用管. #include <stdio.h>#define hi8(x) ( ...
树莓派python编程入门与实战解压密码_树莓派Python编程入门与实战
目录第一部分树莓派编程环境第1章配置树莓派 3 1.1 获取树莓派 3 1.1.1 了解树莓派的历史 3 1.1.2 为什么要学习用Python 在树莓派上进行编程 4 1.2 获取树莓派 5 ...
解压文件密码是html格式,解压密码
2002年--2004年期间的解压密码为http://www.doczj.com/doc/05c97c6f561252d380eb6e23.html或http://www.doczj.com/doc/ ...
压缩包文件的解压密码如何解除
压缩包文件带有解压密码,想要解除解压密码,只能将压缩包文件解压出来之后,对文件再次进行加密,这次加密的时候就不要对文件进行加密就可以了,但如果你不知道压缩包文件的密码,就不能解压文件,也就不能解除解压 ...
压缩包文件，解压密码可以删除吗？
首先,压缩包文件的解压密码是不能删除的. 如果是想要拿到压缩包内的文件,是需要找到正确密码才可以解压文件的,如果你想要得到没有密码的压缩包,需要将文件解压出来,重新压缩的时候不进行加密才能得到没有密码 ...
解密压缩包的解压密码
压缩包设置了解压密码,但是自己设置了密码,长时间没使用就忘记了密码,现在想要解压文件,但是没有密码就没办法解压文件,现在如何解密解压密码呢? 想要解密压缩包密码,就只能找回正确的解压密码,使用奥凯丰 ...

流沙解压密码_在流沙的基础上：总结

在流沙的基础上：总结 (Building on Quicksand: A Summary)

日志传送 (Log Shipping)

酸性2.0 (ACID 2.0)

流沙解压密码_在流沙的基础上：总结相关推荐

最新文章

热门文章