lyft读音

At Lyft, we chose to move our server infrastructure onto Kubernetes, a distributed container orchestration system in order to take advantage of automation, have a solid platform we can build upon, and lower overall cost with efficiency gains.

在Lyft,我们选择将服务器基础架构移至Kubernetes,Kubernetes是分布式容器编排系统,以利用自动化的优势,拥有可构建的可靠平台,并通过提高效率来降低总体成本。

Distributed systems can be difficult to reason about and understand, and Kubernetes is no exception. Despite the many benefits of Kubernetes, we discovered several pain points while adopting Kubernetes’ built-in CronJob as a platform for running repeated, scheduled tasks. In this two-part blog series, we will dive deep into the technical and operational shortcomings of Kubernetes CronJob at scale and share what we did to overcome them.

分布式系统可能难以推理和理解,Kubernetes也不例外。 尽管Kubernetes有许多好处,但在采用Kubernetes内置的CronJob作为运行重复的计划任务的平台时,我们发现了几个痛点。 在这个分为两部分的博客系列中,我们将深入探讨Kubernetes CronJob的技术和操作缺陷,并分享我们为克服这些缺陷所做的工作。

Part 1 (this article) of this series discusses in detail the shortcomings we’ve encountered using Kubernetes CronJob at Lyft. In part 2, we share what we did to address these issues in our Kubernetes stack to improve usability and reliability.

本系列的第1部分(本文)详细讨论了使用Lyft的Kubernetes CronJob遇到的缺点。 在第2部分中,我们分享了我们为解决Kubernetes堆栈中的这些问题所做的工作,以提高可用性和可靠性。

这是给谁的 (Who is this for?)

  • Users of Kubernetes CronJobKubernetes CronJob的用户
  • Anyone building a platform on top of Kubernetes任何人在Kubernetes上构建平台
  • Anyone interested in running distributed, scheduled tasks on Kubernetes任何对在Kubernetes上运行分布式的计划任务感兴趣的人
  • Anyone interested in learning about Kubernetes usage at scale in the real-world有兴趣了解现实世界中大规模使用Kubernetes的任何人
  • Kubernetes contributorsKubernetes贡献者

阅读此书您会得到什么? (What will you gain from reading this?)

  • Insight into how parts of Kubernetes (in particular, CronJob) behave at scale in the real-world.深入了解Kubernetes的各个部分(特别是CronJob)在现实世界中的行为。
  • Lessons learned from using Kubernetes as a platform at a company like Lyft, and how we addressed the shortcomings.在Lyft这样的公司中使用Kubernetes作为平台汲取的经验教训以及我们如何解决这些缺点。

先决条件 (Prerequisites)

  • Basic familiarity with the cron concept

    基本了解cron概念

  • Basic understanding of how CronJob works, specifically the relationship between the CronJob controller, the Jobs it creates, and their underlying Pods, in order to better understand the CronJob deep-dives and comparisons with Unix cron later in this article.

    对CronJob的工作原理有基本的了解,特别是CronJob控制器,它创建的Jobs和它们的底层Pod之间的关系,以便更好地理解CronJob的深层次以及与Unix cron的比较。

  • Familiarity with the sidecar container pattern and what it is used for. At Lyft, we make use of sidecar container ordering to make sure that runtime dependencies like Envoy, statsd, etc., packaged as sidecar containers, are up and running prior to the application container itself.

    熟悉sidecar容器模式及其用途。 在Lyft,我们利用sidecar容器排序来确保打包为sidecar容器的运行时依赖项(如Envoy,statsd等)在应用程序容器本身之前启动并运行。

背景和术语 (Background & Terminology)

  • The cronjobcontroller is the piece of code in the Kubernetes control-plane that reconciles CronJobs

    cronjobcontroller是Kubernetes控制面板中协调CronJobs的一段代码

  • A cron is said to be invoked when it is executed by some machinery (usually in accordance to its schedule)

    据说cron在某些机器执行时会被调用 (通常根据其时间表)

  • Lyft Engineering operates on a platform infrastructure model where there is an infrastructure team (henceforth referred to as platform team, platform engineers, or platform infrastructure) and the customers of the platform are other engineers at Lyft (henceforth referred to as developers, service developers, users, or customers). Engineers at Lyft own, operate, and maintain what they build, hence “operat-” is used throughout this article.

    Lyft Engineering在平台基础架构模型上运行,该模型中有一个基础架构团队(以下称为平台团队平台工程师平台基础架构 ), 平台的客户是Lyft的其他工程师(以下称为开发人员服务开发人员用户客户 )。 Lyft的工程师拥有,操作和维护他们所构建的内容,因此本文通篇使用“ operat- ”。

Lyft的CronJobs (CronJobs at Lyft)

Today at Lyft, we run nearly 500 cron tasks with more than 1500 invocations per-hour in our multi-tenant production Kubernetes environment.

今天,在Lyft,我们在多租户生产Kubernetes环境中运行了将近500个cron任务,每小时执行1500多次调用。

Repeated, scheduled tasks are widely used at Lyft for a variety of use cases. Prior to adopting Kubernetes, these were executed using Unix cron directly on Linux boxes. Developer teams were responsible for writing their crontab definitions and provisioning the instances that run them using the Infrastructure As Code (IaC) pipelines that the platform infrastructure team maintained.

Lyft在各种用例中广泛使用了重复的计划任务。 在采用Kubernetes之前,这些都是在Linux机器上直接使用Unix cron执行的。 开发人员团队负责编写其crontab定义,并使用平台基础结构团队维护的基础设施代码 (IaC)管道来配置运行它们的实例。

As part of a larger effort to containerize and migrate workloads to our internal Kubernetes platform, we chose to adopt Kubernetes CronJob* to replace Unix cron as a cron executor in this new, containerized environment. Like many, we chose Kubernetes for many of its theoretical benefits, one of which is efficient resource usage.

作为将工作负载容器化和迁移到我们内部Kubernetes平台的更大努力的一部分,我们选择采用Kubernetes CronJob *来代替Unix cron作为这种新的容器化环境中的cron执行程序。 与许多人一样,我们选择Kubernetes的原因是它具有许多理论上的好处,其中之一就是有效地利用资源。

Consider a cron that runs once a week for 15 minutes. In our old environment, the machine running that cron is sitting idle 99.85% of the time. With Kubernetes CronJob, compute resources (CPU, memory) are only used during the lifetime of a cron invocation. The rest of the time, Kubernetes can efficiently use those resources to run other CronJobs or scale down the cluster all together. Given the previous method for executing cron tasks, there was much to gain by transitioning to a model where jobs are made ephemeral.

考虑一个每周运行15分钟的cron。 在我们的旧环境中,运行cron的机器空闲时间为99.85%。 使用Kubernetes CronJob,仅在cron调用的生命周期内使用计算资源(CPU,内存)。 在其余时间里,Kubernetes可以有效地使用这些资源来运行其他CronJob或一起缩小集群。 鉴于以前的执行cron任务的方法,通过过渡到临时创建工作的模型,可以获得很多好处。

The platform and developer ownership boundary in Lyft’s K8s stackLyft K8s堆栈中的平台和开发人员所有权边界

Since adopting Kubernetes as a platform, developer teams no longer provision and operate their own compute instances. Instead, the platform engineering team is responsible for maintaining and operating the compute resources and runtime dependencies used in our Kubernetes stack, as well as generating the Kubernetes CronJob objects themselves. Developers need only configure their cron schedule and application code.

自从采用Kubernetes作为平台以来,开发团队不再配置和操作自己的计算实例。 取而代之的是,平台工程团队负责维护和操作Kubernetes堆栈中使用的计算资源和运行时依赖项,以及自己生成Kubernetes CronJob对象。 开发人员只需配置其cron计划和应用程序代码。

This all sounds good on paper, but in practice, we discovered several pain points in moving crons away from the well-understood environment of traditional Unix cron to the distributed, ephemeral environment of Kubernetes using CronJob.

所有这些在纸上听起来都不错,但是在实践中,我们发现了使用CronJob将cron从易于理解的传统Unix cron环境转移到Kubernetes的分布式临时环境的痛苦。

* while CronJob was, and still is (as of Kubernetes v1.18), a beta API, we found that it fit the bill for the requirements we had at the time, and further, it fit in nicely with the rest of the Kubernetes infrastructure tooling we had already built.

* 尽管CronJob曾经是(并且至今仍是)(从Kubernetes v1.18起)是一个beta API,但我们发现它符合当时的要求,而且与其他Kubernetes很好地契合我们已经构建的基础架构工具。

Kubernetes CronJob(与Unix cron相比)有什么不同? (What’s so different about Kubernetes CronJob (versus Unix cron)?)

A simplified sequence of events and K8s software components involved in executing a Kubernetes CronJob执行Kubernetes CronJob涉及的事件和K8s软件组件的简化序列

To better understand why Kubernetes CronJobs can be difficult to work with in a production environment, we must first discuss what makes CronJob different. Kubernetes CronJobs promise to run like cron tasks on a Linux or Unix system; however, there are a few key differences in their behavior compared to a Unix cron: Startup Performance and Failure handling.

为了更好地理解为什么在生产环境中很难使用Kubernetes CronJobs,我们必须首先讨论什么使CronJob与众不同。 Kubernetes CronJobs承诺可以像cron任务一样在Linux或Unix系统上运行 ; 但是,它们的行为与Unix cron相比有一些关键差异: 启动性能故障处理

启动表现 (Startup Performance)

We begin by defining start delay to be the wall time from expected cron start to application code actually executing. That is, if a cron is expected to run at 00:00:00, and the application code actually begins execution at 00:00:22, then the particular cron invocation has a start delay of 22 seconds.

我们首先将启动延迟定义为从预期的cron启动到实际执行的应用程序代码的时间。 也就是说,如果cron预计在00:00:00运行,而应用程序代码实际上在00:00:22开始执行,则特定cron调用的启动延迟为22秒。

Traditional Unix crons experience very minimal start delay. When it is time for a Unix cron to be invoked, the specified command just runs. To illustrate this, consider the following cron definition:

传统的Unix crons的启动延迟非常短。 当需要调用Unix cron时, 只需运行指定的命令。 为了说明这一点,请考虑以下cron定义:

# run the date command at midnight every night0 0 * * * date >> date-cron.log

With this cron definition, one can expect the following output:

使用此cron定义,可以期待以下输出:

# date-cron.logMon Jun 22 00:00:00 PDT 2020Tue Jun 23 00:00:00 PDT 2020

On the other hand, Kubernetes CronJobs can experience significant start delays because they require several events to happen prior to any application code beginning to run. Just to name a few:

另一方面,Kubernetes CronJobs可能会经历严重的启动延迟,因为它们需要在任何应用程序代码开始运行之前发生多个事件。 仅举几个:

  1. cronjobcontroller processes and decides to invoke the CronJob

    cronjobcontroller处理并决定调用CronJob

  2. cronjobcontroller creates a Job out of the CronJob’s Job spec

    cronjobcontroller根据CronJob的Job规范创建Job

  3. jobcontroller notices the newly created Job and creates a Pod

    jobcontroller注意到新创建的Job并创建一个Pod

  4. Kubernetes admission controllers inject sidecar Container specs into the Pod spec*Kubernetes准入控制器将Sidecar Container规范注入Pod规范*
  5. kube-scheduler schedules the Pod onto a kubelet

    kube-scheduler将Pod安排到kubelet上

  6. kubelet runs the Pod (pulling all container images)

    kubelet运行Pod(拉出所有容器图像)

  7. kubelet starts all sidecar containers*

    kubelet启动所有kubelet箱*

  8. kubelet starts the application container*

    kubelet启动应用程序容器*

* unique to Lyft’s Kubernetes stack

* Lyft的Kubernetes堆栈独有

At Lyft, we found that start delay was especially compounded by #1, #5, and #7 once we reached a certain scale of CronJobs in our Kubernetes environment.

在Lyft,我们发现,一旦在Kubernetes环境中达到一定规模的CronJobs,启动延迟就会特别受到#1,#5和#7的影响。

Cronjobcontroller处理延迟 (Cronjobcontroller Processing Latency)

To better understand where this latency comes from, let’s dive into the source-code of the built-in cronjobcontroller. Through Kubernetes 1.18, the cronjobcontroller simply lists all CronJobs every 10 seconds and does some controller logic over each. The cronjobcontroller implementation does so synchronously, issuing at least 1 additional API call for every CronJob. When the number of CronJobs exceeds a certain amount, these API calls begin to be rate-limited client-side. The latencies from the 10 second polling cycle and API client rate-limiting add up and contribute to a noticeable start-delay for CronJobs.

为了更好地了解这种延迟的来源,让我们深入了解内置cronjobcontroller的源代码。 通过Kubernetes 1.18, cronjobcontroller仅每10秒列出所有CronJob,并对每个 CronJob执行一些控制器逻辑 。 cronjobcontroller实现同步执行,为每个CronJob发出至少1个额外的API调用 。 当CronJob的数量超过一定数量时,这些API调用将开始受客户端的速率限制 。 10秒轮询周期的延迟和API客户端速率限制的延迟加起来并导致CronJobs的启动延迟明显。

计划Cron Pod (Scheduling Cron Pods)

Due to the nature of cron schedules, most crons are expected to run at the top of the minute (XX:YY:00). For example, an @hourly cron is expected to execute at 01:00:00, 02:00:00, and so on. In a multi-tenant cron platform with lots of crons scheduled to run every hour, every 15 minutes, every 5 minutes, etc., this produces hot-spots where lots of crons need to be invoked simultaneously. At Lyft, we noticed that one such hot spot is the top of the hour (XX:00:00). These hot-spots can put strain on and expose additional client-side rate-limiting in control-plane components involved in the happy-path of CronJob execution like the kube-scheduler and kube-apiserver causing start delay to increase noticeably.

由于Cron时间表的性质,大多数的Cron都应该在分钟的最高时间( XX:YY:00 )运行。 例如, @hourly cron预计在01: 01:00:00 : 02:00:00 : 01:00:00执行,依此类推。 在多租户cron平台中,计划每小时,每15分钟,每5分钟等等运行很多cron,这会产生热点 ,需要同时调用很多cron。 在Lyft,我们注意到这样的热点之一就是小时的高峰( XX:00:00 )。 这些热点可能会对CronJob执行的快乐路径中涉及的控制平面组件(如kube-schedulerkube-apiserver施加压力并暴露出额外的客户端速率限制,从而导致启动延迟显着增加。

Additionally, if you do not provision compute for peak demand (and/or use a cloud-provider for compute instances) and instead use something like cluster autoscaler to dynamically scale nodes, then node launch times can contribute additional delays to launching CronJob Pods.

此外,如果您没有为高峰需求配置计算(和/或为计算实例使用云提供商),而是使用诸如群集自动缩放器之类的工具动态扩展节点,则节点启动时间会给启动CronJob Pods带来额外的延迟。

Pod执行:非应用容器 (Pod Execution: Non-application Containers)

Once a CronJob Pod has successfully scheduled onto a kubelet, the kubelet needs to pull and execute the container images of all sidecars and the application itself. Due to the way Lyft uses sidecar ordering to gate application containers, if any of these sidecar containers are slow to start, or need to be restarted, they will propagate additional start delay.

一旦CronJob Pod已成功调度到kubelet ,该kubelet需要提取并执行所有小车和应用程序本身的容器映像。 由于Lyft使用Sidecar排序方式控制应用程序容器的方式,如果这些Sidecar容器中的任何一个启动缓慢或需要重新启动,它们将传播额外的启动延迟。

To summarize, each of these events that happen prior to application code actually executing combined with the scale of CronJobs in a multi-tenant environment can introduce noticeable and unpredictable start delay. As we will see later on, this start delay can negatively affect the behavior of a CronJob in the real-world by causing CronJobs to miss runs.

总而言之,在多租户环境中,在实际执行应用程序代码之前发生的所有这些事件,再加上CronJob的规模,都可能导致明显的和不可预测的启动延迟。 稍后我们将看到,此启动延迟会导致CronJob错过运行,从而对CronJob在现实世界中的行为产生负面影响。

集装箱故障处理 (Container Failure handling)

It is good practice to monitor the execution of crons. With Unix cron, doing so is fairly straight-forward. Unix crons interpret the given command with the specified $SHELL, and, when the command exits (whether successful or not), that particular invocation is done. One rudimentary way of monitoring a Unix cron then is to introduce a command-wrapper script like so:

监视克朗的执行是一个好习惯。 使用Unix cron,这样做非常简单。 Unix crons用指定的$SHELL解释给定的命令,并且,当命令退出时(无论成功与否),都会完成特定的调用。 然后,监视Unix cron的一种基本方法是引入命令包装程序脚本,如下所示:

With Unix cron, stat-and-log will be executed exactly once per complete cron invocation, regardless of the $exitcode. One can then use these metrics for simple alerts on failed executions.

使用Unix cron,无论$exitcode为何,每次完整的cron调用都会执行一次stat-and-log 。 然后,可以将这些指标用于执行失败的简单警报。

With Kubernetes CronJob, where there are retries on failures by default and an execution can have multiple failure states (Job failure and container failure), monitoring is not as straightforward.

使用Kubernetes CronJob时,默认情况下会重试故障,并且执行过程可能具有多个故障状态(Job故障和容器故障),因此监控并不那么简单。

Using a similar script in an application container and with Jobs configured to restart on failure, a CronJob will instead repeatedly execute and spew metrics and logs up to a BackoffLimit number of times on failure, introducing lots of noise to a developer trying to debug it. Additionally, a naive alert using the first failure from the wrapper script can be un-actionable noise as the application container may recover and complete successfully on its own.

在应用程序容器中使用类似的脚本,并且将Jobs配置为在失败时重新启动,CronJob会重复执行并喷出度量标准,并在失败时记录多达BackoffLimit次数,从而给尝试调试它的开发人员带来很多噪音。 此外,由于应用程序容器可能会自行恢复并成功完成,因此使用包装脚本中的第一个失败的幼稚警报可能是无法采取行动的噪音。

Alternatively, you could alert at the Job level instead of the application container level using an API-layer metric for Job failures like kube_job_status_failed from kube-state-metrics. The drawback of this approach is that an on-call won’t be alerted until the Job has reached the terminal failure state once BackoffLimit has been reached, which can be much later than the first application container failure.

或者,您可以在工作层面,而不是使用API层度量工作欠佳的应用程序容器级别警报kube_job_status_failed从KUBE-状态度量 。 这种方法的缺点是,一旦达到BackoffLimit在作业达到终端故障状态之前,将不会发出呼叫通知,这可能比第一次应用程序容器故障要晚得多。

是什么导致CronJobs间歇性失败? (What causes CronJobs to fail intermittently?)

Non-negligible start delay and retry-on-failure loops contribute additional delay that can interfere with the repeated execution of Kubernetes CronJobs. For frequent CronJobs, or those with long application execution times relative to idling time, this additional delay can carry over into the next scheduled invocation. If the CronJob has ConcurrencyPolicy: Forbid set to disallow concurrent runs, then this carry-over causes future invocations to not execute on-time and get backed up.

不可忽略的启动延迟和失败重试循环会导致额外的延迟,这些延迟可能会干扰Kubernetes CronJobs的重复执行。 对于频繁的CronJob或与空闲时间相关的应用程序执行时间较长的CronJob,此额外的延迟可以延续到下一个计划的调用中。 如果CronJob的ConcurrencyPolicy: Forbid Forbid设置为不允许并发运行 ,则此残留会导致将来的调用不能按时执行并得到备份。

Example timeline (from the perspective of the 例如时间线(从的角度 cronjobcontrollercronjobcontroller) where ),其中 startingDeadlineSecondsstartingDeadlineSeconds is exceeded for a particular hourly CronJob — the CronJob misses its run and won’t be invoked until the next scheduled time超过特定小时的cronjob -中的cronjob错过它的运行并不会被调用,直到下一个预定的时间

A more sinister scenario that we observed at Lyft where CronJobs can miss invocations entirely is when a CronJob has startingDeadlineSeconds set. In that scenario, when start delay exceeds the startingDeadlineSeconds, the CronJob will miss the run entirely. Additionally, if the CronJob also has ConcurrencyPolicy set to Forbid, a previous invocation’s retry-on-failure loop can also delay the next invocation, causing the CronJob to miss as well.

我们在Lyft观察那里CronJobs能错过调用一个更险恶的情况完全是当一个cronjob有startingDeadlineSeconds集。 在那种情况下,当启动延迟超过startingDeadlineSeconds ,CronJob将完全错过运行。 此外,如果的cronjob也有ConcurrencyPolicy设置为Forbid ,以前调用的重试,在故障回路也延迟下一次调用,引起的cronjob小姐为好。

Kubernetes CronJobs在现实世界中的运营负担 (The Real-world operational burden of Kubernetes CronJobs)

Since beginning to move these repeated, scheduled tasks onto Kubernetes, we found that using CronJob out-of-the-box introduced several pain-points from both the developers’ and the platform team’s points of view that began to negate the benefits and cost-savings we initially chose Kubernetes CronJob for. We soon realized that neither our developers nor the platform team were equipped with the necessary tools for operating and understanding the complex life cycles of CronJobs.

自开始将这些重复的计划任务移至Kubernetes以来,我们发现开箱即用地使用CronJob从开发人员和平台团队的角度都引入了多个痛点,这些痛点开始抵消了收益和成本。为了节省开支,我们最初选择了Kubernetes CronJob。 我们很快意识到,我们的开发人员和平台团队都没有配备必要的工具来操作和理解CronJobs的复杂生命周期。

Developers at Lyft came to us with lots of questions and complaints when trying to operate and debug their Kubernetes CronJobs like:

Lyft的开发人员在尝试操作和调试其Kubernetes CronJobs时遇到了许多问题和投诉 ,例如:

  • “Why isn’t my cron running?”

    “为什么我的cron不运行?”

  • “I think my cron stopped running. How can I tell if my cron is actually running?”

    “我认为我的cron已停止运行。 我如何知道我的cron是否正在运行?”

  • “I didn’t know the cron wasn’t running, I just assumed it was!”

    “我不知道cron不在运行,我只是以为它在运行!”

  • “How do I remedy X failed cron? I can’t just ssh in and run the command myself.”

    “如何纠正X失败的Cron? 我不能只是自己输入和运行命令。”

  • “Can you explain why this cron seemed to miss a few schedules between X and Y [time periods]?”

    “您能解释一下为什么这个Cron似乎错过了X和Y [时间段]之间的一些时间表吗?”

  • “We have X (large number) of crons, each with their own alarms, and it’s becoming tedious/painful to maintain them all.”

    “我们有X(大量)克朗,每个克朗都有自己的警报器,维护它们又变得乏味/痛苦。”

  • “What is all this Job, Pod, and sidecar nonsense?”

    “这是工作,豆荚和杂物的全部废话吗?”

As a platform team, we were not equipped to answer questions like:

作为平台团队 ,我们没有能力回答以下问题:

  • How do we quantify the performance characteristics of our Kubernetes Cron platform?

    我们如何量化Kubernetes Cron平台的性能特征?

  • What is the impact of on-boarding more CronJobs onto our Kubernetes environment?

    在我们的Kubernetes环境中加入更多CronJobs有什么影响?

  • How does running multi-tenant Kubernetes CronJobs perform compared to single-tenant Unix cron?

    与单租户Unix cron相比,运行多租户Kubernetes CronJobs的性能如何?

  • How do we begin to define Service-Level-Objectives (SLOs) to communicate with our customers?

    我们如何开始定义与客户沟通的服务水平目标(SLO)?

  • What do we monitor and alarm on as platform operators to make sure platform-wide issues are tended to quickly with minimal impact on our customers?

    作为平台运营商,我们要监视什么并发出警报,以确保在不影响客户的情况下Swift解决平台范围的问题?

Debugging CronJob failures is no easy task, and often requires an intuition for where failures happen and where to look to find proof. Sometimes this evidence can be difficult to dig up, such as logs in the cronjobcontroller which are only logged at a high verbosity log-level. Or, the traces simply disappear after a certain time period and make debugging a game of “whack-a-mole”, such as Kubernetes Events on the CronJob, Job, and Pod objects themselves, which are only retained for one hour by default. None of these methods are easy to use, and do not scale well from a support point-of-view with more and more CronJobs on the platform.

调试CronJob故障不是一件容易的事,并且通常需要对故障发生的位置以及寻找证据的位置进行直观了解。 有时,很难找到这些证据,例如cronjobcontroller中的日志仅以高详细日志级别记录。 或者,这些痕迹只是在特定时间段后消失,并调试“打mol鼠”游戏,例如CronJob,Job和Pod对象本身上的Kubernetes Events,默认情况下仅保留一小时。 这些方法都不容易使用,而且从平台上越来越多的CronJobs的支持角度来看,它们的伸缩性也不佳。

In addition, sometimes Kubernetes would just quit when a CronJob had missed too many runs, requiring someone to manually “un-stick” the CronJob. This happens in real-world usage more often than you would think, and became painful to remedy manually each time.

另外,有时当CronJob错过了太多运行时,Kubernetes只会退出 ,需要有人手动“松开” CronJob。 在现实世界中,这种情况发生的频率比您想象的要高,并且每次手动修复都变得很痛苦。

This concludes the dive into the technical and operational issues we’ve encountered using Kubernetes CronJob at scale. In Part 2 we share what we did to address these issues in our Kubernetes stack to improve the usability and reliability of CronJobs.

到此结束 深入探讨我们使用Kubernetes CronJob遇到的技术和运营问题。 在第2部分中,我们分享了我们为解决Kubernetes堆栈中的这些问题所做的工作,以提高CronJobs的可用性和可靠性。

As always, Lyft is hiring! If you’re passionate about Kubernetes and building infrastructure platforms, read more about them on our blog and join our team!

与往常一样,Lyft正在招聘! 如果您对Kubernetes和构建基础架构平台充满热情,请在我们的博客上阅读有关它们的更多信息并 加入我们的团队

翻译自: https://eng.lyft.com/improving-kubernetes-cronjobs-at-scale-part-1-cf1479df98d4

lyft读音


http://www.taodudu.cc/news/show-3296353.html

相关文章:

  • 生字癖用计算机,那些因为太复杂没被收录的生僻字,终于被录入电脑
  • 营养和饮食 蛋白质
  • 营养与免疫息息相关
  • 量化交易软件 python_摄入量
  • 营养与碘
  • 水果之王猕猴桃之系列八(猕猴桃的营养诉说)
  • 怀孕了需要哪些营养?
  • [论文总结] 树木的营养生理
  • 营养素揭秘:合成营养素与天然营养素真的有区别吗?
  • 肠内营养知多少
  • 肠道内营养
  • 肠道菌群对哪些维生素的代谢最负责?
  • 《深度营养》读书笔记
  • 高光谱与农业(一)植物叶片的反射、吸收光谱
  • Python+cplex运筹优化学习笔记(三)-营养膳食选择
  • 营养早餐
  • 营养搭配Lingo
  • [渝粤教育] 江西师范大学 大学生身体活动与健康 参考 资料
  • 职教云python题和答案_智慧职教云课堂APPPython程序设计试题及答案
  • 肠道菌群与蛋白质代谢
  • 最能挽救头发的五种营养物质
  • JAVA--RSA加密
  • html导出pdf的四种方式
  • RSA加密算法的java实现
  • java的加密方式
  • 微软新版edge浏览器设置用户目录和插件白名单
  • SpringCloud从入门到精通(超详细文档二)
  • Java数据加密、解密
  • 前后端分离架构中的接口安全(上篇)
  • 第16课:Spring Cloud 实例详解——基础框架搭建(三)

lyft读音_Lyft的CronJobs相关推荐

  1. lyft读音_我在Lyft的远程暑期实习

    lyft读音 Have you ever wondered what it's like to work at Lyft? What a remote internship looks like? I ...

  2. 模糊匹配 读音_onenote搜索机制详解②:两种搜索模式,模糊与精确匹配

    先从纯文本搜索讲起,这是最基本也是最重要的. 从这篇开始,以及接下来连续几篇文章,都会介绍搜索的基础功能.注意,这几篇文章中谈论的都是基本的.正常的搜索功能,暂时不考虑Bug等因素. 在很多软件(例如 ...

  3. 管理7k+工作流,月运行超10000万次,Lyft开源的Flyte平台意味着什么?

    作者 | Allyson Gale 译者 | 刘畅 编辑 | Jane 出品 | AI科技大本营(ID:rgznai100) [导读]Flyte 平台可以更容易的创建并发,可伸缩和可维护的工作流,从而 ...

  4. 自动驾驶人的福音!Lyft公开Level 5部署平台Flexo细节

    作者 | Mathias Gug等,Lyft Level 5 软件工程师 译者 | Lucy 编辑 | 夕颜 出品 | AI科技大本营(ID:rgznai100) 导读:经过一年半的 bootstra ...

  5. jop怎么读音英语怎么说_“秀恩爱”英语怎么说?可不是“show love”

    朋友圈都是刷屏秀恩爱, "秀恩爱"英语可不是show love! 一."秀恩爱"英语怎么说? show有"展示"的意思,love是" ...

  6. Lyft推出一种新的实时地图匹配算法

    点击上方"3D视觉工坊",选择"星标" 干货第一时间送达 打车有时也会职业病发作,琢磨一下车辆调度是怎么做的,路径规划算法要怎么写,GPS偏移该怎么纠正等等.不 ...

  7. envoy实现_微服务之服务治理:Envoy 全局 gRPC 限速服务 lyft/ratelimit 详解

    发布 | 才云 Caicloud 作者 | Keon(才云首席客户成功官) 应用微服务架构后,你是否遇到过这些问题?后台资源被大量请求淹没耗尽.客户端持续发起请求直至服务宕机.分布式系统因超时而雪崩- ...

  8. 全民自动驾驶5年内真的会来吗?这是Lyft的自动驾驶2.0

    来源:机器之心本文约2800字,建议阅读5分钟 本文介绍了自动驾驶技术领域的一些困惑. 过去十年,尽管机器学习已经在图像识别.决策制定.NLP 和图像合成等领域取得很多成功,但却在自动驾驶技术领域没有 ...

  9. Lyft估值目标近200亿美元 有望成今年来美国最大IPO

    https://m.jrj.com.cn/rss/toutiaoyc/2019/3/19/27185459.shtml 金融界美股讯 Lyft正寻求通过首次公开募股(IPO)筹集最高21亿美元资金,估 ...

最新文章

  1. 从刚入职阿里的学弟那里薅来的面试题,速速领取~~~
  2. 查看终端进程是否死掉技巧
  3. android MIPI屏 导航栏丢失
  4. WinAPI: GetDoubleClickTime、SetDoubleClickTime - 获取与设置鼠标双击间隔时间
  5. 高通平台Tag精确寻找进阶教程
  6. Google zerotouch方案介绍
  7. 查看约束信息_【华智产品汇】育种信息安全的守护者——华智育种管家
  8. C++网络编程快速入门(一):TCP网络通信基本流程以及基础函数使用
  9. 【搜索引擎】全文索引数据结构和算法
  10. Layui的一点小理解(上)
  11. Java LinkedHashMap 逆序遍历
  12. 2021年中国乙醛市场趋势报告、技术动态创新及2027年市场预测
  13. MapReduce简介和过程浅析
  14. 本程序是三菱FX3U PLC编写的铝材过秤包装平台
  15. VSCode离线汉化教程
  16. linux编译安卓源码,Ubuntu下编译Android源码
  17. 解析身份证号码(附加Java代码)
  18. SystemUI之功能介绍和UI布局实现
  19. 嵌入式开发需要学mysql吗_学习嵌入式开发需要学习哪些课程?如何学习?
  20. 速腾激光雷达 xavier环境驱动配置踩坑记录

热门文章

  1. Linux Cron表达式每半个小时执行一次
  2. TI Cortex-A8 AM335X开发板工控板
  3. 静静地倾听:你会获得无限的力量与深深的宁静!
  4. 在北京外地农村户口和城镇户口五险一金的区别?
  5. 空心字母金字塔(Java)
  6. php 三消map 生成算法,UGC三消 一个H5布局的虎牙直播故事
  7. c4d导入html,C4D模型导出到网页插件 C4D Web Exporter(WIN)
  8. 青海特色美食制作工艺数字化保护平台
  9. 激光雷达点云---点云二维栅格化处理
  10. ArcEngine二次开发-构建获取栅格图层属性表(ITable)