用来解决润秒突变时可能会造成的系统运行异常。对时间同步也有一定参考意义。
原文:https://googleblog.blogspot.com/2011/09/time-technology-and-leaping-seconds.html

Google’s Site Reliability team is responsible for keeping Google’s services and data centers up and running 24/7. In this post, you’ll hear about a project our Site Reliability Engineers took on to make sure that the fluctuations of time don’t adversely affect Google’s products and services. If you like this (detailed) glimpse at the tech behind the scenes, come back for more about this team’s work in the future. -Ed.

Have you ever had a watch that ran slow or fast, and that you’d correct every morning off your bedside clock? Computers have that same problem. Many computers, including some desktop and laptop computers, use a service called the “Network Time Protocol” (NTP), which does something very similar—it periodically checks the computers’ time against a more accurate server, which may be connected to an external source of time, such as an atomic clock. NTP also takes into account variable factors like how long the NTP server takes to reply, or the speed of the network between you and the server when setting a to-the-second or better time on the computer you’re using.

Soon after the advent of ticking clocks, scientists observed that the time told by them (and now, much more accurate clocks), and the time told by the Earth’s position were rarely exactly the same. It turns out that being on a revolving imperfect sphere floating in space, being reshaped by earthquakes and volcanic eruptions, and being dragged around by gravitational forces makes your rotation somewhat irregular. Who knew?

These fluctuations in Earth’s rotational speed mean that even very accurate clocks, like the atomic clocks used by global timekeeping services, occasionally have to be adjusted slightly to bring them in line with “solar time.” There have been 24 such adjustments, called “leap seconds,” since they were introduced in 1972. Their effect on technology has become more and more profound as people come to rely on fast, accurate and reliable technology.

Why time matters at Google

Having accurate time is critical to everything we do at Google. Keeping replicas of data up to date, correctly reporting the order of searches and clicks, and determining which data-affecting operation came last are all examples of why accurate time is crucial to our products and to our ability to keep your data safe.

Very large-scale distributed systems, like ours, demand that time be well-synchronized and expect that time always moves forwards. Computers traditionally accommodate leap seconds by setting their clock backwards by one second at the very end of the day. But this “repeated” second can be a problem. For example, what happens to write operations that happen during that second? Does email that comes in during that second get stored correctly? What about all the unforeseen problems that may come up with the massive number of systems and servers that we run? Our systems are engineered for data integrity, and some will refuse to work if their time is sufficiently “wrong.” We saw some of our clustered systems stop accepting work on a small scale during the leap second in 2005, and while it didn’t affect the site or any of our data, we wanted to fix such issues once and for all.

This was the problem that a group of our engineers identified during 2008, with a leap second scheduled for December 31. Given our observations in 2005, we wanted to be ready this time, and in the future. How could we make sure everything at Google stays running as if nothing happened, when all our server clocks suddenly see the same second happening twice? Also, how could we make this solution scale? Would we need to audit every line of code that cares about the time? (That’s a lot of code!)

The solution we came up with came to be known as the “leap smear.” We modified our internal NTP servers to gradually add a couple of milliseconds to every update, varying over a time window before the moment when the leap second actually happens. This meant that when it became time to add an extra second at midnight, our clocks had already taken this into account, by skewing the time over the course of the day. All of our servers were then able to continue as normal with the new year, blissfully unaware that a leap second had just occurred. We plan to use this “leap smear” technique again in the future, when new leap seconds are announced by the IERS.

Here’s the science bit

Usually when a leap second is almost due, the NTP protocol says a server must indicate this to its clients by setting the “Leap Indicator” (LI) field in its response. This indicates that the last minute of that day will have 61 seconds, or 59 seconds. (Leap seconds can, in theory, be used to shorten a day too, although that hasn’t happened to date.) Rather than doing this, we applied a patch to the NTP server software on our internal Stratum 2 NTP servers to not set LI, and tell a small “lie” about the time, modulating this “lie” over a time window w before midnight:

lie(t) = (1.0 - cos(pi t / w)) / 2.0*

What this did was make sure that the “lie” we were telling our servers about the time wouldn’t trigger any undesirable behavior in the NTP clients, such as causing them to suspect the time servers to be wrong and applying local corrections themselves. It also made sure the updates were sufficiently small so that any software running on the servers that were doing synchronization actions or had Chubby locks wouldn’t lose those locks or abandon any operations. It also meant this software didn’t necessarily have to be aware of or resilient to the leap second.

In an experiment, we performed two smears—one negative then one positive—and tested this setup using about 10,000 servers. We’d previously added monitoring to plot the skew between atomic time, our Stratum 2 servers and all those NTP clients, allowing us to constantly evaluate the performance of our time infrastructure. We were excited to see monitoring showing plots of those servers’ clocks tracking our model’s predictions, and that we were continuing to serve users’ requests without errors.

Following the successful test, we reconfigured all our production Stratum 2 NTP servers with details of the actual leap second, ready for New Year’s Eve, when they would automatically activate the smear for all production machines, without any further human intervention required. We had a “big red button” opt-out that allowed us to stop the smear in case anything went wrong.

What we learned

The leap smear is talked about internally in the Site Reliability Engineering group as one of our coolest workarounds, that took a lot of experimentation and verification, but paid off by ultimately saving us massive amounts of time and energy in inspecting and refactoring code. It meant that we didn’t have to sweep our entire (large) codebase, and Google engineers developing code don’t have to worry about leap seconds. The team involved in solving this issue was a handful of people, distributed around the world, who were able to work together without restriction in order to solve this problem.

The solution to this challenge drove a lot of thinking to develop better ways to implement locking and consistency, and synchronizing units of work between servers across the world. It also meant we thought more about the precision of our time systems, which have a knock-on effect on our ability to minimize resource wastage and run greener data centers by reducing the amount of time we must spend waiting for responses and rarely doing excess work.

By anticipating potential problems and developing solutions like these, the Site Reliability Engineering group informs and inspires the development of new technology for distributed systems—the systems that you use every day in Google’s products.

leap smear是什么相关推荐

  1. 嵌入式linux时间同步,ntpdate的交叉编译

    一.网站 http://www.ntp.org/downloads.html SoftwareDownloads < Main < NTP 二.配置 ./configure --prefi ...

  2. #资讯 #科普 #闰秒 这一秒,困扰了程序员 50 年

    目录 1.前情提要 2.闰秒从何而来? 3.计算机中令人"头疼"的闰秒问题 3.呼吁废除闰秒 1.前情提要 2012 年 6 月 30 日晚,美国著名新闻社交网站 Reddit 突 ...

  3. 恐造成下一个“千年虫”的闰秒,遭科技巨头们联合抵制

    整理 | 彭慧中       责编 | 屠敏 出品 | CSDN(ID:CSDNnews) 近日,谷歌.Meta.微软和亚马逊四位科技巨头呼吁将闰秒取消,并称闰秒会对网络造成巨大影响.闰秒或将成为下一 ...

  4. 全球首个Magic Leap One体验:吓到你不敢进房间

    来源:智东西 概要:业内备受关注的AR技术公司Magic Leap,在获得19亿美元融资历经七年之后,终于放出其第一款头盔产品Magic Leap One,很快在科技圈.VR圈引起刷屏式关注. 昨夜, ...

  5. 吊打Magic Leap,微软HoloLens 2不只为炫技

    近几日网上关于HoloLens 2的话题颇多.Infinite Retina联合创始人,拥有40多万关注者的Robert Scoble发推写道,HoloLens 2一出,Magic Leap就没那么& ...

  6. 新产品扑朔迷离,Magic Leap又跑去收购3D扫描公司

    Magic Leap收购Dacuda的3D扫描资产,很可能是为了解决其产品的位置追踪问题. 对于神秘的AR公司Magic Leap来说,他们的一举一动都会成为科技圈的头条,上周被外媒曝出产品原型机,不 ...

  7. 终于要揭开神秘面纱?Magic Leap将要展示产品

    Magic Leap准备下周召开董事会,并且会在会议上展示Magic Leap的原型机 "PEQ ". 自打去年年底被爆出产品无法小型化的问题之后,Magic Leap沉寂了一段时 ...

  8. 维塔与 Magic Leap 的MR游戏发布概念片

    (52VR重新整理了原译文并进行了润饰编译) 在游戏方面,Magic Leap 可谓是最为壁垒森严的世界之一.除了知道这涉及混合现实技术,以及<雪崩>作者尼尔·尼尔·斯蒂芬森等大神加盟之外 ...

  9. 使用Leap Motion Orion开发酷炫的手势识别VR/AR应用

    Leap Motion Orien支持Oculus和HTC Vive开发,当然对Unity的支持显然是必须的. 不过前提是: 1.Windows 7 64位或者windows 10 2.Leap Mo ...

最新文章

  1. 自适应滤波:最小二乘法
  2. Py之cx_Freeze:Python库之cx_Freeze库(程序打包)简介、安装、使用方法详细攻略—案例之实现机器人在线24小时智能翻译
  3. 学术部活动具体落实计划
  4. egg --- 配置连接mysql 创建模型 插入数据
  5. linux ndk编译so,有的APP NDK 编译的SO文件 无法调用 PackageManager
  6. java从入门到精通第11章_《Java从入门到精通》第十章学习笔记
  7. 【C++深入探索】Copy-and-swap idiom详解和实现安全自我赋值
  8. [***]HZOI20190714 T2熟练剖分
  9. python解析dom,关于Python解析xml dom的简单应用
  10. Nodejs gRPC client 使用typescript
  11. 使用delphi 10.2 开发linux 上的Daemon
  12. 阿里矢量图标库彩色图标(Symbol 引用)
  13. gaussian09使用教程linux,Gaussian 09 GaussView5.0使用教程.ppt
  14. 忘记卡巴斯基内置账户密码 / 取消卡巴斯基密码保护
  15. 色谱计算机常用英文,色谱术语的常用中英文对照
  16. Mac上的网络视频会议软件
  17. python中计算结果保留两位小数
  18. Android 点击按钮切换图片
  19. 第七课:BootRom的烧录
  20. 四大美女 沉鱼-->西施 落雁-->王昭君 闭月-->貂禅 羞花-->杨玉环

热门文章

  1. 7-15 航空公司VIP客户查询 (25分)(没用stl,哈希链地址法实现)
  2. 电脑开机后显示屏只显示品牌Logo就黑屏
  3. python使用matplotlib画折线图(详细)
  4. 代码实现stable-diffusion模型,你也用AI生成获得一等奖的艺术图
  5. Python学习打卡【Task3】异常处理
  6. 计算机组成原理笔记|03存储系统
  7. 没中奖的花花卡不要扔,还有用!
  8. MPLS virtual private network PE-CE之间的路由协议(OSPF)
  9. ! 和 !! 的区别
  10. 信息化和信息系统(3)