文章目录

  • I Numbers Everyone Should Know
    • 1. Google AppEngine Numbers
    • 2. Writes Are Expensive!
    • 3. Reads Are Cheap!
    • 4. Numbers Miscellaneous
    • 5. The Lessons
    • 6. The Techniques
    • 7. Sharded Counters
    • 8. Paging Through Comments
  • II 行业大神 - 杰夫·迪恩

I Numbers Everyone Should Know

1. Google AppEngine Numbers

This group of numbers is from Brett Slatkin in Building Scalable Web Apps with Google App Engine.

2. Writes Are Expensive!

  • Datastore is transactional: writes require disk access
  • Disk access means disk seeks
  • Rule of thumb: 10ms for a disk seek
  • Simple math: 1s / 10ms = 100 seeks/sec maximum
  • Depends on:
    * The size and shape of your data
    * Doing work in batches (batch puts and gets)

3. Reads Are Cheap!

  • Reads do not need to be transactional, just consistent
  • Data is read from disk once, then it’s easily cached
  • All subsequent reads come straight from memory
  • Rule of thumb: 250usec for 1MB of data from memory
  • Simple math: 1s / 250usec = 4GB/sec maximum
    * For a 1MB entity, that’s 4000 fetches/sec

4. Numbers Miscellaneous

This group of numbers is from a presentation Jeff Dean gave at a Engineering All-Hands Meeting at Google.

  • L1 cache reference 0.5 ns
  • Branch mispredict 5 ns
  • L2 cache reference 7 ns
  • Mutex lock/unlock 100 ns
  • Main memory reference 100 ns
  • Compress 1K bytes with Zippy 10,000 ns
  • Send 2K bytes over 1 Gbps network 20,000 ns
  • Read 1 MB sequentially from memory 250,000 ns
  • Round trip within same datacenter 500,000 ns
  • Disk seek 10,000,000 ns
  • Read 1 MB sequentially from network 10,000,000 ns
  • Read 1 MB sequentially from disk 30,000,000 ns
  • Send packet CA->Netherlands->CA 150,000,000 ns

5. The Lessons

  • Writes are 40 times more expensive than reads.
  • Global shared data is expensive. This is a fundamental limitation of distributed systems. The lock contention in shared heavily written objects kills performance as transactions become serialized and slow.
  • Architect for scaling writes.
  • Optimize for low write contention.
  • Optimize wide. Make writes as parallel as you can.

6. The Techniques

Keep in mind these are from a Google AppEngine perspective, but the ideas are generally applicable.

7. Sharded Counters

We always seem to want to keep count of things. But BigTable doesn’t keep a count of entities because it’s a key-value store. It’s very good at getting data by keys, it’s not interested in how many you have. So the job of keeping counts is shifted to you.

The naive counter implementation is to lock-read-increment-write. This is fine if there a low number of writes. But if there are frequent updates there’s high contention. Given the the number of writes that can be made per second is so limited, a high write load serializes and slows down the whole process.

The solution is to shard counters. This means:

  • Create N counters in parallel.
  • Pick a shard to increment transactionally at random for each item counted.
  • To get the real current count sum up all the sharded counters.
  • Contention is reduced by 1/N. Writes have been optimized because they have been spread over the different shards. A bottleneck around shared state has been removed.

This approach seems counter-intuitive because we are used to a counter being a single incrementable variable. Reads are cheap so we replace having a single easily read counter with having to make multiple reads to recover the actual count. Frequently updated shared variables are expensive so we shard and parallelize those writes.

With a centralized database letting the database be the source of sequence numbers is doable. But to scale writes you need to partition and once you partition it becomes difficult to keep any shared state like counters. You might argue that so common a feature should be provided by GAE and I would agree 100 percent, but it’s the ideas that count (pun intended).

8. Paging Through Comments

How can comments be stored such that they can be paged through
in roughly the order they were entered?

Under a high write load situation this is a surprisingly hard question to answer. Obviously what you want is just a counter. As a comment is made you get a sequence number and that’s the order comments are displayed. But as we saw in the last section shared state like a single counter won’t scale in high write environments.

A sharded counter won’t work in this situation either because summing the shared counters isn’t transactional. There’s no way to guarantee each comment will get back the sequence number it allocated so we could have duplicates.

Searches in BigTable return data in alphabetical order. So what is needed for a key is something unique and alphabetical so when searching through comments you can go forward and backward using only keys.

A lot of paging algorithms use counts. Give me records 1-20, 21-30, etc. SQL makes this easy, but it doesn’t work for BigTable. BigTable knows how to get things by keys so you must make keys that return data in the proper order.

In the grand old tradition of making unique keys we just keep appending stuff until it becomes unique. The suggested key for GAE is: time stamp + user ID + user comment ID.

Ordering by date is obvious. The good thing is getting a time stamp is a local decision, it doesn’t rely on writes and is scalable. The problem is timestamps are not unique, especially with a lot of users.

So we can add the user name to the key to distinguish it from all other comments made at the same time. We already have the user name so this too is a cheap call.

Theoretically even time stamps for a single user aren’t sufficient. What we need then is a sequence number for each user’s comments.

And this is where the GAE solution turns into something totally unexpected. Our goal is to remove write contention so we want to parallelize writes. And we have a lot available storage so we don’t have to worry about that.

With these forces in mind, the idea is to create a counter per user. When a user adds a comment it’s added to a user’s comment list and a sequence number is allocated. Comments are added in a transactional context on a per user basis using Entity Groups. So each comment add is guaranteed to be unique because updates in an Entity Group are serialized.

The resulting key is guaranteed unique and sorts properly in alphabetical order. When paging a query is made across entity groups using the ID index. The results will be in the correct order. Paging is a matter of getting the previous and next keys in the query for the current page. These keys can then be used to move through index.

I certainly would have never thought of this approach. The idea of keeping per user comment indexes is out there. But it cleverly follows the rules of scaling in a distributed system. Writes and reads are done in parallel and that’s the goal. Write contention is removed.

II 行业大神 - 杰夫·迪恩

迪恩意识到,很多在他头脑里是“常识”的事情,原来绝大部分软件工程师都不知道。于是他制作了下面这张“数表”。

每个计算机工程师都该知道的数字列表:

  • L1 cache reference 0.5 ns
  • Branch mispredict 5 ns
  • L2 cache reference 7 ns
  • Mutex lock/unlock 100 ns
  • Main memory reference 100 ns
  • Compress 1K bytes with Zippy 10,000 ns
  • Send 2K bytes over 1 Gbps network 20,000 ns
  • Read 1 MB sequentially from memory 250,000 ns
  • Round trip within same datacenter 500,000 ns
  • Disk seek 10,000,000 ns
  • Read 1 MB sequentially from network 10,000,000 ns
  • Read 1 MB sequentially from disk 30,000,000 ns
  • Send packet CA->Netherlands->CA 150,000,000 ns

从那以后,软件工程师在做系统设计之时就能参照这张数据表来评估不同设计方案性能的优劣。

对迪恩来说,这张表只是他做的一件小事而已。迪恩之所以牛,是因为他和他的搭档桑杰·格玛瓦特一起打造了支撑大数据、机器学习的分布式系统的基石。

迪恩进人谷歌以后,解决的第一个问题就是怎样有效解决大量数据的存储问题。简单来说就是,在迪恩和他的搭档桑杰之前,软件工程师要想完成一些重要任务、解决核心问题必须买特别高配的机器,因为计算机的性能越强,计算能力才会越强,而性能强的计算机价格一定更贵。

但迪恩打破了这个常规做法,在他看来,把很多很多台非常便宜的机器拼在一起,也能达到强大的运算能力。这个做法,可不只是替谷歌节省开销,而是开辟了一个全新的方向。

今天我们看到的整个云服务运用的分布式存储、分布式计算,以及一些硬件、网络技术,都是基于迪恩的这个方向产生、蓬勃发展的,比如 CFS、MapReduce、BigTable、Spanner、TensorFlow 等。

通过迪恩的经历我们可以看到,顶尖高手就是这样,具备开创新领域的能力,他们会推翻一些第一性原理(first principle),把整个行业的认知提升到不一样的水平,从而推动整个行业发展。

每个人都该知道的数字相关推荐

  1. cpu内存访问速度,磁盘和网络速度,所有人都应该知道的数字

    google 工程师Jeff Dean 首先在他关于分布式系统的ppt文档列出来的,到处被引用的很多. 1纳秒等于10亿分之一秒,= 10 ^ -9 秒 ----------------------- ...

  2. 大数据早报:搜狐《数字之道》,召唤新势力 十一数据观:钱和人都去哪了?(10.10)

    数据早知道,上36dsj看早报! 来源36大数据,作者:奥兰多 『IPO』每日互动:积极创新构建智能大数据平台,走向IPO上市之路 在移动互联网的浪潮中,随着移动互联网人群及移动终端的大幅增长,移动应 ...

  3. 10个性鼠标指针主题包_每个人都应该知道的十大电脑鼠标使用技巧,别说你不知道...

    电脑鼠标是我们日常使用频率最高的一个设备,但遗憾的是,大多数计算机用户都没有充分利用计算机鼠标.下面我就给出一些使用计算机鼠标的提示和秘密,以帮助大家充分发挥计算机鼠标的潜力,并提高整体工作效率. 1 ...

  4. 干货丨有关机器学习每个人都应该了解的东西

    本文科普了机器学习方面的知识,简单介绍了机器学习可以做什么,以及如何做的.以下是译文. 计算机应该为人类解决问题.传统的方法是"编写"所需的程序,换句话说,就是我们教电脑问题解决的 ...

  5. python3.7和3.8的区别-Python 3.8 新功能来一波(大部分人都不知道)

    Python 是一门广受好评的编程语言,每个版本的更新都会对开发社区带来一定影响.近日,Python 3.8 已进入 beta 2 版本的测试中,各项新特性已经添加完毕,最终版本预计于今年 10 月发 ...

  6. 微信从原版到现在所有界面图片_微信突然宣布:现在能改微信号了,所有人都能改...

    千呼万唤始出来! 微信号,终于可以修改了,而且,是所有人都可以! 不是机哥吹牛,这可是官方宣布,安全可靠. 机哥这就教你怎么操作. 看到了么?在资料页中,微信号这一项中,多出来一个右箭头,点击即可修改 ...

  7. 两个半小时,一份Python基础试卷,满分100,却有80%的人都不及格

    两个半小时,一份Python基础试卷,满分100,却有80%的人都不及格 Python基础类型考试题 考试时间:90分钟 满分100(80分以上为及格) 如果不给你答案的话,你能及格? 1.件数变量命 ...

  8. 35+的互联网人都哪去了

    最近,经常被问到这样的问题,比如这样的: 杭哥好,目前在头条实习,发现周围真的几乎没有35+员工,那互联网大部分的35+员工都去哪了呢?以及怎么看待有一些关于35的言论总被人提? 1.第一波互联网从业 ...

  9. 为什么说,每个人都应该多读些书?

    全世界只有3.14 % 的人关注了 爆炸吧知识 知乎上有个高赞问题:有哪些值得长期坚持下去就能改变人生的好习惯? 其中最高频的回答是读书. 随着经历和阅历的增加,越来越多的人清醒的认识到:读书不再是学 ...

最新文章

  1. C语言经典例96-计算字符串中子串出现的次数
  2. Linux关机命令和重启命令
  3. 分库分表的类型和特点
  4. HBase 数据存储结构
  5. 使用计算机时什么是开机键什么是关机键,计算器上的开机键是关机键是
  6. vue+elementUI 显示表格指定列合计数据
  7. c# 取余数 浮点数_浮点数精度问题透析:小数计算不准确+浮点数精度丢失根源
  8. 自学python3 最好的入门书籍-清华学霸整理,Python入门到精通这几本书帮你快速入行...
  9. Atitit js canvas的图像处理类库attilax总结与事业
  10. gbdt算法_双色球最简单的算法
  11. 【无人机系统】四轴飞行器及其UAV飞控系统 - 桂林电子科技大学信息科技学院 电子工程系(一)
  12. Spring3.2.1+Hibernate4.1.7 多数据源动态切换
  13. 九度OJ—题目1015:还是A+B
  14. Android 64位变32位
  15. 线性代数拾遗(3)—— “系数矩阵的秩” 和 “齐次线性方程组基础解系向量个数” 的关系
  16. 【C语言】打印乘法口诀表
  17. PyQt5最全73 布局之addStretch设置布局的伸缩量
  18. base64真正的作用和意义
  19. NameNode故障处理方法
  20. HTML文本、段落标记

热门文章

  1. 【Python成长之路】从 零做网站开发 -- 基于Flask和JQuery,实现表格管理平台
  2. 【华为云网络技术分享】HTTP重定向HTTPS配置指南
  3. 【华为云•云享专家•原创分享计划】分享好文赢好礼
  4. java awt文件上传_springMVC实现前台带进度条文件上传的示例代码
  5. JAVA设计一个电视机类_漫谈Java程序设计中的接口应用
  6. java 强制类型转换告警_java-对未声明的强制转换返回警告
  7. C# 委托和事件 (三)
  8. m1 MBA配置TeX+Sublime+Skim环境
  9. 吴恩达 深度学习 编程作业(1-2.1)- Python Basics with Numpy
  10. pycharm不能输入代码或删除代码