2019独角兽企业重金招聘Python工程师标准>>>

Q:    I recently came to know that in Java 8 hash maps uses binary tree instead of linked list and hash code is used as the branching factor.I understand that in case of high collision the lookup is reduced to O(log n) from O(n) by using binary trees.My question is what good does it really do as the amortized time complexity is still O(1) and maybe if you force to store all the entries in the same bucket by providing the same hash code for all keys we can see a significant time difference but no one in their right minds would do that.

Binary tree also uses more space than singly linked list as it stores both left and right nodes.Why increase the space complexity when there is absolutely no improvement in time complexity except for some spurious test cases.

我最近才知道在Java 8哈希映射中使用二叉树而不是链表,并使用哈希代码作为分支因子。我知道在高冲突的情况下,查找从 O(n)减少到O(log n) 通过使用二叉树。我的问题是它真正做了什么好处,因为摊销的时间复杂度仍然是 O(1)并且如果你强制通过为所有键提供相同的哈希码来存储同一桶中的所有条目 可以看到一个显着的时间差异,但没有一个人在他们正确的思想中会这样做。二进制树比单链表使用更多空间,因为它存储左右节点。当除了一些虚假测试用例之外,当时间复杂度完全没有改善时,为什么增加空间复杂度。

A:    This is mostly security-related change. While in normal situation it's rarely possible to have many collisions, if hash keys arrive from untrusted source (e.g. HTTP header names received from the client), then it's possible and not very hard to specially craft the input, so the resulting keys will have the same hashcode. Now if you perform many look-ups, you may experience denial-of-service. It appears that there's quite a lot of code in the wild which is vulnerable to this kind of attacks, thus it was decided to fix this on the Java side.

For more information refer to JEP-180.

这主要是与安全相关的变化。 虽然在正常情况下很少有可能发生很多冲突,如果哈希密钥来自不受信任的来源(例如从客户端收到的HTTP头名称),那么可能并且不是很难专门设计输入,因此生成的密钥将具有 相同的哈希码。 现在,如果您执行许多查找,您可能会遇到拒绝服务。 似乎在野外有相当多的代码容易受到这种攻击,因此决定在Java端解决这个问题。

有关更多信息,请参阅JEP-180。

PS(参考原文):

在设计hash函数时,因为目前的table长度n为2的幂,而计算下标的时候,是这样实现的(使用&位操作,而非%求余):

(n - 1) & hash

设计者认为这方法很容易发生碰撞。为什么这么说呢?不妨思考一下,在n – 1为15(0×1111)时,其实散列真正生效的只是低4bit的有效位,当然容易碰撞了。

因此,设计者想了一个顾全大局的方法(综合考虑了速度、作用、质量),就是把高16bit和低16bit异或了一下。设计者还解释到因为现在大多数的hashCode的分布已经很不错了,就算是发生了碰撞也用O(logn)的tree去做了。仅仅异或一下,既减少了系统的开销,也不会造成的因为高位没有参与下标的计算(table长度比较小时),从而引起的碰撞。

如果还是产生了频繁的碰撞,会发生什么问题呢?作者注释说,他们使用树来处理频繁的碰撞(we use trees to handle large sets of collisions in bins),在JEP-180中,描述了这个问题:

Improve the performance of java.util.HashMap under high hash-collision conditions byusing balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

之前已经提过,在获取HashMap的元素时,基本分两步:

  1. 首先根据hashCode()做hash,然后确定bucket的index;
  2. 如果bucket的节点的key不是我们需要的,则通过keys.equals()在链中找。

在Java 8之前的实现中是用链表解决冲突的,在产生碰撞的情况下,进行get时,两步的时间复杂度是O(1)+O(n)。因此,当碰撞很厉害的时候n很大,O(n)的速度显然是影响速度的。

因此在Java 8中,利用红黑树替换链表,这样复杂度就变成了O(1)+O(logn)了,这样在n很大的时候,能够比较理想的解决这个问题,在Java 8:HashMap的性能提升一文中有性能测试的结果

JEP 180: Handle Frequent HashMap Collisions with Balanced Trees

Author Mike Duigou
Owner Brent Christian
Type Feature
Scope Implementation
Status Closed / Delivered
Release 8
Component core-libs
Discussion core dash libs dash dev at openjdk dot java dot net
Effort M
Duration M
Reviewed by Alan Bateman
Endorsed by Brian Goetz
Created 2013/02/08 20:00
Updated 2017/06/14 18:44
Issue 8046170

Summary

Improve the performance of java.util.HashMap under high hash-collision conditions by using balanced trees rather than linked lists to store map entries. Implement the same improvement in the LinkedHashMap class.

Motivation

Earlier work in this area in JDK 8, namely the alternative string-hashing implementation, improved collision performance for string-valued keys only, and it did so at the cost of adding a new (private) field to every String instance.

The changes proposed here will improve collision performance for any key type that implements Comparable. The alternative string-hashing mechanism, including the private hash32 field added to the String class, can then be removed.

Description

The principal idea is that once the number of items in a hash bucket grows beyond a certain threshold, that bucket will switch from using a linked list of entries to a balanced tree. In the case of high hash collisions, this will improve worst-case performance from O(n) to O(log n).

This technique has already been implemented in the latest version of thejava.util.concurrent.ConcurrentHashMap class, which is also slated for inclusion in JDK 8 as part of JEP 155. Portions of that code will be re-used to implement the same idea in the HashMap and LinkedHashMap classes. Only the implementations will be changed; no interfaces or specifications will be modified. Some user-visible behaviors, such as iteration order, will change within the bounds of their current specifications.

We will not implement this technique in the legacy Hashtable class. That class has been part of the platform since Java 1.0, and some legacy code that uses it is known to depend upon iteration order. Hashtable will be reverted to its state prior to the introduction of the alternative string-hashing implementation, and will maintain its historical iteration order.

We also will not implement this technique in WeakHashMap. An attempt was made, but the complexity of having to account for weak keys resulted in an unacceptable drop in microbenchmark performance. WeakHashMap will also be reverted to its prior state.

There is no need to implement this technique in the IdentityHashMap class. It uses System.identityHashCode() to generate hash codes, so collisions are generally rare.

Testing

  • Run Map tests from Doug Lea's JSR 166 CVS workspace (includes a couple microbenchmarks)
  • Run performance tests of standard workloads
  • Possibly develop new microbenchmarks

Risks and Assumptions

This change will introduce some overhead for the addition and management of the balanced trees; we expect that overhead to be negligible.

This change will likely result in a change to the iteration order of the HashMap class. The HashMap specification explicitly makes no guarantee about iteration order. The iteration order of the LinkedHashMap class will be maintained.

转载于:https://my.oschina.net/u/2935389/blog/3041636

Why hash maps in Java 8 use binary tree instead of linked list?相关推荐

  1. LeetCode 102. Binary Tree Level Order Traversal--递归,迭代-Python,Java解法

    题目地址: Given a binary tree, return the level order traversal of its nodes' values. (ie, from left to ...

  2. LeetCode 105 Construct Binary Tree from Preorder and Inorder Traversal-前序中序遍历构造二叉树-Python和Java递归解法

    题目地址:Construct Binary Tree from Preorder and Inorder Traversal - LeetCode Given preorder and inorder ...

  3. Java for LeetCode 114 Flatten Binary Tree to Linked List

    Given a binary tree, flatten it to a linked list in-place. For example, Given 1/ \2 5/ \ \3 4 6 The ...

  4. 对一致性Hash算法,Java代码实现的深入研究

    一致性Hash算法 关于一致性Hash算法,在我之前的博文中已经有多次提到了,MemCache超详细解读一文中"一致性Hash算法"部分,对于为什么要使用一致性Hash算法.一致性 ...

  5. -bash: /tyrone/jdk/jdk1.8.0_91/bin/java: cannot execute binary file

    问题描述:今天在linux环境下安装了一下JDK,安装成功后,打算输入java -version去测试一下,结果却出错了. 错误信息:-bash: /tyrone/jdk/jdk1.8.0_91/bi ...

  6. java: cannot execute binary file错误

    http://everlook.iteye.com/blog/1568886 tomcat报错: /data/cmsolr/tomcat-solr-bid/bin/catalina.sh: line ...

  7. leetcode 637. Average of Levels in Binary Tree | 637. 二叉树的层平均值(Java)

    题目 https://leetcode-cn.com/problems/average-of-levels-in-binary-tree/ 题解 1.参考"二叉树按层打印"写的解法 ...

  8. Android之提示Caused by: java.lang.UnsupportedOperationException: Binary XML file line #67: You must sup

    1 问题 : Caused by: java.lang.UnsupportedOperationException: Binary XML file line #67: You must supply ...

  9. java: cannot execute binary file 如果遇到这个错,一般是操作系统位数出问题了。

    [root@testserver usr]# java/jdk1.6.0_12/bin/java -bash: java/jdk1.6.0_12/bin/java: cannot execute bi ...

  10. 【LeetCode】Minimum Depth of Binary Tree 二叉树的最小深度 java

    [LeetCode]Minimum Depth of Binary Tree Given a binary tree, find its minimum depth. The minimum dept ...

最新文章

  1. 【Perl】二维数组
  2. 深入探寻seajs的模块化与加载方式
  3. 计算机网络中数据的传递过程
  4. CentOS 使用 Docker 安装 Sentry
  5. 详解Linux-I2C驱动
  6. ITK:演示可用的阈值算法
  7. linux 恢复数据
  8. php zend mvc 配置,理解Zend Framework 的MVC模式_PHP教程
  9. Makeflie自动生成依赖,自动化编译
  10. java工程师面试几百问_不是吧?面试被问了几百遍的JVM,你还搞不清楚?
  11. LintCode—删除链表中倒数第n个节点(174)
  12. 安全教育平台账号后四位_安全教育平台登录账号是什么?
  13. udp push java ddpush_DDPush首页、文档和下载 - 任意门推送 - OSCHINA - 中文开源技术交流社区...
  14. x3850x5服务器内存_有图有真相 IBM System x3850 X5拆机秀
  15. 云更新无盘计算机配置,云更新网吧管理系统
  16. 数据库学习整理之常见运算符
  17. 数理方程与特殊函数|均匀各向同性介质内的热传导方程
  18. 【论文阅读笔记】Beamforming Optimization for Wireless Network Aided by IRS with Discrete Phase Shifts
  19. HAUT 1285: 军团再临【并查集*逆向思维】
  20. Linguist for Mac(mac菜单栏语言翻译工具)

热门文章

  1. 基于天然概率的无需人为平衡的skiplist的美之展现
  2. windows下CodeBlocks TMD-GCC安装及配置
  3. 5月上旬香港域名总量动态:大幅度下降 净减6466个
  4. 用组策略彻底禁止USB存储设备、光驱、软驱、ZIP软驱
  5. ms sql server 2005 select guid返回null的问题
  6. Hard lockup occurs due to an infinite loop encountered in distribute_cfs_runtime()
  7. Linux音频驱动-AOSC之Platform
  8. OpenCV之图像二值化
  9. 快速H.264编码算法的研究及实现
  10. Java的GUI学习六(Action事件)