首先来看下HashMap的类描述

/*** Hash table based implementation of the <tt>Map</tt> interface.  This* implementation provides all of the optional map operations, and permits* <tt>null</tt> values and the <tt>null</tt> key.  (The <tt>HashMap</tt>* class is roughly equivalent to <tt>Hashtable</tt>, except that it is* unsynchronized and permits nulls.)  This class makes no guarantees as to* the order of the map; in particular, it does not guarantee that the order* will remain constant over time.** <p>This implementation provides constant-time performance for the basic* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function* disperses the elements properly among the buckets.  Iteration over* collection views requires time proportional to the "capacity" of the* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number* of key-value mappings).  Thus, it's very important not to set the initial* capacity too high (or the load factor too low) if iteration performance is* important.** <p>An instance of <tt>HashMap</tt> has two parameters that affect its* performance: <i>initial capacity</i> and <i>load factor</i>.  The* <i>capacity</i> is the number of buckets in the hash table, and the initial* capacity is simply the capacity at the time the hash table is created.  The* <i>load factor</i> is a measure of how full the hash table is allowed to* get before its capacity is automatically increased.  When the number of* entries in the hash table exceeds the product of the load factor and the* current capacity, the hash table is <i>rehashed</i> (that is, internal data* structures are rebuilt) so that the hash table has approximately twice the* number of buckets.** <p>As a general rule, the default load factor (.75) offers a good* tradeoff between time and space costs.  Higher values decrease the* space overhead but increase the lookup cost (reflected in most of* the operations of the <tt>HashMap</tt> class, including* <tt>get</tt> and <tt>put</tt>).  The expected number of entries in* the map and its load factor should be taken into account when* setting its initial capacity, so as to minimize the number of* rehash operations.  If the initial capacity is greater than the* maximum number of entries divided by the load factor, no rehash* operations will ever occur.** <p>If many mappings are to be stored in a <tt>HashMap</tt>* instance, creating it with a sufficiently large capacity will allow* the mappings to be stored more efficiently than letting it perform* automatic rehashing as needed to grow the table.  Note that using* many keys with the same {@code hashCode()} is a sure way to slow* down performance of any hash table. To ameliorate impact, when keys* are {@link Comparable}, this class may use comparison order among* keys to help break ties.** <p><strong>Note that this implementation is not synchronized.</strong>* If multiple threads access a hash map concurrently, and at least one of* the threads modifies the map structurally, it <i>must</i> be* synchronized externally.  (A structural modification is any operation* that adds or deletes one or more mappings; merely changing the value* associated with a key that an instance already contains is not a* structural modification.)  This is typically accomplished by* synchronizing on some object that naturally encapsulates the map.** If no such object exists, the map should be "wrapped" using the* {@link Collections#synchronizedMap Collections.synchronizedMap}* method.  This is best done at creation time, to prevent accidental* unsynchronized access to the map:<pre>*   Map m = Collections.synchronizedMap(new HashMap(...));</pre>** <p>The iterators returned by all of this class's "collection view methods"* are <i>fail-fast</i>: if the map is structurally modified at any time after* the iterator is created, in any way except through the iterator's own* <tt>remove</tt> method, the iterator will throw a* {@link ConcurrentModificationException}.  Thus, in the face of concurrent* modification, the iterator fails quickly and cleanly, rather than risking* arbitrary, non-deterministic behavior at an undetermined time in the* future.** <p>Note that the fail-fast behavior of an iterator cannot be guaranteed* as it is, generally speaking, impossible to make any hard guarantees in the* presence of unsynchronized concurrent modification.  Fail-fast iterators* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.* Therefore, it would be wrong to write a program that depended on this* exception for its correctness: <i>the fail-fast behavior of iterators* should be used only to detect bugs.</i>

总体来说,HashMap具有以下特性

  • key & value 可以为空
  • 多线程不安全
  • 大小为2的幂,就近取
  • 负载因子 0.75 是权衡时间&空间的较好值 泊松分布
  • 初始容量默认 16
  • 负载因子 & 初始容量 ——> 高效的查询和存储 均匀分布  数组扩容受其影响
** Implementation notes.** This map usually acts as a binned (bucketed) hash table, but* when bins get too large, they are transformed into bins of* TreeNodes, each structured similarly to those in* java.util.TreeMap. Most methods try to use normal bins, but* relay to TreeNode methods when applicable (simply by checking* instanceof a node).  Bins of TreeNodes may be traversed and* used like any others, but additionally support faster lookup* when overpopulated. However, since the vast majority of bins in* normal use are not overpopulated, checking for existence of* tree bins may be delayed in the course of table methods.** Tree bins (i.e., bins whose elements are all TreeNodes) are* ordered primarily by hashCode, but in the case of ties, if two* elements are of the same "class C implements Comparable<C>",* type then their compareTo method is used for ordering. (We* conservatively check generic types via reflection to validate* this -- see method comparableClassFor).  The added complexity* of tree bins is worthwhile in providing worst-case O(log n)* operations when keys either have distinct hashes or are* orderable, Thus, performance degrades gracefully under* accidental or malicious usages in which hashCode() methods* return values that are poorly distributed, as well as those in* which many keys share a hashCode, so long as they are also* Comparable. (If neither of these apply, we may waste about a* factor of two in time and space compared to taking no* precautions. But the only known cases stem from poor user* programming practices that are already so slow that this makes* little difference.)** Because TreeNodes are about twice the size of regular nodes, we* use them only when bins contain enough nodes to warrant use* (see TREEIFY_THRESHOLD). And when they become too small (due to* removal or resizing) they are converted back to plain bins.  In* usages with well-distributed user hashCodes, tree bins are* rarely used.  Ideally, under random hashCodes, the frequency of* nodes in bins follows a Poisson distribution* (http://en.wikipedia.org/wiki/Poisson_distribution) with a* parameter of about 0.5 on average for the default resizing* threshold of 0.75, although with a large variance because of* resizing granularity. Ignoring variance, the expected* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /* factorial(k)). The first values are:** 0:    0.60653066* 1:    0.30326533* 2:    0.07581633* 3:    0.01263606* 4:    0.00157952* 5:    0.00015795* 6:    0.00001316* 7:    0.00000094* 8:    0.00000006* more: less than 1 in ten million** The root of a tree bin is normally its first node.  However,* sometimes (currently only upon Iterator.remove), the root might* be elsewhere, but can be recovered following parent links* (method TreeNode.root()).** All applicable internal methods accept a hash code as an* argument (as normally supplied from a public method), allowing* them to call each other without recomputing user hashCodes.* Most internal methods also accept a "tab" argument, that is* normally the current table, but may be a new or old one when* resizing or converting.** When bin lists are treeified, split, or untreeified, we keep* them in the same relative access/traversal order (i.e., field* Node.next) to better preserve locality, and to slightly* simplify handling of splits and traversals that invoke* iterator.remove. When using comparators on insertion, to keep a* total ordering (or as close as is required here) across* rebalancings, we compare classes and identityHashCodes as* tie-breakers.** The use and transitions among plain vs tree modes is* complicated by the existence of subclass LinkedHashMap. See* below for hook methods defined to be invoked upon insertion,* removal and access that allow LinkedHashMap internals to* otherwise remain independent of these mechanics. (This also* requires that a map instance be passed to some utility methods* that may create new nodes.)** The concurrent-programming-like SSA-based coding style helps* avoid aliasing errors amid all of the twisty pointer operations.*//*** The default initial capacity - MUST be a power of two.*/static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

HashMap常见面试题:

  • HashMap的底层数据结构?

  • HashMap的存取原理?

  • Java7和Java8的区别? (1.7 resize的时候,单链表的头插入方式,同一位置上新元素总会被放在链表的头部位置,在旧数组中同一条Entry链上的元素,通过重新计算索引位置后,有可能被放到了新数组的不同位置上。可能出现有环链表!而1.8采用尾插法避免此问题!)

  • 为啥会线程不安全?(多线程情况最容易出现的就是:无法保证刚刚put的值,get的时候还是原值,所以线程安全还是无法保证。)

  • 有什么线程安全的类代替么?

  • 默认初始化大小是多少?为啥是这么多?为啥大小都是2的幂?(均匀分布 & 存取高效)

  • HashMap的扩容方式?负载因子是多少?为什么是这么多?

  • HashMap的主要参数都有哪些?

  • HashMap是怎么处理hash碰撞的?

  • hash的计算规则?

  • 快速失败 & 安全失败?

底层数据结构:

1.8 数组和链表 当链表的长度大于8时 ——> 红黑树 提升查询效率

当链表的长度超过了默认阈值8的时就树形化

在treeifyBin(Node<K,V>[] tab, int hash)方法里面还要判断下 table 的 length 是否大于64,小于64是直接resize原数组长度的 2 倍。

看下Node

    /*** Basic hash bin node, used for most entries.  (See below for* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)*/static class Node<K,V> implements Map.Entry<K,V> {final int hash;final K key;V value;Node<K,V> next;Node(int hash, K key, V value, Node<K,V> next) {this.hash = hash;this.key = key;this.value = value;this.next = next;}public final K getKey()        { return key; }public final V getValue()      { return value; }public final String toString() { return key + "=" + value; }public final int hashCode() {return Objects.hashCode(key) ^ Objects.hashCode(value);}public final V setValue(V newValue) {V oldValue = value;value = newValue;return oldValue;}public final boolean equals(Object o) {if (o == this)return true;if (o instanceof Map.Entry) {Map.Entry<?,?> e = (Map.Entry<?,?>)o;if (Objects.equals(key, e.getKey()) &&Objects.equals(value, e.getValue()))return true;}return false;}}

扩容

当元素个数达 负载因子乘上初始容量时 2倍扩容 将原有的数据重新hash存储到新的数组

线程安全可替代的类

SynchronizedMap、HashTable、ConcurrentHashMap(并发更高分段锁1.7,而1.8利用CAS和Synchronized来保证并发,内部虽然定义了segment,但仅仅是为了保证序列化时的兼容性!)

/*** Stripped-down version of helper class used in previous version,* declared for the sake of serialization compatibility.*/
static class Segment<K,V> extends ReentrantLock implements Serializable {final float loadFactor;Segment(float lf) { this.loadFactor = lf; }
}

hash碰撞处理

    /*** Computes key.hashCode() and spreads (XORs) higher bits of hash* to lower.  Because the table uses power-of-two masking, sets of* hashes that vary only in bits above the current mask will* always collide. (Among known examples are sets of Float keys* holding consecutive whole numbers in small tables.)  So we* apply a transform that spreads the impact of higher bits* downward. There is a tradeoff between speed, utility, and* quality of bit-spreading. Because many common sets of hashes* are already reasonably distributed (so don't benefit from* spreading), and because we use trees to handle large sets of* collisions in bins, we just XOR some shifted bits in the* cheapest possible way to reduce systematic lossage, as well as* to incorporate impact of the highest bits that would otherwise* never be used in index calculations because of table bounds.*/
static final int hash(Object key) {int h;return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

快速失败&安全失败?

    /*** An optimized version of AbstractList.Itr*/private class Itr implements Iterator<E> {int cursor;       // index of next element to returnint lastRet = -1; // index of last element returned; -1 if no suchint expectedModCount = modCount;public boolean hasNext() {return cursor != size;}@SuppressWarnings("unchecked")public E next() {checkForComodification();int i = cursor;if (i >= size)throw new NoSuchElementException();Object[] elementData = ArrayList.this.elementData;if (i >= elementData.length)throw new ConcurrentModificationException();cursor = i + 1;return (E) elementData[lastRet = i];}public void remove() {if (lastRet < 0)throw new IllegalStateException();checkForComodification();try {ArrayList.this.remove(lastRet);cursor = lastRet;lastRet = -1;expectedModCount = modCount;} catch (IndexOutOfBoundsException ex) {throw new ConcurrentModificationException();}}@Override@SuppressWarnings("unchecked")public void forEachRemaining(Consumer<? super E> consumer) {Objects.requireNonNull(consumer);final int size = ArrayList.this.size;int i = cursor;if (i >= size) {return;}final Object[] elementData = ArrayList.this.elementData;if (i >= elementData.length) {throw new ConcurrentModificationException();}while (i != size && modCount == expectedModCount) {consumer.accept((E) elementData[i++]);}// update once at end of iteration to reduce heap write trafficcursor = i;lastRet = i - 1;checkForComodification();}final void checkForComodification() {if (modCount != expectedModCount)throw new ConcurrentModificationException();}}

当通过remove移除HashMap中的一个元素时,会修改modCount值,其他修改HashMap集合的方法也会修改modCount值。该值在创建迭代器的时候,会赋值给expectedModCount,在迭代器工作的时候,会判定检查modCount值是否修改了。如果该值被修改了,则抛出ConcurrentModificationException异常。

//直接从hashtable增删数据就会报错。
//hashtable,hashmap等非并发集合,如果在迭代过程中增减了数据,会快速失败 (一检测到修改,马上抛异常)  修改为期望的值不会抛异常
//java.util.ConcurrentModificationException

Iterator的安全失败是基于对底层集合做拷贝,因此,它不受源集合上修改的影响。

采用安全失败机制的集合容器,在遍历时不是直接在集合内容上访问的,而是先复制原有集合内容,在拷贝的集合上进行遍历。

原理:由于迭代时是对原集合的拷贝进行遍历,所以在遍历过程中对原集合所作的修改并不能被迭代器检测到,所以不会触发Concurrent Modification Exception。

缺点:基于拷贝内容的优点是避免了Concurrent Modification Exception,但同样地,迭代器并不能访问到修改后的内容,即:迭代器遍历的是开始遍历那一刻拿到的集合拷贝,在遍历期间原集合发生的修改迭代器是不知道的。

场景:java.util.concurrent包下的容器都是安全失败,可以在多线程下并发使用,并发修改。

java.util包下面的所有的集合类都是快速失败的,而java.util.concurrent包下面的所有的类都是安全失败的。
快速失败的迭代器会抛出ConcurrentModificationException异常,而安全失败的迭代器永远不会抛出这样的异常。

集合遍历是使用Iterator, Iterator是工作在一个独立的线程中,且有一个互斥锁。Iterator 被创建之后会建立一个指向原来对象的单链索引表,当原来的对象数量发生变化时,这个索引表的内容不会同步改变,所以当索引指针往后移动的时候就找不到要迭代的对象,所以按照 fail-fast原则 Iterator 会马上抛出java.util.ConcurrentModificationException 异常。 Iterator 在工作的时候是不允许被迭代的对象被改变的。

HashMap的线程不安全主要体现在下面两个方面:

1.在JDK1.7中,当并发执行扩容操作时会造成环形链和数据丢失的情况。
2.在JDK1.8中,在并发执行put操作时会发生数据覆盖的情况、死循环。

浅析HashMap,何时树化?常见面试题解析相关推荐

  1. c# 多线程 执行事件 并发_C#.NET Thread多线程并发编程学习与常见面试题解析-1、Thread使用与控制基础...

    前言: 因为平时挺少用到多线程的,写游戏时都在用协程,至于协程那是另一个话题了,除了第一次学习多线程时和以前某个小项目有过就挺少有接触了,最近准备面试又怕被问的深入,所以就赶紧补补多线程基础. 网上已 ...

  2. C#.NET Thread多线程并发编程学习与常见面试题解析-1、Thread使用与控制基础

    前言: 因为平时挺少用到多线程的,写游戏时都在用协程,至于协程那是另一个话题了,除了第一次学习多线程时和以前某个小项目有过就挺少有接触了,最近准备面试又怕被问的深入,所以就赶紧补补多线程基础. 网上已 ...

  3. HashMap的树化门槛为什么是8

    网上主流的答案: 红黑树的平均查找长度是log(n),如果长度为8,平均查找长度为log(8)=3,链表的平均查找长度为n/2,当长度为8时,平均查找长度为8/2=4,红黑树的查找效率更高,这才有转换 ...

  4. 为什么HashMap要树化呢?

    本质上这是个安全问题.因为在元素放置过程中,如果一个对象哈希冲突,都被放置到同一个桶里,则会形成一个链表,我们知道链表查询是线性的,会严重影响存取的性能. 而在现实世界,构造哈希冲突的数据并不是非常复 ...

  5. mysql查找表中员工姓名性别_SQL 常见面试题解析

    内容简介 本文介绍并分析了 100 道常见 SQL 面试题,主要分为三个模块:SQL 初级查询.SQL 高级查询以及数据库设计与开发.文章内容结构如下图所示: 本文主要使用三个示例表:员工表(empl ...

  6. MySQL常见面试题解析

    1.drop,delete与truncate的区别 相同点: truncate和不带where子句的delete,以及drop都会删除表内的数据 不同点: truncate会清除表数据并重置id从1开 ...

  7. java线程工作内存在栈中吗_JVM常见面试题解析

    前言 总结了JVM一些经典面试题,分享出我自己的解题思路,希望对大家有帮助,有哪里你觉得不正确的话,欢迎指出,后续有空会更新. 1.什么情况下会发生栈内存溢出. 思路: 描述栈定义,再描述为什么会溢出 ...

  8. 【搞定Jvm面试】 JVM 垃圾回收揭秘附常见面试题解析

    JVM 垃圾回收 写在前面 本节常见面试题 问题答案在文中都有提到 如何判断对象是否死亡(两种方法). 简单的介绍一下强引用.软引用.弱引用.虚引用(虚引用与软引用和弱引用的区别.使用软引用能带来的好 ...

  9. 【搞定Jvm面试】 Java 内存区域揭秘附常见面试题解析

    本文已经收录自笔者开源的 JavaGuide: https://github.com/Snailclimb ([Java学习 面试指南] 一份涵盖大部分Java程序员所需要掌握的核心知识)如果觉得不错 ...

最新文章

  1. vlc-android配置实录
  2. 为什么搜索引擎都上HTTPS?SSL证书竟是如此重要—Vecloud微云
  3. Satwe楼板能用弹性模计算吗_现浇楼板淋水后却出现裂缝,还好老师傅有经验,多是这3点造成的...
  4. java单行字符_十个便捷的单行代码编程技巧
  5. python爬虫10万信息mysql_python爬虫:爬取易迅网价格信息,并写入Mysql数据库
  6. exchange管理控制台命令行 查看邮箱数据库信息、接收连接器、发送连接器 相关命令
  7. Linux目录结构详解
  8. Win10/Win8快速启动失效/卡logo 的解决方法汇总
  9. DarkSide勒索病毒分析
  10. NoSQL数据库简介——《大数据技术原理与应用》课程学习总结
  11. 摄像机镜头的计算方法
  12. 原创超简单代码(1.18.50)
  13. 笔记本电脑什么牌子好 世界笔记本电脑排名
  14. Python复盘股票_3. 超短的复盘框架
  15. js如何判断Object是否为空?
  16. 【数值仿真】基于有限差分法的三维热传导matlab数值仿真(附代码)
  17. 百分之九十九的人所不知道的事
  18. es系列:es集群部署
  19. 使用Hierarchy View
  20. android_porting步骤

热门文章

  1. python OpenCV 答题卡识别判卷
  2. compilation debug=true targetFramework=4.0的解决
  3. 用Chrome浏览器来当手机模拟器
  4. 【HarmonyOS】鸿蒙应用开发中使用CommonDialog时调用setSwipeToDismiss()右滑关闭dialog功能为什么无法正常关闭?
  5. Android之RootTools框架简单使用
  6. 下15个周六是那年那月那日
  7. 【在绝望中寻找希望】读后感
  8. 数学建模方法(1)引言
  9. ArcMap 创建自定义切片方案
  10. MATLAB中readmatrix函数用法