

java7中用 HashMap底层算法使用了数组加链表的结构

  • 插入元素
    计算待插入元素的hashcode值, 通过Hash算法, 算出此hashcode值在数组中对应下标, 然后查这个下标位置的链表, 没有的话直接插入, 如果有的话, 查询链表, 并与新元素equals 如果返回的都是false 则把新元素放在链表第一个位置上, 如果有返回true的, 则插入失败​
  • 数组扩容→加载因子


HashMap的内存结构进行升级 数组+链表+红黑树

  • *当前HashMap中元素总数超过64个, 且某一组链表中元素数量>=8个 则将此链表结构变成红黑树结构
  • 红黑树结构牺牲了添加元素的性能, 增加了查找元素的效率


1. 有1000_000条不同的记录需要插入到HashMap,怎样插入并说明理由?

这里考察的是capacity 和loadFactor, 当插入记录达到capacity*loadFactor时,需要对HashMap重新分配空间(rebuilt)和拷贝数据。所以对于需要插入很多数据时,比较好的方法是初始容量分配size*2的空间。

 <p>As a general rule, the default load factor (.75) offers a good
* tradeoff between time and space costs.  Higher values decrease the
* space overhead but increase the lookup cost (reflected in most of
* the operations of the <tt>HashMap</tt> class, including
* <tt>get</tt> and <tt>put</tt>).  The expected number of entries in
* the map and its load factor should be taken into account when
* setting its initial capacity, so as to minimize the number of
* rehash operations.  If the initial capacity is greater than the
* maximum number of entries divided by the load factor, no rehash
* operations will ever occur.
import java.util.HashMap;/*** to insert 1000_000 elements into hashmap, test the impact of initial values on hashmap performance.* best performance: entity size * 2;* worst performance: default capacity size;*/
public class HashMapTest {static final int SIZE = 1000_000;public static void main(String[] args) {testMap(0);testMap((int)(SIZE/0.75+16));testMap(SIZE * 2);testMap(SIZE * 3);}public static void testMap(int capacity) {long start = System.currentTimeMillis();HashMap<Integer, Integer> map;if (capacity <= 0) {map = new HashMap<>();} else {map = new HashMap<>(capacity);}for (int i=0; i<SIZE; i++) {map.put(i, i);}System.out.println("capacity: " + capacity +", cost: "+ (System.currentTimeMillis() - start) + "ms");}

capacity: 0, cost: 119ms

capacity: 1333349, cost: 65ms

capacity: 2000000, cost: 32ms

capacity: 3000000, cost: 70ms

2. HashMap怎样删除满足条件的元素?

* <p>The iterators returned by all of this class's "collection view methods"
* are <i>fail-fast</i>: if the map is structurally modified at any time after
* the iterator is created, in any way except through the iterator's own
* <tt>remove</tt> method, the iterator will throw a
* {@link ConcurrentModificationException}.  Thus, in the face of concurrent
* modification, the iterator fails quickly and cleanly, rather than risking
* arbitrary, non-deterministic behavior at an undetermined time in the
* future.
/*** wrong deletion, throw out exception: java.util.ConcurrentModificationException*/
public static void wrongDelete() {HashMap<Integer, Integer> map = new HashMap<>(10*2);for (int i=0; i<10; i++) {map.put(i, i);}for (Integer key: map.keySet()) {if (key % 2 == 0) {map.remove(key);}}
/*** use keySet, values(), entrySet iterator.remove to delete key/value.*/
public static void rightDelete() {HashMap<Integer, Integer> map = new HashMap<>(10*2);for (int i=0; i<10; i++) {map.put(i, i);}// use iterator.remove to remove key and valueIterator<Integer> itr = map.keySet().iterator();while (itr.hasNext()) {Integer key = itr.next();if (key % 2 == 0) {itr.remove();}}for (Integer key: map.keySet()) {System.out.println(key +"=" + map.get(key));}

3. Java8之前HashMap底层是怎样实现的?Java8又是怎样实现的?





Tree bins (i.e., bins whose elements are all TreeNodes) are
* ordered primarily by hashCode, but in the case of ties, if two
* elements are of the same "class C implements Comparable<C>",
* type then their compareTo method is used for ordering. (We
* conservatively check generic types via reflection to validate
* this -- see method comparableClassFor).  The added complexity
* of tree bins is worthwhile in providing worst-case O(log n)
* operations when keys either have distinct hashes or are
* orderable, Thus, performance degrades gracefully under
* accidental or malicious usages in which hashCode() methods
* return values that are poorly distributed, as well as those in
* which many keys share a hashCode, so long as they are also
* Comparable. (If neither of these apply, we may waste about a
* factor of two in time and space compared to taking no
* precautions. But the only known cases stem from poor user
* programming practices that are already so slow that this makes
* little difference.)

5. 为什么链表转化为红黑树的门槛是8?

查看java8 HashMap注释大意是:在hashcode随机分布时,链表长度和红黑树的出现概率复合泊松分布。在链表长度为8时,红黑树出现概率为百万分子6(非常小的概率)。

* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins.  In
* usages with well-distributed user hashCodes, tree bins are
* rarely used.  Ideally, under random hashCodes, the frequency of
* nodes in bins follows a Poisson distribution
* (http://en.wikipedia.org/wiki/Poisson_distribution) with a
* parameter of about 0.5 on average for the default resizing
* threshold of 0.75, although with a large variance because of
* resizing granularity. Ignoring variance, the expected
* occurrences of list size k are (exp(-0.5) * pow(0.5, k) /
* factorial(k)). The first values are:
* 0:    0.60653066
* 1:    0.30326533
* 2:    0.07581633
* 3:    0.01263606
* 4:    0.00157952
* 5:    0.00015795
* 6:    0.00001316
* 7:    0.00000094
* 8:    0.00000006
* more: less than 1 in ten million







