JVM学习--垃圾回收机制

JVM学习–垃圾回收机制

本章学习jvm中关于垃圾回收机制的相关原理。部分内容参考Oracle官网和咕泡学院教材。分析版本为jdk1.8。

垃圾的定义

听到垃圾，就想到了名场面。

断水流大师兄表示，在座的各位都是垃圾。场景定义如下：

在座：理解成内存区域
各位：内存中的各个对象
都是垃圾：都是没用的对象
大师兄：上帝视角在看待，我理解成GC Root

最后，大师兄说完后，把其他人都干掉了，也就是把垃圾回收掉了。

没用的对象

怎么理解没用的对象呢？

引用计数法

引用计数法的理解很简单。在程序中不存在任何引用指向该对象，那么这个对象就是垃圾，反之则不是。

思考：如果A引用B，B引用A，那么这两个对象永远不能被回收。

可达性分析

通过GC Root对象开始向下寻找，来确认对象是否可达。

GC Root：可以作为GC Root的对象为类加载器、Thread、虚拟机栈的本地变量表、static成员、常量引用、本地方法栈的变量等。

垃圾回收算法

在知道对象是否是垃圾之后，接下来考虑怎么回收这些垃圾。断水流大师兄用直拳干掉了拳击手、先破坏木刀然后干掉了剑道部主将等。也就是使用了不同的方式来回收垃圾。

标记清除（Mark-Sweep）

标记清除算法顾名思义就是将垃圾标记出来，然后清除掉这些垃圾对象。如下图所示：

思考：

标记清除算法，从结果来看确实完成了对垃圾对象的回收，但是会导致内存空间不连续，内存碎片过多。如果下次再次分配的对象较大，可能因为空间不连续而导致无法分配，从而触发另一次GC。

复制算法（Copy）

复制算法的原理是将内存区域划分为两块，当一块内存空间不足的情况下，将还存活的对象复制到另外一块空间，然后一次回收。如下图所示：

思考：

如图所示，复制算法解决了空间碎片的问题，但是又出现了另外的问题，也就是空间浪费。

标记整理（Mark-Compact）

标记整理算法，标记的过程同标记清除算法一样。但是后续步骤是将存活对象整理移动到一端，然后清理以外的内存。

分代回收算法策略

思考：

介绍了几种算法后，那么在jvm中到底是用什么算法来做垃圾回收的呢？

Young区：复制算法(对象在被分配之后，可能生命周期比较短，Young区复制效率比较高)
Old区：标记清除或标记整理(Old区对象存活时间比较长，复制来复制去没必要，不如做个标记再清理)

–参考自咕泡学院教材

垃圾回收器

垃圾回收算法是策略的提供，而垃圾回收器则是策略方法的具体落地。下图为java提供的垃圾收集器分类及作用范围：

–参考自咕泡学院教材

在Java Hotspot VM中提供三种类型的回收器：The serial collector（串行）、The parallel collector（并行）、The mostly concurrent collector（并发）。参考如下官网介绍：

Available Collectors

The Java HotSpot VM includes three different types of collectors, each with different performance characteristics.

The serial collector uses a single thread to perform all garbage collection work, which makes it relatively efficient because there is no communication overhead between threads. It is best-suited to single processor machines, because it cannot take advantage of multiprocessor hardware, although it can be useful on multiprocessors for applications with small data sets (up to approximately 100 MB). The serial collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseSerialGC.

The parallel collector (also known as the throughput collector) performs minor collections in parallel, which can significantly reduce garbage collection overhead. It is intended for applications with medium-sized to large-sized data sets that are run on multiprocessor or multithreaded hardware. The parallel collector is selected by default on certain hardware and operating system configurations, or can be explicitly enabled with the option -XX:+UseParallelGC.

Parallel compaction is a feature that enables the parallel collector to perform major collections in parallel. Without parallel compaction, major collections are performed using a single thread, which can significantly limit scalability. Parallel compaction is enabled by default if the option -XX:+UseParallelGC has been specified. The option to turn it off is -XX:-UseParallelOldGC.

The mostly concurrent collector performs most of its work concurrently (for example, while the application is still running) to keep garbage collection pauses short. It is designed for applications with medium-sized to large-sized data sets in which response time is more important than overall throughput because the techniques used to minimize pauses can reduce application performance. The Java HotSpot VM offers a choice between two mostly concurrent collectors; see The Mostly Concurrent Collectors. Use the option -XX:+UseConcMarkSweepGC to enable the CMS collector or -XX:+UseG1GC to enable the G1 collector.

Serial

Serial是早期jvm提供的对新生代回收的唯一选择（jdk1.3.1之前）。他是一种串行的单线程回收器，仅仅只会使用单个CPU、单个线程去完成垃圾回收工作。

优点：简单高效，拥有很高的单线程收集效率
缺点：收集过程需要暂停所有线程
算法：复制算法
适用范围：新生代
应用：Client模式下的默认新生代收集器

ParNew

Serial多线程版本，理解成多核CPU发展后对Serial的升级。

优点：在多CPU时，比Serial效率高。
缺点：收集过程暂停所有应用程序线程，单CPU时比Serial效率差。
算法：复制算法
适用范围：新生代
应用：运行在Server模式下的虚拟机中首选的新生代收集器

Parallel Scavenge

Parallel Scavenge同ParNew类似，也是新生代的垃圾回收器。但是Parallel Scavenge更加关注系统吞吐量。

吞吐量=运行用户代码的时间/(运行用户代码的时间+垃圾收集时间)
比如虚拟机总共运行了100分钟，垃圾收集时间用了1分钟，吞吐量=(100-1)/100=99%。
若吞吐量越大，意味着垃圾收集的时间越短，则用户代码可以充分利用CPU资源，尽快完成程序
的运算任务。

Serial Old

Serial Old可以理解为Serial的老年代版本，区别在于Serial Old使用的是标记整理算法。

Parallel Old

Parallel Old可以理解为Parallel Scavenge的老年代版本，区别在于Parallel Old使用的是标记整理算法。

Concurrent Mark Sweep(CMS)

CMS是为了更短的垃圾回收时间并且能够在垃圾回收阶段应用程序与垃圾回收器共享CPU资源而设计的。参考如下官网介绍：

The Concurrent Mark Sweep (CMS) collector is designed for applications that prefer shorter garbage collection pauses and that can afford to share processor resources with the garbage collector while the application is running. Typically applications that have a relatively large set of long-lived data (a large tenured generation) and run on machines with two or more processors tend to benefit from the use of this collector. However, this collector should be considered for any application with a low pause time requirement. The CMS collector is enabled with the command-line option -XX:+UseConcMarkSweepGC.

直译：Concurrent Mark Sweep (CMS)收集器是为那些喜欢更短的垃圾收集暂停时间并且能够在应用程序运行时与垃圾收集器共享处理器资源的应用程序设计的。通常，具有相对较大的长期数据集(较大的长期生成)并在具有两个或更多处理器的机器上运行的应用程序会受益于此收集器的使用。但是，对于任何暂停时间要求较低的应用程序，都应该考虑使用此收集器。CMS收集器通过命令行选项-XX:+UseConcMarkSweepGC启用。

回收阶段执行示意图：

(1)初始标记 CMS initial mark 标记GC Roots能关联到的对象 Stop The World–
->速度很快
(2)并发标记 CMS concurrent mark 进行GC Roots Tracing
(3)重新标记 CMS remark 修改并发标记因用户程序变动的内容 Stop The
World
(4)并发清除 CMS concurrent sweep

由于整个过程中，并发标记和并发清除，收集器线程可以与用户线程一起工作，所以总体上来
说，CMS收集器的内存回收过程是与用户线程一起并发地执行的。

优点：并发收集、低停顿
缺点：产生大量空间碎片、并发阶段会降低吞吐量

–参考自咕泡学院教材

更多详细内容可以参考官网：Concurrent Mark Sweep (CMS) Collector

Garbage-First（G1）

G1是针对现代环境下多处理器和大内存设备设计的一款支持尽可能满足用户指定的垃圾回收暂时时间目标，同时实现高吞吐量、并发执行、整堆操作。

The Garbage-First (G1) garbage collector is a server-style garbage collector, targeted for multiprocessor machines with large memories. It attempts to meet garbage collection (GC) pause time goals with high probability while achieving high throughput. Whole-heap operations, such as global marking, are performed concurrently with the application threads. This prevents interruptions proportional to heap or live-data size.

直译：Garbage-First(G1)垃圾收集器是一种Server风格的垃圾收集器，针对具有大内存的多处理器机器。它尝试以高概率满足垃圾收集(GC)暂停时间目标，同时实现高吞吐量。整堆操作(例如全局标记)与应用程序线程并发执行。这可以防止中断与堆或实时数据大小成比例。

并行与并发
分代收集（仍然保留了分代的概念）
空间整合（整体上属于“标记-整理”算法，不会导致空间碎片）
可预测的停顿（比CMS更先进的地方在于能让使用者明确指定一个长度为M毫秒的时间片段内，消耗在垃圾收集
上的时间不得超过N毫秒）

使用G1收集器时，Java堆的内存布局与就与其他收集器有很大差别，它将整个Java堆划分为多个
大小相等的独立区域（Region），虽然还保留有新生代和老年代的概念，但新生代和老年代不再
是物理隔离的了，它们都是一部分Region（不需要连续）的集合。

回收步骤如下：

初始标记（Initial Marking）标记一下GC Roots能够关联的对象，并且修改TAMS的值，需要暂
停用户线程
并发标记（Concurrent Marking）从GC Roots进行可达性分析，找出存活的对象，与用户线程并发
执行
最终标记（Final Marking）修正在并发标记阶段因为用户程序的并发执行导致变动的数据，需
暂停用户线程
筛选回收（Live Data Counting and Evacuation）对各个Region的回收价值和成本进行排序，根据
用户所期望的GC停顿时间制定回收计划

–参考自咕泡学院教材

回收阶段示意图：

更多详细内容可以参考官网：Garbage-First Garbage Collector

理解吞吐量和暂停时间

停顿时间->垃圾收集器进行垃圾回收终端应用执行响应的时间

吞吐量->运行用户代码时间/(运行用户代码时间+垃圾收集时间)

停顿时间越短就越适合需要和用户交互的程序，良好的响应速度能提升用户体验；
高吞吐量则可以高效地利用CPU时间，尽快完成程序的运算任务，主要适合在后台运算而不需要太多交互的任
务。

这两个指标也是评价垃圾回收器好处的标准，其实调优也就是在观察者两个变量。

–参考自咕泡学院教材

选择垃圾回收器

参考官网介绍：Selecting a Collector

Unless your application has rather strict pause time requirements, first run your application and allow the VM to select a collector. If necessary, adjust the heap size to improve performance. If the performance still does not meet your goals, then use the following guidelines as a starting point for selecting a collector.

If the application has a small data set (up to approximately 100 MB), then

select the serial collector with the option -XX:+UseSerialGC.

If the application will be run on a single processor and there are no pause time requirements, then let the VM select the collector, or select the serial collector with the option -XX:+UseSerialGC.

If (a) peak application performance is the first priority and (b) there are no pause time requirements or pauses of 1 second or longer are acceptable, then let the VM select the collector, or select the parallel collector with -XX:+UseParallelGC.

If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately 1 second, then select the concurrent collector with -XX:+UseConcMarkSweepGC or -XX:+UseG1GC.

These guidelines provide only a starting point for selecting a collector because performance is dependent on the size of the heap, the amount of live data maintained by the application, and the number and speed of available processors. Pause times are particularly sensitive to these factors, so the threshold of 1 second mentioned previously is only approximate: the parallel collector will experience pause times longer than 1 second on many data size and hardware combinations; conversely, the concurrent collector may not be able to keep pauses shorter than 1 second on some combinations.

If the recommended collector does not achieve the desired performance, first attempt to adjust the heap and generation sizes to meet the desired goals. If performance is still inadequate, then try a different collector: use the concurrent collector to reduce pause times and use the parallel collector to increase overall throughput on multiprocessor hardware.

直译：除非应用程序有相当严格的暂停时间要求，否则首先运行应用程序并允许VM选择收集器。如果需要，调整堆大小以提高性能。如果性能仍然不能满足您的目标，那么使用以下指导原则作为选择收集器的起点。

如果应用程序的数据集很小(最多大约100 MB)，那么

使用选项-XX:+UseSerialGC选择串行收集器。

如果应用程序运行在单个处理器上，并且没有暂停时间要求，那么让VM选择收集器，或者选择串行收集器，选项-XX:+UseSerialGC。

如果(a)应用程序性能峰值是优先级，并且(b)没有暂停时间要求，或者可以接受1秒或更长时间的暂停，那么让VM选择收集器，或者选择并行收集器，使用-XX:+UseParallelGC。

如果响应间比总体吞吐量更重要，并且垃圾收集暂停时间必须保持在大约1秒以下，那么选择具有-XX:+UseConcMarkSweepGC或-XX:+UseG1GC的并发收集器。

这些指导原则仅为选择收集器提供了一个起点，因为性能取决于堆的大小、应用程序维护的活动数据量以及可用处理器的数量和速度。暂停时间对这些因素特别敏感，因此前面提到的1秒阈值只是近似值:并行收集器在许多数据大小和硬件组合上的暂停时间将超过1秒;相反，在某些组合中，并发收集器可能无法将暂停时间保持在1秒以下。

如果推荐的收集器没有达到预期的性能，首先尝试调整堆和生成大小以满足预期的目标。如果性能仍然不足够，那么尝试使用不同的收集器:使用并发收集器来减少暂停时间，使用并行收集器来增加多处理器硬件上的总体吞吐量。

总的来说，对于垃圾回收器的选择要取决于使用场景、服务器配置以及应用程序规模。需要经过不断尝试调整才能得出最适合自身的方式。