JDK8之Collector

Collector

Collector是JDK8开始新增加进来的。关于Collector是什么？有什么用？接下来就是来对Collector的源码DOC进行解析：

A mutable reduction operation that accumulates input elements into a mutable result container, optionally transforming the accumulated result into a final representation after all input elements have been processed. Reduction operations can be performed either sequentially or in parallel.

一种可变汇聚操作，将输入元素累积到可变结果容器中，可选地在处理完所有输入元素之后将累积结果转换为最终表示。汇聚操作可以顺序地或并行地执行。这句话就把 `Collector` 是什么诠释的特别详细啦！

例如：1.把元素累加到集合当中去。2.使用StringBuilder拼接字符串。3.计算关于诸如sum、min、max或平均值之类的元素的汇总信息。

 A {@code Collector} is specified by four functions that work together toaccumulate entries into a mutable result container, and optionally performa final transform on the result.They are:<ul><li>creation of a new result container ({@link #supplier()})</li><li>incorporating a new data element into a result container ({@link #accumulator()})</li><li>combining two result containers into one ({@link #combiner()})</li><li>performing an optional final transform on the container ({@link #finisher()})</li></ul>

一个Collector由以下四个函数指定将元素累积到可变结果容器中，并可选地执行对结果的最终转换：

supplier函数：创建一个结果容器。
accumulator函数：将新的数据元素加到结果容器中
combiner函数：将多个结果容器合并成一个容器
finisher函数：对容器执行可选的最终转换

通过这些DOC可以知道Collector有4个特别重要的方法，那么接下来就来看下这四个方法：

1.supplier

    /*** A function that creates and returns a new mutable result container.** @return a function which returns a new, mutable result container*/Supplier<A> supplier();

该方法需要返回一个Supplier对象，通过之前对Supplier的介绍，该对象就是一个提供者(也可以认为是一个生产者)，该方法会提供生产一个可变的结果容器(可变的结果容器就是我们常用的集合)，注意：该方法是一个泛型方法：泛型参数A代表生产出的结果容器类型。

2.accumulator

 /*** A function that folds a value into a mutable result container.** @return a function which folds a value into a mutable result container*/BiConsumer<A, T> accumulator();

该方法返回一个BiConsumer对象，该对象可以认为是消费者，而它的实际作用是把元素累加到supplier生产的结果容器中去。==注意：该方法同样式一个泛型方法，泛型参数T A分别代表元素的类型和结果容器的类型。

3.combiner

/*** A function that accepts two partial results and merges them.  The* combiner function may fold state from one argument into the other and* return that, or may return a new result container.** @return a function which combines two partial results into a combined* result*/BinaryOperator<A> combiner();

该方法返回一个BinaryOperator,该对象的作用是：接收两个T类型的参数，返回一个T类型的结果。因此就可以非常明确的理解到combiner的作用：将多个新的结果容器合并为一个结果容器。那么有一个问题出现啦：为什么会生产多个结果容器，把每个元素添加到结果容器中去，然后再把每个结果容器进行合并成一个容器。为什么不直接生成一个结果容器呢？

关于这个问题在下面会详细的讲解，这里就简单的说一下：因为，我们使用流的时候，流是有串行流和并行流的，那么并行流就是多线程操作的，因此在多线程操作的情况下，就可以生成多个结果容器，并把元素分配到每个新的结果容器中，分配结束后，就需要将所有结果容器合并为一个整体的结果容器。同样，该方法也是泛型方法，泛型参数A就是结果容器类型

4.finisher

/*** Perform the final transformation from the intermediate accumulation type* {@code A} to the final result type {@code R}.** <p>If the characteristic {@code IDENTITY_TRANSFORM} is* set, this function may be presumed to be an identity transform with an* unchecked cast from {@code A} to {@code R}.** @return a function which transforms the intermediate result to the final* result*/Function<A, R> finisher();

该方法返回一个Function对象，该对象的作用就是接收一个输入元素类型A，返回一个结果类型R。这样就可以理解finisher方法啦，该方法，就做最终的转换，把结果容器转换为我们需要的最终结果R。该方法同样值得注意的是：该方法的DOC也说了，就是如果characteristic中设置的有IDENTITY_TRANSFORM属性，finisher方法将不会被调用。关于为什么？这个问题同样会在下面做出专门的解释。

但是，我们应该知道的是：如果我们需要的最终结果类型就是可变容器类型，就没有必要执行finisher方法，这样也是为了提高执行效率。
该方法同样是泛型方法，泛型参数A R分别代表结果容器类型和最终转换的结果类型。`

上面已经对Collector主要的方法做了全面的记录，那么接下来就对Collector中的一个特别的方法做出讲解：

/*** Returns a {@code Set} of {@code Collector.Characteristics} indicating* the characteristics of this Collector.  This set should be immutable.** @return an immutable set of collector characteristics*/Set<Characteristics> characteristics();

characteristics方法：返回一组不可变的指示收集器特性。也就是说该方法是返回对该收集器特有的特性。那么，收集器又有哪些特性呢？

/*** Characteristics indicating properties of a {@code Collector}, which can* be used to optimize reduction implementations.*/enum Characteristics {/*** Indicates that this collector is <em>concurrent</em>, meaning that* the result container can support the accumulator function being* called concurrently with the same result container from multiple* threads.** <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},* then it should only be evaluated concurrently if applied to an* unordered data source.*/CONCURRENT,/*** Indicates that the collection operation does not commit to preserving* the encounter order of input elements.  (This might be true if the* result container has no intrinsic order, such as a {@link Set}.)*/UNORDERED,/*** Indicates that the finisher function is the identity function and* can be elided.  If set, it must be the case that an unchecked cast* from A to R will succeed.*/IDENTITY_FINISH}

在Collector接口内部定义了一个枚举类Characteristics，该枚举类就是列出了收集的特性。那么接下来就是对该枚举类中的元素做出详细的介绍：

1.CONCURRENT

 /*** Indicates that this collector is <em>concurrent</em>, meaning that* the result container can support the accumulator function being* called concurrently with the same result container from multiple* threads.** <p>If a {@code CONCURRENT} collector is not also {@code UNORDERED},* then it should only be evaluated concurrently if applied to an* unordered data source.*/CONCURRENT,

CONCURRENT属性标记该收集器是并发的。这意味着结果容器可以支持来自多个线程对其进行accumulator操作。也就是说该标记表示一个相同的结果容器可以被多个线程进行accumulator操作。因此，被CONCURRENT标记的收集器有一下特征：1.supplier会调用一次，只创建一个可变的容器。2.supplier创建的容器会在多线程环境下，被多个线程执行accumulator操作。3.combiner操作不会被调用。相反，如果没有CONCURRENT标记，combiner函数会被执行调用，并且supplier也会生成多个容器，一个线程生成一个容器。关于这些特性在下面自定义Collector实现的时候会去进行验证。

2.UNORDERED

/*** Indicates that the collection operation does not commit to preserving* the encounter order of input elements.  (This might be true if the* result container has no intrinsic order, such as a {@link Set}.)*/UNORDERED,

UNORDERED属性表示的意思比较简单。指示收集操作不承诺保留输入元素的输入顺序。（如果结果容器没有内在顺序，比如Set）

3.IDENTITY_FINISH

        /*** Indicates that the finisher function is the identity function and* can be elided.  If set, it must be the case that an unchecked cast* from A to R will succeed.*/IDENTITY_FINISH

IDENTITY_FINISH属性表示的是：如果设置了该特性，表示finisher函数不会被调用，在实现上可以被省略。并且设置了该特性，还必须保证从A类型到R类型可以被强制转换的。如果不能的话就会报ClassCastException异常。

上面算是对Collector做了全面的阐述，那么接下来就来尝试自定义一个收集器的实现：

public class MyCollector<T> implements Collector<T, Set<T>, Set<T>> {@Overridepublic Supplier<Set<T>> supplier() {System.out.println("supplier invoke");return HashSet<T>::new;}@Overridepublic BiConsumer<Set<T>, T> accumulator() {System.out.println("accumulator invoke");return (set,item)->{set.add(item);};}@Overridepublic BinaryOperator<Set<T>> combiner() {System.out.println("combiner invoke");return (set1,set2)->{set1.addAll(set2);return set1;};}@Overridepublic Function<Set<T>, Set<T>> finisher() {System.out.println("finisher invoke");return (set)->{return set;};}@Overridepublic Set<Characteristics> characteristics() {return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));}public static void main(String[] args) {Set<String> set = new HashSet<>(Arrays.asList("nihao","hello","world","hello","welcome"));set.stream().collect(new MyCollector<String>());}
}//output:
//supplier invoke
//accumulator invoke
//combiner invoke
//finisher invoke

通过观察输出结果就可以证明Collector收集器的4个方法执行顺序。那么，我们在characteristics方法中添加一句输出信息：

    @Overridepublic Set<Characteristics> characteristics() {System.out.println("characteristics invoke");return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));}//output://supplier invoke//accumulator invoke//combiner invoke//characteristics invoke//characteristics invoke//finisher invoke

这次的输出结果就发生了一定的变化：为什么characteristics方法比finisher方法先调用?并且characteristics方法调用了两次？

关于这个问题的答案我们就可以去参考collect方法的实现：

    public final <R, A> R collect(Collector<? super P_OUT, A, R> collector) {A container;if (isParallel()&& (collector.characteristics().contains(Collector.Characteristics.CONCURRENT))&& (!isOrdered() || collector.characteristics().contains(Collector.Characteristics.UNORDERED))) {container = collector.supplier().get();BiConsumer<A, ? super P_OUT> accumulator = collector.accumulator();forEach(u -> accumulator.accept(container, u));}else {container = evaluate(ReduceOps.makeRef(collector));}return collector.characteristics().contains(Collector.Characteristics.IDENTITY_FINISH)? (R) container: collector.finisher().apply(container);//通过这就可以非常直观的发现finisher方法发生在characteristics方法之后，并且finisher方法调用必须要IDENTITY_FINISH属性不存在。}

观察发现finisher方法发生在characteristics方法之后，并且finisher方法调用必须要IDENTITY_FINISH属性不存在。

接下来，把上面定义的收集器代码改写一下：

@Overridepublic Supplier<Set<T>> supplier() {return ()->{System.out.println("supplier create container-------"+Thread.currentThread().getName());return new HashSet<T>();};}@Overridepublic BiConsumer<Set<T>, T> accumulator() {return (set,item)->{System.out.println("accumulator:"+set+"------"+Thread.currentThread().getName());set.add(item);};}@Overridepublic BinaryOperator<Set<T>> combiner() {return (set1,set2)->{System.out.println("combiner:"+ set1 + "----" + Thread.currentThread().getName());set1.addAll(set2);return set1;};}@Overridepublic Function<Set<T>, Set<T>> finisher() {return (set)->{System.out.println("finisher:"+ set + "----" +Thread.currentThread().getName());return set;};}@Overridepublic Set<Characteristics> characteristics() {return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED));}public static void main(String[] args) {Set<String> set = new HashSet<>(Arrays.asList("nihao","hello","world","hello","welcome"));System.out.println(set.stream().collect(new MyCollector<String>()));}

1.characteristics方法提供UNORDERED属性的时候

使用串行流实现上面的代码，输出结果为：

supplier create container-------main
accumulator:[]------main
accumulator:[world]------main
accumulator:[world, nihao]------main
accumulator:[world, nihao, hello]------main
finisher:[world, nihao, hello, welcome]----main
[world, nihao, hello, welcome]

从输出结果可以看出：1.只创建了一个结果容器。2.只有一个线程(main)操作结果容器。3.没有执行combiner方法。4.执行了finisher方法

把流改为并行流：

public static void main(String[] args) {Set<String> set = new HashSet<>(Arrays.asList("nihao","hello","world","hello","welcome"));System.out.println(set.stream().parallel() .collect(new MyCollector<String>()));}

执行后的输出结果：

supplier create container-------ForkJoinPool.commonPool-worker-1
supplier create container-------main
supplier create container-------ForkJoinPool.commonPool-worker-2
accumulator:[]------ForkJoinPool.commonPool-worker-2
accumulator:[]------ForkJoinPool.commonPool-worker-1
accumulator:[]------main
supplier create container-------ForkJoinPool.commonPool-worker-2
combiner:[world]----ForkJoinPool.commonPool-worker-1
accumulator:[]------ForkJoinPool.commonPool-worker-2
combiner:[hello]----ForkJoinPool.commonPool-worker-2
combiner:[world, nihao]----ForkJoinPool.commonPool-worker-2
finisher:[world, nihao, hello, welcome]----main
[world, nihao, hello, welcome]

1.会创建多个结果容器。2.有多个线程执行。3.会执行combineer方法。3.会执行finisher方法。

通过输出结果就可以证明：当characteristics方法提供的set容器中添加收集器属性只有UNORDERED,那么收集器执行的时候就会执行finisher方法。如果在串行流中，只会创建一个结果容器，并且不会执行combiner方法。如果在并行流中，就会创建多个结果容器，会执行combiner方法。

2.characteristics方法提供IDENTITY_FINISH属性和的时候

@Overridepublic Set<Characteristics> characteristics() {return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED,Characteristics.IDENTITY_FINISH));}

在串行流中执行的结果：

supplier create container-------main
accumulator:[]------main
accumulator:[world]------main
accumulator:[world, nihao]------main
accumulator:[world, nihao, hello]------main
[world, nihao, hello, welcome]

在并行流中执行的结果：

supplier create container-------main
supplier create container-------ForkJoinPool.commonPool-worker-2
supplier create container-------ForkJoinPool.commonPool-worker-1
supplier create container-------ForkJoinPool.commonPool-worker-3
accumulator:[]------main
accumulator:[]------ForkJoinPool.commonPool-worker-3
accumulator:[]------ForkJoinPool.commonPool-worker-1
accumulator:[]------ForkJoinPool.commonPool-worker-2
combiner:[hello]----ForkJoinPool.commonPool-worker-1
combiner:[world]----ForkJoinPool.commonPool-worker-2
combiner:[world, nihao]----ForkJoinPool.commonPool-worker-2
[world, nihao, hello, welcome]

通过观察结果：添加了IDENTITY_FINISH属性后，就不会调用收集器中的finisher方法。但是需要注意：如果添加了IDENTITY_FINISH属性，就意味着从A类型到R类型是通过强制类型转换的，而没有调用finisher方法。所以，必须保证A结果容器类型到R最终结果类型是可以进行强制转换的。不然的话，就会报ClassCastException异常。

3.characteristics方法提供CONCURRENT属性和的时候

@Overridepublic Set<Characteristics> characteristics() {return Collections.unmodifiableSet(EnumSet.of(Characteristics.UNORDERED,Characteristics.IDENTITY_FINISH,Characteristics.CONCURRENT));}

在串行流中执行的结果：

supplier create container-------main
accumulator:[]------main
accumulator:[world]------main
accumulator:[world, nihao]------main
accumulator:[world, nihao, hello]------main
[world, nihao, hello, welcome]

在并行流中执行的结果：

supplier create container-------main
accumulator:[]------main
accumulator:[hello]------main
accumulator:[hello, welcome]------main
accumulator:[hello]------ForkJoinPool.commonPool-worker-1
[world, nihao, hello, welcome]

通过观察结果：添加了CONCURRENT属性，就意味着在多线程操作下，supplier方法也只会提供一个结果容器，因此就不需要调用combiner方法。但是必须注意的是：由于是多个线程操作同一个结果容器，那么在accumulator方法中，就不要做一些线程不安全的操作，不然的话就会出现异常的结果甚至报错。

JDK8之Collector相关推荐

JDK8 Stream操作 collectingAndThen：根据对象的属性去重
来源:blog.csdn.net/qq_35634181/article/details/108867857 ExportTemperatureDto实体对象: @Getter @Setter @To ...
java 自定义 operator_java8 自定义Collector
package com.lgx.jdk8.part02; import java.util.*; import java.util.function.BiConsumer; import java.u ...
Pinpoint【环境搭建 01】JDK\HBase\Pinpoint Collector+Web 最新版 2.3.3 安装配置运行验证及脚本文件分享（避坑指南捷径指北）
本文主要是介绍 Pinpoint 环境的部署,小伙伴儿们也可以参考 Pinpoint <官网>的<快速入门>手册,最新版本v2.3.3组件可到官方<GitHub仓库> ...
gc()两分钟了解JDK8默认垃圾收集器(附英文)
Hello!today let's try to know the default GC of JDK8.You know,在JDK8中JVM(Java Virtual Machine)的参数大概总数 ...
JDK8之Stream新特性
/***JDK8 Stream特性* Created by chengbx on 2018/5/27.* Java 8 中的 Stream 是对集合(Collection)对象功能的增强,它专注于对集 ...
jdk8 字符串_在JDK 8中连接字符串
jdk8 字符串 JDK 8引入了语言功能,例如lambda表达式 , 流 ,甚至是新的Date / Time API ,这些都会改变我们编写Java应用程序的方式. 但是,还有一些新的API和功能可 ...
jdk8分组统计字段和_JDK 8流和分组
jdk8分组统计字段和我在JDK 8中的Stream-Powered Collections Functionality中介绍了将JDK 8的Streams与Java集合一起使用的强大功能. 我没有 ...
jdk8集合类流_JDK 8中的流驱动的集合功能
jdk8集合类流这篇文章介绍了JDK 8的应用–引入了带有集合的流 ,以更简洁地完成通常需要的与集合相关的功能. 在此过程中,将演示并简要说明使用Java Streams的几个关键方面. 请注意, ...
jdk8新特性（接口新特性、lambda表达式、方法引用、函数式接口、Stream流）和单例设计模式
1.单例设计模式 1.概念: 设计模式:使用固有的流程或方式设计出来的类接口.枚举等元素 2.设计原则: 1.私有化构造方法[private.protected] 2.类中创建最终对象[唯一不能被赋值 ...

JDK8之Collector

Collector

JDK8之Collector相关推荐

最新文章

热门文章