1. 普通查询的用法

org.apache.lucene.search.IndexSearcher

public void search(Query query, Collector results)

其中

Collector定义

/*** <p>Expert: Collectors are primarily meant to be used to* gather raw results from a search, and implement sorting* or custom result filtering, collation, etc. </p>** <p>Lucene's core collectors are derived from {@link Collector}* and {@link SimpleCollector}. Likely your application can* use one of these classes, or subclass {@link TopDocsCollector},* instead of implementing Collector directly:** <ul>**   <li>{@link TopDocsCollector} is an abstract base class*   that assumes you will retrieve the top N docs,*   according to some criteria, after collection is * done. </li> * * <li>{@link TopScoreDocCollector} is a concrete subclass * {@link TopDocsCollector} and sorts according to score + * docID. This is used internally by the {@link * IndexSearcher} search methods that do not take an * explicit {@link Sort}. It is likely the most frequently * used collector.</li> * * <li>{@link TopFieldCollector} subclasses {@link * TopDocsCollector} and sorts according to a specified * {@link Sort} object (sort by field). This is used * internally by the {@link IndexSearcher} search methods * that take an explicit {@link Sort}. * * <li>{@link TimeLimitingCollector}, which wraps any other * Collector and aborts the search if it's taken too much * time.</li> * * <li>{@link PositiveScoresOnlyCollector} wraps any other * Collector and prevents collection of hits whose score * is &lt;= 0.0</li>  * * </ul> * * @lucene.experimental */

Collector的层次结构

2 lucene-group

提供了分组查询GroupingSearch,对应相应的collector

3.实例:

public Map<String, Integer> groupBy(Query query, String field, int topCount) {Map<String, Integer> map = new HashMap<String, Integer>();long begin = System.currentTimeMillis();int topNGroups = topCount;int groupOffset = 0;int maxDocsPerGroup = 100;int withinGroupOffset = 0;try {FirstPassGroupingCollector c1 = new FirstPassGroupingCollector(field, Sort.RELEVANCE, topNGroups);boolean cacheScores = true; double maxCacheRAMMB = 4.0;CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB); indexSearcher.search(query, cachedCollector);Collection<SearchGroup<String>> topGroups = c1.getTopGroups(groupOffset, true);if (topGroups == null) { return null;} SecondPassGroupingCollector c2 = new SecondPassGroupingCollector(field, topGroups, Sort.RELEVANCE, Sort.RELEVANCE, maxDocsPerGroup, true, true, true);if (cachedCollector.isCached()) {// Cache fit within maxCacheRAMMB, so we can replay it:
            cachedCollector.replay(c2); } else {// Cache was too large; must re-execute query:
            indexSearcher.search(query, c2);}TopGroups<String> tg = c2.getTopGroups(withinGroupOffset);GroupDocs<String>[] gds = tg.groups;for(GroupDocs<String> gd : gds) {map.put(gd.groupValue, gd.totalHits);}} catch (IOException e) {e.printStackTrace();}long end = System.currentTimeMillis();System.out.println("group by time :" + (end - begin) + "ms");return map;}

几个参数说明:

  • groupField: 分组域
  • groupSort: 分组排序
  • topNGroups: 最大分组数
  • groupOffset: 分组分页用
  • withinGroupSort: 组内结果排序
  • maxDocsPerGroup: 每个分组的最多结果数
  • withinGroupOffset: 组内分页用

参考资料

https://blog.csdn.net/wyyl1/article/details/7388241

转载于:https://www.cnblogs.com/davidwang456/p/10000765.html

lucene源码分析(5)lucence-group相关推荐

  1. Lucene 源码分析之倒排索引(三)

    上文找到了 collect(-) 方法,其形参就是匹配的文档 Id,根据代码上下文,其中 doc 是由 iterator.nextDoc() 获得的,那 DefaultBulkScorer.itera ...

  2. lucene 源码分析_Lucene分析过程指南

    lucene 源码分析 本文是我们名为" Apache Lucene基础知识 "的学院课程的一部分. 在本课程中,您将了解Lucene. 您将了解为什么这样的库很重要,然后了解Lu ...

  3. lucene源码分析的一些资料

    针对lucene6.1较新的分析:http://46aae4d1e2371e4aa769798941cef698.devproxy.yunshipei.com/conansonic/article/d ...

  4. lucene源码分析(1)基本要素

    1.源码包 core: Lucene core library analyzers-common: Analyzers for indexing content in different langua ...

  5. lucene源码分析(8)MergeScheduler

    1.使用IndexWriter.java mergeScheduler.merge(this, MergeTrigger.EXPLICIT, newMergesFound); 2.定义MergeSch ...

  6. lucene源码分析(7)Analyzer分析

    1.Analyzer的使用 Analyzer使用在IndexWriter的构造方法 /*** Constructs a new IndexWriter per the settings given i ...

  7. lucene源码分析(6)Query分析

    查询的入口 /** Lower-level search API.** <p>{@link LeafCollector#collect(int)} is called for every ...

  8. lucene源码分析(4)Similarity相似度算法

    lucene 7.5.0默认的评分Similarity是BM25Similarity (IndexSearcher.java) // the default Similarityprivate sta ...

  9. lucene源码分析(2)读取过程实例

    1.官方提供的代码demo Analyzer analyzer = new StandardAnalyzer();// Store the index in memory:Directory dire ...

最新文章

  1. python语法面试题_Python语法面试题
  2. pandas转mysql特定列_在pandas.DataFrame.to_sql时指定数据库表的列类型
  3. python的 是什么-python中的生成器是什么?生成器有什么用处?
  4. 安装部署Exchange Server 2010 CAS NLB MailBox DAG
  5. activemq 开启监听_SpringBoot集成ActiveMQ怎么实现Topic发布/订阅模式通信?
  6. python设计模式7-桥接模式
  7. jquery jtable应用
  8. linux 校园网 热点,Linux/Ubuntu 16.04 使用校园网客户端Dr.com DrClient 有线连网,同时开启WiFi热点...
  9. 关于一些3D数学基础的习题,感兴趣的同行进来帮帮忙啦。
  10. SolidWorks零件图转工程图
  11. html如何让标签居中显示,HTML怎么让标签居中
  12. chrome清除某个网站缓存
  13. 迪文屏DMT12800K070_A2WTC踩坑实录(一)
  14. springboot接口慢_Springboot tomcat 启动慢 响应时间超长 问题解决
  15. 56个民族服饰:介绍56个民族的56个美女,衣服真的好漂亮
  16. Git develop分支的一些操作
  17. 计算机excel公式2010,计算机二级Office2010Eexcel公式汇总
  18. 软件测试缺陷指标,如何对缺陷进行分析,都分析哪些指标?
  19. php公众号用户关注,微信公众号获取用户信息(用户关注公众号)
  20. Ansys2020R2的Fluent网格重排问题(reorder)

热门文章

  1. # 日期待t_2020最值得期待的几款新车
  2. python3怎么创建一个链表_怎么创建一个自己的微信公众号
  3. idea不自动检查语法_idea自动检查失效-目录中类名下的红色波浪线没有自动消除问题...
  4. python导入哨兵数据_Python 下载哨兵Sentinel数据(Sentinel-1~3)
  5. java表格标题栏_Java MFixedColumnTable (提供行标题栏的表格)
  6. centos mysql 5.6.36_CentOS 6.9 升级MySQL 5.6.36到5.7.18
  7. http响应最大时长 nginx_nginx反向代理时如何保持长连接
  8. linux yum安装mysql5.7_Linux安装MySQL5.7通过yum安装轻松搞定
  9. linux mysql安装数据库在哪里看_linux 查看mysql安装在哪里?
  10. gzip、bzip2和tar