lucene源码分析(5)lucence-group
1. 普通查询的用法
org.apache.lucene.search.IndexSearcher
public void search(Query query, Collector results)
其中
Collector定义
/*** <p>Expert: Collectors are primarily meant to be used to* gather raw results from a search, and implement sorting* or custom result filtering, collation, etc. </p>** <p>Lucene's core collectors are derived from {@link Collector}* and {@link SimpleCollector}. Likely your application can* use one of these classes, or subclass {@link TopDocsCollector},* instead of implementing Collector directly:** <ul>** <li>{@link TopDocsCollector} is an abstract base class* that assumes you will retrieve the top N docs,* according to some criteria, after collection is * done. </li> * * <li>{@link TopScoreDocCollector} is a concrete subclass * {@link TopDocsCollector} and sorts according to score + * docID. This is used internally by the {@link * IndexSearcher} search methods that do not take an * explicit {@link Sort}. It is likely the most frequently * used collector.</li> * * <li>{@link TopFieldCollector} subclasses {@link * TopDocsCollector} and sorts according to a specified * {@link Sort} object (sort by field). This is used * internally by the {@link IndexSearcher} search methods * that take an explicit {@link Sort}. * * <li>{@link TimeLimitingCollector}, which wraps any other * Collector and aborts the search if it's taken too much * time.</li> * * <li>{@link PositiveScoresOnlyCollector} wraps any other * Collector and prevents collection of hits whose score * is <= 0.0</li> * * </ul> * * @lucene.experimental */
Collector的层次结构
2 lucene-group
提供了分组查询GroupingSearch,对应相应的collector
3.实例:
public Map<String, Integer> groupBy(Query query, String field, int topCount) {Map<String, Integer> map = new HashMap<String, Integer>();long begin = System.currentTimeMillis();int topNGroups = topCount;int groupOffset = 0;int maxDocsPerGroup = 100;int withinGroupOffset = 0;try {FirstPassGroupingCollector c1 = new FirstPassGroupingCollector(field, Sort.RELEVANCE, topNGroups);boolean cacheScores = true; double maxCacheRAMMB = 4.0;CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB); indexSearcher.search(query, cachedCollector);Collection<SearchGroup<String>> topGroups = c1.getTopGroups(groupOffset, true);if (topGroups == null) { return null;} SecondPassGroupingCollector c2 = new SecondPassGroupingCollector(field, topGroups, Sort.RELEVANCE, Sort.RELEVANCE, maxDocsPerGroup, true, true, true);if (cachedCollector.isCached()) {// Cache fit within maxCacheRAMMB, so we can replay it: cachedCollector.replay(c2); } else {// Cache was too large; must re-execute query: indexSearcher.search(query, c2);}TopGroups<String> tg = c2.getTopGroups(withinGroupOffset);GroupDocs<String>[] gds = tg.groups;for(GroupDocs<String> gd : gds) {map.put(gd.groupValue, gd.totalHits);}} catch (IOException e) {e.printStackTrace();}long end = System.currentTimeMillis();System.out.println("group by time :" + (end - begin) + "ms");return map;}
几个参数说明:
groupField
: 分组域groupSort
: 分组排序topNGroups
: 最大分组数groupOffset
: 分组分页用withinGroupSort
: 组内结果排序maxDocsPerGroup
: 每个分组的最多结果数withinGroupOffset
: 组内分页用
参考资料
https://blog.csdn.net/wyyl1/article/details/7388241
转载于:https://www.cnblogs.com/davidwang456/p/10000765.html
lucene源码分析(5)lucence-group相关推荐
- Lucene 源码分析之倒排索引(三)
上文找到了 collect(-) 方法,其形参就是匹配的文档 Id,根据代码上下文,其中 doc 是由 iterator.nextDoc() 获得的,那 DefaultBulkScorer.itera ...
- lucene 源码分析_Lucene分析过程指南
lucene 源码分析 本文是我们名为" Apache Lucene基础知识 "的学院课程的一部分. 在本课程中,您将了解Lucene. 您将了解为什么这样的库很重要,然后了解Lu ...
- lucene源码分析的一些资料
针对lucene6.1较新的分析:http://46aae4d1e2371e4aa769798941cef698.devproxy.yunshipei.com/conansonic/article/d ...
- lucene源码分析(1)基本要素
1.源码包 core: Lucene core library analyzers-common: Analyzers for indexing content in different langua ...
- lucene源码分析(8)MergeScheduler
1.使用IndexWriter.java mergeScheduler.merge(this, MergeTrigger.EXPLICIT, newMergesFound); 2.定义MergeSch ...
- lucene源码分析(7)Analyzer分析
1.Analyzer的使用 Analyzer使用在IndexWriter的构造方法 /*** Constructs a new IndexWriter per the settings given i ...
- lucene源码分析(6)Query分析
查询的入口 /** Lower-level search API.** <p>{@link LeafCollector#collect(int)} is called for every ...
- lucene源码分析(4)Similarity相似度算法
lucene 7.5.0默认的评分Similarity是BM25Similarity (IndexSearcher.java) // the default Similarityprivate sta ...
- lucene源码分析(2)读取过程实例
1.官方提供的代码demo Analyzer analyzer = new StandardAnalyzer();// Store the index in memory:Directory dire ...
最新文章
- python语法面试题_Python语法面试题
- pandas转mysql特定列_在pandas.DataFrame.to_sql时指定数据库表的列类型
- python的 是什么-python中的生成器是什么?生成器有什么用处?
- 安装部署Exchange Server 2010 CAS NLB MailBox DAG
- activemq 开启监听_SpringBoot集成ActiveMQ怎么实现Topic发布/订阅模式通信?
- python设计模式7-桥接模式
- jquery jtable应用
- linux 校园网 热点,Linux/Ubuntu 16.04 使用校园网客户端Dr.com DrClient 有线连网,同时开启WiFi热点...
- 关于一些3D数学基础的习题,感兴趣的同行进来帮帮忙啦。
- SolidWorks零件图转工程图
- html如何让标签居中显示,HTML怎么让标签居中
- chrome清除某个网站缓存
- 迪文屏DMT12800K070_A2WTC踩坑实录(一)
- springboot接口慢_Springboot tomcat 启动慢 响应时间超长 问题解决
- 56个民族服饰:介绍56个民族的56个美女,衣服真的好漂亮
- Git develop分支的一些操作
- 计算机excel公式2010,计算机二级Office2010Eexcel公式汇总
- 软件测试缺陷指标,如何对缺陷进行分析,都分析哪些指标?
- php公众号用户关注,微信公众号获取用户信息(用户关注公众号)
- Ansys2020R2的Fluent网格重排问题(reorder)
热门文章
- # 日期待t_2020最值得期待的几款新车
- python3怎么创建一个链表_怎么创建一个自己的微信公众号
- idea不自动检查语法_idea自动检查失效-目录中类名下的红色波浪线没有自动消除问题...
- python导入哨兵数据_Python 下载哨兵Sentinel数据(Sentinel-1~3)
- java表格标题栏_Java MFixedColumnTable (提供行标题栏的表格)
- centos mysql 5.6.36_CentOS 6.9 升级MySQL 5.6.36到5.7.18
- http响应最大时长 nginx_nginx反向代理时如何保持长连接
- linux yum安装mysql5.7_Linux安装MySQL5.7通过yum安装轻松搞定
- linux mysql安装数据库在哪里看_linux 查看mysql安装在哪里?
- gzip、bzip2和tar