lucene源码分析(5)lucence-group

1. 普通查询的用法

org.apache.lucene.search.IndexSearcher

public void search(Query query, Collector results)

其中

Collector定义

/*** <p>Expert: Collectors are primarily meant to be used to* gather raw results from a search, and implement sorting* or custom result filtering, collation, etc. </p>** <p>Lucene's core collectors are derived from {@link Collector}* and {@link SimpleCollector}. Likely your application can* use one of these classes, or subclass {@link TopDocsCollector},* instead of implementing Collector directly:** <ul>**   <li>{@link TopDocsCollector} is an abstract base class*   that assumes you will retrieve the top N docs,*   according to some criteria, after collection is * done. </li> * * <li>{@link TopScoreDocCollector} is a concrete subclass * {@link TopDocsCollector} and sorts according to score + * docID. This is used internally by the {@link * IndexSearcher} search methods that do not take an * explicit {@link Sort}. It is likely the most frequently * used collector.</li> * * <li>{@link TopFieldCollector} subclasses {@link * TopDocsCollector} and sorts according to a specified * {@link Sort} object (sort by field). This is used * internally by the {@link IndexSearcher} search methods * that take an explicit {@link Sort}. * * <li>{@link TimeLimitingCollector}, which wraps any other * Collector and aborts the search if it's taken too much * time.</li> * * <li>{@link PositiveScoresOnlyCollector} wraps any other * Collector and prevents collection of hits whose score * is &lt;= 0.0</li>  * * </ul> * * @lucene.experimental */

Collector的层次结构

2 lucene-group

提供了分组查询GroupingSearch，对应相应的collector

3.实例：

public Map<String, Integer> groupBy(Query query, String field, int topCount) {Map<String, Integer> map = new HashMap<String, Integer>();long begin = System.currentTimeMillis();int topNGroups = topCount;int groupOffset = 0;int maxDocsPerGroup = 100;int withinGroupOffset = 0;try {FirstPassGroupingCollector c1 = new FirstPassGroupingCollector(field, Sort.RELEVANCE, topNGroups);boolean cacheScores = true; double maxCacheRAMMB = 4.0;CachingCollector cachedCollector = CachingCollector.create(c1, cacheScores, maxCacheRAMMB); indexSearcher.search(query, cachedCollector);Collection<SearchGroup<String>> topGroups = c1.getTopGroups(groupOffset, true);if (topGroups == null) { return null;} SecondPassGroupingCollector c2 = new SecondPassGroupingCollector(field, topGroups, Sort.RELEVANCE, Sort.RELEVANCE, maxDocsPerGroup, true, true, true);if (cachedCollector.isCached()) {// Cache fit within maxCacheRAMMB, so we can replay it:
            cachedCollector.replay(c2); } else {// Cache was too large; must re-execute query:
            indexSearcher.search(query, c2);}TopGroups<String> tg = c2.getTopGroups(withinGroupOffset);GroupDocs<String>[] gds = tg.groups;for(GroupDocs<String> gd : gds) {map.put(gd.groupValue, gd.totalHits);}} catch (IOException e) {e.printStackTrace();}long end = System.currentTimeMillis();System.out.println("group by time :" + (end - begin) + "ms");return map;}

几个参数说明：

groupField: 分组域
groupSort: 分组排序
topNGroups: 最大分组数
groupOffset: 分组分页用
withinGroupSort: 组内结果排序
maxDocsPerGroup: 每个分组的最多结果数
withinGroupOffset: 组内分页用

参考资料

https://blog.csdn.net/wyyl1/article/details/7388241

转载于:https://www.cnblogs.com/davidwang456/p/10000765.html

lucene源码分析(5)lucence-group相关推荐

Lucene 源码分析之倒排索引（三）
上文找到了 collect(-) 方法,其形参就是匹配的文档 Id,根据代码上下文,其中 doc 是由 iterator.nextDoc() 获得的,那 DefaultBulkScorer.itera ...
lucene 源码分析_Lucene分析过程指南
lucene 源码分析本文是我们名为" Apache Lucene基础知识 "的学院课程的一部分. 在本课程中,您将了解Lucene. 您将了解为什么这样的库很重要,然后了解Lu ...
lucene源码分析的一些资料
针对lucene6.1较新的分析:http://46aae4d1e2371e4aa769798941cef698.devproxy.yunshipei.com/conansonic/article/d ...
lucene源码分析(1)基本要素
1.源码包 core: Lucene core library analyzers-common: Analyzers for indexing content in different langua ...
lucene源码分析(8)MergeScheduler
1.使用IndexWriter.java mergeScheduler.merge(this, MergeTrigger.EXPLICIT, newMergesFound); 2.定义MergeSch ...
lucene源码分析(7)Analyzer分析
1.Analyzer的使用 Analyzer使用在IndexWriter的构造方法 /*** Constructs a new IndexWriter per the settings given i ...
lucene源码分析(6)Query分析
查询的入口 /** Lower-level search API.** <p>{@link LeafCollector#collect(int)} is called for every ...
lucene源码分析(4)Similarity相似度算法
lucene 7.5.0默认的评分Similarity是BM25Similarity (IndexSearcher.java) // the default Similarityprivate sta ...
lucene源码分析(2)读取过程实例
1.官方提供的代码demo Analyzer analyzer = new StandardAnalyzer();// Store the index in memory:Directory dire ...

lucene源码分析(5)lucence-group

lucene源码分析(5)lucence-group相关推荐

最新文章

热门文章