上文找到了 collect(…) 方法,其形参就是匹配的文档 Id,根据代码上下文,其中 doc 是由 iterator.nextDoc() 获得的,那 DefaultBulkScorer.iterator 是何时赋值的?代码如下。

public abstract class Weight implements SegmentCacheable {protected static class DefaultBulkScorer extends BulkScorer {// ...public DefaultBulkScorer(Scorer scorer) {// ...this.scorer = scorer;this.iterator = scorer.iterator();this.twoPhase = scorer.twoPhaseIterator();}// ...}
}

构造函数中 scorer.iterator() 即为匹配的文档 Id,那么 scorer 又是从何而来呢?回顾 Weight.bulkScorer(…) 方法,代码如下。根据上文可知 scorer(context) 的实现类是 TermWeight。

public abstract class Weight implements SegmentCacheable {public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {Scorer scorer = scorer(context);// ...return new DefaultBulkScorer(scorer);}
}public class TermQuery extends Query {final class TermWeight extends Weight {@Overridepublic Scorer scorer(LeafReaderContext context) throws IOException {    final TermsEnum termsEnum = getTermsEnum(context);if (termsEnum == null) {return null;}PostingsEnum docs = termsEnum.postings(null, needsScores ? PostingsEnum.FREQS : PostingsEnum.NONE);assert docs != null;return new TermScorer(this, docs, similarity.simScorer(stats, context));}}
}final class TermScorer extends Scorer {private final PostingsEnum postingsEnum;TermScorer(Weight weight, PostingsEnum td, Similarity.SimScorer docScorer) {super(weight);this.docScorer = docScorer;this.postingsEnum = td;}@Overridepublic DocIdSetIterator iterator() {return postingsEnum;}
}

至此可以确定 scorer.iterator() 来源于 termsEnum.postings(...) 。倒排索引是不是若隐若现了呢。

下面聚焦于 termsEnum 的实际类型和其 postings(...) 方法。

根据上文可知,termsEnum 来源于 TermQuery.getTermsEnum(...),代码如下。

public class TermQuery extends Query {private TermsEnum getTermsEnum(LeafReaderContext context) throws IOException {final TermState state = termStates.get(context.ord);final TermsEnum termsEnum = context.reader().terms(term.field()).iterator();termsEnum.seekExact(term.bytes(), state);return termsEnum;}
}public final class LeafReaderContext extends IndexReaderContext {private final LeafReader reader;
}

LeafReader 本身是没有 terms(...) 方法的,也就是说 context.reader() 并不是 LeaferReader,而是其子类。根据上文已知 LeafReaderContext 是 IndexSearcher.leafContexts 其中的一个元素,那么找到 IndexSearcher.leafContexts 的赋值代码也就能知道 context.reader() 的实际类型。

public class IndexSearcher {public IndexSearcher(IndexReader r) {this(r, null);}public IndexSearcher(IndexReader r, ExecutorService executor) {this(r.getContext(), executor);}public IndexSearcher(IndexReaderContext context, ExecutorService executor) {// ...leafContexts = context.leaves();// ...}
}

根据这部分代码可知,IndexSearcher.leafContexts 来源于 IndexReader.getContext().leaves()。一般来说,这个 IndexReader 是 DirectoryReader.open(...) 返回的一个 StandardDirectoryReader 类。代码如下。

public abstract class DirectoryReader extends BaseCompositeReader<LeafReader> {public static DirectoryReader open(final Directory directory) throws IOException {return StandardDirectoryReader.open(directory, null);}
}

那么 IndexSearcher.leafContexts 实际来源于 StandardDirectoryReader.getContext().leaves()

public final class StandardDirectoryReader extends DirectoryReader {// ...
}public abstract class DirectoryReader extends BaseCompositeReader<LeafReader> {// ...
}public abstract class BaseCompositeReader<R extends IndexReader> extends CompositeReader {// ...
}public abstract class CompositeReader extends IndexReader {@Overridepublic final CompositeReaderContext getContext() {// ...readerContext = CompositeReaderContext.create(this);return readerContext;}@Overridepublic List<LeafReaderContext> leaves() throws UnsupportedOperationException {return leaves;}private final List<LeafReaderContext> leaves;
}

CompositeReaderContext.create(…) 是怎么创建的呢?

public final class CompositeReaderContext extends IndexReaderContext {   static CompositeReaderContext create(CompositeReader reader) {return new Builder(reader).build();}private static final class Builder {public Builder(CompositeReader reader) {this.reader = reader;}public CompositeReaderContext build() {return (CompositeReaderContext) build(null, reader, 0, 0);}private IndexReaderContext build(CompositeReaderContext parent, IndexReader reader, int ord, int docBase) {if (reader instanceof LeafReader) {final LeafReader ar = (LeafReader) reader;final LeafReaderContext atomic = new LeafReaderContext(parent, ar, ord, docBase, leaves.size(), leafDocBase);leaves.add(atomic);leafDocBase += reader.maxDoc();return atomic;} else {final CompositeReader cr = (CompositeReader) reader;final List<? extends IndexReader> sequentialSubReaders = cr.getSequentialSubReaders();final List<IndexReaderContext> children = Arrays.asList(new IndexReaderContext[sequentialSubReaders.size()]);final CompositeReaderContext newParent;if (parent == null) {newParent = new CompositeReaderContext(cr, children, leaves);} else {newParent = new CompositeReaderContext(parent, cr, ord, docBase, children);}int newDocBase = 0;for (int i = 0, c = sequentialSubReaders.size(); i < c; i++) {final IndexReader r = sequentialSubReaders.get(i);children.set(i, build(newParent, r, i, newDocBase));newDocBase += r.maxDoc();}assert newDocBase == cr.maxDoc();return newParent;}}}private CompositeReaderContext(CompositeReaderContext parent, CompositeReader reader, int ordInParent, int docbaseInParent, List<IndexReaderContext> children, List<LeafReaderContext> leaves) {this.leaves = leaves == null ? null : Collections.unmodifiableList(leaves);// ...}
}

build(...) 时,传入的 reader 类型是 StandardDirectoryReader,将执行 getSequentialSubReaders() 得到其所有子 reader,并以 reader 作为成员变量创建 LeafReaderContext,然后将 LeafReaderContext 加入到 leaves 中。

所以 IndexSearcher.leafContexts 的每个元素 LeafReaderContext 的 reader 即为 StandardDirectoryReader 的 getSequentialSubReaders()

public final class StandardDirectoryReader extends DirectoryReader {static DirectoryReader open(final Directory directory, final IndexCommit commit) throws IOException {return new SegmentInfos.FindSegmentsFile<DirectoryReader>(directory) {@Overrideprotected DirectoryReader doBody(String segmentFileName) throws IOException {SegmentInfos sis = SegmentInfos.readCommit(directory, segmentFileName);final SegmentReader[] readers = new SegmentReader[sis.size()];boolean success = false;try {for (int i = sis.size()-1; i >= 0; i--) {readers[i] = new SegmentReader(sis.info(i), sis.getIndexCreatedVersionMajor(), IOContext.READ);}DirectoryReader reader = new StandardDirectoryReader(directory, readers, null, sis, false, false);success = true;return reader;}// ...}}.run(commit);}StandardDirectoryReader(Directory directory, LeafReader[] readers, IndexWriter writer, SegmentInfos sis, boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {super(directory, readers);this.writer = writer;this.segmentInfos = sis;this.applyAllDeletes = applyAllDeletes;this.writeAllDeletes = writeAllDeletes;}
}public abstract class DirectoryReader extends BaseCompositeReader<LeafReader> {protected DirectoryReader(Directory directory, LeafReader[] segmentReaders) throws IOException {super(segmentReaders);this.directory = directory;}
}public abstract class BaseCompositeReader<R extends IndexReader> extends CompositeReader {protected BaseCompositeReader(R[] subReaders) throws IOException {this.subReaders = subReaders;// ...}
}

可以分析出,reader 的类型是 SegmentReader,而该类(其实是其父类)确实是有 terms(…) 方法的。代码如下。

public final class SegmentReader extends CodecReader {// ...final SegmentCoreReaders core;@Overridepublic FieldsProducer getPostingsReader() {return core.fields;}
}public abstract class CodecReader extends LeafReader implements Accountable {@Overridepublic final Terms terms(String field) throws IOException {return getPostingsReader().terms(field);}
}final class SegmentCoreReaders {final FieldsProducer fields;SegmentCoreReaders(Directory dir, SegmentCommitInfo si, IOContext context) throws IOException {// ...final Codec codec = si.info.getCodec();final PostingsFormat format = codec.postingsFormat();fields = format.fieldsProducer(segmentReadState);// ...}
}

在 lucene-7.3.0 中默认的 codec 是 Lucene70Codec,默认 postingsFomat 是 Lucene50PostingsFormat,这个分析过程请见 Lucene 源码分析之 segment(后续补上)。

所以 SegmentReader.terms(…) 实际调用的是 Lucene50PostingsFormat.fieldsProducer(…).terms(…)。

public final class Lucene50PostingsFormat extends PostingsFormat {@Overridepublic FieldsProducer fieldsProducer(SegmentReadState state) throws IOException {PostingsReaderBase postingsReader = new Lucene50PostingsReader(state);FieldsProducer ret = new BlockTreeTermsReader(postingsReader, state);return ret;}
}

最终 SegmentReader.terms(…) 实际调用的是 BlockTreeTermsReader.terms(…)。

public final class BlockTreeTermsReader extends FieldsProducer {@Overridepublic Terms terms(String field) throws IOException {return fields.get(field);}private final TreeMap<String,FieldReader> fields = new TreeMap<>();public BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState state) throws IOException {this.postingsReader = postingsReader;fields.put(fieldInfo.name, new FieldReader(...));}
}

则 BlockTreeTermsReader.terms(…) 实际返回的是 FieldReader。

再次回顾上文中的核心代码。

public class TermQuery extends Query {final class TermWeight extends Weight {@Overridepublic Scorer scorer(LeafReaderContext context) throws IOException {    final TermsEnum termsEnum = getTermsEnum(context);if (termsEnum == null) {return null;}PostingsEnum docs = termsEnum.postings(null, needsScores ? PostingsEnum.FREQS : PostingsEnum.NONE);assert docs != null;return new TermScorer(this, docs, similarity.simScorer(stats, context));}}private TermsEnum getTermsEnum(LeafReaderContext context) throws IOException {final TermState state = termStates.get(context.ord);final TermsEnum termsEnum = context.reader().terms(term.field()).iterator();termsEnum.seekExact(term.bytes(), state);return termsEnum;}
}

则 termsEnum 为 FieldReader.iterator(),是一个 SegmentTermsEnum。

public final class FieldReader extends Terms implements Accountable {@Overridepublic TermsEnum iterator() throws IOException {return new SegmentTermsEnum(this);}
}

则 termsEnum.postings(…) 为 SegmentTermsEnum.postings(…)。

final class SegmentTermsEnum extends TermsEnum {@Overridepublic PostingsEnum postings(PostingsEnum reuse, int flags) throws IOException {   currentFrame.decodeMetaData();return fr.parent.postingsReader.postings(fr.fieldInfo, currentFrame.state, reuse, flags);}final FieldReader fr;
}public final class FieldReader extends Terms implements Accountable {final BlockTreeTermsReader parent;
}public final class BlockTreeTermsReader extends FieldsProducer {final PostingsReaderBase postingsReader;
}

fr 是在 SegmntTermsEnum 的构造函数里出现的。

final class SegmentTermsEnum extends TermsEnum {public SegmentTermsEnum(FieldReader fr) throws IOException {this.fr = fr;}
}

而这个 FieldReader 是在 BlockTreeTermsReader 的构造函数里构造的。

public final class BlockTreeTermsReader extends FieldsProducer {   public BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState state) throws IOException {// ...fields.put(fieldInfo.name, new FieldReader(this,...));}
}public final class FieldReader extends Terms implements Accountable {FieldReader(BlockTreeTermsReader parent,...) throws IOException {this.parent = parent;}
}

则 fr.parent 是 BlockTreeTermsReader,则 fr.parent.postingsReader 是 Lucene50PostingsReader,这就是倒排索引的核心类。

转载于:https://www.cnblogs.com/studyhs/p/9092928.html

Lucene 源码分析之倒排索引(三)相关推荐

  1. lucene 源码分析_Lucene分析过程指南

    lucene 源码分析 本文是我们名为" Apache Lucene基础知识 "的学院课程的一部分. 在本课程中,您将了解Lucene. 您将了解为什么这样的库很重要,然后了解Lu ...

  2. lucene源码分析的一些资料

    针对lucene6.1较新的分析:http://46aae4d1e2371e4aa769798941cef698.devproxy.yunshipei.com/conansonic/article/d ...

  3. Spring 源码分析衍生篇三 : lookup-method 和 replaced-method

    文章目录 一.前言 二.基本使用 1. 作用 三.原理实现 1. 预处理 1.1 AbstractBeanDefinition#prepareMethodOverrides 1.2 Autowired ...

  4. SpringMVC之源码分析--ThemeResolver(三)

    概述 上节介绍了SessionThemeResolver解析器,本章分析下CookieThemeResolver,两个解析器实现的功能是一样的,只是使用的主题载体有区别而已,SessionThemeR ...

  5. [原]tornado源码分析系列(三)[网络层 IOLoop类]

    引言:由于都是在工作当中抽出时间看源代码,所以更新速度比较慢,但是还是希望通过对好的源码的分析和探讨,大家相互学习,发现不好的地方共同讨论. 上次讲了IOLoop中的几个重要的方法,inistance ...

  6. lucene源码分析(1)基本要素

    1.源码包 core: Lucene core library analyzers-common: Analyzers for indexing content in different langua ...

  7. Mybatis源码分析之(三)mapper接口底层原理(为什么不用写方法体就能访问到数据库)

    mybatis是怎么拿sqlSession 在 上一篇的时候,我们的SqlSessionFactoryBuilder已经从xml文件中解析出了Configuration并且返回了sessionFact ...

  8. Spring5源码分析系列(三)Spring5概述

    咕泡学院Tom老师视频讲解第三章,对Spring5进行简要介绍,文章参考自Tom老师视频. Spring是一个开源的轻量级JavaSE(Java标准版本)/JavaEE(Java企业版本)开发应用框架 ...

  9. jQuery-1.9.1源码分析系列(三) Sizzle选择器引擎——一些有用的Sizzle API

    说一下Sizzle中零碎的API.这些API有的被jQuery接管,直接使用jQuery.xxx就可以使用,有的没有被接管,如果要在jQuery中使用,使用方法是jQuery.find.xxx. 具体 ...

最新文章

  1. Java平时需要注意的事项
  2. Python中lambda表达式学习
  3. 5.13T1Send 题(send)
  4. Qt 4.3 公布揭晓
  5. kali入侵windows
  6. mac lion 安装 mysql_mac osx下安装mysql
  7. iPhone 12s Pro渲染图曝光:屏幕和相机将成升级重点
  8. 软件度量五步法包括_软件交付效能度量——从吞吐量和稳定性开始
  9. [swift] LeetCode 215. Kth Largest Element in an Array
  10. 算法-罗马数字转成整数(leetcode13)
  11. Web入门-namp的下载与使用
  12. Go-获取整形切片中的最大值最小值
  13. 使用u盘如何装linux系统教程视频教程,如何使用u盘安装linux系统
  14. win7如何通过电脑系统开启无线热点
  15. qlv转mp4工具下载
  16. 汇编语言||存储单元,存储字长,存储字,存储容量的理解
  17. 使用VMware安装Ubuntu虚拟机 - 完整教程
  18. 解读6大常见肿瘤的消融选择
  19. U盘安装kali linux
  20. 上号神器,穿越火线扫码登录教程

热门文章

  1. html语言设计表格,HTML标记语言——表格标记
  2. JStorm/Storm源码解读(二)--启动篇
  3. ant 编译java 项目_使用ant编译打包、部署简单的javaweb项目 --01
  4. 减肥登Cell封面上热搜:华东师大热疗探索新突破,这一波网友们想象力大开
  5. 中国批准AMD收购赛灵思!苏妈花350亿美元集齐CPU\GPU\FPGA三大芯片业务
  6. 清华团队将Transformer用到3D点云分割上后,效果好极了丨开源
  7. Google人体图像分割模型Bodypix再次更新,针对Coral开发板优化,720p/30fps流畅运行...
  8. 马斯克明年送3人到太空站旅游:票价3.8亿,仅剩2席,手慢无
  9. Java笔试题库之选题题篇【1-70题】
  10. IBM和Lightbend宣布构建新认知开发平台的战略