一些开源搜索引擎实现——倒排使用原始文件，列存储Hbase，KV store如levelDB、mongoDB、redis，以及SQL的，如sqlite或者xxSQL...

本文说明：除开ES，Solr，sphinx系列的其他开源搜索引擎汇总于此。

A search engine based on Node.js and LevelDB

A persistent, network resilient, full text search library for the browser and Node.js

https://github.com/fergiemcdowall/norch

https://github.com/fergiemcdowall/search-index

使用的是levelDB存储索引，不过目前没有明白，对于倒排索引，其是否适合？类似思路还有：

https://github.com/patrickfrey/strus

Library implementing the storage and the query evaluation for a text search engine. It uses on a key value store database interface to store its data. Currently there exists an implementation based on the google LevelDB library. http://www.project-strus.net

用法：https://www.codeproject.com/articles/1059766/building-a-search-engine-with-python-tornado-and-s

go写的，评分比较高的，也是levelDB实现底层存储：

https://github.com/blevesearch/bleve 功能强大支持facet等

https://github.com/huichen/wukong 2000+star 貌似使用的boltBD存储支持了分布式源码分析在https://ayende.com/blog/171745/code-reading-wukong-full-text-search-engine 可以看到其使用的内存持久化支持kv DB

https://github.com/basho/riak_search/ 应该使用的是riak（kv存储）中sets来存储倒排索引不过是erlang语言写的让人很忧伤啊

https://github.com/victorparmar/zsearch 也是levelDB做的，看起来很牛叉的样子，Low Data fragmentation and good random write performance by using levelDB Log Structured Merge Trees. High performance query speed by using CompressedBitmap to store DocumentIds in an InvertedIndex interface provided by a simple libEvent2 http server.

https://github.com/daviddengcn/gcse 貌似也是levelDB

https://github.com/eugeneware/fulltext-engine

使用mongodb做倒排存储的：

https://github.com/c-bata/gosearch 本质是底层addToSet 见这个即可知 https://github.com/c-bata/gosearch/blob/master/models/index.go

https://github.com/hemslo/poky-engine A simple search engine in python using Tornado, Scrapy, Redis and MongoDB

360的：

A search engine which can hold 100 trillion lines of log data.

用的是hdfs存储，MR来做并发，他号称针对日志搜索，其元数据是放在redis NoSQL里，倒排索引放的是Hbase，这样看来本质上是列存储！待看！

https://github.com/Qihoo360/poseidon

linkin的：

https://github.com/linkedin/indextank-engine 比较强大支持facet等使用内存和文件两种方式做索引有时间可以好好研究下底层文件应该支持压缩

https://github.com/gigablast/open-source-search-engine http://www.gigablast.com/使用的搜索引擎代码是c++写的不过看起来稍微有点凌乱也支持索引持久化到数据库

分布式：

https://github.com/izenecloud/sf1r-lite 使用nginx做负载均衡，底层倒排貌似是用hadoop做的架构图 https://raw.githubusercontent.com/izenecloud/sf1r/master/docs/source/images/sf1r.png

值得一看的：

https://github.com/aaalgo/donkey

https://github.com/groonga/groonga Groonga is an open-source fulltext search engine and column store. It lets you write high-performance applications that requires fulltext search.列存储？

ruby的：

https://github.com/mrkamel/search_cop 使用了SQL数据库，支持SQL语句+全文搜索 Search engine like fulltext query support for ActiveRecord

使用lucene做全文搜索的：

CrateDB: The fast, scalable, easy to use SQL database with native full text search https://crate.io

http://www.opensearchserver.com/

yacy

go search网站的搜索引擎：（http://go-search.org/search?q=hello）

https://github.com/daviddengcn/gcse 用的是 https://github.com/daviddengcn/go-index 做索引有时间可以研究下后者

使用原始文件做倒排的：

https://github.com/bradleypeabody/fulltext Pure-Go full text indexer and search library

https://github.com/dchest/static-search 搜索本地文件

https://github.com/getwe/cse 用的是https://github.com/getwe/goose 本质上是原始文件倒排是百度的一个工程师写的 http://getwe.cn/%E6%8A%80%E6%9C%AF/%E6%90%9C%E7%B4%A2%E5%BC%95%E6%93%8E/goose/database-diskindex/

使用redis做倒排的：

https://github.com/hymloth/pyredise/

lucene的go移植版：

https://github.com/philipsoutham/golucy

https://github.com/ipfs-search/ipfs-search 使用ES5做搜索

用sqlite存倒排索引：

https://github.com/gansidui/gose

尚不知内在原理的：

https://github.com/gigablast/open-source-search-engine

https://github.com/sourcegraph/thesrc 源码搜索但还没有看出其使用的搜索引擎

https://github.com/rsesek/usda-ndb 搜食品成分的

https://github.com/devict/magopie 搜BT种子的

https://github.com/yieldbot/ferret Ferret is a search engine that unifies search results from Github, Slack, Trello and more

https://github.com/ndmitchell/hoogle haskell写的

https://github.com/BitFunnel/BitFunnel

https://github.com/Maxime2/dataparksearch

其他：

https://github.com/KunBetter/GridSearch real-time grid search engine 网格搜索引擎不知道原理

https://github.com/kanatohodets/carbonsearch search engine for graphite metrics

https://github.com/beniz/seeks 貌似是直接放在内存一个list里 p2p websearch

https://github.com/carrot2/carrot2 可能用到了ES Solr

https://github.com/reyesr/fullproof 使用webSQL或者浏览器DB来存倒排索引的JS搜索引擎

https://github.com/nolanlawson/pouchdb-quick-search 使用小型数据库的离线搜索例如phonegap、app等

https://github.com/legendary001/SearchEngine 使用hadoop+lucene的搜索引擎

不过按照我的观点看，搜索引擎本质上是针对各个field的特定搜索word的列存储。所以其底层实现用tokuDB线性树结构应该更合适，日志的话搜索使用时间序列存储更合适。

转载于:https://www.cnblogs.com/bonelee/p/6265853.html

一些开源搜索引擎实现——倒排使用原始文件，列存储Hbase，KV store如levelDB、mongoDB、redis，以及SQL的，如sqlite或者xxSQL...相关推荐

图解Skip List——本质是空间换时间的数据结构，在lucene的倒排列表，bigtable，hbase，cassandra的memtable，redis中sorted set中均用到...
Skip List的提出已有二十多年[Pugh, W. (1990)],却依旧应用广泛(Redis.LevelDB等).作为平衡树(AVL.红黑树.伸展树.树堆)的替代方案,虽然它性能不如平衡树稳定, ...
python 信息检索,python信息检索代码_信息检索_倒排记录表合并算法实现（python）...
小程序描述:输入两个倒排记录表,求两个倒排记录表的交集. 倒排记录表合并算法伪代码如下所示: 功能描述: ①运行程序,看到提示"请输入词项word1:",输入某个倒排记录表的词项. ...
Excel中倒排数据
OUTLINE: 问题解决方案问题: "倒排"不同于"倒序",倒序是将一列数按照数值大小进行升序或者降序排列,而倒排是指给一列数据翻个个,前后数据的顺序进行 ...
《大话搜索引擎》-第一季（2）聊聊工程切分、倒排、分词
本季内容概览:大概会花费4篇左右的文章为大家讲解一个普适率80%左右的垂直搜索引擎,内容涵盖需求分析.架构设计.模块切分.各种填坑.效果评测直至上线运行. 本篇知识点概览:工程切分.倒排.分词一. ...
开源搜索引擎排名第一，Elasticearch是如何做到的？
一.引言随着移动互联网.物联网.云计算等信息技术蓬勃发展,数据量呈爆炸式增长.如今我们可以轻易得从海量数据里找到想要的信息,离不开搜索引擎技术的帮助. 作为开源搜索引擎领域排名第一的 Elastic ...
13款开源搜索引擎的介绍
本文档转载自 http://blog.csdn.net/xum2008/article/details/8740063 对现有的开源的搜索引擎的一个简单介绍: Lucene Lucene的开发语言是J ...
新手学信息检索2：倒排表与存储
这篇就说一个信息检索里面理解最简单的一个东西吧,它就叫做倒排表或者倒排索引.但是这只是个名字,我想大家都知道它是什么就行了,不必纠结于名称.先说说倒排表张什么样子吧! 倒排表以词做索引,内容为包含该词 ...
倒排文件索引（Inverted File Index）的建立
建立索引目前主流的索引技术有三种:倒排文件.后缀数组和签名.后缀数组的方法虽然快,但是其维护困难,代价相当高,不适合做引擎的索引.签名是一种很好的索引方式,但倒排文件的速度和性能已经超过了签名.倒排 ...
倒排表数据结构、通配符查询、拼写纠正详解
目录: Dictionary Data Structure 词典数据结构 Wild-Card Query 通配符查询 Spelling Correction 拼写纠正搜索引擎里的diction ...

一些开源搜索引擎实现——倒排使用原始文件，列存储Hbase，KV store如levelDB、mongoDB、redis，以及SQL的，如sqlite或者xxSQL...

一些开源搜索引擎实现——倒排使用原始文件，列存储Hbase，KV store如levelDB、mongoDB、redis，以及SQL的，如sqlite或者xxSQL...相关推荐

最新文章

热门文章