https://github.com/Stratio/cassandra-lucene-index

Stratio’s Cassandra Lucene Index

Stratio’s Cassandra Lucene Index, derived from Stratio Cassandra, is a plugin for Apache Cassandra that extends its index functionality to provide near real time search such as ElasticSearch or Solr, including full text search capabilities and free multivariable, geospatial and bitemporal search. It is achieved through an Apache Lucene based implementation of Cassandra secondary indexes, where each node of the cluster indexes its own data. Stratio’s Cassandra indexes are one of the core modules on which Stratio’s BigData platform is based.

Index relevance searches allow you to retrieve the n more relevant results satisfying a search. The coordinator node sends the search to each node in the cluster, each node returns its n best results and then the coordinator combines these partial results and gives you the n best of them, avoiding full scan. You can also base the sorting in a combination of fields.

Any cell in the tables can be indexed, including those in the primary key as well as collections. Wide rows are also supported. You can scan token/key ranges, apply additional CQL3 clauses and page on the filtered results.

Index filtered searches are a powerful help when analyzing the data stored in Cassandra with MapReduce frameworks as Apache Hadoop or, even better, Apache Spark. Adding Lucene filters in the jobs input can dramatically reduce the amount of data to be processed, avoiding full scan.

The following benchmark result can give you an idea about the expected performance when combining Lucene indexes with Spark. We do successive queries requesting from the 1% to 100% of the stored data. We can see a high performance for the index for the queries requesting strongly filtered data. However, the performance decays in less restrictive queries. As the number of records returned by the query increases, we reach a point where the index becomes slower than the full scan. So, the decision to use indexes in your Spark jobs depends on the query selectivity. The trade-off between both approaches depends on the particular use case. Generally, combining Lucene indexes with Spark is recommended for jobs retrieving no more than the 25% of the stored data.

This project is not intended to replace Apache Cassandra denormalized tables, inverted indexes, and/or secondary indexes. It is just a tool to perform some kind of queries which are really hard to be addressed using Apache Cassandra out of the box features, filling the gap between real-time and analytics.

More detailed information is available at Stratio’s Cassandra Lucene Index documentation.

Features

Lucene search technology integration into Cassandra provides:

Stratio’s Cassandra Lucene Index and its integration with Lucene search technology provides:

  • Full text search (language-aware analysis, wildcard, fuzzy, regexp)
  • Boolean search (and, or, not)
  • Sorting by relevance, column value, and distance
  • Geospatial indexing (points, lines, polygons and their multiparts)
  • Geospatial transformations (bounding box, buffer, centroid, convex hull, union, difference, intersection)
  • Geospatial operations (intersects, contains, is within)
  • Bitemporal search (valid and transaction time durations)
  • CQL complex types (list, set, map, tuple and UDT)
  • CQL user defined functions (UDF)
  • CQL paging, even with sorted searches
  • Columns with TTL
  • Third-party CQL-based drivers compatibility
  • Spark and Hadoop compatibility

Not yet supported:

  • Thrift API
  • Legacy compact storage option
  • Indexing counter columns
  • Indexing static columns
  • Other partitioners than Murmur3

Requirements

  • Cassandra (identified by the three first numbers of the plugin version)
  • Java >= 1.8 (OpenJDK and Sun have been tested)
  • Maven >= 3.0

转载于:https://www.cnblogs.com/bonelee/p/6757830.html

cassandra的全文检索插件相关推荐

  1. mysql 5.7 刘,深度解析MySQL 5.7之中文全文检索

    前言 其实全文检索在mysql里面很早就支持了,只不过一直以来只支持英文.缘由是他从来都使用空格来作为分词的分隔符,而对于中文来讲,显然用空格就不合适,需要针对中文语义进行分词.这不,从mysql 5 ...

  2. mysql fulltext搜索_[MySQL] 原生全文检索 fulltext 的简单应用

    在目标字段上添加全文检索:alter table 表名 add fulltext(字段) with parser ngram 查询语句:select * from xxx where match(字段 ...

  3. mysql 索引与约束_MySQL之索引与约束条件

    字段约束 作用 顾名思义就是给字段加以限制 其保证数据库的完整性与一致性 通过约束条件防止数据库产生一些不必要的数据 保证数据库的正确性 相容性 安全性 null和not null mysql> ...

  4. 云信小课堂|搭建应用级别在线聊天室,7步就够了!

    Vol. 6 从2000年至今,聊天室一直活跃在人们的各种生活场景中,目前广泛运用于超级小班课.互动大班课.连麦开黑.主播 PK 等场景,还具备文本.表情.点赞.撒花等互动方式,架起沟通桥梁的同时,玩 ...

  5. python爬虫数据库_python数据库索引|python爬虫|python入门|python教程

    https://www.xin3721.com/eschool/pythonxin3721/ 首先要思考两个问题: 如何在字典中查找指定偏旁的汉字?如何在一本书中查找某内容?对于这两个问题大家都不陌生 ...

  6. mysql全文搜索实现模糊查询_mysql使用全文索引实现大字段的模糊查询

    0.场景说明 centos7 mysql5.7 InnoDB引擎 0.1创建表 DROP TABLE IF EXISTS tbl_article_content; CREATE TABLE tbl_a ...

  7. 安卓集成网易云信SDK实现登录功能

    通过集成实现登录,通过网易云信官网自带的控制台可以实现账号注册,不再赘述.其实开发者手册上写的很明白,可以先对照着看看. https://doc.yunxin.163.com/docs/TM5MzM5 ...

  8. 网易云IM(即时通讯) 集成指南(Android)

    网易云IM(即时通讯) 集成指南(Android) 新手接入指南 网易云官网地址自行观看,开发文档很清楚. 两种集成方式 1.as 通过gradle导入依赖集成 2.手动下载sdk. 我们这里用第一种 ...

  9. 《Spring Boot 实战派》--13.集成NoSQL数据库,实现Elasticsearch和Solr搜索引擎

    第13章 集成NoSQL数据库,实现Elasticsearch和Solr搜索引擎 关于搜索引擎 我们很难实现 Elasticseach 和 Solr两大搜索框架的效果:所以本章针对两大搜索框架,非常详 ...

最新文章

  1. 2022-2028年中国电动牙刷行业深度调研及投资前景预测报告(全卷)
  2. poj_2479 动态规划
  3. 为何 NLP 领域难以出现“独角兽”?
  4. java 使用jaxb 把xml 直接转换为ben
  5. 额外篇 | ggplot (下)
  6. Mysql 学习笔记2
  7. 5月份,我居然发了这么多文章?我果然无聊.
  8. FreeRTOS移植到STM32F103
  9. ActiveMQ RabbitMQ KafKa对比
  10. 分表扩展全局序列实际操作_高可用_单表存储千万级_海量存储_分表扩展---MyCat分布式数据库集群架构工作笔记0026
  11. C# 禁止用户关掉对话框的方法
  12. 常用Linux命令--CPU和GPU查看
  13. 台式计算机怎么把声音输出,台式电脑音频管理器的设置教程 台式电脑音响没声音怎么弄...
  14. 置换密码c语言,替代密码和置换密码的C语言实现
  15. 如何使用ECharts绘制甘特图
  16. msxml3.dll 错误 '800c0005'具体解决办法详解
  17. c语言初学者编程大题部分
  18. ARPG游戏开发知识整理。
  19. 2048小游戏项目总结
  20. 敏捷Scrum框架最全总结!

热门文章

  1. git pull遇到错误
  2. TextBoxSuggest,输入框提示工具,输入建议,输入匹配,辅助输入,输入即时提示,文本编辑器,Visual Studio效果,高速查询引擎,哈希树,模糊匹配,百万条零毫秒
  3. win这个傻逼系统,高PPI上默认的情况下是放大的,放大了之后逻辑分辨率不跟着放大。
  4. python 第一个单词大写其他小写_Python入门的新手需要遵守哪些命名规范?
  5. 2012年上海市高等学校计算机等级考试试卷,2012年上海市高等学校计算机等级考试A试卷...
  6. mysql 多表查询or_MySQL 多表查询
  7. openlayers 可以实现3d地图效果吗_OpenLayers教程:地图标注
  8. centos 5.5 mysql 5.5.39_CentOS 5.5 下安装 Mysql 5.5
  9. nginx的error.log日志常见的几个错误解决方法
  10. 【网页前端设计Front end】JavaScript教程.上(看不懂你来打我)