Solr  is  Searching On Lucene w/Replication

Specifically, Solr is a scalable, ready-to-deploy enterprise search engine that’s optimized to search large volumes of text-centric data and return results sorted by relevance.

  • Scalable—Solr scales by distributing work (indexing and query processing) to multiple servers in a cluster.
  • Ready to deploy—Solr is open source, is easy to install and configure, and provides a preconfigured example to help you get started.
  • Optimized for search—Solr is fast and can execute complex queries in subsecond speed, often only tens of milliseconds.
  • Large volumes of documents—Solr is designed to deal with indexes containing many millions of documents.
  • Text-centric—Solr is optimized for searching natural-language text, like emails, web pages, resumes, PDF documents, and social messages such as tweets or blogs.
  • Results sorted by relevance—Solr returns documents in ranked order based on how relevant each document is to the user’s query.

Search engines like Solr are optimized to handle data exhibiting four main characteristics:

  • Text-centric
  • Read-dominant
  • Document-oriented
  • Flexible schema

You also want to consider which fields in your documents must be stored in Solr and which should be stored in another system, such as a database. A search engine isn’t the place to store data unless it’s useful for search or displaying results.

 Building a web-scale inverted index

It might surprise you that search engines like Google also use an inverted index for searching the web. In fact, the need to build a web-scale inverted index led to the invention of MapReduce.

MapReduce is a programming model that distributes large-scale data-processing operations across a cluster of commodity servers by formulating an algorithm into two phases: map and reduce. With its roots in functional programming, MapReduce was adapted by Google for building its massive inverted index to power web search.

Using MapReduce, the map phase produces a unique term and document ID where the term occurs. In the reduce phase, terms are sorted so that all term/docID pairs are sent to the same reducer process for each unique term. The reducer sums up all term frequencies for each term to generate the inverted index.

------------------------------------------------------------------------------------------------------------------------------------

Diagram of the main components of Solr 4



 

Solr: Introduction相关推荐

  1. Guide: Solr performance tuning--转载

    原文地址:http://h3x.no/2011/05/10/guide-solr-performance-tuning Introduction I have for the last year be ...

  2. 【solr专题之四】关于VelocityResponseWriter

    一.关于Velocity的基本配置 在Solr中,可以以多种方式返回搜索结果,如单纯的文本回复(XML.JSON.CSV等),也可以返回velocity,js等格式.而VelocityResponse ...

  3. Spring Data Solr教程:Solr简介

    大多数应用程序必须具有某种搜索功能. 问题在于搜索功能通常是巨大的资源消耗,它们可能通过给数据库造成沉重的负担而破坏我们应用程序的性能. 因此,将负载转移到外部搜索服务器是个好主意. 这是我的Spri ...

  4. Lucene / Solr 开发经验

    Lucene / Solr 开发经验 http://clayz.iteye.com/blog/240357 2008-09-10 Lucene / Solr 开发经验 博客分类:Framework S ...

  5. Solr 使用Facet分组过程中与分词的矛盾解决办法

    对于一般查询而言 , 分词和存储都是必要的 . 比如 CPU 类型 "Intel 酷睿 2 双核 P7570", 拆分成 "Intel"," 酷睿 & ...

  6. Solr部署如何启动

    Solr部署如何启动 Posted on 一月 10, 2013 in: Solr入门|评论关闭 我刚接触solr,我要怎么启动,这是群里的朋友问得比较多的问题, solr最新版本下载地址: http ...

  7. Solr定时重建索引和增量更新

    新增jar包 新增solr-dataimport-scheduler.jar到所有节点tomcat\webapps\下solr项目的WEB-INF\lib下 下载地址: 为Solr配置监听器 修改所有 ...

  8. Solr索引和基本数据操作

    转自:https://blog.csdn.net/lzx1104/article/details/51460987 1. 介绍 Solr索引可以接收不同的数据来源,包括XML文件,逗号分隔值(CSV) ...

  9. Solr初始化源码分析-Solr初始化与启动

    用solr做项目已经有一年有余,但都是使用层面,只是利用solr现有机制,修改参数,然后监控调优,从没有对solr进行源码级别的研究.但是,最近手头的一个项目,让我感觉必须把solrn内部原理和扩展机 ...

最新文章

  1. 牛X,一系列Chrome 灵魂插件!爱了爱了!
  2. 让机器听懂世界,触及人类梦想还有多远?
  3. Android AdapterView 源码分析以及其相关回收机制的分析
  4. 智能合约的48个应用场景介绍
  5. 明明有了 promise ,为啥还需要 async await ?
  6. python核心编程:web服务器日志分析简单脚本
  7. Maven不会吮吸。 。 。 但是Maven文件会
  8. ppt修复无法读取_移动硬盘故障分析以及建议修复方法
  9. python 安装包查询_Linux系统下查找安装包所在目录
  10. 0x00D2DCAC 处(位于 Company.exe 中)引发的异常: 0xC0000005: 读取位置 0x00000024 时发生访问冲突。
  11. plc原理及应用_一年只一次,百篇电工+PLC技术资料大合集,不看真的亏!
  12. psp模拟器完美字库_安卓PSP模拟器评测:讨鬼传
  13. 思科ASA防火墙部署和基本配置
  14. Vue开发与调试工具vue-devtools
  15. 鞋类电商出路考:成本洼地在哪里
  16. matlab tic和toc单位,matlab toc tic 的用法
  17. 生成PayPal测试账号clientID 和 密钥
  18. STM32F429-Discovery 编译 uclinux
  19. 《暗时间》----读书笔记
  20. 全球与中国智能灯市场深度研究分析报告

热门文章

  1. 裁判文书数据搜索新网站【有法网】
  2. 解决GitHub连不上的问题fatal: unable to access ‘https://github.com/..’: Failed to connect to github.com port
  3. 计算机中文输入法教案,中文输入教学设计.doc
  4. Axure RP 9最新版软件及汉化包下载
  5. Python|简易银行ATM程序制作
  6. 不得不了解的 iOS 15.4 beta 新特性
  7. c语言求三个整数的积,反汇编学习-C语言实例解析精粹-实例3求整数之积
  8. facsum (线性筛 积性函数)
  9. 什么叫做社交电商,社交电商怎么做?
  10. 【ArangoDB 介绍】