1.基本特性

一个分布式的实时文档存储库，每个字段都可以被索引与搜索；

一个分布式的实时分析搜索引擎；

能胜任上百个服务节点的扩展，并支持PB级别的结构化或者非结构化数据。

2.相关概念

近实时（Near Real Time，NRT）：

Elasticsearch是一个近实时的数据搜索和分析平台，这意味着从索引文档到可搜索文档都会有一段微小的延迟（通常是1s以内）。

集群（Cluster）：

集群是一个或多个节点的集合，这些节点将共同拥有完整的数据，并跨节点提供联合索引、搜索和分析功能。

集群由唯一的名称标识（elasticsearch.yml配置文件中的参数cluster.name，默认名称为：Elasticsearch）。可以存在多独立的集群，但每个集群必须都有自己唯一的集群名称！

节点（Node）：

节点是一个Elasticsearch的运行实例，也就是一个进程（process）。数据存储在节点，多个节点组成集群。节点还参与集群的索引、搜索和分析功能。

与集群一样，节点由一个名称标识。默认情况下，节点名称是启动时分配给节点的随机通用唯一标识符（UUID）。如果不想用默认值，可以自定义节点名称（谨记配置有意义的节点名称）。

可以通过集群名称将节点配置为加入特定集群。默认情况下，每个节点都被设置为加入一个名为Elasticsearch的集群（这意味着，如果在网络上启动了多个节点，并且它们可以发现彼此的话，那么它们将自动形成并加入一个名为Elasticsearch的集群）。

索引（index）：

索引是Elasticsearch数据管理的顶层单位，是具有某种相似特性的文档集合（相当于MySQL、MongoDB等中的“数据库”这个概念）。

索引由一个名称（必须全部小写）标识，当对其中的文档执行索引、搜索、更新和删除操作时，该名称指向这个特定的索引。

在单个集群中，可以定义任意多个索引。

类型（Type）：

这个概念在7.0版本以后已被彻底移除。

文档（document）：

文档是可以被索引的基本信息单元，并以JSON表示（JSON是一种普遍存在的Internet数据交换格式）。在单个索引中，理论上可以存储任意多的文档。

字段（Fields）：

每个文档中都包含许多字段，每个字段都有其对应的值。

3.准备工作

【必须】配置Elasticsearch：https://www.elastic.co/cn/downloads/elasticsearch

【必须】安装Python库（对接Elasticsearch操作）：pip install elasticsearch

【建议】配置kibana（配套的可视化管理工具）：https://www.elastic.co/cn/start

【可选】安装IK Analysis（中文分词插件）：https://github.com/medcl/elasticsearch-analysis-ik

注意三个工具的版本要保持一致！！

配置Elasticsearch：

1.下载并解压zip文件；

2.点击.\bin\elasticsearch.bat启动（在打开的命令窗口点击Ctrl+C关闭Elasticsearch）；

3.在CMD中运行：curl http://localhost:9200

或在PowerShell中运行：Invoke-RestMethod http://localhost:9200

或在浏览器中打开：http://localhost:9200

配置kibana：

1.下载并解压zip文件；

2.编辑.\config\kibana.yml；

3.点击.\bin\kibana.bat启动（在打开的命令窗口点击Ctrl+C关闭kibana）；

4.在浏览器中打开：http://localhost:5601

安装IK Analysis

在.\bin打开命令窗口，输入：

.\elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.16.3/elasticsearch-analysis-ik-7.16.3.zip

最新版本地址：https://github.com/medcl/elasticsearch-analysis-ik/releases

确保版本号和Elasticsearch一致！！

4.创建索引

from elasticsearch import Elasticsearches = Elasticsearch()
# 声明一个Elasticsearch对象es
result = es.indices.create(index='it_news')
# 调用es的indices对象的create方法，传入index的名称“it_news”
print(result)

{'acknowledged': True, 'shards_acknowledged': True, 'index': 'it_news'}

返回结果是JSON格式，其中的acknowledged字段表示创建操作执行成功。

如果把代码再执行一遍，会报错：

raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.RequestError: RequestError(400, 'resource_already_exists_exception', 'index [it_news/VQIJEDvjRzaqknxEHdQ6lA] already exists')

所以一般会加入ignore参数，把一些意外情况排除，来保证程序正常执行而不会中断。

from elasticsearch import Elasticsearches = Elasticsearch()
# 声明一个Elasticsearch对象es
result = es.indices.create(index='it_news',ignore=400)
# 调用es的indices对象的create方法传入index的名称“it_news”
print(result)

此时把代码再执行一遍，返回的结果同样是JSON格式，这表明程序没有中断（这是因为添加的ignore参数忽略了400状态码，因此程序正常执行输出JSON结果，而不是抛出异常。）。

{'error': {'root_cause': [{'type': 'resource_already_exists_exception', 'reason': 'index [it_news/VQIJEDvjRzaqknxEHdQ6lA] already exists', 'index_uuid': 'VQIJEDvjRzaqknxEHdQ6lA', 'index': 'it_news'}], 'type': 'resource_already_exists_exception', 'reason': 'index [it_news/VQIJEDvjRzaqknxEHdQ6lA] already exists', 'index_uuid': 'VQIJEDvjRzaqknxEHdQ6lA', 'index': 'it_news'}, 'status': 400}

创建完之后，还可以设置一下索引的字段映射定义（内容待补充）。

5.删除索引

from elasticsearch import Elasticsearches = Elasticsearch()
# 声明一个Elasticsearch对象es
result = es.indices.delete(index='it_news',ignore=[400, 404])
# 调用es的indices对象的delete方法删除名为“it_news”的index
print(result)

返回结果：

{'acknowledged': True}

如果再执行一遍，返回结果：

{'error': {'root_cause': [{'type': 'index_not_found_exception', 'reason': 'no such index [it_news]', 'resource.type': 'index_or_alias', 'resource.id': 'it_news', 'index_uuid': '_na_', 'index': 'it_news'}], 'type': 'index_not_found_exception', 'reason': 'no such index [it_news]', 'resource.type': 'index_or_alias', 'resource.id': 'it_news', 'index_uuid': '_na_', 'index': 'it_news'}, 'status': 404}

由于delete方法里面也使用了ignore参数，所以即使删除失败，也只是会返回一个JSON格式，而不会导致程序中断。

6.插入数据

用create方法：

Elasticsearch跟MongoDB一样，在插入数据的时候可以直接插入结构化字典数据，插入数据可以调用create方法或者index方法（create方法内部其实也是调用了index方法，是对index方法的封装）。

from elasticsearch import Elasticsearches = Elasticsearch()
# 声明一个Elasticsearch对象es
es.indices.create(index='it_news',ignore=400)
# 调用es的indices对象的create创建名为“it_news”的index（已创建就忽略）data = {'title': '微软大手笔：Xbox 宣布 687 亿美元收购动视暴雪为打造“元宇宙”添砖加瓦，后者盘前大涨 37%','url': 'https://www.ithome.com/0/599/223.htm'}
# 声明一条新闻数据（包含标题和链接）
result = es.create(index='it_news', id='1', document=data)
# 调用es的create方法插入，在调用时传入3个参数：
# index代表索引名称，id是数据的唯一标识，document代表文档的具体内容
print(result)

返回的同样是JSON格式，其中result字段为created，代表数据插入成功。

{'_index': 'it_news', '_type': '_doc', '_id': '1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 0, '_primary_term': 1}

用index方法：

另外也可以用index方法插入数据（区别在于index方法不需要指定id，会自动生成）

from elasticsearch import Elasticsearches = Elasticsearch()
# 声明一个Elasticsearch对象es
es.indices.create(index='it_news', ignore=400)
# 调用es的indices对象的create创建名为“it_news”的index（已创建就忽略）data = {'title': '三星 Exynos 2200 GPU 跑分首次出炉：AMD RDNA2 加持，紧追骁龙 8 Gen 1','url': 'https://www.ithome.com/0/599/261.htm'}
# 声明一条新闻数据（包含标题和链接）
result = es.index(index='it_news', document=data)
# 调用es的create方法插入，在调用时传入2个参数：
# index代表索引名称，document代表文档的具体内容
print(result)

{'_index': 'it_news', '_type': '_doc', '_id': 'QDE_cX4Bw_xw4x789mmU', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 1, '_primary_term': 1}

在返回信息里面可以看到，data已经插入成功，并自动生成了id。

7.更新数据

用update方法（局部更新）：

效率更高！可以看作是先删除后index的原子操作，省略了返回的过程，节省了来回传输的网络流量，同时避免了中间时间造成的文档修改冲突。（注意在body的参数中，需要添加doc或者script变量来指定修改的内容）

<此处参考：这4种Python更新Elasticsearch数据的方法你都会吗？>

from elasticsearch import Elasticsearches = Elasticsearch()
# 声明一个Elasticsearch对象esdata = {'doc': {'title': '微软大手笔：Xbox 宣布 687 亿美元收购动视暴雪为打造“元宇宙”添砖加瓦，后者盘前大涨 37%','url': 'https://www.ithome.com/0/599/223.htm','data': '2022-01-19'}
}
# 声明一条新闻数据（在原本基础上，新增日期）
result = es.update(index='it_news', id='1', body=data)
# 调用es的update方法插入，在调用时传入3个参数：
# index代表索引名称，id表示ID标识，body代表文档的具体内容
# 注意在body的参数中，需要添加doc或者script变量来指定修改的内容
print(result)

{'_index': 'it_news', '_type': '_doc', '_id': '2', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 4, '_primary_term': 1}

用index方法（全局更新）：

兼具插入与更新功能，更加便捷！本质上是进行了一次reindex的操作，所以当数据量或者document很庞大的时候，效率非常的低。

from elasticsearch import Elasticsearches = Elasticsearch()
# 声明一个Elasticsearch对象esdata = {'title': '微软大手笔：Xbox 宣布 687 亿美元收购动视暴雪为打造“元宇宙”添砖加瓦，后者盘前大涨 37%','url': 'https://www.ithome.com/0/599/223.htm','date': '2022-01-19'
}
# 声明一条新闻数据（在原本基础上，新增日期）
result = es.index(index='it_news', id='1', document=data)
# 调用es的index方法插入，在调用时传入3个参数：
# index代表索引名称，id表示ID标识，document代表文档的具体内容
print(result)

{'_index': 'it_news', '_type': '_doc', '_id': '1', '_version': 2, 'result': 'updated', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 2, '_primary_term': 1}

这里同样使用index方法进行更新操作。（如果数据不存在就执行插入操作，如果数据存在就执行更新操作，非常方便）在返回结果中，可以看到result字段为updated，表示更新成功。version字段为2，表示这是第二个版本（对比一下前面插入操作返回的信息，当时是1，每次更新后都会加1）。

8.删除数据

from elasticsearch import Elasticsearches = Elasticsearch()
# 声明一个Elasticsearch对象esresult = es.delete(index='it_news', id='1')
# 调用es的delete方法删除指定数据，在调用时传入3个参数：
# index代表索引名称，id表示ID标识
print(result)

{'_index': 'it_news', '_type': '_doc', '_id': '1', '_version': 3, 'result': 'deleted', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 3, '_primary_term': 1}

返回结果同样是JSON格式，result字段的deleted表示删除成功，此时version字段的值又加了1变成3。

9.查询数据

对于中文语境下的查询操作，必须先安装分词插件，这里用的是前面提及的IK Analysis。

初始化测试环境1（删除之前的索引，新建索引并更新索引的mapping信息）

from elasticsearch import Elasticsearches = Elasticsearch()
es.indices.delete(index='it_news', ignore=[400, 404])
# 删除之前创建的索引
es.indices.create(index='it_news', ignore=400)
# 新建一个it_news索引
mapping = {# 声明一个mapping信息'properties': {'title': {# 指定分词字段为“title”'type': 'text',# 指定字段类型为text'analyzer': 'ik_max_word',# 指定分词器为IK（如果不指定，则会使用默认的英文分词器）'search_analyzer': 'ik_max_word'# 指定搜索分词器为IK}}
}
result = es.indices.put_mapping(index='it_news', body=mapping)
# 更新索引的mapping信息
print(result)

{'acknowledged': True}

初始化测试环境2（导入四条document）：

from elasticsearch import Elasticsearches = Elasticsearch()
datas = [{'title': '微软大手笔：Xbox 宣布 687 亿美元收购动视暴雪为打造“元宇宙”添砖加瓦，后者盘前大涨 37%','url': 'https://www.ithome.com/0/599/223.htm',},{'title': '三星 Exynos 2200 GPU 跑分首次出炉：AMD RDNA2 加持，紧追骁龙 8 Gen 1','url': 'https://www.ithome.com/0/599/261.htm',},{'title': '三星晶圆代工客户超过 100 家，目标 2030 年超过台积电','url': 'https://www.ithome.com/0/599/403.htm',},{'title': '微软收购动视暴雪，游戏圈“大地震”：索尼在日本股市跌近 10%，创 2020 年 3 月以来最大跌幅','url': 'https://www.ithome.com/0/599/265.htm',},
]for data in datas:es.index(index='it_news', document=data)

基于索引名进行搜索查询：

from elasticsearch import Elasticsearches = Elasticsearch()result = es.search(index='it_news')
print(result)

{'took': 1020, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 4, 'relation': 'eq'}, 'max_score': 1.0, 'hits': [{'_index': 'it_news', '_type': '_doc', '_id': 'QTFKcn4Bw_xw4x78kmlW', '_score': 1.0, '_source': {'title': '微软大手笔：Xbox 宣布 687 亿美元收购动视暴雪为打造“元宇宙”添砖加瓦，后者盘前大涨 37%', 'url': 'https://www.ithome.com/0/599/223.htm'}}, {'_index': 'it_news', '_type': '_doc', '_id': 'QjFKcn4Bw_xw4x78k2k3', '_score': 1.0, '_source': {'title': '三星 Exynos 2200 GPU 跑分首次出炉：AMD RDNA2 加持，紧追骁龙 8 Gen 1', 'url': 'https://www.ithome.com/0/599/261.htm'}}, {'_index': 'it_news', '_type': '_doc', '_id': 'QzFKcn4Bw_xw4x78k2lt', '_score': 1.0, '_source': {'title': '三星晶圆代工客户超过 100 家，目标 2030 年超过台积电', 'url': 'https://www.ithome.com/0/599/403.htm'}}, {'_index': 'it_news', '_type': '_doc', '_id': 'RDFKcn4Bw_xw4x78k2mg', '_score': 1.0, '_source': {'title': '微软收购动视暴雪，游戏圈“大地震”：索尼在日本股市跌近 10%，创 2020 年 3 月以来最大跌幅', 'url': 'https://www.ithome.com/0/599/265.htm'}}]}}

▲ 这里查询出来了4条数据（在hits字段里面），其中total字段标明了查询结果的条目数，max_score代表了匹配结果中的最大分数。

基于全文信息进行检索：

from elasticsearch import Elasticsearches = Elasticsearch()
query = {'match': {'title': '微软 索尼'}}
result = es.search(index='it_news', query=query)
print(result)

这里使用Elasticsearch支持的Query语句进行查询，使用match指定全文检索，检索的字段是title，内容是“微软索尼”，搜索结果如下：

{'took': 2, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 2, 'relation': 'eq'}, 'max_score': 1.7566946, 'hits': [{'_index': 'it_news', '_type': '_doc', '_id': 'RDFKcn4Bw_xw4x78k2mg', '_score': 1.7566946, '_source': {'title': '微软收购动视暴雪，游戏圈“大地震”：索尼在日本股市跌近 10%，创 2020 年 3 月以来最大跌幅', 'url': 'https://www.ithome.com/0/599/265.htm'}}, {'_index': 'it_news', '_type': '_doc', '_id': 'QTFKcn4Bw_xw4x78kmlW', '_score': 0.67722493, '_source': {'title': '微软大手笔：Xbox 宣布 687 亿美元收购动视暴雪为打造“元宇宙”添砖加瓦，后者盘前大涨 37%', 'url': 'https://www.ithome.com/0/599/223.htm'}}]}}

▲返回了两条结果，第一条分数为1.7566946，第二条分数为0.67722493，这是因为第一条匹配到了两个关键词（微软+索尼），第二条只匹配到了一个关键词（微软）。可以看到，检索时会对对应字段进行全文检索，并且还会根据检索关键词的相关性进行排序——这其实就是一个基本的搜索引擎的雏形了。

补充一些文字：

ElasticSearch 是一个分布式的，高性能，高可用的，可伸缩的搜索和分析系统

（1）可以作为大型分布式集群（数百台服务器）技术，处理 PB 级的数据，服务大公司；也可以运行在单机上服务于小公司

（2）Elasticsearch 不是什么新技术，主要是将全文检索、数据分析以及分布式技术，合并在了一起，才形成了独一无二的 ES：lucene (全文检索)，商用的数据分析软件，分布式数据库

（3）对用户而言，是开箱即用的，非常简单，作为中小型应用，直接 3 分钟部署一下 ES，就可以作为生产环境的系统来使用了，此时的场景是数据量不大，操作不是太复杂

（4）数据库的功能面对很多领域是不够用的（事务，还有各种联机事务型的操作）；

特殊的功能，比如全文检索，同义词处理，相关度排名，复杂数据分析，海量数据的近实时处理，Elasticsearch 作为传统数据库的一个补充，提供了数据库所不能提供的很多功能。

<完>

知识补充2：Elasticsearch的基本使用（Windows+Python）相关推荐

谷粒商城之高级篇知识补充
谷粒商城高级篇之知识补充前言本篇主要是完成谷粒商城高级篇开发时,我们需要了解并学习一部分补充的知识,才能更好的完成商城业务. 以后我们将商城任务和额外知识分开来编写,方便商城业务的连贯性. 下面是 ...
{电脑救助站}常用知识1 来自常用知识( 网络安全中起重大作用的 Windows 命令)
常用知识( 网络安全中起重大作用的 Windows 命令) 常用在线查毒链接 3 楼超级兔子的在线检测功能 4楼准确探索间谍软件或广告软件 5楼预防间谍软 ...
Android知识补充(Android学习笔记)
Android知识补充 ●国际化所谓的国际化,就是指软件在开发时就应该具备支持多种语言和地区的功能,也就是说开发的软件能同时应对不同国家和地区的用户访问,并针对不同国家和地区的用户,提供相应的.符合 ...
【文件上传漏洞-01】文件上传漏洞概述、防御以及WebShell基础知识补充
目录 1 文件上传漏洞概述 2 文件上传漏洞防御.绕过.利用 2.1 黑白名单策略 3 WebShell基础知识补充 3.1 WebShell概述 3.2 大马与小马 1 文件上传漏洞概述概述:文件 ...
Python学习---django知识补充之CBV
Django知识补充之CBV Django: url --> def函数 FBV[function based view] 用函数和URL进行匹配 url --> ...
ASP分页要用到的属性ADO CursorLocation知识补充：
ASP分页要用到的属性ADO CursorLocation知识补充: 定义和用法 CursorLocation 属性可设置或返回一个 long 值,该值指示游标服务的位置.可被设置为 CursorLo ...
Java爬虫系列(五) - 爬虫补充内容+ElasticSearch展示数据
Java爬虫内容补充和ElasticSearch 网页去重之前我们对下载的url地址进行了去重操作,避免同样的url下载多次.其实不光url需要去重,我们对下载的内容也需要去重. 在网上我们可以找到 ...
前端知识补充说明（一）之HTTP/浏览器等基础原理部分
前端知识补充说明(一)之HTTP/浏览器等基础原理部分 1. 说一下http和https https的SSL加密是在传输层实现的.(1)http和https的基本概念http: 超文本传输协议,是互联 ...
疯狂的大柚柚带你玩转MSP-ESP430G2(基础篇) -----（六）基础知识补充
疯狂的大柚柚带你玩转MSP-ESP430G2(基础篇) (六)基础知识补充常用数据类型数据类型字节数值范围 Char 1 -128~127 Unsigned char 1 0`255 shor ...

知识补充2：Elasticsearch的基本使用（Windows+Python）