Elasticsearch

1 Elasticsearch是什么？
2 Elasticsearch安装与配置
- 2.1 Ubuntu操作系统安装
- 2.2 ES配置
3 ES Restful API基本使用
- 3.1 Mapping
- 3.2 REST API
- - 3.2.1 集群信息
  - 3.2.2 集群中的节点信息
  - 3.2.3 集群中的索引信息
  - 3.2.4 创建索引
  - 3.2.5 删除索引
  - 3.2.6 创建索引的mapping
  - 3.2.7 删除索引的mapping
  - 3.2.9 查看索引的mapping
  - 3.2.10 新增文档
  - 3.2.11 更新文档
  - 3.2.12 删除文档
  - 3.2.13 查看文档
- 3.3 REST API文档结构

1 Elasticsearch是什么？

是一个基于Apache Lucene™的开源搜索引擎

接近实时（NRT）
集群（cluster）
节点（node） - 集群中的一个服务器
索引（index） - 类似redis里的database
类型（type） - 在一个索引中具体分类
文档（document） - 物理上存在索引中，类赋予一个类型
分片和复制（shards & replicas） - j将索引划分成多份，为了防止分片故障，可以复制分片。默认5个主分片、5个复制分片

2 Elasticsearch安装与配置

前置条件： JDK版本不能低于1.7_55

2.1 Ubuntu操作系统安装

下载TAR.GZ格式的1.5.0版本的安装包
解压缩，成功后完成安装
tar –vxf elasticsearch-1.5.0.tar.gz
运行（前台启动）

cd /home/elasticsearch/elasticsearch-1.5.0/bin/chmod +x * ./elasticsearch

补充：

后台启动（三种方法）
./elasticsearch –d #在后台运行Elasticsearch
./elasticsearch -d -Xmx2g -Xms2g #后台启动，启动时指定内存大小（2G）
./elasticsearch -d -Des.logger.level=DEBUG #可以在日志中打印出更加详细的信息。
如果和本地jdk不兼容报错，改成elasticsearch-7.6.1就可以了

warning: ignoring JAVA_HOME=C:\Java\jdk1.8.0_191； using bundled JDK

验证
请求http://127.0.0.1:9200，如果有返回则说明安装成功

{"status" : 200,"name" : "Captain Zero","cluster_name" : "elasticsearch","version" : {"number" : "1.5.0","build_hash" : "544816042d40151d3ce4ba4f95399d7860dc2e92","build_timestamp" : "2015-03-23T14:30:58Z","build_snapshot" : false,"lucene_version" : "4.10.4"},"tagline" : "You Know, for Search"
}

2.2 ES配置

配置文件所在的目录路径如下：$ES_HOME/config/elasticsearch.yml

配置项含义

参数	含义
cluster.name: elasticsearch	elasticsearch的集群名称，默认是elasticsearch。生成环境时建议更改。
node.name: “Franz Kafka”	节点名，默认随机指定，位置elasticsearch的jar包中config/name.txt
node.master: true	是否有资格被选举成为node，默认是true
node.data: true	是否存储索引数据，默认为true。如果节点配置node.master:false并且node.data: false，则该节点将起到负载均衡的作用
index.number_of_shards: 5	默认索引分片个数，默认为5片
index.number_of_replicas:	默认索引副本个数，默认为1个副本
path.conf: /path/to/conf	配置文件的存储路径
path.data:/path/to/data1,/path/to/data2	索引数据的存储路径
path.work:/path/to/work	临时文件的存储路径
path.logs: /path/to/logs	日志文件的存储路径
path.plugins: /path/to/plugins	插件的存放路径
bootstrap.mlockall: true	锁住内存。可以把ES_MIN_MEM和ES_MAX_MEM两个环境变量设置成同一个值，并且保证机器有足够的内存分配给es。同时也要允许elasticsearch的进程可以锁住内存，linux下可以通过ulimit -l unlimited命令。
network.bind_host: 192.168.0.1	绑定的ip地址，默认为0.0.0.0
network.publish_host: 192.168.0.1	其它节点和该节点交互的ip地址
network.host: 192.168.0.1	同时设置bind_host和publish_host
transport.tcp.port: 9300	节点间交互的tcp端口，默认是9300。
transport.tcp.compress: false	是否压缩tcp传输时的数据，默认为false
http.port: 9200	对外服务的http端口，默认为9200
http.max_content_length: 100mb	内容的最大容量，默认100mb
http.enabled: true	是否使用http协议对外提供服务，默认为true
gateway.type: local	gateway的类型，默认为local即为本地文件系统，可以设置为本地文件系统，分布式文件系统，hadoop的HDFS，和amazon的s3服务器，其它文件系统的设置。
gateway.recover_after_nodes: 1	集群中N个节点启动时进行数据恢复，默认为1
gateway.recover_after_time: 5m	始化数据恢复进程的超时时间，默认是5分钟
gateway.expected_nodes: 2	这个集群中节点的数量，默认为2
cluster.routing.allocation.node_initial_primaries_recoveries: 4	初始化数据恢复时，并发恢复线程的个数，默认为4
cluster.routing.allocation.node_concurrent_recoveries: 2	加删除节点或负载均衡时并发恢复线程的个数，默认为4
indices.recovery.max_size_per_sec: 0	数据恢复时限制的带宽，如入100mb，默认为0，即无限制。
indices.recovery.concurrent_streams: 5	限制从其它分片恢复数据时最大同时打开并发流的个数，默认为5
discovery.zen.minimum_master_nodes: 1	集群中的节点可以知道其它N个有master资格的节点。默认为1
discovery.zen.ping.timeout: 3s	集群中自动发现其它节点时ping连接超时时间，默认为3秒
discovery.zen.ping.multicast.enabled: true	是否打开多播发现节点，默认是true
discovery.zen.ping.unicast.hosts: [“host1”, “host2:port”, “host3 [portX-portY] “]	集群中master节点的初始列表，可以通过这些节点来自动发现新加入集群的节点

其他配置：

threadpool:search:type: fixedmin: 60max: 80queue_size: 1000
// 配置es服务器的执行查询操作时所用线程池，fix固定线程数的线程池。

index :store:type: memory
// 表示索引存储在内存中，当然es不太建议这么做。经本人测试，做查询时，使用内存索引并不会比正常的索引快。

index.mapper.dynamic: false
// 禁止自动创建mapping。默认情况下，es可以根据数据类型自动创建mapping。配置成这样，可以禁止自动创建mapping的行为。至于什么是mapping，在之后的博文中再介绍。

index.query.parse.allow_unmapped_fields: false
// 不能查找没有在mapping中定义的属性

3 ES Restful API基本使用

基于HTTP协议的Rest API

3.1 Mapping

在ES中，我们无需手动创建type（相当于table）和mapping(相关与schema)。在默认配置下，ES可以根据插入的数据自动地创建type及其mapping。也可以通过配置文件关闭ES的自动创建mapping功能。

mapping中主要包括字段名、字段数据类型和字段索引类型这3个方面的定义。

数据类型

大类	小类
String	string
Whole number	byte, short, integer, long
Floating point	float, double
Boolean	boolean
Date	date

索引
基于Apache Lucene，在ES中，只有建立了索引的字段，才能作为查询条件，不然只是数据

mapping中string类型字段可以配置的索引类型

索引类型	含义
analyzed	首先使用分析器（analyser）分析这个字符串，然后再建立索引。换言之，以全文形式索引此字段。
not_analyzed	索引这个字段，使之可以被搜索，但是索引内容和指定值一样。不分析此字段。
no	不索引这个字段。这个字段不能被搜索到。

3.2 REST API

3.2.1 集群信息

curl -XGET “localhost:9200/_cat/heath?v”
curl -XGET “localhost:9200/_cat/heath?help” 有字段含义
curl -XGET “localhost:9200/_cat/health?h=cluster,pri,relo&v” 查看指定字段

结果：

epoch      timestamp cluster       status node.total node.data shards pri relo init unassign pending_tasks
1440206633 18:23:53  elasticsearch green           1         1      0   0    0    0        0             0

返回结果的主要字段意义：

字段	含义
cluster	集群名，是在ES的配置文件中配置的cluster.name的值。
status	集群状态。集群共有green、yellow或red中的三种状态。green代表一切正常（集群功能齐全），yellow意味着所有的数据都是可用的，但是某些复制没有被分配（集群功能齐全），red则代表因为某些原因，某些数据不可用。如果是red状态，则要引起高度注意，数据很有可能已经丢失。
node.total	集群中的节点数。
node.data	集群中的数据节点数。
shards	集群中总的分片数量。
pri	主分片数量，英文全称为private。
relo	复制分片总数。
unassign	未指定的分片数量，是应有分片数和现有的分片数的差值（包括主分片和复制分片）。

3.2.2 集群中的节点信息

curl -XGET “localhost:9200/_cat/nodes?v”

host          ip            heap.percent ram.percent load node.role master name
master.hadoop localhost            3          35 0.00 d         *      Ezekiel

3.2.3 集群中的索引信息

curl -XGET “localhost:9200/_cat/indices?v”

health status index      pri rep docs.count docs.deleted store.size pri.store.size
yellow open   index_test   5   1          0            0       575b           575b

3.2.4 创建索引

curl -XPUT “localhost:9200/index_test”
curl -XPUT “localhost:9200/index_test” -d ’ 创建好settings和mapping

curl -XPUT "localhost:9200/index_test" -d ' # 注意这里的'号
{"settings": {"index": {"number_of_replicas": "1", # 设置复制数"number_of_shards": "5" # 设置主分片数}},"mappings": { # 创建mapping"test_type": { # 在index中创建一个新的type(相当于table)"properties": {"name": { # 创建一个字段（string类型数据，使用普通索引）"type": "string","index": "not_analyzed"},"age": {"type": "integer"}}}}
}'

3.2.5 删除索引

curl -XDELETE “localhost:9200/index_test”

3.2.6 创建索引的mapping

curl -XPUT 'localhost:9200/index_test/_mapping/test_type' -d '
{"test_type": { # 注意，这里的test_type与url上的test_type名保存一致"properties": {"name": {"type": "string","index": "not_analyzed"},"age": {"type": "integer"}}}}'

3.2.7 删除索引的mapping

curl -XDELETE ‘localhost:9200/index_test/_mapping/test_type’

3.2.9 查看索引的mapping

curl -XGET ‘localhost:9200/index_test/_mapping/test_type’

3.2.10 新增文档

这里的pretty参数的作用是使得返回的json显示地更加好看。1是文档的id值

curl -XPUT 'localhost:9200/index_test/test_type/1?pretty' -d '
{"name": "zhangsan","age" : "12"
}'

3.2.11 更新文档

这里的1必须是索引中已经存在id，否则就会变成新增文档操作

curl -XPOST 'localhost:9200/index_test/test_type/1?pretty' -d '
{"name": "lisi","age" : "12"
}'

3.2.12 删除文档

curl -XDELETE ‘localhost:9200/index_test/test_type/1?pretty’
这里的1必须是索引中已经存在id

3.2.13 查看文档

curl -XGET ‘localhost:9200/index_test/test_type/1?pretty’

3.3 REST API文档结构

cat.health.json文件为例简单地介绍这些Rest API文档的结构

{"cat.health": {"documentation": "http://www.elastic.co/guide/en/elasticsearch/reference/master/cat-health.html", # 该文档对应的官方站点"methods": ["GET"], "url": { # url部分可选"path": "/_cat/health",  "paths": ["/_cat/health"],"parts": {},"params": {"local": {"type" : "boolean","description" : "Return local information, do not retrieve the state from master node (default: false)"},"master_timeout": {"type" : "time","description" : "Explicit operation timeout for connection to master node"},"h": {"type": "list","description" : "Comma-separated list of column names to display"},"help": {"type": "boolean","description": "Return help information","default": false},"ts": {"type": "boolean","description": "Set to false to disable timestamping","default": true},"v": {"type": "boolean","description": "Verbose mode. Display column headers","default": true}}},"body": null}
}

命令例子：
curl -XGET “localhost:9200/_cat/health?v” -d ‘body’

第1部分（-XGET）：对应文档中methods所包含的GET操作。
第2部分（localhost:9200）：是ES服务端所在主机的hostname和port。
第3部分（/_cat/health）：对应文档中的url。其中path是最简单的url；paths是除了path之外的其他url；parts描述和解释paths里面的url的可变部分（通常用{}包裹，如{index}）。
第4部分v：表示参数，对应文档中的params。像“v”这种boolean类型的参数，不需要特意指定其布尔值（true或者false），出现即表示true，否则为false。
第5部分body：表示要传递的数据主体,对应文档中的body。如果body里面指明“required=true”,则表示必须传入body数据。具体body里面需要传怎样的数据，则可以访问文档中的documentation字段所指明的官方站点进行查询。

Elasticsearch笔记相关推荐

狂神Elasticsearch笔记
ElasticSearch笔记我们要讲解什么? SQL : like %狂神说% ,如果是的大数据,就十分慢!索引! ElasticSearch:搜索! (百度.github. 淘宝电商! ) 1. ...
Elasticsearch笔记（三）基础知识
Elasticsearch笔记(二)安装与使用理论知识索引.文档.节点.分片索引.文档偏向开发人员节点.分片偏向运维人员文档(Document) 简介 Elasticsearch是面向文档的 ...
Elasticsearch笔记-es基础版——一看就会
文章目录 1.基础 2.语法 3.基本查询 4.高级查询 1.基础 elasticsearch是一个分布式的全文搜索引擎. 基于Lucene.具有restful的api接口.分布式,高横向扩展能力. ...
Elasticsearch笔记基础入门
并不能保证通过这四篇文章让你掌握ES,但是!我会用大白话串讲ES的一些概念.和花哨的玩法.起码可以把你对Elasticsearch的陌生度降到最低,等有一天你自己业务需要使用ES时,会因为提前读了ES ...
Elasticsearch笔记三之版本控制和插件
转载来源 :https://blog.csdn.net/ty4315/article/details/52264235 版本控制 1:关系型数据库使用的是悲观锁,数据被读取后就被锁定其他的线程就无法对 ...
Elasticsearch笔记（九）：实践篇-查找附近的人
到这里Elasticsearch的整个系列分享就基本上结束了,当然后续还是会针对某一点进行讲解.为何要在实践篇中讲解"查找附近的人"呢?说实话,想了很久,最终才确定下来,总体希望这 ...
Zhong__CentOS7安装Elasticsearch笔记
时间:2019.04.12 环境:Centos 目的:Centos系统安装Elasticsearch 说明: 作者:Zhong QQ交流群:121160124 欢迎加入! ElasticSearch是 ...
【ElasticSearch笔记】
ElasticSearch 全文搜索原理计算文档相关性的算法:TF-IDF(词频-逆文档频率) 词频:查找的单词在文档出现次数越高,得分越高逆文档频率:如果某个单词在所有文档中出现的次数较低,则其 ...
Elasticsearch笔记（四）—— Java API的使用
可结合上一篇(三)Elasticsearch基于Json的基本操作食用查询相关的API 分页查询所有文档 @Autowiredprivate RestHighLevelClient hClient; ...

Elasticsearch笔记