ElasticSearch全文搜索引擎之Windows集群搭建

一、为什么要搭建集群

二、Windows搭建ES集群

一、为什么要搭建集群

在搭建ElasticSearch集群之前，首先得搞明白我们为什么需要搭建集群？它有什么优势呢？ES单机部署为什么不行？

（1）高可用性

众所周知，搭建集群最明显的优点就是提供我们的高可用性，无论是之前学习的Redis集群，Eureka集群等等，单机版部署只适合我们自己学习使用，真正到了生产环境很少会单机部署某个服务，基本上都是集群部署，保证单台机器宕机时不会导致我们服务不可用。

Elasticsearch 作为一个搜索引擎，我们对它的基本要求就是存储海量数据并且可以在非常短的时间内查询到我们想要的信息。所以第一步我们需要保证的就是 Elasticsearch 的高可用性，什么是高可用性呢？

它通常是指，通过设计减少系统不能提供服务的时间，假设系统一直能够提供服务，我们说系统的可用性是 100%。如果系统在某个时刻宕掉了，比如某个网站在某个时间挂掉了，那么它就是不可用的。所以，为了保证 Elasticsearch 的高可用性，我们就应该尽量减少 Elasticsearch 的不可用时间。

那么怎样提高 Elasticsearch 的高可用性呢？

那就是将ElasticSearch进行集群部署，假如 Elasticsearch 只放在一台服务器上，即单机运行，假如这台主机突然断网了或者被攻击了，那么整个 Elasticsearch 的服务就不可用了。但如果改成 Elasticsearch 集群的话，有一台主机宕机了，还有其他的主机可以支撑，这样就仍然可以保证服务是可用的。

那可能有的小伙伴就会说了，那假如一台主机宕机了，那么不就无法访问这台主机的数据了吗？

为了解答这个问题，这里就引出了 Elasticsearch 的信息存储机制了。首先解答上面的问题，一台主机宕机了，这台主机里面存的数据依然是可以被访问到的，因为在其他的主机上也有备份，但备份的时候也不是整台主机备份，是分片备份的，那这里就又引出了一个概念——分片。

分片，英文叫做 Shard，顾名思义，分片就是对数据切分成了多个部分。我们知道 Elasticsearch 中一个索引（Index）相当于是一个数据库，如存某网站的用户信息，我们就建一个名为 user 的索引。但索引存储的时候并不是整个存一起的，它是被分片存储的，Elasticsearch 默认会把一个索引分成五个分片，当然这个数字是可以自定义的。分片是数据的容器，数据保存在分片内，分片又被分配到集群内的各个节点里。当你的集群规模扩大或者缩小时， Elasticsearch 会自动的在各节点中迁移分片，使得数据仍然均匀分布在集群里，所以相当于一份数据被分成了多份并保存在不同的主机上。

如果一台主机挂掉了，那么这个分片里面的数据不就无法访问了？别的主机都是存储的其他的分片。其实是可以访问的，因为其他主机存储了这个分片的备份，叫做副本，这里就引出了另外一个概念——副本。

副本，英文叫做 Replica，同样顾名思义，副本就是对原分片的复制，和原分片的内容是一样的，Elasticsearch 默认会生成一份副本，所以相当于是五个原分片和五个分片副本，相当于一份数据存了两份，并分了十个分片，当然副本的数量也是可以自定义的。这时我们只需要将某个分片的副本存在另外一台主机上，这样当某台主机宕机了，我们依然还可以从另外一台主机的副本中找到对应的数据。所以从外部来看，数据结果是没有任何区别的。

一般来说，Elasticsearch 会尽量把一个索引的不同分片存储在不同的主机上，分片的副本也尽可能存在不同的主机上，这样可以提高容错率，从而提高高可用性。

（2）健康状态

针对一个索引，Elasticsearch 中其实有专门的衡量索引健康状况的标志，分为三个等级：

green，绿色。这代表所有的主分片和副本分片都已分配。你的集群是 100% 可用的。
yellow，黄色。所有的主分片已经分片了，但至少还有一个副本是缺失的。不会有数据丢失，所以搜索结果依然是完整的。不过，你的高可用性在某种程度上被弱化。如果更多的分片消失，你就会丢数据了。所以可把 yellow 想象成一个需要及时调查的警告。
red，红色。至少一个主分片以及它的全部副本都在缺失中。这意味着你在缺少数据：搜索只能返回部分数据，而分配到这个分片上的写入请求会返回一个异常。

如果你只有一台主机的话，其实索引的健康状况也是 yellow，因为一台主机，集群没有其他的主机可以防止副本，所以说，这就是一个不健康的状态，因此集群也是十分有必要的。

（3）存储空间

另外，既然是群集，那么存储空间肯定也是联合起来的，假如一台主机的存储空间是固定的，那么集群它相对于单个主机也有更多的存储空间，可存储的数据量也更大。

综合上述所说，所以我们需要搭建es集群。

上述关于搭建ES集群的说明引用自：https://www.cnblogs.com/tianyiliang/p/10291305.html，笔者觉得写得特别赞，小伙伴们可以参考学习。

二、Windows搭建ES集群

这里笔者为了学习，只搭建两个node节点的ES集群，更多节点配置大体都类似，无非多加一个节点配置而已。下面来看一下详细的集群搭建步骤：

【a】解压缩elasticsearch-7.6.1-windows-x86_64.zip

解压完成之后复制两份，这里笔者起名为：elasticsearch-7.6.1_node1、elasticsearch-7.6.1_node2

注意：如果复制的解压缩包是之前已经使用过的es解压包并且解压包里面已经存在历史数据，那么数据一定要删除，否则集群搭建不成功。即所以如果使用之前用过的解压缩包，需要删除解压缩文件夹中的data目录。

【b】配置集群节点node0

更改elasticsearch7.6.1的配置文件elasticsearch.yml，具体修改后的文件内容如下：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
#设置集群名称，集群内所有节点的名称必须一致
cluster.name: myEsCluster
#
# ------------------------------------ Node ------------------------------------
#
# 设置节点名称，集群内节点名称必须唯一
node.name: node0
#
# Add custom attributes to the node:
#
#node.attr.rack: r1# 表示该节点会不会作为主节点，true表示会；false表示不会
node.master: true# 当前节点是否用于存储数据，是：true、否：false
node.data: true#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):# 索引数据存放的位置
#path.data: /path/to/data
#
# Path to log files:# 日志文件存放的位置
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:#需求锁住物理内存，是：true、否：false
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 监听地址，用于访问该es
network.host: 127.0.0.1#
# Set a custom port for HTTP:# es对外提供的http端口，默认 9200
http.port: 9200# TCP的默认监听端口，默认 9300
transport.tcp.port: 9300#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1，对于大的集群来说，可以设置大一点的值（2-4）
discovery.zen.minimum_master_nodes: 2# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]# es7.x 之后新增的配置，写入候选主节点的设备地址，在开启服务后可以被选为主节点
discovery.seed_hosts: ["127.0.0.1:9300", "127.0.0.1:9301"]
discovery.zen.fd.ping_timeout: 1m
discovery.zen.fd.ping_retries: 5# Bootstrap the cluster using an initial set of master-eligible nodes:
# es7.x 之后新增的配置，初始化一个新的集群时需要此配置来选举master
cluster.initial_master_nodes: ["node0", "node1"]#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true# 是否支持跨域，是：true，在使用head插件时需要此配置
http.cors.enabled: true
# “*” 表示支持所有域名
http.cors.allow-origin: "*"

【c】配置集群节点node1

同理，另外一个节点修改elasticsearch.yml，具体修改后的文件内容如下：

# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
#       Before you set out to tweak and tune the configuration, make sure you
#       understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/en/elasticsearch/reference/index.html
#
# ---------------------------------- Cluster -----------------------------------
#
#设置集群名称，集群内所有节点的名称必须一致
cluster.name: myEsCluster
#
# ------------------------------------ Node ------------------------------------
#
# 设置节点名称，集群内节点名称必须唯一
node.name: node1
#
# Add custom attributes to the node:
#
#node.attr.rack: r1# 表示该节点会不会作为主节点，true表示会；false表示不会
node.master: true# 当前节点是否用于存储数据，是：true、否：false
node.data: true#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):# 索引数据存放的位置
#path.data: /path/to/data
#
# Path to log files:# 日志文件存放的位置
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:#需求锁住物理内存，是：true、否：false
#bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
# 监听地址，用于访问该es
network.host: 127.0.0.1#
# Set a custom port for HTTP:# es对外提供的http端口，默认 9200
http.port: 9201# TCP的默认监听端口，默认 9300
transport.tcp.port: 9301#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1，对于大的集群来说，可以设置大一点的值（2-4）
discovery.zen.minimum_master_nodes: 2# Pass an initial list of hosts to perform discovery when this node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]# es7.x 之后新增的配置，写入候选主节点的设备地址，在开启服务后可以被选为主节点
discovery.seed_hosts: ["127.0.0.1:9300", "127.0.0.1:9301"]
discovery.zen.fd.ping_timeout: 1m
discovery.zen.fd.ping_retries: 5# Bootstrap the cluster using an initial set of master-eligible nodes:
# es7.x 之后新增的配置，初始化一个新的集群时需要此配置来选举master
cluster.initial_master_nodes: ["node0", "node1"]#
# For more information, consult the discovery and cluster formation module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true# 是否支持跨域，是：true，在使用head插件时需要此配置
http.cors.enabled: true
# “*” 表示支持所有域名
http.cors.allow-origin: "*"

【d】分别启动两台elasticsearch7.6.1节点

启动es-head插件：

启动完成后，浏览器访问：http://localhost:9200/

浏览器访问：http://localhost:9201/

浏览器访问：http://localhost:9100/

如上图，我们可以看到es当前有两个节点node0和node1，说明我们集群搭建成功，我们可以查看每个节点的信息：

接着我们可以启动kibana，启动完成后，浏览器访问：http://localhost:5601/，测试往里面创建两个文档：

PUT /user/_doc/1
{"name":"张三","age": 10
}PUT /user/_doc/2
{"name":"李四","age": 20
}

执行完成后，查看索引内数据：

通过kibana还能查看集群健康信息：

GET /_cat/health?v

如上图，可以看到，当前集群有两个节点，并且节点有两个文档数据，以及主分片数量等信息。各个属性说明如下：

cluster ，集群名称
status，集群状态 green代表健康；yellow代表分配了所有主分片，但至少缺少一个副本，此时集群数据仍旧完整；red代表部分主分片不可用，可能已经丢失数据。
node.total，代表在线的节点总数量
node.data，代表在线的数据节点的数量
shards， active_shards 存活的分片数量
pri，active_primary_shards 存活的主分片数量。正常情况下 shards的数量是pri的两倍
relo， relocating_shards 迁移中的分片数量，正常情况为 0
init， initializing_shards 初始化中的分片数量正常情况为 0
unassign， unassigned_shards 未分配的分片正常情况为 0
pending_tasks，准备中的任务，任务指迁移分片等正常情况为 0
max_task_wait_time，任务最长等待时间
active_shards_percent，正常分片百分比正常情况为 100%

通过kibana查看集群中节点信息：

GET /_cat/nodes?v

通过kibana查看集群中的索引信息：

GET /_cat/indices?v

至此，我们成功搭建了两个节点的es集群。在生产环境，肯定都是在Linux系统中进行集群部署，这里为了学习方便，直接使用Windows搭建，希望对小伙伴们的学习有所帮助。