文章目录

  • Error汇总
    • 1. 由于Elasticsearch可以输入且执行脚本,为了系统安全,不允许使用root启动
    • 2. 外部无法访问
    • 3. 解决 max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
    • 4. sysctl: setting key "vm.max_map_count": Read-only file system
    • 5. the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
  • 一、安装及插件
    • 1. 下载
    • 2. 解压
    • 3. 运行
    • 4. 修改profile,环境变量
    • 5. 设置外网访问
    • 6. 后台运行
    • 7. 问题汇总
    • 8. 插件安装
      • 8.1 kibana
      • 8.2 ik分词
  • 二、基本概念
  • 三、新建和删除 Index
  • 四、中文分词设置
  • 五、数据操作
  • 六、数据查询
    • 6.1 返回所有记录
    • 6.2 全文搜索
    • 6.3 逻辑运算
    • 6.4 复杂查询实例
  • 七、python简单调用,实现中文分词检索功能

发现好久没更新博客了,由于实习公司用到了ES,这两天一直在研究ES,正好记录一下学习的过程,方便以后查阅,后续可能汇总结一下ES的原理,并对比分析一下Lucene,本节主要是对增删改差等基本操作的记录,以及通过python调用的实例,过年新开工,今年继续加油,希望能够斩获秋招!

Error汇总

1. 由于Elasticsearch可以输入且执行脚本,为了系统安全,不允许使用root启动

useradd -m work
su work

2. 外部无法访问

vim elasticsearch.yml

# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#

3. 解决 max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

// 修改下面的文件  里面是一些内核参数
vi /etc/sysctl.conf //添加以下配置
vm.max_map_count=655360

保存,然后:

sysctl -p
//-p   从指定的文件加载系统参数,如不指定即从/etc/sysctl.conf中加载

会提示错误:

4. sysctl: setting key “vm.max_map_count”: Read-only file system

这是因为Docker的base image做的很精简,甚至都没有init进程,原本在OS启动时执行生效系统变量的过程(sysctl -p)也给省略了,导致这些系统变量依旧保留着kernel默认值,这时候需要我们在容器启动时加入 –privileged 来获取修改系统参数的权限

这里我选择的是修改宿主机本身的配置文件,然后重新启动镜像,也能解决问题,退出容器,返回到宿主机
修改vm.max_map_count 可以通过命令行修改,但是在机器重启时会失效,所以通过修改配置文件来解决问题
命令行修改办法:

sudo sysctl -w vm.max_map_count=655360

5. the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured

elasticsearch.yml文件node.name: node-1 前面的#打开cluster.initial_master_nodes: ["node-1"] 这里一定要这样设置,我就是这里没有这样设置出问题的,弄了好久

一、安装及插件

1. 下载

在官网https://www.elastic.co/downloads/elasticsearch下载安装包,本文下载的是elasticsearch-6.4.3.tar.gz,将下载的安装包上传至服务器上。

2. 解压

tar zxvf elasticsearch-6.4.3.tar.gz -C /usr/local

3. 运行

cd /usr/local/elasticsearch-6.4.3/./bin/elasticsearch

4. 修改profile,环境变量

vim ~/.bash_profile
export PATH=$PATH:/usr/local/elasticsearch-6.4.3/bin
# 以后直接 elasticsearch 就可以运行了

5. 设置外网访问

cd /usr/local/elasticsearch-6.4.3/config
vim elasticsearch.yml
# 增加配置:
network.host: 0.0.0.0

在浏览器访问:http://IP地址:9200/,出现以下配置表示安装成功。

name "mQQdX2p"
cluster_name    "elasticsearch"
cluster_uuid    "em-fVPdIRLeG4BZAkdRarA"
version
number  "6.4.3"
build_flavor    "default"
build_type  "tar"
build_hash  "fe40335"
build_date  "2018-10-30T23:17:19.084789Z"
build_snapshot  false
lucene_version  "7.4.0"
minimum_wire_compatibility_version  "5.6.0"
minimum_index_compatibility_version "5.0.0"
tagline "You Know, for Search"

6. 后台运行

elasticsearch -d

7. 问题汇总

ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-11-23T10:00:28,057][INFO ][o.e.n.Node               ] [mQQdX2p] stopping ...
[2018-11-23T10:00:28,100][INFO ][o.e.n.Node               ] [mQQdX2p] stopped
[2018-11-23T10:00:28,101][INFO ][o.e.n.Node               ] [mQQdX2p] closing ...
[2018-11-23T10:00:28,111][INFO ][o.e.n.Node               ] [mQQdX2p] closed

解决方案:

sudo su
vim /etc/sysctl.conf
#  添加下面配置:
vm.max_map_count=655360
# 配置生效:
sysctl -p

8. 插件安装

8.1 kibana

$ curl -O https://artifacts.elastic.co/downloads/kibana/kibana-5.6.9-linux-x86_64.tar.gz
$ sha1sum kibana-5.6.9-linux-x86_64.tar.gz
$ tar -xzf kibana-5.6.9-linux-x86_64.tar.gz
$ cd kibana-5.6.9-linux-x86_64/
#启动
$ ./bin/kibana
$ nohup ./bin/kibana & #在后台运行#停止
$ ps -ef | grep node # 查找kibana的pid(./bin/../node/bin/node --no-warnings ./bin/../src/cli)
$ kill pid

8.2 ik分词

https://github.com/medcl/elasticsearch-analysis-ik

二、基本概念

全文搜索引擎 Elasticsearch 入门教程

Elasticsearch—基础介绍及索引原理分析

2.1 Index
下面的命令可以查看当前节点的所有 Index

curl -X GET 'http://localhost:9200/_cat/indices?v'

2.2 Type
下面的命令可以列出每个 Index 所包含的 Type。

$ curl 'localhost:9200/_mapping?pretty=true'

三、新建和删除 Index

curl -X PUT 'localhost:9200/weather'
curl -X DELETE 'localhost:9200/weather'

四、中文分词设置

$ curl -H "Content-Type: application/json" -X PUT 'localhost:9200/accounts' -d '
{"mappings": {"person": {"properties": {"user": {"type": "text","analyzer": "smartcn","search_analyzer": "smartcn"},"title": {"type": "text","analyzer": "smartcn","search_analyzer": "smartcn"},"desc": {"type": "text","analyzer": "smartcn","search_analyzer": "smartcn"}}}}
}'

五、数据操作

5.1 新增记录

$ curl -H "Content-Type: application/json" -X PUT 'localhost:9200/accounts/person/1' -d '
{"user": "张三","title": "工程师","desc": "数据库管理"
}' $ curl -H "Content-Type: application/json" -X POST 'localhost:9200/accounts/person' -d '
{"user": "李四","title": "工程师","desc": "系统管理"
}'curl -X DELETE 'localhost:9200/accounts/person/1'curl  -H "Content-Type: application/json" -X PUT 'localhost:9200/accounts/person/1' -d '
{"user" : "张三","title" : "工程师","desc" : "数据库管理,软件开发"
}'

六、数据查询

6.1 返回所有记录

使用 GET 方法,直接请求/Index/Type/_search,就会返回所有记录。

curl 'localhost:9200/accounts/person/_search'

6.2 全文搜索

Elastic 的查询非常特别,使用自己的查询语法,要求 GET 请求带有数据体。

$ curl -H "Content-Type: application/json" 'localhost:9200/accounts/person/_search'  -d '
{"query" : { "match" : { "desc" : "软件" }},"from": 1,"size": 1
}'

size指定,每次只返回一条结果。还可以通过from字段,指定位移。

6.3 逻辑运算

如果有多个搜索关键字, Elastic 认为它们是or关系。

$ curl -H "Content-Type: application/json" localhost:9200/accounts/person/_search'  -d '
{"query" : { "match" : { "desc" : "软件 系统" }}
}'

上面代码搜索的是软件 or 系统。

如果要执行多个关键词的and搜索,必须使用布尔查询。

$ curl -H "Content-Type: application/json" localhost:9200/accounts/person/_search'  -d '
{"query": {"bool": {"must": [{ "match": { "desc": "软件" } },{ "match": { "desc": "系统" } }]}}
}'

6.4 复杂查询实例

  1. 查询时间戳>某个时间并且shopId为100000002和100000006的在SQL中是这样的:
select * from shopsOrder where timestamp>1523671189000 and shopid in ("100000002","100000006")
  • 在ES中就得这么查:
POST:http://192.168.0.1:9200/shopsinfo/shopsOrder/_search
{"size":20,"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1523671189000}}},{"terms":{"shopid":["100000002","100000006"]}}]}}
}
  1. 统计的话ES是以aggs作为参数,全称应该叫做Aggregation,比如接着刚才的查询我想计算出结果的amount总额是多少就是类似SQL中的
select sum(amount)query_amount from shopsOrder where timestamp>1523671189000 and shopid in ("100000002","100000006")
  • 在ES中就得这么查
{"aggs":{"query_amount":{"sum":{"field":"amount"}}},"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1523671189000}}},{"terms":{"shopid":["100000002","100000006"]}}]}}
}
  1. 按天分组进行统计查询SQL中的提现是这样的:
select createdate,sum(amount)query_amount from shopsOrder where timestamp>1523671189000 and shopid in ("100000002","100000006")
group by createdate order by createdate
  • 在ES中是这样的:
{"size":0,"aggs":{"orderDate":{"terms":{"field":"createdate","order":{"_term":"asc"}},"aggs":{"query_amount":{"sum":{"field":"amount"}}}}},"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1523671189000}}},{"terms":{"shopid":["100000002","100000006"]}}]}}
}
  • 查询结果为
......
"aggregations": {"orderDate": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 99,"buckets": [......{"key": "20180415","doc_count": 8,"query_amount": {"value": 31632}},{"key": "20180417","doc_count": 3,"query_amount": {"value": 21401}},{"key": "20180418","doc_count": 2,"query_amount": {"value": 2333}}......]}
}
  • buckets中就是查询的结果,key为按我createdate分组后的值,doc_count类似count,query_amount为sum后的值。至于我的参数里面有一个size:0是因为我不需要具体的记录就是hits,所以这里传0
  1. 最后我们来个更复杂的1、统计所有的总额;2、先按paymentType支付方式分组统计amount总额,并且每个支付方式中再按天分组统计每天的amount总额
 {"size":0,"aggs":{"amount":{"sum":{"field":"amount"}},"paymenttype":{"terms":{"field":"paymentType"},"aggs":{"query_amount":{"sum":{"field":"amount"}},"payment_date":{"terms":{"field":"createdate"},"aggs":{"query_amount":{"sum":{"field":"amount"}}}}}}},"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1523671189000}}},{"terms":{"shopid":["100000002","100000006"]}}]}}
}
  • 查询结果为:
......
"amount": {"value": 684854
},
"paymenttype":{......"buckets": [{"key": "wechatpay","doc_count": 73,"amount": {"value": 351142},"payment_date": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 25,"buckets": [......{"key": "20180415","doc_count": 6,"amount": {"value": 29032}},{"key": "20180425","doc_count": 6,"amount": {"value": 21592}}......]}},{"key": "alipay","doc_count": 67,"amount": {"value": 333712},"payment_date": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 23,"buckets": [......{"key": "20180506","doc_count": 8,"amount": {"value": 38280}},{"key": "20180426","doc_count": 6,"amount": {"value": 41052}}......]}}]
}

七、python简单调用,实现中文分词检索功能

# -*- coding:UTF-8 -*-
from elasticsearch import Elasticsearch
import pymysql
from utils.util_tools import *
from conf import tmp_dir
import os
from elasticsearch.helpers import bulkclass ElasticObj:def __init__(self, index_name, index_type, ip="127.0.0.1"):''':param index_name: 索引名称:param index_type: 索引类型'''self.index_name = index_nameself.index_type = index_type# 用户名 密码self.es = Elasticsearch([ip])# self.es = Elasticsearch([ip],http_auth=('elastic', 'password'),port=9200)def create_index(self, index_mappings):# 创建索引if self.es.indices.exists(index=self.index_name) is not True:res = self.es.indices.create(index=self.index_name, body=index_mappings, ignore=400)print(res)def delete_index(self):result = self.es.indices.delete(index=self.index_name)print(result)def bulk_index_data(self, in_list):'''用bulk将批量数据存储到es:return:'''ACTIONS = []i = 1for line in in_list:action = {"_index": self.index_name,"_type": self.index_type,"_id": i,  # _id 也可以默认生成,不赋值"_source": {"date": line['date'],"source": line['source'],"link": line['link'],"keyword": line['keyword'],"title": line['title']}}i += 1ACTIONS.append(action)# 批量处理success, _ = bulk(self.es, ACTIONS, index=self.index_name, raise_on_error=True)print('Performed %d actions' % success)def build_index_doc(self, in_file, body_data_keys, extra_val, sep=None):try:i = 0with open(in_file, encoding='utf-8') as fin:for line in fin:row = line.strip().split(sep)body_data = {}for idx, col in enumerate(row):body_data[body_data_keys[idx]] = colbody_data[body_data_keys[-1]] = extra_valself.es.index(index=self.index_name, doc_type=self.index_type, body=body_data)i += 1except Exception as e:print(e)def build_index_db(self, body_data_keys):db = pymysql.connect("localhost", "root", "root", "gongan", charset='utf8')cursor = db.cursor()sql = "SELECT * FROM sheet1"try:# 使用 execute()  方法执行 SQL 查询cursor.execute(sql)# 获取所有记录列表results = cursor.fetchall()new = []i = -1for row in results:i += 1body_data = {}for idx, col in enumerate(row):body_data[body_data_keys[idx]] = colself.es.index(index='gongan', doc_type='test-type', body=body_data)except:print("Error: unable to fecth data")db.close()def delete_index_data(self, id):res = self.es.delete(index=self.index_name, doc_type=self.index_type, id=id)print(res)def get_data_by_id(self, id):_searched = self.es.get(index=self.index_name, doc_type=self.index_type, id=id)print(_searched['_source'])print('--' * 50)def get_data_by_body(self, doc):_searched = self.es.search(index=self.index_name, doc_type=self.index_type, body=doc)for hit in _searched['hits']['hits']:# print hit['_source']print(hit['_source']['date'], hit['_source']['source'], hit['_source']['link'], hit['_source']['keyword'], \hit['_source']['title'])# @excute_time_logdef get_data_by_para(self, search_name, corpus_type, topk=10):body_data = {"size": topk,"query": {"bool": {"must": [{"match": {"standard_name": search_name}},{"term": {"corpus_type": corpus_type}}]}}}_searched = self.es.search(index=self.index_name, doc_type=self.index_type, body=body_data)candidates = []for item in _searched["hits"]["hits"]:candidates.append(item["_source"]["standard_name"])return candidatesif __name__ == '__main__':# """_index_name = "medical_corpus"_index_type = "doc_type_test"_index_mappings = {"mappings": {_index_type: {"properties": {# "id": {#     "type": "long",#     "index": "false"# },"icd_code": {"type": "keyword"},"standard_name": {"type": "text",# "analyzer": "standard",# "search_analyzer": "standard"},"corpus_type": {"type": "keyword"}}}}}obj = ElasticObj(_index_name, _index_type)obj = ElasticObj("test", "test_type")# obj.delete_index()# obj.create_index(_index_mappings)corpus_names = ["drug.txt", "treatment.txt", "material.txt"]body_data_keys = ['icd_code', 'standard_name', 'corpus_type']sep = Nonefor corpus_name in corpus_names:in_file = os.path.join(tmp_dir, "corpus_bm25/" + corpus_name)obj.build_index_doc(in_file, body_data_keys, corpus_name.split(".")[0], sep)# obj.get_data_by_id(1)# obj.delete_index_data(1)# obj.get_data_by_id(2)# search_name = "组织钳"# corpus_type = "material"# print(obj.get_data_by_para(search_name, corpus_type, topk=20))doc0 = {'query': {'match_all': {}}}doc = {"query": {"match": {"title": "电视"}}}doc1 = {"query": {"multi_match": {"query": "网","fields": ["source", "title"]}}}doc = {"size": 10,"query": {"bool": {"must": [{"term": {"title": "人民"}},{"terms": {"source": ["慧聪网", "人民电视"]}}]}}}obj.get_data_by_body(doc0)# """# 测试用:""" in_listin_list = [{"date": "2017-09-13","source": "慧聪网","link": "http://info.broadcast.hc360.com/2017/09/130859749974.shtml","keyword": "电视","title": "付费 电视 行业面临的转型和挑战"},{"date": "2017-09-13","source": "中国文明网","link": "http://www.wenming.cn/xj_pd/yw/201709/t20170913_4421323.shtml","keyword": "电视","title": "电视 专题片《巡视利剑》广获好评:铁腕反腐凝聚党心民心"},{"date": "2017-09-13","source": "人民电视","link": "http://tv.people.com.cn/BIG5/n1/2017/0913/c67816-29533981.html","keyword": "电视","title": "中国第21批赴刚果(金)维和部隊启程--人民 电视 --人民网"},{"date": "2017-09-13","source": "站长之家","link": "http://www.chinaz.com/news/2017/0913/804263.shtml","keyword": "电视","title": "电视 盒子 哪个牌子好? 吐血奉献三大选购秘笈"}]# obj.bulk_index_data(in_list)"""""" mapping"serial": {"type": "keyword",  # keyword不会进行分词,text会分词"index": "false"  # 不建索引},# tags可以存json格式,访问tags.content"tags": {"type": "object","properties": {"content": {"type": "keyword", "index": True},"dominant_color_name": {"type": "keyword", "index": True},"skill": {"type": "keyword", "index": True},}},"status": {"type": "long","index": True},"createTime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},"updateTime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}"""

ElasticSearch 6.x 增删改查操作汇总 及 python调用ES中文检索实例相关推荐

  1. Delphi开发中增删改查操作以及存储过程的调用方式

    1.SQL实现增删改 tryif not DTM.Conn.InTransaction thenDTM.Conn.StartTransaction;//插入制式机型Init;A('insert int ...

  2. Elasticsearch Javascript API增删改查

    查询 根据索引.类型.id进行查询: client.get({ index:'myindex', type:'mytype', id:1 },function(error, response){// ...

  3. python增删改查的框架_简单的Django框架增删改查操作

    Django之orm对MysqL数据库的增删改查操作简介: 利用Django中orm来查找数据库中的数据,对数据库进行增.删.改.查: 增:新增数据 # 操作数据库user表新增记录 # 方式1: u ...

  4. python操作mysql的增删改查_详解使用pymysql在python中对mysql的增删改查操作(综合)...

    这一次将使用pymysql来进行一次对MySQL的增删改查的全部操作,相当于对前五次的总结: 先查阅数据库: 现在编写源码进行增删改查操作,源码为: #!/usr/bin/python #coding ...

  5. 学生信息管理系统(连接数据库,面向对象的方法实现学生信息的增删改查操作)...

    ---------------------------------------------------------------------------------------------------- ...

  6. 后盾网lavarel视频项目---lavarel使用模型进行增删改查操作

    后盾网lavarel视频项目---lavarel使用模型进行增删改查操作 一.总结 一句话总结: 使用模型操作常用方法 查一条:$model=Tag::find($id); 删一条:Tag::dest ...

  7. java调用oracle删除,使用IDEA对Oracle数据库进行简单增删改查操作

    1.1 java中的数据存储技术 在java中,数据库存取技术可分为如下几类: 1.jdbc直接访问数据库 2.jdo(java data object)是java对象持久化的新的规范,也是一个用于存 ...

  8. 使用JDBC,完成对如下表的增删改查操作

    使用JDBC,完成对如下表的增删改查操作 增加操作 使用循环和随机数技巧,增加1000个数据.要求积分在0-200,注册时间均匀分布在2018年各个月份.从26个字母中随机取出3个字母作为昵称,昵称不 ...

  9. sqlite数据库的基本增删改查操作

    2019独角兽企业重金招聘Python工程师标准>>> 效果图示例 1.在清单里添加相应的权限 <uses-permission android:name="andr ...

最新文章

  1. 创成汇2019年参加创新创业大赛都能获得什么?
  2. jni返回byte[]
  3. 厉害了,如何通过双 key 来解决缓存并发问题?
  4. tcp option 结构体_基于 Kotlin 实现一个简单的 TCP 自定义协议
  5. 用Delphi开发OPC客户端工具的方法研究
  6. OpenCV——释放时错误[SourceReaderCB::~SourceReaderCB terminating async callback]解决方案
  7. 使用CXF发布WebService服务简单实例
  8. Linux必知必会的目录结构
  9. 蓝桥杯 ALGO-149 算法训练 5-2求指数
  10. iOS OpenGL ES2.0教程   Lesson03 旋转
  11. EMeeting会议预约系统软件,会议预约新方案
  12. 基于SSM的社区宠物信息管理系统
  13. 怎么清理服务器数据库日志文件,SQL SERVER 数据库日志清理图文教程
  14. vmware esxi6.5安装使用教程(图文安装)
  15. Cisco防火墙概述和产品线
  16. 卧槽,物色了一款隐秘拍摄神器,别乱用!
  17. 每天一句英语(有道)
  18. 免费在线pdf转换成word转换器
  19. MYSQL on和where的区别
  20. 从键盘上输入一个数,判断是否为素数。

热门文章

  1. Win10:USB接口异常,供电限制
  2. 计算机系的学ansys吗,学ansys软件需要什么样的电脑配置才能运行?
  3. SolidWorks装配体中零件的贴图无法显示的解决方法
  4. 应用连接mysql数据库失败_连接MySQL数据库失败频繁的原因
  5. 小操作——电脑电源及显卡调试高性能的方法
  6. 【前端学习一】【网页设计】利用HTML和CSS制作了一个小网页.
  7. c语言实现两个矩阵相乘
  8. 八核机仅888元 各价位四核八核手机推荐
  9. java处理Excel表格(EasyExcel)
  10. mac系统修改锁屏快捷键