ElasticSearch 6.x 增删改查操作汇总 及 python调用ES中文检索实例
文章目录
- Error汇总
- 1. 由于Elasticsearch可以输入且执行脚本,为了系统安全,不允许使用root启动
- 2. 外部无法访问
- 3. 解决 max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
- 4. sysctl: setting key "vm.max_map_count": Read-only file system
- 5. the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
- 一、安装及插件
- 1. 下载
- 2. 解压
- 3. 运行
- 4. 修改profile,环境变量
- 5. 设置外网访问
- 6. 后台运行
- 7. 问题汇总
- 8. 插件安装
- 8.1 kibana
- 8.2 ik分词
- 二、基本概念
- 三、新建和删除 Index
- 四、中文分词设置
- 五、数据操作
- 六、数据查询
- 6.1 返回所有记录
- 6.2 全文搜索
- 6.3 逻辑运算
- 6.4 复杂查询实例
- 七、python简单调用,实现中文分词检索功能
发现好久没更新博客了,由于实习公司用到了ES,这两天一直在研究ES,正好记录一下学习的过程,方便以后查阅,后续可能汇总结一下ES的原理,并对比分析一下Lucene,本节主要是对增删改差等基本操作的记录,以及通过python调用的实例,过年新开工,今年继续加油,希望能够斩获秋招!
Error汇总
1. 由于Elasticsearch可以输入且执行脚本,为了系统安全,不允许使用root启动
useradd -m work
su work
2. 外部无法访问
vim elasticsearch.yml
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# For more information, consult the network module documentation.
#
3. 解决 max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
// 修改下面的文件 里面是一些内核参数
vi /etc/sysctl.conf //添加以下配置
vm.max_map_count=655360
保存,然后:
sysctl -p
//-p 从指定的文件加载系统参数,如不指定即从/etc/sysctl.conf中加载
会提示错误:
4. sysctl: setting key “vm.max_map_count”: Read-only file system
这是因为Docker的base image做的很精简,甚至都没有init进程,原本在OS启动时执行生效系统变量的过程(sysctl -p)也给省略了,导致这些系统变量依旧保留着kernel默认值,这时候需要我们在容器启动时加入 –privileged 来获取修改系统参数的权限
这里我选择的是修改宿主机本身的配置文件,然后重新启动镜像,也能解决问题,退出容器,返回到宿主机
修改vm.max_map_count 可以通过命令行修改,但是在机器重启时会失效,所以通过修改配置文件来解决问题
命令行修改办法:
sudo sysctl -w vm.max_map_count=655360
5. the default discovery settings are unsuitable for production use; at least one of [discovery.seed_hosts, discovery.seed_providers, cluster.initial_master_nodes] must be configured
elasticsearch.yml文件node.name: node-1 前面的#打开cluster.initial_master_nodes: ["node-1"] 这里一定要这样设置,我就是这里没有这样设置出问题的,弄了好久
一、安装及插件
1. 下载
在官网https://www.elastic.co/downloads/elasticsearch下载安装包,本文下载的是elasticsearch-6.4.3.tar.gz,将下载的安装包上传至服务器上。
2. 解压
tar zxvf elasticsearch-6.4.3.tar.gz -C /usr/local
3. 运行
cd /usr/local/elasticsearch-6.4.3/./bin/elasticsearch
4. 修改profile,环境变量
vim ~/.bash_profile
export PATH=$PATH:/usr/local/elasticsearch-6.4.3/bin
# 以后直接 elasticsearch 就可以运行了
5. 设置外网访问
cd /usr/local/elasticsearch-6.4.3/config
vim elasticsearch.yml
# 增加配置:
network.host: 0.0.0.0
在浏览器访问:http://IP地址:9200/,出现以下配置表示安装成功。
name "mQQdX2p"
cluster_name "elasticsearch"
cluster_uuid "em-fVPdIRLeG4BZAkdRarA"
version
number "6.4.3"
build_flavor "default"
build_type "tar"
build_hash "fe40335"
build_date "2018-10-30T23:17:19.084789Z"
build_snapshot false
lucene_version "7.4.0"
minimum_wire_compatibility_version "5.6.0"
minimum_index_compatibility_version "5.0.0"
tagline "You Know, for Search"
6. 后台运行
elasticsearch -d
7. 问题汇总
ERROR: [1] bootstrap checks failed
[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-11-23T10:00:28,057][INFO ][o.e.n.Node ] [mQQdX2p] stopping ...
[2018-11-23T10:00:28,100][INFO ][o.e.n.Node ] [mQQdX2p] stopped
[2018-11-23T10:00:28,101][INFO ][o.e.n.Node ] [mQQdX2p] closing ...
[2018-11-23T10:00:28,111][INFO ][o.e.n.Node ] [mQQdX2p] closed
解决方案:
sudo su
vim /etc/sysctl.conf
# 添加下面配置:
vm.max_map_count=655360
# 配置生效:
sysctl -p
8. 插件安装
8.1 kibana
$ curl -O https://artifacts.elastic.co/downloads/kibana/kibana-5.6.9-linux-x86_64.tar.gz
$ sha1sum kibana-5.6.9-linux-x86_64.tar.gz
$ tar -xzf kibana-5.6.9-linux-x86_64.tar.gz
$ cd kibana-5.6.9-linux-x86_64/
#启动
$ ./bin/kibana
$ nohup ./bin/kibana & #在后台运行#停止
$ ps -ef | grep node # 查找kibana的pid(./bin/../node/bin/node --no-warnings ./bin/../src/cli)
$ kill pid
8.2 ik分词
https://github.com/medcl/elasticsearch-analysis-ik
二、基本概念
全文搜索引擎 Elasticsearch 入门教程
Elasticsearch—基础介绍及索引原理分析
2.1 Index
下面的命令可以查看当前节点的所有 Index
curl -X GET 'http://localhost:9200/_cat/indices?v'
2.2 Type
下面的命令可以列出每个 Index 所包含的 Type。
$ curl 'localhost:9200/_mapping?pretty=true'
三、新建和删除 Index
curl -X PUT 'localhost:9200/weather'
curl -X DELETE 'localhost:9200/weather'
四、中文分词设置
$ curl -H "Content-Type: application/json" -X PUT 'localhost:9200/accounts' -d '
{"mappings": {"person": {"properties": {"user": {"type": "text","analyzer": "smartcn","search_analyzer": "smartcn"},"title": {"type": "text","analyzer": "smartcn","search_analyzer": "smartcn"},"desc": {"type": "text","analyzer": "smartcn","search_analyzer": "smartcn"}}}}
}'
五、数据操作
5.1 新增记录
$ curl -H "Content-Type: application/json" -X PUT 'localhost:9200/accounts/person/1' -d '
{"user": "张三","title": "工程师","desc": "数据库管理"
}' $ curl -H "Content-Type: application/json" -X POST 'localhost:9200/accounts/person' -d '
{"user": "李四","title": "工程师","desc": "系统管理"
}'curl -X DELETE 'localhost:9200/accounts/person/1'curl -H "Content-Type: application/json" -X PUT 'localhost:9200/accounts/person/1' -d '
{"user" : "张三","title" : "工程师","desc" : "数据库管理,软件开发"
}'
六、数据查询
6.1 返回所有记录
使用 GET 方法,直接请求/Index/Type/_search,就会返回所有记录。
curl 'localhost:9200/accounts/person/_search'
6.2 全文搜索
Elastic 的查询非常特别,使用自己的查询语法,要求 GET 请求带有数据体。
$ curl -H "Content-Type: application/json" 'localhost:9200/accounts/person/_search' -d '
{"query" : { "match" : { "desc" : "软件" }},"from": 1,"size": 1
}'
size指定,每次只返回一条结果。还可以通过from字段,指定位移。
6.3 逻辑运算
如果有多个搜索关键字, Elastic 认为它们是or关系。
$ curl -H "Content-Type: application/json" localhost:9200/accounts/person/_search' -d '
{"query" : { "match" : { "desc" : "软件 系统" }}
}'
上面代码搜索的是软件 or 系统。
如果要执行多个关键词的and搜索,必须使用布尔查询。
$ curl -H "Content-Type: application/json" localhost:9200/accounts/person/_search' -d '
{"query": {"bool": {"must": [{ "match": { "desc": "软件" } },{ "match": { "desc": "系统" } }]}}
}'
6.4 复杂查询实例
- 查询时间戳>某个时间并且shopId为100000002和100000006的在SQL中是这样的:
select * from shopsOrder where timestamp>1523671189000 and shopid in ("100000002","100000006")
- 在ES中就得这么查:
POST:http://192.168.0.1:9200/shopsinfo/shopsOrder/_search
{"size":20,"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1523671189000}}},{"terms":{"shopid":["100000002","100000006"]}}]}}
}
- 统计的话ES是以aggs作为参数,全称应该叫做Aggregation,比如接着刚才的查询我想计算出结果的amount总额是多少就是类似SQL中的
select sum(amount)query_amount from shopsOrder where timestamp>1523671189000 and shopid in ("100000002","100000006")
- 在ES中就得这么查
{"aggs":{"query_amount":{"sum":{"field":"amount"}}},"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1523671189000}}},{"terms":{"shopid":["100000002","100000006"]}}]}}
}
- 按天分组进行统计查询SQL中的提现是这样的:
select createdate,sum(amount)query_amount from shopsOrder where timestamp>1523671189000 and shopid in ("100000002","100000006")
group by createdate order by createdate
- 在ES中是这样的:
{"size":0,"aggs":{"orderDate":{"terms":{"field":"createdate","order":{"_term":"asc"}},"aggs":{"query_amount":{"sum":{"field":"amount"}}}}},"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1523671189000}}},{"terms":{"shopid":["100000002","100000006"]}}]}}
}
- 查询结果为
......
"aggregations": {"orderDate": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 99,"buckets": [......{"key": "20180415","doc_count": 8,"query_amount": {"value": 31632}},{"key": "20180417","doc_count": 3,"query_amount": {"value": 21401}},{"key": "20180418","doc_count": 2,"query_amount": {"value": 2333}}......]}
}
- buckets中就是查询的结果,key为按我createdate分组后的值,doc_count类似count,query_amount为sum后的值。至于我的参数里面有一个size:0是因为我不需要具体的记录就是hits,所以这里传0
- 最后我们来个更复杂的1、统计所有的总额;2、先按paymentType支付方式分组统计amount总额,并且每个支付方式中再按天分组统计每天的amount总额
{"size":0,"aggs":{"amount":{"sum":{"field":"amount"}},"paymenttype":{"terms":{"field":"paymentType"},"aggs":{"query_amount":{"sum":{"field":"amount"}},"payment_date":{"terms":{"field":"createdate"},"aggs":{"query_amount":{"sum":{"field":"amount"}}}}}}},"query":{"bool":{"must":[{"range":{"timestamp":{"gte":1523671189000}}},{"terms":{"shopid":["100000002","100000006"]}}]}}
}
- 查询结果为:
......
"amount": {"value": 684854
},
"paymenttype":{......"buckets": [{"key": "wechatpay","doc_count": 73,"amount": {"value": 351142},"payment_date": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 25,"buckets": [......{"key": "20180415","doc_count": 6,"amount": {"value": 29032}},{"key": "20180425","doc_count": 6,"amount": {"value": 21592}}......]}},{"key": "alipay","doc_count": 67,"amount": {"value": 333712},"payment_date": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 23,"buckets": [......{"key": "20180506","doc_count": 8,"amount": {"value": 38280}},{"key": "20180426","doc_count": 6,"amount": {"value": 41052}}......]}}]
}
七、python简单调用,实现中文分词检索功能
# -*- coding:UTF-8 -*-
from elasticsearch import Elasticsearch
import pymysql
from utils.util_tools import *
from conf import tmp_dir
import os
from elasticsearch.helpers import bulkclass ElasticObj:def __init__(self, index_name, index_type, ip="127.0.0.1"):''':param index_name: 索引名称:param index_type: 索引类型'''self.index_name = index_nameself.index_type = index_type# 用户名 密码self.es = Elasticsearch([ip])# self.es = Elasticsearch([ip],http_auth=('elastic', 'password'),port=9200)def create_index(self, index_mappings):# 创建索引if self.es.indices.exists(index=self.index_name) is not True:res = self.es.indices.create(index=self.index_name, body=index_mappings, ignore=400)print(res)def delete_index(self):result = self.es.indices.delete(index=self.index_name)print(result)def bulk_index_data(self, in_list):'''用bulk将批量数据存储到es:return:'''ACTIONS = []i = 1for line in in_list:action = {"_index": self.index_name,"_type": self.index_type,"_id": i, # _id 也可以默认生成,不赋值"_source": {"date": line['date'],"source": line['source'],"link": line['link'],"keyword": line['keyword'],"title": line['title']}}i += 1ACTIONS.append(action)# 批量处理success, _ = bulk(self.es, ACTIONS, index=self.index_name, raise_on_error=True)print('Performed %d actions' % success)def build_index_doc(self, in_file, body_data_keys, extra_val, sep=None):try:i = 0with open(in_file, encoding='utf-8') as fin:for line in fin:row = line.strip().split(sep)body_data = {}for idx, col in enumerate(row):body_data[body_data_keys[idx]] = colbody_data[body_data_keys[-1]] = extra_valself.es.index(index=self.index_name, doc_type=self.index_type, body=body_data)i += 1except Exception as e:print(e)def build_index_db(self, body_data_keys):db = pymysql.connect("localhost", "root", "root", "gongan", charset='utf8')cursor = db.cursor()sql = "SELECT * FROM sheet1"try:# 使用 execute() 方法执行 SQL 查询cursor.execute(sql)# 获取所有记录列表results = cursor.fetchall()new = []i = -1for row in results:i += 1body_data = {}for idx, col in enumerate(row):body_data[body_data_keys[idx]] = colself.es.index(index='gongan', doc_type='test-type', body=body_data)except:print("Error: unable to fecth data")db.close()def delete_index_data(self, id):res = self.es.delete(index=self.index_name, doc_type=self.index_type, id=id)print(res)def get_data_by_id(self, id):_searched = self.es.get(index=self.index_name, doc_type=self.index_type, id=id)print(_searched['_source'])print('--' * 50)def get_data_by_body(self, doc):_searched = self.es.search(index=self.index_name, doc_type=self.index_type, body=doc)for hit in _searched['hits']['hits']:# print hit['_source']print(hit['_source']['date'], hit['_source']['source'], hit['_source']['link'], hit['_source']['keyword'], \hit['_source']['title'])# @excute_time_logdef get_data_by_para(self, search_name, corpus_type, topk=10):body_data = {"size": topk,"query": {"bool": {"must": [{"match": {"standard_name": search_name}},{"term": {"corpus_type": corpus_type}}]}}}_searched = self.es.search(index=self.index_name, doc_type=self.index_type, body=body_data)candidates = []for item in _searched["hits"]["hits"]:candidates.append(item["_source"]["standard_name"])return candidatesif __name__ == '__main__':# """_index_name = "medical_corpus"_index_type = "doc_type_test"_index_mappings = {"mappings": {_index_type: {"properties": {# "id": {# "type": "long",# "index": "false"# },"icd_code": {"type": "keyword"},"standard_name": {"type": "text",# "analyzer": "standard",# "search_analyzer": "standard"},"corpus_type": {"type": "keyword"}}}}}obj = ElasticObj(_index_name, _index_type)obj = ElasticObj("test", "test_type")# obj.delete_index()# obj.create_index(_index_mappings)corpus_names = ["drug.txt", "treatment.txt", "material.txt"]body_data_keys = ['icd_code', 'standard_name', 'corpus_type']sep = Nonefor corpus_name in corpus_names:in_file = os.path.join(tmp_dir, "corpus_bm25/" + corpus_name)obj.build_index_doc(in_file, body_data_keys, corpus_name.split(".")[0], sep)# obj.get_data_by_id(1)# obj.delete_index_data(1)# obj.get_data_by_id(2)# search_name = "组织钳"# corpus_type = "material"# print(obj.get_data_by_para(search_name, corpus_type, topk=20))doc0 = {'query': {'match_all': {}}}doc = {"query": {"match": {"title": "电视"}}}doc1 = {"query": {"multi_match": {"query": "网","fields": ["source", "title"]}}}doc = {"size": 10,"query": {"bool": {"must": [{"term": {"title": "人民"}},{"terms": {"source": ["慧聪网", "人民电视"]}}]}}}obj.get_data_by_body(doc0)# """# 测试用:""" in_listin_list = [{"date": "2017-09-13","source": "慧聪网","link": "http://info.broadcast.hc360.com/2017/09/130859749974.shtml","keyword": "电视","title": "付费 电视 行业面临的转型和挑战"},{"date": "2017-09-13","source": "中国文明网","link": "http://www.wenming.cn/xj_pd/yw/201709/t20170913_4421323.shtml","keyword": "电视","title": "电视 专题片《巡视利剑》广获好评:铁腕反腐凝聚党心民心"},{"date": "2017-09-13","source": "人民电视","link": "http://tv.people.com.cn/BIG5/n1/2017/0913/c67816-29533981.html","keyword": "电视","title": "中国第21批赴刚果(金)维和部隊启程--人民 电视 --人民网"},{"date": "2017-09-13","source": "站长之家","link": "http://www.chinaz.com/news/2017/0913/804263.shtml","keyword": "电视","title": "电视 盒子 哪个牌子好? 吐血奉献三大选购秘笈"}]# obj.bulk_index_data(in_list)"""""" mapping"serial": {"type": "keyword", # keyword不会进行分词,text会分词"index": "false" # 不建索引},# tags可以存json格式,访问tags.content"tags": {"type": "object","properties": {"content": {"type": "keyword", "index": True},"dominant_color_name": {"type": "keyword", "index": True},"skill": {"type": "keyword", "index": True},}},"status": {"type": "long","index": True},"createTime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"},"updateTime": {"type": "date","format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"}"""
ElasticSearch 6.x 增删改查操作汇总 及 python调用ES中文检索实例相关推荐
- Delphi开发中增删改查操作以及存储过程的调用方式
1.SQL实现增删改 tryif not DTM.Conn.InTransaction thenDTM.Conn.StartTransaction;//插入制式机型Init;A('insert int ...
- Elasticsearch Javascript API增删改查
查询 根据索引.类型.id进行查询: client.get({ index:'myindex', type:'mytype', id:1 },function(error, response){// ...
- python增删改查的框架_简单的Django框架增删改查操作
Django之orm对MysqL数据库的增删改查操作简介: 利用Django中orm来查找数据库中的数据,对数据库进行增.删.改.查: 增:新增数据 # 操作数据库user表新增记录 # 方式1: u ...
- python操作mysql的增删改查_详解使用pymysql在python中对mysql的增删改查操作(综合)...
这一次将使用pymysql来进行一次对MySQL的增删改查的全部操作,相当于对前五次的总结: 先查阅数据库: 现在编写源码进行增删改查操作,源码为: #!/usr/bin/python #coding ...
- 学生信息管理系统(连接数据库,面向对象的方法实现学生信息的增删改查操作)...
---------------------------------------------------------------------------------------------------- ...
- 后盾网lavarel视频项目---lavarel使用模型进行增删改查操作
后盾网lavarel视频项目---lavarel使用模型进行增删改查操作 一.总结 一句话总结: 使用模型操作常用方法 查一条:$model=Tag::find($id); 删一条:Tag::dest ...
- java调用oracle删除,使用IDEA对Oracle数据库进行简单增删改查操作
1.1 java中的数据存储技术 在java中,数据库存取技术可分为如下几类: 1.jdbc直接访问数据库 2.jdo(java data object)是java对象持久化的新的规范,也是一个用于存 ...
- 使用JDBC,完成对如下表的增删改查操作
使用JDBC,完成对如下表的增删改查操作 增加操作 使用循环和随机数技巧,增加1000个数据.要求积分在0-200,注册时间均匀分布在2018年各个月份.从26个字母中随机取出3个字母作为昵称,昵称不 ...
- sqlite数据库的基本增删改查操作
2019独角兽企业重金招聘Python工程师标准>>> 效果图示例 1.在清单里添加相应的权限 <uses-permission android:name="andr ...
最新文章
- 创成汇2019年参加创新创业大赛都能获得什么?
- jni返回byte[]
- 厉害了,如何通过双 key 来解决缓存并发问题?
- tcp option 结构体_基于 Kotlin 实现一个简单的 TCP 自定义协议
- 用Delphi开发OPC客户端工具的方法研究
- OpenCV——释放时错误[SourceReaderCB::~SourceReaderCB terminating async callback]解决方案
- 使用CXF发布WebService服务简单实例
- Linux必知必会的目录结构
- 蓝桥杯 ALGO-149 算法训练 5-2求指数
- iOS OpenGL ES2.0教程 Lesson03 旋转
- EMeeting会议预约系统软件,会议预约新方案
- 基于SSM的社区宠物信息管理系统
- 怎么清理服务器数据库日志文件,SQL SERVER 数据库日志清理图文教程
- vmware esxi6.5安装使用教程(图文安装)
- Cisco防火墙概述和产品线
- 卧槽,物色了一款隐秘拍摄神器,别乱用!
- 每天一句英语(有道)
- 免费在线pdf转换成word转换器
- MYSQL on和where的区别
- 从键盘上输入一个数,判断是否为素数。