hive入ES5.6.8

1、--建立索引

number_of_shards：分片 number_of_replicas：副本数 index.refresh_interval：缓存策略
curl -XPUT 'http://192.168.10.69:9200/zhuanlidata9' -d '{"settings":{"number_of_shards":64,"number_of_replicas":0,"index.refresh_interval": -1}}'

2、--创建mapping

curl -X PUT '192.168.10.69:9200/zhuanlidata9/_mapping/zhuanliquanwen' -d '
{
"properties":{
"uuid":{"type":"keyword"},
"filename":{"type":"keyword"},
"lang":{"type":"keyword"},
"country":{"type":"keyword"},
"doc_number":{"type":"keyword"},
"kind":{"type":"keyword"},
"date":{"type":"keyword"},
"gazette_num":{"type":"keyword"},
"gazette_date":{"type":"keyword"},
"appl_type":{"type":"keyword"},
"appl_country":{"type":"keyword"},
"appl_doc_number":{"type":"keyword"},
"appl_date":{"type":"keyword"},
"text":{"type":"keyword"},
"invention_title":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_max_word"},
"assignees":{"type":"text"},
"assignees_address":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_max_word"},
"abstracts":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_max_word"},
"applicants":{"type":"text"},
"applicants_address":{"type":"text"},
"inventors":{"type":"text"},
"agents":{"type":"text"},
"agency":{"type":"text"},
"descriptions":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_max_word"},
"claims":{"type":"text","analyzer":"ik_max_word","search_analyzer":"ik_max_word"},
"cn_related_publication":{"type":"text"},
"cn_publication_referen":{"type":"text"},
"cn_related_document":{"type":"text"},
"priority_claims":{"type":"text"},
"reference":{"type":"text"},
"searcher":{"type":"text"}
}
}'

3、--创建hive映射ES表

--11.31上输入"hive" 然后执行如下命令。
hive
--添加jar包
add jar /data/2/zly/elasticsearch-hadoop-5.6.8/dist/elasticsearch-hadoop-5.6.8.jar;
--建立映射表
CREATE EXTERNAL TABLE test.zhuanlidata9 (
uuid string,
filename string ,
lang string ,
country string ,
doc_number string ,
kind string ,
date string ,
gazette_num string ,
gazette_date string ,
appl_type string ,
appl_country string ,
appl_doc_number string ,
appl_date string ,
text string ,
invention_title string ,
assignees string ,
assignees_address string ,
abstracts string ,
applicants string ,
applicants_address string ,
inventors string ,
agents string ,
agency string ,
descriptions string ,
claims string ,
cn_related_publication string ,
cn_publication_referen string ,
Cn_related_document string ,
priority_claims string ,
Reference string ,
Searcher string
)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES(
'es.resource' = 'zhuanlidata9/zhuanliquanwen',
'es.nodes'='192.168.10.69,192.168.10.70,192.168.10.71',
'es.port'='9200',
'es.mapping.id' = 'uuid',
'es.write.operation'='upsert'
);
--退出hive
exit;

4、--将数据load进hive映射es表/*在11.31上修改 /data/2/zly/test_hive_es.sh 的循环次数以及表名/*

--{1..18}循环次数  mapreduce.job.running.map.limit 线程数
#!/bin/bash
for i in {1..18}
do
hive -e "
add jar /data/2/zly/elasticsearch-hadoop-5.6.8/dist/elasticsearch-hadoop-5.6.8.jar;
set mapreduce.job.running.map.limit=50;
insert into test.zhuanlidata9
select
regexp_replace(reflect(\"java.util.UUID\", \"randomUUID\"), \"-\", \"\") uuid,
filename,
lang,
country,
doc_number,
kind,
case when appl_date like '2%' then appl_date else '' end date ,
gazette_num,
gazette_date,
appl_type,
appl_country,
appl_doc_number,
case when appl_date like '2%' then appl_date else '' end appl_date ,
text,
invention_title,
assignees,
assignees_address,
abstracts,
applicants,
applicants_address,
inventors,
agents,
agency,
descriptions,
claims,
cn_related_publication,
cn_publication_referen,
Cn_related_document,
priority_claims,
Reference,
Searcher
from report_statistics.zhuanli_zlqw;
"
done

转载于:https://www.cnblogs.com/oneby/p/9187776.html

hive入ES5.6.8相关推荐

hive时金额为科学记数法转为普通的数字
1.改变入湖时的字段类型在hive入湖的时候有时候遇到源数据库中科学计数法,如:1.2e+007,1e+006,等在hive中运算不识别,我试过入湖时改变字段的字段类型,有一定效果,我改成bigin ...
史上最详细大数据基础知识
# **1___Hive** ## 0.0.hive基本命令 ```sql [1.分区表] --创建分区 alter table table_name add partition(分区字段='分区值' ...
大数据Hive其实一点都不难，从入坑到放弃？不存在的
Hive 首先,我们来介绍一下什么是Hive.有些人不仅会想,Hive不就是写SQL的吗.没错,Hive和SQL的语法结构很像,其实,二者没有多大区别,甚至可以这样讲,Hive就是写SQL.但是,问题 ...
数仓回刷历史数据--hive设置动态分区，并向动态分区内刷入历史数据
数仓回刷历史数据–hive设置动态分区,并向动态分区内刷入历史数据内容目录数仓回刷历史数据--hive设置动态分区,并向动态分区内刷入历史数据一.问题介绍二.问题解决思路 1 . 解决复杂逻辑 ...
Azkaban任务调度（使用带有依赖的任务调度）【mapreduce数据清洗，数据入hive库，kylin预编译、数据分析】
1 Azkaban任务调度管理 1.1 执行任务的脚本编写和说明在做任务调度的过程中,要编写相应的脚本. -rwxrwxrwx 1 root root 809 6月 12 19:52 auto-ex ...
hive sqoop 分区导入_Sqoop概述及shell操作
特别说明:该专栏文章均来源自微信公众号<大数据实战演练>,欢迎关注! 一.Sqoop概述 1. 产生背景基于传统关系型数据库的稳定性,还是有很多企业将数据存储在关系型数据库中:早期由于工 ...
Hadoop集群的基本操作（四：Hive的基本操作）
实验目的要求目的: (1)掌握数据仓库工具Hive的使用: 要求: 掌握数据仓库Hive的使用: 能够正常操作数据库.表.数据: 实验环境五台独立PC式虚拟机: 主机之间有有效的网络连接 ...
HIVE QL 杂记
最近要处理用户访问日志,需要从HIVE中取数据,写了一些HIVE QL,有一点小感想,记录在此. 1. 临时表在HIVE中进行多表连接时,可以给一些临时表命名,这样有助于理清查询语句之间的逻辑,格式 ...
Hive 高频考点讲解
1 Hive Hive 是 FaceBook 开源的一款基于 Hadoop 数据仓库工具,它可以将结构化的数据文件映射为一张表,并提供类SQL查询功能. The Apache Hive ™ data ...

hive入ES5.6.8

hive入ES5.6.8相关推荐

最新文章

热门文章