第一步:爬取指定数据,去重复,并存储到mysql。

Springboot+ssm+定时(定时器)爬取+redis去重+mybatis保存。

详情请看爬虫 spider09——爬取指定数据,去重复,并存储到mysql

https://blog.csdn.net/qq_41946557/article/details/102573282


第二步:搭建elk平台,把mysql中数据导入es中

第三步:开发服务提供者(8001),读取es中数据,提供关键字查询功能。

先说第二步:搭建elk平台,把mysql中数据导入es中

通过第一步我们已经把数据爬取到数据库mysql中

此时我们要做的是去整合elasticsearch(把mysql中数据导入es中)

0.下载logstash

https://artifacts.elastic.co/downloads/logstash/logstash-7.3.2.zip

  1. 解压的指定目录
  2. 在bin的同级目录中创建存放mysql驱动jar的文件夹mysql
  3. 拷贝mysql驱动jar包到3中的mysql文件夹中
  4. 在config或bin下创建配置文件logstash.conf【话虽这样说,不过我在config下并不成功】
  5. logstash.conf内容,详见配置文件

input {

# 多张表的同步只需要设置多个jdbc的模块就行了

jdbc {

# mysql 数据库链接,shop为数据库名

jdbc_connection_string => "jdbc:mysql://localhost:3306/spider?useUnicode=true&characterEncoding=utf8&serverTimezone=UTC"

# 用户名和密码

jdbc_user => "root"

jdbc_password => "123456"

# 驱动

jdbc_driver_library => "D:/es/logstash-7.3.2/mysql/mysql-connector-java-5.1.6-bin.jar"

# 驱动类名

jdbc_driver_class => "com.mysql.jdbc.Driver"

jdbc_validate_connection => "true"

#是否分页

jdbc_paging_enabled => "true"

jdbc_page_size => "1000"

#时区

jdbc_default_timezone => "Asia/Shanghai"

#直接执行sql语句

statement => "select * from news where id >=:sql_last_value order by id asc"

# 执行的sql 文件路径+名称

# statement_filepath => "/hw/elasticsearch/logstash-6.2.4/bin/test.sql"

#设置监听间隔  各字段含义(由左至右)分、时、天、月、年,全部为*默认含义为每分钟都更新

schedule => "* * * * *"

#每隔10分钟执行一次

#schedule => "*/10 * * * *"

#是否记录上次执行结果, 如果为真,将会把上次执行到的 tracking_column 字段的值记录下来,保存到last_run_metadata_path

record_last_run => true

#记录最新的同步的offset信息

last_run_metadata_path => "D:/es/logstash-7.3.2/logs/last_id.txt"

use_column_value => true

#递增字段的类型,numeric 表示数值类型, timestamp 表示时间戳类型

tracking_column_type => "numeric"

tracking_column => "id"

clean_run => false

# 索引类型

#type => "jdbc"

}

}

output {

elasticsearch {

#es的ip和端口

hosts => ["http://localhost:9200"]

#ES索引名称(自己定义的)

index => "spider"

#文档类型

document_type => "_doc"

#设置数据的id为数据库中的字段

document_id => "%{id}"

}

stdout {

codec => json_lines

}

}

6.启动logstash,进入bin文件夹下,执行:logstash -f logstash.conf

【注】爬取前需启动es

启动kibana检验:


第三步:开发服务提供者(8001),读取es中数据,提供关键字查询功能。

1、修改pom.xml

 <dependency><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId><version>7.3.2</version></dependency><dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-client</artifactId><version>7.3.2</version></dependency><dependency><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-high-level-client</artifactId><version>7.3.2</version><exclusions><exclusion><groupId>org.elasticsearch</groupId><artifactId>elasticsearch</artifactId></exclusion><exclusion><groupId>org.elasticsearch.client</groupId><artifactId>elasticsearch-rest-client</artifactId></exclusion></exclusions></dependency><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency>

2、修改yml

#elasticSearch配置
elasticSearch:hostlist: 127.0.0.1:9200client:connectNum: 10connectPerRoute: 50

3、导入es访问相关工具类

从上到下代码演示:

EsEntity

package com.henu.es.bean;public final class EsEntity<T> {//文档idprivate String id;//一条文档private T data;public EsEntity() {}public EsEntity(String id, T data) {this.data = data;this.id = id;}public String getId() {return id;}public void setId(String id) {this.id = id;}public T getData() {return data;}public void setData(T data) {this.data = data;}
}

EsPage

package com.henu.es.bean;import lombok.Getter;
import lombok.NoArgsConstructor;
import lombok.Setter;
import lombok.ToString;
import java.util.List;
import java.util.Map;@Getter
@Setter
@NoArgsConstructor
@ToString
public class EsPage {/*** 当前页*/private int currentPage;/*** 每页显示多少条*/private int pageSize;/*** 总记录数*/private int recordCount;/*** 本页的数据列表*/private List<Map<String, Object>> recordList;/*** 总页数*/private int pageCount;/*** 页码列表的开始索引(包含)*/private int beginPageIndex;/*** 页码列表的结束索引(包含)*/private int endPageIndex;/*** 只接受前4个必要的属性,会自动的计算出其他3个属性的值* @param currentPage* @param pageSize* @param recordCount* @param recordList*/public EsPage(int currentPage, int pageSize, int recordCount, List<Map<String, Object>> recordList) {this.currentPage = currentPage;this.pageSize = pageSize;this.recordCount = recordCount;this.recordList = recordList;// 计算总页码pageCount = (recordCount + pageSize - 1) / pageSize;// 计算 beginPageIndex 和 endPageIndex// 总页数不多于10页,则全部显示if (pageCount <= 10) {beginPageIndex = 1;endPageIndex = pageCount;}// 总页数多于10页,则显示当前页附近的共10个页码else {// 当前页附近的共10个页码(前4个 + 当前页 + 后5个)beginPageIndex = currentPage - 4;endPageIndex = currentPage + 5;// 当前面的页码不足4个时,则显示前10个页码if (beginPageIndex < 1) {beginPageIndex = 1;endPageIndex = 10;}// 当后面的页码不足5个时,则显示后10个页码if (endPageIndex > pageCount) {endPageIndex = pageCount;beginPageIndex = pageCount - 10 + 1;}}}
}

User

package com.henu.es.bean;import lombok.Data;
/*** userRepository操作的bean*/
@Data
public class User {private Integer id;private String name;private String address;private Integer sex;
}

ElasticsearchRestClient

package com.henu.es.client;import com.henu.es.factory.ESClientSpringFactory;
import lombok.Getter;
import lombok.Setter;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Scope;@Configuration
@Getter
@Setter
@ComponentScan(basePackageClasses= ESClientSpringFactory.class)
public class ElasticsearchRestClient {private  final Logger LOGGER = LoggerFactory.getLogger(ElasticsearchRestClient.class);@Value("${elasticSearch.client.connectNum}")private Integer connectNum;@Value("${elasticSearch.client.connectPerRoute}")private Integer connectPerRoute;@Value("${elasticSearch.hostlist}")private String hostlist;@Beanpublic HttpHost[] httpHost(){//解析hostlist配置信息String[] split = hostlist.split(",");//创建HttpHost数组,其中存放es主机和端口的配置信息HttpHost[] httpHostArray = new HttpHost[split.length];for(int i=0;i<split.length;i++){String item = split[i];httpHostArray[i] = new HttpHost(item.split(":")[0], Integer.parseInt(item.split(":")[1]), "http");}LOGGER.info("init HttpHost");return httpHostArray;}@Bean(initMethod="init",destroyMethod="close")public ESClientSpringFactory getFactory(){LOGGER.info("ESClientSpringFactory 初始化");return ESClientSpringFactory.build(httpHost(), connectNum, connectPerRoute);}@Bean@Scope("singleton")public RestClient getRestClient(){LOGGER.info("RestClient 初始化");return getFactory().getClient();}@Bean(name = "restHighLevelClient")@Scope("singleton")public RestHighLevelClient getRHLClient(){LOGGER.info("RestHighLevelClient 初始化");return getFactory().getRhlClient();}
}

SpiderController

package com.henu.es.controller;import com.henu.es.bean.EsPage;
import com.henu.es.util.ElasticsearchUtil;
import org.apache.ibatis.annotations.Param;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.ResponseBody;/*** @author George* @description**/
@Controller
public class SpiderController {@RequestMapping("/search")@ResponseBodypublic String search(@RequestParam(value = "keyword")String keyword, @RequestParam(value="currentPage",defaultValue = "1") int currentPage, @RequestParam(value="pageSize",defaultValue = "10") int pageSize){System.out.println("好好学习,天天向上:"+keyword);QueryBuilder queryBuilder = QueryBuilders.matchQuery("intro", keyword);EsPage esPage = ElasticsearchUtil.searchDataPage("spider", currentPage, pageSize, queryBuilder, "id,appid,title,intro,url,source,updatetime", "id", "intro");return esPage.toString();}
}

ESClientSpringFactory

package com.henu.es.factory;import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;import java.io.IOException;
import java.util.Arrays;public class ESClientSpringFactory {private  final Logger LOGGER = LoggerFactory.getLogger(ESClientSpringFactory.class);public static int CONNECT_TIMEOUT_MILLIS = 1000;public static int SOCKET_TIMEOUT_MILLIS = 30000;public static int CONNECTION_REQUEST_TIMEOUT_MILLIS = 500;public static int MAX_CONN_PER_ROUTE = 10;public static int MAX_CONN_TOTAL = 30;private static HttpHost[] HTTP_HOST;private RestClientBuilder builder;private RestClient restClient;private RestHighLevelClient restHighLevelClient;private static ESClientSpringFactory esClientSpringFactory = new ESClientSpringFactory();private ESClientSpringFactory(){}public static ESClientSpringFactory build(HttpHost[] httpHostArray,Integer maxConnectNum, Integer maxConnectPerRoute){HTTP_HOST = httpHostArray;MAX_CONN_TOTAL = maxConnectNum;MAX_CONN_PER_ROUTE = maxConnectPerRoute;return  esClientSpringFactory;}public static ESClientSpringFactory build(HttpHost[] httpHostArray,Integer connectTimeOut, Integer socketTimeOut,Integer connectionRequestTime,Integer maxConnectNum, Integer maxConnectPerRoute){HTTP_HOST = httpHostArray;CONNECT_TIMEOUT_MILLIS = connectTimeOut;SOCKET_TIMEOUT_MILLIS = socketTimeOut;CONNECTION_REQUEST_TIMEOUT_MILLIS = connectionRequestTime;MAX_CONN_TOTAL = maxConnectNum;MAX_CONN_PER_ROUTE = maxConnectPerRoute;return  esClientSpringFactory;}public void init(){builder = RestClient.builder(HTTP_HOST);setConnectTimeOutConfig();setMutiConnectConfig();restClient = builder.build();restHighLevelClient = new RestHighLevelClient(builder);LOGGER.info("init factory" + Arrays.toString(HTTP_HOST));}/*** 配置连接时间延时* */public void setConnectTimeOutConfig(){builder.setRequestConfigCallback(requestConfigBuilder -> {requestConfigBuilder.setConnectTimeout(CONNECT_TIMEOUT_MILLIS);requestConfigBuilder.setSocketTimeout(SOCKET_TIMEOUT_MILLIS);requestConfigBuilder.setConnectionRequestTimeout(CONNECTION_REQUEST_TIMEOUT_MILLIS);return requestConfigBuilder;});}/*** 使用异步httpclient时设置并发连接数* */public void setMutiConnectConfig(){builder.setHttpClientConfigCallback(httpClientBuilder -> {httpClientBuilder.setMaxConnTotal(MAX_CONN_TOTAL);httpClientBuilder.setMaxConnPerRoute(MAX_CONN_PER_ROUTE);return httpClientBuilder;});}public RestClient getClient(){return restClient;}public RestHighLevelClient getRhlClient(){return restHighLevelClient;}public void close() {if (restClient != null) {try {restClient.close();} catch (IOException e) {e.printStackTrace();}}LOGGER.info("close client");}
}

ElasticsearchUtil

package com.henu.es.util;import com.alibaba.fastjson.JSON;
import com.henu.es.bean.EsEntity;
import com.henu.es.bean.EsPage;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.ClearScrollRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.search.SearchScrollRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.text.Text;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.index.reindex.DeleteByQueryRequest;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.highlight.HighlightBuilder;
import org.elasticsearch.search.sort.FieldSortBuilder;
import org.elasticsearch.search.sort.SortOrder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
import org.springframework.util.StringUtils;import javax.annotation.PostConstruct;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;/*** /**** @author Administrator* @date 2019/10/13 0013 23:32* @description*/
@Component
public class ElasticsearchUtil<T> {private static final Logger LOGGER = LoggerFactory.getLogger(ElasticsearchUtil.class);@Autowiredprivate RestHighLevelClient rhlClient;private static RestHighLevelClient client;/*** spring容器初始化的时候执行该方法*/@PostConstructpublic void init() {client = this.rhlClient;}/*** 判断索引是否存在     *** @param index 索引,类似数据库* @return boolean* @auther: LHL*/public static boolean isIndexExist(String index) {boolean exists = false;try {exists = client.indices().exists(new GetIndexRequest(index), RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}if (exists) {LOGGER.info("Index [" + index + "] is exist!");} else {LOGGER.info("Index [" + index + "] is not exist!");}return exists;}/*** 创建索引以及映射mapping,并给索引某些字段指定iK分词,以后向该索引中查询时,就会用ik分词。** @param: indexName  索引,类似数据库* @return: boolean* @auther: LHL*/public static boolean createIndex(String indexName) {if (!isIndexExist(indexName)) {LOGGER.info("Index is not exits!");}CreateIndexResponse createIndexResponse = null;try {//创建映射XContentBuilder mapping = null;try {mapping = XContentFactory.jsonBuilder().startObject().startObject("properties")//.startObject("m_id").field("type","keyword").endObject()  //m_id:字段名,type:文本类型,analyzer 分词器类型//该字段添加的内容,查询时将会使用ik_max_word 分词 //ik_smart  ik_max_word  standard.startObject("id").field("type", "text").endObject().startObject("title").field("type", "text").field("analyzer", "ik_smart").endObject().startObject("content").field("type", "text").field("analyzer", "ik_smart").endObject().startObject("state").field("type", "text").endObject().endObject().startObject("settings")//分片数.field("number_of_shards", 3)//副本数.field("number_of_replicas", 1).endObject().endObject();} catch (IOException e) {e.printStackTrace();}CreateIndexRequest request = new CreateIndexRequest(indexName).source(mapping);//设置创建索引超时2分钟request.setTimeout(TimeValue.timeValueMinutes(2));createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}return createIndexResponse.isAcknowledged();}/*** 数据添加,一条文档** @param content   要增加的数据* @param indexName 索引,类似数据库* @param id        id* @return String* @auther: LHL*/public static String addData(XContentBuilder content, String indexName, String id) {IndexResponse response = null;try {IndexRequest request = new IndexRequest(indexName).id(id).source(content);response = client.index(request, RequestOptions.DEFAULT);LOGGER.info("addData response status:{},id:{}", response.status().getStatus(), response.getId());} catch (IOException e) {e.printStackTrace();}return response.getId();}/*** 批量添加数据** @param list  要批量增加的数据* @param index 索引,类似数据库* @return* @auther: LHL*/public void insertBatch(String index, List<EsEntity> list) {BulkRequest request = new BulkRequest();list.forEach(item -> request.add(new IndexRequest(index).id(item.getId()).source(JSON.toJSONString(item.getData()), XContentType.JSON)));try {client.bulk(request, RequestOptions.DEFAULT);} catch (Exception e) {throw new RuntimeException(e);}}/*** 根据条件删除** @param builder   要删除的数据  new TermQueryBuilder("userId", userId)* @param indexName 索引,类似数据库* @return* @auther: LHL*/public void deleteByQuery(String indexName, QueryBuilder builder) {DeleteByQueryRequest request = new DeleteByQueryRequest(indexName);request.setQuery(builder);//设置批量操作数量,最大为10000request.setBatchSize(10000);request.setConflicts("proceed");try {client.deleteByQuery(request, RequestOptions.DEFAULT);} catch (Exception e) {throw new RuntimeException(e);}}/*** 批量删除** @param idList 要删除的数据id* @param index  索引,类似数据库* @return* @auther: LHL*/public static <T> void deleteBatch(String index, Collection<T> idList) {BulkRequest request = new BulkRequest();idList.forEach(item -> request.add(new DeleteRequest(index, item.toString())));try {client.bulk(request, RequestOptions.DEFAULT);} catch (Exception e) {throw new RuntimeException(e);}}/*** 使用分词查询  高亮 排序 ,并分页** @param index          索引名称* @param startPage      当前页* @param pageSize       每页显示条数* @param query          查询条件* @param fields         需要显示的字段,逗号分隔(缺省为全部字段)"id,appid,title,intro,source,updatetime"* @param highlightField 高亮字段* @return 结果*/public static EsPage searchDataPage(String index, int startPage, int pageSize, QueryBuilder query, String fields, String sortField, String highlightField) {SearchRequest searchRequest = new SearchRequest(index);SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();//设置一个可选的超时,控制允许搜索的时间searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 需要显示的字段,逗号分隔(缺省为全部字段)if (!StringUtils.isEmpty(fields)) {System.out.println("显示的字段:"+fields);searchSourceBuilder.fetchSource(fields.split(","), null);}//排序字段if (!StringUtils.isEmpty(sortField)) {searchSourceBuilder.sort(new FieldSortBuilder(sortField).order(SortOrder.ASC));}// 高亮(xxx=111,aaa=222)if (!StringUtils.isEmpty(highlightField)) {HighlightBuilder highlightBuilder = new HighlightBuilder();//设置前缀highlightBuilder.preTags("<span style='color:red' >");//设置后缀highlightBuilder.postTags("</span>");HighlightBuilder.Field highlightTitle = new HighlightBuilder.Field(highlightField);//荧光笔类型highlightTitle.highlighterType("unified");// 设置高亮字段highlightBuilder.field(highlightTitle);searchSourceBuilder.highlighter(highlightBuilder);}// 设置是否按查询匹配度排序searchSourceBuilder.explain(true);if (startPage <= 0) {startPage = 0;}//如果 pageSize是10 那么startPage>9990 (10000-pagesize) 如果 20  那么 >9980 如果 50 那么>9950//深度分页  TODOif (startPage > (10000 - pageSize)) {searchSourceBuilder.query(query);searchSourceBuilder// .setScroll(TimeValue.timeValueMinutes(1)).size(10000);//打印的内容 可以在 Elasticsearch head 和 Kibana  上执行查询LOGGER.info("\n{}", searchSourceBuilder);// 执行搜索,返回搜索响应信息searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = null;try {searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}long totalHits = searchResponse.getHits().getTotalHits().value;if (searchResponse.status().getStatus() == 200) {//使用scrollId迭代查询List<Map<String, Object>> result = disposeScrollResult(searchResponse, highlightField);List<Map<String, Object>> sourceList = result.stream().parallel().skip((startPage - 1 - (10000 / pageSize)) * pageSize).limit(pageSize).collect(Collectors.toList());return new EsPage(startPage, pageSize, (int) totalHits, sourceList);}} else {//浅度分页searchSourceBuilder.query(QueryBuilders.matchAllQuery());searchSourceBuilder.query(query);/*MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("username", "pretty");matchQueryBuilder.fuzziness(Fuzziness.AUTO);//在匹配查询上启用模糊匹配matchQueryBuilder.prefixLength(3);//在匹配查询上设置前缀长度选项matchQueryBuilder.maxExpansions(10);//设置最大扩展选项以控制查询的模糊过程searchSourceBuilder.query(matchQueryBuilder);*/// 分页应用searchSourceBuilder//设置from确定结果索引的选项以开始搜索。默认为0// .from(startPage).from((startPage - 1) * pageSize)//设置size确定要返回的搜索匹配数的选项。默认为10.size(pageSize);//打印的内容 可以在 Elasticsearch head 和 Kibana  上执行查询LOGGER.info("\n{}", searchSourceBuilder);// 执行搜索,返回搜索响应信息searchRequest.source(searchSourceBuilder);SearchResponse searchResponse = null;try {searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}long totalHits = searchResponse.getHits().getTotalHits().value;long length = searchResponse.getHits().getHits().length;LOGGER.debug("共查询到[{}]条数据,处理数据条数[{}]", totalHits, length);if (searchResponse.status().getStatus() == 200) {// 解析对象List<Map<String, Object>> sourceList = setSearchResponse(searchResponse, highlightField);return new EsPage(startPage, pageSize, (int) totalHits, sourceList);}}return null;}/*** 高亮结果集 特殊处理* @param searchResponse 搜索的结果集* @param highlightField 高亮字段*/private static List<Map<String, Object>> setSearchResponse(SearchResponse searchResponse, String highlightField) {List<Map<String, Object>> sourceList = new ArrayList<>();for (SearchHit searchHit : searchResponse.getHits().getHits()) {Map<String, Object> resultMap = getResultMap(searchHit, highlightField);sourceList.add(resultMap);}return sourceList;}/*** 获取高亮结果集** @param: [hit, highlightField]* @return: java.util.Map<java.lang.String, java.lang.Object>* @auther: LHL*/private static Map<String, Object> getResultMap(SearchHit hit, String highlightField) {hit.getSourceAsMap().put("id", hit.getId());if (!StringUtils.isEmpty(highlightField)) {Text[] text = hit.getHighlightFields().get(highlightField).getFragments();String hightStr = null;if (text != null) {for (Text str : text) {hightStr = str.string();}//遍历 高亮结果集,覆盖 正常结果集hit.getSourceAsMap().put(highlightField, hightStr);}}return hit.getSourceAsMap();}public static <T> List<T> search(String index, SearchSourceBuilder builder, Class<T> c) {SearchRequest request = new SearchRequest(index);request.source(builder);try {SearchResponse response = client.search(request, RequestOptions.DEFAULT);SearchHit[] hits = response.getHits().getHits();List<T> res = new ArrayList<>(hits.length);for (SearchHit hit : hits) {res.add(JSON.parseObject(hit.getSourceAsString(), c));}return res;} catch (Exception e) {throw new RuntimeException(e);}}/*** 处理scroll结果** @param: [response, highlightField]* @return: java.util.List<java.util.Map < java.lang.String, java.lang.Object>>* @auther: LHL*/private static List<Map<String, Object>> disposeScrollResult(SearchResponse response, String highlightField) {List<Map<String, Object>> sourceList = new ArrayList<>();//使用scrollId迭代查询while (response.getHits().getHits().length > 0) {String scrollId = response.getScrollId();try {response = client.scroll(new SearchScrollRequest(scrollId), RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}SearchHits hits = response.getHits();for (SearchHit hit : hits.getHits()) {Map<String, Object> resultMap = getResultMap(hit, highlightField);sourceList.add(resultMap);}}ClearScrollRequest request = new ClearScrollRequest();request.addScrollId(response.getScrollId());try {client.clearScroll(request, RequestOptions.DEFAULT);} catch (IOException e) {e.printStackTrace();}return sourceList;}
}

SpringbootElasticsearchApplication

package com.henu.es;import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;/**/*** 操作elasticsearch有两种方式:* (1)jest:默认不生效,需要导入包io.searchbox.jest*    配置application.properties,测试添加文档和查询文档* (2)spring-data-es:导入spring-data-elasticsearch包*    配置application.properties:cluster-name  cluster-nodes*    启动要是报错:可能是版本不匹配*    两种用法:*    (1)编写接口继承elasticsearchRepository*     (2) elasticsearchTemplate* (3)spring-data-es CRUD + 分页 + 高亮的练习**/
@SpringBootApplication
public class SpringbootElasticsearchApplication {public static void main(String[] args) {// 避免netty冲突System.setProperty("es.set.netty.runtime.available.processors", "false");SpringApplication.run(SpringbootElasticsearchApplication.class, args);}
}

搜索结果:

爬虫 spider10——搭建elk平台,开发服务提供者相关推荐

  1. 电影购票APP快速搭建(平台开发)

    电影购票APP开发,电影购票APP开发搭建,电影购票APP开发平台.伴随着我们生活水平不断提升,当然期待根据各种各样主题活动来发展见识,看电视剧则是非常好的选择,由此可见电影购票APP开发设计也是给大 ...

  2. 爬虫 spider11——搭建分布式架构通过feign技术,开发服务消费者

    搭建分布式架构,把3中开发的服务提供者,注册到eureka server(三台,7001,7002,7003) 开发服务消费者(可以直接访问3中的服务),调试成功后, 通过feign技术,开发服务消费 ...

  3. windows环境下ELK平台搭建

    背景 日志系统主要包括系统日志,应用程序日志和安全日志.系统运维和开发人员可以通过日志了解服务器的软件,硬件信息,检查配置过程中的错误以及错误发生的原因.通常分析日志可以了解服务器的负荷,性能安全性, ...

  4. 搭建ELK日志分析平台(上)—— ELK介绍及搭建 Elasticsearch 分布式集群

    笔记内容:搭建ELK日志分析平台(上)-- ELK介绍及搭建 Elasticsearch 分布式集群 笔记日期:2018-03-02 27.1 ELK介绍 27.2 ELK安装准备工作 27.3 安装 ...

  5. RabbitMQ + ELK 搭建日志平台

    CentOS下使用ELK套件搭建日志分析和监控平台 2015年01月30日 17:32:29 i_chips 阅读数:24252 https://blog.csdn.net/i_chips/artic ...

  6. 搭建ELK日志分析平台(下)—— 搭建kibana和logstash服务器

    27.6 安装kibana 27.7 安装logstash 27.8 配置logstash 27.9 kibana上查看日志 27.10 收集nginx日志 27.11 使用beats采集日志 本文是 ...

  7. 手把手教你搭建 ELK 实时日志分析平台

    来自:武培轩 本篇文章主要是手把手教你搭建 ELK 实时日志分析平台,那么,ELK 到底是什么呢? ELK 是三个开源项目的首字母缩写,这三个项目分别是:Elasticsearch.Logstash ...

  8. 【官方搭建入门】JEECG 平台开发环境搭建必读

    [官方搭建入门]JEECG 平台开发环境搭建必读 下载地址:http://git.oschina.net/jeecg/jeecg 1. 标准开发环境:eclipse + maven + jdk7 + ...

  9. Docker快速搭建Taiga敏捷开发项目管理平台

    Taiga.io , Open Source, full featured project management platform for startups and agile developers ...

最新文章

  1. 天天Linux-安装samba,nasm
  2. python编程语言是什么-Python是一种什么样的编程语言?解释?编译?汇编?机械?...
  3. case 语句不跳转,死循环
  4. idea报错:Error:java: JDK isn‘t specified for module ‘xxx‘
  5. servlet 3.0异步_Servlet 3.0异步处理可将服务器吞吐量提高十倍
  6. html中元素的几种居中方法
  7. CS231n 课程(笔记内容 by Aries.Y)
  8. Android Studio启动海马玩模拟器
  9. walking与Matlab入门教程-介绍示例模型
  10. 数组filter的用法
  11. css动画(transition,translate,rotate,scale)
  12. 计算机专业英语教程比较实用,计算机专业英语教程Unit1整理
  13. Oracle存储过程中声明数组
  14. 数据结构实验——哈夫曼编码
  15. php 时间戳与日期的转换
  16. 转:创业之路“一生悬命”---只有偏执狂才能生存
  17. Keil5: 如何创建一个工程模板
  18. 工作日计算器_.NET工作日计算器
  19. 8.Redis- 集群:AKF拆分(y轴和z轴),twemproxy,predixy,cluster
  20. 渗透工程师面试题合集(2022版)

热门文章

  1. 上海理工大学第二届“联想杯”全国程序设计邀请赛 - Dahno Dahno(SW)
  2. CodeForces - 1307D Cow and Fields(最短路+思维)
  3. POJ - 1220 NUMBER BASE CONVERSION(高精度运算+进制转换+模拟)
  4. PAT (Basic Level) 1075 链表元素分类(模拟)
  5. HDU - 5875 Function(单调栈)
  6. keil 查看 stm32 io波形_如何系统地入门学习stm32?
  7. HDU4259(简单群置换)
  8. Android low memory killer 机制
  9. MySQL 事务 :ACID、并发带来的问题、事务的隔离级别、事务的实现
  10. Linux网络编程 | 多路复用I/O :select、poll、epoll、水平触发与边缘触发、惊群问题