Elasticsearch（三）

文章目录

10、集成Spring Boot
- 10.1、入门环境搭建
- 10.2、新建索引
- 10.3、删除索引
- 10.4、查找索引
- 10.5、新建文档
- 10.6、删除文档
- 10.7、修改文档
- 10.8、查找文档
- 10.9、批量插入
- 10.10、复杂查询
11、项目实战
- 11.1、环境搭建
- 11.2、爬虫数据
- 11.3、数据展示
- 11.4、高亮显示

该篇博文根据B站UP主遇见狂神说的课程——【狂神说Java】ElasticSearch7.6.x最新完整教程通俗易懂整理而出

10、集成Spring Boot

10.1、入门环境搭建

1、创建Spring Boot项目

2、选择依赖时，需要勾选Spring Date Elasticsearch依赖

3、修改配置

由于我们本地安装的ES版本为7.6.1，IDEA中的环境需要与之匹配，所以修改对应的版本信息，配置如图示：

4、创建配置类

@Configuration
public class ElasticSearchClientConfig {@Beanpublic RestHighLevelClient restHighLevelClient() {RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(new HttpHost("127.0.0.1", 9200, "http")));return client;}
}

注意：ES的核心配置类为RestClientAutoConfiguration，但其本质是RestClientConfigurations类

至此，基本的ES环境搭建已经完成，现在只需要编写具体的业务测试类即可

10.2、新建索引

测试创建索引，

等同于直接使用命令：PUT mobian_test_api

测试代码：

@SpringBootTest
class DemoApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;@Testvoid createIndex() throws IOException {// 1、创建索引请求CreateIndexRequest request = new CreateIndexRequest("mobian_test_api");// 2、客户端执行对应的请求IndicesClient ,请求后获得响应CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);System.out.println(createIndexResponse);}
}

测试结果：

使用Kibana查看我们创建的索引信息

10.3、删除索引

测试代码：

@SpringBootTest
class DemoApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;@Testpublic void deleteIndex() throws IOException {DeleteIndexRequest request = new DeleteIndexRequest("mobian_test_api2");AcknowledgedResponse delete = restHighLevelClient.indices().delete(request, RequestOptions.DEFAULT);System.out.println(delete);}}

测试结果：

10.4、查找索引

判断对应索引是否存在

测试代码：

@SpringBootTest
class DemoApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;@Testpublic void getIndex() throws IOException {GetIndexRequest request = new GetIndexRequest("mobian_test_api");boolean exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);System.out.println("mobian_test_api 索引是否存在：" + exists);request = new GetIndexRequest("mobian_test_api3");exists = restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);System.out.println("mobian_test_api3 索引是否存在：" + exists);}}

测试结果：

获取对应的索引信息

@SpringBootTest
class DemoApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;@Testpublic void findIndex() throws IOException {GetIndexRequest request = new GetIndexRequest("mobian_test_api");GetIndexResponse getIndexResponse = restHighLevelClient.indices().get(request, RequestOptions.DEFAULT);System.out.println(getIndexResponse);System.out.println(getIndexResponse.getIndices());}
}

测试结果：

10.5、新建文档

由于涉及到对象和JSON字符串之间的转换，可以引入fastjson依赖

<dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.66</version>
</dependency>

测试代码：

@SpringBootTest
public class DocumentApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;@Testpublic void addDocument() throws IOException {// 1.创建对象User user = new User("mobian", 4);// 2.创建请求IndexRequest request = new IndexRequest("mobian_test_api");// 类比理解 put /mobian_test_api/_doc/1request.id("1");request.timeout("1s");// 3.将我们的数据以json字符串的格式放入请求中request.source(JSON.toJSONString(user), XContentType.JSON);// 4.客户端发送对应的请求，获取具体的响应结果IndexResponse index = restHighLevelClient.index(request, RequestOptions.DEFAULT);System.out.println(index.toString());System.out.println(index.status());}
}

测试结果：

10.6、删除文档

测试代码：

@SpringBootTest
public class DocumentApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 删除文档@Testpublic void deleteDocument() throws IOException {DeleteRequest request = new DeleteRequest("mobian_test_api", "1");DeleteResponse delete = restHighLevelClient.delete(request, RequestOptions.DEFAULT);System.out.println(delete);System.out.println(delete.status());}}

测试结果：

10.7、修改文档

测试代码：

@SpringBootTest
public class DocumentApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 修改文档@Testpublic void updateDocument() throws IOException {// 构建新的对象UpdateRequest updateRequest = new UpdateRequest("mobian_test_api", "1");User user = new User("默辨", 18);updateRequest.doc(JSON.toJSONString(user), XContentType.JSON);UpdateResponse update = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);System.out.println(update);System.out.println(update.status());}
}

测试结果：

10.8、查找文档

判断对应的文档是否存在

测试代码：

@SpringBootTest
public class DocumentApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;@Testpublic void findDocument() throws IOException {GetRequest getRequest = new GetRequest("mobian_test_api", "1");// 设置不返回 _source的上下文，该处不设置不影响查询结果getRequest.fetchSourceContext(new FetchSourceContext(false));getRequest.storedFields("_none_");boolean exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);System.out.println("mobian_test_api索引中是否存在文档为1的记录：" + exists);getRequest = new GetRequest("mobian_test_api", "3");exists = restHighLevelClient.exists(getRequest, RequestOptions.DEFAULT);System.out.println("mobian_test_api索引中是否存在文档为3的记录：" + exists);}
}

测试结果：由于我们只添加了文档为1的记录，所以当记录的id为3时，查询状态对false。

查询具体的文档信息

测试代码：

@SpringBootTest
public class DocumentApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 查找对应的文档信息@Testpublic void getDocument() throws IOException {GetRequest getRequest = new GetRequest("mobian_test_api", "1");GetResponse documentFields = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT);// 想要查询什么信息，直接调用对应的方法即可System.out.println(documentFields.getSource());//返回的全部内容和使用命令返回的全部内容是一样的System.out.println(documentFields);}
}

测试结果：

总结步骤：

1、封装我们想要完成操作的对象
2、使用调用RestHighLevelClient类中对应操作的方法

@SpringBootTest
public class DocumentApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;}

10.9、批量插入

测试代码：

@SpringBootTest
public class DocumentApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 批量新增文档@Testpublic void batchDocument() throws IOException {BulkRequest bulkRequest = new BulkRequest();bulkRequest.timeout("10s");ArrayList<User> users = new ArrayList<>();users.add(new User("mobian1", 22));users.add(new User("mobian2", 22));users.add(new User("mobian3", 22));users.add(new User("mobian4", 22));users.add(new User("mobian5", 22));users.add(new User("mobian6", 22));// 批量处理for (int i = 0; i < users.size(); i++) {// 如果想要完成修改操作，只需要修改该步即可bulkRequest.add(new IndexRequest("mobian_test_api").id("" + (i + 1)).source(JSON.toJSONString(users.get(i)), XContentType.JSON));}BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);System.out.println(bulk);System.out.println(bulk.status());}
}

测试结果：

10.10、复杂查询

测试代码：

@SpringBootTest
public class DocumentApplicationTests {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 复杂条件查询@Testpublic void searchDocument() throws IOException {SearchRequest searchRequest = new SearchRequest("mobian_test_api");// 构建搜索条件，这里有点类似于MyBatis-Plus中的wrapper条件构造器SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();// 精确匹配TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("age", "22");// 匹配所有// MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();sourceBuilder.query(termQueryBuilder);sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 高亮// sourceBuilder.highlighter();// 分页// sourceBuilder.from();// sourceBuilder.size();searchRequest.source(sourceBuilder);SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);System.out.println(JSON.toJSONString(search.getHits()));for (SearchHit documentFields : search.getHits().getHits()) {System.out.println(documentFields.getSourceAsMap());}}
}

测试结果：

11、项目实战

该部分主要分为三部分，爬取指定界面的数据，展示爬取出来的数据，高亮展示爬取出来的数据。

此处的案例：爬取京东商城以python为关键字的数据信息（标题、价格、图片），再获取ES索引库中对应的数据信息，并且完成关键字（python）的高亮显示。

11.1、环境搭建

1、创建一个Spring Boot项目

2、引入主要的maven依赖

<dependencies><!-- ES组件 --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId></dependency><!-- Web组件 --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency><!-- thymeleaf组件 --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-thymeleaf</artifactId></dependency><!-- 热部署组件 --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-devtools</artifactId><scope>runtime</scope><optional>true</optional></dependency><!-- thymeleaf组件 --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-configuration-processor</artifactId><optional>true</optional></dependency><!-- lombok组件 --><dependency><groupId>org.projectlombok</groupId><artifactId>lombok</artifactId><optional>true</optional></dependency><!-- 测试组件 --><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-test</artifactId><scope>test</scope></dependency><!-- fastjson组件 --><dependency><groupId>com.alibaba</groupId><artifactId>fastjson</artifactId><version>1.2.66</version></dependency><!-- jsoup组件 --><dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.13.1</version></dependency>
</dependencies>

11.2、爬虫数据

1、创建实体类，与我们的目标数据相对应

Content.java

@Data
@AllArgsConstructor
@NoArgsConstructor
public class Content {private String img;private String title;private String price;
}

2、创建一个解析工具类，

HtmlParseUtil.java

@Component
public class HtmlParseUtil {public static List<Content> parseJD(String keyWord) throws IOException {// 指定对应的路径，这里去爬取京东商城界面String url = "https://search.jd.com/Search?keyword=" + keyWord;// 解析对应网络地址的网页信息Document document = Jsoup.parse(new URL(url), 30000);// 获取对应的id节点Element element = document.getElementById("J_goodsList");// 获取对应节点内的对应tagElements lis = element.getElementsByTag("li");ArrayList<Content> goodsList = new ArrayList<>();// 遍历每一个tag，获取tag标签内部具体的信息for (Element li : lis) {// 京东书籍界面，设置了懒加载，所以需要设置对应的懒加载标签String img = li.getElementsByTag("img").eq(0).attr("data-lazy-img");String price = li.getElementsByClass("p-price").eq(0).text();String title = li.getElementsByClass("p-name").eq(0).text();goodsList.add(new Content(img, title, price));}return goodsList;}
}

3、编写service层业务方法

ContentService.java

@Service
public class ContentService {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 解析网页数据，并保存到es数据库中public Boolean parseContent(String keyWord) throws IOException {List<Content> contents = HtmlParseUtil.parseJD(keyWord);BulkRequest bulkRequest = new BulkRequest();bulkRequest.timeout("30s");for (int i = 0; i < contents.size(); i++) {// 使用默认的id生成策略bulkRequest.add(new IndexRequest("jd_goods").source(JSON.toJSONString(contents.get(i)), XContentType.JSON));}BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);System.out.println(bulk);System.out.println(bulk.status());// 失败返回true，成功返回falsereturn !bulk.hasFailures();}}

4、编写controller层接口

ContentController.java

@RestController
public class ContentController {@Autowiredprivate ContentService contentService;@GetMapping("/parse/{keyWords}")public Boolean parseContent(@PathVariable("keyWords") String keyWords) throws IOException {return contentService.parseContent(keyWords);}
}

测试结果：

当我们在浏览器地址栏中输入：http://localhost:8080/parse/python时，会将京东商城界面中指定的数据信息（名字、价格、图片），添加到我们ES的jd_goods索引库中。并打印出我们的指定的返回结果。具体的数据信息，我们可以在ES库中查询

11.3、数据展示

1、编写service层业务方法

ContentService.java

@Service
public class ContentService {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 搜索es数据库中的信息public List<Map<String, Object>> searchPage(String keyWord, int pageNo, int pageSize) throws IOException {// 条件搜索SearchRequest searchRequest = new SearchRequest("jd_goods");SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();// 精准匹配TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyWord);sourceBuilder.query(termQueryBuilder);sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 分页sourceBuilder.from(pageNo);sourceBuilder.size(pageSize);// 执行搜索searchRequest.source(sourceBuilder);SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);// 解析结果ArrayList<Map<String, Object>> list = new ArrayList<>();for (SearchHit hit : search.getHits().getHits()) {list.add(hit.getSourceAsMap());}return list;}
}

2、编写controller层接口

ContentController.java

@RestController
public class ContentController {@Autowiredprivate ContentService contentService;@GetMapping("/search/{keyWord}/{pageNo}/{pageSize}")public List<Map<String, Object>> search(@PathVariable("keyWord") String keyWord,@PathVariable("pageNo") int pageNo,@PathVariable("pageSize") int pageSize) throws IOException {return contentService.searchPage(keyWord,pageNo,pageSize);}
}

测试结果：

在查询对应的数据前，一定要确保我们ES索引库中包含有与我们关键字对应的数据信息，具体的数据添加操作，请查看上一小节。此处的测试为，输入关键字python，界面展示出我们ES索引库中对应的与python相关的书记的名字、价格、图片地址。

11.4、高亮显示

1、编写service层业务方法

ContentService.java

@Service
public class ContentService {@Autowiredprivate RestHighLevelClient restHighLevelClient;// 高亮搜索public List<Map<String, Object>> searchHighLightPage(String keyWord, int pageNo, int pageSize) throws IOException {// 条件搜索SearchRequest searchRequest = new SearchRequest("jd_goods");SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();// 精准匹配TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("title", keyWord);sourceBuilder.query(termQueryBuilder);sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 分页sourceBuilder.from(pageNo);sourceBuilder.size(pageSize);// 高亮HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.field("title");highlightBuilder.requireFieldMatch(false);highlightBuilder.preTags("<span style='color:red'>");highlightBuilder.postTags("</span>");sourceBuilder.highlighter(highlightBuilder);// 执行搜索searchRequest.source(sourceBuilder);SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);ArrayList<Map<String, Object>> list = new ArrayList<>();// 解析结果for (SearchHit hit : search.getHits().getHits()) {Map<String, HighlightField> highlightFields = hit.getHighlightFields();HighlightField title = highlightFields.get("title");Map<String, Object> sourceAsMap = hit.getSourceAsMap();if (title != null) {Text[] fragments = title.fragments();String newTitle = "";for (Text text : fragments) {newTitle += text;}sourceAsMap.put("title", newTitle);}list.add(hit.getSourceAsMap());}return list;}
}

2、编写controller层接口

ContentController.java

@RestController
public class ContentController {@Autowiredprivate ContentService contentService;@GetMapping("/search2/{keyWord}/{pageNo}/{pageSize}")public List<Map<String, Object>> searchHighLight(@PathVariable("keyWord") String keyWord,@PathVariable("pageNo") int pageNo,@PathVariable("pageSize") int pageSize) throws IOException {return contentService.searchHighLightPage(keyWord,pageNo,pageSize);}
}

测试结果：

service层业务代码的逻辑为，对搜索的关键字进行高亮处理，所以理论上的测试结果应该在title属性的关键字上添加对应的span标签，如下图的测试情况：