Android 通过okhttp + jsoup 爬虫爬取网页小说

效果图

1.准备工作

测试地址：http://www.tlxs.net
第三方依赖：
implementation ‘com.squareup.okhttp3:okhttp:4.10.0’
implementation ‘org.jsoup:jsoup:1.15.3’
implementation ‘com.github.bumptech.glide:glide:4.14.2’

2.通过okhttp获取 html数据

//获取okhttp对象
OkHttpClient client = getOkHttpClient();
Request request = new Request.Builder().url(address).build();
//获取请求返回数据
Response response = client.newCall(request).execute();
//回调输入流
InputStream inputStream = response.body().byteStream();

3.输入流转文本

/*** 转换response 为 html* @param inputStream* @return html*/
public static String parseResponse(InputStream inputStream) {try {//输入流转文本BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, "GBK"));StringBuilder response = new StringBuilder();String line = reader.readLine();while (line != null) {response.append(line);line = reader.readLine();}//获取html数据String html = response.toString();return html;} catch (Exception e) {e.printStackTrace();}return null;}

4. 解析html，获取小说名称和图片链接

/*** 热门小说排行榜列表** @param html* @return*/
public static List<Book> getHotRank(String html) {//解析html数据Document doc = Jsoup.parse(html);//获取请求体Element body = doc.getElementsByTag("body").get(0);//获取小说列表的关键类Elements p10 = body.getElementsByClass("p10");List<Book> books = new ArrayList<>();//遍历对象for (Element li : p10) {Book book = new Book();//获取小说名称Element name = li.getElementsByTag("dt").first().getElementsByTag("a").first();//获取小说图片地址Element img = li.getElementsByClass("image").first().getElementsByTag("img").first();String text = name.text();String imgUrl = img.attr("src");String novelUrl = name.attr("href");book.setImgUrl(imgUrl);book.setName(text);book.setNovelUrl(novelUrl);books.add(book);}return books;
}

5. 通过小说地址获取小说章节内容

/*** 获取书本详情页 章节列表* @param html* @return*/
public static Book getBookInfo(String html) {//解析html数据Document doc = Jsoup.parse(html);//获取请求体Element head = doc.getElementsByTag("head").get(0);Element body = doc.getElementsByTag("body").get(0);Elements metas = head.getElementsByTag("meta");Book book = new Book();for (Element meta : metas) {String property = meta.attr("property");if (property.contains("category")) {String content = meta.attr("content");book.setCategory(content);} else if (property.contains("author")) {String content = meta.attr("content");book.setAuthor(content);} else if (property.contains("book_name")) {String content = meta.attr("content");book.setName(content);} else if (property.contains("read_url")) {String content = meta.attr("content");book.setRead_url(content);} else if (property.contains("url")) {String content = meta.attr("content");book.setNovelUrl(content);} else if (property.contains("status")) {String content = meta.attr("content");book.setStatus(content);} else if (property.contains("update_time")) {String content = meta.attr("content");book.setUpdate_time(content);}}Element listmain = body.getElementsByClass("listmain").get(0);Elements dds = listmain.getElementsByTag("dd");List<Chapter> chapterList = new ArrayList<>();for (Element dd : dds) {Element a = dd.getElementsByTag("a").get(0);Chapter chapter = new Chapter();chapter.setUrl(a.attr("href"));chapter.setName(a.text());chapterList.add(chapter);}book.setChapterList(chapterList);return book;}

6. 通过章节地址，解析小说内容

/*** 获取章节详情* @param chapter* @param html*/
public static Chapter getChapterInfo(Chapter chapter, String html) {//解析html数据Document doc = Jsoup.parse(html);//获取请求体Element head = doc.getElementsByTag("head").get(0);Element body = doc.getElementsByTag("body").get(0);Element book = body.getElementById("book");Element content = book.getElementById("content");chapter.setContent(content.text());return chapter;
}

Android 通过okhttp + jsoup 爬虫爬取网页小说相关推荐

python java 爬数据_如何用java爬虫爬取网页上的数据
当我们使用浏览器处理网页的时候,有时候是不需要浏览的,例如使用PhantomJS适用于无头浏览器,进行爬取网页数据操作.最近在进行java爬虫学习的小伙伴们有没有想过如何爬取js生成的网络页面吗?别急 ...
node：爬虫爬取网页图片 1
代码地址如下: http://www.demodashi.com/demo/13845.html 前言周末自己在家闲着没事,刷着微信,玩着手机,发现自己的微信头像该换了,就去网上找了一下头像,看着图 ...
Python 爬取网页信息并保存到本地爬虫爬取网页第一步【简单易懂，注释超级全，代码可以直接运行】
Python 爬取网页信息并保存到本地[简单易懂,代码可以直接运行] 功能:给出一个关键词,根据关键词爬取程序,这是爬虫爬取网页的第一步步骤: 1.确定url 2.确定请求头 3.发送请求 4.写入 ...
python爬虫获取的网页数据为什么要加[0-python3爬虫爬取网页思路及常见问题（原创）...
学习爬虫有一段时间了,对遇到的一些问题进行一下总结. 爬虫流程可大致分为:请求网页(request),获取响应(response),解析(parse),保存(save). 下面分别说下这几个过程中可以 ...
Python爬虫爬取网页数据并存储（一）
Python爬虫爬取网页数据并存储(一) 环境搭建爬虫基本原理 urllib库使用 requests库使用正则表达式一个示例环境搭建 1.需要事先安装anaconda(或Python3.7)和 ...
python爬虫爬取网页新闻标题-看完保证你会
python爬虫爬取网页新闻标题方法 1.首先使用浏览自带的工具--检查,查找网页新闻标题对应的元素位置,这里查到的新闻标题是在 h3 标签中 2.然后使用编辑器编写python代码 2.1方法一: ...
jsoup爬虫,爬取全站代码
最近使用jsoup扒了几个网站,感觉bug改的差不多了,于是写出来与大家分享. 首先我会把爬虫基础的爬取思路与部分重要方法展示出来,最后我会把全部代码贴出来.并且我会写一个Main类,里面就是爬虫的模 ...
网页爬虫 python-python爬虫——爬取网页的中文
# 爬取网页的中文内容 from urllib import request from bs4 import BeautifulSoup import pandas as pds import xlr ...
python爬虫爬取网页壁纸图片（《底特律：变人》）
参考文章:https://www.cnblogs.com/franklv/p/6829387.html 爬虫爬取网址:http://www.gamersky.com/news/201804/10396 ...

Android 通过okhttp + jsoup 爬虫爬取网页小说

Android 通过okhttp + jsoup 爬虫爬取网页小说

效果图

1.准备工作

2.通过okhttp获取 html数据

3.输入流转文本

4. 解析html，获取小说名称和图片链接

5. 通过小说地址获取小说章节内容

6. 通过章节地址，解析小说内容

Android 通过okhttp + jsoup 爬虫爬取网页小说相关推荐

最新文章

热门文章