python3爬取笔趣阁小说

爬虫第一步：确定要爬取的信息
确定要爬取的网站的URL地址：http://www.xbiquge.la/6/6818/

第二步：分析网站信息
爬小说要获取小说章节目录名称和每一章的阅读链接

第三步：编写代码：
本次爬虫练习完整代码，存在缺陷，小说章节过多的话爬虫可能会被封Ip

import requests
import re
from lxml import etree
import os
import timedef get_html():headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}# 小说目录URL，改变这个URL就可以下载对应的小说url = 'http://www.xbiquge.la/6/6818/'html = requests.get(url, headers=headers).content.decode('utf-8')return htmldef get_novel_url(html):''' 获取章节名和链接 '''pat2 = r"<dd><a href='(.*?)' >(.*?)</a></dd>"title_name = re.findall(pat2, html)# 小说保存文件名称path = '真武世界'if not os.path.exists(path):os.makedirs(path)for title in title_name:# 章节URLnovel_url = title[0]# 章节名novel_name = title[1]# 构造章节URLnewUrl = 'http://www.xbiquge.la' + novel_urlresponse = requests.get(newUrl).content.decode('utf-8', 'ignore')response = etree.HTML(response)# 获取章节内容content = response.xpath('//*[@id="content"]/text()')# content = content[0].replace('?', '')try:# 下载小说print("正在下载小说----->>>>>> %s" % novel_name)filename = path + '/' + '{}.txt'.format(novel_name)with open(filename, 'w', encoding='utf-8') as f:f.writelines(content)time.sleep(1)except Exception as e:print("下载出错!", e)def main():html = get_html()get_novel_url(html)if __name__ == '__main__':main()

python3爬取笔趣阁小说相关推荐

python3+正则(re)增量爬虫爬取笔趣阁小说( 斗罗大陆IV终极斗罗)
python3+re 爬虫爬取笔趣阁小说斗罗大陆IV终极斗罗爬取前准备导入的模块分析正则的贪婪与非贪婪附完整代码示例爬取前准备导入的模块 import redis #redis数据库 ...
python爬取笔趣阁小说（附源码）
python爬取笔趣阁小说文章目录 python爬取笔趣阁小说前言一.获取小说目录结构获取目录连接请求代码解析目录 XPath tqdm 解析二.获取小说章节结构请求代码解析章节代 ...
爬取笔趣阁小说网站上的所有小说（二）
爬取笔趣阁小说网站上的所有小说(二) 网址为:https://www.biqukan.cc/topallvisit/1.html 我们已经拿到了所有小说的地址爬取笔趣阁小说网站上的所有小说(一),现在 ...
python爬取小说爬取_用python爬取笔趣阁小说
原标题:用python爬取笔趣阁小说首先打开笔趣阁网址,链接,搜索自己想要的小说. 在网站内单击右键,点击检查,会出现如下界面! 我们需要的章节信息就在我划的这块, 可以将每个标签点一下,它对应的内 ...
java爬虫爬取笔趣阁小说
java爬虫爬取笔趣阁小说 package novelCrawler;import org.jsoup.Connection; import org.jsoup.HttpStatusException ...
Python爬虫之爬取笔趣阁小说下载到本地文件并且存储到数据库
学习了python之后,接触到了爬虫,加上我又喜欢看小说,所以就做了一个爬虫的小程序,爬取笔趣阁小说. 程序中一共引入了以下几个库: import requests import mysql.conn ...
爬取笔趣阁小说网站上的所有小说（一）
爬取笔趣阁小说网站上的所有小说(一) 网址为:https://www.biqukan.cc/topallvisit/1.html 反反爬虫爬虫首先要做的就是看看目标网址有没有反爬虫手段,一般网站都是 ...
爬虫练习-爬取笔趣阁小说
练习一下爬虫,将笔趣阁的小说根据需求目标再爬取下来,本文仅仅学习爬虫技术,大家还是要支持一下正版网站的思路: Created with Raphaël 2.2.0开始输入书名查询小说是否存在跳转页面 ...
用Scrapy爬取笔趣阁小说
今天早上无聊,去笔趣阁扒了点小说存Mongodb里存着,想着哪天做一个小说网站有点用,无奈网太差,爬了一个小时就爬了几百章,爬完全网的小说,不知道要到猴年马月去了.再说说scrapy这个爬虫框架,真是 ...

python3爬取笔趣阁小说

python3爬取笔趣阁小说相关推荐

最新文章

热门文章