python 小说cms系统_「博文小说网」Python爬虫爬取小说网站

博文小说网

#!/usr/bin/env Python

# -*- coding: utf-8 -*-

# @Author : Woolei

# @File : book136_singleprocess.py

import requests

import time

import os

from bs4 import BeautifulSoup

'User-Agent':

'Mozilla/5.0 (windows NT 10.0; Win64; x64) APPleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'

}

# 获取小说章节内容，并写入文本

def getChaptercontent(each_chapter_dict):

content_html = requests.get(each_chapter_dict['chapter_url'], headers=headers).text

soup = BeautifulSoup(content_html, 'lxml')

content_tag = soup.find('p', {'id': 'content'})

p_tag = content_tag.find_all('p')

print('正在保存的章节 --> ' + each_chapter_dict['name'])

for each in p_tag:

paragraph = each.get_text().strip()

with open(each_chapter_dict['name'] + r'.txt', 'a', encoding='utf8') as f:

f.write(' ' + paragraph + '\n\n')

f.close()

# 获取小说各个章节的名字和url

def getChapterInfo(novel_url):

chapter_html = requests.get(novel_url, headers=headers).text

soup = BeautifulSoup(chapter_html, 'lxml')

chapter_list = soup.find_all('li')

chapter_all_dict = {}

for each in chapter_list:

import re

chapter_each = {}

chapter_each['name'] = each.find('a').get_text() # 获取章节名字

chapter_each['chapter_url'] = each.find('a')['href'] # 获取章节url

chapter_num = int(re.findall('\d+', each.get_text())[0]) # 提取章节序号

chapter_all_dict[chapter_num] = chapter_each # 记录到所有的章节的字典中保存

return chapter_all_dict

if __name__ == '__main__':

start = time.clock() # 记录程序运行起始时间

novel_url = 'https://www.136book.com/sanshengsanshimenglitaohua/' # 这里以三生三世十里桃花为例

novel_info = getChapterInfo(novel_url) # 获取小说章节记录信息

dir_name = '保存的小说路径'

if not os.path.exists(dir_name):

os.mkdir(dir_name)

os.chdir(dir_name) # 切换到保存小说的目录

for each in novel_info:

getChapterContent(novel_info[each])

# time.sleep(1)

end = time.clock() # 记录程序结束的时间

print('保存小说结束，共保存了 %d 章，消耗时间：%f s' % (len(novel_info), (end - start)))

python 小说cms系统_「博文小说网」Python爬虫爬取小说网站 - seo实验室相关推荐

python 文本框位置_「每日一练」Python文本框的显示和插入
Python强大之处在于对于数据的处理,而处理数据就离不开文本框,那么你知道Python中文本框是如何显示和插入吗? 案例 python文本框的显示和插入先上代码~ 运行效果题目详述第一行: i ...
python抖音接口_「docker实战篇」python的docker-抖音分析接口数据分析（21）
上节xposed已经安装完毕,设置对应的android的版本和代理服务器. 准备工作 1.xposed和JustTruestMe的安装 2.抖音安装完毕启动fildder 点击进入指定的粉丝界面查 ...
python抓包工具_「docker实战篇」python的docker爬虫技术-fiddler抓包软件详细配置（七）...
挑选常用的功能给各位老铁介绍下. fiddler第一次进入fiddlerfiddler会请求fiddler的官网,检查更新操作布局分布工具栏File -capture traffic开启爬虫File ...
python抓包程序_「docker实战篇」python的docker爬虫技术-fiddler抓包软件详细配置（七）...
fiddler 第一次进入fiddler fiddler会请求fiddler的官网,检查更新操作布局分布工具栏 File -capture traffic 开启爬虫 File -new Viewe ...
css设置图标居左_「css图片居中」css - 常用垂直/水平居中 + 左右布局 - seo实验室...
css图片居中主要内容学习文献元素居中胡子哥 - 谈一谈我在阿里的成长 + 2 左右布局几种常见的CSS布局 -- 掘金 BFC 实现三栏布局的几种方法-- github 实现多列等高布局 ...
python 小说cms系统_狂雨小说cms开源系统附安装教程-狂雨小说CMS系统(外加一套采集规律)下载两款优化版-西西软件下载...
狂雨小说CMS系统(外加一套采集规律)是一款狂雨小说cms建站内容管理平台系统,内置标签模版,即使不懂代码的前端开发者也可以快速建立一个漂亮的小说网站.用狂雨小说cms搭建自己的小说站. 系统要求 P ...
node爬虫爬取小说
node爬虫爬取小说 node爬虫爬取小说直接上代码 node爬虫爬取小说最近发现自己喜欢的一个小说无法下载,网页版广告太多,操作太难受,只能自己写个爬虫把内容爬下来放在阅读器里面看项目下载地址 ...
python网络爬虫_Python网络爬虫——爬取视频网站源视频！
原标题:Python网络爬虫--爬取视频网站源视频! 学习前提 1.了解python基础语法 2.了解re.selenium.BeautifulSoup.os.requests等python第三方库 ...
python怎么爬网站视频教程_python爬虫爬取某网站视频的示例代码
把获取到的下载视频的url存放在数组中(也可写入文件中),通过调用迅雷接口,进行自动下载.(请先下载迅雷,并在其设置中心的下载管理中设置为一键下载) 实现代码如下: from bs4 import B ...

python 小说cms系统_「博文小说网」Python爬虫爬取小说网站 - seo实验室

python 小说cms系统_「博文小说网」Python爬虫爬取小说网站 - seo实验室相关推荐

最新文章

热门文章