python爬虫--一次爬取小说的尝试

一次爬取小说的尝试

 1 #!/usr/bin/python
 2 # -*- coding:utf-8 -*-
 3 import requests
 4 from bs4 import BeautifulSoup
 5
 6 url = 'http://www.zanghaihua.org/nanbudangan/'
 7 req = requests.get(url=url)
 8 req.encoding = req.apparent_encoding
 9 soup = BeautifulSoup(req.text,'html.parser')
10 div = soup.find(name='div',attrs={'class':'booklist'})
11 # print(div)
12 span_list = div.find_all('span')
13 # print(span_list)
14
15 for span in span_list:
16     a = span.find('a')
17     # span_text=span.find(attrs={'class':'v'})
18     if not a:
19         continue
20     a_url = a.get('href')
21     # a_text = a.text
22
23
24     response = requests.get(url=a_url)
25     response.encoding =response.apparent_encoding
26     # print(response.text)
27     # print(response.encoding)
28     soup = BeautifulSoup(response.text,'html.parser')
29
30     Bookname = soup.find(name='h1',attrs={'align':'center'}).text
31     # print('书名：%s' %Bookname)
32     ChapterTitle =soup.find(name='div',attrs={'class':'chaptertitle'}).text
33     # print('章节名：%s' %ChapterTitle)
34
35     Title = soup.find(name='div',attrs={'id':'BookText'}).get_text('\n','<br/><br/>')
36     #用get_text获取文本并将<br/><br/>替换成\n
37     # print(Title)
38
39     with open(Bookname,'ab+') as f:
40         #以追加模式写入文件
41
42         if ChapterTitle=='关于南部档案馆的研究':
43             f.write(Bookname.encode('utf-8'))
44         f.write(ChapterTitle.encode('utf-8'))
45         f.write(Title.encode('utf-8'))

转载于:https://www.cnblogs.com/xiaoyujuan/p/11098668.html

python爬虫--一次爬取小说的尝试相关推荐

还在苦于Kindle的epub格式吗？python爬虫，一键爬取小说加txt转换epub。
还在苦于Kindle的epub格式吗?python爬虫,一键爬取小说加txt转换epub. 项目地址: https://github.com/Fruiticecake/dubuNovel/blob/m ...
Python爬虫系列：爬取小说并写入txt文件
导语: 哈喽,哈喽~都说手机自带的浏览器是看小说最好的一个APP,不须要下载任何软件,直接百度就ok了. 但是小编还是想说,如果没有网,度娘还是度娘吗?能把小说下载成一个.txt文件看不是更香吗?这能 ...
python 爬虫实例电影-Python爬虫教程-17-ajax爬取实例（豆瓣电影）
Python爬虫教程-17-ajax爬取实例(豆瓣电影) ajax: 简单的说,就是一段js代码,通过这段代码,可以让页面发送异步的请求,或者向服务器发送一个东西,即和服务器进行交互对于ajax: ...
Python爬虫系列之爬取微信公众号新闻数据
Python爬虫系列之爬取微信公众号新闻数据小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发 > 点击这里联系我们 < 微信请扫描下方二维码代码仅供学习交流 ...
携程ajax,Python爬虫实战之爬取携程评论
一.分析数据源这里的数据源是指html网页?还是Aajx异步.对于爬虫初学者来说,可能不知道怎么判断,这里辰哥也手把手过一遍. 提示:以下操作均不需要登录(当然登录也可以) 咱们先在浏览器里面搜索携 ...
Python爬虫学习笔记 -- 爬取糗事百科
Python爬虫学习笔记 -- 爬取糗事百科代码存放地址: https://github.com/xyls2011/python/tree/master/qiushibaike 爬取网址:https ...
Python爬虫系列之爬取某奢侈品小程序店铺商品数据
Python爬虫系列之爬取某奢侈品小程序店铺商品数据小程序爬虫接单.app爬虫接单.网页爬虫接单.接口定制.网站开发.小程序开发> 点击这里联系我们 < 微信请扫描下方二维码代码仅供学 ...
Python爬虫 scrapy框架爬取某招聘网存入mongodb解析
这篇文章主要介绍了Python爬虫 scrapy框架爬取某招聘网存入mongodb解析,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下创建项目 sc ...
Python爬虫入门（爬取豆瓣电影信息小结）
Python爬虫入门(爬取豆瓣电影信息小结) 1.爬虫概念网络爬虫,是一种按照一定规则,自动抓取互联网信息的程序或脚本.爬虫的本质是模拟浏览器打开网页,获取网页中我们想要的那部分数据. 2.基本流程 ...

python爬虫--一次爬取小说的尝试

python爬虫--一次爬取小说的尝试相关推荐

最新文章

热门文章