(python)查看糗事百科文字点赞作者等级评论

import requests
import re
headers = {
'User-Agent':'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)'
}
info_lists = []
def judgment_sex(class_name):
if class_name == 'womenIcon':
return '女'
else:
return '男'
def get_info(url):
res = requests.get(url)
ids = re.findall('<h2>(.*?)</h2>',res.text,re.S)
levels = re.findall('<div class="articleGender (.*?)">',res.text,re.S)
sexs = re.findall('<div class="number">',res.text,re.S)
contents = re.findall('<div class="content">.*?<span>(.*?)</span>',res.text,re.S)
laughs = re.findall('<span class="stats-vote"><i class="number">(\d+)<i>',res.text,re.S)
comments = re.findall('<i class="number">(\d+)</i>评论',res.text,re.S)
for id,level,sex,content,laugh,comment in zip(ids,levels,sexs,contents,laughs,comments):
info = {
'id':id,
'level':level,
'sex':judgment_sex(sex),
'content':content,
'laugh':laugh,
'comment':comment
}
info_lists.append(info)
if __name__ == '__main__':
urls = ['https://www.qiushibaike.com/text/page/{}/'.format(str(i)) for i in range(1,12)]
for url in urls:
get_info(url)
for info_list in info_lists:
f = open('E:/qiushi.text', 'a+')
try:
f.write(info_list['id']+'\n')
f.write(info_list['level']+'\n')
f.write(info_list['sex']+'\n')
f.write(info_list['content']+'\n')
f.write(info_list['laugh']+'\n')
f.write(info_list['comment']+'\n\n')
f.close()
except UnicodeEncodeError:
pass

问题：无法生成文档 debug无错

转载于:https://www.cnblogs.com/zhentaoFrezt/p/9255371.html

(python)查看糗事百科文字点赞作者等级评论相关推荐

python多线程糗事百科案例
案例:多线程爬虫目标:爬取糗事百科段子,待爬取页面URL:http://www.qiushibaike.com/8hr/page/1 要求: 使用requests获取页面信息,用XPATH/re 做 ...
正则表达式re模式(python爬虫糗事百科热点段子)
python编程快速上手(持续更新中-) python爬虫从入门到精通文章目录 python编程快速上手(持续更新中-) python爬虫从入门到精通非结构化数据与结构化数据提取概述非结构化的 ...
python爬虫糗事百科
#coding:utf-8 import urllib2 import re # 工具类 class Tools(object):remove_n = re.compile(r'\n')replace ...
python爬虫案例——糗事百科数据采集
全栈工程师开发手册 (作者:栾鹏) python教程全解 python爬虫案例--糗事百科数据采集通过python实现糗事百科页面的内容采集是相对来说比较容易的,因为糗事百科不需要登陆,不需要coo ...
利用Python爬取糗事百科段子信息
文章来源:公众号-智能化IT系统. 爬虫技术目前越来越流行,这里介绍一个爬虫的简单应用. 爬取的内容为糗事百科文字内容中的信息,如图所示: 爬取糗事百科文字35页的信息,通过手动浏览,以下为前四页的网 ...
Python爬虫基于Beautiful Soup的糗事百科爬虫
python爬虫 ---- 糗事百科爬虫首先进入糗事百科官网首页 -> 糗事百科本次爬虫的目标是翻页爬取糗事百科的信息,包括标题, 链接, 作者名, 好笑数&评论数之后右键检查, ...
爬虫实战1：爬取糗事百科段子
本文主要展示利用python3.7+urllib实现一个简单无需登录爬取糗事百科段子实例. 如何获取网页源代码对网页源码进行正则分析,爬取段子对爬取数据进行再次替换&删除处理易于阅读 0. ...
Android实战——jsoup实现网络爬虫，糗事百科项目的起步
Android实战--jsoup实现网络爬虫,爬糗事百科主界面本篇文章包括以下内容: 前言 jsoup的简介 jsoup的配置 jsoup的使用结语前言对于Android初学者想要做项目时,最 ...
python爬虫之糗事百科
历经1个星期的实践,终于把python爬虫的第一个实践项目完成了,此时此刻,心里有的只能用兴奋来形容,后续将继续加工,把这个做成一个小文件,发给同学,能够在cmd中运行的文件.简化版程序,即单单爬取页 ...

(python)查看糗事百科文字点赞作者等级评论