用python爬取qq空间内容_利用Fiddler抓包和py的requests库爬取QQ空间说说内容并写入文件...

[Python] 纯文本查看复制代码#!C:\Program Files\Python36 python

# -*- coding: UTF-8 -*-

"""

@author: 东方不败

QQ空间说说爬取程序

1. 登录QQ空间，获取cookie

2. 利用Fiddler抓取QQ空间说说的URL(Fiddler显示response直接解析成Json，但是自己尝试并未成功，带研究）

3. 利用requests库爬取页面数据

4. 利用re匹配页面内容，获取说说的createTime和content

5. 文件保存写入txt文件

"""

class Spider:

"""emotionSpider类"""

def __init__(self, cookie):

"""初始化"""

self.page = 0 # 记录爬取页数

self.counts = 0 # 记录爬取条数

self.url = "https://user.qzone.com" # 爬取的url

self.cookie = cookie # 登录cookie

def get(self, url):

"""利用requests爬取页面内容，并转化成str返回"""

import requests # 导入requests库

"""组合headers"""

headers = {

"Cookie": self.cookie,

"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36",

}

self.url = url

# 特别提醒：verify=False参数必不可少，关闭HTTPS验证，否则可能报错(踩过的坑)

try:

response = requests.get(self.url, headers=headers, verify=False).content

html = str(response, 'utf-8')

return html

except Exception as err:

print("Error:" + err)

def analyse(self, content):

"""利用re匹配createTime和con"""

import re

con = re.findall("\"con\":\"([^\"]+)", content)

createTime = re.findall("\"createTime\":\"([^\"]+)", content)

emotion = tuple(zip(createTime, con))

return emotion

def saveEmotin(self, emotion):

"""保存emotion到TXT文件"""

for i in emotion:

# 参数encoding='utf-8'要带，避免Unicode编码解析保存(踩过的坑)

try:

with open("emotion.txt", "a+", encoding='utf-8') as f:

f.write(i[0])

f.write(":")

f.write(i[1])

f.write("\n")

self.counts += 1

print("已经保存 "+str(self.counts)+" 条说说")

except Exception as err:

print("Error:"+err)

self.page += 1

print("第"+str(self.page)+"页说说爬取完毕")

print("#"*50)

if __name__ == "__main__":

pos = 0

url = ":" # Fiddler 抓取的url

cookie = ":" # cookie

spider = Spider(cookie)

while True:

emotion = spider.analyse(spider.get(url))

if len(emotion) > 0:

spider.saveEmotin(emotion)

pos += 20 # 通过pos控制下一页

url = "https://user.qzone.qq.com/proxy/domain/taotao.qq.com/cgi-bin/emotion_cgi_msglist_v6?uin=65&ftype=0&sort=0&pos="+str(pos)+"&num=20"

else:

print("+"*50)

print("爬取结束!!!共计爬取"+str(spider.counts)+"条说说")

break

用python爬取qq空间内容_利用Fiddler抓包和py的requests库爬取QQ空间说说内容并写入文件...相关推荐

利用fiddler抓包爬取微信小程序数据
利用fiddler抓包爬取微信小程序数据 1.背景原理有些微信小程序无法在PC端进行访问原因判断非微信'内嵌浏览器',则禁止访问解决方法模拟微信'内嵌浏览器'进行访问,需要获取的数据有:Us ...
postman怎么导出测试用例_利用Charles抓包巧转接口自动化测试用例
在前面的文章中,也有介绍类似的工具的.例如:一键将接口请求转为测试用例介绍了开源的mitmproxy录制转化为接口测试用例,postman接口用例转化为python自动化测试用例文章记录了如何把po ...
fiddler 只监听模拟器_使用fiddler抓包模拟器及配置fiddler过滤
一. 安装fiddler 二. 配置fiddler,一下的ip要根据自己电脑情况设置然后重启Fiddler,一定要重启!!! 三.配置模拟器夜神或mumu 首先,你要保证你的代理IP是你电脑的I ...
python使用requests库爬取淘宝指定商品信息
python使用requests库爬取淘宝指定商品信息在搜索栏中输入商品通过F12开发者工具抓包我们知道了商品信息的API,同时发现了商品数据都以json字符串的形式存储在返回的html内解析u ...
全网最详细的Python+Requests接口测试教程：Fiddler抓包工具
本篇涵盖内容:fiddler.http协议.json.requests+unittest+报告.bs4.数据相关(mysql/oracle/logging)等内容. 文章是针对零基础入门接口测试和py ...
Python+Requests接口测试教程（1）：Fiddler抓包工具
本书涵盖内容:fiddler.http协议.json.requests+unittest+报告.bs4.数据相关(mysql/oracle/logging)等内容. 刚买须知:本书是针对零基础入门接口 ...
python获取网页json返回空_Python用requests库爬取返回为空的解决办法
首先介紹一下我們用360搜索派取城市排名前20. 我们爬取的网址:https://baike.so.com/doc/24368318-25185095.html 我们要爬取的内容: html字段: r ...
python爬豆瓣电视剧_python requests库爬取豆瓣电视剧数据并保存到本地详解
首先要做的就是去豆瓣网找对应的接口,这里就不赘述了,谷歌浏览器抓包即可,然后要做的就是分析返回的json数据的结构: https://movie.douban.com/j/search_subject ...
用Requests库爬取微博照片
用Requests库爬取微博照片代码如下: #微博图片爬取 import requests import os #文件目录方法模块 url="https://wx3.sinaimg.cn/ ...

用python爬取qq空间内容_利用Fiddler抓包和py的requests库爬取QQ空间说说内容并写入文件...

用python爬取qq空间内容_利用Fiddler抓包和py的requests库爬取QQ空间说说内容并写入文件...相关推荐

最新文章

热门文章