利用request、pyquery、xlwings等库抓取微博个人博客数据。

（1）抓取目标网址

（2）用 Chrome 浏览器或360快速浏览器分析微博网页结构。

（3）按功能不同编写不同方法组织代码。

（4）抓取100条微博数据。

# -*- coding: utf-8 -*-
from urllib.parse import urlencode
import requests
from pyquery import PyQuery as pq
import xlwings as xwdef get_page(page):global base_urlheaders = {'Host': 'm.weibo.cn','Referer': 'https://m.weibo.cn/u/2830678474','User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36','X-Requested-With': 'XMLHttpRequest','Cookie': 'M_WEIBOCN_PARAMS=oid%3D4703960021074409%26luicode%3D10000011%26lfid%3D1076032830678474; expires=Wed, 17-Nov-2021 00:39:42 GMT; Max-Age=600; path=/; domain=.weibo.cn; HttpOnly'}params = {'type': 'uid','value': '2830678474','containerid': '1076032830678474','page': page}url = base_url + urlencode(params)try:response = requests.get(url, headers=headers)if response.status_code == 200:return response.json()except requests.ConnectionError as e:print('Error', e.args)def parse_page(json_):global wblistif json_:items = json_.get('data').get('cards')for item in items:item = item.get('mblog')print(item)wblist.append([item.get('id'),pq(item.get('text')).text(),item.get('attitudes_count'),item.get('comments_count'),item.get('reposts_count')])if __name__ == '__main__':global wblist,base_urlwblist=[['id', 'text', 'attitudes', 'comments', 'reposts']]base_url = 'https://m.weibo.cn/api/container/getIndex?'for page in range(1, 20):json_ = get_page(page)results = parse_page(json_)# 写入Excel文件wb = xw.Book('./data.xlsx')sht = wb.sheets('Sheet4')sht.range('a1').value = wblist  # 将数据添加到表格中

# -*- coding: utf-8 -*-
from urllib.parse import urlencode
import requests
from pyquery import PyQuery as pq
import json
base_url = 'https://m.weibo.cn/api/container/getIndex?'headers = {'Host': 'm.weibo.cn','Referer': 'https://m.weibo.cn/u/2830678474','User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36','X-Requested-With': 'XMLHttpRequest','Cookie': 'M_WEIBOCN_PARAMS=oid%3D4703960021074409%26luicode%3D10000011%26lfid%3D1076032830678474; expires=Wed, 17-Nov-2021 00:39:42 GMT; Max-Age=600; path=/; domain=.weibo.cn; HttpOnly'
}def get_page(page):params = {'type': 'uid','value': '2830678474','containerid': '1076032830678474','page': page}url = base_url + urlencode(params)try:response = requests.get(url, headers=headers)if response.status_code == 200:return response.json()except requests.ConnectionError as e:print('Error', e.args)def parse_page(json_):if json_:items = json_.get('data').get('cards')for item in items:item = item.get('mblog')yield {'id': item.get('id'),'text': pq(item.get('text')).text(),'attitudes': item.get('attitudes_count'),'comments': item.get('comments_count'),'reposts' : item.get('reposts_count'),}def write_to_file(content):"""存储数据，通过JSON库的dumps()方法实现字典的序列化，写入到一个文本文件！:param content::return:"""with open('result.txt', 'a', encoding='utf-8') as f:f.write(json.dumps(content, ensure_ascii=False) + ',\n')
if __name__ == '__main__':count=0for page in range(1, 16):json_ = get_page(page)results = parse_page(json_)for result in results:print(result)count+=1write_to_file(result)print("抓取的数量为:",count)

结果

参考链接

Python Ajax爬取微博个人博客数据相关推荐

Python爬虫小实践：爬取任意CSDN博客所有文章的文字内容（或可改写为保存其他的元素），间接增加博客访问量...
Python并不是我的主业,当初学Python主要是为了学爬虫,以为自己觉得能够从网上爬东西是一件非常神奇又是一件非常有用的事情,因为我们可以获取一些方面的数据或者其他的东西,反正各有用处. 这两天闲 ...
使用Python爬取CSDN历史博客文章列表，并生成目录
使用Python爬取CSDN历史博客文章列表,并生成目录这篇博客将介绍如何使用Python爬取CSDN历史博客文章列表,并生成目录. 2020年 2020年04月 cv2.threshold() 阈 ...
python requests爬虫——爬取李开复博客信息（包括javascript部分信息）
今天是国庆假期第二天,已经玩了一天了,今天整理一下前两天写的数据分析作业思路,给实验报告打一下底稿.供对爬虫有兴趣的小伙伴们参考,也希望给实验没完成的同学提供一点小小的帮助. 任务要求. 1)分析页面 ...
php抓取微博评论,python爬虫爬取微博评论案例详解
前几天,杨超越编程大赛火了,大家都在报名参加,而我也是其中的一员. 在我们的项目中,我负责的是数据爬取这块,我主要是把对于杨超越的每一条评论的相关信息. 数据格式:{"name" ...
Selenium3+python3自动化（四十三）--爬取我的博客园粉丝的名称，并写入.text文件...
爬取目标 1.爬取目标网站:我的博客:https://home.cnblogs.com/u/canglongdao/followers/ 爬取内容:爬取我的博客的所有粉丝的名称,并保存到txt 3.由 ...
用python爬虫爬取微博信息
用python爬虫爬取微博信息话不多说,直接上代码! import requests from bs4 import BeautifulSoup from urllib import parse i ...
python+selenium 爬取微博（网页版）并解决账号密码登录、短信验证
使用python+selenium 爬取微博前言为什么爬网页版微博为什么使用selenium 怎么模拟微博登录一.事前准备二.Selenium安装关于selenium 安装步骤三.sel ...
python selenium 爬取去哪儿网的数据
python selenium 爬取去哪儿网的数据完整代码下载:https://github.com/tanjunchen/SpiderProject/tree/master/selenium+qu ...
爬取微博实时热搜数据可视化分析
文章目录爬取微博实时热搜数据可视化分析一.爬取数据 1.1 Spider主要函数 1.2 根据微博一分钟更新一次的状态进行爬虫二.可视化 2.1 利用轮播图加柱状图进行可视化爬取微博实时热搜数 ...

Python Ajax爬取微博个人博客数据

文章目录

利用request、pyquery、xlwings等库抓取微博个人博客数据。

（1）抓取目标网址

（2）用 Chrome 浏览器或360快速浏览器分析微博网页结构。

（3）按功能不同编写不同方法组织代码。

（4）抓取100条微博数据。

参考链接

Python Ajax爬取微博个人博客数据相关推荐

最新文章

热门文章