【爬虫】python 微博评论数据分析

原文链接

用python爬取微博评论数据，爬虫之路，永无止境。。（附源码）_主打Python的博客-CSDN博客_爬虫微博评论

# !/usr/bin/nev python
# -*-coding:utf8-*-from datetime import datetime
from requests_html import HTMLSession
import re, time
import csv
# import tkinter as tk
import urllib3  # 解除警告urllib3.disable_warnings()
session = HTMLSession()user_url = 'https://weibo.com/2318265821/KrBA7lvW4#comment'
pass_wd = 'WEIBOCN_FROM=1110005030; SUB=_2A25Mx3mlDeRhGeNM41sV8i7KyzWIHXVsSAftrDV6PUJbkdANLUfEkW1NSeR9M3dIjq3lBi61DJC0D26LvrU8YMVV; MLOGIN=1; _T_WM=14744352522; XSRF-TOKEN=781dcc'f = open(r'评论.csv','a+',newline='')
fileheader = ['a','screen_names', 'genders', 'std_create_times', 'texts', 'like_counts']
fp = csv.DictWriter(f, fileheader) # 定义表头
fp.writeheader() # 写入表头
fp = csv.writer(f)class WBSpider(object):def main(self, user_url, pass_wd):i = 1a = 1headers_1 = {'cookie': pass_wd,'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.0 Safari/537.36'}headers_2 = {"referer": "https://m.weibo.cn/status/Kk9Ft0FIg?jumpfrom=weibocom",'cookie': pass_wd,'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Mobile Safari/537.36'}# user_url = 'https://weibo.com/2318265821/KrBA7lvW4#comment'# print(re.findall('/(.*?)#', user_url))uid_1 = re.findall('/(.*?)#', user_url)[0]uid_2 = uid_1.split('/', 3)[3]# print(uid_2)url_1 = f'https://weibo.com/ajax/statuses/show?id={uid_2}'prox = ''response = session.get(url_1, proxies={'http': prox, 'https': prox}, headers=headers_1,verify=False).content.decode()# print(response)weibo_id = re.findall('"id":(.*?),"idstr"', response)[0]# print(weibo_id)# 构造起始地址start_url = f'https://m.weibo.cn/comments/hotflow?id={weibo_id}&mid={weibo_id}&max_id_type=0'"""2.发送请求，获取响应： 解析起始的url地址:return:"""prox = ''response = session.get(start_url, proxies={'http': prox, 'https': prox}, headers=headers_2, verify=False).json()"""提取翻页的max_id"""max_id = response['data']['max_id']"""提取翻页的max_id_type"""max_id_type = response['data']['max_id_type']b = len(response['data']['data'])-1print('条数',b)"""构造GET请求参数"""data = {'id': weibo_id,'mid': weibo_id,'max_id': max_id,'max_id_type': max_id_type}"""解析评论内容"""self.parse_response_data(response, i,a)i += 1a += bprint('总条数',a)"""参数传递，方法回调"""self.parse_page_func(data, weibo_id, headers_2, i,a)def parse_page_func(self, data, weibo_id, headers_2, i,a):""":return:"""start_url = 'https://m.weibo.cn/comments/hotflow?'prox = ''response = session.get(start_url, proxies={'http': prox, 'https': prox}, headers=headers_2, params=data,verify=False).json()"""提取翻页的max_id"""max_id = response['data']['max_id']"""提取翻页的max_id_type"""max_id_type = response['data']['max_id_type']b = len(response['data']['data']) -1print('条数：',b)"""构造GET请求参数"""data = {'id': weibo_id,'mid': weibo_id,'max_id': max_id,'max_id_type': max_id_type}"""解析评论内容"""self.parse_response_data(response, i,a)i += 1a +=bprint('总条数',a)"""递归回调"""self.parse_page_func(data, weibo_id, headers_2, i,a)def parse_response_data(self, response, i,a):"""从响应中提取评论内容:return:""""""提取出评论大列表"""data_list = response['data']['data']# print(data_list)for data_json_dict in data_list:# 提取评论内容try:texts_1 = data_json_dict['text']"""需要sub替换掉标签内容"""# 需要替换的内容，替换之后的内容，替换对象alts = ''.join(re.findall(r'alt=(.*?) ', texts_1))texts = re.sub("<span.*?</span>", alts, texts_1)# 点赞量like_counts = str(data_json_dict['like_count'])# 评论时间   格林威治时间---需要转化为北京时间created_at = data_json_dict['created_at']std_transfer = '%a %b %d %H:%M:%S %z %Y'std_create_times = str(datetime.strptime(created_at, std_transfer))# 性别  提取出来的是  fgender = data_json_dict['user']['gender']genders = '女' if gender == 'f' else '男'# 用户名screen_names = data_json_dict['user']['screen_name']# print(a,screen_names, genders, std_create_times, texts, like_counts)data =[a,screen_names, genders, std_create_times, texts, like_counts]print(data)fp.writerow(data)print()a=a+1except Exception as e:continueprint('*******************************************************************************************')print()print(f'*****第{i}页评论打印完成*****')if __name__ == '__main__':w = WBSpider()w.main(user_url, pass_wd)

【爬虫】python 微博评论数据分析相关推荐

python微博评论情感分析_Python采集微博热评进行情感分析祝你狗年脱单
Ps: 重要的事情说三遍!!! 结尾有彩蛋,结尾有彩蛋,结尾有彩蛋. 如果自己需要爬(cai)虫(ji)的数据量比较大,为了防止被网站封Ip,可以分时段爬取,另外对于爬到的数据一般是用来存储数据库,这 ...
python微博评论情感分析_基于Python的微博情感分析系统设计
2019 年第 6 期信息与电脑 China Computer & Communication 软件开发与应用基于 Python 的微博情感分析系统设计王欣周文龙 (武汉工程大学邮电 ...
python 豆瓣评论数据分析_Python爬虫实战案例：豆瓣影评大数据分析报告之网页分析...
个人希望,通过这个完整的爬虫案例(预计总共4篇短文),能够让爬虫小白学会怎么做爬虫的开发,所以在高手们看来,会有很多浅显的废话,如果觉得啰嗦,可以跳过一些内容~ 上一篇文章给大家简单介绍了Python ...
python微博评论爬虫_详解用python写网络爬虫-爬取新浪微博评论基于Python的新浪微博爬虫研究...
怎样爬取新浪微博的评论信息针对八爪鱼在微博的应用上,除了用户信息之外还包括话题内容方面的采集,目前绝大多数企业均在微博设有官方微博,八爪鱼可以协助企业快速及时的抓取与企业产品相关联的话题信息,规则市 ...
python 豆瓣评论数据分析_Python数据可视化分析--豆瓣电影Top250
Python数据分析–豆瓣电影Top250 利用Python爬取豆瓣电影TOP250并进行数据分析,对于众多爬虫爱好者,应该并不陌生.很多人都会以此作为第一个练手的小项目.当然这也多亏了豆瓣的包容,没 ...
python微博评论抓取_python抓取新浪微博评论并分析
1,实现效果 2,数据库 3.主要步骤 1,输入账号password,模拟新浪微博登陆 2,抓取评论页的内容 3.用正則表達式过滤出username,评论时间和评论内容 4,将得到的内容存入数据库 5 ...
利用百度情感分析进行微博评论数据分析及可视化（万字长文）
总览: *输入:*关键词 *输出:*分日期情感均值折线图,主题词云,分省市情感均值折线图一.微博数据爬取 1.爬取指定关键词相关的微博 2.根据爬取到的微博信息爬取到相关的评论数据这里主要借鉴了 ...
python爬取微博评论点赞数_python 爬虫爬微博分析数据
python 爬虫爬微博分析数据最近刚看完爱情公寓5,里面的大力也太好看了吧... 打开成果的微博,小作文一样的微博看着也太爽了吧... 来用python分析分析狗哥这几年微博的干了些啥. ...
python跑一亿次循环_python爬虫爬取微博评论
原标题:python爬虫爬取微博评论 python爬虫是程序员们一定会掌握的知识,练习python爬虫时,很多人会选择爬取微博练手.python爬虫微博根据微博存在于不同媒介上,所爬取的难度有差异,无 ...
王校长撩妹不成反被锤爆？再有钱的舔狗也只是舔狗【Python爬虫实战：微博评论采取】
大家好,我是辣条呀~ 正如标题所示,想必这两天大家被王校长的瓜轰炸了吧,微博上都上了几轮热搜了,我也是吃的津津有味,难得看王校长在女生面前吃瘪呀.加上和一个朋友聊到了微博评论采集遇到的问题,有感而发写 ...

【爬虫】python 微博评论数据分析

【爬虫】python 微博评论数据分析相关推荐

最新文章

热门文章