爬虫入门教程 | 使用selenium爬取微博热门数据

抓取实时热搜榜、热点热搜榜、潮流热搜榜、名人热搜榜，并固定格式存到CSV文件。

代码如下：

# coding=utf-8
import re
import requests
import xlwt
from bs4 import BeautifulSoup
from selenium import webdriverdriver = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chromedriver")driver.set_window_size(1080, 800)
driver.implicitly_wait(10)user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
headers = {'User-Agent': user_agent}class weibo():def __init__(self, url, filename):self.url = urlself.filename = filenamedef sousuo(self):url = self.urlfilename = self.filenamedriver.get(url)myfile = xlwt.Workbook()table = myfile.add_sheet(u'filename', cell_overwrite_ok=True)table.write(0, 0, u"排名")table.write(0, 1, u"关键词")table.write(0, 2, u"热搜指数")r = requests.get(url, headers=headers)html = r.textprint(html)soup = BeautifulSoup(html, 'html.parser')i = 1for tag in soup.find_all(href=re.compile("Refer=top"), target="_blank"):if tag.string is not None:print(tag.string)table.write(i, 1, tag.string)i += 1j = 1for tag in soup.find_all(href=re.compile("Refer=top"), target="_blank"):print(j)table.write(j, 0, j)j += 1z = 1for tag in soup.find_all(class_="star_num"):if tag.string is not None:print(tag.string)table.write(z, 2, tag.string)z += 1filename = str(filename) + ".csv"myfile.save(filename)s1 = weibo('http://s.weibo.com/top/summary?cate=realtimehot', '实时热搜榜')
s1.sousuo()
s2 = weibo('http://s.weibo.com/top/summary?cate=total&key=all', '热点热搜榜')
s2.sousuo()
s3 = weibo('http://s.weibo.com/top/summary?cate=total&key=films', '潮流热搜榜')
s3.sousuo()
s4 = weibo('http://s.weibo.com/top/summary?cate=total&key=person', '名人热搜榜')
s4.sousuo()

“`

爬虫入门教程 | 使用selenium爬取微博热门数据相关推荐

Python爬虫入门教程30：爬取拉勾网招聘数据信息
前言
Python爬虫入门教程06：爬取数据后的词云图制作
前言
Python爬虫入门教程31：爬取猫咪交易网站数据并作数据分析
前言
Python爬虫入门教程32：爬取boss直聘招聘数据并做可视化展示
前言
Python爬虫入门教程27：爬取某电商平台数据内容并做数据可视化
前言
python爬虫——使用selenium爬取微博数据（一）
python爬虫--使用selenium爬取微博数据(二) 写在前面之前因为在组里做和nlp相关的项目,需要自己构建数据集,采用selenium爬取了几十万条微博数据,学习了很多,想在这里分享一下如 ...
node 没有界面的浏览器_node.js爬虫入门（二）爬取动态页面(puppeteer)
之前第一篇爬虫教程node.js爬虫入门(一)爬取静态页面讲解了静态网页的爬取,十分简单,但是遇到一些动态网页(ajax)的话,直接用之前的方法发送请求就无法获得我们想要的数据.这时就需要通过爬取动态 ...
python+selenium 爬取微博（网页版）并解决账号密码登录、短信验证
使用python+selenium 爬取微博前言为什么爬网页版微博为什么使用selenium 怎么模拟微博登录一.事前准备二.Selenium安装关于selenium 安装步骤三.sel ...
爬取新笔趣阁排行并保存到mysql_python+selenium爬取微博热搜存入Mysql的实现方法...
最终的效果废话不多少,直接上图这里可以清楚的看到,数据库里包含了日期,内容,和网站link 下面我们来分析怎么实现使用的库 import requests from selenium.webdr ...

爬虫入门教程 | 使用selenium爬取微博热门数据

爬虫入门教程 | 使用selenium爬取微博热门数据相关推荐

最新文章

热门文章