新浪的微指数,首页输入一个关键字,比如 欢乐颂,会跳转至:http://data.weibo.com/index/hotword?wid=1091324230349&wname=欢乐颂

我不知道wid是什么编号还是什么,也不是和其他关键字的wid规则,于是我就删除了这个参数再请求一次,发现去掉也可以进入页面

热词趋势是一张图,鼠标动就会显示每天的数据,这个和360指数,百度指数一样

微指数还和360指数一样是一次请求就直接将所有数据以json的形式返回过来

我们用工具会找到一个http://data.weibo.com/index/ajax/getchartdata?month=default&__rnd=1464188164238,里面是整体趋势,pc&移动端趋势的所有数据

但是我现在没有弄明白是每个关键字的__rnd值都不一样,我还不知道如何获取到这个值,或者这个值的规律,或者如何自动获取到这个url,如果搞不定这个,那我只能做到单一关键词的数据采集

以下先采用单一采集

#coding=utf-8
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
import requests
import urllibclass xl():def pc(self):r=requests.get("http://data.weibo.com/index/ajax/getchartdata?month=default&__rnd=1464188164238")return r.textx=xl()
print x.pc()

结果:

csrf

很明显,跨站请求伪造,这样我们请求时就要把请求的头信息带上

#coding=utf-8
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
import requests
import urllibclass xl():def pc(self,name):url_name=urllib.quote(name)headers={
'Host': 'data.weibo.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding': 'gzip, deflate',
'Content-Type': 'application/x-www-form-urlencoded',
'X-Requested-With': 'XMLHttpRequest',
'Referer': 'http://data.weibo.com/index/hotword?wname='+url_name,
'Cookie': 'UOR=www.baidu.com,data.weibo.com,www.baidu.com; SINAGLOBAL=1213237876483.9214.1464074185942; ULV=1464183246396:2:2:2:3463179069239.6826.1464183246393:1464074185944; DATA=usrmdinst_12; _s_tentry=www.baidu.com; Apache=3463179069239.6826.1464183246393; WBStore=8ca40a3ef06ad7b2|undefined; PHPSESSID=3mn5oie7g3cm954prqan14hbg5',
'Connection': 'keep-alive'
}r=requests.get("http://data.weibo.com/index/ajax/getchartdata?month=default&__rnd=1464188164238",headers=headers)return r.textx=xl()
print x.pc("欢乐颂")

结果:

{"data":[{"zt":[{"day_key":"2016-04-25","wid":"1091324230349","value":"224539"},{"day_key":"2016-04-26","wid":"1091324230349","value":"157686"},{"day_key":"2016-04-27","wid":"1091324230349","value":"180757"},{"day_key":"2016-04-28","wid":"1091324230349","value":"219171"},{"day_key":"2016-04-29","wid":"1091324230349","value":"165993"},{"day_key":"2016-04-30","wid":"1091324230349","value":"141948"},{"day_key":"2016-05-01","wid":"1091324230349","value":"126398"},{"day_key":"2016-05-02","wid":"1091324230349","value":"174244"},{"day_key":"2016-05-03","wid":"1091324230349","value":"180751"},{"day_key":"2016-05-04","wid":"1091324230349","value":"212351"},{"day_key":"2016-05-05","wid":"1091324230349","value":"252814"},{"day_key":"2016-05-06","wid":"1091324230349","value":"340472"},{"day_key":"2016-05-07","wid":"1091324230349","value":"316276"},{"day_key":"2016-05-08","wid":"1091324230349","value":"260587"},{"day_key":"2016-05-09","wid":"1091324230349","value":"222790"},{"day_key":"2016-05-10","wid":"1091324230349","value":"200010"},{"day_key":"2016-05-11","wid":"1091324230349","value":"224717"},{"day_key":"2016-05-12","wid":"1091324230349","value":"166743"},{"day_key":"2016-05-13","wid":"1091324230349","value":"103426"},{"day_key":"2016-05-14","wid":"1091324230349","value":"135842"},{"day_key":"2016-05-15","wid":"1091324230349","value":"75692"},{"day_key":"2016-05-16","wid":"1091324230349","value":"68669"},{"day_key":"2016-05-17","wid":"1091324230349","value":"79509"},{"day_key":"2016-05-18","wid":"1091324230349","value":"110907"},{"day_key":"2016-05-19","wid":"1091324230349","value":"44296"},{"day_key":"2016-05-20","wid":"1091324230349","value":"82582"},{"day_key":"2016-05-21","wid":"1091324230349","value":"41602"},{"day_key":"2016-05-22","wid":"1091324230349","value":"27270"},{"day_key":"2016-05-23","wid":"1091324230349","value":"31520"},{"day_key":"2016-05-24","wid":"1091324230349","value":"29199"},{"word":"\u6b22\u4e50\u9882"}],"yd":[{"daykey":"2016-04-25","pc":"67488","mobile":"157051"},{"daykey":"2016-04-26","pc":"47711","mobile":"109975"},{"daykey":"2016-04-27","pc":"43718","mobile":"137039"},{"daykey":"2016-04-28","pc":"43571","mobile":"175600"},{"daykey":"2016-04-29","pc":"42836","mobile":"123157"},{"daykey":"2016-04-30","pc":"41607","mobile":"100341"},{"daykey":"2016-05-01","pc":"25525","mobile":"100873"},{"daykey":"2016-05-02","pc":"45209","mobile":"129035"},{"daykey":"2016-05-03","pc":"52973","mobile":"127778"},{"daykey":"2016-05-04","pc":"57490","mobile":"154861"},{"daykey":"2016-05-05","pc":"71589","mobile":"181225"},{"daykey":"2016-05-06","pc":"133376","mobile":"207096"},{"daykey":"2016-05-07","pc":"92976","mobile":"223300"},{"daykey":"2016-05-08","pc":"49791","mobile":"210796"},{"daykey":"2016-05-09","pc":"62232","mobile":"160558"},{"daykey":"2016-05-10","pc":"59730","mobile":"140280"},{"daykey":"2016-05-11","pc":"80675","mobile":"144042"},{"daykey":"2016-05-12","pc":"81176","mobile":"85567"},{"daykey":"2016-05-13","pc":"40298","mobile":"63128"},{"daykey":"2016-05-14","pc":"42531","mobile":"93311"},{"daykey":"2016-05-15","pc":"13055","mobile":"62637"},{"daykey":"2016-05-16","pc":"20792","mobile":"47877"},{"daykey":"2016-05-17","pc":"41057","mobile":"38452"},{"daykey":"2016-05-18","pc":"70896","mobile":"40011"},{"daykey":"2016-05-19","pc":"13487","mobile":"30809"},{"daykey":"2016-05-20","pc":"45656","mobile":"36926"},{"daykey":"2016-05-21","pc":"20755","mobile":"20847"},{"daykey":"2016-05-22","pc":"7732","mobile":"19538"},{"daykey":"2016-05-23","pc":"13396","mobile":"18124"},{"daykey":"2016-05-24","pc":"10143","mobile":"19056"}]}],"len":1,"keyword":["\u6b22\u4e50\u9882"]}

json信息全部获得

zt是整体趋势数据

yd是pc&移动趋势数据

"keyword":["这里就是关键字"]

我又试了几个关键字,看了http://data.weibo.com/index/ajax/getchartdata?month=default&__rnd=xxxxxx这个url,__rnd这个参数的值可以为空,应该是个时间戳

python爬虫:案例四:新浪微指数相关推荐

  1. python爬虫-模拟登陆新浪微+博爬取感兴趣人的所有信息

    新浪微博的登录非常麻烦,涉及到预登录,用户名密码加密等等一系列问题 而现在pc端的新浪微博大量使用了ajax技术,必须要鼠标下滑多次才能加载一个页面的所有信息 图省事,使用了selenium_Plan ...

  2. Python爬虫爬取新浪新闻内容

    首先感谢丘祐玮老师在网易云课堂的Python网络爬虫实战课程,接下来也都是根据课程内容而写.一来算是自己的学习笔记,二来分享给大家参考之用. 课程视频大概是在16年11月录制的,现在是18年2月.其中 ...

  3. [Python爬虫]爬取新浪理财师股票问答

    本文将与大家分享如何爬取新浪理财师股票问答. 一.背景介绍 1)爬取顺序: 在这里,根据已有的股票id列表,按照顺序,依次爬取每只股票下面的股票问答. 股票id格式: lines = ['300592 ...

  4. Python爬虫——百度+新浪微盘下载歌曲

    本篇分享将讲解如何利用Python爬虫在百度上下载新浪微盘里自己想要的歌手的歌曲,随便你喜欢的歌手! 首先我们先探索一下我们操作的步骤(以下载Westlife的歌曲为例):打开百度,输入"W ...

  5. python 微盘下载_Python爬虫——百度+新浪微盘下载歌曲

    # -*- coding: utf-8 -*- """ Created on Mon Aug 7 09:22:12 2017 @author: JClian " ...

  6. 使用Python调用新浪微盘接口,创建自己的云盘应用

    我们可以使用新浪微博提供的微盘API接口,开发自己的云盘应用.下面一起来看一下吧. 1.首先到新浪微盘的开发者平台上创建自己的应用,然后可以获得你的APP_KEY和APP_SECRET. 2.新浪微盘 ...

  7. 使用python网络爬虫爬取新浪新闻(一)

    使用python网络爬虫爬取新浪新闻 第一次写博客,感觉有点不太习惯!不知道怎么突然就想学学爬虫了,然后就用了一天的时间,跟着教程写了这个爬虫,!不说废话了,我将我从教程上学习的东西整个写下来吧,从头 ...

  8. Python爬虫【四】爬取PC网页版“微博辟谣”账号内容(selenium多线程异步处理多页面)

    专题系列导引   爬虫课题描述可见: Python爬虫[零]课题介绍 – 对"微博辟谣"账号的历史微博进行数据采集   课题解决方法: 微博移动版爬虫 Python爬虫[一]爬取移 ...

  9. Python爬虫入门四之Urllib库的高级用法

    1.设置Headers 有些网站不会同意程序直接用上面的方式进行访问,如果识别有问题,那么站点根本不会响应,所以为了完全模拟浏览器的工作,我们需要设置一些Headers 的属性. 首先,打开我们的浏览 ...

最新文章

  1. 配置文件总结(机房重构知识点总结)
  2. Linux自学日记1
  3. Android 内存管理 Memory Leak OOM 分析
  4. css框架:五大css流行框架的总结-css教程-PHP中文网
  5. 画出的点做交互_设计之下交互设计原型设计之概念设计
  6. OpenZeppelin集成Truffle编写健壮安全的合约
  7. ORB_SLAM3在ubuntu18.04安装和初步测试+轨迹评估
  8. 欧姆龙cp1h指令讲解_欧姆龙CP1H-XA40DT-D手册CP1H-XA40DT-D参考手册通信指令 - 广州凌控...
  9. 前端三大框架有哪些异同?
  10. c盘瘦身(c盘瘦身最简单的方法win10)
  11. ps vita 说明书
  12. 一杯茶的时间,上手 Django 框架开发
  13. 结巴 python_结巴中文分词使用学习(python)
  14. datastage错误之Consumed more than 100,000 bytes looking for record delimiter; aborting
  15. 阿里云ECS的1M带宽能干嘛?(详解)
  16. 海盗比酒量--蓝桥杯
  17. Python 中列表与元组的异同
  18. 2020芝加哥计算机博士生录取,2020年芝加哥大学博士面试后多久知道结果
  19. C语言教程-main函数
  20. 中国微型无刷直流电动机市场趋势报告、技术动态创新及市场预测

热门文章

  1. 弹性云服务器的稳定可靠性,弹性云服务器的稳定可靠性
  2. bootstrap select 用法
  3. 手把手教你安装telnet(离线方式+在线方式)
  4. Elastik 采样器下载-Ueberschall Elastik 3.0.0 macOS
  5. DB2中常见sqlCode原因分析
  6. 电脑能接受消息,但是不能打开网页
  7. 没有python基础直接学爬虫_只针对 Python 爬虫,该如何学习?
  8. 百度阿波罗获雅析功能安全产品认证证书
  9. PC性能监测工具,软件测试人员不可或缺的好帮手
  10. eyeOS2.5安装,试用