Python爬虫爬取新浪微博热搜

文章目录

- Python爬虫爬取新浪微博热搜
网页分析
数据爬取
数据存储
全部代码

网页分析

找到热搜的排名，标题和热度，发现它们在同一路径

数据爬取

import requests
from lxml import etree
url= 'https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6'
#print(response.text)
headers={'User-Agent': 'Mozilla/5.0 (Wind ows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
}
response=requests.get(url,headers=headers)
html=etree.HTML(response.text)
datas=html.xpath('//*[@id="pl_top_realtimehot"]/table/tbody/tr')
for data in datas:data_title=data.xpath('td[2]/a/text()')#标题的xpath路径print(data_title)

运行结果

数据存储

fp=open('D:/老齐/微博.txt', 'a+')#a+，如果文件不存在就创建，存在就在内容后追加
for data in datas:data_title=''.join(data.xpath('td[2]/a/text()'))#标题data_rank=''.join(data.xpath('td[1]/text()'))#排名data_num=''.join(data.xpath('td[2]/span/text()'))print(data_rank,data_title,data_num,file=fp)
fp.close()

全部代码

import requests
from lxml import etree
url= 'https://s.weibo.com/top/summary?Refer=top_hot&topnav=1&wvr=6'
#print(response.text)
headers={'User-Agent': 'Mozilla/5.0 (Wind ows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
}
response=requests.get(url,headers=headers)
html=etree.HTML(response.text)
datas=html.xpath('//*[@id="pl_top_realtimehot"]/table/tbody/tr')
fp=open('D:/老齐/微博.txt', 'a+')#a+，如果文件不存在就创建，存在就在内容后追加
for data in datas:data_title=''.join(data.xpath('td[2]/a/text()'))#标题data_rank=''.join(data.xpath('td[1]/text()'))#排名data_num=''.join(data.xpath('td[2]/span/text()'))print(data_rank,data_title,data_num,file=fp)
fp.close()

Python爬虫爬取新浪微博热搜相关推荐

Python爬虫爬取微博热搜保存为 Markdown 文件
微博热搜榜python爬虫,仅供学习交流源码及注释: # -*- coding=UTF-8 -*- #!usr/bin/env pythonimport os import time import ...
爬取新浪微博热搜排行
爬取新浪微博热搜排行 1.1 爬虫基本原理解析什么是爬虫通过编写程序模拟浏览器上网在互联网抓取数据的过程分类通用爬虫搜索引擎用的爬虫系统尽可能把互联网的所有网页下载放到本地服务器形成备 ...
爬虫—爬取微博热搜榜
1. 引言利用scrapy框架爬取微博热搜榜网站前50条热搜. 爬取信息:热搜排名.热搜新闻名.热搜新闻热搜量. 数据存储:存储为.csv文件. 2.爬取流程新建scrapy爬虫项目: 在终端输入 ...
从零到一学爬虫-爬取微博热搜示例
爬取微博热搜榜-简单示例使用爬虫模拟浏览器向微博热搜的服务器发送请求,得到响应,然后将响应的信息进行打印. 当我们直接打开浏览器,输入https://s.weibo.com/top/summary, ...
c#使用正则表达式获取TR中的多个TD_使用python+BeautifulSoup爬取微博热搜榜
本文将介绍基于Python使用BeautifulSoup爬取微博热搜榜的实现过程 1.首先导入需要使用的库 from bs4 import BeautifulSoup from urllib.requ ...
Python爬虫与信息提取（五）爬虫实例：爬取新浪微博热搜排名
经过一段时间的Python网络爬虫学习,今天自己摸索制作了一个能够爬取新浪微博实时热搜排名的小爬虫 1.效果: 2.制作过程中遇到的问题: (1)一开始研究微博热搜页面的源代码时忽略了<tbod ...
Python爬取新浪微博热搜榜
Python爬取新浪微博实时热搜榜.名人热搜榜.热点热搜榜和潮流热搜榜四大板块.这些板块都是不需要登录的,所以爬起来还是比较简单的.不过频繁的爬取会出现验证码. 作用爬取四大榜单的关键词和热搜指数并存 ...
python爬取微博热搜显示到折线图_Python爬取新浪微博热搜榜-Go语言中文社区
我们如何爬取这50条热搜呢?今天写一个简单的方法供感兴趣的朋友们参考! 引用库: requests json lxml.etree bs4.BeautifulSoup引用方法如下: 如果没有下载的需要 ...
python爬取微博热搜写入数据库_python爬虫爬取微博热搜
[实例简介] [实例截图] [核心代码] import requests #数据抓取库 from lxml import etree #数据解析库 imp ...

Python爬虫爬取新浪微博热搜

Python爬虫爬取新浪微博热搜

文章目录

网页分析

数据爬取

数据存储

全部代码

Python爬虫爬取新浪微博热搜相关推荐

最新文章

热门文章

Python爬虫 爬取新浪微博热搜

Python爬虫 爬取新浪微博热搜

文章目录

网页分析

数据爬取

数据存储

全部代码

Python爬虫 爬取新浪微博热搜相关推荐

最新文章

热门文章

Python爬虫爬取新浪微博热搜

Python爬虫爬取新浪微博热搜

Python爬虫爬取新浪微博热搜相关推荐