Python3 快手视频爬取

前提

我们有一些具体的快手播放地址例如：

https://live.kuaishou.com/u/shengxue1111/3xwgehu7uyudyeq

打开后出现如下

目的

拿到视频的播放地址

解决过程

首先是F12看见返回的网页里面在最后有一个json串

但是在用代码请求的时候没有这个东西，根据地址栏发生了变化变为了

https://live.kuaishou.com/u/shengxue1111/3xwgehu7uyudyeq?did=web_975948772fda54ca569800162f04e530

猜测可能是有一些跳转，于是清空了 cookie 和缓存的文件，重新请求下发现了端倪

返回的结果里面也有具体的MP4的播放地址

观察其实首页进行了跳转，在response里面有 set cookie

接下来要啥啥，就不用多说了吧，上代码

# coding:utf-8import pymysql
import requests
import re
import time
import jsonheaders = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36','Host': 'live.kuaishou.com','content-type': 'application/json',
}url = data[1]print(f"开始请求 {url}")response = Noneresponse = requests.get(url, headers=headers)text = response.textcookie = response.cookies.get_dict()
did = cookie['did']if not did:print(f"未获取到did {sid}")returntime.sleep(3)cookie_str = ''for key in cookie:cookie_str += key + ":" + cookie[key] + ";"headers['cookie'] = cookie_str
headers['Referer'] = url + '?csr=true'params = {"operationName": "FeedQuery","query": "query FeedQuery($principalId: String, $photoId: String) {\n  feedById(principalId: $principalId, photoId: $photoId) {\n    currentWork {\n      id\n      thumbnailUrl\n      poster\n      workType\n      type\n      useVideoPlayer\n      imgUrls\n      imgSizes\n      magicFace\n      musicName\n      caption\n      location\n      liked\n      onlyFollowerCanComment\n      relativeHeight\n      timestamp\n      width\n      height\n      counts {\n        displayView\n        displayLike\n        displayComment\n        __typename\n      }\n      user {\n        id\n        eid\n        name\n        avatar\n        __typename\n      }\n      expTag\n      playUrl\n      __typename\n    }\n    status\n    errMsg\n    __typename\n  }\n}\n","variables": {"principalId": author_id,"photoId": video_id}
}response = requests.post('https://live.kuaishou.com/m_graphql', headers=headers, json=params)text = response.textj = json.loads(text)playUrl = Nonetry:playUrl = j.get('data').get('feedById').get('currentWork').get('playUrl')
except Exception as e:passif not playUrl:print(f"没有找到地址 {sid},{url}")return;print(playUrl)

大致意思就是请求具体的链接，从响应里面获取cookie，再请求JSON数据，需要注意的是快手的链接

https://live.kuaishou.com/u/shengxue1111/3xwgehu7uyudyeq

shengxue1111 就是用户ID

3xwgehu7uyudyeq就是具体的视频ID

Python3 快手视频爬取相关推荐

python3网络爬虫--爬取b站用户投稿视频信息（附源码）
文章目录一．准备工作 1．工具二．思路 1．整体思路 2．爬虫思路三．分析网页 1.分析数据加载方式 2．分词接口url 3.分析用户名(mid) 四．撰写爬虫五．得到数据六．总结上次写了 ...
python3.x+requests 爬取网站遇到中文乱码的解决方案
正常情况下,遇见问题上google找答案能甩百度100条街,但是这个问题是个例外······人家老外就没有乱码的问题.言归正传,首先建议大家看一下python3.x+requests 爬取网站遇到中文 ...
爬虫入门实战第一站——梨视频视频爬取
爬虫入门实战第一站--梨视频视频爬取简介博主最近重新开始了解爬虫,想以文字方式记录自己学习和操作的过程.本篇文章主要是使用爬虫爬取梨视频网站中的视频并下载到本地,同时将视频简介和视频网站保存在ex ...
Python爬虫系列之抖音热门视频爬取
Python爬虫系列之抖音热门视频爬取主要使用requests库以及手机抓包工具去分析接口该demo仅供参考,已经失效,需要抖音2019年5月最新所有接口请点击这里获取抖音资源获取接口文档请点击 ...
python3 requests+bs4爬取某网页MM图片
python3 requests+bs4爬取某网页MM图片原理: 将所要抓取的首页分标题及地址保存到字典,遍历字典,对每一个标题下的所有分页进行抓取 import requests from bs4 ...
python伪装浏览器爬取网页图片_【IT专家】python 分别用python2和python3伪装浏览器爬取网页内容...
本文由我司收集整编,推荐下载,如有疑问,请与我司联系 python 分别用 python2 和 python3 伪装浏览器爬取网页内容 2017/07/06 1 python 网页抓取功能非常强大,使 ...
python3 爬虫数据处理爬取华为应用市场 APP应用评论（二）根据评论生成词云——小猿搜题
python3 爬虫&数据处理爬取华为应用市场 APP应用评论(二)根据评论生成词云--小猿搜题 # 导入扩展库 import re # 正则表达式库 import collections ...
python3爬虫：爬取电影天堂电影信息
python3爬虫:爬取电影天堂电影信息 #爬取电影天堂电影信息 #爬取电影天堂电影信息 #爬取电影天堂电影信息 from lxml import etree import requestsBASE_ ...
Python 视频爬取与存储
可以把视频存到本地,前提是有视频链接,有的还要登录,我还不会/(ㄒoㄒ)/~~. 这里爬取的是MOOC上的视频,也是我学习的地方. 将链接保存在.py文件同目录下的.txt里即可 import req ...

Python3 快手视频爬取

Python3 快手视频爬取相关推荐

最新文章

热门文章